Python Web Scraping -Beautiful Soup

Kenneth Law
2 min readAug 21, 2021

I record web scraping my learning in this story and want to apply it on scraping some Hong Kong news website because I worked in advertising job before and this is familiar with me. This tutorial scraps timejobs.com.

Here are some points I think it is important

Read html

first thing to read html file by using ‘lxml’ format

soup = BeautifulSoup(html_text, 'lxml')

Job posts

Several jobs posts in timesjob.com are in the page. Let’t inspect it.

Each job post are under a tag <li>and a class. We use soup.find_all to scrap each job post.

jobs = soup.find_all('li', class_ ='clearfix job-bx wht-shd-bx')

Scraping comp name

company name inside <h3> tag and class joblist-comp-name

company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')

Scraping more info

In <h2>, you can see a link inside it. The link redirects the job post details. If we click the job post, we will see more details.

<a> tag are inside h2. href means Hypertext Reference


more_info = job.header.h2.a['href']

Final code

--

--