{%hackmd @yun-cheng/theme %} # Python BeautifulSoup Notes ###### tags: `Python` `BeautifulSoup` `Notes` [TOC] ## 起手式 ```python= import requests from bs4 import BeautifulSoup as bs url = '' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' } session = requests.session() page = session.get(url, headers=headers) page_source = bs(page.text, 'html.parser') ``` ## 一般篩選方式 ### 使用`find_all` > 各種方法可合併在一起寫 ```python= class_A_tags = page_source.find_all(class_='A') # id 包含 ABC id_ABC_tags = page_source.find_all(id=re.compile('ABC')) tag_div_tags = page_source.find_all('div') ``` ### 使用`select` 使用 select 比 find 快 https://stackoverflow.com/questions/38028384/beautifulsoup-difference-between-find-and-select ```python= tags_1 = page_source.select('#um > p:nth-child(2) > strong > a') tags_2 = tags_1.select('.title a') ``` ### 使用`xpath` ```python= from lxml import etree page_source = etree.HTML(search_page.content) tag = page_source.xpath('//*[@class="Info"]/tbody/tr[2]/td[2]/a/span') ``` ## 特殊篩選方式 > 使用`find_next_sibling`,尋找特定 tag 之後符合的 tags ```python= class_B_tags_after_tag_A = tag_A.find_next_sibling(class_='B') ``` ## 讀取內容 ```python= # 文字 tag[0].text ```
Sign in
Forgot password
By clicking below, you agree to our
terms of service
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
Connect another wallet
New to HackMD?
Sign up