{%hackmd @yun-cheng/theme %} # Python BeautifulSoup Notes ###### tags: `Python` `BeautifulSoup` `Notes` [TOC] ## 起手式 ```python= import requests from bs4 import BeautifulSoup as bs url = '' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' } session = requests.session() page = session.get(url, headers=headers) page_source = bs(page.text, 'html.parser') ``` ## 一般篩選方式 ### 使用`find_all` > 各種方法可合併在一起寫 ```python= class_A_tags = page_source.find_all(class_='A') # id 包含 ABC id_ABC_tags = page_source.find_all(id=re.compile('ABC')) tag_div_tags = page_source.find_all('div') ``` ### 使用`select` 使用 select 比 find 快 https://stackoverflow.com/questions/38028384/beautifulsoup-difference-between-find-and-select ```python= tags_1 = page_source.select('#um > p:nth-child(2) > strong > a') tags_2 = tags_1.select('.title a') ``` ### 使用`xpath` ```python= from lxml import etree page_source = etree.HTML(search_page.content) tag = page_source.xpath('//*[@class="Info"]/tbody/tr[2]/td[2]/a/span') ``` ## 特殊篩選方式 > 使用`find_next_sibling`,尋找特定 tag 之後符合的 tags ```python= class_B_tags_after_tag_A = tag_A.find_next_sibling(class_='B') ``` ## 讀取內容 ```python= # 文字 tag[0].text ```
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.