Python
BeautifulSoup
Notes
import requests
from bs4 import BeautifulSoup as bs
url = ''
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}
session = requests.session()
page = session.get(url, headers=headers)
page_source = bs(page.text, 'html.parser')
find_all
各種方法可合併在一起寫
class_A_tags = page_source.find_all(class_='A')
# id 包含 ABC
id_ABC_tags = page_source.find_all(id=re.compile('ABC'))
tag_div_tags = page_source.find_all('div')
select
使用 select 比 find 快
https://stackoverflow.com/questions/38028384/beautifulsoup-difference-between-find-and-select
tags_1 = page_source.select('#um > p:nth-child(2) > strong > a')
tags_2 = tags_1.select('.title a')
xpath
from lxml import etree
page_source = etree.HTML(search_page.content)
tag = page_source.xpath('//*[@class="Info"]/tbody/tr[2]/td[2]/a/span')
使用
find_next_sibling
,尋找特定 tag 之後符合的 tags
class_B_tags_after_tag_A = tag_A.find_next_sibling(class_='B')
# 文字
tag[0].text
[TOC] 下載 以下 2 個是在 Windows 運行所需,第 1 個也可以換成下載一般 Tor Browser。 1. Tor Windows Expert Bundle https://www.torproject.org/download/tor/ 2. Vidalia
Jul 31, 2020or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up