--- tags: Python, Scrawler --- # Python requests+lxml 爬蟲豆瓣音樂排行榜 [](https://hackmd.io/GFh4gerYSPySSUqrGQdI4Q) 1. 載入資源 ```bash import requests from lxml import etree ``` 2. 下載並解析html ```bash res = requests.get('https://music.douban.com/chart', headers=headers) headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'} content = res.content.decode() ``` 3. 將 html 轉為 lxml 形式 ```bash html = etree.HTML(content) ``` 4. 從網頁複製Xpath路徑: > F12>對想要的內容按右鍵>Copy>Copy XPath >  5. 貼到 ```html.xpath()``` 後面加上 ```/text()```顯示標籤的內容(歌名)  ```bash main_data = html.xpath('//*[@id="content"]/div/div[1]/div/ul/li/div//a/text()') ``` 6. 抓取創作者和播放量並且過濾空行和分開創作者與播放量  ```bash describe = [x.split('\xa0/\xa0') for x in list(filter(lambda x:x!='', [x.strip() for x in html.xpath('//*[@id="content"]/div/div[1]/div/ul/li/div/p/text()')]))] ```
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.