web crawler 實作--運動賽事照片抓取

###### tags: `web crawler`,`網路爬蟲`,`圖片抓取` # web crawler 實作--運動賽事照片抓取 **今天要來抓取運動賽事相簿的照片** 網址提供如下: <https://running.biji.co/index.php?q=album> 以下程式的url記得要點進相簿裡面喔,要進到網頁上有顯示多張照片(如下圖2),再複製網址喔! ![](https://i.imgur.com/mExSyJG.jpg) ![](https://i.imgur.com/cLvkTzW.jpg) :bulb: **初步解析網頁** ![](https://i.imgur.com/7z3TElQ.jpg)![](https://i.imgur.com/2dL3Gz2.jpg) ```python= import os #確認檔案目錄 import urllib #存檔用 import requests from bs4 import BeautifulSoup #https://running.biji.co/index.php?q=album&act=photo_list&album_id=47168&cid=10113&type=album&subtitle=2022%20ELLE%20RUN%20WITH%20STYLE-5K%E6%8A%98%E8%BF%945" url=input("請輸入相簿網址: ") rq1=requests.get(url) rq2=BeautifulSoup(rq1.text,"html5lib") #解析網頁 title = rq2.find("h1","album-title flex-1").text.strip() # 找到標題 imgs = rq2.find_all("img","photo_img photo-img") # 以標題建立目錄儲存圖片 imgDir=title + "/" if not os.path.exists(imgDir): #如果路徑不存在imgDir,就建立一個imgDir os.mkdir(imgDir) # 處理所有 <img> 標籤 num=0 for image in imgs: # 讀取 src 屬性內容(圖檔完整路徑) Path=image.get("src") #get也可以得到屬性 #path=image["src"] if Path != None: imgName = Path.split('/')[-1] # 取得圖檔名 #print(Path) # 儲存圖片 try: urllib.request.urlretrieve(Path,title+"\\"+imgName) #圖片存檔 title+"\\"存到這目錄底下 num += 1 except: print("{} 無法讀取!!".format(imgName)) print("共下載",num,"張圖片!!") else: print("相簿已經存在!!") ``` **urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)** * 從url複製檔案至電腦的目錄裡 * 參數url:指定網址 * 參數filename:檔案名稱 Copy a network object denoted by a URL to a local file. If the URL points to a local file, the object will not be copied unless filename is supplied. Return a tuple (filename, headers) where filename is the local file name under which the object can be found, and headers is whatever the info() method of the object returned by urlopen() returned (for a remote object). Exceptions are the same as for urlopen(). **結果** ![](https://i.imgur.com/C9pK9xN.jpg) ![](https://i.imgur.com/bZb09pe.jpg)