python影像處理&爬蟲

## 以成大網站上抓取圖片並做處理 ### 以這張圖片為例，先從有給定網址開始，試著做處理。 https://www.ncku.edu.tw/var/file/0/1000/pictures/955/m/mczh-tw1920x800_small259803_196914831722.jpg ```python= import requests from PIL import Image, ImageEnhance from io import BytesIO import matplotlib.pyplot as plt def download_image(url): #下載圖片函數 response = requests.get(url) #讀取該網址內容 image = Image.open(BytesIO(response.content)) #IO為把文件轉成圖片 return image def concatenate_images(image1, image2): #合併圖片 total_width = image1.width + image2.width #寬度相加 #物件的屬性 max_height = max(image1.height, image2.height) #兩張圖片高度切齊(取最高) new_image = Image.new('RGB', (total_width, max_height)) new_image.paste(image1, (0, 0)) new_image.paste(image2, (image1.width, 0)) return new_image #圖片網址 image_url = "https://www.ncku.edu.tw/var/file/0/1000/pictures/955/m/mczh-tw1920x800_small259803_196914831722.jpg" image = download_image(image_url) #下載完圖片存進一個變數 gray_image = image.convert("L") #"L"為灰階 combined_image = concatenate_images(image, gray_image) #把原本圖片跟轉灰色圖片結合 plt.imshow(combined_image) plt.axis('off') # 不顯示座標軸 plt.show() ``` 結果: ![image](https://hackmd.io/_uploads/Hkxf5kMaXa.png) ### 假設不給定該圖片網址，只有成大網頁網址的情況: 那就要從成大網頁裡爬取該圖片的連結，再把它download下來，再做處理。 ```python= import requests from bs4 import BeautifulSoup from PIL import Image, ImageEnhance from io import BytesIO import matplotlib.pyplot as plt def download_image(url): #下載圖片函數 response = requests.get(url) #讀取該網址內容 image = Image.open(BytesIO(response.content)) #IO為把文件轉成圖片 return image def concatenate_images(image1, image2): #合併圖片 total_width = image1.width + image2.width #寬度相加 #物件的屬性 max_height = max(image1.height, image2.height) #兩張圖片高度切齊(取最高) new_image = Image.new('RGB', (total_width, max_height)) new_image.paste(image1, (0, 0)) new_image.paste(image2, (image1.width, 0)) return new_image url_list=[] #設立一個list放網址 # 目標網站的URL url = "https://www.ncku.edu.tw/" front="https://www.ncku.edu.tw" # 發送HTTP GET請求並取得頁面內容 response = requests.get(url) # 檢查請求是否成功 if response.status_code == 200: # 將網頁內容解析為Beautiful Soup物件 soup = BeautifulSoup(response.text, 'html.parser') ``` 再來就是爬取的部分，基本上跟以前做過的爬取圖片連結的方法一樣，在字典取出該連結後綴，前面再加上成大網站連結方為完整連結。然而我再這邊要對爬取到的每一張圖片做串接，我先把全部的圖片連結都放在一個list裡面，使用for迴圈，以第一張圖片為基準點(image1)，第二張與第一張圖片合併後，再把合併後圖片令為image1，後來迭代的圖片連結經過download後存在image2再跟image1合併。 ```python= for element in element1: t = element.find('h2', class_='mt-title') if t and t.text == '成大快訊': #再細找成大快訊的部份 elements_2 = element.find_all('div', class_='d-item v-it col-sm-3') for element3 in elements_2: result=element3.find('div',class_="d-img") result=result.find("img") url=front+result["src"] #後綴網址result["src"]還要加上前綴https://www.ncku.edu.tw #print(url) url_list.append(url) #把每一個圖片網址放入list image1=download_image(url_list[0]) #image1=第一張圖 for x in url_list[1::]: #從第二張開始一直遞迴，新的圖片指定回image1，image1再跟下一張圖結合 image2=download_image(x) image1=concatenate_images(image1,image2) plt.axis('off') plt.imshow(image1) 執行結果: ![image](https://hackmd.io/_uploads/SkroGzaX6.png)