# 網路爬蟲實作 ###### tags: `光復高中`, `光啟高中`, `網路爬蟲`, `Python` ## Step 1: 課程講義 * 光復高中:https://github.com/ycwang812/KFSH * 光啟高中:https://github.com/ycwang812/PHSH ## Step 2: 線上使用環境 Google Colaboratory - https://colab.research.google.com/notebooks/ ## Step 3: 建立 Notebook File > New Notebook > 名稱為 Starbuzz.ipynb ## Step 4: 取得咖啡豆網頁資訊 ```python= import urllib.request page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") print(text) ``` ## Step 5: 找出價格 ```python= import urllib.request page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") price = text[234:238] print(price) ``` ## Step 6: 動態找價格 ```python= import urllib.request page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 price = text[start_of_price:end_of_price] print(price) ``` ## Step 7: 找到最低價格 (會錯!) ```python= import urllib.request price = 99.99 while price > 5.9: page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 price = text[start_of_price:end_of_price] print("Buy!") ``` ## Step 8: 轉換變數型別 ```python= import urllib.request price = 99.99 while price > 5.9: page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 price = float(text[start_of_price:end_of_price]) print("Buy!") ``` ## Step 9: 減少伺服器負載 ```python= import urllib.request import time price = 99.99 while price > 5.9: time.sleep(900) page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 price = float(text[start_of_price:end_of_price]) print("Buy!") ``` ## Step 10: 減少程式重複性 ```python= import urllib.request def get_price(): page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 print(text[start_of_price:end_of_price]) get_price() ``` ## Step 11: 函式回傳值 ```python= import urllib.request def get_price(): page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 return(text[start_of_price:end_of_price]) price = get_price() print(price) ``` ## Step 12: 增加判斷式 ```python= import urllib.request import time def get_price(): page = urllib.request.urlopen("http://beans-r-us.appspot.com/prices.html") text = page.read().decode("utf8") where = text.find(">$") start_of_price = where + 2 end_of_price = start_of_price + 4 return(float(text[start_of_price:end_of_price])) price_now = input("Do you want to see the price now (Y/N)?") if price_now == "Y": print(get_price()) else: price = 99.99 while price > 5.9: time.sleep(900) price = get_price print("Buy!") ```