# 逼逼逼...機油好難喝🤖 ## Web Crawler #### 初探網路爬蟲 ``` print("Hello ! I am Maximilian aka Snackeyes.") ``` --- ## 聽說上學期你們有碰過 --- ## 先簡介一下什麼是網路爬蟲 ![](https://i.imgur.com/FJqsQFs.png =30%x) ---- ### 試想看看有想大量下載圖片 ### 但你又很懶 ---- # 那該怎麼辦呢? ---- ## ~~去睡覺吧!~~ ## 夢裡甚麼都有 ### 呸呸呸!! ---- ### 簡單來說就是讓你當慣老闆 ### 使用網路爬蟲來替你工作 ---- ### 可以做些甚麼呢? #### 幫你下載美女的照片、爬文、搶Black pink的票等等 ![](https://i.imgur.com/kdSWxXN.png =60%x) ---- ### 基本上就是讓爬蟲偽裝成瀏覽器去爬取你要的資料 ### 或下達指令給爬蟲執行 ![](https://i.imgur.com/KH2dmBb.png =50%x) ---- ### 爬蟲之前必須知道的事 * 會需要先認識網站架構,Ex. HTML * 要當個有道德的爬蟲(!? --- # Selenium [官網介紹](https://selenium-python.readthedocs.io/installation.html) ## 前置作業 ---- ### 這裡順便提一下pipenv ``` pip3 install pipenv pipenv install selenium ``` ![](https://i.imgur.com/s0FyZFO.png =60%x) ### 可以先確認是否安裝成功 ---- ### 下載Chrome Driver [Here](https://https://chromedriver.chromium.org/downloads) ##### Chrome -> Setting -> About Chrome -> Chrome Version ![](https://i.imgur.com/0EveoCT.png =70%x) #### 記得對照版本下載 #### 解壓縮到桌面 --- ## 語法教學 ---- ### 環境設定 ``` from selenium import webdriver from selenium.webdriver.support.ui import Select,WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import requests from bs4 import BeautifulSoup import io import time ``` ---- ### 起手式 ``` PATH = " " //Chrome Driver的位置 driver = webdriver.Chrome //所需使用的瀏覽器 driver.get(" ") //Chrome Driver要開啟的網頁 ``` ##### Mac的部分可以把Chrome Driver加入到bin的環境變數中 ``` /usr/local/bin ``` ---- ### Example ``` Path = "/usr/local/bin/chromedriver.exe"#chromedriver path driver = webdriver.Chrome()#open browser driver.get("https://www.dcard.tw/f") ``` ``` ==>開啟Dcard首頁 ``` ---- ## 必備知識 ---- #### 標籤種類 ![](https://i.imgur.com/b56mALA.png) ##### 常用的會像是name,class,id... ---- ### 普通操作 ``` print(driver.title) //網頁名稱 search = driver.find_element(HTML標籤, "標籤名稱" ) search.clear() //清除預設文字 search.send_keys("搜尋名稱") time.sleep(秒數) //使爬蟲停等時間 search.send_keys(Keys.RETURN) //按下Enter ``` ---- ### Example ``` print(driver.title) search = driver.find_element(By.NAME, "query" ) search.clear() search.send_keys("比特幣") time.sleep(3) search.send_keys(Keys.RETURN) ``` ``` ==>出現有關比特幣的搜尋結果 ``` ---- ### 取得文章標題 ``` WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CLASS_NAME, "atm_26_loj7t9")) ) //等到讀到指定標籤,在執行下面的程式 titles = driver.find_elements(By.CLASS_NAME, "atm_cs_1urozh") for title in titles : print(title.text) //迴圈 ``` ![](https://i.imgur.com/lV75kZe.png) --- ## 來做個好玩東西! #### 來玩個網頁小遊戲 [TSJ](https://tsj.tw) ![](https://i.imgur.com/QqnzOGB.jpg =50%x) ###### 若有雷同存屬巧合 ---- #### 先建構環境跟要開的網頁 ``` from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.support.ui import Select,WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import requests from bs4 import BeautifulSoup import io PATH = "/usr/local/bin/chromedriver.exe" driver = webdriver.Chrome(PATH) driver.get("https://tsj.tw/") ``` ---- #### 點擊按鈕位置 #### 次數 ``` blow = driver.find_element(By.ID,'click') blow_count = driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[2]/h4[2]') ``` ###### tags: 記得最外層要把"蓋成' ---- #### 等等? XPath? ![](https://i.imgur.com/7qC7sCV.png =40%x) #### 簡單來說就是物件在HTML中的絕對位置 ##### 若要找的東西標籤重複,就可以用來指定位置~ ---- #### 讀取商品按鈕 ``` items = [] items.append(driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[4]/table/tbody/tr[4]/td[5]/button[1]')) items.append(driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[4]/table/tbody/tr[3]/td[5]/button[1]')) items.append(driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[4]/table/tbody/tr[2]/td[5]/button[1]')) ``` ---- #### 讀取商品價格 ``` prices = [] prices.append(driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[4]/table/tbody/tr[4]/td[4]')) prices.append(driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[4]/table/tbody/tr[3]/td[4]')) prices.append(driver.find_element(By.XPATH, '//*[@id="app"]/div[2]/div[4]/div[4]/table/tbody/tr[2]/td[4]')) ``` ---- #### 前置點擊器 ``` actions = ActionChains(driver)#操作滑鼠 for n in range(10): actions.click(blow) ``` ---- #### 自動購買道具 ``` for i in range(10000): actions.perform() #動作執行 count = int(blow_count.text.replace("您目前擁有", "").replace("技術點", "")) #替換掉字串為空字串只留數字 for j in range(3): #跑三次去比價 price = int(prices[j].text.replace("技術點", "")) if count >= price: #購買 upgrade_actions = ActionChains(driver) upgrade_actions.move_to_element(items[j]) upgrade_actions.click() upgrade_actions.perform() #動作執行 break ``` --- ## 不夠過癮? [Examples](https://blog.jiatool.com/series/python網路爬蟲實例/) ###### ~~學長教太爛~~🥲 --- # 🔥🦐(乾蝦) #### 期待你們能夠做出🐮🍺的機器人~~
{"metaMigratedAt":"2023-06-17T22:54:00.256Z","metaMigratedFrom":"Content","title":"逼逼逼...機油好難喝🤖","breaks":true,"contributors":"[{\"id\":\"064a1252-03a4-49c8-866b-c200224a2af9\",\"add\":5332,\"del\":340}]"}
    126 views