R 爬取動態網頁資料-RSelenium
LHB阿好伯, 2021/07/15
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
https://github.com/ropensci/RSelenium
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
簡介
Selenium 是自動化控制網路瀏覽器的工具
可以作為動態網頁的爬蟲工具
這時可能會有人想問什麼是動態網頁
我之前分享的以R語言爬取監測站歷史資料並以ggplot2繪製風玫瑰圖(風花圖,Wind Rose)_大寮測站為例
網頁的資料會隨網址變化
可以藉由修改網址取得所需資料
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
然而有時候會遇到像下面這網頁
他的資料是依據網頁上的選單互動所變化
這時候就可以利用Selenium讓程式像人一樣去操作網頁
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
安裝Java
https://java.com/zh-TW/download/ie_manual.jsp?locale=zh_TW
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
安裝 WebDriver
https://chromedriver.chromium.org/
創建一個目錄來放置可執行文件,例如 C:\WebDriver\bin或/opt/WebDriver/bin
在 Windows 上 - 以管理員身份打開命令提示符並運行以下命令以將該目錄永久添加到您計算機上所有用戶的路徑中:
setx /m path "%path%;C:\R\WebDriver"
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
您現在已準備好測試您的更改。關閉所有打開的命令提示符並打開一個新的
鍵入您在上一步中創建的文件夾中的一個二進製文件的名稱,例如:
chromedriver
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
安裝Selenium Server
https://www.selenium.dev/downloads/
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
並不需要執行檔案
而是將檔案放在CMD(命令提示字元)的路徑中即可
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
執行
java -jar selenium-server-standalone-3.141.59.jar
這邊有一個網路上都沒提到的重點
這時候不要關閉CMD才可以開始使用RSelenium
所以每一次使用套件前都要先執行selenium-server-standalone
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
常用函數
開啟、關閉瀏覽器
瀏覽器會顯示測試軟體控制
這時候代表你已經成功了XD
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
開啟特定網頁
定位
定位到網頁上某個物件需要使用函數
findElements( using = c("xpath", "css selector", "id", "name", "tag name", "class name", "link text", "partial link text"), value )
其中using可以是"xpath"、"css selector"、 "id"、 "name"、"tag name"、"class name"、 "link text"或"partial link text"
value是其相應的數值
例如在這個網頁中我們可以查詢他的XPath是
//*[(@id = "CPH1_ddl_Unit")]
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
網頁定位方法
滑鼠點擊
這時再搭配 clickElement()
就可以模擬滑鼠點擊的動作去選擇清單中的選項
click(buttonId = 0),預設buttonId爲0表示單擊左鍵
1表示單擊滾動條
2表示單擊右鍵
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
或是也可以選擇 mouseMoveToLocation(x = NA_integer_, y = NA_integer_, webElement = NULL)
來控制滑鼠移動
輸入框控制

擷取資料
getElementTagName() #查詢元素的標籤名稱
getElementText() #獲取元素的文字
其他實用函數
參考資料
https://cran.r-project.org/web/packages/RSelenium/RSelenium.pdf
https://docs.ropensci.org/RSelenium/
https://mran.microsoft.com/snapshot/2017-12-11/web/packages/RSelenium/vignettes/RSelenium-basics.html#sending-mouse-events-to-elements
🌟全文可以至下方連結觀看或是補充
全文分享至
https://www.facebook.com/LHB0222/
https://www.instagram.com/ahb0222/
有疑問想討論的都歡迎於下方留言
喜歡的幫我分享給所有的朋友 \o/
有所錯誤歡迎指教
