Selenium 基本用法與函數

安裝與測試

Chrome webdriver:
https://chromedriver.chromium.org/
Install Selenium module

pip install selenium

Selenium啟動Chrome測試










from selenium import webdriver
from time import sleep

driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(10) #等待10秒(隱性等待)
driver.get('http://example.com')

sleep(5) #睡眠5秒(強制等待)

driver.quit() #關閉所有開啟的瀏覽器視窗及安全結束交談期間

讀取檔案時






import os

file_path = "file:///" + os.path.abspath("檔案名稱") #取得檔案絕對路徑

#os.path.abspath(".")    #當前目錄的絕對路徑
#os.path.abspath(r"..")  #上級目錄的絕對路徑

driver可用屬性(driver.*)

title	page_source
取得title內容	取得網頁原始碼

driver網頁資料定位函數(driver.*)

定位函數	說明
find_element_by_id()	id屬性定位資料
find_element_by_name()	name屬性定位資料
find_element_by_xpath()	xpath表達式定位資料
find_element_by_link_text()	超連結文字定位資料
find_element_by_partital_link_text()	部分超連結文字定位資料
find_element_by_tag_name()	標籤名稱定位資料
find_element_by_class_name()	class屬性定位資料
find_element_by_css_selector()	CSS選擇器定位資料
※以上函數只會取得第一筆資料，要取得所有符合條件的資料時，在element加上s。
例:find_elements_by_name()
※例外:HTML網頁的id屬性值為唯一，所以沒有find_elements_by_id()。?待確認?

定位資料後可用屬性(driver.定位函數.*)

tag_name	get_attribute("屬性名稱")	text
取得標籤名稱	取得屬性值	取得屬性內容

get_attribute("innerHTML")取得標籤原始碼(不含標籤本身)
get_attribute("outerHTML")取得標籤原始碼(含標籤本身)

例外物件

例外物件	說明
ElementNotSelectableException	選取的是不允許被選取的元素
ElementNotVisibleException	元素存在但是不可見
ErrorInResponseException	伺服端回應錯誤
NoSuchAttributeException	選取元素的指定屬性並不存在
NoSuchElementException	選取的元素不存在
TimeoutException	超過時間期限















from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import os

driver = webdriver.Chrome("./chromedriver")
html_path = "file:///" + os.path.abspath("Exception.html")
driver.get(html_path)

try:
    content = driver.find_element_by_css_selector("h2.content")
    print(content.text)
except NoSuchElementException:
    print("選取的元素不存在")
    
driver.quit()

Selenium 基本用法與函數

lxml html.fromstring