# table of contents
[toc]
# requirement
- install the following requirements first.
```bash!
pip install selenium
sudo apt update
sudo apt install -y chromium-chromedriver
```
## make sure you use python3 version >= python3.11
### check version
- `python3 --version`
### Using conda to Specify Python 3 Version
1. Install conda first.
2. Add conda to the PATH:
- ```bash
echo 'export PATH="/opt/conda/bin:$PATH"' >> ~/.bashrc
```
4. Create a conda virtual environment:
- ```bash
conda create --name myenv python=3.10
```
5. Activate the virtual environment:
- ```bash
conda activate myenv
```
- Confirm the Python 3 version in use:
- ```bash
python3 --version
```
- ```bash
which python3
```
6. Deactivate the conda virtual environment:
- ```bash
conda deactivate
```
## Activate conda
- ```bash
conda activate myenv
```
- Add the activation to ``.bashrc`:
- ```bash
echo 'conda activate myenv' >> ~/.bashrc
```
### (Obsolete)configure python3 to use version python3.11
- First, make sure Python 3.11 is installed in your WSL Ubuntu environment. You can do this by running:
```
sudo apt update
sudo apt install python3.11
```
- Now, use the update-alternatives command to set Python 3.11 as the default:
```
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
```
- The 1 at the end is the priority, and it's set to the highest priority.
- You can then use the following command to configure the alternatives and choose Python 3.11 as the default:
```
sudo update-alternatives --config python3
```
- After selecting Python 3.11, you can verify the change by running:
```
python3 --version
```
### Resolve the `ModuleNotFoundError: No module named 'apt_pkg'`issue
```
sudo apt remove python3-apt
sudo apt autoclean
sudo apt install python3-apt
```
# running version: GUI vs no GUI
## GUI version
### sample code
```python!
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
wd = webdriver.Chrome(options=options)
# Navigate to the specified URL
wd.get('target_url')
# make the website last until ctrl-c
while True:
time.sleep(1)
# Close the WebDriver
wd.quit()
```
## no GUI version with firefox(run in server)
### requirements
```bash
apt-get update
apt install firefox-geckodriver
```
- Add following line to your code
```python
import sys
sys.path.insert(0,'/usr/lib/firefox-geckodriver')
```
### sample code
```python!
from selenium import webdriver
#from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time
import sys
sys.path.insert(0,'/usr/lib/firefox-geckodriver')
options = webdriver.FirefoxOptions()
options.add_argument('--headless') # Run Firefox in headless mode (no GUI)
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Firefox(options=options)
# Specify the URL of the web page
url = "target_url"
driver.get(url)
element = driver.find_element(By.XPATH, '//tagname[@attribute="value"]')
```
# some common command
## Get element by XPATH
```python
element = driver.find_element(By.XPATH, '//tagname[@attribute="value"]/tagname[@attribute="value"]...')
```
## wait
### Explicit Waits
- wait for a condition to happen before certain time
#### sample code
```python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
```
### Implicit Waits
- wait for certain second after each driver operation
```python
from selenium import webdriver
driver = webdriver.Firefox()
driver.implicitly_wait(10) # seconds
driver.get("http://somedomain/url_that_delays_loading")
myDynamicElement = driver.find_element_by_id("myDynamicElement")
```
## action chains
- define a chain of actions, then perfome it in certain time.
- sample code (not runnable)
```python
from selenium.webdriver.common.action_chains import ActionChains
# Click the element
actions = ActionChains(webdriver)
actions.move_to_element(button_element)
actions.click(button_element)
try:
actions.perform()
#button.click()
except Exception as e:
# Handle other exceptions
pass
#print(f"An unexpected error occurred: {e}")
## take screenshot
```python
driver.save_screenshot("file_name.png")
```
## with OCR
### install
```bash
sudo apt-get install tesseract-ocr
pip install pytesseract Pillow
```
### sample code
```python
from PIL import Image
import pytesseract
# Open the captured screenshot
screenshot = Image.open('captcha_area.png')
# Perform OCR to extract text
extracted_text = pytesseract.image_to_string(screenshot)
# Print the extracted text
print("Extracted Text:")
print(extracted_text)
```
# References
- https://steam.oxxostudio.tw/category/python/spider/selenium.html
- [XPath in Selenium: All You Need to Know](https://www.simplilearn.com/tutorials/selenium-tutorial/xpath-in-selenium)
- [selenium-python.readthedocs.io](https://selenium-python.readthedocs.io/waits.html)