### Espaço de Tecnologias e Artes - Sesc Avenida Paulista
## Grupo de estudos em Python
### `hackmd.io/@sesc-av-paulista/estudos-em-python-26-agosto`
### Raspagem de dados - primeiros passos
- Site para testes! https://toscrape.com/
- Exemplo do Guilherme Felitti da atividade "Mares de Texto" (requests + beatiful soup) https://colab.research.google.com/drive/1fbwglei7YcrdeKJZqZPHWQsRnWoV61ad?usp=sharing
- [Site do Tiago com raspagem de eventos do Sesc - calendariosp.com](https://www.calendariosp.com/)
### Bibliotecas
- requests
- BeatifulSoup
- [Scrapy](https://www.scrapy.org/)
- [Selenium](https://www.selenium.dev/documentation/webdriver/getting_started/first_script/?language=python)
- Curso do Dunossauro de Selenium https://www.youtube.com/watch?v=PHHXksljGNA&list=PLOQgLBuj2-3LqnMYKZZgzeC7CKCPF375B&ab_channel=EduardoMendes
-
### Exemplo do Felitti
Funcionou no Sesc Nacional (sesc.com.br) mas não no de São Paulo (sescsp.org.br)
```python=
import requests
from bs4 import BeautifulSoup
try:
r = requests.get('https://www.sesc.org.br')
except Exception as e:
print(str(e))
exit()
html = r.content
site = BeautifulSoup(html, 'html.parser')
noticias = site.find_all('div', {'class':'item box_loop1'})
for noticia in noticias:
print(noticia.find('h3').get('title'))
```
### Exemplo com Selenium
```python=
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome()
actions = ActionChains(driver)
driver.get("https://www.sescsp.org.br")
sleep(1)
b = driver.find_element(By.CLASS_NAME,
'button-policy')
b.click()
driver.execute_script(
"window.scrollTo(0, document.body.scrollHeight);")
b_load_more = driver.find_element(By.CLASS_NAME,
'destaques-home-load-more')
actions.move_to_element(b_load_more).perform()
b_load_more.click()
## tentativas que não deram muito certo
#sleep(5)
#actions.move_to_element(b_load_more).perform()
#b_load_more.click()
#sleep(2)
#actions.move_to_element(b_load_more).perform()
#b_load_more.click()
```
### Dica para quem tiver dificuldades de instalar e usar o webdriver
```python=
from webdriver_manager.chrome import ChromeDriverManager
try:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
except Exception as e:
print(e)
driver = webdriver.Chrome()
finally:
driver.get(url)
time.sleep(1)
```