### Espaço de Tecnologias e Artes - Sesc Avenida Paulista ## Grupo de estudos em Python ### `hackmd.io/@sesc-av-paulista/estudos-em-python-26-agosto` ### Raspagem de dados - primeiros passos - Site para testes! https://toscrape.com/ - Exemplo do Guilherme Felitti da atividade "Mares de Texto" (requests + beatiful soup) https://colab.research.google.com/drive/1fbwglei7YcrdeKJZqZPHWQsRnWoV61ad?usp=sharing - [Site do Tiago com raspagem de eventos do Sesc - calendariosp.com](https://www.calendariosp.com/) ### Bibliotecas - requests - BeatifulSoup - [Scrapy](https://www.scrapy.org/) - [Selenium](https://www.selenium.dev/documentation/webdriver/getting_started/first_script/?language=python) - Curso do Dunossauro de Selenium https://www.youtube.com/watch?v=PHHXksljGNA&list=PLOQgLBuj2-3LqnMYKZZgzeC7CKCPF375B&ab_channel=EduardoMendes - ### Exemplo do Felitti Funcionou no Sesc Nacional (sesc.com.br) mas não no de São Paulo (sescsp.org.br) ```python= import requests from bs4 import BeautifulSoup try: r = requests.get('https://www.sesc.org.br') except Exception as e: print(str(e)) exit() html = r.content site = BeautifulSoup(html, 'html.parser') noticias = site.find_all('div', {'class':'item box_loop1'}) for noticia in noticias: print(noticia.find('h3').get('title')) ``` ### Exemplo com Selenium ```python= from time import sleep from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChains driver = webdriver.Chrome() actions = ActionChains(driver) driver.get("https://www.sescsp.org.br") sleep(1) b = driver.find_element(By.CLASS_NAME, 'button-policy') b.click() driver.execute_script( "window.scrollTo(0, document.body.scrollHeight);") b_load_more = driver.find_element(By.CLASS_NAME, 'destaques-home-load-more') actions.move_to_element(b_load_more).perform() b_load_more.click() ## tentativas que não deram muito certo #sleep(5) #actions.move_to_element(b_load_more).perform() #b_load_more.click() #sleep(2) #actions.move_to_element(b_load_more).perform() #b_load_more.click() ``` ### Dica para quem tiver dificuldades de instalar e usar o webdriver ```python= from webdriver_manager.chrome import ChromeDriverManager try: driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) except Exception as e: print(e) driver = webdriver.Chrome() finally: driver.get(url) time.sleep(1) ```