# Messenger Bot: Example Bot with Web Scraping
###### tags: `MessengerBot`
To extend from the Echo Bot, I made a RateMyProf bot using the same tools and skills, with a little bit of web scraping.
---
## The Process
Goal: Input a professor's name and output their RateMyProfessor's link.
Since I want to send a link back to the user, I'll probably need:
- send_buttons()
- URL button
- Scrape the ratemyprofessor.com's website to grab specific information
---
## Web Scraping - BeautifulSoup
Beautiful Soup is a library that makes it easy to get information from web pages.
[BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
Install dependencies (in the terminal):
```bash
pip3 install --user bs4 requests
```
Steps of web scraping process:
1. Make a request with requests module via a URL.
2. Retrieve the HTML content as text.
3. Examine the HTML structure closely to identify the particular HTML element from which to extract data. To do this, right click on the web page in the browser and select inspect options to view the structure.
4. Use BeautifulSoup to find the particular element from the response and extract the text.
Import dependencies
```python
import requests
from bs4 import BeautifulSoup
```
---
## RateMyProf Website
I'll use Prof. Schurgers as an example: http://www.ratemyprofessors.com/ShowRatings.jsp?tid=605064
Notice how every professor's page URL is differentiated by an id number appended to the end of the URL. However, we don't have this information! Instead, we want to search by name.
How can we do that? Search bar! 🔍
----
### Search Query
ratemyprofessor.com's search bar returns this:
http://www.ratemyprofessors.com/search.jsp?query=curt+schurgers
Notice that the search query is structured as follows: http://www.ratemyprofessors.com/search.jsp?query=[YOUR SEARCH HERE]
So, the first step is for my bot to be able to return the search query for an input name.
From the Echo Bot, we know we can get the user's input using `messaging_event['message']['text'] `
Now, we only need to process that string and append it to the end of the search URL.
``` python
# This splits the input string into a list
input = messaging_event['message']['text'].lower().split()
# This turns the list back into a string but separated by +
prof_search_query = "+".join(input)
```
----
### Making the Request
``` python
# Base url for Rate My Professor search
rmp_url = "http://www.ratemyprofessors.com/search.jsp?query="
# Make the request to get the page
page = requests.get(rmp_url + prof_search_query)
# Use BeautifulSoup to get the HTML content
soup = BeautifulSoup(page.content, 'html.parser')
```
----
### Parsing the HTML Content
``` python
# Get the href suffix of the first professor
prof_link = soup.find('li', {'class': 'listing PROFESSOR'}).findChild('a', href=True, recursive=False).get('href')
# Append suffix to get full url
full_rmp_link = "https://www.ratemyprofessors.com" + prof_link
```
---
### Return the Link as a Button
``` python
# Define the WEB_URL button
buttons =
[
ActionButton(ButtonType.WEB_URL, "Rate My Professor", url=full_rmp_link)
]
# Send the link as a button
messager.send_buttons(sender_id, "Here's the RMP Link: ", buttons)
```
---
Full Code:
```python=
# This splits the input string into a list
input = messaging_event['message']['text'].lower().split()
# This turns the list back into a string but separated by +
prof_search_query = "+".join(input)
# Base url for Rate My Professor search
rmp_url = "https://www.ratemyprofessors.com/search.jsp?query="
# Make the request to get the page
page = requests.get(rmp_url + prof_search_query)
# Use BeautifulSoup to get the HTML content
soup = BeautifulSoup(page.content, 'html.parser')
# Get the href suffix of the first professor
prof_link = soup.find('li', {'class': 'listing PROFESSOR'}).findChild('a', href=True, recursive=False).get('href')
# Append suffix to get full url
full_rmp_link = "https://www.ratemyprofessors.com" + prof_link
# Define the WEB_URL button
buttons = [ActionButton(ButtonType.WEB_URL, "Rate My Professor", url=full_rmp_link)]
# Send the link as a button
messager.send_buttons(sender_id, "Here's the RMP Link: ", buttons)
```
---
Warning ⚠: Some websites are not scrapable with just BeautifulSoup due to the website generating dynamic content using Javascript
For these, you'd need to use another tool like Selenium to do the scraping. [Check out this tutorial](https://www.scrapingbee.com/blog/selenium-python/).
Another Warning ⚠: Selenium is not compatible with Glitch.com unfortunately, so you might want to consider alternative options (see below)
#### Alternative options:
- Using an API (next week's tutorial 😎)
- (Sometimes) Manually parse the req.text returned from BeautifulSoup