MessengerBot
To extend from the Echo Bot, I made a RateMyProf bot using the same tools and skills, with a little bit of web scraping.
Goal: Input a professor's name and output their RateMyProfessor's link.
Since I want to send a link back to the user, I'll probably need:
Beautiful Soup is a library that makes it easy to get information from web pages.
Install dependencies (in the terminal):
pip3 install --user bs4 requests
Steps of web scraping process:
Import dependencies
import requests
from bs4 import BeautifulSoup
I'll use Prof. Schurgers as an example: http://www.ratemyprofessors.com/ShowRatings.jsp?tid=605064
Notice how every professor's page URL is differentiated by an id number appended to the end of the URL. However, we don't have this information! Instead, we want to search by name.
How can we do that? Search bar! ๐
ratemyprofessor.com's search bar returns this:
http://www.ratemyprofessors.com/search.jsp?query=curt+schurgers
Notice that the search query is structured as follows: http://www.ratemyprofessors.com/search.jsp?query=[YOUR SEARCH HERE]
So, the first step is for my bot to be able to return the search query for an input name.
From the Echo Bot, we know we can get the user's input using messaging_event['message']['text']
Now, we only need to process that string and append it to the end of the search URL.
# This splits the input string into a list
input = messaging_event['message']['text'].lower().split()
# This turns the list back into a string but separated by +
prof_search_query = "+".join(input)
# Base url for Rate My Professor search
rmp_url = "http://www.ratemyprofessors.com/search.jsp?query="
# Make the request to get the page
page = requests.get(rmp_url + prof_search_query)
# Use BeautifulSoup to get the HTML content
soup = BeautifulSoup(page.content, 'html.parser')
# Get the href suffix of the first professor
prof_link = soup.find('li', {'class': 'listing PROFESSOR'}).findChild('a', href=True, recursive=False).get('href')
# Append suffix to get full url
full_rmp_link = "https://www.ratemyprofessors.com" + prof_link
# Define the WEB_URL button
buttons =
[
ActionButton(ButtonType.WEB_URL, "Rate My Professor", url=full_rmp_link)
]
# Send the link as a button
messager.send_buttons(sender_id, "Here's the RMP Link: ", buttons)
Full Code:
# This splits the input string into a list
input = messaging_event['message']['text'].lower().split()
# This turns the list back into a string but separated by +
prof_search_query = "+".join(input)
# Base url for Rate My Professor search
rmp_url = "https://www.ratemyprofessors.com/search.jsp?query="
# Make the request to get the page
page = requests.get(rmp_url + prof_search_query)
# Use BeautifulSoup to get the HTML content
soup = BeautifulSoup(page.content, 'html.parser')
# Get the href suffix of the first professor
prof_link = soup.find('li', {'class': 'listing PROFESSOR'}).findChild('a', href=True, recursive=False).get('href')
# Append suffix to get full url
full_rmp_link = "https://www.ratemyprofessors.com" + prof_link
# Define the WEB_URL button
buttons = [ActionButton(ButtonType.WEB_URL, "Rate My Professor", url=full_rmp_link)]
# Send the link as a button
messager.send_buttons(sender_id, "Here's the RMP Link: ", buttons)
Warning โ : Some websites are not scrapable with just BeautifulSoup due to the website generating dynamic content using Javascript
For these, you'd need to use another tool like Selenium to do the scraping. Check out this tutorial.
Another Warning โ : Selenium is not compatible with Glitch.com unfortunately, so you might want to consider alternative options (see below)