Sets and Dictionaries

## Ice Breaker 🧊 Share something you've recently gotten into or want to get into. Why? #### Any lingering questions from the last lesson? Yesterday we discussed Hashtables and how effective they can be at storing data efficently for particular problems. However, Hashtables aren't the only data structure good for efficiency. Today, we will discuss two more data structures that use the hashtabe idea. These data structures can be used to store data and when it is appropriate to use them. ### Objectives By the end of this lesson, you will learn 🧐: - About sets and dictionaries in Python - How to use the basic operations and methods associated with sets and dictionaries - Practical use cases for sets and dictionaries Follow along with this [Google Colab](https://colab.research.google.com/drive/1gSHTW9LGKvcm0QBdt4nYnYboolh3PiPn#scrollTo=LTCx_uzMMwX2) link. ## Sets A set is a data structure that is often used for data collection that: - Holds unordered data - Set is mutable - Are iterable - Collects unique elements ### Creating Sets There are two methods for creating sets in Python. The first is using `{}`, curly braces, around the data values. Let's create a set of some the data types we've learned so far. ```python= # We can add different types of data types to one set # like how in this one we have an int, float, # string, and boolean datatypes = {5, 2.0, "one", True} ``` If we print this set we will get: ```python= # Output {True, 2.0, 'one', 5} ``` Another way to create a set is to use the built in method`set()`. When using this method, you can put the set inside brackets like below, or parentheses. For instance: ```python= another_dataset = set([5, 2.0, "one", True]) ``` Also prints: ```python= # Output: {True, 2.0, 'one', 5} ``` Notice that the order the set was printed in is not the same as how we declared it. This what makes sets unique as they have no real order. ###### Footnote: By unique we mean collect unique elements. Since the data is the data we inputed so it isn't really unique. ### No Duplicates Sets also have a feature that allows them to automatically filter out duplicate data. So each element of the set can be present only once. To demonstrate, Lets bring back Nicole's grades from the lists lesson as a set. But with another 88 in the list. ```python= nicole_grades = set([75, 100, 90, 96, 88, 82, 45, 80, 88]) ``` If we print out nicoles grades, instead of giving us the same values back it will remove one of the 88s and give us back the a new set with no duplicates. ```python= print(nicole_grades) #Output: {96, 100, 75, 45, 80, 82, 87, 88, 90} ``` **Discussion:** Why do you think it would be useful to remove duplicates from data? What could be the downsides? <details> <summary> Think, then click 🤔 </summary> Pro: - **Time and Space Efficency:** it can be helpful in processing the data quickly. Like, if we wanted to look through a list of names but didn't want any duplicate info to come through a set would make that happen quicker than human navigation or an additional function. Con: - **Bad Data Preservation:** Keeping particular parts of data can be very important part in some programs and projects. For instance, if we wanted to find the total of all of nicoles grades, we shouldn't use a set to store them. </details> ### Methods for Sets Sets, like other data structures, have built-in methods that can be used on them. For sets these methods are: `add()`, `remove()`, `discard()`, and `clear()`. These work similarly to how some of them work on other data structures. `add()` - adds a key to the set ```python= mosaic_tas = {"Dior", "Melyssa", "Muhiim", "Kamryn"} mosaic_tas.add("Nicole") #output: {'Nicole', 'Kamryn', 'Muhiim', 'Melyssa', 'Dior'} ``` `discard()` and `remove()` do the same overall elimination of an item however, `discard()` runs no errors even when the item isnt in the list and `remove()` does. `discard()` ```python= my_set_with_discard = {1, 2, 3} my_set_with_discard.discard(2) # Removes 2 from the set my_set_with_discard.discard(4) # Element 4 doesn't exist in the set, so no error or change print(my_set_with_discard) # Output: {1, 3} ``` `remove()` ```python= my_set_with_remove = {1, 2, 3} my_set_with_remove.remove(2) # Removes 2 from the set # my_set_with_remove.remove(4) # Raises KeyError since element 4 doesn't exist in the set print(my_set_with_remove) # Output: {1, 3} ``` `clear()` - removes all the elements in a set. ```python= clear_set = {"one", "two", "three"} print(clear_set) # output: {'two', 'one', 'three'} clear_set.clear() print(clear_set) #output: set() ``` We can convert a list to a set, and vice versa: ```python= monster = ["very", "very", "scary"] set(monster) # output: {"very", "scary"} list(set(monster)) # output: ["very", "scary"] ``` <details> <summary> Think then Open! Why would we want to use a set instead of a list. </summary> Sets compared to lists can't store indexes (due to its hashtable implementation) or duplicate data, which saves space. So when navigating to a set, it takes shorter time. </details> ## Dictionaries Dictionaries are another way we can store information in Python. They map keys to values. What that means is dictionaries take the value you input, the key, and use it to look up something stored, the value. (The concept of keys is similar to keys in hashtables). Sometimes dictionaries are referred to as hashtables. ### Creating Dictionaries Dictionaries are instantiated using curly braces. ```python= tas_to_zodiac = {} ``` We can add key and values to the dictionary. Let’s say we want the keys to be strings (TA names) and the values to also be strings (zodiac signs). This is how we add to a dictionary: ``` python= tas_to_zodiac["Kam"] = "Capricorn" tas_to_zodiac["Nicole"] = "Aries" tas_to_zodiac["Fernando"] = "Libra" ``` #### Accessing Elements Then, if we wanted to look up Nicole’s zodiac sign, we can use that as the key! It will return the value to us in constant time. ```python= print(tas_to_zodiac["Nicole"]) # Output Aries ``` #### Modifying Elements Modifying the value of a key in a dictionary has a very similar syntax to access the value of a key. dictionary[key] = new_value Lets say we got the value of Nicole's zodiac wrong and want to change it. We can do the following. ```python= tas_to_zodiac["Nicole"] = "Scorpio" print(tas_to_zodiac["Nicole"]) #output: Scorpio ``` Then lets say we want to remove Fernando's element in the dictionary since he is no longer a TA. We can use `pop()`. We used `pop()` on lists to remove an element. It works similarly however for dictionaries you put the key instead of the index. ```python= tas_to_zodiac.pop("Fernando") print(tas_to_zodiac) #output: {'Kam': 'Capricorn', 'Nicole': 'Aries'} ``` ### Methods for Dictionaries Just like sets, dictionaries also have built in methods, however, these methods operate on the key-value pairs aspect of dictionaries. These methods are `keys()`, `values()`, `items()`. `keys()` - returns a object containing the keys of the dictionary. ```python= brown_dining_ratings = {"Ratty" : 5, "V-Dub": 7, "Andrews" : 8.5, "Ivy Room": 10, "Jos": 6} brown_dining_ratings.keys() #output: dict_keys(['Ratty', 'V-Dub', 'Andrews', 'Ivy Room', 'Jos']) ``` `values()` - returns a object containing the values of the dictionary. ```python= project_grades = {"Jack" : 10, "Kassidy" : 80, "Chris" : 60, "Crystal": 70} project_grades.values() #output: dict_values([10, 80, 60, 70]) ``` `items()` - returns a object containing the key-value pairs as tuples. ```python= fruit_dictionary = {'Apples': 3, 'Oranges': 2, 'Bananas': 6} fruit_dictionary.items() #output: dict_items([('Apples', 3), ('Oranges', 2), ('Bananas', 6)]) ``` ### Practice Lets go back to the student grades from Professor Williams class. Recall we stored the students and their grades in a nested list last week. **The Task**: Implement a function that returns a specific student's grades given a name. Here is the data to keep in mind. | Nicole | Melyssa | Muhiim | | ------ | ------- | ------ | | 85 | 92 | 78 | | 90 | 88 | 85 | | 78 | 95 | 80 | What data structure would you choose? You can definitely choose between dictionary or a set. There is more than one good answer! **Guiding Questions:** - What do we need to keep in mind before choosing that data structure? - What are the tradeoffs of each structure? - What should this function return? ```python= def student_grade(grades, student): ... ``` <details> <summary>Solution!</summary> I think for this solution since the grades have duplicates and we want to loop through and chcek the names, we should use dictionaries. ```python= grades = {"Muhiim": [85, 90, 78], "Melyssa":[92, 88, 95], "Nicole": [78,85,80]} def student_grade(grades, student): for person in grades: if person == student: #checks if person matches prefered student return grades[person] #returns the values for that person return print("Person not found") # returns if person not in list ``` If you wanted you could have used both data structures by making each student grade list into a set. You can try that on your own later if interested. </details> ## Differences between Sets and Dictionaries <details> <summary> Discussion: When would you choose a set over a dictionary? And when would you choose a dictionary over a set? 🤔 </summary> Sets are useful for checking if a value exists in the set. Dictionaries are great if you are holding more information (as a key and value pair) as they also have efficient data retrieval. </details> Some key info to know: | Sets | Dictionaries | | ------------------------------------- | ---------------------- | | Can't contain duplicates | can not have duplicate keys but have duplicate values | | Mutable | Mutable | | Can't Index into | Can index into | Sets are useful for handling unique data and performing set operations like union, intersection, membership checking and difference. Dictionaries are widely used for data indexing, mapping, and efficient data retrieval. ## Challenges ### Word Counts continued Lets do some more File IO work. Use this [google colab](https://colab.research.google.com/drive/1RH9e5w5aYqvFfc31AyPWkDRldL7G6C9h?usp=sharing) link to do the challenges. Its the same one from last week. Last week, we talked about how we could count how many times a word appears using lists and loops. What if we wanted to count how many times every single word appears? Is it possible to do it all in one go? Let's try writing a helper function to do that. If we've decided to use a dictionary, we'll start by writing the skeleton of the function, and then add the dictionary (which starts out empty) and return it. ```python= def count_words(s: str): word_count = {} # Code goes here, I don't know what yet return word_count ``` Now we just have to figure out how to convert the input string into the dictionary. To do that, we need to break the goal down into subtasks. What are they? <details> <summary>Think, then click!</summary> We need to: * break up the input string into words; and * count the number of times each word occurs To break up the input, we'll use the `split()` function again. To count the words, we'll loop over that list! </details> ### Challenge 1: Word Counts Create a helper function that determines the amount of times each word appears. Then puts those words in a dictionary. What will the key be? What is the value? <details> <summary>Solution!</summary> ```python= def count_words(s: str): word_count = {} #initilize the counts dictionary for word in s.split(): #splits the file and loops over it if word not in word_count: #checks if word is already in dictionary word_count[word] = 1 # if not adds it with a one for its value else: word_count[word] += 1 #if so adds one to its preexisting value ``` </details> ## ### Challenge 2: Common Words What can we do with this dictionary? Well, lots of things! But for today, let's try to figure out what the most common word in the text is. Create a function that finds the most common word in the text using the dictionary you made in common_words, `word_count`. <details> <summary>Solution!</summary> There are a few ways to do this, but let's opt for another loop. We can loop over dictionaries just like over a list. ```python= def most_common(word_count: dict): most_common = '' #initialize most_common as a string so it can be a key most_common_count = 0 #initialize most_common_count as a int so it can be a value for word in word_count: #loop through the word_count dictionary if word_count[word] > most_common_count: #checks if the current words value in the dictionary is more than the most common_count value most_common = word #if so change the most_common word to that key most_common_count = word_count[word] #change the most_common_count to that value return most_common ``` </details> ### Questions? ### Helpful Resources - [Further Information on Sets](https://www.geeksforgeeks.org/sets-in-python/) - [Sets over Lists](https://towardsdatascience.com/3-reasons-to-use-sets-over-lists-82b36980c9fd)