{%hackmd @88u1wNUtQpyVz9FsQYeBRg/r1vSYkogS %} # Datacamp Python Data Science Toolbox (Part 2) > Lee Tsung-Tang ###### tags: `python` `iterator` `list comprehension` `generator` `datacamp` --- [TOC] --- ## You’ve learned: * Writing custom functions * Using custom functions in data science * List comprehensions * Wrangle data to create other lists * Iterators * You’ve encountered these before! * Rapidly iterate data science protocols and procedures over sets of objects --- ## Using iterators in PythonLand ### Iterating with a for loop * We can iterate over a list using a for loop ```python= In [1]: employees = ['Nick', 'Lore', 'Hugo'] In [2]: for employee in employees: ...: print(employee) Nick Lore Hugo ``` ```python= In [1]: for letter in 'DataCamp': ...: print(letter) D a t a C a m p ``` ```python= In [1]: for i in range(4): ...: print(i) 0 1 2 3 ``` --- ### Iterators vs. iterables * Iterable * Examples: lists, strings, dictionaries, file connections * An object with an associated `iter()` method * Applying `iter()` to an iterable creates an iterator * Iterator * Produces next value with `next()` --- ### Iterating over iterables: next() ```python= In [1]: word = 'Da' In [2]: it = iter(word) In [3]: next(it) Out[3]: 'D' In [4]: next(it) Out[4]: 'a' In [5]: next(it) ----------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-11-2cdb14c0d4d6> in <module>() ----> 1 next(it) StopIteration: ``` --- ### Iterating at once with * ```python= In [1]: word = 'Data' In [2]: it = iter(word) In [3]: print(*it) D a t a In [4]: print(*it) # No more values to go through! ``` --- ### Iterating over dictionaries ```python= In [1]: pythonistas = {'hugo': 'bowne-anderson', 'francis': 'castro'} In [2]: for key, value in pythonistas.items(): ...: print(key, value) francis castro hugo bowne-anderson ``` --- ### Iterating over file connections ```python= In [1]: file = open('file.txt') In [2]: it = iter(file) In [3]: print(next(it)) This is the first line. In [4]: print(next(it)) This is the second line. ``` --- ### practice - Iterators vs Iterables :::info Let's do a quick recall of what you've learned about iterables and iterators. Recall from the video that an iterable is an object that can return an iterator, while an iterator is an object that keeps state and produces the next value when you call `next()` on it. In this exercise, you will identify which object is an iterable and which is an iterator. The environment has been pre-loaded with the variables `flash1` and `flash2`. Try printing out their values with `print()` and `next()` to figure out which is an iterable and which is an iterator. </br> **Possible Answers** ```python= In [1]: flash1 Out[1]: ['jay garrick', 'barry allen', 'wally west', 'bart allen'] In [2]: flash2 Out[2]: <list_iterator at 0x7f7f3b5700f0> ``` - [ ] Both flash1 and flash2 are iterators. - [ ] Both flash1 and flash2 are iterables. - [x] flash1 is an iterable and flash2 is an iterator. ::: --- - Iterating over iterables (1) :::info Great, you're familiar with what iterables and iterators are! In this exercise, you will reinforce your knowledge about these by iterating over and printing from iterables and iterators. You are provided with a list of strings `flash`. You will practice iterating over the list by using a `for` loop. You will also create an iterator for the list and access the values from the iterator. ::: :::success * Create a `for` loop to loop over `flash` and print the values in the list. Use `person` as the loop variable. * Create an iterator for the list `flash` and assign the result to `superspeed`. * Print each of the items from `superspeed` using `next()` 4 times. ```python= # Create a list of strings: flash flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen'] # Print each list item in flash using a for loop for person in flash: print(person) >>> jay garrick >>> barry allen >>> wally west >>> bart allen # Create an iterator for flash: superspeed superspeed = iter(flash) # Print each item from the iterator print(next(superspeed)) print(next(superspeed)) print(next(superspeed)) print(next(superspeed)) >>> jay garrick >>> barry allen >>> wally west >>> bart allen ``` ::: --- - Iterating over iterables (2) :::info One of the things you learned about in this chapter is that not all iterables are actual lists. A couple of examples that we looked at are strings and the use of the `range()` function. In this exercise, we will focus on the `range()` function. You can use `range()` in a `for` loop as if it's a list to be iterated over: ```python= for i in range(5): print(i) ``` Recall that `range()` doesn't actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit (in the example, until the value 4). If `range()` created the actual list, calling it with a value of $10^{100}$ may not work, especially since a number as big as that may go over a regular computer's memory. The value $10^{100}$ is actually what's called a Googol which is a 1 followed by a hundred 0s. That's a huge number! Your task for this exercise is to show that calling `range()` with $10^{100}$ won't actually pre-create the list. ::: :::success * Create an iterator object `small_value` over `range(3)` using the function `iter()`. * Using a `for` loop, iterate over `range(3)`, printing the value for every iteration. Use `num` as the loop variable. * Create an iterator object `googol` over `range(10 ** 100)`. ```python= # Create an iterator for range(3): small_value small_value = iter(range(3)) # Print the values in small_value print(next(small_value)) print(next(small_value)) print(next(small_value)) >>> 0 >>> 1 >>> 2 # Loop over range(3) and print the values for num in range(3): print(num) >>> 0 >>> 1 >>> 2 # Create an iterator for range(10 ** 100): googol googol = iter(range(10 ** 100)) # Print the first 5 values from googol print(next(googol)) print(next(googol)) print(next(googol)) print(next(googol)) print(next(googol)) >>> 0 >>> 1 >>> 2 >>> 3 >>> 4 ``` ::: --- - Iterators as function arguments :::info You've been using the `iter()` function to get an iterator object, as well as the `next()` function to retrieve the values one by one from the iterator object. There are also functions that take iterators and iterables as arguments. For example, the `list()` and `sum()` functions return a list and the sum of elements, respectively. In this exercise, you will use these functions by passing an iterable from `range()` and then printing the results of the function calls. ::: :::success * Create a `range` object that would produce the values from 10 to 20 using `range()`. Assign the result to `values`. * Use the `list()` function to create a list of values from the range object `values`. Assign the result to `values_list`. * Use the `sum()` function to get the sum of the values from 10 to 20 from the range object `values`. Assign the result to `values_sum`. ```python= In [1]: # Create a range object: values values = range(10,21) # Print the range object print(values) range(10, 21) In [2]: # Create a list of integers: values_list values_list = list(values) # Print values_list print(values_list) [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] In [3]: # Get the sum of values: values_sum values_sum = sum(range(10,21)) # Print values_sum print(values_sum) 165 ``` ::: --- ## Playing with iterators ### Using enumerate() 建構含index與value的iterator ```python= In [1]: avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] In [2]: e = enumerate(avengers) In [3]: print(type(e)) #<class 'enumerate'> In [4]: e_list = list(e) In [5]: print(e_list) [(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')] ``` 一般的循環都是在循每一個tuple的index或value進行 --- ### enumerate() and unpack ```python= In [1]: avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] In [2]: for index, value in enumerate(avengers): ....: print(index, value) 0 hawkeye 1 iron man 2 thor 3 quicksilver In [3]: for index, value in enumerate(avengers, start=10): ....: print(index, value) 10 hakweye 11 iron man 12 thor 13 quicksilver ``` --- ### Using zip() 建構兩個list value的iterator ```python= In [1]: avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] In [2]: names = ['barton', 'stark', 'odinson', 'maximoff'] In [3]: z = zip(avengers, names) In [4]: print(type(z)) #<class 'zip'> n [5]: z_list = list(z) In [6]: print(z_list) [('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')] ``` --- ### zip() and unpack ```python= In [1]: avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] In [2]: names = ['barton', 'stark', 'odinson', 'maximoff'] In [3]: for z1, z2 in zip(avengers, names): ....: print(z1, z2) hawkeye barton iron man stark thor odinson quicksilver maximoff ``` --- ### Print zip with * ```python= In [1]: avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] In [2]: names = ['barton', 'stark', 'odinson', 'maximoff'] In [3]: z = zip(avengers, names) In [4]: print(*z) ('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff') ``` --- ### practice - Using enumerate :::info You're really getting the hang of using iterators, great job! You've just gained several new ideas on iterators from the last video and one of them is the `enumerate()` function. Recall that `enumerate()` returns an *enumerate object* that produces a ++sequence of tuples++, and each of the tuples is an ++index-value++ pair. In this exercise, you are given a list of strings mutants and you will practice using `enumerate()` on it by printing out a list of tuples and unpacking the tuples using a `for` loop. ::: :::success * Create a list of tuples from `mutants` and assign the result to `mutant_list`. Make sure you generate the tuples using `enumerate()` and turn the result from it into a list using `list()`. * Complete the first `for` loop by unpacking the tuples generated by calling `enumerate()` on `mutants`. Use `index1` for the index and `value1` for the value when unpacking the tuple. * Complete the second `for` loop similarly as with the first, but this time change the starting index to start from `1` by passing it in as an argument to the `start` parameter of `enumerate()`. Use `index2` for the index and `value2` for the value when unpacking the tuple. ```python= In [1]: # Create a list of strings: mutants mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde'] # Create a list of tuples: mutant_list mutant_list = list(enumerate(mutants)) # Print the list of tuples print(mutant_list) [(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')] In [2]: # Unpack and print the tuple pairs for index1,value1 in enumerate(mutants): print(index1, value1) 0 charles xavier 1 bobby drake 2 kurt wagner 3 max eisenhardt 4 kitty pryde In [3]: # Change the start index for index2,value2 in enumerate(mutants,start=1): print(index2, value2) 1 charles xavier 2 bobby drake 3 kurt wagner 4 max eisenhardt 5 kitty pryde ``` ::: --- - Using zip :::info Another interesting function that you've learned is `zip()`, which takes any number of iterables and returns a *zip object* that is an ++iterator of tuples++. If you wanted to print the values of a `zip` object, you can convert it into a list and then print it. Printing just a `zip` object will not return the values unless you unpack it first. In this exercise, you will explore this for yourself. Three lists of strings are pre-loaded: `mutants`, `aliases`, and `powers`. First, you will use `list()` and `zip()` on these lists to generate a list of tuples. Then, you will create a `zip` object using `zip()`. Finally, you will unpack this `zip` object in a `for` loop to print the values in each tuple. Observe the different output generated by printing the list of tuples, then the `zip` object, and finally, the tuple values in the `for` loop. ::: :::success * Using `zip()` with `list()`, create a list of tuples from the three lists `mutants`, `aliases`, and `powers` (in that order) and assign the result to `mutant_data`. * Using `zip()`, create a zip object called `mutant_zip` from the three lists `mutants`, `aliases`, and `powers`. * Complete the `for` loop by unpacking the `zip` object you created and printing the tuple values. Use `value1`, `value2`, `value3` for the values from each of `mutants`, `aliases`, and `powers`, in that order. ```python= In [10]: # Create a list of tuples: mutant_data mutant_data = list(zip(mutants, aliases, powers)) # Print the list of tuples print(mutant_data) [('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')] In [11]: # Create a zip object using the three lists: mutant_zip mutant_zip = zip(mutants, aliases, powers) # Print the zip object print(mutant_zip) <zip object at 0x7f2938773088> In [12]: # Unpack the zip object and print the tuple values for value1, value2, value3 in mutant_zip: print(value1, value2, value3) charles xavier prof x telepathy bobby drake iceman thermokinesis kurt wagner nightcrawler teleportation max eisenhardt magneto magnetokinesis kitty pryde shadowcat intangibility ``` ::: --- - Using * and zip to 'unzip' :::info You know how to use `zip()` as well as how to print out values from a `zip` object. Excellent! Let's play around with `zip()` a little more. There is no unzip function for doing the reverse of what `zip()` does. We can, however, reverse what has been zipped together by using `zip()` with a little help from `*`! `*` unpacks an iterable such as a list or a tuple into positional arguments in a function call. In this exercise, you will use `*` in a call to `zip()` to unpack the tuples produced by `zip()`. Two tuples of strings, `mutants` and `powers` have been pre-loaded. ::: :::success * Create a `zip` object by using `zip()` on `mutants` and `powers`, in that order. Assign the result to `z1`. * Print the tuples in `z1` by unpacking them into positional arguments using the `*` operator in a `print()` call. * Because the previous `print()` call would have exhausted the elements in `z1`, recreate the `zip` object you defined earlier and assign the result again to `z1`. * 'Unzip' the tuples in `z1` by unpacking them into positional arguments using the `*` operator in a `zip()` call. Assign the results to `result1` and `result2`, in that order. * The last `print()` statements prints the output of comparing `result1` to `mutants` and `result2` to `powers`. Click `Submit Answer` to see if the unpacked `result1` and `result2` are equivalent to `mutants` and `powers`, respectively. ```python= In [1]: # Create a zip object from mutants and powers: z1 z1 = zip(mutants, powers) # Print the tuples in z1 by unpacking with * print(*z1) ('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility') In [2]: # Re-create a zip object from mutants and powers: z1 z1 = zip(mutants, powers) # 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2 result1, result2 = zip(*z1) # Check if unpacked tuples are equivalent to original tuples print(result1 == mutants) print(result2 == powers) True True ``` ::: --- ## Using iterators for big data ### Loading data in chunks * There can be too much data to hold in memory * Solution: load data in chunks! * Pandas function: read_csv() * Specify the chunk: chunksize --- ### Iterating over data ```python= In [1]: import pandas as pd In [2]: result = [] In [3]: for chunk in pd.read_csv('data.csv', chunksize=1000): ...: result.append(sum(chunk['x'])) In [4]: total = sum(result) In [5]: print(total) 4252532 ``` ```python= In [1]: import pandas as pd In [2]: total = 0 In [3]: for chunk in pd.read_csv('data.csv', chunksize=1000): ...: total += sum(chunk['x']) In [4]: print(total) 4252532 ``` ### practice - Processing large amounts of Twitter data :::info Sometimes, the data we have to process reaches a size that is too much for a computer's memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once. In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed `'tweets.csv'` in Bringing it all together exercises of the prequel course, but this time, working on it in chunks of 10 entries at a time. If you are interested in learning how to access Twitter data so you can work with it on your own system, refer to Part 2 of the DataCamp course on Importing Data in Python. The pandas package has been imported as `pd` and the file `'tweets.csv'` is in your current directory for your use. Go for it! ::: :::success * Initialize an empty dictionary `counts_dict` for storing the results of processing the Twitter data. * Iterate over the `'tweets.csv'` file by using a `for` loop. Use the loop variable `chunk` and iterate over the call to `pd.read_csv()` with a `chunksize` of 10. * In the inner loop, iterate over the column `'lang'` in chunk by using a `for` loop. Use the loop variable `entry`. 批次讀入資料,並依照'lang'column的值逐一計次 ```python= In [7]: counts_dict = {} # Iterate over the file chunk by chunk for chunk in pd.read_csv('tweets.csv' ,chunksize=10 ): # Iterate over the column in DataFrame for entry in chunk["lang"] : if entry in counts_dict.keys(): counts_dict[entry] += 1 else: counts_dict[entry] = 1 # Print the populated dictionary print(counts_dict) {'en': 97, 'et': 1, 'und': 2} ``` ::: --- - Extracting information for large amounts of Twitter data :::info Great job chunking out that file in the previous exercise. You now know how to deal with situations where you need to process a very large file and that's a very useful skill to have! It's good to know how to process a file in smaller, more manageable chunks, but it can become very tedious having to write and rewrite the same code for the same task each time. In this exercise, you will be making your code more *reusable* by putting your work in the last exercise in a ***function definition***. The pandas package has been imported as `pd` and the file `'tweets.csv'` is in your current directory for your use. ::: :::success * Define the function `count_entries()`, which has 3 parameters. The first parameter is `csv_file` for the filename, the second is `c_size` for the chunk size, and the last is `colname` for the column name. * Iterate over the file in `csv_file` file by using a for loop. Use the loop variable `chunk` and iterate over the call to `pd.read_csv()`, passing `c_size` to chunksize. * In the inner loop, iterate over the column given by `colname` in `chunk` by using a for loop. Use the loop variable `entry`. * Call the `count_entries()` function by passing to it the filename `'tweets.csv'`, the size of chunks `10`, and the name of the column to count, `'lang'`. Assign the result of the call to the variable `result_counts`. ```python= In [4]: def count_entries(csv_file,c_size, colname ): """Return a dictionary with counts of occurrences as value for each key.""" # Initialize an empty dictionary: counts_dict counts_dict = {} # Iterate over the file chunk by chunk for chunk in pd.read_csv(csv_file, chunksize=c_size): # Iterate over the column in DataFrame for entry in chunk[colname]: if entry in counts_dict.keys(): counts_dict[entry] += 1 else: counts_dict[entry] = 1 # Return counts_dict return counts_dict # Call count_entries(): result_counts result_counts = count_entries('tweets.csv',10,'lang') # Print result_counts print(result_counts) {'en': 97, 'et': 1, 'und': 2} ``` ::: --- ### What’s next? * List comprehensions and generators * List comprehensions: * Create lists from other lists, DataFrame columns, etc. * Single line of code * More efficient than using a for loop ___ ## List comprehensions and generators ### Populate a list with a for loop ```python= In [1]: nums = [12, 8, 21, 3, 16] In [2]: new_nums = [] In [3]: for num in nums: ...: new_nums.append(num + 1) In [4]: print(new_nums) [13, 9, 22, 4, 17] ``` --- ### A list comprehension ```python= In [1]: nums = [12, 8, 21, 3, 16] In [2]: new_nums = [num + 1 for num in nums] In [3]: print(new_nums) [13, 9, 22, 4, 17] ``` --- ### For loop and list comprehension syntax ![](https://i.imgur.com/QEVgLrg.png) --- ### List comprehension with range() ```python= In [1]: result = [num for num in range(11)] In [2]: print(result) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` --- ### List comprehensions * Collapse for loops for building lists into a single line * Components * Iterable * Iterator variable (represent members of iterable) * Output expression --- ### Nested loops (1) ```python= In [1]: pairs_1 = [] In [2]: for num1 in range(0, 2): ...: for num2 in range(6, 8): ...: pairs_1.append(num1, num2) In [3]: print(pairs_1) [(0, 6), (0, 7), (1, 6), (1, 7)] ``` How to do this with a list comprehension? --- ### Nested loops (2) ```python= In [1]: pairs_2 = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)] In [2]: print(pairs_2) [(0, 6), (0, 7), (1, 6), (1, 7)] ``` **Tradeoff: readability** --- ### practice - Write a basic list comprehension :::info In this exercise, you will practice what you've learned from the video about writing list comprehensions. You will write a list comprehension and identify the output that will be produced. The following list has been pre-loaded in the environment. ``` doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson'] ``` How would a list comprehension that produces a list of the *first character* of each string in `doctor` look like? Note that the list comprehension uses `doc` as the iterator variable. What will the output be? ::: :::success - [ ] The list comprehension is `[for doc in doctor: doc[0]]` and produces the list `['h', 'c', 'c', 't', 'w']`. - [x] The list comprehension is `[doc[0] for doc in doctor]` and produces the list `['h', 'c', 'c', 't', 'w']`. - [ ] The list comprehension is `[doc[0] in doctor]` and produces the list `['h', 'c', 'c', 't', 'w']`. ```python= In [1]: [for doc in doctor: doc[0]] File "<stdin>", line 1 [for doc in doctor: doc[0]] ^ SyntaxError: invalid syntax In [2]: [doc[0] for doc in doctor] Out[2]: ['h', 'c', 'c', 't', 'w'] In [3]: [doc[0] in doctor] Traceback (most recent call last): File "<stdin>", line 1, in <module> [doc[0] in doctor] NameError: name 'doc' is not defined ``` ::: --- - List comprehension over iterables :::info You know that list comprehensions can be built over iterables. Given the following objects below, which of these can we build list comprehensions over? ```python= doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson'] range(50) underwood = 'After all, we are nothing more or less than what we choose to reveal.' jean = '24601' flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen'] valjean = 24601 ``` ::: :::success - [ ] You can build list comprehensions over all the objects except the string of number characters `jean`. - [ ] You can build list comprehensions over all the objects except the string lists `doctor` and `flash`. - [ ] You can build list comprehensions over all the objects except `range(50)`. - [x] You can build list comprehensions over all the objects except the integer object `valjean`. ::: 無法對數字進行迴圈 ___ - Writing list comprehensions :::info You now have all the knowledge necessary to begin writing list comprehensions! Your job in this exercise is to write a list comprehension that produces a list of the squares of the numbers ranging from 0 to 9. ::: :::success Using the range of numbers from `0` to `9` as your iterable and `i` as your iterator variable, write a list comprehension that produces a list of numbers consisting of the squared values of `i`. ```python= In [1]: squares = [i**2 for i in range(10)] In [2]: print(squares) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] ``` ::: --- - Nested list comprehensions :::info Great! At this point, you have a good grasp of the basic syntax of list comprehensions. Let's push your code-writing skills a little further. In this exercise, you will be writing a list comprehension *within* another list comprehension, or nested list comprehensions. It sounds a little tricky, but you can do it! Let's step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as **matrices**. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values `0` to `4` in each row can be written as: ```python= matrix = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] ``` Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the **output expression** of the overall list comprehension: **++`[`[output expression] `for` iterator variable `in` iterable`]`++** Note that here, the **output expression** is itself a list comprehension. ::: :::success * In the inner list comprehension - that is, the **output expression** of the nested list comprehension - create a list of values from `0` to `4` using `range()`. Use `col` as the iterator variable. * In the iterable part of your nested list comprehension, use `range()` to count 5 rows - that is, create a list of values from `0` to `4`. Use `row` as the iterator variable; note that you won't be needing this to create values in the list of lists. ```python= In [1]: # Create a 5 x 5 matrix using a list of lists: matrix matrix = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] # Print the matrix [[col for col in range(5)] for row in range(5) ] Out[1]: [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] ``` ::: --- ## Advanced comprehensions ### Conditionals in comprehensions Conditionals on the iterable ```python= In [1]: [num ** 2 for num in range(10) if num % 2 == 0] Out[1]: [0, 4, 16, 36, 64] ``` Python documentation on the % operator: ![](https://i.imgur.com/GYOtRbr.png) ```python= In [1]: 5 % 2 Out[1]: 1 In [2]: 6 % 2 Out[2]: 0 ``` Conditionals on the output expression ```python= In [2]: [num ** 2 if num % 2 == 0 else 0 for num in range(10)] Out[2]: [0, 0, 4, 0, 16, 0, 36, 0, 64, 0] ``` --- ### Dict comprehensions * Create dictionaries * Use curly braces `{}` instead of brackets `[]` ```python= In [1]: pos_neg = {num: -num for num in range(9)} In [2]: print(pos_neg) {0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8} In [3]: print(type(pos_neg)) <class 'dict'> ``` --- ### practice - Using conditionals in comprehensions (1) :::info You've been using list comprehensions to build lists of values, sometimes using operations to create these values. An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that! Recall from the video that you can apply a conditional statement to test the iterator variable by adding an `if` statement in the optional predicate expression part after the `for` statement in the comprehension: ++`[` output expression `for` iterator variable `in` iterable `if` predicate expression `]`++. You will use this recipe to write a list comprehension for this exercise. You are given a list of strings `fellowship` and, using a list comprehension, you will create a list that only includes the members of `fellowship` that have 7 characters or more. ::: :::success * Use `member` as the iterator variable in the list comprehension. For the conditional, use `len()` to evaluate the iterator variable. Note that you only want strings with 7 characters or more. ```python= In [2]: # Create a list of strings: fellowship fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] # Create list comprehension: new_fellowship new_fellowship = [member for member in fellowship if len(member)>=7] # Print the new list print(new_fellowship) ['samwise', 'aragorn', 'legolas', 'boromir'] ``` ::: --- - Using conditionals in comprehensions (2) :::info In the previous exercise, you used an `if` conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an `if-else` statement on the output expression of the list. You will work on the same list, `fellowship` and, using a list comprehension and an `if-else` conditional statement in the output expression, create a list that keeps members of `fellowship` with 7 or more characters and replaces others with an empty string. Use `member` as the iterator variable in the list comprehension. ::: :::success * In the output expression, keep the string as-is **if** the number of characters is >= 7, **else** replace it with an empty string - that is, `''` or `""`. ```python= In [1]: # Create a list of strings: fellowship fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] # Create list comprehension: new_fellowship new_fellowship = [member if len(member)>=7 else "" for member in fellowship] # Print the new list print(new_fellowship) ['', 'samwise', '', 'aragorn', 'legolas', 'boromir', ''] ``` ::: --- - Dict comprehensions :::info Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In t:his case, the comprehension is called a **dict comprehension**. Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces `{}` instead of `[]`. Additionally, members of the dictionary are created using a colon `:`, as in `<key> : <value>`. You are given a list of strings `fellowship` and, using a **dict comprehension**, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values. ::: :::success * Create a dict comprehension where the key is a string in `fellowship` and the value is the length of the string. Remember to use the syntax `<key> : <value>` in the output expression part of the comprehension to create the members of the dictionary. Use `member` as the iterator variable. ```python= In [3]: # Create a list of strings: fellowship fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] # Create dict comprehension: new_fellowship new_fellowship = { member:len(member) for member in fellowship } # Print the new dictionary print(new_fellowship) {'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5} ``` ::: --- - List comprehensions vs generators :::info You've seen from the videos that list comprehensions and generator expressions look very similar in their syntax, except for the use of ++parentheses `()` in generator expressions++ and ++brackets `[]` in list comprehensions++. In this exercise, you will recall the difference between list comprehensions and generators. To help with that task, the following code has been pre-loaded in the environment: ```python= # List of strings fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] # List comprehension fellow1 = [member for member in fellowship if len(member) >= 7] # Generator expression fellow2 = (member for member in fellowship if len(member) >= 7) ``` Try to play around with `fellow1` and `fellow2` by figuring out their types and printing out their values. Based on your observations and what you can recall from the video, select from the options below the best description for the difference between list comprehensions and generators. ::: :::success - [ ]List comprehensions and generators are not different at all; they are just different ways of writing the same thing. - [x]A list comprehension produces a list as output, a generator produces a generator object. - [ ]A list comprehension produces a list as output that can be iterated over, a generator produces a generator object that can't be iterated over. ```python= In [2]: fellow1 == fellow2 Out[2]: False In [3]: type(fellow1) Out[3]: list In [4]: type(fellow2) Out[4]: generator ``` ::: --- - Write your own generator expressions :::info You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own. Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses `()` instead of brackets `[]`; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with `.items()`, or used the `range()` function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes. Now, you will start simple by creating a generator object that produces numeric values. ::: :::success * Create a generator object that will produce values from `0` to `30`. Assign the `result` to result and use `num` as the iterator variable in the generator expression. * Print the first `5` values by using `next()` appropriately in `print()`. * Print the rest of the values by using a `for` loop to iterate over the generator object. ```python= In [9]: result = (num for num in range(31)) # Print the first 5 values print(next(result)) print(next(result)) print(next(result)) print(next(result)) print(next(result)) 0 1 2 3 4 In [10]: # Print the rest of the values for value in result : print(value) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` ::: --- - Changing the output in generator expressions :::info Great! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you! You are given a list of strings `lannister` and, using a generator expression, create a generator object that you will iterate over to print its values. ::: :::success * Write a generator expression that will generate the **lengths** of each string in `lannister`. Use `person` as the iterator variable. Assign the result to `lengths`. * Supply the correct iterable in the `for` loop for printing the values in the generator object. ```python= In [4]: lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey'] # Create a generator object: lengths lengths = (len(person) for person in lannister) # Iterate over and print the values in lengths for value in lengths: print(value) 6 5 5 6 7 ``` ::: ---