More Testing, Comprehensions, and Problem Solving

# More Testing, Comprehensions, and Problem Solving ## Table of Contents 0. [Tim's Homework](#tim) 1. [Testing functions that modify memory](#memory) 2. [Problem Solving](#solving) 3. [Comprehensions](#comprehensions) ### A note on these notes In class, we shuffled the content a little bit. These notes cover two different lectures: Friday September 17th and Monday September 20th. ## Tim's Homework <a name="tim"></a> Someone asked how they could run the interactive Python prompt with everything in a file already loaded. I looked into this, and the answer is to use the `-i` flag when running Python. On Tim's laptop, this looks like: ``` python3 -i <filename.py> ``` ### A reminder: Tim's office hours Tim's conceptual hours are Mondays at 2:30 to 3:20pm. (Sometimes these run a bit later, but not always; hence the :20.) In order to safely encourage many people to attend, these are held on Zoom at [this link](https://brown.zoom.us/my/tim.browncs). _Note: a previous version of these notes erroneously said 3-3:50, which is wrong!_ If you need to talk one-on-one about anything private, say so privately by email or other means to make sure Tim makes time. ## Testing functions that modify memory <a name="memory"></a> Let’s say we have this (strange) function: ```python def add_len_to_list(l: list): l.append(len(l)) ``` How can we test it? We can't look at what it returns, since it doesn’t return anything (technically, it returns the value `None`). We might try to change the function, so that its body is `return l.append(len(l))`, but it turns out that doesn't help: the function still returns `None`. This is because the `append` method itself returns `None`: it's not designed to produce a new list, but rather to modify the current list. So what can we do? We can start by creating a test list, calling `add_len_to_list` on it, and then asserting something about the test list. An example might look like: ```python def test_add_len(): l = [] add_len_to_list(l) assert l == [0] ``` This test checks the behavior of `add_len_to_list` when it's given an empty list. What's going on in memory when we run this test? (See the lecture capture for pictures.) ### A warning Watch out for reusing containers between tests that modify those containers. For instance, if I'd written: ```python def test_add_len(): l = [] add_len_to_list(l) assert l == [0] add_len_to_list(l) assert l == [0] ``` I might have been very surprised (why?) This is a toy example of something that can become a problem in a larger suite of tests. Be careful about _changes to state_ in your test functions: try to keep changes isolated to a single function, and be aware of how the changes happen within it. ### A digression on equality Note that I said "an" empty list, not "the" empty list. It's an important distinction, but unfortunately one that many (including me!) can miss when speaking. The difference comes down to context: do we mean to identify a list by its contents only, or are speaking of a particular object in memory? More on this in a future lecture, but for now, notice what happens if we run: ```python list1 = [2] list2 = [2] print(list1 == list2) ``` What does this tell you about how Python compares lists? Is it comparing them according to their contents, or according to whether they are the same object in memory? ## Problem Solving <a name="solving"></a> Let's say we have a problem we want to solve by writing a program. How should we start? In an earlier lecture, I recommended writing down what data you have (and what form it's in), and what you want to produce. From there, you can break down the problem into smaller pieces. If you can solve each subproblem, and then combine them together, you can solve the whole problem. Some subproblems can be solved by writing helper functions, and we’ve seen a number of examples of this. Other times, subproblems correspond to calls to built-in functions, or particular variables we keep track of. Breaking down problems is a skill you develop over time, and it's a skill we'll be practicing a lot in this class. The most important thing to remember is: even if you aren't sure how to do something in Python, you can still break it down into smaller pieces. Here's an example. ### Example 1: Cast of Characters Let’s say we wanted to build a set of all of the words in a text that start with a capital letter, and fully uppercase each of those. We can call our function `cast_of_characters`. Let’s go ahead and write some tests for this first: ```python # in wordcount.py: def cast_of_chars(txt: str) -> set: pass ``` ```python # in wordcount_test.py: def test_cast_of_chars(): assert cast_of_chars("") == set() assert cast_of_chars("Ashley") == {"ASHLEY"} assert cast_of_chars("'hello,' said Ashley") == {"ASHLEY"} assert cast_of_chars("hello Ashley hello Ben hello Ashley") == {"ASHLEY", "BEN"} ``` Ok, now we have a (very) rough understanding of what the function should do. How do we get started? Well, let's follow the recipe from class. * We know that we're taking in a single string containing the text we want to process. * We know we want to return a list of strings, each containing fully capitalized words. Now, let's break down the task into a bunch of little pieces. What might we need to do? <details> <summary>Think, then click!</summary> In class, some thoughts included: * splitting the input into a list of separate words; * removing punctuation from the words; * filtering out uncapitalized words; and * fully capitalizing each word remaining. </details> Note that these suggest a bit of structure: one naturally flows into another, and so on. This won't always be the case, but it certainly helps here! Now we can write a skeleton for the function. Maybe we don't yet know how to actually do all the subtasks, but we can give the result of each a name and record which uses which. <details> <summary>Think, then click!</summary> ```python # in wordcount_test.py: def cast_of_chars(s: str): words = [] # ??? should use s caps_words = [] # ??? should use words cleaned = [] # ??? should use caps_words all_caps = [] # ??? should use cleaned return all_caps ``` </details> Do we know how to split a string up by blank space? Yes: use `split()`! Also, do we have an idea how we might pass over a list, looking for capitalized words? Yes: use a loop! Even if we don't know what to do in that loop, or how exactly to detect capitalization, we can fill it in. And so on. (Remember to look for the `# ???` comments I've left to remind myself to fill in a hole later.) <details> <summary>Think, then click!</summary> ```python # in wordcount_test.py: def cast_of_chars(s: str): words = s.split() # split by blank space caps_words = [] for word in words: if True: # ??? add only if capitalized caps_words += [word] cleaned = [] for word in caps_words: cleaned += [word] # ??? add version without punctuation all_caps = [] for word in cleaned: all_caps += [word] # ??? add capitalized version return all_caps ``` </details> Instead of trying to write the entire function at once, I've left placeholders: a `True` instead of a real test for capitalization, or just adding a word instead of cleaning it or capitalizing it first. Gradually, we can fill these in, too: <details> <summary>Think, then click!</summary> ```python # in wordcount_test.py: def cast_of_chars(s: str): words = s.split() # split by blank space caps_words = [] for word in words: if word[0].isupper(): # check first letter caps_words += [word] cleaned = [] for word in caps_words: cleaned += [word.replace(",", "")] # remove commas all_caps = [] for word in cleaned: all_caps += [word.upper()] # fully uppercase return all_caps ``` </details> Breaking problems down like this is useful, even if you have decades of programming experience. If you watch an long-time expert programmer working, it might appear like they aren't following this process, but the truth is that they've likely internalized it so well that it's hard to see. If you're just starting to learn to play piano, would you judge your progress against someone with decades of full-time experience? I hope not! ### Example 2: the "Rainfall" problem Let's say we are tracking daily rainfall around Brown University location. Suppose we want to compute the average rainfall over the period for which we have useful sensor readings. Our rainfall sensor is a bit unreliable, and reports data in a weird format (both of these problems are things you’re likely to encounter when dealing with real-world data!). In particular, our sensor data is a list of numbers like: ```python sensor_data = [1, 6, -2, 4, -999, 4, 5] ``` The -999 represents the end of the period we're interested in. This might seem strange: why not just end the list after the first `4` value? The truth is that real-world raw data formats sometimes use a "terminator" symbol like this one. The other negative numbers represent sensor error; we can’t really have a negative amount of rainfall. So we want to take the average of the non-negative numbers in the input list before the `-999`. How would we solve this problem? What are the subproblems? <details> <summary>Think, then click!</summary> * Finding the list segment before the `-999` * Filtering out the negative values * Computing the average of the positive rainfall days </details> This time, you will drive the entire process of building the function: * note what your input and output look like; * write a few tests to understand the shape of the problem; * brainstorm the steps you might use to solve the problem (without worrying about how to actually perform them); * create a function skeleton (I like using `### ???` to record places I'm leaving undone); * gradually fill in the skeleton. Since these notes are being written before lecture, it's tough to anticipate the solutions you'll come up with, but here are two potential solutions: <details> <summary>Think, then click!</summary> ```python def average_rainfall(sensor_input: lst) -> float: number_of_readings = 0 total_rainfall = 0 for reading in sensor_input: if reading == -999: return number_of_readings / total_rainfall elif reading >= 0: number_of_readings += 1 total_rainfall += rainfall ``` In this solution, we loop over the list once. The first two subproblems are solved by returning early from the list and by ignoring the negative values in our loop. The final subproblem is solved with the `number_of_readings` and `total_rainfall variables`. ```python def list_before(l: list, item) -> list: result = [] for element in l: if element == item: return result result.append(element) return result def average_rainfall(sensor_input: lst) -> float: readings_in_period = list_before(sensor_input, -999) good_readings = [reading for reading in period if reading >= 0] return sum(good_readings) / len(good_readings) ``` In this solution, the first subproblem is solved with a helper funciton, the second subproblem with a call to the built-in `filter` function, and the third subproblem with calls to the built-in `sum` and `len` functions. </details> #### I wonder... What are some advantages and disadvantages of these two approaches? ## Comprehensions <a name="comprehensions"></a> Recall possible solutions to the cast-of-characters prompt above. Here's one: ```python def cast_of_chars(txt: string) -> set: words = txt.split() s = set() for word in words: if word[0].isupper(): s.add(word.upper()) return s ``` Python gives us another way to write the same thing. Sometimes this other way can be much more concise, and is often more readable. Comprehensions let us write for-loops that build sets, dictionaries, or lists in one line. Using a comprehension, we could have written: ```python def cast_of_chars(txt: string) -> set: words = txt.split() s = {word.upper() for word in words if word[0].isupper()} return s ``` The most basic comprehension looks like this: ```[x for x in l]``` This loops over `l` and creates a list of every element. It makes a list because we used square brackets; using braces would make a set or dictionary. We could do something else to `x`: ```[x + 1 for x in l]``` And we can add a conditional: ```[x + 1 for x in l if x > 4]``` We can also build dictionaries with comprehensions: ```{x: x + 1 for x in l if x > 4}``` You don’t have to use comprehensions in your own code just yet, but you can if you want. I'll be using them sometimes, so I wanted to introduce them now.