Program Performance

# Program Performance ## Table of Contents 1. [Starting to think about performance](#starting) 2. [Scaling vs. runtime](#scaling) 3. [Two more examples](#examples) ## Starting to think about performance We've talked some about making programs more readable and concise, but we haven't talked much yet about making programs fast (or efficient in other ways). Today, we’ll start to learn how to reason carefully about program performance. Here’s a Python program: ```python def count_the(l: list) -> int: s = 0 for s in l: if s == 'the': s += 1 return s ``` What's the running time of the `count_the` function? One way to find out would be to actually time an execution on some real data. We could load in Frankenstein again, call the function on the list that `split()` returns, and actually time it. ``` tim@somerville sep24 % python3 prep.py frankenstein.txt 4066 ``` That went by far faster than we could actually count with a stopwatch. But the approach has other flaws too. What are they? <details> <summary>Think, then click!</summary> * Some of the time will be taken up by starting the program, loading the file into memory, and other operations that have nothing to do with the runtime of `count_the`. * Our stopwatch only tells us the runtime for a specific computer, under a specific existing workload, in a specific environment. There will also be some randomness involved. * This tells us how long the program takes on Frankenstein, but what about other inputs? How is the function likely to scale as the books get bigger? We can't run the function on all possible books, so how can we really predict its performance _overall_? </details> ## Runtime vs. Scaling The truth is that, when we talk about how time-efficient a program is, there are two very different things we might mean. (1) How long is the program taking to run on a particular input, or on a set of inputs? (2) How well does the program _scale_ as the input becomes larger or more complex? Which question you're asking depends on what problems you're trying to solve. For today, we'll focus on the second question: how well does a program scale? Let’s see if we can figure out a rough estimate of how long it takes to run `count_the` _without_ actually running it. We can start by counting the "basic operations," like adding numbers or assigning to a variable, executed in the function: ``` def count_the(l: list) -> int: s = 0 # 1 creation, 1 assignment for s in l: # repeated len(l) times if s == 'the': # 1 string comparison s += 1 # 1 add, 1 assignment return s # 1 to return ``` So for a list of length L, the program executes (3 * L) + 3 operations. This might sound odd. Not all operations take the same amount of time (and there's a fair bit of variation across different computers). We are losing a bit of precision, but we're gaining generality. Here's what I mean: in general, what is going to dominate the runtime of the program: the setup, or the loop? And, as inputs get bigger, what's going to dominate the runtime: the number of times the loop runs, or the time it takes to run the loop? If we're really concerned with how our function's performance _scales_ as inputs get bigger, the exact details of how long an operation takes often become irrelevant. Instead, we'll say that the running time of `count_the` is _proportional to (2 * L) + 3_. Notice how many assumptions we’re making here! For one, we’re not counting the for-loop itself as an operation; do we need to? Python’s operations probably don't actually take the same amount of time; does it matter? For some purposes, the answers to these questions absolutely matter. Some code is run inside Google every time a user does a web search. The exact running time of that code is very important to Google! But if you're trying to pick a data structure or an algorithm (more on this later) concrete runtime is less important than how your choice will scale. In fact, even this much precision is more than we’ll generally use going forward! It's hard to be sure about how long specific operations take: if returning from a function takes a while, then the _real_ runtime could be more like (2 * L) + 100. If addition takes a while it could be (50 * L) + 3. So we'll focus on the most important value here: the list length. We'll say that our function's running time is *linear in L*. That is, it does some constant amount of work for each element of L, and an additional constant amount of work regardless. ## Two more examples Next class we'll introduce some commonly-used notation for expressing this idea. But before that, let's look at a couple more functions. ### Example 1: testing membership in a list How about this one? ``` def is_member(l: list, ele) -> bool: for x in l: if x == ele: return True return False ``` How long does this function take, for a list of length L? The running time of member depends on its arguments even more than the running time of `count_the` did! Best Case: If we call it on a list whose first element is `ele`, it really doesn’t matter how long the list is. It's going to check the first element and no more, so it'll take the same, constant, amount of time regardless of how long the list is. We can express this by saying that the _best case_ time is _constant_. Computer programming is an engineering discipline, though. If I'm building a bridge, I'm probably not all that interested in the _maximum_ weight it can tolerate under ideal conditions! What we're usually interested in (though we’ll see some exceptions, later in the course) is _worst-case_ running time: how long will the function take on the worst possible input or inputs? Worst Case: If `ele` isn’t actually in `l`, then the loop in `is_member` needs to traverse the entire list. As with `count_the`, we'll say that the function runs time _linear_ in the length of `l`. ### Example 2: removing duplicates from a list We’ll take a look at one last function: ``` def distinct(l: list) -> list: seen = [] for x in l: if x not in seen: seen.append(x) return seen ``` This one is a bit more complicated. The for loop runs once for every element in the list, but how long does every iteration of the loop take? To figure that out, we need to learn something about how lists function in Python. In Python, `x not in seen` is basically equivalent in to calling our previous example function. To check membership in a list, Python will walk the list just like our function did. But that means that every time the `for` loop in `distinct` runs, that single iteration takes linear time in the length of `seen`. Best Case: What’s the best case? Well, if all of the elements in `l` are the same, `x not in seen` never takes more than a constant number of operations. So in that case, the function runs in linear time. Worst Case: How about the worst case? What if the elements are all _different_? <details> <summary>Think, then click!</summary> If all the elements are different, then every `x` will be new, and thus won't be found in the `seen` list. But then the size of the `seen` list grows every time the loop iterates: at first it's 0, then 1, then 2, and so on. Some geometric intuition might be useful here: ![](https://hackmd.io/_uploads/r1KnPwsQK.png) How many blocks are there in this diagram? `(0 + 1 + 2 + ... + (len(l) - 1))` of them. With a little algebra, this simplifies to `(L * (L - 1))/2`. Which means that the function will scale with the _square_ of the length of the input. We'll call this _quadratic_ time. </details> How much worse is this? Does it actually matter in practice? Let's find out. We'll time how long it takes to run`count_the` on the text of Frankenstein, and how long it takes to run `distinct` on the same input. ``` tim@somerville sep24 % time python3 prep.py frankenstein.txt python3 prep.py frankenstein.txt 0.03s user 0.01s system 73% cpu 0.059 total tim@somerville sep24 % time python3 prep.py frankenstein.txt python3 prep.py frankenstein.txt 2.27s user 0.03s system 96% cpu 2.384 total ``` The difference is actually quite noticeable even without a timer: 30 milliseconds versus over 2 full seconds. Nearly 2 orders of magnitude. What if we tried on a larger book? I've downloaded the full text of an English translation of Mallory's _The Death of Arthur_, which is a bit longer: ``` tim@somerville sep24 % wc frankenstein.txt 7743 78122 448821 frankenstein.txt tim@somerville sep24 % wc arthur.txt 34131 368286 1965003 arthur.txt ``` (That's 78122 words versus 368286 words.) ``` tim@somerville sep24 % time python3 prep.py arthur.txt python3 prep.py arthur.txt 0.08s user 0.03s system 59% cpu 0.197 total tim@somerville sep24 % time python3 prep.py arthur.txt python3 prep.py arthur.txt 8.57s user 0.07s system 98% cpu 8.795 total ``` Now it's an even bigger gap! 80 milliseconds versus over 8 seconds. And it'll only get worse as the data gets bigger, unless we get "lucky" and the input is closer to the best case. ## Why does runtime matter? (If time permits) It's useful to consider the ways that runtime matters in the world. People often point out that a faster program uses less electricity and therefore has a lower climate impact. While this is true, I personally tend to be skeptical of generalizing this, since faster execution also makes room for running _more_ programs. Sure, there are lots of business-focused reasons to have fast code, but there is at least one social factor: access. When building software, we don't know what kind of computer our eventual users will have. A laptop that a highly-paid developer in San Diego or New York might consider "average" could be a powerful and expensive luxury for others.