Program Performance

Table of Contents

  1. Starting to think about performance
  2. Scaling vs. runtime
  3. Two more examples

Starting to think about performance

We've talked some about making programs more readable and concise, but we haven't talked much yet about making programs fast (or efficient in other ways). Today, we’ll start to learn how to reason carefully about program performance.

Here’s a Python program:

def count_the(l: list) -> int:
    s = 0
    for s in l:
        if s == 'the':
            s += 1
    return s

What's the running time of the count_the function? One way to find out would be to actually time an execution on some real data. We could load in Frankenstein again, call the function on the list that split() returns, and actually time it.

tim@somerville sep24 % python3 prep.py frankenstein.txt
4066

That went by far faster than we could actually count with a stopwatch. But the approach has other flaws too. What are they?

Think, then click!
  • Some of the time will be taken up by starting the program, loading the file into memory, and other operations that have nothing to do with the runtime of count_the.

  • Our stopwatch only tells us the runtime for a specific computer, under a specific existing workload, in a specific environment. There will also be some randomness involved.

  • This tells us how long the program takes on Frankenstein, but what about other inputs? How is the function likely to scale as the books get bigger? We can't run the function on all possible books, so how can we really predict its performance overall?


Runtime vs. Scaling

The truth is that, when we talk about how time-efficient a program is, there are two very different things we might mean.

(1) How long is the program taking to run on a particular input, or on a set of inputs?

(2) How well does the program scale as the input becomes larger or more complex?

Which question you're asking depends on what problems you're trying to solve. For today, we'll focus on the second question: how well does a program scale?

Let’s see if we can figure out a rough estimate of how long it takes to run count_the without actually running it. We can start by counting the "basic operations," like adding numbers or assigning to a variable, executed in the function:

def count_the(l: list) -> int:
    s = 0               # 1 creation, 1 assignment
    for s in l:         # repeated len(l) times
        if s == 'the':  # 1 string comparison
            s += 1      # 1 add, 1 assignment
    return s            # 1 to return          

So for a list of length L, the program executes (3 * L) + 3 operations.

This might sound odd. Not all operations take the same amount of time (and there's a fair bit of variation across different computers). We are losing a bit of precision, but we're gaining generality. Here's what I mean: in general, what is going to dominate the runtime of the program: the setup, or the loop? And, as inputs get bigger, what's going to dominate the runtime: the number of times the loop runs, or the time it takes to run the loop?

If we're really concerned with how our function's performance scales as inputs get bigger, the exact details of how long an operation takes often become irrelevant. Instead, we'll say that the running time of count_the is proportional to (2 * L) + 3.

Notice how many assumptions we’re making here! For one, we’re not counting the for-loop itself as an operation; do we need to? Python’s operations probably don't actually take the same amount of time; does it matter?

For some purposes, the answers to these questions absolutely matter. Some code is run inside Google every time a user does a web search. The exact running time of that code is very important to Google! But if you're trying to pick a data structure or an algorithm (more on this later) concrete runtime is less important than how your choice will scale.

In fact, even this much precision is more than we’ll generally use going forward! It's hard to be sure about how long specific operations take: if returning from a function takes a while, then the real runtime could be more like (2 * L) + 100. If addition takes a while it could be (50 * L) + 3. So we'll focus on the most important value here: the list length. We'll say that our function's running time is linear in L. That is, it does some constant amount of work for each element of L, and an additional constant amount of work regardless.

Two more examples

Next class we'll introduce some commonly-used notation for expressing this idea. But before that, let's look at a couple more functions.

Example 1: testing membership in a list

How about this one?

def is_member(l: list, ele) -> bool:
    for x in l:
        if x == ele:
            return True
    return False

How long does this function take, for a list of length L?

The running time of member depends on its arguments even more than the running time of count_the did!

Best Case: If we call it on a list whose first element is ele, it really doesn’t matter how long the list is. It's going to check the first element and no more, so it'll take the same, constant, amount of time regardless of how long the list is. We can express this by saying that the best case time is constant.

Computer programming is an engineering discipline, though. If I'm building a bridge, I'm probably not all that interested in the maximum weight it can tolerate under ideal conditions! What we're usually interested in (though we’ll see some exceptions, later in the course) is worst-case running time: how long will the function take on the worst possible input or inputs?

Worst Case: If ele isn’t actually in l, then the loop in is_member needs to traverse the entire list. As with count_the, we'll say that the function runs time linear in the length of l.

Example 2: removing duplicates from a list

We’ll take a look at one last function:

def distinct(l: list) -> list:
    seen = []
    for x in l:
        if x not in seen:
            seen.append(x)
    return seen

This one is a bit more complicated. The for loop runs once for every element in the list, but how long does every iteration of the loop take? To figure that out, we need to learn something about how lists function in Python.

In Python, x not in seen is basically equivalent in to calling our previous example function. To check membership in a list, Python will walk the list just like our function did. But that means that every time the for loop in distinct runs, that single iteration takes linear time in the length of seen.

Best Case: What’s the best case? Well, if all of the elements in l are the same, x not in seen never takes more than a constant number of operations. So in that case, the function runs in linear time.

Worst Case: How about the worst case? What if the elements are all different?

Think, then click!

If all the elements are different, then every x will be new, and thus won't be found in the seen list. But then the size of the seen list grows every time the loop iterates: at first it's 0, then 1, then 2, and so on.

Some geometric intuition might be useful here:

How many blocks are there in this diagram? (0 + 1 + 2 + ... + (len(l) - 1)) of them. With a little algebra, this simplifies to (L * (L - 1))/2. Which means that the function will scale with the square of the length of the input. We'll call this quadratic time.


How much worse is this? Does it actually matter in practice? Let's find out. We'll time how long it takes to runcount_the on the text of Frankenstein, and how long it takes to run distinct on the same input.

tim@somerville sep24 % time python3 prep.py frankenstein.txt
python3 prep.py frankenstein.txt  0.03s user 0.01s system 73% cpu 0.059 total

tim@somerville sep24 % time python3 prep.py frankenstein.txt
python3 prep.py frankenstein.txt  2.27s user 0.03s system 96% cpu 2.384 total

The difference is actually quite noticeable even without a timer: 30 milliseconds versus over 2 full seconds. Nearly 2 orders of magnitude.

What if we tried on a larger book? I've downloaded the full text of an English translation of Mallory's The Death of Arthur, which is a bit longer:

tim@somerville sep24 % wc frankenstein.txt
    7743   78122  448821 frankenstein.txt
tim@somerville sep24 % wc arthur.txt
   34131  368286 1965003 arthur.txt

(That's 78122 words versus 368286 words.)

tim@somerville sep24 % time python3 prep.py arthur.txt
python3 prep.py arthur.txt  0.08s user 0.03s system 59% cpu 0.197 total

tim@somerville sep24 % time python3 prep.py arthur.txt      
python3 prep.py arthur.txt  8.57s user 0.07s system 98% cpu 8.795 total

Now it's an even bigger gap! 80 milliseconds versus over 8 seconds. And it'll only get worse as the data gets bigger, unless we get "lucky" and the input is closer to the best case.

Why does runtime matter?

(If time permits) It's useful to consider the ways that runtime matters in the world. People often point out that a faster program uses less electricity and therefore has a lower climate impact. While this is true, I personally tend to be skeptical of generalizing this, since faster execution also makes room for running more programs.

Sure, there are lots of business-focused reasons to have fast code, but there is at least one social factor: access. When building software, we don't know what kind of computer our eventual users will have. A laptop that a highly-paid developer in San Diego or New York might consider "average" could be a powerful and expensive luxury for others.