--- tags: CS200Prep, 2021 title: CS200 Bridge--Running Times --- # CS200 Bridge Assignment: Running Times In the last week of class, we got a brief introduction to how computer scientists measure program efficiency: they count the number of operations that a computation takes to perform. Being able to work with the basics of this is the main non-programming topic in the gap between CS17 and CS111. CS200 will work more with this topic. For now, we need you to get warmed up with it. This document provides lecture notes on the topic prior to the problems down below. The lecture notes to into more detail on program cost than we did in lecture. There is no coding on this assignment. You'll submit your work in a file named `runningtimes.pdf`. You can work in a word processor or on paper and upload a photo of your work. Submit to the Gradescope assignment for bridge no. 5. :::info See **[here](https://brown.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=de555c80-9e78-42c3-aec8-b0ac011484c8)** for the summer 2022 office hours recording on this assignment. *Note: as with all of the office hours recordings, you will get a lot more out of watching this recording ***after*** you've attempted the problems in this assignment.* ::: --- ## The Basic Rules of Computing Program Costs This is a concise summary of the rules for computing program costs. The textbook chapter that we will point you to further down provides a deeper look. ### Costs of operations - Accessing a variable (in the directory) takes 1 step - Arithmetic operations each take 1 step - Calling a function takes 1 step - Updating the value of a variable in the directory takes 1 step (for =), plus the time to compute the new value - Updating the value stored in a field in the heap takes 1 step for each location you have to follow to get to the right spot, plus the time to compute the new value. For example, - `myacct.balance = 100` takes 3 steps: 1 to access `myacct`, 1 to get to the `balance` location, and 1 for the `=`. There is no cost for using a number, boolean, or string - `custA.account.balance = 100` would take 4 steps: the extra step compared to the previous example is because we follow two locations (one per `.`) as compared to one in the previous example - Walking through all elements of a list (whether with recursion or a for-loop) takes a number of steps equal to the length of the list. Typically, we introduce a fresh variable for the size of each input whose size can vary ### Combining costs Three rules guide how to combine costs for individual programming constructs into costs for an entire program: - if two statements/expressions occur sequentially (one after the other), we add their costs - if we have a loop or recursive function, we multiply the size of the data (number of iterations) by the work that we do on each individual element - if we have an `if/else` statement, we add the cost of computing the question to the larger cost of the `if` and `else` answers. ### Keeping it simple: focus on the worst case In practice, we can't predict the exact number of steps that will be taken in a program without knowing the exact input. For example, consider a program to check whether an item is in a list: ``` for elt in some_list: if (elt == data_to_find) return True return False ``` How many times will we perform the comparison check? It depends on where (if anywhere) the `data_to_find` is in the input list. How do we predict the cost of this code then? We will make a **worst-case assumption** that the `data_fo_find` is not in `some_list`, and hence we will check every element. This is a sufficient approximation for many settings, and it is a more manageable way to get started thinking about costs. We'll begin to relax it in CS200. --- ## When is one cost better than another? After using the rules to compute costs, we are left with cost summaries along the lines of - $N * 2 + 4$ - $N * 2 + 7$ - $N + N + N + 8$ - $N * (N + 3) + 2$ - etc When should we say that one cost is worse than another? Technically, the costs in these expressions are getting worse from the first to the last expression. But do we really care about a difference of 3 operations (as between the first two) in the context of processing a large list? If the list were large enough, the cost of visiting every element are clearly more impactful than an extra operation here and there. In practice, we classify program costs by the nature of the function from the size of the input to the cost. Let's recast our four examples as functions named $T$ (for "time"): - $T1(N) = N * 2 + 4$ - $T2(N) = N * 2 + 7$ - $T3(N) = N + N + N + 8$ - $T4(N) = N * (N + 3) + 2$ If we were to graph these functions over increasing values of `N`, we'd find that the first three yielded straight-line graphs (constant slopes), while the fourth curves upward over time. Back in high school, you learned terminology for functions based on the shapes of their graphs, such as "linear", "quadratic", and "exponential". As a reminder, here's a pictorial view: ![](https://i.imgur.com/wrNK86V.png) *Linear (red), quadratic (blue), and exponential (green) growth [image from [Wikipedia](https://en.wikipedia.org/wiki/Exponential_growth)]* In computer science, we typically describe the running times of programs in these terms. For example, for a list-membership function, we might say - `member` is linear or - `member` is in $O(N)$ (where $O(N)$ is the set of all functions whose growth is linear in the size of $N$.) Specifically, the terms you will see in computer science are: - constant -- $O(1)$ - logarithmic -- $O(log N)$ - linear -- $O(N)$ - $O(N * log N)$ [serves as its own name] - quadratic -- $O(N^2)$ - exponential -- $O(2^N)$ *[For those feeling rusty on logarithmic growth, it is the inverse of exponential, and grows much more slowly than linear. For a graphical representation, look at the top-left red/blue/green graph at this [Wikipedia page on logarithmic scales](https://en.wikipedia.org/wiki/Logarithmic_scale) (about half way down on the right).]* When we compare algorithms for time, the higher in this list an algorithm is classified in, the faster it will be as data get large. We don't worry about time comparisons for small data (they'll be fast enough because the data are small). We worry about larger data. --- ## The Assignment For this assignment, we want you to practice determining the costs of programs, using code that you wrote in the programming component. Specifically, for each of the problems below: 1. mark up or comment a copy of the code with the costs of each statement/expression 2. produce the algebraic expression for the total cost of the program as a function of the input size 3. state the running time in its summary form **The Problems to Analyze** * insertion sort * mergesort * quicksort * binary search tree lookup (`has-elt` on the binary search tree assignment) ### An example of what we're looking for Here's a sample showing what we want to see for a filter-like function (you can type comments on the code or write on a copy of the code, as you prefer). ``` fun is-large(num :: Number) -> Boolean: num > 15 # costs 3 end #| Cost Formula: Tis-large(num) = 3 Summary Form Tis-large is in O(1) [or Tis-large is constant] |# fun keep-large(numList :: List<Number>) -> List<Number>: cases(List) numList: | empty => empty # costs 2 (question, get empty) | link(fst, rst) => # costs 1 (for the question) if is-large(fst): # costs 2 + Tis-large(fst) link(fst, keep-large(rst)) # costs 4 + Tkeep-large(N-1) else: keep-large(rst) # costs Tkeep-large(N-1) end end end #| Cost Formula: Tkeep-large = 2 + 1 + (2 + Tis_large(fst) ) + max(4 + Tkeep-large(N-1), Tkeep-large(N-1)) Tkeep-large = 5 + O(1) + 4 + Tkeep-large(N-1) Tkeep-large = O(1) + Tkeep-large(N-1) Tkeep-large = O(1) + O(1) + ... + O(1) for N times Summary Form: Tkeep-large is in O(N) [or Tkeep-large is linear] |# ``` If you want more detail, particularly on writing turning the code annotations and recurrence relations into cost formulas, [section 13.10 of the textbook](https://dcic-world.org/2023-02-21/predicting-growth.html) goes through this (it uses $c$ to represent a constant value). All of chapter 13 is an expanded version of the notes in this handout. If you want more detail than what's been written in this handout, you can look at Kathi Fislers's Fall 2018 lecture video on chapter 18 (which was the chapter number for this content in an old edition of DCIC): - [Lecture Capture from CS-111, Dec 2018](https://brown.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=5376e59e-d1bc-45b6-bf02-a9b300f9b1b3) - [Code handout that went with lecture capture](https://docs.google.com/document/d/1t7CQdqtZfJANw8FKpIXsBXPa9y66zq_Y2cRTztNBlao/edit?usp=sharing) <!-- - [Lecture from CS0190, Sept 14 2018](https://brown.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=1fc33c24-b6c4-4f59-ad0a-a95b0105d612) --> ## Reflection Write a few sentences about what you’ve learned from these problems (either on paper, in a file, etc). Also mention any questions you have (if any) based on these exercises.