Asymptotic Analysis Notes

--- tags: fall 2018 cs61b --- # Asymptotic Analysis Notes Hello. If you spot any errors or are particularly confused on anything, feel free to [email](mailto:tinazhao@berkeley.edu) me. ## Table of Contents [TOC] ## Introduction ### Why use asymptotic analysis? Suppose I’m trying to compare my code to a friend’s code to see whose runs better. The best way to do this is to simply measure whose code runs faster (implies the code is efficient). We can do that in the following ways: **1. Time How Fast the Code Runs** One way to measure speed is to just time how fast the code runs in minutes (hopefully not hours!). However, the speed can vary based on the programming language, compiler, type of machine, etc. Therefore, this is not a objective way of measuring runtime. **2.** **Measure the Number of Operations Based on Input Size** This is the best way to measure runtime, but what does this even *mean*? By **operations**, we mean - Basic operations ($a += 1$) [*Constant time*] - Assignments ($a = b$) [*Constant time*] - For/while loops - Recursion Runtime is often measured as a function, $f(N)$, with $N$ representing the input size. We use notations to represent the runtime of a method with a familiar function. If the parameter is an array, we’ll measure the runtime of the function by the length of the array (`N = array.length`). The number of operations that our function does can vary based on input size. Asymptotic analysis in CS 61B is pretty reminiscent of CS 61A orders of growth. Since we are measuring the runtime of a method at **large** inputs, we can still do many simplifications (think of it as we’re taking the **limit of the function $f(N)$ to $N \rightarrow \infty$**). **1. Drop constants:** - $\Theta(N + 1) = \Theta(N)$ - $\Theta(5N) = \Theta(N)$ **2. Drop smaller terms:** - $\Theta(N + \textrm{log } N) = \Theta(N)$ **3. Omit logarithmic bases:** - Since the bases of a logarithmic function can be converted by a constant multiplier, the base of $\Theta(\textrm{log} N)$ does not need to be specified! ### Common Orders of Growth Constant — $\Theta(1)$ Logarithmic — $\Theta(\textrm{log } N)$ Linear — $\Theta(N)$ Polynomial — $\Theta(N^x)$ Linearithmic — $\Theta(N \textrm{ log } N)$ Exponential — $\Theta(x^N)$ Factorial — $\Theta(N!)$ These orders of growth are functions that we use to define and compare runtimes. Sometimes, it’s hard to tell if the function that represents a runtime fits a function included in the list of orders of growth. Instead, we will consider the upper bound and lower bound of the runtime function in terms of these familiar functions to make the runtime easier to compare. ### Notations $O$ - “Big Oh” notation for upper bound of $f(N)$ $\Omega$ - “Big Omega” notation for lower bound of $f(N)$ $\Theta$ - “Big Theta” notation for tight bound of $f(N)$, use only if $\Omega = O$ The graphs above are a good visualization. $f(N)$ is the measured runtime of a method and sometimes, it’s difficult to tell what type of function it is (in terms of the common orders of growth). Therefore, we use an upper bound and lower bound to better define $f(N)$. Note: We might notice that the upper bound is less than $f(N)$ at the very beginning. This is totally okay! We’re only looking at **large inputs** for asymptotic analysis and $O$ will always be greater than $f(N)$ at larger inputs. ![](https://i.imgur.com/0Kfi6Y3.jpg) In graph **( a )**, $f(n)$ can be tight bounded by some constant multiple of $g(n)$, implying it can be lower bounded and upper bounded by $cg(n)$. In graph **( b )**, $f(n)$ can be upper bounded by some constant multiple of $g(n)$. In graph **( c )**, $f(n)$ can be lower bounded by some constant multiple of $g(n)$. :::warning A common misconception that students run into is assuming that the best case in the problem is based off a small input. This is wrong because asymptotic analysis looks at LARGE INPUTS only. Example: ```Java public int p1(int N) { if (N <= 0) { return 0; } else { for (int i = 0; i < N; i++) { System.println("hi"); } } } ``` Misconception: The best case is $\Theta$(1) because N can be less than or equal to 0 and instantly hit the first if-statement. This is wrong because $N$ must be a large input. Therefore, the best case is $\Theta(N)$. ::: ### Notation Order Given two functions that represent *possibly* different runtimes, $f(N)$ and $g(N)$, we want to compare their runtimes and bound them. Therefore, we use a limit. $$\lim_{n \rightarrow \infty}\frac{f(N)}{g(N)} = c$$ If $0 \leq c < \infty$, then $f(N) = O(g(n))$ If $0 < c \leq \infty$, then $f(N) = \Omega(g(n))$ If $0 < c < \infty$, then $f(N) = \Theta(g(n))$ ### Best case vs Worst Case vs Overall An asymptotic problem will either ask for **just the tight asymptotic runtime bound** or the **tight asymptotic runtime bound in the best case, worst case, and overall**. **Best case:** the lower bound of the method runtime, uses $\Theta$ notation **Worst case:** the upper bound of the method runtime, uses $\Theta$ notation :::warning A common misconception that students have is using $\Omega$ notation for best case runtime and $O$ notation for worst case runtime when asked for the tight asymptotic bound in the best case or worst case! Best case refers to **one** scenario in which the code runs the fastest. Worst case also refers to **one** scenario in which the code runs the slowest. Therefore, to give them the tightest bound, they must both use $\Theta$ notation! ::: **Overall** looks at all possible cases for the method runtime, but we examine the best case and worst case runtime to give a bound for the runtime of all possible cases: * uses $\Theta$ notation if best case = worst case * Otherwise, uses $\Omega$ notation denoting the best case runtime and $O$ notation denoting the worst case runtime **Asymptotic Tight Bound Only:** * uses $\Theta$ notation if best case = worst case * otherwise, give worst case runtime in $O$ notation ## Analyzing Recursive Functions For recursive functions, the best approach to draw out a tree. A node in the tree would represent a call to the function. The runtime that we denote in the node would represent the runtime of running the function. The branching factor (number of children per node) of the tree represents the number of times we recurse on the function per function call. The height of the tree represents the time it takes to stop recursing/opening new frames/reach the base case. ## Common Summations Summations are often used to sum up the amount of work done in iterative and recursive functions. The ones below are very common: $1 + 2 + 4 + ....... + 2^{log_2N} = N$ $1 + 2 + 4 + ....... + 2^{N} = 2^N$ $1 + 2 + 3 + ....... + N = N^2$ ### Confused about where these summations come from? **Sum of finite arithmetic series:** $$S_n = \frac{n}{2}(a_1 + a_n)$$ $n$ = number of terms $a_1$ = first term in sequence $a_n$ = last term in the sequence **Sum of finite geometric series:** $$S_n = a_1\left( \frac{1-r^n}{1-r} \right)$$ $n$ = number of terms $a_1$ = first term in sequence $r =$ geometric ratio $r^n =$ ratio to the power of $n$ ## Examples ### Independent Double Nested For-Loops ```Java def f(N) { for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { //code that runs constant time } } } ``` **Analysis:** The inner for loop runs $N$ times each time the outer loop runs once. Since the outer loop runs $N$ times, we get a summation: $$\sum_{i = 0}^{N-1}N = N^2$$ **Runtime:** $\Theta(N^2)$ ### Dependant Double Nested For-Loops ```Java def f(N) { for (int i = 0; i < N; i++) { for (int j = 0; j < i; j++) { //code that runs constant time } } } ``` **Analysis:** When $i = 0$, the inner loop does $0$ work. When $i = 1$, the inner loop does $1$ work. When $i = 2$, the inner loop does $2$ work and the pattern continues until $i = N$. Therefore, we can formulate the following summation: $$0+ 1 + 2 + 3 + 4 + ....... + (N - 1) = \sum_{i = 0}^{N-1}i = \frac{N(0 + N - 1)}{2} = N^2$$ **Runtime:** $\Theta(N^2)$ ### Analyzing Runtime of Triple Dependent Nested For-Loops ```Java def f(N) { for (int i = 0; i < N; i++) { for (int j = 0; j < i; j++) { for (int k = 0; k < j; k++) { //code that runs constant time } } } } ``` Again, we can do the same analysis: When $i = 0$ and $j = 0$, $k$ does $0$ work. When $i = 1$ and $j = 0$, $k$ does $0$ work. When $i = 1$ and $j = 1$, $k$ does $1$ work. When $i = 2$ and $j = 0$, $k$ does $0$ work. When $i = 2$ and $j = 1$, $k$ does $1$ work. When $i = 2$ and $j = 2$, $k$ does $2$ work. And the pattern continues. If you're interested in a more visual representation, try to make sense of the grid below. The x-axis represents $i$ and the y-axis represents $j$. The numbers inside represent the amount of work that $k$ does at that certain $i$ and $j$. ![](https://i.imgur.com/f7A4IGV.jpg) From our analysis, we get the summation (we can drop the zeros): $$ (1) + (1 + 2) + (1 + 2 + 3) + ....... + = \sum_{j = 1}^{N-1}\sum_{i = 1}^{j} i$$ There's not a particularly easy way to calculate the resulting double summation, so instead, we can manipulate the summation by grouping together equal values (i.e. group all the 1's, 2's etc.) to get: $$1(N-1) + 2(N-2) + 3(N-3) + ...... + (N-1)(1) = \sum_{i = 1}^{N-1} i (N - i)$$ $$= \sum_{i = 1}^{N-1} iN - i^2 = N\sum_{i = 1}^{N-1} i + \sum_{i = 1}^{N-1} i^2$$ $$= N\left(\frac{(N-1)(N - 1 + N - 1)}{2}\right) + \frac{(N-1)(N-1 + 1)(2(N - 1) + 1)}{6} = N^3$$ Note: Second summation is calculated using **sum of squares formula**. Therefore the runtime is $\Theta(N^3)$. ## Resources [CS 61B General Asymptotics Resources](https://drive.google.com/drive/folders/1RZtoJjZpIKCXqKMEsUkWIfcGXW0ULirt?usp=sharing): compiled by Tina [Amortized Analysis Video](https://www.youtube.com/watch?v=Jp7hEXdTjSE&index=10&list=PLv24fkNrbcYiwBZz1NkfatR3PnWgQYXSd): old CSM video that covers amortized analysis [Tina's Amortized Runtime Notes](https://hackmd.io/s/rJqVfs6qQ): amortized runtime notes based off asymptotic CSM problem

Read more

Effective Java Notes [On Hold]

A Reflection On My Summer 2019 Internship

CS 61BL Week 1 Notes

Classes and Interfaces Review