Divide and Conquer

<center> <h1> Divide and Conquer </h1> </center> Divide and Conquer is a programming or problem solving paradigm that **partitions a problem (divides)** until it is **small enough to be solved (conquered)**. The solved subproblems are then **combined to solve bigger problems (combine)** until the whole problem is solved. **Recursion** is particularly useful in divide and conquer as we can divide a problem recursively until we reach a **base case** that we know the solution. The combining process is then done by the calling function when a recursive function returns. Note that recursion is not necessary but is very helpful for divide and conquer algorithms. ![picture](https://drive.google.com/uc?export=view&id=1xcdTJkJcHG4qo8YjYu2icxaRlTnlhzms) # Programming Example: Searching Let's say we have a list of sorted student numbers of those enrolled in EEE 121 (let's limit it to 10 as an example): | index | Student Number | | --- | :-: | | 0 | 1503 | | 1 | 1549 | | 2 | 3228 | | 3 | 5537 | | 4 | 6769 | | 5 | 8184 | | 6 | 8487 | | 7 | 9064 | | 8 | 9584 | | 9 | 9932 | We want to know the index of a particular student in the list (i.e. `index_of(6769)` should evaluate to `4`) or return `-1` if the student is not in the list. ## The *Lame* Approach: Linear Search We can simply start at index `0` and check each element until we find the one we need: ```python sn_list = [1503, 1549, 3228, 5537, 6769, 8184, 8487, 9064, 9584, 9932] n = len(sn_list) def index_of(key): i = 0 # start checking from index 0 while i < n: # check until index is within list length if sn_list[i] == key: return i # key is found at index i i += 1 # go to next index return -1 # key is not found index_of(1500) ``` The worst case is when the element does not exist in the list and the `while` loop iterates for `n` times. Thus, we can say that the *time complexity* of this implementation is $O(n)$. *Space complexity*, on the other hand is $O(1)$ since it is not recursive nor temporary variables that have variable size. ## The *Chad* Approach: Binary Search We can take advantage of the fact that the list is already sorted and apply the divide and conquer paradigm in our thought process. Instead of looking at all the elements of the list, why don't we just look at half? Since we know it's sorted, we can look at the middle element and compare it to element we want to find and should give us a good idea of which partition to look into: ```python sn_list = [1503, 1549, 3228, 5537, 6769, 8184, 8487, 9064, 9584, 9932] n = len(sn_list) def index_of(key): mid = n // 2 # find the middle index start = 0 # starting index to check end = n # last index to check + 1 if key > sn_list[mid]: start = mid # key is in the upper half else: end = mid # key is in the lower half # Search Loop i = start while i < end: # check until index is within list length if sn_list[i] == key: return i # key is found at index i i += 1 # go to next index return -1 # key is not found index_of(9932) ``` But wait, that just halves the number of of elements. The worst-case time complexity is: $$ O(T(n/2)) = O(n/2) = n $$ It's still linear! What if we partition the partition again? What if we partition *that* partition again? But how do we know how many times we need to partition? If we use recursion, we actually don't need to know how many partitions we need. We just need to know when to stop partitioning or properly define our base case. ```python sn_list = [1503, 1549, 3228, 5537, 6769, 8184, 8487, 9064, 9584, 9932] n = len(sn_list) def index_of(key, start = 0, end = n): # Base Case if end - start == 1: # partition has only 1 element # key is either the remaining element or not in the list return start if sn_list[start] == key else -1 elif end <= start: # list has no elements return -1 # key not found mid = (end + start) // 2 # find the middle index (average) if key > sn_list[mid]: return index_of(key, mid, end) # key is in the upper half elif key < sn_list[mid]: return index_of(key, start, mid) # key is in the lower half else: return mid # key is equal to middle element (we found it!) index_of(8184) ``` Okay, it looks like it works. Now, let's analyze it! ### Time Complexity We must first formulate the **recurrence relation** of binary search by looking at how the input changes between recursive calls. If we assume the worst case where the element is never in the middle of the list and we let $n$ represent the length of the list to be searched on, then we can say: $$ T(n) = \begin{cases} T(n/2) + c_1 & n > 1 \\ c_2 & n \leq 1 \\ \end{cases} $$ When there is more than one element in the partition, we call on `index_of()` with one of the boundaries changed to `mid`. `mid` is computed as the middle of start and end so the new boundaries would constitute approximately half the size of the original partition, hence the $T(n/2)$ term. Of course, let's not forget that our function has other operations but we assume all of these to be of constant runtime, hence the $c_1$ term. Then, we consider the base case where there are is only 1 element left in the list which executes in constant time $c_2$. Let's use the substitution method to get a closed-form expression of $T(n)$: \begin{aligned} T(n) = T(n/2) + c_1, & \quad \rm starting \space point \\ T(n) = [T(n/4) + c_1] + c_1, & \quad {\rm expanding \space} T(n/2) \\ T(n) = T(n/4) + 2c_1, & \quad {\rm combining \space} c_1 {\rm \space terms} \\ T(n) = [T(n/8) + c_1] + 2c_1, & \quad {\rm expanding \space} T(n/4) \\ T(n) = T(n/8) + 3c_1, & \quad {\rm combining \space} c_1 {\rm \space terms} \\ T(n) = T(n/16) + 4c_1, & \quad {\rm expanding \space} T(n/8) \\ T(n) = T(n/32) + 5c_1, & \quad {\rm expanding \space} T(n/16) \\ \end{aligned} Notice how $n$ is always divided by increasing powers of $2$ and the coefficient of $c_1$ increased after each expansion as well. In fact, if we *substitute* $k$ as the coefficient of $c_1$ we can rewrite the expression as follows: $$ T(n) = T\left(\frac{n}{2^k}\right) + k{c}_{1} $$ We can prove the expression by induction but let's leave that as something to work on in our spare time. Next we equate $n$ to $2^k$ and get an expression for $k$ $$ n = 2^k \rightarrow k = \log_{2}n$$ Plugging it to $T(n)$ we get: $$ T(n) = T(n/n) + c_1 \log_{2}n $$ $$ T(n) = c_2 + c_1 \log_{2}n $$ Note that $n/n = 1$ and $T(1) = c_2$. Lastly, we get the asymptotic $O(n)$: $$ O(T(n)) = O(c_2 + c_1 \log_{2}n ) = {\rm lg \space} n $$ Thus, we can now confirm that *binary search* **is better than** *linear search* in terms of runtime. ### Auxiliary Space Complexity To get the auxiliary space complexity we must first visualize the recursion tree of the function considering a possible worst case where the element needed is index `0`: <center><img src="https://drive.google.com/uc?export=view&id=1MKVvnWEI0s4vBZQ8prfdCfwz4djYam6T" width="300" /></center> Notice that for each level, the partition length is halved and continues to halved until the base case where only one element remains. Thus, we can say that: $$ partition \space length = \frac{n}{2^{level}} $$ To estimate the recursion depth we compute for the level where the partition length is approximately $1$: $$ 1 \approx \frac{n}{2^{depth}} \rightarrow depth \approx \log_{2}{n}$$ Assuming a constant ($c$) auxiliary space for a single call of `index_of()` the total auxiliary space can be expressed as: $$ M(n) = \left(\log_{2}n\right) \times c$$ Thus, the worst case auxiliary space complexity is: $$ O(M(n)) = O((\log_{2}{n}) \times c) = O({\rm lg \space}n)$$ Which means that the *recursive binary search* **uses more space than** *linear search* which could matter when we implement this in low-memory computers. ## The Verdict *Is binary search better than linear search?* The answer depends on which resource are you willing to give up. Binary search uses a bit more memory but is significantly faster than linear search. You may even find this as a recurring trade-off in future algorithms where we incur more memory usage in order to speed up our solution. *Is it possible to implement binary search without recursion?* Absolutely, and it might even be better in terms of memory! The verdict here is strictly based on the implementation above. The recursive implementation was shown to better prep you for the algorithms to come. It may look simple to do for binary search but future divide and conquer algorithms would be a little more complex and harder to do iteratively. # Programming Example: Merge Sort Let's apply divide and conquer to our ever present problem of sorting a list of elements. So far, you should have encountered the $O(n^2)$ algorithms which passes through each element in possibly multiple times. For merge sort, we first **partition the list** and **sort these partitions separately**. Once sorted, the partitions can be easily be **merged into a sorted list** since they're already sorted. But how would we sort the partitions? With merge sort, of course! Below is an annotated recording from [visualgo.net](https://visualgo.net/en/sorting): [![Merge Sort Video](https://drive.google.com/uc?export=view&id=1I_X7D1VrQhhFuDuN_21kL8jTZiODvQRq)](https://drive.google.com/file/d/1XqMIOawZPIg6i_5mPBDq6TR4kcWZFf4Z/view?usp=sharing "Merge Sort Video") In summary, once we have two partitions that we know are sorted, we merge these partitions one element at a time. We look at the smallest element (which is also the first one since we know these partitions are sorted) of each and *pluck* out whichever is smaller append it to a temporary list. This is repeated until both partitions are empty and we arrive at a merged and sorted list that we copy back to the original one. We can also implement this in python: ```python def merge_sort(arr): # Merge Sort helper function, calls recursive Merge Sort function global _debug def merge(a, b, l=0): temp = [] # temporary empty list # Inserting elements into 'temp' while len(a) * len(b) > 0: if a[0] < b[0]: # Check for smaller element temp.append(a.pop(0)) # a[0] is removed and appended to temp else: temp.append(b.pop(0)) # b[0] is removed and appended to temp if _debug: print(" "*l, "Sorted", temp + a + b) return temp + a + b # Remaining elements appended def merge_sort_r(arr, l=0): if _debug: print(" "*l, "Merge Sort:", arr) n = len(arr) # Base Case if n <= 1: # partition has 1 element or less # partition is already sorted return arr mid = n // 2 # find the middle index (average) left = merge_sort_r(arr[:mid], l + 1) # merge sort lower partition right= merge_sort_r(arr[mid:], l + 1) # merge sort upper partition if _debug: print(" "*l, "Merging:", left, right) return merge(left, right, l) if len(arr) > 1: return merge_sort_r(arr, 0) return arr _debug = False my_list = [3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48] print("Before Sorting:", my_list) my_list = merge_sort(my_list) print("After Sorting:", my_list) ``` The first function in the implementation is the `merge_sort` and it is what we call as a *helper function* and just calls the recursive `merge_sort_r` function which is what actually implements the merge sort algorithm. A *helper function* is commonly used with recursive functions as a form of input validation. It also allows us to reduce the number of parameters that need to be passed in the main function call since we can eliminate the parameters tracked in recursion (`l` in this case) which makes it a little more user friendly. The actual recursive function is `merge_sort_r` is called twice for each partition. When `merge_sort_r(arr[:mid], l + 1)` is called, merge sort is applied to the partition starting from the `0`th up to index `mid-1`. The element at the index `mid` is included in the second partition or when `merge_sort_r(arr[mid:], l + 1)` is called, this ensures only one partition includes the middle element. Lastly, the `merge` function, as the name suggests merges the partitions stored as `left` and `right`. A temporary list called `temp` is filled with elements from the two partitions one element at a time. We assume that each partition is sorted so we are sure the first element in each partition is the smallest element. Thus, we only need to look at the first elements to know which to add first to `temp` and preserve the 'sorted-ness' of the result. Note that `l` and `_debug` are additional variables used in debugging. Setting `_debug = True` generates additional `print()` statements that are indented based on the level. These are really for debugging purposes only and are not used by the actual algorithm. Now that we understand the implementation, let's analyze it! ## Time Complexity We focus on the recursive function `merge_sort_r` which calls itself twice with half its inputs and calls `merge`. Thus, our recurrence relation should look like: $$ T(n) = \begin{cases} 2T(n/2) + T_{\rm merge}(n) + c_1 & n > 1 \\ c_2 & n \leq 1 \\ \end{cases} $$ where $n$ is the size of the partition for which the functions are called and $c_1$ and $c_2$ are the constant times for the other operations and the base case of the function, respectively. We first need to determine $T_{\rm merge}(n)$ by analyzing at the `merge` function which we rewrite below with comments on the runtime of each line: ``` def merge(a, b, l=0): temp = [] # O(1) while len(a) * len(b) > 0: # (len(a) + len(b))* O(1) = O(n/2 + n/2) if a[0] < b[0]: temp.append(a.pop(0)) # Assume append() and pop() are O(1) else: temp.append(b.pop(0)) # Assume append() and pop() are O(1) if _debug: print(" "*l, "Sorted", temp + a + b) # Does not count return temp + a + b # Assume O(1) ``` We first assume that the `append()` and `pop()` operations are $O(1)$. These operations get repeated as long as `len(a)` and `len(b)` are greater than `0`. Note that the `pop()` operations remove an item from the list which means that after each iteration, `len(a)` or `len(b)` decreases by 1. The worst-case scenario would be when both `a` and `b` run out at the same time so the loop is repeated for all but one element. So, the runtime would linearly depend on the sizes of `a` and `b` combined. However, we know that when `merge_sort_r()` calls `merge()`, we know that both inputs have approximately $\frac{n}{2}$ elements. Thus, the worst-case time complexity of `merge()` is: $$ O(T_{merge}) = O(n) $$ We can then rewrite the recurrence relation as: $$ T(n) = \begin{cases} 2T(n/2) + n + c_1 & n > 1 \\ c_2 & n \leq 1 \\ \end{cases} $$ and start simplifying it with either the substitution or recursion tree method. Since we already used substitution for binary search, we draw the recurrence tree for our implementation of merge sort: ![picture](https://drive.google.com/uc?export=view&id=1YhUhPl0GBI7A1daTzNokPWg_3_J1at6T) Notice that for each node, the runtime has been annotated in terms of the size of the orignal size of the input ($n$) and the total runtime per level is shown on the right. For each level, the $n$ terms are halved per level but the total number of nodes per level doubles as well so the coefficient of $n$ when taken per level remains the same at $1$. On the other hand the constant $c_{1}$ does not change per level but the number of nodes still double per level. We can then generalize the expression for the runtime for each level as: $$ T_{k}(n) = n + 2^{k}c_{1} $$ Similar to binary search, we can get the depth $d$ when the partition length is approximately $1$: $$ \frac{n}{2^{d}} \approx 1 \rightarrow d \approx \log_{2}{n}$$ We can then set up and simplify the summation of the total runtime as: $$ T(n) = 2^{d}c_2 + \sum_{k = 0}^{d-1} \left(n + 2^{k}c_1\right) = 2^{d}c_2 + dn + (2^{d} - 1)c_1 $$ $$ T(n) = nc_{2} + (\log_{2}n)n + (n-1)c_{1}$$ Thus, the time complexity would be: $$ O(T(n)) = O(n + n \log_{2}{n} + n) = n \space {\rm lg \space}n $$ $O(n \space {\rm lg \space}n)$ or **log-linear** time complexity is definitely better than $O(n^2)$ and is a clear improvement over the sorting algorithms we have previously encountered. ## Space Complexity The space complexity would be a little tricky for merge sort because there is an additional `merge()` function that is called. Thus, the total worst-case auxiliary space complexity for merge sort is generally: $$ M(n) = (memory \space for \space recursion) + (memory \space for \space merge) $$ Memory for recursion can be obtained by getting the total memory used by one 'path' from the initial function call to one of the function calls in the deepest level like this one: ![picture](https://drive.google.com/uc?export=view&id=1wyfaYf0J3LuDWcMBJAXJlbsrNG-UJ_M3) Note that $c_{1}$ in the figure would be the constant space needed for the normal variables used (i.e. `n` and `mid`) while $c_{2}$ is the space required to store the sorted partitions (i.e. `left` and `right`). Notice that $c_{2}$ is multiplied by $2$ at each level because we need to store 2 partitions but it is continuously halved because the partitions needed at each level is also halved. Thus, the total space for recursion could be expressed as: $$ M_{recursion}(n) = dc_{1} + \sum_{k=0}^{d-1}\left( 2\times \frac{n}{2^{k+1}}c_{2}\right) = dc_{1} + \left(2 - \frac{2}{2^{d}}\right)nc_{2} $$ We already know that that the depth ($d$) is equal to $\log_2(n)$ so we can further simplify the expression to: $$ M_{recursion}(n) = c_{1}\log_2(n) + 2nc_{2} - \frac{2n}{n}c_{2} = c_{1}\log_2(n) + (2n-2)c_{2} $$ Lastly, we look at the memory needed by `merge()` which is largely dictated by the space alloted for the temporary list `temp` which stores the resulting merged list. Obviously, the size of `temp` is the combined size of inputs `a` and `b`. This means that: $$ O(M_{merge}(n)) = O(n) $$ We can now finally get the overall auxiliary space complexity for our merge sort implementation: $$ O(M_{recursion}(n) + M_{merge}(n)) = O(\log_2(n) + 2n + n) = O(n) $$ Which means **the price to pay for the log-linear running time is a linear space complexity** at least for our specific implementation of merge sort. Note that this may differ from our references or other online sources since they may have a different, more optimized, implementation. # Programming Example: Maximum Sub-Array Suppose that you have an array of numbers (e.g. stock price changes) and you want to find the maximum sub-array or slice of the array that results in the highest sum (e.g. profit). Shown in the code below is a sample array and highlighted in red is the expected highest sub-array. ```python import matplotlib.pyplot as plt def plot_array(arr, l=-1, r=-1): plt.plot(arr, color="b", label="arr", marker='x') if l >= 0 and r >= 0: sum = 0 for e in arr[l:r+1]: sum += e plt.plot(range(l, r+1), arr[l:r+1], color="r", \ label="total = " + str(sum), marker='*', alpha=0.4) plt.legend() plt.show() d = True # Replace with False to disable debug commands def debug(*s): if d: print(*s) arr = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7] plot_array(arr, 7, 10) ``` ## A $O(n^2)$ approach. We can simply check each possible sub-array and store whichever has the highest sum: ```python def max_sub_array_i(arr): i = 0 # start checking from index 0 max_sum = -10000000 # some large negative number start = 0 # storage for first index of max sub-array stop = 0 # storage for last index of max sub-array while i < len(arr): # check until last index j = i # end of current sub-array sum = 0 # current sum of sub-array while j < len(arr): # check until last index sum += arr[j] # add last included element debug("Getting sum from", i, "to", j, ":", sum) if sum > max_sum: # new max sub-array found max_sum = sum # update values start = i stop = j debug(" New maximum!") #plot_array(arr, i, j) j += 1 # update j i += 1 # update i return (max_sum, start, stop) m, l, r = max_sub_array_i(arr) print("Maximum sum found from", l, "to", r, ": ", m) plot_array(arr, l, r) ``` The worst case time complexity of the algorithm, as we may have expected, is $O(n^2)$. This could easily be deduced because of the nested `while` loops which iterate approximately `n` times where `n` is the length of `arr`. To be more concise, the inner `while` loop (`j < len(arr)`) will iterate `n` times when `i=0`, then `n-1` times when `i` is incremented, then `n-2` times and so on and so forth. The total sum can be rewritten as: $$ T(n) = n + (n-1) + (n-2) + ... + 2 + 1 = \frac{n(n+1)}{2} $$ which means a complexity of $O(T(n)) = n^2$ assuming that the operations in the loops all have constant time complexity. The function only uses temporary variables that are constant in size thus the auxiliary space complexity is constant or $O(M(n)) = 1$ ## A Divide and Conquer Approach So how do we create a divide and conquer approach? We can similarly to merge sort where we split the original input array and recursively call the function on these sub-arrays. ```python def max_sub_array_r(arr, low, high): if high-low > 1: # if more than one element mid = (high+low)//2 # find middle max_sub_array_r(arr, low, mid) # operate on left sub-array max_sub_array_r(arr, mid, high) # operate on right sub-array # do something in the middle return return ``` So what's next? Well we can assume that `max_sub_array_r(arr, start, mid)` finds the maximum sub-array on the left partition and `max_sub_array_r(arr, mid, end)` finds the maximum sub-array on the right partition. We can consider these 2 maxima as *contenders* for the maximum sub-array of the whole thing. But we're missing one more *contender* and that is maximum sub-array which *crosses* the midpoint of the partitioned array. We can visualize this by looking at this figure from <a href="https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844">CLRS</a>: <center><img src="https://drive.google.com/uc?export=view&id=1eCk3jhk0UlqoJYNcHMDHQsOXcLUR6Qhc" /></center> Knowing that, we can modify our *template* above to compare the contenders and define the return values: ``` def max_sub_array_r(arr, low, high): if high-low > 1: # if more than one element debug("Getting max sum from", low, "to", high, "- 1") mid = (high+low)//2 # find middle debug(" Getting max left sum") c1 = max_sub_array_r(arr, low, mid) # operate on left sub-array debug(" Left contender", c1[1], "to", c1[2], ":", c1[0]) debug(" Getting max right sum") c2 = max_sub_array_r(arr, mid, high) # operate on right sub-array debug(" Right contender", c2[1], "to", c2[2], ":", c2[0]) c3 = max_sub_array_cross(arr, low, mid, high) # operate on crossing debug(" Crossing contender", c3[1], "to", c3[2], ":", c3[0]) # Note that c1, c2, c3 are 3-element tuples # [0] := maximum sum # [1] := start of max sub-array # [2] := end of max sub-array cmax = c1 if c1[0] > c2[0] else c2 cmax = c3 if c3[0] > cmax[0] else cmax debug(" Max contender from", low, "to", high, "- 1:", cmax[1], "to", cmax[2], "=", cmax[0]) return cmax elif high-low == 1: debug(" Single element", low, "to", high, "- 1", ":", arr[low]) return (arr[low], low, high) # return single element value else: debug(" Empty list", low, "to", high, "- 1", ".") return (-10000000, low, high) # return large negative value ``` What is left for us to do is to define `max_sub_array_cross` but that's easier said than done. We can always just check *every possible* sub-array like the $O(n^2)$ approach but wouldn't that mean a $O(n^2)$ runtime as well? *Not exactly*, because we have the added restriction that the mid-point must be included and that changes a lot! As shown in Figure 4.4(b) a sub-array containing the midpoint consists of two sub-arrays: one ending at the mid-point and the other starting at the midpoint. Since we already know one endpoint of these sub-arrays, finding the other endpoint is a bit more simpler: ```python def max_sub_array_cross(arr, start, mid, end): debug(" Getting max crossing sum from", start, "to", mid, "to", end, "- 1") # Getting maximum sub-array ending at mid max_sum_left = -100000 # some large negative number left = -1 # initial border of left max sub-array sum_left = 0 # initial sum i = mid-1 # start loop at mid-1 while i >= start: sum_left += arr[i] if sum_left > max_sum_left: # new max sub-array found max_sum_left = sum_left # update values left = i i -= 1 # decrement since we're moving to the left debug(" Max left at", left, ":", max_sum_left) # Getting maximum sub-array starting at mid max_sum_right = -100000 # some large negative number right = -1 # initial border of right max sub-array sum_right = 0 # initial sum i = mid # start loop at mid + 1 while i < end: sum_right += arr[i] if sum_right > max_sum_right: # new max sub-array found max_sum_right = sum_right # update values right = i i += 1 # decrement since we're moving to the left debug(" Max right at", right, ":", max_sum_right) return(max_sum_left+max_sum_right, left, right) """Now, we just need our helper funtion:""" def max_sub_array(arr): return max_sub_array_r(arr, 0, len(arr)) arr = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7] m, l, r = max_sub_array(arr) print("Maximum sum found from", l, "to", r, ": ", m) plot_array(arr, l, r) ``` ## Complexity Analysis Now to analyze the algorithm, let us rewrite the function without the debug statements and some comments (which are basically half the code lol). ```python def max_sub_array_r(arr, start, end): if end-start > 1: # if more than one element mid = (end+start)//2 # find middle c1 = max_sub_array_r(arr, start, mid) # operate on left sub-array c2 = max_sub_array_r(arr, mid, end) # operate on right sub-array c3 = max_sub_array_cross(arr, start, mid, end) # operate on crossing cmax = c1 if c1[0] > c2[0] else c2 cmax = c3 if c3[0] > cmax[0] else cmax return cmax elif end-start == 1: return (arr[start], start, start) # return single element value else: return (-10000, start, end) # return single element value m, l, r = max_sub_array(arr) print("Maximum sum found from", l, "to", r, ": ", m) plot_array(arr, l, r) ``` Apart from looking way more friendlier it's much more easier to analyze. We can now define the recurrence relation to be: $$ T(n) = \begin{cases} 2T(n/2) + T_{cross}(n) & n > 1 \\ O(1) & n \leq 1 \\ \end{cases} $$ where $n$ is the size of the array and $T_{cross}$ is the time complexity of `max_sub_array_cross`. Looking at the said function, we see 2 while loops. The first while loop iterates exactly `mid-1-start` times while the second while loops iterates exactly `end-mid` times. Thus, it takes `end-start` time which is equal to `n` so assuming all operations are constant: $$T_{cross}(n) = O(n)$$ Thus the recurrence becomes: $$ T(n) = \begin{cases} 2T(n/2) + O(n) & n > 1 \\ O(1) & n \leq 1 \\ \end{cases} $$ which is the same as merge sort. Thus: $$T(n) = \Theta(n \log_2{n}).$$ This would also generate a similar recursion tree but unlike `merge`, `max_sub_array_cross` only consumes constant space. Thus the auxiliary space complexity would be proportional to the recursion depth: $$M(n) = \Theta(\log_2{n}).$$ ## The Verdict Below is a table of the worst-case time and space complexities of both implementations: | Implementation | O(T(n)) | O(M(n)) | --- | :-: | :-: | | Iterative | O(n^2) | O(1) | | Divide-and-Conquer | O(n log n) | O(log n) | The divide-and-conquer approach, like merge sort, has a much better time complexity at the cost of some additional space. Thus, if it is more important to have a faster algorithm, divide-and-conquer approaches typically take less time by requiring additional space. # Programming Example: The Maximum Sub-Array Problem Note that this is hugely based on Chapter 4 of [CLRS](https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844) with my own implementation in Python3 and more *aligned* with the discussion in class. ```python # Some functions to help visualize the examples later # Run this code snippet before anything else import matplotlib.pyplot as plt def plot_array(arr, l=-1, r=-1): plt.plot(arr, color="b", label="arr", marker='x') if l >= 0 and r >= 0: sum = 0 for e in arr[l:r+1]: sum += e plt.plot(range(l, r+1), arr[l:r+1], color="r", \ label="total = " + str(sum), marker='*', alpha=0.4) plt.legend() plt.show() d = True # Replace with False to disable debug commands def debug(*s): if d: print(*s) ``` ## Problem Statement Suppose that you have an array of numbers (e.g. stock price changes) and you want to find the maximum sub-array or slice of the array that results in the highest sum (e.g. profit). Shown in the code below is a sample array and highlighted in red is the expected highes sub-array. ```python arr = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7] plot_array(arr, 7, 10) ``` ## A $\Theta(n^2)$ approach. We can simply check each possible sub-array and store whichever has the highest sum: ```python def max_sub_array_i(arr): i = 0 # start checking from index 0 max_sum = -10000000 # some large negative number start = 0 # storage for first index of max sub-array stop = 0 # storage for last index of max sub-array while i < len(arr): # check until last index j = i # end of current sub-array sum = 0 # current sum of sub-array while j < len(arr): # check until last index sum += arr[j] # add last included element debug("Getting sum from", i, "to", j, ":", sum) if sum > max_sum: # new max sub-array found max_sum = sum # update values start = i stop = j debug(" New maximum!") #plot_array(arr, i, j) j += 1 # update j i += 1 # update i return (max_sum, start, stop) m, l, r = max_sub_array_i(arr) print("Maximum sum found from", l, "to", r, ": ", m) plot_array(arr, l, r) ``` The worst case time complexity of the algorithm is, as suggested by the section header, $\Theta(n^2)$ since the inner while loop always iterates `i` times which is `n` at the worst case and that loop is repeated `n` times, where `n` is the length of `arr`. This is of course assuming all operations are done in constant time. The function only uses temporary variables that are constant in size thus the auxiliary space complexity is constant. Correctness is left as an exercise to the reader. Although, it would be helpful to note that the inner loop invariant would read like "`arr[start:stop+1]` is the maximum sub-array in `arr[i:j]`" and the outer loop invariant would be hugely based on the termination condition of the inner loop. ## A Divide and Conquer Approach So how do we create a divide and conquer approach? Of course, we can always start with a *template* where we progressively split the array into two: ```python def max_sub_array_r(arr, start, end): if end-start > 1: # if more than one element mid = (end+start)//2 # find middle max_sub_array_r(arr, start, mid) # operate on left sub-array max_sub_array_r(arr, mid, end) # operate on right sub-array # do something in the middle return return ``` Now what? Well `max_sub_array_r(arr, start, mid)` finds the maximum sub-array on the left partition and `max_sub_array_r(arr, mid, end)` finds the maximum sub-array on the right partition and these should be the *contenders* for the maximum sub-array of the whole thing. But we're missing one more *contender* and that is maximum *crossing* sub-array which contains the midpoint of the partitioned array. <center><img src="https://drive.google.com/uc?export=view&id=1eCk3jhk0UlqoJYNcHMDHQsOXcLUR6Qhc" /> <br>Figure from <a href="https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844"> CLRS </a>. Note that $low$ is `start`, $high$ is `end`, and the difference of how we define slices in Python3) </center> Knowing that, we can modify our *template* above to compare the contenders and define the return values: ```python def max_sub_array_r(arr, start, end): if end-start > 1: # if more than one element debug("Getting max sum from", start, "to", end, "- 1") mid = (end+start)//2 # find middle debug(" Getting max left sum") c1 = max_sub_array_r(arr, start, mid) # operate on left sub-array debug(" Left contender", c1[1], "to", c1[2], ":", c1[0]) debug(" Getting max right sum") c2 = max_sub_array_r(arr, mid, end) # operate on right sub-array debug(" Right contender", c2[1], "to", c2[2], ":", c2[0]) c3 = max_sub_array_cross(arr, start, mid, end) # operate on crossing debug(" Crossing contender", c3[1], "to", c3[2], ":", c3[0]) # Note that c1, c2, c3 are 3-element tuples # [0] := maximum sum # [1] := start of max sub-array # [2] := end of max sub-array cmax = c1 if c1[0] > c2[0] else c2 cmax = c3 if c3[0] > cmax[0] else cmax debug(" Max contender from", start, "to", end, "- 1:", \ cmax[1], "to", cmax[2], "=", cmax[0]) return cmax elif end-start == 1: debug(" Single element", start, "to", end, "- 1", ":", arr[start]) return (arr[start], start, start) # return single element value else: debug(" Empty list", start, "to", end, "- 1", ".") return (-10000, start, end) # return single element value ``` What is left is to define `max_sub_array_cross` but that's easier said than done. We can always just check *every possible* sub-array like the $O(n^2)$ approach but wouldn't that mean a $O(n^2)$ runtime as well? Not exactly, because we have the added restriction that the mid-point must be included and that changes a lot! As shown in Figure 4.4(b) a sub-array containing the midpoint consists of two sub-arrays: one ending at the mid-point and the other starting at the midpoint. By maximizing these two sub-arrays, we can find the maximum sub-array by combining them together. ```python def max_sub_array_cross(arr, start, mid, end): debug(" Getting max crossing sum from", start, "to", mid, "to", end, "- 1") # Getting maximum sub-array ending at mid max_sum_left = -100000 # some large negative number left = -1 # initial border of left max sub-array sum_left = 0 # initial sum i = mid-1 # start loop at mid-1 while i >= start: sum_left += arr[i] if sum_left > max_sum_left: # new max sub-array found max_sum_left = sum_left # update values left = i i -= 1 # decrement since we're moving to the left debug(" Max left at", left, ":", max_sum_left) # Getting maximum sub-array starting at mid max_sum_right = -100000 # some large negative number right = -1 # initial border of right max sub-array sum_right = 0 # initial sum i = mid # start loop at mid + 1 while i < end: sum_right += arr[i] if sum_right > max_sum_right: # new max sub-array found max_sum_right = sum_right # update values right = i i += 1 # decrement since we're moving to the left debug(" Max right at", right, ":", max_sum_right) return(max_sum_left+max_sum_right, left, right) ``` Now, we just need our helper funtion: ```python def max_sub_array(arr): return max_sub_array_r(arr, 0, len(arr)) m, l, r = max_sub_array(arr) print("Maximum sum found from", l, "to", r, ": ", m) plot_array(arr, l, r) ``` ## Complexity Analysis Now to analyze the algorithm, let us rewrite the function without the debug statements and some comments (which are basically half the code lol). ```python def max_sub_array_r(arr, start, end): if end-start > 1: # if more than one element mid = (end+start)//2 # find middle c1 = max_sub_array_r(arr, start, mid) # operate on left sub-array c2 = max_sub_array_r(arr, mid, end) # operate on right sub-array c3 = max_sub_array_cross(arr, start, mid, end) # operate on crossing cmax = c1 if c1[0] > c2[0] else c2 cmax = c3 if c3[0] > cmax[0] else cmax return cmax elif end-start == 1: return (arr[start], start, start) # return single element value else: return (-10000, start, end) # return single element value ``` Apart from looking way more friendlier it's much more easier to analyze. We can now define the recurrence relation to be: $$ T(n) = \begin{cases} 2T(n/2) + T_{cross}(n) & n > 1 \\ O(1) & n \leq 1 \\ \end{cases} $$ where $n$ is the size of the array and $T_{cross}$ is the time complexity of `max_sub_array_cross`. Looking at the said function, we see 2 while loops. The first while loop iterates exactly `mid-1-start` times while the second while loops iterates exactly `end-mid` times. Thus, it takes `end-start` time which is equal to `n` so assuming all operations are constant: $$T_{cross}(n) = O(n)$$ Thus the recurrence becomes: $$ T(n) = \begin{cases} 2T(n/2) + O(n) & n > 1 \\ O(1) & n \leq 1 \\ \end{cases} $$ which is the same as merge sort. Thus: $$T(n) = \Theta(n \log_2{n}).$$ This would also generate a similar recursion tree but unlike `merge`, `max_sub_array_cross` only consumes constant space. Thus the auxiliary space complexity would be proportional to the recursion depth: $$M(n) = \Theta(\log_2{n}).$$ ## Algorithm Correctness Our proof would sound similar to our process in creating the algorithm itself. It's just a matter of formalizing the statements. We can state the predicate as: $$ msa(start, end) = \max{\left(msa\left(start, mid\right), msa\left(mid, end \right), msa_{cross}\left(start, end \right)\right)} $$ where $mid = \left\lfloor \frac{end+start}{2} \right\rfloor$ and $msa$ stands for the `max_sub_array` function (abbreviated because it was long). Using induction, we have the base case where either: (a) $end-start == 1$ which results to a single element and thus there is only a single sub-array and that is the element itself, or (b) $end-start < 1$ which means the list is empty and no sub-arrays exist. In the inductive step, we assume that $msa\left(start, mid\right)$ and $msa\left(mid, end \right)$ both return the maximum sub-arrays in their respective partition. You can then prove (on your own hehe) that $msa_{cross}\left(start, end \right)$ returns the maximum sub-array containing the middle of the list. Lastly, notice that if the maximum sub-array in the list *does not* contain the midpoint, it *must* be in either partition. But we already know the maximum sub-arrays in each partition because of the inductive assumption so we just have to find the maximum among them. Lastly, the terminating condition is when the $end-start = n$ where $n$ is the length of the array and we find the maximum sub-array over the whole array. # The Master Theorem You may have noticed that the recurrence relation for both binary search and merge sort have a similar form. Acuatlly, it has been observed that the recurrence relation for most, if not all, divide-and-conquer algorithms can be generalized as: $$ T(n) = a \times T\left( \frac{n}{b} \right) + f(n)$$ where $f(n)$ describes the running time of the *merge* operations, $b$ describes the *number of partitions* made, and $a$ describes the *number of conquered* partitions. For binary search, $f(n)$ is a constant ($c$) since 'merging' is just returning a value, $b$ is $2$ since the list is partitioned in half, and $a$ is $1$ since only one partition is searched. On the other hand, merge sort has an $f(n)$ that is linear ($n$) because the `merge` function is linear, $b$ is also $2$ since it generates $2$ partitions each time, and $a$ is $2$ since both partitions are sorted. The master theorem tells us that depending on the relationship of $f(n)$, $a$, and $b$, the following can be said about the time complexity provided that the base case runs in constant time: 1. if $O(f(n)) < n^{\log_{b}{a}}$, then $\Theta(T(n)) = n^{\log_{b}{a}}$ 2. if $O(f(n)) = n^{\log_{b}{a}}$, then $\Theta(T(n)) = n^{\log_{b}{a}} \space {\rm lg \space} n$ 3. if $O(f(n)) > n^{\log_{b}{a}}$, then $\Theta(T(n)) = \Theta(f(n))$ Remember that the $\Theta$ notation means that it is both upper and lower bound of the time complexity. Applying it to binary search we compare $O(f(n)) = 1$ and $n^{\log_{2}{1}} = n^0$. Since $n^0 = 1$, this is case 2 of the master theorem and thus: $$ \Theta(T(n)) = n^{\log_{2}{1}} \space {\rm lg \space} n = {\rm lg \space} n $$ which is consistent with our example above. Try and apply the master theorem on merge sort yourself and confirm that our analysis arrived at the same answer.