Searching & Sorting Notes

# Searching & Sorting Notes Algorithms Course by Grow with Google ###### tags: Data Structures & Algorithms Practice `{%hackmd theme-dark %}` ## Binary Search Let's say you have an array sorted in numerical order and you want to see if a number you have exists in this array. If you start at the front and check every number in the array the time could be O(n) if your number is very big Cool trick Binary Search!... Rather than starting at the end of the array let's instead start at the middle of the array. You could say, "is my number bigger or smaller than the one in the middle?"" Let's say the number is bigger.. then you know that the number will be in the second half of the array. Now you dont even have to search the first half of the array![](https://i.imgur.com/TXVEIRp.png) * The efficiency of binary search is: O(log(n)) ### Binary Search Practice * Searches and sorts can be very hard to visualize and understand. [Here](http://www.cs.armstrong.edu/liang/animation/web/BinarySearch.html) is a supplementary visualization that might help as well! Python lists have a method called `index()`, which just does a search and returns the first index with an instance of that value. Next, you're going to write a binary search function that has the same result, but searches faster. Keep in mind the constraint for this exercise—for binary search, elements need to be in increasing order. You're going to write a binary search function. You should use an iterative approach - meaning using loops. Your function should take two inputs: a *Python list* to search through, and the *value* you're searching for. Assume the list only has distinct elements, meaning there are no repeated values, and elements are in a strictly increasing order. Return the index of value, or -1 if the value doesn't exist in the list. ```python def binary_search(input_array, value): # low and high keep track of which part of the list you'll search in low = 0 high = len(input_array)-1 # while you havent narrowed it down to one element... while low <= high: # ...check the middle element mid = (low + high) guess = input_array[mid] # found the element if guess == value: return mid # the guess was too high if guess > value: high = mid - 1 # the guess was too low else: low = mid + 1 # the item doesn't exist return -1 # Let's test it! test_list = [1,3,9,11,15,19,29] test_val1 = 25 test_val2 = 15 print binary_search(test_list, test_val1) print binary_search(test_list, test_val2) ``` check out [this visualization](https://www.cs.usfca.edu/~galles/visualization/Search.html). ## Recursion The idea is to create a function that calls itself 1. First, a recursive funtion needs to call itself at some point 2. Second, a recursive function needs a base case 3. needs to alter the input parameter at some point Here is a basic example in pseudo code: ```python function recursive(input): #Second, a recursive function needs a base case if input <= 0 return input else # First, a recursive funtion needs to call itself at some point output = recursive(input - 1) return output ``` You can think of a recursive function kind of like a while loop in some regard * you keep going through the code again and again and again until you hit some exit condition The *base case* is like the exit condition, it lets you know when you need to stop - without a base case you could get stuck in infinite recursion ### Recursion Practice We're going to take a look at recursion with a famous example—the Fibonacci Sequence. The Fibonacci Sequence follows one rule: get the next number in the sequence by adding the two previous numbers. Here is an example of the sequence: ```python 0,1,1,2,3,5,8,13,21,34... ``` Step through each value. You start with 0 and 1. 0 + 1 = 1, so you add 1 to the sequence. 1 + 1 = 2, so 2 is added. 1 + 2 = 3, so 3. 2 + 3 = 5, et cetera. You could represent these numbers as Python list, and it would look something like this: ``` fib_seq = [] fib_seq[0] = 0 fib_seq[1] = 1 fib_seq[2] = 1 fib_seq[3] = 2 fib_seq[4] = 3 fib_seq[5] = 5 fib_seq[6] = 8 fib_seq[7] = 13 fib_seq[8] = 21 fib_seq[9] = 34 ``` We can generate this sequence using an iterative method (with loops): ```python function getFib(position) { if (position == 0) { return 0; } if (position == 1) { return 1; } var first = 0, second = 1, next = first + second; for (var i = 2; i < position; i++) { first = second; second = next; next = first + second; } return next; } ``` Implementing `get_fib()` in a recursive way. Implement a function recursively to get the desired Fibonacci sequence value. Your code should have the same input/output as the iterative code in the instructions. ```python def get_fib(position): if position == 0 or position == 1: return position return get_fib(position - 1) + get_fib(position - 2) # Test cases print get_fib(9) print get_fib(11) print get_fib(0) ``` You may have noticed that this solution will compute the values of some inputs more than once. For example `get_fib(4)` calls `get_fib(3)` and `get_fib(2)`, `get_fib(3)` calls `get_fib(2)` and `get_fib(1)` etc. The number of recursive calls grows exponentially with `n`. In practice if we were to use recursion to solve this problem we should use a hash table to store and fetch previously calculated results. This will increase the space needed but will drastically improve the runtime efficiency. ## Intro to sorting * *In place* sorting algorithm just rearranges the elements in the data structure they're already in, without needing to re create the data structure by copying everything to a new structure * has low space complexity * bc you dont need to recreate the data structure ## Bubble Sort (Sinking sort) * Naive approach * Go through an array comparing elements side by side and switching them whenever necessary For example, take this array to sort [8,3,1,7,0]: 1. If the first element is bigger than the second one, we'll switch them 2. Repeat for the next few elements 3. An array of 5 elements [3,1,7,0,8] takes four comparisons so the runtime is (n - 1) since the array size is five the array is still not completely sorted but it looks better now... 4. The largest element 8 moved up to the end of the array where it belongs - In each iteration the largest element in the array wil bubble up to the top 5. Repeat the switching process until sorted - again we had to do four steps so which is still (n - 1) [0,1,3,7,8] ### Efficiency O(# comparisons per step)(# steps) *Iterations(Comparisons)* = (n-1)(n-1) = n²-2n+1 = **O(n²)** The most common verison of bubble sort leaves out comparing the last few elements that are in the right place. It assumes that after the first iteration you know that eight is in the right place [3,1,0,~~7~~,~~8~~] Bubble sort is an *in place* sorting algorithm *Space Complexity* = **O(1)** Meaning we didnt need any extra arrays or data structures in the whole process ### Bubble Sort Practice Check out the [Wikipedia page](https://en.wikipedia.org/wiki/Bubble_sort) for each sort. There's normally some colorful illustration near the top, then a GIF showing the sort in action. ## Merge Sort You split a huge array down as much as possible and overtime build it back up, doing comparisons your comparisons and sorting at each step along the way - *Divide & Conquer* For example, take this array to sort: [8,3,1,7,0,10,2] 1. First we break up the array into a bunch of array with one element each [8] [3] [1] [7] [0] [10] [2] 2. Now we need to start building it up again - this step should only have arrays size two and size one left [8] [3,1] [7,0] [10,2] 3. First compare [3,1] to see which is smaller, since one is smaller it'll go frist in the new array and since three is bigger it'll go second (3>1) 4. Now move on and repeat for the next pair of arrays - overall these steps took 3 comparisons: [8] [1,3] [0,7] [2,10] 5. Now combine the first two arrays and the last two arrays to creat an array of three and an array of four [8,1,3] [0,7,2,10] 6. [1,3] are already sorted * (8>1) so 1 will be the first element * (8>3) so 3 will be the next element * tack 8 on the end [1,3,8] 7. Sort the second array * (0<2) so 0 will be the first element * (7>2) so 2 will be the next element * (7<10) 7 will be the next element [0,2,7,10] These steps required 5 comparisons: [1,3,8] [0,2,7,10] 8. Need to go through one more round to get the completely sorted array - repeat the process from the last steps - Sorted array required 6 comparisons: [0,1,2,3,7,8,10] ### Efficency At each step we did one less comparison than the array we were building - when we were building an array of seven, we only did six comparisons m = length of the array So, m is the size of the array the # of comparisons is always going to be one less than that O(n(# steps)) using approximation ![Merge Sort efficiency](https://i.imgur.com/i6hKl70.png) In binary search the number of iteration incremented at every power of two. This time we are *incrementing one after every power of two*. In summary, we're doing roughly n comparisons for log(n) steps that makes an overall complexity of **O(nlog(n))** Efficiency of merge sort is better than efficiency of bubble sort: O(n * n)= O(n²) However, the space efficency for bubble sort(sorting in place/didn't use extra arrays) is better than merge sort(used many copied arrays) ### Merge Sort Practice Sorting is pretty tedious—take a break and check out [this comic](https://xkcd.com/1185/) inspired by merge sort. You can learn more about merge sort, as well as see many more visualizations, in the online [_Algorithms_ textbook](http://algs4.cs.princeton.edu/22mergesort/). This book is often nice for a more in-depth analysis of topics, but beware—all of the examples are in Java! For merge sort, it's particularly worth reading up on top-down and bottom-up merge sort. ## Quick Sort In many case, Quick sort is one of the most efficient sorting algorithms: 1. Pick one of the values in the array at random (*Pivot*) 2. Move all values larger than it above it and all values below it, lower than it. 3. You continue on recursively, picking a pivot in the upper and lower sections of the array sorting them similarly until the whole array is sorted ### Eficiency The magic of this algorithm, is that it cuts the number of comparisons you need to do by splitting the array in half every time. *Worst case*: Not splitting the array in half and have to do all of the comparisons everytime - O(n²) *Average & Best case:* **O(nlog(n)**) good case the pivot will move to the middle and you'll get to divide the array in half everytime. Space Complexity = **O(1)** Meaning we didnt need any extra arrays or data structures in the whole process ### Quick Sort Practice Implement quick sort in Python: 1. Input a list. 2. Output a sorted list. ```python def quicksort(array): if len(array) < 2: # base case, arrays with 0 or 1 element are already "sorted" return array else: # recursive case pivot = array[0] # sub-array of all the elements less than the pivot less = [i for i in array[1:] if i <= pivot] # sub-array of all the elements greater than the pivot greater = [i for i in array[1:] if i > pivot] return quicksort(less) + [pivot] + quicksort(greater) test = [21, 4, 1, 3, 9, 20, 25, 6, 21, 14] print quicksort(test) ``` Investigate some of the other famous sorting algorithms. You can watch all of them in action [here](http://visualgo.net/en/sorting/)