**Data Structures**

# **Data Structures** # Algorithms: [Notes from Algorithmic Toolbox](https://drive.google.com/file/d/1A46qXWdQnDnxcaNC4v_KxbnznqcDUBD5/view?usp=sharing) # Week 1 * [Dynamic memory managment](https://www.programiz.com/cpp-programming/memory-management#:~:text=of%20an%20array%2C-,the,returns%20the%20address%20of%20the%20first%20element%20of%20the%20array.,-From%20the%20example) * [Understading pointers](https://www.programiz.com/cpp-programming/pointers-function#:~:text=n2%3B%0A*n2%20%3D%20temp%3B-,*n1,respectively.,-Since) * [arrow operator](https://www.geeksforgeeks.org/arrow-operator-in-c-c-with-examples/) * [why pointers aren't intialised with null by default ?](https://stackoverflow.com/questions/1910832/why-arent-pointers-initialized-with-null-by-default) ## Singly Linked List ![](https://i.imgur.com/83sXJ8V.png) ``` PushFront(key) node ←new node node.key ← key node.next ← head head ← node if tail = nil: tail ← head PopFront() if head = nil: ERROR: empty list head ← head.next if head = nil: tail ← nil PushBack(key) node ←new node node.key ← key node.next =nil if tail = nil: head ← tail ← node else: tail.next ← node tail ← node PopBack() if head = nil: ERROR: empty list if head = tail: head ← tail ←nil else: p ← head while p.next.next ̸= nil: p ← p.next p.next ← nil; tail ← p > p is temporary element AddAfter(node, key) node2 ←new node node2.key ← key node2.next = node.next node.next = node2 if tail = node: tail ← node2 ``` Running Time: ![](https://i.imgur.com/AjL4FEa.png) ## Doubly Linked List: ![](https://i.imgur.com/T1N1CU0.png) ``` PushBack(key) node ←new node node.key ← key; node.next = nil; if tail = nil: head ← tail ← node node.prev ←nil else: tail.next ← node node.prev ← tail tail ← node PopBack() if head = nil: ERROR: empty list if head = tail: head ← tail ←nil else: tail ← tail.prev tail.next ←nil AddAfter(node, key) node2 ←new node node2.key ← key node2.next ← node.next node2.prev ← node node.next ← node2 if node2.next ̸=nil: node2.next.prev ← node2 if tail = node: tail ← node2 AddBefore(node, key) node2 ←new node node2.key ← key node2.next ← node node2.prev ← node.prev node.prev ← node2 if node2.prev ̸=nil: node2.prev.next ← node2 if head = node: head ← node2 ``` #### Running time: ![](https://i.imgur.com/xZki8TG.png) ## stacks * [stuct in cpp](https://www.geeksforgeeks.org/structures-in-cpp/) * [constructors for struct](https://stackoverflow.com/questions/1127396/struct-constructor-in-c) * [intializer list](https://www.geeksforgeeks.org/when-do-we-use-initializer-list-in-c/) ### Abstract data types with the following operations: ![](https://i.imgur.com/O6fOgIf.png) #### Question ![](https://i.imgur.com/qVCmKOC.png) ![](https://i.imgur.com/xPLCiSE.png) ![](https://i.imgur.com/XfhWy2R.png) stacks are used in compilers and lots of other applications stack implementiation with array: * limitation has maximum size stack with linked list: * no wasted space * no limitation on size summery: * each operation is O(1): PUSH,pop, top, empty. * stacks are ocassionaly known as LIFO(last in first out) queues. ## Queues ### Abstract data type operations: Enqueue(Key): adds key to collection Key Denqueue(): removes and returns least recently-added key Boolean Empty(): are there any elements ? FIFO: first in first out #### Implementation with linked list ![](https://i.imgur.com/LkL2QoT.png) * can get arbitorily large but have to pay for extra space of pointer. ### Implementation with Array: * we make two more variables read and write * there is atleast one element were we cannot write. * cost for enqueue, deque and empty is o(1) * bounded * [gfg implementation](https://www.geeksforgeeks.org/circular-queue-set-1-introduction-array-implementation/) ## Trees ![](https://i.imgur.com/mNzV6M7.png) ![](https://i.imgur.com/PqScl8R.png) Binary Search tree: * root node is higher left node but less then right node and its elements * A tree is * empty or * a node with: * a key, and * a list of child trees In CS trees normally grow down. * Siblings : share the same parent * leaf: no childrean * level: 1 + num edges between root and node * height: max depth of subtree node and farthest leaf * Forest: Collection of trees * Node contains: * key * children: list of children nodes * (optional) parent * Binary tree: * key * left * right * (optional)parent ``` Height(Tree): if tree = nil: return 0 return 1 + Max(Height(tree.left),Height(tree.right)) ``` ## Tree Traversal: Walking a tree: * we want to visit nodes of tree in particular order. * Depth-first: We completely traverse one sub-tree before exploring a sibling sub-tree * Breadth-first: We traverse all nodes ar one level before progressing to the next level. ``` InOrderTravesal(tree): if tree = nil: return InOrderTraversal(tree.left) print(tree.key) InOrderTraversal(tree.right) ``` ![](https://i.imgur.com/hHp9H3t.png) ![](https://i.imgur.com/Dwh7W9O.png) ![](https://i.imgur.com/oCw7Vxy.png) ![](https://i.imgur.com/lcOzGrL.png) ![](https://i.imgur.com/yqW6lnG.png) ![](https://i.imgur.com/cQmCMxK.png) ![](https://i.imgur.com/c7R9Cqe.png) ------------------------------------------------ # week 2 ## Dynamic Arrays: * [Template class](https://www.geeksforgeeks.org/templates-cpp/) * [free vs delete for new](https://stackoverflow.com/questions/4061514/c-object-created-with-new-destroyed-with-free-how-bad-is-this#:~:text=Yes%20it%20does%20matter.,where%20it%20has%20allocated%20memory.) * [free and delete](https://www.geeksforgeeks.org/delete-and-free-in-cpp/) * [GFG dynamic array ](https://www.geeksforgeeks.org/how-do-dynamic-arrays-work/) * [Pointer and array](https://www.programiz.com/cpp-programming/pointers-arrays) Semi solution: dynamically allocated arrays: ``` int *my_array = new int[size]; ``` Problem: might not know max size when allocating array. All problems in cs can be solved by another level of indirection. Solution: dynamic arrays(also know as resizeable arrays) Idea: store a pointer to dynamically allocated array, and replace it with a newly allocated array as needed. ![](https://i.imgur.com/WGgGUL8.png) We will store: arr: dynamically allocated array capacity: size of the dynamically allocated array size: number of elements currently in the array ``` Get(i): if i < 0 or i >= size: ERROR: index out of range return arr[i] PushBack(val) if size = capacity: allocate new_arr[2 × capacity ] for i from 0 to size − 1: new_arr[i] ← arr[i] free arr arr ← new_arr; capacity ← 2 × capacity arr[size] ← val size ← size + 1 Remove(i) if i < 0 or i ≥ size: ERROR: index out of range for j from i to size − 2: arr[j] ← arr[j + 1] size ← size − 1 ``` #### Runtimes * Get(i) O(1) * Set(i, val) O(1) * PushBack(val) O(n) * Remove(i) O(n) * Size() O(1) ## Amortized Analysis: Aggregate Method * Sometimes, looking at the individual worst-case may be too severe. We may want to know the total worst-case cost for a sequence of operations. Definition Amortized cost: Given a sequence of n operations, the amortized cost is: ``` Cost(n operations)/n ``` ![](https://i.imgur.com/JzfJJDF.png) ## Amortized Analysis: Banker's Method * Charge extra for each cheap operation * Save the extra charge as tokens in your data structures() * Use the tokens to pay for expensive operations Dynamic array: n calls to PushBack Charge 3 for each insertion: 1 token is the raw cost for insertion. Resize needed: To pay for moving the elements, use the token that’s present on each element that needs to move. Place one token on the newly-inserted element, and one token capacity/2 elements prior. ## Amortized Analysis: Physicist’s Method * Define a potential function, Φ which maps states of the data structure to integers: Φ(h0) = 0 Φ(ht) ≥ 0 * amortized cost for operation t: ct + Φ(ht) − Φ(ht−1) * Choose Φ so that: * if ct is small, the potential increases * if ct is large, the potential decreases by the same scale * ![](https://i.imgur.com/6wAGzS1.png) With resize when adding element i Let k = sizei−1 = capi−1 Then: Φ(hi−1) = 2(sizei−1) − (capi−1) = 2k − k = k Φ(hi) = 2sizei − capi = 2(k + 1) − 2k = 2 Amortized cost of adding element i: ci + Φ(hi) − Φ(hi−1) =(sizei) + 2 − k =(k + 1) + 2 − k =3 ### summery * Calculate amortized cost of an operation in the context of a sequence of operations. * Three ways to do analysis: * Aggregate method (brute-force sum) * Banker’s method (tokens) * Physicist’s method (potential function, Φ) * Nothing changes in the code: runtime analysis only # week 3 ## Priority queues ### Naive Implementation #### Using unsorted list * Inserting an element is easy we just add it to the end of array. * But for finding maximum element we require to scan whole list/array.Therefore running time is O(n) #### Using sorted list * Finding maximum element has running time O(1). * For inserting element we can have running time of O(logn) by using binary search but we have to shift elements to insert new element and then running time becomes O(n). * But if we use doubly linked list: * to get max element: O(1) * Also for inserting element: running time O(1) * But for finding a position we cannot use binary search in linked list therefore running time again becomes O(n). ## Heap ### Binary Heaps: Binary max-heap is a binary tree(each node has zero, one or two children) where the value of each node is at least the value of its children. In other words: For each edge of the tree the value of the parent is at least the value of the children. #### Basic Operations: * GetMax: Maximum value is stored at the root. * Insert: first we insert it in heap tree and then move it at the appropriate edge. running time: O(tree height) * ExtractMax: Use shift down. * Change Priority: Running time O(tree height) * remove: first we change priority of the node to infinity and then it moves up to the root, then we remove root and then refresh tree orer. running time : O(tree height) #### Complete Binary Trees: * A binary tree is complete if all its levels are filled except possibly the last one which is filled from left to right. * Advantage: It has height of atmost O(log(n)) for n nodes. ![](https://i.imgur.com/ufsjra4.png) * Second Advantage: We can stored it as array. ![](https://i.imgur.com/xULveEs.png) * Cost for these advantages: * Insert(): * We insert new element in leftmost vacant position in last level and then let it shift up. * ExtractMax(): * Extract maximum value i,e at value at root position. Then move last element to the root position. * Pseudocode: * maxSize: maximum number of elements in the heap * size is size of heap * H[1, ... maxSize] is an array of length *maxSize* where the heaps occupies the size elements. ``` Parent(i) return ⌊i/2⌋ LeftChild(i) return 2i RightChild(i) return 2i + 1 SiftUp(i) while i > 1 and H[Parent(i)] < H[i]: swap H[Parent(i)] and H[i] i ← Parent(i) SiftDown(i) maxIndex ← i ℓ ← LeftChild(i) if ℓ ≤ size and H[ℓ] > H[maxIndex]: maxIndex ← ℓ r ← RightChild(i) if r ≤ size and H[r] > H[maxIndex]: maxIndex ← r if i ̸= maxIndex: swap H[i] and H[maxIndex] SiftDown(maxIndex) Insert(p) if size = maxSize: return ERROR size ← size + 1 H[size] ← p SiftUp(size) ExtractMax() result ← H[1] H[1] ← H[size] size ← size − 1 SiftDown(1) return result Remove(i) H[i] ← ∞ SiftUp(i) ExtractMax() ChangePriority(i, p) oldp ← H[i] H[i] ← p if p > oldp: SiftUp(i) else: SiftDown(i) ``` ## Heap Sort: ``` HeapSort(A[1...n]): create an empty priority queue for i from 1 to n: Insert(A[i]) for i from n downto 1: A[i] <- ExtractMas() ``` * The resulting algorithm is comparison-based and has running time of O(nlogn) ( Hence, asymptotically optimal!) * Natural generalization of selection sort: * instead of simply scanning the rest of the array to find the maximum value, use a smart data structure. * Not in-place: uses additional space to store the priority queue. Turn Array into a Heap: ``` BuildHeap(A[1..n]) size <- n for i from sealing(n/2) downto 1: SiftDown(i) ``` * We repair the heap property going from bottom to top * Initially the heap property is statisfied in the leaves ( i.e, ) * We then start repairing the heap property in all subtrees of depth 1. * When we reach the root, the heap property is statisfied in the whole tree. * [Heap sort visualization](https://www.cs.usfca.edu/~galles/visualization/HeapSort.html) * Running time O(nlogn) ``` HeapSort(A[1 . . . n]): BuildHeap(A) {size = n} repeat (n − 1) times: swap A[1] and A[size] size ← size − 1 SiftDown(1) ``` * Dont need additional array. * Has worst case running time of nlogn. ## Building a Heap: * The running time of BuildHeap is O(n log n) since we call SiftDown for O(n) nodes. * If a node is already close to the leaves, then sifting it down is fast. * We have many such nodes. ![](https://i.imgur.com/noTlO7g.png) Partial sorting: * Input: An array A[1 . . . n], an integer 1 ≤ k ≤ n. * Output: The last k elements of a sorted version of A. * Can be solved in O(n) if k = O(n/log n)! ``` PartialSorting(A[1...n],k): BuildHeap(A) for i from 1 to k: ExtractMax() ``` Running time: O(n + klogn) Heap sort is a time and space efficient comparison-based algorithm: has running time O(nlogn), and uses no additional space. ## Disjoint Sets: Defination: A disjoint-set data structure supports the following operations: * MakeSet(x) creates a singleton set {x} * Find(x) returns ID of the set containing x: * if x and y lie in the same set, then * Find(x) = Find(y) * otherwise, Find(x) ̸= Find(y) * Union(x, y) merges two sets containing x and y ``` Preprocess(maze): for each cell c in maze: MakeSet(c) for each cell c in maze: for each neighbor n of c: Union(c, n) IsReachable(A, B) return Find(A) = Find(B) ``` ### Naive Implementation * Using the Smallest Element as ID * Use the smallest element of a set as its ID * Use array smallest[1 . . . n]: smallest[i] stores the smallest element in the set i belongs to. ![](https://i.imgur.com/PlxpQCc.png) ``` MakeSet(i): smallest[i] <- i Find(i): return smallest[i] Union(i, j): i_id ← Find(i) j_id ← Find(j) if i_id = j_id: return m ← min(i_id, j_id) for k from 1 to n: if smallest[k] in {i_id, j_id}: smallest[k] ← m ``` * Current bottleneck: Union * which basic data structure allows for efficient merging? -> linkded list * Idea: Represent a set as linked list, use the list tail as ID of the set. ## Trees for Disjoint sets: ![](https://i.imgur.com/ZF7OqX3.png) ``` MakeSet(i) parent[i] ← i Running time: O(1) Find(i) while i ̸= parent[i]: i ← parent[i] return i Running time: O(tree height) ``` ![](https://i.imgur.com/QeuEwGX.png) * we will hang shorter tree to bigger tree as we want to keep trees shallow. * To quickly find a height of a tree, we will keep the height of each subtree in an array rank[1 . . . n]: rank[i] is the height of the subtree whose root is i * Hanging a shorter tree under a taller one is called a union by rank heuristic Addition to Add rank: ``` MakeSet(i) parent[i] ← i rank[i] ← 0 Find(i) while i ̸= parent[i]: i ← parent[i] return i // Find doesn't need to be changed. ``` ![](https://i.imgur.com/bsZH800.png) ![](https://i.imgur.com/tA0w7PD.png) ![](https://i.imgur.com/OkMPe17.png) ![](https://i.imgur.com/xJ5ZVmh.png) The union by rank heuristic guarantees that Union and find work in O(logn) ### Path Compression: ![](https://i.imgur.com/8ZyDN6f.png) ``` Find(i): if i ̸= parent[i]: parent[i] ← Find(parent[i]) return parent[i] ``` #### Iterated logarithm(Defination): * The iterate logarithm of n, log*n, is the number of times the logarithm funciton needs to be applied to n before the result is less or equal than 1. ![](https://i.imgur.com/NyPW5Pg.png) * for any practical applicaitons log*n will be atmost 5. * Lemma: * Assume that initially the data structure is empty. We make a sequence of m operations including n calls to MakeSet. Then the total running time is O(m log*n). * In other words: The amortized time of single operation is O(log*n) ##### Analysis: [Video](https://www.coursera.org/learn/data-structures/lecture/GQQLN/analysis-optional) * Goal: Prove that when both union by rank heuristic and path compression heuristic are used, the average running time of each operation is nearly constant. * When using path compression, rank[i] is no longer equal to the height of the subtree rooted at i. * Still, the height of the subtree rooted at i is at most rank[i] * And it is still true that a root node of rank k has at least 2k nodes in its subtree: a root node is not affected by path compression ![](https://i.imgur.com/tNtXsqz.png) * Important Properties: * there are at most n/2^k nodes of rank k. * for any node i, rank[i] < rank[parent[i]] * Once an internal node, always an internal node. ![](https://i.imgur.com/G6Ie0zD.png) ![](https://i.imgur.com/eIKdQHs.png) # week 4 ## Direct Addressing: ![](https://i.imgur.com/K2YH4cQ.png) * how to implement data structure for C. * We can create integer array of A of size 2^32 * use A[int(IP)] as C[IP] * ![](https://i.imgur.com/00LJ6Of.png) * concatenate all 4 bytes then we get 32 bits. then we convert that to decimal form. * ![](https://i.imgur.com/K16v9cj.png) ``` UpdateAccessList(log, i, j, C) while log[i].time ≤ Now(): C[log[i].IP] ← C[log[i].IP] + 1 i ← i + 1 while log[j].time ≤ Now() − 3600: C[log[j].IP] ← C[log[j].IP] − 1 j ← j + 1 AccessedLastHour(IP, C) return C[IP] > 0 int(IP) return IP[1]·2^24+IP[2]·2^16+IP[3]·2^8+IP[4] UpdateAccessList(log, i, j, A) while log[i].time ≤ Now(): A[int(log[i].IP)] ← A[int(log[i].IP)] + 1 i ← i + 1 while log[j].time ≤ Now() − 3600: A[int(log[j].IP)] ← A[int(log[j].IP)] − 1 j ← j + 1 AccessedLastHour(IP) return A[int(IP)] > 0 ``` ## List base mapping * direct addressing requires too much memory * so lets store IP adress in list form. ``` UpdateAccessList(log, i, L): while log[i].time ≤ Now(): log_line ← L.FindByIP(log[i].IP) if log_line ̸= NULL: L.Erase(log_line) L.Append(log[i]) i ← i + 1 while L.Top().time ≤ Now() − 3600: L.Pop() AccessedLastHour(IP, L): return L.FindByIP(IP) ̸= NULL AccessCountLastHour(IP,L): return L.CountIP(IP) ``` ## Hash Functions: * We need to encode IPs * Direct adressing worked fast but used lot of memory. * **Definition** For any set of objects S and any integer m > 0, a function h : S → {0, 1, . . . , m − 1} is called a hash function **Definition** m is called the cardinality of hash function h. * h should provide differnt values for different objects. Collisions: **Definition** When h(o1) = h(o2) and o1 ̸= o2, this is a collision. ## Chaining: **Definition** Map from S to V is a data structure with methods HasKey(O), Get(O), Set(O, v), where O ∈ S, v ∈ V . Example: ![](https://i.imgur.com/YA8Bkj0.png) ``` HasKey(O) L ← A[h(O)] for (O′, v′) in L: if O′ == O: return true return false Get(o): L ← A[h(O)] for (O′, v′) in L: if O′ == O: return v′ return n/a Set(O, v) L ← A[h(O)] for p in L: if p.O == O: p.v ← v return L.Append(O, v) ``` ## Hash Tables: **Defination** Set is a data structure with methods Add(o), Remove(o), Find(o). h : S → {0, 1, . . . , m − 1} O,O′ ∈ S v, v′ ∈ V A ← array of m lists (chains) of pairs (O, v) ``` Find(O) L ← A[h(O)] for O′ in L: if O′ == O: return true return false Add(O): L <- A[h(O)] for O' in L: if O' == O: return L.Append(O) Remove(O): if not Find(O): return L ← A[h(O)] L.Erase(O) ``` ## hash Functions: * Phone book problem: The data structure should be able to do the following quickly: Add and delete contacts,Lookup the phone number by name,Determine who is calling given their phone number. * Direct Addressing: int(123-45-67) = 1234567 Create array Name of size 10L where L is the maximum allowed phone number length Store the name corresponding to phone number P in Name[int(P)] If no contact with phone number P, Name[int(P)] = N/A * Chaining: ![](https://i.imgur.com/uJpP2vZ.png) Parameters: n phone numbers stored m cardinality of the hash function c length of the longest chain O(n + m) memory is used 𝛼 = n/m is called load factor Operations run in time O(c + 1) You want small m and c * Hash function must be deterministic therefore we cannot use random no. * Good hash functions should Deterministic,Fast to compute,Distributes keys well into different cells,Few collisions * There is no universal Hash function: Lemma If number of possible keys is big (|U| ≫ m), for any hash function h there is a bad input resulting in many collisions. ![](https://i.imgur.com/pe6FoHX.png) * Universal Family: we will define family of hash functions and choose random function from that family. ![](https://i.imgur.com/59r5CWm.png) **how Randomization work** * h(x) = random({0, 1, 2, . . . , m − 1}) * gives probability of collision exactly 1/m. * It is not deterministic can't use it. * All hash functions in H are deterministic * Select a random function h from H * Fixed h is used throughout the algorithm **Runnning Time** * Lemma If h is chosen randomly from a universal family, the average length of the longest chain c is O(1 + 𝛼), where 𝛼 = n/m is the load factor of the hash table. * Corollary If h is from universal family, operations with hash table run on average in time O(1 + 𝛼) **Choosing Hash table Size** ![](https://i.imgur.com/iwelydm.png) **Dynamic Hash Tables** * if we dont know keys n in advance we can use the idea of dynamic arrays , so when $/alpha$ becomes too large , resize the hash table then choose new hash function and reahash all the objects. To keep load factor below 0.9 ``` Rehash(T) loadFactor ← T.numberOfKeys/T.size if loadFactor > 0.9: Create Tnew of size 2 × T.size Choose hnew with cardinality Tnew.size For each object O in T: Insert O in Tnew using hnew T ← Tnew, h ← hnew ``` Similarly to dynamic arrays, single rehashing takes O(n) time, but amortized running time of each operation with hash table is still O(1) on average, because rehashing will be rare ## Hashing Integers: Take phone numbers up to length 7, for example 148-25-67 Convert phone numbers to integers from 0 to 107 − 1 = 9 999 999: 148-25-67 → 1 482 567 Choose prime number bigger than 107, e.g. p = 10 000 019 Choose hash table size, e.g. m = 1 000 **Lemma** ![](https://i.imgur.com/dQp3mmZ.png) **Example** ![](https://i.imgur.com/7RH11Tv.png) General Case: * Define maximum length L of a phone number * Convert phone numbers to integers from 0 to 10L − 1 * Choose prime number p > 10L * Choose hash table size m * Choose random hash function from universal family 𝒣p (choose random a ∈ [1, p − 1] and b ∈ [0, p − 1]) ## Hashing Strings: * Given a string S, compute its hash value * S = S[0]S[1] . . . S[|S| − 1], where S[i] are individual characters * We should use all the characters in the hash function * Otherwise there will be many collisions * For example, if S[0] is not used, then h(“aa”) = h(“ba”) = · · · = h(“za”) **Polynomial Hashing** ![](https://i.imgur.com/9G004zE.png) ``` PolyHash(S, p, x) hash ← 0 for i from |S| − 1 down to 0: hash ← (hash · x + S[i]) mod p return hash ``` ## Polynomical Hashing: ## Find substring in a text: ## Rabin-Karps ALgorithm: ``` AreEqual(s1,s2): if |S1| ̸= |S2|: return False for i from 0 to |S1| − 1: if S1[i] ̸= S2[i]: return False return True FindSubString(T,P): positions ← empty list for i from 0 to |T| − |P|: if AreEqual(T[i..i + |P| − 1], P): positions.Append(i) return positions ``` # week 5 ## Binary search Tree: * Local Search: Stores a number of elements each with a key coming from an ordered set. It suppports Operations. Rangesearch(x,y):returns all elements with keys betwenn x and y . NearestNeighbors(z): returns the elements with keys on either side of z. Parts of Tree: ![](https://i.imgur.com/RyhfhHp.png) Search Tree Property: X's key is larger than the key of any desendent of its left child, and smaller than the key of any desendant of its right child. ![](https://i.imgur.com/3azcs8z.png) ![](https://i.imgur.com/ZUcWa8B.png) ![](https://i.imgur.com/bXYERur.png) ![](https://i.imgur.com/uvT0jOO.png) ![](https://i.imgur.com/X72iJjE.png) * If in the presented code N is the largest element in the tree, you will get an error when finding the RightAncestor after trying to find the grandparent of the root node. To fix this, you would want to modify RightAncestor to return null if the input is null. * Range Search: Input Numbers x,y root R Output: A list of nodes with key between x and y. ![](https://i.imgur.com/pyphkSP.png) * Insert: ![](https://i.imgur.com/mIVg0g9.png) * Delete: ![](https://i.imgur.com/KB3B7OO.png) * Runtime: * Balance: want left and right subtree to have approximately the same size. * Rebalancing tree: ![](https://i.imgur.com/Ey6dlc6.png) ##### AVL Trees: * Introduction: Add height field to nodes. To balance we want size of subtrees roughly the same for that we force height to be roughly the same. ![](https://i.imgur.com/uPrcX8a.png) we need to show that height = O(log(n)) Alternatively, show that large height implies many nodes. ![](https://i.imgur.com/eIWgMm8.png) If we can maintain AVL property , we can perform operations in O(log(n)) time. Implementation: Insertion that does rebalancing. ![](https://i.imgur.com/RK1abak.png) ![](https://i.imgur.com/g1cXFIN.png) ![](https://i.imgur.com/4zOO4hL.png) Bad case where rotate right doesn't work: ![](https://i.imgur.com/aNZszKt.png) ![](https://i.imgur.com/gH5TSn6.png) ![](https://i.imgur.com/O146IYU.png) Deletions: ![](https://i.imgur.com/QTqvVzo.png) Merge: it takes: O(n) time. Input: Roots R1 and R2 of trees with all keys in R1's tree smaller than those in R2's Output: The root of new tree with all the elements of both trees. Easy if we have an extra node to add as root. ![](https://i.imgur.com/yNQ4eCc.png) ![](https://i.imgur.com/bCTg1s5.png) If extra node for merger is not given, we find largest element from left tree and turn it into the root of the new tree. ![](https://i.imgur.com/QAOikGM.png) This merge may not preserve balance propertise. To tackle this we go down the tree until we merge with subtree of same height. ![](https://i.imgur.com/XxYDbAb.png) * Implementation: ![](https://i.imgur.com/lVhwwtx.png) ![](https://i.imgur.com/6jc5UIX.png) * Analysis: ![](https://i.imgur.com/DJBcSqq.png) Split: ![](https://i.imgur.com/TiwDnq0.png) AVL Trees: * Using AVLMergeWithRoot maintains balance. ![](https://i.imgur.com/dfzqcDO.png) Applications: * Ordered statistics: Input: the root of a tree T and a number k. Output: The kth smallest element in T. We add new field: N.size ![](https://i.imgur.com/IH6XdUa.png) ![](https://i.imgur.com/VWs0jQa.png) ![](https://i.imgur.com/DgyCmrD.png) Runtime O(h) * Colour Flips: ![](https://i.imgur.com/t6OjUcv.png) * Operations: ![](https://i.imgur.com/aN9ZBSz.png) Create: ![](https://i.imgur.com/ig78ujr.png) Find: ![](https://i.imgur.com/qKfGQR6.png) Flip: ![](https://i.imgur.com/eZlRBz4.png) ##### Splay Trees: * Introduction: we want comman searched node near to the root, so we bring the searched node next to the root. ![](https://i.imgur.com/3trGqm1.png) Modifications: Zig-Zig: ![](https://i.imgur.com/5nNXrCp.png) Zig-Zag: ![](https://i.imgur.com/GtQq55P.png) Zig: ![](https://i.imgur.com/44C4cIL.png) Splay Operation: ![](https://i.imgur.com/mRjuM1v.png) Implementation: Amortized Analysis: ![](https://i.imgur.com/8nF60CA.png) Find: ![](https://i.imgur.com/gnFjvT7.png) * Imp point: Even if we fail to find k, we still have to splay the closest node, otherwise operation will do O(D) work without anything to amortize against. Insert: ![](https://i.imgur.com/WYmEvpJ.png) ![](https://i.imgur.com/iiivJVM.png) Split: ![](https://i.imgur.com/uUE34nM.png) CutLeft ![](https://i.imgur.com/aFLvcaP.png) Merge: ![](https://i.imgur.com/OkdEQMJ.png) Splay trees perform all this operations in O(log(n)) **Analysis of splay trees:** Rank: ![](https://i.imgur.com/6Db2G8N.png) Zig analysis: ![](https://i.imgur.com/Fe2hbuC.png) Zig-Zig Analysis: ![](https://i.imgur.com/MYpbnzL.png) ![](https://i.imgur.com/3AFgP2C.png) Zig-Zag Analysis: ![](https://i.imgur.com/wPBiglB.png) Total Change: ![](https://i.imgur.com/mRozSq1.png) Weighted Nodes: ![](https://i.imgur.com/38lZX7V.png) Dynamic Finger: ![](https://i.imgur.com/JmUWHYF.png) Working Set Bound: ![](https://i.imgur.com/2QLYD8T.png) ![](https://i.imgur.com/WaMf8Z0.png) # **Graph Theory** * [6.06 Algorithmic understanding](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/) ## BFS * [BFS video](https://www.youtube.com/watch?v=oDqjPvD54Ss) * ![](https://i.imgur.com/cTKt87K.png) ## Coding notes: * substracting -1 from unsigned int of value 1 will result value one less than highest power of 2.[link](https://stackoverflow.com/questions/48646518/why-vector-size-1-gives-garbage-value) *