Try   HackMD

CS2040C Data Structures and Algorithms

Notes.

Some algorithms are in psedocode/Java, some are C++ code.

Useful Website:

Introduction

Table of Content (Link)
Table of Content (Tree)
CS2040C Data Structures and Algorithms
├── Searching/Sorting
│   └── Binary Search, Merge Sort, Quick Sort, Heap Sort*
├── Linked List
├── BST
├── Hash Table
│   ├── Hash Collision
│   │   ├── Chaining
│   │   └── Open Addressing
│   │       ├── Linear Probing
│   │       ├── Quadratic Probing
│   │       └── second hash function (cuckoo hashing)
│   ├── Hashing Techniques
│   │   ├── Division Method
│   │   └── Multiplication Method
│   └── Hash Table Size
├── Priority Queue
│   ├── Priority Queue Operations
│   ├── Store Heap in Array
│   └── Heap Sort
├── Disjoint Set
│   ├── Quick Find
│   ├── Quick Union
│   ├── Weighted Union
│   ├── Weighted Union with Path Compression
│   └── Union Find Algorithm Summary
├── Graph
│   ├── Graph Representations
│   ├── Travesal Algorithms
│   │   ├── DFS
│   │   ├── BFS
│   │   └── Analysis
│   ├── Shortest Path Algorithms
│   │   ├── Dijkstra's algorithm
│   │   ├── Bellman-Ford algorithm
│   │   └── Analysis
│   ├── Minimum Spanning Tree Algorithms
│   │   ├── Prim's algorithm
│   │   ├── Kruskal's algorithm
│   │   ├── Boruvka’s algorithm
│   │   └── Analysis
│   └── Topological Sort
└── Binary Space Partitioning

Data structures serve as the basis for abstract data types (ADT). ADT is in the logical level and data structure is in the implementation level.

ADT Data Structure
Stack, Queue Linked List, Array
Tree Linked List, Array (too slow for AVL)
Binary Search Tree, Balanced Tree Tree
Priority Queue Binary Heap (using array)
Disjoint Set
Graph Graph*
- Hash Table

Put simply, whether one thing is a data structure or not really depends on how we look at it. stackoverflow

  • An ADT can be both ADT and data structure?
  • TBH I am still not sure which one is which one

Searching & Sorting

int arr[100] = {...};

int binarySearch(const int arr[], int size, int target) {
    int start = 0; 
    int end = size;
    int mid = 0;
    while (start < end) {
        mid = (start + end) / 2;
        if (arr[mid] == target) {
            return mid;
        }
        if (arr[mid] > target) {
            end = mid - 1;
        } else {
            start = mid + 1;
        }
    }
    return -1;
}

Merge Sort

// mergeSort(arr, 0, 100) <- not 99 !
int arr[100] = {...};

void merge(int arr[], int start, int mid, int end) {
    int leftSize = mid - start + 1;
    int rightSize = end - mid;
    int currIndexLeft = 0;
    int currIndexRight = 0;
    int currIndexMerged = start;

    // temporary array
    int* leftArray = new int[leftSize];
    int* rightArray = new int[rightSize];

    for (int i = 0; i < leftSize; i += 1) {
        leftArray[i] = arr[start + i];
    }
    for (int i = 0; i < rightSize; i += 1) {
        rightArray[i] = arr[mid + 1 + i];
    }
    
    // merge
    while (currIndexLeft < leftSize && currIndexRight < rightSize) {
        if (leftArray[currIndexLeft] <= rightArray[currIndexRight]) {
            arr[currIndexMerged] = leftArray[currIndexLeft];
            currIndexLeft += 1;
        } else {
            arr[currIndexMerged] = rightArray[currIndexRight];
            currIndexRight += 1;
        }
        currIndexMerged += 1;
    }

    // remaining left
    while (currIndexLeft < leftSize) {
        arr[currIndexMerged] = leftArray[currIndexLeft];
        currIndexLeft += 1;
        currIndexMerged += 1;
    }

    // remaining right
    while (currIndexRight < rightSize) {
        arr[currIndexMerged] = rightArray[currIndexRight];
        currIndexRight += 1;
        currIndexMerged += 1;
    }

    delete[] leftArray;
    delete[] rightArray;
}

void mergeSort(int arr[], int start, int end) {
    if (start < end) {
        int mid = start + (end - start) / 2; // prevent overflow for big number
        mergeSort(arr, start, mid);
        mergeSort(arr, mid + 1, end);
        merge(arr, start, mid, end);
}

Quick Sort

// quickSort(arr, 0, 100) <- not 99 !
int arr[100] = {...};

int partition(int arr[], int start, int end) { // A = [1...n], A[end - 1] = n
    int pivotIndex = start;
    int pivot = arr[start]; // select first element as pivot
    //int pivot = arr[end-1]; // select last element as pivot
    
    start += 1; // since A[start] is pivot
    while (start < end) {
        while ((arr[start] <= pivot) && (start < end)) {
            start++;
        }
        while ((arr[end - 1] > pivot) && (start < end)) {
            end--;
        }
        if (start < end) {
            // swap A[start], A[end-1];
            int temp = arr[start];
            arr[start] = arr[end - 1];
            arr[end - 1] = temp;
        }
    }
    // swap A[pivotIndex], A[start - 1];
    arr[pivotIndex] = arr[start - 1];
    arr[start - 1] = pivot;
    return (start - 1);
}

void quickSort(int arr[], int start, int end) { // A = [1...n], A[end - 1] = n
    if (start < end) {
        int p = partition(arr, start, end); // p is "sorted"
        quickSort(arr, start, p);
        quickSort(arr, p + 1, end);
    }
}

Linked List

TODO

Linked List Tricks:

Middle of Linked List:

public Node middleNode(Node head) {
     Node fast = head;
     Node slow  = head;

// Here is one tricky part, while giving condition of loop
// For list with even length, fast pointer is going to reach end i.e. null
// For off length, fast pointer is going to be at the tail
// so we have to terminate the loop 
// if fast becomes null && fast.next becomes null

     while( fast!=null && fast.next!=null){
         fast = fast.next.next; // moving fast by 2 step
         slow = slow.next;
     }
     return slow;
}

Nth node from the end of the Linked List:

public getNodeFromEnd(Node head, int n){
    Node fast = head;
    Node slow = head;
    
//moving fast ahead by n, to make gap
    for(int i=0;i<n;i++){
         fast = fast.next; 
    }

// when the fast reaches end, slow will be the nth node from end
    while(fast.next!=null){
         fast = fast.next;
         slow = slow.next;
    }
    return slow; 
}
// NOTE: in case n is more than length of list, handle that case according to
// problem given

Detect Cycle

// code from my leetcode solution
/**
 * Definition for singly-linked list.
 * struct ListNode {
 *     int val;
 *     ListNode *next;
 *     ListNode(int x) : val(x), next(NULL) {}
 * };
 */
class Solution {
public:
    bool hasCycle(ListNode *head) {
        // Not so good solution but worth trying: hash table / store node visited
        if (head) {
            ListNode* slow = head;
            ListNode* fast = head->next;
            while (fast) {
                if (fast == slow) {
                    return true;
                }
                fast = fast->next;
                if (fast) {
                    fast = fast->next;
                }
                slow = slow->next;
            }
        }
        return false;
    }
};

Binary Search Tree (BST)

Tree: graph that is circuit free and connected

Binary Tree: a rooted tree in which every parent has at most 2 children, which is referred to as the left child and the right child

Binary Search Tree (BST): a rooted binary tree with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree

Height-balanced BST: a BST in which the height of the left and the right subtree of any node differ by not more than 1*

  • *different height-balanced BST will have slightly different definition
  • e.g. AVL tree and Red-black tree

TODO

Hash Table

Motivation:

When the ordering of the data is not important, hash table is a better choice as it provides constant time* search, insert and delete operations.

* performance degrades to

O(n) as load factor increase

Hash Collisions

Definition: Two distinct keys

k1 and
k2
collide
h(k1)=h(k2)

Impossible to have a hash function with no collisions, by pigeonhole principle

Resolving collision:

  1. chaining (put both items in the same bucket)
  2. open addressing (find another bucket for the new item)
    • linear probing
    • quadratic probing
    • second hash function (cuckoo hashing)

Hashing with Chaining

Idea: Each bucket contains a linked list of items

Space complexity:

O(m+n), where
m
= table size and
n
= linked list size

Operations:

// Insertion
1. insert(key, value)
2. Calculate h(key)
3. Lookup h(key) and add (key,value) to the linked list
// Searching / Deletion
1. Calculate h(key)
2. Search for (key,value) in the linked list

Analysis:

Optimistic

  • Consider the Simple Uniform Hashing Assumption (SUHA), where every key is equally likely to map to every bucket. Then, as long as there are enough buckets, we won’t get too many keys in any one bucket.
  • Simple Uniform Hashing Assumption: Suppose
    n
    items and
    m
    buckets, then define
    load(hash table)=nm=average # itemsbuckets
  • Then by (SUHA), expected search time
    =1hash function + array access+nmlinked list traversal
    .
  • If
    m>n
    , expected search time =
    O(1)
  • Expected search time
    =1+nm=O(1)
  • Worst-case search time
    =O(n)
  • Worst-case insertion time
    =O(1)

Reality

  • BUT Simple Uniform Hashing doesn’t exist. In reality, keys are not random and have lots of regularity which can induce patterns in hash functions.

Open Addressing

Idea: When collision happened, probe a sequence of buckets until you find an empty one.

Probing:

Operation:

//searching
function search(key) {
    int i = 0;
    while (i <= m) {
        int bucket = h(key, i);
        if (T[bucket] == null) // Empty bucket!
            return key-not-found;
        if (T[bucket].key == key) // Full bucket.
            return T[bucket].data;
        i++;
    }
    return key-not-found; // Exhausted entire table.
}
// deletion
1. search() to get the position
2. Set the slot to be “DELETED”, DON'T set to "NULL"
// insertion
1. algorithm similar to search()
2. replace an item on the slot of "DELETED"

Analysis:

Performance

  • When
    m==n
    , the table is full and we cannot insert any more items, cannot search efficiently (chaining can still search and insert efficiently).
  • Define
    load,α=nm
    and assume
    α<1
    , assuming SUHA, the expected cost of operation,
    E11α
  • Prove for
    E11α
    is as below.
Prove for cost of operation, E

First probe: probability that first bucket is full is:

nm

Second probe: if first bucket is full, then the probability that the second bucket is also full:

n1m1

Third probe: probability is full:

n2m2

Expected #probes

=1first probe+nmprobability of collision on first probe(Expected cost of remaining probes)
=1+nm(1+n1m1probability of collision on second probe+Expected cost of remaining probes)

()=1+nm(1+n1m1(1+n2m2(1+n3m3)()))

Note that,

niminmα

So,

()1+α(1+α(1+α(1+α)()))
1+α2+α3+α4+

11α

 

Advantages

  • Save space
  • Rarely allocate memory, as no new list-node allocations
  • Better cache performance
    • table all in one place in memory
    • fewer accesses to bring table into cache.
    • linked lists can wander all over the memory

Disadvantages

  • More sensitive to choice of hash functions
    • clustering is a common problem
  • More sensitive to load
    • performance degrades badly as
      α1

Hashing Techniques

Two common hashing techniques:

  • Division Method
  • Multiplication Method

Division Method

Hash Function:

h(k)=k mod m, where

m=table size

Then, two keys

k1 and
k2
collide
k1 mod m=k2 mod m

Problem Solutions
If
k
and
m
has a common divisor
d
, then
k=k mod m+im
, so
h(k)=k mod m
is divisible by
d
. For all
k
that are divisible by
d
, only
1d
fraction of the hash table will be utilized
Choose
m
to be a prime number
Division is slow

Multiplication Method

Hash Function:

h(k)=(Ak) mod 2w(wr), where

m=2r=table size, for some constant r

w=size of a key in bits

A=(odd) constant

TODO: need more diagram to illustrate how it work

Pros:

  • Multiplication method is faster than Division Method
  • Works reasonably well when
    A
    is an odd integer
    >
    2w1
    • Odd: if it is even, then lose at least one bit’s worth
    • Big enough: use all the bits in
      A

Hash Table Size

Let number of items

=n, table size
=m

If

m<2n, then too many collisions

If

m>10n, then too much wasted space

Also, we don’t know

n in advance

Idea:

  • Start with small (constant) table size
  • Grow (and shrink) table as necessary

Increasing table size:

  • Increment table size by 1
  • Double the size of the table
    • Cannot wait until
      n==m
    • Resizing when
      n
      was
      n2,n4,
    • Total time complexity =
      O(1++n8+n4+n2+n)=O(n)
    • In average, every addition of an item cost
      O(1)
  • Square the size of the table

Shrinking table:

  • Similar to increase table size
  • Need to allow some buffer before shrinking

Analysis:

Cost

  • Scanning old hash table:
    O(size of the old hash table,m1)
  • Creating new hash table:
    O(size of the new hash table,m2)
  • Inserting each element in new hash table:
    O(1)
  • Total cost:
    O(m1+m2+n)

Extra Questions

Extra Questions
  1. Let's say a company has about 600+ employee and we would like to store them into a hash table with size

    m=4096. In order to avoid confusion if there are two employees with the same name from different department, we will hash the concatenation of an employee name with his/her department. We will then convert the string into "binary" performing
    mod2
    on the ASCII value of each character. For example,
    KIMACCOUNTING75 73 77 65 67 67 79 85 78 84 73 78 711 1 1 1 1 1 1 1 0 0 1 0 1.
    After
    mod2
    with each number, the binary number will be
    1111111100101=8165
    in decimal. And the final hash value is
    h(8165)=8165modm=4096.
    Is this hashing method good or bad? Give reasons to support your answer.

    • There will be a lot of collision
    • A lot of hash table entries are empty. Because after
      mod4096
      , it basically just use the least 12 significant digits that are all coming from the department.

Priority Queue

The aim is to develop an ADT that manages a collection of objects with associated priorities. This ADT requires several operations:

Operations Description
insert(Key k, Priority p) insert k with priority p
extractMin() / extractMax() remove key with minimum / maximum priority
decreaseKey(Key k, Priority p) / increaseKey(Key k, Priority p) reduce / increase the priority of key k to priority p
contains(Key k) does the priority queue contain key k?
isEmpty() is the priority queue empty?

Assume data items are unique.

Why Binary Heap?

Data Structure Operations Priority Queue Sort
insert(x) extractMax() Time In-place?
Sorted Array O(n) O(1) O(n^2) Yes (insertion sort)
Unsorted Array O(1) O(n) O(n^2) Yes (selection sort)
AVL Tree O(logn) O(logn) O(nlogn) No (AVL sort)
Heap / Binary Heap O(logn) O(logn) O(nlogn) Yes (Heap sort)

A binary heap can be visualized as a binary tree (NOT BST), and it can be implemented using an array.

Q: Heap vs. AVL Tree

  • same cost for operations
  • slightly simpler
    • no rotations
    • implement using array
  • slightly better concurrency

Q: Can we implement AVL tree using array?

  • too slow for AVL
  • how many operations/items needed for right/left rotate?
  • AVL tree is not a complete tree
    waste of memory if use array

Properties of Heap:

  1. Heap Ordering:
    • Max Heap: priority[parent] >= priority[child]
    • Min Heap: priority[child] >= priority[parent]
  2. Complete binary tree:
    • every level is full, except possible the last
    • all nodes are as far left as possible
  3. Height:
    O(logn)

Priority Queue Operations

Refer to the slides or VisualAlgo for detailed explanations and motivations behind the operations.

// bubble up
function bubbleUp(Node v) {
    while (v != null) {
        if (priority(v) > priority(parent(v))) {
            swap(v, parent(v));
            v = parent(v);
        } else {
            return;
        }
    }
}
// bubble down
function bubbleDown(Node v) {
    while (!leaf(v)) {
        leftP = priority(left(v));
        rightP = priority(right(v));
        biggerChild = leftP > rightP ? left(v) : right(v);
        if (priority(biggerChild) > priority(v)) {
            swap(v, biggerChild);
            v = biggerChild;
        } else {
            return;
        }
    }
}
// increase key, O(log n)
1. update priority: increaseKey(5 -> 25)
2. bubble up
// decrease key, O(log n)
1. update priority: decreaseKey(25 -> 5)
2. bubble down
// insert, O(log n)
1. add a new leaf with priority p
2. bubble up
function insert(Priority p, Key k) {
    Node v = m_completeTree.insert(p,k);
    bubbleUp(v);
}
// delete, O(log n)
1. swap node to be deleted with last node (right most leaf)
2. remove last node (the node to be deleted, since we swapped earlier)
3. bubble down IF last() > deleted
// extract max priority, O(log n)
1. Node v = root;
2. delete(root);

Store Heap in Array

  • map each node in complete binary tree into a slot in an array
  • each level
    i
    starts from the array index
    2i1
    , assuming root is level 0 from top
  • let
    x
    be array index of node
    v
    • index(left(v))=2x+1
    • index(right(v))=2x+2
    • index(parent(v))=x12

Illustration:







max_heap



a

24



b

10



a--b




c

17



a--c




d

8



b--d




e

4



b--e




f

6



c--f




g

7



c--g




h

1



d--h












array



heapArray:
heapArray:



indices:
indices:



heapArray:->indices:





indices

0

1

2

3

4

5

6

7



values

24

10

17

8

4

6

7

1



Heap Sort

// heap sort
1. convert unsorted array into heap array (heapify)
    1.1 idea: given two heaps and one new node, how to join all of them into a single heap?
    1.2 join them and bubble down the root
    1.3 base case: each leaf is a heap, so half of the tree no need to bubble down
    1.4 recursion: left + right are heaps

int[] A = array of unsorted integers
for (int i = (n-1); i >= 0; i--) {
    bubbleDown(i, A); // O(height)
}
for (int i = 0; i < n; i++) {
    int value = A[i];
    A[i] = EMPTY;
    heapInsert(value, A, 0, i);
}
// NOTE: the above algorithm is not correct, still need modification

2. convert heap array into sorted array

int[] A = array stored as a heap
for (int i = (n-1); i >= 0; i--) { // O(n log n)
    int value = extractMax(A); // O(log n)
    A[i] = value;
}

Analysis:

  1. unsorted array

    heap array

    • cost of bubbleDown() is dependent on height of tree
    • obversation: at least
      n2
      nodes are
    • observation: most nodes have small height
    Height Number of Nodes
    0
    n2
    1
    n4
    2
    n8
    3
    n16
    log(n)
    1
    • Cost of building a heap:
      O(n)
Cost of building a heap is O(n)

h=0lognn2h upper bound on number of nodes at level h×O(h) cost for bubbling a node at level h=cnh=0logn12hO(h)=cnO(h=0lognh2h2, can derive using geometric series )==2O(n)=O(n)

 

  1. heap array

    sorted array

    • extractMax() is
      O(logn)
    • so the loop takes
      O(nlogn)
  2. Summary:

    • unsorted array
      heap array:
      O(n)
    • heap array
      sorted array:
      O(nlogn)
    • so overall is
      O(nlogn)
    • heap sort is in-place, but unstable
      • what if two different keys have same priority?

Q: Why not just use merge sort or quick sort?

  • heap sort is in place
  • in the scenario where sorting the whole array is not necessary, only the first
    m
    element need to be sorted, heap sort is efficient, the time complexity for this will be
    O(n+ number of items +logn)
  • a little slower than quick sort, but always completes in
    O(nlogn)

Q: Can we improve heap sort?

  • ternary (3-way) heap sort is a little faster
  • ternary heap sort is not difficult to implement, we just need to figure out the index of parent and children
  • the time complexity is still the same, we just improve from
    log2log3

Extra Questions

Extra Questions
  1. Given an array of max heap, how to convert it into a min heap.

    Answer 1
    • Approach 1: treat the “max heap” as an unsorted array and heapify (on min heap)
      O(n)
      .
    • Aprroach 2: If you are given max heap implementation and its raw array and the contents are numbers (integers or floating points), you can simply just reinsert the negations of each numbers of that max heap. This essentially converts a max heap into a min heap (just ignore the presence of -ve symbols that is only used to reverse the meaning of min and max).
  2. Given

    n items, they are in
    k
    sorted list, so each of the sorted list has
    nk
    items. How fast can you merge them into one single sorted list? What if
    k=O(n)
    , e.g.
    k=n10
    ?

    Answer 2
    • Idea: min heap array + new sorted array, put every first element of the
      k
      sorted lists into heap array, extract min from heap array, choose another element from old sorted list into heap array and then bubble
    • O(k)
  3. If a maxheap is stored in an array, we can always convert a max heap to a min heap by reversing the array, and vice versa. For example, if an array A store a max heap of [10,9,8,7] then reversing A as [7,8,9,10] will be a min heap.

    Answer 3

    False. Counterexample: A = [10,9,3,8,7,1].

Disjoint Set

Definition: A disjoint-set ADT maintains a collection of disjoint (non-overlapping) sets. It supports two primary operations: find and union.

Purpose: Primarily used to track elements split into multiple sets and to easily merge these sets.

Core Operations:

Operations Description
DisjointSet(int N) constructor: N objects
boolean find(Key p, Key q) are p and q in the same set? (is there a path connecting the two objects?)
void union(Key p, Key q) replace sets containing p and q with their union

Quick Find

Data Structure:

  • uses an array componentID[] of size N, where N is the number of elements
  • initialy, each element is in its own set, so componentID[i] = i, for
    0iN1
  • two objects are connected if they have the same component identifier






component



a

0



b

1



d

3



b--d




e

4



b--e




c

2



g

6



c--g




f

5



h

7









array



componentID:
componentID:



object:
object:



componentID:->object:





object

0

1

2

3

4

5

6

7



values

0

1

2

1

1

5

2

7



// find
bool find(int p, int q) { // O(1)
    return componentID[p] == componentID[q];
}
// union
void union(int p, int q) { // O(n)
    for (int i = 0; i < componentID.size(); i++) {
        if (componentID[i] == componentID[q]) {
            componentID[i] = componentID[p];
        }
    }
}

Quick Union

Data Structure:

  • uses an array parent[] of size N, where N is the number of elements
  • initialy, each element is in its own set, so parent[i] = i, for
    0iN1
  • two objects are connected if they are part of the same tree






component



a

0



b

1



f

5



b--f






c

2



c--b





d

3



e

4



g

6



g--a




g--e




h

7



h--c




i

8



h--i










array



parent:
parent:



object:
object:



parent:->object:





object

0

1

2

3

4

5

6

7

8



values

6

2

7

3

6

1

6

7

7



// find
bool find(int p, int q) { // O(n)
    while (parent[p] != p) {
        p = parent[p];
    }
    while (parent[q] != q) {
        q = parent[q];
    }
    return p == q;
}
// union
void union(int p, int q) { // O(n)
    while (parent[p] != p) {
        p = parent[p];
    }
    while (parent[q] != q) {
        q = parent[q];
    }
    parent[p] = q;
}

Weighted Union

Data Structure:

  • enhanced version of quick union
  • uses an array parent[] of size N, where N is the number of elements
  • an additional size[] array that tracks the size of each tree
  • when merging 2 trees, always attach the smaller tree to the root of the larger tree
  • the maximum depth of tree will be
    O(logn)
    , prove refers to slides






component



a

0



b

1



f

5



b--f






c

2



c--b





d

3



e

4



g

6



g--a




g--e




h

7



h--c




i

8



h--i










array



parent:
parent:



size:
size:



parent:->size:





object:
object:



size:->object:





object

0

1

2

3

4

5

6

7

8



size

1

2

3

1

1

1

3

5

1



values

6

2

7

3

6

1

6

7

7



// find (same as quick union)
bool find(int p, int q) { // O(log n)
    while (parent[p] != p) {
        p = parent[p];
    }
    while (parent[q] != q) {
        q = parent[q];
    }
    return p == q;
}
// union
void union(int p, int q) { // O(log n)
    while (parent[p] != p) {
        p = parent[p];
    }
    while (parent[q] != q) {
        q = parent[q];
    }
    if (size[p] > size[q]) {
        parent[q] = p;
        size[p] += size[q];
    } else {
        parent[p] = q;
        size[q] += size[p];
    }
}

Weighted Union with Path Compression

  • extended from weighted union
  • after finding the root, set the parent of each traversed node to the root

Example: find(11, 32)







component

Before Path Compression


a

0



j

10



a--j






b

1



e

4



b--e




g

6



b--g




c

2



c--b




d

3



c--d




f

5



c--f




g--a




k

11



g--k




h

7



h--c




i

8



h--i




l

12



k--l




m

13



k--m










component

After Path Compression


a

0



j

10



a--j




b

1



e

4



b--e




c

2



d

3



c--d




f

5



c--f




g

6



g--a




h

7



h--b




h--c




h--g




i

8



h--i




k

11



h--k




l

12



k--l




m

13



k--m





// findRoot v1
// also need to update size[] array, ommited here
int findRoot(int p) {
    int root = p;
    while (parent[root] != root) {
        root = parent[root;
    }
    while (parent[p] != p) {
        int temp = parent[p];
        parent[p] = root;
        p = temp;
    }
    return root;
}
// findRoot v2
// also need to update size[] array, ommited here
int findRoot(int p) {
    int root = p;
    while (parent[root] != root) {
        parent[root] = parent[parent[root]];
        root = parent[root];
    }
    return root;
}

Union-Find Algorithm Summary

Algorithm find union
Quick Find
O(1)
O(n)
Quick Union
O(n)
O(n)
Weighted Union
O(logn)
O(logn)
Path Compression
O(logn)
O(logn)
Weighted Union with Path Compression
α(m,n)
α(m,n)
  • α
    is Inverse Ackermann function

Extra Questions

Extra Questiosn
  1. Given an array equations of strings that represent relationships between variables in one of two different forms: "a==b" or "a!=b". Return true if and only if it is possible to assign integeres to variable names so as to satisfy all the given equations.

    Answer 1
    1. Create a Union-Find Data Structure:
      • For each variable in the equations, create a disjoint set. Initially, each variable is its own set.
    2. Process "==" Equations:
      • Iterate through the equations and for each "a==b" equation, union the sets to which variables 'a' and 'b' belong.
    3. Check "!=" Equations:
      • Iterate through the equations again and for each "a!=b" equation, check if 'a' and 'b' belong to the same set in the Union-Find data structure. If they do, the equations are contradictory, and you return False.
    4. If No Contradictions, Return True:
      • If you have iterated through all the equations without finding any contradictions, return True, indicating that it's possible to assign integers to variables to satisfy all the given equations.
  2. Leetcode 100

    Answer 2
    • Approach 1: DFS/BFS
    • Approach 2: UF
      • Union-Find (or Disjoint Set) data structure allows you to efficiently perform union and find operations. Union operation merges two sets into one, and find operation determines the set to which an element belongs.
      • For each '1' cell, check its adjacent cells.
      • If adjacent cells are also '1', it means they are part of the same island. Therefore, union these cells in the Union-Find data structure.
      • The number of disjoint sets remaining in the Union-Find data structure after processing all cells represents the number of islands. Each disjoint set represents a separate island.

Graph

For more detailed graph basics, refer CS1231S notes.

Notations:

G=(V,E), comprising

  • nonempty set of vertices/nodes,
    V={v1,v2,,vn}
  • set of edges,
    E={e1,e2,,ek}
  • e={v,w}
    for undirected edge
    E
    incident of vertices
    v
    and
    w
  • e=(v,w)
    for directed edge
    E
    from vertex
    v
    to vertex
    w

Definition: (not in CS1231S)

  • Diameter: Maximum distance between two nodes, following the shortest path.

Motivation:

  • Graphs can model a wide array of real-world problems, which we can then apply graph-based algorithms to solve these problems efficiently.

Graph Representations

Adjacency List

  • an array of linked list
  • the index represents a node, and each element in the linked list represents the nodes that are connected to it
  • the image below shows an undirected graph, can also be a directed graph
image
image
// structure
// there are many different implementation
// refer assignment 5 and lecture slides

Adjacency Matrix

  • a 2D array, where matrix[u][v] indicates the presence/weight of an edge between u and v
  • Neat Property: Let
    A
    be a adjacency matrix, then number of walks of length
    n
    from
    vi
    to
    vj=i,j
    entry of
    An
  • the image below shows an undirected graph, can also be a directed graph
image
image
// structure
class Graph {
    std::vector<std::vector<int>> _adjMatrix;
}

Comparison

Adjacency List Adjacency Matrix
Memory Usage for graph,
G=(V,E)
O(V+E)
O(V2)
Memory Usage for graph with cycle
O(V)
O(V2)
Memory Usage for graph with clique
O(V2)
O(V2)
find any neighbour of v fast slow
are v and w neighbours? slow fast
enumerate all neighbours fast slow

Base rule: if graph is dense then use an adjacency matrix; else use an adjacency list.

Travesal Algorithms

Depth-First Search (DFS)

  • DFS uses stack

Psedocode:

1. Add start-node to stack.
2. Repeat until stack is empty:
    2.1 Pop node v from the top of the stack.
    2.2 Visit v.
    2.3 Explore all outgoing edges of v.
    2.4 Push all unvisited neighbors of v on the top of the stack.

C++ code:

// code snippet from TIC2001 AY2020 S1 PE
// maybe not be 100% correct
void Graph::BFS(int s, List<int>& output, bool resetVisited) {
    Stack<int> nodes;
    if (resetVisited) {
        _resetVisited();
    }

    nodes.push(s);
    _setVisited(s);
    while (!nodes.empty()) {
        int currNode = nodes.pop();
        _setVisited(currNode);
        output.insertTail(currNode);
        _al[currNode].start();
        while (!_al[currNode].end()) {
            int n = _al[currNode].current();
            if (!_isVisited(n)) {
                nodes.push(n);
                _setVisited(n);
            }
            _al[currNode].next();
        }
    }
}

Breadth-First Search (BFS)

  • BFS uses queue

Psedocode:

1. Add start-node to queue.
2. Repeat until queue is empty:
    2.1 Remove node v from the front of the queue.
    2.2 Visit v.
    2.3 Explore all outgoing edges of v.
    2.4 Add all unvisited neighbors of v to the queue.

C++ code:

// code snippet from TIC2001 AY2020 S1 PE
// maybe not be 100% correct
void Graph::BFS(int s, List<int>& output, bool resetVisited) {
    Queue<int> nodes;
    if (resetVisited) {
        _resetVisited();
    }

    nodes.enqueue(s);
    _setVisited(s);
    while (!nodes.empty()) {
        int currNode = nodes.dequeue();
        _setVisited(currNode);
        output.insertTail(currNode);
        _al[currNode].start();
        while (!_al[currNode].end()) {
            int n = _al[currNode].current();
            if (!_isVisited(n)) {
                nodes.enqueue(n);
                _setVisited(n);
            }
            _al[currNode].next();
        }
    }
}

Analysis

TODO AFTER PRACTICAL EXAM, FOR FINAL EXAM

Shortest Path Algorithms

Dijkstra's algorithm

Bellman-Ford algorithm

Analysis

Questions: why can't we use DFS or BFS to find shortest path

Minimum Spanning Tree Algorithms

Prim's algorithm

For a connected weighted graph G with n vertices:
1. pick any vertex v of G and let T be the graph with this vertex only
2. let V be the set of all vertices of G except v
3. for (i = 0 to n − 1)
    3.1 find the edge e in G with the least weight of all the edges connected to T. 
    3.2 let w be the endpoint of e
    3.3 add e and w to the edge and vertex sets of T
    3.4 delete w from v

Kruskal's algorithm

For a connected weighted graph G with n vertices:
1. initialise T to have all the vertices of G and no edges
2. let E be the set of all edges in G; let m = 0
3. while (m < n-1)
    3.1 find and remove the edge e in E of least weight
    3.2 if adding e to the edge set of T does not produce a circuit:
         i. add e to the edge set of T
        ii. set m = m + 1

Boruvka’s algorithm

Analysis

https://cs.stackexchange.com/questions/18797/minimum-spanning-tree-vs-shortest-path

Topological Sort

  • Post Order DFS

Psedocode:

For a graph with n vertices and m directed edges:
1. Initialize a stack and a visited array of size n
2. For each unvisited vertex in the graph:
    2.1 Call the DFS function with the vertex as the parameter
    2.2 In the DFS function, mark the vertex as visited and recursively call the DFS function for all unvisited neighbors of the vertex
    2.3 Once all the neighbors have been visited, push the vertex onto the stack
3. After all vertices have been visited, pop elements from the stack and append them to the output list until the stack is empty

C++ code:

#include <forward_list>
#include <stack>
#include <vector>

using std::forward_list;
using std::vector;
using std::stack;

class GraphEdge {
private:
    int _dest;

public:
    GraphEdge() : _dest(-1), _weight(-1) {} // for vector
    GraphEdge(int index) : _dest(index) {}
    GraphEdge(const GraphEdge& e) : _dest(e._dest) {}
    int dest() const { return _dest; }
}

class Graph {
private:
    vector<forward_list<GraphEdge>> _vertices;
    void _topologicalSortUtil(int, vector<bool>&, stack<int>&);

public:
    Graph(int n) : _vertices(n) {}
    void addEdge(int source, int dest, int weight);
    void topologicalSort();
}

void Graph::addEdge(int source, int dest) {
    // Assumes that an edge doesn't already exist!
    _vertices[source].emplace_front(dest, weight);
}

void Graph::_topologicalSortUtil(int v, vector<bool>& visited, stack<int>& Stack) {
    visited[v] = true;
    forward_list<GraphEdge> curr_edge_neighbour;
    
    curr_edge_neighbour = _vertices[v];
    
    for (auto i = curr_edge_neighbour.begin(); i != curr_edge_neighbour.end(); i++) {
        if (!visited[i->dest()]) {
            topologicalSortUtil(i->dest(), visited, Stack);
        }
    }
    Stack.push(v);
}

void Graph::topologicalSort() {
    stack<int> Stack;
    vector<bool> visited(_vertices.size(), false);
    /*
    bool* visited = new bool[_vertices.size()];
    for (int i = 0; i < _vertices.size(); i++) {
        visited[i] = false;
    }
    */
        
    for (int i = 0; i < _vertices.size(); i += 1) {
        if (!visited[i]) {
            _topologicalSortUtil(i, visited, Stack);
        }
    }
    
    // print the content of the stack
    // output here
    while (!Stack.empty()) {
        std::cout << Stack.top() << " ";
        Stack.pop();
    }
    std::cout << std::endl;
}

Mix and Match

Motivation:

Combine multiple basic data structure to achieve better performance.

Example:

  • adjacency list - array of linked list
  • a linked list can also be a BST, by augmenting the nodes with *nextPtr, *leftPtr, *rightPtr
  • queue - printInOrder() will be
    O(nlogn)
    • TBC

Conclusion:

Pointers is powerful and useful, but complicated to manage.

SkipList

Data Structure:

  • TODO

Analysis:

  • TODO

Binary Space Partitioning

Motivation:

Convex Hull

Definition:

Convex: A point set

P is convex if
x,yP,xyP

Convex Combination: Let

S={piRdi=1n}, an convex combination of
S=piS(λipi)
such that
i=1nλi=1,λi0

Convex Hull: The convex hull of

S,conv(S) is the collection of all convex combinations

Font size of math is increased because there is some rendering problem with $\overline$

Alternative explanation:

Let

CRn.
C
is convex if, for all
x,yC
, the line segment connecting
x,y
is included
C
.

A convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1.

Source: Wikipedia

Motivation:

Application of ConveX Hull:

  • In general, a “simpler” representation of a complicated object
    • Graphics: Bezier curve drawing/intersection, ray tracing
    • Data mining/computer vision: Classification of data
    • GIS, path finding

Convex Hull Algorithms

Gift Wrapping algorithm

a.k.a Jarvis march

1. Take the leftmost vertex
2. Repeat
    2.1 Search for the next vertex on the convex hull by choosing the one with the minimal turning angle

Graham's scan

1. Starting from the left most vertex, go around the polygon
3. If it’s a convex corner, continue
2. If it is a concave vertex, “make it convex” by “filling” it
    - By connecting its two neighbors

Divide and Conquer

1. Sort vertices in a direction, e.g. y direction
2. Repeat:
    2.1 Merge every neighboring convex hull
    2.2 For every merge, find the two “supporting lines”
        2.2.1 By taking a walk from an initial line (by joining the two neighbors)

Incremental method

1. Adding a point according to a sorted direction incrementally

Quickhull

- General idea: discard points that are not on the hull as quickly as possible
- First find the maximum and minimum in x and y directions
1. Construct a quadrilateral by these four points
2. Discard any point inside
3. Repeat:
    3.1 For each side, find the furthermost point
    3.2 Include in the convex hull
    3.3 Eliminate any point within

Analysis

Algorithm Expected Time Worst Time Moving up to 3D
Gift Wrapping
O(nh)
O(n2)
O(fn)
Graham's scan
O(nlogn)
O(nlogn)
-
Divide and Conquer
O(nlogn)
O(nlogn)
O(nlogn)
Incremental method
O(nlogn)
O(nlogn)
O(nlogn)
Quickhull
O(nlogn)
O(n2)
O(n2)
  • n
    is the number of points in the set
  • h
    is the number of points in the hull
  • f
    is the number of faces on the convex hull which is
    O(n)

Can we do it in

O(n) ?

  • No. Otherwise, we can solve sorting problem in
    O(n)
    time (comparison sort)
  • Idea: For
    n
    numbers, project them onto a parabola