# 19 - Quick-Sort and random binary search trees
## Quicksort
```
quicksort(a,i,j) //sort the subarray a_i,..,a_j (i <= j)
if i < j:
p = random element in a_i,...,a_j //the element to partition around
k = partition(a,i,j,p) //the index of p
quicksort(a,i,k-1)
quicksort(a,k+1, j)
```
Compared to Insertion Sort: $\binom{n}{2}$ comparisons at most, average $\frac{\binom{n}{2}}{2}$, $n-1$ at best
How bad is quicksort?
In the algorithm, we pick some random partition point and partition the array to everything less than the point
Say we have an array $1, 4, 7, 2, 6, 5, 3$
The worst partition point we can pick is either $1$ or $7$ because then we'd partition the array into one part with every other element and the other with a single element and we'd do $n-1$ comparisons.
Should we continue picking either the greatest or least elements in the array, we'd continue doing $n-2, n-3...1$ comparisons.
This results in $\binom{n}{2}$ comparisons in the worst case and is more likely to happen then insertion sort so is this better than insertion sort?
Define $X = \text{"the number of comparisons performed by quicksort(a,1,n)"}$
Assume that each element in the list is unique are the elements $1,...,n$. The probability of picking any partition point is uniformly random and independent of each other.
$E(X)?$
Not a trivial problem
For each $1 \leq x < y \leq n$ define
$I_{x,y} = 1$ if quicksort compares $x$ and $y$
$I_{x,y} = 0$ if quicksort never compares $x$ and $y$
$X = \sum_{x=1}^{n-1}\sum_{y=x+1}^n I_{x,y}$
We convince outselves that $x$ and $y$ will only ever compare themselves at most once, so the number of comparisons is exactly equal tto $X$
$E(X) = E(\sum_{x=1}^{n-1}\sum_{y=x+1}^n I_{x,y}) = \sum_{x=1}^{n-1}\sum_{y=x+1}^n E(I_{x,y})$
$=\sum_{x=1}^{n-1}\sum_{y=x+1}^n Pr(\text{quicksort compares x and y})$
We convince ourselves that x and y will only be compared if either are picked as a partition point. The number of possible partition points are the total number of numbers in the range of $x$ and $y$ which is $x-y+1$ so:
$=\sum_{x=1}^{n-1}\sum_{y=x+1}^n \frac{2}{y-x+1} =\sum_{x=1}^{n-1}(\frac{2}{2} + \frac{2}{3} + \frac{2}{4} + ... + \frac{2}{n-x})$
$=\sum_{x}2(\frac{1}{2} + \frac{1}{3} + \frac{1}{4} + ... + \frac{1}{n-x})$
$=\sum_{x}2(H_n-1) \leq \sum_{x=1}^{n-1}2H_n \leq 2n H_n \leq 2n \ln n \leq 1.38n \log_2 n + 2n$
So the average number of comparisons is roughly $1.38n log_2 n$
Ex:

$I_{1,n}
$I_{1,7} = 0$ because in the optimal case where $4$ is the partition
$Pr(I_{1,n}) = Pr(I_{1,7}) = \frac{2}{7}$
## Random binary search trees
What happens when you take a random list of numbers and insert them into a binary search tree
The number of comparisons that each element in the list will experience in the tree is the same as that of quicksort. Pick the first number as the root, each number will get compared to that at least, after that, we define a BST structure
###### tags: `COMP2804` `Probability`