# Algorithm - Probabilistic Analysis and Randomized Algorithms
# Hiring Problem
So here is the example we will use(basically same as hiring problem but number version):
```cpp=
int max=0
for(int i=0;i<n;i++){
if(arr[i]>max) //compare(interview)
max=arr[i]; //assign(hire)
}
```
there are 2 main operation `compare` and `assign`.
* `compare`: cost $c_c$
* `assign`: cost $c_a$
For our intuition, we calculate the worst case complexity, which is $nc_c+nc_a$; however, in the best situation, we got $nc_c+c_a$, which have big different. Then, how should we anlyze this code? Let's ignore the `compare`, we got complexity $O(nc_a)=O(n)$
btw, the complexity of original worst case is $nc_c+nc_a=O(n\times(c_c+c_a))=O(n)$
# Probabilistic Analysis
:::info
Probabilistic Analysis is the use of probability theory to analyse the running time of a **deterministic algorithm** (that if we have same input, we'll get same output).
:::
To slove the problem we just mention, we can use probabilistic analysis. We consider all situation, including worst and best case. We need to know the **distribution of input**. For the example, we assume it is a uniform random distribution:
How many different inputs are possible?
There are $n!$ possible permutations! And all of them are equally like! Take $n=4$ for instance:
| | | ||||||
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| 1,2,3,4| 4 | 2,1,3,4|3|3,1,2,4|2|4,1,2,3|1|
| 1,2,4,3| 3 | 2,1,4,3|2|3,1,4,2|2|4,1,3,2|1|
| 1,3,2,4| 3 | 2,3,1,4|3|3,2,1,4|2|4,2,1,3|1|
| 1,3,4,2| 3 | 2,3,4,1|3|3,2,4,1|2|4,2,3,1|1|
| 1,4,2,3| 2 | 2,4,1,3|2|3,4,1,2|2|4,3,1,2|1|
| 1,4,3,2| 2 | 2,4,3,1|2|3,4,2,1|2|4,3,2,1|1|
The average # of `assign` is actually $\frac{50}{24}\approx2.083$. Much less than 4.
## Calculate the Average Case (Expectation) of Complexity
That what we do in probabilistic analysis. How we analyze array below?
$arr[n]=\{x_1, x_2, x_3,x_4, ..., x_n\}$
We can get through it step by step:
* If $x_1$ is the max of $arr[1:1]$, probability to `assign` $x_1$ $\rightarrow 1\times1$
* expect # of `assign` $= 1\times1$
* If $x_2$ is the max of $arr[1:2]$, probability to `assign` $x_2$ $\rightarrow 1\times\frac{1}{2}$
* expect # of `assign` $= 1\times1+1\times\frac{1}{2}$
* If $x_3$ is the max of $arr[1:3]$, probability to `assign` $x_3$ $\rightarrow 1\times\frac{1}{3}$
* expect # of `assign` $= 1\times1+1\times\frac{1}{2}+1\times\frac{1}{3}$
* If $x_4$ is the max of $arr[1:4]$, probability to `assign` $x_4$ $\rightarrow 1\times\frac{1}{4}$
* expect # of `assign` $= 1\times1+1\times\frac{1}{2}+1\times\frac{1}{3}+1\times\frac{1}{4}$
$$
\vdots
$$
* If $x_n$ is the max of $arr[1:n]$, probability to `assign` $x_n$ $\rightarrow 1\times\frac{1}{n}$
* expect # of `assign` $= 1\times1+1\times\frac{1}{2}+1\times\frac{1}{3}+1\times\frac{1}{4}+\dots+1\times\frac{1}{n}$
We got:
expect # of `assign` is
$1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\dots+\frac{1}{n}<log(n)+1$
The average case of complexity is $O(log(n))$, which is the result of **Probabilistic Analysis**.
# Randomized Algorithms
Different form **deterministic algorithm**, which if we have same input, we'll get same output, the **randomized algorithms** may give **different output with same input**. But don't worry, you can still derive the expected running time without knowing the distribution of input.
## Why use randomized algorithms?
* because we want to avoid bad inputs(for axample: worst case)
* because the deterministic algorithm may take too long(for axample: worst case)
* because you are doing cryptography or security.
## Example: Found 1
Given an array with n elements (n is even). Half of the array contains 0s, the other half contains 1s.
**Goal**: Found an index that contains a 1.
**Code**:
### Generate the question
```python=
n=100000
arr= [0]*int(n/2)+[1]*int(n/2)# worst case for linear search
```
### Monte Carlo method
In this case, linear search will compare 50000 times to find an index that contains a 1. However, for the randomized algorithm below. It only compare 100 times with tiny probability to failed ($\frac{1}{2^{100}}\approx 7.8886091e-31\approx0$)
```python=
flag=0
for i in range(100):
k = random.randint(0, n-1)
if (arr[k]==1):
flag=1
print(k)
break
if(flag==0): print("failed!")
```
### Las Vegas method
In this method, the worst case is it can run forever; however, it stopped immediately after it find the answer(we choose Monte Carlo method most in most of case)
```python=
count=0
while 1:
k = random.randint(0, n-1)
count+=1
if (arr[k]==1):
flag=1
print(k, ", we tried", count,"times.")
break
```
Here is the result:
```
74593
We tried 4 times.
```
Which has preety good performence in this example.
**There are more good examples in the second reference, please check out**
## Hiring Problem
Let's get back to **Hiring Problem**, we can actually shuffle the list before we start hiring! Which we have only $\frac{1}{n!}$ chance to get the worst case($O(n)$). And that's why we learn randomized algorithms.
# Reference
1. https://www.youtube.com/watch?v=fC7HaZa_HI4
2. https://www.youtube.com/watch?v=3qa48SlptP8
3. https://www.youtube.com/watch?v=mXy4iO-8J3A