Algorithm - Probabilistic Analysis and Randomized Algorithms

# Algorithm - Probabilistic Analysis and Randomized Algorithms # Hiring Problem So here is the example we will use(basically same as hiring problem but number version): ```cpp= int max=0 for(int i=0;i<n;i++){ if(arr[i]>max) //compare(interview) max=arr[i]; //assign(hire) } ``` there are 2 main operation `compare` and `assign`. * `compare`: cost $c_c$ * `assign`: cost $c_a$ For our intuition, we calculate the worst case complexity, which is $nc_c+nc_a$; however, in the best situation, we got $nc_c+c_a$, which have big different. Then, how should we anlyze this code? Let's ignore the `compare`, we got complexity $O(nc_a)=O(n)$ btw, the complexity of original worst case is $nc_c+nc_a=O(n\times(c_c+c_a))=O(n)$ # Probabilistic Analysis :::info Probabilistic Analysis is the use of probability theory to analyse the running time of a **deterministic algorithm** (that if we have same input, we'll get same output). ::: To slove the problem we just mention, we can use probabilistic analysis. We consider all situation, including worst and best case. We need to know the **distribution of input**. For the example, we assume it is a uniform random distribution: How many different inputs are possible? There are $n!$ possible permutations! And all of them are equally like! Take $n=4$ for instance: | | | |||||| | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | 1,2,3,4| 4 | 2,1,3,4|3|3,1,2,4|2|4,1,2,3|1| | 1,2,4,3| 3 | 2,1,4,3|2|3,1,4,2|2|4,1,3,2|1| | 1,3,2,4| 3 | 2,3,1,4|3|3,2,1,4|2|4,2,1,3|1| | 1,3,4,2| 3 | 2,3,4,1|3|3,2,4,1|2|4,2,3,1|1| | 1,4,2,3| 2 | 2,4,1,3|2|3,4,1,2|2|4,3,1,2|1| | 1,4,3,2| 2 | 2,4,3,1|2|3,4,2,1|2|4,3,2,1|1| The average # of `assign` is actually $\frac{50}{24}\approx2.083$. Much less than 4. ## Calculate the Average Case (Expectation) of Complexity That what we do in probabilistic analysis. How we analyze array below? $arr[n]=\{x_1, x_2, x_3,x_4, ..., x_n\}$ We can get through it step by step: * If $x_1$ is the max of $arr[1:1]$, probability to `assign` $x_1$ $\rightarrow 1\times1$ * expect # of `assign` $= 1\times1$ * If $x_2$ is the max of $arr[1:2]$, probability to `assign` $x_2$ $\rightarrow 1\times\frac{1}{2}$ * expect # of `assign` $= 1\times1+1\times\frac{1}{2}$ * If $x_3$ is the max of $arr[1:3]$, probability to `assign` $x_3$ $\rightarrow 1\times\frac{1}{3}$ * expect # of `assign` $= 1\times1+1\times\frac{1}{2}+1\times\frac{1}{3}$ * If $x_4$ is the max of $arr[1:4]$, probability to `assign` $x_4$ $\rightarrow 1\times\frac{1}{4}$ * expect # of `assign` $= 1\times1+1\times\frac{1}{2}+1\times\frac{1}{3}+1\times\frac{1}{4}$ $$ \vdots $$ * If $x_n$ is the max of $arr[1:n]$, probability to `assign` $x_n$ $\rightarrow 1\times\frac{1}{n}$ * expect # of `assign` $= 1\times1+1\times\frac{1}{2}+1\times\frac{1}{3}+1\times\frac{1}{4}+\dots+1\times\frac{1}{n}$ We got: expect # of `assign` is $1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\dots+\frac{1}{n}<log(n)+1$ The average case of complexity is $O(log(n))$, which is the result of **Probabilistic Analysis**. # Randomized Algorithms Different form **deterministic algorithm**, which if we have same input, we'll get same output, the **randomized algorithms** may give **different output with same input**. But don't worry, you can still derive the expected running time without knowing the distribution of input. ## Why use randomized algorithms? * because we want to avoid bad inputs(for axample: worst case) * because the deterministic algorithm may take too long(for axample: worst case) * because you are doing cryptography or security. ## Example: Found 1 Given an array with n elements (n is even). Half of the array contains 0s, the other half contains 1s. **Goal**: Found an index that contains a 1. **Code**: ### Generate the question ```python= n=100000 arr= [0]*int(n/2)+[1]*int(n/2)# worst case for linear search ``` ### Monte Carlo method In this case, linear search will compare 50000 times to find an index that contains a 1. However, for the randomized algorithm below. It only compare 100 times with tiny probability to failed ($\frac{1}{2^{100}}\approx 7.8886091e-31\approx0$) ```python= flag=0 for i in range(100): k = random.randint(0, n-1) if (arr[k]==1): flag=1 print(k) break if(flag==0): print("failed!") ``` ### Las Vegas method In this method, the worst case is it can run forever; however, it stopped immediately after it find the answer(we choose Monte Carlo method most in most of case) ```python= count=0 while 1: k = random.randint(0, n-1) count+=1 if (arr[k]==1): flag=1 print(k, ", we tried", count,"times.") break ``` Here is the result: ``` 74593 We tried 4 times. ``` Which has preety good performence in this example. **There are more good examples in the second reference, please check out** ## Hiring Problem Let's get back to **Hiring Problem**, we can actually shuffle the list before we start hiring! Which we have only $\frac{1}{n!}$ chance to get the worst case($O(n)$). And that's why we learn randomized algorithms. # Reference 1. https://www.youtube.com/watch?v=fC7HaZa_HI4 2. https://www.youtube.com/watch?v=3qa48SlptP8 3. https://www.youtube.com/watch?v=mXy4iO-8J3A