Probabilistic Dynamic Programming I

During the Algorithms and Probability class, which I took this semester, I got introduced to the concept of probabilistic dynamic programming. At the start, it felt very unintiuitve but after some time studying, I think I have a decent understanding of it. But to improve my understanding I decided to write a short post about the problems that I've seen and detailed explanation on how to solve them. Ok too much talk, let's start with the first problem. **Vanishing Bag** *DESCRIPTION*: We have a special bag from which the objects can vanish. Initially there are some number of white and black stones in the bag. The game proceeds by two players, you and your friend, alternatively choosing a stone from the bag and removing it. The players are required to do this without looking into the bag, so each stone is considered to be drawn uniformly at random. However, this on its own would not be as fun, so there is a catch. Every time someone draws a stone from the bag, another stone chosen uniformly at random vanishes. The game ends once a white stone is drawn or the bag becomes empty. The winner is the player who draws the white stone or if the bag becomes empty the winner is the player who played second. Out of courtesy, your friend draws first and begins the game (thus, if the bag becomes empty you win). What is the probability of you winning the game? So before getting into how to construct the dp-table and fill it, let's approach the problem from purely theoretical perspective. Let $X_{i,j,k}$ be the event that represents us winning the game considering we have $i ≤ w$ white stones, $j ≤ b$ black stones left in the bag and current player is player $k \in \{0,1\}$. $k=0$ implies that the current player is the friend and k=1 implies it is our turn. Let $Y/\overline Y$ represent the occurance "A black/white stone vanished from the bag.". Let $Z/\overline Z$ represent "Current player picked a black/white stone.". We have the following recursive relationship: $$ Pr[X_{i,j,0}] = Pr[Y \cap Z]Pr[X_{i,j,0}|Y \cap Z] +Pr[Y\cap \overline Z]Pr[X_{i,j,0}|Y\cap \overline Z]+\\Pr[\overline Y\cap Z]Pr[X_{i,j,0}|\overline Y\cap Z]+Pr[\overline Y \cap \overline Z]Pr[X_{i,j,0}|\overline Y\cap \overline Z]\\ =Pr[Z]Pr[Y|Z]Pr[X_{i,j,0}|Y \cap Z] +Pr[\overline Z]Pr[Y| \overline Z]Pr[X_{i,j,0}|Y\cap \overline Z]+\\Pr[Z]Pr[\overline Y| Z]Pr[X_{i,j,0}|\overline Y\cap Z]+Pr[\overline Z]Pr[\overline Y | \overline Z]Pr[X_{i,j,0}|\overline Y\cap \overline Z]\\=\frac{j}{i+j}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,1}]+\frac{i}{i+j}\cdot\frac{j}{i+j-2}\cdot 0 + \\ \frac{j}{i+j}\cdot\frac{i}{i+j-1}\cdot Pr[X_{i-1,j-1,1}]+ \frac{i}{i+j}\cdot \frac{i-1}{i+j-1}\cdot 0 \\= \frac{j}{i+j}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,1}]+\frac{j}{i+j}\cdot\frac{i}{i+j-1}\cdot Pr[X_{i-1,j-1,1}]$$ $$Pr[X_{i,j,1}] =Pr[Y \cap Z]Pr[X_{i,j,1}|Y \cap Z] +Pr[Y\cap \overline Z]Pr[X_{i,j,1}|Y\cap \overline Z]+\\Pr[\overline Y\cap Z]Pr[X_{i,j,1}|\overline Y\cap Z]+Pr[\overline Y \cap \overline Z]Pr[X_{i,j,1}|\overline Y\cap \overline Z]\\= Pr[Z]Pr[Y|Z]Pr[X_{i,j,1}|Y \cap Z] +Pr[\overline Z]Pr[Y| \overline Z]Pr[X_{i,j,1}|Y\cap \overline Z]+\\Pr[Z]Pr[\overline Y| Z]Pr[X_{i,j,1}|\overline Y\cap Z]+Pr[\overline Z]Pr[\overline Y | \overline Z]Pr[X_{i,j,1}|\overline Y\cap \overline Z]\\=\frac{j}{j+i}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,0}]+\frac{i}{i+j}\cdot\frac{j}{i+j-1}\cdot 1+\\ \frac{j}{i+j}\cdot\frac{i}{j+i-1}\cdot Pr[X_{i-1,j-1,0}]+\frac{i}{i+j}\cdot\frac{i-1}{j+i-1}\cdot 1\\=\frac{i}{i+j}+\frac{j}{j+i}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,0}]+\frac{j}{i+j}\cdot\frac{i}{j+i-1}\cdot Pr[X_{i-1,j-1,0}]$$ Packing these two in to one equation, we have: $$Pr[X_{i,j,k}] = (1-k)\cdot\\ (\frac{j}{i+j}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,1}]+\frac{j}{i+j}\cdot\frac{i}{i+j-1}\cdot Pr[X_{i-1,j-1,1}])\\+k\cdot\\(\frac{i}{i+j}+\frac{j}{j+i}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,0}]+\frac{j}{i+j}\cdot\frac{i}{j+i-1}\cdot Pr[X_{i-1,j-1,0}])$$ Even though we are getting close, it is neither done nor correct yet. We have to set some base cases and make sure we add the probabilities to calculation when it is possible for them to occur (e.g. if we have only one black stone in the bag, a case where a player picks a black stone and a black stone vanishes cannot happen.). So we define a indicator variable $\mathbb I_{j\geq2}=1\Leftrightarrow$ "We have more than 1 black stone left.", 0 otherwise. In addition to that, for bases cases, we have: $$Pr[X_{i>0,0,k}]=k,Pr[X_{0,j,k}]=1$$ We construct two more indicator variables $\mathbb I_{i>0,j=0}=1\Leftrightarrow i>0\wedge j=0$ $\mathbb I_{i=0}=1\Leftrightarrow i=0$ Packing it all into one equation: $$Pr[X_{i,j,k}] = \mathbb I_{i>0,j=0}\cdot k +(1-\mathbb I_{i>0,j=0})\cdot\mathbb I_{i=0}+ (1-\mathbb I_{i=0})\cdot(1-\mathbb I_{i>0,j=0})\cdot( (1-k)\cdot\\ (\mathbb I_{j\geq2}\cdot\frac{j}{i+j}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,1}]+\frac{j}{i+j}\cdot\frac{i}{i+j-1}\cdot Pr[X_{i-1,j-1,1}])\\+k\cdot\\(\mathbb I_{j\geq2}\cdot\frac{i}{i+j}+\frac{j}{j+i}\cdot\frac{j-1}{i+j-1}\cdot Pr[X_{i,j-2,0}]+\frac{j}{i+j}\cdot\frac{i}{j+i-1}\cdot Pr[X_{i-1,j-1,0}]))$$ Hmm yeah looks like that's all. Since we have this equation, we can easily write the code:![](https://hackmd.io/_uploads/SyWIBnM5n.png) As one can see, before calling the recursive method, we fill the table with -1 so that we don't compute the values we already computed. This brings the code to the time complexity $O((w+1)(b+1)2)\in O(wb)$. And that's all, good night!