Basic Properties of Mutual Information

--- tags: information theory --- # Basic Properties of Mutual Information ## Definition ### Mutual Information $X$, $Y$: random variables Define the mutual information between $X$ and $Y$, denoted by $I(X:Y)$, as $$ I(X:Y) = H(X) - X(X|Y) $$ ### Condition Mutual Inforations $X$, $Y$, $Z$: random variables Then we define $I(X;Y|Z)$ as $$ \begin{align} I(X;Y|Z) &:= H(X|Z) - H(X|Y,Z) \notag \\ & = -\sum_{z \in Z} P(z) I(X:Y|Z=z) \end{align} $$ ### Remark $$ I(X:Y) = H(X) - H(X|Y) = H(X,Y) - H(X) - H(Y) = I(Y:X) $$ Similarly, $$ I(X:Y|Z) = I(Y:X|Z) $$ ## Basic Properties ### Proposition 1 $$ I(X:Y) \ge 0 $$ and $$ I(X:Y) \le H(X) $$ ### Proposition 1 $I(X:Y|Z) = 0$ if and only if $X, Y$ are Independent given $Z$, that is, $X-Z-Y$ [forms a Markov chain](https://hackmd.io/40iNuBaSTrmnjdgwWIfb1g#Corollary2). #### proof $(\Leftarrow)$ $$ \begin{align} I(X:Y|Z) &= I(Y:X|Z) \notag \\ &= H(Y|Z) - H(Y|X,Z) \notag \\ &= H(Y|Z) - \sum_{x, z} p(x, z) H(Y|X=x, Z=z) \notag\\ &= H(Y|Z) + \sum_{x, z} p(x,z) \sum_{y} \underbrace{p(y|X=x, Z=z)}_{p(y|Z=z)} \log p(y|X=x, Z=z) \notag \\ &=H(Y|Z) + \sum_{x,z} p(x,z) \sum_y p(y|Z=z) \log p(y|Z = z) \notag \\ &= H(Y|Z) - H(Y|Z) = 0 \end{align} $$ ### Theorem 1 (Chain Rule) $$ \begin{align} I(X:Y^n) = I(X: Y_1,\ldots, Y_n) = \sum_{i=1} I(Y_i:X|Y^{i-1}) \end{align} $$ #### proof $$ \begin{align} I(X: Y^n) &= I(Y^n:X) \notag \\ &= H(Y^n) - H(Y^n|X) \notag \\ &= H(Y_n | Y^{n-1}) + H(Y_{n-1} | Y^{n-1}) + \cdots + H(Y_2 | Y_1) + H(Y_1) \notag \\ & - H(Y_n | Y^{n-1}, X) - H(Y_{n-1}|Y^{n-2}, X) - H(Y_2 | Y_1, X) - H(Y_1|X) \notag \\ &= I(Y_n:X |Y^{n-1}) + I(Y_{n-1} :X | Y^{n-2}) + \cdots + I(Y_2:X|Y_1) + I(Y_1:X) \end{align} $$ ### Theorem 2 (Information Processing) For a Markov chain $X-Y-Z$, that is $P(X,Y,Z) = P_X P_{Y|X} P_{Z|Y}$, we have $$ I(X:Y) \ge I(X:Z) $$ #### proof Since $X-Y-Z$, we have $I(X:Z|Y) = 0$. Hence we can apply chain rule to get $$ I(X:Y,Z) = I(X:Z|Y) + I(X:Y) = I(X:Y) $$ and $$ I(X:Y,Z) = I(X:Y|Z) + I(X:Z) $$ Therefore, $$ I(X:Y) = I(X:Y|Z) + I(X:Z) \ge I(X:Z) $$ ### Exercise For $Z= g(Y)$ being a deterministic function of $Y$, we have $$ H(Y) \ge H(Z) $$ and $$ I(X:Y) \ge I(X:Z) $$ #### proof Hint: Use the fact that $-(p_1 + p_2) \log (p_1 + p_2) \le -(p_1 \log p_1 + p_2 \log p_2)$. ### Exercise $$ X_1 - X_2 - X_3 - X_4 \Rightarrow I(X_1;X_4) \le I(X_2; X_3) $$ #### proof idea From theorem 2 we have: $$ \begin{align} \because X_2 - X_3 - X_4 \Rightarrow I(X_2 :X_3) \ge I(X_2:X_4) \end{align} $$ Thus, we only need to show that $I(X_2:X_4) \ge I(X_1:X_4)$. Therefore, it will be sufficient if we prove that $X_1 - X_2 -X_4$; that is, $$ P(X_4|X_2, X_1) = P(X_4|X_2) \times P(X_2 | X_1) \times P(X_1) $$ To prove this, you can follow [this](https://hackmd.io/40iNuBaSTrmnjdgwWIfb1g#Theorem-1). #### Remark For the following communication system: ![](https://i.imgur.com/AOYNtxD.png) ,where $W-X^n-Y^n-\tilde W$, $I(W:\tilde W)$ is bounded by $I(X^n; Y^n)$, which is channel capacity (defined later). ### Proposition3 For a Markov Chain $X-Y-Z$, we have $I(X:Y|Z) \le I(X:Y)$ #### proof $$ \begin{align} I(X:Y|Z) &= H(X|Z) - H(X|Y,Z) \notag \\ &= H(X|Z) - H(X|Y) \notag \\ & \le H(X) - H(X|Y) = I(X:Y) \end{align} $$ #### Remark In the case of Markov process, condition reduces entropy. However, this is not true in general.