Markov matrix - HackMD

# Markov matrix ## Setting We use $s_t$ to denote the state of a variable in period $t$. Suppose there are only finite states $e_1, \dots, e_n$, and the probability of being in any state next period given all states before now is equal to the probability conditional on the last state. That is $$P(s_{t+1} = e_i | s_{t}, s_{t-1}, s_{t-2}, \dots ) = P(s_{t+1}= e_i| s_t).$$ We can use **transition matrix** to describe this model. There are two equivalent ways to write the transition matrix. Here we use the conventional one in linear algebra. ### Transition matrix Given the above setting, each element of the transition matrix $A$ is, $$ A_{ij} = P(s_{t+1} = e_i | s_t = e_j),$$ A transition matrix must satisfy the following properties: 1. $0 \le A_{ij} \le 1$ for all $i,j$. Since probability is between $0$ and $1$. 2. $\sum_{i=1}^n A_{ij} = 1$ for all $j$. The column sum must be $1$ since the probability of all mutually exclusive events is $1$. ### Probability vector A non-negative vector $x$ is a **probability vector** if $\sum_{i=1}^n x_{i} = 1$. Suppose $x$ represents the probability of each state in time $t$, then the probability of each state in time $t+k$ is $$ P^k x. $$ ## 1 is an eigenvalue ### Statement Suppose $A$ is a transition matrix, then $1$ is an eigenvalue of $A$. <details> <summary> Proof: </summary> Let $u$ be a vector that all of its elements is $1$, i.e., $u=(1, \dots, 1)^T$. We have, $$A^T u = u.$$ $1$ is an eigenvalue of $A^T$. Because $det(A - \lambda I) = det(A^T - \lambda I)$, $A^T$ and $A$ share the same eigenvalues. Hence, $1$ is an eigenvalue of $A$. </details> ## Eigenvectors corresponding to $1$ ### Statement If $A$ is a transition matrix with positive entries, i.e. $A_{ij}>0, \forall i, j$. Then the dimension of eigenspace corresponding to eigenvalue $1$ is $1$. In other words, if $u, v$ are two vectors such that $$Au=u,\quad Av=v,$$ Then $u = \alpha v$ for some $\alpha$ in the field.($\mathbb{R}$ or $\mathbb{C}$) <details> <summary> Proof: </summary> Consider $A^T$, let $u$ be the vector that all its elements are $1$. we want to show that if $A^Tv =v$, then $v = \alpha u$ for some $\alpha$ in the field. Suppose not; we can find the largest element $v_i$ in $v$ that is greater or equal to all other elements and at least strictly greater than one element in $v$. Let $A^T_i$ be the corresponding row vector of $A^T$; we must have, $$ A^T_i v_i = v_i. $$ However, since all entries of $A^T$ is positive, $A^T_{ii}<1$ and $\sum_{j=1}^n A^T_{ij}=1$, the weighted average of elements smaller than $v_i$ could not be equal to $v_i$, which is a contradiction. Hence, $u$ is the basis of the eigenspace of $A^T$ corresponding to $1$. Because row rank equals to column rank, we have $rank(A - \lambda I) = rank(A^T - \lambda I)$. The dimensions of eigenspace corresponding to the same eigenvalue of $A^T$ and $A$ are equal. Therefore, the eigenspace of $A$ corresponding to $1$ is one dimension. </details> ## No eigenvalue longer than $1$ The following theorem is called **Gershgorin circle theorem**. We need to extend the field from real numbers into complex numbers for further usage. ### Definition Let $A$ be a $n \times n$ matrix over the complex field, and define $R_i$ as the sum of the absolute values of the non-diagonal entries in the $i$ row. $$ R_{i}=\sum _{j\neq {i}}\left|a_{ij}\right|. $$ Let $D(A_{ii},R_{i})\subseteq \mathbb {C}$ be a closed disc centered at $A_{ii}$ with radius $R_{i}$. ### Statement Every eigenvalue of $A$ lies within at least one of the Gershgorin discs $D(A_{ii},R_{i})$ <details> <summary> Proof: </summary> Let $\lambda$ be an eigenvalue of $A$ with corresponding eigenvector $x$. We want to show that $\left|\lambda -A_{ii}\right|\leq R_{i}.$ Find $i$ such that the element of $x$ with the largest absolute value is $x_{i}$. Since $Ax=\lambda x$, in particular, we take the $i$th component of that equation to get $$\sum _{j}A_{ij}x_{j}=\lambda x_{i}.$$ Moving $A_{ii}$ to the other side, $$\sum _{j\neq i}A_{ij}x_{j}=(\lambda -A_{ii})x_{i}.$$ Dividing each side with $x_i$ and taking the absolute value, $$\left|\lambda -A_{ii}\right|=\left|\sum _{j\neq i}{\frac {A_{ij}x_{j}}{x_{i}}}\right| .$$ Applying the triangle inequality, $$\left|\sum _{j\neq i}{\frac {A_{ij}x_{j}}{x_{i}}}\right| \le \sum _{j\neq i}\left|{ A_{ij}\frac {x_{j}}{x_{i}}}\right| .$$ Because $x_i$ is the element of $x$ with the largest absolute value, for any element $x_j$, $${\frac {\left|x_{j}\right|}{\left|x_{i}\right|}}\leq 1.$$ Hence, $$\left|\lambda -A_{ii}\right|=\left|\sum _{j\neq i}{\frac {A_{ij}x_{j}}{x_{i}}}\right| \leq \sum _{j\neq i}\left|{ A_{ij}\frac {x_{j}}{x_{i}}}\right| \leq \sum _{j\neq i}\left|A_{ij}\right|=R_{i}.$$ </details> ### Corollary Suppose $A$ is a transition matrix over the complex field and $\lambda$ is an eigen value of $A$, then $|\lambda| <1$. <details> <summary> Proof: </summary> Consider $A^T$. By the definition of transition matrix, $R_i = 1-A_{ii}$. If $\lambda$ is an eigen value of $A^T$, $$ \left|\lambda -A_{ii}\right| \le 1- A_{ii} \implies \left|\lambda\right| \le 1. $$ Since $A^T$ and $A$ share the same eigenvalues, the absolute value of $A$'s eigenvalue could not be greater than $1$. </details> ## The limit of distributions ### Statement If $A$ is a transition matrix with positive entries, i.e. $A_{ij}>0, \forall i, j$. Then 1. $\lim_{k \to \infty} A^k$ exists. 2. All column vectors of $\lim_{k \to \infty} A^k$ are equal to the eigenvector of $A$ corresponding to $1$. 3. The eigenvector of $A$ corresponding to $1$ is a probability vector. <details> <summary> Proof: </summary> If $A$ is diagonalizable, following the previous statements, we have $$ A = Q^{-1 } \begin{pmatrix} 1 & 0 &\dots & 0\\ 0 & \lambda_2 & \dots & 0\\ & &\dots & \\ 0 & 0 & \dots & \lambda_n\end{pmatrix} Q, $$ where $|\lambda_i|< 1, i=2, \dots, n$. Hence, $$ \lim_{k\to \infty}A^k = Q^{-1 } \begin{pmatrix} 1 & 0 &\dots & 0\\ 0 & \lim_{k\to \infty} \lambda_2^k & \dots & 0\\ & &\dots & \\ 0 & 0 & \dots & \lim_{k\to \infty} \lambda_n^k\end{pmatrix} Q = Q^{-1} \begin{pmatrix} 1 & 0 &\dots & 0\\ 0 & 0 & \dots & 0\\ & &\dots & \\ 0 & 0 & \dots & 0\end{pmatrix} Q. $$ Moreover, the first column of $Q^{-1}$ is the eigenvector of $A$ corresponding to $1$, which we denote as $v$; the first row of $Q$ is the eigenvector of $A^T$ corresponding to $1$, which is $(1, \dots, 1)$. Therefore, $$\lim_{k\to \infty}A^k = \begin{pmatrix} v & v & \dots v\\ \end{pmatrix}.$$ If $A$ is not diagonalizable, we need to use **Jordan normal form**. This is the reason that we extended the field into complex numbers when we proved the Gershgorin circle theorem. If $A$ is a matrix over complex number with $k$ eigenvalues, we can decompose it as $$ A = Q^{-1 } \begin{pmatrix} 1 & & & \\ 0 & J_2 & & \\ & &\dots \\ 0 & & & J_k\end{pmatrix} Q, $$ where $J_i$ is a square matrix corresponding to the eigenvalue $\lambda_i$ such that $$ J_i = \begin{pmatrix} \lambda_i & 1 & 0 & \dots & 0 \\ 0 & \lambda_i & 1 & \dots & 0 \\ & &\dots & \\ 0 & & \dots & 0 & \lambda_i\end{pmatrix} , \quad i=2, \dots, k, $$ After some calculations, we can get the same result, $$\lim_{k\to \infty}A^k = \begin{pmatrix} v & v & \dots v\\ \end{pmatrix}.$$ Finally, since $\lim_{k\to \infty}A^k$ is the limit of a non-negative matrix, $v$ must be non-negative. Moreover, $v$ is the first column of $Q^{-1}$ and $u^T= (1, \dots, 1)$ is the first row of $Q$, $Q^{-1}Q = I$ implies that $v^T u =1$, and $v$ is a probability vector. </details> ### Corollary Clearly, if $A^k$ is a matrix with all positive entries for some $k$, then $\lim_{k\to \infty}A^k$ exists even if some entries of $A$ is zero. Let $L = \lim_{k\to \infty}A^k = (v, \dots, v)$. Given any probability vector $x$, $Lx= v$. ## Recursive Macroeconomic Theory In the Macroeconomics textbook, the authors use $P$ and $\pi$ to denote the transition matrix and probability vector, respectively. They define the transition matrix as the transpose of the above definition. With the probability vector $\pi_t$ at time $t$, the probability vector of time $t+k$ is $$ \pi_{t+k}' = \pi_t' P^k, $$ where they use $\pi'$ but not $\pi^T$ to represent the transposition. They also state the following definitions and theorems. ### Stationary distribution An unconditional distribution is called **stationary** or **invariant** if it satisfies $$ \pi' P = \pi'. $$ ### Asymptotic stationarity Let $\pi_{\infty}$ be a unique vector that satisfies $(I − P')\pi_{\infty}= 0$. If for all initial distributions $\pi_0$ it is true that $\pi_0' P^k$ converges to the same Markov chains $\pi_{\infty}$, we say that the Markov chain is **asymptotically stationary** with a unique invariant distribution $\pi_{\infty}$. ### Theorem 2.2.1. Let $P$ be a stochastic matrix with $P_{ij} > 0 \quad \forall (i, j)$. Then $P$ has a unique stationary distribution, and the process is asymptotically stationary. ### Theorem 2.2.2. Let $P$ be a stochastic matrix for which $P^n_{ij} > 0 \quad \forall (i, j)$ for some value of $n \ge 1$. Then $P$ has a unique stationary distribution, and the process is asymptotically stationary. ## Reference Stephen H. Friedberg, Arnold J. Insel y Lawrence E. Spence, *Linear Algebra*, Prentice-Hall, Inc., 2003 Ljungqvist, Lars, and Thomas J. Sargent. *Recursive macroeconomic theory*, MIT press, 2018. https://en.wikipedia.org/wiki/Gershgorin_circle_theorem https://en.wikipedia.org/wiki/Jordan_normal_form