---
tags: math, macro
---
# Markov matrix
## Setting
We use $s_t$ to denote the state of a variable in period $t$. Suppose there are only finite states $e_1, \dots, e_n$, and the probability of being in any state next period given all states before now is equal to the probability conditional on the last state. That is
$$P(s_{t+1} = e_i | s_{t}, s_{t-1}, s_{t-2}, \dots ) = P(s_{t+1}= e_i| s_t).$$
We can use **transition matrix** to describe this model. There are two equivalent ways to write the transition matrix. Here we use the conventional one in linear algebra.
### Transition matrix
Given the above setting, each element of the transition matrix $A$ is,
$$ A_{ij} = P(s_{t+1} = e_i | s_t = e_j),$$
A transition matrix must satisfy the following properties:
1. $0 \le A_{ij} \le 1$ for all $i,j$. Since probability is between $0$ and $1$.
2. $\sum_{i=1}^n A_{ij} = 1$ for all $j$. The column sum must be $1$ since the probability of all mutually exclusive events is $1$.
### Probability vector
A non-negative vector $x$ is a **probability vector** if $\sum_{i=1}^n x_{i} = 1$. Suppose $x$ represents the probability of each state in time $t$, then the probability of each state in time $t+k$ is
$$
P^k x.
$$
## 1 is an eigenvalue
### Statement
Suppose $A$ is a transition matrix, then $1$ is an eigenvalue of $A$.
<details>
<summary>
Proof:
</summary>
Let $u$ be a vector that all of its elements is $1$, i.e., $u=(1, \dots, 1)^T$. We have,
$$A^T u = u.$$
$1$ is an eigenvalue of $A^T$. Because $det(A - \lambda I) = det(A^T - \lambda I)$, $A^T$ and $A$ share the same eigenvalues. Hence, $1$ is an eigenvalue of $A$.
</details>
## Eigenvectors corresponding to $1$
### Statement
If $A$ is a transition matrix with positive entries, i.e. $A_{ij}>0, \forall i, j$. Then the dimension of eigenspace corresponding to eigenvalue $1$ is $1$. In other words, if $u, v$ are two vectors such that
$$Au=u,\quad Av=v,$$
Then $u = \alpha v$ for some $\alpha$ in the field.($\mathbb{R}$ or $\mathbb{C}$)
<details>
<summary>
Proof:
</summary>
Consider $A^T$, let $u$ be the vector that all its elements are $1$. we want to show that if $A^Tv =v$, then $v = \alpha u$ for some $\alpha$ in the field.
Suppose not; we can find the largest element $v_i$ in $v$ that is greater or equal to all other elements and at least strictly greater than one element in $v$. Let $A^T_i$ be the corresponding row vector of $A^T$; we must have,
$$
A^T_i v_i = v_i.
$$
However, since all entries of $A^T$ is positive, $A^T_{ii}<1$ and $\sum_{j=1}^n A^T_{ij}=1$, the weighted average of elements smaller than $v_i$ could not be equal to $v_i$, which is a contradiction. Hence, $u$ is the basis of the eigenspace of $A^T$ corresponding to $1$.
Because row rank equals to column rank, we have $rank(A - \lambda I) = rank(A^T - \lambda I)$. The dimensions of eigenspace corresponding to the same eigenvalue of $A^T$ and $A$ are equal. Therefore, the eigenspace of $A$ corresponding to $1$ is one dimension.
</details>
## No eigenvalue longer than $1$
The following theorem is called **Gershgorin circle theorem**. We need to extend the field from real numbers into complex numbers for further usage.
### Definition
Let $A$ be a $n \times n$ matrix over the complex field, and define $R_i$ as the sum of the absolute values of the non-diagonal entries in the $i$ row.
$$
R_{i}=\sum _{j\neq {i}}\left|a_{ij}\right|.
$$
Let $D(A_{ii},R_{i})\subseteq \mathbb {C}$ be a closed disc centered at $A_{ii}$ with radius $R_{i}$.
### Statement
Every eigenvalue of $A$ lies within at least one of the Gershgorin discs $D(A_{ii},R_{i})$
<details>
<summary>
Proof:
</summary>
Let $\lambda$ be an eigenvalue of $A$ with corresponding eigenvector $x$. We want to show that $\left|\lambda -A_{ii}\right|\leq R_{i}.$
Find $i$ such that the element of $x$ with the largest absolute value is $x_{i}$. Since $Ax=\lambda x$, in particular, we take the $i$th component of that equation to get
$$\sum _{j}A_{ij}x_{j}=\lambda x_{i}.$$
Moving $A_{ii}$ to the other side,
$$\sum _{j\neq i}A_{ij}x_{j}=(\lambda -A_{ii})x_{i}.$$
Dividing each side with $x_i$ and taking the absolute value,
$$\left|\lambda -A_{ii}\right|=\left|\sum _{j\neq i}{\frac {A_{ij}x_{j}}{x_{i}}}\right|
.$$
Applying the triangle inequality,
$$\left|\sum _{j\neq i}{\frac {A_{ij}x_{j}}{x_{i}}}\right| \le \sum _{j\neq i}\left|{ A_{ij}\frac {x_{j}}{x_{i}}}\right| .$$
Because $x_i$ is the element of $x$ with the largest absolute value, for any element $x_j$,
$${\frac {\left|x_{j}\right|}{\left|x_{i}\right|}}\leq 1.$$
Hence,
$$\left|\lambda -A_{ii}\right|=\left|\sum _{j\neq i}{\frac {A_{ij}x_{j}}{x_{i}}}\right| \leq \sum _{j\neq i}\left|{ A_{ij}\frac {x_{j}}{x_{i}}}\right| \leq \sum _{j\neq i}\left|A_{ij}\right|=R_{i}.$$
</details>
### Corollary
Suppose $A$ is a transition matrix over the complex field and $\lambda$ is an eigen value of $A$, then $|\lambda| <1$.
<details>
<summary>
Proof:
</summary>
Consider $A^T$. By the definition of transition matrix, $R_i = 1-A_{ii}$. If $\lambda$ is an eigen value of $A^T$,
$$
\left|\lambda -A_{ii}\right| \le 1- A_{ii} \implies \left|\lambda\right| \le 1.
$$
Since $A^T$ and $A$ share the same eigenvalues, the absolute value of $A$'s eigenvalue could not be greater than $1$.
</details>
## The limit of distributions
### Statement
If $A$ is a transition matrix with positive entries, i.e. $A_{ij}>0, \forall i, j$. Then
1. $\lim_{k \to \infty} A^k$ exists.
2. All column vectors of $\lim_{k \to \infty} A^k$ are equal to the eigenvector of $A$ corresponding to $1$.
3. The eigenvector of $A$ corresponding to $1$ is a probability vector.
<details>
<summary>
Proof:
</summary>
If $A$ is diagonalizable, following the previous statements, we have
$$
A = Q^{-1 }
\begin{pmatrix} 1 & 0 &\dots & 0\\
0 & \lambda_2 & \dots & 0\\
& &\dots & \\
0 & 0 & \dots & \lambda_n\end{pmatrix}
Q,
$$
where $|\lambda_i|< 1, i=2, \dots, n$. Hence,
$$
\lim_{k\to \infty}A^k = Q^{-1 }
\begin{pmatrix} 1 & 0 &\dots & 0\\
0 & \lim_{k\to \infty} \lambda_2^k & \dots & 0\\
& &\dots & \\
0 & 0 & \dots & \lim_{k\to \infty} \lambda_n^k\end{pmatrix}
Q = Q^{-1}
\begin{pmatrix} 1 & 0 &\dots & 0\\
0 & 0 & \dots & 0\\
& &\dots & \\
0 & 0 & \dots & 0\end{pmatrix}
Q.
$$
Moreover, the first column of $Q^{-1}$ is the eigenvector of $A$ corresponding to $1$, which we denote as $v$; the first row of $Q$ is the eigenvector of $A^T$ corresponding to $1$, which is $(1, \dots, 1)$. Therefore,
$$\lim_{k\to \infty}A^k =
\begin{pmatrix}
v & v & \dots v\\
\end{pmatrix}.$$
If $A$ is not diagonalizable, we need to use **Jordan normal form**. This is the reason that we extended the field into complex numbers when we proved the Gershgorin circle theorem. If $A$ is a matrix over complex number with $k$ eigenvalues, we can decompose it as
$$
A = Q^{-1 }
\begin{pmatrix} 1 & & & \\
0 & J_2 & & \\
& &\dots \\
0 & & & J_k\end{pmatrix}
Q,
$$
where $J_i$ is a square matrix corresponding to the eigenvalue $\lambda_i$ such that
$$
J_i =
\begin{pmatrix} \lambda_i & 1 & 0 & \dots & 0 \\
0 & \lambda_i & 1 & \dots & 0 \\
& &\dots & \\
0 & & \dots & 0 & \lambda_i\end{pmatrix}
, \quad i=2, \dots, k,
$$
After some calculations, we can get the same result,
$$\lim_{k\to \infty}A^k =
\begin{pmatrix}
v & v & \dots v\\
\end{pmatrix}.$$
Finally, since $\lim_{k\to \infty}A^k$ is the limit of a non-negative matrix, $v$ must be non-negative. Moreover, $v$ is the first column of $Q^{-1}$ and $u^T= (1, \dots, 1)$ is the first row of $Q$, $Q^{-1}Q = I$ implies that $v^T u =1$, and $v$ is a probability vector.
</details>
### Corollary
Clearly, if $A^k$ is a matrix with all positive entries for some $k$, then $\lim_{k\to \infty}A^k$ exists even if some entries of $A$ is zero.
Let $L = \lim_{k\to \infty}A^k = (v, \dots, v)$. Given any probability vector $x$, $Lx= v$.
## Recursive Macroeconomic Theory
In the Macroeconomics textbook, the authors use $P$ and $\pi$ to denote the transition matrix and probability vector, respectively. They define the transition matrix as the transpose of the above definition. With the probability vector $\pi_t$ at time $t$, the probability vector of time $t+k$ is
$$
\pi_{t+k}' = \pi_t' P^k,
$$
where they use $\pi'$ but not $\pi^T$ to represent the transposition.
They also state the following definitions and theorems.
### Stationary distribution
An unconditional distribution is called **stationary** or **invariant** if it satisfies
$$
\pi' P = \pi'.
$$
### Asymptotic stationarity
Let $\pi_{\infty}$ be a unique vector that satisfies $(I − P')\pi_{\infty}= 0$.
If for all initial distributions $\pi_0$ it is true that $\pi_0' P^k$ converges to the same Markov chains $\pi_{\infty}$, we say that the Markov chain is **asymptotically stationary** with a unique invariant distribution $\pi_{\infty}$.
### Theorem 2.2.1.
Let $P$ be a stochastic matrix with $P_{ij} > 0 \quad \forall (i, j)$. Then $P$ has a unique stationary distribution, and the process is asymptotically stationary.
### Theorem 2.2.2.
Let $P$ be a stochastic matrix for which $P^n_{ij} > 0 \quad \forall (i, j)$ for some value of $n \ge 1$. Then $P$ has a unique stationary distribution, and the process is asymptotically stationary.
## Reference
Stephen H. Friedberg, Arnold J. Insel y Lawrence E. Spence, *Linear Algebra*, Prentice-Hall, Inc., 2003
Ljungqvist, Lars, and Thomas J. Sargent. *Recursive macroeconomic theory*, MIT press, 2018.
https://en.wikipedia.org/wiki/Gershgorin_circle_theorem
https://en.wikipedia.org/wiki/Jordan_normal_form