Absorbing Markov Chains

# Absorbing Markov Chains We consider a finite Markov chain with $r$ transient states and $s$ absorbing states. Namely, we consider a transition matrix of the form $$P = \begin{bmatrix} Q & R \\ O_{s\times r} & I_s\end{bmatrix},$$ where - $Q$ is the $r$ by $r$ matrix of transition probabilities among the transient states - $R$ is the $r$ by $r$ matrix of transition probabilities from the transient states to the terminal states - $O_{s\times r}$ is the zero matrix of size $r\times s$ and $I_s$ is the $s\times s$ identity matrix. Block-mutiplication gives the $k$-step transition matrix $$P^k = \begin{bmatrix} Q^k & \left(\sum_{j=0}^{k-1} Q^j\right) R \\ O_{s\times r} & I_s\end{bmatrix}.$$ We say that $P$ is an absorbing chain if for any transient state can reach some terminal state in a finite number of steps. Notice that each entry of $\left(\sum_{j=0}^{k-1} Q^j\right) R$ is non-decreasing in $k$, and since each of the first $r$ rows of $P^k$ sums to $1$, we know that $P$ is absorbing if and only if there exists some $K\in\mathbb{N}$ such that every row sum of $Q^K$ is less than $1$. Using the [row-sum upper bound on the spectral radius](https://hackmd.io/@sueiwenchen/on-spectral-radius), we know that the largest eigenvalue of $Q^k$ (and hence of $Q$) is less than $1$. This means that $I_r-Q$ invertible, so $(I_r-Q)^{-1}=I_r + Q + Q^2 + \cdots$ and $\lim_{k\to\infty} Q^k=0$. In summary, the limiting behavior of the chain $P$ is given by $$ \lim_{k\to\infty} P^k = \begin{bmatrix} O_r & (I_r- Q)^{-1} R \\ O_{s\times r} & I_s\end{bmatrix}.$$ We are interested in the random time of absorption $T_i$ when starting at the transient state $i$. For transient states $i$ and $j$, let $N_{ij}$ be the number of visits to $j$ given that the chain starts at $i$. Then we have $N_{ij}=\sum_{k=0}^\infty \mathbb{1}_{Z_k=j}$ with $Z_k:=X_k\vert (X_1=i)$ and hence (by the monotone convergence theorem) $$\mathbb{E}[N_{ij}]=\sum_{k=0}^\infty \mathbb{P}(X_k=j\mid X_1=i) = \sum_{k=0}^\infty P^k_{ij} = \sum_{k=0}^\infty Q^k_{ij}.$$ Therefore the $(i,j)$ entry of $(I_r- Q)^{-1}$ is the expected number of visits to $j$ given that the chain started at $i$. In particular, since $T_i=\sum_{j} N_{ij}$ this implies that the expected number of steps before absorption given that the chain starts at $i$ is the $i$-th row sum of the matrix $(I_r- Q)^{-1}$. In vector notation, let $T=[T_1 \cdots T_r]^t$ and $\mathbf{1}$ be the $r$-dimensional vector consisting of $1$'s. Then $$\mathbb{E}[T] = (I_r- Q)^{-1}\mathbf{1}.$$ ## Higher moments of $N_{ij}$ Note that by law of total expectation $$\begin{align} \mathbb{E}[N_{ij}^2] & = \sum_{k=1}^r P_{ik} \mathbb{E}[N_{ij}^2 \mid X_2 = k] = \sum_{k=1}^r P_{ik} \mathbb{E}[(\delta_{ij}+N_{kj})^2 ]\\ & = \delta_{ij} + 2\sum_{k=1}^r P_{ik} \mathbb{E}[N_{kj} ]\delta_{ij} + \sum_{k=1}^r P_{ik} \mathbb{E}[N_{kj}^2]. \end{align} $$ For a matrix $A$ let $A_{dg}$ be the matrix with the same diagonal entries as $A$. Also let $M^{(l)}:=(\mathbb{E}[N_{ij}^l])_{1\leq i,j\leq r}$. With these matrix notations we have $M^{(1)}=(I_r-Q)^{-1}$ and $$M^{(2)} = I_r + 2(QM^{(1)})_{dg} + QM^{(2)} \implies M^{(2)} = M^{(1)}(I_r+2(Q M^{(1)})_{dg}).$$ The same argument gives the recursive relation $$M^{(k)} = M^{(1)}\left(I_r+\sum_{l=1}^{k-1}\binom{k}{l} (Q M^{(l)})_{dg} \right).$$ Noting that $(AB_{dg})_{dg} = A_{dg} B_{dg}$ and setting $\Phi^{(k)}:=(QM^{(k)})_{dg}$, the equation above implies $$\Phi^{(k)} = \Phi^{(1)} + \sum_{l=1}^{k-1} \binom{k}{l} \Phi^{(1)}\Phi^{(l)}.$$ Solving this recursion yields the following formula for $k\geq 2$ : $$\Phi^{(k)} = \Phi^{(1)} \left[I_r + \sum_{l=1}^{k-1} \sum_{1\leq n_1<\cdots<n_l < k} \binom{k}{n_l} \prod_{j=2}^l \binom{n_j}{n_{j-1}} {\Phi^{(1)}}^l\right].$$ Substituing this back to the recursive relation of $M^{(k)}$ gives $$M^{(k)} = M^{(1)} \left[I_r + \sum_{l=1}^{k} \sum_{1\leq n_1<\cdots<n_l < n} \binom{n}{n_l} \prod_{j=2}^l \binom{n_j}{n_{j-1}} {\Phi^{(1)}}^l\right].$$ ## Higher moments of $T_i$ Let $t^{(n)}$ be the vector consisting of the $n$-th moments of $T_i$'s (and hence $T=t^{(1)}=M^{(1)}\mathbf{1}$). Then $$\mathbb{E}[T_i^n] = \sum_{k=1}^{r+s} P_{ik}\mathbb{E}[(1+T_k)^n].$$ Noting that $T_k=0$ for any terminal state $k$, we have $$t^{(n)} = (I_r-Q)^{-1}\left[ \mathbf{1} + \sum_{k=1}^{n-1} \binom{n}{k} Qt^{(k)}\right]= t^{(1)} + \sum_{k=1}^{n-1} \binom{n}{k} (I_r-Q)^{-1}Qt^{(k)}.$$ Setting $A:=Q(I_r-Q)^{-1}$ gives $$t^{(n)} = t^{(1)} + \sum_{k=1}^{n-1}\binom{n}{k}At^{(k)}$$ and therefore \begin{align*}t^{(k)} & = \left[I_r + \sum_{l=1}^{k-1} \sum_{1\leq n_1<\cdots<n_l < k} \binom{k}{n_l} \prod_{j=2}^l \binom{n_j}{n_{j-1}} A^l\right] t^{(1)}.\end{align*} ## What is that combinatorial term? The term $$C_{k,l}:=\sum_{1\leq n_1<\cdots<n_l < k} \binom{k}{n_l} \prod_{j=2}^l \binom{n_j}{n_{j-1}}, \hspace{5pt} l=1,\dots,k-1$$ has a natural combinatorial explanation. For illustration take $k=4$. Then $$C_{4,1} = \binom{4}{3}+ \binom{4}{2}+ \binom{4}{1}, C_{4,2} = \binom{4}{3}\binom{3}{2}+ \binom{4}{3}\binom{3}{1}+ \binom{4}{2}\binom{2}{1}, C_{4,3} =\binom{4}{3}\binom{3}{2}\binom{2}{1}.$$ Thus $C_{k,l}$ is the number of ways to recursively select a subset from a set of size $k$ in exactly $l$ steps under the constraint that at each step the selected subset must be proper. Note that this term can almost be simplified to multinomial coefficients, but here the gap is required to be at least $1$. Nonetheless, one can show using multinomial coefficients that $$\sum_{j=0}^l \binom{l+1}{j+1} C_{k,j} = \sum_{j=0}^l \binom{l+1}{j} C_{k,l-j} = (l+1)^k \text{ for } k\geq 1, 0\leq l \leq k-1$$ where $C_{k,0}=1$. From here one can obtain another expression $$C_{k,l} = (l+1)^k + \sum_{m=1}^l \left[ \sum_{\substack{j\geq 0 \text{ s.t.}\\m=n_0<\cdots < n_j < n_{j+1}=l+1}} (-1)^{j+1} \prod_{i=0}^j\binom{n_{i+1}}{n_i}\right]m^k,$$ where it is implicit that when $j=0$ the inner most sum gives the term $-\binom{l+1}{m}$. You can find the code that computes the term $C_{k,l}$ [here](https://github.com/SueiWenChen/Absorbing-time-of-markov-chains/blob/main/Ckl_computation.ipynb). The complexity of this terms suggests that this is probably not the right way to approach this problem. ## Tail probabilites Even though we managed to obtain a formula for the moments of the absorption time, the combinatorial term is hard to analyze. Instead we take a more direct and natural point of view. Let $X_\tau$ denote the random time of termination when the initial state (which is transient) is distributed as $\tau$. Then $\mathbb{P}(X_\tau > k) = \tau Q^k \mathbf{1}$ and hence the cdf of $X_\tau$ is $F_{X_\tau}(k) = 1-\tau Q^k \mathbf{1}$ for $k\geq 0$. Thus the density is $$\mathbb{P}(X_\tau=k) = F_{X_\tau}(k)-F_{X_\tau}(k-1) = \tau Q^{k-1}(I_r-Q)\mathbf{1}_r = \tau Q^{k-1}R\mathbf{1}_s$$ for $k\geq 1$. In particular, we have $$\mathbb{P}(T_i > k) =e_i^tQ^k \mathbf{1} $$ and $$t^{(n)}_i = \mathbb{E} T_i^n = \sum_{k=1}^\infty k^n f_{X_{e_i}}(k) = e_i^t \left( \sum_{k=0}^\infty (k+1)^n Q^k\right)(I_r-Q)\mathbf{1},$$ i.e. $$t^{(n)} = \left( \sum_{k=0}^\infty (k+1)^n Q^k\right)(I_r-Q)\mathbf{1}.$$ ## Simulation See [here](https://github.com/SueiWenChen/Absorbing-time-of-markov-chains/blob/main/absorbing_mc.ipynb) for a simple simulation that generates trajectories from a simple random walk on a graph with absorbing states. ## Applications The moment formula may be useful in analyzing the convergence of Monte Carlo algorithms in episodic Markov Decision Processes where the absorption time is random. There are two graphs naturally associated with a Monte Carlo MDP control problem, namely the graph on the state space and the graph on the [policy space](https://hackmd.io/@sueiwenchen/graph-perspective-on-gpi). If we want to adopt undiscounted reward, there must be some restrictions on transition dynamics for the return to be finite. Examples of such assumptions are stochastic feed-forward (SFF) and optimal policy feed-forward (OPFF), which induce topological orderings on the graph of state space. Under SFF, the transition matrix $P=P^\pi$ induced by any policy $\pi$ would be upper triangular, for which the the associated $Q$ would be strictly upper-triangular and hence powers of $Q^{l}$ have receding diagonals. # References [1] [Dong, Zixuan, Che Wang, and Keith Ross. "On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs." arXiv preprint arXiv:2209.02864 (2022).](https://arxiv.org/abs/2209.02864)