Basic Markov chains theory

# Basic Markov chains theory Taken from Chapter 11 of the [online version](https://www.probabilitycourse.com/chapter11/11_2_1_introduction.php) of the textbook "Introduction to Probability, Statistics, and Random Processes" by Hossein Pishro-Nik ## Discete-time ### Finite state space We first consider a finte number of states, i.e. we can use matrices to represent the transition probabilities. - Chapman-Kolmogorov equation. $$p_{ij}^{(m+n)}=P(X_{m+n}=j|X_0=i)=\sum_{k \in \text{states}} p_{ik} ^{(m)} p_{kj}^{(n)}$$ where $p_{ij}^{(m+n)}$ means the proability of getting from state $i$ to state $j$ in $(m+n)$ steps. This describes how to compute probabilities of moving between states. - Accessibility: we say state $j$ is accessible from $i$ if there is a possibility to get $i$ to $j$ after some time. - Communication: If two states are accessible from each other, they communicate. This is an equivalence relation, meaning we can break up the states into communicating classes where all members in the same class communicate. - Irreducibility: If all the states in the chain belong to one communicating class, the resulting MC is irreducible. Intuitively, say there are 2 classes and our MC contains states from both classes, this means in our chain not all states communicate with each other, i.e. we say that is reducible. For example we have class 1 and 2, the chain stays in class 1 for a while and moves to class 2 where it's not possible to move back to class 1 (if it is, then they'd be in the same class), hence the chain has states from both classes. - Recurrence and transience. A state can be either recurrent or transient. Recurrent meaning will be visited infinitely often, or if we leave the state, we will come back to it with probability 1. Transient means only finitely often. Say we know that the probability of returning to a transient state $i$ is some $f_{ii} < 1$, then the total number of visits to state $i$ follows Geometric($1-f_{ii}$). (Reminder: Geometric variable models the number of independent trials/failures it takes until we succeeed, given the probability of success). We can show that in the same class, all members are either recurrent or transient. Hence, the class can also be labelled as recurrent or transient. A chain that remains in a trasient class can visit the transient states finitely often with probability 1 but once it leaves the class, it will never come back. For recurrent classes, once the chain enters a recurrent class, it never leaves. If it leaves and come back, that means whatever class it left for actually belongs in the same class, since they can move to and fro. And if it leaves and never come back, the class would have been a transient class. - Periodicity. The largest integer $d$ such that whenever $n$ is not divisible by $d$, we have $p_{ii}^{(n)}=0$ (zero probability of returning). In other words, only when $n$ is divisible by $d$ (after some time that is a multiple of $d$) can we possibly return to the state $i$. If period = 1, state $i$ is aperiodic. Else, i.e. period > 1, it's periodic. Again, members of the same class have the same period. The whole chain is aperiodic if all of its state are aperiodic. Note that if $p_{ii} > 0$, i.e. self-transition is possible, then we possibly return after 1 step so we will get $d=1$, hence aperiodic. Also, say if we can return to a state after $m$ steps and $l$ steps, then $d = gcd(l,m)$. If $l$ and $m$ are co-prime, then $d=1$ (aperiodic) as welll. - Absorption. An absorbing state is one that upon visit the chain will be stuck at that state forever. For example, our MC may have several transient and recurrent classes. We can think of a recurrent class as a state and then say that once we enter that class, we may not escape. We can compute the probability of getting absorbed into some state $l$ starting from state $i$ using recursive expressions $a_i = \sum_k a_k p_{ik}$. We also know that $a_l = 1$ and $a_j =0$ for if state $j$ is another absorbing state (can never reach $l$). - Mean hitting times. The expected time taken to visit some set $A$ of states starting from $i$. Again, for finite case we can write things down recursively $t_i = 1 + \sum_k t_k p_{ik}$ for $i \notin A$. Of course $t_j=0$ if we start from a state in $A$. the "$1+$" is to account for at least taking one step if you're not already in $A$. - Mean return times. The expected time for a chain starting from state $i$ to return to state $i$ for the first time. Similarly, the recursive expression is $r_l = 1 + \sum_k p_{lk}t_k$ for $k \neq l$. Naturally, $t_l=0$ as it means already return. - Limiting distribution. The long-term behaviour of the MC, i.e. the distribution of the states after applying the transition matrix $n$ times and let $n \to \infty$. For large $n$, we expect our chain to be absorbed into one of the recurrent class, hence we ignore transient classes when it comes to limiting behaviours of the chain. However, if there is more than one recurrent class and if we start in one of the recurrent class, we cannot go to the other, i.e. the limiting distribution depends on where we start -- the limiting behaviour is not unique. Similarly, consider a two-state MC which switches states with probability 1, i.e. period of 2 (periodic), then the distribution of the state at any point also depends on where we start, e.g. if I start at state $i$ then for all odd $n$ I will be in the other state. The distribution do not converge at all in fact. To have a limiting distribution that is unique (does not depend on initial distribution), we require our chain to be irreducible and aperiodic. We can find that limiting distribution by solving for the stationary distribution -- solve $\pi P = \pi$. Given some limiting distribution $\pi$ where $\sum_{j \in \text{state}} \pi_j = 1$, then mean return time (expecting time between revisits) for state $j$ is $r_j = 1/ \pi_j$. ### Countable state space - For countably infinite states e.g. take the integers as state space, we have two finer notions of recurrence. - For a state $i$, if $r_i < \infty$ (a finite expected time/steps between revisit) the state is positive current. Else, $r_i = \infty$ means that state $i$ is null recurrent (recurrent as in we will revisit with probability 1 but the expected waiting time between revisits is infinite). All the states in a recurrent class are either positive or null recurrent. Side-note: all transient class/states have infinite expected return times. - Intuition: positive recurrence means that $r_i$ expected time between revisit might be an unbounded random variable (e.g. supported on the reals) but it must have finite expectation. Unbounded here means for any value $a$, we have that probability of $r_i > a$ is larger than 0. In other words, the random variable can be unbounded but must be integrable. - Intuition: null recurrence means that the unbounded random variable has infinite expectation. The symmetric random walk on the integers is null-recurrent in dimensions 1 and 2, and is transient from 3 onwards (proof is by George Polya in the 1921). - Irreducibility and aperiodicity is no longer enough. Given an irreducible and aperiodic MC, we can have either -- all states are transient (stationary distribution does not exist) -- all states are null recurrent (stationary distribution does not exist) -- all states are positive recurrent. In this case, we have a limiting distribution that is the unique stationary distribution. - In practice, given a chain, we find the stationary distribution. If it is unique, then the chain is positive recurrent and the stationary distribution is the limiting distribution. Else where no solution exists, the chain is either transient or null recurrent where in both cases we have $$\lim_{n\to \infty} P(X_n=j|X_0=i) = 0, \forall i,j.$$ - Conclusion: for things to work 'nicely' (a MC with a stationary distribution), here we need irreducibility, aperiodicity and positive recurrence. ## Continuous-time Markov Chains Generally, we consider a countable state-space $S$. Continuous-time means that the duration the chain stays on some state $i$ (holding time) is itself a continuous random variable. Unsurprisingly the holding time follows an exponential distribution with rate parameter $\lambda_i$, so to impart memorylessness for a valid MC. We assume $\lambda_i$ to be bounded for all $i \in S$. On average the chain stays at state $i$ for a duration of $1/\lambda_i$ (mean of the exponential distribution). - We do not worry about periodicity in CT MC because the time spent at a state can be any positive real number, won't be some multiple of any period. We still care about irreducibility. - We can think of a CT MC as a discrete MC with first jump, second jump so on, but instead of a unit time step now we have a continuous random time in between. Therefore, a CT MC has 2 parts: first is the jump chain/embedded MC that gives the transition probablities between states. Second is the holding time parameter for each state. - Note that if we remain at some state, it will be as if we started at that time (memorylessness), hence we do not care about self-transitions unless the state is absorbing, i.e. if some state $i$ is not absorbing, we take it as $p_{ii} = 0$. If some state $i$ is absorbing, then $p_{ii} = 1$ and $p_{ij} =0$ for all $j$ and it will spend forever at state $i$, i.e. $\lambda_i = \infty$. - Transition probability. They are defined in a similar way as discrete time, except there is a time parameter now where $p_{ij}(t)=P(X(t)=j|X(0)=i)$. Note that all the rows must sum up to 1 as usual. Also since $p_{ii} = 1$ and $p_{ij} = 0$ if $i \neq j$, we also see that $P(0)$ is the identity. Please do not confuse this with jump chain transition matrix. The jump chain tells us what to do upon a transition; the de facto transition probability tells us the probability of starting and ending at different states after some time. This can be tricky, I'm still a bit new to this idea. - Chapman-Kolmogorov Equation. For all times $s$ and $t$, we have that $P(s+t) = P(s)P(t)$. - Stationary distribution, $\pi P(t) = \pi$ for all $t \geq 0$ where $\sum_j \pi_j = 1 - Limiting distribution. $\pi_j = \lim_{t \to \infty} P(X(t)=j|X(0)=i)$ for any starting state $i$ and sums to 1. - For "nice chains", we will find a unique stationary distribution which equals the limiting distribution. So it seems like we have two methods to find the limiting/stationary distribution, but in practice the CT MC transition probabilities are hard to compute. We use the jump chains instead. - Take the transition of the jump chain (embedded discrete MC) and find the stationary distribution $\tilde \pi$. Since the CT MC spends on average $1/\lambda_i$ at each state $i$, we have the corresponding continuous case limiting distribution $$\pi_j = \frac{\tilde\pi_j/\lambda_j}{\text{normalizing constant}}$$ where the constant is $\sum_k \tilde\pi_j/\lambda_k$. - Generator. The generator matrix $G$ is constructed with diagonals $g_{ii} = -\lambda_i$ and non-diagonals $g_{ij} = \lambda_i p_{ij}$ where $p_{ij}$ is the probability from the jump chain (constant). The rows of $G$ sum to 0. - We can relate $G$ with the CT MC transitions at very small time $\delta$ using the approximations $$p_{jj}(\delta) \approx 1 + g_{jj} \delta$$ and $$p_{kj}(\delta) \approx \delta g_{kj} \forall k \neq j.$$ - Forwards and backwards equation. $dP(t)/dt = P(t)G = GP(t)$. - Using the approximation and forward/backward equations, we can show that $pi$ such that $\pi G = 0$ also solves $\pi P(t) = \pi$. So, we can use $G$ to find the stationary distribution. - If we draw the CT MC diagram using the values of $G$, it is called a transition rate diagram.