# Periodic Markov-chain example
In this short note (and a corresponding [colab notebook](https://colab.research.google.com/drive/1VNrX571McG6EVBzI62p_h7HUXpbv48w4?usp=sharing)) about the example of a Markov-chain whose state distribution does not converge to a stationary distribution.
**These are probably irrelevant technicalities for reinforcement learning, but an interesting topic to understand nevertheless.**
Consider an integer $k$ and the homogeneous Markov-chain $S_{t}$ such that the transition probabilities are:
$$
\mathbb{P}(S_{t+1} = n+1 \mod 2k \vert S_t = n) = \mathbb{P}(S_{t+1} = n-1 \mod 2k \vert S_t = n) = \frac{1}{2}
$$
In other words, the state $S_t$ is a counter, and we either increase or decrease a counter by $1$, with probability $0.5$. I added the $\mod 2k$ bit so that the counter always stays finite, bounded dbetween $0$ and $2k-1$.
The problem with this Markov-chain, from the perspective of converging to a stationary distribution, is that the states are periodic. An even number can only be reached from another even number after an even number of steps. Therefore, the periodicity of all states is $2$. Because the states are periodic, the Markov-chain is not ergodic. See [Wikipedia](https://en.wikipedia.org/wiki/Markov_chain#Properties) for a definition of periodicity of Markov Chains.
## Ergodicity
A Markov-chain is ergodic, meaning that
* all states are reachable from one another with positive probability (irreducibility)
* all states are aperiodic, i.e. have a periodicity of 1, and
* all states are recurrent, i.e. we return to the same state after visiting it with non-zero probability.
An ergodic Markov chain has a stationary distribution, and the state distribution converges to this stationary distribution *irrespective of the starting distibution*. Let's denote the stationary distrribution by $\mu(s)$ as in the paper. Then we have that
$$
\mu(s) = \lim_{t\rightarrow \infty} \mathbb{P}(S_t = s\vert S_0 = s_0)
$$
And $\mu(s)$ does not depend on $s_0$. Moreover, it is also true that if $\nu(s_0)$ is an initial state distribution then
$$
\mu(s) = \lim_{t\rightarrow \infty}\sum_{s_0} \mathbb{P}(S_t = s\vert S_0 = s_0)\nu(s_0)
$$
## Back to our non-ergodic example
However, if the Markov-chain is non-ergodic, this convergence won't happen. For example, in the Markov chain we defined above, the one with periodicity 2, the limit
$$
\lim_{t\rightarrow \infty} \mathbb{P}(S_t = s\vert S_0 = s_0)
$$
simply does not exist. This sequence of probability distributions doesn't converge, it keeps oscillating between a uniform distribution over odd numbers and a uniform distribution over even numbers.
However, consider the initial distribution $\nu(s_0=0) = \nu(s_0=1) = 0.5$. Now, starting from this initial distribution, the state distribution does converge.
$$
\mu(s) = \lim_{t\rightarrow \infty}\sum_{s_0} \mathbb{P}(S_t = s\vert S_0 = s_0)\nu(s_0) = \frac{1}{2k}
$$
Let's take the definition of the *discounted* stationary distibution, which isn't really a thing in Markov chains, but we use it in RL apparently:
$$
\mu_\gamma(s\vert s_0) = \sum_{t=0}^\infty \gamma^t \mathbb{P}(S_t = s\vert S_0 = s_0)
$$
This limit exists even for non-ergodic Markov chains (someone with fresher memories of calculus might be able to say why). However, this *discounted* stationary distibution-ish thing could depend on the initial state $s_0$.
I'm also not sure that this discounted distribution is ever independent of the initial conditon $s_0$, even for ergodic Markov chains. Intuitively, it won't be.
### Colab notebook
Herer's a [colab notebook](https://colab.research.google.com/drive/1VNrX571McG6EVBzI62p_h7HUXpbv48w4?usp=sharing) illustrating the limiting behaviour of this Markov-chain.
### Homework:
Calculate the discounted stationary distribution as a function of $s_0$ and $\gamma$.