Periodic Markov-chain example

In this short note (and a corresponding colab notebook) about the example of a Markov-chain whose state distribution does not converge to a stationary distribution.

These are probably irrelevant technicalities for reinforcement learning, but an interesting topic to understand nevertheless.

Consider an integer

k and the homogeneous Markov-chain
St
such that the transition probabilities are:

P(St+1=n+1mod2k|St=n)=P(St+1=n1mod2k|St=n)=12

In other words, the state

St is a counter, and we either increase or decrease a counter by
1
, with probability
0.5
. I added the
mod2k
bit so that the counter always stays finite, bounded dbetween
0
and
2k1
.

The problem with this Markov-chain, from the perspective of converging to a stationary distribution, is that the states are periodic. An even number can only be reached from another even number after an even number of steps. Therefore, the periodicity of all states is

2. Because the states are periodic, the Markov-chain is not ergodic. See Wikipedia for a definition of periodicity of Markov Chains.

Ergodicity

A Markov-chain is ergodic, meaning that

  • all states are reachable from one another with positive probability (irreducibility)
  • all states are aperiodic, i.e. have a periodicity of 1, and
  • all states are recurrent, i.e. we return to the same state after visiting it with non-zero probability.

An ergodic Markov chain has a stationary distribution, and the state distribution converges to this stationary distribution irrespective of the starting distibution. Let's denote the stationary distrribution by

μ(s) as in the paper. Then we have that

μ(s)=limtP(St=s|S0=s0)

And

μ(s) does not depend on
s0
. Moreover, it is also true that if
ν(s0)
is an initial state distribution then

μ(s)=limts0P(St=s|S0=s0)ν(s0)

Back to our non-ergodic example

However, if the Markov-chain is non-ergodic, this convergence won't happen. For example, in the Markov chain we defined above, the one with periodicity 2, the limit

limtP(St=s|S0=s0)

simply does not exist. This sequence of probability distributions doesn't converge, it keeps oscillating between a uniform distribution over odd numbers and a uniform distribution over even numbers.

However, consider the initial distribution

ν(s0=0)=ν(s0=1)=0.5. Now, starting from this initial distribution, the state distribution does converge.

μ(s)=limts0P(St=s|S0=s0)ν(s0)=12k

Let's take the definition of the discounted stationary distibution, which isn't really a thing in Markov chains, but we use it in RL apparently:

μγ(s|s0)=t=0γtP(St=s|S0=s0)

This limit exists even for non-ergodic Markov chains (someone with fresher memories of calculus might be able to say why). However, this discounted stationary distibution-ish thing could depend on the initial state

s0.

I'm also not sure that this discounted distribution is ever independent of the initial conditon

s0, even for ergodic Markov chains. Intuitively, it won't be.

Colab notebook

Herer's a colab notebook illustrating the limiting behaviour of this Markov-chain.

Homework:

Calculate the discounted stationary distribution as a function of

s0 and
γ
.