Periodic Markov-chain example

In this short note (and a corresponding colab notebook) about the example of a Markov-chain whose state distribution does not converge to a stationary distribution.

These are probably irrelevant technicalities for reinforcement learning, but an interesting topic to understand nevertheless.

Consider an integer

k

and the homogeneous Markov-chain

S_{t}

such that the transition probabilities are:

P (S_{t + 1} = n + 1 \mod 2 k | S_{t} = n) = P (S_{t + 1} = n - 1 \mod 2 k | S_{t} = n) = \frac{1}{2}

In other words, the state

S_{t}

is a counter, and we either increase or decrease a counter by

1

, with probability

0.5

. I added the

\mod 2 k

bit so that the counter always stays finite, bounded dbetween

0

and

2 k - 1

The problem with this Markov-chain, from the perspective of converging to a stationary distribution, is that the states are periodic. An even number can only be reached from another even number after an even number of steps. Therefore, the periodicity of all states is

2

. Because the states are periodic, the Markov-chain is not ergodic. See Wikipedia for a definition of periodicity of Markov Chains.

Ergodicity

A Markov-chain is ergodic, meaning that

all states are reachable from one another with positive probability (irreducibility)
all states are aperiodic, i.e. have a periodicity of 1, and
all states are recurrent, i.e. we return to the same state after visiting it with non-zero probability.

An ergodic Markov chain has a stationary distribution, and the state distribution converges to this stationary distribution irrespective of the starting distibution. Let's denote the stationary distrribution by

μ (s)

as in the paper. Then we have that

μ (s) = lim_{t \to \infty} P (S_{t} = s | S_{0} = s_{0})

And

μ (s)

does not depend on

s_{0}

. Moreover, it is also true that if

ν (s_{0})

is an initial state distribution then

μ (s) = lim_{t \to \infty} \sum_{s_{0}} P (S_{t} = s | S_{0} = s_{0}) ν (s_{0})

Back to our non-ergodic example

However, if the Markov-chain is non-ergodic, this convergence won't happen. For example, in the Markov chain we defined above, the one with periodicity 2, the limit

lim_{t \to \infty} P (S_{t} = s | S_{0} = s_{0})

simply does not exist. This sequence of probability distributions doesn't converge, it keeps oscillating between a uniform distribution over odd numbers and a uniform distribution over even numbers.

However, consider the initial distribution

ν (s_{0} = 0) = ν (s_{0} = 1) = 0.5

. Now, starting from this initial distribution, the state distribution does converge.

μ (s) = lim_{t \to \infty} \sum_{s_{0}} P (S_{t} = s | S_{0} = s_{0}) ν (s_{0}) = \frac{1}{2 k}

Let's take the definition of the discounted stationary distibution, which isn't really a thing in Markov chains, but we use it in RL apparently:

μ_{γ} (s | s_{0}) = \sum_{t = 0}^{\infty} γ^{t} P (S_{t} = s | S_{0} = s_{0})

This limit exists even for non-ergodic Markov chains (someone with fresher memories of calculus might be able to say why). However, this discounted stationary distibution-ish thing could depend on the initial state

s_{0}

I'm also not sure that this discounted distribution is ever independent of the initial conditon

s_{0}

, even for ergodic Markov chains. Intuitively, it won't be.

Colab notebook

Herer's a colab notebook illustrating the limiting behaviour of this Markov-chain.

Homework:

Calculate the discounted stationary distribution as a function of

s_{0}

and

γ

Periodic Markov-chain example

Ergodicity

Back to our non-ergodic example

Colab notebook

Homework:

Read more

Reading Group

Új témák MLJC-re

Importrance of Masking in Generative Modeling of Sequences

Independent Componet Analysis