Covariance estimation

Question

Given a set of samples

x = (x_{1}, \dots, x_{N})

. What is its covariacne? It is agreeable that the mean is

\overset{―}{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i} .

For covariance, maybe you have heard of

σ^{2} = \frac{1}{N} \sum_{i = 1}^{N} (x_{i} - \overset{―}{x})^{2} and s^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} (x_{i} - \overset{―}{x})^{2} .

Which formula for covariance is correct?

Experiments

You need: handout, 5 dice per group

Review that the covariance of a fair dice is
$2.916 \dots$ .
Roll 5 dice at once. Record their numbers, calculate the mean, and the value
$τ^{2}$ .
Calculate the mean for the column of
$σ^{2}$ . Check if it is close to
$2.916 \dots$ .
If there are several groups, we may combine the data together.

Intuition

In probability theory, we assume we know the details of the probability distribution

X

, where the event

X = x

happens with probability

p_{x}

. Here is an example of a fair dice.

value	1	2	3	4	5	6
probability	1/6	1/6	1/6	1/6	1/6	1/6

There are fomal definitions for the mean and the covariance of

X

E [X] = \sum_{x} p_{x} \cdot x and Var (X) = E [(X - E [X])^{2}] .

This covariance is usually called the population covariance , indicating that you know the details of the whole population.

However, we never know if a dice is fair or not. We can only get some samples and use these data to estimate the population covariance. Here comes the problem. You obtain some samples

x = (x_{1}, \dots, x_{N})

. When we want to estimate the covariance

Var (X) = E [(X - E [X])^{2}]

, we do not know the mean

E [X]

as well. The only thing we can do is to replace it by the sample mean

\overset{―}{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i} .

Then calculate the sample covariance

s^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} (x_{i} - \overset{―}{x})^{2} .

The point is that

s^{2}

is itself a random variable, depending on your samples. Ideally, when you run the experiment several times, the average of

s^{2}

should be close to the real anser

Var (X)

. As you have seen in the experiment, setting the denominator as

N - 1

suprisingly did the job!

Resources

YouTube: Why Sample Variance is Divided by n-1 by Krish Naik

Covariance estimation

Question

Experiments

Intuition

More questions to think about

Resources