Covariance estimation

Question

Given a set of samples

x=(x1,โ€ฆ,xN). What is its covariacne? It is agreeable that the mean is

xโ€•=1Nโˆ‘i=1Nxi.

For covariance, maybe you have heard of

ฯƒ2=1Nโˆ‘i=1N(xiโˆ’xโ€•)2 and s2=1Nโˆ’1โˆ‘i=1N(xiโˆ’xโ€•)2.

Which formula for covariance is correct?

Experiments

You need: handout, 5 dice per group

  1. Review that the covariance of a fair dice is
    2.916โ‹ฏ
    .
  2. Roll 5 dice at once. Record their numbers, calculate the mean, and the value
    ฯ„2
    .
  3. Calculate the mean for the column of
    ฯƒ2
    . Check if it is close to
    2.916โ‹ฏ
    .
  4. If there are several groups, we may combine the data together.

Intuition

In probability theory, we assume we know the details of the probability distribution

X, where the event
X=x
happens with probability
px
. Here is an example of a fair dice.

value 1 2 3 4 5 6
probability 1/6 1/6 1/6 1/6 1/6 1/6

There are fomal definitions for the mean and the covariance of

X

E[X]=โˆ‘xpxโ‹…x and Var(X)=E[(Xโˆ’E[X])2].

This covariance is usually called the population covariance , indicating that you know the details of the whole population.

However, we never know if a dice is fair or not. We can only get some samples and use these data to estimate the population covariance. Here comes the problem. You obtain some samples

x=(x1,โ€ฆ,xN). When we want to estimate the covariance
Var(X)=E[(Xโˆ’E[X])2]
, we do not know the mean
E[X]
as well. The only thing we can do is to replace it by the sample mean

xโ€•=1Nโˆ‘i=1Nxi.

Then calculate the sample covariance

s2=1Nโˆ’1โˆ‘i=1N(xiโˆ’xโ€•)2.

The point is that

s2 is itself a random variable, depending on your samples. Ideally, when you run the experiment several times, the average of
s2
should be close to the real anser
Var(X)
. As you have seen in the experiment, setting the denominator as
Nโˆ’1
suprisingly did the job!

More questions to think about

  1. Calculate the covariance of a fair coin of two sides
    0
    and
    1
    .
  2. Consider a random variable
    xโ€•
    as the mean of five dice. Describe its probability distribution.
  3. Consider a random variable
    s2
    as the sample variance of five dice. Describe its probability distribution.

Resources

  1. YouTube: Why Sample Variance is Divided by n-1 by Krish Naik