Given a set of samples . What is its covariacne? It is agreeable that the mean is
For covariance, maybe you have heard of
Which formula for covariance is correct?
You need: handout, 5 dice per group
In probability theory, we assume we know the details of the probability distribution , where the event happens with probability . Here is an example of a fair dice.
value | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
probability | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |
There are fomal definitions for the mean and the covariance of
This covariance is usually called the population covariance , indicating that you know the details of the whole population.
However, we never know if a dice is fair or not. We can only get some samples and use these data to estimate the population covariance. Here comes the problem. You obtain some samples . When we want to estimate the covariance , we do not know the mean as well. The only thing we can do is to replace it by the sample mean
Then calculate the sample covariance
The point is that is itself a random variable, depending on your samples. Ideally, when you run the experiment several times, the average of should be close to the real anser . As you have seen in the experiment, setting the denominator as suprisingly did the job!