Try   HackMD

Basics about probability distribution and Gaussians

written by @marc_lelarge

Probability recap

We start with real random variables (r.v.).

1- Why is variance positive?

Recall that

Var(X)=E[X2]E[X]2 so
Var(X)0
means that
E[X2]E[X]2
.

answer: Start with

E[(XE[X])2]=E[X22XE[X]+E[X]2]=E[X2]2E[X]E[X]+E[X]2=E[X2]E[X]2=Var(X).

Similarly, we have for the Covariance of the random variables

X and
Y
:
Cov(X,Y)=E[(XE[X])(YE[Y]]=E[XY]E[X]E[Y].

Note that

Var(X)=Cov(X,X).
We have for
a,bR
,
Var(aX+b)=a2Var(X)
, note that we use the standard notation where capital letters denote random variables and lowercase letters for constants or parameters.

We have

Var(X+Y)=Var(X)+Var(Y)+Cov(X,Y).

2- How to compute moments?

We start with a remark: if a random variable has a symmetric density function, i.e.

p(X=x)=p(X=x) for all
xR
, then its odd moments are zero:
E[X2k+1]=0
.

answer thanks to the moment generating function:

MX(t)=E[etX] so that we get:
E[Xk]=[dkMX(t)dtk]t=0

To understand why this is true, we can write :

etX=k0(tX)kk!=1+tX+(tX)22+(tX)36+ so that we have:
ddtetX=X+tX2+t2X32+d2dt2etX=X2+tX3+

Let's apply this method for the normalized Gaussian random variable

Z. We have
MZ(t)=E[etZ]=etzez2/22πdz=e(z22tz+t2)/22πet2/2dz=et2/2.

In particular, we have

MZ(t)=tet2/2MZ(t)=(1+t2)et2/2MZ(3)(t)=(3t+t3)et2/2MZ(4)(t)=(3+6t2+t4)et2/2...
so that we have
E[Z]=0
,
E[Z2]=1
,
E[Z3]=0
,
E[Z4]=3

Note that we already knew that the odd moments are zero but if you need the fourth moment, you need to compute the fourth derivative anyway

Note that for simple distributions, the direct computation might be easier. For example, for the uniform distribution on the interval

[a,b] with
a<b
, we have for
UUnif(a,b)
:
MU(t)=etbetat(ba),

and
E[U]=abxdxba=b2a22(ba)=a+b2,E[U2]=abx2dxba=a2+ab+b23,

so that
Var(U)=(ba)212
.

3- Independence implies null covariance but null covariance does not imply independence!

Here is a simple example: consider the random variables

(X,Y) equal to
(1,1)
,
(0,2)
or
(1,1)
with equal probability.

We clearly have

E[X]=E[Y]=0 and
Cov(X,Y)=E[XY]=13+03+13=0
but
X
and
Y
are not independent as knowing
X
determines
Y
. More formally, we have for example
E[X2]=2/3
and
E[X2Y]=2/3E[X2]E[Y]
.

Gaussian random variables

4- If

X is a Gaussian r.v. and
Y
is another Gaussian r.v. such that
XY
then
(X,Y)
is a Gaussian vector.

5- If

(X,Y) is a Gaussian vector then
XY
is equivalent to
Cov(X,Y)=0
.

But even if

XN(0,1),
YN(0,1)
and
Cov(X,Y)=0
, this does not imply that
(X,Y)
is a Gaussian vector. Here is a simple counter-example: take
XN(0,1)
and define for
a>0
:
Y={Xif |X|>aXif |X|a

It is easy to see that
YN(0,1)
, moreover, we have
Cov(X,Y)=E[XY]=E[X21(|X|>a)]E[X21(|X|a)].

We see that for
a0
, we have
Cov(X,Y)1
and when
a
, we have
Cov(X,Y)1
, so there exists a value of
a>0
for which
Cov(X,Y)=0
.
But
X+Y
is never a Gaussain r.v. as
X+Y={2Xif |X|>a0if |X|a

6- Moments of Gaussian r.v.

We have for

XN(0,1)
MX(t)=E[etX]=et22=etxex2/22πdx=e(x22tx+t2)/22πdxet2/2=et2/2e(xt)2/22πdx.

Hence the moments for

XN(0,1) are given by
E[X2m+1]=0
and
E[X2m]=2m!2mm!.

In general we have
E[(μ+σX)k]=m=0k(km)μmσkmE[Xkm].

Gaussian vectors

7- Partititoned Gaussians

We consider now a Gaussian vector

xN(μ,Σ) that we decompose as
x=(xaxb)
. We consider the same decomposition for the parameters:
μ=(μaμb),Σ=(ΣaaΣabΣbaΣbb).

Note that
Σba=ΣabT
.
We also introduce the precision matrix
Λ=Σ1
and decompose it as:
Λ=(ΛaaΛabΛbaΛbb).

Note that
ΛaaΣaa1
, indeed we can use the following formula for the inverse of a partitioned matrix:
(ABCD)1=(MMBD1D1CMD1+D1CMBD1),

where
M=(ABD1C)1
.

Hence we see that

Λaa=(ΣaaΣabΣbb1Σba)1.

Conditional distributions:

p(xa|xb)=N(xa|μa|b,Λaa1),
with
μa|b=μaΛaa1Λab(xbμb)

Marginal distribution

p(xa)=N(xa|μa,Σaa).

8- Marginal and Conditional Gaussians
Consider a Gaussian vector:

p(x)=N(x|μ,Λ1)
and a linear Gaussian model:
p(y|x)=N(y|Ax+b,L1),

where
A,b,μ
are parameters governing the means, and
Λ
and
L
are precision matrices. Then
z=(xy)
is a Gaussian vector and we have
p(y)=N(y|Aμ+b,L1+AΛ1AT)p(x|y)=N(x|Σ{ATL(yb)+Λμ},Σ), with Σ=(Λ+ATLA)1

tags: public tutorials