---
disqus: ierosodin
---
# Gaussian
> Organization contact [name= [ierosodin](ierosodin@gmail.com)]
###### tags: `machine learning` `學習筆記`
==[Back to Catalog](https://hackmd.io/@ierosodin/Machine_Learning)==
這裡我們想要推導出高斯分佈,首先,我們希望有一個分佈滿足以下兩點:
1. 離平均值等距的地方,其機率相等
2. 帶入參數 ${\mu,\ \sigma^2}$ 後,分佈是唯一的
這裡我們會需要一項重要的積分工具:jacobian
## Jacobian determinant
惡補幾個基本的線性代數:
1. 兩向量 ${\vec{x_1},\ \vec{x_2}}$ 展開的平行四邊形面積為 ${det\left[\begin{array}{c}
\vec{x_1} \\
\vec{x_2}
\end{array}\right]}$
2. 若基底經過一個線性轉換 ${A}$,則其面積變為原本的 ${detA}$ 倍
接下來就要介紹所謂的 jacobian 矩陣,若要將原本的基底 ${\{x,\ y\}}$ 轉換為 ${\{u,\ v\}}$,其轉換矩陣為:
${J(u,\ v)\ =\ \left[\begin{array}{cc}
\frac{\partial x}{\partial u} \frac{\partial x}{\partial v} \\
\frac{\partial y}{\partial u} \frac{\partial y}{\partial v}
\end{array}\right]}$
因此對於一個極小面積 ${dxdy}$,其轉換後的面積為 ${det(J(u,\ v))dudv}$
### 解 Gaussian integral
${\begin{split}
\int_{-\infty}^{\infty}e^{-x^2}dx\ &=\ \sqrt{\int_{-\infty}^{\infty}e^{-x^2}dx\int_{-\infty}^{\infty}e^{-y^2}dy}\ =\ \sqrt{\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-(x^2+y^2)}dxdy} \\
&=\ \sqrt{\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2}det(J)drd\theta} \\
&= \sqrt{\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2}rdrd\theta} \\
&=\ \sqrt{2\pi\int_0^\infty e^{-r^2}d\frac{r^2}{2}} \\
&=\ \sqrt{\pi\int_0^\infty e^{-r^2}dr^2} \\
&=\ \sqrt{\pi(-e^{-r^2})|_0^\infty} \\
&=\ \sqrt{\pi}
\end{split}}$
## Gaussian distribution 的 close form
有了以上的工具,我們就可以開始推導我們的高斯分佈了。
首先,由於距離 mean 等距的地方,我們希望其機率值相等,因此下列式子必成立:
${P(x,\ y)\ =\ P(x)P(y)\ =\ P(r)}$
也就是說,${x}$ 與 ${y}$ 彼此互相獨立,且機率值僅跟離中心的距離有關,又因為僅跟距離 ${r}$ 有關,與 ${\theta}$ 無關,因此:
${\frac{d(P(x)P(y))}{d\theta}\ =\ \frac{dP(r)}{d\theta}\ =\ 0}$
又 ${\theta}$ 是 ${x}$ 與 ${y}$ 的函數:
${\Rightarrow\ \frac{dP(r)}{d\theta}\ =\ \frac{dP(x)}{dx}\frac{dx}{d\theta}P(y)\ +\ \frac{P(y)}{dy}\frac{dy}{d\theta}P(x)\ =\ 0}$
${\because\ x\ =\ rcos\theta,\ y\ =\ rsin\theta \\
\Rightarrow\ \frac{dx}{d\theta}\ =\ -y,\ \frac{dy}{d\theta}\ =\ x \\
\therefore\ \frac{dP(x)}{dx}\frac{dx}{d\theta}P(y)\ +\ \frac{P(y)}{dy}\frac{dy}{d\theta}P(x)\ =\ -\frac{dP(x)}{dx}yP(y)\ +\ \frac{dP(y)}{dy}xP(x)\ =\ 0 \\
\Rightarrow\ \frac{dP(x)}{dx}yP(y)\ =\ \frac{dP(y)}{dy}xP(x) \\
\Rightarrow\ \frac{dP(x)}{dx}\frac{1}{xP(X)}\ =\ \frac{dP(y)}{dy}\frac{1}{yP(y)}}$
又因為 ${x}$ 與 ${y}$ 彼此互相獨立,因此恆等於一個常數:
${\frac{dP(x)}{dx}\frac{1}{xP(X)}\ =\ c \\
\Rightarrow\ \frac{dP(x)}{P(X)}\ =\ cxdx \\
\Rightarrow\ lnP(x)\ =\ c\frac{x^2}{2}\ +\ d \\
\Rightarrow\ P(x)\ =\ e^{\frac{cx^2}{2}\ +\ d}\ =\ ae^{\frac{cx^2}{2}}}$
> 這裡只要解其中一維就好
接著再帶入機率條件,積分和為一:
${\int_{-\infty}^\infty P(x)dx\ =\ 1 \\
\Rightarrow\ a\int_{-\infty}^\infty e^{\frac{cx^2}{2}}dx\ =\ 1 \\
\Rightarrow\ c\ <\ 0}$
${因此令\ c\ =\ \frac{-k}{2},\ k\ >\ 0 \\
\Rightarrow\ a\int_{-\infty}^\infty e^{-kx^2}dx\ =\ 1}$
又由 Gaussian integral:
${a\int_{-\infty}^\infty e^{-kx^2}dx\ =\ a\sqrt{\frac{\pi}{k}}\ =\ 1 \\
\Rightarrow\ a\ =\ \sqrt{\frac{k}{\pi}} \\
\Rightarrow\ P(x)\ =\ \sqrt{\frac{k}{\pi}}e^{-kx^2}}$
接下來帶入 ${(\mu,\ \sigma^2)}$,求唯一函數:
${\sigma^2\ =\ \int_{-\infty}^\infty (x\ -\ \mu)^2P(x)dx\ =\ \int_{-\infty}^\infty x^2P(x)dx}$
> 因為我們希望的函式是一個以 ${\mu}$ 為對稱中心的函數,因此 ${x}$ 做位移後,不會改變積分結果。
這裡用到 integration by parts:
${\int_{-\infty}^\infty x^2e^{-kx^2}dx\ =\ \frac{-1}{k}e^{-kx^2}|_{-\infty}^\infty\ +\ \frac{1}{2k}\int_{-\infty}^\infty e^{-kx^2}dx\ =\ \frac{1}{2k}\sqrt{\frac{\pi}{k}} \\
\Rightarrow\ \sqrt{\frac{k}{\pi}}\int_{-\infty}^\infty x^2e^{-kx^2}dx\ =\ \sqrt{\frac{k}{\pi}}\frac{1}{2k}\sqrt{\frac{\pi}{k}}\ =\ \frac{1}{2k}\ =\ \sigma^2 \\
\Rightarrow\ k\ =\ \frac{1}{2\sigma^2} \\
\Rightarrow\ P(x)\ =\ \frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}}$
推得唯一的機率分佈,且符合前面所提的兩個條件,也就是高斯分佈。
## MLE on Gaussian
這裡要找出 Gaussian MLE 的 ${\mu}$ 與 ${\sigma^2}$:
${L(\theta\ =\ \mu,\ \sigma^2|D)\ =\ P(D|\theta\ =\ \mu,\ \theta)\ =\ \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}}$
和前面一樣,看到連乘取極值,就想到取 ${log}$:
${\begin{split}
log\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\ &=\ \sum_{i=1}^{n}log\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\ =\ nlog(2\pi\sigma^2)^{\frac{-1}{2}}\ -\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2\sigma^2} \\
&=\ \frac{-n}{2}log(2\pi\sigma^2)\ -\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2\sigma^2}
\end{split}}$
分別對 ${\mu}$ 與 ${\sigma}$ 做偏微分:
${\frac{d}{d\mu}\left(\frac{-n}{2}log(2\pi\sigma^2)\ -\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2\sigma^2}\right)\ =\ -\frac{d}{d\mu}\sum_{i=1}^{n}\frac{(x-\mu)^2}{2\sigma^2}\ =\ 0 \\
\sum_{i=1}^{n} x\ =\ n\mu \\
\Rightarrow\ \mu_{MLE}\ =\ \frac{\sum_{i=1}^{n} x}{n}}$
${\frac{d}{d\sigma^2}\left(\frac{-n}{2}log(2\pi\sigma^2)\ -\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2\sigma^2}\right) \\
\begin{split}
\Rightarrow\ \frac{d}{ds}\left(\frac{-n}{2}log(2\pi s)\ -\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2s}\right)\ &=\ \frac{d}{ds}\left(\frac{-n}{2}log(2\pi)\ +\ \frac{-n}{2}logs\ -\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2s}\right) \\
&=\ \frac{-n}{2s}\ +\ \sum_{i=1}^{n}\frac{(x-\mu)^2}{2s^2} \\
&=\ 0
\end{split} \\
\Rightarrow\ s\ =\ \frac{\sum_{i=1}^{n}(x-\mu)^2}{n}\ =\ \sigma^2}$
## Conjugate on Gaussian
高斯分佈也有所謂的 conjugate,這裡只證明給定 prior ${N(\mu_0,\ \sigma_0^2)}$
若有一組 data,其 likelihood 的 variance 為已知,且 mean 亦為一個 Gaussian distribution,則其 posterior 仍為一個 Gaussian distribution。
> 注意:這裡的變數只有 ${\mu}$
由 Bayesian:
${P(\theta|D)\ =\ \frac{P(D|\theta)P(\theta)}{P(D)} \\
P(D|\theta)P(\theta)\ =\ P(D|\mu)P(\mu)\ =\ \prod_{i=1}^n P(x_i|\mu,\ \sigma^2)N(\mu|\mu_0,\ \sigma_0^2)}$
> 這裡的 ${\mu}$ 其實也是 N 中的一個 x,可以想成,今天我有一大堆新的 data,而我只知道他們的 variance,我希望在不知道 ${\mu}$ 的情況下,利用 likelihood 與 prior,得到更新後的 posterior,而 ${\mu}$ 其實也可以是一筆資料。
${\prod_{i=1}^n P(x_i|\mu,\ \sigma^2)N(\mu|\mu_0,\ \sigma_0^2) \\
=\ \frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x_1-\mu)^2}{2\sigma^2}}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x_2-\mu)^2}{2\sigma^2}}...\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x_n-\mu)^2}{2\sigma^2}}\frac{1}{\sqrt{2\pi}\sigma_0}e^{\frac{-(\mu-\mu_0)^2}{2\sigma_0^2}} \\
=\ \left(\frac{1}{\sqrt{2\pi}\sigma}\right)^ne^{\sum_{i=1}^n\frac{-(x_i-\mu)^2}{2\sigma^2}}\frac{1}{\sqrt{2\pi}\sigma_0}e^{\frac{-(\mu-\mu_0)^2}{2\sigma_0^2}} \\
=\ \left(\frac{1}{\sqrt{2\pi}\sigma}\right)^n\left(\frac{1}{\sqrt{2\pi}\sigma_0}\right)e^{\sum_{i=1}^n\frac{-(x_i-\mu)^2}{2\sigma^2}\ +\ \frac{-(\mu-\mu_0)^2}{2\sigma_0^2}}}$
我們希望指數項可以變成平方項加常數的形式(符合高斯的形式):
${\sum_{i=1}^n\frac{-(x_i-\mu)^2}{2\sigma^2}\ -\ \frac{(\mu-\mu_0)^2}{2\sigma_0^2} \\
=\ \frac{-1}{2\sigma^2}\sum_{i=1}^n (x_i^2\ +\ \mu^2\ -\ 2x_i\mu)\ -\ \frac{1}{2\sigma^2}(\mu^2\ +\ \mu_0^2\ -\ 2\mu\mu_0) \\
=\ \frac{-1}{2}\mu^2\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)\ +\ \mu\left(\frac{\sum_{i=1}^n x_i}{\sigma^2}\ +\ \frac{\mu_0}{\sigma_0^2}\right)\ -\ \frac{1}{2}\left(\frac{\sum_{i=1}^n x_i^2}{\sigma^2}\ +\ \frac{\mu_0^2}{\sigma_0^2}\right) \\
=\ \frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right) \\
\left(\mu^2-\frac{2\left(\frac{\sum_{i=1}^n x_i}{\sigma^2}+\frac{\mu_0}{\sigma_0^2}\right)}{\left(\frac{n}{\sigma^2}+\frac{1}{\sigma_0^2}\right)}\mu+\left(\frac{\left(\frac{\sum_{i=1}^n x_i}{\sigma^2}+\frac{\mu_0}{\sigma_0^2}\right)}{\left(\frac{n}{\sigma^2}+\frac{1}{\sigma_0^2}\right)}\right)^2+\frac{\left(\frac{\sum_{i=1}^n x_i^2}{\sigma^2}\ +\ \frac{\mu_0^2}{\sigma_0^2}\right)}{\left(\frac{n}{\sigma^2}+\frac{1}{\sigma_0^2}\right)}-\left(\frac{\left(\frac{\sum_{i=1}^n x_i}{\sigma^2}+\frac{\mu_0}{\sigma_0^2}\right)}{\left(\frac{n}{\sigma^2}+\frac{1}{\sigma_0^2}\right)}\right)^2\right)}$
${令\ \mu_n\ =\ \frac{\left(\frac{\sum_{i=1}^n x_i}{\sigma^2}\ +\ \frac{\mu_0}{\sigma_0^2}\right)}{\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)} \\
\begin{split}
&\Rightarrow\ \sum_{i=1}^n\frac{-(x_i-\mu)^2}{2\sigma^2}\ -\ \frac{(\mu-\mu_0)^2}{2\sigma_0^2} \\
&=\ \frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)(\mu\ -\ \mu_n)^2\ +\ \frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)\left(\frac{\left(\frac{\sum_{i=1}^n x_i^2}{\sigma^2}\ +\ \frac{\mu_0^2}{\sigma_0^2}\right)}{\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)}\ -\ \mu_n^2\right) \\
&=\ \frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)(\mu\ -\ \mu_n)^2\ +\ D
\end{split} \\
\Rightarrow\ P(D|\theta)P(\theta)\ =\ Ce^{\frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)(\mu\ -\ \mu_n)^2\ +\ D}\ =\ Ae^{\frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)(\mu\ -\ \mu_n)^2}}$
而 marginal 為所有可能的 ${\mu_i}$ 加總:
${P(D)\ =\ \int_{-\infty}^\infty Ae^{\frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)(\mu_i\ -\ \mu_n)^2}\ =\ \int_{-\infty}^\infty Ae^{\frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)\mu_i^2}\ =\ A\sqrt{\frac{\pi}{\frac{1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)}}}$
${因此\ P(\theta|D)\ =\ \frac{Ae^{\frac{-1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)(\mu\ -\ \mu_n)^2}}{A\sqrt{\frac{\pi}{\frac{1}{2}\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)}}} \\
令\ \sigma_n^2\ =\ \frac{1}{\left(\frac{n}{\sigma^2}\ +\ \frac{1}{\sigma_0^2}\right)} \\
\Rightarrow\ P(\theta|D)\ =\ \frac{1}{\sqrt{2\pi}\sigma_n}e^{\frac{-(\mu-\mu_n)^2}{2\sigma_n^2}}}$
得證,posterior 仍為一個 Gaussian distribution,
且觀察 ${\mu_n\ =\ \sigma_n^2\left(\frac{\sum_{i=1}^n x_i}{\sigma^2}\ +\ \frac{\mu_0}{\sigma_0^2}\right)\ =\ \sigma_n^2\left(\frac{1}{\sigma^2}n\mu_{MLE}\ +\ \frac{1}{\sigma_0^2}\mu_0\right)\ =\ \sigma_n^2\frac{n}{\sigma^2}\mu_{MLE}\ +\ \frac{\sigma_n^2}{\sigma_0^2}\mu_0}$
且 ${\sigma_n^2\frac{n}{\sigma^2}\ +\ \frac{\sigma_n^2}{\sigma_0^2}\ =\ 1}$
所以 ${\mu_n}$ 必介在 ${\mu_{MLE}}$ 與 ${\mu_0}$ 之間
我們也可以觀察,當 ${n\ \rightarrow\ 0}$,${P(\theta|D)\ =\ \frac{1}{\sqrt{2\pi}\sigma_n}e^{\frac{-(\mu-\mu_0)^2}{2\sigma_n^2}}\ =\ prior}$
當 ${n\ \rightarrow\ \infty}$,
${\sigma_n^2\ =\ 0 \\
\mu_n\ =\ \frac{\sum_{i=1}^n x_i}{n}}$
也就是我們可以 100% 肯定資料的答案為 mean。