共變異數矩陣

# 共變異數矩陣 ![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png) This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/). $\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$ ```python from lingeo import random_int_list ``` ## Main idea Let $\bx = (x_1, \ldots, x_N)$ be a collection of $N$ numbers. The **mean** of $\bx$ is $$ \mu = \frac{1}{n}(x_1 + \cdots + x_N), $$ which can also be computed by $\frac{1}{N}\inp{\bx}{\bone}$. The **variance** of $\bx$ is $$ \sigma^2 = \frac{1}{N-1}\left[(x_1 - \mu)^2 + \cdots + (x_N - \mu)^2\right], $$ which can also be computed by $\frac{1}{N-1}\inp{\bx -\mu\bone}{\bx - \mu\bone}$. That is, one may shift the data and replace $\bx$ with $\bx - \mu\bone$. Thus, the new data is centered at the origin, and the variance of it is simply $\frac{1}{N-1}\inp{\bx}{\bx}$. Let $\bx = (x_1, \ldots, x_N)$ and $\by = (y_1, \ldots, y_N)$ be two collections of numbers. Let $\mu_\bx$ and $\mu_\by$ be the means of $\bx$ and $\by$, respectively. The **covariance** of $\bx$ and $\by$ is $$ \frac{1}{N-1}[(x_1 - \mu_\bx)(y_1 - \mu_\by) + \cdots + (x_N - \mu_\bx)(y_N - \mu_\by)]. $$ Similarly, the covariance can be computed by $\frac{1}{N-1}\inp{\bx - \mu_\bx\bone}{\by - \mu_\by\bone}$, and the covariance of $\bx$ and $\bx$ is the variance of itself. As mentioned in 606, data is often stored in a matrix such that each row represents a sample while each column represents a feature. When $X$ is a such matrix of dimension $N\times d$, the row vectors are called the **sameple vectors** while the columns are called the **feature vectors**. One may make each feature vector centered at $0$ in a few steps. 1. Let $\mu\trans = \frac{1}{N}\bone\trans X$ be the vector that records the mean of each feature vector. 2. Replace $X$ with $X - \bone\mu\trans = X - \frac{1}{N}JX = (I - \frac{1}{N}J)X$, where $J$ is the $N\times N$ all-ones matrix. Once each feature is centered at $0$, the matrix $C = \frac{1}{N-1}X\trans X$ is called the **covariance matrix**, whose $i,j$-entry is the covariance between the $i$-th and the $j$-th feature. ## Side stories - np.random.multivariate_normal ## Experiments ##### Exercise 1 執行以下程式碼。 ```python ### code set_random_seed(None) print_ans = True n = 5 while True: X = matrix(n, random_int_list(2*n)) if sum(X.transpose()[0]) % n == 0 and sum(X.transpose()[1]) % n == 0: break pretty_print(LatexExpr("X ="), X) if print_ans: mu = (1/n) * ones_matrix(1,n) * X print("means of x and y:", mu) X_shifted = X - (1/n) * ones_matrix(n,n) * X print("covariance matrix =") pretty_print(LatexExpr(r"\frac{1}{%s}"%(n-1)), X_shifted.transpose() * X_shifted) ``` ##### Exercise 1(a) 令 $\bx$ 和 $\by$ 分別為 $X$ 的兩個行向量。求 $\bx$ 的平均值和變異數、$\by$ 的平均值和變異數、以及 $\bx$ 和 $\by$ 的共變異數。 $Ans :$ 由 `seed = 0` 可以得到 $X = \begin{bmatrix} 0 & -2 \\ -1 & -1 \\ 3 & 2 \\ 0 & -4 \\ -2 & 5 \end{bmatrix}$。令 $\bx = \begin{bmatrix} 0 \\ -1 \\ 3 \\ 0 \\ -2 \end{bmatrix} 、 \by = \begin{bmatrix} -2 \\ -1 \\ 2 \\ -4 \\ 5 \end{bmatrix}$ ， $\bx$ 的平均值 $\mu_\bx = \frac{1}{5}(0+(-1)+3+0+(-2)) = 0$ ， $\bx$ 的變異數 $\sigma_\bx^2 = \frac{1}{5-1}((0-0)^2+(-1-0)^2+(3-0)^2+(0-0)^2+(-2-0)^2) = \frac{14}{4}$ ， $\by$ 的平均值 $\mu_\by = \frac{1}{5}((-2)+(-1)+2+(-4)+5) = 0$ ， $\by$ 的變異數 $\sigma_\by^2 = \frac{1}{5-1}((-2-0)^2+(-1-0)^2+(2-0)^2+(-4-0)^2+(5-0)^2) = \frac{50}{4}$ ， $\bx$ 和 $\by$ 的共變異數 $\sigma_{\bx\by}^2 = \frac{1}{5-1}((0-0)(-2-0)+(-1-0)(-1-0)+(3-0)(2-0)+(0-0)(-4-0)+(-2-0)(5-0)) = \frac{-3}{4}$ 。 ##### Exercise 1(b) 將 $X$ 的各行當成資料的特徵。求特徵和特徵之間的共變異數矩陣。 $Ans :$ 令 $X_0 = X-\frac{1}{N}JX = \begin{bmatrix} 0 & -2 \\ -1 & -1 \\ 3 & 2 \\ 0 & -4 \\ -2 & 5 \end{bmatrix} - \frac{1}{5}\begin{bmatrix} 1 & 1 & 1 & 1 &1 \\ 1 & 1 & 1 & 1 &1 \\ 1 & 1 & 1 & 1 &1 \\ 1 & 1 & 1 & 1 &1 \\ 1 & 1 & 1 & 1 &1 \end{bmatrix}\begin{bmatrix} 0 & -2 \\ -1 & -1 \\ 3 & 2 \\ 0 & -4 \\ -2 & 5 \end{bmatrix} = \begin{bmatrix} 0 & -2 \\ -1 & -1 \\ 3 & 2 \\ 0 & -4 \\ -2 & 5 \end{bmatrix}$ ，共變異數矩陣 $C = \frac{1}{N-1}X_0\trans X_0 = \frac{1}{5-1}\begin{bmatrix} 0 & -1 & 3 & 0 & -2 \\ -2 & -1 & 2 & -4 & 5 \end{bmatrix}\begin{bmatrix} 0 & -2 \\ -1 & -1 \\ 3 & 2 \\ 0 & -4 \\ -2 & 5 \end{bmatrix} = \begin{bmatrix} \frac{14}{4} & \frac{-3}{4} \\ \frac{-3}{4} & \frac{50}{4} \end{bmatrix}$ 。 ## Exercises ##### Exercise 2 說明資料矩陣的共變異數矩陣必定是半正定。 $Ans :$ 資料矩陣的共變異數矩陣 $C = \frac{1}{N-1}X_0\trans X_0 = (\frac{1}{\sqrt{N-1}}X_0\trans)(\frac{1}{\sqrt{N-1}}X_0) = M\trans M$ 為格拉姆矩陣，故資料矩陣的共變異數矩陣必定是半正定。 :::success Bingo! ::: ##### Exercise 3 令 $\bx$ 和 $\by$ 為兩筆平均值為 $0$ 的資料。令 $X$ 為由 $\bx$ 和 $\by$ 作為行向量的資料矩陣。若已知 $C = \frac{1}{N-1}X\trans X$ 為其共變異數矩陣。利用 $C$ 來求 $\bx + \by$ 的變異數。 $Ans:$ 由於 $\bx$ 和 $\by$ 為兩筆平均值為 $0$ 的資料，因此 $x - \mu_\bx = x$ 及 $y -\mu_\by = y$，令 $\bx = \begin{bmatrix} x_1 \\ | \\ x_n \end{bmatrix} 、 \by = \begin{bmatrix} y_1 \\ | \\ y_n \end{bmatrix}$ ， $C = \frac{1}{N-1}X\trans X = \frac{1}{N-1}\begin{bmatrix}-x\trans-\\-y\trans- \end{bmatrix}\begin{bmatrix} |& | \\x& y \\|& | \end{bmatrix} = \begin{bmatrix} \sigma_{x}^2& \sigma_{xy}^2\\ \sigma_{xy}^2& \sigma_{y}^2\end{bmatrix}$. 令 $C = \begin{bmatrix} a& b\\ b& c\end{bmatrix}$. $$ \begin{aligned} \sigma_{x+y} &= \frac{1}{N-1}\inp{\bx + \by}{\bx + \by}\\ &= \frac{1}{N-1}((x_1+y_1)^2+( x_2+y_2)^2+…+( x_n+y_n)^2)\\ &= \frac{1}{N-1}((x_1^2+2 x_1y_1+y_1^2) +(x_2^2+2 x_2y_1+y_2^2)+…+(x_n^2+2 x_ny_n+y_n^2))\\ &= \frac{1}{N-1}((x_1^2+…+x_n^2) +2(x_1y_1+…+ x_ny_n)+ (y_1^2+…+y_n^2))\\ &= \sigma_{x}^2 + \sigma_{xy}^2 + \sigma_{xy}^2 + \sigma_{y}^2\\ &= a + 2b + c . \end{aligned} $$ ##### Exercise 4 執行以下程式碼。每張圖中都代表一筆二維資料，每個點的 $x$-座標組成一個特徵向量 $\bx$，而每個點的 $y$-座標組成一個特徵向量 $\by$。 ```python ### code import numpy as np import matplotlib.pyplot as plt fig = plt.figure(figsize=(12,4)) axs = fig.subplots(1, 3) mu = np.array([0,0]) covs = [np.array([[5,0],[0,1]]), np.array([[1,0.9], [0.9,1]]), np.array([[1,-0.9], [-0.9,1]])] for i in range(3): axs[i].set_xlim(-5,5) axs[i].set_ylim(-5,5) data = np.random.multivariate_normal(mu, covs[i], (100,)) axs[i].scatter(*data.T) ``` ##### Exercise 4(a) 利用圖形來判斷哪一筆資料中的 $\bx$ 的變異數最大。 $Ans:$ ![](https://i.imgur.com/KgZnzVa.png) 左邊那張圖的點相對於其他兩張圖的點分布較為零散，且點的 $\bx$ 座標大約分布在 $-4$ 到 $4$ 之間，其他兩張點的 $\bx$ 座標約分布在 $-2$ 和 $2$ 之間。因此在 $\bx$ 座標上，左邊的圖每點與平均數的差平方和最大， $\bx$ 的變異數也最大。 ##### Exercise 4(b) 判斷每張圖中 $\bx$ 和 $\by$ 的共變異數為正、負、或是零。 $Ans:$ 在左邊的圖上，每個點之間的 $\bx$ 和 $\by$ 座標並無呈現明顯變化趨勢，因此共變異數約為零。在中間的圖上，每個點之間的 $\bx$ 和 $\by$ 座標變化趨勢一致，即點的 $\bx$ 軸越大， $\by$ 軸越大，因此共變異數為正。而右邊的圖上，每個點之間的 $\bx$ 和 $\by$ 座標變化趨勢相反，即點的 $\bx$ 軸越大， $\by$ 軸越小，因此共變異數為負。 :::success 怎麼那麼棒 :smiley: ::: ##### Exercise 5 查找資料並解釋 `np.random.multivariate_normal` 的用法。 :::warning - [x] 通常程式碼、或是變數名稱的部份都會用 `mean` 這種格式 - [x] 中英數之間空格 ::: $Ans:$ 此函數定義: `numpy.random.multivariate_normal(mean, cov, size, check_valid, tol)` 其中 mean 和 cov 為必要的參數，而 size，check_valid 以及 tol 為可選參數。輸入: mean: 欲生成矩陣的特徵數 (N)。 cov: 設定多元常態分佈的共變異數矩陣，此矩陣必須是對稱矩陣和半正定矩陣(爲 N 乘 N 的矩陣)。 size: $size = (n, m)$，其中 n 控制要生成區塊矩陣的數量，m 控制的是區塊矩陣的列數。若該值未給定，則返回一個特徵數為 N 的樣本，其中 N 為 mean 的值。 check_valid: 當共變異數(上面的 cov )矩陣不是半正定矩陣時，程式的處理方式（一共有三種方式：{ ‘warn’, ‘raise’, ‘ignore’ }）。igore: 忽略共變異數矩陣不是半正定矩陣的問題，生成數組。warn: 輸出警告，但是還是會生成數組。raise: 程式報錯，且不會生成數組。輸出: 隨機生成一個符合常態分佈的 $nm\times N$ 矩陣。 ##### Exercise 6 令 $\bx$ 和 $\by$ 為兩筆長度為 $N$ 的資料、而 $\mu_\bx$ 和 $\mu_\by$ 分別為它們的平均數。以下探討計算共變異數、變異數不同計算方法。 ##### Exercise 6(a) 證明 $\bx$ 和 $\by$ 的共變異數也可寫成 $$ \frac{1}{N-1}(\inp{\bx}{\by} - N\mu_\bx\mu_\by). $$ :::warning - [x] 最後半型句點 ::: $Ans:$ $\bx$ 和 $\by$ 的共變異數 $$ \begin{aligned} \sigma_{\bx\by}^2 &= \frac{1}{N-1}\inp{\bx - \mu_\bx\bone}{\by - \mu_\by\bone} \\ &= \frac{1}{N-1}\left[(x_1 - \mu_\bx)(y_1 - \mu_\by) + \cdots + (x_N - \mu_\bx)(y_N - \mu_\by)\right] \\ &= \frac{1}{N-1}(x_1y_1 - x_1\mu_\by - y_1\mu_\bx + \mu_\bx\mu_\by + \cdots + x_Ny_N - x_N\mu_\by - y_N\mu_\bx + \mu_\bx\mu_\by) \\ &= \frac{1}{N-1}(\inp{\bx}{\by} + N\mu_\bx\mu_\by - \mu_\by\inp{\bx}{\bone} - \mu_\bx\inp{\by}{\bone}) \\ &= \frac{1}{N-1}(\inp{\bx}{\by} + N\mu_\bx\mu_\by - N\mu_\by \times \frac{1}{N}\inp{\bx}{\bone} - N\mu_\bx \times \frac{1}{N}\inp{\by}{\bone}) \\ &= \frac{1}{N-1}(\inp{\bx}{\by} + N\mu_\bx\mu_\by - N\mu_\by\mu_\bx - N\mu_\bx\mu_\by) \\ &= \frac{1}{N-1}(\inp{\bx}{\by} - N\mu_\bx\mu_\by). \end{aligned} $$ ##### Exercise 6(a) 證明 $\bx$ 的變異數也可寫成 $$ \frac{1}{N-1}(\|\bx\|^2 - N\mu_\bx^2). $$ $Ans:$ $\bx$ 的變異數 $$ \begin{aligned} \sigma_\bx^2 &= \frac{1}{N-1}\inp{\bx - \mu_\bx\bone}{\bx - \mu_\bx\bone} \\ &= \frac{1}{N-1}\left[(x_1 - \mu_\bx)^2 + \cdots + (x_N - \mu_\bx)^2\right] \\ &= \frac{1}{N-1}(x_1^2 - 2x_1\mu_\bx + \mu_\bx^2 + \cdots + x_N^2 - 2x_N\mu_\bx + \mu_\bx^2) \\ &= \frac{1}{N-1}(\inp{\bx}{\bx} + N\mu_\bx^2 - 2\mu_\bx\inp{\bx}{\bone}) \\ &= \frac{1}{N-1}(\|\bx\|^2 + N\mu_\bx^2 - 2N\mu_\bx \times \frac{1}{N}\inp{\bx}{\bone}) \\ &= \frac{1}{N-1}(\|\bx\|^2 + N\mu_\bx^2 - 2N\mu_\bx^2) \\ &= \frac{1}{N-1}(\|\bx\|^2 - N\mu_\bx^2). \end{aligned} $$ :::success Well done! ::: :::info 分數 = 6.5 :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.