CHEATSHEET - HackMD

## ACTIVATION FUNCTIONS $$Step(x)= \begin{cases} &0 \text{ if } x<0 \\ &1 \text{ if } x\ge0 \end{cases}; Sign(x)= \begin{cases} &-1 \text{ if } x<0 \\ &1 \text{ if } x\ge0 \end{cases}$$ $$ReLu(x)= \text{max} (0, x) = \begin{cases} &0 \text{ if } x<0 \\ &x \text{ if } x\ge0 \end{cases}$$ $$LeakyReLu(x)= \text{max}(\alpha x, x)=\begin{cases} &x \text{ if } x\ge0 \\ &\alpha x \text{ if } x\lt0 \end{cases}$$ $$Linear(x)=x;Sigmoid(x)=\frac{1}{1+e^{-x}};Tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}} $$ ## PLA - PERCEPTRON LEARNING ALGORITHM - **Input**: $D=\{(x_1, y_1),(x_2, y_2),...(x_N, y_N)\}$ với N samples. Mô hình perceptron $y = f_w(x)$ 1. _Khởi tạo $w_0=-\text{threshold}, x_0=1$ (nếu có)_ 2. _Khởi tạo vector w cố định (thường w=0)_ 3. _Tìm $y_{pi} = f(w^T.x_i)$. Nếu $y_{pi} \ne y_i$ thì $w=w+y_ix_i$_ 4. $i = i + 1$ và quay lại bước 3 - Độ chính xác của mô hình = Số điểm dự đoán đúng/Số điểm kiểm tra ## LINEAR REGRESSION - **Input**: $D=\{(x_1, y_1),(x_2, y_2),...(x_N, y_N)\}$ với N samples. Hàm $y = f_w(x)$ - **Xác định mô hình (tìm w)**: - Lập $X=\begin{bmatrix} x_1^T \\ x_2^T \\ ... \\ x_N^T \end{bmatrix}, y=\begin{bmatrix} y_1 \\ y_2 \\ ... \\ y_N \end{bmatrix}; w=(X^T.X)^{-1}.X^T.y; y_p=\begin{bmatrix} f(x_1) \\ f(x_2) \\ ... \\ f(x_N) \end{bmatrix}$ $$MSE= \frac{1}{N}.\|y_p - y\|^2$$ - Nếu cho mô hình có khối lượng thì lập ma trận đường chéo là khối lượng A. Ứng vào công thức tìm $w$ thì nhân A sau mỗi $X^T$ thành $X^T.A$ - Nếu đề yêu cầu huấn luyện mô hình với $w$ đã cho thì thực hiện huấn luyện bằng thuật toán **Gradient Descent** với vector gradient của hàm huấn luyện là $f(w)=\nabla_w(MSE_{\text{train}})=2X^TXw-2X^Ty$ ## GRADIENT DESCENT/ASCENT Tìm điểm local min/max của $f(\theta)$ với $\theta = (\theta_1, \theta_2,... ,\theta_N)$, độ lỗi $\varepsilon$ và learning rate $\eta \in (0,1)$ 1. Tìm vector gradient tổng quát theo $\theta: \nabla_\theta f = [\frac{\partial f}{\partial \theta_1}, \frac{\partial f}{\partial \theta_2}, ..., \frac{\partial f}{\partial \theta_N}]^T$ 2. Khởi tạo $i = 0$ 3. Khởi tạo ngẫu nhiên $\theta^{(i)}$ 4. Nếu $\|\nabla_{\theta^{(i)}} f \|< \varepsilon$ thì kết quả là $\theta^{(i)}$ 5. $\theta^{(i+1)} = \theta^{(i)} \mp \eta.\nabla_{\theta^{(i)}}f$ 6. $i = i + 1$ rồi quay lại bước 4 ## ĐỒ THỊ TÍNH TOÁN Cho vector $u$ size $(m)$, $v$ size $(n)$, ma trận $a$ $(m \times n)$, $b$ $(n \times p)$, số thực $x, m, n, p$. Công thức đạo hàm: - $f(x)=f_1(x)+f_2(x) \rightarrow \frac{\partial f}{\partial x}=\frac{\partial f_1}{\partial x}+\frac{\partial f_2}{\partial x}$ - $f(x)=f_1(x)-f_2(x) \rightarrow \frac{\partial f}{\partial x}=\frac{\partial f_1}{\partial x}-\frac{\partial f_2}{\partial x}$ - $f(x)=f_1(x).f_2(x) \rightarrow \frac{\partial f}{\partial x}=f_2\frac{\partial f_1}{\partial x}+f_1\frac{\partial f_2}{\partial x}$ - $f(x)=\frac{f_1(x)}{f_2(x)} \rightarrow \frac{\partial f}{\partial x}=\frac{1}{f_2}\frac{\partial f_1}{\partial x}-\frac{f_1}{f_2^2}\frac{\partial f_2}{\partial x}$ - $f(g(x)) \rightarrow \frac{\partial f}{\partial x}=\frac{\partial f}{\partial g} \frac{\partial g}{\partial x} \text{ (Chain rule)}$ - $f(x)=x^n \rightarrow \frac{\partial f}{\partial x}=n.x^{n-1}$ - $f(x)=e^x \rightarrow \frac{\partial f}{\partial x}=e^x$ - $f(x)=\sigma(x)=\frac{1}{1+e^{-x}} \rightarrow \frac{\partial f}{\partial x}=f(x)(1-f(x))$ - $f(x)=\tanh(x) \rightarrow \frac{\partial f}{\partial x}=1-f^2(x)$ - $v=f(u) \rightarrow \frac{\partial v}{\partial u}=\begin{bmatrix} \frac{\partial v_1}{\partial u_1} &\cdots & \frac{\partial v_n}{\partial u_1} \\ \vdots & \ddots &\vdots \\ \frac{\partial v_1}{\partial u_m} & \cdots & \frac{\partial v_n}{\partial u_m} \end{bmatrix} \in \mathbb{R}^{m \times n} \text{ (Jacobian matrix)}$ - $c=a\cdot b \rightarrow \frac{\partial c}{\partial a}=b^T$ ### Một số cách tính thực tiễn và ví dụ - ==Cổng L2==: ![Screenshot 2024-01-01 at 21.00.50](https://hackmd.io/_uploads/B1ejBSeup.png) \begin{gather} z = L_2(x) = \sum_{i=1}^nx^2_i \\ \\ \frac{\partial{L}}{\partial{x}} = 2x\frac{\partial{L}}{\partial{z}} \end{gather} - ==Cổng nhân ma trận và ma trận (vector tương tự)==: ![Screenshot 2024-01-01 at 21.05.11](https://hackmd.io/_uploads/BJNoIBxOT.png) \begin{gather} \frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{Z}}X^T\\ \\ \frac{\partial{L}}{\partial{X}} = W^T\frac{\partial{L}}{\partial{Z}} \end{gather} - ==Cổng Sigmoid (element-wise)==: ![Screenshot 2024-01-01 at 21.11.26](https://hackmd.io/_uploads/Sy_MuHxOT.png) \begin{gather} \frac{\partial{L}}{\partial{x}} = (1-z)z\frac{\partial{L}}{\partial{z}}\\ \end{gather} - ==Cổng softmax==: ![Screenshot 2024-01-01 at 21.13.04](https://hackmd.io/_uploads/Hy6_uBlO6.png) \begin{gather} p_i=\sum_{k=1}^n\frac{e^{z_i}}{e^{z_k}}\\\\ \frac{\partial{p_j}}{\partial{z_i}} = p_i(1-p_i), i=j \\\\ \frac{\partial{p_j}}{\partial{z_i}} = -p_ip_j, i\neq j\\ \end{gather} - ==Cổng softmax khi kết hợp với độ lỗi cross-entropy (CE)==: ![Screenshot 2024-01-01 at 21.21.20](https://hackmd.io/_uploads/B1hv9Be_T.png) ![Screenshot 2023-12-31 at 22.25.51](https://hackmd.io/_uploads/SkkfuWk_T.png =1000x) ### Back Probagation ![Screenshot 2024-01-01 at 15.58.56](https://hackmd.io/_uploads/r1tJyWxOa.png =1000x) - Tính giá trị tại các node: $a^1 = \begin{bmatrix}0.964\\-0.995\end{bmatrix},\,a^2 = \begin{bmatrix}-0.005\\0.961\end{bmatrix},\,a^3 = \begin{bmatrix}1.034\end{bmatrix},\,C = \frac{(1.034-4)^2}{1} = 8.797$ - Để update 3 trọng số $w^1, w^2, w^3$ cần tính đạo hàm riêng của $C$ theo mỗi trọng số: - Đạo hàm $C$ theo $w^1$: $\frac{\partial{C}}{\partial{w^1}} = \frac{\partial{C}}{\partial{a^3}}\frac{\partial{a^3}}{\partial{a^2}}\frac{\partial{a^2}}{\partial{a^1}}\frac{\partial{a^1}}{\partial{w^1}}$ ## LOGISTIC REGRESSION - Logistic function: $f(x)=\sigma(x)=1/(1+e^{-x})$ - Hàm likelihood với $y \in \{0, 1\}: P(y|x,w)=\sigma(w^T.x)^y.(1-\sigma(w^T.x))^{1-y}$ - Thực hiện huấn luyện mô hình bằng **Gradient Descent** với $$f(w)=-\frac{1}{N} \sum_{i=1}^N(y_i.log(\sigma(w^T.x_i))+(1-y_i).log(1-\sigma(w^T.x_i)))$$ $$\nabla_wf=\frac{1}{N} \sum_{i=1}^Nx_i(\sigma(w^T.x_i)-y_i)$$ ## DECISION TREE Cho S là bộ dữ liệu có C lớp, $p = \{p_i\}^C_{i=1}$ là tỉ lệ của các phần tử thuộc lớp i trong S và A là một thuộc tính (1 cột trong S). ==1. Entropy (Độ hỗn loạn):== $$\textbf{Entropy}(S) = -\sum_{i=1}^Cp_ilog_2p_i$$ 1.1 Entropy trung bình trên A: $$\textbf{AE}(S, A) = \sum_{v\in{Values(A)}} \frac{|S_v|}{|S|}Entropy(S_v)$$ 1.2 Information gain (Chọn cái lớn nhất): $$\textbf{Gain}(S,A) = Entropy(S) - AE(S,A)$$ ==2. Gini impurity (Độ đo Gini):== $$\textbf{GiniImp}(S) = 1 - \sum_{i=1}^Cp_i^2$$ 2.1 Gini index trên A: $$\textbf{GiniIndex}(S, A) = \sum_{v\in{Values(A)}}\frac{|S_v|}{|S|}GiniImp(S_v)$$ ==3. Misclassification impurity (Độ lỗi):== $$\textbf{MisImp}(S) = 1 - max\{p_i\}^C_{i=1}$$ 3.1 Misclassification index trên A: $$\textbf{MisIndex}(S, A) = \sum_{v\in{Values(A)}}\frac{|S_v|}{|S|}MisImp(S_v)$$ ==4. Association rule (Luật kết hợp) có dạng $X \rightarrow Y$ or $If\,X\,Then\,Y$== - Độ hỗ trợ: $$\textbf{support}(X, Y) = P(X, Y) = \frac{\#count(X,Y)}{total samples}$$ - Độ tin cậy: $$\textbf{confidence}(X \rightarrow Y) = P(Y\,|\,X) = \frac{\#count(X, Y)}{\#count(X)}$$ - Cách ghi luật: $\textbf{IF} (Outlook = Sunny)\,\wedge\,(Humidity = High) \textbf{ THEN } PlayTennis = No$ $\textbf{ELIF} (Outlook = Rain)\,\wedge\,(Wind = Week) \hspace{0.5cm}\textbf{ THEN } PlayTennis = Yes$ $\textbf{ELIF} \hspace{8.75cm} \textbf{ THEN } \textbf{failure}$ ## CONVOLUTION ### 1. Convolution ![Screenshot 2024-01-01 at 20.55.03](https://hackmd.io/_uploads/H17BVrxOa.png) ### 2. Transposed Convolution ![Screenshot 2023-12-31 at 12.34.58](https://hackmd.io/_uploads/ByAY6_0wp.png) ![Screenshot 2023-12-31 at 12.34.25](https://hackmd.io/_uploads/ry6D6_RvT.png) ![Screenshot 2023-12-31 at 12.35.18](https://hackmd.io/_uploads/Bk7jaO0P6.png) ### 3. Dilated convolutions ![Screenshot 2023-12-31 at 12.39.20](https://hackmd.io/_uploads/B1bo0OCP6.png) ![image](https://hackmd.io/_uploads/HJoxOZZ_6.png) ### 4. Self-attention Layer ![Screenshot 2024-01-01 at 21.55.27](https://hackmd.io/_uploads/ryjDz8euT.png) - **Ví dụ:** $X = (x_1, x_2, x_3) = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 2 & 1 \\ 1 & 0 & 1 \\ 0 & 2 & 1 \\ \end{bmatrix}, W_K = \begin{bmatrix} 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 1\\ 1 & 0 & 0 & 0 \end{bmatrix}, W_Q = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1\\ 1 & 0 & 1 & 1 \end{bmatrix}$ $W_V = \begin{bmatrix} 0 & 0 & 1 & 1\\ 2 & 3 & 0 & 1\end{bmatrix}$ $K = W_K.X = \begin{bmatrix} 0 & 4 & 2 \\ 1 & 4 & 3 \\ 1 & 0 & 1\end{bmatrix}$, $Q = W_Q.X = \begin{bmatrix} 1 & 2 & 2 \\ 0 & 2 & 1 \\ 2 & 2 & 3\end{bmatrix}$, $V = W_V.X = \begin{bmatrix} 1 & 2 & 2 \\ 2 & 8 & 6\end{bmatrix}$ Attention scores matrix: $S = (s_1, s_2, s_3) = Q^TK = \begin{bmatrix} 2 & 4 & 4 \\ 4 & 16 & 12 \\ 4 & 12 & 10\end{bmatrix}$ Softmax: $\alpha = (\alpha_1, \alpha_2, \alpha_3) = \begin{bmatrix} 0.063 & 0.006*10^{-3} & 0.295*10^{-3} \\ 0.468 & 0.982 & 0.881 \\ 0.468 & 0.018 & 0.12\end{bmatrix}$ Outputs: $y_1 = V.\alpha_1 = \begin{bmatrix} 1.935\\ 6.678\end{bmatrix}$, $y_2 = V.\alpha_2 = \begin{bmatrix} 2\\ 7.964\end{bmatrix}$, $y_3 = V.\alpha_3 = \begin{bmatrix} 2.002\\ 7.769\end{bmatrix}$ ## Two-class confusion matrix Consider two-class problem with two classes (+) and (-) • Number of true positives TP • Number of true negatives TN • Number of false positives FP • Number of false negatives FN • The number of actual positives P = TP + FN • The number of actual negatives N = TN + FP ![Screenshot 2023-12-31 at 20.32.43](https://hackmd.io/_uploads/H1YYaJy_a.png =400x) ### Classification related metrics - **Accuracy** (acc): số lượng dự đoán đúng chia cho tổng số dự đoán. $$acc = \frac{TP + TN}{P + N}$$ - **Error rate** (err): $$err = 1 - acc$$ - **Precision** (pre): $$pre = \frac{TP}{TP + FP}$$ - **Sensitivity** (recall, hit rate, or true positive rate tpr ): tỉ lệ các mẫu thuộc lớp (+) được dự đoán đúng. $$tpr = \frac{TP}{P}$$ - **Specificity** (selectivity or true negative rate tnr ): tỉ lệ các mẫu thuộc lớp (-) được dự đoán đúng. $$tnr = \frac{TN}{N}$$ - **F1-score**(F1): giá trị trung bình hài hòa của **Precision** và **Sensitivity** $$F1 = \frac{2 * precision * sensitivity}{precision + sensitivity}$$ - **Fall-out** or false positive rate (fpr): $$fpr = \frac{FP}{N}$$ ### Regression related metrics - **“Mean squared error” (MSE )**: Trung bình độ lỗi bình phương giữa giá trị dự đoán và giá trị thực $$M S E=\frac{1}{n} \sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2$$ - **Cross-Entropy (CE)** is a measure of the difference between two probability distributions. The cross-entropy between a “true” distribution $\textbf{t} = (t_1, ..., t_c)$ and an estimated distribution $p = (p_1, ..., p_C)$: $$CE(p, t) = -\sum_{i=1}^Ct_i\ln{p_i}$$ - Ví dụ: ![Screenshot 2024-01-01 at 15.20.43](https://hackmd.io/_uploads/Sk9yLxlda.png =500x) ## MULTILAYER PERCEPTRON (MLP) MLP là 1 đồ thị tính toán gồm nhiều layer. Lớp đầu tiên là input, layer cuối cùng là output và các layer còn lại là hidden. Mỗi layer gồm nhiều node (neuron). Mỗi node được nối với nhau bằng link và lan truyền tín hiệu. - $w^{l}_{jk}$ là trọng số của link nối từ node thứ $k$ nằm trong layer thứ $l-1$ đến node thứ $j$ nằm trong layer thứ $l$ - Mỗi layer có tham số gồm ma trận trọng số $w^l$ và vector bias $b^l$ ![Screenshot 2023-12-31 at 14.43.13](https://hackmd.io/_uploads/S1xijcAw6.png =400x) ### Tính toán tổng số lượng tham số và biến - Số lượng tham số trên mỗi layer: $N = (\text{previous_layer_nodes + 1}) * \text{layer_nodes}$ ![Screenshot 2023-12-31 at 20.29.04](https://hackmd.io/_uploads/Hyxh3yk_T.png =400x) - Tổng cộng có $15 + 16 + 10 = 41$ tham số và $4 + 3 + 4 + 2 = 13$ biến ### Softmax Gate $$f_t(x) = f_t(x_1, x_2,...,x_k) = \frac{e^{x_i}}{\sum_{j=1}^{k}e^{x_j}}, \text{for} \,t = 1,...,k$$ - Phiên bản ổn định hơn (tránh tràn số): $$f_t(x) = f_t(x_1, x_2,...,x_k) = \frac{e^{x_i-M}}{\sum_{j=1}^{k}e^{x_j-M}}, \text{where}\,M = \max(x_1,x_2,...,x_k)$$ ### Cost Function - Trong việc học của mạng neural thì việc chọn output layer phụ thuộc vào vấn đề cần học: - Classification: sigmoid, softmax, ... $\rightarrow$ cost function là $Cross\,Entropy$ - Regression: linear, ... $\rightarrow$ cost function là $MSE$ ## STATISTICAL LEARNING ### 1. Bayes Theorem $$P(h \mid \mathcal{D})=\frac{P(\mathcal{D} \mid h) P(h)}{P(\mathcal{D})}$$ - Trong đó: - $P(h)$: xác suất (niềm tin) tiên nghiệm - $P(\mathcal{D})$: xác suất của bộ dữ liệu - $P(h | \mathcal{D})$: xác suất (niềm tin) hậu nghiệm - $P(\mathcal{D} | h): likelihood$ ### 2. Công thức xác suất cơ bản - Sum rule: $X, Y$ là các biến ngẫu nhiên - $p(X)=\sum_y p(X, Y)$ - $p(X)=\int p(X, Y) d y$ (nếu y là biến liên tục) - Product rule: $p(X, Y)=p(Y \mid X) p(X)$ ### 3. Bayes Learning - K/n: Quá trình update niềm tin trên k/gian hypothesis dựa trên bộ dữ liệu D. $$P(h_i | \mathcal{D}) = \alpha*P(\mathcal{D} | h_i)*P(h_i)$$ - Trong đó: - $P(h_i)$ là niềm tin tiên nghiệm. - $\alpha$ là hệ số chuẩn hóa. - $P(\mathcal{D}, h_i)$ là likelihood. - Dự đoán sử dụng trung bình các hypothesis. $P(h_i)$ ở đây là contribution. $$P(y) = \sum_i{P(y | h_i)}P(h_i)$$ ### 4. Naive Bayes Algorithm **Maximum likelihood learning** for $\hat{P}(y=c)$ and $\hat{P}\left(x_i=a \mid y=c\right)$ $$ \begin{gathered} \hat{P}(y=c) \leftarrow \frac{n_c}{n} \\ \hat{P}\left(x_i=a \mid y=c\right) \leftarrow \frac{n_a}{n_c} \end{gathered} $$ - $n$ is number of training examples - $n_c$ is number of training examples for which $y=c$ - $n_a$ is number of examples for which $y=c$ and $x_i=a$ ### 5. Naive bayes model ![Screenshot 2024-01-01 at 14.19.26](https://hackmd.io/_uploads/B1sKP1eO6.png) ![Screenshot 2024-01-01 at 14.21.08](https://hackmd.io/_uploads/BJ4l_1xO6.png) ### 6. Avoiding the zero-probability problem Typical solution is Bayesian estimate for $\hat{P}(y=c)$ and $\hat{P}\left(x_i=a \mid y=c\right)$ $$ \begin{gathered} \hat{P}(y=c) \leftarrow \frac{n_c+1}{n+C} \\ \hat{P}\left(x_i=a \mid y=c\right) \leftarrow \frac{n_a+1}{n_c+r} \end{gathered} $$ - $n$ is number of training examples - $n_c$ is number of training examples for which $y=c$ - $C$ is the number of classes - $n_a$ is number of examples for which $y=c$ and $x_i=a$ - $r$ is the number of values of attribute $x_i$ ### 7. Bigram ![Screenshot 2024-01-01 at 20.20.49](https://hackmd.io/_uploads/HJLUnNlda.png)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.