# Deep Learning ## Multi-layer Perceptron ### Limitation: XOR **Feature selection (特徵選擇)** is a method to reduce the variables by using certain criteria to select variables that are most useful to predict the target by our model. ## Deep Learning $$\textbf{y} = \sigma (\textbf{Ax}+\textbf{b})$$ ### Why non-linearity? Why deep? It can model more complex functions. Example: XOR function ise not implementable by a single perceptron. ## Recurrent Neural Network(RNN) Used in label have sequence relationship (聲音前後有關係等等) Idea: Maintain a hidden state $h_t$ Such that $$\textbf{y}_t , \textbf{h}_t = \sigma(\textbf{Ax}_t+\textbf{Bh}_{t-1})$$ ## Convolutional Neural Network (CNN) 影像辨識 feature (receptive field) contains a "pattern" that may be "any place" in the picture -> no need to have fully connected layer! 超過範圍補值: padding Zero padding: 補0 只取某些固定的row/column不會影響圖片的性質: pooling -> 減少運算輛,但可能準確率下降 max pooling: 取範圍內最大值稱為max pooling Convolutional layer -> feature map Flatten: 把通過convolution layer的矩陣拉值變成向量 最後再丟到一個fully-connected的network. CNN is not invariant to scaling & rotation: 圖片旋轉、放大縮小可能對於CNN難以偵測 ## Self attention Input variable size? Attention : map a query and a set of key-value pairs to an output. Q and K multiplies to generate the attention score, $\alpha$, which means the correlation of the query and the key, then apply a normalization layer(ex: softmax) to get the normalized $\alpha'$. After that, use the attention score to weight the input value to get output. In short, $$\text{Attention}(Q,K,V) = \sigma(QK^T)V$$ sequence modeling ###### tags: `machine learning`