類神經網路簡介

--- tags: Lecture Note, 類神經 --- 類神經網路簡介 === <style> .blue {color: blue;} .ph { display : block; margin-left : auto ; margin-right: auto; width: 80%} .ph_4 { display : block; margin-left : auto ; margin-right: auto; width: 40%} .ph_6 { display : block; margin-left : auto ; margin-right: auto; width: 60%} .ph_8 { display : block; margin-left : auto ; margin-right: auto; width: 80%} </style> <DIV style="text-align:center"> <font size=4>----------------------------</font> <font size=5> 09/16 </font> <font size=4>----------------------------</font> </DIV> <DIV style="text-align:center"> <font size=2>Cloud Layer, Fog Layer, AI DL ML</font> </DIV> <DIV style="text-align:center"> <font size=4>--------------------------------------------------------------------</font> </DIV> ## Artifical Intelligence v.s. Machine Learning v.s. Deep Learning <img src=https://i.imgur.com/r0eW6as.png class="ph"> * **Artificial Intelligence (AI)** is a broad field of study dedicated to complex problemsolving. * **Machine Learning (ML)** is usually considered as a subfield of AI. ML is a data-drivenapproach focused on creating algorithms that has the ability to learn from the data without being explicitlyprogrammed. * **Deep Learning (DL)** is a subfield of ML focused on deep neural networks (NN) able to automatically learn hierarchical representations * **What's different** Machine Learning 特徵擷取來自於人類，Deep Learning 實現 N to N <img src=https://i.imgur.com/cU2HK06.png class="ph"> ## Cloud Layer & Fog Layer <img src=https://i.imgur.com/HlIbAtD.png class="ph"> - **Sensor Layer** Edge(Phone) 獲得資料的來源 - **Fog Layer** 多了Fog Layer 主要是差異在資料分流以達到即時性，如果再edge無法處理完，則到fog layer 處理，如果 fog layer 無法處理，則到 Cloud layer 處理。 - **Cloud layer v.s. Fog Layer** <img src=https://i.imgur.com/MBZbgdU.png class="ph"> - 差異在傳輸距離 - 傳輸時間大約差7倍 ## AI 起源 - AI 起源於模仿人類的神經系統 - 每個神經元功用在於：訊息傳輸、訊息放大或是抑制 ## Structure of Network - Single Layer Neural Network <img src=https://i.imgur.com/obNW7Us.png class="ph"> - Recurrent Neural Networks <img src=https://i.imgur.com/P7T1hQe.png class="ph"> - $Z^{-1}$ 延遲在下一個time 再變成input進去 - 影像為數位訊號 --- <DIV style="text-align:center"> <font size=4>----------------------------</font> <font size=5> 09/23 </font> <font size=4>----------------------------</font> </DIV> <DIV style="text-align:center"> <font size=2> Machine Learning, Deep Learning</font> </DIV> <DIV style="text-align:center"> <font size=4>--------------------------------------------------------------------</font> </DIV> ## Machine Learning and Deep Learning 1. **Classification:** Predict a class of an object (cat/dog from an image, male/female from user’s web activity, spam/ham from an email contents,etc) 需要label，依照現有資料做分類，找到合適的分群指標。 3. **Regression:** Predict a continuous value for an object (sales in the next month, price of a house with specific features, energy consumption,etc) 5. **Clustering:** Group similar objects together (find user groups different by their behaviour on the site, cluster customers into meaningful groups,etc) 沒有label，根據輸出資料做分類。 7. **Ranking**: arrange a list of objects to maximize some utility (i.e. by click probability, or by the relevance to aquery) 8. **Recommendations:** filter a small subset of objects from a large collection and recommend them to a user (increase sales and client loyalty,etc). 9. **Feature extraction/Dimensionality reduction:** “compress” the data from a high-dimensional representation into a lower-dimensional one (useful for visualization or as an internal transformation for other MLalgorithms) --- <DIV style="text-align:center"> <font size=4>----------------------------</font> <font size=5> 09/30 </font> <font size=4>----------------------------</font> </DIV> <DIV style="text-align:center"> <font size=2> 上機, ML </font> </DIV> <DIV style="text-align:center"> <font size=4>--------------------------------------------------------------------</font> </DIV> ## ANACONDA - [Install](https://www.anaconda.com/products/individual) ```python conda install / conda list conda create -n nn_class python=3.7.3 %( 創造虛擬環境的指令 ) conda env list %( 查看現有環境 ) activate nn_class %( 進入虛擬環境 ) conda install matplotlib conda install tensorflow==1.14 conda install keras==2.3.1 conda install jupyter conda install numpy conda install ipykernel %(透過 ipykernel 模組將其新增至 Jupyter Notebook 的核心清單（Kernelspec list）中) python -m ipykernel install --user --name nn_class --display-name "Python 3.6” ``` ## KERAS - Define your output - Choose the losses - Batch size - epoch <img src = https://i.imgur.com/GxLvZML.png class = "ph_6"> - Activation function <img src = https://i.imgur.com/8rxjBIt.png class = "ph_6"> 1. **softmax：** - softmax(x, axis=-1) - 参数 - x：张量。 - axis：整数，代表softmax所作用的维度。 - 返回 - softmax 变换后的张量。 2. **elu** - elu(x, alpha=1.0) - 参数 - x：张量。 - alpha：一个标量，表示负数部分的斜率。 - 返回 - 线性指数激活：如果 x > 0，返回值为 x；如果 x < 0 返回值为 alpha * (exp(x)-1) - **selu:** 可伸缩的指数线性单元（Scaled Exponential Linear Unit），参考Self-Normalizing Neural Networks - selu(x) - 参数 - x: 一个用来用于计算激活函数的张量或变量。 - 返回 - 可伸缩的指数线性激活：scale * elu(x, alpha)。 3. **softplus** - softplus(x) - 参数 - x: 张量。 - 返回 - Softplus 激活：log(exp(x) + 1)。 4. **softsign** - softsign(x) - 参数 - x: 张量。 - 返回 - Softsign 激活：x / (abs(x) + 1)。 5. **relu** - relu(x, alpha=0.0, max_value=None, threshold=0.0) - 使用默认值时，它返回逐元素的 max(x, 0)。否则，它遵循：如果 x >= max_value：f(x) = max_value，如果 threshold <= x < max_value：f(x) = x，否则：f(x) = alpha * (x - threshold)。 - 参数 - x: 张量。 - alpha：负数部分的斜率。默认为 0。 - max_value：输出的最大值。 - threshold: 浮点数。Thresholded activation 的阈值值。 - 返回 - 一个张量。 6. **tanh** - tanh(x) 7. **sigmoid** - sigmoid(x) 8. **hard_sigmoid** - hard_sigmoid(x) - 计算速度比 sigmoid 激活函数更快。 - 参数 - x: 张量。 - alpha：负数部分的斜率。默认为 0。 - max_value：输出的最大值。 - threshold: 浮点数。Thresholded activation 的阈值值。 - 返回 - Hard sigmoid 激活： - 如果 x < -2.5，返回 0。 - 如果 x > 2.5，返回 1。 - 如果 -2.5 <= x <= 2.5，返回 0.2 * x + 0.5。 9. **exponential** - exponential(x) 10. **linear** - linear(x) - [Optimizer 選項](https://keras.io/api/optimizers/) - **Adam**：一般而言，比SGD模型訓練成本較低，請參考『Adam - A Method for Stochastic Optimization』，包含相關參數建議值，含以下參數： - lr：逼近最佳解的學習速率，預設值為0.001 - beta_1：一階矩估計的指數衰減因子，預設值為0.9。 - beta_2：二階矩估計的指數衰減因子，預設值為0.999。 - epsilon：為一大於但接近 0 的數，放在分母，避免產生除以 0 的錯誤，預設值為1e-08。 - decay：每次更新後，學習速率隨之衰減的比率。 - **SGD**：隨機梯度下降法(Stochastic Gradient Descent, SGD)：就是利用偏微分，逐步按著下降的方向，尋找最佳解。它含以下參數： - Learning Rate (lr)：逼近最佳解的學習速率，速率訂的太小，計算最佳解的時間花費較長，訂的太大，可能會在最佳解兩旁擺盪，找不到最佳解。 - momentum：更新的動能，一開始學習速率可以大一點，接近最佳解時，學習速率步幅就要小一點，一般訂為0.5，不要那麼大時，可改為 0.9。 - decay：每次更新後，學習速率隨之衰減的比率。 - nesterov：是否使用 Nesterov momentum，[請參考](http://blog.csdn.net/luo123n/article/details/48239963)。 - **RMSprop** - Losses 選項 https://keras.io/api/losses/ - Probabilistic losses - Regression losses ## NN_MINST ```python= import numpy as np from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Flatten from keras.utils import to_categorical from keras.optimizers import SGD from keras.utils import np_utils import matplotlib.pyplot as plt (X_train, Y_train), (X_test, Y_test) = mnist.load_data() # normalization x_train = X_train.reshape(60000,28, 28,1)/255 x_test = X_test.reshape(10000, 28, 28,1)/255 # model structure model = Sequential([ Flatten(), Dense(128, activation='relu'), Dense(10, activation='softmax'), ]) # compile the model model.compile(SGD(lr=.005), loss='categorical_crossentropy', metrics=['accuracy']) # model.compile(optimizer='adam', loss='cosine_proximity', metrics=['accuracy']) # training history=model.fit( x_train, to_categorical(Y_train), batch_size=100, epochs=5, validation_data=(x_test, to_categorical(Y_test)), ) model.summary() history.history['accuracy'],history.history['val_accuracy'] plt.title("Model Accuracy") plt.plot(np.arange(0,5,1),history.history['accuracy'],color='blue',label = 'Train') plt.plot(np.arange(0,5,1),history.history['val_accuracy'],color='orange', label = 'Validation') plt.legend() plt.xlabel("Epoch") plt.ylabel("Accuracy") plt.legend() plt.show() plt.title("Model loss") plt.plot(np.arange(0,5,1),history.history['loss'],color='blue',label = 'Train') plt.plot(np.arange(0,5,1),history.history['val_loss'],color='orange', label = 'Validation') plt.legend() plt.xlabel("Epoch") plt.ylabel("Loss") plt.legend() plt.show() ``` ## CNN_MINST - Flatten - Pooling <img src = https://i.imgur.com/jfPzi3R.png class = "ph"> ```python= import numpy as np from keras.datasets import mnist from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten from keras.utils import to_categorical from keras.optimizers import SGD from keras.utils import np_utils import matplotlib.pyplot as plt (X_train, Y_train), (X_test, Y_test) = mnist.load_data() # normalization x_train = X_train.reshape(60000,28, 28,1)/255 x_test = X_test.reshape(10000, 28, 28,1)/255 # model structure model = Sequential([ Conv2D(8, 3, input_shape=(28, 28,1), use_bias=False,padding='same'), MaxPooling2D(pool_size=2), Flatten(), Dense(10, activation='softmax'), ]) # compile the model model.compile(SGD(lr=.005), loss='categorical_crossentropy', metrics=['accuracy']) # training history=model.fit( x_train, to_categorical(Y_train), batch_size=100, epochs=5, validation_data=(x_test, to_categorical(Y_test)), ) model.summary() history.history['accuracy'],history.history['val_accuracy'] plt.title("Model Accuracy") line1, = plt.plot(np.arange(0,5,1),history.history['accuracy'],color='blue', label = 'Train') line2, = plt.plot(np.arange(0,5,1),history.history['val_accuracy'],color='orange', label = 'Validation') plt.legend(handles = [line1, line2], loc='upper left') plt.xlabel("Epoch") plt.ylabel("Accuracy") plt.xlim(0,4) plt.show() plt.title("Model Accuracy") line1, = plt.plot(np.arange(0,5,1),history.history['loss'],color='blue', label = 'Train') line2, = plt.plot(np.arange(0,5,1),history.history['val_loss'],color='orange', label = 'val_accuracy') plt.legend(handles = [line1, line2], loc='upper left') plt.xlabel("Epoch") plt.ylabel("Loss") plt.xlim(0,4) plt.show() ``` ```python keras.layers.Conv2D( filters = 16, kernel_size = 5, strides = (1, 1), padding = 'valid', data_format = None, dilation_rate = (1, 1), activation = None, use_bias = True, kernel_initializer = 'glorot_uniform', bias_initializer = 'zeros', kernel_regularizer = None, bias_regularizer = None, activity_regularizer = None, kernel_constraint = None, bias_constraint = None) ``` - filters: 整數，輸出空間的維度（即卷積中濾波器的輸出數量）。 - kernel_size: 一個整數，或者 2 個整數表示的元組或列表，指明 2D 卷積窗口的寬度和高度。可以是一個整數，為所有空間維度指定相同的值。 - strides: 一個整數，或者 2 個整數表示的元組或列表，指明卷積沿寬度和高度方向的步長。可以是一個整數，為所有空間維度指定相同的值。指定任何 stride 值 != 1 與指定 dilation_rate 值 != 1 兩者不相容。 - padding: "valid" 或 "same" 。 - data_format: 字串， channels_last (預設) 或 channels_first 之一，表示輸入中維度的順序。 channels_last 對應輸入尺寸為 (batch, height, width, channels)， channels_first 對應輸入尺寸為 (batch, channels, height, width)。它默認為從 Keras 設定檔 ~/.keras/keras.json 中找到的 image_data_format 值。如果你從未設置它，將使用 channels_last。 - dilation_rate: 一個整數或 2 個整數的元組或清單，指定膨脹卷積的膨脹率。可以是一個整數，為所有空間維度指定相同的值。當前，指定任何 dilation_rate 值 != 1 與指定 stride 值 != 1 兩者不相容。 - activation: 要使用的啟動函數 (詳見 activations)。如果你不指定，則不使用啟動函數 (即線性啟動： a(x) = x)。 - use_bias: 布林值，該層是否使用偏置向量。 - [kernel_initializer](https://keras.io/zh/initializers/): kernel 權值矩陣的初始化器。 - [bias_initializer](https://keras.io/zh/initializers/): 偏置向量的初始化器。 - [kernel_regularizer](https://keras.io/zh/regularizers/): 運用到 kernel 權值矩陣的正則化函數。 - [bias_regularizer](https://keras.io/zh/regularizers/): 運用到偏置向量的正則化函數。 - [activity_regularizer](https://keras.io/zh/regularizers/): 運用到層輸出（它的啟動值）的正則化函數。 - [kernel_constraint](https://keras.io/zh/constraints/): 運用到 kernel 權值矩陣的約束函數。 - [bias_constraint](https://keras.io/zh/constraints/): 運用到偏置向量的約束函數。 --- <DIV style="text-align:center"> <font size=4>----------------------------</font> <font size=5> 10/07 </font> <font size=4>----------------------------</font> </DIV> <DIV style="text-align:center"> <font size=2> Section 2 Learning Method of NN </font> </DIV> <DIV style="text-align:center"> <font size=4>--------------------------------------------------------------------</font> </DIV> ## Learning Steps in NNs - Stimulation by the environment - Parameters are changed as a result of the stimulation - Response to the environment is observed due to this change ## Basic Learning Rules ### Error-correction learning <img src = https://i.imgur.com/kX5sJ4X.png class = "ph"> - Need learning pattern(有 desire) - Delta rule(Widrow-Hoff rule or LMS learning rule): - ![](https://i.imgur.com/VhUmHF1.png) where - η = learning rate coefficients - $x_j(n)$ = jth input variable - Example - <img src = https://i.imgur.com/9sduWyZ.png class = "ph"> ### Memory-based learning - Data from all training cases is retained - Any new case is compared to the training cases - The closest (nearest) case is selected - Nearest is typically defined using the Euclidean distance between the input vectors - The output of this case is assumed to be the output from the new case - This method is also known as nearest neighbor classification K-Nearest Neighbor Classification - Advantage: minimizes the effect of outlying cases - Disadvantage: cannot correctly classify outlying cases - Outliers vs. Noises - Outlier: 在邊界範圍的數值是難以正確判斷的。 - Noise: Irrelevant or meaningless data - Example - ![](https://i.imgur.com/m11h1Ts.png) - 不同的input應先進行normalized之後將每個input影響output的程度調整至相同，若需要將不同的input給定不同的影響程度的話，可以在乘上各自的weighting ### Competitive learning - Output neurons compete for different patterns in a winner-take-all competition - 只有winner會調整 - Weights 隨機初始或是依照著初始值給定。 - Weights are adjusted to strengthen winning situations - All inputs and outputs are binary values <img src = https://i.imgur.com/dDVwqiE.png class = "ph_6"> - For all neurons:(normalized weight) - ![](https://i.imgur.com/TvtAEE7.png) - For winning neurons: - $y_k=1$ output = 1 - $Δw_{kj}=η(x_j-w_{kj})$ $η$ learning rate - For losing neurons: - $y_k=0$ output = 0 - $Δw_{kj}=0$ - 可用以前期資料分類 ## Learning Paradigms ### Supervised learning <img src = https://i.imgur.com/vlNoznO.png class = "ph_8"> - Also known as “Learning with a teacher” - Involve using a large set of “training” examples (with known solution) - Utilize error-correction learning - Good for pattern classification and function approximation problems ### Unsupervised (self-organized) learning <img src = https://i.imgur.com/jGFFfVE.png class = "ph_6"> - Involve using a large set of “training” examples (with unknown solution) - Typically utilizes competitive learning - Good for feature extraction ### Reinforcement learning <img src = https://i.imgur.com/pZ1JWuS.png class = "ph_8"> - No teacher to provide a desired response at each step of learning process. - The learning machine must solve a temporal credit assignment problem. (That is, the machine must be able to assign credit or blame individually to each action in the sequence of time steps that led to the final outcome.) ## Learning Tasks - Pattern association or pattern recognition - Function approximation - System identification - <img src = https://i.imgur.com/S1JV5FR.png class = "ph_6"> - Inverse system identification - <img src = https://i.imgur.com/7zeV1Cp.png class = "ph_6"> - 先給 input output 把模型訓練出來，再利用inverse model 給定output 找到對應的 input - Control - <img src = https://i.imgur.com/DWPSkNV.png class = "ph_6"> - Direct learning - Indirect learning - Filtering - Noise cancellation - Speech recognition --- <DIV style="text-align:center"> <font size=4>----------------------------</font> <font size=5> 10/14 </font> <font size=4>----------------------------</font> </DIV> <DIV style="text-align:center"> <font size=2> Section 3: Feedforward & Kernel-based Networks </font> </DIV> <DIV style="text-align:center"> <font size=2> Part 1: First-order Optimization Algorithms: Backpropagation Algorithms </font> </DIV> <DIV style="text-align:center"> <font size=4>--------------------------------------------------------------------</font> </DIV> ### System Identification System identification—To determine a mathematical model for an unknown system by observing its input-output data pair Purposes: - To predict a system’s behavior - To explain the interactions and relationships between inputs and outputs of a system - To design a controller based on the model of a system 兩個主要的步驟： 1. Structure Identification 建立一個參數化的模型 2. Parameter Identification 找參數使得該模型可以完整地表示想要描述之系統 <img src = https://i.imgur.com/p8enxTp.png class = "ph_6"> - Least-Squares Methods - Derivative-Free Optimization - Genetic algorithms - Random search - Derivative-based Optimization - Gradient decent methods - Newton’s methods - Conjugate gradient methods - Nonlinear least-squares problems - Reference Books - J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Englewood Cliffs, N.J.: Prentice-Hall, 1983. - E. K. P. Chong and S. H. Zak, An Introduction to Optimization, John Wiley & Sons, 2nd, 2001. - A. J. Shepherd, Second-Order Methods for Neural Networks, Springer-VerlagNew York, 1997. ### Activation Function - [Keras](#KERAS) - Threshold (Step) Function: <img src = https://i.imgur.com/gr3xgey.png class = "ph_6"> - Piecewise-Linear Function: <img src = https://i.imgur.com/X3iCZyd.png class = "ph_6"> - Sigmoid Function: <img src = https://i.imgur.com/K12hkQU.png class = "ph_6"> - Radial Basis Function: <img src = https://i.imgur.com/nkSbUl7.png class = "ph_6"> - Hyperbolic Tangent Function: <img src = https://i.imgur.com/6qsWd9Q.png class = "ph_6"> ### Backpropagation Algorithm [Reference](https://medium.com/@wenwu53/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92%E8%87%AA%E5%AD%B8%E7%AD%86%E8%A8%9808-backpropagation-94e66e40a09c) <img src = https://i.imgur.com/NsonIfr.png class = "ph_4"> #### Forward - Starting at the input layer, use the input data of the training set to compute the output of each neuron - Iteratively use the outputs of neurons as inputs to compute the outputs of neurons in the next layer - When the output layer is reached, compute the error signal - Store all necessary input and output values for the backward pass. #### Backward - Starting at the output layer: - Use the error signal to compute the gradient - Use the gradient to compute the change of the weights - Going backward one hidden layer at a time: - Sum up the error signals from the output layer - Use the summed error signals to compute the change in the weights for the hidden layers - Continue the second step until reach the input layer ### Derivation of Learning Rules <img src = https://i.imgur.com/34GPwc2.png class = "ph_6"> <DIV style="text-align:center"> <font size=4>----------------------------</font> <font size=5> 11/4 </font> <font size=4>----------------------------</font> </DIV> <DIV style="text-align:center"> <font size=2> 上機 </font> </DIV> <DIV style="text-align:center"> <font size=4>--------------------------------------------------------------------</font> </DIV> ### 上機 #### Neural Network - regression 的error可以定義新的 - 定義出errorfunction 之後微分，與 activation function 之作法依樣 - cross entropy 主要用於估測機率 - imblance - oversampling - undersampling - smote - imbalance data 其 accuracy 並非可以採信的 - balance 之後才進行train - adult - sigmoid - MNIST - relu, softmax - NN_bike - relu - NN_bike - 在回歸的時候，類別型資料需要經過處理 - get_dummies - 類別型資料將之轉換成binary - loss - MAE - Optimizer - Momentum - 更改learning rate - Adagrad weight update equation - Adam - overfitting - accuracy 維持依樣，validation error 上升 - stop criterion - 設定機制停止訓練 - Change of learning rate 可以觀查初始值應該要放在哪裡 #### Radial Basis Function - different radial basis function 是指用 gaussian 或是其他的 - K-means 決定需要己的 center - batch size 可以設定百分之多少跟新一次 weight ## Other - ARM 和 Intel 是死對頭，ARM 做微小的晶片。 - Google 和 NVIDIA 前者發展軟體，後者發展硬體，兩者發展方向源泉不同