Machine Learning Week 7

--- tags: Kermadec Class, Machine Learning, Tensor Flow, Deep Learning, Activation Function, Vanishing Gradient --- Machine Learning Week 7 = # Day 1: ## Machine Learning vs Deep Learning: ML: **Human has to do manual** feature extraction, including: - Cleaning input data. - Enigineer the features: creating new/changing features. - Select useful features. DL: **Machine will do** feature extraction. However, **DL needs a lot of data**. ### What if there is not much data? **Performance** of Deep Learning and Machine Learning are **the same**. ## Deep Learning: **Shallow layer** to detect low level features (simple ones): Edges and Lines in different angles. **Middle layer** to detect middle level features (a litte more complex): Parts of object. **Deep layer** to detect high level features (supper complex ones): The whole object. ## Activation Function in Deep Learning: Activation Function transform a **linear** function to **non-Linearity**. Deep Learning is a combination of multiple machine learning models, which use the output of previous model as input. Similar to curves line is a combination of short straight lines. - **Sigmoid**: values from **0 to 1**, result of derivative can **easily reach 0**, which is the **vanishing radius**, when there are many layers and neurons in Deep Learning model. - **Hperbolic Tangent (TANH)**: values from **-1 to 1**, result of derivative can **easily reach 0** (but **not as easy as Sigmoid**), which is the vanishing radius, when there are many layers in Deep Learning model. - **Rectified Linear Unit (ReLU)**: values from **0 to infinity**, result of derivative **hardly reaches 0** when there are a lot of neurons and layers. **Tanh** and **Sigmoid** is good for **small** Deep Learning models. **ReLU** is the best for **big** Deep Learning models. ![](https://aicurious.io/posts/2019-09-23-cac-ham-kich-hoat-activation-function-trong-neural-networks/activation-functions.png) ## More layer is better than more neuron in 1 layer Because the **result** of each layer can be **used by another layer**, which can also **reduce computing resources** because each neuron in **1 layer does not have to learn everything again from scratch**. ## Overfitting Deep Learning hardly get underfit, most of the time, Deep Learning will get wellfit or overfit. ## Min Loss Function Loss is stable and small enough -> It is good enough model. There is no way to find the global minimum loss point because loss function in Deep Learning is an infinite space of parameters. ## Hyperparameters for Tunning: - Learning Rate - Weight Initialization for Breaking Symmetry - Hidden Layer - Hidden Neuron - Batch Size - ... # Day 2 ## Tensor Flow Similar to Numpy, but can run on GPU/TPU. <img src="https://miro.medium.com/max/4044/1*c8vKRVnog07DK6D5WJ3f3Q.png"> * TF supports **distributed computing** across multiple devices and servers * TF can extracting the **computation graph** from a Python function, then optimize it (e.g., by pruning unused nodes) <img src="https://i.imgur.com/Jrz1ikn.png"> ### Computation graph **Computation graph** is universial for every language. It is like a guideline for computer to do the calcuations that all programming language can understand. * Computation graphs can be executed **efficiently** (e.g. by automatically running independent operations in parallel) * Computation graphs can be export to a **portable format**, so you can run it in another environment (e.g. mobile device) ### Tensor Flow Concept Build a computaion graph for each model, then put tensor flow in the graph, then output. <img src="https://github.com/Amin-Tgz/awesome-tensorflow-2/raw/master/imgs/TF.png"> ## Basic Codes ### Random Values: Creating tensor randomly with normal distribution ``` tf.random.normal(shape=(2, 2), mean=0., stddev=1.) ``` Mean = 0 Standard Deviation = 1 ### Assigning New Value to a Variable Tensor is inmutable. Tensor Variable is mutable. To keep "status" variable when assign new value, must use .assign: ``` a = tf.Variable(initial_value) # DO NOT do this: a = new_value a.assign(new_value) ``` ### GradientTape: ``` a = tf.constant([3.]) b = tf.constant([4.]) with tf.GradientTape() as tape: tape.watch(a) # Start recording the history of operations applied to `a` c = tf.sqrt(tf.square(a) + tf.square(b)) # Do some math using `a` # What's the gradient of `c` with respect to `a`? dc_da = tape.gradient(c, a) print(dc_da) ``` **Tensor Flow watch Variable by default in gradient operation**. When doing **gradient operation with respect of constant "variable"**, Tensor Flow does not watch them until calling: `.watch`: start recording **gradient** operation history of constant "variable". ### Decorator tf.function: `@tf.function` transform a python function (normal function) to a computaion graph. -> The function runs faster, but more difficult to debug. **Execution flow**: **Python function**: Step by step according to the sequence of code. **Computaion graph**: Optimized the steps of python function (the steps are "suffle"). ## Keras: ### History: Keras save everything of the model while training: - Accuracy of training and validation. - Result (weights, biases) of each epoc training. - ... ### Callbacks: A callback is an object that is called at different points during training (e.g. at the end of every batch or at the end of every epoch) and does stuffs: Early Stop, Model Checkpoint ``` # Instantiate some callbacks callbacks = [tf.keras.callbacks.EarlyStopping(patience=2), # if validation not improving, stop training. tf.keras.callbacks.ModelCheckpoint(filepath='my_model.keras', save_best_only=True,)] # save the best model only. history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid), callbacks=callbacks) ``` ### Ways to Define Tensorflow Models: <img src="https://keras-dev.s3.amazonaws.com/tutorials-img/spectrum-of-workflows.png"> #### Sequential 1 straight flow of layers. ![](https://camo.githubusercontent.com/1608a950a0c76a6d36dd0d7ffdf7dcc6a784a99f/68747470733a2f2f7777772e74656e736f72666c6f772e6f72672f696d616765732f717569636b647261775f6d6f64656c2e706e67) #### Functional API Multiple flows of layers. Example 1: <img src="https://i.imgur.com/G2fZMVz.png"> Example 2: <img src="https://i.imgur.com/lQwcGmJ.png"> #### Subclassing API Edit the logic of layers using Class (OOP). Similar to building customized Pipeline. <img src="https://keras-dev.s3.amazonaws.com/tutorials-img/model-building-spectrum.png"> # Day 3 ## Reivew Steps in Machine Learning Model: **Forward**: Calculate y_hat **Backward**: Calculate Gradient (derivative) **Update**: Calculate new weights and bias with Gradient. **Repeat** Deep Learning Model is only a nested of multiple Machine Learning Models. ## Tunning Deep Learning Model: ### Mini Batch: The smaller batch, the harder loss to decrease, the faster the model train. **Reason**: The model only generalize for 1 small batch of samples at 1 time, which may not be suitable for other batches. The bigger batch, the more stable the loss to decrease. ### Learning Rate Decay: Decrease Learning Rate along the training Epoc: - Decrease after specific Epoc with/without a specific value. - Decrease after a specific condition (loss does not decrease after a specific Epoc...) ### Prevent Overfitting: - **Early Stopping**: Stop after certain conditions. - **Dropout**: Drop out neurons randomly, to make the model less depending on any certain neurons, which can make the model more general. Very similar to Weights Regularization. - **Weights Regularization**: - Data Augmentation: Produce new input based on original input with a little bit adjustment (rotation, change color, add noise,...) to prevent the model from remembering the input. ### Optimizer Optimizer conatins algorithm of decreasing Loss. - "SGD": Gradient Descent with Momentum. - "RMSpop": Gradient Descent with Learning Decay. - "Adam" is the best: Gradient Descent with Momentum and Learning Decay. # Day 4 ## Keras: Keras is different from Tensorflow. Keras is sub library in Tensorflow. **import Keras**: Old version of Keras and Tensorflow does **not support** this library. **import tensorflow.keras**: **Newest** version of Keras that Tensorflow support. tensorflow.keras.datasets: limited datasets. tensorflow.datasets: Vast variety datasets. ## Activation function: - **softmax**: propability of each classes, good for **multiple classes** classification. Output of softmax is an **array** (percentage of each classes). - **sigmoid**: propability of each classes, good for **binary classification**. Output of sigmoid is a **scarlar** (percentage of 0 or 1). ## Loss: **CategoricalCrossentropy**: if y_hat (prediction) **shape == y (true label) shape** -> use **CategoricalCrossentropy**. **SparseCategoricalCrossentrop**: if y_hat (prediction) **shape != y (true label) shape** -> use **Sparse**. ## Extract Model: ### Use tf.keras.callbacks.ModelCheckpoint: **.h5**: extention of file for saving tensorflow model/weight. Can save at any epoch as 1 file (1 epoch - 1 file). Must model.save() ## pathlib to make folder directory universial ## Pipeline in Tensorflow Pipeline to preprocess data for model training. `tf.data.Dataset` For train dataset -> Map -> Cache -> Repeat -> Shuffle -> Batch -> Prefetch For test dataset -> Map -> Cache (optional) -> Batch -> Prefetch (optional) ``` # Split into train and test set train_dataset = ds_map_cache.take(train_size) test_dataset = ds_map_cache.skip(train_size) final_train_dataset = train_dataset.repeat().shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.experimental.AUTOTUNE) final_test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.experimental.AUTOTUNE) ``` # [Keras + Pipeline Cheat Sheet](https://colab.research.google.com/drive/1_ZpagBFBBv1NzxsQn2T8niBIneEymX4o#scrollTo=yQDw42Ydf4is) ## Transfer Learning Fetch a copy of MobileNet v2 from [tf.keras.applications](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/applications) ![](https://i.stack.imgur.com/d0iwP.png) -------------------------------------------------- ![](https://www.learnopencv.com/wp-content/uploads/2019/06/Model_Timeline.png) # Day 5 ## Flask Website CSS Similar to dictionary in python, but with `;` ``` h1 { color: orange; text-align: center; } ```