# Pre-work: Deep Learning ## 1. Deep Learning Overview ### Test 1. ![](https://i.imgur.com/IkNQuhK.png) ![](https://i.imgur.com/6esm268.png) ![](https://i.imgur.com/jKGis5d.png) **Use cases:** - Sephora: Clustering-Unsupervised Learning, to cluster the clients with their preferences. - GoogleMind - Tesla: Image processing - Google Translator: NLP ![](https://i.imgur.com/a5J1kos.png) We need to control the selection of the Hyperparameters and find the best strategies for that. Always: Math + Code **1. State whether the following statement is true or false** **Ans:** "Deep Learning inspired by the Human Brain". ## 2. Artificial Neural Networks - Regression The neural networks is analogous to the linear equation or linear regression. y=m * x+b output=weight * input+BIAS The more characteristics the more precision, but it could fall into ovefitting response. ![](https://i.imgur.com/oEJljL1.png) ![](https://i.imgur.com/0sfoXKc.png) Blue circles are inputs Red circle are output W are the weights **The higher number of inputs and weights, the lesser the Root Mean Squared Error**(RMSE) ![](https://i.imgur.com/U0GNy0D.png) ![](https://i.imgur.com/XCBIPA5.png) ## Test 2. **1. If we have 5 different attributes to predict the price of a flight, how many neurons should be there in the input layer?** **Ans:** 5 Why?: Input layers are the layers at the beginning of the neural network, the number of neurons in the input layers **2. The process of taking an input value and going from the left to right of an artificial neural network is called forward propagation?** **Ans:** True Why?: Forward Propagation is the way to move from the Input layer (left) to the Output layer (right) in the neural network. ## 3. Artificial Neural Networks - Classification ### 3.1 Binary Outcome This is for binary cases: classification This use the **sigma function** It's similar to regression artificial neural networks RANN. ![](https://i.imgur.com/3zWXbqf.png) **Let's a case about if a car has the probabilities to be sold** It's always between 0 and 1. ![](https://i.imgur.com/zfuTMmL.png) X1: Mileage The sigmoid function activates the results. ### 3.2 Multiclass Outcome ### 3.2.1 Activation Function There could be more than one output. That is **Multiclass Outcome** When you wanns multiple results ![](https://i.imgur.com/IRsjTM7.png) Each one of the connections have a weight Each one will have a corresponding equation Three different equations: ![](https://i.imgur.com/HggQh6j.png) So the reason to make this is for predict about the probability of success by each event. The add-up of probabilities is always equal to **1**. ![](https://i.imgur.com/zMsY5tD.png) - 0.5 of chance that one car to be sold - 0.2 of chance that one car to be not sold - 0.4 of chance that one car to be leased. ### 3.2.2 Sigmoid Activation Function And we can also use Softmax Activation Function. In the same way they add-up to one the probabilities ![](https://i.imgur.com/Q6aJCZV.png) ## 4. Google Colab ### Test 4. **Where does the Google colab works?** **Ans:** In Cloud ## 5. Tensor Flow and Keras ### Test 5. **1. Tensorflow is the only package available to train neural networks?** **Ans:** False Why?: Other than Tensorflow we have Pytorch and Keras packages to train neural networks. **2. Tensorflow and Keras are developed by?** **Ans:** Tensorflow and Keras were created by Google in the year 2015. **3. Does the sequential model mean adding the layers from left to right?** **Ans:** Yes, the sequential model means adding the layer from left to right or top to bottom. ## 6. Tensors ![](https://i.imgur.com/onrseh9.png) We have different ranks, and each one shows different kind of indexes. rank 0: it doesn't have index rank 1: it has a row rank 2: row and column rank 3: row, column and depth They all ones start the index from 0. ![](https://i.imgur.com/XnLRyeY.png) We have examples of rank3 or 3D data cases in Deep Learning. That's about pixel values, 3 different values: red, green, blue. ![](https://i.imgur.com/TDVQvpg.png) **GPU** GPU help us to build deep learning models, faster than CPUs. ![](https://i.imgur.com/4m5q3Xo.png) ## Test 6. 1. Which of the following are examples of tensors? They are: Scalar, Vector, Matrix, Cube. 2. What is the rank of the below tensor? 1 2 3 4 5 6 7 8 9 **Ans:** 2 **Why?:** The rank is the number of indices needed to access a particular element. In the above tensor, if number 2 wanted to be accessed then two indices should be used, i,e row number and column number. ## 7. Deep Learning. It was created in 1957, by a psychologist named Frank Rosenblatt. It was taking photos of 400 pixels. ![](https://i.imgur.com/yBmWQMB.png) Then in 1970 have created the ANN similar to a linear regression or linear equation. Here was discovery the perceptron. Here was where the AI began. At the beginning the Neural Network was simply, then researchers realised that that is not perfect. ![](https://i.imgur.com/rDJfnUD.png) But in the real world no everything behave as a linear equationm, it could be different characteristics in the curve. ![](https://i.imgur.com/Gj59bX4.png) ![](https://i.imgur.com/8MDl4HW.png) With hidden layer the outcomes is elbow. ![](https://i.imgur.com/1Cm97JU.png) ![](https://i.imgur.com/DsQu5ug.png) **The inputs are:** Features of the model, X1, X2 **Hidden layer:** Weights, it can be 1,2,3..etc. **Output:** The ANN can have many outputs if desired. Since two hidden layers, officially we say: this is a **Deep Learning model**, and we can add more and more layers, it will be deeper. ![](https://i.imgur.com/rQQelgJ.png) DL was created redically since 1980, but it took importance since 2012. By each year since 2010 the **Error** of DL is getting lower and lower. The precision High according we add more hidden layers. ![](https://i.imgur.com/8udoZF6.png) GPT3 was a deep learning model that was created for general purposes, it is based in 96 *hidden layers* and this can create text similar to a Wikipedia. ![](https://i.imgur.com/2lA4HL2.png) ### Test 7. 1. How many hidden layers can be added to a neural network architecture? Ans: Depends upon the problem statement Why? The number of hidden layers to be included is a hyperparameter selection that depends on the problem statement. 2. Does increasing the number of hidden layers in neural networks result in overfitting? Ans: True Why? As the number of hidden layers increases in the network, it tries to capture the complex function which performs very well on train data but fails on test data. ## 8. Linear Algebra in Data Science Main and Relevant Operations. ![](https://i.imgur.com/xPqy0ZE.png) See more about in PDF's ### Test 8. What is the Determinant of the following matrix? 10 5 8 4 **Ans:** 0 Can you multiply the 3x4 matrix by the 4x2 matrix? **Ans:** Yes ## 9. Activation Functions ![](https://i.imgur.com/J1rkltX.png) At the final if we provide information as in weights as in Biases, many information about it, again we are using a linear equation, and that equation will predict with linear relationships, it will not able to predict non-linear relationship. HOWEVER If we put or use Activation Function inside the weights is not the case, the model will be able to predict Non- RELATIONSHIP MODELS. SO WE SEE ACTIVATION FUNCTIONS KIND ![](https://i.imgur.com/UffIoqn.png) The most popular ones. See descriptions in PDF: Activation Functions ### Test 9. **1. What is the range of tanh activation function?** **Ans:** Tanh is an activation function having an s-shaped structure with a range of -1 to 1. **2. Which activation function can be used in the output layer for the regression problem?** **Ans:** ReLU For the regression type of problem, you can simply create the Output layer without any activation function as we are interested in numerical values and ReLu can also be used in the output layer. ## 10. Introduction to Gradient Descent. ![](https://i.imgur.com/7oO5dzQ.png) The goal is define the lowest point to descend the mountain or define the Gradient Descent. The problem is, you cannot see the lowest point But you can devise or think of how get the lowest by feeling around of your feet, and then follow a direction wherever of gradient is the steepest(más profundo). You need one weight and onw Bias to learn. There is an unlimited ways of combinations ![](https://i.imgur.com/uCmdxoy.png) Loss is an error in our model ![](https://i.imgur.com/zDen4cV.png) Cross entrpoy is analogous to MSE MSE you measure the Loss for regression problem. Cross entropy measure the LOss for classification problem. **Learning rate:** You select or calculate the biggest step to the algorithm. -Low learning rate: smallest steps -High learning rate: Biggest steps ![](https://i.imgur.com/N85IaLa.png) **BackPropagation** This is in constrast to the Forwardpropagation, instead to take the left-right direction, e.g., take data find Bias, Weight and outputs. You start from the right, you calculate the loss, biases and weights updated from the rigth to the left in the model. ### Gradient descent steps. 1) ![](https://i.imgur.com/svalMkE.png) 2) ![](https://i.imgur.com/soebRLM.png) 3) ![](https://i.imgur.com/gtkh5Wi.png) 4) Loss needs to reach an acceptable level. ![](https://i.imgur.com/9kbc1dr.png) And of course a new Bias and Weight. This is the work of the deep learning model.s ### Test 10. **1. Gradient descent algorithm makes the deep learning model to learn ___?** **Ans:** Weights and Bias Why?: With the help of forward and backpropagation, neural networks try to learn the weights and biases. And reduce the loss. **2. Which of the below-given functions can be used as a loss function used for the Classification problem?** **Ans:** Cross-Entropy Why?: Cross-entropy is a commonly used loss function for classification problems. ## 11. Calculating the Gradient Descent. See the PDF. See youtube videos about Gradient Descent and derivative. Here we see the d(loss)/d(error)=2*error that was obtained in Power rule, derivatives ![](https://i.imgur.com/1MYksoo.png) Then we **"propose"** the derivative of error relative to weight, so that's a linear relationship. error=x * weight - y x * weight: our predictions y: it could be the **bias** or the **actual** So we can remove actual the bias because our interest is **prediction** that is **x * weight** Y could be a only a constant or when we derive it, it is eliminated (Concepts of calculous)=. ![](https://i.imgur.com/g8ksjcd.png) ### Test 11. **Gradient Descent is an optimization algorithm used to reduce the loss?** Yes, beacuse Gradient Descent is an optimization technique that is used to improve deep learning and neural network-based models by minimizing the loss. ## 12. Gradient Descent Optimization This could happen, let's say the climber wants to descend the way he chose take him to a pocket or he is stuck, he is in a **local minimum**. So we need to find the **global minimum** to reach. ![](https://i.imgur.com/ywePKqh.png) So fortunately we have hyperparameters to use and avoid this tragedy. ![](https://i.imgur.com/msHdth6.png) See the PDF ### Test 12. **1. State whether the following statement is true or false** "The time taken to train the model decreases as the number of epochs increases" **Ans:** False The number of epochs is a hyperparameter that defines the number of times that the learning algorithm will work through the entire training dataset. Thus as the number of epochs increases, the model gets exposed to the whole dataset and also increases and leads to a slow training process. **2. Which of the following are the variants of Optimizers(Gradient Descent)?** Adagrad RMSprop Adam All of the above **(Ans)** Why?: Stochastic Gradient Descent, Adagrad, RMSprop, SGD with Momentum, and Adam are variants of the Optimizers. ## 13. Normalizing See the PDF ### Test 13. **1. What is the function of Z-Score Normalization?** **Ans:** Forces all the data points to fit in a normal distribution having Mean = 0 and Standard Deviation=1 Z-score normalization refers to the process of normalizing every value in a dataset such that the mean of all of the values is 0 and the standard deviation is 1. ## 14. ANN Hyperparameter See the PDF Some times the plots of train and validation decrease similarly. But in this case the validation starts to increase along the epochs increase. So this is a case of overfitting. We need to re-structure the model, reduce the complexity, maybe: decrease # of epochs, #neurons in layers or # of layers, increase the accuracy, etc. ![](https://i.imgur.com/kqmWyRV.png) The Tips. In PDF is in disorder, so lets see this image. ![](https://i.imgur.com/ztXJS7I.png) **5) Don't worry about perfection** Eventually we need to say, hey this is good enough it. No models are perfect they are useful. It's good enough to provide in some value with and I'm gonna put it into production. So, I'm not going to lose sleep by the fact that, there might have been a 0.001 % increasing in accuracy, potential out there and saying if I would have choosen a different parameter? ### Test. 14 **Which of the following are the hyperparameters in Neural Networks?** A. Learning Rate B. Optimizers C. Epochs **Ans:** All of above The hyperparameters to tune are the number of neurons, activation function, optimizer, learning rate, batch size, and epochs. ## Test 15 (Hands.on GridSearchCV). Test Can we use GridSearchCv for tuning the neural network model? Yes, we can use GridSearchCV and RandomizedSearchCV by embedding them with Keras Classifier to tune the neural network model.