Pre-work: Deep Learning

1. Deep Learning Overview

Test 1.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Use cases:

  • Sephora: Clustering-Unsupervised Learning, to cluster the clients with their preferences.
  • GoogleMind
  • Tesla: Image processing
  • Google Translator: NLP

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

We need to control the selection of the Hyperparameters and find the best strategies for that.

Always: Math + Code

1. State whether the following statement is true or false

Ans:
"Deep Learning inspired by the Human Brain".

2. Artificial Neural Networks - Regression

The neural networks is analogous to the linear equation or linear regression.

y=m * x+b
output=weight * input+BIAS

The more characteristics the more precision, but it could fall into ovefitting response.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Blue circles are inputs
Red circle are output
W are the weights

The higher number of inputs and weights, the lesser the Root Mean Squared Error(RMSE)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Test 2.

1. If we have 5 different attributes to predict the price of a flight, how many neurons should be there in the input layer?

Ans: 5

Why?:
Input layers are the layers at the beginning of the neural network, the number of neurons in the input layers

2. The process of taking an input value and going from the left to right of an artificial neural network is called forward propagation?

Ans: True

Why?:
Forward Propagation is the way to move from the Input layer (left) to the Output layer (right) in the neural network.

3. Artificial Neural Networks - Classification

3.1 Binary Outcome

This is for binary cases: classification
This use the sigma function

It's similar to regression artificial neural networks RANN.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Let's a case about if a car has the probabilities to be sold
It's always between 0 and 1.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

X1: Mileage
The sigmoid function activates the results.

3.2 Multiclass Outcome

3.2.1 Activation Function

There could be more than one output. That is Multiclass Outcome

When you wanns multiple results

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Each one of the connections have a weight
Each one will have a corresponding equation
Three different equations:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

So the reason to make this is for predict about the probability of success by each event.
The add-up of probabilities is always equal to 1.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 0.5 of chance that one car to be sold
  • 0.2 of chance that one car to be not sold
  • 0.4 of chance that one car to be leased.

3.2.2 Sigmoid Activation Function

And we can also use Softmax Activation Function.
In the same way they add-up to one the probabilities

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

4. Google Colab

Test 4.

Where does the Google colab works?

Ans: In Cloud

5. Tensor Flow and Keras

Test 5.

1. Tensorflow is the only package available to train neural networks?

Ans: False Why?: Other than Tensorflow we have Pytorch and Keras packages to train neural networks.

2. Tensorflow and Keras are developed by?
Ans: Tensorflow and Keras were created by Google in the year 2015.

3. Does the sequential model mean adding the layers from left to right?

Ans: Yes, the sequential model means adding the layer from left to right or top to bottom.

6. Tensors

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

We have different ranks, and each one shows different kind of indexes.
rank 0: it doesn't have index
rank 1: it has a row
rank 2: row and column
rank 3: row, column and depth

They all ones start the index from 0.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

We have examples of rank3 or 3D data cases in Deep Learning. That's about pixel values, 3 different values: red, green, blue.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

GPU
GPU help us to build deep learning models, faster than CPUs.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Test 6.

  1. Which of the following are examples of tensors?
    They are: Scalar, Vector, Matrix, Cube.

  2. What is the rank of the below tensor?

1 2 3

4 5 6

7 8 9

Ans: 2
Why?:
The rank is the number of indices needed to access a particular element. In the above tensor, if number 2 wanted to be accessed then two indices should be used, i,e row number and column number.

7. Deep Learning.

It was created in 1957, by a psychologist named Frank Rosenblatt. It was taking photos of 400 pixels.

Then in 1970 have created the ANN similar to a linear regression or linear equation.
Here was discovery the perceptron. Here was where the AI began.

At the beginning the Neural Network was simply, then researchers realised that that is not perfect.

But in the real world no everything behave as a linear equationm, it could be different characteristics in the curve.

With hidden layer the outcomes is elbow.

The inputs are: Features of the model, X1, X2
Hidden layer: Weights, it can be 1,2,3..etc.
Output: The ANN can have many outputs if desired.

Since two hidden layers, officially we say: this is a Deep Learning model, and we can add more and more layers, it will be deeper.

DL was created redically since 1980, but it took importance since 2012. By each year since 2010 the Error of DL is getting lower and lower.
The precision High according we add more hidden layers.

GPT3 was a deep learning model that was created for general purposes, it is based in 96 hidden layers and this can create text similar to a Wikipedia.

Test 7.

  1. How many hidden layers can be added to a neural network architecture?

Ans: Depends upon the problem statement

Why? The number of hidden layers to be included is a hyperparameter selection that depends on the problem statement.

  1. Does increasing the number of hidden layers in neural networks result in overfitting? Ans: True

Why? As the number of hidden layers increases in the network, it tries to capture the complex function which performs very well on train data but fails on test data.

8. Linear Algebra in Data Science

Main and Relevant Operations.

See more about in PDF's

Test 8.

What is the Determinant of the following matrix?

10 5

8 4

Ans: 0

Can you multiply the 3x4 matrix by the 4x2 matrix?

Ans: Yes

9. Activation Functions

At the final if we provide information as in weights as in Biases, many information about it, again we are using a linear equation, and that equation will predict with linear relationships, it will not able to predict non-linear relationship.

HOWEVER

If we put or use Activation Function inside the weights is not the case, the model will be able to predict Non- RELATIONSHIP MODELS.

SO WE SEE ACTIVATION FUNCTIONS KIND


The most popular ones.
See descriptions in PDF: Activation Functions

Test 9.

1. What is the range of tanh activation function?

Ans: Tanh is an activation function having an s-shaped structure with a range of -1 to 1.

2. Which activation function can be used in the output layer for the regression problem?

Ans: ReLU

For the regression type of problem, you can simply create the Output layer without any activation function as we are interested in numerical values and ReLu can also be used in the output layer.

10. Introduction to Gradient Descent.

The goal is define the lowest point to descend the mountain or define the Gradient Descent.
The problem is, you cannot see the lowest point

But you can devise or think of how get the lowest by feeling around of your feet, and then follow a direction wherever of gradient is the steepest(más profundo).

You need one weight and onw Bias to learn.
There is an unlimited ways of combinations

Loss is an error in our model

Cross entrpoy is analogous to MSE

MSE you measure the Loss for regression problem.
Cross entropy measure the LOss for classification problem.

Learning rate:
You select or calculate the biggest step to the algorithm.
-Low learning rate: smallest steps
-High learning rate: Biggest steps

BackPropagation

This is in constrast to the Forwardpropagation, instead to take the left-right direction, e.g., take data find Bias, Weight and outputs.

You start from the right, you calculate the loss, biases and weights updated from the rigth to the left in the model.

Gradient descent steps.


2)

3)

4) Loss needs to reach an acceptable level.

And of course a new Bias and Weight.
This is the work of the deep learning model.s

Test 10.

1. Gradient descent algorithm makes the deep learning model to learn ___?

Ans: Weights and Bias

Why?:
With the help of forward and backpropagation, neural networks try to learn the weights and biases. And reduce the loss.

2. Which of the below-given functions can be used as a loss function used for the Classification problem?

Ans: Cross-Entropy

Why?:
Cross-entropy is a commonly used loss function for classification problems.

11. Calculating the Gradient Descent.

See the PDF.
See youtube videos about Gradient Descent and derivative.

Here we see the d(loss)/d(error)=2*error
that was obtained in Power rule, derivatives

Then we "propose" the derivative of error relative to weight, so that's a linear relationship.
error=x * weight - y
x * weight: our predictions
y: it could be the bias or the actual
So we can remove actual the bias because our interest is prediction that is x * weight
Y could be a only a constant or when we derive it, it is eliminated (Concepts of calculous)=.

Test 11.

Gradient Descent is an optimization algorithm used to reduce the loss?

Yes, beacuse Gradient Descent is an optimization technique that is used to improve deep learning and neural network-based models by minimizing the loss.

12. Gradient Descent Optimization

This could happen, let's say the climber wants to descend the way he chose take him to a pocket or he is stuck, he is in a local minimum. So we need to find the global minimum to reach.

So fortunately we have hyperparameters to use and avoid this tragedy.

See the PDF

Test 12.

1. State whether the following statement is true or false

"The time taken to train the model decreases as the number of epochs increases"

Ans: False

The number of epochs is a hyperparameter that defines the number of times that the learning algorithm will work through the entire training dataset. Thus as the number of epochs increases, the model gets exposed to the whole dataset and also increases and leads to a slow training process.

2. Which of the following are the variants of Optimizers(Gradient Descent)?

Adagrad

RMSprop

Adam

All of the above (Ans)

Why?:
Stochastic Gradient Descent, Adagrad, RMSprop, SGD with Momentum, and Adam are variants of the Optimizers.

13. Normalizing

See the PDF

Test 13.

1. What is the function of Z-Score Normalization?
Ans:
Forces all the data points to fit in a normal distribution having Mean = 0 and Standard Deviation=1

Z-score normalization refers to the process of normalizing every value in a dataset such that the mean of all of the values is 0 and the standard deviation is 1.

14. ANN Hyperparameter

See the PDF

Some times the plots of train and validation decrease similarly.

But in this case the validation starts to increase along the epochs increase.

So this is a case of overfitting.

We need to re-structure the model, reduce the complexity, maybe: decrease # of epochs, #neurons in layers or # of layers, increase the accuracy, etc.

The Tips. In PDF is in disorder, so lets see this image.

5) Don't worry about perfection

Eventually we need to say, hey this is good enough it. No models are perfect they are useful. It's good enough to provide in some value with and I'm gonna put it into production.

So, I'm not going to lose sleep by the fact that, there might have been a 0.001 % increasing in accuracy, potential out there and saying if I would have choosen a different parameter?

Test. 14

Which of the following are the hyperparameters in Neural Networks?

A. Learning Rate

B. Optimizers

C. Epochs

Ans: All of above
The hyperparameters to tune are the number of neurons, activation function, optimizer, learning rate, batch size, and epochs.

Test 15 (Hands.on GridSearchCV).

Test

Can we use GridSearchCv for tuning the neural network model?

Yes, we can use GridSearchCV and RandomizedSearchCV by embedding them with Keras Classifier to tune the neural network model.