Use cases:
We need to control the selection of the Hyperparameters and find the best strategies for that.
Always: Math + Code
1. State whether the following statement is true or false
Ans:
"Deep Learning inspired by the Human Brain".
The neural networks is analogous to the linear equation or linear regression.
y=m * x+b
output=weight * input+BIAS
The more characteristics the more precision, but it could fall into ovefitting response.
Blue circles are inputs
Red circle are output
W are the weights
The higher number of inputs and weights, the lesser the Root Mean Squared Error(RMSE)
1. If we have 5 different attributes to predict the price of a flight, how many neurons should be there in the input layer?
Ans: 5
Why?:
Input layers are the layers at the beginning of the neural network, the number of neurons in the input layers
2. The process of taking an input value and going from the left to right of an artificial neural network is called forward propagation?
Ans: True
Why?:
Forward Propagation is the way to move from the Input layer (left) to the Output layer (right) in the neural network.
This is for binary cases: classification
This use the sigma function
It's similar to regression artificial neural networks RANN.
Let's a case about if a car has the probabilities to be sold
It's always between 0 and 1.
X1: Mileage
The sigmoid function activates the results.
There could be more than one output. That is Multiclass Outcome
When you wanns multiple results
Each one of the connections have a weight
Each one will have a corresponding equation
Three different equations:
So the reason to make this is for predict about the probability of success by each event.
The add-up of probabilities is always equal to 1.
And we can also use Softmax Activation Function.
In the same way they add-up to one the probabilities
Where does the Google colab works?
Ans: In Cloud
1. Tensorflow is the only package available to train neural networks?
Ans: False Why?: Other than Tensorflow we have Pytorch and Keras packages to train neural networks.
2. Tensorflow and Keras are developed by?
Ans: Tensorflow and Keras were created by Google in the year 2015.
3. Does the sequential model mean adding the layers from left to right?
Ans: Yes, the sequential model means adding the layer from left to right or top to bottom.
We have different ranks, and each one shows different kind of indexes.
rank 0: it doesn't have index
rank 1: it has a row
rank 2: row and column
rank 3: row, column and depth
They all ones start the index from 0.
We have examples of rank3 or 3D data cases in Deep Learning. That's about pixel values, 3 different values: red, green, blue.
GPU
GPU help us to build deep learning models, faster than CPUs.
Which of the following are examples of tensors?
They are: Scalar, Vector, Matrix, Cube.
What is the rank of the below tensor?
1 2 3
4 5 6
7 8 9
Ans: 2
Why?:
The rank is the number of indices needed to access a particular element. In the above tensor, if number 2 wanted to be accessed then two indices should be used, i,e row number and column number.
It was created in 1957, by a psychologist named Frank Rosenblatt. It was taking photos of 400 pixels.
Then in 1970 have created the ANN similar to a linear regression or linear equation.
Here was discovery the perceptron. Here was where the AI began.
At the beginning the Neural Network was simply, then researchers realised that that is not perfect.
But in the real world no everything behave as a linear equationm, it could be different characteristics in the curve.
With hidden layer the outcomes is elbow.
The inputs are: Features of the model, X1, X2
Hidden layer: Weights, it can be 1,2,3..etc.
Output: The ANN can have many outputs if desired.
Since two hidden layers, officially we say: this is a Deep Learning model, and we can add more and more layers, it will be deeper.
DL was created redically since 1980, but it took importance since 2012. By each year since 2010 the Error of DL is getting lower and lower.
The precision High according we add more hidden layers.
GPT3 was a deep learning model that was created for general purposes, it is based in 96 hidden layers and this can create text similar to a Wikipedia.
Ans: Depends upon the problem statement
Why? The number of hidden layers to be included is a hyperparameter selection that depends on the problem statement.
Why? As the number of hidden layers increases in the network, it tries to capture the complex function which performs very well on train data but fails on test data.
Main and Relevant Operations.
See more about in PDF's
What is the Determinant of the following matrix?
10 5
8 4
Ans: 0
Can you multiply the 3x4 matrix by the 4x2 matrix?
Ans: Yes
At the final if we provide information as in weights as in Biases, many information about it, again we are using a linear equation, and that equation will predict with linear relationships, it will not able to predict non-linear relationship.
HOWEVER
If we put or use Activation Function inside the weights is not the case, the model will be able to predict Non- RELATIONSHIP MODELS.
SO WE SEE ACTIVATION FUNCTIONS KIND
The most popular ones.
See descriptions in PDF: Activation Functions
1. What is the range of tanh activation function?
Ans: Tanh is an activation function having an s-shaped structure with a range of -1 to 1.
2. Which activation function can be used in the output layer for the regression problem?
Ans: ReLU
For the regression type of problem, you can simply create the Output layer without any activation function as we are interested in numerical values and ReLu can also be used in the output layer.
The goal is define the lowest point to descend the mountain or define the Gradient Descent.
The problem is, you cannot see the lowest point
But you can devise or think of how get the lowest by feeling around of your feet, and then follow a direction wherever of gradient is the steepest(más profundo).
You need one weight and onw Bias to learn.
There is an unlimited ways of combinations
Loss is an error in our model
Cross entrpoy is analogous to MSE
MSE you measure the Loss for regression problem.
Cross entropy measure the LOss for classification problem.
Learning rate:
You select or calculate the biggest step to the algorithm.
-Low learning rate: smallest steps
-High learning rate: Biggest steps
BackPropagation
This is in constrast to the Forwardpropagation, instead to take the left-right direction, e.g., take data find Bias, Weight and outputs.
You start from the right, you calculate the loss, biases and weights updated from the rigth to the left in the model.
2)
3)
4) Loss needs to reach an acceptable level.
And of course a new Bias and Weight.
This is the work of the deep learning model.s
1. Gradient descent algorithm makes the deep learning model to learn ___?
Ans: Weights and Bias
Why?:
With the help of forward and backpropagation, neural networks try to learn the weights and biases. And reduce the loss.
2. Which of the below-given functions can be used as a loss function used for the Classification problem?
Ans: Cross-Entropy
Why?:
Cross-entropy is a commonly used loss function for classification problems.
See the PDF.
See youtube videos about Gradient Descent and derivative.
Here we see the d(loss)/d(error)=2*error
that was obtained in Power rule, derivatives
Then we "propose" the derivative of error relative to weight, so that's a linear relationship.
error=x * weight - y
x * weight: our predictions
y: it could be the bias or the actual
So we can remove actual the bias because our interest is prediction that is x * weight
Y could be a only a constant or when we derive it, it is eliminated (Concepts of calculous)=.
Gradient Descent is an optimization algorithm used to reduce the loss?
Yes, beacuse Gradient Descent is an optimization technique that is used to improve deep learning and neural network-based models by minimizing the loss.
This could happen, let's say the climber wants to descend the way he chose take him to a pocket or he is stuck, he is in a local minimum. So we need to find the global minimum to reach.
So fortunately we have hyperparameters to use and avoid this tragedy.
See the PDF
1. State whether the following statement is true or false
"The time taken to train the model decreases as the number of epochs increases"
Ans: False
The number of epochs is a hyperparameter that defines the number of times that the learning algorithm will work through the entire training dataset. Thus as the number of epochs increases, the model gets exposed to the whole dataset and also increases and leads to a slow training process.
2. Which of the following are the variants of Optimizers(Gradient Descent)?
Adagrad
RMSprop
Adam
All of the above (Ans)
Why?:
Stochastic Gradient Descent, Adagrad, RMSprop, SGD with Momentum, and Adam are variants of the Optimizers.
See the PDF
1. What is the function of Z-Score Normalization?
Ans:
Forces all the data points to fit in a normal distribution having Mean = 0 and Standard Deviation=1
Z-score normalization refers to the process of normalizing every value in a dataset such that the mean of all of the values is 0 and the standard deviation is 1.
See the PDF
Some times the plots of train and validation decrease similarly.
But in this case the validation starts to increase along the epochs increase.
So this is a case of overfitting.
We need to re-structure the model, reduce the complexity, maybe: decrease # of epochs, #neurons in layers or # of layers, increase the accuracy, etc.
The Tips. In PDF is in disorder, so lets see this image.
5) Don't worry about perfection
Eventually we need to say, hey this is good enough it. No models are perfect they are useful. It's good enough to provide in some value with and I'm gonna put it into production.
So, I'm not going to lose sleep by the fact that, there might have been a 0.001 % increasing in accuracy, potential out there and saying if I would have choosen a different parameter?
Which of the following are the hyperparameters in Neural Networks?
A. Learning Rate
B. Optimizers
C. Epochs
Ans: All of above
The hyperparameters to tune are the number of neurons, activation function, optimizer, learning rate, batch size, and epochs.
Test
Can we use GridSearchCv for tuning the neural network model?
Yes, we can use GridSearchCV and RandomizedSearchCV by embedding them with Keras Classifier to tune the neural network model.