Linear Neural Networks

--- title: Linear Neural Networks tags: ml --- ### Linear Neural Networks #### Linear regression Assumptions - The relationship between the independent variables x and the dependent variable y is linear - Any noise is well-behaved (following a Gaussian distribution) $\hat{y}$ = $w_1x_1$ + $w_dx_d$ + b - The w’s are the weights and the b is the bias #### Loss Function - Quantifies the distance between the real and predicted value of the target - The loss will usually be a non-negative number where smaller values are better and perfect predictions have a loss of 0 - Squared error is the most popular loss function in regression problems ![](https://i.imgur.com/6vnrEsI.png) - Average the losses on the training set to measure the quality of a model on the entire dataset of n examples ![](https://i.imgur.com/csJDJ2m.png) #### Analytic Solution - Subsume bias b into the parameter w by appending a column to the design matrix consisting of all ones - Minimize ||$y - X_w$||^2^ where y is observation, x is vector of all features, and w is weight - Take derivative with respect to w and set it equal to zero #### Minibatch Stochastic Gradient Descent - Gradient descent - Iteratively reducing the error by updating the parameters in the direction that incrementally lowers the loss function - Requires taking derivative of loss function, which is an average of the losses computed on every single example in the dataset - Can be super slow - Minibatch stochastic gradient descent - Sample a random minibatch of examples every time an update needs to be computer ![](https://i.imgur.com/tWcU968.png) - Algorithm - Initialize the values of the model parameters, typically at random - Iteratively sample random minibatches from the data, updating the parameters in the direction of the negative gradient - For quadratic losses and affine transformations, we can write it as ![](https://i.imgur.com/phlkdl5.png) - |𝜷| is the number of examples in each minibatch (batch size) - n is the learning rate - Hyperparameters - Parameters that are tunable but not updated in the training loop - ex: batch size, learning rate - Hyperparameter tuning - Process by which hyperparameters are chosen - Typically requires that we adjust them based on the results of the training loop as assessed on a separate validation set #### Normal distribution and squared loss