[TOC] # Deep Learning for academics This focuses on Introduction To Deep Learning course, that is offered by ECE dept. [**Class recordings, two lectures from April 2023**](https://www.youtube.com/playlist?list=PLBY2MLoAD0Ex2zM5yL2EiD9EPFt_j_9MF) Batch2021 covered only the first 5chapters of the recommended book and we barely had classes post midsem. Entire April was for Project Discussion and classes moved online, so we could not cover any topic in detail. Projects were mainly stupid simple tutorial ones, try to have some uniqueness and present results comparing different learning rates and optimizers. Book to follow : Deep Learning by Ian Goodfellow, first 6chapters [Playlist if you are bored, by Chai Time Data Science](https://www.youtube.com/playlist?list=PLLvvXm0q8zUbYF1nCy5nc7iyZsgPqPt21) :::info **Imp but not in book** Histograms, basically Frequency plotting ::: ## Ch1 Skip chapter, not much to see around ## Ch2 Linear Algebra Rank of matrix Trace of matrix Span of matrix L1, L2 norms [Principal Component Analysis](https://www.youtube.com/watch?v=MLaJbA82nzk) <li> Most imp topic in the entire course, was asked both in midsem and endsem evaluation</li> ## Ch3 Information Theory Basics of Probability and Random Processes Mass distributions and moments (mean, variance) Chain rule of conditional probabilities, see pg75 and pg76 for diagram Gaussian Gaussian with $\beta$ parameter Gaussian in matrix form (was asked in endsem for 10marks, difference between ***A and A+***) Functions and their properties Sigmoid and softplus Try to derive all the 10relations, was asked in midsem for 5marks pg67 Information Theory H(x) Entropy KL divergence # Ch4 Gradient descent Jacobian and hessian (basically jacobian is the matrix of simple derivatives, hessian is a matrix of double derivaties) :::info Good to understand Saddle point, pg86 ::: KKT optimization, min max max optimization Linear Least Sqaures, simply remember the formulas for midsem. # Ch5 Underfitting and Overfitting pg127 diagram What is irreducible error ? (error due to minor disturbances in our sampled data) Stochastic Gradient Descent vs Mini Batch Curse of dimensionality, small 2mark