# Probability and Statistics Statistics forms the heart of Machine Learning and a thorough understanding of the same is pivotal to truly understand the rationale behind the various learning algorithms. Some of the resources are not disjoint and cover essentially the same material, such redundant resources have been purposefully added so that you can figure out what works best for you. The Must-Know section is, well, a Must-Know, and hence ensure that you are very comfortable with these topics, do not try to jump ahead if you are not very familiar with these basics. Exercise 2-6 is based on these essentials, hence consider solving them before you move onto the next section. The next sections might be a little more challenging, but we hope that you cover atleast the 'Good to know' section before the lecture. Completing CS229 in the next month or two is highly recommended and it will greatly benefit you in your joruney ahead. The resources on 'Introduction to Statistical Learning Theory' is slightly more advanced but we hope that all of you try tackling it, though don't be disheartened if you find it challenging. Lastly, resources have also been provided for Bayesian Analysis, try to atleast juickly go through it before the lecture, though reading both of these resources are not compulsory at the moment. You are supposed to solve all the questions in the Exercises section before the lecture. The answers for all the Problem Sets can bee easily found, refrain from directly looking at them, in case you have spent a considerable amount of time on a problem and you can't still wrap your head around it, feel free to contact any senior in the club for help (discussions are also welcome). ## Must Know 1. [Stats 110](https://www.youtube.com/playlist?list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo): Lecture 1-16 (Basics of probability, Conditional Probability, Various common distributions, Expected values) 2. [Goodfellow: Chapter 3](https://github.com/janishar/mit-deep-learning-book-pdf/blob/master/chapter-wise-pdf/%5B7%5Dpart-1-chapter-3.pdf) : Till 3.11 3. [CS229 Notes on Probability](https://cs229.stanford.edu/lectures-spring2022/cs229-probability_review.pdf): Concise notes on probability and statistics required to start understanding Machine Learning 4. [MIT 18.05: Hypothesis Testing](https://ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2014/11b300b528689cba71f91588d6248143_MIT18_05S14_Reading17b.pdf): A good resource to develop a sense of Hypothesis Testing ## Good to know 1. [ISLR](https://static1.squarespace.com/static/5ff2adbe3fe4fe33db902812/t/6009dd9fa7bc363aa822d2c7/1611259312432/ISLR+Seventh+Printing.pdf): Chapter 3 (Linear Regression) Sections 3.1, 3.2 and 3.3 2. [Mathematical Statistics by Hogg and Craig](https://minerva.it.manchester.ac.uk/~saralees/statbook2.pdf): For Hypothesis testing, section 4.5 to 4.8 ## Additional Resources 1. [d2l.ai](https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/index.html): Good resource to see how stuff is implemented. 2. [CS229: Machine Learning](https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU): Amazing course, try to complete as many lectures as possible 3. [MIT 15.097: Introduction to Statistical Learning Theory](https://ocw.mit.edu/courses/15-097-prediction-machine-learning-and-statistics-spring-2012/3f3332b76e8248226fb2285b91cfc6db_MIT15_097S12_lec14.pdf): Introduction to Statistical Learning Theory 4. [MIT 15.097: Probabilistic Modeling and Bayesian Analysis](https://ocw.mit.edu/courses/15-097-prediction-machine-learning-and-statistics-spring-2012/553a0822984b08bc611306c93533a0a3_MIT15_097S12_lec15.pdf): Introduction to Probabilistic Modeling and Bayesian Analysis ## Exercises These questions are given to make sure you understand the basics of the topics mentioned above. Do not hesitate in approaching any senior within the group to ask any doubts. 1. [Problem Set 1](https://ocw.mit.edu/courses/18-s096-topics-in-mathematics-with-applications-in-finance-fall-2013/86e64fb3acfdb8aab38af462aaac1ece_MIT18_S096F13_pset2.pdf): Solve Question A1 and B4 before the lecture, try solving most of the Problem Set if possible 2. [Stats 110- PSet 4](https://projects.iq.harvard.edu/sites/projects.iq.harvard.edu/files/stat110/files/strategic_practice_and_homework_4.pdf): Q2-2 and Q2-3 3. [Stats 110- PSet 5](https://projects.iq.harvard.edu/sites/projects.iq.harvard.edu/files/stat110/files/strategic_practice_and_homework_5.pdf): Q1-2 4. [Stats 110- PSet 6](https://projects.iq.harvard.edu/sites/projects.iq.harvard.edu/files/stat110/files/strategic_practice_and_homework_6.pdf): Q1-4 5. [Stats 110- PSet 7](https://projects.iq.harvard.edu/sites/projects.iq.harvard.edu/files/stat110/files/strategic_practice_and_homework_7.pdf): Q1-1 and Q1-4 6. [Stats 110- PSet 8](https://projects.iq.harvard.edu/sites/projects.iq.harvard.edu/files/stat110/files/strategic_practice_and_homework_8.pdf): Q1-1, Q1-2 and Q1-3 ## Programming Exercises 1. Create a list to store average and repeat experiment 100 times. Generate a random array of 10 values with values between 0 and 1 and then calculate average of random sample. Plot it along with average of the distribution. 2. (On Central Limit Theorem) Generate 100 uniform random variables between 50 to 100. Take the average of these random variables. Repeat the experiment 1000 times, everytime noting down the average. Check if the distribution of mean is close to normal distribution by checking the proportion of values between "mu - sigma" to "mu + 2 sigma". You will also need to compute the proportion of values that are there between "mu - sigma" to "mu + 2 sigma" for a normal distribution. <!-- For myself https://ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2014/fb9bdc8ea76dcbb4da3fba11a6eb0ab7_MIT18_05S14_Reading18.pdf https://ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2014/3ede82f727a05cb3e35b01820dacff02_MIT18_05S14_Reading19.pdf -->