Course's Note and Tutorial

# Course's Note and Tutorial *"Machine Learning"* oleh Andrew Ng, Stanford University **Week 1** * Machine learning examples: Web search, photo tagging, email anti-spam, database mining, self-customizing programs. * AI -> learning algorithm * Machine learning * Algorithm: * Supervised learning -> Given correct data so the algorithm can learn * Classification -> discrete valued data (few differences) * Regression -> predict continuous valued output * Unsupervised learning -> Given data but we didn’t know which one is correct so the algorithm choose the best, which is the most dominant or similar (clustered relationship within data). * Model Representation * Linear regressions ![](https://i.imgur.com/uuOlL6I.png) * Cost Functions -> measure the accuracy of our hypothesis function ![](https://i.imgur.com/B7rhEG7.png) Training data set is scattered on the x-y plane. Try to make a straight line (defined by hθ(x)) which passes through these scattered data points. ![](https://i.imgur.com/51IHw4K.png) Ideal situation where cost function of 0 When θ1=1, we get a slope of 1 which goes through every single data point in our model Try to minimize the cost function. In this case, θ1=1 is our global minimum. Our objective is to find or make the software to determine which theta 0 and theta 1 is best. * Gradient descent ![](https://i.imgur.com/WnyVjKp.png) Fungsi derivative -> memperkecil atau memperbesar theta 1 untuk mendekati nilai minimum ![](https://i.imgur.com/hXetDw1.png) Nilai learning rate (alpha) ![](https://i.imgur.com/yjiWerF.png) Gradient descent can converge to local minimum, even with the learning rate fixed ![](https://i.imgur.com/Q969HBV.png) ![](https://i.imgur.com/PHQvDte.png) The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate. Gradient descent can be susceptible to local minima in general, the optimization problem we have posed here for linear regression has only one global, and no other local, optima; thus gradient descent always converges (assuming the learning rate α is not too large) to the global minimum * Matrix and Vector (Matlab Tutorial) ![](https://i.imgur.com/PSGVGas.png) ![](https://i.imgur.com/t2lW7N5.png) ![](https://i.imgur.com/LOXvq77.png) ![](https://i.imgur.com/08pLqHs.png) ![](https://i.imgur.com/sJaMNVa.png) ![](https://i.imgur.com/j12jULB.png) * ...