Tâm - Week 4 - HackMD

--- title: 'Tâm - Week 4' tags: CoderSchool, Mariana --- Week 4 === ## Table of Contents [TOC] ## Monday ### Intro to Machine Learning - Scalar: a number, a vector: - matrix: combinations of vector - Vector addition: adding element-wise, must be of the same dimension - Norm is the length of the vector (between two points end point and start point) - Manhattan distance is the sum of the two end points - Norm infinity $||v||_{infinity}$ is the largest element - $L_p$ is $\sqrt[p]{x_1^p + x_2^p + ... + x_n^p}$ - Tensor is multidimensional vector like - Matrix multiplication http://matrixmultiplication.xyz/?source=post_page--------------------------- the length of the second matrix must passed the the width of the first matrix - **Derivative:** - sum - power - log - product - quotient - chain rule - Overfitting - Mean squared error - Loss function/Cost function measures the mean square error - **Vectorization:** - use vector to apply operation to all elements of vector - Numpy type casting - numpy will convert the type of all elements to one type if possible - `numpy.shape` - reshape(3,1) : three rows, one column `[[1,2,3]]` - shape (3,) is 1D: `[1,2,3]` - create list of zeros with `np.zeros` and list of ones with `np.ones` - `np.ones(2,5)` array with 2 rows 5 columns - `c = np.full((2,2), 7)` results an array 2x2 of 7s - Numpy indexing: flatten (-1,1) or (-1,n) with n is a number that col or row divisible by - Statistics: mean max min max(axis=1) std var - Array broadcasting - Using numpy array is around 400 times faster than for loop ## Tuesday ### ML overview - Review ML basics from Week 1 - How machines learn : they learn from error -> start with a random line -> calculate error -> repeat to reduce error ### Linear Regression - **Simple Linear Regression:** - Data Engineer: take care of ETL (extract transfer load) - Mean Squared Error: the sum of squared error of the line vs the data points ### Polynomial Regression ### Overfitting ### Practice `X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)` - random_state makes sure the train test split is the same ``` from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train,y_train) ``` model.fit returns the model fitted to the data, include `model.coef_` and `model.intercept_` `model.predict(X_test)` ### Degree Complexity - Early stopping: stop increasing training degree complexity when test set error starts increasing ### Data Leaking - The label depends or correlate with the training data instead of being independent ### Extra bonus problem - https://hackmd.io/@sDKJIkjlTzyk6tlHoM59Jg/ry4fBqCAQ?type=view - get data : https://data.worldbank.org/?locations=MY-VN-US-CN - ## Wednesday ### Logistic Regression - Confusion matrix: our goal is maximize the diagonal where we predict correctly. Type I error is false positive (when we predict positive wrongly). Type II error is false negative (when we predict negative wrongly) ### Natural Language Processing - Bag of Words: the length of array is the total number of words in bag of words, the sum of the elements equal the number of words in the sentence - Problems: dictionary can be huge. - Sparsity: - rare words - TF-IDF - TF-IDF reduces the score of more frequent words appearing in all reviews ## Thursday - What is API? ### Linear Regression from Scratch - We use gradient descent to minimize cost function (mean-squared-error) - $w = w - \alpha.dW$ - $b = b - \alpha.db$ - The learing rate $\alpha$ of the gradient descent control the rate of the gradient descent - Forward propagation: calculate cost function using bias and weights - Backward propagation: calculate and update w and b using cost function - One backward + forward is one iteration - One epoch is one time you use all data for iterations - Mini-batch is a technique that uses only part of the data - ![](https://i.stack.imgur.com/gnsVD.png) - $dW = \frac{dJ}{dw} = \frac{1}{m}2.(y-(wx+b))(y-(wx+b))'$ - $dW = \frac{dJ}{dw} = -\frac{2x}{m}(y-(wx+b))$ - $db = \frac{dJ}{db} = -\frac{2}{m}(y-(wx+b))$ - **Techniques:** - Standardization: transform the data to have mean 0 and std of 1 - Normalization: transform the data into range of (0,1) range ## Friday ### Logistic Regression from scratch :::info **Find this document incomplete?** Leave a comment! ::: ###### tags: `CoderSchool` `Mariana` `MachineLearning`