---
title: 'Tâm - Week 4'
tags: CoderSchool, Mariana
---
Week 4
===
## Table of Contents
[TOC]
## Monday
### Intro to Machine Learning
- Scalar: a number, a vector:
- matrix: combinations of vector
- Vector addition: adding element-wise, must be of the same dimension
- Norm is the length of the vector (between two points end point and start point)
- Manhattan distance is the sum of the two end points
- Norm infinity $||v||_{infinity}$ is the largest element
- $L_p$ is $\sqrt[p]{x_1^p + x_2^p + ... + x_n^p}$
- Tensor is multidimensional vector like
- Matrix multiplication http://matrixmultiplication.xyz/?source=post_page--------------------------- the length of the second matrix must passed the the width of the first matrix
- **Derivative:**
- sum
- power
- log
- product
- quotient
- chain rule
- Overfitting
- Mean squared error
- Loss function/Cost function measures the mean square error
- **Vectorization:**
- use vector to apply operation to all elements of vector
- Numpy type casting
- numpy will convert the type of all elements to one type if possible
- `numpy.shape`
- reshape(3,1) : three rows, one column `[[1,2,3]]`
- shape (3,) is 1D: `[1,2,3]`
- create list of zeros with `np.zeros` and list of ones with `np.ones`
- `np.ones(2,5)` array with 2 rows 5 columns
- `c = np.full((2,2), 7)` results an array 2x2 of 7s
- Numpy indexing: flatten (-1,1) or (-1,n) with n is a number that col or row divisible by
- Statistics: mean max min max(axis=1) std var
- Array broadcasting
- Using numpy array is around 400 times faster than for loop
## Tuesday
### ML overview
- Review ML basics from Week 1
- How machines learn : they learn from error -> start with a random line -> calculate error -> repeat to reduce error
### Linear Regression
- **Simple Linear Regression:**
- Data Engineer: take care of ETL (extract transfer load)
- Mean Squared Error: the sum of squared error of the line vs the data points
### Polynomial Regression
### Overfitting
### Practice
`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)`
- random_state makes sure the train test split is the same
```
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train,y_train)
```
model.fit returns the model fitted to the data, include `model.coef_` and `model.intercept_`
`model.predict(X_test)`
### Degree Complexity
- Early stopping: stop increasing training degree complexity when test set error starts increasing
### Data Leaking
- The label depends or correlate with the training data instead of being independent
### Extra bonus problem
- https://hackmd.io/@sDKJIkjlTzyk6tlHoM59Jg/ry4fBqCAQ?type=view
- get data : https://data.worldbank.org/?locations=MY-VN-US-CN
-
## Wednesday
### Logistic Regression
- Confusion matrix: our goal is maximize the diagonal where we predict correctly. Type I error is false positive (when we predict positive wrongly). Type II error is false negative (when we predict negative wrongly)
### Natural Language Processing
- Bag of Words: the length of array is the total number of words in bag of words, the sum of the elements equal the number of words in the sentence
- Problems: dictionary can be huge.
- Sparsity:
- rare words
- TF-IDF
- TF-IDF reduces the score of more frequent words appearing in all reviews
## Thursday
- What is API?
### Linear Regression from Scratch
- We use gradient descent to minimize cost function (mean-squared-error)
- $w = w - \alpha.dW$
- $b = b - \alpha.db$
- The learing rate $\alpha$ of the gradient descent control the rate of the gradient descent
- Forward propagation: calculate cost function using bias and weights
- Backward propagation: calculate and update w and b using cost function
- One backward + forward is one iteration
- One epoch is one time you use all data for iterations
- Mini-batch is a technique that uses only part of the data
- 
- $dW = \frac{dJ}{dw} = \frac{1}{m}2.(y-(wx+b))(y-(wx+b))'$
- $dW = \frac{dJ}{dw} = -\frac{2x}{m}(y-(wx+b))$
- $db = \frac{dJ}{db} = -\frac{2}{m}(y-(wx+b))$
- **Techniques:**
- Standardization: transform the data to have mean 0 and std of 1
- Normalization: transform the data into range of (0,1) range
## Friday
### Logistic Regression from scratch
:::info
**Find this document incomplete?** Leave a comment!
:::
###### tags: `CoderSchool` `Mariana` `MachineLearning`