# Machine Learning & Scikit-learn
###### tags: `Machine Learning` `Python` `Scikit-learn`
---
## Before Modeling
### Split Data
* sklearn.model_selection.train_test_split(X, y, test_size=0.2)
### Data Normaizaton
* sklearn.preprocessing.StandardScaler()
* sklearn.preprocessing.MinMaxScaler()
### Validate Trained Model
* model.predict
* sklearn.metrics.mean_squared_error()
* sklearn.metrics.r2_score()
* sklearn.metrics.accuracy_score()
* sklearn.metrics.confusion_matrix()
---
## Regression Supervised Learning
### Linear Regression
* sklearn.linear_model.LinearRegression()
* .fit
* .coef_
### Ploynomial Regression
* sklearn.preprocessing.PolynomialFeatures(degree=n)
* .transform
* .fit
* .fit_transform
* .coef_
### Ridge, Lasso, ElasticNet
* sklearn.linear_model.Lasso(alpha=n)
* sklearn.linear_model.Ridge(alpha=n)
* sklearn.linear_model.ElasticNet(alpha=n, l1_ratio=m)
---
## Classification Supervised Learning
### Logistic Regression
* sklearn.linear_model.LogisticRegression()
### K-Nearest Neighbor (KNN)
* sklearn.neighbors.KNeighborsClassifier(n_neighbors=n)
### Decision Tree
#### CART
* sklearn.tree.DecisionTreeClassifier(max_depth=m)
* sklearn.tree.DecisionTreeClassifier(criterion='gini',max_depth=m)
#### ID3
* sklearn.tree.DecisionTreeClassifier(criterion='entropy',max_depth=m)
### Naive Bayes
* sklearn.naive_bayes.GaussianNB()
* sklearn.naive_bayes.MultinomialNB()
* When using Multinomial Naive Bayes, the data must greater than 0
* Suggest to using Min-Max Scaler for Normalization
### Random Forests
* sklearn.ensemble.RandomForestClassifier(max_depth=m, n_estimators=n)
### Support Vector Machine (SVM)
* sklearn.svm.SVC(kernel='rbf', C=c)
* sklearn.svm.SVC(kernel='poly', C=c)
* sklearn.svm.SVC(kernel='linear', C=c)
* kernel for mapping data to higher dimension
* C for penalty parameter
---
## Model Selection
### K-fold Model Evaluation
* sklearn.model_selection.cross_val_score(estimator, X, y, cv=n, scoring='accuracy')
### Learning Curve
* sklearn.model_selection.learning_curve(estimator, X, y, cv=n, scoring='accuracy',train_sizes=array())
### Grid Search
* sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring = 'accuracy', cv=n)
---
## Classification Unsupervised Learning
### K-means
* sklearn.cluster.KMeans(n_clusters=n)
* .cluster_centers_ (centroids)
* silhouette score
* sklearn.metrics.silhouette_score(X, labels)
### DBSCAN
* Density-based spatial clustering of applications with noise
* sklearn.cluster.dbscan(X, eps=r, min_samples=n)
### EM
* Expection-Maximization
* sklearn.mixture.GaussianMixture(n_components=n)
---
## Dimension Reduction
### SVD
* sklearn.decomposition.TruncatedSVD(n_components=n)
### PCA
* sklearn.decomposition.PCA(n_components=n)
### T-SNE
* sklearn.manifold.TSNE(n_components=n)
---
## Ensemble Learning
### Bagging method
* sklearn.ensemble.BaggingClassifier(base_estimator=None, n_estimators=n)
* sklearn.ensemble.BaggingRegressor(base_estimator=None, n_estimators=n)
### Boosting method
* sklearn.ensemble.AdaBoostClassifier(base_estimator=None, n_estimators=n)
### XGBOOST
* https://towardsdatascience.com/xgboost-python-example-42777d01001e
* xgboost.xgbregressor(n_estimators=n,reg_lambda=1,gamma=0,max_depth=3)
### Averaging
* https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html
---
## Feature Engineering
### Data Exploration
### Feature Cleaning
### Feature Engineering
### Feature Selection