Python scikit-learn 機械学習ライブラリ 2021-07-23

--- lang: ja-jp breaks: true --- # Python scikit-learn 機械学習ライブラリ 2021-07-23 ## 参考資料 > Python - scikit-learn による初歩の機械学習(Hello, world!) > https://9cubed.info/article/python/0046 > 【Python講座】第15回 scikit-learn による初歩の機械学習(Hello, world!)【独り言】 > https://youtu.be/YEo13-vH8Zk > {%youtube YEo13-vH8Zk %} > scikit-learn > https://scikit-learn.org/stable/ > scikit-learn/scikit-learn > https://github.com/scikit-learn/scikit-learn > User Guide > https://scikit-learn.org/stable/user_guide.html > Examples > https://scikit-learn.org/stable/auto_examples/index.html > Choosing the right estimator > https://scikit-learn.org/stable/_static/ml_map.png > ![](https://hackmd.io/_uploads/SJBfEZghJe.png) ## 検証環境 ```shell= >pip list | find "sklearn" sklearn 0.0 ``` ```python= import sklearn print("sklearn :" , sklearn.__version__) # sklearn : 0.24.2 ``` ```shell= >python -V Python 3.8.10 ``` ## サポートするアルゴリズム 1. サポートベクトルマシンに基づくクラス分類 ```= clf = LinearSVC(); ``` 1. ランダムフォレスト・クラス分類 ```= clf = RandomForestClassifier(); ``` 3. K−近傍法・クラス分類 ```= clf = KNeighborsClassifier(n_neighbors = 1); ``` 1. Supervised learning 1.1. Linear Models 1.2. Linear and Quadratic Discriminant Analysis 1.3. Kernel ridge regression 1.4. Support Vector Machines 1.5. Stochastic Gradient Descent 1.6. Nearest Neighbors 1.7. Gaussian Processes 1.8. Cross decomposition 1.9. Naive Bayes 1.10. Decision Trees 1.11. Ensemble methods 1.12. Multiclass and multioutput algorithms 1.13. Feature selection 1.14. Semi-supervised learning 1.15. Isotonic regression 1.16. Probability calibration 1.17. Neural network models (supervised) 2. Unsupervised learning 2.1. Gaussian mixture models 2.2. Manifold learning 2.3. Clustering 2.4. Biclustering 2.5. Decomposing signals in components (matrix factorization problems) 2.6. Covariance estimation 2.7. Novelty and Outlier Detection 2.8. Density Estimation 2.9. Neural network models (unsupervised) 3. Model selection and evaluation 3.1. Cross-validation: evaluating estimator performance 3.2. Tuning the hyper-parameters of an estimator 3.3. Metrics and scoring: quantifying the quality of predictions 3.4. Validation curves: plotting scores to evaluate models 4. Inspection 4.1. Partial Dependence and Individual Conditional Expectation plots 4.2. Permutation feature importance 5. Visualizations 5.1. Available Plotting Utilities 6. Dataset transformations 6.1. Pipelines and composite estimators 6.2. Feature extraction 6.3. Preprocessing data 6.4. Imputation of missing values 6.5. Unsupervised dimensionality reduction 6.6. Random Projection 6.7. Kernel Approximation 6.8. Pairwise metrics, Affinities and Kernels 6.9. Transforming the prediction target (y) 7. Dataset loading utilities 7.1. Toy datasets 7.2. Real world datasets 7.3. Generated datasets 7.4. Loading other datasets 8. Computing with scikit-learn 8.1. Strategies to scale computationally: bigger data 8.2. Computational Performance 8.3. Parallelism, resource management, and configuration 9. Model persistence 9.1. Python specific serialization 9.2. Interoperable formats 10. Common pitfalls and recommended practices 10.1. Inconsistent preprocessing 10.2. Data leakage 10.3. Controlling randomness ## ###### tags: `Python` `scikit-learn` `sklearn` `機械学習`