# Data Science Project List ## 1. Machine Learning ### 1.1 Core ML Libraries - [scikit-learn](https://scikit-learn.org/): Fundamental machine learning library in Python ### 1.2 Automated Machine Learning - [AutoML](https://www.automl.org/automl/): Overview of AutoML concepts and applications - [AutoGluon](https://auto.gluon.ai/stable/index.html): AutoML toolkit for deep learning models ### 1.3 Feature Engineering - Tools: - [Feature Engine](https://feature-engine.trainindata.com/) - [Featuretools](https://www.featuretools.com/) - Resource: [Feature Engineering Tools](https://neptune.ai/blog/feature-engineering-tools) ### 1.4 ML Model Interpretation & Explainability - Tools: - [LIME](https://github.com/marcotcr/lime) - [SHAP](https://github.com/shap/shap) - [Mlxtend](https://rasbt.github.io/mlxtend/) - [InterpretML](https://interpret.ml/) - Resources: - [ML Model Interpretation Tools](https://neptune.ai/blog/ml-model-interpretation-tools) - [Explainability & Auditability in ML](https://neptune.ai/blog/explainability-auditability-ml-definitions-techniques-tools) - [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/) ### 1.5 Hyperparameter Optimization - Tools: - [Ray Tune](https://docs.ray.io/en/latest/tune/index.html) - [Scikit-optimize](https://scikit-optimize.github.io/stable/index.html) - [OPTUNA](https://optuna.org/) - [Hyperopt](https://hyperopt.github.io/hyperopt/) - [BayesianOptimization](https://bayesian-optimization.github.io/BayesianOptimization/1.5.1/) - [GPyOpt](https://sheffieldml.github.io/GPyOpt/) - [SigOpt](https://sigopt.org/) - Resources: - [Best Tools for Model Tuning and Hyperparameter Optimization](https://neptune.ai/blog/best-tools-for-model-tuning-and-hyperparameter-optimization) - [Hyperparameter Tuning in Python: Complete Guide](https://neptune.ai/blog/hyperparameter-tuning-in-python-complete-guide) - [Optuna Guide](https://neptune.ai/blog/optuna-guide-how-to-monitor-hyper-parameter-optimization-runs) - [Optuna vs. Hyperopt](https://neptune.ai/blog/optuna-vs-hyperopt) ### 1.6 Model Quality Testing - Tools: - [Deepchecks](https://deepchecks.com/) - [Kolena](https://docs.kolena.com/) - Resource: [Tools for ML Model Testing](https://neptune.ai/blog/tools-ml-model-testing) ### 1.7 Boosting Algorithms - Tools: - [CatBoost](https://catboost.ai/) - [XGBoost](https://xgboost.readthedocs.io/) - [LightGBM](https://lightgbm.readthedocs.io/) - Resources: - [When to Choose CatBoost Over XGBoost or LightGBM](https://neptune.ai/blog/when-to-choose-catboost-over-xgboost-or-lightgbm) - [XGBoost vs. LightGBM](https://neptune.ai/blog/xgboost-vs-lightgbm) - [XGBoost: Everything You Need to Know](https://neptune.ai/blog/xgboost-everything-you-need-to-know) ### 1.8 Kernel Methods - Tool: [CodPy](https://github.com/johnlem/codpy_alpha) - Resource: [CodPy: a Python library for numerics, machine learning, and statistics](https://arxiv.org/abs/2402.07084) ## 2. Time Series Analysis ### 2.1 Time Series Forecasting - Tools: - [Skforecast](https://skforecast.org) - [Nixtla](https://www.nixtla.io/) - [Prophet](https://facebook.github.io/prophet/) - [GluonTS](https://auto.gluon.ai/stable/tutorials/timeseries/index.html) - Book: ["Forecasting: principles and practice"](https://OTexts.com/fpp3) by Hyndman & Athanasopoulos ### 2.2 Anomaly Detection - Tool: [PyOD](https://pyod.readthedocs.io/en/latest/) - Resources: - [Anomaly Detection Resources](https://github.com/yzhao062/anomaly-detection-resources) - [How to Use Python for Anomaly Detection](https://dataheadhunters.com/academy/how-to-use-python-for-anomaly-detection-in-data-detailed-steps/) ### 2.3 Change Point Detection - Tools: - [ruptures](https://centre-borelli.github.io/ruptures-docs/) - [Kats](https://facebookresearch.github.io/Kats/) - Resources: - [Change Point Detection In Time Series With Python](https://forecastegy.com/posts/change-point-detection-time-series-python/) - [A Brief Introduction to Change Point Detection using Python](https://techrando.com/2019/08/14/a-brief-introduction-to-change-point-detection-using-python/) ## 3. Computer Vision ### 3.1 Object Detection - Tools: - [ImageAI](https://imageai.readthedocs.io/) - [GluonCV](https://cv.gluon.ai/) - [Detectron2](https://ai.meta.com/blog/-detectron2-a-pytorch-based-modular-object-detection-library-/) - Resource: [Object Detection Algorithms and Libraries](https://neptune.ai/blog/object-detection-algorithms-and-libraries) ### 3.2 Image Processing - Tools: - [OpenCV](https://opencv.org/) - [Pillow (PIL Fork)](https://pillow.readthedocs.io/en/stable) - Resources: - [Image Processing in Python](https://neptune.ai/blog/image-processing-python) - [Image Processing Python Libraries](https://neptune.ai/blog/image-processing-python-libraries-for-machine-learning) - [PIL Image Tutorial](https://neptune.ai/blog/pil-image-tutorial-for-machine-learning) ### 3.3 Graph Neural Networks (GNN) - Tools: - [PyG](https://pytorch-geometric.readthedocs.io/) - [DGL](https://neptune.ai/blog/graph-neural-networks-libraries-tools-learning-resources) - [Graph Nets](https://github.com/google-deepmind/graph_nets) ## 4. Natural Language Processing (NLP) ### 4.1 NLP Libraries - Tools: - [NLTK](https://www.nltk.org/) - [pyLDAvis](https://pyldavis.readthedocs.io/) - [Wordcloud](https://amueller.github.io/word_cloud/index.html) - [AdaptNLP](https://novetta.github.io/adaptnlp) - [Flair](https://flairnlp.github.io/) - [Snorkel](https://www.snorkel.org/) - [GluonNLP](https://github.com/dmlc/gluon-nlp) - [spaCY](https://spacy.io/) ### 4.2 Exploratory Data Analysis for NLP - Resources: - [EDA for NLP Tools](https://neptune.ai/blog/exploratory-data-analysis-natural-language-processing-tools) - [Sentiment Analysis in Python](https://neptune.ai/blog/sentiment-analysis-python-textblob-vs-vader-vs-flair) - [pyLDAvis Topic Modelling](https://neptune.ai/blog/pyldavis-topic-modelling-exploration-tool-that-every-nlp-data-scientist-should-know) ### 4.3 Large Language Models (LLMs) - Repo: [LLM101n](https://github.com/karpathy/LLM101n) - Resources: - [A Very Gentle Introduction to Large Language Models without the Hype](https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e) - [LLM Training: RLHF and Its Alternatives](https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives) - [Prompt Engineering: Classification of Techniques and Prompt Tuning](https://medium.com/the-modern-scientist/prompt-engineering-classification-of-techniques-and-prompt-tuning-6d4247b9b64c) - [Automated Prompt Engineering: The Definitive Hands-on Guide](https://medium.com/towards-data-science/automated-prompt-engineering-the-definitive-hands-on-guide-1476c8cd3c50) ## 5. Optimization ### 5.1 Convex Optimization - Tool: [CVXPY](https://www.cvxpy.org/) - Book: [Convex Optimization](https://web.stanford.edu/~boyd/cvxbook/) by Boyd and Vandenberghe - Tutorials: - [CVXPY Tutorial](https://youtu.be/kXqu-TqEl7Q?si=FqTy3758kXgZhmHH) - [Solving Sudoku with CVXPY](https://youtu.be/USaishDES9s?si=WPBcALQt1Nh5aJxm) ### 5.2 Genetic Algorithms - Tool: [PyGAD](https://pygad.readthedocs.io/) - Resource: [Train PyTorch Models Using Genetic Algorithm with PyGAD](https://neptune.ai/blog/train-pytorch-models-using-genetic-algorithm-with-pygad) - Book: [Introduction to Evolutionary Computing](https://link.springer.com/book/10.1007/978-3-662-44874-8) by Eiben & Smith ### 5.3 Portfolio Optimization - Tools: - [skfolio](https://skfolio.org/) - [pyportfolioopt](https://pyportfolioopt.readthedocs.io/) ## 6. Reinforcement Learning (RL) - Tool: [Gym](https://gymnasium.farama.org/) - Resources: - [The Best Tools for Reinforcement Learning in Python](https://neptune.ai/blog/the-best-tools-for-reinforcement-learning-in-python) - [AI Reinforcement Learning with OpenAI's Gym](https://levelup.gitconnected.com/ai-reinforcement-learning-with-openais-gym-ead06726663a) ## 7. Other Topics ### 7.1 Bayesian Data Analysis - Tools: - [PyMC](https://www.pymc.io/welcome.html) - [ArviZ](https://www.arviz.org/en/latest/) - [Bambi](https://bambinos.github.io/bambi/) - [NumPyro](https://num.pyro.ai/en/stable/index.html) - Book: [Bayesian Data Analysis](http://www.stat.columbia.edu/~gelman/book/) by Gelman et al. ### 7.2 Causal Machine Learning - Tool: [CasualML](https://causalml.readthedocs.io/en/latest/index.html) - Books: - [Python Causality Handbook](https://matheusfacure.github.io/python-causality-handbook/landing-page.html) - [Causal Inference: What If](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) ### 7.3 Probabilistic Machine Learning - Repo: [pyprobml](https://github.com/probml/pyprobml) - Books: - ["Probabilistic Machine Learning" - a book series](http://probml.ai/) by Kevin Murphy - [Bayesian Reasoning and Machine Learning](http://www.cs.ucl.ac.uk/staff/d.barber/brml/) by David Barber ### 7.4 Recommender Systems - Tools: - [Suprise](https://surpriselib.com/) - [Recommenders](https://github.com/recommenders-team/recommenders) - [RecTools](https://github.com/MobileTeleSystems/RecTools) - Tutorial: [Practical Guide to Building Scalable Recommender Systems in Python](https://medium.com/@anilcogalan/practical-guide-to-building-scalable-recommender-systems-in-python-b175547e6fce) ### 7.5 Symbolic Regression - Tools: [PySR](https://astroautomata.com/PySR/) - Tutorials: - [A tutorial on simulation-based inference](https://astroautomata.com/blog/simulation-based-inference/) - [Powerful ‘Machine Scientists’ Distill the Laws of Physics From Raw Data](https://www.quantamagazine.org/machine-scientists-distill-the-laws-of-physics-from-raw-data-20220510/)
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up