# Intermediate R: Machine Learning https://hackmd.io/@k8hertweck/RML **Sign in to each class meeting** [here](https://goo.gl/forms/j4MbWJuPoIYeJET12) This page is for easy access to links we'll use during class. Bookmark it for future reference. Have you installed Anaconda and run the conda script to add plotnine? Instructions [here](http://www.fredhutch.io/software/#python-jupyter-notebooks) If you have feedback about this course, please [comment here](https://goo.gl/forms/Bw8dTV0Wghq2iG5i2) Complete class notes [here](https://github.com/fredhutchio/python_machine_learning) If you're having trouble viewing a notebook directly in github, try pasting the url into this [Jupyter Notebook Viewer](https://nbviewer.jupyter.org/) to render the notebook as a static webpage. _**Note on directory structure**_: Whatever you name your directory for this course, make sure it has these three directories inside: `data/`, `img/`, and `notebooks/` **Week 1: Machine Learning and CRISP-DM Overview; EDA and Data Preparation** * Files * [week1.ipynb](https://raw.githubusercontent.com/fredhutchio/python_machine_learning/master/notebooks/week1.ipynb) <-- using `save as...`, save this notebook inside `notebooks/` making sure it has the `.ipynb` filename extension * (1) [commute-times-train.csv](https://github.com/fredhutchio/python_machine_learning/raw/master/data/commute-times-train.csv); (2) [commute-times-test.csv](https://github.com/fredhutchio/python_machine_learning/raw/master/data/commute-times-test.csv) <-- using `save as...`, save these 2 files inside `data/` making sure they have the `.csv` filename extension * Reference * [Week 1 Slides from Concepts in ML](https://github.com/fredhutchio/concepts_machine_learning/blob/master/slides/ML_concepts_wk1_slides.pdf) * [CRISP-DM](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining) * [Machine Learning](https://en.wikipedia.org/wiki/Machine_learning) **Week 2: Case Study in Regression** * Files * [week2.ipynb](https://raw.githubusercontent.com/fredhutchio/python_machine_learning/master/notebooks/week2.ipynb) <-- using `save as...`, save this notebook inside `notebooks/` making sure it has the `.ipynb` filename extension * Reference * [Week 2 Slides from Concepts in ML](https://github.com/fredhutchio/concepts_machine_learning/blob/master/slides/ML_concepts_wk2_slides.pdf) * [Supervised Learning](https://en.wikipedia.org/wiki/Supervised_learning) * [Regression Analysis](https://en.wikipedia.org/wiki/Regression_analysis) **Week 3: Classification with Logistic Regression and Random Forests** * Files * [week3.ipynb](https://raw.githubusercontent.com/fredhutchio/python_machine_learning/master/notebooks/week3.ipynb) <-- using `save as...`, save this notebook inside `notebooks/` making sure it has the `.ipynb` filename extension * [Confusion_Matrix.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/Confusion_Matrix.png) <-- using `save as...`, save this image file inside `img/` making sure it has the `.png` filename extension * [tennis.txt](https://raw.githubusercontent.com/fredhutchio/python_machine_learning/master/data/tennis.txt) <-- using `save as...`, save this text file inside `data/` making sure it has the `.txt` filename extension * Reference * [Week 3 Slides from Concepts in ML](https://github.com/fredhutchio/concepts_machine_learning/blob/master/slides/ML_concepts_wk3_slides.pdf) * [Statistical Classification](https://en.wikipedia.org/wiki/Statistical_classification) * [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) * [Confusion Matrix](https://en.wikipedia.org/wiki/Confusion_matrix) * [Receiver Operating Characteristic (ROC Curve)](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) * [Random Forest](https://en.wikipedia.org/wiki/Random_forest) **Week 4: PCA and Clustering; Case Study in Unsupervised Learning** * Files * [week4.ipynb](https://github.com/fredhutchio/python_machine_learning/raw/master/notebooks/week4.ipynb) <-- using `save as...`, save this notebook inside `notebooks/` making sure it has the `.ipynb` filename extension * (1) [NCI60_X.csv](https://github.com/fredhutchio/python_machine_learning/raw/master/data/NCI60_X.csv); (2) [NCI60_y.csv](https://github.com/fredhutchio/python_machine_learning/raw/master/data/NCI60_y.csv); (3) [USArrests.csv](https://github.com/fredhutchio/python_machine_learning/raw/master/data/USArrests.csv) <-- using `save as...`, save these 3 files inside `data/` making sure they have the `.csv` filename extension * [pca.gif](https://github.com/fredhutchio/python_machine_learning/raw/master/img/pca.gif) <-- using `save as...`, save this notebook inside `img/` making sure it has the `.gif` filename extension * (1) [clusters.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/clusters.png); (2) [iris-measurements.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/iris-measurements.png); (3) [kmeans.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/kmeans.png); (4) [letters-dendrogram.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/letters-dendrogram.png); (5) [letters-grouped.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/letters-grouped.png); (6) [letters-ungrouped.png](https://github.com/fredhutchio/python_machine_learning/raw/master/img/letters-ungrouped.png) <-- using `save as...`, save these 6 files inside `img/` making sure they have the `.png` filename extension * Reference * [Week 4 Slides from Concepts in ML](https://github.com/fredhutchio/concepts_machine_learning/blob/master/slides/ML_concepts_wk4_slides.pdf) * [Unsupervised Learning](https://en.wikipedia.org/wiki/Unsupervised_learning) * [Clustering Analysis](https://en.wikipedia.org/wiki/Cluster_analysis) * [Principal Component Analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) * [Curse of Dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) **Resources for continued learning** * Learn about other courses through fredhutch.io [here](http://www.fredhutch.io/resources/). * The Fred Hutch Bioinformatics and Data Science Cooperative, or the Coop, hosts many community meetings and office hours about data science. Learn more information about these groups [here](https://research.fhcrc.org/coop/en/community/hosted-groups.html), * Join the [Coop Community Slack](https://join.slack.com/t/fhbig/shared_invite/enQtMzUyMDIxNzk3MDU3LWE5NGUyMTY1NGU0N2VmMmEyNTM5YzM1MmNlMTk2YmM1OWNkMmJiNTQxMTQ4OTNkMTFjMjk3M2Q0MzkwYzQ3NDA) to talk about data science with other Hutch researchers! * The [Fred Hutch Biomedical Data Science Wiki](https://sciwiki.fredhutch.org) is written by Hutch researchers and staff, and is a great place to find information about data management, bioinformatics, computing, and more. ###### tags: `fredhutch.io` `Machinelearning` `python`