## Group 2 Project: Machine Learning
## Project Discussion
### Planning ###
1. Define Goals
- Steph:
- build a docker image so someone else can click and run an analysis pipeline
- gain experience using github, slack, and hackmd for managing a project and version control
- apply workflow management tools (if time)
- learn what works and what doesn't for reproducible science
- Sarah:
- gain experience with Docker in the context of other code
- learn about how to work with data repositories and what works well and doesn't work well for storing and accessing data
- improve Python and R skills by working with other's code
- learn how to implement workflow management tools
- Toktam:
- Learn how to apply reproducible research methods to my own reseach.
- Learn how to work with containers such as docker and singularity.
- Learn professional management using GitHub.
- Learn to add binders to the GitHub repositories to launch the docker images and run the code authomatically.
- Use resources such as HachMD for project documentation.
- Learn to use workflow tools for the projects.
2. Synthesize Goals
- Build a Docker image and integrate with binder for easy launch
- Start with the pre-processing step and if time add analysis
- Gain more confidence in reproducible computing tools and interdisciplinary project management
- If time, reproduce analysis
- If time, implement automated workflow in Snakemake or similar
4. Pick Topic
*"MNIST dataset classification using machine learning models"*
5. Scaffold Collaboration
* [Github repo](https://github.com/cyber-carpentry/group2-machine-learning) - code, project management, final results,and final write up
* Slack - messaging
* [HackMD](https://hackmd.io/8IlRqMagSr-wxBMXtmtgnA) - notes and resources, which is this page :)
## Project Goals
1. Design a neural network to classify each of the three types of data.
2. Make it reproducible using github, docker, etc.
3. Dockerize the neural network so that it is reuseable on other data.
## Project Resources
[Our Github Page](https://github.com/cyber-carpentry/group2-machine-learning)
[Original Project Page](https://hackmd.io/@dasberry/H15UULJfr)
[Our Final Presentation](https://docs.google.com/presentation/d/1WaltYDpdswiPNjTpKWBtGwahz8-ENnlFB8Iw8DHlJYo/edit?usp=sharing)
[Example Dockerization](https://github.com/NCBI-Hackathons/RNA-Seq-in-the-Cloud/tree/master/Generative%20Adversarial%20Networks)
[MNIST](http://yann.lecun.com/exdb/mnist/) | [Example solution to MNIST](https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py)
## Project Code
### Adding code from other repos to this github repo ###
```sh
git clone repoyouwant
cd repoyouwant
git remote add destination https://github.com/cyber-carpentry/group2-machine-learning
git remote rm origin
git pull --allow-unrelated-histories https://github.com/cyber-carpentry/group2-machine-learning
git push -u destination master
```
### Building a docker image to run a tensorflow jupyter notebook ###
```sh
docker pull jupyter/tensorflow-notebook
docker run -p 8888:8888 jupyter/tensorflow-notebook
```
With the jupyter notbook running in your browser, Write the jupyter notebook code or upload an existing jupyter notebook.
Make a Dockerfile that reads the following: (where mynotebook.ipynb is the name of the jupyter notebook you want to use)
```sh
FROM jupyter/tensorflow-notebook:7a3e968dd212
ADD --chown=100 mynotebook.ipynb /home/jovyan/
```
Build a docker image that preloads your notebook where 'tfnotebook' is the name of your docker image
```sh
docker build -f Dockerfile -t ${DOCKERUSERNAME}/tfnotebook .
```
Then run the image
```sh
docker run =p 8888:8888 ${DOCKERUSERNAME}/tfnotebook
```
### [SnakeMake Notes](https://hackmd.io/vxFyjBMIRcavIDJpWyPsDA#Load-snakemake-Docker-image)
## Ongoing Questions:
If we want to use a docker container to deploy multiple instances that train our neural network model through distributed training, how do we set that up?
- Amazon Sagemaker seems to automatically deploy multiple instances and run distributed training
## Deliverables:
1. github repo
2. python source
4. Dockerfile
<!-- ### [Paper Preprint](http://faculty.virginia.edu/goodall/Sadler_JHE_2018_Preprint.pdf)
### [Source Project GitHub ](https://github.com/uva-hydroinformatics/flood_data)
### [Our GitHub Project ](https://github.com/cyber-carpentry/group2-hydroshare/projects)
### [HydroShare Collection Resource](https://www.hydroshare.org/resource/9db60cf6c8394a0fa24777c8b9363a9b/)
### [Hydrology Slides](https://drive.google.com/file/d/1s-WrQ8uJl908Yb59-heXt5vUcUb7eRbd/view)
### [Additional material](https://drive.google.com/open?id=1T4bUQztoRkm5S6P6fnpC-a5FttteE-jz) -->