owned this note
owned this note
Published
Linked with GitHub
---
tags: ggg, ggg2020, ggg298
---
# Syllabus for GGG 298: Tools to support data-intensive research (Winter 2019/2020)
UC Davis
## URL: http://bit.ly/ggg298syllabus
[toc]
## Code of Conduct
Please abide by [my lab's Code of Conduct](http://ivory.idyll.org/lab/coc.html) in this course.
In particular, this is not an intellectual contest, and please realize that we all have plenty of things to learn.
## Course sessions
[GitHub site for class](https://github.com/ngs-docs/2020-GGG298) - redundant with links below, but more permanent!
### Lab
What: hands-on computational work
When: Wed 9:15am-noon
Where: Bennett Conference Room 203 (2nd floor of the Center for Companion Animal Health)
Sticky notes!!
### Discussion
What: read & discuss a paper.
When: Fri 12-1pm
Where: Shields Library, room 360 (Datalab/Data Science Initiative main classroom)
## Homework
There will be 8 paragraph-length homeworks due, one each week on the reading; they'll be assigned a week in advance and due on Fridays at 11am.
## Grading
The course is pass/fail, and only graded on homework; you need to hand in 6 of the 8 homeworks to pass.
## Office hours
Office hours to meet with Titus will be from 3-5pm on Wednesdays in CCAH 251 (just down the hallway from Bennett); please use [this online signup sheet](https://calendly.com/ctitusbrown/office-hours). Don't worry too much about the specific time, this is just to signal to me that you want to talk to me (and puts it on my calendar!). (If no one signs up by 1pm of that Wednesday, I will feel free not to show up!)
Note that Titus is busy on 1/15 between 3-5pm, and out of town on 2/5 and 2/26.
Shannon's office hour will be from 11am to 12am on Fridays in the DataLab. Please email me (Shannon) to let me know if you plan to come.
If neither of the listed office hours work and you have questions please email Shannon (sejoslin@ucdavis.edu) to setup a time to meet.
## Instructors
C. Titus Brown (IOR) (<ctbrown@ucdavis.edu>), Shannon Joslin (<sejoslin@ucdavis.edu>).
## Course description
This course will provide a practical introduction to common tools used in data-intensive research, including the UNIX shell, version control with git, RMarkdown, JupyterLab, and workflows with snakemake. The associated discussion section will connect the lab practicals to foundational concepts in data science, including repeatability/reproducibility, statistics, and publication ethics.
This course is open to all graduate students. No prior computational experience is required or assumed. There will be some minimal overlap with GGG 201(b) topics. All materials will be open to the community and freely available online.
## Schedule of lab topics
Wednesdays, 9-noon: Bennett Conference Room (2nd floor Center for Companion Animal Health).
These will be lab practicals where we take a solid look at a given piece of technology.
1. 1/08 : [Basic UNIX + R/RMarkdown](https://hackmd.io/@ctb/S1_mb0fe8)
2. 1/15 : [UNIX bash shell for file manipulation](https://github.com/ngs-docs/2020-GGG298/tree/master/Week2-UNIX_for_file_manipulation)
3. 1/22 : [conda for software installation](https://hackmd.io/@ctb/BkbkefV-U)
4. 1/29 : [snakemake for data intensive workflows](https://hackmd.io/@ctb/H1MUty3ZU)
5. 2/05 : [Project organization and file manipulation](https://hackmd.io/ppKOha6USvWvk8J6KV9omA)
6. 2/12 : [git and GitHub for change tracking in scripts](https://hackmd.io/@ctb/ryBQTVxXL)
7. 2/19 : [Slurm and the Farm cluster for doing analysis](https://github.com/ngs-docs/2020-GGG298/tree/master/Week7-Slurm_and_Farm_cluster_for_doing_analysis)
8. 2/26 : [RMarkdown for reports, documentation & beyond](https://github.com/ngs-docs/2020-GGG298/blob/master/Week8-Rmarkdown_for_reports_documentation_and_beyond/README.md)
9. 3/04 : [Integrating all the things - a sourmash project!](https://hackmd.io/XgI03HNBRtS6kyKcycKFLA?view)
10. 3/11 : [Advanced Intro UNIX and integration](https://github.com/ngs-docs/2020-GGG298/tree/master/Week10-advanced_intro_UNIX_and_integration)
## Paper discussions
Fridays, noon-1pm: 360 Shields Library (Data Science Initiative classroom).
These will be discussion periods where we explore some of the literature on techniques and processes for (biological) data science.
1. 1/10 - Read and discuss: [A preliminary review of influential works in data-driven discovery](https://springerplus.springeropen.com/articles/10.1186/s40064-016-2888-8), Stalzer & Mentzel, 2016.
2. 1/17 - More on week 1 paper; [first homework due](https://hackmd.io/@ctb/S1_mb0fe8#Homework-for-Week-2).
3. 1/24 - [second homework due](https://hackmd.io/O6MaR9tMSxazAC_UCr_hTg?view#Homework-for-Week-3)
4. 1/31 - [third homework due](https://hackmd.io/To23drs_STONN1zdFb2hkw?view#Homework-for-Week-4)
5. 2/07 - [fourth homework due](https://hackmd.io/UuYTlGyVQ7WLTL-3kX3K5A?view#Homework-for-Week-5)
6. 2/14
7. 2/21 - [homework for week 7](https://hackmd.io/GOM86OhHQ-WKdwxCk1Xtyw?view#Friday-discussion---221)
8. 2/28
9. 3/06
10. 3/13