DC Genomics Workshop 06-27/28-2018

# DC Genomics Workshop 06-27/28-2018 ###### tags: `dc genomics` Example DC genomics outline: https://dib-lab.github.io/2018-06-27-DIBSI-Genomics/ ## Workflow suggestions + We propose a change to the order of lessons. We propose: + Day 1 = Introduction to command line for bioinformatics + Why shell? (use tools, automate) + [Why of cloud computing?](http://www.datacarpentry.org/cloud-genomics/01-why-cloud-computing/) (more space. also note you need shell to cloud compute) + remove "choosing a cloud section" + *NB* this section will be quite short. + [Cloud Genomics, Episode 2: Logging onto cloud](http://www.datacarpentry.org/cloud-genomics/) + Talk about command structure when `ssh`ing + [Shell Genomics](http://www.datacarpentry.org/shell-genomics/) - on cloud, written around a text file. This could be the metadata file, that we reveal later. It could include all [2,443 Lenski samples](https://www.ebi.ac.uk/ena/data/view/PRJNA380528). [Meta-data here](https://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=PRJNA380528&result=read_run&fields=study_accession,sample_accession,secondary_sample_accession,experiment_accession,run_accession,tax_id,scientific_name,instrument_model,library_layout,fastq_ftp,fastq_galaxy,submitted_ftp,submitted_galaxy,sra_ftp,sra_galaxy,cram_index_ftp,cram_index_galaxy&download=txt). Include `fasta` file as well. + [Episode 3](http://www.datacarpentry.org/shell-genomics/03-working-with-files/index.html) needs a rewrite. We think we need to cover `cd`, `rm`, `head`, `tail`, `cat`, `print`, `mv`, `cp`, `grep`, `wc`, `less`, `man`, `scp` (teach with `cp`), `curl` + Show `grep` by grepping for our 6 samples. + We think this could be named "Exploring the Shell" + @tomsing1 pirate treasure hunt to demonstrate folder structure in a rewarding way + Add **optional** episode that includes `cut`, `paste`, `sort`, `uniq`, `awk` + [Shell Genomics, Episode 4: Pipes & Redirection](http://www.datacarpentry.org/shell-genomics/04-redirection/index.html) + right now includes `>`, `|`, `sort`, `wc`, and `uniq`, `cut`, `paste` + We think it should only include `>` and `|` + [Shell Genomics, Episode 05: Writing Scripts](http://www.datacarpentry.org/shell-genomics/05-writing-scripts/index.html). + Change name to "Writing For loops & scripts" + Don't write a script using `history`. + Write the script in `nano` + Modify to include `for` loop, addressing variables (`$`) and arguments (`$1`) + Also use `print` in the for loop, like [@ctb's Beginner Unix lesson](http://dib-training.readthedocs.io/en/pub/2016-01-13-adv-beg-shell.html). + **For non-novice learners optional:** Introduce `tmux`/`screen`, perhaps with for loops. + Consider making two episodes + Day 02 + Move [Shell Genomics, Episode 6: Project Organization](http://www.datacarpentry.org/shell-genomics/06-organization/index.html) to [Wrangling Genomics: Variant Calling Workflow](http://www.datacarpentry.org/wrangling-genomics/03-automation/index.html) + Project organization and management + Data Tidiness (see https://hackmd.io/DbFMOsvATai6Fb8UbEmZLw#) + nix formal [Genomics Organization, Episode 02: Planning for NGS Projects](http://www.datacarpentry.org/organization-genomics/02-project-planning/), roll this info into data tidiness where we will have a more relevant spreadsheet to work with + Download data instead of moving from hidden files. Download a subsampled dataset that we post on figshare. Note that figshare is acting as our backup. + 90% of chromosome can be thrown out with high coverage of 10% + [Wrangling Genomics, Episode 01: Assessing Read Quality](http://www.datacarpentry.org/wrangling-genomics/00-quality-control/index.html) + Back up plan with cyberduck/filezilla + [Wrangling Genomics, Episode 02: Trimming and Filtering](http://www.datacarpentry.org/wrangling-genomics/01-trimming/index.html) + Add a section to show the Trimmomatic manual. + [Wrangling Genomics, Episode 03: Variant Calling Workflow](http://www.datacarpentry.org/wrangling-genomics/03-automation/index.html) + add information on all of the flags used in the different commands + [Wrangling Genomics, Episode 04: Automating a Variant Calling Workflow](http://www.datacarpentry.org/wrangling-genomics/03-automation/index.html) + Only live code the "Automating QCing" section. + Allow the learners to download a full automated script to look at + Move "[Genomics Organization, Episode 04: Examining Data on the NCBI SRA Database](http://www.datacarpentry.org/organization-genomics/03-ncbi-sra/)" to the end of Day 2, and include other resources. Demonstrate finding the SRR accession number in the paper, searching for it in the ENA, and downloading a fastq file with `wget`. + This is also nice bc people are tired at the end of day, and we can give them goodies here :) ## Additional suggestions + use GitBash instead of PuTTy. Include pasting instructions in GitBash, and note that `open` and `man` don't work in GitBash. Relates to https://github.com/datacarpentry/genomics-workshop/issues/41 + Change the dataset to longer reads (~150bp) from Lenski lab as suggested in https://github.com/datacarpentry/genomics-workshop/issues/42 + we suggest removing the use of `df` and `tree` in favor of reinforcing the command `ls` + Remove the lesson "Fine tuning your cloud setup" + Download subsampled fastq files data an s3 bucket or figshare and talk about write-protecting data once downloaded + We need to demonstrate a best practice workflow (for example if we tell people to write-protect files we should then write-protect them in our workflow) + we need to find (or make) a new *exemplar* csv file of meta data that answers a biological question that correponds to a new dataset which has been proposed in https://github.com/datacarpentry/genomics-workshop/issues/42 + make sure that the snps are biologically relvant + we think this paper relates to the data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988878/ + We need to create a terrible and beautiful metadata file for these data + the workshop template really needs editing so that the default does not populate a workshop webpage with R, pyhton, and shell instructions + List of resources at end of day-2 + ncbi sra + ENA + ncbi taxonomy (help finding a genome) + Genbank vs RefSeq + The lesson is really lacking in philosphy and overview. + Include links to manuals and pull up Trimmomatic manual to dissect the flags we include. + Add bash vocab! + Some overlapping and confusing terms we hear a lot (at least how Mike Lee thinks of them; not peer reviewed!) + “command line” or “terminal” – a text-based environment capable of taking input and providing output + “shell” – what runs in a terminal, it is your ambassador to your operating system so you can tell it very specific things; the shell translates our language into the computer’s language and then back again so we can communicate with it + “bash” – a specific type of shell (by far the most common these days), this describes the programming language your shell understands; the language you use to talk to the shell so it can talk to the computer for you + explain the difference between mapping and aligning + broken link at http://www.datacarpentry.org/genomics-workshop/setup/ + `bcftools mpileup` instead of `samtools mpileup` + And pipe the pileup output instead of saving it to a bcf file

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.