VM allocations:
Frances: ec2-52-18-173-70.eu-west-1.compute.amazonaws.com:1
Graeme: ec2-34-252-97-49.eu-west-1.compute.amazonaws.com:1
Password is ed-dash-1307
Abhishek (A): ec2-34-252-20-175.eu-west-1.compute.amazonaws.com:1
Notes:
* GG: Allow up to 30 minutes to get everyone setup
* GG: Cover how to resize gedit text editor as screen fill entire VNC window.
* Have a callout box with info on dataset
* Lesson 1 exercise 1 , provide hint in question. e.g. piping using less and then using search
* Typo etoh_60.fq in several places
Ep 03:
* Typo when _1 and _2 reversed in Kallisto rule
* Exercise "running the kallisto quant rule" should just be a demo
Ep 04: **How Snakemake chooses what jobs to run**
* Gotcha 1: Changing the rules, no space between -Rtrimreads
~~~
snakemake -j1 -Rtrimreads -p kallisto.ref1/abundance.h5
~~~
Answers to Question 1:
* `-p` prints shell commands that will be executed
Challenge2:
Completed part 1? A
Completed part 2? A
Ep2: How many reads in the temp33_1_1.fq
20593
Exercise 2 - once you have an answer please shere it here, or shout
if you have problems or unexpected errors.
```
rule countreads:
output: "counts/{asample}.txt"
input: "reads/{asample}.fq"
shell:
"echo $(( $(wc -l <{input}) / 4 )) > {output}"
rule countreads:
output: "{asample}_counts/fq.{n}.count"
input: "reads/{asample}_{n}.fq"
shell:
"echo $(( $(wc -l <{input}) / 4 )) > {output}"
rule countreads:
output: "countreads_{n}.txt" # many to one, so shouldn't use
input: "reads/{asample}_{n}.fq"
shell:
"echo $(( $(wc -l <{input}) / 4 )) > {output}"
```
Adding the trimreads rule. Give me thumbs-up once you have it working, or let me know if you get an error. A(:+1:)
Workflow example (Abhishek):
1. Preprocess training and test data I: unprocessed data O: processed data
2. Fit a model I: processed data O: model parameters
3. Run a prediction on test data I: (model parameters, test data) O: predictions
4. Validation of predictions (I: (test data, predictions), O: validation metrics)
Workflow example (Graeme):
1. Run QC on fastq files uing fastqc. input: fastqc files output: html report, zip file
1. index a transcriptome using salmon, input: fasta transcriptome file , output: salmon index directory
1. quantitfy transcripts from fastq reads using salmon input: fastq files, salmon index dir, output: quant directory
```
Answer from Tim
Illumina demultiplexing
bcl2fastq: input = bcl files, sample sheet
output = fastq files, logs
cutadapt: input = fastq_files
output = metrics
report: input = bcl2fastq_logs, cutadapt_metrics
output = html report
```
```
answer from frances
align_to_genome:
input = reference genome, fastq files
output =per_sample_bam_files
quantify_expression:
input = bam files
output = counts files
differential_expression
input =counts files, contrast file
output =differential expression results
```
```abhishek
Waiting at most 5 seconds for missing files.
MissingOutputException in line 15 of /home/training/yeast/Snakefile:
Job Missing files after 5 seconds:
kallisto.ref1/abundances.h5
kallisto.ref1/abundances.tsv
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 0 completed successfully, but some output files are missing. 0
File "/home/training/miniconda3/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 583, in handle_job_success
File "/home/training/miniconda3/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 252, in handle_job_success
```
GG
~~~
Building DAG of jobs...
MissingRuleException:
No rule to produce kallisto.ref1/abundance.tsv (if you use input functions make sure that they don't raise unexpected exceptions).
~~~
GG: 1 index, 6 trim, 3 quant
A: 10 jobs (1 index, 3 quant, 6 trim)