VM allocations: Frances: ec2-52-18-173-70.eu-west-1.compute.amazonaws.com:1 Graeme: ec2-34-252-97-49.eu-west-1.compute.amazonaws.com:1 Password is ed-dash-1307 Abhishek (A): ec2-34-252-20-175.eu-west-1.compute.amazonaws.com:1 Notes: * GG: Allow up to 30 minutes to get everyone setup * GG: Cover how to resize gedit text editor as screen fill entire VNC window. * Have a callout box with info on dataset * Lesson 1 exercise 1 , provide hint in question. e.g. piping using less and then using search * Typo etoh_60.fq in several places Ep 03: * Typo when _1 and _2 reversed in Kallisto rule * Exercise "running the kallisto quant rule" should just be a demo Ep 04: **How Snakemake chooses what jobs to run** * Gotcha 1: Changing the rules, no space between -Rtrimreads ~~~ snakemake -j1 -Rtrimreads -p kallisto.ref1/abundance.h5 ~~~ Answers to Question 1: * `-p` prints shell commands that will be executed Challenge2: Completed part 1? A Completed part 2? A Ep2: How many reads in the temp33_1_1.fq 20593 Exercise 2 - once you have an answer please shere it here, or shout if you have problems or unexpected errors. ``` rule countreads: output: "counts/{asample}.txt" input: "reads/{asample}.fq" shell: "echo $(( $(wc -l <{input}) / 4 )) > {output}" rule countreads: output: "{asample}_counts/fq.{n}.count" input: "reads/{asample}_{n}.fq" shell: "echo $(( $(wc -l <{input}) / 4 )) > {output}" rule countreads: output: "countreads_{n}.txt" # many to one, so shouldn't use input: "reads/{asample}_{n}.fq" shell: "echo $(( $(wc -l <{input}) / 4 )) > {output}" ``` Adding the trimreads rule. Give me thumbs-up once you have it working, or let me know if you get an error. A(:+1:) Workflow example (Abhishek): 1. Preprocess training and test data I: unprocessed data O: processed data 2. Fit a model I: processed data O: model parameters 3. Run a prediction on test data I: (model parameters, test data) O: predictions 4. Validation of predictions (I: (test data, predictions), O: validation metrics) Workflow example (Graeme): 1. Run QC on fastq files uing fastqc. input: fastqc files output: html report, zip file 1. index a transcriptome using salmon, input: fasta transcriptome file , output: salmon index directory 1. quantitfy transcripts from fastq reads using salmon input: fastq files, salmon index dir, output: quant directory ``` Answer from Tim Illumina demultiplexing bcl2fastq: input = bcl files, sample sheet output = fastq files, logs cutadapt: input = fastq_files output = metrics report: input = bcl2fastq_logs, cutadapt_metrics output = html report ``` ``` answer from frances align_to_genome: input = reference genome, fastq files output =per_sample_bam_files quantify_expression: input = bam files output = counts files differential_expression input =counts files, contrast file output =differential expression results ``` ```abhishek Waiting at most 5 seconds for missing files. MissingOutputException in line 15 of /home/training/yeast/Snakefile: Job Missing files after 5 seconds: kallisto.ref1/abundances.h5 kallisto.ref1/abundances.tsv This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/training/miniconda3/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 583, in handle_job_success File "/home/training/miniconda3/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 252, in handle_job_success ``` GG ~~~ Building DAG of jobs... MissingRuleException: No rule to produce kallisto.ref1/abundance.tsv (if you use input functions make sure that they don't raise unexpected exceptions). ~~~ GG: 1 index, 6 trim, 3 quant A: 10 jobs (1 index, 3 quant, 6 trim)