Revision session - Day 5

# Revision session - Day 5 ## Feedback Obviously, yesterday I have been a bit zealous and covered too much ground at once. I apologise for that. A lot of it was new to many of you and this must have felt overwhelming or frustrating. I should have cut on some aspects (scripting in bash for example). On the other hand, you have the material accessible, you have the videos and you can contact us anytme after the course, so hopefully that extra material will come to fruition at some point :slightly_smiling_face: I will shortly revisit some of the aspects. * Reproducible Research: we have covered all the minimal requirements (great!) and even if `git` might have a step leearning curve, it is worth learning. Bastian mentioned Vince Buffalo's book, I also (specifically for git) refer to the [git book](https://git-scm.com/book/en/v2). What we saw yesterday, which is what one needs 99% of the time (and for the rest there is Google), is covered in section 1.1 to 1.4 and 2.1 to 2.5. * Salmon / bash scripting. There the idea is as follows, writing script is useful for re-useability and documenting. A script could be as simple as the solution listed at the end of the [revision session](/q3hEfPmsRuaAl41OraWgWA) yesterday. I introduced some extra concepts: * passing arguments * validating input * separating the logic from the data ```mermaid flowchart LR id1([runScript.sh]) --> id2([tool]) id3[/submitScriptDS1.sh/] --> id1 id4[\submitScriptDS2.sh\] --> id1 id5[/dataset1/] --> id3 id6[\dataset2\] --> id4 id2 --> id7[/result1/] id2 --> id8[\result2\] ``` This above examplifies the last point: the losanges represent two different datasets (the data), while the round box represent the logical unit. The logic is agnostic of the data and can easily be re-used (runSalmon.sh), while the script (submitSalmon.sh) is aware of the data and details the specifics (input directory, index directory, output directory, etc.) These concepts might feel a bit abstract, but they are the core of any bioinformatics (actually broader; _i.e._ computer science really!) pipeline, such as Nextflow or snakemake. ## Nextflow Bastian has written the whole pipeline you did during this course as a Nextflow pipeline and he will demonstrate how to use it briefly. ## Package installation Download the package from Slack (general channel) ```R pkgs <- c("ComplexHeatmap","DESeq2","DOSE","DRIMSeq","EGSEA","EGSEAdata", "EnsDb.Hsapiens.v86","GOfuncR","MASS","UpSetR","edgeR","enrichplot","europepmc", "ggvenn","gplots","ggridges","here","hyperSpec","learnr","limma","org.Hs.eg.db", "palmerpenguins","stageR","tidyverse","topGO","tximport","vsn") # install.packages("BiocManager") BiocManager::install(pkgs) # if R started in the directory where the dowloaded package is install.packages("RnaSeqTutorials_0.6.4.tar.gz",repos=NULL) # otherwise install.packages("~/Download/RnaSeqTutorials_0.6.4.tar.gz",repos=NULL) ```