<!-- .slide: data-background="https://raw.githubusercontent.com/maxulysse/maxulysse.github.io/main/assets/img/svg/green_white_bg.svg" --> <a href="https://www.nf-co.re"><img src="https://github.com/nf-core/logos/raw/master/nf-core-logos/nf-core-logo-darkbg.svg" width="65%"><img></a> ## Reproducible Pipelines for Core Facilities (and you!) Franziska Bonath ▸ KTH | NGI | SciLifeLab Maxime U Garcia ▸ Seqera Labs --- ## Outline * Data Flow to NGI * QC Steps from Sequencer to Delivery * Pipelines and Workflow Managers * nf-core: A Community Curated Set of Pipelines using Nextflow * Nextflow Pipeline Case Study: Sarek --- <!-- .slide: data-background="https://hackmd.io/_uploads/H1CZ2IH2j.jpg" --> ## NGI #### (Stockholm node, Illumina projects only) |<font size = 6> projects: 631</font>| |<font size = 6> samples: 31686</font>| | -------- | -------- | -------- | | | <font size = 7> 2022 </font> | | |<font size = 6> bases: 1373 Gbp / day</font>| |<font size = 6> 1 human genome / 3.39 minutes</font>| --- ## Data flow at NGI <image src=https://hackmd.io/_uploads/rkvgNrrhj.jpg width=90%/> [comment]: <NAS = Network Attached Storage> --- ## bioinformatics at NGI ### 1) Primary QC of flowcells * Did the flowcell/lane get enough reads? * Is the average quality of all reads acceptable? * % of reads above Phred Q30 * PhiX error rate below 2 % ![](https://hackmd.io/_uploads/r1x9Gy_2s.png) ![](https://hackmd.io/_uploads/BJH9zJu3s.png) --- ## bioinformatics at NGI ### 2) Demultiplexing <div style="text-align: left; float: left;"><font size=6> * Did all samples get enough reads * Are there excessive amounts of undetermined reads? * Are there valid indexes within the undetermined reads </font></div> <span><image src=https://hackmd.io/_uploads/B1CDU1O3s.jpg/></span> --- ## bioinformatics at NGI ### 2) Demultiplexing <image src=https://hackmd.io/_uploads/ry9UBvHhs.png, width="393"/> <span><image src=https://hackmd.io/_uploads/H1_VUPH3o.png, width="500"/></span> <span><image src=https://hackmd.io/_uploads/BkjM_PHho.png, width="810"/></span> --- ## bioinformatics at NGI ### 3) QC reports by sample &nbsp; <div style="display: flex; justify-content: space-evenly; align-items:center;"> <div style="text-align: left; float: left;"><font size=5> * Quality on sample level * % of reads above Phred Q30 * Contamination report (Fastq-screen) * mapping against most common species * Summary of QC report in MultiQC </font></div><img src="https://hackmd.io/_uploads/HJXD0eOho.jpg" width="35%"> </div> --- ## bioinformatics at NGI ### 4) "Best Practice" Analysis <div style="text-align: left; float: left;"><font size=6> - Analysis to control for library preparation issues - Specific to library preparation type - First steps of data analysis for the data type - NGI _cannot_ do project specific analysis </font></div> <div style="text-align: left;"><font size=6> - Use of nextflow pipelines under nf-core - Results are summarized using MultiQC </font></div> <img src="https://github.com/nextflow-io/trademark/raw/master/nextflow2014_no-bg-bright.png" width="40%"/><img></a><a href="https://www.nf-co.re"><img src="https://github.com/nf-core/logos/raw/master/nf-core-logos/nf-core-logo-square.svg" width="10%"><img></a><a href="https://www.nextflow.io/"/><img src="https://github.com/ewels/MultiQC/raw/master/docs/images/MultiQC_logo_darkbg.png" width="30%"/> --- ## bioinformatics at NGI ### 5) Generation of project reports * Will contain: * General QC stats for the flowcell and each sample * Information on * Library prep * Sequencing setup * Accreditation status and deviations --- ## bioinformatics at NGI ### 6) Deliveries ![](https://hackmd.io/_uploads/HJw9Xhwni.png) <div style="display: flex; justify-content: space-evenly; align-items:center;"> <div style="text-align: left; float: left;"><font size=5> <p data-markdown>- For sensitive data</p> <p data-markdown>- Hosted by Uppmax</p> <p data-markdown>- Requires a SNIC account</p> </font></div> <div style="text-align: right; float: right;"><font size=5> <p data-markdown>- (Currently) only for non-sensitive data</p> <p data-markdown>- hosted by SciLifeLab Data Centre</p> <p data-markdown>- Email with access link sent to user</p> </font></div></div> --- <!-- .slide: data-background="https://dog.dnr.alaska.gov/resources/images/backgrounds/pipeline.jpg" --> # Pipelines --- ## What is a pipeline? <img src="https://hackmd.io/_uploads/BJ0vVftns.png" width="50%"><img> <font size=5> <a href="https://doi.org/10.1038/s41592-021-01254-9">10.1038/s41592-021-01254-9</a> </font> --- ## What is a workflow manager? <img src="https://hackmd.io/_uploads/HyEpPfY3i.png" width="70%"><img> <font size=5> <a href="https://doi.org/10.1038/s41592-021-01254-9">10.1038/s41592-021-01254-9</a> </font> --- ### Some available workflow managers <img src="https://hackmd.io/_uploads/S1rJYzY3o.png" width="90%"><img> <font size=5> <a href="https://doi.org/10.1038/s41592-021-01254-9">10.1038/s41592-021-01254-9</a> </font> --- <a href="https://www.nf-co.re"><img src="https://github.com/nf-core/logos/raw/master/nf-core-logos/nf-core-logo-darkbg.svg" width="65%"><img></a> --- ## Reproducibility is central <a href="https://academic.oup.com/view-large/figure/118918033/giy077fig1.jpg"><img src="https://maxulysse.github.io/assets/img/slides/gigascience_giy077_fig1.jpg" width="50%"><img></a> <font size=5> <a href="https://doi.org/10.1093/gigascience/giy077">10.1093/gigascience/giy077</a> </font> --- # What is nf-core? > A community effort to collect a curated set of analysis pipelines built using Nextflow. --- # What is Nextflow? <a href="https://www.nextflow.io/"><img src="https://maxulysse.github.io/assets/img/slides/nextflow.png" width="50%"><img></a> * Workflow manager * Data driven language * Portable * executable on multiple platforms * Shareable and reproducible * with containers or virtual environments --- ## Data driven language The execution graph depends on the input data, and is calculated on the go In `snakemake` it's the other way around The execution graph depends on the final target, and is calculated before launch --- ## Portability [www.nextflow.io/docs/latest/executor.html](https://www.nextflow.io/docs/latest/executor.html) - <i class="fa fa-server"></i> Sun Grid Engine, SLURM, PBS/Torque... - <i class="fa fa-cloud"></i> AWS Batch, Kubernetes, Google Life Sciences --- ## Reproducibility <a href="https://docs.conda.io/"><img src="https://maxulysse.github.io/assets/img/svg/conda_logo.svg" width="50%"><img></a> | <a href="https://www.docker.com/"><img src="https://maxulysse.github.io/assets/img/svg/docker_logo.svg" width="50%"><img></a> | <a href="https://sylabs.io/singularity/"><img src="https://maxulysse.github.io/assets/img/svg/singularity_logo.svg" width="50%"><img></a> :-:|:-:|:-: --- # What is nf-core: community ![](https://hackmd.io/_uploads/rkWJyefPo.png) --- # What is nf-core: for users ![](https://hackmd.io/_uploads/Hk5vpJGDi.png) --- # What is nf-core: for developers ![](https://hackmd.io/_uploads/BJAcR1Mws.png) --- # What does nf-core provide - **Pipelines**: ready-made pipelines [n=68] - **Docs <i class="fa fa-globe"></i>**: Guidelines, tutorials, videos - **Subworkflows <i class="fa fa-globe"></i>**: multi-tool wrappers [n=31] - **Modules <i class="fa fa-globe"></i>**: single-tool wrappers [n=797] - **Configs <i class="fa fa-globe"></i>**: shared infrastructure configs - **Test datasets <i class="fa fa-globe"></i>**: test data for :point_up_2: - **Tools <i class="fa fa-globe"></i>**: linting, template + automation for :point_up_2: <i class="fa fa-globe"></i> provided for the larger community --- ## Pipeline requirements [<i class="fa fa-globe"></i> nf-co.re/docs/contributing/adding_pipelines](https://nf-co.re/docs/contributing/adding_pipelines) - Nextflow based - Common structure - Stable release tags - MIT license - Software bundled for reproducibility - Continuous Integration testing - _lagom_ --- ## Sarek [<i class="fa fa-globe"></i> nf-co.re/sarek](https://nf-co.re/sarek) - Based on GATK Best Practices - Alignment, Variant Calling, Annotation - SNPs Indels, SVs, CNV, MSI... - Germline, Somatic or Tumor only --- <a href="https://nf-co.re/tools/"><img src="https://maxulysse.github.io/assets/img/svg/nf-core-tools_logo.svg" width="60%"><img></a> --- ## A companion tool [<i class="fa fa-globe"></i> https://nf-co.re/tools](https://nf-co.re/tools) - **[launch](https://nf-co.re/tools#launch-a-pipeline)** - with interactive prompts - **[download](https://nf-co.re/tools#downloading-pipelines-for-offline-use)** - for offline use - **[lint](https://nf-co.re/tools#linting-a-workflow)** - check code against guidelines - **[modules](https://nf-co.re/tools/#modules)** - List, update, lint, create... - **[subworkflows](https://nf-co.re/tools/#subworkflows)** - List, update, lint, create... - ... --- ## Configurations All pipelines come with a default sensible configuration for a regular sized HPC (Including UPPMAX) --- ## Configurations [<i class="fa fa-github"></i> github.com/nf-core/configs](https://github.com/nf-core/configs/) allows shared configurations between pipelines for a specific HPC * cpus, time and memory requirements * scheduler * queues * environments * path to common references files * ... --- ## <i class="fa fa-laptop"></i> Training and other events [<i class="fa fa-globe"></i> https://nf-co.re/events](https://nf-co.re/events) <a href="https://nf-co.re/events/2020/hackathon-francis-crick-2020"><img src="https://maxulysse.github.io/assets/img/slides/nf-core_hackathon_crick2020.jpg" width="60%"><img></a> [<i class="fa fa-globe"></i> nf-co.re/events/2023/training-march-2023](https://nf-co.re/events/2023/training-march-2023) --- ## Need help? <!-- .slide: data-background="https://raw.githubusercontent.com/maxulysse/maxulysse.github.io/main/assets/img/svg/green_white_bg.svg" --> Website: [`https://nf-co.re`](https://nf-co.re) Chat: [`https://nf-co.re/join`](https://nf-co.re/join) <img src="https://cdn.brandfolder.io/5H442O3W/at/pl546j-7le8zk-6gwiyo/Slack_Mark.svg" width=7.5%></img> <div style="margin-top:0.1em">&nbsp;</div> <p align="center"> Follow nf-core on <a href="https://www.twitter.com/nf_core"><img src="https://openmoji.org/data/color/svg/E040.svg" width=6%></a> <a href="https://mstdn.science/@nf_core"><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Mastodon_Logotype_%28Simple%29.svg/216px-Mastodon_Logotype_%28Simple%29.svg.png" width=5%></a> <a href="https://github.com/nf-core"><img src="https://openmoji.org/data/color/svg/E045.svg" width=6%></a> <a href="https://www.youtube.com/c/nf-core"><img src="https://openmoji.org/data/color/svg/E044.svg" width=6%></a> </a> </p> <a href="https://nf-co.re/" style="color: #000000; font-family:Monaco, monospace; font-weight:bold;">https://nf-co.re/</a> <div style="display: flex; justify-content: space-evenly; align-items:center;"> <img src="https://chanzuckerberg.com/wp-content/themes/czi/img/logo.svg" width=15%> <div style="font-style:italic; font-size: 0.5em; color: #666;">Icons:<br><a href="https://openmoji.org">openmoji.org</a></div></div> <style> body { background-image: url(https://raw.githubusercontent.com/nf-core/logos/master/nf-core-logos/nf-core-logo-square.svg); background-size: 7.5%; background-repeat: no-repeat; background-position: 3% 96%; background-color: #181a1b; } .reveal body { font-family: 'Roboto', sans-serif; font-weight: 300; color: white; } .reveal p { font-family: 'Roboto', sans-serif; font-weight: 300; color: white; } .reveal h1 { font-family: 'Roboto', sans-serif; font-style: bold; font-weight: 400; color: white; font-size: 62px; } .reveal h2 { font-family: 'Roboto', sans-serif; font-weight: 300; color: white; } .reveal h3 { font-family: 'Roboto', sans-serif; font-style: italic; font-weight: 300; color: white; } .reveal p { font-family: 'Roboto', sans-serif; font-weight: 300; color: white; } .reveal li { font-family: 'Roboto', sans-serif; font-weight: 300; color: white; } .reveal pre { background-color: #272822 !important; display: inline-block; border-radius: 7px; color: #aaaba9; } .reveal pre code { color: #eeeeee; background-color: #272822; font-size: 100%; } .reveal code { background-color: #272822; font-size: 75%; } .reveal .progress { color: #24B064; } .reveal .controls button { color: #24B064; } .reveal blockquote { display: block; position: relative; width: 90%; margin: 20px auto; padding: 5px; background: rgba(255, 255, 255, 0.05); box-shadow: 0px 0px 2px rgb(0 0 0 / 20%); } </style>
{"metaMigratedAt":"2023-06-17T19:14:43.969Z","metaMigratedFrom":"YAML","title":"Introduction to bioinformatics using NGS Data - NBIS course","breaks":true,"contributors":"[{\"id\":\"fb193497-1111-470c-a594-827d34b6f673\",\"add\":21847,\"del\":14742},{\"id\":\"5d29bb46-4e7a-46d5-91af-8540de253fce\",\"add\":8436,\"del\":2460}]"}
    552 views
   Owned this note