owned this note
owned this note
Published
Linked with GitHub
---
title: nf-core/sarek a Best-Practices community pipeline for WGS/WES
tags: nf-core, sarek, GenPhySE
type: slide
---
<!-- .slide: data-background="https://raw.githubusercontent.com/maxulysse/maxulysse.github.io/main/assets/img/svg/green_white_bg.svg" -->
<a href="https://www.nf-co.re/sarek"><img src="https://raw.githubusercontent.com/nf-core/sarek/master/docs/images/logos/nf-core_sarek/nf-core_sarek_dark_color.svg" width="65%"><img></a>
#### A Best-Practices community pipeline for WGS/WES
<small>
Maxime U Garcia ▸ [<i class="fa fa-github"></i>@maxulysse](https://github.com/maxulysse/)
[Seqera Labs](https://seqera.io/) ▸ Barcelona | Stockholm
2023-03-06@[Genphyse](https://genphyse.toulouse.inra.fr/seminars/2023/maxime-garcia)
</small>
---
## Reproducibility is central
<a href="https://academic.oup.com/view-large/figure/118918033/giy077fig1.jpg"><img src="https://maxulysse.github.io/assets/img/slides/gigascience_giy077_fig1.jpg" width="50%"><img></a>
<font size=5>
<a href="https://doi.org/10.1093/gigascience/giy077">10.1093/gigascience/giy077</a>
</font>
---
## What is Sarek
<!-- .slide: data-background="https://maxulysse.github.io/assets/img/background/Sarek-beer.jpg" data-background-opacity="0.5" -->
A [National Park](https://www.sverigesnationalparker.se/en/choose-park---list/sarek-national-park/) in Northern Sweden.
<small>
* Long, deep, narrow valleys and wild, turbulent water
* A tortuous delta landscape
* Completely lacking in comfortable accommodations
* One of Sweden’s most inaccessible national parks
* There are no roads leading up to the national park
</small>
---
<!-- .slide: data-background="https://maxulysse.github.io/assets/img/background/Sarek-Park-02.jpg" -->
---
<!-- .slide: data-background="https://maxulysse.github.io/assets/img/background/Sarek-Park-02.jpg" data-background-opacity="0.5" -->
## Explore strange new worlds
<div class="multi-column">
<div>
<small>
* [Sarek](https://github.com/SciLifeLab/Sarek/releases/tag/2.0.0)
* [Ruotes](https://github.com/SciLifeLab/Sarek/releases/tag/2.1.0)
* [Skårki](https://github.com/SciLifeLab/Sarek/releases/tag/2.2.0)
* [Äpar](https://github.com/SciLifeLab/Sarek/releases/tag/2.3)
</small>
</div>
<div>
<small>
* [Ålkatj](https://github.com/nf-core/sarek/releases/tag/2.5)
* [Årjep-Ålkatjjekna](https://github.com/nf-core/sarek/releases/tag/2.5.1)
* [Jåkkåtjkaskajekna](https://github.com/nf-core/sarek/releases/tag/2.5.2)
* [Piellorieppe](https://github.com/nf-core/sarek/releases/tag/2.6)
* [Gådokgaskatjåhkkå](https://github.com/nf-core/sarek/releases/tag/2.6.1)
* [Pårte](https://github.com/nf-core/sarek/releases/tag/2.7)
* [Pårtejekna](https://github.com/nf-core/sarek/releases/tag/2.7.1)
* [Áhkká](https://github.com/nf-core/sarek/releases/tag/2.7.2)
</small>
</div>
<div>
<small>
* [Skierfe](https://github.com/nf-core/sarek/releases/tag/3.0)
* [Saiva](https://github.com/nf-core/sarek/releases/tag/3.0.1)
* [Lájtávrre](https://github.com/nf-core/sarek/releases/tag/3.0.2)
* [Rapaätno](https://github.com/nf-core/sarek/releases/tag/3.1)
* [Lilla Luleälven](https://github.com/nf-core/sarek/releases/tag/3.1.1)
* [Lesser Lule River](https://github.com/nf-core/sarek/releases/tag/3.1.2)
</small>
</div>
</div>
---
[](https://nf-co.re/sarek)
* Open-Source Nextflow Pipeline
* Community driven
---
[](https://www.nextflow.io/)
* Workflow manager
* Data driven language
* Portable
* executable on multiple platforms
* Shareable and reproducible
* with containers or virtual environments
---
## Data driven language
The execution graph depends on the input data,
and is calculated on the go.
In `snakemake`, it's the other way around.
The execution graph depends on the final target,
and is calculated before launch.
---
## Portability
[www.nextflow.io/docs/latest/executor.html](https://www.nextflow.io/docs/latest/executor.html)
* <i class="fa fa-server"></i> Sun Grid Engine, SLURM, PBS/Torque...
* <i class="fa fa-cloud"></i> AWS, Azure, Google...
---
## Reproducibility
<div class="multi-column">
<div>
[ ](https://bioconda.github.io/)
</div>
<div>
[](https://www.docker.com/)
</div>
<div>
[](https://sylabs.io/singularity/)
</div>
</div>
---
[](https://nf-co.re/sarek)
<div class="multi-column">
<div>
<div>Started at</div>
* [NGI](https://ngisweden.scilifelab.se/)
* [NBIS](https://nbis.se/)
* [BTB](https://ki.se/forskning/barntumorbanken)
</div>
<div>
<div>Continued at</div>
* [QBiC](https://www.qbic.uni-tuebingen.de/)
* [DNGC](https://eng.ngc.dk/)
* [GHGA](https://www.ghga.de/)
* [Seqera Labs](https://seqera.io/)
</div>
</div>
---
<!-- .slide: data-background="https://maxulysse.github.io/assets/img/background/Sarek-Park-02.jpg" data-background-opacity="0.5" -->
## Multiple flavors

<div class="multi-column">
<div>

</div>
<div>

</div>
</div>
---
<!-- .slide: data-background="https://maxulysse.github.io/assets/img/background/Sarek-Park-02.jpg" data-background-opacity="0.7" -->
## Multiple modes

---
## Reference genomes
* Currently using [AWS iGenomes](https://registry.opendata.aws/aws-igenomes/)
* Considering [refgenie](http://refgenie.databio.org/en/latest/)
---
## Preprocessing
[](https://software.broadinstitute.org/gatk/best-practices/)
Based on GATK Best Practices (GATK 4.3.0.0)
* Reads mapped to reference genome with `bwa mem`
* Duplicates marked with `picard MarkDuplicates`
* Recalibrate with `GATK BaseRecalibrator`
---
## Germline Variant Calling
<small>
* SNVs and small indels
* Deepvariant
* Freebayes
* HaplotypeCaller
* mpileup
* Strelka2
* Structural variants
* Manta
* TIDDIT
* CNVs
* CNVKIT
</small>
---
## Somatic Variant Calling
<small>
* SNVs and small indels
* Freebayes
* Mutect2
* Strelka2
* Structural variants
* Manta
* TIDDIT
* Sample heterogeneity, ploidy and CNVs
* ASCAT
* CNVKIT
* Control-FREEC
* Microsatellite instability
* MSIsensorpro
</small>
---
## Annotation
<small>
* Ensembl-VEP and SnpEff
* ClinVar, COSMIC, dbSNP, GENCODE, gnomAD, polyphen, sift, etc.
</small>
---
## Reports
[](https://multiqc.info/)
---
[](https://github.com/nf-core/sarek/releases/tag/3.1.1)
---
## Sarek in Nextflow Tower
[cloud.tower.nf](https://cloud.tower.nf/)
---
[](https://nf-co.re)
> A community effort to collect a curated set of analysis pipelines built using Nextflow.
---
# Community

---
# Users

---
# Developers

---
# What does nf-core provide
* **Pipelines**: ready-made pipelines
* **Docs <i class="fa fa-globe"></i>**: Guidelines, tutorials, videos
* **Modules <i class="fa fa-globe"></i>**: single-tool wrappers
* **Subworkflows <i class="fa fa-globe"></i>**: multi-tool wrappers
* **Configs <i class="fa fa-globe"></i>**: shared infrastructure configs
* **Test datasets <i class="fa fa-globe"></i>**: test data for :point_up_2:
* **Tools <i class="fa fa-globe"></i>**: linting, template + automation for :point_up_2:
---
## Pipeline requirements
[<i class="fa fa-globe"></i> nf-co.re/docs/contributing/adding_pipelines](https://nf-co.re/docs/contributing/adding_pipelines)
* Nextflow based
* Common structure
* Stable release tags
* MIT license
* Software bundled for reproducibility
* Continuous Integration testing
* _lagom_
---
## <i class="fa fa-laptop"></i> Training and other events
[<i class="fa fa-globe"></i> https://nf-co.re/events](https://nf-co.re/events)
<a href="https://nf-co.re/events/2020/hackathon-francis-crick-2020"><img src="https://maxulysse.github.io/assets/img/slides/nf-core_hackathon_crick2020.jpg" width="60%"><img></a>
<small>
* [<i class="fa fa-globe"></i> nf-co.re/events/2023/training-march-2023](https://nf-co.re/events/2023/training-march-2023)
* [<i class="fa fa-globe"></i> nf-co.re/events/2023/hackathon-march-2023](https://nf-co.re/events/2023/hackathon-march-2023)
</small>
---
## Need help?
<!-- .slide: data-background="https://raw.githubusercontent.com/maxulysse/maxulysse.github.io/main/assets/img/svg/green_white_bg.svg" -->
Website: [`https://nf-co.re`](https://nf-co.re)
Chat: [`https://nf-co.re/join`](https://nf-co.re/join) <img src="https://cdn.brandfolder.io/5H442O3W/at/pl546j-7le8zk-6gwiyo/Slack_Mark.svg" width=7.5%></img>
<div style="margin-top:0.1em"> </div>
[<i class="fa fa-twitter fa-2x"></i>](https://www.twitter.com/nf_core) <a href="https://mstdn.science/@nf_core"><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Mastodon_Logotype_%28Simple%29.svg/216px-Mastodon_Logotype_%28Simple%29.svg.png" width="7%"></a> [<i class="fa fa-github fa-2x"></i>](https://github.com/nf-core) [<i class="fa fa-youtube fa-2x"></i>](https://www.youtube.com/c/nf-core)
<a href="https://nf-co.re/" style="color: #000000; font-family:Monaco, monospace; font-weight:bold;">https://nf-co.re/</a>
<img src="https://chanzuckerberg.com/wp-content/themes/czi/img/logo.svg">
<style>
body {
background-image: url(https://raw.githubusercontent.com/nf-core/logos/master/nf-core-logos/nf-core-logo-square.svg);
background-size: 7.5%;
background-repeat: no-repeat;
background-position: 3% 96%;
background-color: #181a1b;
}
.reveal body {
font-family: 'Roboto', sans-serif;
font-weight: 300;
color: white;
}
.reveal p {
font-family: 'Roboto', sans-serif;
font-weight: 300;
color: white;
}
.reveal h1 {
font-family: 'Roboto', sans-serif;
font-style: bold;
font-weight: 400;
color: white;
font-size: 62px;
}
.reveal h2 {
font-family: 'Roboto', sans-serif;
font-weight: 300;
color: white;
}
.reveal h3 {
font-family: 'Roboto', sans-serif;
font-style: italic;
font-weight: 300;
color: white;
}
.reveal p {
font-family: 'Roboto', sans-serif;
font-weight: 300;
color: white;
}
.reveal li {
font-family: 'Roboto', sans-serif;
font-weight: 300;
color: white;
}
.reveal pre {
background-color: #272822 !important;
display: inline-block;
border-radius: 7px;
color: #aaaba9;
}
.reveal pre code {
color: #eeeeee;
background-color: #272822;
font-size: 100%;
}
.reveal code {
background-color: #272822;
font-size: 75%;
}
.reveal .progress {
color: #24B064;
}
.reveal .controls button {
color: #24B064;
}
.reveal blockquote {
display: block;
position: relative;
width: 90%;
margin: 20px auto;
padding: 5px;
background: rgba(255, 255, 255, 0.05);
box-shadow: 0px 0px 2px rgb(0 0 0 / 20%);
}
.multi-column{
display: grid;
grid-auto-flow: column;
}
</style>