---
tags: pipeline
---
# sarek annotation
## How to customise snpeff and vep annotation
### Using the nf-core containers with pre-downloaded cache
All is already configured within the [igenomes.config](https://github.com/nf-core/sarek/blob/master/conf/igenomes.config) file, so nothing to be done there.
Note: These containers are only created for some species and some cache/tools versions combinations (cf DockerHub tags for these containers [`nfcore/snpeff`](https://hub.docker.com/r/nfcore/snpeff/tags) and [`nfcore/vep`](https://hub.docker.com/r/nfcore/vep/tags).
These containers can be quite huge especially for human, it is recommended to use annotation cache on a path if possible
### Create containers with pre-downloaded cache
For each tool, an helper script `build.sh` can be found at the root of the tool folder in the nf-core module repo ([snpeff](https://github.com/nf-core/modules/tree/master/modules/nf-core/snpeff) and [ensemblvep](https://github.com/nf-core/modules/tree/master/modules/nf-core/ensemblvep)), and can be adapted for your usage.
### Use Sarek to download cache and annotate in one go
Use the params `--download_cache`, and specify with `--tools` for which annotation tool you need to download the cache (`snpeff` and or `vep`)
Sarek will automatically download the cache, use the biocontainers container for said tools, and use it to annotate any vcfs produced.
### Only download cache
Using the params `--build_only_index` allow for only downloading the cache for the specified tools.
### Location for the cache
Cache can be downloaded in the specified `--outdir_cache` location.
Else, it will be downloaded in `cache/` in the specified `--outdir` location.
To download cache on a cloud infrastructure, an absolute path is needed.
Params `--snpeff_cache` and `--vep_cache` are to used to specify the locations to the root of the annotation cache folder.
For example this is what can be seen when cache has been downloaded for `GATK.GRCh38` and `WBcell235` for both tools using the default values in the [igenomes.config](https://github.com/nf-core/sarek/blob/master/conf/igenomes.config) file:
```
ls /data/snpeff_cache /data/vep_cache/*
/data/snpeff_cache:
GRCh38.105
WBcel235.105
/data/vep_cache/caenorhabditis_elegans:
106_WBcel235
/data/vep_cache/homo_sapiens:
106_GRCh38
```
### Change cache version and species
By default all is specified in the [igenomes.config](https://github.com/nf-core/sarek/blob/master/conf/igenomes.config) file.
Explanation can be found for all params in the documentation:
- [snpeff_db](https://nf-co.re/sarek/latest/parameters#snpeff_db)
- [snpeff_genome](https://nf-co.re/sarek/latest/parameters#snpeff_genome)
- [vep_genome](https://nf-co.re/sarek/latest/parameters#vep_genome)
- [vep_species](https://nf-co.re/sarek/latest/parameters#vep_species)
- [vep_cache_version](https://nf-co.re/sarek/latest/parameters#vep_cache_version)
With the previous example of `GRCh38`, these are the values that were used for these params:
```
snpeff_db = 'GRCh38.105'
snpeff_genome = 'GRCh38'
vep_genome = 'GRCh38'
vep_species = 'homo_sapiens'
vep_cache_version = '106'
```
### Usage recommendation with AWS iGenomes
Annotation cache is a resource separated from AWS iGenomes, which as its own structure and a frequent update cycle.
So it is not recommended to put any annotation cache in your local AWS iGenomes folder.
A classical organisation could be:
```
/data/igenomes/
/data/cache/ensemblvep
/data/cache/snpeff
```
which can then be used this way in sarek:
```
nextflow run nf-core/sarek \\
--igenomes_base /data/igenomes/ \\
--snpeff_cache /data/cache/snpeff/ \\
--vep_cache /data/cache/ensemblvep/ \\
...
```
Or similarly on the cloud:
```
s3://data/igenomes/
s3://data/cache/ensemblvep
s3://data/cache/snpeff
```
which can then be used this way in sarek:
```
nextflow run nf-core/sarek \\
--igenomes_base s3://data/igenomes/ \\
--snpeff_cache s3://data/cache/snpeff/ \\
--vep_cache s3://data/cache/ensemblvep/ \\
...
```
These params can be specified in a config file or in a profile using the params scope, or even in a json or a yaml file using the `-params-file` nextflow option.