A fast introduction to snakemake (in 5 examples)

# A rapid introduction to snakemake (in 5 examples) See github repo: [ctb/2026-rapid-snakemake-intro](https://github.com/ctb/2026-rapid-snakemake-intro) --- ## 1. Compress a file ``` rule compress_file: shell: """ gzip -c example.txt > example.txt.gz """ ``` Put this in `Snakefile`; run with `snakemake -j 1`. --- ## 2. Compress a file _once_ ``` rule compress_file: output: "example.txt.gz" shell: """ gzip -c example.txt > example.txt.gz """ ``` Run with `snakemake -j 1` --- ## 3. Compress _any_ .txt file ``` rule compress_file: input: '{name}.txt' output: '{name}.txt.gz' shell: 'gzip -c {input} > {output}' ``` Run: ``` snakemake -j 1 example.txt.gz # NOTE: filename req'd! ``` --- ## 4. Compress _all_ .txt files (including in subdirs) ``` MATCHES = glob_wildcards('{name}.txt') rule all: input: expand('{name}.txt.gz', name=MATCHES.name) rule compress_file: input: '{name}.txt' output: '{name}.txt.gz' shell: 'gzip -c {input} > {output}' ``` Run: ``` snakemake -j 4 ## NOTE: uses 4 CPUs; no filename req'd ``` --- ## 5. Multistage: compress & do something else ``` MATCHES = glob_wildcards('{name}.txt') rule all: input: expand('{name}.info', name=MATCHES.name) rule compress_file: input: '{x}.txt' output: '{x}.txt.gz' shell: 'gzip -c {input} > {output}' rule info: input: '{y}.txt.gz' output: '{y}.info' shell: 'gzip -lv {input} > {output}' ``` --- Points to make: * snakemake basically just builds shell commands and runs them in a particular order. * the first rule is the default rule, but you can give a rule name to run, or a filename to make, and it will run that instead. * it works hand in hand with consistent naming schemes... * it tries to avoid rerunning jobs unnecessarily. * it's really good at paralellizing jobs!

{"title":"A fast introduction to snakemake (in 5 examples)","description":"TODO:","contributors":"[{\"id\":\"fbac64b8-20e4-4eb4-85a6-d4048a601d72\",\"add\":2176,\"del\":309,\"latestUpdatedAt\":1768418630442}]"}

203 views