---
tags: tutorials
---
[TOC]
# Sourmash GGG298 Tutorial:
## Putting together everything we learned!
### 1. Install packages with Conda
make conda env, install snakemake minimal, install sourmash
```
conda create -y --name lab9_ggg298 -c conda-forge -c bioconda
conda activate lab9_ggg298
conda install -c bioconda snakemake-minimal
conda install -c bioconda sourmash
sourmash
```
sourmash returns help menu
an export.yml file can be created that stores the env info:
`conda env export > export.yml`
but we should edit out all of the dependency lines in that file that aren’t sourmash or snakemake, so the contents are:
```
name: lab9_ggg298
channels:
- r
- conda-forge
- bioconda
- defaults
dependencies:
- sourmash=3.2.2=py37h516909a_0
- snakemake-minimal=5.10.0=py_0
prefix: /home/hehouts/miniconda3/envs/lab9_ggg298
```
then we can use
`conda env create -f export.yml -n <new name>` to create a new environment from that file.
### 2. Github
Made a github repo called "lab9_ggg298"
Made a github project called "Sourmash_lab9_ggg298"
```
git clone https://github.com/hehouts/lab9_ggg298
```
throws the warning that an empty repository was cloned
### 3. Snakemake
Now we are going to make a snakefile!
`nano -ET4 Snakefile`
The first rule we are writing is to download files:
```
rule download_data:
output: "data_files/1.fa.gz",
"data_files/2.fa.gz",
"data_files/3.fa.gz",
"data_files/4.fa.gz",
"data_files/5.fa.gz"
shell: """
wget https://osf.io/t5bu6/download -O data_files/1.fa.gz
wget https://osf.io/ztqx3/download -O data_files/2.fa.gz
wget https://osf.io/w4ber/download -O data_files/3.fa.gz
wget https://osf.io/dnyzp/download -O data_files/4.fa.gz
wget https://osf.io/ajvqk/download -O data_files/5.fa.gz
"""
```
### 4. Sourmash
#### compute
run sourmash compute on the zipped genomes:
```
rule sourmash_compute_sigs:
input: "data_files/{sample}.fa.gz"
output: "sourmash_sigs/{sample}.fa.gz.sig"
shell: """
sourmash compute -k 31 {wildcard.sample}.fa.gz
"""
```
but we will need a rule all so.
```
# list out samples
SAMPLES=['data_files/1.fa.gz',
'data_files/2.fa.gz',
'data_files/3.fa.gz',
'data_files/4.fa.gz',
'data_files/5.fa.gz']
rule all:
input:
# create a new filename for every entry in SAMPLES,
# replacing {name} with each entry.
expand("{name}.fa.gz.sig", name=SAMPLES),
rule sourmash_compute:
output: "{sample}.sig"
input: "{sample}"
shell:
"sourmash compute -k 31 {wildcards.sample} -o {output}"
```
#### compare
```
rule sourmash_compare:
input: "sourmash_sigs/1.fa.gz.sig",
"sourmash_sigs/2.fa.gz.sig",
"sourmash_sigs/3.fa.gz.sig",
"sourmash_sigs/4.fa.gz.sig",
"sourmash_sigs/5.fa.gz.sig"
output: "sm_compare_all.cpm",
"sm_compare_all.csv"
shell: """
sourmash compare {input} -o sm_compare_all.cmp
"""
```
Actions:
Questions:
what is the number on conda installs?
e.g. `conda install -c bioconda/label/cf201901 sourmash`
Thoughts: