Sourmash GGG298 Tutorial:

--- tags: tutorials --- [TOC] # Sourmash GGG298 Tutorial: ## Putting together everything we learned! ### 1. Install packages with Conda make conda env, install snakemake minimal, install sourmash ``` conda create -y --name lab9_ggg298 -c conda-forge -c bioconda conda activate lab9_ggg298 conda install -c bioconda snakemake-minimal conda install -c bioconda sourmash sourmash ``` sourmash returns help menu an export.yml file can be created that stores the env info: `conda env export > export.yml` but we should edit out all of the dependency lines in that file that aren’t sourmash or snakemake, so the contents are: ``` name: lab9_ggg298 channels: - r - conda-forge - bioconda - defaults dependencies: - sourmash=3.2.2=py37h516909a_0 - snakemake-minimal=5.10.0=py_0 prefix: /home/hehouts/miniconda3/envs/lab9_ggg298 ``` then we can use `conda env create -f export.yml -n <new name>` to create a new environment from that file. ### 2. Github Made a github repo called "lab9_ggg298" Made a github project called "Sourmash_lab9_ggg298" ``` git clone https://github.com/hehouts/lab9_ggg298 ``` throws the warning that an empty repository was cloned ### 3. Snakemake Now we are going to make a snakefile! `nano -ET4 Snakefile` The first rule we are writing is to download files: ``` rule download_data: output: "data_files/1.fa.gz", "data_files/2.fa.gz", "data_files/3.fa.gz", "data_files/4.fa.gz", "data_files/5.fa.gz" shell: """ wget https://osf.io/t5bu6/download -O data_files/1.fa.gz wget https://osf.io/ztqx3/download -O data_files/2.fa.gz wget https://osf.io/w4ber/download -O data_files/3.fa.gz wget https://osf.io/dnyzp/download -O data_files/4.fa.gz wget https://osf.io/ajvqk/download -O data_files/5.fa.gz """ ``` ### 4. Sourmash #### compute run sourmash compute on the zipped genomes: ``` rule sourmash_compute_sigs: input: "data_files/{sample}.fa.gz" output: "sourmash_sigs/{sample}.fa.gz.sig" shell: """ sourmash compute -k 31 {wildcard.sample}.fa.gz """ ``` but we will need a rule all so. ``` # list out samples SAMPLES=['data_files/1.fa.gz', 'data_files/2.fa.gz', 'data_files/3.fa.gz', 'data_files/4.fa.gz', 'data_files/5.fa.gz'] rule all: input: # create a new filename for every entry in SAMPLES, # replacing {name} with each entry. expand("{name}.fa.gz.sig", name=SAMPLES), rule sourmash_compute: output: "{sample}.sig" input: "{sample}" shell: "sourmash compute -k 31 {wildcards.sample} -o {output}" ``` #### compare ``` rule sourmash_compare: input: "sourmash_sigs/1.fa.gz.sig", "sourmash_sigs/2.fa.gz.sig", "sourmash_sigs/3.fa.gz.sig", "sourmash_sigs/4.fa.gz.sig", "sourmash_sigs/5.fa.gz.sig" output: "sm_compare_all.cpm", "sm_compare_all.csv" shell: """ sourmash compare {input} -o sm_compare_all.cmp """ ``` Actions: Questions: what is the number on conda installs? e.g. `conda install -c bioconda/label/cf201901 sourmash` Thoughts: