# R03 june 2021
[toc]
## June Goals
#### Make species classifications against
+ "Run genome grist" on the 106 samples
+ [Genome Grist Docs](https://pypi.org/project/genome-grist/)
+ Taylor has these on farm
+ [QUESTION] are these already cleaned
+ **write out full** workflow:
+ downloaded from sra
+ remove human, kmer trim
+ [QUESTION] calculate "grist level species catalogs" for the 106 samples
+ prefetch against microbial genbank (gather)
+ only need gather db, not XXXXX
+ subcommand:
+ `gather genbank` or `genbank gather`
+ ~24 hr per sample in compute time
#### Steps
1. Ask Taylor (GHI) about sample set
2. ask if these are cleaned of human reads, and or khmer preprocessed
3. if not, do that
4. [QUESTION] How do we confirm that we are removing only the human data
5. Grist species catalog:
6. subset to 3 to test
7. Reports: Make a "Week 1" Issue
## Goals after june:
- titus do the data access request for the human subjects data for the rest of iHMP data
- figure out what next batch to do
- start working on visualization for the species catalogs we have
- add viral catalogs (hannah)
- figure out strain-level validation (bry)
## W1:
+ moved to using `/scratch/` for running
+ `genome-grist run conf-r03.yml summarize &> log.txt` with 64 gb mem (bm15)
+ `35715670 .. bmm smashtax hehouts R .. 5:35:55 .. 1 1 64G bm15`
+ use `-c 8` in srun `-J 8`
+ got error:
```
MissingOutputException in line 209 of /home/hehouts/miniconda3/envs/grist2/lib/python3.8/site-packages/genome_grist/conf/Snakefile:
Job Missing files after 5 seconds:
/tmp/tmp101jynny/PSM6XBW3.d/PSM6XBW3_1.fastq
/tmp/tmp101jynny/PSM6XBW3.d/PSM6XBW3_2.fastq
/tmp/tmp101jynny/PSM6XBW3.d/PSM6XBW3.fastq
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 22 completed successfully, but some output files are missing. 22
File "/home/hehouts/miniconda3/envs/grist2/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 589, in handle_job_success
File "/home/hehouts/miniconda3/envs/grist2/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 252, in handle_job_success
Removing output files of failed job download_sra_wc since they might be corrupted:
outputs.r03/raw/PSM6XBW3_1.fastq.gz, outputs.r03/raw/PSM6XBW3_2.fastq.gz, outputs.r03/raw/PSM6XBW3_unpaired.fastq.gz, /tmp/tmp101jynny/PSM6XBW3.d
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /scratch/hehouts/.snakemake/log/2021-06-18T172411.180308.snakemake.log
sample: ['CSM67UEW_TR', 'PSM6XBW3', 'PSM7J177']
outdir: outputs.r03
Error in snakemake invocation: Command '['snakemake', '-s', '/home/hehouts/miniconda3/envs/grist2/lib/python3.8/site-packages/genome_grist/conf/Snakefile', '-j', '1', '--use-conda', 'summarize', '--configfile', '/home/hehouts/miniconda3/envs/grist2/lib/python3.8/site-packages/genome_grist/conf/defaults.conf', '/home/hehouts/miniconda3/envs/grist2/lib/python3.8/site-packages/genome_grist/conf/system.conf', 'conf-r03.yml']' returned non-zero exit status 1.
```
+ so rerunning with `srun -p bmm -J grist -t 4-00:00:00 --mem=80G -c 8 --pty bash`
+ from bm3 `scratch/hehouts`
+ moved the sample files into `abundtrim/`
+ cp `hehouts/r03/*` to `/scratch/hehouts`
+ running: `genome-grist run conf-r03.yml summarize -J 8 &> log.txt`
with this `conf.yml`
```
sample:
- CSM67UEW_TR
- PSM6XBW3
- PSM7J177
outdir: outputs.r03/
metagenome_trim_memory: 1e9
#ommiting the database searches all of genbank if on farm
#sourmash_database_glob_pattern: gtdb-r95.nucleotide-k31-scaled1000.sbt.zip
```
## W2:
most recent failure log here:
`/home/hehouts/r03/grist2/.snakemake/log/2021-06-22T124419.842847.snakemake.log`
the command run was
`genome-grist run conf-r03.yml summarize outputs/genbank/PSM7J177.x.genbank.prefetch.zip`
and even though I specified (by adding a "goal" file in command line) that only rules through `sourmash_prefetch_gather_wc` should be run, these were the jobs:
```
1 gather_genbank_wc
1 make_combined_info_csv
1 make_notebook_wc
1 set_kernel
1 smash_abundtrim_wc
1 sourmash_gather_wc
1 sourmash_prefetch_gather_wc
1 split_query_known_unknown_wc
1 summarize
2 summarize_samtools_depth_wc
```