---
title: Update reference genomes
tags: OMM
description: Personal hackmd notes
image: https://partechshaker.com/wp-content/uploads/2018/10/logo_square.png
robots: noindex, nofollow
GA: UA-165598729-1
---
---
[TOC]
---
Update reference genomes
===
Raw files
---
:::info
Recalculate `bam` files with new reference genomes, especially inspect if there are similar regions where more than one reads are matching to (marked white in *IGV viewer*)
:::
:::warning
- **location:** `/home/aime/projects/oligomm-claudia/databases/omm_new`
:::
**Workflow**
- `OMM_new_genomes.zip` per mail from Baerbel, contains 12 `fasta` files
- uplodaed genomes to [github](https://github.com/philippmuench/OligoMM-report/tree/master/databases/omm_new)
- remove whitespaces between concatenated fasta entries (e.g. tailing whitespace on single fasta files)
- use strain names in fasta header
- `fasta` format validated, leading to errors on bowtie2, neet to curate them manually and updated files to github
| file name | fasta entries |
| -------- | -------- |
|`B_animalis_YL2.fasta`| Bifidobacterium_animalis_YL2_1, Bifidobacterium_animalis_YL2_1 |
|`L_reuteri_I49.fasta` | Lactobacillus_reuteri_I49_1, Lactobacillus_reuteri_I49_2, Lactobacillus_reuteri_I49_3 |
|`A_muciniphila_YL44.fasta` | Akkermansia_muciniphila |
|`T_muris.fasta` | T_muris |
|`E_faecalis_KB1.fasta` | Enterococcus_faecalis|
|`M_intestinale_YL27.fasta` | Muribaculum_intestinale |
|`F_plautii.fasta` | F_plautii_1, F_plautii_2|
|`A_muris_KB18.fasta` | Acutalibacter_muris |
|`C_innocuum_I46.fasta` | Clostridium_innocuum |
|`B_caecimuris.fasta` | B_caecimuris |
|`B_coccoiedes_YL58.fasta` | Blautia_coccoides |
|`Clostridioforme.fasta` | Clostridioforme|
```bash
cat *.fasta > joined_reference.fasta
code joined_reference.fasta # manually curate doulbe whitespaces
```
load-write fasta to make sure they are all in the same format (e.g. `nbchar` set to 60)
```R!
fasta <- seqinr::read.fasta(file = "joined_reference.fasta")
seqinr::write.fasta(fasta, names(fasta), file.out = "joined_reference_curated.fasta", open = "w", nbchar = 60, as.string = FALSE)
```
Prokka annotation of concatenated
---
```!bash
conda activate prokka
prokka joined_reference_curated.fasta --cpus 40 --outdir prokka
```
Create databases and restart pipeline
---
```!bash
ssh pmuench@grid.bifo.helmholtz-hzi.de
qrsh -l h='atlas-compute-01.bifo.helmholtz-hzi.de'
screen -r 16753.pts-3.atlas-compute-01
cd /net/sgi/oligomm_ab/OligoMM-report/databases/omm_new
git pull
bowtie-build joined_reference_curated.fasta joined_reference_curated # see fix error below, this needs to be bowtie2-build
bwa index joined_reference_curated.fasta
samtools faidx joined_reference_curated.fasta
vim ../../config/config.json # edit paths to joined_reference_curated.fasta, and index for bowtie2
cd ../../
rm -rf processed/1423_S34/
snakemake --jobs 40
```
Fix error
---
```!
(ERR): "/net/sgi/oligomm_ab/OligoMM-report/databases/omm_new/joined_reference_curated" does not exist or is not a Bowtie 2 index
Exiting now ...
```
seems the index was not created with bowtie2
```!bash
cd /net/sgi/oligomm_ab/OligoMM-report/databases/omm_new
bowtie2-build joined_reference_curated.fasta joined_reference_curated
cd /net/sgi/oligomm_ab/OligoMM-report
rm -rf processed/1423_S34/
snakemake --jobs 40
```
Inspect new alignment
---
Open IGV viewer on HZI server with new alignment files and reference genome. See [IGV Viewer](https://hackmd.io/YMMKtwsaQ7OLaxYitbUjVQ). When zoomed in and reads are loded (left lower corner) do nothing, otherwise it will freeze!
To load files:
- <kbd>Genomes</kbd> > <kbd>Load genome from file</kbd>: Select `/net/sgi/oligomm_ab/OligoMM-report/databases/omm_new/joined_reference_curated.fasta`
- <kbd>File</kbd> > <kbd>Load from file</kbd>: Select `bam` files and `vcf` files from `/net/sgi/oligomm_ab/OligoMM-report/processed/1423_S34/`
Change the bam tracks by <kbd>leftclick</kbd> <kbd>quished</kbd>
Lofreq variants (*bowtie2*)
---
Zoomed in to SNP on _A. Muciniphila_ - looks good :+1:

Zoomed in to SNP on _B. Caecimuris_ - many variants?

other variant looks also odd

Other region on same Contig, same problem with alignment to differen locations

Zoomed in to a region with many variants on _B. Caecimuris_ here, same problem that variants are detected in a region with abnormal coverage and evidence for ambigous alignments?

Move to another region, _M. Intestinale_, SNP seems legit :+1:

Move to another region, _Clostridioforme_, SNP seems legit :+1:

Many variants are on 16S rRNA which makes sense?

SNP on region with abnormal coverage

Varscan variants (*bowtie2*)
---