--- title: Update reference genomes tags: OMM description: Personal hackmd notes image: https://partechshaker.com/wp-content/uploads/2018/10/logo_square.png robots: noindex, nofollow GA: UA-165598729-1 --- --- [TOC] --- Update reference genomes === Raw files --- :::info Recalculate `bam` files with new reference genomes, especially inspect if there are similar regions where more than one reads are matching to (marked white in *IGV viewer*) ::: :::warning - **location:** `/home/aime/projects/oligomm-claudia/databases/omm_new` ::: **Workflow** - `OMM_new_genomes.zip` per mail from Baerbel, contains 12 `fasta` files - uplodaed genomes to [github](https://github.com/philippmuench/OligoMM-report/tree/master/databases/omm_new) - remove whitespaces between concatenated fasta entries (e.g. tailing whitespace on single fasta files) - use strain names in fasta header - `fasta` format validated, leading to errors on bowtie2, neet to curate them manually and updated files to github | file name | fasta entries | | -------- | -------- | |`B_animalis_YL2.fasta`| Bifidobacterium_animalis_YL2_1, Bifidobacterium_animalis_YL2_1 | |`L_reuteri_I49.fasta` | Lactobacillus_reuteri_I49_1, Lactobacillus_reuteri_I49_2, Lactobacillus_reuteri_I49_3 | |`A_muciniphila_YL44.fasta` | Akkermansia_muciniphila | |`T_muris.fasta` | T_muris | |`E_faecalis_KB1.fasta` | Enterococcus_faecalis| |`M_intestinale_YL27.fasta` | Muribaculum_intestinale | |`F_plautii.fasta` | F_plautii_1, F_plautii_2| |`A_muris_KB18.fasta` | Acutalibacter_muris | |`C_innocuum_I46.fasta` | Clostridium_innocuum | |`B_caecimuris.fasta` | B_caecimuris | |`B_coccoiedes_YL58.fasta` | Blautia_coccoides | |`Clostridioforme.fasta` | Clostridioforme| ```bash cat *.fasta > joined_reference.fasta code joined_reference.fasta # manually curate doulbe whitespaces ``` load-write fasta to make sure they are all in the same format (e.g. `nbchar` set to 60) ```R! fasta <- seqinr::read.fasta(file = "joined_reference.fasta") seqinr::write.fasta(fasta, names(fasta), file.out = "joined_reference_curated.fasta", open = "w", nbchar = 60, as.string = FALSE) ``` Prokka annotation of concatenated --- ```!bash conda activate prokka prokka joined_reference_curated.fasta --cpus 40 --outdir prokka ``` Create databases and restart pipeline --- ```!bash ssh pmuench@grid.bifo.helmholtz-hzi.de qrsh -l h='atlas-compute-01.bifo.helmholtz-hzi.de' screen -r 16753.pts-3.atlas-compute-01 cd /net/sgi/oligomm_ab/OligoMM-report/databases/omm_new git pull bowtie-build joined_reference_curated.fasta joined_reference_curated # see fix error below, this needs to be bowtie2-build bwa index joined_reference_curated.fasta samtools faidx joined_reference_curated.fasta vim ../../config/config.json # edit paths to joined_reference_curated.fasta, and index for bowtie2 cd ../../ rm -rf processed/1423_S34/ snakemake --jobs 40 ``` Fix error --- ```! (ERR): "/net/sgi/oligomm_ab/OligoMM-report/databases/omm_new/joined_reference_curated" does not exist or is not a Bowtie 2 index Exiting now ... ``` seems the index was not created with bowtie2 ```!bash cd /net/sgi/oligomm_ab/OligoMM-report/databases/omm_new bowtie2-build joined_reference_curated.fasta joined_reference_curated cd /net/sgi/oligomm_ab/OligoMM-report rm -rf processed/1423_S34/ snakemake --jobs 40 ``` Inspect new alignment --- Open IGV viewer on HZI server with new alignment files and reference genome. See [IGV Viewer](https://hackmd.io/YMMKtwsaQ7OLaxYitbUjVQ). When zoomed in and reads are loded (left lower corner) do nothing, otherwise it will freeze! To load files: - <kbd>Genomes</kbd> > <kbd>Load genome from file</kbd>: Select `/net/sgi/oligomm_ab/OligoMM-report/databases/omm_new/joined_reference_curated.fasta` - <kbd>File</kbd> > <kbd>Load from file</kbd>: Select `bam` files and `vcf` files from `/net/sgi/oligomm_ab/OligoMM-report/processed/1423_S34/` Change the bam tracks by <kbd>leftclick</kbd> <kbd>quished</kbd> Lofreq variants (*bowtie2*) --- Zoomed in to SNP on _A. Muciniphila_ - looks good :+1: ![](https://i.imgur.com/wGUIJon.png) Zoomed in to SNP on _B. Caecimuris_ - many variants? ![](https://i.imgur.com/zhYMhpu.png) other variant looks also odd ![](https://i.imgur.com/r6pyJB5.png) Other region on same Contig, same problem with alignment to differen locations ![](https://i.imgur.com/ySkl8Bg.png) Zoomed in to a region with many variants on _B. Caecimuris_ here, same problem that variants are detected in a region with abnormal coverage and evidence for ambigous alignments? ![](https://i.imgur.com/TYjjhiK.png) Move to another region, _M. Intestinale_, SNP seems legit :+1: ![](https://i.imgur.com/nLTKgM7.png) Move to another region, _Clostridioforme_, SNP seems legit :+1: ![](https://i.imgur.com/6o6PWC1.png) Many variants are on 16S rRNA which makes sense? ![](https://i.imgur.com/YCQV2Fa.png) SNP on region with abnormal coverage ![](https://i.imgur.com/8vZkBk7.png) Varscan variants (*bowtie2*) ---