# *OMM* slide deck (cummulative) --- # Week 18 (meeting Eric) --- ## Agenda 0. Recap 1. Tried out new reference genomes 2. spot-checked one duplicated region 3. removal of duplicated regions --- ## Recap - Some regions have reads marked as "duplicated mapping location - These regions have often variants detected by *Lofreq*/*Varscan* ---- ![](https://i.imgur.com/8g5k6J7.png) --- ## 1. Tried out new reference genomes - Maybe the duplicated regions we saw are due to low quality of refrence genomes? - New genomes have less number of conigs and seem to be more complete - But no real difference when inspecting the alignment files ---- | file name | # contig | | -------- | -------- | |`L_reuteri_I49.fasta` | 3 | |`B_animalis_YL2.fasta`| 2 | |`F_plautii.fasta` | 2| |`B_caecimuris.fasta` |1 | |`B_coccoiedes_YL58.fasta` | 1 | |`Clostridioforme.fasta` |1| ---- | file name | # contig | | -------- | -------- | |`A_muris_KB18.fasta` |1 | |`C_innocuum_I46.fasta` | 1 | |`A_muciniphila_YL44.fasta` | 1 | |`T_muris.fasta` | 1 | |`E_faecalis_KB1.fasta` | 1| |`M_intestinale_YL27.fasta` |1 | ---- ![](https://i.imgur.com/wGUIJon.png) ---- ![](https://i.imgur.com/zhYMhpu.png) ---- ![](https://i.imgur.com/r6pyJB5.png) ---- ![](https://i.imgur.com/ySkl8Bg.png) ---- ![](https://i.imgur.com/TYjjhiK.png) ---- ![](https://i.imgur.com/nLTKgM7.png) ---- ![](https://i.imgur.com/6o6PWC1.png) ---- ![](https://i.imgur.com/YCQV2Fa.png) ---- ![](https://i.imgur.com/8vZkBk7.png) --- ## 2. spot-checked one duplicated region - choosed this region, reads are marked duplicated on `bwa mem` mapping - region is on position 1411179 to 1413329 ![](https://i.imgur.com/CRC1IiE.png) ---- Extracted fasta `region.bed` (tab delim) ``` B_caecimuris 1411179 1413329 ``` ```bash! bedtools getfasta -fi joined_reference_curated.fasta -bed region.bed > duplicated_sequence.fasta ``` ---- And aligned it to reference ```bash! bowtie2 -x joined_reference_curated -f duplicated_sequence.fasta -S region1.sam ``` seem to be really duplicated: ``` 1 reads; of these: 1 (100.00%) were unpaired; of these: 0 (0.00%) aligned 0 times 0 (0.00%) aligned exactly 1 time 1 (100.00%) aligned >1 times 100.00% overall alignment rate ``` But `sam` shows only one mapping ---- > By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). ---- To show all ```bash! bowtie2 -x joined_reference_curated -f duplicated_sequence.fasta -S region1.sam --no-contain -a --no-mixed ``` And finds 7 regions ![](https://i.imgur.com/NusV5uG.png) ---- [Full report](https://hackmd.io/@pmuench/SkKTfxPFL) --- ## 3. removal of duplicated regions ```bash! java -jar -Xmx30g picard.jar MarkDuplicates INPUT=1423_S34_bowtie2_mapped_omm_sorted.bam OUTPUT=1423_S34/1423_S34_bowtie2_mapped_omm_sorted_picard.bam METRICS_FILE=output.dup_metrics CREATE_INDEX=TRUE VALIDATION_STRINGENCY=SILENT ``` Seems there are 6.9% duplicated reads, and no big difference between *bowtie2* and *bwa mem* ---- Using [Samtools `markdup`](http://www.htslib.org/doc/samtools-markdup.html) marks duplicate alignments in a coordinate sorted file and use `-r` parameter to create a bam file with removed duplicates. ---- ![](https://i.imgur.com/hpFMgA3.png) ---- Alignment tacks: 1. bowtie2 2. bowtie2 with removed duplicated reads via `samtools markdup -r` 3. bwa mem 4. bwa mem with removed duplicated reads via `samtools markdup -r` --- Conclusion: - [ ] use Mauve to check for regions that overlap and exclude these from the variant calling - [ ] Quantification of all variants --- # Week 19 (meeting Eric) --- ## Agenda 0. Recap 1. _New Genome Browser_ 2. Quantification of all variants --- ## _New Genome Browser_ - Full report [https://hackmd.io/@pmuench/rJmrF4dKU](https://hackmd.io/@pmuench/rJmrF4dKU) - [link to server](http://3.123.8.123/catgenome/#/omm/Akkermansia_muciniphila/1073088/1152560?tracks=%5B%7B%22h%22:100,%22s%22:%7B%22rt%22:false,%22rsfs%22:true,%22rsrs%22:false%7D,%22f%22:%22REFERENCE%22,%22l%22:true,%22b%22:%22omm%22,%22p%22:%22omm%22%7D,%7B%22h%22:70,%22s%22:%7B%22v%22:%22Collapsed%22%7D,%22i1%22:%22%252Fngs%252F%252F1423_S34_bwa_lofreq.vcf.gz.tbi%22,%22f%22:%22VCF%22,%22l%22:true,%22b%22:%22%252Fngs%252F%252F1423_S34_bwa_lofreq.vcf.gz%22,%22p%22:%22omm%22%7D,%7B%22b%22:%22%252Fngs%252F%252F1423_S34_bwa_mapped_omm_sorted_updated.bam%22,%22n%22:%22%252Fngs%252F%252F1423_S34_bwa_mapped_omm_sorted_updated.bam%22,%22i1%22:%22%252Fngs%252F%252F1423_S34_bwa_mapped_omm_sorted_updated.bam.bai%22,%22l%22:true,%22f%22:%22BAM%22,%22p%22:%22omm%22,%22h%22:250,%22s%22:%7B%22a%22:true,%22aa%22:true,%22c%22:%22noColor%22,%22c1%22:true,%22d%22:true,%22g1%22:%22default%22,%22i%22:true,%22m%22:true,%22r%22:%221%22,%22s1%22:false,%22s2%22:true,%22s3%22:false,%22v1%22:false,%22cls%22:false,%22csm%22:%22default%22%7D%7D%5D) ---- ![](https://i.imgur.com/bAp4OgZ.png) --- ## Quantification of all variants - Full report [https://hackmd.io/@pmuench/BJO7_zdtI](https://hackmd.io/@pmuench/BJO7_zdtI) ---- ![](https://i.imgur.com/BgZfwDd.png) AF can be quanttified by ```bash! samtools mpileup -uv -Q 0 -f joined_reference_curated.fasta -r "Akkermansia_muciniphila:1-250" file.bam > file.vcf ``` ---- ```bash! Akkermansia_muciniphila 237 . C G,T,<X> 0 . DP=54;I16=48,3,1,2,1887,72557,111,4107,3060,183600,180,10800,1100,26006,75,1875;QS=0.944165,0.0372233,0.0186117,0;VDB=0.0481132;SGB=-0.511536;RPB=0.730199;MQB=1;MQSB=1;BQB=1;MQ0F=0 PL 0,89,247,120,231,255,154,253,255,255 ``` - For this position: `QS=0.944165,0.0372233,0.0186117` is the AF of the three alleles (ref, alternative 1, alternative 2). - So the _major AF_ is $max(QS)$ ---- quantification of the major AF: ```bash! bcftools view file.vcf -o file.bcf bcftools query -f '%CHROM\t%POS\t%QS]\n' file.bcf | cut -f3 | sed "s/\,/\ /g" | awk '{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print NR" ",m}' ``` ---- applied on one sample ```bash! $ less major_allele_bins.dat 0.0 0.1 141 0.1 0.2 0 0.2 0.3 0 0.3 0.4 4 0.4 0.5 983 0.5 0.6 765 0.6 0.7 1643 0.7 0.8 5650 0.8 0.9 19809 0.9 1.0 4888812 1.0 1.1 30253736 ``` ---- ![](https://i.imgur.com/vxHXWfP.png) ---- Values < 0.5? major AF-backround ``` 0.993274 0.992812 0.966161 0.89012 0.975505 0.48508 <- 0.976565 0.99231 0.990577 0.991922 0.996273 ```
{"metaMigratedAt":"2023-06-15T07:21:31.463Z","metaMigratedFrom":"YAML","title":"OMM Slide deck","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"transition\":\"slide\"}","contributors":"[{\"id\":\"6c2e9acb-a863-46f0-8240-af54ded6ea16\",\"add\":7732,\"del\":1028}]"}
    213 views