# *OMM* slide deck (cummulative)
---
# Week 18 (meeting Eric)
---
## Agenda
0. Recap
1. Tried out new reference genomes
2. spot-checked one duplicated region
3. removal of duplicated regions
---
## Recap
- Some regions have reads marked as "duplicated mapping location
- These regions have often variants detected by *Lofreq*/*Varscan*
----

---
## 1. Tried out new reference genomes
- Maybe the duplicated regions we saw are due to low quality of refrence genomes?
- New genomes have less number of conigs and seem to be more complete
- But no real difference when inspecting the alignment files
----
| file name | # contig |
| -------- | -------- |
|`L_reuteri_I49.fasta` | 3 |
|`B_animalis_YL2.fasta`| 2 |
|`F_plautii.fasta` | 2|
|`B_caecimuris.fasta` |1 |
|`B_coccoiedes_YL58.fasta` | 1 |
|`Clostridioforme.fasta` |1|
----
| file name | # contig |
| -------- | -------- |
|`A_muris_KB18.fasta` |1 |
|`C_innocuum_I46.fasta` | 1 |
|`A_muciniphila_YL44.fasta` | 1 |
|`T_muris.fasta` | 1 |
|`E_faecalis_KB1.fasta` | 1|
|`M_intestinale_YL27.fasta` |1 |
----

----

----

----

----

----

----

----

----

---
## 2. spot-checked one duplicated region
- choosed this region, reads are marked duplicated on `bwa mem` mapping
- region is on position 1411179 to 1413329

----
Extracted fasta
`region.bed` (tab delim)
```
B_caecimuris 1411179 1413329
```
```bash!
bedtools getfasta -fi joined_reference_curated.fasta
-bed region.bed > duplicated_sequence.fasta
```
----
And aligned it to reference
```bash!
bowtie2 -x joined_reference_curated
-f duplicated_sequence.fasta -S region1.sam
```
seem to be really duplicated:
```
1 reads; of these:
1 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
1 (100.00%) aligned >1 times
100.00% overall alignment rate
```
But `sam` shows only one mapping
----
> By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied).
----
To show all
```bash!
bowtie2 -x joined_reference_curated
-f duplicated_sequence.fasta
-S region1.sam
--no-contain -a --no-mixed
```
And finds 7 regions

----
[Full report](https://hackmd.io/@pmuench/SkKTfxPFL)
---
## 3. removal of duplicated regions
```bash!
java -jar -Xmx30g picard.jar MarkDuplicates
INPUT=1423_S34_bowtie2_mapped_omm_sorted.bam
OUTPUT=1423_S34/1423_S34_bowtie2_mapped_omm_sorted_picard.bam
METRICS_FILE=output.dup_metrics
CREATE_INDEX=TRUE VALIDATION_STRINGENCY=SILENT
```
Seems there are 6.9% duplicated reads, and no big difference between *bowtie2* and *bwa mem*
----
Using [Samtools `markdup`](http://www.htslib.org/doc/samtools-markdup.html) marks duplicate alignments in a coordinate sorted file and use `-r` parameter to create a bam file with removed duplicates.
----

----
Alignment tacks:
1. bowtie2
2. bowtie2 with removed duplicated reads via `samtools markdup -r`
3. bwa mem
4. bwa mem with removed duplicated reads via `samtools markdup -r`
---
Conclusion:
- [ ] use Mauve to check for regions that overlap and exclude these from the variant calling
- [ ] Quantification of all variants
---
# Week 19 (meeting Eric)
---
## Agenda
0. Recap
1. _New Genome Browser_
2. Quantification of all variants
---
## _New Genome Browser_
- Full report [https://hackmd.io/@pmuench/rJmrF4dKU](https://hackmd.io/@pmuench/rJmrF4dKU)
- [link to server](http://3.123.8.123/catgenome/#/omm/Akkermansia_muciniphila/1073088/1152560?tracks=%5B%7B%22h%22:100,%22s%22:%7B%22rt%22:false,%22rsfs%22:true,%22rsrs%22:false%7D,%22f%22:%22REFERENCE%22,%22l%22:true,%22b%22:%22omm%22,%22p%22:%22omm%22%7D,%7B%22h%22:70,%22s%22:%7B%22v%22:%22Collapsed%22%7D,%22i1%22:%22%252Fngs%252F%252F1423_S34_bwa_lofreq.vcf.gz.tbi%22,%22f%22:%22VCF%22,%22l%22:true,%22b%22:%22%252Fngs%252F%252F1423_S34_bwa_lofreq.vcf.gz%22,%22p%22:%22omm%22%7D,%7B%22b%22:%22%252Fngs%252F%252F1423_S34_bwa_mapped_omm_sorted_updated.bam%22,%22n%22:%22%252Fngs%252F%252F1423_S34_bwa_mapped_omm_sorted_updated.bam%22,%22i1%22:%22%252Fngs%252F%252F1423_S34_bwa_mapped_omm_sorted_updated.bam.bai%22,%22l%22:true,%22f%22:%22BAM%22,%22p%22:%22omm%22,%22h%22:250,%22s%22:%7B%22a%22:true,%22aa%22:true,%22c%22:%22noColor%22,%22c1%22:true,%22d%22:true,%22g1%22:%22default%22,%22i%22:true,%22m%22:true,%22r%22:%221%22,%22s1%22:false,%22s2%22:true,%22s3%22:false,%22v1%22:false,%22cls%22:false,%22csm%22:%22default%22%7D%7D%5D)
----

---
## Quantification of all variants
- Full report [https://hackmd.io/@pmuench/BJO7_zdtI](https://hackmd.io/@pmuench/BJO7_zdtI)
----

AF can be quanttified by
```bash!
samtools mpileup -uv -Q 0 -f joined_reference_curated.fasta -r "Akkermansia_muciniphila:1-250" file.bam > file.vcf
```
----
```bash!
Akkermansia_muciniphila 237 . C G,T,<X> 0 . DP=54;I16=48,3,1,2,1887,72557,111,4107,3060,183600,180,10800,1100,26006,75,1875;QS=0.944165,0.0372233,0.0186117,0;VDB=0.0481132;SGB=-0.511536;RPB=0.730199;MQB=1;MQSB=1;BQB=1;MQ0F=0 PL 0,89,247,120,231,255,154,253,255,255
```
- For this position: `QS=0.944165,0.0372233,0.0186117` is the AF of the three alleles (ref, alternative 1, alternative 2).
- So the _major AF_ is $max(QS)$
----
quantification of the major AF:
```bash!
bcftools view file.vcf -o file.bcf
bcftools query -f '%CHROM\t%POS\t%QS]\n' file.bcf | cut -f3 | sed "s/\,/\ /g" | awk '{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print NR" ",m}'
```
----
applied on one sample
```bash!
$ less major_allele_bins.dat
0.0 0.1 141
0.1 0.2 0
0.2 0.3 0
0.3 0.4 4
0.4 0.5 983
0.5 0.6 765
0.6 0.7 1643
0.7 0.8 5650
0.8 0.9 19809
0.9 1.0 4888812
1.0 1.1 30253736
```
----

----
Values < 0.5?
major AF-backround
```
0.993274
0.992812
0.966161
0.89012
0.975505
0.48508 <-
0.976565
0.99231
0.990577
0.991922
0.996273
```
{"metaMigratedAt":"2023-06-15T07:21:31.463Z","metaMigratedFrom":"YAML","title":"OMM Slide deck","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"transition\":\"slide\"}","contributors":"[{\"id\":\"6c2e9acb-a863-46f0-8240-af54ded6ea16\",\"add\":7732,\"del\":1028}]"}