---
title: Spot check duplicated regions
tags: OMM
description: Personal hackmd notes
image: https://partechshaker.com/wp-content/uploads/2018/10/logo_square.png
robots: noindex, nofollow
GA: UA-165598729-1
---
---
[TOC]
---
Spot check duplicated regions
===
Extract coordinates of one region
---
Region 1 (transposase)
----
Duplicated region is on 1411179 to 1413329. Exported bed file of this region to `/net/sgi/oligomm_ab/OligoMM-report/IGV_files/`. Exported using [bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html)
`region.bed` (tab delim)
```
B_caecimuris 1411179 1413329
```
```bash!
bedtools getfasta -fi ../databases/omm_new/joined_reference_curated.fasta -bed region.bed > duplicated_sequence.fasta
```
```!
>B_caecimuris:1411179-1413329
aaatagtacccattttttactctgtttgcttttgtatgtcattggattttagtatatttacattgttggaaaatttaaagtcgattgacccactattatactctaaattgacccatcaaaaaaatattagatttaaataaagaaaaatgcattttaacccgggtcattttcaattataataatctacaatatatactatttctttttagcattaatctttcttacagaatccccggtaagttcaatcttgtgtgctgtatgtacaattctgtccaggattgcgtcagctactgtaggatcaccaattgcatcataccggctttcgacaggaagttgtgatgttattatgattgattttaatccgtgtctgtcttctattatttccataaggattgacctttccctggcatccagtcctataagaaacagatcgtcaagaatcagcagttgacacttttctatttttttcatctcagattctatagtgcccttgtttttggcaattttaagctgtcccataagctttgacgcatttgaatacaaagtctttattccattcttacatgcctcatatcctatggctgaagctatatagcttttacctgtaccggagcttcccgtgatgaaaacattctgtccgtctcttataaaatcaagagatgcaagtctctcaagctggttacggtcaaggttacgttttatggtatagtctattttttccatatatgccttatatctgaagtttgcagactttatcagtctctcaatgcttacattgcgtctgtaatcatattcgctttcaagaagccatttaagaaactcatcgtttgtcataccatcagaggatgtggtcctgcagtcatttctatatgtttcaagcataccgtaaaaacgtaacttggagagtagttccattattctgtccatattttttccgacagttctgcttgttttattattacttgtcatttttatccatgtttttagagttaaaataatccttgcctctgagatttctgtgtttgggggtaagttctggagcctgtccctccatctgtacatgatacttctcatcttccctgtttaccagaatactttcaagttcgttgtatccgaacataccgaattccattgccacctgacatgaaagaaccaccctgtcatgtccgaaacgctccacaagacttaatatgccatcggctgaacgtacggcctggaccggatatttcttggcaacggccacacgctttatgtactcttccaataccggatcaattccacatgccttgcggtatatatcatccattctgattctgtattcatgtggcactcccgaaagtttgtgggacggtttttctgaataagtgaaaggtgtatcatcctttcgatgtgtcgttatgtgtctgaacttatgatatatctccactgtgtccccatcatacagtaattctacagtatcgccgatatactctttaggaacactgtaatagtgattgttaagcgatacataactgtttctcatgacagttgccgttttccggctttttgatataaattttgttgccggcaatgtatgcagcctgtctttctcgacttcgaggaaacgttcccgacggctgtagttgcgattgtacatctttcggctgttcaacgcatccgtatgcttcattatttctatgttcaaggcttcaagatcattgaatttcaatcccgtcatctttgaatacacctccctgtagagcagtcttacagcattctcaaccagagctttgtctttaggctttcgtactcttgccgggaagacaacacatccataatggtctgcaaatgcagcaaagtcgtcattgattacaggttcaacacctccaggctttgttacggctgatctcaggttgtctggaactatggcattggggacacctccaaaataatggaaagcattttcacatgcctggataaggtgttctttcttttgtgatggtacagcctcgtaataggtaatctggctgcatggaagtatggcggcaaatacttctacgggaaccttattgcctgttctttcgtatgagagataaagtttgtcaccggcaaaatccacatacatctgatcaccggctatatgatctatgcgtccaacaggactctttact
```
Align this region to reference `databases/omm_new/joined_reference_curated.fasta`
```bash!
bowtie2 -x /net/sgi/oligomm_ab/OligoMM-report/databases/omm_new/joined_reference_curated -f IGV_files/duplicated_sequence.fasta -S IGV_files/region1.sam
```
```
1 reads; of these:
1 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
1 (100.00%) aligned >1 times
100.00% overall alignment rate
```
> By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied).
so we run it with `--no-contain -a --no-mixed` to obtain more than one mapping region
```bash!
bowtie2 -x /net/sgi/oligomm_ab/OligoMM-report/databases/omm_new/joined_reference_curated -f IGV_files/duplicated_sequence.fasta -S IGV_files/region1.sam --no-contain -a --no-mixed
```
And the `IGV_files/region1.sam` show 7 valid mapping locations all on the same contig:

## Mauve on all genomes in OMM set

However, export of Mauve seems not working, testing out other approaches.
## Mummer
```bash
cd /home/aime/projects/oligomm-claudia
mkdir mummer && cd mummer
wget https://github.com/mummer4/mummer/releases/download/v4.0.0beta2/mummer-4.0.0beta2.tar.gz
tar xvzf mummer-4.0.0beta2.tar.gz
cd mummer-4.0.0beta2
./configure
make
sudo make install
mummer -h
```
Manual at https://github.com/mummer4/mummer/blob/master/MANUAL.md
```bash!
nucmer ../../databases/omm_new/joined_reference_curated.fasta ../../databases/omm_new/C_innocuum_I46.fasta -p C_innocuum_I46
show-coords C_innocuum_I46.delta -w -o
show-diff C_innocuum_I46.delta
```
example output of `show-diff C_innocuum_I46.delta`
> Outputs a list of structural differences for each sequence in
the reference and query, sorted by position. For a reference
sequence R, and its matching query sequence Q, differences are
categorized as GAP (gap between two mutually consistent alignments),
DUP (inserted duplication), BRK (other inserted sequence), JMP
(rearrangement), INV (rearrangement with inversion), SEQ
(rearrangement with another sequence). The first five columns of
the output are seq ID, feature type, feature start, feature end,
and feature length. Additional columns are added depending on the
feature type. Negative feature lengths indicate overlapping adjacent
alignment blocks.
> This program classifies alignment breakpoints for the quantification of macroscopic differences between two genomes. It takes a standard, unfiltered delta file as input, determines the best mapping between the two sequence sets, and reports on the breaks in that mapping.
```
IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff
IDR DUP dup-start dup-end dup-length
IDR BRK gap-start gap-end gap-length
IDR JMP gap-start gap-end gap-length
IDR INV gap-start gap-end gap-length
IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence
```
From meeting with Eric: `BRK` seems to be the sequence before/after the break and unintersting for us (since the use case for Mummer is for strain comparison). So only column `DUP` is interesting for us.
```
$show-diff test/Cin.delta | xclip -i -selection clipboard
/home/aime/projects/oligomm-claudia/databases/omm_new/joined_reference_curated.fasta /home/aime/projects/oligomm-claudia/databases/omm_new/C_innocuum_I46.fasta
NUCMER
[SEQ] [TYPE] [S1] [E1] [LEN 1]
Acutalibacter_muris BRK 1 3406851 3406851
Acutalibacter_muris DUP 3406852 3407744 893
Acutalibacter_muris BRK 3407745 3802913 395169
Blautia_coccoides BRK 1 1237502 1237502
Blautia_coccoides DUP 1237503 1239438 1936
Blautia_coccoides BRK 1239439 1410980 171542
Blautia_coccoides DUP 1410981 1411521 541
Blautia_coccoides BRK 1411522 5128582 3717061
Clostridioforme BRK 1 3951294 3951294
Clostridioforme DUP 3951295 3951472 178
Clostridioforme DUP 3951296 3953537 2242
Clostridioforme BRK 3953538 3958272 4735
Clostridioforme DUP 3958273 3958492 220
Clostridioforme BRK 3958493 5416939 1458447
Clostridioforme DUP 5416940 5421324 4385
Clostridioforme BRK 5421325 6123472 702148
Clostridioforme DUP 6123473 6123642 170
Clostridioforme BRK 6123643 7157610 1033968
Muribaculum_intestinale BRK 1 3306967 3306967
Muribaculum_intestinale DUP 3306968 3307069 102
```
However, `show-coords` output seem to be similar, not sure what are the differnces exaclty?
```
$ show-coords test/Cin.delta | xclip -i -selection clipboard
```
Open _details_ to see large file
:::spoiler
```
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [TAGS]
=====================================================================================
1 4469084 | 1 4469084 | 4469084 4469084 | 100.00 | Clostridium_innocuum Clostridium_innocuum
37000 37732 | 3640005 3639271 | 733 735 | 89.52 | Clostridium_innocuum Clostridium_innocuum
39423 41073 | 3637673 3636020 | 1651 1654 | 92.26 | Clostridium_innocuum Clostridium_innocuum
39423 41061 | 1487976 1486335 | 1639 1642 | 93.61 | Clostridium_innocuum Clostridium_innocuum
39423 41053 | 3248872 3247242 | 1631 1631 | 90.32 | Clostridium_innocuum Clostridium_innocuum
79146 79568 | 1252622 1252201 | 423 422 | 99.05 | Clostridium_innocuum Clostridium_innocuum
104939 105753 | 650018 649205 | 815 814 | 85.12 | Clostridium_innocuum Clostridium_innocuum
104941 105740 | 2146199 2145398 | 800 802 | 83.83 | Clostridium_innocuum Clostridium_innocuum
129616 129979 | 2036278 2035915 | 364 364 | 99.45 | Clostridium_innocuum Clostridium_innocuum
236448 236616 | 2639811 2639643 | 169 169 | 94.08 | Clostridium_innocuum Clostridium_innocuum
236785 236887 | 584147 584045 | 103 103 | 100.00 | Clostridium_innocuum Clostridium_innocuum
236886 238278 | 2412253 2410861 | 1393 1393 | 99.86 | Clostridium_innocuum Clostridium_innocuum
236886 238278 | 2639373 2637981 | 1393 1393 | 99.28 | Clostridium_innocuum Clostridium_innocuum
241359 241713 | 2407916 2407586 | 355 331 | 92.68 | Clostridium_innocuum Clostridium_innocuum
343208 343568 | 3266878 3266522 | 361 357 | 96.95 | Clostridium_innocuum Clostridium_innocuum
352727 352898 | 3375830 3375659 | 172 172 | 88.95 | Clostridium_innocuum Clostridium_innocuum
370683 371495 | 383901 383089 | 813 813 | 99.38 | Clostridium_innocuum Clostridium_innocuum
370687 371475 | 2146179 2145391 | 789 789 | 85.17 | Clostridium_innocuum Clostridium_innocuum
370688 371574 | 649995 649111 | 887 885 | 83.60 | Clostridium_innocuum Clostridium_innocuum
370687 371486 | 1613773 1612974 | 800 800 | 85.27 | Clostridium_innocuum Clostridium_innocuum
371368 371504 | 383079 382943 | 137 137 | 98.54 | Clostridium_innocuum Clostridium_innocuum
383089 383901 | 371495 370683 | 813 813 | 99.38 | Clostridium_innocuum Clostridium_innocuum
383769 383924 | 370677 370522 | 156 156 | 98.09 | Clostridium_innocuum Clostridium_innocuum
432119 433906 | 2887872 2886085 | 1788 1788 | 99.83 | Clostridium_innocuum Clostridium_innocuum
432121 433907 | 1939454 1937668 | 1787 1787 | 99.89 | Clostridium_innocuum Clostridium_innocuum
443901 445698 | 1903722 1901924 | 1798 1799 | 99.83 | Clostridium_innocuum Clostridium_innocuum
443906 445693 | 3165444 3163656 | 1788 1789 | 99.94 | Clostridium_innocuum Clostridium_innocuum
562366 564120 | 2412615 2410861 | 1755 1755 | 99.77 | Clostridium_innocuum Clostridium_innocuum
574892 575057 | 4202779 4202614 | 166 166 | 100.00 | Clostridium_innocuum Clostridium_innocuum
584045 584147 | 236887 236785 | 103 103 | 100.00 | Clostridium_innocuum Clostridium_innocuum
649205 650018 | 105753 104939 | 814 815 | 85.12 | Clostridium_innocuum Clostridium_innocuum
702391 702844 | 2578323 2577846 | 454 478 | 87.03 | Clostridium_innocuum Clostridium_innocuum
735357 737142 | 1939454 1937669 | 1786 1786 | 99.94 | Clostridium_innocuum Clostridium_innocuum
735355 737142 | 2887872 2886085 | 1788 1788 | 99.78 | Clostridium_innocuum Clostridium_innocuum
743331 745085 | 1746527 1744772 | 1755 1756 | 99.49 | Clostridium_innocuum Clostridium_innocuum
743334 745085 | 1024096 1022345 | 1752 1752 | 99.77 | Clostridium_innocuum Clostridium_innocuum
747132 747404 | 851433 851160 | 273 274 | 95.99 | Clostridium_innocuum Clostridium_innocuum
841624 842073 | 2578313 2577846 | 450 468 | 87.23 | Clostridium_innocuum Clostridium_innocuum
851160 851433 | 747404 747132 | 274 273 | 95.99 | Clostridium_innocuum Clostridium_innocuum
1022343 1024096 | 3044228 3042474 | 1754 1755 | 99.54 | Clostridium_innocuum Clostridium_innocuum
1022345 1024096 | 745085 743334 | 1752 1752 | 99.77 | Clostridium_innocuum Clostridium_innocuum
1023857 1024096 | 584046 583807 | 240 240 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1023858 1024098 | 573406 573166 | 241 241 | 99.59 | Clostridium_innocuum Clostridium_innocuum
1022345 1024098 | 1802632 1800878 | 1754 1755 | 99.60 | Clostridium_innocuum Clostridium_innocuum
1231365 1231504 | 2034465 2034326 | 140 140 | 97.14 | Clostridium_innocuum Clostridium_innocuum
1237205 1239156 | 1675607 1673656 | 1952 1952 | 98.00 | Clostridium_innocuum Clostridium_innocuum
1239015 1239154 | 1231504 1231365 | 140 140 | 97.14 | Clostridium_innocuum Clostridium_innocuum
1251152 1252621 | 2472388 2470940 | 1470 1449 | 97.55 | Clostridium_innocuum Clostridium_innocuum
1251153 1252621 | 3947639 3946191 | 1469 1449 | 92.85 | Clostridium_innocuum Clostridium_innocuum
1251153 1252622 | 3951009 3949560 | 1470 1450 | 97.48 | Clostridium_innocuum Clostridium_innocuum
1251154 1252624 | 1264719 1263250 | 1471 1470 | 92.67 | Clostridium_innocuum Clostridium_innocuum
1251154 1252621 | 2328923 2327476 | 1468 1448 | 92.92 | Clostridium_innocuum Clostridium_innocuum
1251851 1252621 | 4468984 4468214 | 771 771 | 97.15 | Clostridium_innocuum Clostridium_innocuum
1252201 1252622 | 79568 79146 | 422 423 | 99.05 | Clostridium_innocuum Clostridium_innocuum
1263250 1264719 | 1252624 1251154 | 1470 1471 | 92.67 | Clostridium_innocuum Clostridium_innocuum
1293617 1297262 | 3597352 3593708 | 3646 3645 | 90.45 | Clostridium_innocuum Clostridium_innocuum
1298550 1299860 | 3592331 3591021 | 1311 1311 | 90.78 | Clostridium_innocuum Clostridium_innocuum
1301902 1304875 | 3589426 3586453 | 2974 2974 | 98.72 | Clostridium_innocuum Clostridium_innocuum
1307136 1312128 | 3585210 3580230 | 4993 4981 | 82.52 | Clostridium_innocuum Clostridium_innocuum
1312975 1313835 | 3579404 3578544 | 861 861 | 83.33 | Clostridium_innocuum Clostridium_innocuum
1323918 1324105 | 3570362 3570178 | 188 185 | 84.13 | Clostridium_innocuum Clostridium_innocuum
1324785 1326773 | 3569496 3567507 | 1989 1990 | 92.38 | Clostridium_innocuum Clostridium_innocuum
1436799 1436902 | 4212099 4211996 | 104 104 | 99.04 | Clostridium_innocuum Clostridium_innocuum
1436799 1436909 | 1899631 1899521 | 111 111 | 95.50 | Clostridium_innocuum Clostridium_innocuum
1463710 1464016 | 3551157 3550860 | 307 298 | 86.17 | Clostridium_innocuum Clostridium_innocuum
1463710 1464016 | 4286109 4285812 | 307 298 | 87.62 | Clostridium_innocuum Clostridium_innocuum
1486344 1487976 | 3075989 3074357 | 1633 1633 | 92.11 | Clostridium_innocuum Clostridium_innocuum
1486335 1487976 | 41061 39423 | 1642 1639 | 93.61 | Clostridium_innocuum Clostridium_innocuum
1486345 1489503 | 2372099 2368940 | 3159 3160 | 93.83 | Clostridium_innocuum Clostridium_innocuum
1486333 1487976 | 1771751 1770107 | 1644 1645 | 93.86 | Clostridium_innocuum Clostridium_innocuum
1673656 1675607 | 1239156 1237205 | 1952 1952 | 98.00 | Clostridium_innocuum Clostridium_innocuum
1673657 1675610 | 2034466 2032521 | 1954 1946 | 97.45 | Clostridium_innocuum Clostridium_innocuum
1673652 1675610 | 2137134 2135153 | 1959 1982 | 96.42 | Clostridium_innocuum Clostridium_innocuum
1675609 1677057 | 2328924 2327476 | 1449 1449 | 93.93 | Clostridium_innocuum Clostridium_innocuum
1675609 1677060 | 4291981 4290530 | 1452 1452 | 99.93 | Clostridium_innocuum Clostridium_innocuum
1675610 1677058 | 2472386 2470939 | 1449 1448 | 99.86 | Clostridium_innocuum Clostridium_innocuum
1675610 1677057 | 3951008 3949561 | 1448 1448 | 99.93 | Clostridium_innocuum Clostridium_innocuum
1675610 1677057 | 3947638 3946191 | 1448 1448 | 93.86 | Clostridium_innocuum Clostridium_innocuum
1675715 1676529 | 80490 79676 | 815 815 | 91.66 | Clostridium_innocuum Clostridium_innocuum
1676287 1677057 | 4468984 4468214 | 771 771 | 96.37 | Clostridium_innocuum Clostridium_innocuum
1676637 1677057 | 79568 79147 | 421 422 | 98.82 | Clostridium_innocuum Clostridium_innocuum
1677058 1677132 | 2032529 2032455 | 75 75 | 98.67 | Clostridium_innocuum Clostridium_innocuum
1680693 1681030 | 3244881 3244544 | 338 338 | 93.79 | Clostridium_innocuum Clostridium_innocuum
1722532 1722808 | 3530140 3529865 | 277 276 | 94.58 | Clostridium_innocuum Clostridium_innocuum
1744770 1746526 | 3942290 3940535 | 1757 1756 | 99.54 | Clostridium_innocuum Clostridium_innocuum
1744757 1744939 | 333046 332867 | 183 180 | 98.36 | Clostridium_innocuum Clostridium_innocuum
1744771 1746525 | 3475307 3473554 | 1755 1754 | 99.72 | Clostridium_innocuum Clostridium_innocuum
1744771 1744937 | 585696 585530 | 167 167 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1744771 1746528 | 1802633 1800875 | 1758 1759 | 99.94 | Clostridium_innocuum Clostridium_innocuum
1744810 1746182 | 574881 573509 | 1373 1373 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1744772 1746524 | 3044226 3042474 | 1753 1753 | 99.94 | Clostridium_innocuum Clostridium_innocuum
1744772 1746524 | 4219583 4217831 | 1753 1753 | 99.32 | Clostridium_innocuum Clostridium_innocuum
1746286 1746526 | 331384 331144 | 241 241 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1744772 1746527 | 745085 743331 | 1756 1755 | 99.49 | Clostridium_innocuum Clostridium_innocuum
1748276 1750029 | 4120763 4119010 | 1754 1754 | 99.83 | Clostridium_innocuum Clostridium_innocuum
1767803 1768415 | 3251076 3250464 | 613 613 | 85.67 | Clostridium_innocuum Clostridium_innocuum
1770107 1771740 | 3248872 3247240 | 1634 1633 | 91.06 | Clostridium_innocuum Clostridium_innocuum
1770107 1771751 | 1487976 1486333 | 1645 1644 | 93.86 | Clostridium_innocuum Clostridium_innocuum
1770107 1771743 | 3637673 3636038 | 1637 1636 | 92.85 | Clostridium_innocuum Clostridium_innocuum
1772787 1772864 | 1795872 1795795 | 78 78 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1795795 1795872 | 1772864 1772787 | 78 78 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1800875 1802633 | 1746528 1744771 | 1759 1758 | 99.94 | Clostridium_innocuum Clostridium_innocuum
1872687 1874666 | 616002 614023 | 1980 1980 | 99.34 | Clostridium_innocuum Clostridium_innocuum
1872687 1874666 | 4003662 4001683 | 1980 1980 | 99.34 | Clostridium_innocuum Clostridium_innocuum
1872687 1872813 | 616139 616013 | 127 127 | 100.00 | Clostridium_innocuum Clostridium_innocuum
1937668 1939454 | 433907 432121 | 1787 1787 | 99.89 | Clostridium_innocuum Clostridium_innocuum
1937669 1939454 | 737142 735357 | 1786 1786 | 99.94 | Clostridium_innocuum Clostridium_innocuum
2032455 2032529 | 1677132 1677058 | 75 75 | 98.67 | Clostridium_innocuum Clostridium_innocuum
2032521 2034466 | 1675610 1673657 | 1946 1954 | 97.45 | Clostridium_innocuum Clostridium_innocuum
2034326 2034465 | 1231504 1231365 | 140 140 | 97.14 | Clostridium_innocuum Clostridium_innocuum
2035915 2036278 | 129979 129616 | 364 364 | 99.45 | Clostridium_innocuum Clostridium_innocuum
2061085 2061195 | 2223479 2223369 | 111 111 | 93.69 | Clostridium_innocuum Clostridium_innocuum
2099398 2100271 | 2243385 2242512 | 874 874 | 97.60 | Clostridium_innocuum Clostridium_innocuum
2099427 2100251 | 3467729 3466905 | 825 825 | 90.31 | Clostridium_innocuum Clostridium_innocuum
2099410 2100271 | 3276602 3275741 | 862 862 | 89.48 | Clostridium_innocuum Clostridium_innocuum
2135153 2137134 | 1675610 1673652 | 1982 1959 | 96.42 | Clostridium_innocuum Clostridium_innocuum
2217821 2217916 | 3746130 3746035 | 96 96 | 95.83 | Clostridium_innocuum Clostridium_innocuum
2217821 2217983 | 4113548 4113386 | 163 163 | 98.77 | Clostridium_innocuum Clostridium_innocuum
2217816 2217983 | 343573 343406 | 168 168 | 97.62 | Clostridium_innocuum Clostridium_innocuum
2217821 2217983 | 4235599 4235437 | 163 163 | 97.55 | Clostridium_innocuum Clostridium_innocuum
2242380 2243414 | 2799280 2798221 | 1035 1060 | 90.85 | Clostridium_innocuum Clostridium_innocuum
2242512 2242760 | 2100868 2100621 | 249 248 | 93.60 | Clostridium_innocuum Clostridium_innocuum
2242512 2243385 | 2100271 2099398 | 874 874 | 97.60 | Clostridium_innocuum Clostridium_innocuum
2242536 2243353 | 4384925 4384108 | 818 818 | 91.69 | Clostridium_innocuum Clostridium_innocuum
2253484 2253683 | 4411320 4411121 | 200 200 | 95.50 | Clostridium_innocuum Clostridium_innocuum
2295618 2297379 | 4120773 4119010 | 1762 1764 | 99.72 | Clostridium_innocuum Clostridium_innocuum
2295628 2297380 | 3475306 3473553 | 1753 1754 | 99.94 | Clostridium_innocuum Clostridium_innocuum
2327476 2328924 | 1677057 1675609 | 1449 1449 | 93.93 | Clostridium_innocuum Clostridium_innocuum
2368940 2372099 | 3250392 3247242 | 3160 3151 | 91.31 | Clostridium_innocuum Clostridium_innocuum
2368940 2372099 | 1489503 1486345 | 3160 3159 | 93.83 | Clostridium_innocuum Clostridium_innocuum
2368940 2372099 | 3639200 3636043 | 3160 3158 | 93.35 | Clostridium_innocuum Clostridium_innocuum
2369213 2369507 | 3212063 3211769 | 295 295 | 99.32 | Clostridium_innocuum Clostridium_innocuum
2407586 2407916 | 241713 241359 | 331 355 | 92.68 | Clostridium_innocuum Clostridium_innocuum
2407790 2410720 | 567190 564260 | 2931 2931 | 99.97 | Clostridium_innocuum Clostridium_innocuum
2407790 2410720 | 241348 238418 | 2931 2931 | 99.97 | Clostridium_innocuum Clostridium_innocuum
2407585 2412618 | 4103577 4098545 | 5034 5033 | 99.62 | Clostridium_innocuum Clostridium_innocuum
2410861 2412253 | 238278 236886 | 1393 1393 | 99.86 | Clostridium_innocuum Clostridium_innocuum
2410861 2412615 | 564120 562366 | 1755 1755 | 99.77 | Clostridium_innocuum Clostridium_innocuum
2412521 2412618 | 236618 236521 | 98 98 | 96.94 | Clostridium_innocuum Clostridium_innocuum
2446342 2446605 | 3641083 3640820 | 264 264 | 96.59 | Clostridium_innocuum Clostridium_innocuum
2470939 2472386 | 1677058 1675610 | 1448 1449 | 99.86 | Clostridium_innocuum Clostridium_innocuum
2470940 2472388 | 1252621 1251152 | 1449 1470 | 97.55 | Clostridium_innocuum Clostridium_innocuum
2509085 2510854 | 4120773 4119004 | 1770 1770 | 99.60 | Clostridium_innocuum Clostridium_innocuum
2510609 2510848 | 573406 573167 | 240 240 | 100.00 | Clostridium_innocuum Clostridium_innocuum
2509084 2510858 | 1802646 1800871 | 1775 1776 | 99.38 | Clostridium_innocuum Clostridium_innocuum
2577846 2578313 | 842073 841624 | 468 450 | 87.23 | Clostridium_innocuum Clostridium_innocuum
2577846 2578323 | 702844 702391 | 478 454 | 87.03 | Clostridium_innocuum Clostridium_innocuum
2634649 2635036 | 567590 567200 | 388 391 | 98.47 | Clostridium_innocuum Clostridium_innocuum
2634703 2639764 | 4103578 4098519 | 5062 5060 | 99.41 | Clostridium_innocuum Clostridium_innocuum
2634705 2635035 | 241713 241359 | 331 355 | 92.39 | Clostridium_innocuum Clostridium_innocuum
2635036 2637839 | 567063 564260 | 2804 2804 | 99.96 | Clostridium_innocuum Clostridium_innocuum
2635035 2637839 | 241222 238418 | 2805 2805 | 99.96 | Clostridium_innocuum Clostridium_innocuum
2637981 2639764 | 564120 562338 | 1784 1783 | 98.65 | Clostridium_innocuum Clostridium_innocuum
2637981 2639373 | 238278 236886 | 1393 1393 | 99.28 | Clostridium_innocuum Clostridium_innocuum
2639643 2639811 | 236616 236448 | 169 169 | 94.08 | Clostridium_innocuum Clostridium_innocuum
2698252 2700225 | 3318245 3316272 | 1974 1974 | 99.95 | Clostridium_innocuum Clostridium_innocuum
2698252 2700221 | 3954040 3952071 | 1970 1970 | 99.95 | Clostridium_innocuum Clostridium_innocuum
2798221 2799280 | 2243414 2242380 | 1060 1035 | 90.85 | Clostridium_innocuum Clostridium_innocuum
2798283 2799104 | 3467726 3466905 | 822 822 | 90.88 | Clostridium_innocuum Clostridium_innocuum
2798295 2799124 | 3276570 3275741 | 830 830 | 90.60 | Clostridium_innocuum Clostridium_innocuum
2886085 2887872 | 433906 432119 | 1788 1788 | 99.83 | Clostridium_innocuum Clostridium_innocuum
2916757 2917173 | 2939721 2939365 | 417 357 | 83.45 | Clostridium_innocuum Clostridium_innocuum
2916956 2917826 | 3036175 3035305 | 871 871 | 95.06 | Clostridium_innocuum Clostridium_innocuum
2917364 2917827 | 2939364 2938901 | 464 464 | 95.26 | Clostridium_innocuum Clostridium_innocuum
2938901 2939364 | 2917827 2917364 | 464 464 | 95.26 | Clostridium_innocuum Clostridium_innocuum
2939368 2939555 | 3467084 3466897 | 188 188 | 93.62 | Clostridium_innocuum Clostridium_innocuum
2939365 2939721 | 2917173 2916757 | 357 417 | 83.45 | Clostridium_innocuum Clostridium_innocuum
2949804 2950673 | 3329924 3329055 | 870 870 | 97.82 | Clostridium_innocuum Clostridium_innocuum
2957610 2958473 | 3329924 3329061 | 864 864 | 98.50 | Clostridium_innocuum Clostridium_innocuum
3022506 3023237 | 4320497 4319766 | 732 732 | 100.00 | Clostridium_innocuum Clostridium_innocuum
3035305 3036175 | 2917826 2916956 | 871 871 | 95.06 | Clostridium_innocuum Clostridium_innocuum
3042474 3044228 | 1024096 1022343 | 1755 1754 | 99.54 | Clostridium_innocuum Clostridium_innocuum
3071933 3072678 | 3640005 3639258 | 746 748 | 88.52 | Clostridium_innocuum Clostridium_innocuum
3074357 3075989 | 1487976 1486344 | 1633 1633 | 92.11 | Clostridium_innocuum Clostridium_innocuum
3074357 3075989 | 3637673 3636042 | 1633 1632 | 91.86 | Clostridium_innocuum Clostridium_innocuum
3074357 3075989 | 3248872 3247241 | 1633 1632 | 92.11 | Clostridium_innocuum Clostridium_innocuum
3163656 3165444 | 445693 443906 | 1789 1788 | 99.94 | Clostridium_innocuum Clostridium_innocuum
3173589 3173821 | 80732 80500 | 233 233 | 98.28 | Clostridium_innocuum Clostridium_innocuum
3173589 3175075 | 2328923 2327476 | 1487 1448 | 94.49 | Clostridium_innocuum Clostridium_innocuum
3173694 3174528 | 80490 79676 | 835 815 | 93.77 | Clostridium_innocuum Clostridium_innocuum
3173588 3175075 | 3947639 3946191 | 1488 1449 | 94.43 | Clostridium_innocuum Clostridium_innocuum
3174286 3175075 | 4468984 4468214 | 790 771 | 95.58 | Clostridium_innocuum Clostridium_innocuum
3173589 3175076 | 1264719 1263252 | 1488 1468 | 89.67 | Clostridium_innocuum Clostridium_innocuum
3211769 3212063 | 2369507 2369213 | 295 295 | 99.32 | Clostridium_innocuum Clostridium_innocuum
3244544 3244881 | 1681030 1680693 | 338 338 | 93.79 | Clostridium_innocuum Clostridium_innocuum
3247242 3247696 | 41053 40599 | 455 455 | 92.11 | Clostridium_innocuum Clostridium_innocuum
3247242 3250128 | 2372099 2369213 | 2887 2887 | 92.67 | Clostridium_innocuum Clostridium_innocuum
3247241 3248872 | 3075989 3074357 | 1632 1633 | 92.11 | Clostridium_innocuum Clostridium_innocuum
3247240 3248872 | 1771740 1770107 | 1633 1634 | 91.06 | Clostridium_innocuum Clostridium_innocuum
3250433 3251076 | 37763 37120 | 644 644 | 84.81 | Clostridium_innocuum Clostridium_innocuum
3266521 3266878 | 4113549 4113188 | 358 362 | 95.59 | Clostridium_innocuum Clostridium_innocuum
3266522 3266878 | 343568 343208 | 357 361 | 96.95 | Clostridium_innocuum Clostridium_innocuum
3266522 3266878 | 3512283 3511924 | 357 360 | 96.12 | Clostridium_innocuum Clostridium_innocuum
3266522 3266878 | 4235599 4235239 | 357 361 | 95.57 | Clostridium_innocuum Clostridium_innocuum
3275733 3276585 | 4384958 4384105 | 853 854 | 93.21 | Clostridium_innocuum Clostridium_innocuum
3275741 3276570 | 2799124 2798295 | 830 830 | 90.60 | Clostridium_innocuum Clostridium_innocuum
3276059 3276456 | 3035842 3035445 | 398 398 | 84.92 | Clostridium_innocuum Clostridium_innocuum
3276134 3276456 | 2939364 2939042 | 323 323 | 85.76 | Clostridium_innocuum Clostridium_innocuum
3316272 3318245 | 2700225 2698252 | 1974 1974 | 99.95 | Clostridium_innocuum Clostridium_innocuum
3316271 3318246 | 4151450 4149475 | 1976 1976 | 99.90 | Clostridium_innocuum Clostridium_innocuum
3328977 3329924 | 2950755 2949804 | 948 952 | 95.39 | Clostridium_innocuum Clostridium_innocuum
3329061 3329924 | 2958473 2957610 | 864 864 | 98.50 | Clostridium_innocuum Clostridium_innocuum
3375659 3375830 | 352898 352727 | 172 172 | 88.95 | Clostridium_innocuum Clostridium_innocuum
3466897 3467084 | 2939555 2939368 | 188 188 | 93.62 | Clostridium_innocuum Clostridium_innocuum
3466912 3467740 | 4384922 4384094 | 829 829 | 93.97 | Clostridium_innocuum Clostridium_innocuum
3466905 3467726 | 2799104 2798283 | 822 822 | 90.88 | Clostridium_innocuum Clostridium_innocuum
3473553 3475306 | 2297380 2295628 | 1754 1753 | 99.94 | Clostridium_innocuum Clostridium_innocuum
3473554 3475306 | 1750029 1748277 | 1753 1753 | 99.94 | Clostridium_innocuum Clostridium_innocuum
3473555 3475306 | 1024096 1022345 | 1752 1752 | 99.89 | Clostridium_innocuum Clostridium_innocuum
3511924 3512283 | 3266878 3266522 | 360 357 | 96.12 | Clostridium_innocuum Clostridium_innocuum
3529865 3530140 | 1722808 1722532 | 276 277 | 94.58 | Clostridium_innocuum Clostridium_innocuum
3550860 3551157 | 1464016 1463710 | 298 307 | 86.17 | Clostridium_innocuum Clostridium_innocuum
3567507 3569496 | 1326773 1324785 | 1990 1989 | 92.38 | Clostridium_innocuum Clostridium_innocuum
3570178 3570362 | 1324105 1323918 | 185 188 | 84.13 | Clostridium_innocuum Clostridium_innocuum
3578544 3579404 | 1313835 1312975 | 861 861 | 83.33 | Clostridium_innocuum Clostridium_innocuum
3580230 3585210 | 1312128 1307136 | 4981 4993 | 82.52 | Clostridium_innocuum Clostridium_innocuum
3586453 3589426 | 1304875 1301902 | 2974 2974 | 98.72 | Clostridium_innocuum Clostridium_innocuum
3591021 3592331 | 1299860 1298550 | 1311 1311 | 90.78 | Clostridium_innocuum Clostridium_innocuum
3593708 3597352 | 1297262 1293617 | 3645 3646 | 90.45 | Clostridium_innocuum Clostridium_innocuum
3632259 3634044 | 1903717 1901932 | 1786 1786 | 100.00 | Clostridium_innocuum Clostridium_innocuum
3632259 3634044 | 3165444 3163659 | 1786 1786 | 100.00 | Clostridium_innocuum Clostridium_innocuum
3636020 3637673 | 41073 39423 | 1654 1651 | 92.26 | Clostridium_innocuum Clostridium_innocuum
3636038 3637673 | 1771743 1770107 | 1636 1637 | 92.85 | Clostridium_innocuum Clostridium_innocuum
3636043 3639200 | 2372099 2368940 | 3158 3160 | 93.35 | Clostridium_innocuum Clostridium_innocuum
3639258 3640005 | 3072678 3071933 | 748 746 | 88.52 | Clostridium_innocuum Clostridium_innocuum
3639271 3640005 | 1768415 1767683 | 735 733 | 88.57 | Clostridium_innocuum Clostridium_innocuum
3639271 3640005 | 37732 37000 | 735 733 | 89.52 | Clostridium_innocuum Clostridium_innocuum
3640820 3641083 | 2446605 2446342 | 264 264 | 96.59 | Clostridium_innocuum Clostridium_innocuum
3756417 3757639 | 3761967 3760745 | 1223 1223 | 84.38 | Clostridium_innocuum Clostridium_innocuum
3760745 3761967 | 3757639 3756417 | 1223 1223 | 84.38 | Clostridium_innocuum Clostridium_innocuum
3855196 3855819 | 4430334 4429731 | 624 604 | 92.47 | Clostridium_innocuum Clostridium_innocuum
3856032 3856747 | 4429535 4428820 | 716 716 | 97.35 | Clostridium_innocuum Clostridium_innocuum
3940537 3942288 | 1024096 1022345 | 1752 1752 | 99.83 | Clostridium_innocuum Clostridium_innocuum
3940536 3942288 | 1750029 1748277 | 1753 1753 | 99.89 | Clostridium_innocuum Clostridium_innocuum
3940535 3942290 | 1746526 1744770 | 1756 1757 | 99.54 | Clostridium_innocuum Clostridium_innocuum
3952046 3954044 | 4165203 4163206 | 1999 1998 | 99.60 | Clostridium_innocuum Clostridium_innocuum
3952068 3954043 | 3937706 3935733 | 1976 1974 | 99.80 | Clostridium_innocuum Clostridium_innocuum
3990989 3991071 | 4292168 4292086 | 83 83 | 100.00 | Clostridium_innocuum Clostridium_innocuum
4098545 4103577 | 2412618 2407585 | 5033 5034 | 99.62 | Clostridium_innocuum Clostridium_innocuum
4098519 4103578 | 2639764 2634703 | 5060 5062 | 99.41 | Clostridium_innocuum Clostridium_innocuum
4119005 4120762 | 4204373 4202615 | 1758 1759 | 99.77 | Clostridium_innocuum Clostridium_innocuum
4119004 4120773 | 2510854 2509085 | 1770 1770 | 99.60 | Clostridium_innocuum Clostridium_innocuum
4119010 4120763 | 1750029 1748276 | 1754 1754 | 99.83 | Clostridium_innocuum Clostridium_innocuum
4119010 4120773 | 2297379 2295618 | 1764 1762 | 99.72 | Clostridium_innocuum Clostridium_innocuum
4149475 4151450 | 3318246 3316271 | 1976 1976 | 99.90 | Clostridium_innocuum Clostridium_innocuum
4149476 4151445 | 3954040 3952071 | 1970 1970 | 99.95 | Clostridium_innocuum Clostridium_innocuum
4163206 4165203 | 3954044 3952046 | 1998 1999 | 99.60 | Clostridium_innocuum Clostridium_innocuum
4202614 4202779 | 575057 574892 | 166 166 | 100.00 | Clostridium_innocuum Clostridium_innocuum
4202615 4204373 | 4120762 4119005 | 1759 1758 | 99.77 | Clostridium_innocuum Clostridium_innocuum
4211996 4212099 | 1436902 1436799 | 104 104 | 99.04 | Clostridium_innocuum Clostridium_innocuum
4235437 4235599 | 2217983 2217821 | 163 163 | 97.55 | Clostridium_innocuum Clostridium_innocuum
4285812 4286109 | 1464016 1463710 | 298 307 | 87.62 | Clostridium_innocuum Clostridium_innocuum
4290530 4291981 | 1677060 1675609 | 1452 1452 | 99.93 | Clostridium_innocuum Clostridium_innocuum
4290533 4291980 | 1252621 1251154 | 1448 1468 | 97.48 | Clostridium_innocuum Clostridium_innocuum
4292086 4292168 | 3991071 3990989 | 83 83 | 100.00 | Clostridium_innocuum Clostridium_innocuum
4292090 4292168 | 3988823 3988745 | 79 79 | 98.73 | Clostridium_innocuum Clostridium_innocuum
4319766 4320497 | 3023237 3022506 | 732 732 | 100.00 | Clostridium_innocuum Clostridium_innocuum
4319763 4320497 | 1772752 1772018 | 735 735 | 99.86 | Clostridium_innocuum Clostridium_innocuum
4384108 4384925 | 2243353 2242536 | 818 818 | 91.69 | Clostridium_innocuum Clostridium_innocuum
4384105 4384958 | 3276585 3275733 | 854 853 | 93.21 | Clostridium_innocuum Clostridium_innocuum
4384094 4384922 | 3467740 3466912 | 829 829 | 93.97 | Clostridium_innocuum Clostridium_innocuum
4411121 4411320 | 2253683 2253484 | 200 200 | 95.50 | Clostridium_innocuum Clostridium_innocuum
4428820 4429535 | 3856747 3856032 | 716 716 | 97.35 | Clostridium_innocuum Clostridium_innocuum
4429731 4430334 | 3855819 3855196 | 604 624 | 92.47 | Clostridium_innocuum Clostridium_innocuum
3951295 3951472 | 3074534 3074357 | 178 178 | 97.19 | Clostridioforme Clostridium_innocuum
3951295 3951472 | 1770284 1770107 | 178 178 | 94.94 | Clostridioforme Clostridium_innocuum
3951296 3953537 | 2370644 2368404 | 2242 2241 | 91.27 | Clostridioforme Clostridium_innocuum
3958273 3958492 | 3067381 3067163 | 220 219 | 85.52 | Clostridioforme Clostridium_innocuum
3958273 3958492 | 1762678 1762460 | 220 219 | 85.52 | Clostridioforme Clostridium_innocuum
5416940 5421324 | 3597351 3592961 | 4385 4391 | 90.08 | Clostridioforme Clostridium_innocuum
6123473 6123642 | 74263 74094 | 170 170 | 83.04 | Clostridioforme Clostridium_innocuum
1237503 1239438 | 1314865 1312919 | 1936 1947 | 83.10 | Blautia_coccoides Clostridium_innocuum
1410981 1411521 | 1322232 1321692 | 541 541 | 86.53 | Blautia_coccoides Clostridium_innocuum
3306968 3307069 | 720839 720738 | 102 102 | 100.00 | Muribaculum_intestinale Clostridium_innocuum
3406852 3407744 | 69318 68426 | 893 893 | 78.24 | Acutalibacter_muris Clostridium_innocuum
```
:::
:::danger
There are many intra-sample duplications. Is this real or a artifact of alignment or due to a bad-reference genome?
:::
---
### Generate Mummer output for all _new_ OMM~12~ ref sequences
- Added snakemake pipeline for analysis [commit](https://github.com/philippmuench/OligoMM-report/commit/f443ef8ccb4be68a2dc6b255b45b30ff8827ccc5)
- start on new genomes `snakemake -s Snakefile_reference --jobs 40`
- start on old genomes `snakemake -s Snakefile_reference_old --jobs 40`
### Generate duplication statistics
```bash
cd /home/aime/projects/oligomm-claudia/databases/omm_new/duplication_coords
cat *.tsv > merged.tsv
# manually remove all lines that have _identity_ at last column
rds_all_genomes.tsv
```
- 2903 duplications new ref set (2904 in old)
- of which 2618 (90%) are in same genome
- 285 between species
- have same size (mean = 1123.511; sd = 1187.204)
Counts of same-genome similarities
```
Clostridioforme 738
B_caecimuris 437
Clostridium_innocuum 263
Muribaculum_intestinale 255
Acutalibacter_muris 244
T_muris 166
Lactobacillus_reuteri_I49_1 142
F_plautii_1 113
Enterococcus_faecalis 111
Blautia_coccoides 66
Akkermansia_muciniphila 61
Bifidobacterium_animalis_YL2_1 15
Bifidobacterium_animalis_YL2_2 7
```
Counts of between-genome similarities
```
Clostridioforme 83
F_plautii_1 62
Acutalibacter_muris 37
Blautia_coccoides 35
Muribaculum_intestinale 30
B_caecimuris 20
Clostridium_innocuum 10
Bifidobacterium_animalis_YL2_1 3
Bifidobacterium_animalis_YL2_2 3
Enterococcus_faecalis 1
Lactobacillus_reuteri_I49_1 1
```
Duplicated nucleotides
- between: 320,672
- same genome: 2,941,351
Genome sizes new
```
Akkermansia_muciniphila 2737357
Acutalibacter_muris 3802913
Bifidobacterium_animalis_YL2_1 1133782
Bifidobacterium_animalis_YL2_2 888144
B_caecimuris 4800606
Blautia_coccoides 5128582
Clostridium_innocuum 4469084
Clostridioforme 7157610
Enterococcus_faecalis 3025655
F_plautii_1 3743035
F_plautii_2 70670
Lactobacillus_reuteri_I49_1 2045096
Lactobacillus_reuteri_I49_2 14338
Lactobacillus_reuteri_I49_3 4170
Muribaculum_intestinale 3307069
T_muris 2887949
```
Genome sizes old 45,215,430
Genome sizes new :45,216,060
:::success
- 7.2 % of total reference is duplicated (new and old reference set similar)
:::
```
genome var_in_dup var_total %
CP021420.1 0 11 0
CP021421.1 13 56 23
CP022713.1 10 24 42
NHMU01000001.1 32 34 94
NHMU01000002.1 1 3 33
NHMU01000003.1 11 18 61
NHMU01000004.1 56 66 85
NHMU01000005.1 5 10 50
NHMU01000008.1 14 14 100
NHMU01000009.1 12 45 27
NHMU01000012.1 0 3 0
NHMU01000018.1 7 7 100
NHTR01000001.1 18 68 26
NHTR01000002.1 27 36 75
NHTR01000003.1 0 5 0
NHTR01000004.1 21 30 70
NHTR01000006.1 29 29 100
NHTR01000007.1 6 6 100
NHTR01000008.1 7 8 88
NHTR01000009.1 26 28 93
NHTR01000011.1 10 45 22
CP022722.1 9 28 32
NHMP01000002.1 0 7 0
NHMP01000003.1 0 2 0
NHMP01000004.1 2 9 22
NHMQ01000001.1 14 38 37
NHMQ01000002.1 1 15 7
NHMQ01000003.1 0 2 0
NHMU01000013.1 0 5 0
NHMP01000005.1 0 18 0
NHMP01000007.1 0 10 0
NHMR02000001.1 0 2 0
NHMR02000002.1 0 4 0
NHMT01000001.1 9 19 47
NHMP01000012.1 11 11 100
NHTR01000010.1 0 2 0
CP021422.1 0 1 0
CP022712.1 3 3 100
NHMP01000009.1 0 5 0
```
variants iun duplicated regions: 48.69326%