---
title: 'ROH'
disqus: hackmd
---
Interpreting ROH
===
NOTE TO WARREN:
I re-ran this analysis with a more informed parametrization and the results told a completely different story, mainly that the Newfoundland (island population) samples have a higher count of ROH, when shorter ROH <1000bp are included. The results below suggested that at least one individual (LIC46) is highly inbred and has a high proportion of the genome (0.09) in ROH. So, I think there is some additional thought that needs to go into the parameterization of this analysis. Please see the updated results here:
and read the general description below. Thank you!
#### A brief overview of ROH and what we're doing here:
{add more here but basically...} Shorter ROH display more ancient inbreeding, while longer ROH show more recent inbreeding [24].
I've used the detectRUNS(0.9.6) package in R to detect runs of homozygosity and summarize the results.
#### A full description of the analyses can be found here:
## A couple of notes:
1. I think some changes will be made to the parameterization of the sliding-window ROH detection (particularly minLengthBps and maxMissing).
This paper was a great resource for informed parameter-setting: http://europepmc.org/article/PMC/5787230. It's important to parametrize carefully because a too-small minimum window size (e.g. 5 SNPs) will overestimate inbreeding rates according to simulations.
3. We use PLINK ped/map inputs here, but I suspect we have an unplaced scaffold in the mix (allow-extra-chr in PLINK). We may need to rerun PLINK and try new input files without SNPs on the unplaced scaffold, so that it will play nicely with R.
Request an interactive session to run this command
bsub -q interactive -R rusage[mem=24000] -n 1 -W 30 -Is bash
Use plink, not vcftools
plink --recode --vcf allsamples.vcf.gz --allow-extra-chr
5. I've used the 6SV.vcf, which is a filtered VCF that has *not* been further reduced (e.g. for LD-pruning). I've not determined if SNPs need to be in linkage equilibrium to analyze ROH. It seems like LD-pruning would inherently reduce our ability to acurately detect ROH, but I don't know and need to look into this further.
6. We need to evaluate three ROH length classes because they tell us different things about historic Ne. However, what constitutes a short/medium/long ROH is hard to pinpoint. I've evaluated the literature and found some concensus around these size classes:
**"short"**(less than 100 kb) and
**medium** (0.1 to 3 Mb) ROH regions have *more* deleterious variants than **long** (> 3 Mb) ROH regions.
# Results
These are the results from a first run of analyses, but I've since adjusted the min n SNPs to detect shorter (<250000bp) ROH.
1. Min/Max/Mean
Note that none of the ROH are > 3Mb, they are all in the short to medium range.
maximum length ROH (Mb) 1.26534
minimum length ROH (Mb) 0.250011
mean legth ROH (Mb) by group
| NFLD|northSLR|southSLR|
| -------- | -------- | -------- |
| 0.3117209| 0.3069324|0.3396391 |
2. ROH count by group
Group names: Newfoundland, North of the St Lawrence River, South of the St Lawrence River. Note that population assignment was based on results from several clustering analyses (PCA, sNMF).
| NFLD|northSLR|southSLR|
| -------- | -------- | -------- |
| 665 | 1396 | 6695 |
3. Distribution of ROH by length and individual

Fig1: The number of ROH by individual (colored by group), distributed by mean length of ROH in Mb (x-axis). Group names are abbreviated for "Newfoundland", "North-" and "South of the St Lawrence River". The outlier to the top right is LIC46, a lynx from Maine, south of the St. Lawrence River

Fig2: Same as above, but distributed along the sum of ROH in Mb (x-axis). Note that NFLD and North are somewhat tightly distributed but Maine has a much broader distribution of ROH and includes outlier LIC46 (top right, blue).
5. Estimating genome-wide FROH
FROH is a measure of inbreeding that ranges from 0 to 1 (like the pedigree inbreeding coefficient FP), and estimates the fraction of the genome in ROH, where identical-by-descent chromosome copies coalesce in a “recent” ancestor.
See: https://www.pnas.org/content/pnas/early/2018/02/16/1714475115.full.pdf
id group sum genome_wide_FROH
1 a109 southSLR 109076291 0.045283027
2 a182 southSLR 88868071 0.036893584
3 a202 northSLR 19946211 0.00828067
4 a33 northSLR 16021104 0.006651162
5 a475 NFLD 69859782 0.029002292
6 a494 NFLD 61099018 0.02536526
7 a507 southSLR 145277232 0.060311849
8 a697 southSLR 1627327 0.000675585
9 a772 northSLR 17801140 0.007390144
10 a794 southSLR 42909881 0.017814039
11 a803 northSLR 16911461 0.007020794
12 a818 northSLR 38136027 0.015832173
13 a857 southSLR 51597748 0.021420807
14 b114 northSLR 27589392 0.011453737
15 b124 northSLR 34457599 0.014305074
16 b13 southSLR 15865049 0.006586376
17 b188 northSLR 47453535 0.019700337
18 b23 southSLR 98115926 0.040732831
19 b276 NFLD 76335584 0.031690721
20 b554 northSLR 37901829 0.015734946
21 b90 southSLR 2303022 0.0009561
22 c165 northSLR 16788137 0.006969596
23 c323 northSLR 19320646 0.008020967
24 c548 southSLR 97195636 0.040350773
25 f264 northSLR 14402970 0.005979394
26 f457 northSLR 16097324 0.006682805
27 fha_024 northSLR 25417751 0.01055218
28 fha_042 northSLR 52954667 0.021984132
29 fha_043 northSLR 13356791 0.005545072
30 l09_003 southSLR 33181222 0.013775186
31 l09_007 northSLR 13921106 0.005779348
32 l09_015 southSLR 83475846 0.034655001
33 L155 southSLR 6660924 0.002765283
34 LIC11 southSLR 16738026 0.006948792
35 LIC20 southSLR 22522262 0.009350118
36 LIC23 southSLR 16031019 0.006655278
37 LIC24 southSLR 43162205 0.017918791
38 LIC27B southSLR 19214958 0.007977091
39 LIC28 southSLR 18186905 0.007550294
40 LIC31 southSLR 51106214 0.021216747
41 LIC32 southSLR 149313408 0.061987468
42 LIC36 southSLR 67959915 0.028213562
43 LIC46 southSLR 236214996 0.098064666
44 LIC47 southSLR 116967778 0.048559178
45 LIC48 southSLR 123256833 0.05117008
46 LIC54 southSLR 70507565 0.029271219
47 LIC57 southSLR 90552558 0.037592898
48 LIC60 southSLR 84435937 0.035053583
49 LIC8 southSLR 32280875 0.013401407
50 LIC9 southSLR 65922805 0.027367855
51 LIT2 southSLR 26886377 0.011161881
52 LIT5 southSLR 56071155 0.023277942
53 LRK10 southSLR 66665460 0.027676169
54 LRK11 southSLR 15109721 0.006272801
55 LRK12 southSLR 6523076 0.002708055
56 LRK13 southSLR 19836946 0.008235309
57 LRK17 southSLR 54441704 0.022601476
58 LRK22 southSLR 27821056 0.011549913
Maximum FROH: 0.098064666
^This is LIC46 (a lynx from Maine, south of the St Lawrence River).
I wanted to break the results up into three categories (small10-100kb, medium100kb-1Mb, long>1Mb). So I binned the results and here they are:
# Medium
$summary_ROH_count_chr
NFLD northSLR southSLR
NC_044303.1 966 3406 7915
NC_044304.1 589 2270 5050
NC_044305.1 545 1743 4821
NC_044306.1 911 3035 6662
NC_044307.1 610 2392 5333
NC_044308.1 577 2013 5240
NC_044309.1 599 2171 5199
NC_044310.1 1099 3230 6742
NC_044311.1 791 2286 5770
NC_044312.1 457 1577 3614
NC_044313.1 311 1168 2678
NC_044314.1 378 1104 2561
NC_044315.1 355 1184 3020
NC_044316.1 132 540 1081
NC_044317.1 193 597 1647
NC_044318.1 112 320 889
NC_044319.1 233 710 2022
NC_044320.1 357 1049 2402
NC_044321.1 569 2154 3942
NW_022059694.1 NA 4 11
NW_022059695.1 NA 2 17
NW_022059696.1 NA 2 14
NW_022059697.1 NA NA 2
NW_022059698.1 6 16 26
NW_022059699.1 NA NA 3
$summary_ROH_percentage_chr
NFLD northSLR southSLR chrom
NC_044303.1 0.0986721144 0.10329663664 0.10324676172 NC_044303.1
NC_044304.1 0.0601634321 0.06884420587 0.06587443420 NC_044304.1
NC_044305.1 0.0556690501 0.05286143208 0.06288725688 NC_044305.1
NC_044306.1 0.0930541369 0.09204500652 0.08690207537 NC_044306.1
NC_044307.1 0.0623084780 0.07254420283 0.06956601140 NC_044307.1
NC_044308.1 0.0589376915 0.06104994996 0.06835287826 NC_044308.1
NC_044309.1 0.0611848825 0.06584174931 0.06781805612 NC_044309.1
NC_044310.1 0.1122574055 0.09795893610 0.08794563076 NC_044310.1
NC_044311.1 0.0807967314 0.06932945137 0.07526643274 NC_044311.1
NC_044312.1 0.0466802860 0.04782700998 0.04714261489 NC_044312.1
NC_044313.1 0.0317671093 0.03542292178 0.03493301679 NC_044313.1
NC_044314.1 0.0386108274 0.03348193977 0.03340681703 NC_044314.1
NC_044315.1 0.0362614913 0.03590816729 0.03939421609 NC_044315.1
NC_044316.1 0.0134831461 0.01637703576 0.01410104225 NC_044316.1
NC_044317.1 0.0197139939 0.01810572286 0.02148419666 NC_044317.1
NC_044318.1 0.0114402451 0.00970491008 0.01159650931 NC_044318.1
NC_044319.1 0.0237997957 0.02153276924 0.02637586256 NC_044319.1
NC_044320.1 0.0364657814 0.03181390835 0.03133275068 NC_044320.1
NC_044321.1 0.0581205312 0.06532617596 0.05142119200 NC_044321.1
NW_022059694.1 NA 0.00012131138 0.00014348887 NW_022059694.1
NW_022059695.1 NA 0.00006065569 0.00022175552 NW_022059695.1
NW_022059696.1 NA 0.00006065569 0.00018262219 NW_022059696.1
NW_022059697.1 NA NA 0.00002608888 NW_022059697.1
NW_022059698.1 0.0006128703 0.00048524550 0.00033915550 NW_022059698.1
NW_022059699.1 NA NA 0.00003913333 NW_022059699.1
$summary_ROH_count
NFLD northSLR southSLR
0-1 9790 32973 76661
$summary_ROH_percentage
NFLD northSLR southSLR CLASS
0-1 1 1 1 0-1
$summary_ROH_mean_chr
group chrom sum
1 NFLD NC_044303.1 0.1535463
2 NFLD NC_044304.1 0.1539820
3 NFLD NC_044305.1 0.1507903
4 NFLD NC_044306.1 0.1525205
5 NFLD NC_044307.1 0.1551508
6 NFLD NC_044308.1 0.1493451
7 NFLD NC_044309.1 0.1496168
8 NFLD NC_044310.1 0.1591317
9 NFLD NC_044311.1 0.1552105
10 NFLD NC_044312.1 0.1502754
11 NFLD NC_044313.1 0.1511492
12 NFLD NC_044314.1 0.1521889
13 NFLD NC_044315.1 0.1513121
14 NFLD NC_044316.1 0.1369083
15 NFLD NC_044317.1 0.1512648
16 NFLD NC_044318.1 0.1522134
17 NFLD NC_044319.1 0.1504932
18 NFLD NC_044320.1 0.1461599
19 NFLD NC_044321.1 0.1697721
20 NFLD NW_022059698.1 0.1352262
21 northSLR NC_044303.1 0.1414698
22 northSLR NC_044304.1 0.1425300
23 northSLR NC_044305.1 0.1412502
24 northSLR NC_044306.1 0.1443095
25 northSLR NC_044307.1 0.1465497
26 northSLR NC_044308.1 0.1432942
27 northSLR NC_044309.1 0.1443501
28 northSLR NC_044310.1 0.1437766
29 northSLR NC_044311.1 0.1436200
30 northSLR NC_044312.1 0.1421168
31 northSLR NC_044313.1 0.1444451
32 northSLR NC_044314.1 0.1434877
33 northSLR NC_044315.1 0.1456807
34 northSLR NC_044316.1 0.1423822
35 northSLR NC_044317.1 0.1447175
36 northSLR NC_044318.1 0.1343179
37 northSLR NC_044319.1 0.1409540
38 northSLR NC_044320.1 0.1408060
39 northSLR NC_044321.1 0.1559180
40 northSLR NW_022059694.1 0.2428732
41 northSLR NW_022059695.1 0.1550300
42 northSLR NW_022059696.1 0.1401060
43 northSLR NW_022059698.1 0.1404041
44 southSLR NC_044303.1 0.1578864
45 southSLR NC_044304.1 0.1564522
46 southSLR NC_044305.1 0.1626767
47 southSLR NC_044306.1 0.1580781
48 southSLR NC_044307.1 0.1598886
49 southSLR NC_044308.1 0.1602480
50 southSLR NC_044309.1 0.1654044
51 southSLR NC_044310.1 0.1549127
52 southSLR NC_044311.1 0.1605186
53 southSLR NC_044312.1 0.1549133
54 southSLR NC_044313.1 0.1577230
55 southSLR NC_044314.1 0.1534337
56 southSLR NC_044315.1 0.1604805
57 southSLR NC_044316.1 0.1475240
58 southSLR NC_044317.1 0.1540291
59 southSLR NC_044318.1 0.1508201
60 southSLR NC_044319.1 0.1616367
61 southSLR NC_044320.1 0.1510919
62 southSLR NC_044321.1 0.1568657
63 southSLR NW_022059694.1 0.1683893
64 southSLR NW_022059695.1 0.1359663
65 southSLR NW_022059696.1 0.1328786
66 southSLR NW_022059697.1 0.1132430
67 southSLR NW_022059698.1 0.1304428
68 southSLR NW_022059699.1 0.1232833
$summary_ROH_mean_class
CLASS NFLD northSLR southSLR
1 0-1 0.15364 0.1441207 0.1581674
$result_Froh_genome_wide
id group sum Froh_genome
1 a109 southSLR 500214126 0.20766392
2 a182 southSLR 444333451 0.18446505
3 a202 northSLR 253977611 0.10543882
4 a33 northSLR 208768163 0.08667011
5 a475 NFLD 494094599 0.20512339
6 a494 NFLD 481967470 0.20008882
7 a507 southSLR 507341991 0.21062305
8 a697 southSLR 90644194 0.03763094
9 a772 northSLR 240653104 0.09990715
10 a794 southSLR 369240707 0.15329030
11 a803 northSLR 247714149 0.10283854
12 a818 northSLR 300235387 0.12464273
13 a857 southSLR 336983719 0.13989881
14 b114 northSLR 274217810 0.11384154
15 b124 northSLR 297928339 0.12368496
16 b13 northSLR 231167241 0.09596909
17 b188 northSLR 352222884 0.14622535
18 b23 southSLR 464663171 0.19290494
19 b276 NFLD 528073995 0.21922994
20 b554 northSLR 327500262 0.13596175
21 b90 southSLR 65837734 0.02733254
22 c165 northSLR 250877808 0.10415193
23 c323 northSLR 250642243 0.10405414
24 c548 southSLR 426806006 0.17718853
25 f264 northSLR 196504680 0.08157893
26 f457 northSLR 215528370 0.08947661
27 fha_024 northSLR 275475563 0.11436369
28 fha_042 northSLR 381146748 0.15823309
29 fha_043 northSLR 210809513 0.08751758
30 l09_003 southSLR 286369948 0.11888650
31 l09_007 northSLR 236723032 0.09827558
32 l09_015 southSLR 415054760 0.17231000
33 L155 southSLR 143948574 0.05976026
34 LIC11 southSLR 193906446 0.08050027
35 LIC20 southSLR 231921805 0.09628235
36 LIC23 southSLR 212912245 0.08839053
37 LIC24 southSLR 312754269 0.12983995
38 LIC27B southSLR 227583189 0.09448117
39 LIC28 southSLR 230852367 0.09583837
40 LIC31 southSLR 342720076 0.14228025
41 LIC32 southSLR 488152642 0.20265659
42 LIC36 southSLR 367325064 0.15249502
43 LIC46 southSLR 567174790 0.23546264
44 LIC47 southSLR 439697230 0.18254032
45 LIC48 southSLR 487932958 0.20256539
46 LIC54 southSLR 376154944 0.15616074
47 LIC57 southSLR 409829475 0.17014072
48 LIC60 southSLR 380937637 0.15814628
49 LIC8 southSLR 281120952 0.11670738
50 LIC9 southSLR 336736235 0.13979606
51 LIT2 southSLR 253068681 0.10506147
52 LIT5 southSLR 358666815 0.14890054
53 LRK10 southSLR 413566148 0.17169200
54 LRK11 southSLR 196388639 0.08153075
55 LRK12 southSLR 144269102 0.05989332
56 LRK13 southSLR 228110291 0.09470000
57 LRK17 southSLR 360399688 0.14961995
58 LRK22 southSLR 231652294 0.09617046
$result_Froh_class
id group Sum_Class_0 Froh_Class_0
1 a109 southSLR 500214126 0.20766392
2 a182 southSLR 444333451 0.18446505
3 a202 northSLR 253977611 0.10543882
4 a33 northSLR 208768163 0.08667011
5 a475 NFLD 494094599 0.20512339
6 a494 NFLD 481967470 0.20008882
7 a507 southSLR 507341991 0.21062305
8 a697 southSLR 90644194 0.03763094
9 a772 northSLR 240653104 0.09990715
10 a794 southSLR 369240707 0.15329030
11 a803 northSLR 247714149 0.10283854
12 a818 northSLR 300235387 0.12464273
13 a857 southSLR 336983719 0.13989881
14 b114 northSLR 274217810 0.11384154
15 b124 northSLR 297928339 0.12368496
16 b13 northSLR 231167241 0.09596909
17 b188 northSLR 352222884 0.14622535
18 b23 southSLR 464663171 0.19290494
19 b276 NFLD 528073995 0.21922994
20 b554 northSLR 327500262 0.13596175
21 b90 southSLR 65837734 0.02733254
22 c165 northSLR 250877808 0.10415193
23 c323 northSLR 250642243 0.10405414
24 c548 southSLR 426806006 0.17718853
25 f264 northSLR 196504680 0.08157893
26 f457 northSLR 215528370 0.08947661
27 fha_024 northSLR 275475563 0.11436369
28 fha_042 northSLR 381146748 0.15823309
29 fha_043 northSLR 210809513 0.08751758
30 l09_003 southSLR 286369948 0.11888650
31 l09_007 northSLR 236723032 0.09827558
32 l09_015 southSLR 415054760 0.17231000
33 L155 southSLR 143948574 0.05976026
34 LIC11 southSLR 193906446 0.08050027
35 LIC20 southSLR 231921805 0.09628235
36 LIC23 southSLR 212912245 0.08839053
37 LIC24 southSLR 312754269 0.12983995
38 LIC27B southSLR 227583189 0.09448117
39 LIC28 southSLR 230852367 0.09583837
40 LIC31 southSLR 342720076 0.14228025
41 LIC32 southSLR 488152642 0.20265659
42 LIC36 southSLR 367325064 0.15249502
43 LIC46 southSLR 567174790 0.23546264
44 LIC47 southSLR 439697230 0.18254032
45 LIC48 southSLR 487932958 0.20256539
46 LIC54 southSLR 376154944 0.15616074
47 LIC57 southSLR 409829475 0.17014072
48 LIC60 southSLR 380937637 0.15814628
49 LIC8 southSLR 281120952 0.11670738
50 LIC9 southSLR 336736235 0.13979606
51 LIT2 southSLR 253068681 0.10506147
52 LIT5 southSLR 358666815 0.14890054
53 LRK10 southSLR 413566148 0.17169200
54 LRK11 southSLR 196388639 0.08153075
55 LRK12 southSLR 144269102 0.05989332
56 LRK13 southSLR 228110291 0.09470000
57 LRK17 southSLR 360399688 0.14961995
58 LRK22 southSLR 231652294 0.09617046
# Long
> long_result_summary
$summary_ROH_count_chr
southSLR
NC_044303.1 1
NC_044308.1 1
NC_044309.1 2
NC_044310.1 2
NC_044311.1 1
NC_044321.1 2
$summary_ROH_percentage_chr
southSLR chrom
NC_044303.1 0.1111111 NC_044303.1
NC_044308.1 0.1111111 NC_044308.1
NC_044309.1 0.2222222 NC_044309.1
NC_044310.1 0.2222222 NC_044310.1
NC_044311.1 0.1111111 NC_044311.1
NC_044321.1 0.2222222 NC_044321.1
$summary_ROH_count
southSLR
1-2 9
$summary_ROH_percentage
southSLR CLASS
1-2 1 1-2
$summary_ROH_mean_chr
group chrom sum
1 southSLR NC_044303.1 1.142770
2 southSLR NC_044308.1 1.265340
3 southSLR NC_044309.1 1.084336
4 southSLR NC_044310.1 1.091748
5 southSLR NC_044311.1 1.053286
6 southSLR NC_044321.1 1.208071
$summary_ROH_mean_class
CLASS southSLR
1 0-1 1.136634
$result_Froh_genome_wide
id group sum Froh_genome
1 a182 southSLR 1029250 0.0004272932
2 a507 southSLR 2176237 0.0009034649
3 c548 southSLR 2416142 0.0010030615
4 LIC32 southSLR 2196056 0.0009116927
5 LIC46 southSLR 1146681 0.0004760447
6 LIC47 southSLR 1265340 0.0005253060
$result_Froh_class
id group Sum_Class_0 Froh_Class_0 Sum_Class_1 Froh_Class_1
1 a182 southSLR 1029250 0.0004272932 1029250 0.0004272932
2 a507 southSLR 2176237 0.0009034649 2176237 0.0009034649
3 c548 southSLR 2416142 0.0010030615 2416142 0.0010030615
4 LIC32 southSLR 2196056 0.0009116927 2196056 0.0009116927
5 LIC46 southSLR 1146681 0.0004760447 1146681 0.0004760447
6 LIC47 southSLR 1265340 0.0005253060 1265340 0.0005253060
Next Steps:
Run ROH on bobcat for comparison.
Write up results in Chapter 2
:::info
**Find this document incomplete?** Leave a comment!
:::
###### tags: `chapter2_popgen_metrics` `ROH` `Rows of Heterozygosity` `Templates` `Documentation`
Update: ROH Islands
Looking at ROH islands for NFLD lynx (specifically looking at the longest ROH) in Chr
bCFTOOLs & reference genome method
bCFTOOLS & PLINK comparison
ANGSD snps --> PLINK
make sure using snps shared by all individuals (snpset blah blah LD etc.)
some saying that ROH is affected by coverage and should only be used with the high coverage genomes.