--- title: 'ROH' disqus: hackmd --- Interpreting ROH === NOTE TO WARREN: I re-ran this analysis with a more informed parametrization and the results told a completely different story, mainly that the Newfoundland (island population) samples have a higher count of ROH, when shorter ROH <1000bp are included. The results below suggested that at least one individual (LIC46) is highly inbred and has a high proportion of the genome (0.09) in ROH. So, I think there is some additional thought that needs to go into the parameterization of this analysis. Please see the updated results here: and read the general description below. Thank you! #### A brief overview of ROH and what we're doing here: {add more here but basically...} Shorter ROH display more ancient inbreeding, while longer ROH show more recent inbreeding [24]. I've used the detectRUNS(0.9.6) package in R to detect runs of homozygosity and summarize the results. #### A full description of the analyses can be found here: ## A couple of notes: 1. I think some changes will be made to the parameterization of the sliding-window ROH detection (particularly minLengthBps and maxMissing). This paper was a great resource for informed parameter-setting: http://europepmc.org/article/PMC/5787230. It's important to parametrize carefully because a too-small minimum window size (e.g. 5 SNPs) will overestimate inbreeding rates according to simulations. 3. We use PLINK ped/map inputs here, but I suspect we have an unplaced scaffold in the mix (allow-extra-chr in PLINK). We may need to rerun PLINK and try new input files without SNPs on the unplaced scaffold, so that it will play nicely with R. Request an interactive session to run this command bsub -q interactive -R rusage[mem=24000] -n 1 -W 30 -Is bash Use plink, not vcftools plink --recode --vcf allsamples.vcf.gz --allow-extra-chr 5. I've used the 6SV.vcf, which is a filtered VCF that has *not* been further reduced (e.g. for LD-pruning). I've not determined if SNPs need to be in linkage equilibrium to analyze ROH. It seems like LD-pruning would inherently reduce our ability to acurately detect ROH, but I don't know and need to look into this further. 6. We need to evaluate three ROH length classes because they tell us different things about historic Ne. However, what constitutes a short/medium/long ROH is hard to pinpoint. I've evaluated the literature and found some concensus around these size classes: **"short"**(less than 100 kb) and **medium** (0.1 to 3 Mb) ROH regions have *more* deleterious variants than **long** (> 3 Mb) ROH regions. # Results These are the results from a first run of analyses, but I've since adjusted the min n SNPs to detect shorter (<250000bp) ROH. 1. Min/Max/Mean Note that none of the ROH are > 3Mb, they are all in the short to medium range. maximum length ROH (Mb) 1.26534 minimum length ROH (Mb) 0.250011 mean legth ROH (Mb) by group | NFLD|northSLR|southSLR| | -------- | -------- | -------- | | 0.3117209| 0.3069324|0.3396391 | 2. ROH count by group Group names: Newfoundland, North of the St Lawrence River, South of the St Lawrence River. Note that population assignment was based on results from several clustering analyses (PCA, sNMF). | NFLD|northSLR|southSLR| | -------- | -------- | -------- | | 665 | 1396 | 6695 | 3. Distribution of ROH by length and individual ![](https://i.imgur.com/NWF5aW6.png) Fig1: The number of ROH by individual (colored by group), distributed by mean length of ROH in Mb (x-axis). Group names are abbreviated for "Newfoundland", "North-" and "South of the St Lawrence River". The outlier to the top right is LIC46, a lynx from Maine, south of the St. Lawrence River ![](https://i.imgur.com/6ovuYlE.png) Fig2: Same as above, but distributed along the sum of ROH in Mb (x-axis). Note that NFLD and North are somewhat tightly distributed but Maine has a much broader distribution of ROH and includes outlier LIC46 (top right, blue). 5. Estimating genome-wide FROH FROH is a measure of inbreeding that ranges from 0 to 1 (like the pedigree inbreeding coefficient FP), and estimates the fraction of the genome in ROH, where identical-by-descent chromosome copies coalesce in a “recent” ancestor. See: https://www.pnas.org/content/pnas/early/2018/02/16/1714475115.full.pdf id group sum genome_wide_FROH 1 a109 southSLR 109076291 0.045283027 2 a182 southSLR 88868071 0.036893584 3 a202 northSLR 19946211 0.00828067 4 a33 northSLR 16021104 0.006651162 5 a475 NFLD 69859782 0.029002292 6 a494 NFLD 61099018 0.02536526 7 a507 southSLR 145277232 0.060311849 8 a697 southSLR 1627327 0.000675585 9 a772 northSLR 17801140 0.007390144 10 a794 southSLR 42909881 0.017814039 11 a803 northSLR 16911461 0.007020794 12 a818 northSLR 38136027 0.015832173 13 a857 southSLR 51597748 0.021420807 14 b114 northSLR 27589392 0.011453737 15 b124 northSLR 34457599 0.014305074 16 b13 southSLR 15865049 0.006586376 17 b188 northSLR 47453535 0.019700337 18 b23 southSLR 98115926 0.040732831 19 b276 NFLD 76335584 0.031690721 20 b554 northSLR 37901829 0.015734946 21 b90 southSLR 2303022 0.0009561 22 c165 northSLR 16788137 0.006969596 23 c323 northSLR 19320646 0.008020967 24 c548 southSLR 97195636 0.040350773 25 f264 northSLR 14402970 0.005979394 26 f457 northSLR 16097324 0.006682805 27 fha_024 northSLR 25417751 0.01055218 28 fha_042 northSLR 52954667 0.021984132 29 fha_043 northSLR 13356791 0.005545072 30 l09_003 southSLR 33181222 0.013775186 31 l09_007 northSLR 13921106 0.005779348 32 l09_015 southSLR 83475846 0.034655001 33 L155 southSLR 6660924 0.002765283 34 LIC11 southSLR 16738026 0.006948792 35 LIC20 southSLR 22522262 0.009350118 36 LIC23 southSLR 16031019 0.006655278 37 LIC24 southSLR 43162205 0.017918791 38 LIC27B southSLR 19214958 0.007977091 39 LIC28 southSLR 18186905 0.007550294 40 LIC31 southSLR 51106214 0.021216747 41 LIC32 southSLR 149313408 0.061987468 42 LIC36 southSLR 67959915 0.028213562 43 LIC46 southSLR 236214996 0.098064666 44 LIC47 southSLR 116967778 0.048559178 45 LIC48 southSLR 123256833 0.05117008 46 LIC54 southSLR 70507565 0.029271219 47 LIC57 southSLR 90552558 0.037592898 48 LIC60 southSLR 84435937 0.035053583 49 LIC8 southSLR 32280875 0.013401407 50 LIC9 southSLR 65922805 0.027367855 51 LIT2 southSLR 26886377 0.011161881 52 LIT5 southSLR 56071155 0.023277942 53 LRK10 southSLR 66665460 0.027676169 54 LRK11 southSLR 15109721 0.006272801 55 LRK12 southSLR 6523076 0.002708055 56 LRK13 southSLR 19836946 0.008235309 57 LRK17 southSLR 54441704 0.022601476 58 LRK22 southSLR 27821056 0.011549913 Maximum FROH: 0.098064666 ^This is LIC46 (a lynx from Maine, south of the St Lawrence River). I wanted to break the results up into three categories (small10-100kb, medium100kb-1Mb, long>1Mb). So I binned the results and here they are: # Medium $summary_ROH_count_chr NFLD northSLR southSLR NC_044303.1 966 3406 7915 NC_044304.1 589 2270 5050 NC_044305.1 545 1743 4821 NC_044306.1 911 3035 6662 NC_044307.1 610 2392 5333 NC_044308.1 577 2013 5240 NC_044309.1 599 2171 5199 NC_044310.1 1099 3230 6742 NC_044311.1 791 2286 5770 NC_044312.1 457 1577 3614 NC_044313.1 311 1168 2678 NC_044314.1 378 1104 2561 NC_044315.1 355 1184 3020 NC_044316.1 132 540 1081 NC_044317.1 193 597 1647 NC_044318.1 112 320 889 NC_044319.1 233 710 2022 NC_044320.1 357 1049 2402 NC_044321.1 569 2154 3942 NW_022059694.1 NA 4 11 NW_022059695.1 NA 2 17 NW_022059696.1 NA 2 14 NW_022059697.1 NA NA 2 NW_022059698.1 6 16 26 NW_022059699.1 NA NA 3 $summary_ROH_percentage_chr NFLD northSLR southSLR chrom NC_044303.1 0.0986721144 0.10329663664 0.10324676172 NC_044303.1 NC_044304.1 0.0601634321 0.06884420587 0.06587443420 NC_044304.1 NC_044305.1 0.0556690501 0.05286143208 0.06288725688 NC_044305.1 NC_044306.1 0.0930541369 0.09204500652 0.08690207537 NC_044306.1 NC_044307.1 0.0623084780 0.07254420283 0.06956601140 NC_044307.1 NC_044308.1 0.0589376915 0.06104994996 0.06835287826 NC_044308.1 NC_044309.1 0.0611848825 0.06584174931 0.06781805612 NC_044309.1 NC_044310.1 0.1122574055 0.09795893610 0.08794563076 NC_044310.1 NC_044311.1 0.0807967314 0.06932945137 0.07526643274 NC_044311.1 NC_044312.1 0.0466802860 0.04782700998 0.04714261489 NC_044312.1 NC_044313.1 0.0317671093 0.03542292178 0.03493301679 NC_044313.1 NC_044314.1 0.0386108274 0.03348193977 0.03340681703 NC_044314.1 NC_044315.1 0.0362614913 0.03590816729 0.03939421609 NC_044315.1 NC_044316.1 0.0134831461 0.01637703576 0.01410104225 NC_044316.1 NC_044317.1 0.0197139939 0.01810572286 0.02148419666 NC_044317.1 NC_044318.1 0.0114402451 0.00970491008 0.01159650931 NC_044318.1 NC_044319.1 0.0237997957 0.02153276924 0.02637586256 NC_044319.1 NC_044320.1 0.0364657814 0.03181390835 0.03133275068 NC_044320.1 NC_044321.1 0.0581205312 0.06532617596 0.05142119200 NC_044321.1 NW_022059694.1 NA 0.00012131138 0.00014348887 NW_022059694.1 NW_022059695.1 NA 0.00006065569 0.00022175552 NW_022059695.1 NW_022059696.1 NA 0.00006065569 0.00018262219 NW_022059696.1 NW_022059697.1 NA NA 0.00002608888 NW_022059697.1 NW_022059698.1 0.0006128703 0.00048524550 0.00033915550 NW_022059698.1 NW_022059699.1 NA NA 0.00003913333 NW_022059699.1 $summary_ROH_count NFLD northSLR southSLR 0-1 9790 32973 76661 $summary_ROH_percentage NFLD northSLR southSLR CLASS 0-1 1 1 1 0-1 $summary_ROH_mean_chr group chrom sum 1 NFLD NC_044303.1 0.1535463 2 NFLD NC_044304.1 0.1539820 3 NFLD NC_044305.1 0.1507903 4 NFLD NC_044306.1 0.1525205 5 NFLD NC_044307.1 0.1551508 6 NFLD NC_044308.1 0.1493451 7 NFLD NC_044309.1 0.1496168 8 NFLD NC_044310.1 0.1591317 9 NFLD NC_044311.1 0.1552105 10 NFLD NC_044312.1 0.1502754 11 NFLD NC_044313.1 0.1511492 12 NFLD NC_044314.1 0.1521889 13 NFLD NC_044315.1 0.1513121 14 NFLD NC_044316.1 0.1369083 15 NFLD NC_044317.1 0.1512648 16 NFLD NC_044318.1 0.1522134 17 NFLD NC_044319.1 0.1504932 18 NFLD NC_044320.1 0.1461599 19 NFLD NC_044321.1 0.1697721 20 NFLD NW_022059698.1 0.1352262 21 northSLR NC_044303.1 0.1414698 22 northSLR NC_044304.1 0.1425300 23 northSLR NC_044305.1 0.1412502 24 northSLR NC_044306.1 0.1443095 25 northSLR NC_044307.1 0.1465497 26 northSLR NC_044308.1 0.1432942 27 northSLR NC_044309.1 0.1443501 28 northSLR NC_044310.1 0.1437766 29 northSLR NC_044311.1 0.1436200 30 northSLR NC_044312.1 0.1421168 31 northSLR NC_044313.1 0.1444451 32 northSLR NC_044314.1 0.1434877 33 northSLR NC_044315.1 0.1456807 34 northSLR NC_044316.1 0.1423822 35 northSLR NC_044317.1 0.1447175 36 northSLR NC_044318.1 0.1343179 37 northSLR NC_044319.1 0.1409540 38 northSLR NC_044320.1 0.1408060 39 northSLR NC_044321.1 0.1559180 40 northSLR NW_022059694.1 0.2428732 41 northSLR NW_022059695.1 0.1550300 42 northSLR NW_022059696.1 0.1401060 43 northSLR NW_022059698.1 0.1404041 44 southSLR NC_044303.1 0.1578864 45 southSLR NC_044304.1 0.1564522 46 southSLR NC_044305.1 0.1626767 47 southSLR NC_044306.1 0.1580781 48 southSLR NC_044307.1 0.1598886 49 southSLR NC_044308.1 0.1602480 50 southSLR NC_044309.1 0.1654044 51 southSLR NC_044310.1 0.1549127 52 southSLR NC_044311.1 0.1605186 53 southSLR NC_044312.1 0.1549133 54 southSLR NC_044313.1 0.1577230 55 southSLR NC_044314.1 0.1534337 56 southSLR NC_044315.1 0.1604805 57 southSLR NC_044316.1 0.1475240 58 southSLR NC_044317.1 0.1540291 59 southSLR NC_044318.1 0.1508201 60 southSLR NC_044319.1 0.1616367 61 southSLR NC_044320.1 0.1510919 62 southSLR NC_044321.1 0.1568657 63 southSLR NW_022059694.1 0.1683893 64 southSLR NW_022059695.1 0.1359663 65 southSLR NW_022059696.1 0.1328786 66 southSLR NW_022059697.1 0.1132430 67 southSLR NW_022059698.1 0.1304428 68 southSLR NW_022059699.1 0.1232833 $summary_ROH_mean_class CLASS NFLD northSLR southSLR 1 0-1 0.15364 0.1441207 0.1581674 $result_Froh_genome_wide id group sum Froh_genome 1 a109 southSLR 500214126 0.20766392 2 a182 southSLR 444333451 0.18446505 3 a202 northSLR 253977611 0.10543882 4 a33 northSLR 208768163 0.08667011 5 a475 NFLD 494094599 0.20512339 6 a494 NFLD 481967470 0.20008882 7 a507 southSLR 507341991 0.21062305 8 a697 southSLR 90644194 0.03763094 9 a772 northSLR 240653104 0.09990715 10 a794 southSLR 369240707 0.15329030 11 a803 northSLR 247714149 0.10283854 12 a818 northSLR 300235387 0.12464273 13 a857 southSLR 336983719 0.13989881 14 b114 northSLR 274217810 0.11384154 15 b124 northSLR 297928339 0.12368496 16 b13 northSLR 231167241 0.09596909 17 b188 northSLR 352222884 0.14622535 18 b23 southSLR 464663171 0.19290494 19 b276 NFLD 528073995 0.21922994 20 b554 northSLR 327500262 0.13596175 21 b90 southSLR 65837734 0.02733254 22 c165 northSLR 250877808 0.10415193 23 c323 northSLR 250642243 0.10405414 24 c548 southSLR 426806006 0.17718853 25 f264 northSLR 196504680 0.08157893 26 f457 northSLR 215528370 0.08947661 27 fha_024 northSLR 275475563 0.11436369 28 fha_042 northSLR 381146748 0.15823309 29 fha_043 northSLR 210809513 0.08751758 30 l09_003 southSLR 286369948 0.11888650 31 l09_007 northSLR 236723032 0.09827558 32 l09_015 southSLR 415054760 0.17231000 33 L155 southSLR 143948574 0.05976026 34 LIC11 southSLR 193906446 0.08050027 35 LIC20 southSLR 231921805 0.09628235 36 LIC23 southSLR 212912245 0.08839053 37 LIC24 southSLR 312754269 0.12983995 38 LIC27B southSLR 227583189 0.09448117 39 LIC28 southSLR 230852367 0.09583837 40 LIC31 southSLR 342720076 0.14228025 41 LIC32 southSLR 488152642 0.20265659 42 LIC36 southSLR 367325064 0.15249502 43 LIC46 southSLR 567174790 0.23546264 44 LIC47 southSLR 439697230 0.18254032 45 LIC48 southSLR 487932958 0.20256539 46 LIC54 southSLR 376154944 0.15616074 47 LIC57 southSLR 409829475 0.17014072 48 LIC60 southSLR 380937637 0.15814628 49 LIC8 southSLR 281120952 0.11670738 50 LIC9 southSLR 336736235 0.13979606 51 LIT2 southSLR 253068681 0.10506147 52 LIT5 southSLR 358666815 0.14890054 53 LRK10 southSLR 413566148 0.17169200 54 LRK11 southSLR 196388639 0.08153075 55 LRK12 southSLR 144269102 0.05989332 56 LRK13 southSLR 228110291 0.09470000 57 LRK17 southSLR 360399688 0.14961995 58 LRK22 southSLR 231652294 0.09617046 $result_Froh_class id group Sum_Class_0 Froh_Class_0 1 a109 southSLR 500214126 0.20766392 2 a182 southSLR 444333451 0.18446505 3 a202 northSLR 253977611 0.10543882 4 a33 northSLR 208768163 0.08667011 5 a475 NFLD 494094599 0.20512339 6 a494 NFLD 481967470 0.20008882 7 a507 southSLR 507341991 0.21062305 8 a697 southSLR 90644194 0.03763094 9 a772 northSLR 240653104 0.09990715 10 a794 southSLR 369240707 0.15329030 11 a803 northSLR 247714149 0.10283854 12 a818 northSLR 300235387 0.12464273 13 a857 southSLR 336983719 0.13989881 14 b114 northSLR 274217810 0.11384154 15 b124 northSLR 297928339 0.12368496 16 b13 northSLR 231167241 0.09596909 17 b188 northSLR 352222884 0.14622535 18 b23 southSLR 464663171 0.19290494 19 b276 NFLD 528073995 0.21922994 20 b554 northSLR 327500262 0.13596175 21 b90 southSLR 65837734 0.02733254 22 c165 northSLR 250877808 0.10415193 23 c323 northSLR 250642243 0.10405414 24 c548 southSLR 426806006 0.17718853 25 f264 northSLR 196504680 0.08157893 26 f457 northSLR 215528370 0.08947661 27 fha_024 northSLR 275475563 0.11436369 28 fha_042 northSLR 381146748 0.15823309 29 fha_043 northSLR 210809513 0.08751758 30 l09_003 southSLR 286369948 0.11888650 31 l09_007 northSLR 236723032 0.09827558 32 l09_015 southSLR 415054760 0.17231000 33 L155 southSLR 143948574 0.05976026 34 LIC11 southSLR 193906446 0.08050027 35 LIC20 southSLR 231921805 0.09628235 36 LIC23 southSLR 212912245 0.08839053 37 LIC24 southSLR 312754269 0.12983995 38 LIC27B southSLR 227583189 0.09448117 39 LIC28 southSLR 230852367 0.09583837 40 LIC31 southSLR 342720076 0.14228025 41 LIC32 southSLR 488152642 0.20265659 42 LIC36 southSLR 367325064 0.15249502 43 LIC46 southSLR 567174790 0.23546264 44 LIC47 southSLR 439697230 0.18254032 45 LIC48 southSLR 487932958 0.20256539 46 LIC54 southSLR 376154944 0.15616074 47 LIC57 southSLR 409829475 0.17014072 48 LIC60 southSLR 380937637 0.15814628 49 LIC8 southSLR 281120952 0.11670738 50 LIC9 southSLR 336736235 0.13979606 51 LIT2 southSLR 253068681 0.10506147 52 LIT5 southSLR 358666815 0.14890054 53 LRK10 southSLR 413566148 0.17169200 54 LRK11 southSLR 196388639 0.08153075 55 LRK12 southSLR 144269102 0.05989332 56 LRK13 southSLR 228110291 0.09470000 57 LRK17 southSLR 360399688 0.14961995 58 LRK22 southSLR 231652294 0.09617046 # Long > long_result_summary $summary_ROH_count_chr southSLR NC_044303.1 1 NC_044308.1 1 NC_044309.1 2 NC_044310.1 2 NC_044311.1 1 NC_044321.1 2 $summary_ROH_percentage_chr southSLR chrom NC_044303.1 0.1111111 NC_044303.1 NC_044308.1 0.1111111 NC_044308.1 NC_044309.1 0.2222222 NC_044309.1 NC_044310.1 0.2222222 NC_044310.1 NC_044311.1 0.1111111 NC_044311.1 NC_044321.1 0.2222222 NC_044321.1 $summary_ROH_count southSLR 1-2 9 $summary_ROH_percentage southSLR CLASS 1-2 1 1-2 $summary_ROH_mean_chr group chrom sum 1 southSLR NC_044303.1 1.142770 2 southSLR NC_044308.1 1.265340 3 southSLR NC_044309.1 1.084336 4 southSLR NC_044310.1 1.091748 5 southSLR NC_044311.1 1.053286 6 southSLR NC_044321.1 1.208071 $summary_ROH_mean_class CLASS southSLR 1 0-1 1.136634 $result_Froh_genome_wide id group sum Froh_genome 1 a182 southSLR 1029250 0.0004272932 2 a507 southSLR 2176237 0.0009034649 3 c548 southSLR 2416142 0.0010030615 4 LIC32 southSLR 2196056 0.0009116927 5 LIC46 southSLR 1146681 0.0004760447 6 LIC47 southSLR 1265340 0.0005253060 $result_Froh_class id group Sum_Class_0 Froh_Class_0 Sum_Class_1 Froh_Class_1 1 a182 southSLR 1029250 0.0004272932 1029250 0.0004272932 2 a507 southSLR 2176237 0.0009034649 2176237 0.0009034649 3 c548 southSLR 2416142 0.0010030615 2416142 0.0010030615 4 LIC32 southSLR 2196056 0.0009116927 2196056 0.0009116927 5 LIC46 southSLR 1146681 0.0004760447 1146681 0.0004760447 6 LIC47 southSLR 1265340 0.0005253060 1265340 0.0005253060 Next Steps: Run ROH on bobcat for comparison. Write up results in Chapter 2 :::info **Find this document incomplete?** Leave a comment! ::: ###### tags: `chapter2_popgen_metrics` `ROH` `Rows of Heterozygosity` `Templates` `Documentation` Update: ROH Islands Looking at ROH islands for NFLD lynx (specifically looking at the longest ROH) in Chr bCFTOOLs & reference genome method bCFTOOLS & PLINK comparison ANGSD snps --> PLINK make sure using snps shared by all individuals (snpset blah blah LD etc.) some saying that ROH is affected by coverage and should only be used with the high coverage genomes.