Katrina Phillips
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    <h2>Loggerhead WGR migratory genomics workflow</h2> --- <h3>Data summary</h3> * Total number of unique individuals: 132 * Sampling groups: 5 * Rookery KEY: GOM, SNWA * Rookery MB: GOM, SNWA, MAB * Goal: n~25 per rookery/foraging site type * Library prep details * Library pools: 2, each sequenced over 2 lanes of NovaSeq X * Used 7x PCR cycles to reduce number of PCR duplicates * details in [pooling spreadsheet](https://docs.google.com/spreadsheets/d/1zQLUvMfeJCWPn_rODM5VE9k3YrD6isMS/edit#gid=924633929) * Expected sequencing depth: 11-12x * including ~20% loss for duplicates --- <h3>Study System</h3> ![study_system](https://hackmd.io/_uploads/rJToaKYZC.png) Samples from nesting loggerhead turtles at two rookeries in Florida (black dots). Turtles nesting at the Gulf coast site on Keewaydin Island (KEY) migrate to 2 foraging regions (~77% to GOM, ~23% SNWA). Turtles nesting at the east coast site at Melbourne Beach (MB) migrate to 4 foraging regions; focusing on 3 for this study (~59% to SNWA, ~21% GOM, ~17% MAB; not analyzing 3% to SAB due to SIA assignment ambiguity). --- <h3>Pipeline Summary</h3> Data processing 1. Download raw data from Novogene 2. Check for download errors with MD5checksums 3. Raw data QC 4. Concatenate libraries sequenced across lanes 5. Trim adapters 6. Trimmed data QC 7. Remove mitochondrial reads 8. Align to reference genome 9. Remove duplicates 10. Calculate coverage depth Data analyses 1. Assess population structure 2. Fst outlier analysis 3. Cross-check candidate genes in Baypass? <h3>Raw data</h3> Downloaded by LMK /nese/meclab/Shared/raw_reads/Cc_WGR/usftp21.novogene.com/01.RawData/ MD5checksums completed by KFP (run_md5CheckSums.sh) ``` LANE=$(sed -n ${SLURM_ARRAY_TASK_ID}p /nese/meclab/Katrina/cc/md5CheckSums/read_folders.txt) for file in $LANE/*.gz do LANE_PREFIX=$(echo $LANE | cut -d "/" -f 9) md5sum $file >> /nese/meclab/Katrina/cc/md5CheckSums/MD5_postTransfer_${LANE_PREFIX}.txt done ``` Initial FastQC on raw data ([files and MultiQC](https://drive.google.com/drive/u/1/folders/1cHuyJgKvNmBAVynGJIGSUmSROZEiBcfX), see 01_FastQC_Trim_Align.sh) ``` fastqc $FQDIR/$SAMPLE/${SAMPLE}*_1.fq.gz --outdir ./FastQC/raw_reads fastqc $FQDIR/$SAMPLE/${SAMPLE}*_2.fq.gz --outdir ./FastQC/raw_reads multiqc /nese/meclab/Katrina/cc/FastQC/raw_reads/*_fastqc.zip ``` Of note: 1. There are 3 areas of low tile quality that appear across most individuals (though not all as severe as example below) Raised this issue with Novogene, and they say it's specific to these libraries and not a flowcell problem ![raw_pertileKEY440_2](https://hackmd.io/_uploads/HkyJ7Cqy0.png) 2. Mean quality scores mostly >30, dipping at end of reads (flagged <FONT COLOR="#CBAF52">**MB205**</FONT>) ![raw_meanquality](https://hackmd.io/_uploads/rkU6ZR910.png) 3. Raw GC content (flagged <FONT COLOR="#FF0000">**KEY063**</FONT>) ![raw_GC](https://hackmd.io/_uploads/HkV9rR5yR.png) 4. High duplication rates for 6 libraries (<FONT COLOR="#FF0000">**KEY063, KEY435, KEY445, MB111, MB265, MB303**</FONT>) ![raw_seqduplication](https://hackmd.io/_uploads/HJYoLC51C.png) <h3>Concatenate and Trim</h3> Concatenated data from 2 sequencing lanes (00_concat_lanes.sh), double-checked that Total sequences from Lane 1 + Lane 2 = concatenated ([CcWGR_concat_check](https://docs.google.com/spreadsheets/d/1uYh6mS4ryzwFO_HXSGbr7jiyeAjxs55G/edit#gid=632560932)) ``` SAMPLE=$(sed -n ${SLURM_ARRAY_TASK_ID}p ./all_samples.txt) #R1 cat /nese/meclab/Shared/raw_reads/Cc_WGR/usftp21.novogene.com/01.RawData/$SAMPLE/${SAMPLE}*_1.fq.gz > /scratch/workspace/katrinaphill_umass_edu-simple/concat/${SAMPLE}_1.fq.gz #R2 cat /nese/meclab/Shared/raw_reads/Cc_WGR/usftp21.novogene.com/01.RawData/$SAMPLE/${SAMPLE}*_2.fq.gz > /scratch/workspace/katrinaphill_umass_edu-simple/concat/${SAMPLE}_2.fq.gz ``` Distribution of raw read counts after concatenating ![raw_read_counts](https://hackmd.io/_uploads/SJ6azOTxR.png) Ran Trimmomatic on each concatenated R1 and R2 for quality and removing Illimina adapters (see 01_FastQC_Trim_Align.sh) ``` java -jar /modules/apps/trimmomatic/0.39/trimmomatic-0.39.jar PE -phred33 -threads 16 \ $FQDIR/${SAMPLE}_1.fq.gz $FQDIR/${SAMPLE}_2.fq.gz \ $TRIMDIR/paired/${SAMPLE}_R1.paired.fq.gz $TRIMDIR/unpaired/${SAMPLE}_R1.unpaired.fq.gz \ $TRIMDIR/paired/${SAMPLE}_R2.paired.fq.gz $TRIMDIR/unpaired/${SAMPLE}_R2.unpaired.fq.gz \ ILLUMINACLIP:/modules/apps/trimmomatic/0.39/adapters/NexteraPE-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:75 ``` FastQC trimmed data ([MultiQC](https://drive.google.com/drive/u/1/folders/1cHuyJgKvNmBAVynGJIGSUmSROZEiBcfX)) Of note: 1. Trimmed mean read quality looks good ![trimmed_meanquality](https://hackmd.io/_uploads/SJX18EikR.png) 2. One library has GC > AT (KEY063); a satellite-tagged individual, flagged in the raw GC content plot above. Is more trimming needed before moving forward? Perhaps will be resolved when duplicates are removed? ![KEY063_GC](https://hackmd.io/_uploads/ByjI84oyA.jpg) 3. Trimming removed 30% of reads on average (7-86%). Seems like a lot? Perhaps due to the lower average peak size of my libraries? Number of reads remaining post-trim range from 4.5 - 182M (without two of the four hail mary libraries: average 68M, 15.4 - 182M). See [alternate trimming method](https://hackmd.io/JwGRyGu-T76Yl35ilowk2g?both#Re-visit-trimming-and-downstream-processing) below. Percent of reads retained after trimming: ![reads_retained](https://hackmd.io/_uploads/BJgVLkCJR.png) <h3>Remove mitochondrial reads</h3> Aligned trimmed reads to the mitogenome to split mtDNA and nucDNA. Used Mediterranean [FR694649.1](https://www.ncbi.nlm.nih.gov/nuccore/FR694649.1) because reference is Med, though Florida mitogenomes are available (only 2 bp differ?). ``` bbsplit.sh ref=$REFDIR/$MITOREF in1=$TRIMDIR/paired/${SAMPLE}_R1.paired.fq.gz in2=$TRIMDIR/paired/${SAMPLE}_R2.paired.fq.gz basename=$MITODIR/${SAMPLE}_mt_%.fq out1=$SPLITDIR/${SAMPLE}_nomt_R1.fq.gz out2=$SPLITDIR/${SAMPLE}_nomt_R2.fq.gz ``` <h3>Align reads to reference genome</h3> Indexed loggerhead reference genome ``` gunzip -c GCF_023653815.1_GSC_CCare_1.0_genomic.fna.gz module load bwa/0.7.17 bwa index GCF_023653815.1_GSC_CCare_1.0_genomic.fna module load samtools/1.14 samtools faidx GCF_023653815.1_GSC_CCare_1.0_genomic.fna module load picard/2.26.2 picard CreateSequenceDictionary R=GCF_023653815.1_GSC_CCare_1.0_genomic.fna O=GCF_023653815.1_GSC_CCare_1.0_genomic.dict ``` Which reference to use? Used the Mediterranean [ GSC_CCare_1.0](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_023653815.1/). Alternates: Cape Verde coming soon, Wild Genomes Project(?). Aligned reads to referece, skipped .sam file and wrote straight to .bam. ``` bwa mem -t 16 $REFDIR/$REF $SPLITDIR/${SAMPLE}_nomt_R1.fq.gz $SPLITDIR/${SAMPLE}_nomt_R2.fq.gz | samtools view -bS - > $OUTDIR/${SAMPLE}.bam ``` Produce alignment statistics ``` samtools flagstat $OUTDIR/${SAMPLE}.bam > $METDIR/alignment_metrics/${SAMPLE}_alignment_stats.txt ``` Percent of reads aligned: ![percent_aligned](https://hackmd.io/_uploads/ryD1vJCy0.png) Number of reads aligned: ![seqs_aligned](https://hackmd.io/_uploads/HJidYJCJC.png) Sort and index .bam files ``` samtools sort -o $OUTDIR/${SAMPLE}_SORT.bam $OUTDIR/${SAMPLE}.bam samtools index $OUTDIR/${SAMPLE}_SORT.bam ``` Check mean depth (before removing duplicates) ``` conda activate gatk3 cd ./metrics/depth_metrics/pre_dupe/ mosdepth -n --fast-mode -Q 20 ${SAMPLE}_pre_dupe $OUTDIR/${SAMPLE}_SORT.bam cd ../../../ #conda deactivate ``` Coverage (pre-dupe), line indicates 8x ![pre_dupe](https://hackmd.io/_uploads/SyApBykx0.png) Mean pre-dupe coverage: Keewaydin_GOM: 8.07 Keewaydin_SNWA: 6.88 Melbourne_GOM: 8.69 Melbourne_MAB: 7.98 Melbourne_SNWA: 10.07 Remove duplicates ``` java -Xms72G -Xmx90G -jar /modules/apps/picard/2.26.11/picard.jar MarkDuplicates \ I=$OUTDIR/${SAMPLE}_SORT.bam TMP_DIR=$TMP \ O=$OUTDIR/${SAMPLE}_SORT_DR.bam \ M=$METDIR/dupe_metrics/${SAMPLE}_duplicate_metrics.txt \ ASSUME_SORT_ORDER=coordinate \ VALIDATION_STRINGENCY=SILENT \ REMOVE_DUPLICATES=TRUE \ MAX_RECORDS_IN_RAM=500000 ``` Sort files and check depth (post-dupe) ``` samtools index $OUTDIR/${SAMPLE}_SORT_DR.bam conda activate gatk3 cd ./metrics/depth_metrics/post_dupe/ mosdepth -n --fast-mode -Q 20 ${SAMPLE}_post_dupe $OUTDIR/${SAMPLE}_SORT_DR.bam cd ../../../ conda deactivate ``` Duplication rate, line indicates 20% ![percent_dupe](https://hackmd.io/_uploads/SJTmwMexC.png) Coverage (post-dupe), line indicates 8x ![post_dupe](https://hackmd.io/_uploads/BkjZwfgeA.png) Not the coverage we were hoping for (~11x), and most samples are below the coverage needed for hard calls. Was this due in part to the high percentage of reads trimmed at the start? <h3>Re-visit trimming and downstream processing</h3> Trimming seems dependent on library insert size - is there a less stringent trimming option that might keep a higher proportion of reads with low quality at the ends of reads? ![trim_insertsize](https://hackmd.io/_uploads/S15ZtB4x0.png) **Trim alternative 1 (bbtrim):** Instead of using the Trimmomatic sliding-window approach, tried [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/) to see if a higher number of reads would be retained. ``` bbduk.sh \ ref=./adapters.fa \ in1=$FQDIR/${SAMPLE}_1.fq.gz \ in2=$FQDIR/${SAMPLE}_2.fq.gz \ out1=$TRIMBB/${SAMPLE}_R1.bbtrim.fq.gz \ out2=$TRIMBB/${SAMPLE}_R2.bbtrim.fq.gz \ minlen=75 qtrim=r trimq=20 ``` Comparison of percent trimmed ![compare_trimming_percent](https://hackmd.io/_uploads/rkMKtH4lA.png) Using bbduk did result in a lower percent of reads trimmed, then continued with the pipeline as above. Does a lower percent trimmed using bbduk result in higher coverage/more indivduals >8x downstream? Comparison of coverage after removing duplicates ![compare_post_dupe](https://hackmd.io/_uploads/Sk842uNxR.png) Using bbduk increased coverage just a bit for low-coverage samples, though reduced coverage for some of the higher coverage samples. number of IDs with >8x coverage using trimmomatic --> bbduk KEY_GOM: 5 --> 6 KEY_SNWA: 5 --> 5 MB_GOM: 8 --> 10 MB_MAB: 4 --> 5 MB_SNWA: 12 --> 14 There are 7 samples <1x using Trimmomatic, 6 samples <1x using bbduk **Trim alternative 2 (bbtrim36):** Trimming with bbduk and setting a 36bp minimum length retained. The percent trimmed was a bit lower, but did not process through all the following steps for now in case another option is better. ``` bbduk.sh \ ref=./adapters.fa \ in1=$FQDIR/${SAMPLE}_1.fq.gz \ in2=$FQDIR/${SAMPLE}_2.fq.gz \ out1=$TRIMBB/${SAMPLE}_R1.bbtrim36.fq.gz \ out2=$TRIMBB/${SAMPLE}_R2.bbtrim36.fq.gz \ minlen=36 qtrim=r trimq=20 ``` Comparison of percent trimmed with bbduk minlen=36 ([MultiQC](https://drive.google.com/drive/u/1/folders/1cHuyJgKvNmBAVynGJIGSUmSROZEiBcfX)) ![compare_trim_length36](https://hackmd.io/_uploads/S1f5ugig0.png) Not a huge reduction in reads trimmed, did not continue with pipeline to alignment. **Trim alternative 3 (bb_noq):** Trimming adapters with bbduk but with no quality or length filters. ``` bbduk.sh \ ref=./adapters.fa \ in1=$FQDIR/${SAMPLE}_1.fq.gz \ in2=$FQDIR/${SAMPLE}_2.fq.gz \ out1=$TRIMBB/${SAMPLE}_R1.bb_noq.fq.gz \ out2=$TRIMBB/${SAMPLE}_R2.bb_noq.fq.gz \ ktrim=r k=31 mink=8 hdist=1 tpe tbo ``` This method **retained 100% of reads**, while trimming bases from reads to remove adapters. Avgerage read length per individual after adapter trimming 99-148 ![read_lengths_bb_noq](https://hackmd.io/_uploads/SyWedqClA.png) Thoughts: Variation in mean read length may impact coverage? Presumably retaining low quality reads will reduce alignment rates as bwa soft-clips as it aligns to genome. Alignment rates were similar: ![compare_percent_aligned_bb_bbnoq_Trimmomatic](https://hackmd.io/_uploads/HkXer90eC.png) Though the number of reads going in were higher, so total number of reads aligned were higher: ![compare_seqs_aligned_bb_bbnoq_Trimmomatic](https://hackmd.io/_uploads/B1a7U5AlC.png) Note: mosdepth has quality metric, currently set to ignore reads with Q<20. But they're still there? Coverage (after duplicates removed) higher, line at 8x ![compare_post_dupe_bb_bbnoq_Trimmomatic](https://hackmd.io/_uploads/rkxkeGJbR.png) **Trim parameter trouble-shooting** Trimming with bbduk and no quality filter retained all reads, but why is quality trimming not working as expected? Tested 3 IDs with average insert lengths (111, 200, and 400) to see how small changes to the parameters for trimming with Trimmomatic and bbduk impact reads retained. Original parameters in the Trimmomatic script: ``` ILLUMINACLIP:/modules/apps/trimmomatic/0.39/adapters/NexteraPE-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:75 ``` Tested: minlen75 (shown above), minlen36, no leading/trailing + minlen75, no leading/trailing + minlen36 Original parameters in the bbduk script: ``` minlen=75 qtrim=r trimq=20 ``` Tested: minlen75 + qtrim=r + q20 (shown above), minlen36 + qtrim=r + q20, minlen75 + qtrim=rl + q20, minlen36 + qtrim=rl + q20, minlen75 + qtrim=rl + q15, minlen75 + qtrim=rl + q10, minlen75 + qtrim=rl + q5, minlen36 + qtrim=rl + q5 Trim alternative 3 (above) bbduk script with no qualty filters, which retained 100% of reads: ``` ktrim=r k=31 mink=8 hdist=1 tpe tbo ``` Finally, added a right-kmer trim and minimum k to a quality filter, which retained ~99.4% of reads: ``` ktrim=r mink=8 minlen=36 qtrim=rl trimq=15 tpe tbo ``` ![Rplot03](https://hackmd.io/_uploads/SyRvb3eLA.png) The "duk" in “bbduk” stands for Decontamination Using Kmers. It's also used to compare reads to the genome of a potential source of contamination (e.g., E. coli, PhiX) to remove any matching reads. Here's what I think was happening - We'd been using bbduk to search for matches to whole adapter sequences, which for "normal" insert sizes didn't happen all that often. For libraries with short inserts, many more reads included the whole adapter sequence, and any read containing the whole adapter was treated as a match to contamination and removed. "The default is filtering – any read matching a reference kmer will be discarded." By adding the "ktrim = r" parameter, bbduk instead *trims* reads starting at the adapter match and anything to the right of it. The bbktrim option trims only slightly more reads than the no quality-trimming option (plot below), and for many IDs slightly increases final depth. ![trim_pair_percent_retained](https://hackmd.io/_uploads/ry4Odil8C.png) ![Rplot04](https://hackmd.io/_uploads/HyJIQheI0.png) I did not re-run all my samples with updated Trimmomatic, but going back to that option... From the [Trimmomatic manual](http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf): "Trimmomatic uses two strategies for adapter trimming: Palindrome and Simple With 'simple' trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately. 'Palindrome' trimming is **specifically designed for the case of 'reading through' a short fragment** into the adapter sequence on the other end. In this approach, the appropriate adapter sequences are 'in silico ligated' onto the start of the reads, and the combined adapter+read sequences, forward and reverse are aligned. If they align in a manner which indicates 'readthrough', the forward read is clipped and the reverse read dropped (since it contains no new data)." So you end up with unpaired reads. Does that explain the high % of reads lost? Perhaps not what we want, may be why bbduk is better. Adapter file for Trimmomatic: >PrefixNX/1 AGATGTGTATAAGAGACAG >PrefixNX/2 AGATGTGTATAAGAGACAG >Trans1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG >Trans1_rc CTGTCTCTTATACACATCTGACGCTGCCGACGA >Trans2 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG >Trans2_rc CTGTCTCTTATACACATCTCCGAGCCCACGAGAC We had not been using ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>**:<minAdapterLength>:<keepBothReads>** for which: **minAdapterLength**: In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. **keepBothReads**: After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying „true‟ for this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. Trimmomatic with these additions retained ~98.8% of reads. ``` ILLUMINACLIP:/modules/apps/trimmomatic/0.39/adapters/NexteraPE-PE.fa:2:30:10:1:true LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:36 ``` <h3>Prep for analyses in ANGSD</h3> Working through these steps for now with the initial bbduk-trimmed dataset (in <FONT COLOR="#F8766D">**red** </FONT>above) for an initial look at things. Prepare files: Add RG header ``` java -jar -Xms72G -Xmx90G /modules/apps/picard/2.26.11/picard.jar AddOrReplaceReadGroups \ I=$OUTDIR/${SAMPLE}_bb_SORT_DR.bam \ O=$OUTDIR/${SAMPLE}_bb_SORT_DR_RG.bam \ TMP_DIR=$TMP \ SORT_ORDER=coordinate RGID=1 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=${SAMPLE} CREATE_INDEX=True ``` Sort new bam files (with RG header) ``` #samtools index $OUTDIR/${SAMPLE}_bb_SORT_DR_RG.bam ``` Re-align around indels for use with ANGSD ``` conda activate gatk3 gatk3 \ -T RealignerTargetCreator \ -R $REFDIR/$REF \ -I $OUTDIR/${SAMPLE}_bb_SORT_DR_RG.bam \ -o $OUTDIR/${SAMPLE}.intervals \ -drf BadMate gatk3 \ -T IndelRealigner \ -R $REFDIR/$REF \ -I $OUTDIR/${SAMPLE}_bb_SORT_DR_RG.bam \ --targetIntervals $OUTDIR/${SAMPLE}.intervals \ --consensusDeterminationModel USE_READS \ -o $OUTDIR/${SAMPLE}_bb_SORT_DR_RG_IR.bam conda deactivate ``` Removed ~5-10% of reads for failing MappingQualityZeroFilter Final sort and index ``` samtools sort $OUTDIR/${SAMPLE}_bb_SORT_DR_RG_IR.bam -o $OUTDIR/${SAMPLE}_bb_DR_RG_IR_SORT.bam samtools index $OUTDIR/${SAMPLE}_bb_DR_RG_IR_SORT.bam ``` Generate beagle files to move forward with analyses based on genotype likelihoods in ANGSD. The loggerhead reference genome includes chromosomes and ~2000 scaffolds, using just the 28 chromosomes for assessing population structure (see 02_ANGSD.sh). ``` module load angsd/0.935 angsd -GL 1 -out $OUTDIR/all_${CHR} -nThreads 16 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1 -C 50 -bam ./all_bam_paths.txt -r $CHR -ref $REFDIR/$REF ``` Concatenate the outputs per chromosome into a single file ``` cat all*.beagle.gz > all.beagle.gz ``` <h3>Preliminary population structure analysis</h3> <h4> bbduk-trimmed dataset (with quality filtering)</h4> Covariance matrix estimation with PCAngsd for PCA ``` conda activate pcangsd pcangsd -b $OUTDIR/8x.beagle.gz -o $OUTDIR/Results/8x_PCAngsd_covmat conda deactivate ``` Initial PCA including all samples regardless of coverage (n=132) ![all_bb_PCA](https://hackmd.io/_uploads/HynK9eogA.png) There were quite a few outliers here, so ran again with just the samples that had 8x coverage or above. PCA including only samples 8x and above (n=40) ![8x_bb_PCA](https://hackmd.io/_uploads/S130cgog0.png) Two pairs of samples from Melbourne seem to be true outliers. From different populations? Their mitochondrial haplotypes are not unique: MB113, MB121 (mtDNA Cc-A2.1); MB103, MB316 (mtDNA ~Cc-A1.1) Should they be removed before Fst analyses? PCA including only samples >8x (4 outliers removed) ![8x-4IDs_PCA](https://hackmd.io/_uploads/S1aSGykWA.png) Admixture estimation with PCAngsd ``` conda activate pcangsd pcangsd -b $INDIR/8x.beagle.gz --admix -t 16 -e $k -o $OUTDIR/8x_Admix_$k conda deactivate ``` For the >8x dataset k=2 ![8x_admix2](https://hackmd.io/_uploads/H1iN2logA.png) k=3 ![8x_admix3](https://hackmd.io/_uploads/Hyo4neie0.png) k=4 ![8x_admix4](https://hackmd.io/_uploads/S1iV2lse0.png) <h4> bbduk-trimmed dataset (no quality filtering at trimming step)</h4> PCA including all samples regardless of coverage PC1-PC2 (n=132) ![PCA1-2_bb_noq](https://hackmd.io/_uploads/HkWWi5UWC.png) Two outliers are still clear outliers (MB103, MB316), but the other two outliers from previous dataset are less distinct now (MB113, MB121) with added coverage in other IDs. PCA including all samples regardless of coverage PC2-PC3 (n=132) ![PCA2-3_bb_noq](https://hackmd.io/_uploads/rkhPi9U-A.png) PC3 distinguishes MB113 & MB121 from other IDs. <h3>Filtering</h3> Which samples to drop from subsequent analysis due to data quality? Seven IDs aligned to the reference at <90%. | Sample_ID | % aligned | coverage | | -------- | -------- | -------- | | MB303 | 26.59 | 0.37 | |KEY063_2** | 59.02 | 1.71 | | MB111 | 62.93 | 4.38* | | MB265** | 67.34 | 0.28 | | KEY445 | 74.55 | 1.51 | | KEY435** | 75.12 | 0.37 | | KEY456** | 83.97 | 0.47 | *filtered for alignment even though coverage >3 **satellite-tagged turtle One additional ID has coverage <3x: KEY287 (1.31x) With these eight removed (if aligned <90% and/or coverage <3x), leaves n=124: Keewaydin_GOM (-1) 24/25 Keewaydin_SNWA (-4) 21/25 Melbourne_GOM (-1) 24/25 Melbourne_MAB (-1) 28/29 Melbourne_SNWA (-1) 27/28 PCA filtered data PC1-PC2 (n=124) 3x-filtered-bbktrim ![PCA1-2_3xfiltered_ktrim](https://hackmd.io/_uploads/rJfYC3NHR.png) PCA filtered data PC2-PC3 (n=124) 3x-filtered-bbktrim ![PCA2-3_3xfiltered_ktrim](https://hackmd.io/_uploads/SJu502NrC.png) PCA filtered data PC3-PC4 (n=124) 3x-filtered-bbktrim ![PCA3-4_3xfiltered_ktrim](https://hackmd.io/_uploads/Sykh02NrC.png) Low variance explained by PCs 1-4, no clear clusters among rookeries or foraging areas. pcangsd admixture analysis 3x-filtered-bbktrim ![Admix_3xfiltered-ktrim_labeled](https://hackmd.io/_uploads/SyOzUDvBC.jpg) <h3>Prelim Fst Outlier Analyses</h3> Trial first with all 124 filtered IDs and parameters from the leatherback dataset (see 03_Fst_outliers.sh): MIN_MAF=0.05 MIN_DEPTH=1 MAX_DEPTH_FACTOR=4 MAX_DEPTH = (4 * 124) = 496 PERCENT_IND=0.60 MIN_IND= (0.6 * 124) = 74 **Questions: What is optimal value for max depth?** Create MAF files to select for sites that meet the criteria above: ``` angsd -P $NB_CPU -nQueueSize 50 \ -doMaf 2 -GL 2 -doGlf 2 -doMajorMinor 1 -doCounts 1 \ -ref $REFDIR/$REF -remove_bads 1 -minMapQ 30 -minQ 20 -skipTriallelic 1 \ -minInd $MIN_IND -minMaf $MIN_MAF -SNP_pval 1e-6 -setMaxDepth $MAX_DEPTH -setMinDepthInd $MIN_DEPTH \ -b $DIR/3x_filtered_bb_noq_bam_paths.txt -r ${CHR} \ -out $OUTDIR/all_maf"$MIN_MAF"_pctind"$PERCENT_IND"_maxdepth"$MAX_DEPTH_FACTOR"_${CHR} ``` Convert the MAF files per chromosome to a single file, and format for use with -sites command for SFS (keep just first four columns) ``` for q in {473..500}; do zcat all_maf0.05_pctind0.60_maxdepth4_NC_064${q}.1.mafs.gz | grep NC_064* >> outputfile.mafs; done awk 'BEGIN { OFS="\t" }{print $1,$2,$3,$4}' outputfile.mafs > sites_all_maf0.05_pctind0.60_maxdepth4 ``` Index the sites file ``` angsd sites index /nese/meclab/Katrina/cc/bam_paths/sites_all_maf0.05_pctind0.60_maxdepth4 ``` First trial with MAB vs SNWA - SFS by group Pooled IDs from both rookeries for SNWA, MAB is only from Melbourne MAB: n = 28 SNWA: n = 48 ``` for POP in MAB SNWA do echo $POP REFDIR=/nese/meclab/Shared/reference_genomes/ST_reference_genomes/CCare_1.0 REF=GCF_023653815.1_GSC_CCare_1.0_genomic.fna OUTDIR=/scratch/workspace/katrinaphill_umass_edu-simple/ANGSD/for_Fst DIR=/nese/meclab/Katrina/cc/bam_paths MIN_DEPTH=1 MAX_DEPTH_FACTOR=4 PERCENT_IND=0.60 N_IND=$(cat $DIR/$POP'_3x_noq_bam.txt' | wc -l) MIN_IND_FLOAT=$(echo "($N_IND * $PERCENT_IND)"| bc -l) MIN_IND=${MIN_IND_FLOAT%.*} MAX_DEPTH=$(echo "($N_IND * $MAX_DEPTH_FACTOR)" |bc -l) angsd -P $NB_CPU -nQueueSize 50 \ -doMaf 2 -GL 2 -doMajorMinor 3 -dosaf 1 -docounts 1 \ -b $DIR/$POP'_3x_noq_bam.txt' -ref $REFDIR/$REF -anc $REFDIR/$REF -out $OUTDIR/sfs_results_M-S/$POP \ -remove_bads 1 -minMapQ 30 -minQ 20 -skipTriallelic 1 \ -minInd $MIN_IND -setMaxDepth $MAX_DEPTH -setMinDepthInd $MIN_DEPTH \ -sites "$OUTDIR"/sites_all_maf0.05_pctind0.60_maxdepth4 realSFS $OUTDIR/sfs_results_M-S/$POP.saf.idx -fold 1 >$OUTDIR/sfs_results_M-S/$POP.folded.sfs done ``` Generate 2d SFS between the 2 groups, capped number of sites to 300,000,000. ``` realSFS $OUTDIR/sfs_results_M-S/MAB.saf.idx $OUTDIR/sfs_results_M-S/SNWA.saf.idx -nSites 300000000 >$OUTDIR/sfs_results_M-S/MAB.SNWA.sfs ``` Caclulate fst values ``` realSFS fst index $OUTDIR/sfs_results_M-S/MAB.saf.idx $OUTDIR/sfs_results_M-S/SNWA.saf.idx -sfs $OUTDIR/sfs_results_M-S/MAB.SNWA.sfs -fstout $OUTDIR/sfs_results_M-S/MAB.SNWA -whichFst 1 ``` In .err file from running this script: IMPORTANT: please make sure that your saf files hasnt been folded with -fold 1 in -doSaf in angsd Sliding window Fst scan with fst stats ``` realSFS fst stats2 $OUTDIR/sfs_results_M-S/MAB.SNWA.fst.idx -win 10000 -step 1000 >$OUTDIR/sfs_results_M-S/MAB.SNWA.fst.slidingwindow ``` Fst Outliers per chromosome for MAB-SNWA foragers ![MAB-SNWA_Fst_plot](https://hackmd.io/_uploads/B1l84muA-0.png) Max Fst 0.18 on chromosome 5 (NC_064477.1) **Fst Outlier parameter testing** *All below with the filtered bb-ktrim dataset* **Parameters tested** Minimum MAF: 0.05, 0.03, 0.01 Percent of individuals: 60%, 17% Max depth factor: 4, 15, 30 Sites file based on subsets or full dataset No sites file (with and without min MAF) **Combinations tested** Fst: KEY-GOM v SNWA, pooled-GOM v SNWA, MB-GOM v SNWA, MB-MAB v SNWA, MB - MAB v GOM, KEY-random v random, pooled-rookery v rookery PBS: MB-GOM v SNWA v MAB, pooled-GOM v SNWA v MAB [Summary of fst results](https://drive.google.com/drive/u/0/folders/1wk9s2wj4VLvmBrTQu1e1gVrwss-zh-pR): see fst_analyses_summary.pdf Results comparisons of note 1. Depth factor 4 -> 15 ![KEY-depth4_to_depth15](https://hackmd.io/_uploads/HJCE89l8R.png) The average depth of this dataset is ~9x, so depth factor needs adjusting 2. Depth factor 15 -> 30 ![pooled-depth15_to_depth30](https://hackmd.io/_uploads/SJHtI9lLC.png) Only slight differences between depth factors 15 and 30. Moving forward using depth factor 30. 3. Fst between 2 foraging sites vs between 2 randomly sorted groups ![KEY-depth15_to_random](https://hackmd.io/_uploads/HJ5NT5xLA.png) Overall range of Fst values doesn't differ much, though the locations of outlier peaks do. **Is Fst between foraging sites any more informative than random differences?** 4. Fst between two foraging sites vs between the two rookeries ![pooled_depth30_to_rookery](https://hackmd.io/_uploads/rkBrj9g8R.png) Right panel less 6 IDs for equal sample sizes **Notes for creating -sites file for Fst analyses using all samples:** MIN_MAF=0.05 (retained more sites with lower min_maf but resulting Fst peaks similar) MIN_DEPTH=1 MAX_DEPTH_FACTOR=30 PERCENT_IND=0.17 (because smallest group is Keewaydin-SNWA with 21 samples out of the 124 total) *revisit, perhaps go back to 0.60 And for the SFS per group: MIN_DEPTH=1 MAX_DEPTH_FACTOR=30 PERCENT_IND=0.60 <h3>Fst Outlier Tests</h3> 3-way PBS within the MB rookery vs across rookeries ![PBS_depth30_vMAB_MB_to_pooled](https://hackmd.io/_uploads/B1Tk14hD0.png) Reduced group sizes for equal sampling based on foraging assignment poterior probability (n=24 x3 for MB and n=28 x3 when pooled) In examples above, there may be a peak of interest on chromosome 6 when comparing foraging sites GOM (n=48) and SNWA (n=48). This peak does not appear when comparing these phenotypes just within the KEY rookery (left panel of #3, n=21 v n=24). ![pooled-3xbbktrim-depth30-GOM.SNWA.fst_plot_99.9_chrom6_composite](https://hackmd.io/_uploads/ry6EQjgLA.png) Perhaps related to DLK1 Keep an eye on chrom 15, early trials including the MAB foraging group indicated outlier peak near end of chrom. When re-doing Fst analysis in angsd for just the region of the peak shown, with 1-bp sliding window to ID location of outliers ![smaller_windows_lowres](https://hackmd.io/_uploads/rJT7J5Alye.png) How to set outlier threshold? The Fst values are higher with the smaller sliding window than in the original analysis ![trio_region_fst_options](https://hackmd.io/_uploads/Hk9ERjQzJx.png) ![duo_region_fst_options](https://hackmd.io/_uploads/ryGdCiXfkl.png) [Poster presented at Evolution 2024](https://drive.google.com/drive/u/0/folders/1dsKPCLf2JoiBhqN6GCyeDMkVc9yRpRZz): see Evolution_Phillips_draft5.pdf <h3>Prelim Baypass Analyses</h3> Started with data from one rookery (KEY) and comparing 2 foraging sites (GOM - SNWA) - core model, no covariates. ![KEY_G.S_XtX_plot_99.9](https://hackmd.io/_uploads/S1BBeEhvR.png) 12 outliers (in <FONT COLOR="#FF0000">red</FONT> above) across 6 chromosomes: chrom1, chrom2, chrom4, chrom9, chrom13, chrom26 Peak on chromosome 1 similar to the Fst outlier plot for KEY:GOM-SNWA (below) ![KEY-3xbbktrim_n-21_depth30_sites124_GOM.SNWA.fst_plot_99.9](https://hackmd.io/_uploads/H1BCg4hPC.png) Baypass outliers (Rookery: Keewaydin; Foraging sites: GOM-SNWA) *also indicated as outlier in Fst analyses | chromosome | position | gene | | -------- | -------- | -------- | | NC_064473.1 | 25554446 | MRE11* | | NC_064473.1 | 25570950 | MRE11* | | NC_064473.1 | 25588436 | MRE11* | | NC_064473.1 | 25589016 | MRE11* | | NC_064473.1 | 188840374 | LOC125645292 | | NC_064473.1 | 282743125 | HMGA2 | | NC_064474.1 | 153545914 | LOC125632159*; LOC125632161*; MYLIP | | NC_064474.1 | 263278740 | no gene at this location? | | NC_064476.1 | 33553153 | KIAA1549L | | NC_064481.1 | 86517141 | no gene at this location? | | NC_064485.1 | 36053543 | SLC26A11* | | NC_064498.1 | 1645240 | KMT2D | For the same analysis in the other rookery (MB) and comparing 2 foraging sites (GOM - SNWA) - core model, no covariates: 0 outliers ![MB_G.S_XtX_plot_99.9](https://hackmd.io/_uploads/HyP-LE1a0.png) The equvalent plot for MB:GOM-SNWA (below) ![MB_G.S_n-24_depth30_sites124_plot_99.9](https://hackmd.io/_uploads/rkzsCKCg1e.png) Next up - PCA with just outlier SNPs? From Fst + Baypass <h3>GATK haplotype calling</h3> For the 82 samples that have coverage = or >8 Keewaydin_GOM 16/25 Keewaydin_SNWA 14/25 Melbourne_GOM 17/25 Melbourne_MAB 15/29 Melbourne_SNWA 20/28 **Consider including lower coverage samples?** 1. call haplotypes per indivdual, see `10_GATK_HaplotypeCaller-perindiv.sh` ``` gatk \ HaplotypeCaller \ --java-options "-Xms60G -Xmx90G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \ -R ${REFDIR}/${REF} \ -ERC GVCF \ -mbq 20 \ -I ${ALIGNDIR}/${SAMPLE}_bb_ktrim_DR_RG_IR_SORT.bam \ -O ${OUTDIR}/${SAMPLE}_bb_ktrim_DR_RG_IR_SORT.bam.g.vcf ``` 2. Import into database per chromosome, see `11_GATK_GenomicsDBImport.sh` ``` gatk --java-options "-Xmx240g -Xms120g -XX:ParallelGCThreads=2" GenomicsDBImport \ --sample-name-map sample_map \ --genomicsdb-shared-posixfs-optimizations true \ --batch-size 20 \ --merge-input-intervals \ --genomicsdb-workspace-path ${OUTDIR}/${CHR} \ -L ${CHR} ``` Completed in ~18 hours 3. Joint genotyping, see `12_GATK_JointGenotyping.sh` ``` gatk --java-options "-Xmx2g -Xms2g -XX:ParallelGCThreads=2" GenotypeGVCFs \ -R ${REFDIR}/${REF} \ -V gendb:///scratch2/workspace/katrinaphill_umass_edu-cc_WGR/VCFs/${CHR} \ -O ${OUTDIR}/WGR_82_Cc_${CHR}.vcf ``` 4. Select variants, see `13_GATK_selectVariants.sh` ``` gatk \ SelectVariants \ --java-options "-Xms60G -Xmx90G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \ -R ${REFDIR}/${REF} \ --select-type-to-include SNP \ --select-type-to-include INDEL \ -L ${CHR} \ -V ${INDIR}/${VCF} \ -O ${OUTDIR}/${SelV_VCF} ``` 5. Filter variants, see `14_GATK_FilterVariants.sh` ``` for CHR in {473..500}; do input_vcf=/scratch2/workspace/katrinaphill_umass_edu-cc_WGR/SelectVariants_VCFs/SelV_82_Cc_NC_064${CHR}.1.vcf #nee tvcf fileprefix=$(echo NC_064$CHR.1) vcftools --vcf $input_vcf --min-meanDP 5 --max-meanDP 200 --recode --mac 1 --out /scratch2/workspace/katrinaphill_umass_edu-cc_WGR/filtered_VCFs/${fileprefix} #nee /fvcf/ done ``` 6. Concatenate filtered VCFs, see `15_GATK_Concat_filteredvcf.sh` ``` bcftools concat $INDIR/NC_064*.recode.vcf -o /scratch2/workspace/katrinaphill_umass_edu-cc_WGR/filtered_VCFs/Cc_82_combined_filtered.vcf ``` PCA of VCF file ``` #RScript using eigenvec and eigenval output # read in data pca <- read_table("./VCF/Cc_WGR.eigenvec", col_names = FALSE) eigenval <- scan("./VCF/Cc_WGR.eigenval") # sort out the pca data # remove nuisance column pca <- pca[,-1] # set names names(pca)[1] <- "ind" names(pca)[2:ncol(pca)] <- paste0("PC", 1:(ncol(pca)-1)) # sort out by population/location loc <- rep(NA, length(pca$ind)) loc[grep("KEY", pca$ind)] <- "Keewaydin" loc[grep("MB", pca$ind)] <- "Mebourne" # remake data.frame pca <- as_tibble(data.frame(pca, loc)) # first convert to percentage variance explained pve <- data.frame(PC = 1:20, pve = eigenval/sum(eigenval)*100) # make plot a <- ggplot(pve, aes(PC, pve)) + geom_bar(stat = "identity") a + ylab("Percentage variance explained") + theme_light() # calculate the cumulative sum of the percentage variance explained cumsum(pve$pve) # plot pca b <- ggplot(pca, aes(PC1, PC2, col = loc)) + geom_point(size = 3) + scale_colour_manual(values = c("red", "blue")) + coord_equal() + theme_light() + xlab(paste0("PC1 (", signif(pve$pve[1], 3), "%)")) + ylab(paste0("PC2 (", signif(pve$pve[2], 3), "%)")) ``` ![vcf_PCA](https://hackmd.io/_uploads/B1za1cd4Jx.png) Not the same two outliers as the ones from the angsd PCA (these two from different rookery) Optional: snpEff.sh to annotate variant sites Another PCA analysis using package SNPrelate ([tutorial](https://www.bioconductor.org/packages/release/bioc/vignettes/SNPRelate/inst/doc/SNPRelate.html)) ``` library(gdsfmt) library(SNPRelate) #CONVERT PLINK FILES TO GDS bed.fn <- "./Cc_WGR_numericChr.bed" fam.fn <- "./Cc_WGR_numericChr.fam" bim.fn <- "./Cc_WGR_numericChr.bim" # Convert snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, "test.gds") # Summary snpgdsSummary("test.gds") #DATA ANALYSIS # Open the GDS file genofile <- snpgdsOpen("test.gds") # Get population information pop_code <- scan("pop.txt", what=character()) # Get sample id sample.id <- read.gdsn(index.gdsn(genofile, "sample.id")) #LD pruning set.seed(1000) # Try different LD thresholds for sensitivity analysis snpset <- snpgdsLDpruning(genofile, ld.threshold=0.2) # Get all selected snp id snpset.id <- unlist(unname(snpset)) head(snpset.id) # Run PCA pca <- snpgdsPCA(genofile, snp.id=snpset.id, num.thread=2) # variance explained (%) pc.percent <- pca$varprop*100 head(round(pc.percent, 2)) # Get sample id sample.id <- read.gdsn(index.gdsn(genofile, "sample.id")) # assume the order of sample IDs is as the same as population codes head(cbind(sample.id, pop_code)) # Make a data.frame tab <- data.frame(sample.id = pca$sample.id, pop = factor(pop_code)[match(pca$sample.id, sample.id)], EV1 = pca$eigenvect[,1], # the first eigenvector EV2 = pca$eigenvect[,2], # the second eigenvector stringsAsFactors = FALSE) head(tab) # Draw 1-2 plot(tab$EV1, tab$EV2, pch=19, col=as.integer(tab$pop), xlab="eigenvector 1", ylab="eigenvector 2") legend("bottomright", legend=levels(tab$pop), pch=19, cex=1, col=1:nlevels(tab$pop)) #Plot the principal component pairs for the first four PCs: lbls <- paste("PC", 1:4, "\n", format(pc.percent[1:4], digits=2), "%", sep="") pairs(pca$eigenvect[,1:4], col=tab$pop, pch=19, labels=lbls) ``` ![PCA_1-2](https://hackmd.io/_uploads/ryfgyWI9Jg.png) ![PCA-grid_1-4](https://hackmd.io/_uploads/HkMl1WIckg.png) As an aside, analysis of relatedness Identity by descent (IBD) using PLINK method of moments (MoM) ``` ##Relatedness- Keewaydin sample.id <- read.gdsn(index.gdsn(genofile, "sample.id")) KEY.id <- sample.id[pop_code == "Keewaydin"] # Estimate IBD coefficients ibd <- snpgdsIBDMoM(genofile, sample.id=KEY.id, snp.id=snpset.id, maf=0.05, missing.rate=0.05, num.thread=2) # Make a data.frame ibd.coeff <- snpgdsIBDSelection(ibd) head(ibd.coeff) plot(ibd.coeff$k0, ibd.coeff$k1, xlim=c(0,1), ylim=c(0,1), xlab="k0", ylab="k1", main="KEY samples (MoM)") lines(c(0,1), c(1,0), col="red", lty=2) ``` Not entirely sure how to interpret, but seems higher relatedness among Melbourne Beach samples than Keewaydin samples, which would be surprising k0 = IBD coefficient, the probability of sharing ZERO IBD k1 = IBD coefficient, the probability of sharing ONE IBD ![kinship-MoM-sidebyside-KEY-MB](https://hackmd.io/_uploads/S1ntlWI5yx.png) ![kinship-MoM-all](https://hackmd.io/_uploads/HJ_ygbU9kl.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully