maripav
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # COVID mutations analysis --- ## Summary We analyzed the variants within patients (intra-host diversity) in combination with the variants in the *covid* population. Thus, we had *two* levels of variability. Within and between patients. The goal of the analysis was to understand the within-patient variability. The following sub-questions were tackled: (i) can we trust the within patient variability; (ii) How should the frequency of the within-patient variability be modeled; (iii) Can we infer selective forces by combining the within-patient and between-patient variability? Briefly, the results show that the within-patients frequencies are noisy and characterized by large variance. Also, at least, low frequency variants cannot be trusted and they may often be artifacts. Finally, by combining the within-patient and between-patient variability we may infer mutations that may be under selection forces even though they are - still - in low frequency. ## Methods ### Data **Greek samples**: NGS samples from the Greek cohort were obtained by following the **paragon** protocol. The primer set comprises the following locations. The BED file is located here: [The **paragon** BED file](SARSCoV2.paragon.bed) Primer-sets are important because in the analysis the part of the short reads that corresponds to the primer locations were soft-clipped. Thus, they didn't participate in the SNP calling analysis. ### Data Location All analysis is located in the folder `/home/pavlos/covid_analysis_pavlos`. The directory is actually hosted in the msc-server (ip: 139.91.75.56, directory:/home/pavlos/covid). This directory is mapped to `rantaplan`, in the folder `covid_analysis_pavlos`, and in `zoya`, in the folder `/home/pavlos/covid_analysis_pavlos`. The *signal-pipeline* has been performed within the directory `covid-19-signal`. There, we can see the following directories: * `resources`: contains different files necessary for the analysis such as the primer sets. We have put there the **paragon** primer sets. By default, the directory contains (in the folder 'primer_schemes') the artic_v3 and the liverpool primer scheme. * Note: in our primer scheme (the paragon), described in the file `SARSCoV2.paragon.bed`, I have substracted 1bp from the beginning of the left end of the left primer. This is because the primer masking was not performed correctly during the `ivar` process. * `data`: The data directory contains mainly fasta and gff3 files necessary for the analysis. The `ls` command, returns the following files: ```{text} pavlos@curie:~/covid/covid-19-signal/data$ ls composite_human_viral_reference.fna composite_human_viral_reference.fna.sa MN908947.3.gbk composite_human_viral_reference.fna.amb GRC38_no_alt_analysis_set.fna MN908947.3.gff3 composite_human_viral_reference.fna.ann Kraken2 NexteraPE-PE.fa composite_human_viral_reference.fna.bwt MN908947.3.fasta composite_human_viral_reference.fna.pac MN908947.3.fasta.fai ``` * `/home/pavlos/covid/covid-19-signal/tal_results_dir`: contains all the results. The results are organized in folders, one per sample, for the signal pipeline. Furthermore, we have created some additional folders, the `pair_alignments`, the `snp_consistency` and the `depth` and `consensus` which contain some specific analysis as we will describe later in the text. * `/home/pavlos/covid/scripts`: Here we have saved the scripts, mainly R and perl, for the analyses. The `ls` command gives the following output: ```{text} ## usage: ./align_sequences.pl - | -sam <INPUT FILE> -goodAl <OUTPUT FILE> -alignments <OUTPUT FILE> -stats <OUTPUT FILE> -misal <OUTPUT FILE> ## performs needleman-wunsch pairwise alignment. It is used to discover whether pair-reads match together or they are characterized by mismatches or indels. A strange thing in the data is that despite paired reads should start/end at the same location, they often don't do so. Probably because larger fragments were obtained in the first place. Also there are several mismatches. align_sequences.pl --------------------------------------------------------------------------------------- ## Given some positions on the genome, this script just goes into the mpileup file to count what nucleotides were found at certain positions. It is used to study whether certain variants found to be polymorphic within individuals from Greece, are also polymorphic in other locations (especially those that the variants do not segregate in the population (e.g. Brazil)) analyze_mpileup.R --------------------------------------------------------------------------------------- ## It reads the gisaid alignment and a list of interesting positions (in our case, these are the polymorphic positions in our samples from the Talianidis lab). Then, given the positions it maps the positions back to the reference genome *in the alignment* to find the new positions in the aligment (maybe these are different due to the '-' character') and it reports: - per continent - per country The states that are found in these genomic locations. This script was used to find out whether the within-individuals polymorphic positions are also polymorphic in different populations around the globe continents_info_gisaid.R --------------------------------------------------------------------------------------- ## To script to exei grapsei h Maria ## TODO ... description double_covered_regions.r --------------------------------------------------------------------------------------- ## scripts from the UCSC to convert fa to Tab and VCF format respectively faToTab faToVcf --------------------------------------------------------------------------------------- ## Creates a bed file with amplicons instead of primers ./get_amplicons_from_primers.pl -in <PRIMER FILE> As input, it uses the bed file for the primers (paragon file). The output looks like that: NC_045512.2 29482 29574 340 nCoV-2019_2 + NC_045512.2 29570 29666 341 nCoV-2019_1 + NC_045512.2 29655 29751 342 nCoV-2019_2 + NC_045512.2 29731 29844 343 nCoV-2019_1 + get_amplicons_from_primers.pl --------------------------------------------------------------------------------------- ## Creates a bed file with amplicons instead of primers ./get_amplicons_from_primers.pl -in <PRIMER FILE> As input, it uses the bed file for the primers (paragon file). The output looks like that: NC_045512.2 29482 29574 340 nCoV-2019_2 + NC_045512.2 29570 29666 341 nCoV-2019_1 + NC_045512.2 29655 29751 342 nCoV-2019_2 + NC_045512.2 29731 29844 343 nCoV-2019_1 + get_amplicons_from_primers.pl --------------------------------------------------------------------------------------- ## A: keep snp positions from ivar.tsv # saved at: '/home/maria/covid_analysis_pavlos/gisaid/our_SNPpos.txt' #B: make matrix of snps from all samples: #PINAKAS ME ROWS=POSITIONS, COLS=SAMPLES kai VALUES=ALT_FREQ #C: same pinakas as B, containing a column for the gisaid samples as well # all (onoma) our_snpPOS.r --------------------------------------------------------------------------------------- ## it plots the coverage and the number of primers per position. Primers may overlap so in some regions, there are two amplicons that are present and therefore the coverage is greater. plot_depth.R --------------------------------------------------------------------------------------- ## it plots the coverage and the number of primers per position. Primers may overlap so in some regions, there are two amplicons that are present and therefore the coverage is greater. plot.R --------------------------------------------------------------------------------------- ## TODO ## It prints the number of mismatches and gaps of the paired end reads. plot_stats_tsvs.R --------------------------------------------------------------------------------------- ## TODO plot_tsvs.R --------------------------------------------------------------------------------------- ## TODO check if this is correct Prints out the number of mismatches and gaps from the paired end reads. posListScore2.pl --------------------------------------------------------------------------------------- ## samme as posListScore2.pl posListScore3.pl --------------------------------------------------------------------------------------- ## samme as posListScore2.pl posListScore.pl --------------------------------------------------------------------------------------- ## converts the tab format from paragon to bed format ./primTabtoBed.pl -in <FILE>\n\n primTabtoBed.pl --------------------------------------------------------------------------------------- # PERIGRAFI: # kaloume to IVAR gia ta BAMS tou kathe amplicon # kanoume merge ta BAMS apo ta diadoxika amplicon kai kaloume to IVAR ## First run the script split_all_bams.r (it's in the scripts folder) to split the bams of each sample ## into separate bam files, one for each amplicon ## go to the ~/covid_analysis_pavlos/covid-19-signal/tal_results_dir/snp_consistency and type 'Rscript split_all_bams.r' ## then each folder will contain SAM files ## activate the conda environment ## ACTIVATE CONDA OUTSIDE THE SCRIPT ## conda activate /home/maria/SNAKE_COVID/conda/5258de7f ## 1. go to the ~/covid_analysis_pavlos/covid-19-signal/tal_results_dir/snp_consistency directory runIvarForSample.sh --------------------------------------------------------------------------------------- ## It runs ivar after producing the mpileup file runIvar.sh --------------------------------------------------------------------------------------- ## # Xwrizoume ta BAM arxeia kathe deigmatos # se mini SAMs pou to kathe ena periexei ta reads pou proerxontai apo ena amplicon #input: */core/*_viral_reference.mapping.primertrimmed.sorted.bam #output dir: covid_analysis_pavlos/covid-19-signal/tal_results_dir/snp_consistency ### NOTE: edw exw kalesei to consistencyCheck.pl (9/2/2021) ### to opoio metaonomastike kai egine store sto: scripts/splitBAM_to_amplicons.pl split_all_bams.r --------------------------------------------------------------------------------------- ## "It checks the amplicon overlap regions to study whether amlplicons from different primers give consistent results\n./s\ amtools view -h BAM | consistencyCheck.pl -bed <FILE WITH PRIMERS INFO *> -snp <SNP FILE>\n\n\t* CHROM LEFT_PRIMER_START RIGHT_PRIM\ ER_END\n\n" splitBAM_to_amplicons.pl --------------------------------------------------------------------------------------- ## prints out the p-values for the test: The variants in the current position do not depend on the major state on the next positon, based on the HyperGeometric distribution transition_prob3.R --------------------------------------------------------------------------------------- ## it finds out whether there is a bias on the type of mutations (2nd more frequent base per site) given the state in the next position. This version is for the gisaid data transition_prob_gisaid.R --------------------------------------------------------------------------------------- ## TODO variant_consistency_statistics.R ``` * `/home/pavlos/covid_analysis_pavlos/gisaid`: contains gisaid data and some subsets of it (brazilian samples). The version of the gisaid data is the 0312 (March 12th, 2021) ### The signal pipeline The signal pipeline is described nicely on the log files generated when the signal pipeline was executed. The signal pipeline is described in this file: [The signal pipeline](2021-01-28T194932.692546.snakemake.log). #### Within patients variability The within-patient-variability was estimated by the typical process of mapping short-read to the reference genome and following the **SIGNAL** pipeline. For the Greek samples used in the analysis, we follow the exact SIGNAL pipeline to identify as accurately as possible the mutations present in the samples as well as their frequency. For the Brazilian samples we followed a simpler pipeline, in which only mapping (bwa) and `samtools mpileup` was used, since we were interested only in the bases that are mapped at certain locations. ##### Generating a profile for the within-patient variability We developed R-functions to process the *mpileup* files (bcf files). Note here, that we do not perform variant calling. **Instead we count all A,C,G,T states that have been called no matter the quality of the call**. We argue for not performing the whole (more strict) variant calling pipeline because the information from many individuals is summed up. There is no reason of systematic bias for all individuals. Furthermore, we really do care about **the most frequent and the second-most-frequent** base call, which probably shouldn't be mistakes. The resulting table from all the within-patient variability is a `4xL` matrix (normalized or unormalized), where `L` is the length of the viral genome. ![Normalized-per-column count of states (from the mpileup file)](https://i.imgur.com/GanvLCV.png) ##### Normalization of the within-patient variability **For a single position**, each individual does not have the same number of reads. Thus, individuals with more reads at a position, provide more information for this position, affecting the total table more than other individuals. To correct for this (that perhaps represents a bias), we normalized the number of reads to a certain fixed number of reads (here we used 10,000, but actually this number doesn't play any role since eventually we do care only about the frequences.) #### Between patients divergence The between-patient-variability was measured by data from the *GISAID* website. We used the 0312 version (i.e. March 12th, 2021) (full data). Data are already aligned with MAFFT (details are not provided by the GISAID website). ##### Generating a profile for the between-patient divergence We developed an R-script to process the GISAID MSA and *count the occurences of each state for every position* (only for the A,C,G,T states). The main function is illustrated here: ```{r} ## alphabet = c(A,C,G,T) getFeatures <- function(alignment, pos){ res <- list() columns <- as.numeric(refPos["aligned.ref", as.character(pos)]) for(i in columns){ res[[length(res)+1]] <- table(factor(alignment[,i], levels=alphabet)) } names(res) <- as.character(pos) return(res) } ```

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully