aglucaci

@aglucaci

Joined on Jan 21, 2021

  • Title Evolutionary analyses of Vertebrate Nucleoporin 214 (Nup214): Insights into a Fundamental Component of the Nuclear Pore Complex Results Figure 1. Phylogenetic Tree Figure 2. Purifying and Positive selection in vertebrate Nup214 You can view the interactive results here: http://vision.hyphy.org/FEL?resultsUrl=https://raw.githubusercontent.com/aglucaci/AOC-Nup214/main/results/Craniata/nup214_processed_distributed_recombinationfree.fas.FEL.json We find a high degree of purifying selection (2266/5716) ~40% (All sites). We find a high degree of purifying selection (2266/3594) ~63% (Non-invariant sites). We find 2122 invariant sites out of 5716 sites, ~37%. Positive selection (re: adaptation) in 46/3594 = 1.3% of sites. Note: These need to be mapped to the Human reference
     Like  Bookmark
  • Observable page: https://observablehq.com/@aglucaci/visualizing-selection-analysis-results-for-evolution-of-t/11 WHO Variant tracking page: https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/ Outbreak.info page for C.37: https://outbreak.info/situation-reports?pango=C.37 CoVSpectrum page for C.37: https://cov-spectrum.ethz.ch/explore/Switzerland/AllSamples/AllTimes/variants/json=%7B%22variant%22%3A%7B%22name%22%3A%22C.37%22%2C%22mutations%22%3A[]%7D%2C%22matchPercentage%22%3A1%7D Data retrival on GISAID on July 7th 2021 with 2,127 sequences. (Params: low coverage excl.)
     Like  Bookmark
  • Wuhan reference at the site is amino acid L, so we are looking for L452 to anything. This initial look excludes Mu (B.1.621) Also of note, this initial look excludes any codon coding for Leucine (L). Meaning synonymous L. L452 to anything WILDTYPE_452 = ["TTA", "TTG", "CTT", "CTC", "CTA", "CTG"] Initial software and exports are located in
     Like  Bookmark
  • Authors: Alexander G Lucaci1, Jordan D Zehr1, Stephen D Shank1, Darren Martin2, Anton Nekrutenko3, Sergei L Kosakovsky Pond1 Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, USA Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA Objectives We identify genomic sites in μ (B.1.621, Latif et al. 2021) clade sequences that may be subject to selective forces and could be prioritized for further studies, but have not yet reached high frequencies. Our approach
     Like  Bookmark
  • Last update 3/4/2021 Data in: /home/aglucaci/FMM_SARS2/2021/SARS-CoV-2 /home/aglucaci/FMM_SARS2/2021/SARS-CoV-2/data/fasta/1 Data curation Ran through a modified version of SARS2 pipeline https://github.com/veg/SARS-CoV-2/tree/master
     Like  Bookmark
  • Abstract Introduction Results Quick summary text In general, not much positive selection going on, however extensive negative selection in the main functional NGF domain. This is evidenced from BUSTEDS result, did not find evidence for gene-wide positive selection. If we dig deeper (with more sensitive selection analyses), with MEME we see some activity in early half of gene (loosely, a prepro/regulatory region). Lots of negative selection in latter half of the gene from FEL (loosely the NGF domain).
     Like  Bookmark
  • An application for user-friendly state of the art molecular sequence analysis. Software availability Github repository https://github.com/aglucaci/GeneInvestigator Retreive input data from NCBI Orthologs Transcript and Protein data from orthologous sequences. For example, if we are interested in the TP53 gene: https://www.ncbi.nlm.nih.gov/gene/7157/ortholog/?scope=117570&term=TP53
     Like  Bookmark
  • Thoughts Use: https://www.ncbi.nlm.nih.gov/kis/ortholog/627/?scope=7776 for data ~163 species Use 1 gene per species. Subtranscripts would just be copies of the "main" transcript. May also bias the dataset. Don't bealign. We can map back to human sites (meaning sites corresponding to the human bdnf) since that is what we are interested in
     Like  Bookmark
  • The Problem When you set out to ask a question such as: "How has the selection/selective pressure in gene X change between N species" one might think it necessary to gather as many sequences as possible. However (speculation) there may be a point of diminishing returns, where less is actually more. Capturing sufficient diversity within the dataset by reducing the total number of sequences in your dataset might be a useful place to start. As datasets grow larger and larger, methods will have to adapt to handle the work loads... etc Abstract Here, we present an evaluation of two subsampling procedures based on (1) trimming a phylogeny or (2) reducing an alignment based on genetic distances. We apply both the "full" dataset and our subsampled datasets through a series of standard techniques in the molecular evolution toolkit. These include the use of FEL and MEME, available through the widely utilized HyPhy software suite. Our results on empirical and synthetic datasets indicate that an optimal subsampled alignment exists for alignments or phylogenies with short branches (clarify this). The subsampling procedure is applicable for datasets with short branches (or distances) or in situtations with large datasets where direct computation is impractical or computationaly burdensome (infeasible?). Organization of the work Evalutation with Treemmer Empirical alignment with p53
     Like  Bookmark