Viruses are as old as life itself, infecting everything from bacteria to humans. Since the very beginning, viruses have left an indelible imprint on the human genome, human history, and medical research. SARS-CoV-2, the viral pathogen responsible for the COVID-19 pandemic, is just the latest in a long line of highly impactful human pathogens, including smallpox, Influenza A virus (IAV) and HIV-1. By some estimates, close to 30% of human proteins are involved in combating viral infections. Viral pathogens, especially RNA viruses such (HIV-1, IAV, SARS-CoV-2), mutate very quickly, with some viruses generating every possible single-nuclotide mutation in each new viral generation. Immune reponses, antiviral drugs, competition between viral strains, transmission between species and hosts create some of strongest evolutionary forces that have been reported in evolutionary biology. Viruses are also some of the most sequenced organisms in existence (e.g. ~15M SARS-CoV-2 genomes have been sequenced in 2020-2022). The mission of the Center for Viral Evolution at Temple University is to Create computational and statistical learning approaches for the analysis of genomic data from rapidly evolving viral pathogens. Develop scalable software tools for processing large volumes of viral sequence data and deliver actionable and interpretable results. Apply these tools and techniques to learn how past viral evolution informs their present ability to adapt to our responses, and predict likely future paths viruses may take.
1/7/2023BW Indexed in: https://hackmd.io/@hannahkimincompbio/Sk9T_TIBY Writeup leader: Steven WeaverGoogle doc draft 1: https://docs.google.com/document/d/1ERRQVBIyBt_98uRQ7f4EvkgT1L2xJ9pNT-pmHBd1MzA/edit?usp=sharing Google doc draft 2: https://docs.google.com/document/d/1rnGZZZrcIzI6YtZlFgXri3j-mknFHXHonn_kK8h8U_g/edit?usp=sharing Project board: https://github.com/users/stevenweaver/projects/2/views/1 Authors: Sergei Pond, Steven Weaver, Jordan D Zehr, Alexander Lucaci, Hannah Kim, Avery Selberg Institutions: iGEM Potential delivery date: November 19th, 2021 (earliest) Abstract Write last
1/7/2023Your task is to write a Python script which implements a simple Neighbor Joining algorithm for phylogeny inference based on the Jukes Cantor distance, that works reasonably fast for ~20 or so sequences. Do not use "prepackaged" routines from BioPython, other than to read sequence files. Input : a FASTA multiple sequence alignment Output: inferred phylogenetic tree with branch lengths (you can choose the format, but Newick format is standard) For example $python3 NJ.py --msa test.fas ... (((human:0.01, chimp: 0.02): 0.03, gorilla : 0.03, orangutan : 0.03) : 0.01, gibbon : 0.05)
10/24/2022From Drabeck et el "Tall" dataset. This dataset has unusual dimensions: short (30) codons and relatively many (199) sequences. This creates some statistical issues that could potentially be impactful. Many branch lengths are =0 (for example, HyPhy collapses 130 internal tree branches because they are 0). This is not biologically realistic. If possible, it would be better to estimate branch lengths from a longer gene alignment, even if not all species are present. The precision with which non-0 branch lengths are estimated is degraded (likely biased). This could create downstream issues with all method. Methods which draw power from sequence length (e.g. aBSREL, BUSTED, and also PAML), are going to suffer power loss. Basic data exploration.
9/19/2022