# FEL with parametric bootstrap and much better visualization.
> Improving method performance for smaller alignments/test sets (HyPhy versions 2.5.33 and later)
FEL (Fixed Effects Likelihood) is a tool that we [originally developed in 2005](https://academic.oup.com/mbe/article/22/5/1208/1066893) to perform a "non-parametric" test of natural selection acting on individual alignment sites.
The method essentially estimates (site-by-site) a pair of evolutionary substitution rates: α (synonymous substitutions) and β (non-synonymous substitutions) and performs a statisitical hypothesis test <tt>is α = β?</tt>. If the null hypothesis is rejected at some significance level (e.g. p≤0.05), then selection is inferred: negative/purifying if α < β and positive/diversifying otherwise.
The significance for the test is derived using standard asymptotics which [work well if the sample size is large enough](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0094534). In this case, the sample size is ~the number of tested branches, which can be small (even ONE!). Motivated in part by own analysis of small samples (~20 sequences) of canine and feline coronavirues, we modified FEL to use parametric bootstrap at each site to obtain significance. This is of course much more expensive (xK, where K is the number of replicates for bootstrap replicates), but should result in a more accurate definition of the null distribution of the test statistic and better detection of non-neutral evolution.
Another context in which FEL is often used is the estimation of site-by-site dN/dS (ω). These estimates are going to be quite noisy, and generally we do not recommend using them directly. But when coupled with some degree of uncertainty assessment, the estimates will be more useful. To make this possible, we added an option to compute profile likelihood estimates for each site.
These options are available via command line arguments (`--resample N` where `N` is the number of bootstrap replicates to draw; and `--ci Yes` to compute CI) and the updated www.datamonkey.org submission page
Finally, we completely reworked the FEL visualization page (http://vision.hyphy.org/FEL) based the ObservableHQ framework with targeted visualization design.
### Site-by-site estimates of dN/dS with uncertainty quantification
>Maximum likelihood estimates of dN/dS at each site, together with estimated profile condifence intervals (if available). dN/dS = 1 (neutrality) is depicted as a horizontal gray line.
![](https://i.imgur.com/Hbg0n9Y.png)
### Site-by-site estimates of individual rates
>Maximum likelihood estimates of synonymous (α) and non-synonymous rates (β) at each site shown as bars. The line shows the estimates under the null model (α=β). Estimates above 10 are censored at this value.
![](https://i.imgur.com/0cHMLO7.png)
### Comparing asymptotic and bootstrapped p-values
Compare the level of agreement and identify which sites differ in classification (crosses)
![](https://i.imgur.com/mT8D3wx.png)
### Identify the degree of agreement between asymptotic and bootstrap p-value distributions
Here's an example of good and poor agreement, depending on the site.
![](https://i.imgur.com/1DDympc.png)

Viruses are as old as life itself, infecting everything from bacteria to humans. Since the very beginning, viruses have left an indelible imprint on the human genome, human history, and medical research. SARS-CoV-2, the viral pathogen responsible for the COVID-19 pandemic, is just the latest in a long line of highly impactful human pathogens, including smallpox, Influenza A virus (IAV) and HIV-1. By some estimates, close to 30% of human proteins are involved in combating viral infections.

1/7/2023BW Indexed in: https://hackmd.io/@hannahkimincompbio/Sk9T_TIBY Writeup leader: Steven WeaverGoogle doc draft 1: https://docs.google.com/document/d/1ERRQVBIyBt_98uRQ7f4EvkgT1L2xJ9pNT-pmHBd1MzA/edit?usp=sharing Google doc draft 2: https://docs.google.com/document/d/1rnGZZZrcIzI6YtZlFgXri3j-mknFHXHonn_kK8h8U_g/edit?usp=sharing Project board: https://github.com/users/stevenweaver/projects/2/views/1 Authors: Sergei Pond, Steven Weaver, Jordan D Zehr, Alexander Lucaci, Hannah Kim, Avery Selberg Institutions: iGEM Potential delivery date: November 19th, 2021 (earliest) Abstract Write last

1/7/2023Your task is to write a Python script which implements a simple Neighbor Joining algorithm for phylogeny inference based on the Jukes Cantor distance, that works reasonably fast for ~20 or so sequences. Do not use "prepackaged" routines from BioPython, other than to read sequence files. Input : a FASTA multiple sequence alignment Output: inferred phylogenetic tree with branch lengths (you can choose the format, but Newick format is standard) For example $python3 NJ.py --msa test.fas ... (((human:0.01, chimp: 0.02): 0.03, gorilla : 0.03, orangutan : 0.03) : 0.01, gibbon : 0.05)

10/24/2022From Drabeck et el "Tall" dataset. This dataset has unusual dimensions: short (30) codons and relatively many (199) sequences. This creates some statistical issues that could potentially be impactful. Many branch lengths are =0 (for example, HyPhy collapses 130 internal tree branches because they are 0). This is not biologically realistic. If possible, it would be better to estimate branch lengths from a longer gene alignment, even if not all species are present. The precision with which non-0 branch lengths are estimated is degraded (likely biased). This could create downstream issues with all method. Methods which draw power from sequence length (e.g. aBSREL, BUSTED, and also PAML), are going to suffer power loss. Basic data exploration.

9/19/2022
Published on ** HackMD**

or

By clicking below, you agree to our terms of service.

Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet

Wallet
(
)

Connect another wallet
New to HackMD? Sign up