Try   HackMD

(AOC) Mammalia: BDNF

Abstract

Introduction

Results

Quick summary text
In general, not much positive selection going on, however extensive negative selection in the main functional NGF domain.

This is evidenced from BUSTEDS result, did not find evidence for gene-wide positive selection.

If we dig deeper (with more sensitive selection analyses), with MEME we see some activity in early half of gene (loosely, a prepro/regulatory region).
Lots of negative selection in latter half of the gene from FEL (loosely the NGF domain).
Some sites may be coevolving, as supported by BGM (an MCMC method for intragene coevolution detection), is this confirmed by structure (left to do)?

aBSREL shows that few branches have episodic selection, however a few are detected, these are… (which clades/species are these leading towards?)

Lineage assignment, remember we are looking at mammalia, however we have subdivided into a number of clades. We only take clades with >= 3 species for further analyses (can we lower this? 1 branch analysis?). This leaves about ~20 species unanalyzed with branch methods (RELAX/CFEL)

RELAX shows relaxed selection for several groups. Meaning, there is a move towards neutrality in this test group as compared to the reference (background). Relaxed selection lowers the "velocity" of selection, a classic example is the opsin gene family in nocturnal animals (bats, etc). Interpretation?

CFEL?

SLAC, FUBAR, PRIME (needs to be added, SLAC+FUBAR can confirm/increase confidence of FEL sites. Or plot where size of bubble is the number of methods which detect site for negative sites. Go with MEME for episodic positive sites. Maybe ignore pervasive positive sites? Useful or no?)

FADE? What to root on? IQTREE automated rooting?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 1. BUSTEDS analysis did not find evidence (LRT, p-value > 0.05) of gene-wide episodic diversifying selection in the selected test branches of our phylogeny.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 2. MEME analysis of BDNF found 33 of 449 (7.35%) sites to be statisically significant (LRT p-value <= 0.1).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 3. FEL analysis of BDNF found 185 of 449 (41.203%) sites to be statisically significant (p-value <= 0.1) for pervasive negative/purifying selection.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 4. BGM analysis of BDNF found 67 pairs of coevolving sites out of 449 total sites to be statisically significant (posterior probability threshold 0.5).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 5. aBSREL analysis of BDNF found 5 of 247 branches (Colored in Red) to be statisically significant (p-value <= 0.05) for episodic diversifying selection.

Internal notes on lineage assignment
23 Artiodactyla.clade
19 Carnivora.clade
17 Chiroptera.clade
3 Eulipotyphla.clade
23 Glires.clade
3 Perissodactyla.clade
25 Primates.clade
113 total

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Should be 132, what happened to 132-113=19? The rest are in small clades with <3 members and are excluded from further analyses.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 6. Patterns of natural selection across taxonomic groups under the Partitioned Descriptive model of the RELAX method. Selection profiles for BDNF are shown along Reference and Test branches for each taxonomic group. Three omega parameters and the relative proportion of sites they represent are plotted for Test (orange) and Reference (blue) branches. Only omega categories representing nonzero proportions of sites are shown. Neutral selection corresponds to the omega=1.0 in this log10 scaled X-axis. These taxonomic groups represent datasets where significant (p<= 0.05) for relaxed selection was detected between test and reference branches.

RELAX interpretation note: Intensified selection pushes all omega categories away from neutrality (omega = 1) whereas relaxed
selection pushes all omega categories toward neutrality

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Figure 7. CFEL

Methods

Data Retrieval

For this study, we queried the NCBI Ortholog database via https://www.ncbi.nlm.nih.gov/kis/ortholog/627/?scope=7776. For the purpose of this study, as are interested in mammalian BDNF evolution, we limited our search to only include species from this taxonomic group (mammals, Mammalia). This returned 162 full gene transcripts and protein sequences. We downloaded all available files: RefSeq protein sequences, RefSeq transcript sequences, Tabular data (CSV, metadata).

Table of species included in this analysis: [XX]

Data Cleaning

We use protein sequence and full gene transcripts to derive coding sequences (CDS) (via a custom script, scripts/codons.py).

However, this process was met with errors in 20 "PREDICTED" protein sequences which had invalid characters and these sequences were subsequently exempt from analysis. This process removes low-quality protein sequences from analysis which may inflate rates of nonsynonymous change. Or, sequences which otherwise have incorrect 'X', or unresolved amino acids.

ref|XP_005064867.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Mesocricetus auratus]
ref|XP_004755948.1| PREDICTED: LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Mustela putorius furo]
ref|XP_017201122.1| PREDICTED: LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Oryctolagus cuniculus]
ref|XP_006980880.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Peromyscus maniculatus bairdii]
ref|XP_028622754.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Grammomys surdaster]
ref|XP_034352200.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Arvicanthis niloticus]
ref|XP_036040876.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Onychomys torridus]
ref|XP_038187507.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Arvicola amphibius]
ref|XP_041534969.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Microtus oregoni]
ref|XP_005364109.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Microtus ochrogaster]
ref|XP_026928761.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Acinonyx jubatus]
ref|XP_040351506.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Puma yagouaroundi]
ref|XP_025784820.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Puma concolor]
ref|XP_019312122.1| PREDICTED: LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Panthera pardus]
ref|XP_032737137.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Lontra canadensis]
ref|XP_032214651.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Mustela erminea]
ref|XP_004607917.1| PREDICTED: LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Sorex araneus]
ref|XP_036076939.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Rousettus aegyptiacus]
ref|XP_023406540.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Loxodonta africana]
ref|XP_022363282.1| LOW QUALITY PROTEIN: brain-derived neurotrophic factor [Enhydra lutris kenyoni]

Analysis of Orthologous Collections (AOC)

The Analysis of Orthologous Collections (AOC) application is designed for comprehensive molecular sequence analysis. This application accepts two input files: a protein sequence unaligned fasta file, and a transcript sequence unaligned fasta file. Typically this can be retrrieved from databases such as NCBI Orthologs. In addition, the application is easily modifiable to accept a CDS input, if that data is available.

If protein and transcript files are provided, a custom script scripts/codons.py is executed and returns coding sequences where available (Note: this script currently is set to use the standard genetic code, this will meet to be modified for alternate codon tables). This script also removes "low-quality" sequences if no match is found, see the above "Data cleaning" section.

Step 1. Alignment, Hyphy-analyses codon-aware multiple sequence alignment procedure is executed. {Note about how many sequences are lost during pre-msa/post-msa.}

Step 2. Recombination detection, done manually via RDP, see below the "Recombination detection" section. A recombination free file is placed in the following folder: results/{GeneLabel}/Recombinants/. The analysis currently will only analyse one recombinant file, but this can be modified.

Step 3. Tree inference and selection analyses. For the recombination free fasta file, we perform maximum likelihood phylogenetic inference (IQ-TREE). Once this is done, the recombination free alignment + phylogenetic tree is used for a standard suite of molecular evolutionary analysis. This set of selection analyses include {}.

Step 4. Lineage assignment and tree annotation. For the recombination free phylogenetic tree. We perform lineage discovery, via NCBI/ete3, and assign lineages to a K (by default, K = 20) number of taxnomic groups. Here, the aim is to have a broad representation of taxonomic groups, rather than the lineages being heavily clustered. We aim for <40% of species to be assigned to any one particular group. We perform tree labelling via the hyphy-analyses/Label-Trees method. Resulting in one annotated tree per lineage.

Step 5. Selection analyses on lineages. Here, a recombination free fasta file and the set of annotated phylogenetic trees is used for analyses in the RELAX and Contrast-FEL methods.

Data gathering and codons.py

Alignment

Recombination detection (manually RDP)

AOC procedure on recombination free dataset
–Tree
–Rooted tree
–Selection analyses

AOC Lineage assignment

AOC on Lineages (RELAX, CFEL)

Recombination detection

Manually tested via RDPv5.5 [ref] with modified settings as follows:

  • Recombination events where β€˜accepted’ in cases more than 2 methods agreed.
  • Slightly modified default parameters
  • Sequences are linear
  • List events detected by >2
  • Alignment was save as a distributed alignment (with recombinant regions separated).
  • This single fasta was separated into two files the first a recombination free file with XX sequences (with recombinanted regions separated out).

Software availability

This application is freely available via a dedicated GitHub repository at: https://github.com/aglucaci/AnalysisOfOrthologousCollections

Discussion

References

Supplementary Material

Tables

MEME, FEL, BGM ,aBSREL, cFEL