Try   HackMD

GeneInvestigator (GI)

An application for user-friendly state of the art molecular sequence analysis.

Software availability

Github repository https://github.com/aglucaci/GeneInvestigator

Retreive input data from NCBI Orthologs

Transcript and Protein data from orthologous sequences.

For example, if we are interested in the TP53 gene: https://www.ncbi.nlm.nih.gov/gene/7157/ortholog/?scope=117570&term=TP53

Download all information:

  • Tabular data (Contains accession numbers, species scientific name, and other metadata)
  • Reference transcript sequence.
  • Reference protein sequence.

Typically performed on one gene per species, but all transcripts per species can also be analyzed as an option.

Pipeline

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Introduction for TP53

Excerpt from: https://www.ncbi.nlm.nih.gov/gene/7157

This gene encodes a tumor suppressor protein containing transcriptional activation, DNA binding, and oligomerization domains. The encoded protein responds to diverse cellular stresses to regulate expression of target genes, thereby inducing cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. Mutations in this gene are associated with a variety of human cancers, including hereditary cancers such as Li-Fraumeni syndrome. Alternative splicing of this gene and the use of alternate promoters result in multiple transcript variants and isoforms. Additional isoforms have also been shown to result from the use of alternate translation initiation codons from identical transcript variants (PMIDs: 12032546, 20937277). [provided by RefSeq, Dec 2016]

Example Results

Quick summary text

{Todo}

Number of species in alignment
Number of sites in alignment
GC content?
Summarize positive and negative sites

Results of BUSTEDS analysis for a gene-level overview

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 1. BUSTEDS found evidence found evidence (LRT, p-value ≤ 0.05) of gene-wide episodic diversifying selection in the selected test branches of your phylogeny.

Describe differences between omega {1,2,3} value and proportion between unconstrained and constrained model. Anything to say with ER plots? Threshold the ER plot?

Results of MEME analysis for episodic positive selection

Plotting everything but highlighting (via color) the significant positive sites.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 2. MEME analysis of your gene of interest found 126 of 1167 sites to be statisically significant (p-value <= 0.1)

Results of FEL analysis for negative selection

Only plotting the negative sites here

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 3. FEL analysis of your gene of interest found 411 of 1167 sites to be statisically significant (p-value <= 0.1) for pervasive negative/purifying selection

Results of FUBAR analysis for episodic selection

Plot everything, but highlight the negative sites

"ProbNegative" here corresponds to "Prob[alpha>beta]" from the FUBAR output.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 4. FUBAR analysis of your gene of interest found 418 of 1167 sites to be statisically significant (posterior probability threshold 0.9) for episodic negative/purifying selection. FUBAR analysis of your gene of interest found 2 of 1167 sites to be statisically significant (posterior probability threshold 0.9) for episodic positive/diversifying selection.

Results of BGM analysis for coevolving sites

Does not show "triangle" linked coevolving sites e.g. site pairs (1,2) and (2,3)

"ProbS1andS2" is the posterior probablity that sites 1 and 2 are not conditionally independent.

Shared subs is the number of substitions shared between a pair of sites.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 5. BGM analysis of your gene of interest found 395 pairs of coevolving sites out of 1167 total sites to be statisically significant (posterior probability threshold 0.5).

Results of aBSREL analysis for episodic diversifying selection on branches

Displaying significant branches and nodes in red.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 6. aBSREL analysis of your gene of interest found 38 of 513 branches to be statisically significant (p-value <= 0.05) for episodic diversifying selection

Which lineages are represented in my alignment?

269 Species in the alignment.
Aim for between 5-10 lineage descriptions.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

What am I left with? 6 clade descriptions

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Mammalia (perhaps subdivide?)
Osteoglossocephalai = bony fishes
Sauropsida = reptiles
Gymnophiona = amphibians
Erpetoichthys = ropefish (~1% = 2-3 species.)
Batrachia = tailless amphibians (like frogs) (1 specie?)
Semionotiformes = (1 specie?)

Discussion

Is this gene under selection?

Do significant sites between MEME and FEL overlap? They should for positively selected sites, and should not for negatively selected (statisically significant) FEL sites.

Are coevolving sites under positive or negative selection, linked-selection

BUSTEDS

RELAX, how many lineages do I have? Do analysis between branches.

aBSREL
MEME, FEL, SLAC, FUBAR

GARD, subsample to ~30, Discuss recombination

References

HyPhy
Datamonkey
All HyPhy methods
Accessory methods, TN93, MAFFT, IQTREE
Evolution of Viral Genomes: Interplay Between Selection, Recombination, and Other Forces

Supplementary Tables

Table 1. BUSTEDS Fits, test results

Table 2. MEME Significant sites

Table 3. FEL Negative Significant sites

Tavke 4. aBSREL

# Baseline MG94xREV Baseline MG94xREV omega ratio Corrected P-value Full adaptive model Full adaptive model (non-synonymous subs/site) Full adaptive model (synonymous subs/site) LRT Nucleotide GTR Rate Distributions Rate classes Uncorrected P-value original name
1 0.271015 0.248128 0.00101124 1.24717 1.16192 0.0852501 23.9811 0.219682 [[0.08351739391025588, 0.814034501673088], [25.78312631442658, 0.185965498326912]] 2 2.05536e-06 nan
2 0.247485 0.402128 0.0224881 379.635 379.548 0.0870347 17.756 0.193016 [[0.1794147285931847, 0.8288530517605159], [9090.000000003736, 0.1711469482394841]] 2 4.68503e-05 nan
3 0.0148508 0.518222 0.000322385 0.175039 0.172598 0.00244112 26.2757 0.0177287 [[0, 0.9744626224056592], [987.8020916848121, 0.02553737759434083]] 2 6.49969e-07 nan
4 0.0688569 1e+10 0.0483007 1.04917 1.04569 0.00347803 16.2193 0.0859847 [[0, 0.8800201943708488], [894.04943259379, 0.1199798056291512]] 2 0.000101472 nan
5 0.147321 0.783096 0.0114947 0.658764 0.639233 0.0195312 19.1077 0.145414 [[0.3838030590330714, 0.8444078195665582], [72.96547749683984, 0.1555921804334418]] 2 2.37494e-05 nan
6 0.0494127 0.248901 0.0391371 2.74932 2.73446 0.0148621 16.6416 0.0494891 [[0.08171724964798617, 0.957908917963494], [1557.699535943612, 0.04209108203650602]] 2 8.20484e-05 nan
7 0.214756 0.410058 0.000978347 0.871563 0.806064 0.0654993 24.051 0.180228 [[0.1095580176944264, 0.8408750960441533], [27.01378932551229, 0.1591249039558467]] 2 1.98448e-06 nan
8 0.216423 0.404559 0.00334964 0.909491 0.84621 0.0632809 21.5789 0.192698 [[0.08907333920270652, 0.8257657450393591], [26.96026079730138, 0.1742342549606409]] 2 6.86401e-06 nan
9 0.0193913 0.467868 0.00343106 2.81498 2.81013 0.00485433 21.5269 0.0181038 [[0.1905853889518029, 0.978009820385468], [9383.731856578368, 0.02199017961453198]] 2 7.04531e-06 nan
10 0.0239495 0.406299 0 0.582777 0.580037 0.00273964 121.617 0.0215566 [[0, 0.9502300085151673], [1517.731423200614, 0.04976999148483274]] 2 0 nan
11 0.0562303 0.445824 0.000189855 0.323557 0.30473 0.0188271 27.3394 0.0539543 [[0.2697852411103061, 0.968801038023459], [176.7157679676714, 0.03119896197654104]] 2 3.81235e-07 nan
12 0.040963 0.863876 0 0.40471 0.40117 0.00353967 126.751 0.040804 [[0.05420569097218374, 0.9502046359232122], [811.0053477596006, 0.04979536407678775]] 2 0 nan
13 0.173598 0.346965 0.0120047 2.73766 2.72766 0.0100021 19.0172 0.134369 [[1, 0.8545620545506069], [663.1141152380948, 0.1454379454493931]] 2 2.48545e-05 nan
14 0.114996 0.809018 0.0028013 1.7344 1.73436 4.28557e-05 21.9431 0.100148 [[1, 0.8556210297601691], [100000, 0.1443789702398309]] 2 5.71693e-06 nan
15 0.122433 0.478445 0.000576403 1.01034 0.995354 0.0149887 25.1094 0.102575 [[0.2539175969811507, 0.8727106483495446], [184.3916331081659, 0.1272893516504554]] 2 1.16681e-06 nan
16 0.00817209 0.340903 0.0347516 0.119071 0.117071 0.00199951 16.8821 0.0080027 [[0, 0.993626957337727], [3277.778847969584, 0.00637304266227301]] 2 7.2702e-05 XM_005749774_1_PREDICTED_Pundamilia_nyererei_cellular_tumor_antigen_p53_like_LOC102203120_mRNA_1
17 0.103901 0.293289 0.00233172 10.3832 10.3375 0.0456859 22.3126 0.0981326 [[0.1357129025907057, 0.9482367361682671], [1557.113665935761, 0.05176326383173291]] 2 4.74892e-06 XM_007952255_1_PREDICTED_Orycteropus_afer_afer_tumor_protein_p53_TP53_mRNA_1
18 0.0457502 0.32037 1.27578e-11 0.241786 0.22455 0.0172363 60.3389 0.0455824 [[0.02364363647801504, 0.9625574726250689], [123.5301901932855, 0.03744252737493114]] 2 2.53131e-14 XM_008010192_2_PREDICTED_Chlorocebus_sabaeus_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
19 0.356247 0.320588 7.02686e-07 3.65051 3.53951 0.110996 38.5186 0.312879 [[0.1159697821727439, 0.7769833043336732], [50.61114657575612, 0.2230166956663268]] 2 1.40537e-09 XM_008321788_3_PREDICTED_Cynoglossus_semilaevis_tumor_protein_p53_tp53_mRNA_1
20 0.110145 0.398121 0.00331385 0.673168 0.630454 0.0427146 21.6043 0.107742 [[0.05381489102857581, 0.9132353699648752], [60.12599823633681, 0.0867646300351248]] 2 6.7768e-06 XM_010571169_1_PREDICTED_Haliaeetus_leucocephalus_tumor_protein_p53_TP53_mRNA_1
21 0.0315783 0.366843 2.68195e-08 6.57414 6.56455 0.0095854 45.047 0.0318192 [[0, 0.9731198669707395], [9090.000000003736, 0.02688013302926051]] 2 5.34253e-11 XM_011727613_2_PREDICTED_Macaca_nemestrina_tumor_protein_p53_TP53_mRNA_1
22 0.248904 0.221391 0.0160571 102.609 102.513 0.0962467 18.4343 0.204709 [[0.1563804928982661, 0.9012698341888755], [3847.530319680883, 0.09873016581112448]] 2 3.33135e-05 XM_014019513_1_PREDICTED_Austrofundulus_limnaeus_tumor_protein_p53_tp53_mRNA_1
23 0.121 0.319189 1.87822e-12 4.44712 4.42298 0.024138 64.1631 0.11054 [[0.0168368360727766, 0.885082564787816], [568.7597482656681, 0.114917435212184]] 2 3.71925e-15 XM_014261886_1_PREDICTED_Pseudopodoces_humilis_tumor_protein_p53_TP53_mRNA_1
24 0.176395 0.243335 9.1023e-05 1.9998 1.93635 0.0634534 28.8093 0.15838 [[0.02835874316489893, 0.8674912613583818], [81.9787911678715, 0.1325087386416182]] 2 1.82411e-07 XM_014893382_1_PREDICTED_Sturnus_vulgaris_tumor_protein_p53_TP53_partial_mRNA_1
25 0.135205 0.7309 0 81.0506 81.0302 0.0203742 88.5054 0.134317 [[0.1397099080155256, 0.8439130611686997], [9090.000000003736, 0.1560869388313003]] 2 0 XM_015541817_1_PREDICTED_Panthera_tigris_altaica_tumor_protein_p53_TP53_partial_mRNA_1
26 0.0422313 0.749838 6.26628e-10 7.36183 7.35356 0.00827325 52.5537 0.0416236 [[0.3056022114043056, 0.9651459173578737], [9090.000000003736, 0.03485408264212631]] 2 1.24578e-12 XM_015562573_1_PREDICTED_Myotis_davidii_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
27 0.125564 0.678655 0 35.3403 35.3302 0.0101213 192.752 0.124534 [[0.0676211734326852, 0.8629977433488712], [9090.000000003736, 0.1370022566511288]] 2 0 XM_015992323_1_PREDICTED_Peromyscus_maniculatus_bairdii_tumor_protein_p53_Tp53_transcript_variant_X1_mRNA_1
28 0.0714231 0.796249 0 13.9542 13.9481 0.00612411 157.307 0.0720728 [[0, 0.9106059867613868], [9090.000000003736, 0.08939401323861318]] 2 0 XM_016931470_2_PREDICTED_Pan_troglodytes_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
29 0.0594683 0.427825 5.62883e-14 0.561761 0.551066 0.0106943 70.5412 0.0498929 [[0.1259844837537699, 0.9514527920641109], [376.2254588321816, 0.04854720793588907]] 2 1.11022e-16 XM_025195179_1_PREDICTED_Alligator_sinensis_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
30 0.041931 0.258989 0.0219854 0.101613 0.0799033 0.0217093 17.8051 0.0408591 [[0.03863496214427223, 0.9690592305995114], [41.23128517951871, 0.03094076940048862]] 2 4.57078e-05 XM_028780049_1_PREDICTED_Grammomys_surdaster_tumor_protein_p53_Tp53_transcript_variant_X1_mRNA_1
31 0.0610096 0.310105 0.00420115 5.10637 5.07529 0.0310796 21.1196 0.0621175 [[0.1018540946120266, 0.9626603774309264], [1557.699906332703, 0.03733962256907364]] 2 8.64434e-06 XM_030474375_1_PREDICTED_Strigops_habroptila_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
32 0.0550798 0.651481 0.00619516 0.335227 0.323262 0.0119644 20.3421 0.0548318 [[0.2411625338720781, 0.9514469608672388], [193.8137073349213, 0.04855303913276121]] 2 1.27735e-05 XM_030548110_1_PREDICTED_Gopherus_evgoodei_tumor_protein_p53_TP53_mRNA_1
33 0.0355974 1.00236 0 25.751 25.7489 0.00206024 87.036 0.0354803 [[0.7066674874553129, 0.9554163256482412], [100000, 0.04458367435175881]] 2 0 XM_030861565_1_PREDICTED_Globicephala_melas_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
34 0.120016 0.254143 0.022593 0.893713 0.83993 0.0537829 17.7426 0.108033 [[0.09255911594996524, 0.9144690076952635], [64.15460824206325, 0.08553099230473649]] 2 4.7167e-05 XM_030970056_1_PREDICTED_Camarhynchus_parvulus_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
35 0.0242619 0.599177 0.000241366 0.0477902 0.0401435 0.00764665 26.8568 0.0250599 [[0.2009160516548929, 0.9798753216487776], [83.28854940137336, 0.02012467835122245]] 2 4.85646e-07 XM_032280430_1_PREDICTED_Sapajus_apella_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1
36 0.0515846 0.500838 0.000548347 15.0164 15.0013 0.0150812 25.2129 0.0499321 [[0.1917560263656985, 0.9609786199719969], [9090.000000003736, 0.03902138002800315]] 2 1.10777e-06 XM_032794536_1_PREDICTED_Chelonoidis_abingdonii_tumor_protein_p53_TP53_mRNA_1
37 0.213434 0.37986 3.69073e-08 95.6181 95.564 0.0540627 44.4054 0.193231 [[0.1554275170793705, 0.8360599805867974], [3846.116864262084, 0.1639400194132026]] 2 7.36673e-11 XM_035135727_1_PREDICTED_Zootoca_vivipara_tumor_protein_p53_TP53_transcript_variant_X2_mRNA_1
38 0.0430373 0.588996 0 0.65763 0.649333 0.00829716 87.2536 0.0433632 [[0.2025151800612972, 0.9628558985353769], [746.4563181259871, 0.0371441014646231]] 2 0 XM_041664596_1_PREDICTED_Microtus_oregoni_tumor_protein_p53_Tp53_transcript_variant_X1_mRNA_1

Phylogenetic tree analysis, histogram of branch lengths and what is the total tree length?

AlignmentProfiler (Skip for now)

v1 https://colab.research.google.com/drive/1JWvp9zTEqCslH5P2_9UhwQllmOJOGAF9?usp=sharing

v2 https://colab.research.google.com/drive/1fNjBYkIOH9DFD8hdmXJTekj8Hzg-g3Jl?usp=sharing

Left over (can delete)

BUSTEDS Unconstrained model BUSTEDS Constrained model

Figure 1. BUSTEDS found evidence found evidence (LRT, p-value ≤ 0.05) of gene-wide episodic diversifying selection in the selected test branches of your phylogeny.