An application for user-friendly state of the art molecular sequence analysis.
Github repository https://github.com/aglucaci/GeneInvestigator
Transcript and Protein data from orthologous sequences.
For example, if we are interested in the TP53 gene: https://www.ncbi.nlm.nih.gov/gene/7157/ortholog/?scope=117570&term=TP53
Download all information:
Typically performed on one gene per species, but all transcripts per species can also be analyzed as an option.
Excerpt from: https://www.ncbi.nlm.nih.gov/gene/7157
This gene encodes a tumor suppressor protein containing transcriptional activation, DNA binding, and oligomerization domains. The encoded protein responds to diverse cellular stresses to regulate expression of target genes, thereby inducing cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. Mutations in this gene are associated with a variety of human cancers, including hereditary cancers such as Li-Fraumeni syndrome. Alternative splicing of this gene and the use of alternate promoters result in multiple transcript variants and isoforms. Additional isoforms have also been shown to result from the use of alternate translation initiation codons from identical transcript variants (PMIDs: 12032546, 20937277). [provided by RefSeq, Dec 2016]
{Todo}
Number of species in alignment
Number of sites in alignment
GC content?
Summarize positive and negative sites
Figure 1. BUSTEDS found evidence found evidence (LRT, p-value ≤ 0.05) of gene-wide episodic diversifying selection in the selected test branches of your phylogeny.
Describe differences between omega {1,2,3} value and proportion between unconstrained and constrained model. Anything to say with ER plots? Threshold the ER plot?
Plotting everything but highlighting (via color) the significant positive sites.
Figure 2. MEME analysis of your gene of interest found 126 of 1167 sites to be statisically significant (p-value <= 0.1)
Only plotting the negative sites here
Figure 3. FEL analysis of your gene of interest found 411 of 1167 sites to be statisically significant (p-value <= 0.1) for pervasive negative/purifying selection
Plot everything, but highlight the negative sites
"ProbNegative" here corresponds to "Prob[alpha>beta]" from the FUBAR output.
Figure 4. FUBAR analysis of your gene of interest found 418 of 1167 sites to be statisically significant (posterior probability threshold 0.9) for episodic negative/purifying selection. FUBAR analysis of your gene of interest found 2 of 1167 sites to be statisically significant (posterior probability threshold 0.9) for episodic positive/diversifying selection.
Does not show "triangle" linked coevolving sites e.g. site pairs (1,2) and (2,3)
"ProbS1andS2" is the posterior probablity that sites 1 and 2 are not conditionally independent.
Shared subs is the number of substitions shared between a pair of sites.
Figure 5. BGM analysis of your gene of interest found 395 pairs of coevolving sites out of 1167 total sites to be statisically significant (posterior probability threshold 0.5).
Displaying significant branches and nodes in red.
Figure 6. aBSREL analysis of your gene of interest found 38 of 513 branches to be statisically significant (p-value <= 0.05) for episodic diversifying selection
269 Species in the alignment.
Aim for between 5-10 lineage descriptions.
What am I left with? 6 clade descriptions
Mammalia (perhaps subdivide?)
Osteoglossocephalai = bony fishes
Sauropsida = reptiles
Gymnophiona = amphibians
Erpetoichthys = ropefish (~1% = 2-3 species.)
Batrachia = tailless amphibians (like frogs) (1 specie?)
Semionotiformes = (1 specie?)
Is this gene under selection?
Do significant sites between MEME and FEL overlap? They should for positively selected sites, and should not for negatively selected (statisically significant) FEL sites.
Are coevolving sites under positive or negative selection, linked-selection
BUSTEDS
RELAX, how many lineages do I have? Do analysis between branches.
aBSREL
MEME, FEL, SLAC, FUBAR
GARD, subsample to ~30, Discuss recombination
HyPhy
Datamonkey
All HyPhy methods
Accessory methods, TN93, MAFFT, IQTREE
Evolution of Viral Genomes: Interplay Between Selection, Recombination, and Other Forces
# | Baseline MG94xREV | Baseline MG94xREV omega ratio | Corrected P-value | Full adaptive model | Full adaptive model (non-synonymous subs/site) | Full adaptive model (synonymous subs/site) | LRT | Nucleotide GTR | Rate Distributions | Rate classes | Uncorrected P-value | original name |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.271015 | 0.248128 | 0.00101124 | 1.24717 | 1.16192 | 0.0852501 | 23.9811 | 0.219682 | [[0.08351739391025588, 0.814034501673088], [25.78312631442658, 0.185965498326912]] | 2 | 2.05536e-06 | nan |
2 | 0.247485 | 0.402128 | 0.0224881 | 379.635 | 379.548 | 0.0870347 | 17.756 | 0.193016 | [[0.1794147285931847, 0.8288530517605159], [9090.000000003736, 0.1711469482394841]] | 2 | 4.68503e-05 | nan |
3 | 0.0148508 | 0.518222 | 0.000322385 | 0.175039 | 0.172598 | 0.00244112 | 26.2757 | 0.0177287 | [[0, 0.9744626224056592], [987.8020916848121, 0.02553737759434083]] | 2 | 6.49969e-07 | nan |
4 | 0.0688569 | 1e+10 | 0.0483007 | 1.04917 | 1.04569 | 0.00347803 | 16.2193 | 0.0859847 | [[0, 0.8800201943708488], [894.04943259379, 0.1199798056291512]] | 2 | 0.000101472 | nan |
5 | 0.147321 | 0.783096 | 0.0114947 | 0.658764 | 0.639233 | 0.0195312 | 19.1077 | 0.145414 | [[0.3838030590330714, 0.8444078195665582], [72.96547749683984, 0.1555921804334418]] | 2 | 2.37494e-05 | nan |
6 | 0.0494127 | 0.248901 | 0.0391371 | 2.74932 | 2.73446 | 0.0148621 | 16.6416 | 0.0494891 | [[0.08171724964798617, 0.957908917963494], [1557.699535943612, 0.04209108203650602]] | 2 | 8.20484e-05 | nan |
7 | 0.214756 | 0.410058 | 0.000978347 | 0.871563 | 0.806064 | 0.0654993 | 24.051 | 0.180228 | [[0.1095580176944264, 0.8408750960441533], [27.01378932551229, 0.1591249039558467]] | 2 | 1.98448e-06 | nan |
8 | 0.216423 | 0.404559 | 0.00334964 | 0.909491 | 0.84621 | 0.0632809 | 21.5789 | 0.192698 | [[0.08907333920270652, 0.8257657450393591], [26.96026079730138, 0.1742342549606409]] | 2 | 6.86401e-06 | nan |
9 | 0.0193913 | 0.467868 | 0.00343106 | 2.81498 | 2.81013 | 0.00485433 | 21.5269 | 0.0181038 | [[0.1905853889518029, 0.978009820385468], [9383.731856578368, 0.02199017961453198]] | 2 | 7.04531e-06 | nan |
10 | 0.0239495 | 0.406299 | 0 | 0.582777 | 0.580037 | 0.00273964 | 121.617 | 0.0215566 | [[0, 0.9502300085151673], [1517.731423200614, 0.04976999148483274]] | 2 | 0 | nan |
11 | 0.0562303 | 0.445824 | 0.000189855 | 0.323557 | 0.30473 | 0.0188271 | 27.3394 | 0.0539543 | [[0.2697852411103061, 0.968801038023459], [176.7157679676714, 0.03119896197654104]] | 2 | 3.81235e-07 | nan |
12 | 0.040963 | 0.863876 | 0 | 0.40471 | 0.40117 | 0.00353967 | 126.751 | 0.040804 | [[0.05420569097218374, 0.9502046359232122], [811.0053477596006, 0.04979536407678775]] | 2 | 0 | nan |
13 | 0.173598 | 0.346965 | 0.0120047 | 2.73766 | 2.72766 | 0.0100021 | 19.0172 | 0.134369 | [[1, 0.8545620545506069], [663.1141152380948, 0.1454379454493931]] | 2 | 2.48545e-05 | nan |
14 | 0.114996 | 0.809018 | 0.0028013 | 1.7344 | 1.73436 | 4.28557e-05 | 21.9431 | 0.100148 | [[1, 0.8556210297601691], [100000, 0.1443789702398309]] | 2 | 5.71693e-06 | nan |
15 | 0.122433 | 0.478445 | 0.000576403 | 1.01034 | 0.995354 | 0.0149887 | 25.1094 | 0.102575 | [[0.2539175969811507, 0.8727106483495446], [184.3916331081659, 0.1272893516504554]] | 2 | 1.16681e-06 | nan |
16 | 0.00817209 | 0.340903 | 0.0347516 | 0.119071 | 0.117071 | 0.00199951 | 16.8821 | 0.0080027 | [[0, 0.993626957337727], [3277.778847969584, 0.00637304266227301]] | 2 | 7.2702e-05 | XM_005749774_1_PREDICTED_Pundamilia_nyererei_cellular_tumor_antigen_p53_like_LOC102203120_mRNA_1 |
17 | 0.103901 | 0.293289 | 0.00233172 | 10.3832 | 10.3375 | 0.0456859 | 22.3126 | 0.0981326 | [[0.1357129025907057, 0.9482367361682671], [1557.113665935761, 0.05176326383173291]] | 2 | 4.74892e-06 | XM_007952255_1_PREDICTED_Orycteropus_afer_afer_tumor_protein_p53_TP53_mRNA_1 |
18 | 0.0457502 | 0.32037 | 1.27578e-11 | 0.241786 | 0.22455 | 0.0172363 | 60.3389 | 0.0455824 | [[0.02364363647801504, 0.9625574726250689], [123.5301901932855, 0.03744252737493114]] | 2 | 2.53131e-14 | XM_008010192_2_PREDICTED_Chlorocebus_sabaeus_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
19 | 0.356247 | 0.320588 | 7.02686e-07 | 3.65051 | 3.53951 | 0.110996 | 38.5186 | 0.312879 | [[0.1159697821727439, 0.7769833043336732], [50.61114657575612, 0.2230166956663268]] | 2 | 1.40537e-09 | XM_008321788_3_PREDICTED_Cynoglossus_semilaevis_tumor_protein_p53_tp53_mRNA_1 |
20 | 0.110145 | 0.398121 | 0.00331385 | 0.673168 | 0.630454 | 0.0427146 | 21.6043 | 0.107742 | [[0.05381489102857581, 0.9132353699648752], [60.12599823633681, 0.0867646300351248]] | 2 | 6.7768e-06 | XM_010571169_1_PREDICTED_Haliaeetus_leucocephalus_tumor_protein_p53_TP53_mRNA_1 |
21 | 0.0315783 | 0.366843 | 2.68195e-08 | 6.57414 | 6.56455 | 0.0095854 | 45.047 | 0.0318192 | [[0, 0.9731198669707395], [9090.000000003736, 0.02688013302926051]] | 2 | 5.34253e-11 | XM_011727613_2_PREDICTED_Macaca_nemestrina_tumor_protein_p53_TP53_mRNA_1 |
22 | 0.248904 | 0.221391 | 0.0160571 | 102.609 | 102.513 | 0.0962467 | 18.4343 | 0.204709 | [[0.1563804928982661, 0.9012698341888755], [3847.530319680883, 0.09873016581112448]] | 2 | 3.33135e-05 | XM_014019513_1_PREDICTED_Austrofundulus_limnaeus_tumor_protein_p53_tp53_mRNA_1 |
23 | 0.121 | 0.319189 | 1.87822e-12 | 4.44712 | 4.42298 | 0.024138 | 64.1631 | 0.11054 | [[0.0168368360727766, 0.885082564787816], [568.7597482656681, 0.114917435212184]] | 2 | 3.71925e-15 | XM_014261886_1_PREDICTED_Pseudopodoces_humilis_tumor_protein_p53_TP53_mRNA_1 |
24 | 0.176395 | 0.243335 | 9.1023e-05 | 1.9998 | 1.93635 | 0.0634534 | 28.8093 | 0.15838 | [[0.02835874316489893, 0.8674912613583818], [81.9787911678715, 0.1325087386416182]] | 2 | 1.82411e-07 | XM_014893382_1_PREDICTED_Sturnus_vulgaris_tumor_protein_p53_TP53_partial_mRNA_1 |
25 | 0.135205 | 0.7309 | 0 | 81.0506 | 81.0302 | 0.0203742 | 88.5054 | 0.134317 | [[0.1397099080155256, 0.8439130611686997], [9090.000000003736, 0.1560869388313003]] | 2 | 0 | XM_015541817_1_PREDICTED_Panthera_tigris_altaica_tumor_protein_p53_TP53_partial_mRNA_1 |
26 | 0.0422313 | 0.749838 | 6.26628e-10 | 7.36183 | 7.35356 | 0.00827325 | 52.5537 | 0.0416236 | [[0.3056022114043056, 0.9651459173578737], [9090.000000003736, 0.03485408264212631]] | 2 | 1.24578e-12 | XM_015562573_1_PREDICTED_Myotis_davidii_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
27 | 0.125564 | 0.678655 | 0 | 35.3403 | 35.3302 | 0.0101213 | 192.752 | 0.124534 | [[0.0676211734326852, 0.8629977433488712], [9090.000000003736, 0.1370022566511288]] | 2 | 0 | XM_015992323_1_PREDICTED_Peromyscus_maniculatus_bairdii_tumor_protein_p53_Tp53_transcript_variant_X1_mRNA_1 |
28 | 0.0714231 | 0.796249 | 0 | 13.9542 | 13.9481 | 0.00612411 | 157.307 | 0.0720728 | [[0, 0.9106059867613868], [9090.000000003736, 0.08939401323861318]] | 2 | 0 | XM_016931470_2_PREDICTED_Pan_troglodytes_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
29 | 0.0594683 | 0.427825 | 5.62883e-14 | 0.561761 | 0.551066 | 0.0106943 | 70.5412 | 0.0498929 | [[0.1259844837537699, 0.9514527920641109], [376.2254588321816, 0.04854720793588907]] | 2 | 1.11022e-16 | XM_025195179_1_PREDICTED_Alligator_sinensis_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
30 | 0.041931 | 0.258989 | 0.0219854 | 0.101613 | 0.0799033 | 0.0217093 | 17.8051 | 0.0408591 | [[0.03863496214427223, 0.9690592305995114], [41.23128517951871, 0.03094076940048862]] | 2 | 4.57078e-05 | XM_028780049_1_PREDICTED_Grammomys_surdaster_tumor_protein_p53_Tp53_transcript_variant_X1_mRNA_1 |
31 | 0.0610096 | 0.310105 | 0.00420115 | 5.10637 | 5.07529 | 0.0310796 | 21.1196 | 0.0621175 | [[0.1018540946120266, 0.9626603774309264], [1557.699906332703, 0.03733962256907364]] | 2 | 8.64434e-06 | XM_030474375_1_PREDICTED_Strigops_habroptila_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
32 | 0.0550798 | 0.651481 | 0.00619516 | 0.335227 | 0.323262 | 0.0119644 | 20.3421 | 0.0548318 | [[0.2411625338720781, 0.9514469608672388], [193.8137073349213, 0.04855303913276121]] | 2 | 1.27735e-05 | XM_030548110_1_PREDICTED_Gopherus_evgoodei_tumor_protein_p53_TP53_mRNA_1 |
33 | 0.0355974 | 1.00236 | 0 | 25.751 | 25.7489 | 0.00206024 | 87.036 | 0.0354803 | [[0.7066674874553129, 0.9554163256482412], [100000, 0.04458367435175881]] | 2 | 0 | XM_030861565_1_PREDICTED_Globicephala_melas_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
34 | 0.120016 | 0.254143 | 0.022593 | 0.893713 | 0.83993 | 0.0537829 | 17.7426 | 0.108033 | [[0.09255911594996524, 0.9144690076952635], [64.15460824206325, 0.08553099230473649]] | 2 | 4.7167e-05 | XM_030970056_1_PREDICTED_Camarhynchus_parvulus_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
35 | 0.0242619 | 0.599177 | 0.000241366 | 0.0477902 | 0.0401435 | 0.00764665 | 26.8568 | 0.0250599 | [[0.2009160516548929, 0.9798753216487776], [83.28854940137336, 0.02012467835122245]] | 2 | 4.85646e-07 | XM_032280430_1_PREDICTED_Sapajus_apella_tumor_protein_p53_TP53_transcript_variant_X1_mRNA_1 |
36 | 0.0515846 | 0.500838 | 0.000548347 | 15.0164 | 15.0013 | 0.0150812 | 25.2129 | 0.0499321 | [[0.1917560263656985, 0.9609786199719969], [9090.000000003736, 0.03902138002800315]] | 2 | 1.10777e-06 | XM_032794536_1_PREDICTED_Chelonoidis_abingdonii_tumor_protein_p53_TP53_mRNA_1 |
37 | 0.213434 | 0.37986 | 3.69073e-08 | 95.6181 | 95.564 | 0.0540627 | 44.4054 | 0.193231 | [[0.1554275170793705, 0.8360599805867974], [3846.116864262084, 0.1639400194132026]] | 2 | 7.36673e-11 | XM_035135727_1_PREDICTED_Zootoca_vivipara_tumor_protein_p53_TP53_transcript_variant_X2_mRNA_1 |
38 | 0.0430373 | 0.588996 | 0 | 0.65763 | 0.649333 | 0.00829716 | 87.2536 | 0.0433632 | [[0.2025151800612972, 0.9628558985353769], [746.4563181259871, 0.0371441014646231]] | 2 | 0 | XM_041664596_1_PREDICTED_Microtus_oregoni_tumor_protein_p53_Tp53_transcript_variant_X1_mRNA_1 |
v1 https://colab.research.google.com/drive/1JWvp9zTEqCslH5P2_9UhwQllmOJOGAF9?usp=sharing
v2 https://colab.research.google.com/drive/1fNjBYkIOH9DFD8hdmXJTekj8Hzg-g3Jl?usp=sharing
BUSTEDS Unconstrained model | BUSTEDS Constrained model |
---|---|
Figure 1. BUSTEDS found evidence found evidence (LRT, p-value ≤ 0.05) of gene-wide episodic diversifying selection in the selected test branches of your phylogeny.