Phylogeny
心得
NONA
Winclada
TNT
TNT教學
https://www.researchgate.net/publication/323615953_A_guide_for_the_analysis_of_continuous_and_landmark_characters_in_TNT_Tree_Analysis_using_New_Technologies
https://matthewvavrek.com/2011/10/19/tnt-and-ubuntu/
https://sites.google.com/site/rossmounce/palass2011
TNT-Bootstrap
http://faculty.baruch.cuny.edu/jwahlert/bio1003/winclada_tnt_separate_basics.html
https://www.academia.edu/10365340/Guia_para_o_software_T.N.T._capítulo_de_dissertação_de_mestrado_atualizado_Guide_for_the_software_T.N.T._updated_MSc_thesis_chapter_in_Portuguese
Basic TNT instructions for bootstrapping
By Peter Unmack
TNT can be a little confusing to use. The following provides minimal instructions for running a bootstrap analysis.
Open TNT, go settings>memory and make it 600mb and for trees say 10,000. Open your data file (a nexus file is fine). Set an outgroup, Data>outgroup taxon, then select one (you can only select one). Next go Trees>multiple tags>store tree tags. To set up the bootstrap run go Analyze>resampling Bootstrap, standard. Unclick frequency differences and click on absolute frequency right above it, set the number of reps and use a traditional search. I don't change any of the other options as they all seem quite appropriate.
To view the results once the search has completed go Trees>multiple tags>show/save tags
You can either print out the tree or to save it hit m, which will prompt you to save the file as an enhanced metafile. At that point you are done. Hit escape to get back to the main TNT interface. You can also save the file to a nexus tree file, however, I've never messed with this. This link had the options, but the links are currently broken and I couldn't find the correct link to update them! http://tnt.insectmuseum.org/index.php/Commands/export. The FAQ explains how to convert it to newick format too. http://tnt.insectmuseum.org/index.php/FAQ#How_can_I_export_trees_then_convert_them_to_Newick_format.3F
Back to Unmack's Molecular Phylogenetics page.
下面紀錄一些論文的最後部分建立型態系統發生樹的操作細節,希望未來可以整理成教學文,減少他人或自己重複進行的冤妄路。
Winclada是用來建立特徵矩陣、看樹,及作為中介軟體以啟動其他建樹軟體的專用軟體。基本上免費,現在已經更名為"Alisa?",作為下一版本軟體的過度版本。NONA命名由來為No name,為同一作者開發的跑樹軟體,基本上專門設計來跑Maximum parsimony tree,可以透過Winclada來啟動。Winclada可以視為NONA等軟體的GUI,其他可以透過Winclada啟動的軟體包括TNT、MrBayes。
NONA只能在32bits的作業系統下運行,所以我使用虛擬機(VirtualBox)模擬 Windows7 作業系統,以運行軟體。
Winclada可以讀入的Nexus檔,為較早期的版本,與現行一般的格式會有所差異,可以詳見 Kristen Porter-Utley 於Youtube上的一系列影片簡介,本文末隨附格式範本。
TNT可以讀取.tnt 或 .nex,如果要讀取.tnt,則編碼必須設定為ANIS,如果是用UTF-8,則會發生讀取失敗。或者可以先利用TNT建立一模板.tnt檔案,再更改內容,只要不改變儲存編碼,應該可以順利讀入。有可能是安裝TNT的電腦的預設編碼問題,或者是TNT開發時的預設編碼問題。
TNT中的fit,其實報的是"Distortion"值,故越低越好。
Implied weighting是Goloboff(1993)年提出的方法,其基本概念如下:如果矩陣內的特徵有同塑現象,則實際的演化樹應比最簡約的那棵樹要有更多步(steps)。這個方法透過指定k值,計算在特定樹下,每個特徵的權重,並疊代找到表現最佳(best score-TBR,此值最低者)的一棵樹(Congreve & Lamsdell, 2016, p.453)。這棵所找到的最佳樹並不一定是最簡約的樹(MPT),但卻是最符合特徵變化情況的樹。
最簡約法則原本的概念是在特徵具有同源性的情況下,認為特徵的變化並不容易,進而衍生的概念;如果特徵已經不是同源,則定義上的最簡約樹並不是演化最有可能發生的情境。因此,Implied weighting的提出,基本上重新定義了何謂「最簡約性」,而透過Implied weighting後得到的樹,可以視為在特定權重下,特徵矩陣內各特徵所呈現出的最一致的樹。
贊成Implied weighting的人,基本上是從另一個角度來看待最簡約性;實務上,兩者所找到的樹接是在該情況下步數最少的樹,如果從此一觀點來看則不存在所謂「非最簡約」的問題。
Legg DA et al. Nature communications 4: 2485.
此篇論文中對k值意義的解釋源自Goloboff (1993),該文章引用錯誤。
DeSilva et al. Zoologica Scripta 44: 59-71.
此篇文章提到隨著K提高,fit逐漸下降。雖然是事實,但本文所引用的
Goloboff (1993)並未闡述此事。
Relative Bremer support在實務上,往往只運用absolute Bremer support 運算時所使用的樹,來計算。雖然理論上應該要獨立運算,如同Goloboff在發表symmetric取樣的文章內對Bremer support及其變形的討論:
An example is the Bremer support and its variants, where the support is measured by comparing the fit of the data to optimal and suboptimal trees. The absolute (Bremer, 1988, 1994) and relative (Goloboff and Farris, 2001) Bremer supports measure two different aspects of group support. One aspect of the support is the absolute amount of favorable evidence, measured by the absolute Bremer support. The other aspect is the ratio between favorable and contradictory evidence, measured by the relative Bremer support. Ideally, these two quantities should be measured separately, because they represent two aspects of the support that can vary independently, but in practical terms it will often be preferable to combine them in a single value. (Goloboff et al., 2003, p. 326)
使用Implied weighting後,Bremer support會出現小數數值,小數述職的Bremer support不直觀,很難解釋;但若使用relative值,則是呈現資料間互相對立的度量,較易於解讀。見下段文字:
Under weighting methods such as those of Goloboff (1993, 1997), the Bremer supports may be hard to interpret, but the relative supports— for different weighting strengths— directly comparable. Goloboff & Farris, 2001, S32左上
以下整理自 Goloboff et al. (2003):
Jackknife 及 Bootstrap 為一種資料穩定性的量測,雖然可以間接推論Group support,但實際上他度量的是stability,即給定資料的情況下,這樣的結果是否會一再出現。反之,Group support,需要知道的是在現有的data中,正向證據與反向證據的比值,即是否現有的正向證據比反向證據多,也就是類似貝氏統計中,是否可以「支持」或「拒絕」假設。Supported group通常可以通過sensitively test(等同 stability test),但是不一定非得如此。在特定情況下 well supported的Group,即使在其他情境下不會被support,仍然不改其在特定情況(即現有data下)被support的事實。
全名 Group present/Contradicted,分別有三個值:-1, 0, +1,代表不同的支持程度。
值 | 意義 |
---|---|
-1 | maximum contradiction |
0 | in difference |
+1 | maximum support |
即使在absolute frequency < 0.5 的情況下,仍適用。
最佳情況是 absolute frequency > 0.5,且 0 < GC <= +1
基本上,負值,且越接近0,support的程度越高。
以所有可能變數跑樹後,將所有的可能演化假說consensus,回報該樹(即Mirande (2009, 2010, 2011) 所使用的方法,所以他的樹在大圖上才沒有支持度,若有支持度,是根據某一組參數所計算的結果)。
Soltis PS, Soltis DE. 2003. Applying the bootstrap in phylogeny reconstruction. Stat Sci 18_256-267
四個值皆 0 <= index <= 1,
CI代表整個特徵矩陣,given tree的情況下,特徵同塑的情況,越接近1表示整個特徵矩陣趨向沒有同塑現象發生,反之則特徵狀態重複出現。Ci與之相同,但是character by character。
RI與CI類似,但是反映的是特徵對支系的支持程度,即特徵作為一支系的共衍徵的程度。越接近一,代表整體特徵可以良好支持樹形下的各支系,反之則無。用於Ri,計算於各特徵時,可以作為補充資訊與Ci相比較,同一Ci的兩棵樹,Ri可能不同。若Ri提升,表示該特徵雖然有同塑現象,但仍有支系是由該同塑的特徵所支持。
使用TNT附帶STATS.RUN計算CI、RI時,須注意:若設有weighting set (xpiwe [ ;]),則後續STATS.RUN運算在 var. 'this' 這一項設定時會出現未知原因的錯誤,導致算出的CI、RI大於1,非常奇怪。如果未設置weighting set則無此問題。建議除了以STATS.RUN運算外,也使用Martín Ramírez發布於其網頁的自製script — CharStats.run,再次驗算。
下方為Martín Ramírez的 Google site 頁面:
https://www.google.com/url?q=https%3A%2F%2Fsites.google.com%2Fsite%2Fteosiste%2Ftp%2Farchivos&sa=D&sntz=1&usg=AFQjCNFZKiSJjeqK86TztLQmwElN-WxHnA
xpiwe (*
開啟不同特徵可以使用不同權重," * "指根據不同特徵的missing data狀況,決定權重。
解決結果視窗的亂碼問題:Format/Optional table formats,開啟後,原先的亂碼會以表格形式輸出,轉為英文。
Partitioned Bremer Support(PBS)-code
k-search.run 的用法
以 "piwe= ;" 讀入data matrix後,輸入"k-search.run def ;"即可。要注意的是,使用此script時,不能有un-activated的taxon,即不能使用taxcode的"-"功能來deactivate資料矩陣內的物種。
Wheeler, W.C., Gatesy, J., DeSalle, R., 1995. Elision: a method for accommodating multiple molecular sequence alignments with alignment-ambiguous sites. Mol. Phylogenet. Evol. 4, 1–9.
Wheeler, W. C. 1995. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. 44:321-331.
https://research.amnh.org/scicomp/pdfs/wheeler/Wheeler1995.pdf
Wheeler WC. 1999. Cladistics 15: 131-135
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1096-0031.1999.tb00255.x
Giribet G. 2003. Stability in phylogenetic formulations and its relationship to nodal support. Syst Biol 52_554-564 (最重要Review!)
https://www.jstor.org/stable/3651143?seq=1#metadata_info_tab_contents
Prendini, L., 2000. Phylogeny and classification of the superfamilyScorpionoidea Latreille 1802 (Chelicerata, Scorpiones): an exemplar approach. Cladistics 16, 1–78.
Giribet G et al. 2002. Phylogeny and systematic position of Opiliones: a combined analysis of Chelicerate relationships using morphological and molecular data. Cladistics 18: 5-70.
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1096-0031.2002.tb00140.x
Whiting, M.F., Carpenter, J.M., Wheeler, Q.D., Wheeler, W.C., 1997.
The Strepsiptera problem: phylogeny of the holometabolous insect
orders inferred from 18S and 28S ribosomal DNA sequences and
morphology. Syst. Biol. 46, 1–68.
And
Mirande 2009 Cladistics.
Vanegas-Rios J, Faustino-Fuster DR, Meza-Vargas V, Ortega H. 2020. J Zool Syst Evol Res 58: 387-407.
https://onlinelibrary.wiley.com/doi/pdf/10.1111/jzs.12346
此為補充:
Wheeler WC. 2003. Cladistics 19: 261-268
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1096-0031.2003.tb00369.x
Likelihood-Based Tests of Topologies in Phylogenetics
https://academic.oup.com/sysbio/article/49/4/652/1678908
Templeton test in parsimony
http://phylobotanist.blogspot.com/2015/09/the-templeton-test-in-parsimony.html
MrBayes基本教學
https://github.com/reelab/nie-phylogenetics/wiki
用形態data跑MrBayes的指令解說
http://phylobotanist.blogspot.com/2016/07/the-markov-k-model-for-discrete.html
嘗試使用其他Mk model以外的model來跑 morphological Bayes tree
https://academic.oup.com/sysbio/article/67/2/328/4102005
RavBayes - Graphical model for Bayes phylogenetic analysis, lauguage and program
https://academic.oup.com/sysbio/article/65/4/726/1753608
Result
Although generally poorly supported, the strict consensus tree (Fig. 1) is well resolved and reveals some interesting patterns. The monophyly of Boloria s.l. (clade 1) is supported strongly by three unique autapomorphies: uncus bifid (character 12: 1, Fig. 3C), female ductus bursae raised vertically with an arched, sclerotized plate just posterior to ductus seminalis (character 44: 1, Fig. 6A) and female bursa ostium with sclerotized ventral plates (character 46: 1, Fig. 6A).
Pena, C., Wahlberg, N., Weingartner, E., Kodandaramaiah, U., ˜ Nylin, S., Freitas, A.V.L. & Brower, A.V.Z. (2006) Higher level phylogeny of Satyrinae butterflies (Lepidoptera: Nymphalidae) based on DNA sequence data. Molecular Phylogenetics and Evolution, 40, 29–49.
negative Bremer support script
https://www.sciencedirect.com/science/article/abs/pii/S1055790306000583?via%3Dihub
Hemiptera-TNT+Winclada+NONA
https://onlinelibrary.wiley.com/doi/pdf/10.1111/syen.12140?casa_token=29t_jCVvKEAAAAAA:4NwRSz4U50_QriKCC41qSwlL2L8sbE1AYj68EUfGWAY3pU-nEpFefYdpIY17KPcduIemf-lp1q6Iz_37cw
小蠹蟲-TNT+Winclada+NONA >> 可能有誤,因為說是implied weighting,卻報整數Bremer support value,這應該不可能。而且有奇怪的: find*; 指令。
https://onlinelibrary.wiley.com/doi/pdf/10.1111/syen.12149?casa_token=N0eH77BKJuYAAAAA:ecdOUV6lcbMfBPdUkhzd_7KWT1RaRrElWQH-Ol_Zy7pmKsDHnyG__QXuQ8b2MwzDttox7bWJKDvZiKLTMA
Bremer support - good explanation
https://books.google.com.tw/books?id=csgACAAAQBAJ&pg=PA22&lpg=PA22&dq=bremer+support&source=bl&ots=VRoOLeQqLx&sig=ACfU3U1O_QIfqTJ0jSE0gEtNvue0-EG9Ew&hl=zh-TW&sa=X&ved=2ahUKEwjgocqqwqnpAhXDGaYKHcV9D2oQ6AEwE3oECAkQAQ#v=onepage&q=bremer support&f=false
胡蜂phylogeny-使用自己coding的script來算k值
https://onlinelibrary.wiley.com/doi/pdf/10.1111/syen.12105?casa_token=11P39LeLe1oAAAAA:hvsQq5HBZag1W16sALFSnxYZeaZ9sotpxrUQpnHGnZQTQkorE9OBlwB7QOvm10-2cVwXF5pRR28imUg82g
毛翅目-phylogeny
https://onlinelibrary.wiley.com/doi/pdf/10.1111/syen.12225?casa_token=8_PDMExt7DQAAAAA:V0hmnzOaew–EF40GwrmvTFNMdQTC6LEsQYfNrh14rS7jR7Cd9OANh4WxElI5Px61owx00rLAtb-rNGNDQ
葉蜂-phylogeny: 頭後縫線為特徵
https://onlinelibrary.wiley.com/doi/pdf/10.1111/syen.12314
薊馬-phylogeny
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-3113.2009.00511.x
將continuous data轉換成ratio,再進行gap weighting的方法
https://onlinelibrary.wiley.com/doi/full/10.1111/zsc.12120
Duke-Wiki, several guides about tree construction and statistical value(ancestral state reconstruction)
https://wiki.duke.edu/display/AnthroTree/2.3.2+Calculating+the+CI+and+RI+in+Mesquite
Google book-白書-Kitching IJ, Forey PL, Humphries CJ, Williams DM. , Cladistics: the theory and practice of parsimony analysis, 19982nd ed.OxfordOxford University Press
https://books.google.com.tw/books?id=F4JvBaeAj_oC&pg=PA95&lpg=PA95&dq=ensemble+consistent+index&source=bl&ots=2Q4dYc7_1D&sig=ACfU3U2xz7fskX0DxhLwynPDRYCl4fN6rQ&hl=zh-TW&sa=X&ved=2ahUKEwiu0oHdpbLpAhVIHKYKHeKkA7oQ6AEwCnoECAUQAQ#v=onepage&q=ensemble consistent index&f=false
Klingenberg CP et al. 2010. Testing and quantifying phylogenetic signals and homoplasy in morphometric data. Systematic Biology 59: 245-261
從morphometric data推論phylogeny relasionship,有趣,因為做出來的跟molecular的不一樣,那麼這個階層式的訊號究竟反映著甚麼演化上的訊息?
https://academic.oup.com/sysbio/article/59/3/245/1699888
將reduce data matrix的連續性資料部分重新離散化,再跑一次樹。因為級距會因為有分類群被去掉而改變。
將phylogeny tree 視為多變量分析的一種,去除非單系群的部分,再重新計算一次clade support,看看可否提高支持度,得到較穩定的樹。 >> 後面失敗,支持度未變,表示特徵並不能解開後方較細的支系
Washamyia的產卵館前緣要改-,非?
檢測台灣產Rhopalomyia屬的演化情境,是在單一寄主種化,還是不斷寄主轉移?
師:造癭特性的演化 >> 但我們的單系群取樣不足,無中國的,且支持度不高
我:植物會限制癭的形成 >> 但應該要有更多、更大的樹,才能說明植物的器官會限制癭形態