Tutorial relax

# Tutorial relax ###### tags: `sloth` - Table of Content [ToC] # 1-) Softwares and files needed to run the analyses. I recommend making a conda enviroment with all of them, and my scripts you can download from github. ## 1.1) Software needed Hyphy - https://anaconda.org/bioconda/hyphy/ Phyml - https://anaconda.org/bioconda/phyml mafft - https://anaconda.org/bioconda/mafft Hyphy codon-msa - https://github.com/veg/hyphy-analyses/blob/master/codon-msa/pre-msa.bf Marcela's scripts: https://github.com/marcelauliano/mChoDid1/tree/main/hyphy ### 1.2.) Files needed https://drive.google.com/drive/u/1/folders/1X0cuBfl7XuBICiM32vTvxgZWJXXltmC8 # 2.) Find the sloth gene of interest We want to test the oxidative phosphorylation (and several other) genes for relaxed selection. So I have annotated the sloth genes with KEGG pathways. You need to find a KO on the KEEG map to select the sloth gene you want to test. Follow below steps: For oxidative phosphorylation, go here: https://www.genome.jp/pathway/map00190 Click one of the pathway modules. For example MP00144 https://www.genome.jp/module/M00144 Lets work with K03884 ## 2.1) Find the sloth gene giving the KO You can do: ``` grep "K01810" user_ko.txt ``` This will give you the gene name as below mChoDid1_lcl|NC_051333.1_prot_XP_037675410.1_52482 K01810 Once you have the gene name you can find the orthogroup with the genes for all the other species as ``` grep "mChoDid1_lcl|NC_006924.1_prot_YP_220691.1_54703" Orthogroups.txt ``` Then we find that the orthogroup is OG0002625 (just an example) ## 2.2.) Getting all nucleotide and protein sequences for that orthogroup Once you have the orthogroup name, you can use my script to get the fasta sequences for all species: First get all the partial IDS: ``` python get_orthogroup_edited_IDs.py -f Orthogroups.txt -i OG0002625 ``` Then get the nucleotide sequences: ``` python get_Fasta_byPartial_id.py -f all.longest.n1.nucl.fa -l OG0002625.ids -o OG0002625.nuc.fa ``` # 3-) Preparing aligments Once you have both files, you need to run hyphy codon-msa. First step is to run the pre script in the nucleotide files such as ``` hyphy pre-msa.bf --input OG0002625.nuc.fa ``` This run will give you protein fasta files as a result. You need to align that with mafft. ``` mafft --globalpair --maxiterate 100 --inputorder OG0002625.nuc.fa_protein.fas > OG0002625.nuc.fa_protein.mafft.fas ``` Then finally, you run the pos script ``` hyphy post-msa.bf --protein-msa OG0002625.nuc.fa_protein.mafft.fas --nucleotide-sequences OG0002625.nuc.fa_nuc.fas --output OG0002625.nuc.final.fas ``` The result of this last script is what you will use to input to phyml to build a tree and to hyphy relax. # 4-) Building a tree with phyml ps: you can use any tree building method you'd like! Does not have to be phyml. You just need to be sure you have a reliable tree. To hyphy relax, you will need to input a tree and a fasta file that have the exact same ids. This is the part that is most annoying to automate for me at the moment. You need to edit ids by hand for now. Let's say you have two nucleotide sequences that start with "Dasnov3". You will need to edit one of them so the softwares don't get confused with the same name. After you edit ids, you should make them smaller. For example: ``` sed ‘s/_.*//g’ OG0002625.nuc.final.fas > OG0002625.nuc.final1.fas ``` ## 4.1-) Run phyml First transform the fasta in a phylyp format with the python line below. ``` from Bio import SeqIO records = SeqIO.parse("OG0002625.nuc.final1.fas", "fasta") count = SeqIO.write(records, "OG0002625.nuc.final1.fas.phylip", "phylip") print("Converted %i records" % count) ``` Change records and count, save as a python script and run. Once you have this file, run phyml. ``` phyml -i OG0002625.nuc.final1.fas.phylip ``` Attention!! Check your tree at the end with something like dendroscope!! Your relax result will be as reliable as your tree is!!! # 5-) Run hyphy relax Edit the phyml tree putting {test} close to the sloth id (mChoDid1) and run hyphy relax as: ``` hyphy relax --alignment “OG0002625.nuc.final1.fas” --tree “OG0002625.nuc.final1.fas.phylip_phyml_tree.txt” --test “test” --output “OG0002625.json” ``` Open results at: http://vision.hyphy.org/RELAX # 6-) Other considerations: Luisa and jacqui, I'd like to ask you to record here the orthogroup and KO numbers your have ran the analyses for so we don';t double the work!! https://docs.google.com/spreadsheets/d/1OEiORJr_ObbMYGH5oOPc1ThIe-jN66Nko4DH5_1fQas/edit#gid=0 # 7-) Genes we need to look on the sloth I mentioned the oxidative phosphorylation genes, but we should have a look at all of these below. We just need to find the KO for each as mentioned above: ``` List of the 10 glycolysis genes that I assume would be conserved in sloths: Hexokinase, Glucose-6- phosphate isomerase (also known as phosphohexose isomerase), Phosphofructokinase-1, Fructose- bisphosphate aldolase (or just aldolase), Triosephosphate isomerase, Glyceraldehyde-3-phosphate dehydrogenase, Phosphoglycerate kinase, Phosphoglycerate mutase, Phosphopyruvate hydratase (AKA enolase) and Pyruvate kinase. List of the lactic acid fermantation gene that I assume would be conserved in sloths: Lactate dehydrogenase. List of citric acid cycle-related genes that I assume might be "relaxed" (i.e. altered) in sloths: Mitochondrial pyruvate carrier 1, Mitochondrial pyruvate carrier 2, pyruvate dehydrogenase complex, Citrate synthase, Aconitase, Isocitrate dehydrogenase, α-Ketoglutarate dehydrogenase complex, Succinyl- CoA synthetase, Succinate dehydrogenase, Fumarase and Malate dehydrogenase. List of oxidative phosphorilation-related genes that I assume might be "relaxed" (i.e. altered) in sloths: Important: most of the complexes below are formed of many protein subunits (i.e. many genes). Some are nuclear, some are from the mitochondria genome. For instance, of the 44 subunits of the Respiratory Complex I (AKA NADH dehydrogenase), seven are encoded by the mitochondrial genome. The list: Complex I (AKA NADH dehydrogenase), Complex II (AKA succinate dehydrogenase), Ubiquinone (AKA Coenzyme Q), Complex III (AKA Q-cytochrome c oxidoreductase), Complex IV (Cytochrome c oxidase) and Complex V (AKA F1F0 ATP synthase or simply ATP synthase). I've found some papers dealing with biochemical, histological and physiological findings in sloths that agree with the current findings, for instance, altered mitochondria shape and lack of cristae, absence of white muscle fibers in sloths, alterations in pancreas and insulin and minor ATPase activity. I'll take a proper look on these papers and I'll send another word document soon, with these physiological changes. ```