# Sloth Paper 1
###### tags: `sloth`
- Table of Content
[ToC]
~~## 1. CAFE
~~
Link to the file with some CAFE5 results. I'll make a small dataset to test selection analysis first. Lines 29-37 bellow
https://docs.google.com/spreadsheets/d/1QYtrGpDhsTL9aRUT-3sCg3VyyvDJYzRv3D8lxVtpER4/edit#gid=0
~~## 1.1 Ortho species~~
https://docs.google.com/spreadsheets/d/1pOhjjqSJCwKo0ehonp8zEI-evi0o-Nv2GHhhOHWh5NI/edit#gid=0
~~## 1.1a Sloth orthology drafts~~
https://docs.google.com/document/d/1vHdluWB7Jr5K1V71UcPswP8GVkVjKFZ3OeihU5Hjw6w/edit
~~### 1.2 Sloth-only orthogroups~~
sloth only orthogroups and GO
```/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Orthogroups/sloth_OG_orthos_perline_GO```
Gain and losses
In total, we identified a set of 30,382 orthogroups across the 11 mammal species in our sample. 18,787 orthogroups contained sloth genes. 3740 were Xenarthra specific - containing armadillo or sloth only genes. We identified 2235 sloth-specific orthogroups: for the 3625 protein-genes identified, only 1685 have GO terms assigned to them.
```/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Orthogroups/```
cat sloth_OG_orthos_perline | awk '{print $2}' | sort -u | wc
3625 3624 86977
wc sloth_OG_orthos_perline_GO1
1685 1685 40440 sloth_OG_orthos_perline_GO1
~~### 1.2.1 Sloth-only orthogroups revigo~~

GO terms for the above: ```/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Orthogroups/sloth_OG_orthos_perline_GO21 ```
Retinoic acid binding
To date, two enhancers of Tbx5 that are specific to the limb or fin have been identified. The first of these enhancers, which is conserved among mammals but is not found in other vertebrates, was identified within Tbx5 intron 2 and can restrict early Tbx5 expression to the prospective forelimb bud mesenchyme of mice32. The enhancer sequence contains binding sites for HOX proteins, which are direct upstream regulators of Tbx5. Through this enhancer, activating Hox signals (rostral Hox4 and Hox5 paralogous group genes) and repress ing Hox signals (caudal Hoxc8, Hoxc9 and Hoxc10) function to restrict Tbx5 expression to the appropriate rostrocaudal level32,33 (FIG. 2). The intron 2 enhancer also contains retinoic acid response elements and a T cell factor/lymphoid enhancer factor (TCF/LEF)binding site that are required for its activity16. A
## 2. Phylopypruner
For expanded gene families.
```
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Orthogroup_Sequences/orthosCAFE5-copy/oi2/phylopypruner_output
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Orthogroups/CAFE5/witherror_2gamma_2/expanded-pylopypruner/
```
grep "mOrnAna1" OG0000857.fa.out.trml.edited.trml_pruned.fa
>mOrnAna1@XP_028902934.1
- [ ] Look for expanded genes in Lemur?
https://bioinformatics.stackexchange.com/questions/13958/extract-sequences-from-partial-header/13959
### Selection analysis
Ok, to do selection I need the nucleotide sequences. For that, I wrote a code to get fasta sequences by partial IDs.
https://github.com/marcelauliano/phython_scripts/blob/master/get_Fasta_byPartial_id.py
```(pal2nal) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Orthogroups/CAFE5/witherror_2gamma_2/expanded-pylopypruner/phylopypruner_output/output_alignments```
### Protein domains
Will run it to have a look at expanded domains
## Literature
### Review on Limb development: inside google drive sloth-paper/literature
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4472170/
https://www.cam.ac.uk/research/news/novel-genetic-mutations-cause-low-metabolic-rate-and-obesity
## 3. CAFE again with more species
Steps on the older run: https://docs.google.com/document/d/1vHdluWB7Jr5K1V71UcPswP8GVkVjKFZ3OeihU5Hjw6w/edit
extra 4 species
https://docs.google.com/spreadsheets/d/1pOhjjqSJCwKo0ehonp8zEI-evi0o-Nv2GHhhOHWh5NI/edit#gid=1528816751
```. 26403 patterns found (out of a total of 89475 sites).
. 53791 sites without polymorphism (60.12%).
. Computing pairwise distances...
. Building BioNJ tree...
. WARNING: this analysis requires at least 789 MB of memory space.
. Do you really want to proceed? [Y/n] y
. Refining the tree...
```
250 single copy orthologs:

```
python /software/team311/mu2/CAFE5/CAFE/tutorial/prep_r8s.py -i SuperMatrix.phylip_phyml_tree -o SuperMatrix2.ultrametric.phylip_phyml_tree -s 53791 -p 'mChoDid1,Dasnov' -c 68
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Mar25/Single_Copy_Orthologue_Sequences$
/software/team311/mu2/r8s1.81/src/r8s -b -f SuperMatrix2.ultrametric.phylip_phyml_tree > r8s_tmp.txt
tail -n 1 r8s_tmp.txt | cut -c 16- > Sept2021mammals_tree2.txt ```
```

```mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups$ python small_orthogroups.py Orthogroups.txt 1.xenathra 2.xenathra 3.xenarthra
```
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| 51775 | total orthogroups | | |
| 1988 | orthogroups with sloths only |2.sloth.1 |
| 18465 | orthogroups containing sloths | 1.sloth.1 |
| 3256 | xenarthra specific orthogroups | 2.xena.1 |
| 19403 | orthogroups containing xenarthrans | 1.xena.1 |
| 22659 | Orthogroups to be analyzed : 2.sloth.1 + 1.sloth.1 + 2.xena.1 + 1.xena.1 | allselecte.orth |
~~## 3.1 CAFE~~
http://tol-1-10-2:41715/lab
```
import pandas as pd
my_names = ["Orthogroup", "Bostaurus", "Canislupus", "Dasnov", "Mmusculus", "Human", "Lafricana", "Mmur", "MonDom", "koala", "platypus", "sloth", "echinops", "giantPanda", "bonobo", "bat", "total"]
my_types = {"Orthogroup":"string", "Bostaurus":"Int64", "Canislupus":"Int64", "Dasnov":"Int64", "Mmusculus":"Int64", "Human":"Int64", "Lafricana":"Int64", "Mmur":"Int64", "MonDom":"Int64", "koala":"Int64", "platypus":"Int64", "sloth":"Int64", "echinops":"Int64", "giantPanda":"Int64", "bonobo":"Int64", "bat":"Int64", "total":"Int64"}
df = pd.read_csv("/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/Orthogroups.GeneCount.tsv", names=my_names, sep="\t", dtype=my_types, skiprows=1)
df
n=["Orthogroup"]
m_type={"Orthogroup":"string"}
c=pd.read_csv("/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/allselecte.orthedi", names=n , dtype=m_type)
result = pd.merge(df, c, on='Orthogroup')
result['Desc'] = "(null)"
result
final = result[["Desc", "Orthogroup", "Bostaurus", "Canislupus","Dasnov", "Mmusculus", "Human", "Lafricana", "Mmur", "MonDom", "koala", "platypus", "sloth", "echinops", "giantPanda", "bonobo", "bat"]]
final.to_csv("/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/cafe/cafe.txt", index=False, sep="\t")
```
```
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/cafe$ /software/team311/mu2/CAFE5/CAFE/bin/cafexp -i cafe.txt -t Sept2021mammals_tree2.txt -e
```
~~## 3.1.1 CAFE k3 (finding significant orthogroups)~~
```/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/cafe/k3/witherror_3gamma ```
```
echo $'#nexus\nbegin trees;'>Significant_trees.tre
grep "*" Gamma_asr.tre >>Significant_trees.tre
echo "end;">>Significant_trees.tre
```
```grep -F "mChoDid1<13>*_" Significant_trees.tre > mChoDid1.Significant_trees.tre
sed 's/=.*//g' mChoDid1.Significant_trees.tre > mChoDid1.Significant_trees.tre1
sed 's/.*TREE //g' mChoDid1.Significant_trees.tre1 > mChoDid1.Significant_trees.tre.f
```
```
(goatools) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/cafe/k3/witherror_3gamma/GO-analysis$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py expanded-mchodid1.id mchodid1_GCF_015220235.1_proteinsID job_MM_lk5mhhkb_annotations.partial.noko.tsvONLYGO
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:00.671656 41,184 annotations READ: job_MM_lk5mhhkb_annotations.partial.noko.tsvONLYGO
Study: 6280 vs. Population 54527
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-10-02)
# min_ratio=None pval=0.05
```
Plotting GO terms
https://gitlab.com/evogenlab/GO-Figure/-/wikis/Example-plots
http://wencke.github.io
https://wego.genomics.cn

Contracted gene families
```
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/cafe/k3/witherror_3gamma/ortho_Sig_contracted.genes_function_mChoDid1_13.txt
```
### 3.2 CAFE k3 (finding significant orthogroups for node 16)
```
(goatools) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder_names/orthofinder-with-mary/OrthoFinder/Results_Mar09/WorkingDirectory/OrthoFinder/Results_Sep23/Orthogroups/cafe/k3/witherror_3gamma$ grep -F "<16>*_" Significant_trees.tre > 16.Significant_trees.tre
sed 's/=.*//g' 16.Significant_trees.tre > 16.Significant_trees.tre1
sed 's/.*TREE //g' 16.Significant_trees.tre1 > 16.Significant_trees.tre.f
```
~~### 3.3. Expanded in mchodid1~~
```
OG0000852 +4 mChoDid1_XP_037676968.1 dystrophin isoform X1 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676969.1 dystrophin isoform X2 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676970.1 dystrophin isoform X3 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676971.1 dystrophin isoform X4 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676972.1 dystrophin isoform X5 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676973.1 dystrophin isoform X6 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676975.1 dystrophin isoform X7 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676976.1 dystrophin isoform X1 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676977.1 dystrophin isoform X1 [Choloepus didactylus]
OG0000852 +4 mChoDid1_XP_037676978.1 dystrophin isoform X8 [Choloepus didactylus]
```
https://www.parentprojectmd.org/about-duchenne/what-is-duchenne/types-of-mutations/
https://www.biostars.org/p/388936/
http://doua.prabi.fr/software/isosel
Lupiáñez et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell (2015)
. For instance, a genomic inversion overlapping the boundary of the Epha4 gene containing TAD can result in several human limb malformations due to new interactions between Epha4 enhancers and the Wnt6 gene
EMBO Lab Management
## 4 LONGEST ISOFORM
08/10/2021
Ok, so noticed I did it all wrong so far as I had many isoforms in my analysis. Worked to get only the longest now using ncbi cds and cds_translate and wrote a number of scripts to deal with them. The scripts are called getting_longestIso_ncbi*.sh at /software/team311/mu2/
Orthofinder is now running again here:
```
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest
```
Running phyml
. 1230085 patterns found (out of a total of 5304363 sites).
. 2949523 sites without polymorphism (55.61%).
ThemouseUlnalessmutationderegulatesposteriorHoxD
gene expression and alters appendicular patterning. Development 124:3481–92
```
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/Single_Copy_Orthologue_Sequences$ python /software/team311/mu2/CAFE5/CAFE/tutorial/prep_r8s.py -i SuperMatrix3.phylip_phyml_tree -o r8s_ctl_file.txt -s 5304363 -p 'mChoDid1,Dasnov3' -c '68'
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/Single_Copy_Orthologue_Sequences$ /software/team311/mu2/r8s1.81/src/r8s -b -f r8s_ctl_file.txt > r8s_tmp.txt
tail -n 1 r8s_tmp.txt | cut -c 16- > ultrametric_tree.txt
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/Orthogroups/cafe$ echo ' /software/team311/mu2/CAFE5/CAFE/bin/cafexp -i cafe.txt -t ultrametric_tree.txt -e ' | bsub -n4 -q long -R"span[hosts=1] select[mem>55000] rusage[mem=55000]" -M55000 -P team311 -o error.%J -e er.error.%J
Job <174498> is submitted to queue <long>.
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/Orthogroups/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe.txt -o witherror_3gamma -p -k 3 -eBase_error_model.txt ' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe.%J -e er.cafe.%J
Job <175469> is submitted to queue <basement>.
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/Orthogroups/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe.txt -o witherror_2gamma -p -k 2 -eBase_error_model.txt ' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe.%J -e er.cafe.%J
Job <175470> is submitted to queue <basement>.
```
### 4.1. witherror_2gamma
Localy files are at:
```
/Users/mu2/Sanger/mChoDid1/cafe-longISO/with_gammak2
```
```
(goatools) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/Orthogroups/cafe/witherror_2gamma/GO_expanded$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py ortho_Sigexpanded_genes_function_mChoDid1_153 mchodid1.longestISO-go21 mchodid1.longestISO-go2
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:00.240366 19,382 annotations READ: mchodid1.longestISO-go2
Study: 326 vs. Population 22969
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-10-18)
# min_ratio=None pval=0.05
```
### 4.2 only limited number of genomes
Orthofinder run:
```
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/Single_Copy_Orthologue_Sequences
```
```
(CAFE) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/Single_Copy_Orthologue_Sequences$ python /software/team311/mu2/CAFE5/CAFE/tutorial/prep_r8s.py -i SuperMatrix.phylip_phyml_tree -o r8s_ctl_file.txt -s 5822109 -p 'mChoDid1,Dasnov3' -c '68'
Running cafetutorial_clade_and_size_filter.py as a standalone...
(CAFE) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/Single_Copy_Orthologue_Sequences$ /software/team311/mu2/r8s1.81/src/r8s -b -f r8s_ctl_file.txt > r8s_tmp.txt
(CAFE) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/Single_Copy_Orthologue_Sequences$ tail -n 1 r8s_tmp.txt | cut -c 16- > ultrametric_tree.txt
CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe.txt -o Less_2gamma -p -k 2 -eBase_error_model.txt ' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe.%J -e er.cafe.%J
Job <181046> is submitted to queue <basement>.
19/10/2021
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe$ echo ' /software/team311/mu2/CAFE5/CAFE/bin/cafexp -i cafe2.txt -t ultrametric_tree.txt -e -o cafe2-error' | bsub -n4 -q long -R"span[hosts=1] select[mem>55000] rusage[mem=55000]" -M55000 -P team311 -o error.%J -e er.error.%J
Job <181566> is submitted to queue <long>.
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe2.txt -o cafe2witherror_3gamma -p -k 3 -eBase_error_model2.txt ' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe2.%J -e er.cafe2.%J
Job <181635> is submitted to queue <basement>.
(CAFE) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe2.txt -o cafe2witherror_2gamma -p -k 2 -eBase_error_model2.txt ' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe2.%J -e er.cafe2.%J
Job <181637> is submitted to queue <basement>.
(CAFE) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe.txt -o cafewitherror_Nop_k3 -eBase_error_model.txt -k 3' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe.%J -e er.cafe.%J
Job <185706> is submitted to queue <basement>.
(CAFE) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe.txt -o cafewitherror_Nop_k4 -eBase_error_model.txt -k 4' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe.%J -e er.cafe.%J
Job <185708> is submitted to queue <basement>.
(CAFE) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric_tree.txt -i cafe.txt -o cafewitherror_Nop_k5 -eBase_error_model.txt -k 5' | bsub -n 4 -q basement -R"span[hosts=1] select[mem>2000] rusage[mem=2000]" -M2000 -P team311 -o cafe.%J -e er.cafe.%J
Job <185709> is submitted to queue <basement>.
```
```
(goatools) mu2@tol-head2:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe/cafewitherror_Nop_k3/GO$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py ortho_Sigexpanded_genes_function_mChoDid1_5.names mchodid1.longestISO-go2_1 mchodid1.longestISO-go2
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:00.267350 19,382 annotations READ: mchodid1.longestISO-go2
Study: 548 vs. Population 22969
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-10-24)
# min_ratio=None pval=0.05
```
https://en.wikipedia.org/wiki/Aldo-keto_reductase_family_1,_member_A1
Less species run:
```
/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe/
```

# 5. Less species (longest ISO)
## 5.1 with gamma k=4 noP
```
goatools) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe/cafewitherror_Nop_k4/GO$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py ortho_Sigexpanded_genes_function_mChoDid1_5.names mchodid1.longestISO-go2_1 mchodid1.longestISO-go2
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:00.261859 19,382 annotations READ: mchodid1.longestISO-go2
Study: 602 vs. Population 22969
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-11-06)
# min_ratio=None pval=0.05
```
No GO enriched for expanded.
```
(goatools) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe/cafewitherror_Nop_k4/GO$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py ortho_SigContracted_genes_function_mChoDid1_5.names mchodid1.longestISO-go2_1 mchodid1.longestISO-go2
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:00.250526 19,382 annotations READ: mchodid1.longestISO-go2
Study: 45 vs. Population 22969
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-11-06)
# min_ratio=None pval=0.05
```
No GO enriched for contracted.
## 5.2 base model (longest iso)
```
(goatools) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe/Less_errorandbase/GO$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py ortho_Sigexpanded_genes_function_mChoDid1_5.names mchodid1.longestISO-go2_1 mchodid1.longestISO-go2
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:01.750468 19,382 annotations READ: mchodid1.longestISO-go2
Study: 898 vs. Population 22969
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-11-06)
# min_ratio=None pval=0.05
```
http://localhost:8888/notebooks/longestISO-witherrorgamak2.ipynb
```
(goatools) mu2@tol-head1:/lustre/scratch116/vr/projects/vgp/user/mu2/mChoDid1-paper/orthofinder-Longest/OrthoFinder/Results_Oct08/WorkingDirectory/OrthoFinder/Results_Oct14/cafe/Less_errorandbase/GO$ python /software/team311/mu2/miniconda3/envs/goatools/bin/find_enrichment.py ortho_SiC_genes_function_mChoDid1_5.names mchodid1.longestISO-go2_1 mchodid1.longestISO-go2
go-basic.obo: fmt(1.2) rel(2021-02-01) 47,291 GO Terms
HMS:0:00:00.448642 19,382 annotations READ: mchodid1.longestISO-go2
Study: 56 vs. Population 22969
0 of 0 results have uncorrected P-values <= 0.05=pval
# Generated by GOATOOLS vv1.0.15 (2021-11-06)
# min_ratio=None pval=0.05
```
https://docs.google.com/presentation/d/1AIsquf0cknDEwEYJL439-znIrOdwuLaD_nJk7HSrHAc/edit#slide=id.p
http://localhost:8888/notebooks/Google%20Drive/My%20Drive/mChodDid1/witherror-k3-nop-longest.ipynb
http://localhost:8888/notebooks/longestISO-witherrorgamak2.ipynb
https://docs.google.com/spreadsheets/d/1QYtrGpDhsTL9aRUT-3sCg3VyyvDJYzRv3D8lxVtpER4/edit#gid=1264905726
# 6. Metabolic genes
## OG0000084
```
/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes
```
```
(mafft_trimal) mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes$ mafft --globalpair --maxiterate 100 --inputorder OG0000084.aa.fa > translation_alignment.fasta
(mafft_trimal) mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes$ mafft --globalpair --maxiterate 100 --inputorder OG0000084.CDS.fa > nucleotide_alignment.fasta
```
```
(base) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes$ pal2nal.pl translation_alignment.2fasta nucleotide_alignment.2fasta -output fasta -codontable 1 > codon_alignment.fasta
```
### 6.1 edits for pal2nal
cd /software/team311/mu2/miniconda3/bin/
source activate base
all cds sequences:
/lustre/scratch123/tol/teams/tola/users/mu2/from_scratch116/mChoDid1-paper/CDS/all1.lesspecies-CDS.fa
#### 6.1.1 Get exact same fasta header
```
cat translation_alignment.fasta | sed 's+|.*XP_+_+' | sed 's+|.*NP_+_+' > translation_alignment2.fasta
cat nucleotide_alignment.fasta | sed 's+|.*XP_+_+' | sed 's+|.*NP_+_+' > translation_alignment2.fasta
pal2nal.pl translation_alignment.2fasta nucleotide_alignment.2fasta -output fasta -codontable 1 > codon_alignment.fasta
```
#### 6.2- again
Working on the folder bellow
```
/Users/mu2/Google Drive/My Drive/sloth_paper/metabolic_genes/dev/again
```
```
python get_protID_by_name.py -l ../list -i ../mChoDid1.prot.tab.ids
ls -ltrh *.id.csv > ls
cat ls | sed 's/.*ID.//g' | sed 's/.id.csv//g' | sed 's/^/mkdir /g' > mkdir.sh
sh mkdir.sh
(base) mib112024i:NADH_dehydrogenase mu2$ python ../../get_ort.py -ort ../../Orthogroups.txt -id NADH_dehydrogenase.csv.id
```
#### 6.3. Hyphy codon-msa
https://github.com/veg/hyphy-analyses/tree/master/codon-msa
```
hyphy pre-msa.bf --input /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/small_test_again/OG0000084.CDS1.fa
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/small_test_again$ muscle -align OG0000084.CDS1.fa_protein.fas -output aln.afa
(hyphy) mu2@tol-head1:/software/team311/mu2/hyphy-analyses/codon-msa$ hyphy post-msa.bf --protein-msa /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/small_test_again/aln.afa --nucleotide-sequences /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/small_test_again/OG0000084.CDS1.fa_nuc.fas --output /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/small_test_again/example.msa
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/small_test_again$ pal2nal.pl example.msa aln.afa -output fasta -codontable 1
ERROR: number of input seqs differ (aa: 70; nuc: 71)!!
```
#### 6.4 Get gene, orthogrop and sequence - step by step
Names in "list" must be all low caption
```
(base) mib112024i:26-07 mu2$ python get_protID_by_name.py -l list -i ../mChoDid1.prot.tab.ids
```
This gave me glucose-6-phosphate_isomerase.id.csv, for example
Then I go and find the orthogroup
```
python ../get_ort.py -id glucose-6-phosphate_isomerase.id.csv -ort ../Orthogroups.txt
```
Now I have this XP_037675410.1.csv wchi is like
```
OG0003182 Dasnov3_lcl|NW_004465803.1_prot_XP_023442251.1_3924 GRCh38_lcl|NC_000019.10_prot_NP_001276718.1_101672 GRCm39_lcl|NC_000073.7_prot_NP_032181.2_36770 Loxafr3_lcl|NW_003573463.1_prot_XP_023411392.1_26581 Mhudiblu_lcl|NC_048258.1_prot_XP_034800742.1_59740 Mmur_lcl|NC_033681.1_prot_XP_012630577.1_51021 MonDom5_lcl|NC_008801.1_prot_XP_001365167.3_6431 bostaurus_lcl|NC_037345.1_prot_NP_001035561.1_41217 cfam_lcl|NC_051805.1_prot_XP_038513467.1_4105 cfam_lcl|NC_051808.1_prot_XP_038518325.1_10412 koala_lcl|NW_018343971.1_prot_XP_020840672.1_8528 mChoDid1_lcl|NC_051333.1_prot_XP_037675410.1_52482 mOrnAna1_lcl|NC_041738.1_prot_XP_028931237.1_22321
```
Then I need to get all the partial IDS so I can get the CDSs
```
(base) mib112024i:OG0003182 mu2$ cat orth | sed 's/.*XP_//g' | sed 's/.*NP_//g' > orth1
```
This is the gene tree. Dog has two sequences

```
(base) mib112024i:OG0003182 mu2$ python ../../get_Fasta_byPartial_id.py -l orth1.1 -f all1.lesspecies-CDS.fa -o OG0003182.cds.fa
```
Using codon-msa
https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md
(hyphy) mu2@tol-head2:/software/team311/mu2/hyphy-analyses/codon-msa$ hyphy pre-msa.bf --input /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/OG0003182/OG0003182.cds.fa
```
(hyphy) mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/OG0003182$ mafft --globalpair --maxiterate 100 --inputorder OG0003182.cds.fa_protein.fas > OG0003182.cds.fa_protein.mafft.fas
```
```
(hyphy) mu2@tol-head2:/software/team311/mu2/hyphy-analyses/codon-msa$ hyphy post-msa.bf --protein-msa /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/OG0003182/OG0003182.cds.fa_protein.mafft.fas --nucleotide-sequences /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/OG0003182/OG0003182.cds.fa_nuc.fas --output /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/OG0003182/OG0003182.cds.fa_nuc.fas.msa.txt
```
https://docs.google.com/spreadsheets/d/1G-xiB4RUvcqZ5S_ArvqOKxARRf7DJN7IYehlJWO-G14/edit#gid=0
https://docs.google.com/spreadsheets/d/19fr88zM3N9gfis_0pZgn-BiX3C9AlnCWQBQYthqqUUM/edit#gid=0
https://hackmd.io/UqPtu7l2RgSXRR7tHsQYgQ?both
https://hackmd.io/t0XiRkLWQWuuSNEnQLIaNQ?view
```
>bostaurus_lcl_NC_037345_1_cds_NP_001035561_1_41217_1
>cfam_lcl_NC_051808_1_cds_XP_038518325_1_10412_1
>Dasnov3_lcl_NW_004465803_1_cds_XP_023442251_1_3924_1
>GRCh38_lcl_NC_000019_10_cds_NP_001276718_1_101672_1
>GRCm39_lcl_NC_000073_7_cds_NP_032181_2_36770_1
>koala_lcl_NW_018343971_1_cds_XP_020840672_1_8528_1
>Loxafr3_lcl_NW_003573463_1_cds_XP_023411392_1_26581_1
>mChoDid1_lcl_NC_051333_1_cds_XP_037675410_1_52482_1
>Mhudiblu_lcl_NC_048258_1_cds_XP_034800742_1_59740_1
>Mmur_lcl_NC_033681_1_cds_XP_012630577_1_51021_1
>MonDom5_lcl_NC_008801_1_cds_XP_001365167_3_6431_1
```
| Column 1 | Column 2 |
| -------- | -------- |
| >cfam_lcl_NC_051805_1_cds_XP_038513467_1_4105_1 | cfam1 |
# 6.5 Oxdative phosporylation genes
Putting the logest isoform of CD predicted proteins on KEGG
https://www.kegg.jp/kegg-bin/blastkoala_result?id=a52aa739f2cd68914c4eeeb0862720f2a8a0323e&passwd=qkAuw0&type=ghostkoala


## 6.5.1
Working on Oxidative Phosporilation K00234
Took a orthogroup from a run with longedt isoforms but also including CH (not treated for longest iso)
Orthogroups for this run are here:
```
/Users/mu2/Google Drive/My Drive/sloth_paper/metabolic_genes/
```
Getting the ids:
```
(base) mib112024i:Results_Jul25 mu2$ cat OG0011943 | sed 's/.*XP_//g' | sed 's/.*NP_//g' > OG0011943.1
```
Alice Urzi results:
```
https://drive.google.com/drive/folders/1PomW4lNgwOmrt8d8_qqXaxrs4bxWM19C
```
Ok so unfortunately I haven't found the nucleotide counterpart of the CH augustus annotation =(
)
## 6.5.1.1 Oxidative Phosporilation K00234
```
(base) mib112024i:Oxidative_phosporilation mu2$ cat OG0012715 | sed 's/.*XP_//g' | sed 's/.*NP_//g' > OG0012715.1
```
```
(base) mib112024i:Oxidative_phosporilation mu2$ python ../get_Fasta_byPartial_id.py -l OG0012715.1 -f ../all1.lesspecies-CDS.fa -o OG0012715.1.cds.fa
```
Where is get_Fasta_byPartial_id.py
```
get_Fasta_byPartial_id.py -> /Users/mu2/Google Drive/My Drive/sloth_paper/metabolic_genes/dev/get_Fasta_byPartial_id.py
```
```
python ../get_Fasta_byPartial_id.py -l OG0012715.1a -f ../all1.lesspecies-CDS.fa -o OG0012715.1.cds.fa
```
```
(base) mib112024i:Oxidative_phosporilation mu2$ cat OG0012715.1.cds.fa | sed 's/NC.*//g' | sed 's/NW.*//g' > OG0012715.1.cds.n.fa
```
Send to farm to use codon-msa
```
(hyphy) mu2@tol-head2:/software/team311/mu2/hyphy-analyses/codon-msa$ hyphy pre-msa.bf --input /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation/OG0012715.1.cds.n.fa
```
```
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation$ mafft --globalpair --maxiterate 100 --inputorder OG0012715.1.cds.n.fa_protein.fas > OG0012715.1_protein.mafft.fas
```
```
(hyphy) mu2@tol-head2:/software/team311/mu2/hyphy-analyses/codon-msa$ hyphy post-msa.bf --protein-msa /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation/OG0012715.1_protein.mafft.fas --nucleotide-sequences /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation/OG0012715.1.cds.n.fa_nuc.fas --output /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation/OG0012715.1.cds.n.fa_nuc.msa.txt
```
Download to convert aligment and
http://phylogeny.lirmm.fr/phylo_cgi/get_result.cgi
run phyml to get tree
```
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation$ phyml -i OG0012715.1.cds.n.fa_nuc.msa.txt.phy
```
```
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation$ sed 's/lcl__1//g' OG0012715.1.cds.n.fa_nuc.msa.txt > OG0012715.1.cds.n.fa_nuc.msa.1.txt
```
```
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation$ hyphy relax --alignment OG0012715.1.cds.n.fa_nuc.msa.1.txt --tree OG0012715.1.cds.n.fa_nuc.msa.txt.phy_phyml_tree.1.txt --test test --output OG0012715-output.json
```
## 6.5.1.2 OG0004707
```
(hyphy) mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic_genes/Oxidative_phosporilation$ hyphy relax --alignment OG0004707.3.cds.fa_nuc.1.msa --tree OG0004707.tre --test test --output OG0004707-output.json
```
>No significant evidence for relaxation (or intensification) of selection among **test** branches _relative_ to the **reference** branches at P<=0.05
# 7. Mapping bradypus tridactylus to reference
Will do it in another hackmd: https://hackmd.io/53UHfEFKT2GsSvYrwYpM7A
# 8. Getting longest iso for more species
Where are the files:
```
/Users/mu2/Google Drive/My Drive/sloth_paper/more_orthologs/files/all.aa.more.longest.n1.fa
/Users/mu2/Google Drive/My Drive/sloth_paper/more_orthologs/files/all.more.cds.longest.n1.fa
/Users/mu2/Google Drive/My Drive/sloth_paper/more_orthologs/files/Orthogroups.txt
```
# 9. From new orthofinder.
Here are the files
```
/Users/mu2/Google Drive/My Drive/sloth_paper/more_orthologs/files
```
```
(hyphy) mu2@tol-head1:/software/team311/mu2/hyphy-analyses/codon-msa$ hyphy pre-msa.bf --input /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic-genes-orthofinder-MoreSpecies-jack/OG0004492.nucl1.fa
```
```
(hyphy) mu2@tol-head1:/software/team311/mu2/hyphy-analyses/codon-msa$ mafft --globalpair --maxiterate 100 --inputorder /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic-genes-orthofinder-MoreSpecies-jack/OG0004492.nucl1.fa_protein.fas > /lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic-genes-orthofinder-MoreSpecies-jack/OG0004492.nucl1.fa_protein.mafft.fas
```
# 10. More proteins from Jack, cafe
Orthofinder run:
```
/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences
```
```
less /software/team311/mu2/singleCopy_phylogeny.sh
```
```
mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences$ sh /software/team311/mu2/mafft.sh
```
```
(mafft_trimal) mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences$ sh trimal.sh
```
```
mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences$ python /software/team311/mu2/Utensils/geneStitcher.py -d _ -in *.trml
```
# 11. (base) mib112024i:sloth_hyphy mu2$ grep "K22138" user_ko.txt
mChoDid1_lcl|NC_051307.1_prot_XP_037670269.1_904 K22138
mChoDid1_lcl|NC_051330.1_prot_XP_037673754.1_49156 K22138
```
[filter] Processing sequence 16
WARNING: Sequence ASM31398v2_lcl_NW_022107431_1_cds_XP_004702142_2_17830 failed to align to any of the in-frame references. Try setting --E flag to a lower value
[filter] Processing sequence 20
WARNING: Sequence ARS_UCD1_lcl_NC_037336_1_cds_XP_024852616_1_23393 failed to align to any of the in-frame references. Try setting --E flag to a lower value
[Next steps] Please run **/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic-genes-orthofinder-MoreSpecies-jack/OG0000924.nuc.fa_protein.fas** through an MSA program, and then run post-msa.bf on the output and **/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/metabolic-genes-orthofinder-MoreSpecies-jack/OG0000924.nuc.fa_nuc.fas** to recover the nucleotide MSA
```
# 12. citrate syntese OG0015558
```
(base) mib112024i:sloth_hyphy mu2$ grep "K00614" user_ko.txt
mChoDid1_lcl|NC_051312.1_prot_XP_037695016.1_17873 K00614
```
OG0015558
Not significant
# 13. aconitase
```
(base) mib112024i:sloth_hyphy mu2$ grep "K01681" user_ko.txt
mChoDid1_lcl|NC_051314.1_prot_XP_037700944.1_23647 K01681
mChoDid1_lcl|NC_051316.1_prot_XP_037652820.1_29164 K01681
```
OG0008334 - No significant evidence for relaxation
OG0003636
# 14. Isocitrate dehydrogenase
```
(base) mib112024i:sloth_hyphy mu2$ grep "K00030" user_ko.txt
mChoDid1_lcl|NC_051310.1_prot_XP_037689200.1_13163 K00030 - isocitrate dehydrogenase (NAD(+)) 3 catalytic subunit alpha (IDH3A), transcript variant X1
mChoDid1_lcl|NC_051312.1_prot_XP_037694388.1_18829 K00030
mChoDid1_lcl|NC_051325.1_prot_XP_037667757.1_42863 K00030
mChoDid1_lcl|NC_051335.1_prot_XP_037678779.1_54491 K00030
(base) mib112024i:sloth_hyphy mu2$ grep "K00031" user_ko.txt
mChoDid1_lcl|NC_051310.1_prot_XP_037688976.1_12943 K00031
mChoDid1_lcl|NC_051315.1_prot_XP_037705915.1_27544 K00031
mChoDid1_lcl|NC_051322.1_prot_XP_037662285.1_37314 K00031
```
```
import pandas as pd
my_types = {'ids':'string'}
my_names=['ids']
my_types1 = {'s_ids':'string'}
my_names1=['s_ids']
df =pd.read_csv("/Users/mu2/Google Drive/My Drive/sloth_paper/more_orthologs/single_copy/allgene_trees.ids1", names=my_names,dtype=my_types)
df
single =pd.read_csv("/Users/mu2/Google Drive/My Drive/sloth_paper/more_orthologs/single_copy/single1.ids1", names=my_names1,dtype=my_types1)
single
#result.sort_values(['ids', 's_ids'], ascending=[True, True])
df['some_col']= single['s_ids']
df.sort_values(['ids', 'some_col'], ascending=[True, True])
df["Unique_ids"] = df["Ligand_miss"][~df["Ligand_miss"].isin(df["Ligand_hit"])].drop_duplicates()
```
# 15. CAFE all species
```
mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences$ python /software/team311/mu2/CAFE5/CAFE/tutorial/prep_r8s.py -i SuperMatrix.phylip_phyml_tree -o SuperMatrix3.phylip_phyml_tree -s 2629902 -p 'mChoDid1,Dasnov3' -c 68
mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences$ /software/team311/mu2/r8s1.81/src/r8s -b -f SuperMatrix3.phylip_phyml_tree > r8s_tmp.txt
mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences$ tail -n 1 r8s_tmp.txt | cut -c 16- > ultrametric-all-3.tree
mu2@tol-head1:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/cafe-allsites$ echo '/software/team311/mu2/CAFE5/CAFE/bin/cafexp -t ultrametric-all-3.tree -i allspecies-cafe.txt -p -e ' | bsub -n 2 -q long -R"span[hosts=1] select[mem>78000] rusage[mem=78000]" -M78000 -P team311 -o error.%J -e er.error.%J
Job <467370> is submitted to queue <long>.
```
# 16 GenEra
```/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/genEra/chlo.27675_phylostrata_assignation.tsv ```
## 16.2.2 GenEra with all proteins (including isoforms)
Gene count
```
22 NA_Absent from the DIAMOND/MMseqs2 results
45 NA_Possible contamination or HGT
33410 1_cellular organisms
6655 2_Eukaryota
1671 3_Opisthokonta
1962 4_Metazoa
3180 5_Eumetazoa
1669 6_Bilateria
203 7_Deuterostomia
233 8_Chordata
894 9_Vertebrata
1548 10_Gnathostomata
405 11_Euteleostomi
77 12_Sarcopterygii
22 13_Dipnotetrapodomorpha
759 14_Tetrapoda
197 15_Amniota
90 16_Mammalia
133 17_Theria
563 18_Eutheria
34 19_Xenarthra
755 20_Choloepus didactylus
```
```
Founder summary
```
12 NA_Possible contamination or HGT
3763 1_cellular organisms
1151 2_Eukaryota
249 3_Opisthokonta
241 4_Metazoa
548 5_Eumetazoa
294 6_Bilateria
40 7_Deuterostomia
38 8_Chordata
187 9_Vertebrata
402 10_Gnathostomata
112 11_Euteleostomi
20 12_Sarcopterygii
6 13_Dipnotetrapodomorpha
236 14_Tetrapoda
56 15_Amniota
39 16_Mammalia
48 17_Theria
225 18_Eutheria
2 19_Xenarthra
497 20_Choloepus didactylus
```
```
## 16.1. 27/01/23
Ok, I'm looking into the GenEra result from the longest isoforms
```mu2@tol-head2:/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/genEra/longest ```
```
12 NA_Absent from the DIAMOND/MMseqs2 results
35 NA_Possible contamination or HGT
14947 1_cellular organisms
2467 2_Eukaryota
619 3_Opisthokonta
706 4_Metazoa
1211 5_Eumetazoa
709 6_Bilateria
92 7_Deuterostomia
107 8_Chordata
383 9_Vertebrata
685 10_Gnathostomata
212 11_Euteleostomi
42 12_Sarcopterygii
17 13_Dipnotetrapodomorpha
403 14_Tetrapoda
92 15_Amniota
60 16_Mammalia
78 17_Theria
312 18_Eutheria
6 19_Xenarthra
531 20_Choloepus didactylus
```
### 16.1.1.

Figure: only genes that found orthology only with sloth. 'New genes'
### 16.1.2.
What are those new genes?
```/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/genEra/longest/chlo.longest.fa ```
q
- [ ] Get GO for choloepus genes and run revigo
- [ ] How many orthogroups these genes represent
- [ ] Are they multigenic families?
### 16.1.3 02/06/23
Ok, I want to include the ``` -s ``` parameter for a new genEra run (https://github.com/josuebarrera/GenEra#installation)
So I need to run this R script with a pylogenetic tree, but the tree needs to have taxIDs as names
```
(R) mu2@tol-head2:/software/team311/mu2/miniconda3/bin
> library(ape)
> tree<-read.tree(file="/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences/SuperMatrix.phylip_phyml_tree")
> distances<-cophenetic.phylo(x = tree)
> write.table(distances,"/lustre/scratch123/tol/teams/tola/users/mu2/mChoDid1/orthofinder-MoreSpecies-jack/proteins/OrthoFinder/Results_Aug27/WorkingDirectory/OrthoFinder/Results_Aug28/Single_Copy_Orthologue_Sequences/pairwise_distances.tsv", row.names = TRUE, sep = "\t")
```
# 17 Discussions with Camila
- [ ] Install and run Siry
- [ ] Olfactory receptors
https://academic.oup.com/gbe/article/14/9/evac125/6656105