# Fundamentos de Bioinformática e Análise Genômica
## Calendário
| Data | Tema |
| -------- | -------- |
| 23/03/2025 |Apresentação geral da disciplina e conceitos básicos de bioinformática.
| 31/03/2025 | Introdução ao linux
|07/04/2025 | Sequênciamento e ômicas
|14/04/2025 | Alinhamentos de sequencias, AB1s e confecçãode primers
|28/04/2025 | Anotação e montagem de genomas
|05/05/2025 | Evolução de genomas, analise de elementos genéticos móveis
|12/05/2025 | Filagenia e interpretação de arvores filogenéticas
|19/05/2025| Genomica epidemiológica, identificação de marcadores
|26/05/2025| Analise e predição de estrutura de proteinas
|02/06/2025| Analise de Pangenomas
|09/06/2025| Ferramentas online para analise de bioinformática
|16/06/2025| Atividade Avaliativa I
|30/06/2025|Atividade Avaliativa II
07/07/2025|Atividade Avaliativa III
|14/07/2025| Atividade Avaliativa IV
# Banco de scrips
https://hackmd.io/VPKSiywKTES91DUDQW1dwg
# Progrmas de bioinformática
[Prokka](https://github.com/tseemann/prokka)
[Roary](https://sanger-pathogens.github.io/Roary/)
[Panaroo](https://github.com/gtonkinhill/panaroo)
[Rast](https://rast.nmpdr.org/rast.cgi)
[Galaxy](https://usegalaxy.org/)
[Spades](https://github.com/ablab/spades)
# Semana 1
[Aula 1 Conceitos Básicos de bioinformática ](https://teodianobastoslab.net/Bioinformatica/LivroBioinformatica.pdf)
# Semana 2
Baseado na disciplina do [Prof. Dr. Alessandro de Melo Varani](https://www.fcav.unesp.br/#!/departamentos/tecnologia/docentes/alessandro-de-mello-varani/main-page/english/)
[Introdução ao linux](https://hackmd.io/@lamoroso92/minicurso25-linux)
[Guia da aula](https://uoguelphca-my.sharepoint.com/:b:/g/personal/lpizauro_uoguelph_ca/ESxv3qn6O1RAp8AIdToHI4wBRBqtJlZNN8nCseqTcQ4PjQ?e=v39NGT)
[Virutal box](https://www.virtualbox.org/wiki/Downloads)
[Biolinux](https://uoguelphca-my.sharepoint.com/:u:/g/personal/lpizauro_uoguelph_ca/EevArqdRDc1IsmMVO7RZev4BuqT_5z860FCfkZT5w6JDcQ?e=ketJaX) - Senha bioinfo2025
Aqui estão os comandos essenciais para começar:
| Comando | Função | Exemplo |
| ------- | -------------- | ------- |
|ls |Lista arquivos e diretórios| ls -la (detalhes e ocultos)|
|mkdir |Cria um diretório| mkdir nova_pasta|
|cd |Navega entre diretórios| cd /home/suario|
|pwd |Mostra o diretório atual| pwd (/home/usuario)|
|cp |Copia arquivos ou diretórios| cp arquivo.txt copia.txt|
|mv |Move ou renomeia arquivos/diretórios | mv arquivo.txt /tmp/novo.txt|
|rm |Remove arquivos ou diretórios | rm -r pasta (remove recursivo)|
|touch |Cria arquivo vazio ou atualiza data | touch novo_arquivo.txt|
|cat |Exibe, concatena ou cria arquivos | cat arquivo.txt (mostra conteúdo)|
|head |Exibe primeiras linhas de um arquivo| head -n 5 arquivo.txt (5 linhas)|
|less |Visualiza arquivos com navegação |less arquivo.txt (navega com setas)|
|tr |Traduz ou manipula caracteres em texto| tr 'a-z' 'A-Z' < arquivo.txt|
|chmod |Altera permissões de arquivos/diretórios| chmod 755 arquivo.txt|
|chown |Altera proprietário de arquivos/diretórios| chown usuario:grupo arquivo.txt|
|find |Busca arquivos/diretórios |find / -name "arquivo.txt"|
|grep |Busca texto em arquivos |grep "texto" arquivo.txt|
|df |Mostra uso do disco |df -h (formato legível)|
|top |Monitora processos em tempo real |top (processos ativos)|
|sudo |Executa comando como administrador |sudo apt update|
Crie um diretório chamado bioinfo_pratica:
```
mkdir bioinfo_pratica
cd bioinfo_pratica
```
Crie um arquivo FASTA simples com o comando ho
```
echo -e ">seq1\nATGCGTACG\n>seq2\nTTAGCCATG" > sequencias.fasta
```
#### criar e juntar .fasta "cat >>"
Visualize o arquivo:
```
cat sequencias.fasta
```
Conte o número de sequências (linhas que começam com >):
```
grep -c "^>" sequencias.fasta
```
Busque sequências que contenham "ATG":
```
grep -A 1 "ATG" sequencias.fasta
```
Outras formas de avaliar o arquivo
```
awk '/^>/{print $1; getline; print}' sequencias.fasta
```
Atualize o sistema e instale o BLAST:
```
sudo apt update
sudo apt install ncbi-blast+
```
Verifique a instalação
```
blastn -version
```
Usando BLAST
Crie um banco de dados com o arquivo sequencias.fasta:
```
makeblastdb -in sequencias.fasta -dbtype nucl
```
Execute uma busca simples:
bash
```
blastn -query sequencias.fasta -db sequencias.fasta -out resultado.txt
```
veja o resultado
```
cat resultado.txt
```
Automação com Scripts
Ensine os alunos a criar um script simples para automatizar tarefas.
```
contar_sequencias.sh:
nano contar_sequencias.sh
#!/bin/bash
echo "Contando sequências no arquivo $1"
grep -c "^>" $1
```
Torne o script executavel
```
chmod +x contar_sequencias.sh
```
Execute o script
```
./contar_sequencias.sh sequencias.fasta
```
Instalar o programa ugene e Mega no linux
Versão ubuntu e variações
#### Ugene
Baixe o instalador: Acesse ugene.net/download e baixe o arquivo ugeneInstaller_64bit.tar para sistemas 64 bits
Desconpacte o arquivo: No terminal, navegue até o diretório de download (ex.: ~/Downloads):
```
cd ~/Downloads
tar -xf ugeneInstaller_64bit.tar
```
Torne o instalador executável:
```
chmod +x ugeneInstaller_64bit
```
(Usa chmod da tabela para alterar permissões)
Execute o instalador:
```
./ugeneInstaller_64bit
```
Siga o assistente de instalação gráfico.
Inicie o UGENE: Após a instalação, inicie pelo menu de aplicativos ou via terminal:
`ugene`
#### Instalando o Mega
Acesse megasoftware.net/downloads e selecione a versão para Ubuntu (ex.: MEGA_12.0.9_Ubuntu_64bit.deb para a versão mais recente, MEGA 12
Alternativamente, use o terminal para baixar diretamente (substitua pelo link correto da versão desejada):
Modo terminal (para quem pretende usar servidores)
```
cd ~/Downloads
wget https://www.megasoftware.net/releases/MEGA_12.0.9_Ubuntu_64bit.deb
```
Verifique o arquivo baixado: Confirme que o arquivo está no diretório:
```
ls
```
Instale o pacote MEGA X: Instale o arquivo .deb:
```
sudo apt install ./MEGA_12.0.9_Ubuntu_64bit.deb
```
Se houver dependências ausentes, corrija com:
```
sudo apt install -f
```
# Arquivos ab1
[16SRNA sequence](https://drive.google.com/file/d/16cSnb5l0bJKXTirITJUa7RmP_u8ZBdgm/view?usp=sharing)
[Galaxy](https://usegalaxy.org/)
[Rast server](https://rast.nmpdr.org/rast.cgi)
# Filogenia Prática
Sequencias abaixo de Toxocara para serem usadas no alinhamento e para gerar as arvores filogenéticas
---
<details>
<summary>Sequência MF072699.1: Acanthocheilus rotundatus isolate Ar2 18S ribosomal RNA gene, partial sequence</summary>
```fasta
>MF072699.1
ACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCG
GGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGC
GGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCT
CTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCC
GATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGA
CAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCA
CGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATC
CTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATC
AGCTCACGTTG
>JN256979.1
ACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCG
GGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGC
GGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCT
CTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCC
GATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGA
CAAGCGGTATTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCA
CGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATC
CTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATC
AGCTCACGTTG
>EF180059.1
CCCGATTGATTCTGTCGGCGGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGG
CCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTATATACTTGATTTTGATGTCCTACGTG
GATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCCGATTTTCTGACGAGCGCATCTATTAG
ATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAATAACTGTAGCTGATCGCATGGTCCAG
AACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTT
GTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGG
AAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTC
TCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCT
GGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTA
GTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGG
TTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCT
TCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGT
TTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATT
CTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAA
GTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTC
GGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTT
GCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACT
CAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGG
TGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGA
CTCTAGCCTACTAAATAGTCATCGGATAAACAAGTGCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAG
CCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTG
GAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGA
TCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATT
ACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACT
GCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGG
GCAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAA
>U94382.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGC
GAACGGCTCATTATAACAGCTATTATATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCCGATTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTAAATTGGTGACTCTGAATAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACTTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT
CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTGCCGAAAGGTATTGGTAACCCGTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>EF180078.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGC
GAACGGCTCATTATAACAGCTATTATATACTTGATGTTGATGTGTTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCCGATTTTTTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTAAATTGGTGACTCTGAATAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACTTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTGTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATAGTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT
CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTACCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
GCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCYCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGAGACTGCTGTTTCGAGACCTTCCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>AF036608.1
GCCATGCATGTCTAAGTCAAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTATA
TACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCCGA
TTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAATA
ACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGATG
GTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGCCT
GAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGTAG
TGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTAAC
GAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCGTC
ATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAACT
GGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGAGT
TTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAATA
GGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGCATTCG
TATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAGAA
TGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTA
AACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCT
TTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAG
TGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAG
ATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCT
GGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACTT
CTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTC
CAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCC
GTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGT
GTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACTGA
GCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACCGC
CTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTATCAAGGTTTCCGTAGTGAACTGCAG
>U94368.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC
GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATATCCTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCCGATTTTCTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTCAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTATTAAATAGT
CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>U94379.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC
GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATATCCTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCCGAATTTTTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTCAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACTTGGTCCGCCCACTGGGCAAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT
CATCGGATAAACAGGTCCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>U94367.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC
GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATATCCTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCCGATTTTCTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTCAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATATGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTATTAAATAGT
CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>U94383.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC
GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATCTCCTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCAGATTTTCTGACGAGCGCATTTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTCAATTGGTGACTCTGAATAACTATTGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT
CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>U94366.1
GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTAAAAAGGTGAAACCGC
GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGAAATCCTACGTGGATAACTGTGGTAATTCTA
GAGCTAATACATGCACCAAAGCTCCGAATTTTTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT
CGGCCCGTAAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC
AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT
AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA
TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG
GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA
ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG
ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC
ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG
AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG
GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG
TCCGACTGCGAATGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA
TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATATGCCTTGA
CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG
AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC
TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC
GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTATTAAATAGT
CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT
AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA
ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT
CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC
ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG
AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT
TTCC
>JN256975.1
AGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTA
TATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCC
GATTTTCTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAA
TAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGA
TGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGC
CTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGT
AGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTA
ACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCG
TCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAA
CTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGA
GTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAA
TAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGGCAT
TCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAA
GAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACC
GTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAG
TCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAG
GAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGA
CAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTG
TCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTGCGGAAGA
CTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGAT
GTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAA
CCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTA
AGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGAC
TGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAAC
CGCCTTAATCGCAGTGGCTGAACCG
>JN256976.1
AGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTA
TATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCC
GATTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAA
TAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGA
TGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGC
CTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGT
AGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTA
ACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCG
TCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAA
CTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGA
GTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAA
TAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGCATT
CGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAG
AATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCG
TAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGT
CTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGG
AGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGAC
AGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGT
CTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGAC
TTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATG
TCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAAC
CCGTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAA
GTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACT
GAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACC
GCCTTAATCGCAGTGGCTTGAACCG
>JN256982.1
AGCCATGCATGTCTAAGTTTCAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTA
TATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCC
GATTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAA
TAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGA
TGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGC
CTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGT
AGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTA
ACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCG
TCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAA
CTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGA
GTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAA
TAGGATCTCGGTTCTATTTTGTTGGTTTTTCTGATCTGAGATAATGTAAGAGGGACGGACGGGGGCATTC
GTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAGA
ATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGT
AAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTC
TTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGA
GTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACA
GATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTC
TGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACT
TCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGT
CCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACC
CGTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAG
TGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACTG
AGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACCG
CCTTAATCGCAGTGGCTGAACCG
```
</details>
# Metadados das Sequências FASTA
A tabela abaixo apresenta os metadados das sequências FASTA fornecidas para a aula prática de filogenia. Inclui o número de acesso, espécie, gene, tipo de sequência, tamanho em pares de bases (bp), hospedeiro e localização geográfica (simulados para fins educacionais).
| Accession Number | Espécie | Gene | Tipo de Sequência | Tamanho (bp) | Hospedeiro | Localização Geográfica |
|------------------|-----------------------------|----------------------|-------------------|--------------|-------------------------|--------------------------|
| MF072699.1 | Acanthocheilus rotundatus | 18S ribosomal RNA | Parcial | 571 | Peixe de água doce | Amazônia, Brasil |
| JN256979.1 | Toxascaris leonina | 18S ribosomal RNA | Parcial | 571 | Cão selvagem | Serengeti, Tanzânia |
| EF180059.1 | Toxocara cati | 18S ribosomal RNA | Parcial | 1798 | Gato doméstico | São Paulo, Brasil |
| U94382.1 | Toxocara canis | 18S ribosomal RNA | Parcial | 1754 | Cão doméstico | Buenos Aires, Argentina |
| EF180078.1 | Toxocara vitulorum | 18S ribosomal RNA | Parcial | 1754 | Búfalo | Uttar Pradesh, Índia |
| AF036608.1 | Toxocara canis | 18S ribosomal RNA | Parcial | 1741 | Cão doméstico | Queensland, Austrália |
| U94368.1 | Baylisascaris procyonis | 18S ribosomal RNA | Parcial | 1754 | Guaxinim | Ontario, Canadá |
| U94379.1 | Porrocaecum depressum | 18S ribosomal RNA | Parcial | 1754 | Aves aquáticas | Delta do Nilo, Egito |
| U94367.1 | Ascaris suum | 18S ribosomal RNA | Parcial | 1754 | Suíno | Sichuan, China |
| U94383.1 | Toxascaris leonina | 18S ribosomal RNA | Parcial | 1754 | Cão selvagem | Patagônia, Chile |
| U94366.1 | Ascaris lumbricoides | 18S ribosomal RNA | Parcial | 1754 | Humano | Lagos, Nigéria |
| JN256975.1 | Toxocara cati | 18S ribosomal RNA | Parcial | 1705 | Gato doméstico | Lisboa, Portugal |
| JN256976.1 | Toxocara canis | 18S ribosomal RNA | Parcial | 1705 | Cão doméstico | Cidade do México, México |
| JN256982.1 | Toxascaris leonina | 18S ribosomal RNA | Parcial | 1703 | Cão selvagem | Hokkaido, Japão |
**Notas**:
- Todas as sequências são parciais, conforme indicado nos cabeçalhos (e.g., "partial sequence").
- As sequências U94366.1 (*Ascaris lumbricoides*) e U94367.1 (*Ascaris suum*) são recomendadas como outgroups para a análise filogenética.
- Os campos **Hospedeiro** e **Localização Geográfica** são simulados para fins educacionais e não refletem dados reais do GenBank.
## Escolhendo o modelo evolutivo para os meus dados
Guia de Modelos Filogenéticos para MEGA
Escolher o modelo certo para análise filogenética no MEGA depende dos dados, suas características evolutivas e das suposições feitas. Algusn modelos disponíveis no MEGA incluem: Jukes-Cantor (JC), Kimura 2-Parâmetros (K2P), Tamura-Nei (TN93), Hasegawa-Kishino-Yano (HKY) e General Time Reversible (GTR).
1. Jukes-Cantor (JC)
Suposições: Frequências de bases iguais (A = C = G = T = 0,25) e taxas de substituição iguais entre todos os nucleotídeos.
Quando Usar: Sequências pouco divergentes ou análises com suposições mínimas. Ideal para sequências muito próximas.
Caso de Uso: Análises preliminares ou quando não há informações sobre padrões de substituição.
Limitações: Simples demais; não considera variação de frequências de bases ou diferenças transição/transversão.
2. Kimura 2-Parâmetros (K2P)
Suposições: Frequências de bases iguais, mas distingue transições (A↔G, C↔T) de transversões (A↔C, A↔T, G↔C, G↔T) com taxas diferentes.
Quando Usar: Dados com maior frequência de transições. Bom para sequências com divergência moderada.
Caso de Uso: DNA mitocondrial ou genes nucleares com divergência moderada.
Limitações: Assume frequências de bases iguais, o que pode não ser verdade.
3. Tamura-Nei (TN93)
Suposições: Frequências de bases desiguais, distingue dois tipos de transições (A↔G vs. C↔T) e transversões. Pode incluir variação de taxas (distribuição gama, opcional).
Quando Usar: Dados com frequências de bases desiguais e viés transição/transversão. Adequado para várias distâncias evolutivas.
Caso de Uso: DNA nuclear ou organelar com padrões variados; comum em filogenias entre espécies.
Limitações: Mais parâmetros que JC ou K2P, exigindo mais dados.
4. Hasegawa-Kishino-Yano (HKY)
Suposições: Frequências de bases desiguais e taxas diferentes para transições e transversões. Assume uma única taxa de transição (A↔G = C↔T).
Quando Usar: Dados com frequências desiguais e diferenças transição/transversão, mas menos complexos que TN93. Comum em DNA mitocondrial de vertebrados.
Caso de Uso: Dados com complexidade moderada, como DNA mitocondrial ou cloroplastos.
Limitações: Menos flexível que TN93 para taxas de transição distintas.
5. General Time Reversible (GTR)
Suposições: Modelo mais flexível; frequências de bases desiguais e uma taxa única para cada substituição (6 taxas). Pode incluir distribuição gama (+G) e sítios invariáveis (+I).
Quando Usar: Dados complexos com divergência significativa e padrões variados. Ideal para filogenias profundas.
Caso de Uso: Estudos taxonômicos amplos (ex.: entre gêneros/famílias).
Limitações: Computacionalmente intensivo; exige muitos dados para evitar sobreajuste.
Como Escolher o Modelo Certo
Teste de Modelos: Use a ferramenta de Seleção de Modelos no MEGA (menu "Models") para comparar modelos com base no AIC ou BIC. Escolha o modelo com menor valor, equilibrando ajuste e complexidade.
Características dos Dados:
Divergência: JC ou K2P para sequências próximas; TN93, HKY ou GTR para mais divergentes.
Frequência de Bases: Se desiguais (verifique no MEGA), prefira TN93, HKY ou GTR.
Variação de Taxas: Adicione distribuição gama (+G) ou sítios invariáveis (+I) para TN93, HKY ou GTR se as taxas variam entre sítios.
Tamanho do Conjunto de Dados: Modelos simples (JC, K2P) para dados pequenos; GTR para conjuntos grandes e diversos.
Método Filogenético:
Distância (NJ, UPGMA): JC ou K2P são suficientes.
Máxima Verossimilhança (ML): TN93, HKY ou GTR para maior precisão.
Parsimônia Máxima (MP): Modelos são menos relevantes, mas padrões de substituição podem ajudar.
Inferência Bayesiana (ex.: MrBayes): GTR+G+I é comum para dados complexos.
Recomendações Práticas
Teste Inicial: Use a seleção de modelos do MEGA para identificar o melhor ajuste.
Escolhas Padrão:
Taxa próximos (dentro de espécie): JC ou K2P.
Divergência moderada (entre espécies): TN93 ou HKY.
Filogenias profundas (entre gêneros/famílias): GTR+G ou GTR+G+I.
Variação de Taxas: Adicione distribuição gama (+G) para alinhamentos longos.
Recursos Computacionais: GTR é pesado; use HKY ou TN93 para dados pequenos ou recursos limitados.
Validação: Após construir a árvore (NJ ou ML), use bootstrap (100–1000 repetições) para avaliar robustez. Se inconsistente, teste um modelo mais complexo ou revise o alinhamento.
## programa para visualizar e enfeitar a arvore filogenética
[TvBOT](https://www.chiplot.online/tvbot.html) mais facil de usar
[Itol](https://itol.embl.de/) mais complicado, precisa de uma conta, mas é melhor
# fazer online a filogenia com 1 clike
[Phylofr](https://www.phylogeny.fr/simple_phylogeny.cgi)
Necessário instalar o google chrome
para instalação no linux usar o código abaixo
'''
sudo dpkg -i ./"arquivo do google baixado"
'''
# filogenia com linha de comando
padrão ouro da filogenia*
instalar o iqtree no linux
'''
sudo apt instal iqtree
'''
realizar o alinhamento com "muscle"
'''
muscle -in ./sequencias.fas -out ./muscle_aln.fas
'''
rodar o iqtree no arquivo alinhado pelo muscle
'''
iqtree2 -s muscle_aln.fas -m MFP -B 1000 -nt AUTO
'''
checar o arquivo log e ver qual o modelo realizado
verificar todas as 3 formas o quanto diferente foram os modelos nas arvores
# Pangenoma
arquivos da aula
[Pangenoma](https://drive.google.com/drive/folders/1TNEkrkOjtFaZ3uVeBd5OvW6941p6qNi3?usp=sharing)
## baixar os genomas do ncbi
instalar o ncbi entrez-direct
```
sudo apt install ncbi-entrez-direct
```
rodar o scirpt para criar uma arquivo com os acesso a serem baixados
```
echo -e "NC_007795.1\nNZ_CP104478.1\nNZ_CP011526.1\nNZ_CP035101.1\nNZ_CP040998.1" > genomes.txt
```
rodar o script abaixo para baixar os genomas no arquivo em formato fna
´´´
```
while read -r id; do efetch -db nuccore -id "$id" -format fasta > "${id}.fna"; done < genomes.txt
```
## Rodar o prokka nos genomas baixados
```
for i in *.fna; do prokka --outdir ${i%.fna} --force --addgenes --addmrna --rfam --prefix ${i%.fna} --locustag ${i%.fna} $i; done
```
criar uma pasta para copiar os arquivos gff
```
mkdir gff
```
mover os arquivos para a pasta gff criada
```
find ./ -name '*.gff' | grep 'gff' | xargs cp -t ./gff
```
## instalar o roary
```
sudo apt install roary
```
## rodar o roary nos arquivos baixados
```
# Rodar o programa Roary no arquivos gff gerados pelo prokka
roary -f roary_output -e -n -v *.gff
## Rodar o script roary_plots para gerar gráficos adicionais
python roary_plots.py ./accessory_binary_genes.fa.newick gene_presence_absence.csv
# Se tiver erros de bibliotecas de python
# seguir o exemplo de pip install "nome da biblioteca"
pip install seaborn
pip install Bio
# instalar o R no biolonux
sudo apt install r-base
# se der usar o item de mudando o arquivos plots no R
# Criar os gráficos com o R
create_pan_genome_plots
chmod +x ./roary2svg.pl
./roary2svg.pl gene_presence_absence.csv > pan_genome.svg
```
mudando o arquivo de plots do R
```
which roary
featherpad /usr/bin/create_pan_genome_plots
```
## acessar os genes do pangenoma
usar o arquivo de analise do Roary pan_genome_reference.fa
[eggnog](http://eggnog-mapper.embl.de/)
# GEnes de virulencia com ABRICATE
[ABRICATE](https://github.com/tseemann/abricate)
# Analise e predição de estrutura de proteinas
Ferramentas a serem utilizadas
[UniProt](https://www.uniprot.org/) Banco de dados de proteinas
[AlphaFold Colab](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb#scrollTo=11l8k--10q0C) para predição de estrutura terciária
[PSIPRED](http://bioinf.cs.ucl.ac.uk/psipred/) para predição de estrutura secundária.
[Molstar Viewer](https://molstar.org/viewer/) observar as estruturas terciárias
[SwissModel](https://swissmodel.expasy.org/) gera estrutura terciaria
Programa para comparação de estruturas de proteinas [FATCAT](https://fatcat.godziklab.org/fatcat/fatcat_pair.html)
Visualização local usar o [PyMOL](https://pymolwiki.org/index.php/Main_Page)
Para instalar o PyMOL no biolinux
```
sudo apt install pymol
```
### Obtenção das sequencias de aminoácidos
No caso, iremo trabalhar com a proteina insulina humana
Acessar o [PDB](https://www.rcsb.org/) - encontrar o fasta da proteina humana
- Pegar o fasta da insulina normal (2JUM) e da variante mutante (1K3M)
EXTRA
Acessar o NCBI e pegar o gene da insulina da lebre do mar (*Aplysia californica*)
-[NM_001204686.1](https://www.ncbi.nlm.nih.gov/nuccore/NM_001204686.1)
Visualiza a estrutura secudária no [PSIPRED](http://bioinf.cs.ucl.ac.uk/psipred/) para isso, basta juntar as sequencias da insulina e colar no site e aguar o resultado e interpretar
Predizer a estrutura terciária no [AlphaFold Colab](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb#scrollTo=11l8k--10q0C) e no [SwissModel](https://swissmodel.expasy.org/) aguardar o resultado e comprar.
**No SwissMOdel avaliar**
GMQE (Global Model Quality Estimate): 0.77
Varia de 0 a 1.
Quanto mais próximo de 1, melhor a expectativa de qualidade do modelo.
GMQE considera tanto a similaridade entre sua sequência e o template quanto a cobertura da modelagem.
QMEANDisCo Global: 0.73 ± 0.08
Também varia de 0 a 1.
Mede a confiabilidade global da estrutura predita, baseada em estatísticas estruturais (potenciais derivados de conhecimento e comparação com estruturas reais).
Acima de 0.7 geralmente indica modelo confiável.
Instalar o PyMOL no biolonux. Abrir as estruturas proteicas geradas, alinhar e observar.
Gerar uma figura para um artigo