# Fundamentos de Bioinformática e Análise Genômica ## Calendário | Data | Tema | | -------- | -------- | | 23/03/2025 |Apresentação geral da disciplina e conceitos básicos de bioinformática. | 31/03/2025 | Introdução ao linux |07/04/2025 | Sequênciamento e ômicas |14/04/2025 | Alinhamentos de sequencias, AB1s e confecçãode primers |28/04/2025 | Anotação e montagem de genomas |05/05/2025 | Evolução de genomas, analise de elementos genéticos móveis |12/05/2025 | Filagenia e interpretação de arvores filogenéticas |19/05/2025| Genomica epidemiológica, identificação de marcadores |26/05/2025| Analise e predição de estrutura de proteinas |02/06/2025| Analise de Pangenomas |09/06/2025| Ferramentas online para analise de bioinformática |16/06/2025| Atividade Avaliativa I |30/06/2025|Atividade Avaliativa II 07/07/2025|Atividade Avaliativa III |14/07/2025| Atividade Avaliativa IV # Banco de scrips https://hackmd.io/VPKSiywKTES91DUDQW1dwg # Progrmas de bioinformática [Prokka](https://github.com/tseemann/prokka) [Roary](https://sanger-pathogens.github.io/Roary/) [Panaroo](https://github.com/gtonkinhill/panaroo) [Rast](https://rast.nmpdr.org/rast.cgi) [Galaxy](https://usegalaxy.org/) [Spades](https://github.com/ablab/spades) # Semana 1 [Aula 1 Conceitos Básicos de bioinformática ](https://teodianobastoslab.net/Bioinformatica/LivroBioinformatica.pdf) # Semana 2 Baseado na disciplina do [Prof. Dr. Alessandro de Melo Varani](https://www.fcav.unesp.br/#!/departamentos/tecnologia/docentes/alessandro-de-mello-varani/main-page/english/) [Introdução ao linux](https://hackmd.io/@lamoroso92/minicurso25-linux) [Guia da aula](https://uoguelphca-my.sharepoint.com/:b:/g/personal/lpizauro_uoguelph_ca/ESxv3qn6O1RAp8AIdToHI4wBRBqtJlZNN8nCseqTcQ4PjQ?e=v39NGT) [Virutal box](https://www.virtualbox.org/wiki/Downloads) [Biolinux](https://uoguelphca-my.sharepoint.com/:u:/g/personal/lpizauro_uoguelph_ca/EevArqdRDc1IsmMVO7RZev4BuqT_5z860FCfkZT5w6JDcQ?e=ketJaX) - Senha bioinfo2025 Aqui estão os comandos essenciais para começar: | Comando | Função | Exemplo | | ------- | -------------- | ------- | |ls |Lista arquivos e diretórios| ls -la (detalhes e ocultos)| |mkdir |Cria um diretório| mkdir nova_pasta| |cd |Navega entre diretórios| cd /home/suario| |pwd |Mostra o diretório atual| pwd (/home/usuario)| |cp |Copia arquivos ou diretórios| cp arquivo.txt copia.txt| |mv |Move ou renomeia arquivos/diretórios | mv arquivo.txt /tmp/novo.txt| |rm |Remove arquivos ou diretórios | rm -r pasta (remove recursivo)| |touch |Cria arquivo vazio ou atualiza data | touch novo_arquivo.txt| |cat |Exibe, concatena ou cria arquivos | cat arquivo.txt (mostra conteúdo)| |head |Exibe primeiras linhas de um arquivo| head -n 5 arquivo.txt (5 linhas)| |less |Visualiza arquivos com navegação |less arquivo.txt (navega com setas)| |tr |Traduz ou manipula caracteres em texto| tr 'a-z' 'A-Z' < arquivo.txt| |chmod |Altera permissões de arquivos/diretórios| chmod 755 arquivo.txt| |chown |Altera proprietário de arquivos/diretórios| chown usuario:grupo arquivo.txt| |find |Busca arquivos/diretórios |find / -name "arquivo.txt"| |grep |Busca texto em arquivos |grep "texto" arquivo.txt| |df |Mostra uso do disco |df -h (formato legível)| |top |Monitora processos em tempo real |top (processos ativos)| |sudo |Executa comando como administrador |sudo apt update| Crie um diretório chamado bioinfo_pratica: ``` mkdir bioinfo_pratica cd bioinfo_pratica ``` Crie um arquivo FASTA simples com o comando ho ``` echo -e ">seq1\nATGCGTACG\n>seq2\nTTAGCCATG" > sequencias.fasta ``` #### criar e juntar .fasta "cat >>" Visualize o arquivo: ``` cat sequencias.fasta ``` Conte o número de sequências (linhas que começam com >): ``` grep -c "^>" sequencias.fasta ``` Busque sequências que contenham "ATG": ``` grep -A 1 "ATG" sequencias.fasta ``` Outras formas de avaliar o arquivo ``` awk '/^>/{print $1; getline; print}' sequencias.fasta ``` Atualize o sistema e instale o BLAST: ``` sudo apt update sudo apt install ncbi-blast+ ``` Verifique a instalação ``` blastn -version ``` Usando BLAST Crie um banco de dados com o arquivo sequencias.fasta: ``` makeblastdb -in sequencias.fasta -dbtype nucl ``` Execute uma busca simples: bash ``` blastn -query sequencias.fasta -db sequencias.fasta -out resultado.txt ``` veja o resultado ``` cat resultado.txt ``` Automação com Scripts Ensine os alunos a criar um script simples para automatizar tarefas. ``` contar_sequencias.sh: nano contar_sequencias.sh #!/bin/bash echo "Contando sequências no arquivo $1" grep -c "^>" $1 ``` Torne o script executavel ``` chmod +x contar_sequencias.sh ``` Execute o script ``` ./contar_sequencias.sh sequencias.fasta ``` Instalar o programa ugene e Mega no linux Versão ubuntu e variações #### Ugene Baixe o instalador: Acesse ugene.net/download e baixe o arquivo ugeneInstaller_64bit.tar para sistemas 64 bits Desconpacte o arquivo: No terminal, navegue até o diretório de download (ex.: ~/Downloads): ``` cd ~/Downloads tar -xf ugeneInstaller_64bit.tar ``` Torne o instalador executável: ``` chmod +x ugeneInstaller_64bit ``` (Usa chmod da tabela para alterar permissões) Execute o instalador: ``` ./ugeneInstaller_64bit ``` Siga o assistente de instalação gráfico. Inicie o UGENE: Após a instalação, inicie pelo menu de aplicativos ou via terminal: `ugene` #### Instalando o Mega Acesse megasoftware.net/downloads e selecione a versão para Ubuntu (ex.: MEGA_12.0.9_Ubuntu_64bit.deb para a versão mais recente, MEGA 12 Alternativamente, use o terminal para baixar diretamente (substitua pelo link correto da versão desejada): Modo terminal (para quem pretende usar servidores) ``` cd ~/Downloads wget https://www.megasoftware.net/releases/MEGA_12.0.9_Ubuntu_64bit.deb ``` Verifique o arquivo baixado: Confirme que o arquivo está no diretório: ``` ls ``` Instale o pacote MEGA X: Instale o arquivo .deb: ``` sudo apt install ./MEGA_12.0.9_Ubuntu_64bit.deb ``` Se houver dependências ausentes, corrija com: ``` sudo apt install -f ``` # Arquivos ab1 [16SRNA sequence](https://drive.google.com/file/d/16cSnb5l0bJKXTirITJUa7RmP_u8ZBdgm/view?usp=sharing) [Galaxy](https://usegalaxy.org/) [Rast server](https://rast.nmpdr.org/rast.cgi) # Filogenia Prática Sequencias abaixo de Toxocara para serem usadas no alinhamento e para gerar as arvores filogenéticas --- <details> <summary>Sequência MF072699.1: Acanthocheilus rotundatus isolate Ar2 18S ribosomal RNA gene, partial sequence</summary> ```fasta >MF072699.1 ACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCG GGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGC GGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCT CTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCC GATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGA CAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCA CGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATC CTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATC AGCTCACGTTG >JN256979.1 ACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCG GGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGC GGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCT CTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCC GATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGA CAAGCGGTATTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCA CGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATC CTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATC AGCTCACGTTG >EF180059.1 CCCGATTGATTCTGTCGGCGGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGG CCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTATATACTTGATTTTGATGTCCTACGTG GATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCCGATTTTCTGACGAGCGCATCTATTAG ATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAATAACTGTAGCTGATCGCATGGTCCAG AACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTT GTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGG AAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTC TCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCT GGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTA GTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGG TTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCT TCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGT TTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATT CTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAA GTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTC GGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTT GCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACT CAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGG TGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGA CTCTAGCCTACTAAATAGTCATCGGATAAACAAGTGCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAG CCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTG GAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGA TCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATT ACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACT GCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGG GCAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAA >U94382.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGC GAACGGCTCATTATAACAGCTATTATATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCCGATTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT CGGCCCGTAAATTGGTGACTCTGAATAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACTTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTGCCGAAAGGTATTGGTAACCCGTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >EF180078.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGC GAACGGCTCATTATAACAGCTATTATATACTTGATGTTGATGTGTTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCCGATTTTTTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT CGGCCCGTAAATTGGTGACTCTGAATAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACTTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTGTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATAGTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTACCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT GCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCYCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGAGACTGCTGTTTCGAGACCTTCCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >AF036608.1 GCCATGCATGTCTAAGTCAAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTATA TACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCCGA TTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAATA ACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGATG GTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGCCT GAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGTAG TGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTAAC GAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCGTC ATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAACT GGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGAGT TTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAATA GGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGCATTCG TATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAGAA TGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTA AACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTCT TTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGAG TGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACAG ATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCT GGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACTT CTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTC CAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACCC GTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAGT GTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACTGA GCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACCGC CTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTATCAAGGTTTCCGTAGTGAACTGCAG >U94368.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATATCCTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCCGATTTTCTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT CGGCCCGTCAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTATTAAATAGT CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >U94379.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATATCCTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCCGAATTTTTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT CGGCCCGTCAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACTTGGTCCGCCCACTGGGCAAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT CATCGGATAAACAGGTCCGGAAGACTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >U94367.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATATCCTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCCGATTTTCTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT CGGCCCGTCAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATATGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTATTAAATAGT CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >U94383.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTATAAAGGTGAAACCGC GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGATCTCCTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCAGATTTTCTGACGAGCGCATTTATTAGATTAAAACCAATCGGGTTT CGGCCCGTCAATTGGTGACTCTGAATAACTATTGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGT CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >U94366.1 GGTTATATGCTTATCTCAAAGGCTAAGCCATGCATGTCTAAGTTCAAATGGCCTAAAAAGGTGAAACCGC GAACGGCTCATTACAACAGCTATTATATACTTGATCTTGAAATCCTACGTGGATAACTGTGGTAATTCTA GAGCTAATACATGCACCAAAGCTCCGAATTTTTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTT CGGCCCGTAAATTGGTGACTCTGAATAACTATAGCTGATCGCATGGTCTCGAACCGGCGACGTGTCTATC AAGTGTCTGCCTTATCAACTGTCGATGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAAT AAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAA TTACCCACTCTCGGCATGAGGAGGTAGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCG GAATGGGTACAATTTAAACCCGTTAACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTA ATTCCAGCTCTCAAAGTGTATATCGTCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGG ACCTGGTCCGCCCACTGGGCGAGAACTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTC ATCGGTCGCGTAGGGTGGCTAGCGAGTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTG AATACTCGTGCATGGAATAATAGAATAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATG GTTAAGAGGGACGGACGGGGGCATTCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACG TCCGACTGCGAATGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGA TCAGATACCGCCCTAGTTCTGACCGTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATATGCCTTGA CGGGCAGCTTCCCGGAAACGAAAGTCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAG AAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACC TGGCCCGGACACCGTGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCC GTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTATTAAATAGT CATCGGATAAACAAGTCCGGAAGACTTCTTAGAGGGACAAGCGGTATTCAGCCGCATGAAGTTGAGCAAT AACAGGTCTGTGATGCCCTTAGATGTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTA ACCATTGCCGAAAGGTATTGGTAACCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTT CCCTTGAACGAGGAATTCCTAGTAAGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTAC ACACCGCCCGTCGCTGCCCGGGACTGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTTCG AGGTGGAGATTCTTTGGTGGAAACCGCCTTAATCGCAGTGGCTTGAACCGGGCAAAAGTCGTAACAAGGT TTCC >JN256975.1 AGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTA TATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCC GATTTTCTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAA TAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGA TGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGC CTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGT AGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTA ACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCG TCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAA CTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGA GTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAA TAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGGCAT TCGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAA GAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACC GTAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAG TCTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAG GAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGA CAGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTG TCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTGCGGAAGA CTTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGAT GTCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAA CCCCTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTA AGTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGAC TGAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAAC CGCCTTAATCGCAGTGGCTGAACCG >JN256976.1 AGCCATGCATGTCTAAGTTCAAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTA TATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCC GATTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAA TAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGA TGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGC CTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGT AGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTA ACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCG TCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAA CTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGA GTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAA TAGGATCTCGGTTCTATTTTGTTGGTTTTCTGATCTGAGATAATGGTTAAGAGGGACGGACGGGGGCATT CGTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAG AATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCG TAAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGT CTTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGG AGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGAC AGATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGT CTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGAC TTCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATG TCCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAAC CCGTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAA GTGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACT GAGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACC GCCTTAATCGCAGTGGCTTGAACCG >JN256982.1 AGCCATGCATGTCTAAGTTTCAATGGCCTTTAAAGGTGAAACCGCGAACGGCTCATTATAACAGCTATTA TATACTTGATCTTGATGTCCTACGTGGATAACTGTGGTAATTCTAGAGCTAATACATGCACCAAAGCTCC GATTTTGTGACGAGCGCATCTATTAGATTAAAACCAATCGGGTTTCGGCCCGTAAATTGGTGACTCTGAA TAACTGTAGCTGATCGCATGGTCCAGAACCGGCGACGTGTCTATCAAGTGTCTGCCTTATCAACTGTCGA TGGTAGTTTATGTGCCTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGC CTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCTCGGCATGAGGAGGT AGTGACGAAAAATAACGAGACCGTTCTCTATGAGGCCGGTTATCGGAATGGGTACAATTTAAACCCGTTA ACGAGGATCTATGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCAAAGTGTATATCG TCATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGCGCCTCAGGACTTGGTCCGCCCACTGGGCGAGAA CTGGGCTCCTGGGCTAGTTCTGCTGGTTTTCCCTACGTTGCCTTCATCGGTCGCGTAGGGTGGCTAGCGA GTTTACTTTGAAAAAATTAGAGTGCTTCACGCGGGCTTATGTCTGAATACTCGTGCATGGAATAATAGAA TAGGATCTCGGTTCTATTTTGTTGGTTTTTCTGATCTGAGATAATGTAAGAGGGACGGACGGGGGCATTC GTATCGCTGCGTGAGAGGTGAAATTCTTGGACCGTAGCGAGACGTCCGACTGCGAAAGCATTTGCCAAGA ATGTCTTCATTAATCAAGAACGAAAGTCAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGT AAACGATACCAACTAGCGTTCCGTCGGCGGTAAATACGCCTTGACGGGCAGCTTCCCGGAAACGAAAGTC TTTCGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGAAATTGACGGAAGGGCACCACCAGGA GTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCTGGCCCGGACACCGTGAGGATTGACA GATTGAGAGCTCTTTCTTGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTC TGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTCATCGGATAAACAAGTCCGGAAGACT TCTTAGAGGGACAAGCGGTGTTCAGCCGCATGAAGTTGAGCAATAACAGGTCTGTGATGCCCTTAGATGT CCAGGGCTGCACGCGCGCTACACTGGAGGAATCAGCGTGCTGTAACCATTGCCGAAAGGTATTGGTAACC CGTTGAAAATCCTCCGTGATCGGGATCGGGAATTGCAATTATTTCCCTTGAACGAGGAATTCCTAGTAAG TGTGAGTCATCAGCTCACGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTGCCCGGGACTG AGCCGTTTCGAGAAAAGCGGGGACTGCTGTTTCGAGACCTTCCGAGGTGGAGATTCTTTGGTGGAAACCG CCTTAATCGCAGTGGCTGAACCG ``` </details> # Metadados das Sequências FASTA A tabela abaixo apresenta os metadados das sequências FASTA fornecidas para a aula prática de filogenia. Inclui o número de acesso, espécie, gene, tipo de sequência, tamanho em pares de bases (bp), hospedeiro e localização geográfica (simulados para fins educacionais). | Accession Number | Espécie | Gene | Tipo de Sequência | Tamanho (bp) | Hospedeiro | Localização Geográfica | |------------------|-----------------------------|----------------------|-------------------|--------------|-------------------------|--------------------------| | MF072699.1 | Acanthocheilus rotundatus | 18S ribosomal RNA | Parcial | 571 | Peixe de água doce | Amazônia, Brasil | | JN256979.1 | Toxascaris leonina | 18S ribosomal RNA | Parcial | 571 | Cão selvagem | Serengeti, Tanzânia | | EF180059.1 | Toxocara cati | 18S ribosomal RNA | Parcial | 1798 | Gato doméstico | São Paulo, Brasil | | U94382.1 | Toxocara canis | 18S ribosomal RNA | Parcial | 1754 | Cão doméstico | Buenos Aires, Argentina | | EF180078.1 | Toxocara vitulorum | 18S ribosomal RNA | Parcial | 1754 | Búfalo | Uttar Pradesh, Índia | | AF036608.1 | Toxocara canis | 18S ribosomal RNA | Parcial | 1741 | Cão doméstico | Queensland, Austrália | | U94368.1 | Baylisascaris procyonis | 18S ribosomal RNA | Parcial | 1754 | Guaxinim | Ontario, Canadá | | U94379.1 | Porrocaecum depressum | 18S ribosomal RNA | Parcial | 1754 | Aves aquáticas | Delta do Nilo, Egito | | U94367.1 | Ascaris suum | 18S ribosomal RNA | Parcial | 1754 | Suíno | Sichuan, China | | U94383.1 | Toxascaris leonina | 18S ribosomal RNA | Parcial | 1754 | Cão selvagem | Patagônia, Chile | | U94366.1 | Ascaris lumbricoides | 18S ribosomal RNA | Parcial | 1754 | Humano | Lagos, Nigéria | | JN256975.1 | Toxocara cati | 18S ribosomal RNA | Parcial | 1705 | Gato doméstico | Lisboa, Portugal | | JN256976.1 | Toxocara canis | 18S ribosomal RNA | Parcial | 1705 | Cão doméstico | Cidade do México, México | | JN256982.1 | Toxascaris leonina | 18S ribosomal RNA | Parcial | 1703 | Cão selvagem | Hokkaido, Japão | **Notas**: - Todas as sequências são parciais, conforme indicado nos cabeçalhos (e.g., "partial sequence"). - As sequências U94366.1 (*Ascaris lumbricoides*) e U94367.1 (*Ascaris suum*) são recomendadas como outgroups para a análise filogenética. - Os campos **Hospedeiro** e **Localização Geográfica** são simulados para fins educacionais e não refletem dados reais do GenBank. ## Escolhendo o modelo evolutivo para os meus dados Guia de Modelos Filogenéticos para MEGA Escolher o modelo certo para análise filogenética no MEGA depende dos dados, suas características evolutivas e das suposições feitas. Algusn modelos disponíveis no MEGA incluem: Jukes-Cantor (JC), Kimura 2-Parâmetros (K2P), Tamura-Nei (TN93), Hasegawa-Kishino-Yano (HKY) e General Time Reversible (GTR). 1. Jukes-Cantor (JC) Suposições: Frequências de bases iguais (A = C = G = T = 0,25) e taxas de substituição iguais entre todos os nucleotídeos. Quando Usar: Sequências pouco divergentes ou análises com suposições mínimas. Ideal para sequências muito próximas. Caso de Uso: Análises preliminares ou quando não há informações sobre padrões de substituição. Limitações: Simples demais; não considera variação de frequências de bases ou diferenças transição/transversão. 2. Kimura 2-Parâmetros (K2P) Suposições: Frequências de bases iguais, mas distingue transições (A↔G, C↔T) de transversões (A↔C, A↔T, G↔C, G↔T) com taxas diferentes. Quando Usar: Dados com maior frequência de transições. Bom para sequências com divergência moderada. Caso de Uso: DNA mitocondrial ou genes nucleares com divergência moderada. Limitações: Assume frequências de bases iguais, o que pode não ser verdade. 3. Tamura-Nei (TN93) Suposições: Frequências de bases desiguais, distingue dois tipos de transições (A↔G vs. C↔T) e transversões. Pode incluir variação de taxas (distribuição gama, opcional). Quando Usar: Dados com frequências de bases desiguais e viés transição/transversão. Adequado para várias distâncias evolutivas. Caso de Uso: DNA nuclear ou organelar com padrões variados; comum em filogenias entre espécies. Limitações: Mais parâmetros que JC ou K2P, exigindo mais dados. 4. Hasegawa-Kishino-Yano (HKY) Suposições: Frequências de bases desiguais e taxas diferentes para transições e transversões. Assume uma única taxa de transição (A↔G = C↔T). Quando Usar: Dados com frequências desiguais e diferenças transição/transversão, mas menos complexos que TN93. Comum em DNA mitocondrial de vertebrados. Caso de Uso: Dados com complexidade moderada, como DNA mitocondrial ou cloroplastos. Limitações: Menos flexível que TN93 para taxas de transição distintas. 5. General Time Reversible (GTR) Suposições: Modelo mais flexível; frequências de bases desiguais e uma taxa única para cada substituição (6 taxas). Pode incluir distribuição gama (+G) e sítios invariáveis (+I). Quando Usar: Dados complexos com divergência significativa e padrões variados. Ideal para filogenias profundas. Caso de Uso: Estudos taxonômicos amplos (ex.: entre gêneros/famílias). Limitações: Computacionalmente intensivo; exige muitos dados para evitar sobreajuste. Como Escolher o Modelo Certo Teste de Modelos: Use a ferramenta de Seleção de Modelos no MEGA (menu "Models") para comparar modelos com base no AIC ou BIC. Escolha o modelo com menor valor, equilibrando ajuste e complexidade. Características dos Dados: Divergência: JC ou K2P para sequências próximas; TN93, HKY ou GTR para mais divergentes. Frequência de Bases: Se desiguais (verifique no MEGA), prefira TN93, HKY ou GTR. Variação de Taxas: Adicione distribuição gama (+G) ou sítios invariáveis (+I) para TN93, HKY ou GTR se as taxas variam entre sítios. Tamanho do Conjunto de Dados: Modelos simples (JC, K2P) para dados pequenos; GTR para conjuntos grandes e diversos. Método Filogenético: Distância (NJ, UPGMA): JC ou K2P são suficientes. Máxima Verossimilhança (ML): TN93, HKY ou GTR para maior precisão. Parsimônia Máxima (MP): Modelos são menos relevantes, mas padrões de substituição podem ajudar. Inferência Bayesiana (ex.: MrBayes): GTR+G+I é comum para dados complexos. Recomendações Práticas Teste Inicial: Use a seleção de modelos do MEGA para identificar o melhor ajuste. Escolhas Padrão: Taxa próximos (dentro de espécie): JC ou K2P. Divergência moderada (entre espécies): TN93 ou HKY. Filogenias profundas (entre gêneros/famílias): GTR+G ou GTR+G+I. Variação de Taxas: Adicione distribuição gama (+G) para alinhamentos longos. Recursos Computacionais: GTR é pesado; use HKY ou TN93 para dados pequenos ou recursos limitados. Validação: Após construir a árvore (NJ ou ML), use bootstrap (100–1000 repetições) para avaliar robustez. Se inconsistente, teste um modelo mais complexo ou revise o alinhamento. ## programa para visualizar e enfeitar a arvore filogenética [TvBOT](https://www.chiplot.online/tvbot.html) mais facil de usar [Itol](https://itol.embl.de/) mais complicado, precisa de uma conta, mas é melhor # fazer online a filogenia com 1 clike [Phylofr](https://www.phylogeny.fr/simple_phylogeny.cgi) Necessário instalar o google chrome para instalação no linux usar o código abaixo ''' sudo dpkg -i ./"arquivo do google baixado" ''' # filogenia com linha de comando padrão ouro da filogenia* instalar o iqtree no linux ''' sudo apt instal iqtree ''' realizar o alinhamento com "muscle" ''' muscle -in ./sequencias.fas -out ./muscle_aln.fas ''' rodar o iqtree no arquivo alinhado pelo muscle ''' iqtree2 -s muscle_aln.fas -m MFP -B 1000 -nt AUTO ''' checar o arquivo log e ver qual o modelo realizado verificar todas as 3 formas o quanto diferente foram os modelos nas arvores # Pangenoma arquivos da aula [Pangenoma](https://drive.google.com/drive/folders/1TNEkrkOjtFaZ3uVeBd5OvW6941p6qNi3?usp=sharing) ## baixar os genomas do ncbi instalar o ncbi entrez-direct ``` sudo apt install ncbi-entrez-direct ``` rodar o scirpt para criar uma arquivo com os acesso a serem baixados ``` echo -e "NC_007795.1\nNZ_CP104478.1\nNZ_CP011526.1\nNZ_CP035101.1\nNZ_CP040998.1" > genomes.txt ``` rodar o script abaixo para baixar os genomas no arquivo em formato fna ´´´ ``` while read -r id; do efetch -db nuccore -id "$id" -format fasta > "${id}.fna"; done < genomes.txt ``` ## Rodar o prokka nos genomas baixados ``` for i in *.fna; do prokka --outdir ${i%.fna} --force --addgenes --addmrna --rfam --prefix ${i%.fna} --locustag ${i%.fna} $i; done ``` criar uma pasta para copiar os arquivos gff ``` mkdir gff ``` mover os arquivos para a pasta gff criada ``` find ./ -name '*.gff' | grep 'gff' | xargs cp -t ./gff ``` ## instalar o roary ``` sudo apt install roary ``` ## rodar o roary nos arquivos baixados ``` # Rodar o programa Roary no arquivos gff gerados pelo prokka roary -f roary_output -e -n -v *.gff ## Rodar o script roary_plots para gerar gráficos adicionais python roary_plots.py ./accessory_binary_genes.fa.newick gene_presence_absence.csv # Se tiver erros de bibliotecas de python # seguir o exemplo de pip install "nome da biblioteca" pip install seaborn pip install Bio # instalar o R no biolonux sudo apt install r-base # se der usar o item de mudando o arquivos plots no R # Criar os gráficos com o R create_pan_genome_plots chmod +x ./roary2svg.pl ./roary2svg.pl gene_presence_absence.csv > pan_genome.svg ``` mudando o arquivo de plots do R ``` which roary featherpad /usr/bin/create_pan_genome_plots ``` ## acessar os genes do pangenoma usar o arquivo de analise do Roary pan_genome_reference.fa [eggnog](http://eggnog-mapper.embl.de/) # GEnes de virulencia com ABRICATE [ABRICATE](https://github.com/tseemann/abricate) # Analise e predição de estrutura de proteinas Ferramentas a serem utilizadas [UniProt](https://www.uniprot.org/) Banco de dados de proteinas [AlphaFold Colab](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb#scrollTo=11l8k--10q0C) para predição de estrutura terciária [PSIPRED](http://bioinf.cs.ucl.ac.uk/psipred/) para predição de estrutura secundária. [Molstar Viewer](https://molstar.org/viewer/) observar as estruturas terciárias [SwissModel](https://swissmodel.expasy.org/) gera estrutura terciaria Programa para comparação de estruturas de proteinas [FATCAT](https://fatcat.godziklab.org/fatcat/fatcat_pair.html) Visualização local usar o [PyMOL](https://pymolwiki.org/index.php/Main_Page) Para instalar o PyMOL no biolinux ``` sudo apt install pymol ``` ### Obtenção das sequencias de aminoácidos No caso, iremo trabalhar com a proteina insulina humana Acessar o [PDB](https://www.rcsb.org/) - encontrar o fasta da proteina humana - Pegar o fasta da insulina normal (2JUM) e da variante mutante (1K3M) EXTRA Acessar o NCBI e pegar o gene da insulina da lebre do mar (*Aplysia californica*) -[NM_001204686.1](https://www.ncbi.nlm.nih.gov/nuccore/NM_001204686.1) Visualiza a estrutura secudária no [PSIPRED](http://bioinf.cs.ucl.ac.uk/psipred/) para isso, basta juntar as sequencias da insulina e colar no site e aguar o resultado e interpretar Predizer a estrutura terciária no [AlphaFold Colab](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb#scrollTo=11l8k--10q0C) e no [SwissModel](https://swissmodel.expasy.org/) aguardar o resultado e comprar. **No SwissMOdel avaliar** GMQE (Global Model Quality Estimate): 0.77 Varia de 0 a 1. Quanto mais próximo de 1, melhor a expectativa de qualidade do modelo. GMQE considera tanto a similaridade entre sua sequência e o template quanto a cobertura da modelagem. QMEANDisCo Global: 0.73 ± 0.08 Também varia de 0 a 1. Mede a confiabilidade global da estrutura predita, baseada em estatísticas estruturais (potenciais derivados de conhecimento e comparação com estruturas reais). Acima de 0.7 geralmente indica modelo confiável. Instalar o PyMOL no biolonux. Abrir as estruturas proteicas geradas, alinhar e observar. Gerar uma figura para um artigo