# Datalab 5: Repeat Review Session
#### April 15th, 2021
Let’s keep things centralized in a collab document!
This is the link to this page, save it so you can come back to it: https://hackmd.io/@yinhsieh/repeatreviewsession
### Today’s Schedule
- [x] 10.15 - 10.25 : Setup & How To
- [x] 10.25 - 10.30 : Comments from Homework 4
- [x] 10.30 - 11.10 : Overview of Review Topics
- [x] 11.10 - 11.20 : Break
- [x] 11.20 - 11.35 : Finish Overview of Review Topics
- [x] 11.35 - 12.15 : Start on Homework 5
- [x] 12.15 - 13.00 : Free time / Work on Homework 5
## Comments to Homework 4
1. Canvas homework submission issues
- pdf / docx format
2. Finding protein sequence from Blast hit
- accession number -> NCBI entry page
3. Use of *blastp, blastn, blastx, tblastn,* etc.
- *blastp, blastn* most sensitive!
4. Can you identify **species** of a protein if there is 100% perc. identity?
5. Mass spectrometry is use for **proteins** and **metabolites**, not DNA!
## Breakout Room Assignment
When we start on the homework, you can choose which breakout room you want to be in. Please write yourself under one of the rooms. You can also work alone, then just stay in the main meeting room.
When you have a question while in a breakout room, write it under the room assignment (otherwise if I am in another breakout room I cannot see it).
If you have a question in the main meeting room, just write it under the **Questions** section below.
I will do my best to respond to questions asap ;-)
- Room 1: Jens, Marcus, Johanne
- Room 2
- Room 3
- Room 4: Guro, Jenny, Pernille, The Tristan<3
## Questions
---
Questions?
---
## Brief Review of Previous Datalabs/Homework
Everything below matches the slides I show in the Zoom, with the links.
### Homework 1: Intro to Biological Sequences
*Concepts*
- DNA vs. RNA vs protein
- DNA transcription (DNA → RNA)
- DNA translation (DNA → RNA → protein)
- Sequence similarity probabilities +++
- If the E value is over 1: then we expect the sequence?
- we expect to see the sequence once by chance in the database, if e-value is 2, we expect to see it twice by chance, and so on
- so in order to be sure our sequence hits are significant, we want the e-value to be as close to 0 as possible, meaning that we don't expect to see our sequence in the database purely by chance, and therefore our hits are actually biologically similar to our original sequence
- Mutation rate calculations ++
*New Tools*
- Expasy Translate tool (6-frame translation of DNA or RNA sequences)
- https://web.expasy.org/translate/
### Homework 2: Sequence alignment and biological databases
*Concepts*
- sequence similarity search in databases (via Blast)
- protein information search in UniProt +
- protein structure search in the P++DB +
- likelihood of finding sequence in database and hit significance (e-value) +
- local vs. global sequence alignment +
*New Tools*
- NCBI (super)database
- https://www.ncbi.nlm.nih.gov/
- BLAST database querying
- https://blast.ncbi.nlm.nih.gov/Blast.cgi
- UniProt standardised protein database
- https://www.uniprot.org/
- PDB structural database
- https://www.rcsb.org/
- What is a big or a small resolution?
### Homework 3: Multiple sequence alignments (MSA)
*Concepts*
- from biological question to bioinformatic analysis +
- extracting sequences for various organisms
- making multiple sequence alignments +
- reading and interpreting multiple sequence alignments +
*New Tools*
- KEGG pathway database for enzyme function
- https://www.genome.jp/kegg/
- ClustalOmega aligner
- https://www.ebi.ac.uk/Tools/msa/clustalo/
### Homework 4: More BLAST, peptide mass, and alignments
*Concepts*
- species identification
- how to use BLAST to identify sequences + certainty +
- mass spectrometry +
- using peptide masses to find modified amino acids +
- recap of conservation: coverage vs percentage identity
Questions:
What is regional and global alignment ?
I don’t really understand the differences between precent identity and query cover?
*New Tools*
- Expasy peptide mass tool
- https://web.expasy.org/peptide_mass/
Can you come to breakout room 1?