HackMD - Collaborative Markdown Knowledge Base

1. BOSS WORKSHOP NOV 1-5 :::info - **Location:** Virtual - **Zoom Link:** https://us02web.zoom.us/j/81086617166?pwd=VWcvbVdsYjFmWWJkTm01U01KSkJodz09 - **Date:** Nov 1-5, 2021 9:00 am EAT (GMT+3) - **Pre workshop survey:** https://forms.gle/LgWuvnihQGgzArjh6 - **Post workshop survey:** https://forms.gle/QDG9yc6pYx8jtEEY6 ::: :::info ## November 5 ***Topics*** - Introduction to Git/GitHub - Caleb Kibet - Introduction to Galaxy - Peter van Heusden **Participants Roll call** 1. Mbu'u Mbanwi Cyrille, UYI, Cameroon 2. Winnie Mumbi 3. Joseph Gisaina; University of EMBU 4. Meshack Wadegu; KEMRI 5. Audrey Oronda 6. Beatrice Oduor, University of Nairobi 7. Martha Nginya, BHKi 8. Oscar Mwaura ;Pwani university 9. munduku Benoni;jcrc; Uganda 10. Okoko Irene; @okoko_mkavi 11. Faith Ndung'u;JKUAT; @Njoki_Ndung'u 12. Dativa pereus, National Institute for Medical Research-(NIMR Tanzania), University of Nairobi 13. Gladys Rotich; ICIPE 14. Sharon Watiri; KEMRI 15. Ahmed Abbi Abdille; PAUSTI @ahmedabbi 16. Stephen Tavasi ; Masinde Muliro University 17. Michael Ambutsi; MMUST ### Notes ***Git and GitHub*** - Git is a version control system that keeps track of work and facilitates collaboration on projects - GitHub is a web-based service for version control used to host Git repositories. It's also a social networking site for developers. - GitHub can also be used to host websites for events, organization, study groups, Gitbooks. It can also be used host code documentations, CVs, to-do lists, open educational resources (OERs). ***Galaxy*** - It's a web-based bioinformatics environment that consists of public servers - Galaxy aims to make bioinformatics accessible by providing user friendly interface, making analysis workflows re-usable. Follows open source and FAIR principles. - Register on https://usegalaxy.org/ or https://usegalaxy.eu/ - Build workflows and add them to Galaxy to enable their reproducibility - short introduction to Galaxy: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html - Galaxy project youtube channel: https://www.youtube.com/c/GalaxyProject/playlists - Demonstration workflow: https://usegalaxy.eu/u/pvanheus/w/my-first-analysis-workflow - Example of published galaxy workflow: https://workflowhub.eu/workflows/155 - Galaxy 101 tutorial - https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-101/tutorial.html - **Questions** ::: :::info ## November 4 ***Topics*** - Sequence alignment - Sonal Henson - Sequence assembly - James Otieno **Participants Roll call** Name/email/affiliation 1. Joseph Gisaina Amwoma; University of EMBU 2. Winnie Mumbi 3. Benjamin Opot; USAMRU-K 4. Kirunda Jeremy Menya; Joint Clinical Research Centre; @Kirunda_J 5. Stephen Tavasi ; mmust 6. Gladys Rotich; ICIPE 7. Audrey Oronda 8. Meshack Wadegu;KEMRI 9. Oscar Mwaura; Pwani University 10. Mbu'u Mbanwi Cyrille 11. Boakye Emmanuel; Kwame Nkrumah University Of Science And Technology,Ghana; @thescientistgh 12. Beatrice Oduor, University of Nairobi 13. Edwin Mwakio; USAMRU-K 14. munduku Benoni; jcrc; Uganda 15. Sharon Watiri;KEMRI 16. Michael Ambutsi; MMUST 17. Kimani Ndung'u, KALRO,FCRC-Njoro, Kenya 18. Dativa pereus, National Institute for Medical Research-(NIMR Tanzania), University of Nairobi **Questions** ### Notes https://docs.conda.io/en/latest/miniconda.html conda installation wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh sh Miniconda3-latest-Linux-x86_64.sh conda update conda for example install spades conda install -c bioconda spades ::: :::info ## November 3 ***Topics*** - Quality control and assessment - Shaun Aron - Practical session - QC - Shaun Aron - Scientific writing - Joy Owango **Participants Roll call** 1. Sharon Watiri 2. Benjamin Opot; USAMRU-k 3. Winnie Mumbi; @ShizlerMumbi 4. Edwin Mwakio;USAMRU-K 5. Kirunda Jeremy Menya; Joint Clincal Research Centre; @Kirunda_J 6. Joseph Gisaina; University Embu 7. Karim Mtengai,Copperbelt University 8. Meshack Wadegu, KEMRI 9. Ahmed Abbi Abdille, @ahmedabbi 10. Stephen Tavasi ; Masinde Muliro University 11. Beatrice Oduor, UON 12. Audrey Oronda 13. Ambutsi Michael; MMUST; @Ambutsi2 14. Mbu'u Mbanwi Cyrille, UYI, Cameroon 15. Gladys Rotich; ICIPE; @jerono7_gladys 16. Oscar Mwaura; Pwani University @OscarMwaura2 17. Faith Ndung'u, JKUAT, @Njoki_Ndung'u 18. Phyllys Langat, KEMRI,@phyllismemo 19. Kimani Ndung'u, KALRO,FCRC-Njoro, Kenya 20. Okoko Irene; @okoko_mkavi 21. Boakye Emmanuel; Kwame Nkrumah University Of Science And Technology,Ghana; @thescientistgh 22. munduku benoni; jcrc; uganda 23. Dativa pereus, National Institute for Medical Research-(NIMR Tanzania), University of Nairobi **Link to course website** - https://canvas.instructure.com/courses/3735217/modules **Links to data** - wget https://hpc.ilri.cgiar.org/~mkofia/NGS_QC/SRR6319976.zip - wget https://hpc.ilri.cgiar.org/~mkofia/NGS_QC/SRR957824.zip **Link to register for the next session** - https://bit.ly/3EGw6Zc -**Notes** **Quality control and assessment** - Errors can be introduced at various steps of a sequencing experiment making quality control an important step during before performing downstream analysis - NGS read quality starts off high and drps off towards the end of the reads - Common tool for generating the quality metrics of some raw reads is fastqc because it is easy to use and quite intuitive - Multiqc can be used view html reports form multiple fastq files and generate a single report for all - A fastqc run report: **Scientific writing** **Questions** ::: ::: :::info ## November 2 ***Topics*** - Advanced Linux, Awk and Sed - Sumir Panji **Participants Roll call** 1. Mbu'u Mbanwi Cyrille, University of Yaounde I, Biotechnology Centre, Yaounde, Cameroon 2. Benjamin Opot; USAMRU-K 3. Joseph Gisaina; University of Embu; Kenya 4. Audrey Oronda 5. Oscar Mwaura; Pwani University;Kenya @OscarMwaura2 6. Boakye Emmanuel; Kwame Nkrumah University Of Science And Technology,Ghana; @thescientistgh 7. Okoko Irene; @okoko_mkavi 8. Musundi Sebastian. JKUAT, Kenya 9. Winnie Mumbi; @ShizlerMumbi 10. Edwin Mwakio;USAMRU-K 11. Gladys Rotich; ICIPE; @jerono7_gladys 12. Kirunda Jeremy Menya, Joint Clinical Research Centre, Uganda; @Kirunda_J 13. Beatrice Oduor, UON, @Betty_Oduor 14. Faith Ndung'u, JKUAT, @Njoki_Ndung'u 15. Ahmed Abbi Abdille, PAUSTI, ahmedabbi 16. Kimani Ndung'u; KALRO,FCRC-Njoro, Kenya 17. munduku benoni; JCRC; uganda 18. Stephen Tavasi ; Masinde Muliro University 19. Michael Ambutsi; MMUST; @Ambutsi2 20. Karim Mtengai;Copperbelt University @Kmtengai 21. Meshack Wadegu, KEMRI 22. Phyllys Langat, KEMRI,@phyllismemo 23. Dativa pereus, National Institute for Medical Research-(NIMR Tanzania), University of Nairobi **Links to data** wget https://raw.githubusercontent.com/eanbit-rt/IntroductoryLinux-2019/master/Data/exercises.fasta wget https://raw.githubusercontent.com/eanbit-rt/IntroductoryLinux-2019/master/Data/sequences.fasta wget https://raw.githubusercontent.com/eanbit-rt/IntroductoryLinux-2019/master/Data/genes.gff wget https://raw.githubusercontent.com/bioinformatics-hub-ke/Boss-workshops/main/exercises.bed **Questions** ::: :::info ## November 1 ***Topics*** - Intro to sequencing technologies - Martha Luka - Data file formats - Talk - Shaun Aron - Introduction to Unix - Sumir Panji **Participants Roll call** ***Name; Institution; Twitter handle*** 1. Benjamin Opot; USAMRU-K 2. Winnie Mumbi; @ShizlerMumbi 3. Gladys Rotich; ICIPE; @jerono7_gladys 4. Stephen Tavasi; Masinde Muliro University Of Sci & Techn. 5. Sharon Watiri 6. Susan Watitwa 7. Faith Ndung'u, JKUAT @Njoki_Ndung'u 8. Audrey Oronda 9. Mbu'u Mbanwi Cyrille, University of Yaounde I, Cameroon 10. Edwin Mwakio; USAMRU-K 11. Felix Maingi, @Felix_Wine, Jomo Kenyatta University of Agriculture and Technology 12. Boakye Emmanuel; Kwame Nkrumah University Of Science And Technology,Ghana; @thescientistgh 13. Michael Ambutsi,MMUST, @Ambutsi2 14. munduku Benoni; joint clinical research center, Uganda; @benon_munduku 15. Okoko Irene; @okoko_mkavi 17. Beatrice Oduor,@Betty_Oduor,University of Nairobi 18. Kirunda Jeremy Menya; Joint Clinical Research Centre, Uganda; @Kirunda_J 19. Joseph Gisaina; University of EMBU;@isain254 20. Ssegawa Abdulkarim;Makerere University,uganda;@karimkarsha 21. Kimani Ndung'u; KALRO,FCRC-Njoro, Kenya; 22. Ahmed Abbi Abdille; Pan African University; Somalia 23. Karim Mtengai Copperbelt University,Zambia @Kmtengai 25. Musundi Sebastian, JKUAT, Kenya 26. Meshack Wadegu, KEMRI 27. Phyllys Langat,KEMRI,@phyllismemo 28. Dativa pereus, National Institute for Medical Research-(NIMR Tanzania), University of Nairobi ### Notes -**Introduction to sequencing technologies** - First DNA sequencing was in 1977 by Fredrick Sanger abd it was gel-based - Speed of sequencing has increased over the years due to adavncement in technology - First generation sequencing was used to infer the identity while the second generation sequencing ensures massive parrell sequencing - Increasing the length of the sequence fragments through the third generation sequencing hence improving quality of output. - Automated sanger sequencing using fluoresent labelled ddNTPs that produces a chromatogram fed into an automatic detector to call individual reads. Sanger sequending is expensive for long reads. - 2nd generation sequencing - takes place in real time: - Ion torrent is based on the detection of hydrogen ions realesed during the polymerization of DNA. The sequencing is real time and is effective for small labs with less sequencing needs. - ABI/SOLiD sequencing is the sequencing by oligonucleotide ligation and detection. Only the sequencing that are complimentary to the template bind and hence detected. It is much slower than other methods. Its good in DNA mythylation / epigenetics. - Illumina sequencing is a sequencing by syntheisis method - the genomic DNA is sheared into small fragments, adapters are added to the library, unique indices are added to help with pooling based on identity. High base calling accuracy, cost effective in terms of running, able to produce deep coverage for metagenomics and de novo assembly, high quality pairwise alignments, produces short reads for long genomes - 3rd generation sequencing: - PacBio sequencing is a long read sequencer. Good for epigenetics because the DNA is already methylated. application in whole genome sequencng and targeted sequencing. - Oxford nanopore sequencing can be used for native DNA sequencing without a PCR step, applicable for short or long reads, it is portable and the most cost effective but with a high error rate. -**Quality control and assessment** - Raw read data generated from sequencing platforms is passed through tools that assess the quality of the reads to determine whther or not they need to be trimmed. - Raw sequence data in the fastq format, derived from the fasta format with the quality of each base encoded in Ascii characters. - QC and trimming: fastqc is a tool whose input is fastqc files to check the quality of reads, checks for adapters etc. Fastqc generates a report. - Aligninments vs assembly: denovo assembly merges fragments of overlapping DNA into full contigs and is applicable when there is no reference genome available. Alignment is conducted when a reference genome is available. - The refence genome is first indexed before proceeding with alignment. When mapping using a tool such as BWA, the information from the indexing step is used to effciently map the reads on the reference genome. - IGV is an aligment viewer. - The output of a read alignment is a SAM (sequence alignment format) file which is human readable - BAM (binary alignment file) is in binary format hence not human readable. Tools lik - Cram files are compressed BAM files and the most effecient way to store BAM files because they use less space. They are reference based. - Variant calling: the process by which we identify variants from sequence data - VCF (variant call formats) files are the outup of variant calling which store the variant data togther with additional information on quality, annotations etc -**Introduction to Unix** - Unix is effiecient for working with large data sets - Most tools can access the biolgical data that is in text format through unix - Linux commands can be combined in many ways using pipes - You can navigate through the terminal using relative or absolute paths. An absolute path specifies the location from the root directory whereas a relative path is related to the current directory. - cd, ls, pwd, mkdir and rmdir are some of the most important commands in Unix. - Linux is case sensitive, using tab completions is advised. Use underscores between names in a file name - File permissions: other, group and owner/user permissions can be accessed. The chmod command is used to change the permissions of a file or directory. You can use symbols or octals(digits) with the chmod command. - Ypu can edit a file content using text editors such as nano, emacs, gedit, vim etc - Commands to use to vie the content of a file include cat, more, less, head and tail. - You can use wc to get the basic counts of a file -**Introduction to high performance computing** - A HPC cluster is a collection of many computers called nodes, connected via fast interconnect - The difference between personal computer and a cluster node is in quantity, quality and power of the components. :::info **Questions** 1. How would the error rates in ONT affect genetic diversity studies? 2. Is it a must to sort SAM files prior to variant calling 3. is a sequence reads a continuation of the one before? :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.