# Virtual Meet-up: Bioinformatics Hub of Kenya initiative **2020-06-19** [TOC] ### 1. What is the advice to an individual interested in bioinformatics who has basics only? The key basic skills required for someone with no prior knowledge on bioinformatics? - key skills: - search, filter, extract, cross-reference **data from large databases** - make use of data & knowledge that's already out there! - **sequence alignment** - core concept in evolution/phylogeny, functional genomics, genome assembly, differential expression analyses, transcriptomics, metagenomics, etc etc etc - **parsing data** - reading data from many different (often messy!) file formats - **organisation** - keep track of what/where your data is, which analyses you've run, with what parameters/settings, etc - start with **web-based tools** e.g. [EMBL-EBI](https://www.ebi.ac.uk/services)/[NCBI](https://www.ncbi.nlm.nih.gov/) resources - [EBI Train Online](https://www.ebi.ac.uk/training/online/) has a huge amount of freely-accessible content introducing the **fundamental concepts & guiding users on getting started** - once you begin working with larger amounts of data, you'll probably need to learn some **command line** computing (avoid long waits/costs of uploading data & downloading results) - many great resources to learn the basics, e.g. [Software Carpentry Shell](http://swcarpentry.github.io/shell-novice/) & [Data Carpentry Genomics](https://datacarpentry.org/shell-genomics/) - The [Galaxy platform](https://galaxyproject.org/) provides a fantastic [GUI alternative](https://usegalaxy.eu/) for those unfamiliar with command line computing - [Learn Galaxy](https://galaxyproject.org/learn/) and [Galaxy Training Network](https://training.galaxyproject.org/) have many excellent tutorials to learn the platform **and** bioinformatics simultaneously - Galaxy can be [installed locally](https://galaxyproject.org/admin/get-galaxy) to avoid upload/download of data over the Internet, but requires access to available server and some knowledge of server administration - some understanding of **statistics** is also necessary - [Bernd Klaus' teaching material](https://www.huber.embl.de/users/klaus/teaching.html#statistical-methods-in-bioinformatics) is a good place to start (if you know R) & [Modern Statistics for Modern Biology](https://www.huber.embl.de/msmb/introduction.html) by Huber & Holmes is a more comprehensive, but less accessible, guide to modern methods - other good, free, online resources I know of for learning **bioinformatics**: - H3ABioNet Resources: - [Online Training](https://www.h3abionet.org/training) - [Workshops](https://www.h3abionet.org/training) - Simon Cockell's [Lockdown Learning Bioinformatics-along](https://www.youtube.com/playlist?list=PLzfP3sCXUnxEu5S9oXni1zmc1sjYmT1L9) videos - [Applied Computational Genomics](https://github.com/quinlan-lab/applied-computational-genomics) from Aaron Quinlan's Lab at University of Utah - for more, see "Teaching" section of http://quinlanlab.org/ - more [here](https://bio-it.embl.de/online-learning/) - Finally: know that, if you're spending a lot of time searching the internet for help/answers, **you're not alone**! - (search first, to see if your question was already asked by someone else!) - http://seqanswers.com/ - http://www.biostars.org/ ### 2. Is there a specific programming language preferred in the bioinformatics field? - **Python & R** are equally popular and great places to start - free, open source, easy to install, huge online community, many resources to help you learn - **Choose whichever language your friends/colleagues are already using** - I suspect this is the single biggest predictor of success - Otherwise: Python is good for **image analysis** (so is **ImageJ**/Fiji, which provide a graphical interface), and more broadly applicable/useful outside bioinformatics, R has more **cutting-edge statistical methods** because of [Bioconductor](http://bioconductor.org/) - if using/learning Python: check out [Biopython](https://biopython.org/) --- ### 3. How to build strong skills on a given programming language for data analysis and visualization. - study other people's code - how do they do what they do? - Python: learn numpy; pandas; matplotlib - Use JupyterLab or Jupyter Notebook - R: learn Tidyverse (dplyr; readr; tidyr; purrr; ggplot2; etc) - Use Rstudio; work in RMarkdown - Make it Open & Reproducible - https://github.com/BioinfoNet & https://bioinfonet.github.io/OpenScienceKE/ - https://openlifesci.org/ - data analysis: - [Jake Vanderplas's Data Science Handbook (**Python**)](https://jakevdp.github.io/PythonDataScienceHandbook/) - [Hadley Wickham's **R** for Data Science](https://r4ds.had.co.nz/) - [Wes McKinney's **Python** for Data Analysis](https://wesmckinney.com/pages/book.html) - sadly not free: eBook PDF (no DRM) costs €~34 - [Rosalind](http://rosalind.info/) programming challenges will help you to simultaneously develop programming skills and insight into bioinformatic algorithms & approaches - for data viz: use an **interactive environment** like Jupyter or RStudio - makes iterating over/exploring new visualisations much more fun. -----