# Virtual Meet-up: Bioinformatics Hub of Kenya initiative
**2020-06-19**
[TOC]
### 1. What is the advice to an individual interested in bioinformatics who has basics only? The key basic skills required for someone with no prior knowledge on bioinformatics?
- key skills:
- search, filter, extract, cross-reference **data from large databases** - make use of data & knowledge that's already out there!
- **sequence alignment** - core concept in evolution/phylogeny, functional genomics, genome assembly, differential expression analyses, transcriptomics, metagenomics, etc etc etc
- **parsing data** - reading data from many different (often messy!) file formats
- **organisation** - keep track of what/where your data is, which analyses you've run, with what parameters/settings, etc
- start with **web-based tools** e.g. [EMBL-EBI](https://www.ebi.ac.uk/services)/[NCBI](https://www.ncbi.nlm.nih.gov/) resources
- [EBI Train Online](https://www.ebi.ac.uk/training/online/) has a huge amount of freely-accessible content introducing the **fundamental concepts & guiding users on getting started**
- once you begin working with larger amounts of data, you'll probably need to learn some **command line** computing (avoid long waits/costs of uploading data & downloading results)
- many great resources to learn the basics, e.g. [Software Carpentry Shell](http://swcarpentry.github.io/shell-novice/) & [Data Carpentry Genomics](https://datacarpentry.org/shell-genomics/)
- The [Galaxy platform](https://galaxyproject.org/) provides a fantastic [GUI alternative](https://usegalaxy.eu/) for those unfamiliar with command line computing
- [Learn Galaxy](https://galaxyproject.org/learn/) and [Galaxy Training Network](https://training.galaxyproject.org/) have many excellent tutorials to learn the platform **and** bioinformatics simultaneously
- Galaxy can be [installed locally](https://galaxyproject.org/admin/get-galaxy) to avoid upload/download of data over the Internet, but requires access to available server and some knowledge of server administration
- some understanding of **statistics** is also necessary
- [Bernd Klaus' teaching material](https://www.huber.embl.de/users/klaus/teaching.html#statistical-methods-in-bioinformatics) is a good place to start (if you know R) & [Modern Statistics for Modern Biology](https://www.huber.embl.de/msmb/introduction.html) by Huber & Holmes is a more comprehensive, but less accessible, guide to modern methods
- other good, free, online resources I know of for learning **bioinformatics**:
- H3ABioNet Resources:
- [Online Training](https://www.h3abionet.org/training)
- [Workshops](https://www.h3abionet.org/training)
- Simon Cockell's [Lockdown Learning Bioinformatics-along](https://www.youtube.com/playlist?list=PLzfP3sCXUnxEu5S9oXni1zmc1sjYmT1L9) videos
- [Applied Computational Genomics](https://github.com/quinlan-lab/applied-computational-genomics) from Aaron Quinlan's Lab at University of Utah
- for more, see "Teaching" section of http://quinlanlab.org/
- more [here](https://bio-it.embl.de/online-learning/)
- Finally: know that, if you're spending a lot of time searching the internet for help/answers, **you're not alone**!
- (search first, to see if your question was already asked by someone else!)
- http://seqanswers.com/
- http://www.biostars.org/
### 2. Is there a specific programming language preferred in the bioinformatics field?
- **Python & R** are equally popular and great places to start - free, open source, easy to install, huge online community, many resources to help you learn
- **Choose whichever language your friends/colleagues are already using** - I suspect this is the single biggest predictor of success
- Otherwise: Python is good for **image analysis** (so is **ImageJ**/Fiji, which provide a graphical interface), and more broadly applicable/useful outside bioinformatics, R has more **cutting-edge statistical methods** because of [Bioconductor](http://bioconductor.org/)
- if using/learning Python: check out [Biopython](https://biopython.org/)
---
### 3. How to build strong skills on a given programming language for data analysis and visualization.
- study other people's code - how do they do what they do?
- Python: learn numpy; pandas; matplotlib
- Use JupyterLab or Jupyter Notebook
- R: learn Tidyverse (dplyr; readr; tidyr; purrr; ggplot2; etc)
- Use Rstudio; work in RMarkdown
- Make it Open & Reproducible
- https://github.com/BioinfoNet & https://bioinfonet.github.io/OpenScienceKE/
- https://openlifesci.org/
- data analysis:
- [Jake Vanderplas's Data Science Handbook (**Python**)](https://jakevdp.github.io/PythonDataScienceHandbook/)
- [Hadley Wickham's **R** for Data Science](https://r4ds.had.co.nz/)
- [Wes McKinney's **Python** for Data Analysis](https://wesmckinney.com/pages/book.html)
- sadly not free: eBook PDF (no DRM) costs €~34
- [Rosalind](http://rosalind.info/) programming challenges will help you to simultaneously develop programming skills and insight into bioinformatic algorithms & approaches
- for data viz: use an **interactive environment** like Jupyter or RStudio - makes iterating over/exploring new visualisations much more fun.
-----