owned this note
owned this note
Published
Linked with GitHub
# Day 2 - Data Analysis and Visualization with R workshop
### tags: change me!
> Type on the left :arrow_left: and see the result on the right. :arrow_right:
> **For role calls, click on the box. Check right** :arrow_right:
## :memo: What you need
- A computer/laptop
- Internet
:::info
- **Location:** Zoom
- **Date:** March 25th, 2020 8:30 AM (EAT)
- **Schedule**
08:30 AM Recap Day1
09:30 AM Manipulating, analyzing and exporting data with tidyverse
11:00 AM Break
11:15 AM Data Visualisation with ggplot2
12:45 PM Wrap-up
12:50 PM Post-Workshop Survey
01:00 PM END
- **Contact:** <bioinformaticshubofkenya@gmail.com / info@bhki.org>, or [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) **follow us!**
- **Host:** BHKi
:::
# Role calls
- Cynthia Awuor/jkuat/cynthiaawuor18@gmail.com
- Ronald Tonui/Moi University SOM/tonuironald@gmail.com
- Jackline Kosgei/KEMRI/jackieruto@yahoo.com/ @JacklineKosgei
- Peris Ambala/IPR/perisambal@gmail.com/ @AmbalaPeris
- Henrick Aduda/EANBIT-ICIPE/henrickkl@gmail.com/ @adudahenrick
- Janet Majanja/KEMRI/jmajanja@gmail.com
- Njuki Susan/CVL/suzannenjuki@gmail.com/ Njuki_Sue
- Grace Kioko/NMK/mwendegrace56@gmail.com /@gracianamwesh
- Sylvia Milanoi/KEMRI/sylviamilanoi@gmail.com
- Kevin Arasa/KNH/arasakev@gmail.com/ @Arasavin
- Harriet Natabona/jkuat/nattybona2012@gmail.com/@hnatabona
- Brian Polo/KEMRI/otienobrn09@gmail.com
- owilifrank/kemri/owilifrank@gmail.com/@owilifrank
- Rispah Torrorey/Moi University/torrorey@gmail.com
- Cynthia King’ori/ICIPE/cynthiakingori19@gmail.com
##
## Code block
- **Codes for day 2**:
```r=
getwd() #to see the directory you are in
setwd("path to data")
## https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/portal_data_joined.csv #Link to dataset
#download data
download.file(url = "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/portal_data_joined.csv",
destfile = "data_raw/portal_data_joined.csv")
#load package
library(tidyverse)
#read data into R
# ===== we use the function read_csv(), to read the data and save it in an object called surveys
surveys <- read_csv("./data_raw/portal_data_joined.csv")
# look at the content of the loaded data, the first few lines
head(surveys)
# specify the first 50 rows
head(surveys, n=50)
print(surveys, n = 50)
# subset just the first 100 rows for testing computations
surveys_sample <- head(surveys, 100)
# look at the structure of a dataset with str
str(surveys)
# further inspect your data set with more functions
# Size
dim(surveys)
nrow(surveys)
ncol(surveys)
# Summary
str(surveys)
summary(surveys)
# ===== Indexing and Subsetting data frames =========================
# ===================================================================
# first element in the first column of the data frame (as a vector)
surveys[1, 1]
# first element in the 6th column (as a vector)
surveys[1, 6]
# first column of the data frame (as a vector)
surveys[, 1]
# whole dataframe except first column
surveys [, -1]
# first column of the data frame (as a data.frame)
surveys[1]
# first three rows of the 6th column (as a vector)
surveys[1:3, 6]
surveys[1:10, 7]
# the 3rd row of the data frame (as a data.frame)
surveys[3, ]
# equivalent to head_surveys <- head(surveys)
head_surveys <- surveys[1:6, ]
# you can also subset by excluding indices
surveys[, -1] # The whole data frame, except the first column
surveys[-(7:34786), ] # Equivalent to head(surveys)
# or by calling their column names
surveys["species_id"] # Result is a data.frame
surveys[, "species_id"] # Result is a vector
surveys[["species_id"]] # Result is a vector
surveys$species_id # Result is a vector
# ===== Factors =====================================================
# we can convert a column to a factor using:
surveys$sex <- factor(surveys$sex)
# check that it worked
summary(surveys$sex)
# By default, R always sorts levels in alphabetical order
levels(surveys$sex) #F comes before M
# check the number of levels
nlevels(surveys$sex)
# ===== converting factors ==========================================
# a vector of levels
sex <- factor(c("male", "female", "female", "male"))
sex # current order
# reorder the levels
sex <- factor(sex, levels = c("male", "female"))
sex
# If you need to convert a factor to a character vector, you use as.character(x)
as.character(sex)
# ===== Renaming factors ============================================
# when data is stored as a factor we can plot to get a quick glance at the number of observations
plot(surveys$sex)
# but we have 1700 NA's, sex hasnt been recorded
# to show them in the plot we can turn the missing values into a factor
# first subset the sex data
sex <- surveys$sex
levels(sex)
# add NA as level
sex <- addNA(sex)
levels(sex)
# by using indices , we can remanem the 3rd object of the leves i.e NA to more useful/informative names
levels(sex)[3] <- "undetermined"
levels(sex)
# now plotting the data again
plot(sex)
levels(sex)[1:2] <- c("female", "male")
sex <- factor(sex, levels = c("undetermined", "female", "male"))
plot(sex)
#corrected code from the challenge
animal_data <- data.frame(
animal = c("dog", "cat", "sea cucumber"),
feel = c("furry", "squishy", "spiny"),
weight = c(45, 8, 0.8))
#corrected code from the challenge
country_climate <- data.frame(
country = c("Canada", "Panama", "South Africa", "Australia"),
climate = c("cold", "hot", "temperate", "hot/temperate"),
temperature = c(10, 30, 18, 15),
northern_hemisphere = c(TRUE, TRUE, FALSE, FALSE),
has_kangaroo = c(FALSE, FALSE, FALSE, TRUE)
)
select(surveys, -record_id, species_id )
#select rows
filter(surveys, year != 1995)
#intermediate workflows
surveys2 <- filter(surveys, weight < 5)
surveys_sml <- select(surveys2, species_id, sex, weight)
filter(surveys, taxa == 'Bird')
#nested
factor(surveys$sex, levels = c("M","F"))
#pipe = %>%
surveys %>%
filter(weight <5) %>% select(species_id, sex, weight)
#plots
library(ggplot2)
ggplot_dataset <- "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv"
download.file(ggplot_dataset, destfile = "data_raw/surveys_complete.csv")
survey_complete <-read_csv("data_raw/surveys_complete.csv")
ggplot(surveys_complete, mapping=aes(x=weight, y=hindfoot_length)) + geom_point(alpha=0.1, aes(color=species_id)) + facet_wrap(facets = vars(genus))
weight_plot <- ggplot(data=surveys_complete, mapping = aes(x=species_id, y=weight)) + geom_boxplot(alpha=0, aes(color=species_id)) + geom_jitter(alpha = 0.3, aes(color=species_id)) + labs(title="My Plot", x="species id")
ggsave("weight_plot.png", weight_plot, width=25, height=10)
```
## Challenge
**1. Question**
1. Based on the output of str(surveys), can you answer the following questions?
- What is the class of the object surveys?
- How many rows and how many columns are in this object?
**Answers**
- class: "spec_tbl_df" "tbl_df" "tbl" "data.frame"
-rows- 34786
-columns - 13
-
- Class: data frame
- Number of rows: 34,786 Number of columns: 13
-class spec_tbl_df
row and col
[34,786 x 13]
**2. Question**
Challenge
1. Create a data.frame (surveys_200) containing only the data in row 200 of the surveys dataset.
2. Notice how nrow() gave you the number of rows in a data.frame?
- Use that number to pull out just that last row in the data frame.
- Compare that with what you see as the last row using tail() to make sure it's meeting expectations.
- Pull out that last row using nrow() instead of the row number.
- Create a new data frame (surveys_last) from that last row.
3. Use nrow() to extract the row that is in the middle of the data frame. Store the content of this row in an object named surveys_middle.
4. Combine nrow() with the - notation above to reproduce the behavior of head(surveys), keeping just the first through 6th rows of the surveys dataset.
**Answers**
-
-
-
-
**3 Question**
Rename "F" and "M" to "female" and "male" respectively.
Now that we renamed the factor level NA to "undetermined", can you recreate the barplot such that "undetermined" is first (before "female")?
**Answers**
```r=
levels(surveys$sex) [3] <- "undetermined"
levels(surveys$sex) [2] <- "male"
levels(surveys$sex) [1] <- "female"
plot(surveys$sex)
surveys$sex <- factor(surveys$sex, levels = c("undetermined", "male", "female"))
plot(surveys$sex)
levels(surveys$sex)[3]<-"undetermined"
levels(surveys$sex)[1]<-"Male"
levels(surveys$sex)[2]<-"Female"
plot(surveys$sex)
sex<-factor(surveys$sex, levels=c("undetermined","Male","Female"))
plot(sex)
levels(sex)[3] <- "undetermined"
levels(sex)[2] <- "Male"
levels(sex)[1] <- "Female"
sex <- factor(sex, levels = c("undetermined","Male", "Female"))
plot(sex)
# slice of first two levels
levels(surveys$sex)
```
## Question and answer
## Notes and links
Link to ggplot dataset = "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv"
```
"https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv"
```
Post-survey link = https://carpentries.typeform.com/to/UgVdRQ?slug=2021-03-24-BHKi-Online
# Day 1 - Data Analysis and Visualization with R workshop
### tags: change me!
> Type on the left :arrow_left: and see the result on the right. :arrow_right:
> **For role calls, click on the box. Check right** :arrow_right:
## :memo: What you need
- A computer/laptop
- Internet
:::info
- **Location:** Zoom
- **Date:** March 24th, 2020 9:00 AM (EAT)
- **Schedule**
09:00 AM: Introductions - name/affiliation/background
09:05 AM: Pre-workshop Survey (confirm if it's done)
09:15 AM: Before we Start
10:15 AM: Introduction to R
11:30 AM: Break
11:45 AM: Starting with Data
12:55 PM: Wrap-up
01:00 PM: END
- **Instructors:**
- Bernice Waweru
- Jennifer Shelton
- **Helpers**
- Festus Nyasimi
- Pauline Karega
- David Kiragu
- Margaret Wanjiru
- Michael Kofia
- **Contact:** <bioinformaticshubofkenya@gmail.com / info@bhki.org>, or [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) **follow us!**
- **Host:** BHKi
:::
# Role calls
**Name/Affiliation/Email/Twitter**
- ==Festus Nyasimi/BHKi/nfestus14@gmail.com/[@Festus_nyasimi](https://twitter.com/Festus_nyasimi)==
- ==David Kiragu/BHKi/davkmwaura@gmail.com /@MwauraKiragu==
- ==Michael Kofia/BHKi/landycofia@gmail.com /@CofiaLandy==
- ==Karega Pauline/UoN/karegapaul@gmail.com /@KaregaP==
- Janet Majanja/KEMRI/jmajanja@gmail.com
- Harriet Natabona/jkuat/nattybona2012@gmail.com /@hnatabona
- Irene Waita/jkuat/waitairenee@gmail.com /@irene_2019
- Henrick Aduda/ EANBIT-ICIPE/ henrickkl@gmail.com/ @adudahenrick
- Cedrick Shikoli/ Institute of Primate Research/ cshikoli@gmail.com/ @cshikoli
- Grace Kioko/National Museums of Kenya/mwendegrace56@gmail.com / @gracianamwesh
- GeorgeMusula/Kemri/eouma79@gmail.com
- Peris Ambala/IPR/perisambal@gmail.com/ @AmbalaPeris
- Frank Owili/kemri/owilifrank@gmail.com /@owilifrank
- Sylvia Milanoi/KEMRI/sylviamilanoi@gmail.com
- Kevin Arasa/KNH/arasakev@gmail.com / @Arasavin
- Brian Polo/KEMRI/otienobrn09@gmail.com /@BRIANPOLO10
- Jackline Kosgei/KEMRI/jackieruto@yahoo.com_/@JacklineKosgei
- Ronald Tonui Moi University
- Njuki Susan/CVL/suzannenjuki@gmail.com/@Njuki_sue
- Cynthia Awuor/JKUAT/cynthiaawuor18@gmail.com
- George Musula/Kemri/eouma79@gmail.com
- Cynthia King'ori/ICIPE/cynthiakingori19@gmail.com
## Code block
- All codes will be updated in this section:
```r=
#for example ...lets have fun
install.packages("ggplot2")
install.packages("tidyverse")
#to check which folder you're working in
getwd()
??mean #returns anything that has mean
?mean #returns results for just arithmetic mean.
browseVignettes() #opens up available tutorials for different packages
#creating objects in R
weight_kgs <- 55
weight_kgs
weight_kgs * 2.2
#reassigning a value for a variable
weight_kgs <- 57.5 #simply delete the value and replace with new value
#Functions and their arguments
weight_kgs <- sqrt(10)
round(3.1459, digits = 2 )
args(round)
#Vectors and Data types
weight_g <- c(50,60,65,82)
animals <- c("mouse","rat","dog")
length(weight_g) #find out how many values are in weight_g
length(animals)
class(animals) #find out the type of values in the vectors
class(weight_g)
str(animals)
num_char <- c(1, 2, 3, "a")
class(num_char)
#Subsetting vectors
animals <- c("mouse" , "rat", "dog", "cat") #create vector animals
length(animals) #check length of vector animals
animals[2] #to check the value at position 2 in a vector
animals[c(3,2)] #check the values in two different positions of a vector
#conditional subsetting
weight_g[c(TRUE,FALSE,TRUE,FALSE)] #subsetting using logical values
weight_g[weight_g > 50] #subsetting using conditionals
weight_g[weight_g > 50 & weight_g < 80] #using multiple conditions
weight_g[weight_g > 50 | weight_g == 50] #the | symbol means or
#Missing data
heights <- c(2,4,4,NA,6)
mean(heights, na.rm = TRUE)
#===========Dataset Analysis===========#
getwd() #to see where you are or in which directory you are in
setwd("data_raw") #set your working directory to data_raw folder
library(tidyverse) #load tidyverse library
surveys <- read_csv("portal_data_joined.csv") #assign your downloaded data to variable surveys
print(surveys) #print your dataset
list.files("data_raw") #list your files within the data_raw directory
#======OR======#
file <- "~/data-carpentry/data_raw/portal_raw_data_joined.csv"
surveys <- read_csv(file)
# explore directories(folders) on the file system
list.files("~/Downloads")
# path examples
getwd()
# use this to find your working directory. You can see my working directory below.
code_handout <- "/Users/jshelton/data-carpentry/code-handout.R"
file.exists(code_handout)
code_handout <- "code-handout.R"
file.exists(code_handout)
code_handout <- "~/data-carpentry/code-handout.R"
code_handout <- "./code-handout.R"
file.exists(code_handout)
code_handout <- "../data-carpentry/code-handout.R"
file.exists(code_handout)
```
## Challenge section
*NB: This section will be used for exercises during the workshop*
**Question 1**
- We’ve seen that atomic vectors can be of type character, numeric (or double), integer, and logical. But what happens if we try to mix these types in a single vector?
**Answer**: *R implicitly converts them to all be the same type*
What will happen in each of these examples? (hint: use class() to check the data type of your objects):
``` r =
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")
```
Why do you think it happens?
**Answers**:*Vectors can be of only one data type. R tries to convert (coerce) the content of this vector to find a "common denominator" that doesn't lose any information.*
**Missing data challenge**
1. Using this vector of heights in inches, create a new vector, heights_no_na, with the NAs removed.
```r=
heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65)
```
2. Use the function median() to calculate the median of the heights vector.
3. Use R to figure out how many people in the set are taller than 67 inches.
**Answers**
1. heights_no_na <- c(63, 69, 60, 65, 68, 61, 70, 61, 59, 64, 69, 63, 63,72, 65, 64, 70, 63, 65)
2. 64
3. 6
4. median= 64, mean= 64.94737 and heights >67 = 6 are the answers in my opinion
5.
## Question and answer
```r=
#Question one
heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65)
mean(heights, na.rm = TRUE)
Answer: 64.94737
```
```r=
#Question two
median(heights, na.rm = TRUE)
Answer: 64
```
```r=
#Question three
heights <-heights[!is.na(heights)]
heights_revised <- heights[heights > 67]
str(heights_revised)
Answer: 6
```
# Challenge
Using pipes, subset the surveys data to include animals collected before 1995 and retain only the columns year, sex, and weight.
```r=
surveys %>%
filter(year < 1995) %>%
select(year, sex, weight)
surveys%>%
filter(year<1995)%>%
select(year,sex,weight)
```
```
# plots
library(ggplot2)
ggplot_dataset <- "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv"
download.file(ggplot_dataset,
destfile="data_raw/surveys_complete.csv")
surveys_complete <- read_csv("data_raw/surveys_complete.csv")
ggplot(data=surveys_complete,
mapping=aes(x=weight,
y=hindfoot_length)) +
geom_point(alpha=0.1, aes(color=species_id)) +
facet_wrap(facets = vars(genus))
ggplot(data=surveys_complete,
mapping=aes(x=species_id, y=weight)) +
geom_boxplot(alpha=0, aes(color=species_id)) +
geom_jitter(alpha = 0.3, aes(color=species_id)) +
labs(title = "My plot",
x="species id")
ggsave("weight_plot.png", weight_plot, width=15, height =10)
# basic R syntax:
# function(arg1="blah", arg2=14)
# ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) +
# <GEOM_FUNCTION>()
```
## Notes and links
http://rosalind.info/problems/locations/
http://rosalind.info/problems/list-view/?location=python-village
https://rmarkdown.rstudio.com/lesson-1.html
R for data science https://r4ds.had.co.nz/
# Pre-event meeting - R workshop 22^nd^ March 1500 EAT (Participants + Helpers)
### tags: change me!
> Type on the left :arrow_left: and see the result on the right. :arrow_right:
> **For role calls, click on the box. Check right** :arrow_right:
## :memo: What you need
- A computer/laptop
- R installed
- Internet
## Role calls
**Helpers**
- [x] Festus Nyasimi
- [x] Karega Pauline
- [x] David Kiragu
- [x] Margret Wanjiku
- [x] Michael Kofia
**Participants**
- [x] Rispah Torrorey
- [x] Ronald Tonui
- [x] Janet Majanja
- [x] Cynthia King'ori
- [x] Irene
- [x] Grace Kioko
- [ ] KELVIN PAUL ARASA
- [x] Elius Mbogori
- [x] Henrick Aduda
- [x] Peris Auma Ambala
- [x] Cedrick Shikoli
- [x] SUSAN WATITWA
- [x] Cynthia Awuor Odhiambo
- [x] Harriet Natabona
- [x] Susan Njuki
- [x] George Musula
- [x] Sylvia Milanoi
- [x] Frank Erastus Owili
- [x] Justo Ochung'
- [x] Brian Polo
- [ ] JACKLINE KOSKEI
## **Agenda**
1. Introduction - check in ritual (name + institution)
2. What is HackMD?
3. Check List
> a. Checking participants' connection works, catching any audio/visual/bandwidth problems early. Every member will test their bandwidth (www.fast.com).
>
> b. Learn about non-verbal feedback.
>
> ![permalink setting demo](https://assets.zoom.us/images/en-us/desktop/generic/in-meeting/participants-list-status-icons.png)
>
>
> c. Check for packages installed for the session. Make sure you already have R installed on your computer. We will demonstrate how to install all the packages needed in the workshop and troubleshoot any installation/setup problems.
>
> **Packages to be installed**
> - [x] tidyverse
> - [x] ggplot2
>
> d. Check that participants can share their screens and introduce breakout rooms to participants.
>
> e. Check that participants have downloaded datasets to be used.
> Download: https://ndownloader.figshare.com/files/2292169.
>
> f. Discuss with attendees if they would be comfortable being recorded. Note that the recordings will be used only by the attendees and for a period of time.
3. Q&A
4. AOB
- (comments)
### Notes
- Windows + ++ ++++
- Mac book + +
- Linux
-
**If you don't have R**
Rstudio
https://rstudio.com/products/rstudio/download/
:rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket:
# Pre-event rehearsal meeting - 22^nd^ March 1200 EAT (Instructors + Helpers)
### tags: change me!
> Type on the left :arrow_left: and see the result on the right. :arrow_right:
> **For role calls, click on the box. Check right** :arrow_right:
## Role calls
**Instructors**
- [x] Jennifer Shelton
- [x] Bernice Waweru
**Helpers**
- [x] Festus Nyasimi
- [x] Karega Pauline
- [x] David Kiragu
- [ ] Margret Wanjiku
- [x] Michael Kofia
## Agenda
1. Check-in ritual / gratitude ritual / how was your weekend?
2. What to discuss
> a. Scheduled time vs content delivered + who teach what (*instructors*)
> **Workshop schedule:** https://bit.ly/3c5w2FW
> **Lessons:**
>
> b. Roles of the helpers (*According to The Carpentries guidelines*)
>
> - **Technical**: responsible for watching for learners reporting problems in the chat and providing assistance. Optionally, depending on instructor preference, they may facilitate question and answer sessions if the instructor needs a break or loses connection, or step in routinely to smooth transitions.
> - **Facilitator**: responsible for monitoring the room to mute learners as needed (requires host or co-host status on Zoom), watching for learner questions across platforms. This role may include oversight and triage, assigning help requests to specific helpers and elevating issues to the Instructor’s attention as needed.
> - **Breakout manager**: uses host status on Zoom to create and assign breakout rooms as needed.
> - **Document manager**: if you are using a collaborative notes document or keeping a command log, consider assigning a helper to keep this up to date.
>
> c. Discuss tips and tricks to make the workshop run smoothly (Non-verbal feedbacks, collaborative documents, breakout rooms, exercises...etc )
3. AOB
- (comments)
### Notes
- Number of participants expected 20 (so far).
- Question: when to stop registration - Modify registration form to allow for next workshop
- http://rosalind.info/problems/locations/
- https://docs.carpentries.org/topic_folders/hosts_instructors/hosts_instructors_checklist.html
- https://carpentries.org/online-workshop-recommendations/
- https://carpentries.org/workshop_faq/#online-workshops
- https://carpentries.org/blog/2020/05/centrally-organised-workshop-learnings/
BHKi in collaboration with Pine Biotech November Virtual Meet-up
===
:::info
- **Location:** Zoom
- **Date:** November 5th, 2020 4 PM (EAT)
- **Presenters:**
- Dr. Mohit Mazumder - Global Business Development
Bioinformatics Education & Research | Pine Biotech
- Dr. Harpeet Kaur
- **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE)
- **Host:** BHKi
:::
Roll call
===
Festus Nyasimi / BHKi / @Festus_nyasimi
Michael Landi/BHKi/
Martha Luka/ Pwani University
Njuki Susan/ Interantional Livestock Research Institute/
Peninah Wairagu/Technical University of Kenya/@pwairagu
Elizabeth Alfaro-FredrEspinoza/ Federal University of Viçosa / @ealfaroe_drafts
Angela Muraya / BHKi / JKUAT / @angelmuraya
Sarah Nyanchera Nyakeri/JKUAT/@SarahNyancheraN
Rissy Makokha/ Chinhoyi University of Technology/ @rissymakokha
Silviane Miruka/ Center for Therapeutic Research Sciogenogenences/m_silviane
Kennedy Mwangi/ JKUAT / @wanjauk1
Winfred Gatua/ Pwani University & ICIPE /@gatuaprof
Dr. Habiba I. Atta/Ahmadu Bello University, Zaria, Nigeria/
Virginiah
Dr. Chiranjeevi Pasala,Ph.D Bioinformatics, SVIMS University,Tirupati, INDIA, chiranjeevipasala099@gmai.il
Agnes Maina/Jkuat/agnesmwangui@gmail.com
Auleria Ajiambo/JKUAT/aajiambo@gmail.com
Fredrick kebaso /BHK /icipe/@fredrickkebaso
Tracey Calvert-Joshua / SANBI (UWC), South Africa / @TCalvertJoshua
Kimutai Rogers / Kenyatta University /kimutairo@gmail.com
Fredrick kebaso /BHK /icipe/@fredrickkebaso
OGANYA DEBORAH/ogenyideborah@gmail.com
Billiah Bwana/ University of Embu/ @kemuntoBil
Okeyo Allan | Pwani University | @5_Allan
Questions
===
-
BHKi September Virtual Meet-up
===
:::info
- **Location:** Zoom
- **Date:** September 24th, 2020 10:30 PM (EAT)
- **Agenda**
10:30 am - 11:00 am : Keynote speech (Motivational letter and CV writing)
11:00 am - 11:20 am : Applying for conferences
11:20 am - 11:40 am : Technical Interviews
11:40am - 12:15 pm: Q&A
- **Presenters:**
- Caleb Kibet - *ICIPE*
- Verena Ras - *H3ABioNet*
- Jean-Baka Domelevo Entfellner - *BecA-ILRI*
- **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE)
- **Host:** BHKi
:::
Felix Maingi/JKUAT/@felixsenior001
Margaret Chifwete/ICIPE/@moseleychichi
Festus Nyasimi/BHKi/@Festus_nyasimi
Winfred Gatua/ Pwani University/@gatuaprof
Eneza Mjema/ Pwani University/ @ene_yoel
Evans Mudibo/ Pwani University/ @mudibo_evans
Pranavathiyani G/BIC,PU/@pranavathiyani
Brenda Muthoni/ Pwani University/ @brenda_muthoni
Victor Sewe/KEMRI-CGHR,KISUMU/@SEWEVICTOR
Karega Pauline/ BHKi/ @Karegap
Peter Gichuki/ UON CEBIB/ @chukiptah
Verena Ras / H3ABioNet / @RasVerena E:verena.ras@uct.ac.za
Michael Kofia / BHKi / @CofiaLandy
Samson Mghanga/Tunde Investmensts/@SamsonMghanga
Jean-Baka Domelevo Entfellner / BecA-ILRI Hub / @JeanBakaDE
David Kiragu/BHKi/@MwauraKiragu
Fredrick kebaso /kenyatta university/@innocentkebaso
Faith Agnes Njeri/JKUAT/@aggierugami
Rose Wambui/JKUAT/@Rosegatheru
Jane Njeri / Pwani University / @NjeriAquila
Ndigezza Livingstone / Makerere University/ @ndigezzaliving
Boaz Wadugu/KCRI-Biotechnology laboratory/@waduguboaz
Caleb Kibet / ICIPE / @calkibet
Njuki Suzanne / ILRI/ @ Njuki_Sue
Margaret Wanjiku / BHKi / @meg_wanjiku
Muturi Njokah/KEMRI/@muturinjokah
Joseph Atemia /Pwani Uni & icipe / @MulamaJoe
Evalyne Wambui/icipe/@Samanthabobo_
Charles Kamonde Mwangi/ JKUAT/ @Kamonde_1
Chelsea Wairimu Gichuhi /JKUAT/ chelseagichuhi@gmail.com
Pauline King'ori/Pwani University/@paulah_kings
Hildah Njoroge/ BHKi / @wacuka_H
Stephen Tavasi/ Masinde Muliro University/ @zevon44
Justus kyalo kasivalu/cvl-kabete/kyalo988@gmail.com
Gershom Mbwambo/ Kilimanjaro Clinical Research Institute (KCRI)/ @gershommbwambo
Useful links
---
## Questions and Answers
Caleb
* **Which is the ideal CV or resume formatting?**
- That is mostly upto you, but some great examples include [Europass](https://europa.eu/europass/en/create-europass-cv). See also tips available [here](https://www.jobsinscience.com/info/cv.asp) and [here](https://www.thebalancecareers.com/academic-curriculum-vitae-example-2060817).
* While looking for post-graduate positions, what approach is best when writing to a professor?**
* Be very intentional and specific—if you can get an introduction from someone who knows them, the better.
* If you have to cold email, choose a catchy informative Subject, which tells them why you are applying at a glance.
* Be brief, and be clear about your ask. They should be able to respond in a short time.
* What opportunities are there to learn bioinformatics before one finally enrolls for an MSc
* Internships
* Online courses
* Self-learning
* How do I condense my 6pages CV to a 2page that is eye catching
* That is a resume. Only include the most important information
* Change the format to columnar one or tabular. Eg
* ![](https://th.bing.com/th/id/OIP.7vljcwO54AOurnUEAX0NQwHaKe?pid=Api&rs=1)
Verena
* What is the status of Bioinformatics careers in Africa
Thanks to some major consortiums like H3Africa and H3ABioNet, a number of strong bioinformatics groups have been implemented across Africa. This means that an increasing number of opportunities have been becoming available. Bioinformatics really is a growing field and people with these skills are becoming more and more in demand and so I expect that we will have an increasing number of opportunities across Africa in the coming years.
* How should one be conscious of their digital footprint? Does it impact selection?
Your digital footprint is extremely important. Whether it will affect your chances of being hired however depends on the company or organisation. Having said that, your digital footprint is becoming more and more important. As a bioinformatician it would definitely be a good idea to start a github account to display your code/projects, etc. Many companies now also request your twitter, facebook and linkedin handles in order to perform checks around what kind of content you post online, etc. Be extremely mindful of what you put online.
* In recent past, the technological advancement has made some career obsolete. Can bioinformatics withstand this pitfall as a career?
The technology is constantly improving and constantly shifting and so I believe new opportunities will always become available as the technology improves. It will require you to remain abreast with all the updates and advancements and will require constant learning throughout. This is however a pertinent point and something that must be considered in terms of your career progression and future plans.
JB
* What's the link between data science and Bioinformatics?
Data science is a broad field that involves managing and analyzing large amounts of data whereas bioinformatics is more specific to biological data and its associated manupilation of the data.
* Is bioinformatics dynamic? What makes it dynamic (or static, if you prefer this)? How does it (the field) do that (change over time)?
It is a dynamic field driven by technology changes, changes in biotechnolgy for instance the recent emergence of gene editing, changes in coputational capabilities (computational power etc)
* How do I kick start a bioinformatics career with no work experience and how is the job market for bioinformatics in Kenya?
Contribute code to projects if you are not in a postion to get into bioinformatics as a job currently by raising issues to already existing code. Contribute to stack overflow. Kenya has a competitive edge with regards to attracting funding. Generally, there is a market for bioinformatics in Kenya.
=================================================
BHKi July Virtual Meet-up
===
###### tags: `Galaxy` `Open source bioinformatics` `reproducible research`
:::info
- **Location:** Zoom
- **Date:** July 30th, 2020 10:30 PM (EAT)
- **Agenda**
10:30 am - 10:45 am : Brief introduction to the Galaxy platform
10:45 am - 11:45 am : Galaxy tutorial 101
11:45 am - 12:15 pm : Break out session
12:15 pm - 12:30 pm : Q&A
- **Presenters:**
- Peter van Heusden - *Galaxy/SANBI*
- Tracey Calvert-Joshua - *SANBI*
- Kamohelo (Kamo) Direko - *SANBI*
- Susan Alicia Fernol - *SANBI*
- **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE)
- **Host:** BHKi
:::
## Name/ Institution/ Twitter handle
Festus Nyasimi/ BHKi/[@Festus_nyasimi](https://twitter.com/Festus_nyasimi)
Michael Landi / BHKi/ @CofiaLandy
Margaret Wanjiku / BHKi / @meg_wanjiku
Rissy Makokha Wesonga/BHKi
Hellen Kariuki -UON
Andrew Mwangila/BHKi/@andrewmwa
Pranavathiyani G/BIC,PU/@pranavathiyani
Karega Pauline/ BHKi/@KaregaP
Collins Kigen/ *icipe*/@collinskigen
Tawich simon/icipe/@Tawich_kiplimo
lucas muiruri/ICRAF/@lucasmuiruri
Kennedy Mwangi / JKUAT/ @wanjauk1
Simeon Hebrew / JKUAT/ @HebrewSimeon
Arnold Lambisia /KWTRP/@Arnold_Sn
Stephen Tavasi/MMUST/@zevon44
John Gitau/UON/@Gitau_JohnK
Irene Waita/JKUAT/@irene_2019
Diana Kinyua/EGERTON/@DianaKinyua15
Joseph Mulama / PU & ICIPE / @MulamaJoe
Muturi Njokah/KEMRI/
Rose Wambui /JKUAT
Peter Gichuki /UON /@chukiptah
Harriet Natabona /JKUAT/ @hnatabona
Amayo Mordecai/ UON/@amayo_mordecai
Beatriz Serrano-Solano /EMBL & Galaxy EU / @Birthae
Irene karegi/JKUAT
Irene Mkavi/PAUSTI/@okoko_mkavi
Daniel OTRON/UFHB/@OtronDaniel
Samuel Oduor/BHKi/@Sam__Odi
Stephen/JKUAT
Brenda Muthoni/KEMRI/ @brenda_muthoni
Sumaya Kambal/NUBRI
Parwos_Abraham/UON&JKUAT/@parwosabraham
Bwanya Brian/icipe/@bwanya_brian
Ambutsi Mike/ MMUST/ @Ambutsi2
Gershom Mbwambo/KILIMANJARO CLINICAL RESEARCH INSTITUTE (KCRI)/ @gershommbwambo
Stephen Okeyo/CDC-KEMRI/CEBIB/@stephenokeyo65
Boaz Wadugu/KCRI-Biotechnology laboratory/@waduguboaz
Rogers Kimutai/KU
---
Useful links
---
Join slack https://join.slack.com/t/bhki/shared_invite/zt-gao8p7vl-W63ySPkw6cHB0lsiBdTyfQ
Tutorial for interpreting fastqc report/results: https://dnacore.missouri.edu/PDF/FastQC_Manual.pdf
fastq https://zenodo.org/record/582600/files/mutant_R1.fastq
Variant analysis https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html
Tutorials https://galaxyproject.org/learn/#tutorials-by-galaxy-training-network
## Questions and Answers
1) **Is there a variant calling pipeline in the galaxy and how can one go about it?**
Yes. Here are some tutorials on variant calling of different types: https://training.galaxyproject.org/training-material/topics/variant-analysis/
2) **Are there measures put in place to ensure data privacy for the galaxy platform users?**
Each Galaxy server is different but here are the terms of service for usegalaxy.eu: https://galaxyproject.eu/gdpr/
This is what they say:
*"Use of Service.
The European Galaxy Sites and UseGalaxy Site are a free, public, Internet accessible resource (the “Service”). Data transfer is encrypted unless you choose to use unencrypted FTP access. Data storage is not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project principal investigator before uploading it to any public site, including this Service. If you have protected data, large data storage requirements, or short deadlines you are encouraged to set up your own local Galaxy instance and not use this Service. Your access to the service may be revoked at any time for reasons deemed necessary by the operators of the Service.
You acknowledge that you are responsible for compliance of all of your data processing activities carried out on the Galaxy Service with applicable laws and regulations of the Federal Republic of Germany, the European Union as well as any laws or regulations of other legislations or any other restrictions that might be applicable due to the provenance, intended use, legal ownership of or any licensing or other legal restrictions imposed on the data being processed.
You are strictly prohibited from making use of the Galaxy Service for the storage or processing of any Personal Data or Sensitive Personal Data as defined by the European Union’s General Data Protection Regulation (GDPR), including but not limited to potentially personally identifying medical data."*
I.e. while usegalaxy.eu is investigating becoming GDPR compliant, they are not currently compliant with that legislation.
The usegalaxy.org server has similar wording:
"This is a free, public, internet accessible resource. Data transfer and data storage are not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project PI before uploading it to any public site, including this Galaxy server. If you have protected data, large data storage requirements, or short deadlines you are encouraged to setup your own local Galaxy instance or run Galaxy on the cloud."
3) **Are there tools for Tajima’s D calculations on the galaxy platform and if yes, how can one go about it?**
This is not supported on usegalaxy.eu (yet) but there is a tool for this in the Galaxy toolshed based on vcftools SlidingWindow. That tool (written by Alexis Dereeper) is available on crop oriented Galaxy servers like SouthGreen Galaxy (https://galaxy.southgreen.fr/galaxy) and on the EiB (Excellence in Breeding) Galaxy: http://galaxy-demo.excellenceinbreeding.org/ . While there was a tutorial on Galaxy for Crops at the recent BCC2020 conference, this training material is not yet available on the GTN training hub (training.galaxyproject.org). The SlidingWindow tool should be straightforward to use if you have variants in a VCF file - it simply takes VCF input and produces output including Tajima's D.
4) **Is there an option available that would allow one to use galaxy tools on the command line or on the HPC environment?**
Not directly. Galaxy Tools are, in general, interfaces to command line tools though.
5) **Main applications of Galaxy pipeline in metagenomics analysis?**
If by "metagenomic" you mean 16S amplicon data, there are tutorials on that using mothur: https://training.galaxyproject.org/training-material/topics/metagenomics/ - dada2 is also supported as a tool in Galaxy
For shotgun metagenomic data, metaphlan is available on some Galaxy servers (e.g. usegalaxy.eu)
6) **How does the galaxy platform relate to synthetic bio?**
Googling "galaxy synthetic biology" reveals that there is a project to make synthetic biology tools available on the Galaxy platform. More can be found out here: http://www.jfaulon.com/galaxy-synbiocad-portal/
As I personally don't know much (anything?) about synthetic biology I cannot evaluate the tools.
7) **What are the strategies that are important to the development and success of open-source bioinformatics tools?**
Torsten Seemann's "Ten recommendations for creating usable bioinformatics command line software" is a useful guide. Also the Bionitio aims to create "templates" for creating command line bioinformatics tools: https://academic.oup.com/gigascience/article/8/9/giz109/5572530 . After you created a tool that you think other people will find useful, you want to make it easy to install. Conda has become the de-facto standard for installing bioinformatics tools. Here is a guide to creating a conda recipe for your tool: https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html - and the bioconda project is a good place for hosting such recipes: https://bioconda.github.io/ Finally, Galaxy is a great way to making your tool accessible to people who are not using the command line. The GTN has some tutorials on writing Galaxy tool wrappers and packaging your tool for use in Galaxy: https://training.galaxyproject.org/training-material/topics/dev/
8) **What are some other open-source bioinformatics tools, what are their advantages and which ones are safe to analyze large datasets of genomic data?**
There are thousands of open-source bioinformatics tools - this question is difficult to answer. The majority of bioinformatics publications are based on use of open source tools.
9) **What are the most reliable open-source bioinformatics tools for RNA modelling?**
I cannot speak to reliability, especially as I don't know which aspect of RNA modelling is meant, but the Galaxy RNA Workbench (https://github.com/bgruening/galaxy-rna-workbench#training) is a project of the same group that hosts the usegalaxy.eu server and includes a number of RNA-oriented tools. These are available on usegalaxy.eu.
10) **What are the minimum computer specifications to run open-source bioinformatics tools?**
This depends on which tool is being asked about.
BHKi June Virtual Meet-up
===
:::info
- **Topic:** Core competencies in Bioinformatics
- **Location:** Zoom
- **Date:** June 19th, 2020 10:30 PM (EAT)
- **Program**
A brief introduction by the facilitator - 5 min
Presentation by Amel - 30 min
Questions session - 15 min
Live polls - 4min (Three questions)
Networking in breakout rooms (3members each) - 10min (Optional)
- **Presenters:**
- Amel Ghouila
- Yo Yehudi
- Malvika Sharan
- Caleb Kibet
- Toby Hodges
- **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE)
- **Host:** BHKi
:::
## Participants roll call:
Please write your name and email address:
Festus Nyasimi - BHK
Gilbert Kibet-Rono - kibet.gilbert@ymail.com
Ronald Tonui - tonuironald@gmail.com
Michael Landi - BHKi
Lucy Njoki - lucynjokinjuki@gmail.com
Edwin Njuguna- eddyynjuguna@gmail.com
Yo Yehudi - yo@openlifesci.org
Karega Pauline - BHKi
Mugoya Trevor - greena.mugoya@gmail.com
John Oketch -oketchjohn9@gmail.com
Kibogo Phinehas - aramaphinehas@yahoo.com
Toby Hodges / tbyhdgs@gmail.com
Peter Gichuki - pgichuki1@gmail.com
Stephen Kanyerezi - kanyerezi30@gmail.com
Harriet Natabona - nattybona2012@gmail.com
Taremwa Yoweri-tyoweri@gmail.com
Japheth Kipkulei- jkipkulei@gmail.com
David Kiragu- BHKi
Caleb Kibet
cecilia katunge email:
justus kyalo kasivalu, cvl kabete, kyalo988@gmail.com
Jane Njaramba-njanekagure@gmail.com
Kiberu Davis: kiberu.i.davis@gmail.com
Martha Luka : mawia.martha@gmail.com
Kennedy Mwangi: wanjaukm@gmail.com
Stephen Njuguna: sephoh.njuguna@gmail.com
Evalyne wambui -samanthaeva98@gmail.com
Simeon Hebrew - simeonhebrew@gmail.com
Shamim Osata-shamimosata@gmail.com
Stella Esther Nabirye - stellanabirye@gmail.com
Yves Hermandez Tchiechoua - yvestchiechoua@yahoo.fr
Brenda Makena - brenda.mugambi@yahoo.com
Armel Tangomo Ngnintedem - tangomoarmel25@gmail.com
Erick Nyaga - ericknyaga21@gmail.com
Suzanne Njuki-Suzannenjuki@gmail.com
Samuel Oduor - samordil@gmail.com
Jacqueline Waeni - jacqwaeni@gmail.com
Edna Wanjiru - ednamacharia@gmail.com
Peter van Heusden - SANBI, UWC, South Africa
Shahiid Kiyaga-ashakykiyaga91@gmail.com
Emmanuel J. Mande - emande@idi.co.ug
Parwos Abraham -parwosabraham@yahoo.com
**Question phase 1 :** These are questions submit by participants prior meeting.
Amel
1. Application of bioinformatics in the fight against covid-19, particularly in coming up with vaccines against the SARS-CoV 2 and others coronaviruses
2. What are the ethics in bioinformatics pracand it's practices?
Ethics in Bioinformatics - There's need to understand where data is coming from and whether it is consented or not. There's also need to understand when designing tools, where and what these tools will be used for.
Inclusivity in the designing of tools in order to serve the communities better is also key
Applications in the fight against covid-19 - Useful in helping understand the data and how to interpret it.
It is lso important to know how to communicate results.
How can we increase bioinformatics awareness among scientists trained in pre-bioinformatics era and help them recognize bioinformatics potential in their studies? (from chat)
Malvika and Yo
1. What is open science? What is reproducible research? Why should we care? Is open distance learning a form of open educational resource?
Practice of sharing science—results, methods, code.
Crediting people for their output
Citizen participation
Reproducibility: redo, verify, use different technique and get same results.
*be thoughtful. Share what you can, don't share when you shouldn't—be open as possible and closed as necessary.
open distance learning: only when shared after delivery and allow others to reuse. Share all supporting resources.
2. Is open science attainable in all fields of biological research and in underdeveloped research settings?
** This will also provide an opportunity to talk about the ols-2 and how interested individuals can apply.
It is attainable for some studies, some may be limited in terms of how much data or techniques may be shared (always check the embargos attached to your study), but overall, it is attainable.
OLS - https://openlifesci.org/ - applications open until June 30 2020, Webinar on the 23rd. (Visit openlifesci.org for joining details!)
For OLS questions please contact team@openlifesci.org, yo@openlifesci.org, or on twitter: @openlifesci, @yoyehudi. Very happy to answer any questions after the call!
Toby
1. What is the advice to an individual interested in bioinformatics who has basics only? The key basic skills required for someone with no prior knowledge on bioinformatics?
key skills:
search, filter, extract, cross-reference data from large databases - make use of data & knowledge that’s already out there!
sequence alignment - core concept in evolution/phylogeny, functional genomics, genome assembly, differential expression analyses, transcriptomics, metagenomics, etc etc etc
parsing data - reading data from many different (often messy!) file formats
organisation - keep track of what/where your data is, which analyses you’ve run, with what parameters/settings, etc
start with web-based tools e.g. EMBL-EBI (https://www.ebi.ac.uk/services)/NCBI (https://www.ncbi.nlm.nih.gov/) resources
EBI Train Online (https://www.ebi.ac.uk/training/online/) has a huge amount of freely-accessible content introducing the fundamental concepts & guiding users on getting started
once you begin working with larger amounts of data, you’ll probably need to learn some command line computing (avoid long waits/costs of uploading data & downloading results)
many great resources to learn the basics, e.g. Software Carpentry Shell (http://swcarpentry.github.io/shell-novice/) & Data Carpentry Genomics (https://datacarpentry.org/shell-genomics/) lessons
practice command line in your browser: https://cli-boot.camp
The Galaxy platform (https://galaxyproject.org/) provides a fantastic GUI alternative (https://usegalaxy.eu/) for those unfamiliar with command line computing
Learn Galaxy (https://galaxyproject.org/learn/) and Galaxy Training Network (https://training.galaxyproject.org/) have many excellent tutorials to learn the platform and bioinformatics simultaneously
Galaxy can be installed locally (https://galaxyproject.org/admin/get-galaxy) to avoid upload/download of data over the Internet, but requires access to available server and some knowledge of server administration
some understanding of statistics is also necessary
Bernd Klaus’ teaching material (https://www.huber.embl.de/users/klaus/teaching.html#statistical-methods-in-bioinformatics) is a good place to start (if you know R) & Modern Statistics for Modern Biology (https://www.huber.embl.de/msmb/introduction.html) by Huber & Holmes is a more comprehensive, but less accessible, guide to modern methods
other good, free, online resources I know of for learning bioinformatics:
H3ABioNet Resources:
Online Training (https://www.h3abionet.org/training)
Workshops (https://www.h3abionet.org/training)
Simon Cockell’s Lockdown Learning Bioinformatics-along (https://www.youtube.com/playlist?list=PLzfP3sCXUnxEu5S9oXni1zmc1sjYmT1L9) videos
Applied Computational Genomics (https://github.com/quinlan-lab/applied-computational-genomics) from Aaron Quinlan’s Lab at University of Utah
for more, see “Teaching” section of http://quinlanlab.org/
more at https://bio-it.embl.de/online-learning/
Prof David Tabb's lectures on bioinformatics and proteomics:
https://pickingupthetabb.wordpress.com/building-a-bioinformaticist/free-online-training-in-bioinformatics-and-biostatistics/
Materials from the 2020 SANBI Bioinformatics Course
http://biocourse.wp.sanbi.ac.za/?doing_wp_cron=1592560734.8887441158294677734375
Finally: know that, if you’re spending a lot of time searching the internet for help/answers, you’re not alone!
(search first, to see if your question was already asked by someone else!)
http://seqanswers.com/
http://www.biostars.org/
Twitter - if you follow the right people - is very useful for staying up to date with the field
does anyone have Twitter lists that they recommend?
rather more noisy, but also has a Slack: https://www.reddit.com/r/bioinformatics/
for parasitologists (with some e-resources): https://twitter.com/parasiteslack?lang=en
which also has an associated Slack
I’m sure many on this call can recommend other great resources! Please add links below
Peter van Heusden: Galaxy track at BCC2020 https://bcc2020.github.io/
Yo: +1 to this - there's also a lot of great low-cost virtual bioinformatics training available at BCC2020 too
2. Is there a specific programming language preferred in the bioinformatics field?
Python & R are equally popular and great places to start - free, open source, easy to install, huge online community, many resources to help you learn
Choose whichever language your friends/colleagues are already using - I suspect this is the single biggest predictor of success
Otherwise: Python is good for image analysis (so is ImageJ/Fiji, which provide a graphical interface), and more broadly applicable/useful outside bioinformatics, R has more cutting-edge statistical methods because of Bioconductor (http://bioconductor.org/)
if using/learning Python: check out Biopython (https://biopython.org/)
3 How to build strong skills on a given programming language for data analysis and visualization.
study other people’s code - how do they do what they do?
Python: learn numpy; pandas; matplotlib
Use JupyterLab or Jupyter Notebook
To install JupyterLab etc: https://www.anaconda.com/products/individual
R: learn Tidyverse (dplyr; readr; tidyr; purrr; ggplot2; etc)
Use Rstudio; work in RMarkdown
Make it Open & Reproducible
https://github.com/BioinfoNet & https://bioinfonet.github.io/OpenScienceKE/
https://openlifesci.org/ (OLS-2 applications now open!)
data analysis:
Jake Vanderplas’s Data Science Handbook (Python) (https://jakevdp.github.io/PythonDataScienceHandbook/)
Hadley Wickham’s R for Data Science (https://r4ds.had.co.nz/)
Wes McKinney’s Python for Data Analysis (https://wesmckinney.com/pages/book.html)
sadly not free: eBook PDF (no DRM) costs €~34
Python for Biologists - older edition free (http://userpages.fu-berlin.de/digga/p4b.pdf), newer edition not (https://pythonforbiologists.com/)
for installing command line software locally on Mac or Linux, the conda project (from Anaconda, listed above) and bioconda is good: https://bioconda.github.io/
Rosalind (http://rosalind.info/) provides programming challenges that will help you to simultaneously develop programming skills and insight into bioinformatic algorithms & approaches
for data viz: use an interactive environment like Jupyter or RStudio - makes iterating over/exploring new visualisations much more fun.
I’m sure many on this call can recommend other great resources! Please add links below
Caleb
1. What are legal frameworks around bioinformatics and e-health in Kenya?
- There is no specific regulation affects bioinformatics, but the recent Personal Data Protection Act would affect those dealing with personal genomic data. For e-health, the Kenya Health Policy Framework is still being used, although a conversation about regulating e-health is ongoing.
2. I am enthusiastic about delving into bioinformatics although my background is epidemiology and biostatistics, how do I transition to more bioinformatics work?
Toby has provided excellent answers to this question. Specifically, seek to build your molecular Biology and genomics skills. As Amel mentioned, chose a path and develop the skills required for that competency.
3. How can we improve reducibility in research?
See the tools Toby has shared.
- Document your work and include the required meta-data to your data
- Share the data, where possible and consented
- Share your code
- use literate programming to share your analysis and results
- At a higher level, make use of workflow languages and containers.
Question 2 : These are question arising from the session.
Members can type below:
Insight on how to transition from a user to a scientist?
Understanding how tools work and getting into coding and programming is a good place to start.
Please elaborate the use of Bioinformatics in Cancer care and personalised medicine - (from chat)
Networking: In the break out rooms please share;
Name and institution affiliated to.
Briefly describe your current work in bioinformatics. For students, they can share their experience in Bioinformatics so far.
NB: 3min for each member
Resources for 3-month coding internships (some bioinformatics orgs participate)
- https://www.outreachy.org/ - paid internships that happen twice a year in summer for northern hemisphere and southern hemisphere.
- Google Summer of Code https://summerofcode.withgoogle.com/ - unfortunately northern hemisphere-centric. Start applying in Feb/March time