Day 2 - Data Analysis and Visualization with R workshop

# Day 2 - Data Analysis and Visualization with R workshop ### tags: change me! > Type on the left :arrow_left: and see the result on the right. :arrow_right: > **For role calls, click on the box. Check right** :arrow_right: ## :memo: What you need - A computer/laptop - Internet :::info - **Location:** Zoom - **Date:** March 25th, 2020 8:30 AM (EAT) - **Schedule** 08:30 AM Recap Day1 09:30 AM Manipulating, analyzing and exporting data with tidyverse 11:00 AM Break 11:15 AM Data Visualisation with ggplot2 12:45 PM Wrap-up 12:50 PM Post-Workshop Survey 01:00 PM END - **Contact:** <bioinformaticshubofkenya@gmail.com / info@bhki.org>, or [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) **follow us!** - **Host:** BHKi ::: # Role calls - Cynthia Awuor/jkuat/cynthiaawuor18@gmail.com - Ronald Tonui/Moi University SOM/tonuironald@gmail.com - Jackline Kosgei/KEMRI/jackieruto@yahoo.com/ @JacklineKosgei - Peris Ambala/IPR/perisambal@gmail.com/ @AmbalaPeris - Henrick Aduda/EANBIT-ICIPE/henrickkl@gmail.com/ @adudahenrick - Janet Majanja/KEMRI/jmajanja@gmail.com - Njuki Susan/CVL/suzannenjuki@gmail.com/ Njuki_Sue - Grace Kioko/NMK/mwendegrace56@gmail.com /@gracianamwesh - Sylvia Milanoi/KEMRI/sylviamilanoi@gmail.com - Kevin Arasa/KNH/arasakev@gmail.com/ @Arasavin - Harriet Natabona/jkuat/nattybona2012@gmail.com/@hnatabona - Brian Polo/KEMRI/otienobrn09@gmail.com - owilifrank/kemri/owilifrank@gmail.com/@owilifrank - Rispah Torrorey/Moi University/torrorey@gmail.com - Cynthia King’ori/ICIPE/cynthiakingori19@gmail.com ## ## Code block - **Codes for day 2**： ```r= getwd() #to see the directory you are in setwd("path to data") ## https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/portal_data_joined.csv #Link to dataset #download data download.file(url = "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/portal_data_joined.csv", destfile = "data_raw/portal_data_joined.csv") #load package library(tidyverse) #read data into R # ===== we use the function read_csv(), to read the data and save it in an object called surveys surveys <- read_csv("./data_raw/portal_data_joined.csv") # look at the content of the loaded data, the first few lines head(surveys) # specify the first 50 rows head(surveys, n=50) print(surveys, n = 50) # subset just the first 100 rows for testing computations surveys_sample <- head(surveys, 100) # look at the structure of a dataset with str str(surveys) # further inspect your data set with more functions # Size dim(surveys) nrow(surveys) ncol(surveys) # Summary str(surveys) summary(surveys) # ===== Indexing and Subsetting data frames ========================= # =================================================================== # first element in the first column of the data frame (as a vector) surveys[1, 1] # first element in the 6th column (as a vector) surveys[1, 6] # first column of the data frame (as a vector) surveys[, 1] # whole dataframe except first column surveys [, -1] # first column of the data frame (as a data.frame) surveys[1] # first three rows of the 6th column (as a vector) surveys[1:3, 6] surveys[1:10, 7] # the 3rd row of the data frame (as a data.frame) surveys[3, ] # equivalent to head_surveys <- head(surveys) head_surveys <- surveys[1:6, ] # you can also subset by excluding indices surveys[, -1] # The whole data frame, except the first column surveys[-(7:34786), ] # Equivalent to head(surveys) # or by calling their column names surveys["species_id"] # Result is a data.frame surveys[, "species_id"] # Result is a vector surveys[["species_id"]] # Result is a vector surveys$species_id # Result is a vector # ===== Factors ===================================================== # we can convert a column to a factor using: surveys$sex <- factor(surveys$sex) # check that it worked summary(surveys$sex) # By default, R always sorts levels in alphabetical order levels(surveys$sex) #F comes before M # check the number of levels nlevels(surveys$sex) # ===== converting factors ========================================== # a vector of levels sex <- factor(c("male", "female", "female", "male")) sex # current order # reorder the levels sex <- factor(sex, levels = c("male", "female")) sex # If you need to convert a factor to a character vector, you use as.character(x) as.character(sex) # ===== Renaming factors ============================================ # when data is stored as a factor we can plot to get a quick glance at the number of observations plot(surveys$sex) # but we have 1700 NA's, sex hasnt been recorded # to show them in the plot we can turn the missing values into a factor # first subset the sex data sex <- surveys$sex levels(sex) # add NA as level sex <- addNA(sex) levels(sex) # by using indices , we can remanem the 3rd object of the leves i.e NA to more useful/informative names levels(sex)[3] <- "undetermined" levels(sex) # now plotting the data again plot(sex) levels(sex)[1:2] <- c("female", "male") sex <- factor(sex, levels = c("undetermined", "female", "male")) plot(sex) #corrected code from the challenge animal_data <- data.frame( animal = c("dog", "cat", "sea cucumber"), feel = c("furry", "squishy", "spiny"), weight = c(45, 8, 0.8)) #corrected code from the challenge country_climate <- data.frame( country = c("Canada", "Panama", "South Africa", "Australia"), climate = c("cold", "hot", "temperate", "hot/temperate"), temperature = c(10, 30, 18, 15), northern_hemisphere = c(TRUE, TRUE, FALSE, FALSE), has_kangaroo = c(FALSE, FALSE, FALSE, TRUE) ) select(surveys, -record_id, species_id ) #select rows filter(surveys, year != 1995) #intermediate workflows surveys2 <- filter(surveys, weight < 5) surveys_sml <- select(surveys2, species_id, sex, weight) filter(surveys, taxa == 'Bird') #nested factor(surveys$sex, levels = c("M","F")) #pipe = %>% surveys %>% filter(weight <5) %>% select(species_id, sex, weight) #plots library(ggplot2) ggplot_dataset <- "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv" download.file(ggplot_dataset, destfile = "data_raw/surveys_complete.csv") survey_complete <-read_csv("data_raw/surveys_complete.csv") ggplot(surveys_complete, mapping=aes(x=weight, y=hindfoot_length)) + geom_point(alpha=0.1, aes(color=species_id)) + facet_wrap(facets = vars(genus)) weight_plot <- ggplot(data=surveys_complete, mapping = aes(x=species_id, y=weight)) + geom_boxplot(alpha=0, aes(color=species_id)) + geom_jitter(alpha = 0.3, aes(color=species_id)) + labs(title="My Plot", x="species id") ggsave("weight_plot.png", weight_plot, width=25, height=10) ``` ## Challenge **1. Question** 1. Based on the output of str(surveys), can you answer the following questions? - What is the class of the object surveys? - How many rows and how many columns are in this object? **Answers** - class: "spec_tbl_df" "tbl_df" "tbl" "data.frame" -rows- 34786 -columns - 13 - - Class: data frame - Number of rows: 34,786 Number of columns: 13 -class spec_tbl_df row and col [34,786 x 13] **2. Question** Challenge 1. Create a data.frame (surveys_200) containing only the data in row 200 of the surveys dataset. 2. Notice how nrow() gave you the number of rows in a data.frame? - Use that number to pull out just that last row in the data frame. - Compare that with what you see as the last row using tail() to make sure it's meeting expectations. - Pull out that last row using nrow() instead of the row number. - Create a new data frame (surveys_last) from that last row. 3. Use nrow() to extract the row that is in the middle of the data frame. Store the content of this row in an object named surveys_middle. 4. Combine nrow() with the - notation above to reproduce the behavior of head(surveys), keeping just the first through 6th rows of the surveys dataset. **Answers** - - - - **3 Question** Rename "F" and "M" to "female" and "male" respectively. Now that we renamed the factor level NA to "undetermined", can you recreate the barplot such that "undetermined" is first (before "female")? **Answers** ```r= levels(surveys$sex) [3] <- "undetermined" levels(surveys$sex) [2] <- "male" levels(surveys$sex) [1] <- "female" plot(surveys$sex) surveys$sex <- factor(surveys$sex, levels = c("undetermined", "male", "female")) plot(surveys$sex) levels(surveys$sex)[3]<-"undetermined" levels(surveys$sex)[1]<-"Male" levels(surveys$sex)[2]<-"Female" plot(surveys$sex) sex<-factor(surveys$sex, levels=c("undetermined","Male","Female")) plot(sex) levels(sex)[3] <- "undetermined" levels(sex)[2] <- "Male" levels(sex)[1] <- "Female" sex <- factor(sex, levels = c("undetermined","Male", "Female")) plot(sex) # slice of first two levels levels(surveys$sex) ``` ## Question and answer ## Notes and links Link to ggplot dataset = "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv" ``` "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv" ``` Post-survey link = https://carpentries.typeform.com/to/UgVdRQ?slug=2021-03-24-BHKi-Online # Day 1 - Data Analysis and Visualization with R workshop ### tags: change me! > Type on the left :arrow_left: and see the result on the right. :arrow_right: > **For role calls, click on the box. Check right** :arrow_right: ## :memo: What you need - A computer/laptop - Internet :::info - **Location:** Zoom - **Date:** March 24th, 2020 9:00 AM (EAT) - **Schedule** 09:00 AM: Introductions - name/affiliation/background 09:05 AM: Pre-workshop Survey (confirm if it's done) 09:15 AM: Before we Start 10:15 AM: Introduction to R 11:30 AM: Break 11:45 AM: Starting with Data 12:55 PM: Wrap-up 01:00 PM: END - **Instructors:** - Bernice Waweru - Jennifer Shelton - **Helpers** - Festus Nyasimi - Pauline Karega - David Kiragu - Margaret Wanjiru - Michael Kofia - **Contact:** <bioinformaticshubofkenya@gmail.com / info@bhki.org>, or [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) **follow us!** - **Host:** BHKi ::: # Role calls **Name/Affiliation/Email/Twitter** - ==Festus Nyasimi/BHKi/nfestus14@gmail.com/[@Festus_nyasimi](https://twitter.com/Festus_nyasimi)== - ==David Kiragu/BHKi/davkmwaura@gmail.com /@MwauraKiragu== - ==Michael Kofia/BHKi/landycofia@gmail.com /@CofiaLandy== - ==Karega Pauline/UoN/karegapaul@gmail.com /@KaregaP== - Janet Majanja/KEMRI/jmajanja@gmail.com - Harriet Natabona/jkuat/nattybona2012@gmail.com /@hnatabona - Irene Waita/jkuat/waitairenee@gmail.com /@irene_2019 - Henrick Aduda/ EANBIT-ICIPE/ henrickkl@gmail.com/ @adudahenrick - Cedrick Shikoli/ Institute of Primate Research/ cshikoli@gmail.com/ @cshikoli - Grace Kioko/National Museums of Kenya/mwendegrace56@gmail.com / @gracianamwesh - GeorgeMusula/Kemri/eouma79@gmail.com - Peris Ambala/IPR/perisambal@gmail.com/ @AmbalaPeris - Frank Owili/kemri/owilifrank@gmail.com /@owilifrank - Sylvia Milanoi/KEMRI/sylviamilanoi@gmail.com - Kevin Arasa/KNH/arasakev@gmail.com / @Arasavin - Brian Polo/KEMRI/otienobrn09@gmail.com /@BRIANPOLO10 - Jackline Kosgei/KEMRI/jackieruto@yahoo.com_/@JacklineKosgei - Ronald Tonui Moi University - Njuki Susan/CVL/suzannenjuki@gmail.com/@Njuki_sue - Cynthia Awuor/JKUAT/cynthiaawuor18@gmail.com - George Musula/Kemri/eouma79@gmail.com - Cynthia King'ori/ICIPE/cynthiakingori19@gmail.com ## Code block - All codes will be updated in this section： ```r= #for example ...lets have fun install.packages("ggplot2") install.packages("tidyverse") #to check which folder you're working in getwd() ??mean #returns anything that has mean ?mean #returns results for just arithmetic mean. browseVignettes() #opens up available tutorials for different packages #creating objects in R weight_kgs <- 55 weight_kgs weight_kgs * 2.2 #reassigning a value for a variable weight_kgs <- 57.5 #simply delete the value and replace with new value #Functions and their arguments weight_kgs <- sqrt(10) round(3.1459, digits = 2 ) args(round) #Vectors and Data types weight_g <- c(50,60,65,82) animals <- c("mouse","rat","dog") length(weight_g) #find out how many values are in weight_g length(animals) class(animals) #find out the type of values in the vectors class(weight_g) str(animals) num_char <- c(1, 2, 3, "a") class(num_char) #Subsetting vectors animals <- c("mouse" , "rat", "dog", "cat") #create vector animals length(animals) #check length of vector animals animals[2] #to check the value at position 2 in a vector animals[c(3,2)] #check the values in two different positions of a vector #conditional subsetting weight_g[c(TRUE,FALSE,TRUE,FALSE)] #subsetting using logical values weight_g[weight_g > 50] #subsetting using conditionals weight_g[weight_g > 50 & weight_g < 80] #using multiple conditions weight_g[weight_g > 50 | weight_g == 50] #the | symbol means or #Missing data heights <- c(2,4,4,NA,6) mean(heights, na.rm = TRUE) #===========Dataset Analysis===========# getwd() #to see where you are or in which directory you are in setwd("data_raw") #set your working directory to data_raw folder library(tidyverse) #load tidyverse library surveys <- read_csv("portal_data_joined.csv") #assign your downloaded data to variable surveys print(surveys) #print your dataset list.files("data_raw") #list your files within the data_raw directory #======OR======# file <- "~/data-carpentry/data_raw/portal_raw_data_joined.csv" surveys <- read_csv(file) # explore directories(folders) on the file system list.files("~/Downloads") # path examples getwd() # use this to find your working directory. You can see my working directory below. code_handout <- "/Users/jshelton/data-carpentry/code-handout.R" file.exists(code_handout) code_handout <- "code-handout.R" file.exists(code_handout) code_handout <- "~/data-carpentry/code-handout.R" code_handout <- "./code-handout.R" file.exists(code_handout) code_handout <- "../data-carpentry/code-handout.R" file.exists(code_handout) ``` ## Challenge section *NB: This section will be used for exercises during the workshop* **Question 1** - We’ve seen that atomic vectors can be of type character, numeric (or double), integer, and logical. But what happens if we try to mix these types in a single vector? **Answer**: *R implicitly converts them to all be the same type* What will happen in each of these examples? (hint: use class() to check the data type of your objects): ``` r = num_char <- c(1, 2, 3, "a") num_logical <- c(1, 2, 3, TRUE) char_logical <- c("a", "b", "c", TRUE) tricky <- c(1, 2, 3, "4") ``` Why do you think it happens? **Answers**:*Vectors can be of only one data type. R tries to convert (coerce) the content of this vector to find a "common denominator" that doesn't lose any information.* **Missing data challenge** 1. Using this vector of heights in inches, create a new vector, heights_no_na, with the NAs removed. ```r= heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65) ``` 2. Use the function median() to calculate the median of the heights vector. 3. Use R to figure out how many people in the set are taller than 67 inches. **Answers** 1. heights_no_na <- c(63, 69, 60, 65, 68, 61, 70, 61, 59, 64, 69, 63, 63,72, 65, 64, 70, 63, 65) 2. 64 3. 6 4. median= 64, mean= 64.94737 and heights >67 = 6 are the answers in my opinion 5. ## Question and answer ```r= #Question one heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65) mean(heights, na.rm = TRUE) Answer: 64.94737 ``` ```r= #Question two median(heights, na.rm = TRUE) Answer: 64 ``` ```r= #Question three heights <-heights[!is.na(heights)] heights_revised <- heights[heights > 67] str(heights_revised) Answer: 6 ``` # Challenge Using pipes, subset the surveys data to include animals collected before 1995 and retain only the columns year, sex, and weight. ```r= surveys %>% filter(year < 1995) %>% select(year, sex, weight) surveys%>% filter(year<1995)%>% select(year,sex,weight) ``` ``` # plots library(ggplot2) ggplot_dataset <- "https://github.com/bioinformatics-hub-ke/R-workshop-24-03-2021/raw/main/surveys_complete.csv" download.file(ggplot_dataset, destfile="data_raw/surveys_complete.csv") surveys_complete <- read_csv("data_raw/surveys_complete.csv") ggplot(data=surveys_complete, mapping=aes(x=weight, y=hindfoot_length)) + geom_point(alpha=0.1, aes(color=species_id)) + facet_wrap(facets = vars(genus)) ggplot(data=surveys_complete, mapping=aes(x=species_id, y=weight)) + geom_boxplot(alpha=0, aes(color=species_id)) + geom_jitter(alpha = 0.3, aes(color=species_id)) + labs(title = "My plot", x="species id") ggsave("weight_plot.png", weight_plot, width=15, height =10) # basic R syntax: # function(arg1="blah", arg2=14) # ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + # <GEOM_FUNCTION>() ``` ## Notes and links http://rosalind.info/problems/locations/ http://rosalind.info/problems/list-view/?location=python-village https://rmarkdown.rstudio.com/lesson-1.html R for data science https://r4ds.had.co.nz/ # Pre-event meeting - R workshop 22^nd^ March 1500 EAT (Participants + Helpers) ### tags: change me! > Type on the left :arrow_left: and see the result on the right. :arrow_right: > **For role calls, click on the box. Check right** :arrow_right: ## :memo: What you need - A computer/laptop - R installed - Internet ## Role calls **Helpers** - [x] Festus Nyasimi - [x] Karega Pauline - [x] David Kiragu - [x] Margret Wanjiku - [x] Michael Kofia **Participants** - [x] Rispah Torrorey - [x] Ronald Tonui - [x] Janet Majanja - [x] Cynthia King'ori - [x] Irene - [x] Grace Kioko - [ ] KELVIN PAUL ARASA - [x] Elius Mbogori - [x] Henrick Aduda - [x] Peris Auma Ambala - [x] Cedrick Shikoli - [x] SUSAN WATITWA - [x] Cynthia Awuor Odhiambo - [x] Harriet Natabona - [x] Susan Njuki - [x] George Musula - [x] Sylvia Milanoi - [x] Frank Erastus Owili - [x] Justo Ochung' - [x] Brian Polo - [ ] JACKLINE KOSKEI ## **Agenda** 1. Introduction - check in ritual (name + institution) 2. What is HackMD? 3. Check List > a. Checking participants' connection works, catching any audio/visual/bandwidth problems early. Every member will test their bandwidth (www.fast.com). > > b. Learn about non-verbal feedback. > > ![permalink setting demo](https://assets.zoom.us/images/en-us/desktop/generic/in-meeting/participants-list-status-icons.png) > > > c. Check for packages installed for the session. Make sure you already have R installed on your computer. We will demonstrate how to install all the packages needed in the workshop and troubleshoot any installation/setup problems. > > **Packages to be installed** > - [x] tidyverse > - [x] ggplot2 > > d. Check that participants can share their screens and introduce breakout rooms to participants. > > e. Check that participants have downloaded datasets to be used. > Download: https://ndownloader.figshare.com/files/2292169. > > f. Discuss with attendees if they would be comfortable being recorded. Note that the recordings will be used only by the attendees and for a period of time. 3. Q&A 4. AOB - (comments) ### Notes - Windows + ++ ++++ - Mac book + + - Linux - **If you don't have R** Rstudio https://rstudio.com/products/rstudio/download/ :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: # Pre-event rehearsal meeting - 22^nd^ March 1200 EAT (Instructors + Helpers) ### tags: change me! > Type on the left :arrow_left: and see the result on the right. :arrow_right: > **For role calls, click on the box. Check right** :arrow_right: ## Role calls **Instructors** - [x] Jennifer Shelton - [x] Bernice Waweru **Helpers** - [x] Festus Nyasimi - [x] Karega Pauline - [x] David Kiragu - [ ] Margret Wanjiku - [x] Michael Kofia ## Agenda 1. Check-in ritual / gratitude ritual / how was your weekend? 2. What to discuss > a. Scheduled time vs content delivered + who teach what (*instructors*) > **Workshop schedule:** https://bit.ly/3c5w2FW > **Lessons:** > > b. Roles of the helpers (*According to The Carpentries guidelines*) > > - **Technical**: responsible for watching for learners reporting problems in the chat and providing assistance. Optionally, depending on instructor preference, they may facilitate question and answer sessions if the instructor needs a break or loses connection, or step in routinely to smooth transitions. > - **Facilitator**: responsible for monitoring the room to mute learners as needed (requires host or co-host status on Zoom), watching for learner questions across platforms. This role may include oversight and triage, assigning help requests to specific helpers and elevating issues to the Instructor’s attention as needed. > - **Breakout manager**: uses host status on Zoom to create and assign breakout rooms as needed. > - **Document manager**: if you are using a collaborative notes document or keeping a command log, consider assigning a helper to keep this up to date. > > c. Discuss tips and tricks to make the workshop run smoothly (Non-verbal feedbacks, collaborative documents, breakout rooms, exercises...etc ) 3. AOB - (comments) ### Notes - Number of participants expected 20 (so far). - Question: when to stop registration - Modify registration form to allow for next workshop - http://rosalind.info/problems/locations/ - https://docs.carpentries.org/topic_folders/hosts_instructors/hosts_instructors_checklist.html - https://carpentries.org/online-workshop-recommendations/ - https://carpentries.org/workshop_faq/#online-workshops - https://carpentries.org/blog/2020/05/centrally-organised-workshop-learnings/ BHKi in collaboration with Pine Biotech November Virtual Meet-up === :::info - **Location:** Zoom - **Date:** November 5th, 2020 4 PM (EAT) - **Presenters:** - Dr. Mohit Mazumder - Global Business Development Bioinformatics Education & Research | Pine Biotech - Dr. Harpeet Kaur - **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) - **Host:** BHKi ::: Roll call === Festus Nyasimi / BHKi / @Festus_nyasimi Michael Landi/BHKi/ Martha Luka/ Pwani University Njuki Susan/ Interantional Livestock Research Institute/ Peninah Wairagu/Technical University of Kenya/@pwairagu Elizabeth Alfaro-FredrEspinoza/ Federal University of Viçosa / @ealfaroe_drafts Angela Muraya / BHKi / JKUAT / @angelmuraya Sarah Nyanchera Nyakeri/JKUAT/@SarahNyancheraN Rissy Makokha/ Chinhoyi University of Technology/ @rissymakokha Silviane Miruka/ Center for Therapeutic Research Sciogenogenences/m_silviane Kennedy Mwangi/ JKUAT / @wanjauk1 Winfred Gatua/ Pwani University & ICIPE /@gatuaprof Dr. Habiba I. Atta/Ahmadu Bello University, Zaria, Nigeria/ Virginiah Dr. Chiranjeevi Pasala,Ph.D Bioinformatics, SVIMS University,Tirupati, INDIA, chiranjeevipasala099@gmai.il Agnes Maina/Jkuat/agnesmwangui@gmail.com Auleria Ajiambo/JKUAT/aajiambo@gmail.com Fredrick kebaso /BHK /icipe/@fredrickkebaso Tracey Calvert-Joshua / SANBI (UWC), South Africa / @TCalvertJoshua Kimutai Rogers / Kenyatta University /kimutairo@gmail.com Fredrick kebaso /BHK /icipe/@fredrickkebaso OGANYA DEBORAH/ogenyideborah@gmail.com Billiah Bwana/ University of Embu/ @kemuntoBil Okeyo Allan | Pwani University | @5_Allan Questions === - BHKi September Virtual Meet-up === :::info - **Location:** Zoom - **Date:** September 24th, 2020 10:30 PM (EAT) - **Agenda** 10:30 am - 11:00 am : Keynote speech (Motivational letter and CV writing) 11:00 am - 11:20 am : Applying for conferences 11:20 am - 11:40 am : Technical Interviews 11:40am - 12:15 pm: Q&A - **Presenters:** - Caleb Kibet - *ICIPE* - Verena Ras - *H3ABioNet* - Jean-Baka Domelevo Entfellner - *BecA-ILRI* - **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) - **Host:** BHKi ::: Felix Maingi/JKUAT/@felixsenior001 Margaret Chifwete/ICIPE/@moseleychichi Festus Nyasimi/BHKi/@Festus_nyasimi Winfred Gatua/ Pwani University/@gatuaprof Eneza Mjema/ Pwani University/ @ene_yoel Evans Mudibo/ Pwani University/ @mudibo_evans Pranavathiyani G/BIC,PU/@pranavathiyani Brenda Muthoni/ Pwani University/ @brenda_muthoni Victor Sewe/KEMRI-CGHR,KISUMU/@SEWEVICTOR Karega Pauline/ BHKi/ @Karegap Peter Gichuki/ UON CEBIB/ @chukiptah Verena Ras / H3ABioNet / @RasVerena E:verena.ras@uct.ac.za Michael Kofia / BHKi / @CofiaLandy Samson Mghanga/Tunde Investmensts/@SamsonMghanga Jean-Baka Domelevo Entfellner / BecA-ILRI Hub / @JeanBakaDE David Kiragu/BHKi/@MwauraKiragu Fredrick kebaso /kenyatta university/@innocentkebaso Faith Agnes Njeri/JKUAT/@aggierugami Rose Wambui/JKUAT/@Rosegatheru Jane Njeri / Pwani University / @NjeriAquila Ndigezza Livingstone / Makerere University/ @ndigezzaliving Boaz Wadugu/KCRI-Biotechnology laboratory/@waduguboaz Caleb Kibet / ICIPE / @calkibet Njuki Suzanne / ILRI/ @ Njuki_Sue Margaret Wanjiku / BHKi / @meg_wanjiku Muturi Njokah/KEMRI/@muturinjokah Joseph Atemia /Pwani Uni & icipe / @MulamaJoe Evalyne Wambui/icipe/@Samanthabobo_ Charles Kamonde Mwangi/ JKUAT/ @Kamonde_1 Chelsea Wairimu Gichuhi /JKUAT/ chelseagichuhi@gmail.com Pauline King'ori/Pwani University/@paulah_kings Hildah Njoroge/ BHKi / @wacuka_H Stephen Tavasi/ Masinde Muliro University/ @zevon44 Justus kyalo kasivalu/cvl-kabete/kyalo988@gmail.com Gershom Mbwambo/ Kilimanjaro Clinical Research Institute (KCRI)/ @gershommbwambo Useful links --- ## Questions and Answers Caleb * **Which is the ideal CV or resume formatting?** - That is mostly upto you, but some great examples include [Europass](https://europa.eu/europass/en/create-europass-cv). See also tips available [here](https://www.jobsinscience.com/info/cv.asp) and [here](https://www.thebalancecareers.com/academic-curriculum-vitae-example-2060817). * While looking for post-graduate positions, what approach is best when writing to a professor?** * Be very intentional and specific—if you can get an introduction from someone who knows them, the better. * If you have to cold email, choose a catchy informative Subject, which tells them why you are applying at a glance. * Be brief, and be clear about your ask. They should be able to respond in a short time. * What opportunities are there to learn bioinformatics before one finally enrolls for an MSc * Internships * Online courses * Self-learning * How do I condense my 6pages CV to a 2page that is eye catching * That is a resume. Only include the most important information * Change the format to columnar one or tabular. Eg * ![](https://th.bing.com/th/id/OIP.7vljcwO54AOurnUEAX0NQwHaKe?pid=Api&rs=1) Verena * What is the status of Bioinformatics careers in Africa Thanks to some major consortiums like H3Africa and H3ABioNet, a number of strong bioinformatics groups have been implemented across Africa. This means that an increasing number of opportunities have been becoming available. Bioinformatics really is a growing field and people with these skills are becoming more and more in demand and so I expect that we will have an increasing number of opportunities across Africa in the coming years. * How should one be conscious of their digital footprint? Does it impact selection? Your digital footprint is extremely important. Whether it will affect your chances of being hired however depends on the company or organisation. Having said that, your digital footprint is becoming more and more important. As a bioinformatician it would definitely be a good idea to start a github account to display your code/projects, etc. Many companies now also request your twitter, facebook and linkedin handles in order to perform checks around what kind of content you post online, etc. Be extremely mindful of what you put online. * In recent past, the technological advancement has made some career obsolete. Can bioinformatics withstand this pitfall as a career? The technology is constantly improving and constantly shifting and so I believe new opportunities will always become available as the technology improves. It will require you to remain abreast with all the updates and advancements and will require constant learning throughout. This is however a pertinent point and something that must be considered in terms of your career progression and future plans. JB * What's the link between data science and Bioinformatics? Data science is a broad field that involves managing and analyzing large amounts of data whereas bioinformatics is more specific to biological data and its associated manupilation of the data. * Is bioinformatics dynamic? What makes it dynamic (or static, if you prefer this)? How does it (the field) do that (change over time)? It is a dynamic field driven by technology changes, changes in biotechnolgy for instance the recent emergence of gene editing, changes in coputational capabilities (computational power etc) * How do I kick start a bioinformatics career with no work experience and how is the job market for bioinformatics in Kenya? Contribute code to projects if you are not in a postion to get into bioinformatics as a job currently by raising issues to already existing code. Contribute to stack overflow. Kenya has a competitive edge with regards to attracting funding. Generally, there is a market for bioinformatics in Kenya. ================================================= BHKi July Virtual Meet-up === ###### tags: `Galaxy` `Open source bioinformatics` `reproducible research` :::info - **Location:** Zoom - **Date:** July 30th, 2020 10:30 PM (EAT) - **Agenda** 10:30 am - 10:45 am : Brief introduction to the Galaxy platform 10:45 am - 11:45 am : Galaxy tutorial 101 11:45 am - 12:15 pm : Break out session 12:15 pm - 12:30 pm : Q&A - **Presenters:** - Peter van Heusden - *Galaxy/SANBI* - Tracey Calvert-Joshua - *SANBI* - Kamohelo (Kamo) Direko - *SANBI* - Susan Alicia Fernol - *SANBI* - **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) - **Host:** BHKi ::: ## Name/ Institution/ Twitter handle Festus Nyasimi/ BHKi/[@Festus_nyasimi](https://twitter.com/Festus_nyasimi) Michael Landi / BHKi/ @CofiaLandy Margaret Wanjiku / BHKi / @meg_wanjiku Rissy Makokha Wesonga/BHKi Hellen Kariuki -UON Andrew Mwangila/BHKi/@andrewmwa Pranavathiyani G/BIC,PU/@pranavathiyani Karega Pauline/ BHKi/@KaregaP Collins Kigen/ *icipe*/@collinskigen Tawich simon/icipe/@Tawich_kiplimo lucas muiruri/ICRAF/@lucasmuiruri Kennedy Mwangi / JKUAT/ @wanjauk1 Simeon Hebrew / JKUAT/ @HebrewSimeon Arnold Lambisia /KWTRP/@Arnold_Sn Stephen Tavasi/MMUST/@zevon44 John Gitau/UON/@Gitau_JohnK Irene Waita/JKUAT/@irene_2019 Diana Kinyua/EGERTON/@DianaKinyua15 Joseph Mulama / PU & ICIPE / @MulamaJoe Muturi Njokah/KEMRI/ Rose Wambui /JKUAT Peter Gichuki /UON /@chukiptah Harriet Natabona /JKUAT/ @hnatabona Amayo Mordecai/ UON/@amayo_mordecai Beatriz Serrano-Solano /EMBL & Galaxy EU / @Birthae Irene karegi/JKUAT Irene Mkavi/PAUSTI/@okoko_mkavi Daniel OTRON/UFHB/@OtronDaniel Samuel Oduor/BHKi/@Sam__Odi Stephen/JKUAT Brenda Muthoni/KEMRI/ @brenda_muthoni Sumaya Kambal/NUBRI Parwos_Abraham/UON&JKUAT/@parwosabraham Bwanya Brian/icipe/@bwanya_brian Ambutsi Mike/ MMUST/ @Ambutsi2 Gershom Mbwambo/KILIMANJARO CLINICAL RESEARCH INSTITUTE (KCRI)/ @gershommbwambo Stephen Okeyo/CDC-KEMRI/CEBIB/@stephenokeyo65 Boaz Wadugu/KCRI-Biotechnology laboratory/@waduguboaz Rogers Kimutai/KU --- Useful links --- Join slack https://join.slack.com/t/bhki/shared_invite/zt-gao8p7vl-W63ySPkw6cHB0lsiBdTyfQ Tutorial for interpreting fastqc report/results: https://dnacore.missouri.edu/PDF/FastQC_Manual.pdf fastq https://zenodo.org/record/582600/files/mutant_R1.fastq Variant analysis https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html Tutorials https://galaxyproject.org/learn/#tutorials-by-galaxy-training-network ## Questions and Answers 1) **Is there a variant calling pipeline in the galaxy and how can one go about it?** Yes. Here are some tutorials on variant calling of different types: https://training.galaxyproject.org/training-material/topics/variant-analysis/ 2) **Are there measures put in place to ensure data privacy for the galaxy platform users?** Each Galaxy server is different but here are the terms of service for usegalaxy.eu: https://galaxyproject.eu/gdpr/ This is what they say: *"Use of Service. The European Galaxy Sites and UseGalaxy Site are a free, public, Internet accessible resource (the “Service”). Data transfer is encrypted unless you choose to use unencrypted FTP access. Data storage is not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project principal investigator before uploading it to any public site, including this Service. If you have protected data, large data storage requirements, or short deadlines you are encouraged to set up your own local Galaxy instance and not use this Service. Your access to the service may be revoked at any time for reasons deemed necessary by the operators of the Service. You acknowledge that you are responsible for compliance of all of your data processing activities carried out on the Galaxy Service with applicable laws and regulations of the Federal Republic of Germany, the European Union as well as any laws or regulations of other legislations or any other restrictions that might be applicable due to the provenance, intended use, legal ownership of or any licensing or other legal restrictions imposed on the data being processed. You are strictly prohibited from making use of the Galaxy Service for the storage or processing of any Personal Data or Sensitive Personal Data as defined by the European Union’s General Data Protection Regulation (GDPR), including but not limited to potentially personally identifying medical data."* I.e. while usegalaxy.eu is investigating becoming GDPR compliant, they are not currently compliant with that legislation. The usegalaxy.org server has similar wording: "This is a free, public, internet accessible resource. Data transfer and data storage are not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project PI before uploading it to any public site, including this Galaxy server. If you have protected data, large data storage requirements, or short deadlines you are encouraged to setup your own local Galaxy instance or run Galaxy on the cloud." 3) **Are there tools for Tajima’s D calculations on the galaxy platform and if yes, how can one go about it?** This is not supported on usegalaxy.eu (yet) but there is a tool for this in the Galaxy toolshed based on vcftools SlidingWindow. That tool (written by Alexis Dereeper) is available on crop oriented Galaxy servers like SouthGreen Galaxy (https://galaxy.southgreen.fr/galaxy) and on the EiB (Excellence in Breeding) Galaxy: http://galaxy-demo.excellenceinbreeding.org/ . While there was a tutorial on Galaxy for Crops at the recent BCC2020 conference, this training material is not yet available on the GTN training hub (training.galaxyproject.org). The SlidingWindow tool should be straightforward to use if you have variants in a VCF file - it simply takes VCF input and produces output including Tajima's D. 4) **Is there an option available that would allow one to use galaxy tools on the command line or on the HPC environment?** Not directly. Galaxy Tools are, in general, interfaces to command line tools though. 5) **Main applications of Galaxy pipeline in metagenomics analysis?** If by "metagenomic" you mean 16S amplicon data, there are tutorials on that using mothur: https://training.galaxyproject.org/training-material/topics/metagenomics/ - dada2 is also supported as a tool in Galaxy For shotgun metagenomic data, metaphlan is available on some Galaxy servers (e.g. usegalaxy.eu) 6) **How does the galaxy platform relate to synthetic bio?** Googling "galaxy synthetic biology" reveals that there is a project to make synthetic biology tools available on the Galaxy platform. More can be found out here: http://www.jfaulon.com/galaxy-synbiocad-portal/ As I personally don't know much (anything?) about synthetic biology I cannot evaluate the tools. 7) **What are the strategies that are important to the development and success of open-source bioinformatics tools?** Torsten Seemann's "Ten recommendations for creating usable bioinformatics command line software" is a useful guide. Also the Bionitio aims to create "templates" for creating command line bioinformatics tools: https://academic.oup.com/gigascience/article/8/9/giz109/5572530 . After you created a tool that you think other people will find useful, you want to make it easy to install. Conda has become the de-facto standard for installing bioinformatics tools. Here is a guide to creating a conda recipe for your tool: https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html - and the bioconda project is a good place for hosting such recipes: https://bioconda.github.io/ Finally, Galaxy is a great way to making your tool accessible to people who are not using the command line. The GTN has some tutorials on writing Galaxy tool wrappers and packaging your tool for use in Galaxy: https://training.galaxyproject.org/training-material/topics/dev/ 8) **What are some other open-source bioinformatics tools, what are their advantages and which ones are safe to analyze large datasets of genomic data?** There are thousands of open-source bioinformatics tools - this question is difficult to answer. The majority of bioinformatics publications are based on use of open source tools. 9) **What are the most reliable open-source bioinformatics tools for RNA modelling?** I cannot speak to reliability, especially as I don't know which aspect of RNA modelling is meant, but the Galaxy RNA Workbench (https://github.com/bgruening/galaxy-rna-workbench#training) is a project of the same group that hosts the usegalaxy.eu server and includes a number of RNA-oriented tools. These are available on usegalaxy.eu. 10) **What are the minimum computer specifications to run open-source bioinformatics tools?** This depends on which tool is being asked about. BHKi June Virtual Meet-up === :::info - **Topic:** Core competencies in Bioinformatics - **Location:** Zoom - **Date:** June 19th, 2020 10:30 PM (EAT) - **Program** A brief introduction by the facilitator - 5 min Presentation by Amel - 30 min Questions session - 15 min Live polls - 4min (Three questions) Networking in breakout rooms (3members each) - 10min (Optional) - **Presenters:** - Amel Ghouila - Yo Yehudi - Malvika Sharan - Caleb Kibet - Toby Hodges - **Contact:** <bioinformaticshubofkenya@gmail.com>, [@BioinfoHub_KE](https://twitter.com/BioinfoHub_KE) - **Host:** BHKi ::: ## Participants roll call: Please write your name and email address: Festus Nyasimi - BHK Gilbert Kibet-Rono - kibet.gilbert@ymail.com Ronald Tonui - tonuironald@gmail.com Michael Landi - BHKi Lucy Njoki - lucynjokinjuki@gmail.com Edwin Njuguna- eddyynjuguna@gmail.com Yo Yehudi - yo@openlifesci.org Karega Pauline - BHKi Mugoya Trevor - greena.mugoya@gmail.com John Oketch -oketchjohn9@gmail.com Kibogo Phinehas - aramaphinehas@yahoo.com Toby Hodges / tbyhdgs@gmail.com Peter Gichuki - pgichuki1@gmail.com Stephen Kanyerezi - kanyerezi30@gmail.com Harriet Natabona - nattybona2012@gmail.com Taremwa Yoweri-tyoweri@gmail.com Japheth Kipkulei- jkipkulei@gmail.com David Kiragu- BHKi Caleb Kibet cecilia katunge email: justus kyalo kasivalu, cvl kabete, kyalo988@gmail.com Jane Njaramba-njanekagure@gmail.com Kiberu Davis: kiberu.i.davis@gmail.com Martha Luka : mawia.martha@gmail.com Kennedy Mwangi: wanjaukm@gmail.com Stephen Njuguna: sephoh.njuguna@gmail.com Evalyne wambui -samanthaeva98@gmail.com Simeon Hebrew - simeonhebrew@gmail.com Shamim Osata-shamimosata@gmail.com Stella Esther Nabirye - stellanabirye@gmail.com Yves Hermandez Tchiechoua - yvestchiechoua@yahoo.fr Brenda Makena - brenda.mugambi@yahoo.com Armel Tangomo Ngnintedem - tangomoarmel25@gmail.com Erick Nyaga - ericknyaga21@gmail.com Suzanne Njuki-Suzannenjuki@gmail.com Samuel Oduor - samordil@gmail.com Jacqueline Waeni - jacqwaeni@gmail.com Edna Wanjiru - ednamacharia@gmail.com Peter van Heusden - SANBI, UWC, South Africa Shahiid Kiyaga-ashakykiyaga91@gmail.com Emmanuel J. Mande - emande@idi.co.ug Parwos Abraham -parwosabraham@yahoo.com **Question phase 1 :** These are questions submit by participants prior meeting. Amel 1. Application of bioinformatics in the fight against covid-19, particularly in coming up with vaccines against the SARS-CoV 2 and others coronaviruses 2. What are the ethics in bioinformatics pracand it's practices? Ethics in Bioinformatics - There's need to understand where data is coming from and whether it is consented or not. There's also need to understand when designing tools, where and what these tools will be used for. Inclusivity in the designing of tools in order to serve the communities better is also key Applications in the fight against covid-19 - Useful in helping understand the data and how to interpret it. It is lso important to know how to communicate results. How can we increase bioinformatics awareness among scientists trained in pre-bioinformatics era and help them recognize bioinformatics potential in their studies? (from chat) Malvika and Yo 1. What is open science? What is reproducible research? Why should we care? Is open distance learning a form of open educational resource? Practice of sharing science—results, methods, code. Crediting people for their output Citizen participation Reproducibility: redo, verify, use different technique and get same results. *be thoughtful. Share what you can, don't share when you shouldn't—be open as possible and closed as necessary. open distance learning: only when shared after delivery and allow others to reuse. Share all supporting resources. 2. Is open science attainable in all fields of biological research and in underdeveloped research settings? ** This will also provide an opportunity to talk about the ols-2 and how interested individuals can apply. It is attainable for some studies, some may be limited in terms of how much data or techniques may be shared (always check the embargos attached to your study), but overall, it is attainable. OLS - https://openlifesci.org/ - applications open until June 30 2020, Webinar on the 23rd. (Visit openlifesci.org for joining details!) For OLS questions please contact team@openlifesci.org, yo@openlifesci.org, or on twitter: @openlifesci, @yoyehudi. Very happy to answer any questions after the call! Toby 1. What is the advice to an individual interested in bioinformatics who has basics only? The key basic skills required for someone with no prior knowledge on bioinformatics? key skills: search, filter, extract, cross-reference data from large databases - make use of data & knowledge that’s already out there! sequence alignment - core concept in evolution/phylogeny, functional genomics, genome assembly, differential expression analyses, transcriptomics, metagenomics, etc etc etc parsing data - reading data from many different (often messy!) file formats organisation - keep track of what/where your data is, which analyses you’ve run, with what parameters/settings, etc start with web-based tools e.g. EMBL-EBI (https://www.ebi.ac.uk/services)/NCBI (https://www.ncbi.nlm.nih.gov/) resources EBI Train Online (https://www.ebi.ac.uk/training/online/) has a huge amount of freely-accessible content introducing the fundamental concepts & guiding users on getting started once you begin working with larger amounts of data, you’ll probably need to learn some command line computing (avoid long waits/costs of uploading data & downloading results) many great resources to learn the basics, e.g. Software Carpentry Shell (http://swcarpentry.github.io/shell-novice/) & Data Carpentry Genomics (https://datacarpentry.org/shell-genomics/) lessons practice command line in your browser: https://cli-boot.camp The Galaxy platform (https://galaxyproject.org/) provides a fantastic GUI alternative (https://usegalaxy.eu/) for those unfamiliar with command line computing Learn Galaxy (https://galaxyproject.org/learn/) and Galaxy Training Network (https://training.galaxyproject.org/) have many excellent tutorials to learn the platform and bioinformatics simultaneously Galaxy can be installed locally (https://galaxyproject.org/admin/get-galaxy) to avoid upload/download of data over the Internet, but requires access to available server and some knowledge of server administration some understanding of statistics is also necessary Bernd Klaus’ teaching material (https://www.huber.embl.de/users/klaus/teaching.html#statistical-methods-in-bioinformatics) is a good place to start (if you know R) & Modern Statistics for Modern Biology (https://www.huber.embl.de/msmb/introduction.html) by Huber & Holmes is a more comprehensive, but less accessible, guide to modern methods other good, free, online resources I know of for learning bioinformatics: H3ABioNet Resources: Online Training (https://www.h3abionet.org/training) Workshops (https://www.h3abionet.org/training) Simon Cockell’s Lockdown Learning Bioinformatics-along (https://www.youtube.com/playlist?list=PLzfP3sCXUnxEu5S9oXni1zmc1sjYmT1L9) videos Applied Computational Genomics (https://github.com/quinlan-lab/applied-computational-genomics) from Aaron Quinlan’s Lab at University of Utah for more, see “Teaching” section of http://quinlanlab.org/ more at https://bio-it.embl.de/online-learning/ Prof David Tabb's lectures on bioinformatics and proteomics: https://pickingupthetabb.wordpress.com/building-a-bioinformaticist/free-online-training-in-bioinformatics-and-biostatistics/ Materials from the 2020 SANBI Bioinformatics Course http://biocourse.wp.sanbi.ac.za/?doing_wp_cron=1592560734.8887441158294677734375 Finally: know that, if you’re spending a lot of time searching the internet for help/answers, you’re not alone! (search first, to see if your question was already asked by someone else!) http://seqanswers.com/ http://www.biostars.org/ Twitter - if you follow the right people - is very useful for staying up to date with the field does anyone have Twitter lists that they recommend? rather more noisy, but also has a Slack: https://www.reddit.com/r/bioinformatics/ for parasitologists (with some e-resources): https://twitter.com/parasiteslack?lang=en which also has an associated Slack I’m sure many on this call can recommend other great resources! Please add links below Peter van Heusden: Galaxy track at BCC2020 https://bcc2020.github.io/ Yo: +1 to this - there's also a lot of great low-cost virtual bioinformatics training available at BCC2020 too 2. Is there a specific programming language preferred in the bioinformatics field? Python & R are equally popular and great places to start - free, open source, easy to install, huge online community, many resources to help you learn Choose whichever language your friends/colleagues are already using - I suspect this is the single biggest predictor of success Otherwise: Python is good for image analysis (so is ImageJ/Fiji, which provide a graphical interface), and more broadly applicable/useful outside bioinformatics, R has more cutting-edge statistical methods because of Bioconductor (http://bioconductor.org/) if using/learning Python: check out Biopython (https://biopython.org/) 3 How to build strong skills on a given programming language for data analysis and visualization. study other people’s code - how do they do what they do? Python: learn numpy; pandas; matplotlib Use JupyterLab or Jupyter Notebook To install JupyterLab etc: https://www.anaconda.com/products/individual R: learn Tidyverse (dplyr; readr; tidyr; purrr; ggplot2; etc) Use Rstudio; work in RMarkdown Make it Open & Reproducible https://github.com/BioinfoNet & https://bioinfonet.github.io/OpenScienceKE/ https://openlifesci.org/ (OLS-2 applications now open!) data analysis: Jake Vanderplas’s Data Science Handbook (Python) (https://jakevdp.github.io/PythonDataScienceHandbook/) Hadley Wickham’s R for Data Science (https://r4ds.had.co.nz/) Wes McKinney’s Python for Data Analysis (https://wesmckinney.com/pages/book.html) sadly not free: eBook PDF (no DRM) costs €~34 Python for Biologists - older edition free (http://userpages.fu-berlin.de/digga/p4b.pdf), newer edition not (https://pythonforbiologists.com/) for installing command line software locally on Mac or Linux, the conda project (from Anaconda, listed above) and bioconda is good: https://bioconda.github.io/ Rosalind (http://rosalind.info/) provides programming challenges that will help you to simultaneously develop programming skills and insight into bioinformatic algorithms & approaches for data viz: use an interactive environment like Jupyter or RStudio - makes iterating over/exploring new visualisations much more fun. I’m sure many on this call can recommend other great resources! Please add links below Caleb 1. What are legal frameworks around bioinformatics and e-health in Kenya? - There is no specific regulation affects bioinformatics, but the recent Personal Data Protection Act would affect those dealing with personal genomic data. For e-health, the Kenya Health Policy Framework is still being used, although a conversation about regulating e-health is ongoing. 2. I am enthusiastic about delving into bioinformatics although my background is epidemiology and biostatistics, how do I transition to more bioinformatics work? Toby has provided excellent answers to this question. Specifically, seek to build your molecular Biology and genomics skills. As Amel mentioned, chose a path and develop the skills required for that competency. 3. How can we improve reducibility in research? See the tools Toby has shared. - Document your work and include the required meta-data to your data - Share the data, where possible and consented - Share your code - use literate programming to share your analysis and results - At a higher level, make use of workflow languages and containers. Question 2 : These are question arising from the session. Members can type below: Insight on how to transition from a user to a scientist? Understanding how tools work and getting into coding and programming is a good place to start. Please elaborate the use of Bioinformatics in Cancer care and personalised medicine - (from chat) Networking: In the break out rooms please share; Name and institution affiliated to. Briefly describe your current work in bioinformatics. For students, they can share their experience in Bioinformatics so far. NB: 3min for each member Resources for 3-month coding internships (some bioinformatics orgs participate) - https://www.outreachy.org/ - paid internships that happen twice a year in summer for northern hemisphere and southern hemisphere. - Google Summer of Code https://summerofcode.withgoogle.com/ - unfortunately northern hemisphere-centric. Start applying in Feb/March time

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.