###### tags: `R` `reshape` `reshape2::dcast` `with` `tidyr::spread` `tidyr::gather` `tidyr::separate` `reshape2::melt` # R user group meets @ QIMR ![Sep 2019](https://i.imgur.com/cUdNIY9.jpg) --- ## Our speakers in 2020 **November** TBA **December** TBA --- ## Upcoming meeting ### Meeting 18 * **Time**: **1-2pm, Tuesday 24 November 2020, AEST** * **Bancroft Auditorium, Level 6, Bancroft Building, QIMR Berghofer** * Join from PC, Mac, Linux, iOS or Android via https://qimrberghofer.zoom.us/j/85788177791?pwd=TTVFdGcxYmFNaUtvREZoY1hKbHRjdz09 * Password: 339501 * **Speaker**: Dr Olga Kondrashova * **Affiliation**: Medical Genomics Group, QIMR Berghofer Medical Research Institute * **Topic**: Tidyverse in action * **Summary**: In this meeting, I will demonstrate key tidyverse packages and verbs that I use for my everyday data wrangling, summarising, and plotting. In particular, I will cover: * dplyr and purrr packages * ggplot2 * cowplot **RSVP** RSVP is required. This allows us to share presentation slides. Please use our Google Form to RSVP [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2020-11-24) The link to the Zoom meeting can be copied in the RSVP form. --- ## Past meetings and presentation content ### Meeting 17 * **Time**: **1-2pm, Tuesday 27 October 2020, AEST** * **Bancroft Auditorium, Level 6, Bancroft Building, QIMR Berghofer** * Join from PC, Mac, Linux, iOS or Android via https://qimrberghofer.zoom.us/j/84518168113 * **Speaker**: Rebecca Johnston * **Affiliation**: QIMR Berghofer Medical Research Institute * **Topic**: R programming tips from the group * **Summary**: In this special meeting, I will present R programming tips on behalf of the R user group members. These programming tips are largely based on tidyverse packages and include: * Dealing with file paths across operating systems * Reading multiple files into a single data frame * Summarising and reshaping data * Microbenchmarking code **RSVP** RSVP is required. This allows us to share presentation slides. Please use our Google Form to RSVP [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2020-10-27) ### Meeting 16 * **Time**: **1-2pm, Tuesday 29 september 2020, AEST** * Join from PC, Mac, Linux, iOS or Android https://uqz.zoom.us/j/96267953602 * **Speaker**: Jack Galbraith * **Affiliation**: UQ Diamantina * **Topic**: Clustering, Classification and model assessment in R. * **Summary**: An introduction to K-means and Hiearchical clustering as well as Knn, Naive Bayes, Discriminant, and Logistic classifiers using examples from the 'MASS', 'class' and 'naivebayes' packages. Data-splitting for training and test sets, clustering accuracy, and cross-validation will be covered using both the caret package and basic R. This should hopefully cover the basics for many machine learning projects. * **RSVP**: RSVP is required. This allows us to send you a short notice if there is a last-minute change or to share presentation slides. Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2020-09-29) Pdf version complete with comments available at: https://jg648.github.io/ClustClass&Caret.pdf Code only, available below: ```r! library(knitr) library(scatterplot3d) library(MASS) library(class) library(caret) str(iris) set.seed(100) x<-c(rnorm(100, mean = 5, sd=1), rnorm(100, mean = 17, sd = 4)) y<-c(rnorm(100, mean = 50, sd = 7), rnorm(100, mean = 10, sd = 2)) xydata<-data.frame(x, y, stringsAsFactors = F) ktest<-kmeans(xydata, 2, nstart=5) plot(x,y, col=ktest$cluster) ktest<-kmeans(xydata, 5, iter.max=10, nstart=5) plot(x,y, col=ktest$cluster) k=3 Ktest<-kmeans(iris[,-length(iris)], k, iter.max = 20, nstart = 10) head(Ktest) colorss<-c(2:(2+k-1)) shapes<-as.numeric(iris$Species) scatterplot3d(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Width, color = colorss[Ktest$cluster], angle=70, pch = shapes) k=5 Ktest<-kmeans(iris[,-length(iris)], k, iter.max = 20, nstart = 10) colorss<-c(2:(2+k-1)) scatterplot3d(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Width, color = colorss[Ktest$cluster], angle=70, pch = shapes) screepltVals<-c() for(i in 2:10){ Ktest<-kmeans(iris[,-length(iris)], i, iter.max = 20, nstart = 10) reslt<-Ktest$tot.withinss/Ktest$totss screepltVals<-append(screepltVals, reslt) } plot(x=2:10, screepltVals, type="l", xlab = "# Clusters", ylab = "WithinSS/TotSS") D.<-dist(iris[,-5]) H.<-hclust(D., method = "ward.D2") H.$labels<-iris[H.$order,5] plot(as.dendrogram(H.)) rect.hclust(H., 3, border = "red") HClusts<-cutree(H., k=3) #identify(H., function(x) print(x)) featurePlot(iris[,-5], iris$Species, plot="pairs", col=c(3,4,5)) irisPart.<-createDataPartition(iris$Species, p=0.8, list=F) head(irisPart.) irisPart.<-createFolds(iris$Species, k=5) kable(head(data.frame(irisPart.)),caption = "Example of k-fold Cross-validation") irisPart.<-createResample(iris$Species, times=10) names(irisPart.)<-paste("Btsmp", 1:10) kable(head(data.frame(irisPart.)), caption = "Example of Bootstrapping Sampling") fitControl.boot<-trainControl(method="boot", number=10) KNN.caret.boot<-train(Species~., data=iris, trControl=fitControl.boot, method="knn") KNN.caret.boot table(predict(KNN.caret.boot, iris), iris$Species, dnn = c("Predicted", "Actual")) RF.caret.boot<-train(Species~., data=iris, trControl=fitControl.boot, method="rf") RF.caret.boot table(predict(RF.caret.boot, iris), iris$Species, dnn = c("Predicted", "Actual")) fitControl.CV<-trainControl(method="LOOCV", p=0.9) pcaIRIS<-predict(preProcess(iris, method="pca"), iris) kable(head(pcaIRIS), caption="Example of PCA preprocessing") KNN.caret.PCAboot<-train(Species~., data=pcaIRIS, trControl=fitControl.boot, method="knn") KNN.caret.PCAboot featurePlot(pcaIRIS[,-1], y=pcaIRIS$Species, plot="pairs") featurePlot(pcaIRIS[,-1], y=predict(KNN.caret.PCAboot, pcaIRIS), plot="pairs") featurePlot(pcaIRIS[,-1], y=predict(KNN.caret.PCAboot, pcaIRIS), plot="density") ``` --- ### Meeting 15 * **Time**: **1-2pm, Tuesday 25 August 2020** * **Meeting room 2, Level 6, Central Building, QIMR Berghofer (Seating restricted to 10)** * **Speaker**: Dr. Dwan Vilcins * **Affiliation**: Children’s Health and Environment Program, Child Health Research Centre, UQ * **Topic**: Tidy models and plotting in R, with Broom and Dotwhisker packages * **Summary**: Plotting estimates from multiple models is made easy by combining the broom package (for tidy models) and dotwhisker (for plotting). This session will give an introduction to both packages. * **RSVP**: RSVP is required. Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2020-08-25) * **Laptop** Attendees are encouraged to bring their laptops and try the R code provided by the speaker ```r! # Tidy models for plotting with Broom and DotWhisker # R peer group August meeting # Author: Dwan Vilcins # Date: 21 August 2020 library(tidyverse) library(broom) library(lmerTest) library(dotwhisker) data(mtcars) glimpse(mtcars) # Broom package # Useful for making tidy model output, that can be used for other purposes. # 3 main functions: tidy, augment and glance # A simple example mod1 <- lm(mpg ~ wt, mtcars) summary(mod1) # What if you want the coefficients in a matrix? # option 1 coef(summary(mod1)) # option 2 mod1_tidy <- tidy(mod1) mod1_tidy class(mod1_tidy) # Predictions and residuals - use augment mod1_aug <- augment(mod1) mod1_aug plot(mod1_aug$.resid) # Summarise your regression model - use glance mod1_glance <- glance(mod1) mod1_glance # What about mixed models? # Use broom.mixed data(swiss) glimpse(swiss) mod2 <- lm(Fertility ~ Catholic, data = swiss) summary(mod2) mod2a <- lm(Fertility ~ Catholic + Education, data = swiss) summary(mod2a) # Create a categorical education variable summary(swiss$Education) hist(swiss$Education) swiss$educat <- cut(swiss$Education, c(0, 5, 20, 53), labels = c("Low", "Medium","High")) table(swiss$educat, swiss$Education) table(swiss$educat) swiss %>% ggplot(aes(x = Catholic, y = Fertility, group = educat, colour = educat)) + geom_line() + geom_point() mod3 <- lmer(Fertility ~ Catholic + (1|educat), data = swiss) summary(mod3) anova(mod3) library(broom.mixed) tidy(mod3) # How can I plot tidy models # Lots of options, but my fav is dotwhisker # Dotwhisker is intuitive, easy to use, and allows easy plotting of multiple model estimates # Simple example dwplot(mod2a) # Plotting multiple models # If plotting multiple models with same data it can be easy to do in one step mod4 <- mtcars %>% group_by(am) %>% # group data by trans do(tidy(lm(mpg ~ wt + cyl + disp + gear, data = .))) %>% # run model on each grp rename(model=am) %>% # make model variable relabel_predictors(c(wt = "Weight", # relabel predictors cyl = "Cylinders", disp = "Displacement", gear = "Gear")) mod4 dwplot(mod4) # If plotting models from different data sets, you can use rbind on tidy models to make one useful df # Now, let's play with our plot! class(dwplot(mod4)) # Add a vline dwplot(mod4, vline = geom_vline(xintercept = 0, colour = "grey60", linetype = 2)) # Change the plot features dwplot(mod4, vline = geom_vline(xintercept = 0, colour = "grey60", linetype = 2)) + theme_bw()+ labs(x = "Estimated mpg change", title = "Gas mileage by transmission", colour = "Transmission") + scale_colour_manual(labels = c("Automatic", "Manual"), values = c("purple", "green")) ``` --- ### Meeting 14 * **Time**: **1-2pm, Tuesday 28 July 2020** * **Seminar room, Level 6, Central Building, QIMR Berghofer** * **Speaker**: Lun-Hsien CHANG * **Affiliation**: Immunology in Cancer and Infection, QIMR Berghofer Medical Research Institute * **Topic**: Tackling repetitive tasks with R- An Introduction to vectorised programming and parallel programming * **Summary**: In this special meeting, I will cover the basics of the following topics: * Serial versus parallel computing * Vectorised operations * for loop * Some functions in the apply family * The parallel, doParallel and doSNOW package * and compare the time used by a self-written function run within a for loop, lapply, sapply, and foreach loop. Users are encouraged to view my source code and documentation at [R user group meeting 14](https://hackmd.io/@Chang/R-user-group-meeting14) * **RSVP**: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2020-07-28) --- ### Meeting 13 * **Time**: **1-2pm, Tuesday 25 Feb 2020** * **Seminar room, Level 6, Central Building, QIMR Berghofer** * **Speaker**: Lun-Hsien CHANG * **Affiliation**: Genetic epidemiology lab, QIMR Berghofer Medical Research Institute * **Topic**: Introduction to a data dashboard- your ultimate information management tool that enables real-time data visualisation and reporting * **Summary**: In this meeting, I will demonstrate dashboards created in Microsoft Power BI. Users will be able to create their own dashboards via a graphical user interface (Microsoft Power BI). * **RSVP**: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2020-02-25) ![](https://i.imgur.com/zLCDNWY.png) --- ### Meeting 12 * Time: **1-2pm, Wednesday 27 Nov 2019** * **Seminar room, Level 6, Central Building, QIMR Berghofer** * Speaker: Jeffrey Molendijk * Affiliation: Hill Group, Precision and Systems Biomedicine, QIMR Berghofer Medical Research Institute * Topic: [10 tips to R programming from the community](https://hackmd.io/@Chang/R-user-group-meeting12) In this special meeting, I will present tips to R programming on behalf of the R user group members. These programming tips include data visualization using ggplot2, tidying up messy output from statistical functions, simultaneously merging multiple data frames, sorting data in a user-defined order, and examining and modifying data structures. * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-11-27) --- ### Meeting 11 * Time: **1-2pm, Wednesday 16 Oct 2019** * **Meeting room 1, Level 6, Central Building, QIMR Berghofer** * Speaker: Jessica Sexton * Affiliation: University of Queensland – Mater Research Institute * Topic: Spatial Analysis of Weather Data & Stillbirth Risk Today, we will be discussing a method to import and collate raw weather station files from the Bureau of Meteorology and linking it to clinical data to explore whether the rate of stillbirth is associated with rainfall or ambient temperature. * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-10-16) ### Meeting 10 * Time: **1-2pm, Wednesday 25 September 2019** * **Seminar room, Level 6, Central Building, QIMR Berghofer** * Speaker: Stéphane Guillou * Affiliation: Digital Scholars Hub (Library), University of Queensland * Topic: **Gentle introduction to Git for R users** Git is a version control system that is widely used to save a history of anything text-based. Mainly used for collaborative programming, it can also help to write documentation and theses, backup files, or collaborate on an R package and make it available to the public on sites like GitHub or GitLab. This talk will introduce the main Git commands, explain RStudio's Git integration, and demonstrate how it can be useful for R users. * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-09-25) ### Meeting 09 * Time: **1:10 - 2:30 PM, Wednesday 28 August 2019** * **Meeting room 2, Level 6, Central Building, QIMR Berghofer Medical Research Institute** * Speaker: Lun-Hsien Chang and Dwan Vilcins * Topic: (1) String manipulation in R and (2) markdown in R for beginners. This joint presentation will cover extracting patterns from string data, reading non-delimited text, string replacement and basic documentation skill using R markdown. * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-08-28) **Manipulating string data in R** [Slides viewable on Google drive](https://drive.google.com/open?id=1Atnrxe65BWhzGd1NGRcsupoDtbqhMbP5) **Markdown for beginners - annotated script** ```r! ## Introduction to markdown ## This annotated script has been created to help new users learn markdown ## Author: Dwan Vilcins ## Date: 26 August 2019 ## This is taken from my talk, and I have annotated this script so it ## steps users through the process. ## More advanced users will find it easier to produce their script directly ## in markdown, but I find it easier to write my script first ## and cut and paste the parts I want into markdown library(tidyverse) library(tableone) # Creates a descriptive table object library(psych) #For describe and cs library(sjPlot) # for tab_model library(gapminder) # Data for this example # The below section is some example code for using in markdown --------------------- # Part 1 - the syntax #Examine the data glimpse(gapminder) describe(gapminder) # Create a summary table myvars <- cs(year, lifeExp, pop, gdpPercap, continent) tabone <- CreateTableOne(vars = myvars, strata = "continent", data = gapminder) print_tabone <- print(tabone, showAllLevels = TRUE, quote = FALSE, noSpaces = TRUE, printToggle = FALSE, nonnormal = FALSE, contDigits = 2) print_tabone # I use knitr::kable(print_tabone) in markdown to get a nicely formatted table # Plot life expectancy by gdpPercap theme_set( theme_bw() + theme(legend.position = "top") ) ggplot(gapminder, aes(gdpPercap, lifeExp, colour = continent), alpha = 0.5) + geom_point() ggplot(gapminder, aes(y = lifeExp, x = continent)) + geom_boxplot() # Make a simple regression summary(object_a <- lm(lifeExp ~ gdpPercap, data = gapminder)) # I use tab_model(object_a) in markdown to get a formatted summary table save(object_a, file = "object_a.RData") # Make another regression summary(object_b <- lm(lifeExp ~ gdpPercap + pop, data = gapminder)) save(object_b, file = "object_b.RData") # Part 2------------------------------------------------------------------ # At this point you are ready to make a markdown document. Click the down arrow # next to the new document on the top left of screen, and select R markdown # This opens a new tab with the prefilled markdown code chunks as an example # The first part is the yaml - use this for title, date, and document type # You can change the document type by changing the word/html/pdf to another type # Next add your libraries, data frames and other objects to the set-up chunk # Use a markdown cheat sheet to set any other global options here # (such as whether you syntax printed with the echo = TRUE option) # Open a new chunk with contorl-alt-i or the insert button on the top right # Cut and paste your code into the grey code chunks # Write your text in the white gaps. You can format it, and the cheat sheet # will help you learn the syntax for this. # When complete, click on knit to get your new markdown document # Part 3----------------------------------------------------------------------- # Some common errors ---- # Running a function without loading the libary first truehist(gapminder$lifeExp) library(MASS) #-------- # Using an object without loading it first (or creating it in markdown first) plot(object_b) load("object_b.RData") # ---------- # Using only part of the code in markdown that is required to get the desired output tabone_noStrata <- CreateTableOne(vars = myvars1, data = gapminder) print_tabone_nostrata <- print(tabone_noStrata, showAllLevels = TRUE, quote = FALSE, noSpaces = TRUE, printToggle = FALSE, nonnormal = FALSE, contDigits = 2) knitr::kable(print_tabone_nostrata) myvars1 <- cs(year, lifeExp, pop, gdpPercap) #Need this first ``` ### Meeting 08 * Time: **1-2:30 PM, Wednesday 31 July 2019** * **Seminar Room, Level 6, Central Building, QIMR Berghofer Medical Research Institute** * Speaker: Ahmed Mohamed * Topic: Introduction into R package development and quasiquotation: a demonstration of how you can transform your workflows into a publishable R package. * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-07-31) ### Meeting 07 * Time: **1-2 PM, Wednesday 26 June 2019** * **Seminar Room, Level 6, Central Building, QIMR Berghofer Medical Research Institute (booked 12:30PM- 2PM)** * Speaker: Muhammad Khan * Topic: Generalized additive model (GAM): A non-linear approach modeling temperature and bio-markers * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-06-26) ### Meeting 06 * Time: **1-2 PM, Wednesday 29 May 2019** * **Seminar Room, Level 6, Central (booked 12:30PM- 2PM)** * Speaker: Javier Cortes Ramirez * Topic: Drawing maps with the Tmap package in R * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-05-29) #### How to highlight your R code in hackMD? After you click the insert code icon, type 'r!' as the first line of the following printscreen. Note hackMD supports syntax highlighting for most of the open source languages. ##### Highlighting R code ![](https://i.imgur.com/BMvFZTJ.png) ##### No highlighting ![](https://i.imgur.com/knipffi.png) #### Peer group problem We have our first online member problem! Karen needs help to recreate a graph. The purpose of this group is to provide advice and support, and I encourage you to engage with this problem. Please put any code snippets below or email to Dwan This is the plot to be created: ![](https://i.imgur.com/jF4Goex.png) Data can be provided. Please email Dwan to request a copy d.vilcins@uq.edu.au #### Solution to plot problem Thanks to Stephane from CDS for providing the first lot of code for this problem. Can anyone address the next part outlined in the last line of code? ```r! # create sample dataset ---- data <- data.frame(participant = 1:20, tiring_light = sample(1:5, 20, T), taxing_straighforward = sample(1:7, 20, T), impossible_manageable = sample(1:2, 20, T), boring_inspiring = sample(3:6, 20, T), difficult_easy = sample(3:7, 20, T), useless_useful = sample(5:6, 20, T)) # reshape for tidy data ---- library(tidyr) data_long <- gather(data, key = question, value = answer, useless_useful:tiring_light) # summarise to get useful stats ---- library(dplyr) summary <- data_long %>% group_by(question) %>% summarise(mean = mean(answer), median = median(answer), sd = sd(answer)) # with ggplot2 ---- # boxplot with original long data library(ggplot2) ggplot(data_long, aes(x = question, y = answer)) + geom_boxplot() + coord_flip() # linerange with summarised stats: mean and standard deviation p <- ggplot(summary, aes(x = question, ymin = mean - sd, ymax = mean + sd)) + geom_linerange() + geom_point(aes(y = mean)) + coord_flip() p # show the plot # ... we can add a vertical bar midway p <- p + geom_hline(yintercept = 3.5, linetype = "dashed", colour = "grey") p # ... and add tick for all values p <- p + scale_y_continuous(breaks = 1:7) # ... and change the look p + theme_linedraw() + ylab("answer") # with ggpubr ---- # ggpubr has convenient functions to directly visualise summary statistics library(ggpubr) # errorplot with original long dataset # the default is mean and standard error: ggerrorplot(data_long, x = "question", y = "answer", orientation = "horizontal") # we can change the stat for mean and standard deviation: ggerrorplot(data_long, x = "question", y = "answer", orientation = "horizontal", desc_stat = "mean_sd") # or mean and 95% confidence interval: ggerrorplot(data_long, x = "question", y = "answer", orientation = "horizontal", desc_stat = "mean_ci") # what next: how could we split the question names and use a secondary axis? ``` ### Meeting 05 * Time: **1-2 PM, Wednesday 24 April 2019** * **Seminar Room, Level 6, Central (booked 12:30PM- 2PM)** * Speaker: Jessica Sexton * Topic: Predictive modelling/calibration/validation: code and packages * RSVP: Please use our [Google Form to RSVP](https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?entry.1631135870=2019-04-24) ### Meeting 05 - Cancelled [Miles McBain's Linked in profile](https://www.linkedin.com/in/miles-mcbain-58941365/) [Miles McBain's Gitgub](https://github.com/milesmcbain) Please view Mike's profile, and let Chang know on what topic you would like to hear Mike speak. He has an interesting background, and is a strong programmer, so there may be an aspect of his experience you would like to draw upon. --- ### Meeting 04 * time: **1-2 PM, Wednesday 27 Feb 2019** * **Seminar Room, Level 6, Central (booked 12:30PM- 2PM)** * speaker: (1)Dwan Vilcins,~~(2)Muhammad Khan~~ * topic: (1)binary and special operators in R. This brief presentation will introduce the most common binary and special operators in R, including the pipe operator. ~~(2) Introduction to Generalised Additive Modelling (GAM) in R. * Registration is now required to attend the meeting.~~ Please [sign up](https://goo.gl/forms/tWT2galKZMsfjKlm1) Stéphane mentioned this eLife "computationally reproducible" article that he's excited about: https://repro.elifesciences.org/example.html #### Special operators in Base R %% Modulo %in% Match %*% Matrix multiply %/% Integer divide, binary %*% Matrix product, binary %o% Outer product, binary %x% Kronecker product, binary #### Match operator ```r! # %in% operator data(iris) str(iris) #Create a vector of values I am interested in size <- c(1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9) # Use %in% to see if these values are in my dataframe (returns a logical vector) iris$Petal.Width %in% size #Subset with %in% med_petals <- iris[iris$Petal.Width %in% size,] head(med_petals) ``` #### Custom operators Make your own operators with the following syntax ```r! `%anything%` <- function(x, y) {…} ``` #### The pipe operator The pipe %>% comes from the magrittr package It’s a forward pipe operator that aims to make code more readable, and allow sequencing without multiple intermediate objects The pipe takes the object on the left and ‘pipes’ it into the right hand side (Essentially means ‘and then’) Shortcut – control, shift, m ```r! # The pipe operator library(tidyverse) #Or load magrittr iris %>% group_by(Species) %>% summarise(mean = mean(Petal.Width)) med_petals2 <- iris %>% filter(Petal.Width >=1 & Petal.Width < 2) dim(med_petals2) dim(med_petals) iris %>% ggplot(aes(Petal.Length, Petal.Width, colour = (Species))) + geom_point() ``` #### Operators in packages Magrittr * %T>% returns left hand side value (not results of right side) * %$% exposes the names from the left object to the right expressions * %<>% result of a chained pipe assigned to left object Foreach, doParallel * %do% evaluates R expressions in foreach loop sequentially * %dopar% evaluates the R expressions in parallel ggplot2 * %+% allows use of new dataframe --- ### Meeting 03 * time: **1-2 PM, Wednesday 30 Jan 2019** * **Seminar Room, Level 6, Central (booked 12:30PM- 2PM)** * speaker: Adrian Campos * topic: Simulating data and experiments in R * visitors RSVP: * Dwan Vilcins * Jessica Sexton * Muhammad Khan * Jackie Creagh (maybe) * Stephane Guillou --- ### Meeting 02 #### Email text to QIMR seminar team Dear Seminar organiser, Can you please advertise our meeting as the following at What's On | This Week at QIMR Berghofer? **1-2 PM, Wednesday 19 December 2018** **R user group meeting** **Seminar Room, Level 6, Central** *Correct etiquette for writing and sharing code, ownership and attribution for work* *Appropriate mediums for sharing this work* *What are the processes behind administrating these types of sites* Stephane Guillou, Centre for Digital Scholarship The R peer group is meeting this Wednesday – please join us for support and advice with using R Are you an R user? Come join our R peer group for support, advice, and the occasional vent! This group is being established to give R users a place to share advice, ideas and ask for help. We welcome all levels of R users, including those with beginner/basic skills who are looking for some peer support. Please note we will not be teaching people with zero experience how to use R, but we will be supporting each other as we learn together. RSVPs and questions to Lun-Hsien Chang at luenhchang@gmail.com --- ### Meeting 01 * [time=Mon, Nov 26, 2018 2:28 PM] * Presenter: Chang * R script directory: `/mnt/backedup/home/lunC/scripts/test/reshape-data-between-long-and-wide.R` #### Reshaping data from long to wide ```r! install.packages("MethComp") # Method Comparison # unload package multcomp, as this package has a same-named object sbp detach("package:multcomp", unload=TRUE) library(MethComp) # Use a long-format data sbp from the MethComp package data(sbp) # Change header sbp <- data.table::setnames(sbp , old=c("meth","item","repl","y") , new=c("method","patientID","replicate","systolic.blood.pressure")) %>% select_(.dots=c("patientID","method","replicate","systolic.blood.pressure")) # Order data for presentation sbp <- sbp[with(sbp,order(patientID,method,replicate)),] str(sbp) # 'data.frame': 765 obs. of 4 variables: # $ patientID : num 1 1 1 1 1 1 1 1 1 2 ... # $ method : Factor w/ 3 levels "J","R","S": 1 1 1 2 2 2 3 3 3 1 ... # $ replicate : num 1 2 3 1 2 3 1 2 3 1 ... # $ systolic.blood.pressure: num 100 106 107 98 98 111 122 128 124 108 ... ``` ##### Using `reshape()` ```r! #----------------------------------------------------------------------------- # Convert data from long to wide using reshape() #----------------------------------------------------------------------------- ## v.names A vector with names of measure variables ## idvar A vector of variables which their unique combinations will form the row dimension of the reshaped data (85*3) ## timevar Variables to widen, forming the column dimension of the reshaped data sbp.wide.reshape <- reshape(sbp ,v.names = "systolic.blood.pressure" ,idvar = c("patientID","method") ,timevar = "replicate" ,direction = "wide" ,sep="_") ## (+) v.names can take multiple measurement variables ## (-) A limitation is only 1 timevar can be used ``` ##### Using `reshape2::dcast()` ```r! #----------------------------------------------------------------------------- # Convert data from long to wide using reshape2::dcast() #----------------------------------------------------------------------------- ## value.var 1 measure variable ## On the left of ~ Determines row dimension of the reshaped data ## On the right of ~ Determines column dimension of the reshaped data ## drop= : Should missing combinations be dropped or kept? If variable contains NA, set drop= to TRUE; Otherwise set it to FALSE sbp.wide.dcast <- reshape2::dcast(sbp ,patientID ~ method + replicate ,value.var = "systolic.blood.pressure" ,drop = FALSE) ## (+) Overcome the limitation in reshape() 1 timevar ## (-) Limitation: when there are duplicate variables, dcast() applies its default function length to your data (e.g. when your value.var takes a character column, but the reshaped values will become 1 or 0). To overcome this limitation, you will create another new column that uniquely identifies the combinations of the ~ variables. ## (-) Limitation: value.var = can take only 1 measure variable if packageVersion(“reshape2”) 1.4.3 ``` ##### Using `tidyr::spread()` ```r! #----------------------------------------------------------------------------- # Convert data from long to wide using tidyr::spread() #----------------------------------------------------------------------------- # Make a key column that uniquely identifies columns to spread sbp$method.replicate <- with(sbp,paste(method,replicate,sep = "_")) sbp2 <- sbp[,c("patientID","method.replicate","systolic.blood.pressure")] # spread() takes two columns (key & value), and spreads into multiple columns: it makes “long” data wider. sbp2.wide.spread <- tidyr::spread(sbp2 ,key= method.replicate ,value=systolic.blood.pressure ,convert=FALSE) # dim(sbp2.wide.spread) 85 10 ``` --- #### Reshaping data from wide to long ##### Using `reshape2::melt()` ```r! #---------------------------------------------------------------------- # Convert data from wide to long reshape2::melt() #---------------------------------------------------------------------- ## id.vars Names of one or more variables in long format that identify multiple records from the same group/individual. ## measure.vars Names of measurement variables to reshape ## variable.name A new column that will contain the names of measurement variables ## value.name A new column that will contain the values of measurement variables sbp.long.melt <- reshape2::melt(sbp.wide.dcast # dim(sbp.wide.dcast) 85 10 ,id.vars=c("patientID") ,measure.vars=c("J_1","J_2","J_3","R_1","R_2","R_3","S_1","S_2","S_3") ,variable.name="method.replicate" ,value.name="systolic.blood.pressure" ) # dim(sbp.long.melt) 765 3 # Split column method.replicate into two columns method and replicate sbp.long.melt2 <- sbp.long.melt %>% tidyr::separate(method.replicate ,c("method","replicate") ,sep="_" ,remove=FALSE) ``` ##### Using `tidyr::gather()` * [How to reshape data in R: tidyr vs reshape2](http://www.milanor.net/blog/reshape-data-r-tidyr-vs-reshape2/) ```r! #---------------------------------------------------------------------- # Convert data from wide to long using tidyr::gather() #---------------------------------------------------------------------- # Collapse all measurement variables into 2 columns: (1) name.measurement.variable and (2) value.measurement.variable ## J_1:S_3 Names of source columns that contain values. Can take 1st.var: last.var ## -patientID Lazy specification of names of source columns by deselecting columns sbp.long.gather <- tidyr::gather(data=sbp.wide.dcast ,key="name.measurement.variable" ,value="value.measurement.variable" ,J_1:S_3) ``` ##### Using `reshape()` * [Reshaping data.frame from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) ```r! #---------------------------------------------------------------------- # Convert data from wide to long using reshape() #---------------------------------------------------------------------- ## varying Names of sets of variables in the wide format that correspond to single variables in long format (‘time-varying’) ## times The values to use for a newly created timevar variable in long format. ## timevar The variable in long format that differentiates multiple records from the same group or individual ## sep A character vector of length 1, indicating a separating character in the variable names in the wide format sbp.long.reshape <- reshape(sbp.wide.dcast ,direction = "long" ,varying = c("J_1","J_2","J_3","R_1","R_2","R_3","S_1","S_2","S_3") ,times = c("J_1","J_2","J_3","R_1","R_2","R_3","S_1","S_2","S_3") ,idvar = c("patientID") ,timevar = "method.replicate" ,sep= "_" ,v.names = "systolic.blood.pressure") ``` --- ## To-do by meeting organisers * Book meeting room for next meeting immediately afer the current meeting * Invite speakers for next meeting * Questions to presenters * **Mobile number** What is your mobile number? * **Laptop and mouse** Are you able to bring your own laptop and mouse? To connect to the projector, your device must have either a HDMI or VGA port. I would recommend the HDMI. Please bring the laptop power supply and mouse. * **Time control** Can you make sure your talk will be no longer than 20 minutes? Having <= 20 slides can be a good estimate for 20 minutes assuming each slide is 1 minute long. Note your talk can take a longer time if you will demonstrate code. We have learned that a talk running over time could risk of losing audience. * **presentation topic** Can you provide a topic? This will be used for advertising your presentation. A specific topic can attract more audience. * **WIFI and password** Network: **QIMR Guests**. password: **guest_access** * **On presentation day** Please arrive at about 12:30 and get your device set up. If you will be presenting R code, please adjust the font size or zoom the window so that audience sitting at the last row will be able to see the content. Here is an example of an enlarged window. Note that a mouse will make it a lot ealier than a touchpad to move your cursor. ![](https://i.imgur.com/KDBG016.jpg) * (Two weeks before a meeting) Change registration Google form (1) meeting information, (2) set prefilled date to next meeting. Following these steps to [send the form with prefilled answer](https://support.google.com/docs/answer/2839588?hl=en) * Open a form in Google Forms. * In the top right, click More **⋮**. * Choose **Get pre-filled link**. * Pre-populate **prefilled meeting date** * Click **GET LINK**. * Clink on the popup **COPY LINK**. You will then see link **copied to clipboard**. * Examples of the link: * https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?entry.1631135870=2019-04-24 * https://docs.google.com/forms/d/e/1FAIpQLSfb8OwP-yA7bdaFmzkQnmbFAzoJK1Op2f0dMbZ7M2zh8EVubw/viewform?usp=pp_url&entry.1631135870=2019-05-29 * Use the link for RSVP * Ask QIMR Seminar team to advertise the event on What's on this week institute email sent on every Monday * (24 hours before a meeting) Send names of visitors to QIMR security, events, Julie Walker for them to print name stickers * Contact: `QIMRSecurity.SharedMailbox@qimrberghofer.edu.au`,`Julie.Walker@qimrberghofer.edu.au`,`Events@qimrberghofer.edu.au` * (On meeting day) * Send Google calendar event invitation to registrees. * Borrow laser pointer from QIMR IT * (On meeting day) Determine day of next meeting. * How to request poster advertising in the lift (from Dr Nancy Cloake | Events and Engagement Officer | External Relations) Please find attached our poster template for seminars and workshops at the institute. You are welcome to populate this will the details of your R User Group Meetings if you wish. If you would like a unique poster, you can put a request through to our graphic designers for help with your template. Please use this link to submit your request - https://staff.qimrberghofer.edu.au/Form/ExternalRelationsTasks?category=design We can display your poster in the lift for up to 2 weeks prior to the meeting date. Posters are updated in the lifts on Wednesday’s so please make sure you provide the poster on the Monday before you want it to be displayed in the lift. For example, for your meeting on 31 July you would have needed to send your poster to **seminars@qimrberghofer.edu.au** by Monday 15 July (latest) so it could go up in the lift on 17 July. I hope this template and info is helpful. Let me know if this is what you would like to do moving forward. Kind Regards, Nancy --- ## Acknowledgements Thanks to Chang for his hard work setting up this page --- ## Past group photos ![April 2019](https://i.imgur.com/a1G8pEP.jpg)