Try   HackMD
tags: R mtext bold bquote corrplot colorlegend par png diag mtext mar seq(from=,to=,length.out=) forestplot matrix rownames colnames lower.tri upper.tri dplyr::arrange plotrix::ablineclip() ggplot2::geom_linerange() ggplot2::scale_x_continuous() ggplot2::geom_line() gghighlight::gghighlight() ggplot2::geom_tile() ggplot2::scale_fill_continuous() ggplot2::facet_wrap() ggpubr::ggarrange(p1,p2,p3,p4) ggplot2::ylim()

Data visualisation in R

Chang's collection of working R code and plots.


Visualising time series data

Fitness data

  • Data source: activities.csv from Strava bulk download
  • X variable: Activity day of year derived from Activity.Date
  • Y variable: Cumulative riding distance derived from Distance
  • Legend variable: Activity year derived from Activity.Date
  • R script fitness.R
  • Core functions ggplot2::ggplot()+ggplot2::geom_line()+gghighlight::gghighlight()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


  • Data source: activities.csv from Strava bulk download
  • X variable: Activity day of year derived from Activity.Date
  • Y variable: Cumulative riding elevation derived from Elevation.Gain
  • Legend variable: Activity year derived from Activity.Date
  • R script fitness.R
  • Core functions ggplot2::ggplot()+ggplot2::geom_line()+gghighlight::gghighlight()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


  • Data source: activities.csv from Strava bulk export
  • X variable: Activity week derived from Activity.Date
  • Y variable: Activity day derived from Activity.Date
  • Legend variable: Riding distance per day derived from Distance
  • R script fitness.R
  • Core functions ggplot2::ggplot()+ggplot2::geom_tile()+ggplot2::scale_fill_continuous()+ggplot2::facet_wrap()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


  • Data source: activities.csv from Strava bulk download
  • Variable: Moving time in hour derived from Moving.Time
  • R script fitness.R
  • Core functions calendR::calendR() ggplot2::ggsave()
  • References graphgallery/calendar-heatmaps

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


  • Data source: Single activity .fit file downloaded from Strava. .fit converted to CSV at Convert a FIT file to a CSV file
  • Common X variable: distance.km derived from distance
  • Y variables: heart_rate, speed, altitude, grade.edited
  • R script fitness.R
  • Core functions
    p1<-ggplot2::ggplot() + ggplot2::geom_line() + gghighlight::gghighlight + ggplot2::ylim()
    p2<-ggplot2::ggplot() + ggplot2::geom_line() + gghighlight::gghighlight + ggplot2::ylim()
    p3<-ggplot2::ggplot() + ggplot2::geom_line() + gghighlight::gghighlight + ggplot2::ylim()
    p4<-ggplot2::ggplot() + ggplot2::geom_line() + gghighlight::gghighlight + ggplot2::ylim()
    p<-ggpubr::ggarrange(p1,p2,p3,p4) ggpubr::annotate_figure(p)
  • References How to Set Axis Limits in ggplot2?

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


  • R version 4.3.1
  • packageVersion("ggplot2") 3.4.3
  • Data source: Single activity .fit file downloaded from Strava. .fit converted to CSV at Convert a FIT file to a CSV file
  • Common X variable: distance.km derived from distance
  • Y variables: heart_rate, speed, altitude, grade.edited
  • R script fitness.R
  • Core functions
    p1<-ggplot2::ggplot() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::scale_fill_manual + ggplot2::coord_cartesian()
    p2<-ggplot2::ggplot() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::scale_fill_manual + ggplot2::coord_cartesian()
    p3<-ggplot2::ggplot() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::scale_fill_manual + ggplot2::coord_cartesian()
    p4<-ggplot2::ggplot() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::geom_area() + ggplot2::scale_fill_manual + ggplot2::coord_cartesian()
    p<-ggpubr::ggarrange(p1,p2,p3,p4)
    ggpubr::annotate_figure(p)
  • References ptt_r_090923|Wei-Chu Chen Problem with geom_ribbon and alpha combined with coord-cartesian in R 4.1.0 on Windows #4498

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Viedoc Electronic Data Capture System

  • Data source: Visualization report_pending forms.xlsx (exported from Viedoc database)
  • X variables on bottom x axis:
    • xmin=PendingSince.date.num
    • xmax=diff.pendingSince.today
  • X variables on top x axis:
    • strftime(x=unique(c(plot.data$PendingSince.date, plot.data$today)) ,format = "%d%b\n%y")
  • Y variable: SubjectId
  • Legends:
    • Items ordered from the most frequent on top to the least frequent at bottom.
    • 20 distinct colors picked from pals::glasbey(n=20)
    • Long legend item text wrapped to multiple lines
  • R script Viedoc-dashboard.R
  • Core functions ggplot2::ggplot()+ggplot2::geom_linerange()+ggplot2::scale_x_continuous()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


  • Data source: Visualization report_pending forms.xlsx (exported from Viedoc database)
  • X variables on bottom x axis:
    • xmin=dummy.xmin (set to 0)
    • xmax=date.pending.until.today
  • Y variable: SubjectId
  • Legends:
    • Items ordered from the most frequent on top to the least frequent at bottom.
    • 20 distinct colors picked from pals::polychrome(n=20)
    • Long legend item text wrapped to multiple lines
  • R script Viedoc-dashboard.R
  • Core functions ggplot2::ggplot()+ggplot2::geom_linerange()+ggplot2::scale_x_continuous()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Annotating images

Raw image

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Annotated image


Treemap

Number of protocol deviation coded terms


Number of disposition coded terms


Medical history coded terms


A treemap that visualises hierarchical and categorical data[1]

Plot relative proportions of categories and their sub-categories with a treemap. The categories and sub-categories are alledged conduct types and subtypes from CCC Allegations Data

Treemap


Wordcloud


Donut plot

Number of study participants and sex ratio

Number of study participants and ethicnity ratio


Bar charts

Vertical bar plot

This plot displays the distribution of ages in females (F) and males (M) with means (dashed lines). The means are slso shown in a table within the plot


Horizontal bar plot. Frequency of ARM


Create multiple bar plots on one page[2].

This plot displays bars in user-defined color (green, blue) for statistically significant result and pale color (pale green, pale blue) for statistically non-sigificant result.

Bar charts


Heatmap

Symmetrical heatmap with identical groups in row and column dimension[3]

The two dimensions are taken from the same group. The diagonal, correlations between variables and themselves, are skipped here. Coefficient data are genetic correlation coefficients calculated using Linux software package Linkage disequilibrium score regression. The input data are three full matrices that are populated from a data frame: rG coefficients, p values and quotients of p values divided by significance threstholds

Symmetrical heatmap


Asymmetrical heatmap with 1 group in row dimension and the other group in column dimension[4]

Create a heatmap to display proportions of variance (R-squared, R2) of group 1 variables explained by group 2 variables. The row dimension represents 40 variables from group 2. The column dimension represents 30 variables from group 1. Higher the R2, larger and more purplish the dots are

Asymmetrical heatmap


Forest plot

A table and a forest plot

Create a table and forest plot to display odds ratios of binary outcome variables estimated from two different types of analyses[5].

forest plot


Venn Diagram

Visualise how participants in three groups overlap[6]

Visualise the number of participants between three phases of a study (NU1,NU2,NU3). The numbers of overlapped participants are in red and the numbers of total participants are in black.

Venn Diagram


Scatter plot

Multiple scatter plots[7]

Compare 2 variables (CTT-based scores in orange, IRT-based scores in light blue) between twin 1 and twin 2 in monozygotic twins (MZ) and dyzygotic twins (DZ)

scatter plot


Manhattan plot[8]

A Manhattan plot is a specific type of scatter plot widely used in Genome Wide Association Study (GWAS). Each point represents a genetic variant. The X axis shows its position on a chromosome, the Y axis shows how much it is associated with a trait as –log10(p-value) Manhattan plot in R: a review.

Manhattan plot


Histogram

Multiple histograms on one page[9]

Compare the distribution of three scores (PSYCH6, SOMA6, and SPHERE12) computed with classical test theory (orange group; panel A, B, C) and item response theory (IRT, blue groups; panel D, E, F)

histogram


Box plot

Distribution of eye test scores in left and right eyes across visits

A single plain box plot

A box plot[10] presents information from a five-number summary. It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared.

box plot


Line plot

Visualising different line types

19. R Base Graphs 2

# setting parameters back to default
par(mfrow=c(1,1), col.axis="black")

# Install plotrix package to use the ablineclip()
install.packages("plotrix")

library(plotrix) # add on package for "ablineclip", install if not yet available

plot(1:7, ylab="", main="Line Types lty 0:6", xlab="lty 0:6") # test plot

ablineclip(v=1, lty=1, col="sienna2", lwd=2) # solid (default)
ablineclip(v=2, lty=2, col="sienna2", lwd=2) # dashed
ablineclip(v=3, lty=3, col="sienna2", lwd=2) # dotted
ablineclip(v=4, lty=4, col="sienna2", lwd=2) # dotdash
ablineclip(v=5, lty=5, col="sienna2", lwd=2) # longdash
ablineclip(v=6, lty=6, col="sienna2", lwd=5) # twodash, thicker for comparison
ablineclip(v=7, lty=0, col="sienna2", lwd=2) # blank

ity 0 to 6


Multiple lines in one plot

line plots[11] .

line plots


Automate the generation of axis tick values from data to plot.

# Calculate the range of data values 
plot.value.ranges <- range(as.vector(plot.data$mean)
                           ,as.vector(plot.data$lower)
                           ,as.vector(plot.data$upper)
                           ,na.rm = TRUE)
# Define lower and upper limit of ticks on x axis                           
x.axis.tick.minimum <- plot.value.ranges[1] - 0.1
x.axis.tick.maximum <- plot.value.ranges[2] + 0.1

# Define ticks as 10 equally spaced numbers from the lower to the upper limit
x.axis.ticks.round <- round(seq( from=x.axis.tick.minimum
                                ,to=x.axis.tick.maximum
                                ,length.out = 10)
                            ,1)
> x.axis.ticks.round
[1] 0.2 0.5 0.8 1.2 1.5 1.8 2.1 2.5 2.8 3.1                            

Add text within a plot using legend().

# Add text to top left corner within the subplot
text_topLeft_withinFigure=subplotInfoSource$plotTitle[i]
# Specify the text location. The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center" 
legend('topleft', text_topLeft_withinFigure, bty="n", cex=3) # add text to top left corner, remove legend frame

Add lower-case bolded letter to a plot

  • D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_output_fig12ab_histogram_sumScore_IRTScore_wide.tif
letter=letters[i]
boldedLetter=bquote(bold(.(letter)))
mtext(boldedLetter,side=3, adj=0.9, cex=1.5, line=-1.75)


  1. R script file path:
    D:\googleDrive\Job\Queensland-Crime-Corruption-Commissioin_QLD-332553_data-scientist\Assessment1_create-dashboard\make-treemap_alledged-conduct-types_alledged-conduct-subtypes.R
    plot file path:
    D:\googleDrive\Job\Queensland-Crime-Corruption-Commissioin_QLD-332553_data-scientist\Assessment1_create-dashboard\make-treemap_alledged-conduct-types_alledged-conduct-subtypes.png ↩︎

  2. R script file path:
    /mnt/backedup/home/lunC/scripts/PRS_UKB_201711/PRS_UKB_201711_step18-04-06_barPlot_percen-variance-selective-phenotypes_explained-by-GSCAN-PRSs.R
    plot file path:
    /mnt/backedup/home/lunC/plots/zfig44-04_percent-variance-of-cocaine-amphetamine-hallucinogens-ecstasy-cannabis-AUD_explained-by-PRS-GSCAN-SI-DPW.png ↩︎

  3. R script file path:
    /mnt/backedup/home/lunC/scripts/MR_ICC_GSCAN_201806/MR_step08-04_heatmap_LDSC-genetic-correlations.R
    plot file path:
    /mnt/backedup/home/lunC/plots/MR_ICC_GSCAN_201806/genetic-correlation-between-use-4-substances.png ↩︎

  4. R script file path:
    /mnt/backedup/home/lunC/scripts/PRS_UKB_201711/PRS_UKB_201711_step18-04_heatmap_variance-explained-by-PRS_r-square_p-value.R
    plot file path:
    /mnt/backedup/home/lunC/plots/licit_substance_PRSs_predict_illicit_drug_use/zfig39_heatmap_corrplot_R2-alcoho-toba-drugs-diagSU-explained-by-GSCAN-PRS.pdf ↩︎

  5. R script file path:
    /mnt/backedup/home/lunC/scripts/MR_ICC_GSCAN_201806/MR_step10-03_forest-plot_odds-ratio-95percent-CI_observational-associations_MR-IVW.R
    plot file path:
    /mnt/backedup/home/lunC/plots/MR_ICC_GSCAN_201806/manu4_odds-ratio-95CI_observational-association_MR-IVW.png ↩︎

  6. R script file path:
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_programs_R\NU_014_vennDiagram_ID_with_at_least_1_SPHERE_item.R
    plot file path:
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_output\fig06_vennDiagram_ID_with_at_least_1_SPHERE_item.png ↩︎

  7. R script file path:
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_programs_R\NU_002c_scatterPlot_twinCorr_2Var.R
    plot file path:
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_output\fig10c_scatterPlot_twinCorr_2Var.png ↩︎

  8. R script file path:
    D:\Now\library_genetics_epidemiology\GWAS\scripts\PRS_UKB_201711\PRS_UKB_201711_step19-02_manhattan-plot_QCed-SNP_clumped-SNPs.R
    plot file path:
    D:\Now\library_genetics_epidemiology\GWAS\plots\zfig43-01-01_manhattan-plot_GSCAN-smoking-initiation_LD-clumped-SNPs_suggestive-line-at-p-smaller-than_5e-08.png ↩︎

  9. R script file path:
    D:/Now/library_genetics_epidemiology/slave_NU/NU_analytical_programs_R/NU_007a_histogram_rawScore_IRT_byVar.R
    plot file path:
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_output_fig12ab_histogram_sumScore_IRTScore_wide.png ↩︎

  10. R script file path
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_programs_R\NU_003_boxplot_SPHERE12.R
    plot file path
    D:\Now\library_genetics_epidemiology\slave_NU\NU_analytical_output\fig11_boxPlot_SPHERE12_raw_IRT_twins.png ↩︎

  11. R script file path
    D:\z_old_files\national_inpatient_sample\NIS_analytical_programs\NIS_10_lineplot_number_discharge_weighted.R
    plot file path
    D:\z_old_files\national_inpatient_sample\NIS_analytical_output\fig03_weighted_number_discharges_b.png ↩︎