changed 5 months ago
Linked with GitHub

2022 UC Carpentries Fall Workshop, R Day 3, Notes and Cheatsheet

plotting and report building with ggplot2 and knitr

Lesson Data

We're using a teaching version of the gapminder dataset. There are several ways to obtain this. Just choose one.

  • Direct csv download
    1. Right-click to save-as onto your computer, or
    2. Right-click to copy link to read into R directly from the github-hosted file
      • gapminder <- read.csv("copied-link"), or
  1. Load gapminder package in R
    • install.packages("gapminder") (as with all packages, you only need to do this once)
    • library("gapminder")

Gapminder Documentation


Intro - A Quick look at rstudio.cloud

Just an introduction, how to check for installed libraries.


Plotting with ggplot2

The first session of Day 3 is Episode 8 from the Software Carpentry R for Reproducible Scientific Analysis Lesson.

ggplot Function Basics

Syntax of a Basic Call

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + 
  geom_point()

Putting geom_point() on its own line aids in readability, which is useful for a plot with many layers.

ggplot Elements

  • ggplot() creates a new ggplot.
  • aes() is how aesthetic mappings are constructed and associated with the data.
    • there are arguments for colors, how things are grouped, line size & shapes, positions, etc.
    • Aesthetic arguments which are called outside of the aes() function will map to all data points.
  • geom_() are geometric objects. These are added to the plot in layers, which is why a ggplot call includes + something (geom/stat)
    • how are your data displayed? lines, bar charts, heatmaps, contours, polygons, segments, etc
    • individual geom_...() calls can include their own aesthetic mappings, both using the aes() function, and directly (remember, aesthetic arguments assigned outside the aes() function will apply to all variables/data points)

Challenge #1

Part A

In the gapminder example we've been using,

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point()

use the column "year" to show how life expectancy has changed over time.

Part B

We’ve been using the aes function to tell the scatterplot geom about the x and y locations of each point. Another aesthetic property we can modify is the point color. Modify the code from the Part A to color the points by the “continent” column. Is it easier to detect trends?

Challenge #1 Solutions

Part A

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) + geom_point()

Part B

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, color=continent)) +
   geom_point()

About Layers

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, color=continent)) +
  geom_line()

add by=:

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country, color=continent)) +
  geom_line()

add points:

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country, color=continent)) +
  geom_line() + geom_point()

move color mapping:

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country)) +
  geom_line(mapping = aes(color=continent)) + geom_point()

Challenge #2 (2-minute challenge)

Using the previous example:

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country)) +
 geom_line(mapping = aes(color=continent)) + geom_point()

Switch the order of the point and line layers from the previous example. What happened?

Answer: the layers are drawn in a different order, so the lines now cover the points

Transformations and statistics

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

change scale:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.5) + scale_x_log10()

add smoothing function, lm stands for linear model:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm")
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm", size=1.5)

About the Tilde ~

The tilde symbol ~ is often used as an operator to describe a statistical model formula.

The left side is optional, and denotes the target or dependent variable. The right side is the predictor or independent variable(s).

Challenge #3

Part A

Given

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm", size=1.5)

Modify the color and size of the points on the point layer, but don't use the aes() function in that layer.

Part B

Modify your solution to Part A so that the points are now a different shape and are colored by continent with new trendlines. Hint: The color argument can be used inside the aes() function.

This cheatsheet helps with syntax on assigning arguments. A PDF version is at RSudio Cheatsheets.

Challenge #3 Solutions

Part A

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
 geom_point(size=3, color="orange") + scale_x_log10() +
 geom_smooth(method="lm", size=1.5)

Part B

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
 geom_point(size=3, shape=17) + scale_x_log10() +
 geom_smooth(method="lm", size=1.5)

Multipaneled Figures

americas <- gapminder[gapminder$continent == "Americas",]
ggplot(data = americas, mapping = aes(x = year, y = lifeExp)) +
  geom_line() +
  facet_wrap( ~ country) +
  theme(axis.text.x = element_text(angle = 45))
ggplot(data = americas, mapping = aes(x = year, y = lifeExp, color=continent)) +
  geom_line() + facet_wrap( ~ country) +
  labs(
    x = "Year",              # x axis title
    y = "Life expectancy",   # y axis title
    title = "Figure 1",      # main title of figure
    color = "Continent"      # title of legend
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Exporting Plots

lifeExp_plot <- ggplot(data = americas, mapping = aes(x = year, y = lifeExp, color=continent)) +
  geom_line() + facet_wrap( ~ country) +
  labs(
    x = "Year",              # x axis title
    y = "Life expectancy",   # y axis title
    title = "Figure 1",      # main title of figure
    color = "Continent"      # title of legend
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggsave(filename = "results/lifeExp.png", plot = lifeExp_plot, width = 12, height = 10, dpi = 300, units = "cm")

Cheatsheets, et al

ggplot_cheatsheet_page1

ggplot_cheatsheet_page2


Create Reports with knitr

The second session of Day 3 is Episode 15 from the Software Carpentry R for Reproducible Scientific Analysis Lesson.


Creating a Markdown file

Within RStudio, click File → New File → R Markdown.

You might need to install packages

install libraries message

Fill out as much info as you want, and it will prepopulate the document header.

Markdown Basics

  • bold with double-asterisks
  • italics with underscores (or single asterisks)
  • code-type font with backticks
  1. be consistent with your methods
  2. otherwise it will confuse collaborators
  3. or, maybe even your future self!

Title

Main section

Sub-section

Sub-sub section

with even smaller type
how small can it go?

When you knit the document, notice how RStudio jumps between the console and the render tab. Error messages, warnings, and other output involving creating the document will appear in the Render tab

Challenge 1

Create a new R Markdown document. Delete all of the R code chunks and write a bit of Markdown (some sections, some italicized text, and an itemized list).

Convert the document to a webpage.

More Markdown

  • hyperlinks: [Carpentries Home Page](https://carpentries.org/)
  • images: ![The Carpentries Logo](https://carpentries.org/assets/img/TheCarpentries.svg)
  • superscripts F2 F^2^
  • subscripts F2 F~2~
  • LaTeX: \[y = \mu + \sum_{i=1}^p \beta_i x_i + \epsilon\] $$y = \mu + \sum_{i=1}^p \beta_i x_i + \epsilon$$

R Code Chunks

```{r load_data}
gapminder <- read.csv("gapminder.csv")

How things are compiled

  • knitr uses Pandoc, which is a really cool tool for document conversion!

Chunk Options

```{r load_libraries, echo=FALSE, message=FALSE}
library("dplyr")
library("ggplot2")
```{r global_options, echo=FALSE}
knitr::opts_chunk$set(fig.path="Figs/", message=FALSE, warning=FALSE,
                      echo=FALSE, results="hide", fig.width=11)

Inline R

`r round(some_value, 2)`
Select a repo