# ODSC WEST (blog post on ggplot2)
`ggplot2`: The `gg` stands for ‘get graphing’
================
> “*Hands down the best way to start learning data science is to focus on data visualization. Pick R or Python and practice building plots that tell a story. Everything else will follow.*” - Isaac Faber Ph.D., Director of AI Development | Stanford & CMU Instructor
If you’re attending [ODSC West](https://odsc.com/california/) and would like to learn (or extend your knowledge) of data visualization, please attend our [workshop on `ggplot2`](https://odsc.com/speakers/data-visualization-with-ggplot2-3/).
Below are some questions we received from attendees to our ODSC workshop.
## What is `ggplot2`?
`ggplot2` is a graphing syntax that accurately “*describes the properties of a plotting system.*” These properties include:
1. A dataset
2. Mappings from variables to visual aesthetics
3. A geometric object (visual elements or graph types)
4. A scale for each aesthetic mapping and a coordinate system
5. An optional faceting specification
## Why use `ggplot2`?
- If you’re using `ggplot2`:
- *You’ll have a clear understanding of the data behind the visualizations you build*
- *You be able to iterate quickly through graph enhancements/revisions*
- *The consistent syntax allows you to reproduce your graphs using ‘templates’*
## How can I get started with `ggplot2`?
Install `ggplot2` from [CRAN](https://cran.r-project.org/web/packages/ggplot2/index.html) or you can use the development version found on the [package website](https://ggplot2.tidyverse.org/)
``` r
install.packages("ggplot2")
# or
install.packages("remotes")
remotes::install_github("tidyverse/ggplot2")
library(ggplot2)
```
## Where can I get some data?
We’ll use the `penguins` dataset provided by the [`palmerpenguins` package](https://allisonhorst.github.io/palmerpenguins/) by Alison Hill, Alison Hill, and Kristen Gorman.
``` r
# from CRAN
install.packages("palmerpenguins")
# from GitHub
remotes::install_github("allisonhorst/palmerpenguins")
library(palmerpenguins)
penguins <- palmerpenguins::penguins
```
``` r
head(penguins)
```
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
## 4 Adelie Torgersen NA NA NA NA <NA> 2007
## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
## # … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
## How do I (*quickly*) build a graph with `ggplot2`?
`ggplot2` graphs are built in layers, and they all start with a `data` argument (in this case, it’s `penguins`).
``` r
ggplot(data = penguins)
```

Once we have an initialized plot, we’re ready to start mapping our graph aesthetics with `mapping = aes()` (i.e. providing variables and their locations). Let’s put `bill_length_mm` on the `x` and `flipper_length_mm` on the `y`.
``` r
ggplot(data = penguins,
mapping = aes(x = bill_length_mm, y = flipper_length_mm))
```

The graph above has 1) `data`, and 2) variables (`x` and `y`). The third step is to add a geom (or geometric object), which is the type of plot that we want to create. In this case, we’ll add `geom_point()` (for points, or a scatter-plot).
``` r
ggplot(data = penguins,
mapping = aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point()
```

In three lines of code, we’ve created a scatter-plot. We’ve also used the basic template for creating graphs with `ggplot2`:
``` r
ggplot(data = <DATA>,
mapping = aes(x = <X VARIABLE>, y = <Y VARIABLE>)) +
geom_*()
```
While this graph might not be ready for publication, it is *infinitely extensible* because it was built using `ggplot2`s grammar.
## How do I change a `ggplot2` graph?
A language is considered functional when it’s capable of [“*making infinite use of finite means*”](https://en.wikipedia.org/wiki/Wilhelm_von_Humboldt). `ggplot2` does this by providing an infinite number of potential graphs from a finite number of functions. Consider the graph we created above with three lines of code. We can add more aesthetics (with `aes()`) to highlight the differences between groups for the `x` and `y` variables.
``` r
ggplot(data = penguins,
mapping = aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point(aes(color = species))
```

We can also include more geoms to further illustrate the group differences (with `geom_smooth()`.
``` r
ggplot(data = penguins,
mapping = aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species))
```

As you can see, with relatively few lines of code, we’re able to quickly iterate through versions of a graph. `ggplot2` also gives us incredible levels of control over how graphs are displayed. For example, we can remove the legend and use facets to separate each group into a small-multiples.
``` r
ggplot(data = penguins,
mapping = aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), show.legend = FALSE) +
facet_wrap(~ island, nrow = 3)
```

We can add finishing touches with labels and themes.
``` r
ggplot(data = penguins,
mapping = aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), show.legend = FALSE) +
facet_wrap(~ island, nrow = 3) +
labs(title = "Bill Length vs. Flipper Length",
subtitle = "Adelie, Chinstrap, and Gentoo Penguins",
caption = "source: palmerpenguins data",
x = "Bill length (mm)", y = "Flipper length (mm)") +
theme_minimal()
```

The consistent syntax and underlying philosophy of `ggplot2`s grammar allow us to quickly generate new graphs (and make adjustments to existing graphs).
> Hadley Wickham, package original author: “*My general thesis of visualization is that the quality of the best visualization has maybe improved 10% in the last 150 years. The best visualization you can make today is only slightly better than the best visualization someone could make 150 years ago. But the time it takes you to make them has probably decreased by three orders of magnitude.*” - [source](https://medium.com/nightingale/dataviz-and-the-20th-anniversary-of-r-an-interview-with-hadley-wickham-ea245078fc8a)
## How big is `ggplot2`?
There are also 100+ `ggplot2` [extensions](https://exts.ggplot2.tidyverse.org/gallery/), and this number is still growing. Extensions include additional geoms (like [`ggbeeswarm`](https://github.com/eclarke/ggbeeswarm)) and themes (like [`ggthemes`](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/)).
``` r
devtools::install_github("eclarke/ggbeeswarm")
devtools::install_github("jrnold/ggthemes")
library(ggbeeswarm)
library(ggthemes)
```
If you understand the `ggplot2` grammar, extensions are like plug-and-play features to for graphs. We simply adapt our template for the new geom and theme layers…
``` r
ggplot(data = penguins,
mapping = aes(x = island, y = body_mass_g)) +
ggbeeswarm::geom_beeswarm(aes(color = species)) +
ggthemes::theme_fivethirtyeight()
```

…and we have a new graph!
We hope you’ll come join us for the workshop! You’ll walk away with a solid introduction and lots of code examples to take home and tinker with.
## Additional Questions
1. Why does `ggplot2` use the `+` instead of the pipe (`%>%`)?
This can be confusing to new R users, especially if they’ve been using the pipe (`%>%`) from the `magrittr` package. The pipe allows us to easily pass the output from a function on the left as an input to the function on the right (in a ‘pipeline’). However, graph layers are added using the plus symbol (`+`). Hadley Wickham touches on the background for why it was implemented this way in [this interview](https://medium.com/nightingale/dataviz-and-the-20th-anniversary-of-r-an-interview-with-hadley-wickham-ea245078fc8a), “*I think I was reading about operator overloading and I thought “Oh maybe I could do this with ‘+’ instead”, and it kind of makes sense, you know, because you’re adding layers to the plot*”
2. Where can find `ggplot2` extensions?
[This website](https://exts.ggplot2.tidyverse.org/gallery/) contains a gallery of extensions for `ggplot2`. It’s always a good idea to check [`#ggplot2` on Twitter](https://twitter.com/search?q=%23ggplot2&src=typed_query&f=top), too.
2. Where can I learn more?
`ggplot2` has a [free online book](https://ggplot2-book.org/) and [package website](https://ggplot2.tidyverse.org/) with loads of examples.