---
title: "Tutorial 3: Histogram"
author: "Elizabeth A. Albright, PhD, Nicholas School of the Environment"
output:
word_document: default
html_document: default
subtitle: ENV710 Applied Statistical Modeling for Environmental Management
editor_options:
chunk_output_type: inline
---
# Tutorial 3: Histogram
Congratulations! You have made it to Tutorial 3! In this tutorial we will work on developing a histogram! We will need to install a few new packages. Be sure to do so in the console below. You should already have the other packages we need (and listed in the first chunk) installed and ready to load.
**ggplot2**: This is a data visualization package that we will use throughout the semester. The ggplot2 package enables you to develop all sorts of graphs and visualizations including histograms, bar charts, and scatterplots.
**ggthemes**: Provides settings to make visualizations consistent and attractive.
```{r library}
library(wbstats) # a package that enables us to import data from the World Bank.
library(ggplot2) # a data visualization package.
library(ggthemes) # a package of themes for visualizations. themes are settings to make are visualizations consistent and attractive.
library(moments) # allows us to calculate skewness and kurtosis
library(dplyr) # a package that helps us wrangle/manage data
library(tidyr) # a package that allows us to pivot the data
```
Let's start this tutorial afresh by removing all of our objects. Look at the Environment on the right. What objects are currently in our Environment? We could also use the function `ls()` to list all the objects and functions in our working directory.
```{r}
ls()
```
Let's remove all of the objects (we need to make we really want to do this!).
```{r}
rm(list = ls()) #this function removes all objects. rm is short for remove. ls() is a function that lists all objects.
```
Our Environment should now be empty! Please take a look at it on the right.
## Loading .Rdata
As you may remember from Tutorial 2, we saved our workspace as Tutorial2.RData. We should be able to load this and have the objects that we made in that tutorial. Let's try it. We will need to make sure that we load the workspace from the correct folder. You can go to Session>Load Workspace to find the workspace. The function to load a workspace is `load()`.
Now let's see what our objects are. We will use the `ls()` function.
```{r}
load("~/🪅Master/04_Study/Fall 2023/ENV 710 TA/R for stats/rscripts/Tutorial2.RData")
ls()
```
## Making a histogram
Now it's time to visualize the data! Let's make a histogram of the variable urban_pop and let's make it pretty. We will build it in phases, so you can see how it is done.
But before we get going, let's just look at the data to make sure it's all there. Please look at the top 10 rows of wb_data in the chunk below.
```{r data}
head(wb_data)
```
We will start off with the ugly, basic histogram. The first argument is the data frame/tibble. The second argument is the aesthetic (aes). We will set this to urban_pop. The plus sign at the end of the line tells R to carry on to the next line. geom_ tells R which geometry to use. There are lots of options (historgram, boxplot, scatter, etc. etc.). We will start with the histogram.
Okay, please run the chunk below!
```{r}
ggplot(data=wb_data, aes(x=urban_pop)) +
geom_histogram()
```
Woah that's ugly! Be we can make it better. I encourage you to read about histograms in ggplot2 on the [STHDA webpage]:http://www.sthda.com/english/wiki/ggplot2-histogram-plot-quick-start-guide-r-software-and-data-visualization.
Let's next add labels and add a binwidth of 5. It's important to note that binwidth is in the units of the variable (in this case %).
```{r}
ggplot(data = wb_data, aes(x=urban_pop)) +
geom_histogram(binwidth = 5) +
labs(title = "Urban population as percentage of total population", x = "Urban population (%)")
```
Okay, it's a little better. But let's change its color. We can do this in the geom_histogram().
```{r}
ggplot(data = wb_data, aes(x = urban_pop)) +
geom_histogram(binwidth = 5, color = "black", fill = "white") +
labs(title = "Urban population as percentage of total population", x = "Urban population (%)")
```
So I still don't love the histogram (this is when the line between data analyst and artist becomes blurry). Let's change the theme of the histogram. A theme is a set of settings that adjust the look overall. I'll use theme_minimal() as an illustration.
```{r}
ggplot(data = wb_data, aes(x = urban_pop)) +
geom_histogram(binwidth = 5, color = "black", fill = "white") +
labs(title = "Urban population as percentage of total population", x = "Urban population (%)") +
theme_minimal()
```
What changed in the histogram? Play around with themes--for example, try theme_economist().
Now it is your turn! Please make a histogram of one of the other variables. Please be sure to label it appropriately and color it. Check out the STHDA website and play around with the figure. Feel free to change the theme.
## Saving an image as a .png
One way to save a figure is to change the Chunk Output to console using the gear pull-down menu. You may get a pop-up menu asking to save your inline output (click yes). Now you can run the chunk with your final histogram. The image should pop-up in the lower right pane in the Plots viewer. From there, you can select export and save the .png.
Congratulations! You have finished Tutorial 3. Be sure to upload your histogram to Discussion/Data visualization/Forums!
**Bonus**
Here is a tutorial on how to make a publication-worthy scatterplot (Economist sytle): http://rstudio-pubs-static.s3.amazonaws.com/284329_c7e660636fec4a42a09eed968dc47f32.html