owned this note
owned this note
Published
Linked with GitHub
---
tags: ucsd-carpentries-archived
---
# UCSD Carpentries Bootcamp - R and Git (June 2022)
**Workshop Details**
Dates: June 13th - 16th, 2022
Days: Monday - Thursday
Time: 9am - 12 pm
**Workshop Agenda:**
https://kthoma2484.github.io/2022-06-13-UCSD/
**Software Installation:**
R Studio downloads: https://www.r-project.org/ - download the free version
Online/Cloud R Studio interface: https://rstudio.cloud/
*This is an online interface that can be used when unable to download R Studio*
Git software: https://git-scm.com/downloads
**Lesson Data (download)**
[Gapminder data](https://kthoma2484.github.io/2022-06-13-UCSD/data/gapminder-FiveYearData.csv) and [Feline-data](https://kthoma2484.github.io/2022-06-13-UCSD/data/feline-data.csv)
## NOTES:
A copy of the instructor live session notes will be made available to participants upon request at the end of the workshop.
## Questions after the workshop about working with R?
You can email UC San Diego Data Science Librarian [Stephanie Labou](slabou@ucsd.edu) or schedule a [Zoom consultation](https://calendly.com/slabou).
## Workshop Day 1 (10 attendees)
### First name and Last Name/Organization/Dept./Email
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | --------------- |
| Chris Day | UCSD | Bio | cdday@ucsd.edu |
| Skyler Zheng | UCSD | CogSci | x3zheng@ucsd.edu |
| Andrew Muroyama | UCSD | Biological Sciences | amuroyama@ucsd.edu |
| Anne Marie Berry | UCSD | Biomedical Sciences | amberry@ucsd.edu
| Ariel Flores | UCSD | Chem E | a6flores@ucsd.edu |
| Rio Aguina-Kang | UCSD | Psych |raguinakang@ucsd.edu |
|Christopher Taylor | UCSD |Envrionmental Systems: EBE |cdtaylor@ucsd.edu
| Kya Barounis | UCSD | Psychiatry | kfawleyking@health.ucsd.edu |
| Peter Huang | UCSD | Bio | phuang@ucsd.edu |
| Dina Zangwill | UCSD | BioSci |dzangwil@ucsd.edu |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
## Day 1 Questions:
Please enter any questions not answered during live session here:
1.
## Day 1 Live Notes
Intro to RStudio:
RSTudio IDE overview
### Running statements in Console
Sometimes when the statement is not complete, the console will be prompt you to finish the statement. For example, if you type:
```r=
1+100+
```
The console will be waiting for you to complete the sentence. You can either finish the sentence, or can simply hit the "esc" key to cancel the current statement.
### Boolean operators
`!=` -> Not equal
`==` -> Equal (Note here, not single, but DOUBLE equal sign!)
`<` -> Less than
`>` -> Greater than
`>=` -> Greater or equal to
`<=` -> Less than or equal to
### Object assignment
It is preferred in R to use the assignment operator `<-` to link variable names to the objects (You can still use `=` for assignment). For example:
```r=
a <- 1/40
```
Now the value `1/40` can be referred as `a` from this point on. Assigning your output to a variable is helpful because you will be able to manipulate the variable and do further anaysis.
### Naming convention
When naming your variables, there are a couple of rules:
- Cannot start with `_`
- Cannot start with numbers
- Can start with `.`, the variable started with `.` will be hidden in the current environment, but the user will still be able to access the variable by invoking the variable name.
- Normally, variable names start with letters
### Vector
```r=
1:5 # this will return 1 2 3 4 5
```
We can do vectorized operations as well:
```r=
2^(1:5) # returns 2 4 8 16 32
```
### Variable Management
To explore what objects are in your current environment, you can use the `ls()` function. If you want to see *all* variables, including the hidden ones, use `ls(all.names=T)`.
To remove variables, use the `rm()` function
```r=
rm(a) # remove the object with the name "a"
rm(list = ls()) # remove all non-hidden variables
```
### Boolean Values
Therea are two boolean values in R, `TRUE`, or `T`; and `FALSE`, or `F`. Note that all letters are capitalized.
### Project Management
Go to File -> New Project -> Existing Directory / New Directory
Great resource on workflow and organizing folders for scientic computing written by the Carpentries [Good enough practices in scientific computing](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510)
### Dataframe
File -> New Script -> R Script | Open a new R script
#### Create New Dataframe
Create a new dataframe from scratch:
```r=
cats <- data.frame(coat = c("calico", "black", "tabby"),
weight = c(2.1, 5.0, 3.2),
likes_string = c(1,0,1))
```
Creates a dataframe that looks like
| coat | weight | likes_string |
| -------- | -------- | -------- |
| calico| 2.1| 1|
| black | 5.0| 0|
| tabby | 3.2| 1|
#### View Dataframe
If you want to examine the data, go to the top right **environment** pane and click on the data to view the dataframe.
Alternatively, you can call the `View(cats)` function to view the `cats` dataframe.
#### Output Dataframe to Other Formats (csv, tsv, etc)
To save the `cats` dataframe to a CSV file:
```r=
write.csv(cats, # the object to output
"data_output/feline-data.csv", # the sys path to write to
row.names=F)
```
**Tab-completion** when you are type a long system path to an existing file, press tab will prompt the computer to complete the long path for you. Tab completion works for saved variables as well.
#### Import data files
```r=
read.csv("data/feline-data.csv", stringAsFactors = T)
```
If you see unfamilar with a options in a function, you can prepend `?` before the function name to read the documentation of the function. For instance, if you don't know what the `row.name=F` does in the function `write.csv()`, simply do
```r=
?write.csv # note here we do NOT include pranthesis
```
And the bottom right corner, **help pane** will show you the documentation of the function.
Anything written after a `#` will not be executed by the computer. This is useful for commenting.
#### Column Access
Use the `$` to access or to modify components of an object. Specifically, for dataframe, `$` is used to access columns within a dataframe.
```r=
cats$weight # Access column "weight" from the "cats"
```
Modify all elements in a column and save the result to a new column in the dataframe.
```r=
cats$weight_minus2 <- cats$weight-2
```
To coerce object into other types, for example, converting an object of type character to integer type:
```r=
char_vct <- c('0', '2', '4')
num_vct <- as.numeric(char_vct) # returns 0 2 4, as numbers
char_vct <- c('0', 'abc', '4')
num_vct <- as.numeric(char_vct) # returns 0 NA 4. 'abc' -> na
```
#### Rename columns
Rename the second column of the dataframe to 'weight_kg':
names(cats)[2] <- "weight_kg"
#### Subset Columns
Select the column coat and weight
```r=
cats_subset <- cats[c("coat", "weight")]
```
Note you must use the `c()` funtion, because the function is expecting only one object.
### Object Types
| Type | Example |
| -------- | -------- |
| Double | 3.14|
| integer| 3 |
| complex| |
| logical| TRUE or FALSE|
| character| "cats"|
To ask for the object type, use `typeof()`. For example, we can do `typeof(cats$weight)` to ask R what is the object type of the column, `weight`.
**Exercise**
Start by makign a vector with the number 1 through 26, multiply the vector by 2, and give the resulting vector name A through Z.
```r=
x <- 1:26
names(x) <- LETTERS
```
**Factor** in R is the categorical variables. R assign a number to each unique string and store them in memory.
### End Day 1
## Workshop Day 2 (11 attendees)
### First name and Last Name/Organization/Dept./Email
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | --------------- |
|Rio Aguina-Kang |UCSD |Psychology |raguinakang@ucsd.edu
| Skyler Zheng | UCSD | CogSci | x3zheng@ucsd.edu |
|
| Sina Ghaffarnejad |UCSD | Biology | sighaffa@ucsd.edu|
| Ariel Flores | UCSD | Chem E | a6flores@ucsd.edu |
| Andrew Muroyama | UCSD | Biology | amuroyama@ucsd.edu |
| Anne Marie Berry | UCSD | Biology | amberry@ucsd.edu |
| Chris Day | UCSD | Bio | cdday@ucsd.edu |
|Christopher Taylor |UCSD |ESYS:EBE |cdtaylor@ucsd.edu |
| | | | |
| Dina Zangwill | UCSD |BioSci |dzangwil@ucsd.edu |
| Peter Huang | UCSD | bio | phuang@ucsd.edu |
| Kya Barounis | UCSD | Psych | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
## Day 2 Questions:
Please enter any questions not answered during live session here:
1.
**Lesson Data (download)**
[Gapminder data](https://kthoma2484.github.io/2022-06-13-UCSD/data/gapminder-FiveYearData.csv) and [Feline-data](https://kthoma2484.github.io/2022-06-13-UCSD/data/feline-data.csv)
## Day 2 Live Notes
dplyr and tidyverse:
### Introduction to tidyverse
[tidyverse homepage](https://www.tidyverse.org)
Install tidyverse if you have not already done so:
```r=
install.packages("tidyverse")
```
Load the tidyverse library once it is installed:
```r=
library(tidyverse)
```
We will focus on *dplyr* and *tidyr*.
## *dplyr*
```r=
#install needs to be done only once
install.package("tidyverse", dep=T)
library(tidyverse)
#output will show 'attaching pages' and 'conflicts' that occurred when installing the libraries; some libraries have conflicts but this should be okay in general - it's primarily an FYI
gapminder <- read_csv("data/gapminder_data.csv")
rm(cats)
#Some common tidyverse functions are select(), filter(), groupby(), summarize(), and mutate(); also will look at %>% (pipe) - this operator lets you pipe down to data
```
### select()
```r=
#select() lets you subset data by column (variable) name
smallr_gapminder_data <- gapminder %>%
dpylr::select(year, country, gdpPercap)
test1 <- dpylr::select(year, country)
rm(test1)
```
### filter()
```r=
#filter() - lets you target specific columns based on certain criteria
gapminder_europe <- gapminder %>%
filter(continent = "Europe")%>%
select(year,
country,
gdpPercap)
gapminder_europe2 <- gapminder %>%
filter(continent = "Europe")%>%
select(year,
country,
gdpPercap)%>%
rename(gdp= gdpPercap)
#You can use ftable(gapminder$continent) directly in the console to get a view of the data continent variable
```
### Challenge:
Write a single command (which can span multiple lines and includes pipes) that will produce a data frame that has the African values for lifeExp, country and year, but not for other Continents. How many rows does your data frame have and why?
```r=
year_country_lifeExp_Africa <- gapminder %>%
filter(continent == "Africa") %>%
select(year, country, lifeExp)
```
### group_by()
```r=
test2 <- gapminder %>%
filter(continent %in% c("Africa",
"Europe")) %>%
select(year, continent, country, lifeExp)
```
### group_by() + summarize()
```r=
gdp_bycontinent <- gapminder%>%
group_by(continent)%>%
summarize(mean_gdp = mean(gdpPercap),
sd_gdp = sd(gdpPercap),
se_gdp = sd(gdpPercap)/sqrt(n()),
count = n())
```
### Challenge
Calculate the average life expectancy per country. Which has the longest average life expectancy and which has the shortest average life expectancy?
```r=
challenge2r <- challenge2%>%
filter(mean_life_exp == max (mean_life_exp))
challenge1r <- challenge1%>%
filter(mean_life_exp == min (mean_life_exp))
```
### mutate()
```r=
gdp_per_billion <-
gapminder%>%
mutate(gdp_per_billion) = gdpPercap*pop/ 10^9)
# removing column (variable names)
remove_pop_year <-
gapminder%>%
select(-c(pop,year))
```
## Introduction to *tidyr*
[tidyr homepage](https://tidyr.tidyverse.org)
[guide](https://swcarpentry.github.io/r-novice-gapminder/14-tidyr/index.html)
[link to gapminder data](https://drive.google.com/drive/folders/1Xz6CUK71n88UEbqn3OFCHUMEG-w7vudd?usp=sharing)
*tidyr* supersedes *reshape2* and *reshape*. *tidyr* is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).
The goal of tidyr is to help you create tidy data. Tidy data are data where:
1. Every column is variable.
2. Every row is an observation.
3. Every cell is a single value
***tidyr*** functions fall into five main categories - we will focus on pivotting:
“Pivotting”: converts between long and wide forms. See pivot_longer() and pivot_wider(), and the vignette("pivot") for more details.
“Rectangling”: turns deeply nested lists (as from JSON) into tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), and the vignette("rectangle") for more details.
"Nesting" converts grouped data to a form where each group becomes a single row containing a nested data frame, and "unnesting" does the opposite. See nest(), unnest(), and the vignette("nest") for more details.
Splitting and combining character columns. Use separate() and extract() to pull a single character column into multiple columns; use unite() to combine multiple columns into a single character column.
Make implicit missing values explicit with complete(); make explicit missing values implicit with drop_na(); replace missing values with next/previous value with fill(), or a known value with replace_na().
### Linking *dplyr* to *ggplot2*
```r=
library(tidyr)
americas <- gapminder[gapminder$continent == "Americas", ]
ftable(americas$country)
levels(as.factor(americas$country))
# Make the plot
ggplot(data = americas, mapping = aes(x= year,
y= lifeExp)) +
geom_line() +
facet_wrap(-country) +
theme_bw() +
theme(axis.test.x = element_text(angle = 30,
hjust =1))
```
### Import wide-form gapminder data
#### Note: You will first need to obtain this file at the link below and add it to your project data folder:
[wide-form gapminder data](https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_wide.csv)
```r=
gap_wide <- read.csv("data/gapminder_wide.csv", stringAsFactors = FALSE)
```
### Wide-form to long-form
```r=
## Note that this uses the piping notation, and similar to select() we use starts_with() to grab more than one observation simultaneously
gap_long <- gap_wide %>%
pivot_longer(
cols = c(starts_with('pop'), starts_with('lifeExp'), starts_with('gdpPercap')),
names_to = "obstype_year", values_to = "obs_values"
)
# Check structure
str(gap_long)
# You can also use the '-' notation to exclude variables
## Note that this generates the same long-form data as the code above
gap_long <- gap_wide %>%
pivot_longer(
cols = c(-continent, -country),
names_to = "obstype_year", values_to = "obs_values"
)
str(gap_long)
# You can separate values in a column by a separator
## Note that in the example above obstype_year has two pieces of information in it
gap_long <- gap_long %>% separate(obstype_year, into = c('obs_type', 'year'), sep = "_")
# Convert year to an integer
gap_long$year <- as.integer(gap_long$year)
```
# Challenge 2
#Using gap_long, calculate the mean life expectancy, population, and gdpPercap for each continent. Hint: use the group_by() and summarize() functions we learned in the dplyr lesson
```r=
#challenge 2 answer
gap_long %>% group_by(continent, obs_type) %>%
summarize(means = mean(obs_values))
```
### Go from long-form to the intermediate form of the raw data
```r=
gap_normal <- gap_long %>%
pivot_wider(names_from = obs_type, values_from = obs_values)
# Check dimensions
dim(gap_normal)
dim(gapminder)
# Check column names, and their order
names(gap_normal)
names(gapminder)
# Re-order levels in new data to match original data
gap_normal <- gap_normal[, names(gapminder)]
# Check for similarity between two datasets
all.equal(gap_normal, gapminder)
## There are differences... let's see why...
head(gap_normal)
head(gapminder)
## Ah, there are differences in how the columns are sorted. We can fix this...
gap_normal <- gap_normal %>% arrange(country, year)
# Check again...
all.equal(gap_normal, gapminder)
## All good! The differences are due to tibble vs. data frame (I think)
```
### Going back to wide-format
```r=
# You can unite variables to make it easier to go to wide-form
gap_temp <- gap_long %>% unite(var_ID, continent, country, sep = "_")
str(gap_temp)
# You can use the pipe to unite more than one group of variables at a time
gap_temp <- gap_long %>%
unite(ID_var, continent, country, sep = "_") %>%
unite(var_names, obs_type, year, sep = "_")
str(gap_temp)
# You can now pipe to pivot_wider
gap_wide_new <- gap_long %>%
unite(ID_var, continent, country, sep = "_") %>%
unite(var_names, obs_type, year, sep = "_") %>%
pivot_wider(names_from = var_names, values_from = obs_values)
str(gap_wide_new)
#Split ID_var into 2 columns
gap_wide_betterID <- gap_long %>%
unite(ID_var, continent, country, sep = "_") %>%
unite(var_names, obs_type, year, sep = "_") %>%
pivot_wider(names_from = var_names, values_from = obs_values) %>%
#separate() command splits a column based on a separator
separate(ID_var, c("continent", "country", sep = "_"))
str(gap_wide_new)
#Check against original data
all.equal(gap_wide, gap_wide_betterID)
```
### End Day 2
## Workshop Day 3
### First name and Last Name/Organization/Dept./Email
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | --------------- |
|Rio Aguina-Kang |UCSD | Psychology |raguinakang@ucsd.edu |
| Skyler Zheng | UCSD | CogSci | x3zheng@ucsd.edu |
| Peter Huang |UCSD |Bioinformatics |phuang@ucsd.edu |
| Dina Zangwill | UCSD | BioSci | dzangwil@ucsd.edu |
|Christopher Taylor |UCSD | ESYS:EBE |cdtaylor@ucsd.edu |
| Andrew Muroyama | UCSD | Biology | amuroyama@ucsd.edu |
| Anne Marie Berry | UCSD | Bio | amberry@ucsd.edu |
| Chris Day | UCSD |Bio | cdday@ucsd.edu |Sina Ghaffarnejad
Sina Ghaffarnejad| UCSD | Bio | sighaffa@ucsd.edu | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
## Day 3 Notes
[Plotting with ggplot2](https://swcarpentry.github.io/r-novice-gapminder/08-plot-ggplot2/index.html)
[R graph gallery - ggplot2 examples](https://r-graph-gallery.com/ggplot2-package.html)
```r=
#Call library
library(tidyverse) #could also load ggplot by itself using library(ggplot)
#Simple scatter plot
#specify data to use, x variable, y variable
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() #specify type of plot (here, scatterplot with points)
#Let's try a transformation and adjust the point transparency, for clarity
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
#alpha is opacity - lower number = more opaque
#can change size of points
geom_point(alpha = 0.5, size = 0.8) +
#transform x axis, log transform
scale_x_log10() +
#remove default gray background by using a pre-built theme
theme_bw()
#Let's adjust labels
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
#alpha is opacity - lower number = more opaque
#can change size of points
geom_point(alpha = 0.5, size = 0.8) +
#transform x axis, log transform
scale_x_log10() +
#remove default gray background by using a pre-built theme
theme_bw() +
#change size of text
theme(axis.text.x = element_text(size = 5))
#Note that each line of additional ggplot code overwrites the ones above
#so if you have a theme that made text size 10, but wanted text to be point 5, make sure to use axis.text AFTER theme_bw()
#Add a trend line
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5, size = 0.8) +
scale_x_log10() +
#add linear trend line, set size of line
#if you don't specify method, uses default method
#can also set color
geom_smooth(method = "lm", size = 0.1) +
theme_bw() +
theme(axis.text.x = element_text(size = 5))
```
### Challenge
Modify the size and color of the points in the previous example
Hint: do not use the aes() function
```r=
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.3, size = 1, color = "red") +
scale_x_log10() +
geom_smooth(method = "lm", size = 0.2, color = "blue") +
theme_bw() +
theme(axis.text.x = element_text(size = 5))
```
## More on modifying plots
```r=
#Set color based on a variable, in this case continent
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.75, size = 0.5) +
geom_smooth(method = "lm", color = "blue") +
scale_x_log10() +
theme_bw()
#Existing color palettes, example ColorBrewer
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.75, size = 0.5) +
geom_smooth(method = "lm", color = "blue") +
scale_x_log10() +
theme_bw() +
#set color palette
#note: works with discrete variables
scale_color_brewer(palette = "Dark2")
#If want to manually set colors
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.75, size = 0.5) +
geom_smooth(method = "lm", color = "blue") +
scale_x_log10() +
theme_bw() +
scale_color_manual(values = c("red", "orange", "yellow", "green", "blue"))
#Multi-panel figures
americas <- gapminder[gapminder$continent == "Americas",]
ggplot(data = americas, mapping = aes(x = year, y = lifeExp)) +
#make a line plot
geom_line() +
#have a separate panel for each country
facet_wrap(~country) +
#rotate axis labels
theme(axis.text.x = element_text(angle = 30))
#Adjust labels
ggplot(data = americas, mapping = aes(x = year, y = lifeExp)) +
geom_line() +
facet_wrap(~country) +
#adjust labels for x axis, y axis, title of figure
labs(x = "Year", y = "Life Expectancy", title = "Figure 1: Americas") +
theme(axis.text.x = element_text(angle = 30, hjust = 1),
#adjust alignment of plot title
plot.title = element_text(hjust = 0.5))
```
## Exporting plots
Can manually click 'save plot as image' and set file name, format, and size
Alternatively, assign plot to an object and use ggsave()
```r =
americas_plot <- ggplot(data = americas, mapping = aes(x = year, y = lifeExp)) +
geom_line() +
facet_wrap(~country) +
labs(x = "Year", y = "Life Expectancy", title = "Figure 1: Americas") +
theme(axis.text.x = element_text(angle = 30, hjust = 1),
plot.title = element_text(hjust = 0.5))
#specify extension, size, units
ggsave(filename = "results/amerias_panels.png", plot = americas_plot, width = 12, height = 10, dpi = 300, units = "cm")
```
## Long vs wide data in plotting
What if we wanted to compare a single variable between two years among all countries? Which format (wide vs long) would be easiest to use for a scatterplot?
```r=
#See levels for country
levels(as.factor(gapminder$country))
#use wide format data
ggplot(data = gap_wide, mapping = aes(x = pop_2007, y = pop_1952, color = country)) +
geom_point() +
geom_smooth(method = "lm") +
#get rid of legend because it's very large
#(has each country as own color)
theme(legend.position = "None")
```
What if we wanted to compare the average of a single variable among continents?
```r=
#use long data, combine dplyr functions with ggplot functions
gap_long %>%
filter(obs_type == "gdpPercap" & year == '2007') %>%
ggplot(mapping = aes(x = continent, y = obs_values)) +
#make invisible point outliers from the boxplots
geom_boxplot(outlier.alpha = 0) +
#want scatterplot of points on top of boxplot to better see distribution
#use jitter to offset the points slightly
#can specify size, opacity, color, and width and height for offset
geom_jitter(height = 0, width = 0.1, size = 0.5, alpha = 0.5, color = "blue") +
#add a point denoting mean
stat_summary(fun = mean, geom = "point", shape = 22, size = 2, color = "red", fill = "red")
#update naming for y axis label
ylab("GDP per capita")
```
How could we compare the average of all variables among continents?
```r=
gap_long %>%
filter(year == '2007') %>%
ggplot(mapping = aes(x = continent, y = obs_values)) +
geom_boxplot() +
facet_wrap(~obs_type, scales = "free_y")
#Faceting defaults to having everything having same ranges
#Can change this using the scales option to specify free or just free_x or free_y
#To facet by two variables, use facet_grid() rather than facet_wrap()
gap_long %>%
filter(year == '2007') %>%
ggplot(mapping = aes(x = continent, y = obs_values)) +
geom_boxplot() +
facet_grid(continent ~ obs_type, scales = "free")
#The plot doesn't look very pretty, but you can see how to facet by two variables
```
How could we compare population over time for each country?
```r=
gap_long %>%
filter(obs_type == "pop") %>%
ggplot(mapping = aes(x = year, y = obs_values, group = country)) +
geom_line(aes(color = country)) +
theme(axis.text.x = element_text(angle = 30, hjust = 1), legend.position = "None") +
scale_y_log10() +
ylab("Population") +
xlab("Year")
```
## Example: adding model outputs to plots
```r =
mod1 <- lm(gapminder$lifeExp ~ gapminder$pop)
summary(mod1)
mod1$coefficients
```
## More R plotting resources
Make plots interactive with [plotly](https://plotly.com/r/)
Make web app plots with [shiny](https://shiny.rstudio.com/)
## Day 3 Questions:
Please enter any questions not answered during live session here:
1.
## Pre-Day 4 Git Installation
* Install Git Bash via this website: https://git-scm.com/downloads
* Setup GitHub account via https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home (follow the interface instructions; recommmend using your personal email rather than UCSD email)
### End Day 3
## Workshop Day 4
GitHub Git Cheat Sheet - https://education.github.com/git-cheat-sheet-education.pdf
### First name and Last Name/Organization/Dept./Email
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | --------------- |
| Andrew Muroyama | UCSD | Biology | amuroyama@ucsd.edu |
| Rio Aguina-Kang | UCSD | Psychology |raguinakang@ucsd.edu |
| Peter Huang | UCSD | Bio | phuang@ucsd.edu |
| Christopher Taylor|UCSD |ESYS:EBE |cdtaylor@ucsd.edu |
| Skyler Zheng | UCSD | CogSci | x3zheng@ucsd.edu |
| Anne Marie Berry |UCSD | Bio | amberry@ucsd.edu |
| Sina Ghaffarnejad | UCSD | Bio | sighaffa@ucsd.edu |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
## Day 4 Questions:
Please enter any questions not answered during live session here:
1.
### End Day 4