---
tags: gps
---
# 2022-GPS-Data-Skills-Course
# Introduction to R Programming
here:https://hackmd.io/@U2NG/S1g1fvCKt
91
## Instructors & helpers
Instructor: Rick McCosh
Helpers: Stephanie Labou, Reid Otsuji, Kim Thomas
TA: Slade Mahoney
## Sign-in
**full name, email**
Sora Park, sop006@ucsd.edu
Gastelum, Erika, edgastel@ucsd.edu
Hung-Yang (Jason) Chien, hchien@ucsd.edu
Poonam Narewatt, pnarewat@ucsd.edu
Yuki Imura, yimura@ucsd.edu
Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu
Bing Rethy, brethy@ucsd.edu
Chenhao Nie. cnie@ucsd.edu
Yizhuo Liu, yil118@ucsd.edu
Woojong, Kim, w5kim@ucsd.edu
Morgan Cohen, m7cohen@ucsd.edu
Tendon, Stevinson, stendon@ucsd.edu
Simonian, Katie ksimonian@ucsd.edu
Jasmine Moheb jmoheb@ucsd.edu
Wenjun Gong, w7gong@ucsd.edu
Jaeyeon, Park, jap013@ucsd.edu
Chuyu Liu, chl082@ucsd.edu
Xinyi Du,x8du@ucsd.edu
Rebecca Howard, r1howard@ucsd.edu
Nicholas Valle njvalle@ucsd.edu
Hong En Jonas Lim, helim@ucsd.edu
Stella Lin, s3lin@ucsd.edu
Vorathip Plengpanit, vplengpa@ucsd.edu
Sam Cohen, szcohen@ucsd.edu
Nick Heimann, nheimann@ucsd.edu
Yimeng Yang, yiy047@ucsd.edu
Meghan Mattioli, mazavala@ucsd.edu
Isaac Wang, iswang@ucsd.edu
Hyun Ji Jung, hjjung@ucsd.edu
Yunxin Liu, yul188@ucsd.edu
Yu Shi, yus064@ucsd.edu
Ziyuan Zhu, ziz063@ucsd,edu
Yilin Che, yiche@ucsd.edu
Sunny Xu,qixu@ucsd.edu
Jared Hernandez, jah014@ucsd.edu
Emily Davalos, edavalos@ucsd.edu
Earley, Elisabeth, eearley@ucsd.edu
Broderick, Topil, btopil@ucsd.edu
Kwadwo Asiedu, kasiedu@ucsd.edu
Elise Spencer, Enspencer@ucsd.edu
pil
Tomas Lavados, tlavados@UCSD.edu
Yu Li, yul193@ucsd.edu
Alayna Bone, abone@ucsd.edu
Juliane Alfen, jalfen@ucsd.edu
Dan Bee Lee, dbl001@ucsd.edu
Laurence Jackson lsjackson@ucsd.edu
Yishu Tang, yit013@ucsd.edu
Patricia Resurreccion, paresurr@ucsd.edu
Nikki Qi, haqi@ucsd.edu
DDDDTino Tirado, ttirado@ucsd.edu
Tomas Lavados , tlavados@UCSD.edu
Koo Fum Kim, kfkim@ucsd.edu
Emily Irion, eirion@ucsd.edu
Ameer Othman, aothman@ucsd.edu
May Han, xuh042@ucsd.edu
Tommy Fang, z8fang@ucsd.edu
Haoran Jiang, haj005@ucsd.edu
Zhibei Wang, zhw048@ucsd.edu
Maxwell Chien tcchien@ucsd.edu
Merik Manzano, mmanzano@ucsd.edu
Yaohong Wang, yaw045@ucsd.edu
Ernesto Castaneda, ecastaneda@ucsd.edu
Meiyu Su, m2su@ucsd.edu
Chengan Li, chl030@ucsd.edu
Salma Shaikh, sshaikh@ucsd.edu
Austin Brown, aubrown@ucsd.edu
David Reimer, dreimer@ucsd.edu
Bowen Deng b2deng@ucsd.edu
Milena Zeray mzeray@ucsd.edu
Qihan Huang q7huang@ucsd.edu
Maykent Salazar mlsalazar@ucsd.edu
Collin Boudreaux, cboudreaux@ucsd.edu
Jeffrey Myers, jmyers@ucsd.edu
Wenjie Tang, w5tang@
Wilborn, Peter, pwilborn@ucsd.edu
Rawlins, Mackenna, mjrawlins@ucsd.edu
Bingru LI, bil005@ucsd.edu
Zizan Wang,ziw011@ucsd.edu
Weixiao Guo, w3guo@ucsd.edu
Masahiro Naka, mnaka@ucsd.edu
Kejun Chen, kec007@ucsd.edu
shivangi gupta, shg011@ucsd.edu
Marissa Myers, maclan@ucsd.edu
Mariya Nikseresht, mniksere@ucsd.edu
Kevin Zhou, kezhou@ucsd.edu
Brenna Wayne bwayne@ucsd.edu
Gordon M. Magne, gmagne@ucsd.edu
Yaohong Wang, yaw045@ucsd.edu
Amanda Lee-Low, aleelow@ucsd.edu
Junhui Xu, jux008@ucsd.edu
# Shared Notes
R can calculate mathmatical functions
```r=
3 + 5 * 2
(3 + 5) * 2
sin(1)
log(10)
log10(10)
```
```r=
?log10(10) #using ? will load the help pages
```
Find R help on:
stackoverflow https://stackoverflow.com
blogs
Google
comparisons can be done
```r=
1 ==1
1 != 1
1 != 2
1 < 2
3 > 1
etc.
```
```r=
x <- 1/10 #assigning a variable
x
log(x)
x <- x + 100
```
```r=
x
x2
2x #does not work , will error
```
# CHALLENGE 1
What will be the value of each variable after each statement in the following program?
1.) mass <- 50
2.) age <- 22
3.) mass <- mass * 2
4.) age <- age - 20
# vectors
```r=
1:5
(1:5)**2
vector <- 1:5
vector**2
a <- (1,3,5,7)
```
```r=
rm(a) #delete variables. use carefully
ls()
```
#packages
```r=
install.packages("knitr")
install.packages("ggplot2")
#gapminder data set
https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv
```
set working directory to whichever folder you saved the gapminder dataset to
read in data using read.csv()
```
gapminder <- read.csv("gapminder.csv") #use whatever name you named the csv you downloaded
```
81
# SIGN in
full name , email
Gastelum, Erika, edgastel@ucsd.edu
Gordon M. Magne, gmagne@ucsd.edu
Juliane Alfen, jalfen@ucsd.esu
Sora Park, sop006@ucsd.edu
John Kim, jok015@ucsd.edu
Yizhuo Liu, yil118@ucsd.edu
oooooooooooo
ssssssssssss
Nicholas Valle njvalle@ucsd.edu
Chenhao Nie. cnie@ucsd.edu
Yu li, yul193@ucsd.edu
Rebecca Howard, r1howard@ucsd.edu
Stevinson Tendon, stendon@ucsd.edu
Vorathip Plengpanit, vplengpa@ucsd.edu
Katie Simonian, ksimonian@ucsd.edu
Merik Manzano, mmanzano@ucsd.edu
J
Ernesto Castaneda, ecastaneda@ucsd.edu
Jaeyeon, Park, jap013@ucsd.edu
Wenjun Gong, w7gong@ucsd.edu
Wilborn, Peter, pwilborn@ucsd.edu
Bing Rethy, brethy@ucsd.edu
Nick Heimann, nheimann@ucsd.edu
Tomas Lavados, tlavados@UCSD.edu
Woojong, Kim, w5kim@ucsd.edu
Poonam Narewatt, pnarewat@ucsd.edu
Emily Irion, eirion@ucsd.eduucsd.eduucsd.eduucsd.eduucsd.eduucsd.edu
Earley, Elisabeth, eearley@ucsd.edu
Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu
Tino Tirado, ttirado@ucsd.edu
Morgan Cohen, m7cohen@ucsd.edu
Salma Shaikh, sshaikh@ucsd.edu
Jeffrey Myers, jmyers@ucsd.edu
shivangi gupta, shg011@ucsd.edu
Stella Lin, s3lin@ucsd.edu
Nikki Qi, haqi@ucsd.edu
Koo Fum Kim, kfkim@ucsd.edu
Tsu Ping, Wang, tsu002@ucsd.edu
Chengan Li, chl030@ucsd.edu
Dan Bee Lee, dbl001@ucsd.edu
Elise Spencer, Enspencer@ucsd.edu
Ziyuan Zhu,ziz063@ucsd.edu
Jared Hernandez,jah014@ucsd.edu
Emily Davalos,edavalos@ucsd.edu
Chuyu Liu, chl082@ucsd.edu
Meiyu Su, m2su@ucsd.edu
Haoran Jiang, haj005@ucsd.edu
May Han, xuh042@ucsd.edu
Patricia Resurreccion, paresurr@ucsd.edu
qihan huang, q7huang@ucsd.edu
Zhibei Wang, zhw048@ucsd.edu
Maykent Salazar mlsalazar@ucsd.edu
Elizabeth Muthoni, emuthoni@ucsd.edu
Yu Shi, yus064@ucsd.edu
Milena Zeray mzeray@ucsd.edu
Sunny Xu, qixu@ucsd.edu
Rawlins, Mackenna, mjrawlins@ucsd.edu
Broderick, Topil, btopil@ucsd.edu
Mariya Nikseresht, mniksere@ucsd.edu
Amanda Lee-Low, aleelow@ucsd.edu
Kejun Chen, kec007@ucsd.edu
Kwaadwo Asiedu, kasiedu@ucsd.edu
maxwell chien, tcchien@ucsd
David Reimer, dreimer@ucsd.edu
Yishu Tang, yit013@ucsd.edu
zizan wang, ziw011@ucsd.edu
Alayna Bone, abone@ucsd.edu
Yimeng Yang, yiy047@ucsd.edu
Weixiao Guo, w3guo@ucsd.edu
Collin Boudreaux, cboudreaux@ucsd.edu
Tommy Fang z8fang@ucsd.edu
Bowen Deng b2deng@ucsd.edu
Bingru Li bil005@ucsd.edu
Brenna Wayne bwayne@ucsd.edu
Hyun Ji Jung, hjjung@ucsd.edu
Yunxin Liu, yul188@ucsd.edu
Yuki Imura. yimura@ucsd.edu
Kevin Zhou, kezhou@ucsd.edu
Hung-Yang Chien, hchien@ucsd.edu
Meghan Mattioli, mazavala@ucsd.edu
Xinyi,Du x8du@ucsd.edu
Junhui Xu, jux008@ucsd.edu
qihan huang q7huang@ucsd.edu
Koo Fum Kim, kfkim@ucsd.edu
# Notes:
Reivew
variables
vectors
set working directory
```r=
read.csv("gapminder.csv") #read in gapminder data
```
view gapminder data
```r=
View(gapminder) #remember View needs capital V
```
```r=
str(gapminder) #view data structure
```
```r=
head(gapminder)
tail(gapminder)
```
Bracket notation
```r
gapminder[1,1]
gapminder[1,6]
```
```r=
gapminder[1:5, c(1,3,5)]
```
```r=
gapminder[gapminder$country == "Australia","lifeExp"]
```
```r
gapminder[gapminder$country=="Australia" & gapminder$year==1952, c("year","lifeExp")]
```
## challenge
Use your new ‘subsetting’ skills to display the life expectancy and GDPperCapita for people in Paraguay in 2007
answer:
```r=
gapminder[gapminder$country=="Paraguay" & gapminder$year==2007, "lifeExp"]
```
writing out a table to csv file
```r=
write.table(
headGapminder,
file="headGapminder.csv",
sep=",", quote=FALSE, row.names=FALSE
)
```
Other functions:
```r=
min(gapminder$lifeExp)
max(gapminder$lifeExp)
mean(gapminder$lifeExp)
sd(gapminder$lifeExp)
rnorm(1,3,2)
rep(1,4)
seq(1,6,2)
``
# knitr package
use Knitr to create reports that conveniently combine code, output (including plots), and notes in one document.
start a new rmarkdown file from the RStudio file menu
Rmarkdown cheat sheet:
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
```
62
# 2022-02-01
## Sign-in
**full name, email**
Sora Park, sop006@ucsd.edu
John Kim, jok015@ucsd.edu
Stella Lin, s3lin@ucsd.edu
Nick Heimann, nheimann@ucsd.edu
Wilborn, Peter, pwilborn@ucsd.edu
Emily Davalos, edavalos@ucsd.edu
Tomas Lavados, tlavados@UCSD.edu
Nicholas Valle, njvalle@ucsd.edu
Wenjun Gong, w7gong@ucsd.edu
morgan cohen, m7cohen@ucsd.edu
Yuki Imura, yimura@ucsd.edu
Yizhuo Liu, yil118@ucsd.edu
Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu
Sam Cohen, szcohen@ucsd.edu
Jeff Myers, jmyers@ucsd.edu
Weixiao Guo, w3guo@ucsd.edu
Haoran Jiang, haj005@ucsd.edu
Marissa Myers, maclan@ucsd.edu
Juliane Alfen, jalfen@ucsd.edu
Mariya Nikseresht, mniksere@ucsd.edu
Yaohong Wang, yaw045@ucsd.edu
Ziyuan Zhu,ziz063@ucsd.edu
Rebecca Howard, r1howard@ucsd.edu
Sunny Xu,qixu@ucsd.edu
Tino Tirado, ttirado@ucsd.edu
Alayna Bone, abone@ucsd.edu
Milena Zeray mzeray@ucsd.edu
Dan Bee Lee, dbl001@ucsd.edu
Chenhao Nie. cnie@ucsd.edu
Salma Shaikh, sshaikh@ucsd.edu
Jared Hernandez jah014@ucsd.edu
May Han, xuh042@ucsd.edu
Jasmine Moheb, jmoheb@ucsd.edu
ernesto castaneda, ecastaneda@ucsd.edu
Stevinson Tendon, stendon@ucsd.edu
Vorathip Plengpanit, vplengpa@ucsd.edu
yiy047@ucsd.edu
Chengan Li, chl030@ucsd.edu
Emily Irion, eirion@ucsd.edu
elisabeth earley, eearley@ucsd.edu
Meiyu Su, m2su@ucsd.edu
Zhibei Wang, zhw048@ucsd.edu
Meghan Mattioli, mazavala@ucsd.edu
maxwell chien tcchien@ucsd.eud
Elise Spencer, Enspencer@ucsd.edu
Nikki Qi. haqi@ucsd.edu
Bing Rethy, brethy.edu
Bingru LI, bil005@ucsd.edu
David Reimer, dreimer@ucsd.edu
Elizabeth Muthoni, emuthoni@ucsd.edu
Dan Bee Lee, dbl001@ucsd.edu
Broderick Topil, btopiL@ucsd.edu
Yu Li, yul193@ucsd.edu
Patricia Resurreccion, paresurr@ucsd.edu
Kejun Chen, kec007@ucsd.edu
Jaeyeon, Park, jap0jap013@ucsd.edu
Hung-Yang Chien, hchien@ucsd.edu
Junhui Xu, jux008@ucsd.edu
Zizan Wang, ziw011@ucsd.edu
Yishu Tang, yit013@ucsd.edu
Merik Manzano, mmanzano@ucsd.edu
Chuyu Liu, chl082@ucsd.edu
kwadwo Asiedu, kasiedu@ucsd.edu
Collin Boudreaux, cboudreaux@ucsd.edu
Hyun Ji Jung, hjjunc@ucsd.edu
Rawlins, Mackenna, mjrawlins@ucsd.edu
Bowen Deng b2deng@ucsd.edu
Austin Brown, aubrown@ucsd.edu
Amanda Lee-Low, aleelow@ucsd.edu
Yunxin Liu, yul188@ucsd.edu
Xinyi Du,x8du@ucsd.edu
# NOTEs:
```r=
#install.packages("dplyr")
library(dplyr)
```
```r=
gapminder <- read.csv("gapminder.csv") # load data
```
```r=
gapminderCountry <- select(gapminder, country)
```
```r=
gapminder_year_country_lifeExp <- select(gapminder, year, country, lifeExp)
```
```r=
canada_data <- filter(gapminder, country == "Canada")
str(canada_data)
```
## using the Tidyverse package
dplyr package
filter() rows
select() columns
```r=
gap_europe <- gapminder %>% filter(continent == "Europe")
gap_europe_lifeExp <- gap_europe %>% select(lifeExp, country, year)
str(gap_europe_lifeExp)
```
```r=
gap_europe <- gapminder %>%
filter(continent == "Europe") %>%
select(lifeExp, country, year)
gap_europe
```
## Challenge 1
Make a new data frame called "New_Data" that has only the columns country, life expectancy, and year for only the countries in Africa
```r=
new_data <- gapminder %>%
filter(continent == "Africa") %>%
select(country, lifeExp, year)
str(new_data)
```
```r=
gap_no_lifeExp <- gapminder %>%
select(-lifeExp)
str(gap_no_lifeExp)
```
## mutate()
```r=
gap_GDP <- gapminder %>%
mutate(GDP = gdpPercap * pop)
str(gap_GDP)
```
## if else
```r=
gap_50 <- gapminder %>%
mutate(is50 = ifelse(lifeExp >= 50,"over 50", "under 50"))
str(gap_50)
```
```r=
gap_50 %>% filter(is50 == "under 50") %>%
count()
gap50
```
```r=
gap50_num <- gapminder %>%
mutate(is50 = ifelse(lifeExp >= 50,"over 50", "under 50"))
str(gap50_num)
```
```r=
gap50_num %>% filter(is50 == "under 50") %>%
count()
gap_50
gap50_num
```
```r=
gap_50cont %>% filter(is50 == "under 50") %>%
count()
gap_50cont
```
```r=
gap_50cont50 %>% filter(is50 == "under 50") %>%
count(continent, sort = TRUE)
gap_50cont50
```
# Challange 2
Create a new data frame called new_data2 from the gapminder data frame that has an additional column indicating if the population is “over 50 million people” or “less than 50 million people”
answer:
```r=
new_data2 <- gapminder %>%
mutate(is50million = ifelse(pop >= 50000000,"over 50 million people", "under 50 million people"))
str(new_data2)
```
## group_by() and summarize()
```r=
GdpPerCap_by_continent <- gapminder %>%
group_by(continent) %>%
summarize(meanGDPPerCap = mean(gdpPercap))
GdpPerCap_by_continent
```
## Challange 3
Calculate the average life expectancy per country. Which country has the longest life expectancy? Which country has the shortest life expectancy?
67
# 2022-02-03
## Sign-in
**full name, email**
John Kim, jok015@ucsd.edu
Sora Park, sop006@ucsd.edu
Emily Davalos, edavalos@ucsd.edu
Poonam Narewatt, pnarewat@ucsd.edu
Nick Heimann, nheimann@ucsd.edu
Wilborn Peter, pwilborn@ucsd.edu
Bing Rethy, brethy@ucsd.edu
Tomas Lavados, tlavados@UCSD.edu
Broderick Topil, btopil@ucsd.edu
Merik Manzano, mmanzano@ucsd.edu
Yu Li, yul193@ucsd.edu
morgan cohen, m7cohen@ucsd.edu
Chenhao Nie. cnie@ucsd.edu
Yimeng Yang, yiy047@ucsd.edu
Meiyu Su, m2su@ucsd.edu
Milena Zeray, mzeray@ucsd.edu
Yizhuo Liu, yil118@ucsd.edu
Nikki Qi, haqi@ucsd.edu
Jared Hernandez, jah014@ucsd.edu
Patricia Resurreccion, paresurr@ucsd.edu
Rebecca Howard, r1howard@ucsd.edu
Sunny Xu, qixu@ucsd.edu
Juliane Alfen, jalfen@ucsd.edu
Zhibei Wang, zhw048@ucsd.edu
Emily Irion, eirion@ucsd.edu
Wenjun Gong, w7gong@ucsd.edu
Jaeyeon, Park, jap013@ucsd.edu
rawlins, mackenna, mjrawlins@ucsd.edu
Salma Shaikh, sshaikh@ucsd.edu
Hung-Yang (Jason) CHien, hchien@ucsd.edu
Yishu Tang, yit013@ucsd.edu
Kevin Zhou, kezhou@ucsd.edu
Jeffrey Myers, jmyers@ucsd.edu
Marissa Myers, maclan@ucsd.edu
YukiImura, yimura@ucsd.edu
Mariya Nikseresht, mniksere@ucsd.edu
Stella Lin, s3lin@ucsd.edu
Chengan Li, chl030@ucsd.edu
Tino Tirado, ttirado@ucsd.edu
Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu
Elise Spencer, Enspencer@ucsd.edu
Tung Cheh (Maxwell) Chien tcchien@ucsd.edu
Ziyuan Zhu,ziz063@ucsd.edu
Bowen Deng b2deng@ucsd.edu
Chuyu Liu, chl082@ucsd.edu
Amanda Lee-Low, aleelow@ucsd.edu
DMay Han, xuh042@ucsd.edu
Meghan Mattioli, mazavala@ucsd.edu
David Reimer, dreimer@ucsd.edu
koo fum kim, kfkim@ucsd.edu
Alayna Bone, abone@ucsd.edu
Yunxin Liu, yul188@ucsd.edu
Dan Bee Lee, dbl001@ucsd.edu
kwadwo asiedu, kasiedu@ucsd.edu
Xinyi Du,x8du@ucsd.edu
Brenna Wayne bwayne@ucsd.edu
Collin Boudreaux, cboudreaux@ucsd.edu
Elisabeth Earley, eearley@ucsd.edu
Bingru LI, bil005@ucsd.edu
Zizan Wang, ziw011@ucsd.edu
Hyun Ji Jung, hjjung@ucsd.edu
Sam Cohen, szcohen@ucsd.edu
Kejun Chen, kec007@ucsd.edu
Junhui Xu, jux008@ucsd.edu
Elizabeth Muthoni, emuthoni@ucsd.edu
Yaohong Wang, yaw045@ucsd.edu
# Notes
install and/or load packages:
```r=
library(lubridate)
library(dplyr)
```
These packages are a part of the the Tidyverse
there is a Tidyverse package that installs several packages useful for data analysis
library(tidyverse)
### review
select()
filter()
count()
mutate()
group_by()
summarize()
```r
lifeExp_by_country <- gapminder %>%
group_by(country) %>%
summarize(MeanLifeExp = mean(lifeExp))
lifeExp_by_country
```
```r=
lifeExp_by_country <- gapminder %>%
group_by(continent, country) %>%
summarize(MeanLifeExp = mean(lifeExp))
View(lifeExp_by_country)
```
# tibbles
A tibble is a new -ish data structure in tidyverse that is a lot like a data frame, but with some formatting differences and some other behind the scenes changes that are supposed to make working with data frames, particularly large ones simpler.
# lists
lists Lists are similar to a vector, where they consist of one or more values in a particular order, but unlike a list, the data types of those values do not all need to be the same type.
```r
length(gapminder)
```
# Tidy data
column is a variable
row is an observation
# rename()
```r=
gap_GDPpc <- gapminder %>%
rename(GDPpc = gdpPercap)
```
## combining or adding rows to datasets
```r=
new_row <- data_frame(country = "USA",
year = 2002,
pop = 1000000,
continent = "Americas",
lifeExp = 100,
gdpPercap = 200)
gapminder_taller <- rbind(gapminder, new_row)
str(gapminder_taller)
tail(gapminder_taller)
```
```r=
set.seed(1)
new_column <- rnorm(1704)
```
```r
gap_wider <- cbind(gapminder, new_column)
```
# seperate()
test <- data_frame(A = C(1,2,3), B = c("1-2", "2-3", "4-5"))
```r=
heights <- c(2, 4, NA, 6, 8, NA, 12)
mean(heights)
max(heights)
min(heights)
heights <- c(2, 4, NA, 6, 8, NA, 12)
mean(heights, na.rm = TRUE)
max(heights, na.rm = TRUE)
heights
heights2 <- na.omit(heights) # returns a vector with the missing values removed
heights2
heights2 <- heights[complete.cases(heights)] # returns a vector with complete cases
```
# identify NA
```r=
is.na(heights)
weights <- seq(1, 7,1)
cbind(heights, weights)
weights[is.na(heights)] # which values are missing
```
# Multiple comparisons
```r=
gapminder %>%
filter(country == "Canada" | country == "United States" | country == "Mexico")
```
```r=
gapminder %>% filter(country %in% c("Canada", "Mexico", "United States"))
```
612
https://hackmd.io/4S3SFxUdQaOJRsBloTZftQ
61
# 2022-02-01
## Sign-in
full name , email
John Kim, jok015@ucsd.edu
Jaeyeon, Park, jap013@ucsd.edu
Emily Davalos, edavalos@ucsd.edu
Broderick Topil, btopil@ucsd.edu
Tino Tirado, ttirado@ucsd.edu
Elisabeth Earley, eearley@ucsd.edu
Poonam Narewatt, pnarewat@ucsd.edu
Amanda Lee-Low, aleelow@ucsd.edu
Sora Park, sop006@ucsd.edu
Bowen Deng b2deng@ucsd.edu
morgan cohen, m7cohen@ucsd.edu
Kevin Zhou, kezhou@ucsd.edu
Yizhuo Liu, yil118@ucsd.edu
Tomas Lavados, tlavados@ucsd.edu
Wenjun Gong, w7gong@ucsd.edu
Rebecca Howard, r1howard@ucsd.edu
Yuki Imura, yimura@ucsd.edu
Chenhao Nie. cnie@ucsd.edu
Elise Spencer, Enspencer@ucsd.edu
Chengan Li, chl030@ucsd.edu
Milena Zeray, mzeray@ucsd.edu
Maxwell Chien, tcchien@ucsd.edu
Zhibei Wang, zhw048@ucsd.edu
Sam Cohen, szcohen@ucsd.edu
Marissa Myers, maclan@ucsd.edu
Jeffrey Myers, jmyers@ucsd.edu
Yu Li, yul193@ucsd.edu
Yimeng Yang, yiy047@ucsd.edu
Merik Manzano, mmanzano@ucsd.edu
Bingru LI, bil005@ucsd.edu
Salma Shaikh, sshaikh@ucsd.edu
Nikki Qi, haqi@ucsd.edu
David Reimer, dreimer@ucsd.edu
Xinyi Du, x8du@ucsd.edu
Dahlia Lopez del006@ucsd.edu
Sunny Xu,qixu@ucsd.edu
qihan huang q7huang@ucsd.edu
Elizabeth Muthoni, emuthoni@ucsd.edu
Chuyu Liu, chl082@ucsd.edu
Kejun Chen, kec007@ucsd.edu
Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu
Jared Hernandez, jah014@ucsd.edu
Vorathip Plengpanit, vplengpa@ucsd.edu
Junhui Xu, ju x008@ucsd.edu
Patricia Resurreccion, paresurr@ucsd.edu
Zizan Wang, ziw011@ucsd.edu
Meiyu Su, m2su@ucsd.edu
Kwadwo asiedu, kasiedu@ucsd.edu
Stevinson Tendon, stendon@ucsd.edu
Yishu Tang, yit013@ucsd.edu
Emily Irion, eirion@ucsd.edu
Bing Rethy, brethy@ucsd.edu
Ziyuan Zhu,ziz063@ucsd.edu
Mariya Nikseresht, mniksere@ucsd.edu
Yaohong Wang, yaw045@ucsd.edu
Dan Bee Lee, dbl001@ucsd.edu
Meghan Mattioli, mazavala@ucsd.edu
Yunxin Liu, yul188@ucsd.edu
Rawlins, Mackenna, mjrawlins@ucsd.edu
Wilborn, Peter, pwilborn@ucsd.edu
Hung-Yang (Jason) Chien, hchien@ucsd.edu
Hyun Ji Jung, hjjung@ucsd.edu
May Han, xuh042@ucsd.edu
Brenna Wayne bwayne@ucsd.edu
# creating plost with ggplot 2
load ggplot package
```r=
library(ggplot2)
```
load data
```r=
gapminder <- read.csv(file = "gapminder.csv")
```
```r=
ggplot(data= gapminder, aes(x = gdpPercap, y=lifeExp)) + geom_point()
```
Challange 1:
Create a scatter plot showing life expectancy on the y-axis and population on the x-axis
Answer:
```r=
ggplot(data = gapminder, aes(x = lifeExp, y = pop)) +
geom_point()
```
saving a plot
```r=
ggsave("filename.pdf") #give your file name with extension to save the plot it will show up in your R working directory
```
you can also save your plot using the pdf() function
```r=
pdf("gdpPercap_vs_time.pdf", width=8, height=8)
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
dev.off() #make sure to use this line if you use the pdf() function it turns off the pdf()
```
```r=
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() + theme_classic()
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() + theme_minimal()
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() + theme_bw()
```
### line plots
```r=
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) + geom_line()
```
```r=
P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, by = continent))
P <- P + geom_line()
P
```
```r=
P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
P <- P + geom_line()
P <- P + geom_point(color = "black")
P
```
changing layer order: put color first, then line
```r=
P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
P <- P + geom_point(color = "black")
P <- P + geom_line()
P
```
#manually select colors
```r=
P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
P <- P + geom_point(color = "black")
P <- P + geom_line(color = "darkOliveGreen")
P
```
manually specifying colors for continents
```r=
p <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
p <- p + geom_line()
p <- p + scale_colour_manual(values = c('red','pink','yellow','black','green'))
p
```
The ggplot2 cheatsheet https://www.rstudio.com/resources/cheatsheets/
change the plot to points
```r=
p <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
p <- p + geom_point() #change this to here to point
p <- p + scale_colour_manual(values = c('red','pink','yellow','black','green'))
p
```
### labels
```r=
p <- ggplot(data= gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
p <- p + geom_line()
p <- p + labs(title = "Life Exp over time", x = "year", y = "Life Expectancy") #label text added here
p
```
#modifying x & y scales
```r
p <- ggplot(data= gapminder, aes(x = year, y = lifeExp, by = country, color = continent))
p <- p + geom_line()
p <- p + labs(title = "Life Exp over time", x = "year", y = "Life Expectancy")
p <- p + scale_y_continuous(limits = c(0,100), breaks = seq(0, 100, 10), labels = seq(0, 100, 10) ) #limits = numbers, breaks = tic marks, labels = for marks
p <- p + scale_x_continuous(limits = c(1952, 2007), breaks = seq(1952,2007, 10), labels = seq(1952,2007, 10) ) #limits = numbers, breaks = tic marks, labels = for marks
p
```
## Challenge 2
Create a scatter plot with year on the X-axis and population on the y-axis.
Create an appropriate title for the plot
Change the x and y axis titles to something more readable
Change the x and y axis valuesto something more useful
answer:
```r
p <- ggplot(data = gapminder, aes(x = year, y = pop))
p <- p + geom_point()
p <- p + labs(title = "Challenge", x = "year", y = "population")
p <- p + scale_y_continuous(limits = c(0,2000000000) , breaks = seq(0, 2000000000, 1000000000 ) , labels = seq(0, 2000000000, 1000000000 ) )
p <- p + scale_x_continuous(limits = c(1952,2017) , breaks = seq(1952, 2017,15) , labels = seq(1952, 2017,15))
p
```
59
# sub setting data
59
# 2022-02-10 week 3 R
## Sign-in
John Kim, jok015@ucsd.edu
Yuki Imura, yimura@ucsd.edu
Sora Park, sop006@ucsd.edu
Broderick Topil, btopil@ucsd.edu
Yizhuo Liu, yil118@ucsd.edu
Jared Hernandez, jah014@ucsd.edu
Emily Davalos, edavalos@ucsd.edu
Sunny Xu,qixu@ucsd.edu
Vorathip Plengpanit, vplengpa@ucsd.edu
Wenjun Gong, w7gong@ucsd.edu
Tino Tirado, ttirado@ucsd.edu
Stevinson Tendon, stendon@ucsd.edu
Chenhao Nie. cnie@ucsd.edu
Salma Shaikh, sshaikh@ucsd.edu
Hyun Ji Jung, hjjung@ucsd.edu
Poonam Narewatt, pnarewat@ucsd.edu
Tomas Lavados, tlavados@UCSD.edu
David Reimer, dreimer@ucsd.edu
Wilborn, Peter, pwilborn@ucsd.edu
Nikki Qi, haqi@ucsd.edu
Qihan Huang, q7huang@ucsd.edu
Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu
Emily Irion, eirion@ucsd.edu
Bowen Deng b2deng@ucsd.edu
Chuyu Liu, chl082@ucsd.edu
Meiyu Su, m2su@ucsd.edu
Bing Rethy, brethy@ucsd.edu
Rebecca Howard, r1howard@ucsd.edu
Milena Zeray, mzeray@ucsd.edu
Yishu Tang, yit013@ucsd.edu
maxwell chien,tcchien@ucsd.edu
Meghan Mattioli, mazavala@ucsd.edu
Yu Li, yul193@ucsd.edu
Amanda Lee-Low, aleelow@ucsd.edu
Chengan Li, chl030@ucsd.edu
morgan cohen, m7cohen@ucsd.edu
Jeffrey Myers, jmyers@ucsd.edu
Marissa Myers, maclan@ucsd.edu
Zhibei Wang, zhw048@ucsd.edu
Xinyi Du, x8du@ucsd.edu
Yunxin Liu, yul188@ucsd.edu
Bingru LI, bil005@ucsd.edu
Elizabeth Muthoni, emuthoni@ucsd.edu
Junhui Xu, jux008@ucsd.edu
Kejun Chen, kec007@ucsd.edu
Yaohong Wang,yaw045@ucsd.edu
Dahlia Lopez, del006@ucsd.edu
Rawlins, Mackenna, mjrawlins@ucsd.edu
Kwadwo Asiedu, kasiedu@ucsd.edu
Jaeyeon Park, jap013@ucsd.edu
Mariya Nikseresht, mniksere@ucsd.edu
Ziyuan Zhu,ziz063@ucsd.edu
Elise Spencer, Enspencer@ucsd.edu
Stella Lin, s3lin@ucsd.edu
Collin Boudreaux, cboudreaux@ucsd.edu
Merik Manzano, mmanzano@ucsd.edu
Patricia Resurreccion, paresurr@ucsd.edu
Zizan Wang, ziw011@ucsd.edu
** assign gapminder data to a varialble**
```r
dataOceania <- gapminder %>%
filter(continent == "Oceania")
str(dataOceania)
```
```r=
p <- ggplot(data = dataOceania, aes(x = year, y = lifeExp, by = country))
p <- p + geom_line()
p
```
```r=
p <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country))
p <- p + geom_line(color = "red")
p <- p + geom_line(data = dataOceania, aes(x = year, y = lifeExp, by = country), color = 'black')
p
```
# Challange 4
Create a scatter plot with year on the X-axis and population on the y-axis. Choose a continent to highlight in a different color (you can choose any continent and any colors)
```r=
p <- ggplot(data = gapminder, aes(x = year, y = pop))
p <- p + geom_point(color = "green")
p <- p + geom_point(data = gapAsia, aes(x = year, y = pop), color = 'yellow')
p
```
# transparency
```r=
p <- ggplot(data = gapminder, aes(x = year, y = pop))
p <- p + geom_point(color = "black", alpha = 0.05)
p <- p + scale_y_log10()
p <- p + labs(y = "log(pop)")
p
```
```r=
# Jitter
p <- ggplot(data = gapminder, aes(x = year, y = pop))
p <- p + geom_jitter(color = "black", alpha = 0.05)
p <- p + scale_y_log10()
p <- p + labs(y = "log(pop)")
p
```
```r=
# trend line
p <- ggplot(data = gapminder, aes(x = year, y = pop))
p <- p + geom_jitter(color = "black", alpha = 0.05, width = 1, height = 0)
p <- p + scale_y_log10()
p <- p + geom_smooth(method = "lm")
p <- p + labs(y = "log(pop)")
p
```
```r=
# adjust line size
p <- ggplot(data = gapminder, aes(x = year, y = pop))
p <- p + geom_jitter(color = "black", alpha = 0.05, width = 1, height = 0)
p <- p + scale_y_log10()
p <- p + geom_smooth(method = "lm", size = 2)
p <- p + labs(y = "log(pop)")
p
```
# box plots
```r
gap2007 <- gapminder %>%
filter(year == 2007)
#populations per continent
p <- ggplot(data = gap2007, aes(x = continent, y = pop))
p <- p + geom_boxplot()
p <- p + scale_y_log10()
p
```
```r=
gapminder2007 <- gapminder %>% filter(year == 2007)
P <- ggplot(data = gapminder2007, aes(x = continent, y = pop))
P <- P + geom_boxplot()
P <- P + geom_jitter(height = 0, width = 0.1)
P <- P + scale_y_log10()
P <- P + labs(y = "Log(Pop)")
P
```
```r=
meanlifeExpContinent <- gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarize(MeanLifeExp = mean(lifeExp), SDLifeExp = sd(lifeExp))
meanlifeExpContinent
```
```r=
p <- ggplot(data = meanlifeExpContinent, aes(x = continent, y = MeanLifeExp))
p <- p + geom_col()
p
```
# multipanel plots
```r=
europe <- gapminder %>% filter(continent == "Europe")
P <- ggplot(data = europe, aes(x = year, y = pop))
P <- P + geom_line()
P <- P + facet_wrap(~ country)
P
```
```r
europe <- gapminder %>% filter(continent == "Europe")
P <- ggplot(data = europe, aes(x = year, y = pop))
P <- P + geom_line()
P <- P + facet_wrap(~ country)
P <- P + theme(axis.text.x = element_text(angle = 45))
P
```
Challange
Create a plot of life expectancy over year for each country in Africa Make the x and y axis labels nicer
```r=
africas <- gapminder %>% filter(continent == "Africa")
P <- ggplot(data = africas, aes(x = year, y = lifeExp))
P <- P + geom_line()
P <- P + facet_wrap(~ country)
P <- P + labs (x = "Year", y = "Life Expectancy")
P <- P + theme(axis.text.x = element_text(angle = 45))
P
```