--- tags: gps --- # 2022-GPS-Data-Skills-Course # Introduction to R Programming here:https://hackmd.io/@U2NG/S1g1fvCKt 91 ## Instructors & helpers Instructor: Rick McCosh Helpers: Stephanie Labou, Reid Otsuji, Kim Thomas TA: Slade Mahoney ## Sign-in **full name, email** Sora Park, sop006@ucsd.edu Gastelum, Erika, edgastel@ucsd.edu Hung-Yang (Jason) Chien, hchien@ucsd.edu Poonam Narewatt, pnarewat@ucsd.edu Yuki Imura, yimura@ucsd.edu Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu Bing Rethy, brethy@ucsd.edu Chenhao Nie. cnie@ucsd.edu Yizhuo Liu, yil118@ucsd.edu Woojong, Kim, w5kim@ucsd.edu Morgan Cohen, m7cohen@ucsd.edu Tendon, Stevinson, stendon@ucsd.edu Simonian, Katie ksimonian@ucsd.edu Jasmine Moheb jmoheb@ucsd.edu Wenjun Gong, w7gong@ucsd.edu Jaeyeon, Park, jap013@ucsd.edu Chuyu Liu, chl082@ucsd.edu Xinyi Du,x8du@ucsd.edu Rebecca Howard, r1howard@ucsd.edu Nicholas Valle njvalle@ucsd.edu Hong En Jonas Lim, helim@ucsd.edu Stella Lin, s3lin@ucsd.edu Vorathip Plengpanit, vplengpa@ucsd.edu Sam Cohen, szcohen@ucsd.edu Nick Heimann, nheimann@ucsd.edu Yimeng Yang, yiy047@ucsd.edu Meghan Mattioli, mazavala@ucsd.edu Isaac Wang, iswang@ucsd.edu Hyun Ji Jung, hjjung@ucsd.edu Yunxin Liu, yul188@ucsd.edu Yu Shi, yus064@ucsd.edu Ziyuan Zhu, ziz063@ucsd,edu Yilin Che, yiche@ucsd.edu Sunny Xu,qixu@ucsd.edu Jared Hernandez, jah014@ucsd.edu Emily Davalos, edavalos@ucsd.edu Earley, Elisabeth, eearley@ucsd.edu Broderick, Topil, btopil@ucsd.edu Kwadwo Asiedu, kasiedu@ucsd.edu Elise Spencer, Enspencer@ucsd.edu pil Tomas Lavados, tlavados@UCSD.edu Yu Li, yul193@ucsd.edu Alayna Bone, abone@ucsd.edu Juliane Alfen, jalfen@ucsd.edu Dan Bee Lee, dbl001@ucsd.edu Laurence Jackson lsjackson@ucsd.edu Yishu Tang, yit013@ucsd.edu Patricia Resurreccion, paresurr@ucsd.edu Nikki Qi, haqi@ucsd.edu DDDDTino Tirado, ttirado@ucsd.edu Tomas Lavados , tlavados@UCSD.edu Koo Fum Kim, kfkim@ucsd.edu Emily Irion, eirion@ucsd.edu Ameer Othman, aothman@ucsd.edu May Han, xuh042@ucsd.edu Tommy Fang, z8fang@ucsd.edu Haoran Jiang, haj005@ucsd.edu Zhibei Wang, zhw048@ucsd.edu Maxwell Chien tcchien@ucsd.edu Merik Manzano, mmanzano@ucsd.edu Yaohong Wang, yaw045@ucsd.edu Ernesto Castaneda, ecastaneda@ucsd.edu Meiyu Su, m2su@ucsd.edu Chengan Li, chl030@ucsd.edu Salma Shaikh, sshaikh@ucsd.edu Austin Brown, aubrown@ucsd.edu David Reimer, dreimer@ucsd.edu Bowen Deng b2deng@ucsd.edu Milena Zeray mzeray@ucsd.edu Qihan Huang q7huang@ucsd.edu Maykent Salazar mlsalazar@ucsd.edu Collin Boudreaux, cboudreaux@ucsd.edu Jeffrey Myers, jmyers@ucsd.edu Wenjie Tang, w5tang@ Wilborn, Peter, pwilborn@ucsd.edu Rawlins, Mackenna, mjrawlins@ucsd.edu Bingru LI, bil005@ucsd.edu Zizan Wang,ziw011@ucsd.edu Weixiao Guo, w3guo@ucsd.edu Masahiro Naka, mnaka@ucsd.edu Kejun Chen, kec007@ucsd.edu shivangi gupta, shg011@ucsd.edu Marissa Myers, maclan@ucsd.edu Mariya Nikseresht, mniksere@ucsd.edu Kevin Zhou, kezhou@ucsd.edu Brenna Wayne bwayne@ucsd.edu Gordon M. Magne, gmagne@ucsd.edu Yaohong Wang, yaw045@ucsd.edu Amanda Lee-Low, aleelow@ucsd.edu Junhui Xu, jux008@ucsd.edu # Shared Notes R can calculate mathmatical functions ```r= 3 + 5 * 2 (3 + 5) * 2 sin(1) log(10) log10(10) ``` ```r= ?log10(10) #using ? will load the help pages ``` Find R help on: stackoverflow https://stackoverflow.com blogs Google comparisons can be done ```r= 1 ==1 1 != 1 1 != 2 1 < 2 3 > 1 etc. ``` ```r= x <- 1/10 #assigning a variable x log(x) x <- x + 100 ``` ```r= x x2 2x #does not work , will error ``` # CHALLENGE 1 What will be the value of each variable after each statement in the following program? 1.) mass <- 50 2.) age <- 22 3.) mass <- mass * 2 4.) age <- age - 20 # vectors ```r= 1:5 (1:5)**2 vector <- 1:5 vector**2 a <- (1,3,5,7) ``` ```r= rm(a) #delete variables. use carefully ls() ``` #packages ```r= install.packages("knitr") install.packages("ggplot2") #gapminder data set https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv ``` set working directory to whichever folder you saved the gapminder dataset to read in data using read.csv() ``` gapminder <- read.csv("gapminder.csv") #use whatever name you named the csv you downloaded ``` 81 # SIGN in full name , email Gastelum, Erika, edgastel@ucsd.edu Gordon M. Magne, gmagne@ucsd.edu Juliane Alfen, jalfen@ucsd.esu Sora Park, sop006@ucsd.edu John Kim, jok015@ucsd.edu Yizhuo Liu, yil118@ucsd.edu oooooooooooo ssssssssssss Nicholas Valle njvalle@ucsd.edu Chenhao Nie. cnie@ucsd.edu Yu li, yul193@ucsd.edu Rebecca Howard, r1howard@ucsd.edu Stevinson Tendon, stendon@ucsd.edu Vorathip Plengpanit, vplengpa@ucsd.edu Katie Simonian, ksimonian@ucsd.edu Merik Manzano, mmanzano@ucsd.edu J Ernesto Castaneda, ecastaneda@ucsd.edu Jaeyeon, Park, jap013@ucsd.edu Wenjun Gong, w7gong@ucsd.edu Wilborn, Peter, pwilborn@ucsd.edu Bing Rethy, brethy@ucsd.edu Nick Heimann, nheimann@ucsd.edu Tomas Lavados, tlavados@UCSD.edu Woojong, Kim, w5kim@ucsd.edu Poonam Narewatt, pnarewat@ucsd.edu Emily Irion, eirion@ucsd.eduucsd.eduucsd.eduucsd.eduucsd.eduucsd.edu Earley, Elisabeth, eearley@ucsd.edu Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu Tino Tirado, ttirado@ucsd.edu Morgan Cohen, m7cohen@ucsd.edu Salma Shaikh, sshaikh@ucsd.edu Jeffrey Myers, jmyers@ucsd.edu shivangi gupta, shg011@ucsd.edu Stella Lin, s3lin@ucsd.edu Nikki Qi, haqi@ucsd.edu Koo Fum Kim, kfkim@ucsd.edu Tsu Ping, Wang, tsu002@ucsd.edu Chengan Li, chl030@ucsd.edu Dan Bee Lee, dbl001@ucsd.edu Elise Spencer, Enspencer@ucsd.edu Ziyuan Zhu,ziz063@ucsd.edu Jared Hernandez,jah014@ucsd.edu Emily Davalos,edavalos@ucsd.edu Chuyu Liu, chl082@ucsd.edu Meiyu Su, m2su@ucsd.edu Haoran Jiang, haj005@ucsd.edu May Han, xuh042@ucsd.edu Patricia Resurreccion, paresurr@ucsd.edu qihan huang, q7huang@ucsd.edu Zhibei Wang, zhw048@ucsd.edu Maykent Salazar mlsalazar@ucsd.edu Elizabeth Muthoni, emuthoni@ucsd.edu Yu Shi, yus064@ucsd.edu Milena Zeray mzeray@ucsd.edu Sunny Xu, qixu@ucsd.edu Rawlins, Mackenna, mjrawlins@ucsd.edu Broderick, Topil, btopil@ucsd.edu Mariya Nikseresht, mniksere@ucsd.edu Amanda Lee-Low, aleelow@ucsd.edu Kejun Chen, kec007@ucsd.edu Kwaadwo Asiedu, kasiedu@ucsd.edu maxwell chien, tcchien@ucsd David Reimer, dreimer@ucsd.edu Yishu Tang, yit013@ucsd.edu zizan wang, ziw011@ucsd.edu Alayna Bone, abone@ucsd.edu Yimeng Yang, yiy047@ucsd.edu Weixiao Guo, w3guo@ucsd.edu Collin Boudreaux, cboudreaux@ucsd.edu Tommy Fang z8fang@ucsd.edu Bowen Deng b2deng@ucsd.edu Bingru Li bil005@ucsd.edu Brenna Wayne bwayne@ucsd.edu Hyun Ji Jung, hjjung@ucsd.edu Yunxin Liu, yul188@ucsd.edu Yuki Imura. yimura@ucsd.edu Kevin Zhou, kezhou@ucsd.edu Hung-Yang Chien, hchien@ucsd.edu Meghan Mattioli, mazavala@ucsd.edu Xinyi,Du x8du@ucsd.edu Junhui Xu, jux008@ucsd.edu qihan huang q7huang@ucsd.edu Koo Fum Kim, kfkim@ucsd.edu # Notes: Reivew variables vectors set working directory ```r= read.csv("gapminder.csv") #read in gapminder data ``` view gapminder data ```r= View(gapminder) #remember View needs capital V ``` ```r= str(gapminder) #view data structure ``` ```r= head(gapminder) tail(gapminder) ``` Bracket notation ```r gapminder[1,1] gapminder[1,6] ``` ```r= gapminder[1:5, c(1,3,5)] ``` ```r= gapminder[gapminder$country == "Australia","lifeExp"] ``` ```r gapminder[gapminder$country=="Australia" & gapminder$year==1952, c("year","lifeExp")] ``` ## challenge Use your new ‘subsetting’ skills to display the life expectancy and GDPperCapita for people in Paraguay in 2007 answer: ```r= gapminder[gapminder$country=="Paraguay" & gapminder$year==2007, "lifeExp"] ``` writing out a table to csv file ```r= write.table( headGapminder, file="headGapminder.csv", sep=",", quote=FALSE, row.names=FALSE ) ``` Other functions: ```r= min(gapminder$lifeExp) max(gapminder$lifeExp) mean(gapminder$lifeExp) sd(gapminder$lifeExp) rnorm(1,3,2) rep(1,4) seq(1,6,2) `` # knitr package use Knitr to create reports that conveniently combine code, output (including plots), and notes in one document. start a new rmarkdown file from the RStudio file menu Rmarkdown cheat sheet: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf ``` 62 # 2022-02-01 ## Sign-in **full name, email** Sora Park, sop006@ucsd.edu John Kim, jok015@ucsd.edu Stella Lin, s3lin@ucsd.edu Nick Heimann, nheimann@ucsd.edu Wilborn, Peter, pwilborn@ucsd.edu Emily Davalos, edavalos@ucsd.edu Tomas Lavados, tlavados@UCSD.edu Nicholas Valle, njvalle@ucsd.edu Wenjun Gong, w7gong@ucsd.edu morgan cohen, m7cohen@ucsd.edu Yuki Imura, yimura@ucsd.edu Yizhuo Liu, yil118@ucsd.edu Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu Sam Cohen, szcohen@ucsd.edu Jeff Myers, jmyers@ucsd.edu Weixiao Guo, w3guo@ucsd.edu Haoran Jiang, haj005@ucsd.edu Marissa Myers, maclan@ucsd.edu Juliane Alfen, jalfen@ucsd.edu Mariya Nikseresht, mniksere@ucsd.edu Yaohong Wang, yaw045@ucsd.edu Ziyuan Zhu,ziz063@ucsd.edu Rebecca Howard, r1howard@ucsd.edu Sunny Xu,qixu@ucsd.edu Tino Tirado, ttirado@ucsd.edu Alayna Bone, abone@ucsd.edu Milena Zeray mzeray@ucsd.edu Dan Bee Lee, dbl001@ucsd.edu Chenhao Nie. cnie@ucsd.edu Salma Shaikh, sshaikh@ucsd.edu Jared Hernandez jah014@ucsd.edu May Han, xuh042@ucsd.edu Jasmine Moheb, jmoheb@ucsd.edu ernesto castaneda, ecastaneda@ucsd.edu Stevinson Tendon, stendon@ucsd.edu Vorathip Plengpanit, vplengpa@ucsd.edu yiy047@ucsd.edu Chengan Li, chl030@ucsd.edu Emily Irion, eirion@ucsd.edu elisabeth earley, eearley@ucsd.edu Meiyu Su, m2su@ucsd.edu Zhibei Wang, zhw048@ucsd.edu Meghan Mattioli, mazavala@ucsd.edu maxwell chien tcchien@ucsd.eud Elise Spencer, Enspencer@ucsd.edu Nikki Qi. haqi@ucsd.edu Bing Rethy, brethy.edu Bingru LI, bil005@ucsd.edu David Reimer, dreimer@ucsd.edu Elizabeth Muthoni, emuthoni@ucsd.edu Dan Bee Lee, dbl001@ucsd.edu Broderick Topil, btopiL@ucsd.edu Yu Li, yul193@ucsd.edu Patricia Resurreccion, paresurr@ucsd.edu Kejun Chen, kec007@ucsd.edu Jaeyeon, Park, jap0jap013@ucsd.edu Hung-Yang Chien, hchien@ucsd.edu Junhui Xu, jux008@ucsd.edu Zizan Wang, ziw011@ucsd.edu Yishu Tang, yit013@ucsd.edu Merik Manzano, mmanzano@ucsd.edu Chuyu Liu, chl082@ucsd.edu kwadwo Asiedu, kasiedu@ucsd.edu Collin Boudreaux, cboudreaux@ucsd.edu Hyun Ji Jung, hjjunc@ucsd.edu Rawlins, Mackenna, mjrawlins@ucsd.edu Bowen Deng b2deng@ucsd.edu Austin Brown, aubrown@ucsd.edu Amanda Lee-Low, aleelow@ucsd.edu Yunxin Liu, yul188@ucsd.edu Xinyi Du,x8du@ucsd.edu # NOTEs: ```r= #install.packages("dplyr") library(dplyr) ``` ```r= gapminder <- read.csv("gapminder.csv") # load data ``` ```r= gapminderCountry <- select(gapminder, country) ``` ```r= gapminder_year_country_lifeExp <- select(gapminder, year, country, lifeExp) ``` ```r= canada_data <- filter(gapminder, country == "Canada") str(canada_data) ``` ## using the Tidyverse package dplyr package filter() rows select() columns ```r= gap_europe <- gapminder %>% filter(continent == "Europe") gap_europe_lifeExp <- gap_europe %>% select(lifeExp, country, year) str(gap_europe_lifeExp) ``` ```r= gap_europe <- gapminder %>% filter(continent == "Europe") %>% select(lifeExp, country, year) gap_europe ``` ## Challenge 1 Make a new data frame called "New_Data" that has only the columns country, life expectancy, and year for only the countries in Africa ```r= new_data <- gapminder %>% filter(continent == "Africa") %>% select(country, lifeExp, year) str(new_data) ``` ```r= gap_no_lifeExp <- gapminder %>% select(-lifeExp) str(gap_no_lifeExp) ``` ## mutate() ```r= gap_GDP <- gapminder %>% mutate(GDP = gdpPercap * pop) str(gap_GDP) ``` ## if else ```r= gap_50 <- gapminder %>% mutate(is50 = ifelse(lifeExp >= 50,"over 50", "under 50")) str(gap_50) ``` ```r= gap_50 %>% filter(is50 == "under 50") %>% count() gap50 ``` ```r= gap50_num <- gapminder %>% mutate(is50 = ifelse(lifeExp >= 50,"over 50", "under 50")) str(gap50_num) ``` ```r= gap50_num %>% filter(is50 == "under 50") %>% count() gap_50 gap50_num ``` ```r= gap_50cont %>% filter(is50 == "under 50") %>% count() gap_50cont ``` ```r= gap_50cont50 %>% filter(is50 == "under 50") %>% count(continent, sort = TRUE) gap_50cont50 ``` # Challange 2 Create a new data frame called new_data2 from the gapminder data frame that has an additional column indicating if the population is “over 50 million people” or “less than 50 million people” answer: ```r= new_data2 <- gapminder %>% mutate(is50million = ifelse(pop >= 50000000,"over 50 million people", "under 50 million people")) str(new_data2) ``` ## group_by() and summarize() ```r= GdpPerCap_by_continent <- gapminder %>% group_by(continent) %>% summarize(meanGDPPerCap = mean(gdpPercap)) GdpPerCap_by_continent ``` ## Challange 3 Calculate the average life expectancy per country. Which country has the longest life expectancy? Which country has the shortest life expectancy? 67 # 2022-02-03 ## Sign-in **full name, email** John Kim, jok015@ucsd.edu Sora Park, sop006@ucsd.edu Emily Davalos, edavalos@ucsd.edu Poonam Narewatt, pnarewat@ucsd.edu Nick Heimann, nheimann@ucsd.edu Wilborn Peter, pwilborn@ucsd.edu Bing Rethy, brethy@ucsd.edu Tomas Lavados, tlavados@UCSD.edu Broderick Topil, btopil@ucsd.edu Merik Manzano, mmanzano@ucsd.edu Yu Li, yul193@ucsd.edu morgan cohen, m7cohen@ucsd.edu Chenhao Nie. cnie@ucsd.edu Yimeng Yang, yiy047@ucsd.edu Meiyu Su, m2su@ucsd.edu Milena Zeray, mzeray@ucsd.edu Yizhuo Liu, yil118@ucsd.edu Nikki Qi, haqi@ucsd.edu Jared Hernandez, jah014@ucsd.edu Patricia Resurreccion, paresurr@ucsd.edu Rebecca Howard, r1howard@ucsd.edu Sunny Xu, qixu@ucsd.edu Juliane Alfen, jalfen@ucsd.edu Zhibei Wang, zhw048@ucsd.edu Emily Irion, eirion@ucsd.edu Wenjun Gong, w7gong@ucsd.edu Jaeyeon, Park, jap013@ucsd.edu rawlins, mackenna, mjrawlins@ucsd.edu Salma Shaikh, sshaikh@ucsd.edu Hung-Yang (Jason) CHien, hchien@ucsd.edu Yishu Tang, yit013@ucsd.edu Kevin Zhou, kezhou@ucsd.edu Jeffrey Myers, jmyers@ucsd.edu Marissa Myers, maclan@ucsd.edu YukiImura, yimura@ucsd.edu Mariya Nikseresht, mniksere@ucsd.edu Stella Lin, s3lin@ucsd.edu Chengan Li, chl030@ucsd.edu Tino Tirado, ttirado@ucsd.edu Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu Elise Spencer, Enspencer@ucsd.edu Tung Cheh (Maxwell) Chien tcchien@ucsd.edu Ziyuan Zhu,ziz063@ucsd.edu Bowen Deng b2deng@ucsd.edu Chuyu Liu, chl082@ucsd.edu Amanda Lee-Low, aleelow@ucsd.edu DMay Han, xuh042@ucsd.edu Meghan Mattioli, mazavala@ucsd.edu David Reimer, dreimer@ucsd.edu koo fum kim, kfkim@ucsd.edu Alayna Bone, abone@ucsd.edu Yunxin Liu, yul188@ucsd.edu Dan Bee Lee, dbl001@ucsd.edu kwadwo asiedu, kasiedu@ucsd.edu Xinyi Du,x8du@ucsd.edu Brenna Wayne bwayne@ucsd.edu Collin Boudreaux, cboudreaux@ucsd.edu Elisabeth Earley, eearley@ucsd.edu Bingru LI, bil005@ucsd.edu Zizan Wang, ziw011@ucsd.edu Hyun Ji Jung, hjjung@ucsd.edu Sam Cohen, szcohen@ucsd.edu Kejun Chen, kec007@ucsd.edu Junhui Xu, jux008@ucsd.edu Elizabeth Muthoni, emuthoni@ucsd.edu Yaohong Wang, yaw045@ucsd.edu # Notes install and/or load packages: ```r= library(lubridate) library(dplyr) ``` These packages are a part of the the Tidyverse there is a Tidyverse package that installs several packages useful for data analysis library(tidyverse) ### review select() filter() count() mutate() group_by() summarize() ```r lifeExp_by_country <- gapminder %>% group_by(country) %>% summarize(MeanLifeExp = mean(lifeExp)) lifeExp_by_country ``` ```r= lifeExp_by_country <- gapminder %>% group_by(continent, country) %>% summarize(MeanLifeExp = mean(lifeExp)) View(lifeExp_by_country) ``` # tibbles A tibble is a new -ish data structure in tidyverse that is a lot like a data frame, but with some formatting differences and some other behind the scenes changes that are supposed to make working with data frames, particularly large ones simpler. # lists lists Lists are similar to a vector, where they consist of one or more values in a particular order, but unlike a list, the data types of those values do not all need to be the same type. ```r length(gapminder) ``` # Tidy data column is a variable row is an observation # rename() ```r= gap_GDPpc <- gapminder %>% rename(GDPpc = gdpPercap) ``` ## combining or adding rows to datasets ```r= new_row <- data_frame(country = "USA", year = 2002, pop = 1000000, continent = "Americas", lifeExp = 100, gdpPercap = 200) gapminder_taller <- rbind(gapminder, new_row) str(gapminder_taller) tail(gapminder_taller) ``` ```r= set.seed(1) new_column <- rnorm(1704) ``` ```r gap_wider <- cbind(gapminder, new_column) ``` # seperate() test <- data_frame(A = C(1,2,3), B = c("1-2", "2-3", "4-5")) ```r= heights <- c(2, 4, NA, 6, 8, NA, 12) mean(heights) max(heights) min(heights) heights <- c(2, 4, NA, 6, 8, NA, 12) mean(heights, na.rm = TRUE) max(heights, na.rm = TRUE) heights heights2 <- na.omit(heights) # returns a vector with the missing values removed heights2 heights2 <- heights[complete.cases(heights)] # returns a vector with complete cases ``` # identify NA ```r= is.na(heights) weights <- seq(1, 7,1) cbind(heights, weights) weights[is.na(heights)] # which values are missing ``` # Multiple comparisons ```r= gapminder %>% filter(country == "Canada" | country == "United States" | country == "Mexico") ``` ```r= gapminder %>% filter(country %in% c("Canada", "Mexico", "United States")) ``` 612 https://hackmd.io/4S3SFxUdQaOJRsBloTZftQ 61 # 2022-02-01 ## Sign-in full name , email John Kim, jok015@ucsd.edu Jaeyeon, Park, jap013@ucsd.edu Emily Davalos, edavalos@ucsd.edu Broderick Topil, btopil@ucsd.edu Tino Tirado, ttirado@ucsd.edu Elisabeth Earley, eearley@ucsd.edu Poonam Narewatt, pnarewat@ucsd.edu Amanda Lee-Low, aleelow@ucsd.edu Sora Park, sop006@ucsd.edu Bowen Deng b2deng@ucsd.edu morgan cohen, m7cohen@ucsd.edu Kevin Zhou, kezhou@ucsd.edu Yizhuo Liu, yil118@ucsd.edu Tomas Lavados, tlavados@ucsd.edu Wenjun Gong, w7gong@ucsd.edu Rebecca Howard, r1howard@ucsd.edu Yuki Imura, yimura@ucsd.edu Chenhao Nie. cnie@ucsd.edu Elise Spencer, Enspencer@ucsd.edu Chengan Li, chl030@ucsd.edu Milena Zeray, mzeray@ucsd.edu Maxwell Chien, tcchien@ucsd.edu Zhibei Wang, zhw048@ucsd.edu Sam Cohen, szcohen@ucsd.edu Marissa Myers, maclan@ucsd.edu Jeffrey Myers, jmyers@ucsd.edu Yu Li, yul193@ucsd.edu Yimeng Yang, yiy047@ucsd.edu Merik Manzano, mmanzano@ucsd.edu Bingru LI, bil005@ucsd.edu Salma Shaikh, sshaikh@ucsd.edu Nikki Qi, haqi@ucsd.edu David Reimer, dreimer@ucsd.edu Xinyi Du, x8du@ucsd.edu Dahlia Lopez del006@ucsd.edu Sunny Xu,qixu@ucsd.edu qihan huang q7huang@ucsd.edu Elizabeth Muthoni, emuthoni@ucsd.edu Chuyu Liu, chl082@ucsd.edu Kejun Chen, kec007@ucsd.edu Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu Jared Hernandez, jah014@ucsd.edu Vorathip Plengpanit, vplengpa@ucsd.edu Junhui Xu, ju x008@ucsd.edu Patricia Resurreccion, paresurr@ucsd.edu Zizan Wang, ziw011@ucsd.edu Meiyu Su, m2su@ucsd.edu Kwadwo asiedu, kasiedu@ucsd.edu Stevinson Tendon, stendon@ucsd.edu Yishu Tang, yit013@ucsd.edu Emily Irion, eirion@ucsd.edu Bing Rethy, brethy@ucsd.edu Ziyuan Zhu,ziz063@ucsd.edu Mariya Nikseresht, mniksere@ucsd.edu Yaohong Wang, yaw045@ucsd.edu Dan Bee Lee, dbl001@ucsd.edu Meghan Mattioli, mazavala@ucsd.edu Yunxin Liu, yul188@ucsd.edu Rawlins, Mackenna, mjrawlins@ucsd.edu Wilborn, Peter, pwilborn@ucsd.edu Hung-Yang (Jason) Chien, hchien@ucsd.edu Hyun Ji Jung, hjjung@ucsd.edu May Han, xuh042@ucsd.edu Brenna Wayne bwayne@ucsd.edu # creating plost with ggplot 2 load ggplot package ```r= library(ggplot2) ``` load data ```r= gapminder <- read.csv(file = "gapminder.csv") ``` ```r= ggplot(data= gapminder, aes(x = gdpPercap, y=lifeExp)) + geom_point() ``` Challange 1: Create a scatter plot showing life expectancy on the y-axis and population on the x-axis Answer: ```r= ggplot(data = gapminder, aes(x = lifeExp, y = pop)) + geom_point() ``` saving a plot ```r= ggsave("filename.pdf") #give your file name with extension to save the plot it will show up in your R working directory ``` you can also save your plot using the pdf() function ```r= pdf("gdpPercap_vs_time.pdf", width=8, height=8) ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() dev.off() #make sure to use this line if you use the pdf() function it turns off the pdf() ``` ```r= ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + theme_classic() ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + theme_minimal() ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + theme_bw() ``` ### line plots ```r= ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) + geom_line() ``` ```r= P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, by = continent)) P <- P + geom_line() P ``` ```r= P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) P <- P + geom_line() P <- P + geom_point(color = "black") P ``` changing layer order: put color first, then line ```r= P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) P <- P + geom_point(color = "black") P <- P + geom_line() P ``` #manually select colors ```r= P <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) P <- P + geom_point(color = "black") P <- P + geom_line(color = "darkOliveGreen") P ``` manually specifying colors for continents ```r= p <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) p <- p + geom_line() p <- p + scale_colour_manual(values = c('red','pink','yellow','black','green')) p ``` The ggplot2 cheatsheet https://www.rstudio.com/resources/cheatsheets/ change the plot to points ```r= p <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) p <- p + geom_point() #change this to here to point p <- p + scale_colour_manual(values = c('red','pink','yellow','black','green')) p ``` ### labels ```r= p <- ggplot(data= gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) p <- p + geom_line() p <- p + labs(title = "Life Exp over time", x = "year", y = "Life Expectancy") #label text added here p ``` #modifying x & y scales ```r p <- ggplot(data= gapminder, aes(x = year, y = lifeExp, by = country, color = continent)) p <- p + geom_line() p <- p + labs(title = "Life Exp over time", x = "year", y = "Life Expectancy") p <- p + scale_y_continuous(limits = c(0,100), breaks = seq(0, 100, 10), labels = seq(0, 100, 10) ) #limits = numbers, breaks = tic marks, labels = for marks p <- p + scale_x_continuous(limits = c(1952, 2007), breaks = seq(1952,2007, 10), labels = seq(1952,2007, 10) ) #limits = numbers, breaks = tic marks, labels = for marks p ``` ## Challenge 2 Create a scatter plot with year on the X-axis and population on the y-axis. Create an appropriate title for the plot Change the x and y axis titles to something more readable Change the x and y axis valuesto something more useful answer: ```r p <- ggplot(data = gapminder, aes(x = year, y = pop)) p <- p + geom_point() p <- p + labs(title = "Challenge", x = "year", y = "population") p <- p + scale_y_continuous(limits = c(0,2000000000) , breaks = seq(0, 2000000000, 1000000000 ) , labels = seq(0, 2000000000, 1000000000 ) ) p <- p + scale_x_continuous(limits = c(1952,2017) , breaks = seq(1952, 2017,15) , labels = seq(1952, 2017,15)) p ``` 59 # sub setting data 59 # 2022-02-10 week 3 R ## Sign-in John Kim, jok015@ucsd.edu Yuki Imura, yimura@ucsd.edu Sora Park, sop006@ucsd.edu Broderick Topil, btopil@ucsd.edu Yizhuo Liu, yil118@ucsd.edu Jared Hernandez, jah014@ucsd.edu Emily Davalos, edavalos@ucsd.edu Sunny Xu,qixu@ucsd.edu Vorathip Plengpanit, vplengpa@ucsd.edu Wenjun Gong, w7gong@ucsd.edu Tino Tirado, ttirado@ucsd.edu Stevinson Tendon, stendon@ucsd.edu Chenhao Nie. cnie@ucsd.edu Salma Shaikh, sshaikh@ucsd.edu Hyun Ji Jung, hjjung@ucsd.edu Poonam Narewatt, pnarewat@ucsd.edu Tomas Lavados, tlavados@UCSD.edu David Reimer, dreimer@ucsd.edu Wilborn, Peter, pwilborn@ucsd.edu Nikki Qi, haqi@ucsd.edu Qihan Huang, q7huang@ucsd.edu Valentina Chanci Arrubla, vchanciarrubla@ucsd.edu Emily Irion, eirion@ucsd.edu Bowen Deng b2deng@ucsd.edu Chuyu Liu, chl082@ucsd.edu Meiyu Su, m2su@ucsd.edu Bing Rethy, brethy@ucsd.edu Rebecca Howard, r1howard@ucsd.edu Milena Zeray, mzeray@ucsd.edu Yishu Tang, yit013@ucsd.edu maxwell chien,tcchien@ucsd.edu Meghan Mattioli, mazavala@ucsd.edu Yu Li, yul193@ucsd.edu Amanda Lee-Low, aleelow@ucsd.edu Chengan Li, chl030@ucsd.edu morgan cohen, m7cohen@ucsd.edu Jeffrey Myers, jmyers@ucsd.edu Marissa Myers, maclan@ucsd.edu Zhibei Wang, zhw048@ucsd.edu Xinyi Du, x8du@ucsd.edu Yunxin Liu, yul188@ucsd.edu Bingru LI, bil005@ucsd.edu Elizabeth Muthoni, emuthoni@ucsd.edu Junhui Xu, jux008@ucsd.edu Kejun Chen, kec007@ucsd.edu Yaohong Wang,yaw045@ucsd.edu Dahlia Lopez, del006@ucsd.edu Rawlins, Mackenna, mjrawlins@ucsd.edu Kwadwo Asiedu, kasiedu@ucsd.edu Jaeyeon Park, jap013@ucsd.edu Mariya Nikseresht, mniksere@ucsd.edu Ziyuan Zhu,ziz063@ucsd.edu Elise Spencer, Enspencer@ucsd.edu Stella Lin, s3lin@ucsd.edu Collin Boudreaux, cboudreaux@ucsd.edu Merik Manzano, mmanzano@ucsd.edu Patricia Resurreccion, paresurr@ucsd.edu Zizan Wang, ziw011@ucsd.edu ** assign gapminder data to a varialble** ```r dataOceania <- gapminder %>% filter(continent == "Oceania") str(dataOceania) ``` ```r= p <- ggplot(data = dataOceania, aes(x = year, y = lifeExp, by = country)) p <- p + geom_line() p ``` ```r= p <- ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) p <- p + geom_line(color = "red") p <- p + geom_line(data = dataOceania, aes(x = year, y = lifeExp, by = country), color = 'black') p ``` # Challange 4 Create a scatter plot with year on the X-axis and population on the y-axis. Choose a continent to highlight in a different color (you can choose any continent and any colors) ```r= p <- ggplot(data = gapminder, aes(x = year, y = pop)) p <- p + geom_point(color = "green") p <- p + geom_point(data = gapAsia, aes(x = year, y = pop), color = 'yellow') p ``` # transparency ```r= p <- ggplot(data = gapminder, aes(x = year, y = pop)) p <- p + geom_point(color = "black", alpha = 0.05) p <- p + scale_y_log10() p <- p + labs(y = "log(pop)") p ``` ```r= # Jitter p <- ggplot(data = gapminder, aes(x = year, y = pop)) p <- p + geom_jitter(color = "black", alpha = 0.05) p <- p + scale_y_log10() p <- p + labs(y = "log(pop)") p ``` ```r= # trend line p <- ggplot(data = gapminder, aes(x = year, y = pop)) p <- p + geom_jitter(color = "black", alpha = 0.05, width = 1, height = 0) p <- p + scale_y_log10() p <- p + geom_smooth(method = "lm") p <- p + labs(y = "log(pop)") p ``` ```r= # adjust line size p <- ggplot(data = gapminder, aes(x = year, y = pop)) p <- p + geom_jitter(color = "black", alpha = 0.05, width = 1, height = 0) p <- p + scale_y_log10() p <- p + geom_smooth(method = "lm", size = 2) p <- p + labs(y = "log(pop)") p ``` # box plots ```r gap2007 <- gapminder %>% filter(year == 2007) #populations per continent p <- ggplot(data = gap2007, aes(x = continent, y = pop)) p <- p + geom_boxplot() p <- p + scale_y_log10() p ``` ```r= gapminder2007 <- gapminder %>% filter(year == 2007) P <- ggplot(data = gapminder2007, aes(x = continent, y = pop)) P <- P + geom_boxplot() P <- P + geom_jitter(height = 0, width = 0.1) P <- P + scale_y_log10() P <- P + labs(y = "Log(Pop)") P ``` ```r= meanlifeExpContinent <- gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarize(MeanLifeExp = mean(lifeExp), SDLifeExp = sd(lifeExp)) meanlifeExpContinent ``` ```r= p <- ggplot(data = meanlifeExpContinent, aes(x = continent, y = MeanLifeExp)) p <- p + geom_col() p ``` # multipanel plots ```r= europe <- gapminder %>% filter(continent == "Europe") P <- ggplot(data = europe, aes(x = year, y = pop)) P <- P + geom_line() P <- P + facet_wrap(~ country) P ``` ```r europe <- gapminder %>% filter(continent == "Europe") P <- ggplot(data = europe, aes(x = year, y = pop)) P <- P + geom_line() P <- P + facet_wrap(~ country) P <- P + theme(axis.text.x = element_text(angle = 45)) P ``` Challange Create a plot of life expectancy over year for each country in Africa Make the x and y axis labels nicer ```r= africas <- gapminder %>% filter(continent == "Africa") P <- ggplot(data = africas, aes(x = year, y = lifeExp)) P <- P + geom_line() P <- P + facet_wrap(~ country) P <- P + labs (x = "Year", y = "Life Expectancy") P <- P + theme(axis.text.x = element_text(angle = 45)) P ```