---
tags: bym4102
---
This document is available [online](https://hackmd.io/@_XewoO5-TXqsfl7SIIJhLQ/r1isbm3H_/edit)
# 1 Joins
Please visit [Data Transformation Cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) at [RStudio cheatsheet site](https://www.rstudio.com/resources/cheatsheets/) page
## 1.1 mutating joins
let's generate some toy data
```r
library(tidyverse)
band <- tribble(
~name, ~band,
"Mick", "Stones",
"John", "Beatles",
"Paul", "Beatles"
)
# same as
# tibble(name=c("Mick","John","Paul"), band=c("Stones","Beatles","Beatles"))
# OR
# data_frame(name=c("Mick","John","Paul"), band=c("Stones","Beatles","Beatles"))
# OR
# data.frame(name=c("Mick","John","Paul"), band=c("Stones","Beatles","Beatles"))
# please check out datapasta and its datapasta::tribble_paste() function
instrument <- tribble(
~name, ~plays,
"John", "guitar",
"Paul", "bass",
"Keith", "guitar"
)
instrument2 <- tribble(
~artist, ~plays,
"John", "guitar",
"Paul", "bass",
"Keith", "guitar"
)
```
Let's add a column for vocalist
```r
band %>%
left_join(instrument) %>%
mutate(vocalist=if_else(is.na(plays), TRUE, FALSE))
```
### 1.1.1 Exercise
Please join the following table with `iris` data frame
```r
tibble::tribble(
~Species, ~State,
"virginica", "Virginia",
"versicolor", "California"
)
```
### 1.1.2 student exercises
#### ENY
```r
new <- tribble(
~CellLine, ~Race,
"bladder T24", "African",
"LeukemiaAML HL-60", "Asian",
"Osteo U-2 OS", "African",
"BreastCancer MDA-MB-231", "Asian"
)
read_csv("https://static-content.springer.com/esm/art%3A10.1186%2Fgb-2013-14-10-r115/MediaObjects/13059_2013_3156_MOESM19_ESM.csv") %>%
select(CellLine, Tissue, AgeAcceleration) %>%
left_join(new)
```
#### ENA
* Animal
```r
animal_list <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/MASS/Animals.csv") %>%
rename(animal=X1)
color <- tibble::tribble(
~animal, ~Color,
"Mountain beaver", "white",
"Cow", "black",
"Grey wolf", "grey"
)
left_join(animal_list, color)
```
* Electricity-water-gas consumption :+1:
:warning: 'join' can only insert one column to another. If you wanna insert more than one column u should use join two times:
```r
electricity <- read_csv("https://query.data.world/s/f4o6wvgdmw4sbudsvzra3dxvctwvrk")
water <- read_csv("https://query.data.world/s/oatsgkasj6pboou26gtyi5mp5uospk")
gas <- read_csv("https://query.data.world/s/iddoto7ptw4nhhrswtvo6vlxl4crna")
electricity %>%
inner_join(water, by="Date" ) %>%
inner_join(gas, by="Date", suffix=c(".water",".gas"))
```
#### FAA
```r
library(tidyverse)
gapp<-read_csv("https://pkgstore.datahub.io/core/gdp/gdp_csv/data/0048bc8f6228d0393d41cac4b663b90f/gdp_csv.csv")
happy <- data_frame("Country Name"=c("European Union","European Union","European Union","European Union","European Union"),Year=c(2012,2013,2014,2015,2016),Happiness=c("Yes","Yes","Yes","Yes","Yes"))
happy %>%
inner_join(gapp,.)
```
#### FK
```r
example <- tibble::tribble(
~state, ~fullname,
"AR", "Arkansas",
"CA", "California",
"AL", "Alabama",
"AZ", "Arizona")
data2 <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/CigarettesSW.csv")
data2 %>%
select(state,year,packs) %>%
right_join(example)
```
#### FZT
```r
new_data <- data.frame(menopause = c("ge40", "premeno", "post40"), oncogene = c("TP53", "BRC", "ERBB2"))
read_csv("https://pkgstore.datahub.io/machine-learning/breast-cancer/breast-cancer_csv/data/cfade1d9730bf286d3a83abf7d48829d/breast-cancer_csv.csv") %>%
select(menopause, `tumor-size`, `deg-malig`) %>%
left_join(new_data , .)
```
#### MC
Checking if the animal exists in the zoo
```r
library(tidyverse)
zoo_list <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/MASS/Animals.csv") %>%
rename(animal=X1)
varmidir <- tibble::tribble(
~animal, ~existence,
"Goat", "yes",
"Kangaroo", "yes",
"Rabbit", "no"
)
inner_join(zoo_list, varmidir)
```
#### MG
```r
a <- tribble(
~LOCATION, ~CONTINENT,
"AUS", "Europe",
"BEL", "Europe",
"CAN", "N. America",
"JPN", "Asia",
"MEX", "S. America"
)
read_csv("https://pkgstore.datahub.io/core/pharmaceutical-drug-spending/data_csv/data/db46fb3c420e7100e1d2b1f973e2cbcd/data_csv.csv") %>%
right_join(a)
```
#### ZA
## 1.2 filtering joins
## 1.3 the case about NA
# 2 Pivoting data / wide vs longer
## 2.1 pivot_longer
formerly known as `gather`
## 2.2 pivot_wider
formerly known as `spead`