# DIBSI Tigers Day 3
- **Day 1-2 hackmd:** https://hackmd.io/KYExEMA4CMAYFYC0BGA7NAzIgLMgxrItHuOIusNhqhpOMrAGZA==?both
***
### Link to data carpentry lesson: http://www.datacarpentry.org/R-ecology-lesson/01-intro-to-r.html
Comic on the `sudo` command: https://imgs.xkcd.com/comics/sandwich.png
## R and Rstudio with Tracy Teal
R can do math!
`3+4`
R is so powerful because we can write scripts in the empty 4th panel in RStudio and it will create a `.R` file. That way we can save what we did!
- To make a new script you can click on the green plus sign in the top left corner in RStudio.
- Once you start typing in your rscript you can press `command enter` on the line to send it down the R console.
- **Don't forget to save your files!**
- **To comment in R use `#`** This means that R will not run whatever comes after the pound symbol.
- Comments are super helpful and will help you be a friend of yourself in the future! And also make friends with others that will use your code
# Current `tigersrock.R` file:
```
# assign output to a
a <- 4
# output a
a
# assign 3 to b
b <- 3
# add a + b
a + b
# Spacing makes your code more readable
c<-7
c
# define pi
pi <- 3.1415
round(pi) # should round to 3!
round(pi, 3)
# Specifically stating digits = 3, will help us as users be sure the the round function is doing *exactly* what we'd like! :)
round(pi, digits = 3)
# To get help on the round function do:
?round
############ Challenge question ################
#What are the values after each statement in the following?
mass <- 47.5 # mass?
age <- 122 # age?
mass <- mass * 2.0 # mass?
age <- age - 20 # age?
mass_index <- mass/age # mass_index?
# This will give us an error as we haven't defined what hello means
hello
# It's a *bad* idea to make variable names the same name as R functions! Avoid doing this for your safety and others'.
round <- 75.7
round(round)
# Download the survey data
download.file("https://ndownloader.figshare.com/files/2292169","portal_data_joined.csv")
# read the data to an object named surveys
surveys <- read.csv("portal_data_joined.csv")
surveys
View(surveys)
# show beginning of the data
head(surveys)
?help # Go down to the examples for extra awesome helpful stuff!
head(surveys, n = 2)
# Structure of the data
str(surveys)
### Based on the output of str(surveys), can you answer the following questions?
#What is the class of the object surveys? It's a data.frame!
# How many rows and how many columns are in this object? 34786 rows and 13 columns/variables!
- **How many species have been recorded during these surveys?**
- 40 species (due to 40 factor levels)
str()
# just get the first row of data
surveys[1,] #
surveys[1,4] # 4th column value in 1st row
# first 3 rows
surveys[1:3]
# Make a new dataframe with 100 rows
surveys_100 <- survyes[1:100,]
surveys$year
#### INSTALLING PACKAGES!
install.packages("dplyr")
library(dplyr)
select(surveys, species)
select(surveys, species, genus)
select(surveys, genus, species)
filter(surveys, years == 1995)
surveys_cool <-
surveys %>%
filter(year == 1999) %>%
select(year, species_id, weight)
str(surveys_cool)
surveys_not_cool <-
surveys %>%
# We want to filter every year EXCEPT 1999
filter(year != 1999) %>%
select(year, species_id, weight)
str(surveys_cool)
surveys %>%
group_by(sex) %>%
summarize(mean_weight = mean(weight, na.rm = TRUE))
###Afternoon ####
install.packages("ggplot2")
library(ggplot2) #include the ggplot2 functions
surveys
#How to get the column year?
surveys[,4]
surveys[,"year"]
select(surveys, year) #dplyr
# How to get the rows of 1999 in base R?
subset(surveys, year == 1999)
# Pipes
surveys %>%
head()
surveys %>%
filter(weight < 5) %>%
select(species_id, sex, weight) %>%
head()
#Mutate
surveys %>%
filter(weight < 5) %>%
mutate(weight_kg = weight / 1000) %>% # Make a new column named weight in kg
head()
?is.na
# filter for NA (keep only the rows where the weight column has an NA)
surveys %>%
filter(is.na(weight)) %>%
mutate(weight_kg = weight / 1000) %>% # Make a new column named weight in kg
head()
# using ! inverts the selection. The following removes all that have NA (or keeps only the NON NA rows)
surveys %>%
filter(!is.na(weight)) %>%
mutate(weight_kg = weight / 1000) %>% # Make a new column named weight in kg
head()
# Pipes
surveys %>% # Select survey data
filter(weight < 5) %>% # filtering out samples that weigh less than 5
# Only pick 3 columns
select(species_id, sex, weight) %>%
head() # Only view the top 6 rows
# Mutate
surveys %>%
#filter(weight < 5) %>%
filter(!is.na(weight)) %>%
mutate(weight_kg = weight / 1000, # Make a new column named weight in kg
day_of_analysis = "June_29") %>%
View()
# Dimensions of our data
dim(surveys)
# Spread command
surveys_gw <-
surveys %>%
filter(!is.na(weight)) %>%
group_by(genus, plot_id) %>%
summarize(mean_weight = mean(weight)) # Calculate mean weight
head(surveys_gw)
install.packages("tidyr")
library(tidyr)
?spread
surveys_gw_wide <-
surveys_gw %>%
spread(genus, mean_weight, fill = 0)
head(surveys_gw_wide)
surveys_gw_wide %>%
cor(use = "pairwise.complete")
# Long formatted Data
surveys_gw_long <-
surveys_gw_wide %>%
gather(genus, mean_weight, -plot_id)
View(surveys_gw_long)
## TIME TO PLOT!
surveys_complete <-
surveys %>%
filter(species_id != "",
!is.na(weight),
!is.na(hindfoot_length),
sex != "")
surveys_plot <-
ggplot(data = surveys_complete,
aes(x = weight, y = hindfoot_length)) +
geom_point()
ggplot(data = surveys_complete,
aes(x = weight, y = hindfoot_length)) +
geom_point(aes(shape = sex), alpha = 0.1, color = "blue")
ggplot(data = surveys_complete,
aes(x = weight, y = hindfoot_length)) +
geom_point(aes(shape = sex, color = sex), alpha = 0.2, size = 3)
ggplot(data = surveys_complete,
aes(x = weight, y = hindfoot_length)) +
# color based on species ID
geom_point(aes(color = species_id), alpha = 0.1)
ggplot(data = surveys_complete,
aes(x = species_id, y = hindfoot_length)) +
geom_boxplot(aes(color = species_id)) +
theme(axis.text.x = element_text(angle = 30))
yearly_counts <-
surveys_complete %>%
group_by(year, species_id) %>%
tally()
ggplot(data = yearly_counts,
aes(x = year, y = n, group = species_id, color = species_id)) +
geom_line() +
labs(y = "Species Count", x = "Year") +
# facet_wrap
facet_wrap(~ species_id)
## ALL IN ONE GO
surveys_complete %>%
group_by(year, species_id) %>%
tally() %>%
ggplot(aes(x = year, y = n, group = species_id, color = species_id)) +
geom_line() +
labs(y = "Species Count", x = "Year") +
# facet_wrap
facet_wrap(~ species_id) +
theme_classic()
``
- **Figshare** is an awesome website to share your data!
- More information about it here: https://figshare.com/
> **Good Practice** When loading data in R, it's always good to first do a `head` and then a `str` to make sure that everything checks out fine.
>