DIBSI Tigers Day 3

# DIBSI Tigers Day 3 - **Day 1-2 hackmd:** https://hackmd.io/KYExEMA4CMAYFYC0BGA7NAzIgLMgxrItHuOIusNhqhpOMrAGZA==?both *** ### Link to data carpentry lesson: http://www.datacarpentry.org/R-ecology-lesson/01-intro-to-r.html Comic on the `sudo` command: https://imgs.xkcd.com/comics/sandwich.png ## R and Rstudio with Tracy Teal R can do math! `3+4` R is so powerful because we can write scripts in the empty 4th panel in RStudio and it will create a `.R` file. That way we can save what we did! - To make a new script you can click on the green plus sign in the top left corner in RStudio. - Once you start typing in your rscript you can press `command enter` on the line to send it down the R console. - **Don't forget to save your files!** - **To comment in R use `#`** This means that R will not run whatever comes after the pound symbol. - Comments are super helpful and will help you be a friend of yourself in the future! And also make friends with others that will use your code # Current `tigersrock.R` file: ``` # assign output to a a <- 4 # output a a # assign 3 to b b <- 3 # add a + b a + b # Spacing makes your code more readable c<-7 c # define pi pi <- 3.1415 round(pi) # should round to 3! round(pi, 3) # Specifically stating digits = 3, will help us as users be sure the the round function is doing *exactly* what we'd like! :) round(pi, digits = 3) # To get help on the round function do: ?round ############ Challenge question ################ #What are the values after each statement in the following? mass <- 47.5 # mass? age <- 122 # age? mass <- mass * 2.0 # mass? age <- age - 20 # age? mass_index <- mass/age # mass_index? # This will give us an error as we haven't defined what hello means hello # It's a *bad* idea to make variable names the same name as R functions! Avoid doing this for your safety and others'. round <- 75.7 round(round) # Download the survey data download.file("https://ndownloader.figshare.com/files/2292169","portal_data_joined.csv") # read the data to an object named surveys surveys <- read.csv("portal_data_joined.csv") surveys View(surveys) # show beginning of the data head(surveys) ?help # Go down to the examples for extra awesome helpful stuff! head(surveys, n = 2) # Structure of the data str(surveys) ### Based on the output of str(surveys), can you answer the following questions? #What is the class of the object surveys? It's a data.frame! # How many rows and how many columns are in this object? 34786 rows and 13 columns/variables! - **How many species have been recorded during these surveys?** - 40 species (due to 40 factor levels) str() # just get the first row of data surveys[1,] # surveys[1,4] # 4th column value in 1st row # first 3 rows surveys[1:3] # Make a new dataframe with 100 rows surveys_100 <- survyes[1:100,] surveys$year #### INSTALLING PACKAGES! install.packages("dplyr") library(dplyr) select(surveys, species) select(surveys, species, genus) select(surveys, genus, species) filter(surveys, years == 1995) surveys_cool <- surveys %>% filter(year == 1999) %>% select(year, species_id, weight) str(surveys_cool) surveys_not_cool <- surveys %>% # We want to filter every year EXCEPT 1999 filter(year != 1999) %>% select(year, species_id, weight) str(surveys_cool) surveys %>% group_by(sex) %>% summarize(mean_weight = mean(weight, na.rm = TRUE)) ###Afternoon #### install.packages("ggplot2") library(ggplot2) #include the ggplot2 functions surveys #How to get the column year? surveys[,4] surveys[,"year"] select(surveys, year) #dplyr # How to get the rows of 1999 in base R? subset(surveys, year == 1999) # Pipes surveys %>% head() surveys %>% filter(weight < 5) %>% select(species_id, sex, weight) %>% head() #Mutate surveys %>% filter(weight < 5) %>% mutate(weight_kg = weight / 1000) %>% # Make a new column named weight in kg head() ?is.na # filter for NA (keep only the rows where the weight column has an NA) surveys %>% filter(is.na(weight)) %>% mutate(weight_kg = weight / 1000) %>% # Make a new column named weight in kg head() # using ! inverts the selection. The following removes all that have NA (or keeps only the NON NA rows) surveys %>% filter(!is.na(weight)) %>% mutate(weight_kg = weight / 1000) %>% # Make a new column named weight in kg head() # Pipes surveys %>% # Select survey data filter(weight < 5) %>% # filtering out samples that weigh less than 5 # Only pick 3 columns select(species_id, sex, weight) %>% head() # Only view the top 6 rows # Mutate surveys %>% #filter(weight < 5) %>% filter(!is.na(weight)) %>% mutate(weight_kg = weight / 1000, # Make a new column named weight in kg day_of_analysis = "June_29") %>% View() # Dimensions of our data dim(surveys) # Spread command surveys_gw <- surveys %>% filter(!is.na(weight)) %>% group_by(genus, plot_id) %>% summarize(mean_weight = mean(weight)) # Calculate mean weight head(surveys_gw) install.packages("tidyr") library(tidyr) ?spread surveys_gw_wide <- surveys_gw %>% spread(genus, mean_weight, fill = 0) head(surveys_gw_wide) surveys_gw_wide %>% cor(use = "pairwise.complete") # Long formatted Data surveys_gw_long <- surveys_gw_wide %>% gather(genus, mean_weight, -plot_id) View(surveys_gw_long) ## TIME TO PLOT! surveys_complete <- surveys %>% filter(species_id != "", !is.na(weight), !is.na(hindfoot_length), sex != "") surveys_plot <- ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point() ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point(aes(shape = sex), alpha = 0.1, color = "blue") ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point(aes(shape = sex, color = sex), alpha = 0.2, size = 3) ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + # color based on species ID geom_point(aes(color = species_id), alpha = 0.1) ggplot(data = surveys_complete, aes(x = species_id, y = hindfoot_length)) + geom_boxplot(aes(color = species_id)) + theme(axis.text.x = element_text(angle = 30)) yearly_counts <- surveys_complete %>% group_by(year, species_id) %>% tally() ggplot(data = yearly_counts, aes(x = year, y = n, group = species_id, color = species_id)) + geom_line() + labs(y = "Species Count", x = "Year") + # facet_wrap facet_wrap(~ species_id) ## ALL IN ONE GO surveys_complete %>% group_by(year, species_id) %>% tally() %>% ggplot(aes(x = year, y = n, group = species_id, color = species_id)) + geom_line() + labs(y = "Species Count", x = "Year") + # facet_wrap facet_wrap(~ species_id) + theme_classic() `` - **Figshare** is an awesome website to share your data! - More information about it here: https://figshare.com/ > **Good Practice** When loading data in R, it's always good to first do a `head` and then a `str` to make sure that everything checks out fine. >

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.