INBO CODING CLUB

# INBO CODING CLUB 29 March 2022 Welcome! ## Share your code snippet If you want to share your code snippet, copy paste your snippet within a section of three backticks (```): As an **example**: ``` library(tidyverse) ``` (*you can copy paste this example and add your code further down*) ## Challenge 1 ``` RAISA *datums staan in een ander formaat (dd/mm/yyyy vs mm/dd/yyyy) *bij een van de tabellen staan species en geslacht in 1 kolom *Eenheden staan bij slechts 1 kolom bij de meting (g) *Andere kolomnamen *Kalibratiecorrectheid kan beter in een variabele (boolean) worden opgenomen ``` ``` Amber Issues * Two tables in one sheet, but similar data so can be combined into one * Columns not consistent in the two tables * Date is written differently * species & sex divided vs combined * weight with different column name and units in the name vs in the value box * calibration as a variable ``` ## Challenge 2 ``` RAÏSA * There's one observation with maturity "A" which is not defined * Focal_N: Does the '/' signify that it has not been followed before (0) or is it unknown how often it has been followed before? * Sex: should be 'U' in stead of '/' for undetermined * Sea_State has undefined coding ARNO * Remove remarks * Remove the first line * change / to NA, U as NA ``` ``` Amber Issues * Different types of data in one table, should be split * Either sample ID or date & location as primary key * one table with sample data (sample id, date, location, movie) * one table with environmental data (date, location, cloud cover, sea state, temperature, tide) * one table with divers data (date, location, proximity, divers_NT, divers_N) * one table with manta data (sample ID, ID cat, focalN, sex, maturity, pregnant, injury, males_t, comments) * comments at the columns can go in metadata or description tables * replace / by NA * also the explanation of the values in manta data columns can go in metadata or lookup tables ``` ## Challenge 3 ARNO 3.1 habitat_types <- read.table("20220329_habitat_types.txt", sep=",", header=TRUE) habitat_types2 <- separate(habitat_types, hab_type, sep=";", into=c("hab1", "hab2", "hab3", "hab4") ) habitat_types2 <- separate_rows(habitat_types, hab_type, sep=";") 3.2 habitat_types_2 <- read.table("20220329_habitat_types_2.txt", sep=",", header=TRUE) ht_pivot <- pivot_longer(habitat_types_2, cols=3:9, names_to="hab_type", values_to="Bolean") ht_pivot <- select(ht_pivot[ht_pivot$Bolean,], -Bolean) ``` RAISA ## CHALLENGE 3.1 # read data habitats_raw <- read_csv("./data/20220329/20220329_habitat_types.txt") %>% separate_rows(hab_type, sep = ";") ## CHALLENGE 3.2 # read data habitats2_raw <- read_csv("./data/20220329/20220329_habitat_types_2.txt") %>% pivot_longer(cols = 3:9, names_to = "habitat_type", values_to = "value") %>% filter(value == TRUE) %>% select(-value) ## CHALLENGE 3.3 #' Do you get warnings? #' Sure, the number of columns is not constant: data are NOT TIDY! camtrap_data <- read.table("./data/20220329/20220329_camtrap_data.txt", sep = ",", col_names = paste0("V",seq_len(21))) camtrap_data camtrap_data_meta <- read_csv("./data/20220329/20220329_camtrap_metadata.txt") %>% mutate(record_type = as.factor(record_type)) levels(camtrap_data_meta$record_type) deployment <- camtrap_data %>% filter(V1 == "deployment") %>% select(2:(1 + sum(camtrap_data_meta$record_type == "deployment"))) colnames(deployment) <- unlist( camtrap_data_meta[camtrap_data_meta$record_type == "deployment","column_name"]) multimedia <- camtrap_data %>% filter(V1 == "multimedia") %>% select(2:(1 + sum(camtrap_data_meta$record_type == "multimedia"))) colnames(multimedia) <- unlist( camtrap_data_meta[camtrap_data_meta$record_type == "multimedia","column_name"]) observation <- camtrap_data %>% filter(V1 == "observation") %>% select(2:min((1 + sum(camtrap_data_meta$record_type == "observation")),17)) colnames(observation) <- unlist( camtrap_data_meta[camtrap_data_meta$record_type == "observation","column_name"]) ``` ## Bonus challenge ``` RAISA rings <- read_csv("./data/20220329/20220329_bird_rings_untidy.txt") %>% mutate(id = row_number()) %>% pivot_longer(cols = 2:3, values_to = "inscription_code", names_to = "inscription_number") %>% mutate(inscription_number = ifelse(inscription_number == "first_inscription", 1, 2)) %>% filter(!(inscription_number == 2 & inscription == inscription_code))%>% select(-inscription) rings #can we still get only the current (latest) ring codes? rings %>% group_by(id) %>% arrange(id, inscription_number) %>% summarize(currentcode = last(inscription_code)) %>% ungroup() ```