owned this note
owned this note
Published
Linked with GitHub
# INBO CODING CLUB
29 March 2022
Welcome!
## Share your code snippet
If you want to share your code snippet, copy paste your snippet within a section of three backticks (```):
As an **example**:
```
library(tidyverse)
```
(*you can copy paste this example and add your code further down*)
## Challenge 1
```
RAISA
*datums staan in een ander formaat (dd/mm/yyyy vs mm/dd/yyyy)
*bij een van de tabellen staan species en geslacht in 1 kolom
*Eenheden staan bij slechts 1 kolom bij de meting (g)
*Andere kolomnamen
*Kalibratiecorrectheid kan beter in een variabele (boolean) worden opgenomen
```
```
Amber
Issues
* Two tables in one sheet, but similar data so can be combined into one
* Columns not consistent in the two tables
* Date is written differently
* species & sex divided vs combined
* weight with different column name and units in the name vs in the value box
* calibration as a variable
```
## Challenge 2
```
RAÏSA
* There's one observation with maturity "A" which is not defined
* Focal_N: Does the '/' signify that it has not been followed before (0) or is it unknown how often it has been followed before?
* Sex: should be 'U' in stead of '/' for undetermined
* Sea_State has undefined coding
ARNO
* Remove remarks
* Remove the first line
* change / to NA, U as NA
```
```
Amber
Issues
* Different types of data in one table, should be split
* Either sample ID or date & location as primary key
* one table with sample data (sample id, date, location, movie)
* one table with environmental data (date, location, cloud cover, sea state, temperature, tide)
* one table with divers data (date, location, proximity, divers_NT, divers_N)
* one table with manta data (sample ID, ID cat, focalN, sex, maturity, pregnant, injury, males_t, comments)
* comments at the columns can go in metadata or description tables
* replace / by NA
* also the explanation of the values in manta data columns can go in metadata or lookup tables
```
## Challenge 3
ARNO
3.1
habitat_types <- read.table("20220329_habitat_types.txt", sep=",", header=TRUE)
habitat_types2 <- separate(habitat_types, hab_type, sep=";", into=c("hab1", "hab2", "hab3", "hab4") )
habitat_types2 <- separate_rows(habitat_types, hab_type, sep=";")
3.2
habitat_types_2 <- read.table("20220329_habitat_types_2.txt", sep=",", header=TRUE)
ht_pivot <- pivot_longer(habitat_types_2, cols=3:9, names_to="hab_type", values_to="Bolean")
ht_pivot <- select(ht_pivot[ht_pivot$Bolean,], -Bolean)
```
RAISA
## CHALLENGE 3.1
# read data
habitats_raw <- read_csv("./data/20220329/20220329_habitat_types.txt") %>%
separate_rows(hab_type, sep = ";")
## CHALLENGE 3.2
# read data
habitats2_raw <- read_csv("./data/20220329/20220329_habitat_types_2.txt") %>%
pivot_longer(cols = 3:9, names_to = "habitat_type",
values_to = "value") %>%
filter(value == TRUE) %>%
select(-value)
## CHALLENGE 3.3
#' Do you get warnings?
#' Sure, the number of columns is not constant: data are NOT TIDY!
camtrap_data <- read.table("./data/20220329/20220329_camtrap_data.txt", sep = ",",
col_names = paste0("V",seq_len(21)))
camtrap_data
camtrap_data_meta <- read_csv("./data/20220329/20220329_camtrap_metadata.txt") %>%
mutate(record_type = as.factor(record_type))
levels(camtrap_data_meta$record_type)
deployment <- camtrap_data %>% filter(V1 == "deployment") %>%
select(2:(1 + sum(camtrap_data_meta$record_type == "deployment")))
colnames(deployment) <- unlist( camtrap_data_meta[camtrap_data_meta$record_type == "deployment","column_name"])
multimedia <- camtrap_data %>% filter(V1 == "multimedia") %>%
select(2:(1 + sum(camtrap_data_meta$record_type == "multimedia")))
colnames(multimedia) <- unlist( camtrap_data_meta[camtrap_data_meta$record_type == "multimedia","column_name"])
observation <- camtrap_data %>% filter(V1 == "observation") %>%
select(2:min((1 + sum(camtrap_data_meta$record_type == "observation")),17))
colnames(observation) <- unlist( camtrap_data_meta[camtrap_data_meta$record_type == "observation","column_name"])
```
## Bonus challenge
```
RAISA
rings <- read_csv("./data/20220329/20220329_bird_rings_untidy.txt") %>%
mutate(id = row_number()) %>%
pivot_longer(cols = 2:3, values_to = "inscription_code",
names_to = "inscription_number") %>%
mutate(inscription_number = ifelse(inscription_number == "first_inscription",
1,
2)) %>%
filter(!(inscription_number == 2 & inscription == inscription_code))%>%
select(-inscription)
rings
#can we still get only the current (latest) ring codes?
rings %>% group_by(id) %>% arrange(id, inscription_number) %>%
summarize(currentcode = last(inscription_code)) %>% ungroup()
```