owned this note
owned this note
Published
Linked with GitHub
# INBO CODING CLUB
17 December 2024
Welcome!
## Share your code snippet
If you want to share your code snippet, copy paste your snippet within a section of three backticks (```):
As an **example**:
```
library(tidyverse)
```
(*you can copy paste this example and add your code further down*)
## Yellow sticky notes
No yellow sticky notes online. Put your name + " | " and add a "*" each time you solve a challenge (see below).
## Participants
Name | Challenges
--- | ---
Damiano Oldoni | ***
Emma Cartuyvels|
Heleen Deroo |
Adriaan Seynaeve |
Berber Meulepas |
Falk Mielke | *
Lissa Breugelmans |
Pieter Huybrechts | **
Bert Van Hecke |**
Stijn Cooleman|
Robin Daelemans |
Sanne Govaert | **
## Off-Topic
### reveal.js Presentations
Presentations in RMarkdown via `reveal.js`:
- <https://bookdown.org/yihui/rmarkdown/revealjs.html>
- <https://github.com/rstudio/revealjs>
reveal.js is an html presentation framework; RStudio/RMarkdown (as many other programming environments) have convenience functions to export to a reveal.js presentation.
Because you presentation will essentially be *html*, it is versatile with media content and animation.
For example, you can embed youtube videos or interactive maps.
Info on reveal.js (with example): <https://revealjs.com>
There is an impressive demo included here: <https://quarto.org/docs/presentations/revealjs>
Note: the `INBOmd` has house style presentations [(see here)](https://inbo.github.io/inbomd_examples), yet they are exported to PDF and don't work on reveal.js.
### "stitching": converting a `.R` script to `.Rmd`
library("knitr")
knitr::stitch("src/20241217/20241217_challenges.R")
https://yihui.org/knitr/demo/stitch/
Note: the other way around is called `purl`:
<https://bookdown.org/yihui/rmarkdown-cookbook/purl.html>
This enables a good workflow to produce well-documented R scripts: write an Rmd, `purl` it to a `.R`.
All the supporting text will become comments of the script.
### References
````markdown
---
title: "References in RMarkdown Files"
output: html_document
bibliography: referencelist.json
---
# How to enter references?
- https://tutorials.inbo.be/tutorials/r_citations_markdown
- https://bookdown.org/yihui/rmarkdown-cookbook/bibliography.html
- https://docs.citationstyles.org/en/stable/primer.html
Examples can be found on the INBO tutorials [@vancalster2021]!
# How to get a CSL reference list?
You can export one from zotero ("export item" >>> "CSL JSON").
Example:
```json
[
{
"id": "vancalster2021",
"type": "article-newspaper",
"container-title": "INBO Tutorials",
"title": "Citations in R Markdown",
"URL": "https://tutorials.inbo.be/tutorials/r_citations_markdown",
"author": [
{
"family": "Van Calster",
"given": "Hans"
},
{
"family": "Vanderhaeghe",
"given": "Floris"
}
],
"accessed": {
"date-parts": [
[
"2024",
12,
17
]
]
},
"issued": {
"date-parts": [
[
"2021",
8,
7
]
]
}
}
]
```
# References
You can explicitly place the references in your document:
<div id="refs"></div>
````
## Challenge 1
### Damiano's solution (example)
Copy paste this section to show your solutions.
```r
# dummy code
print("This is how to insert code.")
```
Or for including a whole markdown document:
````markdown
<...>
````
## Pieter's solution
````markdown
```
---
title: "20241217_anemone_analysis"
author: "PieterH"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```
```
# Read and preprocess geese data
```{r load-libraries, message=FALSE, warning=FALSE}
# Load libraries:
library(tidyverse) # to do datascience
library(INBOtheme) # to apply INBO style to graphs
library(sf) # to work with geospatial vector data
library(mapview) # to make dynamic leaflet maps
```
```
## Introduction
In this document we will:
1. read occurrence cube data
2. explore data
3. preprocess data
## Read data
Read _Anemone_ data from the occurrence cube file `20241217_occurrence_cube_anemone.tsv`:
```{r read data}
anemone_cube <- readr::read_tsv(
file = "./data/20241217/20241217_occurrence_cube_anemone.tsv",
na = ""
)
```
```{r read grid}
# Read the Belgian grid from the geopackage file `20241217_utm1_be.gpkg`:
be_grid <- sf::st_read("./data/20241217/20241217_utm1_be.gpkg")
```
```
## Explore data
This dataset contain data from `r min(anemone_cube$year)` to `r max(anemone_cube$year)` related to `r length(unique(anemone_cube$specieskey))` species and their distribution in Belgium based on a grid of 1 km x 1 km.
Preview with the first 30 rows of the dataset:
```{r}
head(anemone_cube, n = 30)
```
```
## Taxonomic information
Species present in the dataset:
```{r}
anemone_cube %>% distinct(specieskey, species)
```
```
## Temporal information
The data are temporally defined at year level. Years present:
```{r}
anemone_cube %>% dplyr::distinct(year)
```
```
## Geographical information
The geographical information is represented by the `eeacellcode` column, which contains the identifiers of the grid cells containing at least one occurrence of the species.
The dataset contains `r length(unique(anemone_cube$eeacellcode))` unique grid cells.
## Preprocess data
Add geometrical information to the occurrence cube via `eeacellcode`, which contains the identifiers of the grid cells containing at least one occurrence of the species.
```{r add geom information}
cells_in_cube <- be_grid %>%
dplyr::filter(CELLCODE %in% unique(anemone_cube$eeacellcode)) %>%
dplyr::select(-c(EOFORIGIN, NOFORIGIN))
sf_anemone_cube <- cells_in_cube %>%
dplyr::left_join(anemone_cube, by = c("CELLCODE" = "eeacellcode")) %>%
dplyr::rename("eeacellcode" = "CELLCODE")
```
## Final (spatial) dataset:
```{r}
sf_anemone_cube %>% head(n = 30)
```
````
### Falk
This might be a useful addition concerning file paths:
````markdown
```{r setup}
knitr::opts_knit$set(root.dir = here::here())
```
````
Adding comments to rmarkdown ("hidden text", Emma's question):
````markdown
<!-- This is the only working way to add hidden text! -->
% this will not work, i.e. the text will show up, even in LaTeX export!
````
### Sanne's solution
````markdown
---
title: "Read and preprocess geese data"
author: "Sanne Govaert"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
knitr::opts_knit$set(root.dir = here::here())
```
Load libraries:
```{r, message = FALSE}
library(tidyverse) # to do datascience
library(INBOtheme) # to apply INBO style to graphs
library(sf) # to work with geospatial vector data
library(mapview) # to make dynamic leaflet maps
```
# Introduction
In this document we will:
1. read occurrence cube data
2. explore data
3. preprocess data
# Read data
Read _Anemone_ data from the occurrence cube file `20241217_occurrence_cube_anemone.tsv`:
```{r}
anemone_cube <- readr::read_tsv(
file = "./data/20241217/20241217_occurrence_cube_anemone.tsv",
na = ""
)
```
Read the Belgian grid from the geopackage file `20241217_utm1_be.gpkg`:
```{r}
be_grid <- sf::st_read("./data/20241217/20241217_utm1_be.gpkg")
```
# Explore data
This dataset contain data from `r min(anemone_cube$year)` to `r max(anemone_cube$year)` related to `r length(unique(anemone_cube$specieskey))` species and their distribution in Belgium based on a grid of 1 km x 1 km.
## Preview with the first 30 rows of the dataset:
```{r}
head(anemone_cube, n = 30)
```
## Taxonomic information
Species present in the dataset:
```{r}
anemone_cube %>% distinct(specieskey, species)
```
## Temporal information
The data are temporally defined at year level. Years present:
```{r}
anemone_cube %>% dplyr::distinct(year)
```
## Geographical information
The geographical information is represented by the `eeacellcode` column, which contains the identifiers of the grid cells containing at least one occurrence of the species.
The dataset contains `r length(unique(anemone_cube$eeacellcode))` unique grid cells.
# Preprocess data
Add geometrical information to the occurrence cube via `eeacellcode`, which contains the identifiers of the grid cells containing at least one occurrence of the species.
```{r}
cells_in_cube <- be_grid %>%
dplyr::filter(CELLCODE %in% unique(anemone_cube$eeacellcode)) %>%
dplyr::select(-c(EOFORIGIN, NOFORIGIN))
sf_anemone_cube <- cells_in_cube %>%
dplyr::left_join(anemone_cube, by = c("CELLCODE" = "eeacellcode")) %>%
dplyr::rename("eeacellcode" = "CELLCODE")
```
# Final (spatial) dataset:
```{r}
sf_anemone_cube %>% head(n = 30)
```
````
## Challenge 2
### Pieter's solution
````markdown
---
title: "20241217_anemone_analysis"
author: "PieterH"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
toc: true
toc_float: true
number_sections: true
theme: darkly
highlight: zenburn
code_folding: show
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Read and preprocess geese data
```{r load-libraries, message=FALSE, warning=FALSE}
# Load libraries:
library(tidyverse) # to do datascience
library(INBOtheme) # to apply INBO style to graphs
library(sf) # to work with geospatial vector data
library(mapview) # to make dynamic leaflet maps
```
## Introduction
In this document we will:
1. read occurrence cube data
2. explore data
3. preprocess data
## Read data
Read _Anemone_ data from the occurrence cube file `20241217_occurrence_cube_anemone.tsv`:
```{r read data}
anemone_cube <- readr::read_tsv(
file = "./data/20241217/20241217_occurrence_cube_anemone.tsv",
na = ""
)
```
```{r read grid}
# Read the Belgian grid from the geopackage file `20241217_utm1_be.gpkg`:
be_grid <- sf::st_read("./data/20241217/20241217_utm1_be.gpkg")
```
## Explore data
This dataset contain data from `r min(anemone_cube$year)` to `r max(anemone_cube$year)` related to `r length(unique(anemone_cube$specieskey))` species and their distribution in Belgium based on a grid of 1 km x 1 km.
Preview with the first 30 rows of the dataset:
```{r}
head(anemone_cube, n = 30)
```
## Taxonomic information
Species present in the dataset:
```{r}
anemone_cube %>% distinct(specieskey, species)
```
## Temporal information
The data are temporally defined at year level. Years present:
```{r}
anemone_cube %>% dplyr::distinct(year)
```
## Geographical information
The geographical information is represented by the `eeacellcode` column, which contains the identifiers of the grid cells containing at least one occurrence of the species.
The dataset contains `r length(unique(anemone_cube$eeacellcode))` unique grid cells.
## Preprocess data
Add geometrical information to the occurrence cube via `eeacellcode`, which contains the identifiers of the grid cells containing at least one occurrence of the species.
```{r add geom information}
cells_in_cube <- be_grid %>%
dplyr::filter(CELLCODE %in% unique(anemone_cube$eeacellcode)) %>%
dplyr::select(-c(EOFORIGIN, NOFORIGIN))
sf_anemone_cube <- cells_in_cube %>%
dplyr::left_join(anemone_cube, by = c("CELLCODE" = "eeacellcode")) %>%
dplyr::rename("eeacellcode" = "CELLCODE")
```
## Final (spatial) dataset:
```{r}
sf_anemone_cube %>% head(n = 30)
```
## Data visualization
In this section we will show how the number of occurrences and the number of occupied grid cells vary by year and species. Both static plots and dynamic maps are generated.
## Static plots {.tabset}
Show number of occurrences and number of occupied grid cells.
### per species
```{r per species, class.source = 'fold-hide', warning = FALSE, message = FALSE}
n_per_species <- sf_anemone_cube %>%
dplyr::group_by(species) %>%
dplyr::summarize(occurrences = sum(occurrences),
grid_cells = n_distinct(eeacellcode),
.groups = "drop") %>%
tidyr::pivot_longer(cols = c(occurrences, grid_cells),
names_to = "variable",
values_to = "n")
ggplot(n_per_species, aes(x = species, y = n)) +
geom_bar(stat = 'identity') +
facet_grid(.~variable, scales = "free_y") +
ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
### per year
```{r per year, class.source = 'fold-hide'}
n_per_year <- sf_anemone_cube %>%
dplyr::group_by(year) %>%
dplyr::summarize(occurrences = sum(occurrences),
grid_cells = n_distinct(eeacellcode),
.groups = "drop") %>%
tidyr::pivot_longer(cols = c(occurrences, grid_cells),
names_to = "variable",
values_to = "n")
ggplot(n_per_year,aes(x = year, y = n)) +
geom_bar(stat = 'identity') +
facet_grid(.~variable, scales = "free_y") +
ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
### per year and province
```{r year-and-province, class.source = 'fold-hide'}
n_occs_per_year_species <-
sf_anemone_cube %>%
dplyr::group_by(year, species) %>%
dplyr::summarize(occurrences = sum(occurrences),
grid_cells = n_distinct(eeacellcode),
.groups = "drop") %>%
tidyr::pivot_longer(cols = c(occurrences, grid_cells),
names_to = "variable",
values_to = "n")
ggplot(n_occs_per_year_species,
aes(x = year, y = n, fill = species)) +
geom_bar(stat = 'identity') +
facet_grid(.~variable, scales = "free_y") +
ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
`
````
### Heleen's solution
````markdown
---
title: "20241217_anemone_analysis"
author: "Heleen Deroo"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
toc: true
toc_float: true
toc_depth: 2
number_sections: true
code_folding: hide
---
(...)
# Data visualization
In this section we will show how the number of occurrences and the number of occupied grid cells vary by year and species. Both static plots and dynamic maps are generated.
## Static plots {.tabset}
Show number of occurrences and number of occupied grid cells (make a tabbed section out of it)
### per species
```{r}
n_per_species <- sf_anemone_cube %>%
dplyr::group_by(species) %>%
dplyr::summarize(occurrences = sum(occurrences),
grid_cells = n_distinct(eeacellcode),
.groups = "drop") %>%
tidyr::pivot_longer(cols = c(occurrences, grid_cells),
names_to = "variable",
values_to = "n")
ggplot(n_per_species, aes(x = species, y = n)) +
geom_bar(stat = 'identity') +
facet_grid(.~variable, scales = "free_y") +
ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
### per year
```{r}
n_per_year <- sf_anemone_cube %>%
dplyr::group_by(year) %>%
dplyr::summarize(occurrences = sum(occurrences),
grid_cells = n_distinct(eeacellcode),
.groups = "drop") %>%
tidyr::pivot_longer(cols = c(occurrences, grid_cells),
names_to = "variable",
values_to = "n")
ggplot(n_per_year,aes(x = year, y = n)) +
geom_bar(stat = 'identity') +
facet_grid(.~variable, scales = "free_y") +
ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
### per year and province
```{r warning = FALSE}
n_occs_per_year_species <-
sf_anemone_cube %>%
dplyr::group_by(year, species) %>%
dplyr::summarize(occurrences = sum(occurrences),
grid_cells = n_distinct(eeacellcode),
.groups = "drop") %>%
tidyr::pivot_longer(cols = c(occurrences, grid_cells),
names_to = "variable",
values_to = "n")
ggplot(n_occs_per_year_species,
aes(x = year, y = n, fill = species)) +
geom_bar(stat = 'identity') +
facet_grid(.~variable, scales = "free_y") +
ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
## Dynamic maps
### Leaflet maps
We show a map with the distribution of _Anemone_ in Belgium. We show the total number of occurrences per grid cell. The color of the grid cells is based on the number of occurrences. The legend shows the color scale and the number of occurrences per grid cell.
```{r}
n_occs_per_cell <- sf_anemone_cube %>%
dplyr::group_by(eeacellcode) %>%
dplyr::summarize(
occurrences = sum(occurrences),
min_coordinateuncertaintyinmeters = min(mincoordinateuncertaintyinmeters),
min_mintemporaluncertainty = min(mintemporaluncertainty),
.groups = "drop")
map_anemone <- mapview::mapview(n_occs_per_cell,
zcol = "occurrences",
legend = TRUE
)
map_anemone
```
````
## Challenge 3
### Falk
*(For interested readers)*
`bookdown`s are great!
You can also produce book-like projects in quarto:
- https://quarto.org/docs/books/book-structure.html
Initiating a project is simple:
```shell
quarto create project book
```