owned this note
owned this note
Published
Linked with GitHub
# INBO CODING CLUB
25 February 2020
Welcome!
Welcome Tosca!
## Share your code snippet
If you want to share your code snippet, copy paste your snippet within a section of three backticks (```):
As an **example**:
```
library(tidyverse)
```
(*you can copy paste this example and add your code further down*)
## Participants
1. Damiano
2. Sander Devisscher
3. Dirk Maes
4. Geert De Knijf
5. Britt Lonneville
6. Heidi Demolder
7. Anik Schneiders
8. Marie Robberecht
9. Loïc van Doorn
10. Patrik Oosterlynck
11. An Leyssen
12. David Halfmaerten
13. Nicolas Noé
14. Anja Leyman
15. Sabrina Neyrinck
16. Bart Christiaens
17. Jo Packet
18. Koen Thibau
19. Yi-Ming Gan
20. Emma Cartuyvels
21. Floris Vanderhaeghe
22. Inne Vucht
23. Jeroen Vandenborre
24. Andy Van Kerckvoorde
25. Patrik Oosterlynck
26. Marijke Thoonen
27. Mathias Wackenier
28. Dries Adriaens
29. Andy Van Kerckvoorde
## Challenge 1
>
```
Data structure Emma:
list.files(recursive = TRUE)
1 data/20170621_bird_obs.csv
2 data/202001_birdringing_raw.csv
3 messy_project.Rproj
4 output/figure01_count_per_gemeente.jpg
5 src/analysis.R
6 src/data_prep.R
7 src/helpers.R
8 src/visualisations.R
```
```
Data structure Nicolas:
1 data/2017-06-21_birds-observations.csv
2 data/2020-01_dirty-data.csv
3 messy_project.Rproj
4 reports/figures/figure1.jpg
5 src/01_data-preparation.R
6 src/02_analysis-final.R
7 src/02_analysis-finalfinal-i-should-use-git-instead.R
8 src/02_analysis-test-july2018.R
9 src/03_data-visualizations.R
10 src/helpers.R
```
```
Data structure Britt:
1 data/20170621_birdobservations.csv
2 data/20200100_birdringing_bel.csv
3 messy_project.Rproj
4 output/birdspermunicipality.jpg
5 src/20180700_analysis_test.R
6 src/analysis_paper.R
7 src/analysis_paper_afterrevision.R
8 src/createimage.R
9 "" src/datapreparation.R
10 src/helperfunctions.R
```
```
Data structure Ming:
.
├── README.md
├── data
│ └── raw
│ ├── 2017-06-21_bird-observation.csv
│ └── 2020-01_bird-ringing-in-belgium.csv
├── messy_project.Rproj
├── reports
│ └── figures
│ └── 01_plot-bird-obs.jpg
└── src
├── analysis
│ ├── 01_data-analysis.R
│ └── 02_revised-data-analysis.R
├── data
│ └── 01_data-preparation.R
├── global
│ └── 01_helpers.R
├── plots
│ └── 01_plot-bird-obs.R
└── tmp
└── 2018-07_analysis-test.R
```
Anja:
birds <- read_csv(here('data/2017-06-21_bird-obs .csv'))
fig1 <- ggplot(birds, aes(x = PlaatsGemeente)) + geom_bar()
fig1
ggsave(here('output/fig1_nr-birds-gemeente.jpg'), fig1)
### Figure Dirk
```
fig1 <- ggplot(birds, aes(x = PlaatsGemeente)) +
geom_bar(fill = "darkgreen") +
xlab("Locality") +
ylab("Number of birds") +
scale_y_continuous(breaks = c(2, 4, 6, 8)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.text = element_text(size = 10),
axis.title = element_text(size = 15, face= "bold"),
legend.text = element_text(size = 10),
legend.title = element_text(size = 15),
panel.background = element_rect(fill = "white", colour = "grey",
size = 1, linetype = "solid"),
panel.grid.major = element_line(size = 0.5,
linetype = "solid",
colour = "grey"))
fig1
```
## Challenge 2
Anja:
```
urban_gaia <- read_delim(here("data", "20200225_urban_gaia_policy.txt"),
delim = "\t")
```
Nicolas:
```
urban_gaia_clean <- clean_names(urban_gaia) # Automate most of the work...
urban_gaia_clean <- rename(urban_gaia_clean, definitions=definition_s) # ... but with a manual final touch
```
# improve the column names
```
str(urban_gaia)
colnames(urban_gaia)
?clean_names
t <- urban_gaia %>%
clean_names()
t <- t %>%
rename(definition = definition_s,
UGBI_directly_or_indirectly_mentioned = ugbi_directly_or_indirectly_mentioned,
UGBI_central_to_approach = ugbi_central_to_approach,
UGBI_lev_centr = ugbi_lev_centr)
colnames(t)
```
Britt:
```
urban_gaia <- read_delim(here("data", "20200225_urban_gaia_policy.txt"),
delim = "\t")
colnames(urban_gaia)
urban_gaia <- urban_gaia %>% clean_names("snake")
```
Sander:
```
urban_gaia <- urban_gaia %>%
clean_names("snake")
```
Emma:
```
urban_gaia <- clean_names(urban_gaia) %>%
rename(type = type_of_document,
level = to_implement_at_level,
ugbi_mentioned = ugbi_directly_or_indirectly_mentioned,
definition = definition_s)
```
## Challenge 3
```
library(tidyverse)
library(here)
# Read something
bird_obs <- read_csv(here("data", "20191024_species.csv"), na = "")
# ALT + -: inserts an assignment, including spaces " <- "
# CTRL + SHIFT + M: inserts a pipe, including spaces " %>% "
# Do something
bird_obs <- bird_obs %>%
mutate(species_id = str_to_lower(species_id),
taxa = str_to_lower(taxa))
### Do something 2
species_id_label <- str_sort(bird_obs$species_id)
### extract species_id labels longer than 2 letters
species_id_long <- bird_obs$species_id[str_length(bird_obs$species_id) > 2]
### tidyverse version
species_id_long <- bird_obs %>%
filter(str_length(species_id) > 2) %>%
pull(species_id)
# Add canonicalName as genus + species
bird_obs2 <- bird_obs %>%
mutate(canonicalName = str_c(genus, species, sep = " "))
# Remove something from a column
species_bird_obs_taxa_clean <-
bird_obs %>%
# str_remove or str_remove_all, in this case no differences
mutate(taxa = str_remove(taxa, "-not censused" )
)
species_bird_obs_clean <-
bird_obs %>%
# remove tabs
mutate(new_col = str_remove_all(authorship,
pattern = "\\t"),
# remove only NAs that are preceded by | or followed by |
new_col = str_remove_all(new_col,
pattern = "((?<=\\|)NA)|(NA(?=\\|))"),
# remove vertical pipes
new_col = str_remove_all(new_col,
pattern = "\\|"),
# remove spaces at the end
new_col = str_remove_all(new_col,
pattern = "[:space:]+$"),
# remove punctuation at the end
new_col = str_remove_all(new_col,
pattern = "[:punct:]$"))
View(species_bird_obs_clean)
if (nrow(bird_obs) <= nrow(bird_obs2)) {
print(
paste("Number of rows:",
nrow(bird_obs)
)
)
} else {
print(
paste("Number of rows:",
nrow(bird_obs2)
)
)
}
```
## Additional links
https://ourcodingclub.github.io/course/wiz-viz/index.html