_1st ORDS ReproHack_
<img src="https://github.com/ORDS-Rostock/1st-ords-reprohack/blob/main/reprohack-rostock.png?raw=true" alt="drawing" width="200"/>
- :calendar: **11th May 2021 CEST**
- :watch: **9:30--15:00**
# Agenda
**9:30 - Opening and virtual come together**
* Introduction ORDS
* Icebreaker
* Introduction TKFDM
* Article including Code and Data
* Team formation
**10:00 - 1st part of the workshop**
* Hands-on analysis in teams
**11:30 - Rejoin and tell**
**12:00 - Lunch break :pizza: :stew: :strawberry:**
**13:00 - 2nd part of the workshop**
* Continue working in your team
* Prepare feedback to the group and for the authors
**14:30 - Evaluation and Goodbye**
### **Participants**
***Please sign in (Affiliation / Twitter / GitHub)***
#### If you have a twitter handle, please add it!
* Frank Krüger (University of Rostock / [@\_frank_k\_](https://twitter.com/_frank_k_) / [f-krueger](https://github.com/f-krueger) )
* Manuela Reichelt (FBN Dummerstorf / [@manuReichelt](https://twitter.com/ManuReichelt) / [ManuelaReichelt](https://github.com/ManuelaReichelt))
* Anja Eggert (FBN Dummerstorf / [@AnjaEggert42](https://twitter.com/AnjaEggert42) / [AnjaEggert](https://github.com/AnjaEggert))
* Max Schröder (University of Rostock / [@m6121](https://twitter.com/m6121) / [m6121](https://github.com/m6121))
* Jessica Rex (Technical University of Ilmenau / [@ThatDataStuff](https://twitter.com/ThatDataStuff))
* Roman Gerlach (Friedrich-Schiller-University Jena / [@FDMThueringen](https://twitter.com/FDMThueringen) )
* Kevin Lang (Bauhaus-Universität Weimar / [@kev_lan](https://twitter.com/kev_lan) / [GitHub](https://github.com/KujaEx))
* Anke Günther (Uni Rostock, Uni Greifswald)
* Fabian Dröge (Uni Jena / [GitLab](https://gitlab.com/FabianGD))
* Sheeba Samuel (Uni Jena/ [@sheebasamuel](https://twitter.com/sheebasamuel)/[GitLab](https://github.com/Sheeba-Samuel/))
* Phillip Seeber (Uni Jena / [GitLab](https://gitlab.com/sheepforce))
* Kai Budde (Uni Rostock / [GitHub](https://github.com/buddekai))
* Stephanie Dahn (Uni Rostock) [GitHub](https://github.com/KangarooSurfer) [Twitter](https://twitter.com/stephaniedahn)
* Markus Zehner (Uni Jena / [GitHub](https://github.com/MarkusZehner))
* Henja Wehmann (Uni Rostock)
* Felix Cremer (DLR Jena / [GitHub](https://github.com/felixcremer))
* Sebastian Seidenath (Friedrich-Schiller-University Jena)
* Oscar Beltran (Leibniz Institute for Baltic Sea Research)
* Taufia Hussain (Uni Rostock)
* Inga Ulusoy (Scientific Software Center, Uni Heidelberg / [GitHub](https://github.com/iulusoy))
# Icebreaker
Form teams and try to answer the following questions in breakout rooms.
- Who are you?
- Why are you here?
- What is your level of repro-experience?
- What is your favorite (new) hobby after a year of on/off Corona lock down?
**As a group: name your room!**
- What do you have in common?
1. all-over-Germany-group
2. Thuringian-group
3. techincal-issues-group
4. wannabe-musicians-group
5. white-wall-group
# :recycle: ReproHacking - Plan of Action
In contrast to other ReproHacks, here we focus on one particular paper rather than an entire list of papers. We selected the following article:
Luis M. Vilches-Blázquez & Daniela Ballari (2020):
**Unveiling the diversity of spatial data infrastructures in Latin America: evidence from an exploratory inquiry**, Cartography and Geographic Information Science, DOI: [10.1080/15230406.2020.1772113](https://doi.org/10.1080/15230406.2020.1772113)
(author copy available for [Download](https://unibox.uni-rostock.de/getlink/fi6BVET7tsTf89NdAdkHmGcc/SDI_10.1080%4015230406.2020.1772113.pdf))
# :computer: Form Teams
Feel free to either join the predefined teams *Beginners*, *Advanced*, or *Experts*, create your own team, or work individually on the paper.
## Beginners (Room 1)
The paper is analysed with respect to their published resources and the original analysis is re-run in order to see whether the same results will be generated.
_Participants are expected to have some basic knowledge of R_
* **Manuela Reichelt**
* **Anja Eggert**
* **Jessica Rex**
* **Franziska Koebsch**
* **Anke Günther**
* **Henja Wehmann** (knows R, but programming skills are a mess)
* Stephanie Dahn (**really** basic R knowledge)
* **Alexander Schwab** (no significant knowledge unfortunately)
* **Sebastian Seidenath** (no significant knowledge unfortunately)
* **Taufia Hussain (basic R knowledge)**
* Dietmar Zechner (no knowledge)
* Oscar Beltran
### Feedback
* Figure 2b: "Yes" bar missing for Colombia 2017 in the article, caused by zooming in too much on the y-axis (bottom)
* Round while preserving sum:
[Link to r-bloggers](https://www.r-bloggers.com/2016/07/round-values-while-preserve-their-rounded-sum-in-r/)
## Advanced Python (Room 2)
The paper's analysis is re-implemented in a Python Jupyter notebook to see whether the same results can be generated in a different computational environment.
_Participants are expected to have some basic knowledge in R and Python_
* **Frank Krüger**
* **Kevin Lang** (Python with Jupyter)
* **Inga Ulusoy**
* **Fabian Dröge** --> Julia or Python, I don't mind :)
* **Felix Cremer** would like to use [Julia](https://julialang.org/)
* **Markus Zehner**
* **Sheeba Samuel**
### Feedback so far
* recreated first two plots with `python` and `julia`
* some raw values adjusted (inserted magic numbers)
* Translation to Julia seems complicated
* R code is fairly structured and easy to follow, makes the task of mapping into a different language a lot easier
* for the maps, we use shape files from https://tapiquen-sig.jimdofree.com/english-version/free-downloads/americas/
* the shapefiles were added manually over the bar plots
## Experts (Room 3)
A computational environment for both, the original analysis and the re-implemement analysis, is created.
_Participants are expected to have some basic knowledge in R, Python and Docker_
* **Max Schröder**
* **Roman Gerlach**
* **Phillip Seeber** (Haskell, Nix)
### Documentation
* Paper uses survey to gather data
* Data and source code published on figshare:
* Excel sheet with cleaned data (years: 2019, 2017, 2014) (cleaning process see paper)
* Rmarkdown document
* HTML with details and figures
* Use command to produce HTML: `Rscript -e "library(knitr); knitr::knit2html('Survey_Trend_SDI.Rmd')"` or better: `Rscript -e "library(knitr); rmarkdown::render('Survey_Trend_SDI.Rmd')"`
* We aim at two different computing environments:
* [Docker](https://www.docker.com/)
* [Nix](https://nixos.org)
### Docker
#### R
install.packages(c('remotes'), repos='https://ftp.fau.de/cran/')
install_version("knitr", version='1.28', repos="https://ftp.fau.de/cran/")
install_version("rmdformats", version='0.3.6', repos="https://ftp.fau.de/cran/")
install_version("readxl", version='1.3.1', repos="https://ftp.fau.de/cran/")
install_version("ggplot2", version='3.2.1', repos="https://ftp.fau.de/cran/")
install_version("stringr", version='1.4.0', repos="https://ftp.fau.de/cran/")
install_version("rworldmap", version='1.3-6', repos="https://ftp.fau.de/cran/")
install_version("RColorBrewer", version='1.1-2', repos="https://ftp.fau.de/cran/")
install_version("DT", version='0.12', repos="https://ftp.fau.de/cran/")
install_version("plyr", version='1.8.6', repos="https://ftp.fau.de/cran/")
install_version("tidyr", version='1.0.2', repos="https://ftp.fau.de/cran/")
install_version("ggpubr", version='0.2.5', repos="https://ftp.fau.de/cran/")
FROM r-base:3.6.3
LABEL maintainer="max.schroeder@uni-rostock.de;roman.gerlach@uni-jena.de"
RUN apt update \
&& apt install -y \
libcurl4-openssl-dev \
pandoc \
&& apt clean \
&& rm -rf /var/lib/apt/lists/*
COPY install.R /tmp/install.R
RUN Rscript /tmp/install.R
docker build -t ords-reprohack:r-container .
docker run --rm -u $(id -u) -v /data/reprohack:/opt/data -w /opt/data ords-reprohack:r-container Rscript -e "library(knitr); rmarkdown::render('Survey_Trend_SDI.Rmd')"
#### Python
Software dependencies:
* Jupyter Notebook with Python3 Kernel
* Python packages:
* pandas
* matplotlib
* geopandas
FROM jupyter/scipy-notebook:09fb66007615
LABEL maintainer="max.schroeder@uni-rostock.de;roman.gerlach@uni-jena.de"
RUN python3 -m pip install -r geopandas==0.9.0
### Nix
The Nix version reproduces the original data with hermetic nix down to the exact hashes.
A GitLab repository with the Nix expressions and a short description how to build is at [GitLab](https://gitlab.com/sheepforce/reprohack).
The build definition is given by
{ stdenvNoCC, lib, fetchurl, unzip, rPackages, rWrapper, pandoc }:
rPkgs = import ./pkgs.nix { inherit rPackages; };
rWithPkgs = rWrapper.override { packages = rPkgs; };
in stdenvNoCC.mkDerivation rec {
pname = "ReproHack-Original";
version = "1.0";
src = fetchurl {
url = "https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/22720802/SupplementaryMaterial.zip";
sha256 = "00lsp163q44dn8adlra8vaf4cgcyiifv2nh0qabypsfcgzj0c2sd";
nativeBuildInputs = [
phases = [
installTargets = [
unpackPhase = ''
unzip $src
buildPhase = ''
Rscript -e "library(knitr); rmarkdown::render('Survey_Trend_SDI.Rmd')"
installPhase = ''
mkdir -p $out
for i in ${toString installTargets}; do
cp -r $i $out/.
which is fully reproducible, as the dependencies are exactly pinned by git and sha256 hashes.
A Jupyter Lab environment that can fully run the Jupyter notebook from the advanced group can be obtained from within the `Python` directory by executing
nix-shell --command "jupyter lab"
and in jupyter lab select the `ReproHackPython` kernel.
:books: Reproduce
- We attempt to reproduce the paper from available materials and documentation
- Make notes about your experiences, in particular with respect to how easy it is to:
- :earth_africa: navigate the materials
- :repeat: reproduce the analysis
- :recycle: reuse the materials
:memo: Feedback to authors
* Fill in the author feedback form, documenting your experiences reproducing your chosen group
# :raised_hand_with_fingers_splayed: 5-finger Feedback for this event
## :point_up: One thing that you enjoyed
_(put your comments here)_
- to try something I have no idea- Thanks a lot to the organizers
- high level overview of completely foreign data and working with them
- trying to get into a completely new language
- coding together +1
- learning
- I enjoyed taking on the role of inspecting in detail the results of a paper and exploring the tools available to do so.
- working in just one paper +1
- Getting to know new people
## :point_up: One thing that can be improved
_(put your comments here)_
- the tooling was communicated more openly on the website than it was in the end in reality (reduction to Python, Julia)
- more smaller breaks in between where technical issues can be adressed and where you have time to read into new things (and for having snacks :D )
- more structured sections for working and listening - doing everything at once sometimes was very exhausting for me
- perhaps have the paper at least one hour beforehand to know what it is about.
- give an overview about the data (files and meaning)
- give a short introduction about the paper and what we are reproducing
- after coming back from lunch had problems to enter a breakout room
- technical issues in joining the main room
- it would be good to have a place (online) to share the intermediate results/code within the breakout group members. May be also a shared computational environment. +1
- provide a list with items that define what is good practice for reproducibility (like, documentation, requirements should be given with any piece of code, etc.)
- focus more on reproducing the actual results rather than the plots
## :point_up: One thing that you did not like
_(put your comments here)_
- working remotely :/
- doing only plots, it would also be interesting to have a paper with a more interesting methodology, where also the reproduction is harder, because we could have rounding errors and other subtle differences
- not enough time to complete tasks +1
## :point_up: One thing that you would like to keep
_(put your comments here)_
- separation into groups with different technologies and aspects +1
- openness to beginners
- different levels of expertise
- open to different programming languages +2
## :point_up: One thing that came up short
_(put your comments here)_
- time, I think it would be nicer, to have a little bit more time
- intro into reproducibility and what we are actually looking for while doing the analysis +1