Try   HackMD

N8 CIR Northern Tour ReproHack Series Notes

tags: Templates Reprohack hackpad

Plan of Action

Welcome and Introduction

  • Into Slides

:computer: Form teams

Feel free to tackle papers individually or as teams.

:dart: Select papers

  • Choose paper from list of proposed papers
  • Register the paper selected and the participants reproducing below. You can copy, paste and edit the following template:
    ​​​​### **Paper:** <Title of the paper reproduced>
    ​​​​**Reviewers:** Reviewer 1, Reviewer 2 etc.
    

:books: Reproduce

  • Attempt to reproduce papers from available materials and documentation
  • Make notes about your experiences, in particular with respect to how easy it is to:
    • :earth_africa: navigate the materials
    • :repeat: reproduce the analysis
    • :recycle: reuse the materials

:memo: Feedback to authors

  • Fill in the author feedback form, documenting your experiences reproducing your chosen paper

Event Hackpad

Newcastle

  • Location: Newcastle University
  • Date: 2020-01-21
  • Participants:
    • Alison Clarke (Durham University / @alisonrclarke )
    • Jonathan Frawley (Durham University)
    • Marion Weinzierl (Durham University)
    • Juan Ojeda (Newcastle University)
    • Victoria Kurushina (Newcastle University)
    • Helen Clare (Jisc / @helenclare)

Reproduction Log

Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here.

Paper: <Spectral measure of color variation of black-orange-black (BOB) pattern in small parasitoid wasps (Hymenoptera: Scelionidae), a statistical approach>

Reviewers: Juan Ojeda

Discussion notes

Easy to reproduce. GitHub repository downloaded and code run on RStudio without major problems. The only issue was to manually create a folder called "Figures", without it figures were not saved. Figures were identical to the ones on the paper.

Paper: <Social-evaluative threat: Stress response stages and influences of biological sex and neuroticism>

Reviewers: Marion Weinzierl

Discussion notes

  • I got two plots that matched those in the paper. There was an additional plot and lots of .xlsx files produced, but I could not easily match them up with tables or plots in the paper.
  • The code gave lots of 'object not found' errors which I could not resolve.
  • The code had very good in-code documentation (comments), and a spreadsheet that explained the types of data. However, it did not have a README or other documentation, and the 'Steps to reproduce' description was extremely sparse.
  • The code was one very long script that was structured in sections. A better structuring, using functions and splitting up the code into more files might have helped to read it. Also, more descriptive variable names would be good (and not assuming that abbreviations are understood by someone else).

Paper: < PREPRINT: Using digital epidemiology methods to monitor influenza-like illness in the Netherlands in real-time: the 2017-2018 season>

Reviewers: Alison Clarke, Helen Clare

Discussion notes

  • It would be useful to give a link in the README to download R Studio, for new R users.
  • It would also be good to say which files to run etc, e.g. I tried running manuscript first but that didn't work without running the source code file.
  • "CAVE: Please Check - gtrendsR developer version required!" was output partway through running the code.
    • It might be more obvious to mention this in the README as I missed it the first time running.
    • It would be better to give a date or commit hash for the development version used, as the current development version will change over time. Describing the problem with the current version would also be useful, as then we could check the release notes to see if it's fixed in a later version.

Paper: <A multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories

K. Hinsen and G.R. Kneller, J. Chem. Phys. 145, 151101 (2016)>
Reviewers: Victoria Kurushina, Jonathan Frawley, Juan Ojeda

Discussion notes

Lots of issues setting this up:

  • Instructions in general very vague, they just point to the "ActivePapers" tool and do not give instructions on how to use it to run code, etc.
  • Requires "ActivePapers" tool which seems like it is not well supported: Documentation has dead links
  • Does not specify which version of ActivePapers to install (There are at least 3: Python 2, Python 3, JVM). I figured out that they wanted Python 2 by process of elimination
  • Some dependencies missing for ActivePapers: h5py, matplotlib, numpy
  • Requires Python 2 which is now at end of life
  • No Windows install instructions, and we could not get it to work
  • And even once we ran this and got access to the data and code, the code does not work with the latest version of ActivePapers.

Paper: <Evaluation of the ‘Irish Rules’: The Potato Late Blight Forecasting Model and Its Operational Use in the Republic of Ireland by Mladen Cucak, Adam Sparks, Rafael de Andrade Moral, Stephen Kildea, Keith Lambkin and Rowan Fealy>

Reviewers: Jonathan Frawley

Discussion notes

  • No docs on how to run it
  • Takes a long time to install dependencies (Over half an hour!)
  • Some errors when running from a clean install that needed to be fixed: bind_rows -> dplyr::bind_rows, rename -> dplyr::rename, add_column -> tibble::add_column
  • Some system level packages required (on Fedora 31): sudo dnf install libxml2 openssl-devel R-curl
  • Confused by the suffix of (1) on some file names, does this indicate a newer version of the code?
  • But! In the end, I was able to reproduce a lot of the plots in the paper!

Feedback on the Feedback form

  • The introductory slide content + the feedback form are currently under active development. They will form the basis of the Reproducibility Review form in the new Hub and the Reproducibility report to be published in ReScience C. The plans and current thinking is outlined in this issue

How could they be improved? Feel free to add suggestions :point_down:

  • Having more time. Finishing at 17:00
  • Making an exhaustive list of software (with versions!) to reproduce ALL the papers and sending this list in advance to the participants for they could install a bunch of things prior to the session and start directly with the reproduction at the workshop.
  • Starting the workshop session with an easy going paper and reproduce the results together with the instructor with the step-by-step guidance.

HPC Reprohack Special Edition

  • Try and discuss at CW2020

Ten Year Reproducibility challenge

Example discussion Topic 1: Remote ReproHacks

  • Can we make this happen?
  • What technologies would we need?
  • What would loose compared to the localised experience? Could we make up for it?

Have python / django skills or just want to get involved? Check out our open issues in the our new under development hub: https://github.com/reprohack/reprohack_site

Group notes


Leeds

  • Location: University of Leeds
  • Date: 2020-02-14
  • Participants:
    • Anna Krystalli (University of Sheffield, @annakrystalli)
    • Isabel Birds (University of Leeds, @IzBirds)
    • Haroldas Bagdonas (University of York, @GABRAHREX)
    • Nick Sheppard (University of Leeds, @OpenResLeeds)
    • Joanna Leng (University of Leeds)
    • Alistair Curd (University of Leeds)
    • Nick Rhodes (University of Leeds)
    • Nujcharee Haswell ()

Reproduction Log

Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here.

Paper: <Cell Contractility Facilitates Alignment of Cells and Tissues to Static Uniaxial Stretch>

(C and MATLAB)

Reviewers: Alistair Curd

  • Documentation consists of instructions to install C compiling software, and an initial command to type.
  • I installed the recommended C compilation software through Miniconda, avoiding problems with administrator privileges.
  • This software failed to compile with errors on my Windows 10 system.

Paper: <Sea level regulated tetrapod diversity dynamics through the Jurassic/Cretaceous interval>

Reviewers: Isabel Birds. (MacOS)

  • Methods dense and not clear.
  • "Raw" data in supplementary has been manipulated.
  • Could not replicate actual raw data from PaleoDB - some detail missing?
  • Gave up.

Paper: <Where should new parkrun events be located? Modelling the potential impact of 200 new events on socio-economic inequalities in access and participation>

Reviewers: Stefano Maffei, Nujcharee Haswell (Ped).

Discussion notes

  • Shinny app is very good and easy to navigate. Successfully ran this locally.
  • Suggestion on alternative of color legend Red/Yellow/Green. Not sure if this is accessible?
  • Some comments on how to run the script in individual R files however it would beneficial to have all instructions in Readme.md
  • Can successfully re-produce p_imd_dist_runs plot as per the paper.
  • No documentation on package dependency
  • It would be useful in vignette to have details of packages and versions last succcessfully ran.
  • src2_data_proc.R - england_sp object is commented out but referred to elsewhere
  • line 144 - error srvd_lsoa merging data frames as number of rows didnt match

Paper: <Multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories>

Reviewers: Alistair Curd, Reviewer 2 etc.

(Python)

  • In Active Papers format. The Active Papers Python tool did not install to use as documentation says it should. Attempting to access the README through h5py was also unsuccessful.

Paper: <Bayesian determination of the effect of a deep eutectic solvent on the structure of lipid monolayers>

Reviewers: Haroldas Bagdonas(MacOS)

Discussion notes

  • Can't get conda environment setup on MacOS Mojave(v10.14). Some dependancies are impossible to compile.
    I assume the author ran this analysis on Linux, as one of the packages require gcc to compile. Apple MacOS uses clang to compile C++/C code.
  • Otherwise, I am pretty certain that everything would be working if I was on my Linux machine.
    The readme.md and jupyter notebooks look relatively complete and offer enough information for the user to find solutions for themselves in case specific dependancies would fail.
  • Didn't manage to go through the reproducibility in detail, due to failing to get conda environment properly setup.

Paper: <Evaluation of the ‘Irish Rules’: The Potato Late Blight Forecasting Model and Its Operational Use in the Republic of Ireland>

Reviewers: Isabel Birds (MacOS)

Discussion notes

  • Straightforward to run in RMarkdown, but not obvious to a new user. Eg how to access data.
  • Comprehensive list of R packages used given, but not version numbers - could cause issues in future.
  • Need to have more clarity for data. E.g. the source of historical weather data (Met Éireann synoptic weather station) is given, but no further information, eg a key. Perhaps a clear download link including metadata etc?
  • Clear description of quality control justifications.
  • Multiple versions of Rmd files on Github - why?
  • Overall good, but improvement possible.

Paper: A multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories

Reviewers: Nick Rhodes (Linux/Debian)

Discussion notes

  • Packaged code,data and Results together using ActivePapers in a single file made easily accesible on Zenodo.
  • Provided no instructions of how to install software dependancies required to run the provided code.
  • No reference to versions of software dependancies required.
  • By referencing the ActivePapers website I was able to install activepapers via conda and libraries required. Note ActivePapers provides a non-conda install method (for users of Linux).
  • No readme provided within the ActivePapers file, but some clues given in a report.md file.

Paper: <Ecological Network assembly: how the regional meta web influence local food webs>

Reviewers: Euan McDonnell (Linux/Debian)

Discussion notes

Paper: <Algorithm configuration data mining for CMA evolution strategies>

Reviewers: Stefano Maffei, Nujcharee Haswell (Ped)

Discussion notes (Python)

  • Repo is binder compatible
  • Repo has requirements.txt with list of packages dependencies and versions required to test / deploy
  • Successfully run the first diagram however the rest of notebook takes too long to run (10+ mins) for block 4 so we stopped.

Discussion notes

  • Target platforms - Joanna Leng mentioned that it appeared code was designed to run on HPC - this information is very helpful to include.

Group notes


Liverpool

Reproduction Log

Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here.