_N8 CIR Northern Tour_ ReproHack Series Notes

_N8 CIR Northern Tour_ ReproHack Series Notes === ###### tags: `Templates` `Reprohack` `hackpad` # Plan of Action ## Welcome and Introduction - Into Slides :computer: Form teams --- Feel free to tackle papers individually or as teams. :dart: Select papers --- - Choose paper from list of proposed papers - Register the paper selected and the participants reproducing below. You can copy, paste and edit the following template: ``` ### **Paper:** <Title of the paper reproduced> **Reviewers:** Reviewer 1, Reviewer 2 etc. ``` :books: Reproduce --- - Attempt to reproduce papers from available materials and documentation - Make notes about your experiences, in particular with respect to how easy it is to: - :earth_africa: navigate the materials - :repeat: reproduce the analysis - :recycle: reuse the materials :memo: Feedback to authors --- * Fill in the author feedback form, documenting your experiences reproducing your chosen paper --- # Event Hackpad # Newcastle :::info - **Location:** _Newcastle University_ - **Date:** _2020-01-21_ - **Participants:** - Alison Clarke (Durham University / @alisonrclarke ) - Jonathan Frawley (Durham University) - Marion Weinzierl (Durham University) - Juan Ojeda (Newcastle University) - Victoria Kurushina (Newcastle University) - Helen Clare (Jisc / @helenclare) ::: ## Reproduction Log Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here. ### **Paper:** <Spectral measure of color variation of black-orange-black (BOB) pattern in small parasitoid wasps (Hymenoptera: Scelionidae), a statistical approach> **Reviewers:** Juan Ojeda ### Discussion notes Easy to reproduce. GitHub repository downloaded and code run on RStudio without major problems. The only issue was to manually create a folder called "Figures", without it figures were not saved. Figures were identical to the ones on the paper. ### **Paper:** <Social-evaluative threat: Stress response stages and influences of biological sex and neuroticism> **Reviewers:** Marion Weinzierl ### Discussion notes * I got two plots that matched those in the paper. There was an additional plot and lots of .xlsx files produced, but I could not easily match them up with tables or plots in the paper. * The code gave lots of 'object not found' errors which I could not resolve. * The code had very good in-code documentation (comments), and a spreadsheet that explained the types of data. However, it did not have a README or other documentation, and the 'Steps to reproduce' description was extremely sparse. * The code was one very long script that was structured in sections. A better structuring, using functions and splitting up the code into more files might have helped to read it. Also, more descriptive variable names would be good (and not assuming that abbreviations are understood by someone else). ### **Paper:** < PREPRINT: Using digital epidemiology methods to monitor influenza-like illness in the Netherlands in real-time: the 2017-2018 season> **Reviewers:** Alison Clarke, Helen Clare #### Discussion notes * It would be useful to give a link in the README to download R Studio, for new R users. * It would also be good to say which files to run etc, e.g. I tried running manuscript first but that didn't work without running the source code file. * "CAVE: Please Check - gtrendsR developer version required!" was output partway through running the code. - It might be more obvious to mention this in the README as I missed it the first time running. - It would be better to give a date or commit hash for the development version used, as the current development version will change over time. Describing the problem with the current version would also be useful, as then we could check the release notes to see if it's fixed in a later version. ### **Paper:** <A multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories K. Hinsen and G.R. Kneller, J. Chem. Phys. 145, 151101 (2016)></A> **Reviewers:** Victoria Kurushina, Jonathan Frawley, Juan Ojeda ### Discussion notes Lots of issues setting this up: - Instructions in general very vague, they just point to the "ActivePapers" tool and do not give instructions on how to use it to run code, etc. - Requires "ActivePapers" tool which seems like it is not well supported: Documentation has dead links - Does not specify which version of ActivePapers to install (There are at least 3: Python 2, Python 3, JVM). I figured out that they wanted Python 2 by process of elimination - Some dependencies missing for ActivePapers: h5py, matplotlib, numpy - Requires Python 2 which is now at end of life - No Windows install instructions, and we could not get it to work - And even once we ran this and got access to the data and code, the code does not work with the latest version of ActivePapers. ### **Paper:** <Evaluation of the ‘Irish Rules’: The Potato Late Blight Forecasting Model and Its Operational Use in the Republic of Ireland by Mladen Cucak, Adam Sparks, Rafael de Andrade Moral, Stephen Kildea, Keith Lambkin and Rowan Fealy></A> **Reviewers:** Jonathan Frawley ### Discussion notes - No docs on how to run it - Takes a long time to install dependencies (Over half an hour!) - Some errors when running from a clean install that needed to be fixed: `bind_rows -> dplyr::bind_rows`, `rename -> dplyr::rename`, `add_column -> tibble::add_column` - Some system level packages required (on Fedora 31): `sudo dnf install libxml2 openssl-devel R-curl` - Confused by the suffix of `(1)` on some file names, does this indicate a newer version of the code? - But! In the end, I was able to reproduce a lot of the plots in the paper!  #### Feedback on the Feedback form - The introductory slide content + the feedback form are currently under active development. They will form the basis of the Reproducibility Review form in the new Hub and the Reproducibility report to be published in ReScience C. The plans and current thinking is outlined in this [issue](https://github.com/reprohack/reprohack_site/issues/3) **How could they be improved? Feel free to add suggestions** :point_down: - Having more time. Finishing at 17:00 - Making an exhaustive list of software (with versions!) to reproduce ALL the papers and sending this list _in advance_ to the participants for they could install a bunch of things prior to the session and start directly with the reproduction at the workshop. - Starting the workshop session with an easy going paper and reproduce the results together with the instructor with the step-by-step guidance. #### HPC Reprohack Special Edition - Try and discuss at CW2020 #### Ten Year Reproducibility challenge - https://github.com/ReScience/ten-years - Workshop in Bourdeaux 22nd June 2020 #### Example discussion Topic 1: Remote ReproHacks - Can we make this happen? - What technologies would we need? - What would loose compared to the localised experience? Could we make up for it? #### Have python / django skills or just want to get involved? Check out our open issues in the our new under development hub: https://github.com/reprohack/reprohack_site ### Group notes  *** # Leeds :::info - **Location:** _University of Leeds_ - **Date:** _2020-02-14_ - **Participants:** - Anna Krystalli (University of Sheffield, @annakrystalli) - Isabel Birds (University of Leeds, @IzBirds) - Haroldas Bagdonas (University of York, @GABRAHREX) - Nick Sheppard (University of Leeds, @OpenResLeeds) - Joanna Leng (University of Leeds) - Alistair Curd (University of Leeds) - Nick Rhodes (University of Leeds) - Nujcharee Haswell () ::: ## Reproduction Log Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here. ### **Paper:** <Title of the paper reproduced> **Reviewers:** Reviewer 1, Reviewer 2 etc. ### **Paper:** <Comparisons of Citizen Science Data-Gathering Approaches to Evaluate Urban Butterfly Diversity> **Reviewers:** Nick Sheppard, Graham Blyth Data and code availability clearly available as Supplementary Info (github and zenodo) No description or link back to the paper in Zenodo Succinct readme outlining archive contents (data and scripts) First barrier (as novice and new to R) that I can't install R on UoL machine without admin rights. Colleague suggested Anaconda will exe without admin and includes Rstudio so trying that... R project creatated by pointing at top level folder. Tried to run Species-table - packages needed ggplot2 tidyverse tidyr tinyTex two stages Able to reproduce one of graphs in paper. ### **Paper:** <Quantitative analysis of spectroscopic Low Energy Electron Microscopy data: High-dynamic range imaging, drift correction and cluster analysis></A> **Reviewers:** Haroldas Bagdonas(**MacOS**), Joanna Leng(**Windows 10**) ### Discussion notes General: - In github readme.md, we think that the authors should consider providing hardware information and give as much detail as possible about hardware configurations that were successful at running jupyter notebooks. -- Also consider setting recommended and minimum hardware requirements - running these computations on consumer-grade hardware may be impractical. MacOS: - All the code is well formulated and written in Jupyter notebook, which is a significant benefit to reproducibility. - The package requirement list is present. Huge benefit - In general the use of jupyter notebook and having everything there is excellent. Perfect for reproducibility. -- Notebook 0: NetCDF4 fails at runtime. "RuntimeError: NetCDF: HDF error" -- Notebook 1: Works as intended. -- Notebook 2: Works as intended. Long time to wait for computations to finish on consumer Notebook. -- Notebook 3: Works as intended. -- The rest of the notebooks assumed to be working as inteded. - Should consider organising output files better in the code. Currently it saves all files in current directory with all the source .ipynb files. Can get difficult to navigate, when all notebooks finish running. - In conclusion and compared within the context, the authors of this paper did a good job at making their analysis relatively reproducible. Jupyter notebook can be attributed as the main factor of this analysis being relatively reproducible. Windows: - **jupytext** dependancy not available on user's Windows 10, therefore can't get the Jupyer environment initialized. ### **Paper:** <Quantitive analysis of spec></Title> ### **Paper:** <Cell Contractility Facilitates Alignment of Cells and Tissues to Static Uniaxial Stretch> #### (C and MATLAB) **Reviewers:** Alistair Curd * Documentation consists of instructions to install C compiling software, and an initial command to type. * I installed the recommended C compilation software through Miniconda, avoiding problems with administrator privileges. * This software failed to compile with errors on my Windows 10 system. ### **Paper:** <Sea level regulated tetrapod diversity dynamics through the Jurassic/Cretaceous interval> **Reviewers:** Isabel Birds. (**MacOS**) * Methods dense and not clear. * "Raw" data in supplementary has been manipulated. * Could not replicate actual raw data from PaleoDB - some detail missing? * Gave up. ### **Paper:** <Where should new parkrun events be located? Modelling the potential impact of 200 new events on socio-economic inequalities in access and participation> **Reviewers:** Stefano Maffei, Nujcharee Haswell (Ped). #### Discussion notes * Shinny app is very good and easy to navigate. Successfully ran this locally. * Suggestion on alternative of color legend Red/Yellow/Green. Not sure if this is accessible? * Some comments on how to run the script in individual R files however it would beneficial to have all instructions in Readme.md * Can successfully re-produce p_imd_dist_runs plot as per the paper. * No documentation on package dependency * It would be useful in vignette to have details of packages and versions last succcessfully ran. * src2_data_proc.R - england_sp object is commented out but referred to elsewhere * line 144 - error srvd_lsoa merging data frames as number of rows didnt match ### **Paper:** <Multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories> **Reviewers:** Alistair Curd, Reviewer 2 etc. #### (Python) * In Active Papers format. The Active Papers Python tool did not install to use as documentation says it should. Attempting to access the README through h5py was also unsuccessful. ### **Paper:** <Bayesian determination of the effect of a deep eutectic solvent on the structure of lipid monolayers></A> **Reviewers:** Haroldas Bagdonas(**MacOS**) ### Discussion notes - Can't get conda environment setup on MacOS Mojave(**v10.14**). Some dependancies are impossible to compile. -- I assume the author ran this analysis on Linux, as one of the packages require gcc to compile. Apple MacOS uses clang to compile C++/C code. - Otherwise, I am pretty certain that everything would be working if I was on my Linux machine. -- The readme.md and jupyter notebooks look relatively complete and offer enough information for the user to find solutions for themselves in case specific dependancies would fail. - Didn't manage to go through the reproducibility in detail, due to failing to get conda environment properly setup. ### **Paper:** <Evaluation of the ‘Irish Rules’: The Potato Late Blight Forecasting Model and Its Operational Use in the Republic of Ireland> **Reviewers:** Isabel Birds (**MacOS**) ### Discussion notes * Straightforward to run in RMarkdown, but not obvious to a new user. Eg how to access data. * Comprehensive list of R packages used given, but not version numbers - could cause issues in future. * Need to have more clarity for data. E.g. the source of historical weather data (Met Éireann synoptic weather station) is given, but no further information, eg a key. Perhaps a clear download link including metadata etc? * Clear description of quality control justifications. * Multiple versions of Rmd files on Github - why? * Overall good, but improvement possible. ### **Paper:** A multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories **Reviewers:** Nick Rhodes (**Linux/Debian**) ### Discussion notes * Packaged code,data and Results together using ActivePapers in a single file made easily accesible on Zenodo. * Provided no instructions of how to install software dependancies required to run the provided code. * No reference to versions of software dependancies required. * By referencing the ActivePapers website I was able to install activepapers via conda and libraries required. Note ActivePapers provides a non-conda install method (for users of Linux). * No readme provided within the ActivePapers file, but some clues given in a report.md file. ### **Paper:** <Ecological Network assembly: how the regional meta web influence local food webs> **Reviewers:** Euan McDonnell (**Linux/Debian**) ### Discussion notes * Very comprehensive methods, provides R-version (3.4.3), the igraph package version 1.1.2 * Even provide a comprehensive github with source code & data https://github.com/lsaravia/MetawebsAssembly * Some scripts don't function properly - they generated their own functions/package, however in https://github.com/lsaravia/MetawebsAssembly/blob/master/MetaWebAssemblyModelAnalysis.Rmd their own parameters/arguments for Line 61 (AA <- metaWebNetAssembly(A,0.05,1,0.2,tf) ) doesn't work. ### **Paper:** <Algorithm configuration data mining for CMA evolution strategies> **Reviewers:** Stefano Maffei, Nujcharee Haswell (Ped) ### Discussion notes (Python) * Repo is binder compatible * Repo has requirements.txt with list of packages dependencies and versions required to test / deploy * Successfully run the first diagram however the rest of notebook takes too long to run (10+ mins) for block 4 so we stopped. --- ### Discussion notes  * Target platforms - Joanna Leng mentioned that it appeared code was designed to run on HPC - this information is very helpful to include. ### Group notes  *** # Liverpool :::info - **Location:** _University of Liverpool_ - **Date:** _2020-02-25_ - **Participants:** - Anna Krystalli (UoSheffield / twitter / etc) - Fran Biggin (Lancaster Uni - CHICAS/ @francesbiggin) - [Cai Wingfield](http://caiwingfield.net) ([Lancaster University](http://www.lancaster.ac.uk/) / [Psychology](http://www.lancaster.ac.uk/psychology/) / [Embodied Cognition Lab](https://www.lancaster.ac.uk/staff/connelll/lab/)) - Matthew Carter (University of Liverpool / mjcarter.co) - Kostas Alexandridis (University of Liverpool) - Manhui Wang (University of Liverpool) ::: ## Reproduction Log Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here. ### **Paper:** <Title of the paper reproduced> **Reviewers:** Reviewer 1, Reviewer 2 etc. ### Discussion notes  ### Group notes  ### **Paper:** Growth Dynamics of Independent Gametophytes of Pleurosoriopsis makinoi ( Polypodiaceae) **Reviewers:** Marie Phelan ### **Paper:** [Don’t Hold My Data Hostage – A Case For Client Protocol Redesign](https://hannes.muehleisen.org/p852-muehleisen.pdf) **Reviewers:** Cai Wingfield (MacOS Mojave) - [Code repository](https://github.com/Mytherin/Protocol-Benchmarks) linked to from the paper's PDF, which is very useful. The data deposition is then linked from the code repository. - [Data](https://zenodo.org/record/1305845) is deposited inside an Ubuntu virtual machine image, which is available from a location with a DOI, which is nice. Also this means dependency management is presumably not an issue. - Instrutions for running relevant queries and the resultant output locations is given alongside the image download. - Requires [VirtualBox](https://www.virtualbox.org/) to be installed, but this is free and open source. - The encapsulated machine and data image is over 50GB. Waiting for that to complete... - Unfortunately the download keeps failing after 30–40GB so I haven't been able to access it yet... ### **Paper:** [Supercurrent-induced Majorana bound states in a planar geometry](https://scipost.org/10.21468/SciPostPhys.7.3.039) **Reviewer:** Cai Wingfield (MacOS Mojave) - All relevant files for reproduction contained in a [single archive](https://zenodo.org/record/2653483) with a DOI, which was nice, but it wasn't referenced or linked in the [paper's PDF](https://scipost.org/SciPostPhys.7.3.039/pdf) that I could see. - `README.md` included instructions on how to install the dependencies and which files were releant to the reproduction. - `conda` installation following README instructions worked fine. - Inluding a `conda`-readable versioned dependencies file is a *huge* help! - Using a Jupyter notebook is a really handy way to make reproduction easy and convenient. - No specific instructions on how to run Jupyter notebook files. - `jupyter` wasn't specificaly included in list of dependencies… but I'm using PyCharm which helps out with this kind of thing so it was ok. - Some errors when running the notebook, but didn't seem to actually cause any problems to the generated files: - "Notebook kernel does not match project kernel" (can be "fixed" in PyCharm). - "Notebook is not trusted, JavaScript hasn't been executed". - Running the notebook produces and saves PDFs of the figures in the paper, but does not reproduce the entire paper itself. Some people like the whole paper to produced by the code, so that a corrected error is automatically encorporated into a generated PDF. (Personally I have no opinion on this.) - Data is stored in undocumented binary `.p` files, but as this is seemingly a simulation paper, the data files are actually generated by the notebook, which is very neat. - First run (loading precomputed data) produced most figures from the paper identically (figures not produced were just schematic figures and diagrams). - **Yay!** - Second run (recomputing data) is taking a long time... but the file does say it will require a cluster to run in a reasonable time, and I'm using a laptop. ;) - It didn't finish running in the time we had available, so I couldn't verify if the included precomputed data files were identical. - This also makes including the precomputed files very convenient! :) ### **Paper:** [Growth Dynamics of Independent Gametophytes of Pleurosoriopsis makinoi (Polypodiaceae)](https://www.joelnitta.com/publication/2019-03-27_pleurosoriopsis/) **Reviewers:** Cai Wingfield (MacOS Mojave) - I didn't have time to delve into this one too much, but I followed the instructions on the [code repository](https://github.com/joelnitta/pleurosoriopsis) and I got a PDF which reproduced the [published one](https://drive.google.com/uc?export=download&id=1N_fgFtD2l279R-ZhCUrT1HTvEt5xZIdM). - The code repository was linked to from the paper's PDF, which was very nice. - The formatting was a little different (linked pdf was 2 columns, generated one was 1), but figures and tables seem identical. - **Nice!** - Really nice for all code, data and text to be included in the same repository, this was was very straightforward to run, once I'd got Docker to work, which took a bit of fiddling. - To follow the instructions on the code repository, [Docker](https://docs.docker.com/install/) was required, which seems like a lot of machinery. However it worked, even if I didn't understand exactly what I did. ### **Paper:** [Algorithm configuration data mining for CMA evolution strategies](https://dl.acm.org/doi/10.1145/3071178.3071205) **Reviewers:** Cai Wingfield (MacOS Mojave) - The paper was not open-access, so could not be downloaded from the specified link... luckily I have institutional access via VPN. - The paper links to the [code repository](https://github.com/sjvrijn/cma-es-configuration-data-mining), which is very handy! - I worked out I needed to use Python 2.7 instead of Python 3. - The specific `pip install` command given in the instructions produced an error: ``` ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv. ``` (this may be me not knowing how to use Python 2 virtual environments properly). Rerunning without `--user` seemed to work ok, with the obvious Python 2.7 warning: ``` DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support ``` - There is also a [Binder link](http://mybinder.org/repo/energya/cma-es-configuration-data-mining), which is also very handy. - There were some trivial warnings, e.g.: ``` /home/main/.local/lib/python2.7/site-p ackages/matplotlib/font_manager.py:1331: UserWarning: findfont: Font family [u'sans-serif'] not found. Falling back to DejaVu Sans (prop.get_family(), self.defaultFamily[fontext])) ``` ### **Paper:** [Bayesian determination of the effect of a deep eutectic solvent on the structure of lipid monolayers](https://pubs.rsc.org/en/content/articlelanding/2019/CP/C9CP00203K#!divAbstract) **Reviewers:** Matthew Carter (MacOS) * Data, code and paper stored in separate repositories but easy to find. * Nice to include all the figures from the report as pdfs. * Instructions to setup the Conda environment and compile notebooks were clear. * Packages installed after fixing an issue with pip. * Simple to compile and run the code. * Could more clearly label variables (as someone not familiar with the field, I don't know what 'sp1,2,3' mean). * Previous point meant it wasn't intuitive how to run the Python files manually (i.e. without the makefile). * Could include one makefile to build the project from start to finish, and another which only compiles LaTeX code. This would make it easy to update the report when running the code manually. * Good to include a small dataset as well as the full dataset (note: still takes ages to run the 'small' dataset, need an even smaller one?). * Could not verify if the results were reproducible due to the runtime of the code. ### **Paper:** [Communication: A multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories](https://aip.scitation.org/doi/10.1063/1.4965881) **Reviewers:** Matthew Carter (MacOS) * No instructions on how to install the required packages etc. and get the code to run. * Found 'ActivePapers' extremely unintuitive to work with, would've preferred to have the code, data etc. on something like Github. * Attempted to extract data and code from AP however, the extracted data files were blank. * State which version of Python is used. AP isn't compatible between Python 2 and 3 making it hard to reproduce the results. * Unfortunately, couldn't get AP to work, so I couldn't verify the reproducibility of code etc. ### **Paper:** [Bayesian determination of the effect of a deep eutectic solvent on the structure of lipid monolayers](https://pubs.rsc.org/en/content/articlelanding/2019/CP/C9CP00203K#!divAbstract) **Reviewers:** Manhui Wang (Linux) - Generally good notes to follow. - Dependency packages are clearly mentioned. It will be good to note the main Python version and Anaconda version. - pip installation could be very tricky if you don't have sudo permission ("pip --user" could install dependency packages in user's home directory). - It may take many hours to run all the analysis. It will be good to provide smaller analysis to reproduce. - It will require some dependency packages to generate the pdf file on Linux. ### **Paper:** Where should new parkrun events be located? Modelling the potential impact of 200 new events on geographical and socioeconomic inequalities in access and participation. **Reviewers:** Fran Biggin (MacOS) - General comment: Really nice Shiny App. If viewing before reading the paper it would be nice to have some explanantion of the variables. - Everything needed to obtain the data and run the code is available on GitHub, but it would be nice to have some brief detail in the Readme on how to download and run the code. - File structure within the src folder is nice and intuitive. - R scripts are well commented and I particularly like the the fact that links to open source data are included in the comments. - Some problems in src2_data_proc.R: line 102 commented out so subsequent lines don't run; line 149 and 150 create error messages; line 167 produces warninngs regarding missing variables. - minor point: rgdal is a required package, but isn't included in the install_n_load function. - Some of the summary stats are out when compared to the ones reported in the paper. E.g. Mean distance from LSOA to nearest parkrun 4.74 compared to 4.66. I think this relates back to the lines that were unable to be run in src2_data_proc. - As you progress with the analysis script it gets harder to relate the code and output to the relevant parts of the paper, so it's difficult to judge whether the reproduction of the stats matches the paper. Maybe a list of variable names and their descriptions and locations in the paper (e.g. section/table number) would be useful as a comment at the start of each chunk of analysis. - In order to create the map figure you need a GoogleMaps API ### **Paper:** Quantitative analysis of spectroscopic Low Energy Electron Microscopy data: High-dynamic range imaging, drift correction and cluster analysis **Reviewers:** Kostas Alexandridis (Linux) *Some minor issues with installing the packages, (e.g. I had to include a channel in anaconda) *Data link was not working I had to manually download all the file from the website. It was not very straightforward but it was okay. *I liked the fact that each section of the paper correspond to a specific jupyter notebook. So, the user could understand what the code was doing in each step. *A potential improvement could be a general makefile that can compile the whole project and generate all results. *I had to install some extra packages like h5py *In the accuracy_testing notebook, I could run the notebook as it gave me a memory error, for opera browser, when I switch to chromium it worked, but it was very slow. *The PCA components graphs were almost identicall, I am not the expert to juddge but all graphs seemed the same with the paper. ### Discussion Notes * ### Resources: #### Binder - [Sample Binder Repositories](https://mybinder.readthedocs.io/en/latest/sample_repos.html) - [Boost Your Research Reproducibility with Binder](https://github.com/alan-turing-institute/the-turing-way/tree/master/workshops/boost-research-reproducibility-binder): Turing Way workshop materials #### `rrtools` - [`rrtools` package](https://github.com/benmarwick/rrtools) - [Reproducible Research in R with rrtools](https://annakrystalli.me/rrtools-repro-research/): workshop materials *** #### Cautionary tale: - [Characterization of Leptazolines A-D, Polar Oxazolines from the Cyanobacterium Leptolyngbya sp., Reveals a Glitch with the "Willoughby-Hoye" Scripts for Calculating NMR Chemical Shifts.](https://www.ncbi.nlm.nih.gov/pubmed/31591889): see [table 2](https://pubs.acs.org/doi/10.1021/acs.orglett.9b03216#tbl2) for differences in calculated values with Mac, Windows & Linux # Sheffield :::info - **Location:** _University of Sheffield_ - **Date:** _2020-03-10_ - **Participants:** - Jane Doe (Affiliation / twitter / etc) - Florencia D'Andrea (INTA. Argentina / @cantoflor_87) - David Wilby (University of Sheffield / @DrDavidWilby) - Robert (Bob) Turner (University of Sheffield / @bobatron) - Anna Krystalli (University of Sheffield / @annakrystalli) - Simon Rolph (University of Sheffield / @simon_rolph) ::: ## Reproduction Log Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here. ### **Paper:** <Title of the paper reproduced> **Reviewers:** Reviewer 1, Reviewer 2 etc. ### **Paper:** [Population structure and phenotypic variation of Sclerotinia sclerotiorum from dry bean (Phaseolus vulgaris) in the United States] **Reviewers:** Florencia D'Andrea, Juan Ojeda Some problems installing Docker and running RStudio server. The Rmd was not straightforward to use. There were too many scripts to reproduce the figures from the paper, most of the scripts produced multiple tables and figures not present in the paper. ### **Paper:** [Explicit (but not implicit) environmentalist identity predicts pro-environmental behavior and policy preferences](https://www.sciencedirect.com/science/article/pii/S0272494418300549?via%3Dihub) **Reviewers:** Jim Uttley - Not too clear where to start with the reproduction - presumably some parts rely on completion of other parts - e.g. cleaning of data. Perhaps a readme file giving overview instructions on reproducing the work would be useful. - Unable to reproduce SPSS stuff as my SPSS licence had run out - Only looking at reproducing meta-analyses and SEMs (parts done in R), as do not use STATA - Was it not possible to do all data cleaning and analysis in one software (e.g. R)? - Error messages when trying to access ‘mvmeta’, ‘igraph’ and ‘psych’ packages (not included in the ‘wants’ variable. Manually installed these three packages. - When reading in ‘merged’ dataset, had warning error of 37633 parsing failures - not sure if this is a problem? - Plots reproduce well - Variable names are re-used in the script, e.g. ‘model’ - fine if running the script all in one go, but if you run different chunks at different times, this caused a few confusing issues - Not very explicit which bits of code and analysis relate to which statistical results. I am not particularly familiar with the statistical methods used though, so perhaps it would be more obvious to someone who was familiar with them. - Results appear to be reproducible, but had difficulty identifying which statistics reported in the paper relate to which bits of the analysis and code. For ease and transparency of reproduction, the link between the bits of code and what is reported in the paper could be made more explicit. ### **Paper:** [Spatial modelling of rice yield losses in Tanzania due to bacterial leaf blight and leaf blast in a changing climate](https://link.springer.com/content/pdf/10.1007/s10584-015-1580-2.pdf) **Reviewers:** David Wilby ##### Access GitHub repo was straightforward to access. Updating the repo's README to explain which code to run and how would speed up reproducibility. README also refers to a data directory, which seems to be created by running one of the R scripts ##### Installation Dependent on Windows, and on ArcGIS which is proprietary software. Uses `arcpy`, python API for ArcGIS. Would be interesting to port to use `QGIS` and `pyqgis` as an open-source alternative. Difficulty installing cropsim via R-Forge. ##### Data Data was easily obtained once the correct R script was run, however this could be better documented. The README in the github repo refers to a data directory, but this isn't present. ##### Documentation It was undocumented how to run the code and in what order to reproduce the results. For instance to run the first R script, the `cropsim` package is required. A link is provided to the package on R forge, but it is protracted to work out how to install the package. Subversion was required to install. The reproducibility of the analysis would be significantly improved by a walkthrough-style guide of how to run the code in order to reproduce the results. ##### Analysis Table 1 was reproduced exactly. Running 03_Figures_2_and_3.R resulted in the error `Error in value[j, ] : incorrect number of dimensions` at line 36 Running 04_Figures_4_5_6_and_7.R resulted in the error: `Error in loadNamespace(name) : there is no package called ‘mapproj’` ###### Versions etc. Windows 10 running in a virtal machine R version 3.6.3 running in RStudio 1.2.5033 ArcGIS Desktop 10.7.1 ##### ### **Paper** [Comparisons of Citizen Science Data-Gathering Approaches to Evaluate Urban Butterfly Diversity](https://doi.org/10.3390/insects9040186) **Reviewer:** Simon Rolph The doi is attached to v0.9.1 of the code but there is a 0.9.2 available. I meant to work with v0.9.1 but accidently used the master branch, unsure if there are many changes. **Access** Paper is open access. Code on zenodo with link to github repo. Can download straight from zenodo but I have cloned from github. Readme includes a few sentences on how the repo is structured. Pdf documents describing analyses with extra figures that are not included in the original paper. **Installation** R packages needed are in scripts as library(package) with no indication of the version number used for this analysis. **Data** Data is clearly separated from code but not very well documented. Data is stored as csv files. But there is also a bash script for downloading inaturalist data via a GBIF doi. This is good because it means they're not storing a big file on GitHub but this isn't indicated in the readme. In order to download this file you must go to a doi link and download the file into a new folder `data/gbif` and run the bash script. I worked out where I had to put the file from the error so clearer instructions would have been useful. Reading the paper is needed to understand the data. There are three sources of data which this analysis compares. For each of the sources there are different versions of the data eg. `BioScanData.csv` and `BioScanDataComplete.csv` Some column names are not documented eg. opler-wright-min, opler-wright-max **Documentation** Not clear script to run first. Includes instructions to create an output folder. How do the Rmd files relate to the scripts? It is fairly clear when you go into the Rmd scripts what everything does, which is ok for a project of this size and complexity but if the project had been more complex it would be very useful to have a overview readme of an order to run the scripts and what each of them do. **Analysis** `gbif-additional-processing.R` tries to load a file that doesn't exist. This file has to be downloaded by from GBIF then proccessed using included bash script but it's unclear that it was required. Attempted to use `sites-plot.R` to draw the map in the paper but it gave an error: `Error in scalebar(data = latlongs.scale, dist = 5, location = "bottomleft", : transform should be logical.` Assumed to be to due to an update in the `ggsn` package. Listing the version numbers of the packages would make this more reproducible and prevent any future code-breaking changes. When plotting the map without the `scalebar` function it gave the error `there is no package called ‘mapproj’`. After installing the package it worked fine. There's no indication that `mapproj` was needed to be installed. This might have been the cause for the other packages that were required but I happened to have them installed anyway. Two figures in the paper could be reproduced. R markdown documents are nicely put toegther and clear. **Summary** Analysis was reproducible, all stats and figures (and more) could be reproduced but the overall workflow is not initially clear to the user. A more extended readme would have been useful. Including package version numbers and dependencies should prevent some bugs. Generally good! 8/10 ### Discussion notes  Rob: https://github.com/lsaravia/MetawebsAssembly Progress: From a clean R install, had to install various packages. `doParallel` had to be installed manually and wasn't listed anywhere (that i've found). Chunks have to be ran individually, can't use knit (noted at top of rmd). I think it was failing to open PNG device because the directory `Figures` did not exist, but it takes ~40 minutes per chunk so will be a while before i can see if creating that directory has helped. > Error in png("Figures/RegionalFoodWeb.png", width = 6, height = 6, units = "in", : > unable to start png() device > In addition: Warning messages: > 1: In png("Figures/RegionalFoodWeb.png", width = 6, height = 6, units = "in", : > unable to open file 'Figures/RegionalFoodWeb.png' for writing > 2: In png("Figures/RegionalFoodWeb.png", width = 6, height = 6, units = "in", : > opening device failed ### Group notes  [Rmarkdown: literate-programming](https://github.com/annakrystalli/literate-programming) workshop materials [Literate programming with rmarkdown](https://github.com/annakrystalli/lit-prog): Sheffield R Users Group talk materials *** # Manchester :::info - **Location:** _University of Manchester_ - **Date:** _2020-03-12_ - **Link to Materials:** https://github.com/annakrystalli/n8cir-reprohacks - **Participants:** - Anna Krystalli (University of Sheffield / annakrystalli) - Daniela Gawehns (Leiden Uni/ @dgawehns) - Linda Nab (Leiden University Medical Center/ @lindanab1) ::: ## Breakout Rooms and Papers - **Paper Awesome** Room 1 - Reviewer one - Reviewer two - **Paper Awesome2** Room 2 ## Reproduction Log Log details of the papers reproduced and reviewers (see template in Agenda). You can add links to any derived materials produced here. ### **Paper:** <Title of the paper reproduced> **Reviewers:** Reviewer 1, Reviewer 2 etc. ### **Paper:** 6. Resolving the Measurement Uncertainty Paradox in Ecological Management **Reviewers:** Anna Krystalli ## Discussion notes ### First Talk ### Second Talk ### Third Talk  ### Group notes  ***

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.