nuest

@nuest

Joined on Aug 29, 2019

  • Paperprivate repo: https://github.com/LEMTideman/Spatial_IMS_private preprint almost out, part of the data is mouse data CODECHECKer notes look at repo > README pip + requirements.txt does not work out of the box after talking with author, install latest version of packages manually, piece by piece can run the Rat_braun_demo_notebook.ipynb, ~ 800 tasks need to complete, might take about half an hour on my machine[..] [Parallel(n_jobs=52)]: Done 809 out of 809 | elapsed: 21.4min finished
     Like  Bookmark
  • BUM-SCIENCE BUMS <-- maybe not ideal :-) Rationale We as individuals are really bad at learning from other people's mistakes, and so are scientific communities. Therefore it is unfortunately warranted that the state of reproducibility and robustness is evaluated for urban mobility research so that the community understands the implications, needs, and shortcomings of the transparency and reusability of their works. Focussing on reproducibility and robustness means that we do not need to collect new data. A possible way to reproduce/check existing publications are BSc and MSc theses. These theses can draw from a shared list of candidate papers to be reproduced/to be robustness-checked. Based on the results of the individual theses, a manuscript can be prepared and include recommendations for how to improve the openness of urban mobility research.
     Like  Bookmark
  • Introduction Many scientific articles report on results based on computations, e.g., a statistical analysis implemented in R. Publishing the used source code and data to adhere to open reproducible research (ORR) principles (i.e., public access to code and data underlying the reported results [@stodden2016enhancing]) seems simple. However, several studies concluded that papers rarely link to these materials [@stagge2019assessing; @nust2018reproducible]. Moreover, due to technical challenges, e.g., capturing the original computational environment of the analyst, even accessible materials do not guarantee reproducibility [@chen2019open; @konkol2019computational]. These issues have several implications [@morin2012shining]: It is difficult (often even impossible) to find errors within the analysis, but publishing erroneous papers can damage an author’s reputation [@herndon2014does] as well as trust in science [@national2019reproducibility]. Also, reviewers cannot verify the results, because they need to understand the analysis just by reading the text [@bailey2016facilitating]. Furthermore, other researchers cannot build upon existing work but have to collect data and implement the analysis from scratch [@powers2019open]. Finally, libraries cannot preserve the materials for future use or education. These issues are also to society’s disadvantage as it cannot benefit fully from publicly funded research [@piwowar2007sharing]. Fortunately, funding bodies, e.g., Horizon 2020 (https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm, last access for this and the following URLs: 20th Dec 19), increasingly consider data and software management plans as part of grant proposals. Accordingly, more editors add a section on code and data availability into their author guidelines (see, e.g., @nuest2019agile; @hrynaszkiewicz2019publishers), and reviewers consider reproducibility in their decision process [@stark2018before]. Nevertheless, these cultural and systematic developments [@munafo2017manifesto] alone do not solve the plethora of reproducibility issues. Authors often do not know how to fulfill the requirements of funding bodies and journals, such as the TOP guidelines [@Nosek1422]. It is important to consider that the range of researchers’ programming expertise varies from trained research software engineers to self-taught beginners. For these reasons, more and more projects work on solutions to support the publication of executable supplements. The key contribution of this paper is a review of applications that support the publication of executable computational research for transparent and reproducible research. This review can be used as decision support by publishers who want to comply with reproducibility principles, editors and programme committees planning to adopt reproducibility requirements in the author guidelines and integrate code evaluation in their review process [@eglen2019], applicants in the process of creating data and software management plans for their funding proposals, and authors searching for tools to disseminate their work in a convincing, sustainable, and effective manner. We also consider aspects related to preservation relevant for librarians dealing with long-term accessibility of research materials. Based on the survey, we critically discuss trends and limitations in the area of reproducible research infrastructures. Scope: This work focuses on applications that support the publication of research results based on executable source code scripts (e.g., R or Python) and the underlying data. Hence, we did not consider workflow systems (e.g., Taverna [@wolstencroft2013taverna]) or online repositories (e.g., Open Science Framework, https://osf.io/). Also, this paper does not discuss how to work reproducibly since this is covered already in literature (e.g., @rule2019ten, @sandve2013ten, @greenbaum2017structuring, @markowetz2015five). The review is a snapshot of the highly dynamic area of publishing infrastructures. Hence, some of the collected information might become outdated, e.g., an application might extend the set of functionalities or be discontinued. Still, reviewing the current state of the landscape to reflect on available options is helpful for publishers, editors, reviewers, authors, and librarians. All collected data is available in the supplements (see Data and Software Availability). The paper is structured as follows: First, we survey fundamental concepts and tools underlying the applications. We then introduce each application and the comparison criteria followed by the actual comparison. The paper concludes by a discussion about the observations we made, trends, and limitations. Background Packaging computational research reproducibly The traditional research article alone is not sufficient to communicate a complex computational analysis [@donoho2010invitation]. To address this issue, computational reproducibility concerns the publication of code and data underlying a research paper. This form of publishing research allows reviewers to verify the reported results and readers to reuse the materials [@barba2018terminologies]. To achieve that, all materials are needed, including not only the data and code but also the computational environment. A basic concept for such a collection is the research compendium, a “mechanism that combines text, data, and auxiliary software into a distributable and executable unit” [@gentleman2007statistical]. The concept was extended by a description and snapshot of the software environment using containerization resulting in the executable research compendium [@nust2017opening]. Containerization and virtualization are mechanisms to capture the full software stack of a computational environment, including all software dependencies in a portable snapshot [@Perkel_2019]. In contrast to containerization, virtualization also includes the operating system kernel. Despite this difference, both approaches have proven to improve transparency and reproducibility [@Boettiger2015; @howe2012]. One containerization technology is Docker, which is based on so-called Dockerfiles, human and machine readable recipes to create the image of a virtual environment [@Boettiger2015]. These recipes add an additional layer of documentation making Docker a popular tool in the area of computational reproducibility [@nust2019containerit]. A research compendium should contain an entry point, i.e., a main file that needs to be executed to run the entire analysis. One option to realize these entry points is the concept of literate programming, an approach for interweaving source code and text in one notebook [@knuth1984literate]. Two popular realizations of such notebooks are Jupyter Notebooks [@kluyver2016jupyter] and R Markdown [@baumer2014r]. Combining source code and data in one document is advantageous over other approaches, such as having code scripts and the article separated, which might result in inconsistencies between the two. A further advantage is the possibility to execute the analysis with a single click, so called one-click-reproduce [@edzer2013]. This form of making computational results available lowers the barrier for others to reproduce the results and thus increases trust and transparency of computer-based research.
     Like  Bookmark