# NFDI Basic Service Proposal ## Common Stack for Online Reproducible Data Science & AI Advances in fields such as Data Science, Machine Learning, and AI are enabling a wealth of sophisticated data analysis methods across many scientific domains. However, these analysis methods are still only seldom available at a click of a button. For once, many of these methods require access to cloud-based computational infrastructure. But more importantly, analyses are typically multi-stage processes, beginning with the extraction of relevant features, the training of models, and concluding with the visualization and interpretation of results. This leads to the situation that even for researchers with the necessary infrastructure, availability of the respective method implementations, as well as access to the original data, the results of sophisticated modern data analysis pipelines are prohibitively hard to replicate and validate. The envisioned essential service will enable 1-click online reproducibility for FAIR Digital Objects (FDOs) with complex analysis pipelines of large datasets that may even be otherwise difficult to share. The proposed service is of high relevance to all consortia within the NFDI that rely on computational data analysis. ### State of the art for this potential service With the development of the mybinder.org open-source service a first step was made to solve the problem of reproducible data analysis that meets the requirements for (FDOs). However, wider adoption of the service within the NFDI is hindered since 1) it only supports small-scale analysis pipelines and 2) lags the necessary support for the analysis of large datasets 3) that may furthermore not be easily sharable. While offerings such as [GESIS Notebooks](http://notebooks.gesis.org) expand the viability of online reproducibility of FDOs to longer-running analysis pipelines and larger datasets, these services are only available to researchers in some domains. To address these shortcomings a service for scalable online reproducible Data Science & AI is needed. The service consists of a gallery for FDOs, persistent import into Jupyter Hub for binder-ready FDOs, as well as the ability for compatible large-scale batch execution of FDOs, which may otherwise not be easily sharable, to enable scalable 1-click online reproducibility. ### Overall strategy for the possible service with regard to the following stages: #### 1) Service initialisation strategy: The proposed service will be initialized in cooperation with all consortia planning a Jupyter- / Binder-based offering and closely coordinated with the Jupyter community for full compatibility with binder-ready standards. Needs for the service arise in almost all consortia and the service will be successively improved based on user feedback. #### 2) Service integration strategy The mybinder.org service is already the platform of choice for online reproducibility of small FDOs for a number of consortia within the NFDI and is equally popular within the larger scientific community with more than 100k users per week of 30k available FDOs. The 4DS consortium is, via the membership of GESIS in the MyBinder Federation, connected to the service, and the design of a scalable online replication service for FDOs, within the NFDI, will be done in close cooperation with the Jupyter and Mybinder community and follow corresponding standards. #### Address possible challenges and risks: The topic of reproducibility and FDOs receives considerable attention (and funding) in many individual subject- and method-oriented consortia within the NFDI. However, there is currently only limited funding available to ensure the integration and coordination of the efforts across the different consortia. To deliver on the promise of interoperability within NFDI, but also integration with European projects, such as the EGI, and international initiatives, more emphasis needs to be put on the coordination between the efforts. With this proposal, we emphasize the need for the integration of and funding of coordination between these efforts that goes beyond individual subject- or method-oriented consortia.