Reflection on JupyterFAIR

# Reflection on JupyterFAIR ## Agenda 1. Make some reflection as a team, simple retrospective of what went well, what could improve, what not to do for a next round, what recommendations we have for the project sustainability. 2. Make a review of the backlog we want to tackle for a definite first release. 2.1. Make a backlog of the things we want to tackle for closing the NWO project (during the next week). 3. Roadmap, things we don’t have now but it would be nice to explore (for example asnyc, or json schemas for validation, or other ideas each of us have). 4. Dates for open fund workshop: - Frist week of March - Prefered date: consult with datastewards - ## 1.Reflection: ![](https://i.imgur.com/ncAce0L.png) ### What went good (Keep doing) Serkan: - We worked virtually very well, this was not a limitation +1 - We have a running product, without having passed history as a team - The analysis part was quite useful, because it helps us the complete picture +1 +1 - Flexibility to try different things and rescope based on needs evolving in the project +1 - Having flexibility to prototype and be more loose in this first phase, instead of inmediately be compliant with more tough standards of PRs, CI, testing. - Autonomy of each of us developing different functionality was good. - Actions to reach potential users and create visibility of the tools. Manuel: - Good work as dev-team Jose: - Learning from more expericenced developers (Serkan). - I like the drafting of the fairly api package before writing the code... - Also the design discussion - Design of interfaces before focussing on coding. ### What can be improved(less off, stop doing,) Serkan: - We should meet more frequently. - Communication and exchange of knowledge gather during implementation. - The timing could have been improved, I wish to be more responsive, but there was a time limitation. - Time manage of the overall project. All developers spend more time than expected. - Spend too much time in a survey to reach the potential users. Which in the end was not use because we lost the community manager in the team. Manuel: - Improve that each member has the opportunity to speak, share thoughts - Having more agendas, and structure. Jose: - Timing spend on each step of the developing process. Some steps (scopign of solutions) took very long. - Manage ambtions and expertation better, and look into what is feasible on the time available. - Too much focus on community investment in the beginning. It took a told on the time available for development. - Be less defensive in answering questions in Communication events. - Give more attention on limiting the number of features to developed by certain milestones (software releases). ### How we would do things forward (Start doing) Serkan: - Document the tools in more detail for users and developers. - Technical and user documentation for fairly - User documentation for CLI and JupyterFAIR. - Organize coding sprints for boost communication and development. Jose: - For example testing/profiling the performance of download upload - NWO should understand how communities are manage/build for software. You build communicties around actifacts and now around Manuel: Good to look into how communities actually work Jose: - More structure, methodology to guard increments ## 2. Backlog https://github.com/orgs/ITC-CRIB/projects/2 > We drop the idea of setting a backlog for the NWO. What we have now, is more than we promises. ## Notes status on project (internal DCC) > JupyFAIR goals, status and assessment 17-11-2022 Manuel Garcia, Jose Urra **What is JupyFAIR** It's a suite of three tools that aims to make researchers and data manager’s lives easier in the context of computational environments such as HPCs, servers, Jupyter environments and terminals. The open source suite is composed of a python package, a command line application and a Jupyter Extension application. Some key features include, upload and download, local creation of datasets, easy metadata writing in local environments to deposit with one command, and dataset packaging. Why JupyterFAIR? Make the creation and archiving of FAIR data more accessible and straightforward to users. We do this by facilitating the creation of metadata and archiving of datasets supporting two data repositories for now (Zenodo and Figshare). The software architecture is designed to be easy to extend and maintain. **Who is the target audience? Researchers, data managers and data stewards. FAIR data advocates and ambassadors.** **Motivation to develop this project within the DCC** Diversity of opportunities which includes: - Generating impact with scientific community: solving general needs researchers encounter when archiving and reusing scientific data, including (TU Delft) researchers, data stewards, and data managers - Team and expertise development: working with peers from other universities with experience in research software that we can learn from as a young team - Learning what it takes to develop generic open source tools for researcher: Attracting funds that can potentially sustain or increase DCC’s pool of resources - Collaborating closely with 4TU: 4Tu has been active in creating the DCC and shaping its mission, with a more national interest. - Agreements with NWO and stakeholders Original project proposal for NWO (9 months) Contract with University of Twente (1 February - 31 October 2022) **Status and assessment of the project** What has been accomplished? - Fulfilled agreements with NWO and stakeholders, except for software packaging and reports. - An extensible design having a python package at its core, the package is then extended with other interfaces (JupyterFAIR and CLI) - Open source reproducible codebase with documentation, citation, licensing, and acknowledgements. - Set of features that allows us to do essential tasks in managing data, uploading and downloading. - Documentation with guides and tutorials of how to use **Limitations of current outcomes (in terms of open source tool to be used by many)** The current version of the suite tool is not yet stable and the CLI component of the tool is still under development. - We have not yet released an official stable version that users can rely on without guidance and workshops (standalone) - The acceptance testing of the tools (extensive user testing) is limited, we have mostly focused on unit testing to release a stable version, but the ultimate tests are acceptance tests, this is where feedback is collected by users. - Limited software lifecycle management for collaborative development - Limited developer’s documentations for the JupyterLab extension and CLI - Learnings as RSEs within the DCC - What it takes to build an open source tool from scratch including the team behind the development of the tool. This project/team has been developed from scratch including requirements identification, team building, minimum R&D, product backlog building, software architecture design, software development planning and management. We have learned about all aspects that involve designing and delivering an open source tool from scratch. - Open source development of a tool takes more time than what we anticipated. Proper planning of alpha, beta and stable releases are important to progressively get feedback. The first part of developing a foundation of software architecture and key features to release implies many activities beyond just purely coding, documenting, etc. Maintenance and follow up phases are essential parts of the process. Building a team, even among a few people, establishing an effective way of working, as important as it is, it takes time. - Stakeholders value integration of tools. We experienced during the open science festival interest from DANS and Surf. Surf product owner was interested in collaborating and finding integrations. - SMP (Software Management Plan) and methodologies are important. When several stakeholders, including non-technical individuals join a software development team, a working framework to start with a project and understanding of how software is developed are needed. The lack of a common framework at the beginning of the project also took more time than expected. A clear set of roles and approach is also very valuable. - Project timing and funding. As a strategy to increase the changes of funding, we decided to condense the work in nine months. However, practically, it would have been better to ask for the maximum time of 12 months. **Recommendations** - From our side as developers we think It is worth closing the phase of at least alpha testing and packaging of a first release. But ideally beta testing to close the full cycle. It feels wrong to stop without delivering a stable version and getting feedback from users. We should aim to complete the release of the tools. - Share learnings and experiences with different stakeholders and take the opportunity to learn from the project from different stakeholders, interested parties point of view: developers, funders, supervisors. - Share some feedback with NWO to understand better how much time and what requirements, setup favor the development of open source tools for science, this can help them with profiling better the time and money needed, the phases of open source projects, etc. - Assess strategically if there is interest, relevance as a DCC in dedicating more time to the development of open source tools and also in learning how open source projects are developed and maintained. - Subsequently assess and consider the implications the previous point has when it comes to resource management. - Engage other parties who clearly would be interested in maintaining and developing further the tool. For example, on the Open Science Day, people from Surf-Sara and DANS reached out to us and even offered to reuse, improve the tool.