SPEC .. — CI Best Practices for Tutorials

--- title: "SPEC .. — CI Best Practices for Tutorials" number: date: 2025-05-11 author: - "Brigitta Sipőcz <brigitta.sipocz@gmail.com>" - "Dan Allan <daniel.b.allan@gmail.com>" - "Melissa Weber Mendonça <melissawm@gmail.com>" - "Ross Barnowski <rossbar15@gmail.com>" endorsed-by: ---  ## Description  ### Core Project Endorsement  ### Ecosystem Adoption  ## Implementation  ## Notes  - Recommend that CI running against PRs be expected to pass--if it fails, it should be important - Schedule regular runs of CI against nightlies/pre-releases - Generate a badge for the workflow, that can be collected into a dashboard. Use badges as a best practice, so users can directly check when CI/the reproducible infrastructure last run and if it was passing at that time. - GitHub actions security concerns - Need to trust action creators - Pin to hash vs. version/tag? - ### Background Define flavours of tutorials, and list words the different projects may use for the same/similar content: - content that are for individualized/async learning. These are supposed to run on on-specialized environments on user computers, at any time they visit - long form narrative documentation and examples - guides - how-tos - tutorials - scientific lecture notes - workshops, material that are syncronious learning, and may not fully functional off cycle. Often these are also called tutorials (e.g. university course material, scikit-image workshop tutorial) - reproducible science use cases, still expected to run on ### Pain points to address - Some examples have special (heavy) software depedencies that should be required to run the basics. Some tutorials may require conflicting dependencies. - Likewise, some require data sets that are large or require authentication. - Some example require a long time to run or access to specialized resources. - The use of plots that may appear in static or interactive modalities is subtle. - In the interactive modality, you want one figure to update progressively as you go. - In the static modality, you want to diplay snapshots of the canvas. - Supporting various modalities with shared code - Source (version-controllable, like MyST Markdown) (- ipynb notebooks (as a build artifact) natively displayable by notebook clients --> binder can handle markdown notebooks) - Static HTML - Interactive platforms - JupyterHub/Binder - JupyterLite ### Patterns - Every "example" has a directory with - one or more text-based notebook - supporting files - environment specification - For each example, CI should test against minimum, latest, and development verisons. - #### Aim of tutorials - Define what we mean by tutorials - should we aim to discuss any executable content, or just tutorials as defined by https://diataxis.fr/? - Narrative documentation. Some projects call it guides, others call it examples, they may live in galleries. We are not talking about API documentation/docstrings. - Treat narrative docs as a library. - Be specific about what you support. Like in a library, be specific about versions, dependencies and expected settings. Consider whether or not versioning your tutorials is worth it (have multiple versions of the tutorial matching your library/dependencies) - Workshop materials. Use some indication of when this was last tested (potentially by CI) - showcase usage of libraries. Thus tutorials should support a wide range of OS/version/etc of users; in practice they should at least support SPEC0 type versions rather than just latest and greatest or being tied to specific docker image (users should not be expected to work only within a specific container) - CI templates - Tutorial buzzwords/angles to build into this: reproducible, provinence, sustainability, maintainable, and some term we use to mean into the future DRY: Don't repeat yourself - Make sure you use tried and tested tools whenever possible instead of reinventing the wheel - Use tools in the ecosystem, contribute back, don't reinvent the wheel Scaffolding Reclaim control over content (think about mpl case of changing to obj oriented approach, yet a lot of materials out there is plt based) ### Implementation - HTML deployment - Static vs interactive - Content testing - Version matrices - Options for hosting data and compute hardware - Know where your data is coming from (be a good scientific citizen) - Highlight a few options and pros/cons for a few specific strategies - caching - Warning handling, error on warnings, give tracebacks, give info about which cell it fails (e.g. nbval style) ### Nice to haves - Handle different tutorials differently. - Some may have extra/non-self compatible dependencies. - Some may not be executed frequently as they require large data/resources. - we ultimately need solutions for tutorials that scale, either data that scales/or needs authenticaton/etc, and problems that needs scaled computation ## References - https://leomurta.github.io/papers/pimentel2019a.pdf, citation: https://ieeexplore.ieee.org/document/8816763 - https://github.com/napari/napari-workshop-template - https://collections.plos.org/collection/ten-simple-rules/