The Rock

Preliminary notes for the dev-summit on Monday, 4 October, 16:00 UTC in the Château.

Idea

The idea of a Software-Underground-Stack or a Subsurface-Stack is probably almost as old as SWUNG itself. Something that unifies the open-source codes, something to give a face to the outside world, which might also attract funding from industry. Something to show that the open-source stack is strong in the geosciences, and is not only some nerds coding away, without stability or guarantee to work. Role models in this respect (with slightly different scopes) are, e.g., NumFOCUS and Jupyter Meets the Earth, or Pangeo.

Swung orchestrated already many initiatives in this direction: There is the https://softwareunderground.org/stack - page, and there is the subsurface Python package to connect various packages in the ecosystem. Nothing took off so far, mainly because of a lack of time! We are all busy, many of us are main developers of a code that could potentially be part of The Rock, and often this is not even our paid job and we have to find time to maintain it.

My suggestion is therefore: Let's lower the expectations, but start today! As such I suggest bi-monthly meetings (or monthly, but alternating the times to enable people anywhere on the globe to participate in one or the other). These bigger meetings are just to connect, but most things should happen in smaller groups, with only one or a few people. So the big meetings is to assign tasks, keep track on them, and syncronize.

[Note: I started these notes, but consequently many people contributed to it either directly or through giving me feedback. It has now already far greater ideas in it than I could have imagined on my own.]

Actions (next baby-steps)

Different people can tackle these, sub-groups.

  • Elect/nominate a (small) committee from Swung to drive/manage the process.
  • Define the coarse purpose (many inputs from @jokva):
    • Federated Model
      • Projects can apply to get in;
      • They have to meet certain requirements;
      • They will get certain benefits.
      • We could enable Tier blessings, different levels fullfilled of the requirements.
    • Unified Model:
      • Projects can submit/donate ownership of the library
      • We deliver a single, uniform, consistent package (this has to be broken up in modules, otherwise it will be too big)
    • Example-driven model:
      • Single metapackage that installs every library
      • Create compatibility tests
      • Produce use cases that use several libraries (examples in Jupyter notebooks or books)
      • Prototype: https://github.com/leouieda/the-stack
    • Basically: Do we endorse projects and guarantuee certain things (federated model, more like NumFOCUS), or do we provide a stack (unified model, more like conda-forge).
    • Either way, we should probably have an AWESOME top-level documentation (basically the website)
    • Depending on what we decide we might really need a Software Engineer!
  • Define a name (see Name)
  • Create a website (see Website)
  • Define submission/review/test process (see Python ecosystem)
  • Define rules (see Idea collection for rules)

Name

I suggest The Rock, and I use The Rock throughout this document. However, I am open for suggestions, I see it currently as a working title and am open for suggestions. Please suggest your ideas! Nevertheless, I give here some reasons.

  • Rock is intrinsincally geo, yet avoids all the discussions we had about sub-surface, above-surface, morphology, planets, etc. I also think it is not used yet in this context.
  • No py in it (e.g., not PyRock), as it should be language agnostic and might grow beyond Python.
  • It sounds cool. Imagine announcing in your channel: «Big news, my awesome package xyz is now in The Rock!»

Regardless of the name, I think we should register a domain, I suggest swungrock.org. Then we should have various pointers to it, e.g. swu.ng/rock, swu.ng/stack, and softwareunderground.org/stack.

Alternative name suggestions:

  • On The Rocks
  • Codeglomerate

Scope

To keep it simply (remember, at the moment it is just about let's get started) I think we should focus on the Python ecosystem. Later it can grow to

  • include other ecosystems;
  • option for funding (through Swung or as a separate entity);
  • affiliations (e.g., research consortia)

Whatever we add, the scope of The Rock is purely CODE. It is not another Swung, but a subset of it, and we should have a clear distinction.

The Rock should stand for a standard. Something companies might want to sponsor. Something that people will associate with stable, reproducible code. A place where people go and search for solutions. A place where new users know that they will find help (we should have a #rock-channel on Slack).

Website

This will be of utmost importance. We need a professional looking, useful website. This will need different things:

  • Scope and Roadmap
  • Contact
  • Funding page (currently a "coming soon")
  • Affiliation page (currently a "coming soon")
  • Per ecosystem sites:
    • Searchable, "filterable" (e.g., show me all packages with the keyword "inversion", or "rock physics")
    • Categorized
    • Each accepted package needs
      • Short description of its capability
      • Links to: source code, documentation, contact info
      • Keywords (so we can filter for it)

Python ecosystem

The Python Ecosystem is the first step of The Rock. If we can pull this off then we can see where else we can go.

  • There must be a clear, transparent process detailing how a package owner can submit his package to The Rock (I think GitHub issues would be good for that).
  • There should probably be a core committee which can admit new packages.
  • There should be a (semi-automated) annual review process (ensure packages don't fall behind in the stack).
  • Clear set of rules that must be fulfilled. I think we should set a standard, and don't shy away from defining a standard that might be above our current own packages.
  • Make it clear that we want to help/guide maintainers to fulfill these rules.
  • CI (on https://github.com/softwareunderground/TheRock) that checks requirements.

Idea collection for rules

  • Must be on a public, versioned platform (GitHub, GitLab, Bitbucket et al.).
  • Must have a logo (so we can show it; logos are important these days)
  • Must be available from conda-forge.
  • Dependencies: NO PINNED version numbers (e.g., numpy==1.6.0).
  • Dependencies: As few as possible limiting version numbers (e.g., scipy<1.6)
  • Dependencies: Aim to standardise versioning across libraries, particularly for large packages (e.g. numpy, scipy, scikit-learn. etc). But leave smaller packages upto maintainers. If there is conflict, have a resolution process for maintainers to agree on a minimum version of a dependency.
  • It must have API documentation and a manual with examples.
  • Must have tests to some degree.
  • Must have an open-source license.
  • Must have DOI's for releases (reproducibility)
  • Should have at least one main reference (a publication) [this is probably too academic, and the DOI above should do]
  • Must have a stable API (clear deprecation cycles with deprecation warnings)
  • Must be geo-related (either dealing directly with geo, or having many geo-examples, e.g. PyVista)
  • Possible versioning of the stack with LTS releases.
  • Use pyOpensci for review the packages: https://www.pyopensci.org/
  • There is https://projectpythia.org/ that could be useful role model
  • Minimum product for regularly trying to install everything: https://github.com/leouieda/the-stack/blob/main/.github/workflows/test.yml

Some follow up:

  • JupyterBook, website
  • Smoketest that can be put throughout the various projects
    • Document this CI system (this is basically an install?)

Meetings

04/10/2021: "First" Meeting

04/10/2021 - 16:00 UTC - Château
Andrea, Santiago, Agustina, Lindsey, Leo, Wesley, Matteo, Steve, Rowan, Joe, Dieter

Thanks everyone for joining! Just to follow up: Basically we are at a similar point as many times before

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
We would really like to have something, but time is hard. However, there are three possible paths that crystalized:

  1. Tutorials of cross-package usage
  2. Push subsurface-packages
  3. Federated model What we agreed on is:
  • Monthly meeting (more to this later)
  • That interested folks should start of any of these three topics and share progress etc here. Maybe they will come all together, maybe some efforts stop at some point. But basically we should just get started.

We could have monthly meetings on the first Monday of the month at 16 UTC - by coincident it seemed to work fine. However, it is not very inclusive on a global scale. Therefore, please post in #dev-summit in the thread if you are interested but this time does not work for you (also post what times would work for you).