Try   HackMD

2i2c Product for Research Communities

J. Colliander
2023-04-17

Context: This document, building on many prior discussions, assembles ideas and insights on how 2i2c serves and delivers value to research communities. Here are the goals for these notes:

  1. Develop an ontology for "digital villages" served by 2i2c
  2. Build a menu of product offerings 2i2c delivers or can deliver to partner communities
  3. Price the product offerings on the menu for sustainability
  4. Specialize the discussion to generate a document for the NASA VEDA opportunity
    These notes could possibly expand into a white paper that extends 2i2c's submission to the NASA RFI.

Related notes

Digital villages

2i2c serves a collection of communities (CryoCloud, Pangeo, Project VICTOR, CIROH, ) that use interactive computing to collaborate on data-intensive research. We've begun to view these communities as "digital villages" gathering around a common "data watering hole", a metaphor that evokes the social and technological components that influence the vibrancy and success achieved by these communities.

The communities consist of people who work together using compute and data to advance on a shared goal to improve human understanding (discovery) in some area of inquiry.

The people in these communities are diverse and approach their community's shared goal with different motivations. Some of the communities focus on scholarly goals (publish academic papers, train young scholars) and consist of people stratified by academic career stage (students, postdocs, professors). Other communities focus on societal goals (mitigate climate crisis, improve health outcomes) and include people from different sectors (academy, government labs, not-for-profits, industry, policymaking) working together. While the social structures (governance, incentive systems) and technologies (encryption to protect patient privacy, remote sensing data, on-prem HPC, cloud computing) vary across these communities, 2i2c believes in unifying principles:

  • there is a right to participate in science
  • technologies for participation in science should be public goods
  • obstacles to participation in science should be lowered

Heart beats

Digital villages have two kinds of heart beat: the collaboration cycle and the funded project cycle.

The main activity of these "digital villages" is a collaboration cycle that advances human understanding and moves the community toward their shared goal.

People enter and exit. New data arrives.

The community produces valuable outputs aligned with the goal.

Communities that 2i2c serves collaborate on projects to produce target outputs (deliverables) over time intervals using resources (funding, cloud credits, data streams) gathered through the funded project cycle.

The communities 2i2c serves move forward in their pursuit of their shared goal through the funded projects cycle.

Ontology for digital villages

The technologies deployed by 2i2c to serve digital villages can be described in layers. This is all delivered consistent with the right to replicate. Some of these layers involve community-specific implementstions. Some of these layers can be deployed to simultaneoulsy serve multiple digital villages. "county-like layers"

  1. cloud vendor base layer (K8S cluster on commercial cloud, Dask, serverless compute)
  2. collaboration scaffold layer (JupyterHub, CIlogon, RTC, hardware profile selector)
  3. toolchain layer (software image, software environment, applications, software environment selector)
  4. user interface layer (JupyterLab and image applications, syncthing, linux desktop, R Studio, VS code, QGIS, MATLAB, Fortran)
  5. data and metadata layer (APIs, ARCO tools, Globus, Open Storage Network, data suggestion methods, Lunaris,)
  6. security layer ()
  7. training and support layer (documentation, ticketing system, RTC, automated code generation)
  8. knowledge mobilization layer (Binder, JupyterBook, nbgitpuller, git, JupyterLite, Data products, dashboards)
  9. On-prem compute integration layer (HPC)
  10. Usage and cost monitoring layer (Prometheus, Grafana)
  11. Corpus services layer (LLM linked to community research corpus; bibliometric services)