--- tags: oci, catalog, listing, management, distribution --- <!-- https://hackmd.io/s/how-to-tag-notes --> # OCI Catalog Listing API - Workgroup ## workgroup contributors(add your name): - Mike Brown - Atlas Kerr - Steve Lasker - Adam Dobrawy ## Context After agreeing to remove the catalog api from the distribution spec alpha.. It was agreed, without opposition, during the OCI weekly calls of March 6 and 13 of 2019 to create a working group for specifying a catalog listing api. ## Purpose of Workgroup: * identify use-cases and scope of a catalog listing API * draft an alpha spec (and ideally a PoC) ## Scope - Cover repo and possibly tag listings - Optionally Account for role based access control (RBAC), but not specify the details of an RBAC implementation. - Notes on scope: - currently the /v2 API is a nested URL based name VERB should be used instead of URL approach (i.e. .../list) - droping /v2/_catalog https://github.com/opencontainers/distribution-spec/pull/45 - but not dropping --> https://github.com/opencontainers/distribution-spec/blob/master/spec.md#listing-image-tags ## Use Cases - Tools like vulnerability scanners need a listing of the repos by which they need to scan. These tools would like to keep current with event based notifications, through a subscription model. - Another usefull pattern for the repo and tag listing would be to do an initial scan and/or periodic re-scans. For this pattern time based queries should be supported for retrieving repos/tags added/removed/modified since a given `date:time` to enable synchronization. - List the existing repos within a registry, enabling the user to choose which they wish to process. For example, the user may configure a scanning tool to scan particular repos, or select which repos for which the user wants a tag listing. - Deployment tool enabling a user to select the "newest" tag, via an ordered list by last push date, or possibly alphanumeric. - As new artifact types become common, the tag listing will need to identify the artifact type. ## Constraints - Some registry implementations use a root namespace to isolate tenants. The catalog APIs MUST support a beginning root by which repos are listed. But it should not mandate any particular namespace model. - Repo listings MUST respect role based access control (RBAC) or the user issuing the query. A user MUST only see repo names they have the equivalent of `read` permissions for. - As various registries have widely ranging different implementations, There are no constraints of backwards compatibility with the `/v2/_catalog` implementation ## Questions For Discussion - Should the catalog API discussion incorporate an eventing API to round out the experience? - What is the expected relationship between the repository list and tag list, confirm this catalog API should cover tags. ## Related Issues/PRs https://github.com/opencontainers/distribution-spec/issues/22 https://github.com/opencontainers/distribution-spec/pull/45 ## References [Slack Catalog Discussion](https://opencontainers.slack.com/messages/CGSAW6X0X/) [opencontainers.org dev group discussion thread](https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/MFUjpt5fxNc) ## Discussions ### KubeCon 2019 Catalog Meeting May 22, 2019 After 2 hours of conversation on OCI Artifacts and CNAB support ([notes](https://hackmd.io/s/S1X6JNnTN)), and the end of the day, the group self reduced to a smaller set of folks. With Quay, Harbor, ACR, Snyk, Twistlock, Customer (JP Morgan) and OCI maintainer representation, we felt we had good breadth. Our goal wasn't to design the api here, but agree on the scenarios and requirements ### Attendees: - Daniel Jiang & ___ from VMWare, representing Harbor - Jimmy Zelinskie- Quay, OCI maintainer. Worked with Antoine for App Registry - Michael Brown - IBM, OCI maintainer - Kohel - researcher for registries - speeding up container distribution - Joey Schor - Quay - co-founder & tech lead - Kenneth Brooks - JPMorgan Chase - Gareth Rushgrove - Snyk - formerly worked on CNAB while at Docker - Maya Even-Shani - Twistlock Steve provided an overview covering the scenarios captured in https://hackmd.io/s/BJPAUxDvV - A common api across OCI distribution - Scanners shouldn’t have to implement different APIs across each registry implementation - Vulnerability scanners are the only ones larger enough to chase each registry individually. If we had a general API, we could see an ecosystem of tooling. - Smaller products wouldn't get support, as a scanner may not be willing to invest - If we had a common API, we'd likely have broader tooling as developers would invest in tools that work consistently across all distribution instances - The catalog API is just part of the scenario. We need catalog for listing, but eventing for real time. ### Open discussion: - Joey: Would like to redo the tags api as well. eTag for changes since last asked. - Joey: would like to deny requests that don't provide eTags - Steve: What's the use case for eventing new repos? ○ Do repo listing tools, and scanners need an event, or can they query for the delta of repos, with the eTag on demand? - ?: some workflows may want to act on new repos being created. ### _catalog api - Steve: If we require _catalog as part of 1.0, we'll define a requirement that will simply not be consistently implemented. - Docker went a different direction with the DockerCon announce of a search API on DTR and Docker EE. Because `_catalog` wasn't consistently implemented. - Consensus we need something else - Mike: we would reserve `_catalog`, but not require implementation ### Scenarios We had general consensus on the scanning and broader tooling scenarios captured in https://hackmd.io/s/BJPAUxDvV - Scanning - Search - All repos - Search by prefix - Exclude - Filter by artifact type - Order by date, alpha of the image, alpha of the tag - Return - Digests - Tags - Throttling - Registries can enforce throttling, on their own policy, because OCI has a pub sub model for larger scale ### Scalable Solution With the scenarios agreed, we shifted the discussion to an approach that could scale - Joey: Shared the pub/sub implementation of Clare, which scales to millions of images scanned within Quay. It's documented in v3, but not implemented for others. Would like it to be however - Something has changed, when Clare knows it has a new vulnerability. It communicates with Quay to do the dif on both sides. - Joey: The new listing api should support an eTag approach - Sending a new or null eTag provides a paged result of all repos, images or tags - The client maintains the eTag. Subsequent requests present the eTag, with the delta being returned - Maya - Do scanners - Steve: Salesforce & APEX promote a best practices model. Developers could take the easy route, querying the list of states 3 times on a page request. At some point, they'll get throttled, and must implement a caching model. We can use the same pattern for tools that want to query the new listing API. When they get throttled, they may be motivated to implement an eventing model. ### Pub/Sub Approach - Joey: Even in the pub/sub model, the initialization can be done with an empty eTag - Defining the scope of notifications: - Steve: when registering for pub/sub, pass in the scope of what they want to get events upon. Artifact Types, Repos, Platform, … - Event Response: - Joey: Each event payload is a single artifact update. The payload can be the manifest - Steve: results need to respect the RBAC of the credential receiving the payload ### Summary After 3 hours of discussions, we could summarize the discussion as: - `_catalog` should be set to reserved, with the implementation removed from the spec - The use case breaks down into two basic scenarios: - Tools can get a list of repos, images or tags - Tools can subscribe for events of changes - We expect many tools that are human driven will likely use the listing api. Tools that need to manage the entire contents of the registry would use eTags to get detlas. - Tools that must act real time, such as scanners that must scan artifacts as they are pushed, and before they are deleted will use the pub/sub model - Two new apis to be added: - A listing api that returns repos, images and tags - The listing api will support an eTag, enabling the registry to return a delta - A pub/sub model - When initializing the subscription, a scope is defined - The results are one result per event - The result likely has the manifest, or a pointer to the manifest ## OCI Call - March 6, 2019 *redacted to show catalog listing related content* Textual User, irc.freenode.net APP [4:46 PM] #topic catalog/listing capabilities of registries lasker, vbatts, atlas discussing potential for a proposal in this area vbatts: need a proposal for a way forward on issue #22 #link https://github.com/opencontainers/distribution-spec/issues/22 #endmeeting ## OCI Call - Feb 27, 2019 *redacted to show catalog listing related content* #topic what's needed for distribution-spec v1? (lasker) search could be one of those things. But could be added later. process for registering media-types annotations are okay for _optional_ metadata but this is not suitable for scanners that need required/expected information to determine the artifact type Vincent Batts, irc.freenode.net APP [4:58 PM] #link https://github.com/opencontainers/distribution-spec/issues/58 #link https://github.com/opencontainers/image-spec/blob/master/image-index.md #endmeeting