owned this note
owned this note
Linked with GitHub
# KubeCon 2019 EU Notes: OCI Catalog Listing APIs
After 2 hours of conversation on OCI Artifacts and CNAB support ([notes](https://hackmd.io/s/S1X6JNnTN)), and the end of the day, the group self reduced to a smaller set of folks. With Quay, Harbor, ACR, Snyk, Twistlock, Customer (JP Morgan) and OCI maintainer representation, we felt we had good breadth. Our goal wasn't to design the api here, but agree on the scenarios and requirements
- Daniel Jiang & ___ from VMWare, representing Harbor
- Jimmy Zelinskie- Quay, OCI maintainer. Worked with Antoine for App Registry
- Michael Brown - IBM, OCI maintainer
- Kohel - researcher for registries - speeding up container distribution
- Joey Schor - Quay - co-founder & tech lead
- Kenneth Brooks - JPMorgan Chase
- Gareth Rushgrove - Snyk - formerly worked on CNAB while at Docker
- Maya Even-Shani - Twistlock
Steve provided an overview covering the scenarios captured in https://hackmd.io/s/BJPAUxDvV
- A common api across OCI distribution
- Scanners shouldn’t have to implement different APIs across each registry implementation
- Vulnerability scanners are the only ones larger enough to chase each registry individually. If we had a general API, we could see an ecosystem of tooling.
- Smaller products wouldn't get support, as a scanner may not be willing to invest
- If we had a common API, we'd likely have broader tooling as developers would invest in tools that work consistently across all distribution instances
- The catalog API is just part of the scenario. We need catalog for listing, but eventing for real time.
## Open discussion:
- Joey: Would like to redo the tags api as well. eTag for changes since last asked.
- Joey: would like to deny requests that don't provide eTags
- Steve: What's the use case for eventing new repos?
○ Do repo listing tools, and scanners need an event, or can they query for the delta of repos, with the eTag on demand?
- ?: some workflows may want to act on new repos being created.
## _catalog api
- Steve: If we require _catalog as part of 1.0, we'll define a requirement that will simply not be consistently implemented.
- Docker went a different direction with the DockerCon announce of a search API on DTR and Docker EE. Because `_catalog` wasn't consistently implemented.
- Consensus we need something else
- Mike: we would reserve `_catalog`, but not require implementation
We had general consensus on the scanning and broader tooling scenarios captured in https://hackmd.io/s/BJPAUxDvV
- All repos
- Search by prefix
- Filter by artifact type
- Order by date, alpha of the image, alpha of the tag
- Registries can enforce throttling, on their own policy, because OCI has a pub sub model for larger scale
## Scalable Solution
With the scenarios agreed, we shifted the discussion to an approach that could scale
- Joey: Shared the pub/sub implementation of Clare, which scales to millions of images scanned within Quay. It's documented in v3, but not implemented for others. Would like it to be however
- Something has changed, when Clare knows it has a new vulnerability. It communicates with Quay to do the dif on both sides.
- Joey: The new listing api should support an eTag approach
- Sending a new or null eTag provides a paged result of all repos, images or tags
- The client maintains the eTag. Subsequent requests present the eTag, with the delta being returned
- Maya - Do scanners
- Steve: Salesforce & APEX promote a best practices model. Developers could take the easy route, querying the list of states 3 times on a page request. At some point, they'll get throttled, and must implement a caching model. We can use the same pattern for tools that want to query the new listing API. When they get throttled, they may be motivated to implement an eventing model.
## Pub/Sub Approach
- Joey: Even in the pub/sub model, the initialization can be done with an empty eTag
- Defining the scope of notifications:
- Steve: when registering for pub/sub, pass in the scope of what they want to get events upon. Artifact Types, Repos, Platform, …
- Event Response:
- Joey: Each event payload is a single artifact update. The payload can be the manifest
- Steve: results need to respect the RBAC of the credential receiving the payload
After 3 hours of discussions, we could summarize the discussion as:
- `_catalog` should be set to reserved, with the implementation removed from the spec
- The use case breaks down into two basic scenarios:
- Tools can get a list of repos, images or tags
- Tools can subscribe for events of changes
- We expect many tools that are human driven will likely use the listing api. Tools that need to manage the entire contents of the registry would use eTags to get detlas.
- Tools that must act real time, such as scanners that must scan artifacts as they are pushed, and before they are deleted will use the pub/sub model
- Two new apis to be added:
- A listing api that returns repos, images and tags
- The listing api will support an eTag, enabling the registry to return a delta
- A pub/sub model
- When initializing the subscription, a scope is defined
- The results are one result per event
- The result likely has the manifest, or a pointer to the manifest