Brainstorming Notes

**Pull Design** - Admins "register" their clusters with Quay by providing a `ServiceAccount` token with the ability to read/watch all `Pods` on their cluster **Questions** - WebSocket endpoint in Quay to get realtime updating stream of live images on clusters? **Getting live images out of the cluster:** - Fetching all pods and searching their `spec.containers` requires downloading _all the JSON_ (no server-side filtering) which could be unbounded size in the MBs - Should `quay-bridge-operator` just label `Pods` with the image(s) they are running? - Advantage: no new CRD needed - Disadvantages: writing to pods, users could delete labels - `kubectl get pods --all-namespaces -l <quay-host>/<namespace>/<repo>/<tag>` - Should `quay-bridge-operator` watch `Pods` and aggregate into CRs for easier querying by Quay? - Could also be readonly using aggregated apiserver (more work) - Disadvantage: just duplicating info that is already in the k8s API (on `Pods`, but not easily queryable due to large JSON response) - Could also use labels for server-side filtering - `kubectl get liveimages --all-namespaces -l <quay-host>/<namespace>/<repo>/<tag>` - Should `quay-bridge-operator` watch `Pods` and expose its own API (behind a `Service`) for easier querying by Quay? - Advantage: could be a generic endpoint that other orchestration tools implement to get the same features (do we care?) - Disadvantage: setting up a `Service` and `Ingress` or k8s API service proxy thing **Preventing deletion of live tags** - Create decorator on `DELETE /v1/repository/<apirepopath:repository>/tag/<tag>` endpoint which checks for the given tag on any of the registered clusters using Kubernetes API call - Concern: this could significantly slow down this operation, but deletion of tags shouldn't be that critical **Viewing clusters which have a given tag running on them** - Add a query parameter to `GET /v1/repository/<apirepopath:repository>/tag/<tag>` to include this data in an extra field? - Add new endpoint for retrieving this data? - Ideally this should be responsive (using WebSockets) ### Registering Clusters - Start with adding k8s clusters at the entire registry level (Quay instance) - Later individual organizations, users, and teams can add clusters using the same UX - Create `KubernetesClusterAccess` database model - Joey: We might want to use a slightly different name, since if these are going to have permissions on them, `KubernetesClusterAccessPermission` reads oddly. - Contains cluster name and (and other metadata), API endpoint, and auth token (later encrypted) - Initially, restrict one instance per cluster using `UNIQUE` index on `api_endpoint` - Question: Performance implications of a migration which _removes_ a `UNIQUE` index? - Joey: Probably okay, but unless we require it early on, let's just check it client side - Question: How does access to a cluster relate to individual Quay users? - `KubernetesClusterAccess` has a foreign key field to the specific user who created it? - New table (`UserKubernetesClusterAccess` or something) which links a `User` to a `KubernetesClusterAccess` - Joey: I recommend we have every `KubernetesClusterAccess` have an `owner` field pointing to the namespace that owns the cluster (and `NULL` for registry-wide). Then we define a new table called `KubernetesClusterPermission` with columns `cluster_id`, `team_id` and `user_id` (similar to how we do for repositories), with the `team_id` or `user_id` indicating permission for that team/user. Cluster-wide ones would have no such entries. - New internal Quay CRUD API for managing `KubernetesClusterAccess` objects - Eventually UI for adding clusters - Even more eventuallier OAuth flow with refresh tokens - All Kubernetes calls happen in the browser client app using encrypted token(s) - Add new endpoint to NGINX (like `/k8s/<cluster-endpoint>/<some-k8s-api-call>`) - `proxy_pass` to the actual cluster API endpoint - Use `auth_request` to decrypt token (same as `/_storage_proxy/` endpoint) - For more security, maybe whitelist only certain k8s resources in the NGINX proxy - Joey: :+1: - Send along JWT to browser containing all the tokens a user has permission to - Super useful NGINX module --> http://nginx.org/en/docs/http/ngx_http_auth_jwt_module.html - Low expiration time and whitelisted k8s paths # Brainstorming Notes ### 5/7/2020 #### Goals 1. Determine how k8s clusters get registered with Quay - Should `KubernetesClusterAccess` have a `user` field? - Yes, they need an `owner` - No, have a separate table that maps users<->cluster access - Superuser-defined whitelist of clusters (feature flag) - Two modes: open (register access at namespace level), and closed (superuser-defined whitelist) 2. Determine first feature that takes advantage of cluster access - When looking at a tags list, show all clusters running this tag - When deleting a tag, warn me if it is running on clusters #### Action Items - [x] Close https://github.com/quay/quay/pull/300 - [ ] Start spike branch off `python3` working on NGINX proxy to identify roadblocks early # Thought What if we make a dedicated section in OpenShift console for Quay registry? ### Pros - Reflected by some CRD instance (like `QuayIntegration`) - No more need to deal with Quay/k8s auth or proxy (just use Quay API from k8s) - Can use Quay webhook notifications to some cluster `Service` for some things (would need to add more supported events to Quay) - Integration of OpenShift user management with Quay RBAC - Easily add users/teams/orgs to Quay from existing OpenShift users - "Single pane of glass" (ugh) ### Cons - Doesn't help with "warn when deleting tag" - Potentially a lot of hitting the Quay API - How does RBAC work - In multicluster, which UI do you use to interact with a single Quay? - Could interfere with hopes of building shiny new Quay UI # User Stories ## Personas **Cluster Admin**: In charge of one or more Kubernetes clusters with root access. **OpenShift User**: Access to one or more namespaces on a cluster. **Quay Admin**: In charge of one Quay registry, with superuser access. **Quay User**: Member of one or more teams in a Quay registry. ## Stories **Initiative**: As an cluster admin, I want more *insight*, *control*, and *security* in regards to the container images running on my clusters in order to enhance the experience for my developers and ensure the reliability of the workloads on my cluster. **Story**: As a Quay user, I need to know if deleting a tag will affect any running pods in order to prevent breaking my Kubernetes workloads. **Story**: As an OpenShift admin, I want to prevent container images with vulnerabilities from running on my clusters in order to ensure the security of my cluster workloads. **Story**: As an OpenShift user, I want to see if a newer tag with no vulnerabilities exists in order to easily update my pod's image and keep my workloads secure. **Story**: As an OpenShift user, I want to be able to revert my deployments to use an older version of a tag in order to have more confidence when pulling tags in production. **Story**: As an OpenShift user, I need to know if the tag my pod is running will expire soon in order to prevent downtime from failing to pull the tag. **Story**: As an OpenShift admin, I want to know what container images are running on my clusters in order to have a better understanding of the workloads on my clusters. --- **Epic**: As an OpenShift user, I want the build system to push and pull my container images to a registry with a graphical user interface in order to have a delightful, container-native development experience. **Story**: As an OpenShift user, I want to know when my teammate pushes a new tag to our repository in order to speed up my development cycle. **Story**: As an OpenShift user, I want to prevent creating a pod which runs a tag that does not exist, in order to reduce headaches from dumb mistakes. **Story**: As an OpenShift user, I want to know if the tag my container is running has changed or been deleted in order to know that I need to update it or restore the tag. **Story**: As an OpenShift user, I want to know if the container image I am about to use _before_ creating a pod has any vulnerabilities in order to keep SecOps from yelling at me. **Story**: As an OpenShift user, I want to see exactly which environment variables, entrypoints, ports, and mountable volumes a container image has _before_ creating a pod in order to avoid misconfiguration. **Story**: As an OpenShift user, I want to be able to specify that a tag should be seeded into the cluster as soon as it's pushed in order to speed up my development experience. **Story**: As an OpenShift user, I want a link to the container image repository when looking at a pod in order to understand more about the container image. --- **Epic**: As an OpenShift admin, I want unified role-based access control between users of my clusters and my container registry in order to easily manage the security of my cluster workloads. **Story**: As an OpenShift user, I want to authenticate with a Quay registry using my OpenShift credentials in order to not have to remember another password. **Story**: As an OpenShift user, I want know there is a pull secret present for the container image I am about to use _before_ creating a pod in order to reduce headaches from dumb mistakes. **Story**: As an OpenShift user, I want to be able to choose a robot account to use as a pull secret when creating a pod in order to avoid the extra step of creating one manually. **Story**: As a Quay user, I need to know if deleting a robot account will affect any of my running pods in order to prevent breaking my Kubernetes workloads.