OLM
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- title: Informers Based on RBAC authors: - "@awgreene" reviewers: - TBD approvers: - TBD creation-date: 2022-07-28 last-updated: 2022-07-28 status: provisional tags: Descoped Operators --- # Informers Based on RBAC ## Release Signoff Checklist - [ ] Enhancement is `implementable` - [ ] Design details are appropriately documented from clear requirements - [ ] Test plan is defined - [ ] Graduation criteria for dev preview, tech preview, GA ## Background [descoping-plan]: https://hackmd.io/DTVukSyGSkSLwD9WgZsdbw To learn more about the motiviation behind Descoping Operators, please review: - OLM's [Descoping Plan](descoping-plan). - The [Introduction to Scoped Operators Document](https://hackmd.io/luRu5dE2R924qojSeOaugA). - The [Scoped Cache PoC Presentation Document](https://hackmd.io/NoXbbOgOQEe2Q6IWCpSBzg). - TODO: Merge the presentation document with this document ## Summary As part of the [Operator Framework](github.com/operator-framework)'s effort to move towards Descoped Operators we must identify how an operator could be configured to reconcile events in specific namespaces based on available [Role Based Access Controls (RBAC)](https://kubernetes.io/docs/reference/access-authn-authz/rbac/). The Operator Framework ultimately hopes to fulfill the following problem statement: > As an operator author I want to develop an operator that can handle changing permissions so that cluster admins can use [Role Based Access Controls (RBAC)](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) to scope the permissions given to my operator. ## Motivation In an effort to support the `descoped operator` model, the Operator Framework will need to identify how a descoped operator can: - Manage its informers for resources it cares about. - Manage its cache as the operator gains/loses RBAC. ### Goals - Document how an operator is "scoped" today - Document limitations with the existing approaches - Propose how we would like operators to be configured to resolve. ### Non-Goals - Describe how the proposed solution could be backwards compatible with OLM. - Capture how Cluster Administrators could generate the operator's RBAC in a specific set of namespaces. Please review [the oria operator documentation](https://hackmd.io/DmUu4EUWQhqmB4mCwLNFIg) if you'd like to learn more about this problem space. ## Proposal ### Background Operators built with [Kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) and the [Operator SDK](https://github.com/operator-framework/operator-sdk) rely on [Controller Runtime](https://github.com/kubernetes-sigs/controller-runtime) to build the cache used by their operator. Controller Runtime currently supports two forms of caches: - The [Informer Cache](https://github.com/kubernetes-sigs/controller-runtime/blob/a759a0d516760550d07531c91717831994e42c94/pkg/cache/informer_cache.go#L48-L51) - The [Multi Namespace Cache](https://github.com/kubernetes-sigs/controller-runtime/blob/a759a0d516760550d07531c91717831994e42c94/pkg/cache/multi_namespace_cache.go#L73-L82) The cache used by the operator can be set by the [options.NewCacheFunc](https://github.com/kubernetes-sigs/controller-runtime/blob/7a5d60deb3b6a5d6cbec4ba9c56b60549ab9de67/pkg/manager/manager.go#L251-L253) parameter passed into the [New Manager Function](https://github.com/kubernetes-sigs/controller-runtime/blob/7a5d60deb3b6a5d6cbec4ba9c56b60549ab9de67/pkg/manager/manager.go#L335-L436): - The `Informer Cache` is the default cache and can be configured to watch all namespaces or a single namespace. To configure it to watch a single namespace you can set the [options.Namespace](https://github.com/kubernetes-sigs/controller-runtime/blob/7a5d60deb3b6a5d6cbec4ba9c56b60549ab9de67/pkg/manager/manager.go#L207-L213) parameter when creating a new `Manager`. - The `Multi Namespace Cache` is used when the `options.NewCacheFunc` is set to use the [cache.MultiNamespacedCacheBuilder](https://github.com/kubernetes-sigs/controller-runtime/blob/7a5d60deb3b6a5d6cbec4ba9c56b60549ab9de67/pkg/cache/multi_namespace_cache.go#L40-L71) function. This cache can be configured to only set watches in a specific set of namespaces. Caches represent an operator's view of the resources available on the cluster. When an operator attempts to interact with a resource, the cache will: - Populates the cache with a `list` call. - Establish a `watch` to mirror changes to resources in the cache made by outside entities. This means that an operator always needs appropriate `list` and `watch` permissions when configuring the cache. "Appropriate" `list` and `watch` perrmissions are defined by each Cache. In the case of the `Informer Cache`, the operator must have: - `list` and `watch` permissions for the GVK at the cluster level if not constrainted to a single namespace - `list` and `watch` permissions for the GVK at a namespace level if configured to a single namespace In the case of the `Multi Namespace Cache`, the operator must have `list` and `watch` permissions for the GVK of the resource in all namespaces it is configured to. This introduces a number of issues concerning the [Operator Framework](github.com/operator-framework/)'s [Descoping Plan](descoping-plan) in which: - An operator reconciles events for an API it introduces across the cluster. - An operator is scoped to namespaces based on available RBAC. (Example: My Pod Operator watches its FooPod CRD across the cluster but can only `get`, `list`, `watch`, `create`, `delete`, `update` pods in the bar namespace). Let's discuss key issues below ### Key Issues with existing Caches #### Scalability of creating namespaced `watches` Many customers are uncomfortable with granting operators cluster wide `list` and `watch` permissions. As discussed earlier, the existing cache implementation relies on certain `list` and `watch` permissions to populate the cache. The existing `Multi Namespace Cache` does not scale well. For each resource in the cache, a `watch` stream is established in the namespaces given to the Manager function at startup. This cache was designed to limit an operator to a small number of namespaces. The more namespaces that are used with this cache, the less performant it becomes. POSSIBLE SOLUTION: The number of watches that we need to create could be limited by creating the watches as needed. In this scenario, `watches` would be established when processing the CR, the operator could provide context to the cache regarding how to configure the watch. For example, if my CR specified that I needed to configure the cache for `secrets` in a specific namespace, the watch could be established when the CR is processed. Likewise, the watch could be eliminated when all CRs that rely on the watch are deleted. #### Caches for Individual Resources Neither cache supports watching specific resources within a namespace. Consider the scenario in which a CR allows a user to specify a single secret that the operator must `watch` or interact with. There are many instances where an operator may not need to watch all events related to a resource type in a namespace. For example, the [ingress controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) only establishes a watch on secrets specified in a CR, it does not receive events for other secrets in the namespace. #### Scopes of Watched Resource must be consistent Neither of Controller Runtime's cache implementations support populating GVKs at different scopes. The caches are populated for each GVK at the cluster level OR within the set of provided namespaces. ### New Cache Proposal In an effort to address the shortcomings surrounding the existing cache implementations, this document proposes the introduction of a Dynamic Cache. Key features delivered by the new Cache include: - Establishing a `watch` for a GVK at the cluster level. - Establishing a `watch` for a GVK in specific namespaces. - Establishing a `watch` for a specific resource. - Removal of watches if no longer required. The key features deviate heavily from existing cache implementations because they assume that watches are either set at the cluster level or namespace level. If the new cache is designed to handle multiple ways to establish an informer, the onus must be placed on the Operator Author to specify how to configure the watch. ## Design Details ### Removal of Watches There is a need to remove `watches` when they are no longer necessary in an effort to reduce the load that each operator places on the API server. Consider the scenario where the API server must fulfill the watch requests on a cluster with 100's of operators, the strain placed on the API server is unacceptable when using the `Multi Namespace Cache`. In instances where a watch is established by the Cache when reconciling a CR, there is a need to track when the watch can be discontinued. Today, this is done by placing a finalizer on the CR and having the operator clean it up. ### Establishing Informers while Reconciling a CR Many of the benefits for establishing a `watch` based on the existing of a CR that require it are consistent whether or not the watch is established at the cluster or namespace level or for a specific resource. These beneifits include: - Establishing a `watch` with the API server only if needed. - The abiity to remove a `watch` if no longer needed. #### Establishing cluster wide informers TODO #### Establishing namespaced informers Consider the following workflow with the Bar CR: ```yaml= # The Bar CR apiVersion: operators.io.operator-framework/v1 kind: Bar metadata: name: sample spec: secretsInNamespacenamespace: namespaceBar ``` - Operator Foo reconciles the Bar CR - Operator Foo attempts to list the secrets that exist in the `namespaceBar namespace`. - Operator Foo's cache is not populated with secrets in the `namespaceBar namespace`. - The `Dynamic Cache` establishes a `watch` for all secrets in the `namespaceBar namespace` and populates the cache with a `list`. - Operator Foo is returned the result from the `get request` for the `namespaceBar/secretBar secret` The above scenario would only require `list`, `watch`, and `get` permissions for secrets in the `namespaceBar namespace`, so these permissions would not be required at the cluster level. The user could specify this in code like: ```go= <INSERT EXAMPLE CODE HERE ``` ::: info **NOTE**: The current PoC implementation would allow for this but only allows for one namespace to be have informers created in per CR. For example, for a given CR you would only be able to create informers in either namespace `foo` or `bar` but not both. ::: #### Establishing an Informer for a Specific Resource Consider the following workflow with the Bar CR: ```yaml= # The Bar CR apiVersion: operators.io.operator-framework/v1 kind: Bar metadata: name: sample spec: secret: name: secretBar namespace: namespaceBar ``` - Operator Foo reconciles the Bar CR - Operator Foo attempts to get the `namespaceBar/secretBar secret` - Operator Foo's cache is not populated for the `namespaceBar/secretBar secret`. - The `Dynamic Cache` establishes a `watch` for just the `namespaceBar/secretBar secret` and populates the cache with a `list`. - Operator Foo is returned the result from the `get request` for the `namespaceBar/secretBar secret` The above scenario would only require `list`, `watch`, and `get` permissions on the `namespaceBar/secretBar secret`, so these permissions would not be required at the cluster or namespace level. The user could specify this in code like: ```go= <INSERT EXAMPLE CODE HERE ``` ::: info **NOTE**: The current PoC implementation only has a way to create watches on a specific resource in the same namespace a CR is in. ::: A major benefit to this approach is that an operator can be granted RBAC on specific resources. ### Scoping List/Watch RBAC for an API provided by the operator TODO --- [Bryce Palmer] - Proposed Changes (Add PoC details and demo from presentation document): ### PoC Details as of 08/26/2022 The [scoped-cache-poc](https://github.com/everettraven/scoped-cache-poc) is a Go library that has a cache implementation (`ScopedCache`) that satisfies the `controller-runtime` `cache.Cache` interface. This `ScopedCache` provides operator authors a dynamic caching layer that can be used to handle dynamically changing permissions. As of now, this library is ONLY the caching layer. It is up to the operator author to implement the logic to update the cache and handle the permission changes appropriately. #### How does it work? The idea behind the `ScopedCache` is to create informers for resources as they are needed. This means: - Informers are only created when a CR is reconciled - Informers are only created for resources that are related to the CR - Informers only live as long as the corresponding CR is around. If the CR is deleted the corresponding informers should be stopped. One assumption is that Operators will always need to watch for CRs they reconcile at the cluster level. In order to accomplish this, the `ScopedCache` is comprised of a couple different caches: - A `cache.Cache` that is used for everything that should be cluster scoped - A `NamespacedResourceCache` that is a mapping of `Namespace` ---> `ResourceCache` - A `ResourceCache` is a mapping of `types.UID` ---> `cache.Cache` - The `types.UID` is the unique identifier of a given Kubernetes Resource To properly use the `ScopedCache`, when reconciling a CR you would need to create the corresponding watches. The workflow for creating these watches would look like: - Create a new `cache.Cache` with options that scope the cache's view to only objects created or referenced when reconciling the given CR - Add the `cache.Cache` above to the `ScopedCache`: - Internally, the `ScopedCache` will create the correct mapping of `Namespace` ---> `Resource` ---> `cache.Cache` for a given CR and `cache.Cache` - Get/Create necessary informers for the CR from the `ScopedCache` - Configure informers to handle changed permissions - Start the `cache.Cache` that corresponds to the CR being reconciled - Use `controller-runtime` utility functions to create watches with the informers from the `ScopedCache` Due to the process of adding caches for a CR to the `ScopedCache` being a deliberate process - if there are any requests made to the `ScopedCache` without any `ResourceCaches` having been created, it is assumed that it is intended to use the cluster scoped `cache.Cache`. #### Demonstration This demonstration is an adapted version of the [Operator SDK Memcached Operator Tutorial](https://sdk.operatorframework.io/docs/building-operators/golang/tutorial/). It utilizes the `scoped-cache-poc` library and the `ScopedCache` cache implementation, enabling the Memcached operator to handle dynamically changing permissions. The source code for this operator can be found on GitHub here: https://github.com/everettraven/scoped-operator-poc/tree/test/scoped-cache ::: info **NOTE**: When using the scoped-operator-poc repository make sure to use the `test/scoped-cache` branch ::: #### Logic Flow of Operator ```mermaid graph TD A[Request] --> B B(Get Memcached CR) --> C{Deleted?} C -- Yes --> D[Remove Cache for Memcached CR] C -- No --> E{Cache for Memcached CR exists?} E -- Yes --> F(Get Deployment for Memcached CR) E -- No --> G(Create Cache & Informers for Memcached CR) G --> F F --> H{Error?} H -- Yes --> I{NotFound?} H -- No --> J(Ensure Deployment Replicas == Memcached Size) J --> K(List Pods for Memcached CR) K --> L{Error?} L -- No --> M(Update Memcached Status.Nodes with Pods) I -- Yes --> N(Create Deployment for Memcached) N --> O{Forbidden?} O -- Yes --> P(Stop Informers && Remove Cache) O -- No --> J I -- No --> Q{Forbidden?} Q -- Yes --> P Q -- No --> R(Log Error) L -- Yes --> S{Forbidden?} S -- Yes --> P S -- No --> R D --> T[Remove Finalizer] ``` #### Step 1 Run `setup.sh` to: - Delete existing KinD cluster - Create a new KinD cluster - Apply RBAC to give the following cluster level permissions: - `create`, `delete`, `list`, `watch`, `patch`, `get`, and `update` permissions for `memcacheds` resources - `update` permissions for `memcacheds/finalizers` sub-resource - `get`, `patch`, `update` permissions for `memcacheds/status` sub-resource - Create namespaces `allowed-one`, `allowed-two`, `denied` - Apply RBAC to give `create`, `delete`, `get`, `list`, `patch`, `update`, and `watch` permissions for `deployments` resources in the `allowed-one` and `allowed-two` namespaces - Apply RBAC to give `get`, `list`, `watch` permissiosn for `pods` resources in the `allowed-one` and `allowed-two` namespaces #### Step 2 Run `redeploy.sh` to: - Remove any existing deployments of the operator from the cluster - Build the image for the operator - Load the built image to the KinD cluster - Deploy the operator on the cluster - List the pods in the `scoped-memcached-operator-system` namespace so we can easily copy the pod name for when we take a look at the operator pod logs #### Step 3 Get the logs by running: ``` kubectl -n scoped-memcached-operator-system logs <pod-name> ``` We should see that the operator has started successfully: ``` 1.6608304693692005e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"} 1.660830469369386e+09 INFO setup starting manager 1.6608304693696425e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"} 1.6608304693696718e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"} I0818 13:47:49.369663 1 leaderelection.go:248] attempting to acquire leader lease scoped-memcached-operator-system/86f835c3.example.com... I0818 13:47:49.373329 1 leaderelection.go:258] successfully acquired lease scoped-memcached-operator-system/86f835c3.example.com 1.6608304693733532e+09 DEBUG events Normal {"object": {"kind":"Lease","namespace":"scoped-memcached-operator-system","name":"86f835c3.example.com","uid":"ac82b4f7-a193-4d52-864b-315e1fc80ce1","apiVersion":"coordination.k8s.io/v1","resourceVersion":"564"}, "reason": "LeaderElection", "message": "scoped-memcached-operator-controller-manager-7b4c9bb485-7jlkh_666ab6d0-e194-43c8-8348-724608e1521e became leader"} 1.6608304693735454e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": "kind source: *v1alpha1.Memcached"} 1.6608304693736055e+09 INFO Starting Controller {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached"} 1.660830469473995e+09 INFO Starting workers {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "worker count": 1} ``` #### Step 4 Create some `Memcached` resources in the `allowed-one` and `allowed-two` namespaces by running: ``` kubectl apply -f config/samples/cache_v1alpha1_memcached.yaml ``` #### Step 5 Get the logs again: ``` kubectl -n scoped-memcached-operator-system logs <pod-name> ``` For each CR we should see that there are logs signifying that: - A cache has been created for the `Memcached` CR - 2 event sources (watches) have been started (one is for `Deployment`s created by the controller and one is for `Pod`s created from the `Deployment`s) - Attempt to get a deployment - Creation of a deployment The logs should look similar to: ``` 1.6608306977001324e+09 INFO Creating cache for memcached CR {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-one","namespace":"allowed-one"}, "namespace": "allowed-one", "name": "memcached-sample-allowed-one", "reconcileID": "188d52ee-593c-4716-980f-b4fe500bdb6c", "CR UID:": "b2427753-f092-4e22-a633-4d39bea7a0c4"} 1.660830697700437e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": "informer source: 0xc0000a4640"} 1.6608306978010592e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": "informer source: 0xc0000a4780"} 1.6608306978010962e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-one","namespace":"allowed-one"}, "namespace": "allowed-one", "name": "memcached-sample-allowed-one", "reconcileID": "188d52ee-593c-4716-980f-b4fe500bdb6c"} 1.660830697801139e+09 INFO Creating a new Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-one","namespace":"allowed-one"}, "namespace": "allowed-one", "name": "memcached-sample-allowed-one", "reconcileID": "188d52ee-593c-4716-980f-b4fe500bdb6c", "Deployment.Namespace": "allowed-one", "Deployment.Name": "memcached-sample-allowed-one"} 1.6608306978104348e+09 INFO Creating cache for memcached CR {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-two","namespace":"allowed-two"}, "namespace": "allowed-two", "name": "memcached-sample-allowed-two", "reconcileID": "9bdb2035-db34-412b-bf2f-1496df56c134", "CR UID:": "690a5864-af57-4e6a-a75a-efce861d09cf"} 1.660830697810686e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": "informer source: 0xc0001b0e60"} 1.6608306978107378e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": "informer source: 0xc0001b10e0"} 1.6608306978107502e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-two","namespace":"allowed-two"}, "namespace": "allowed-two", "name": "memcached-sample-allowed-two", "reconcileID": "9bdb2035-db34-412b-bf2f-1496df56c134"} 1.6608306978107738e+09 INFO Creating a new Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-two","namespace":"allowed-two"}, "namespace": "allowed-two", "name": "memcached-sample-allowed-two", "reconcileID": "9bdb2035-db34-412b-bf2f-1496df56c134", "Deployment.Namespace": "allowed-two", "Deployment.Name": "memcached-sample-allowed-two"} ``` As the deployments are spun up and reconciled, the deployment may be modified. This operator sets ownership on deployments and will reconcile the parent `Memcached` CR whenever a child deployment is modified. You may see a chunk of logs similar to (example truncated to only a couple logs for brevity): ``` 1.6608307072214928e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-one","namespace":"allowed-one"}, "namespace": "allowed-one", "name": "memcached-sample-allowed-one", "reconcileID": "fa336c14-f699-4f70-89d0-37631770441f"} 1.660830707233768e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-allowed-two","namespace":"allowed-two"}, "namespace": "allowed-two", "name": "memcached-sample-allowed-two", "reconcileID": "644fd96b-346b-47b8-8c36-784e1741bbbb"} ``` #### Step 6 Check the namespaces to see that the proper deployments are created: ``` kubectl -n allowed-one get deploy ``` Output should look like: ``` NAME READY UP-TO-DATE AVAILABLE AGE memcached-sample-allowed-one 2/2 2 2 13m ``` ``` kubectl -n allowed-two get deploy ``` Output should look like: ``` NAME READY UP-TO-DATE AVAILABLE AGE memcached-sample-allowed-two 3/3 3 3 14m ``` #### Step 7 Let's see what happens when we create a `Memcached` CR in a namespace that the operator does not have proper permissions in: Create a `Memcached` CR in the namespace `denied` by running: ``` cat << EOF | kubectl apply -f - apiVersion: cache.example.com/v1alpha1 kind: Memcached metadata: name: memcached-sample-denied namespace: denied spec: size: 1 EOF ``` Check the logs, we should see: ``` 1.6608487955810938e+09 INFO Creating cache for memcached CR {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "865240aa-1eac-48d0-9a64-56c2eec66b88", "CR UID:": "b49142b4-cc50-4465-969c-7257049247b6"} 1.660848795581366e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": {}} 1.6608487955813868e+09 INFO Starting EventSource {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "source": {}} 1.660848795581394e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "865240aa-1eac-48d0-9a64-56c2eec66b88"} 1.6608487971761699e+09 INFO Not permitted to get Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "865240aa-1eac-48d0-9a64-56c2eec66b88"} 1.6612814011258633e+09 INFO Removing cache for memcached CR due to invalid permissions {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "ba671951-e2da-4b2b-87d9-9b0667f0c608"} 1.6612814011259036e+09 INFO Removing ResourceCache {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "ba671951-e2da-4b2b-87d9-9b0667f0c608", "CR UID:": "9f60a7b1-dee6-4b6c-a729-420ec651c0dc", "ResourceCache": {"9f60a7b1-dee6-4b6c-a729-420ec651c0dc":{"Scheme":{}}}} 1.6612814011259542e+09 INFO ResourceCache successfully removed {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "ba671951-e2da-4b2b-87d9-9b0667f0c608", "CR UID:": "9f60a7b1-dee6-4b6c-a729-420ec651c0dc", "ResourceCache": {}} ``` We can see we are also removing any caches that have been created for this `Memcached` CR to prevent unnecessary informers from hanging around. Checking the `Memcached` CR with `kubectl -n denied describe memcached` we can see the status: ``` Status: State: Message: Not permitted to get Deployment: deployments.apps "memcached-sample-denied" is forbidden: Not permitted based on RBAC Status: Failed ``` #### Step 8 Update the RBAC to give permissions to the denied namespace by running: ``` cat << EOF | kubectl apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: op-rolebinding-default namespace: denied roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: scoped-operator-needs subjects: - kind: ServiceAccount name: scoped-memcached-operator-controller-manager namespace: scoped-memcached-operator-system EOF ``` After a little bit of time we should see in the logs: ``` 1.66085439100725e+09 INFO Creating a new Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "8bfc654a-e372-47c8-9be5-2cf89f654c34", "Deployment.Namespace": "denied", "Deployment.Name": "memcached-sample-denied"} 1.66085439102921e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "0bbc7b20-f392-45dc-a210-0d10ec58ff34"} 1.660854392647686e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "610f80fc-9d7e-46b1-ac25-5e5286fa97d2"} ``` We can see in the `Memcached` CR status that it has been successfully reconciled: ``` Status: Nodes: memcached-sample-denied-7685b99f49-tv2b8 State: Message: Deployment memcached-sample-denied successfully created Status: Succeeded ``` We can also see that the deployment is up and running by running `kubectl -n denied get deploy`: ``` NAME READY UP-TO-DATE AVAILABLE AGE memcached-sample-denied 1/1 1 1 71s ``` #### Step 9 Now let's restrict access again by deleting the RBAC we applied to give permissions in the `denied` namespace: ``` kubectl -n denied delete rolebinding op-rolebinding-default ``` This change won't affect the existing `Memcached` CR since it has already been reconciled, but if we edit the existing `Memcached` CR or create a new one in the `denied` namespace we will see these logs start to pop up again: ``` 1.6608546666716454e+09 INFO Getting Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "b88c4f4f-4885-4bd4-a706-dc6b888dbca7"} 1.6608546666883416e+09 INFO Not permitted to get Deployment {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "b88c4f4f-4885-4bd4-a706-dc6b888dbca7"} ``` The `Memcached` CR status will again look like: ``` Status: Nodes: memcached-sample-denied-7685b99f49-tv2b8 State: Message: Not permitted to get Deployment: deployments.apps "memcached-sample-denied" is forbidden: Not permitted based on RBAC Status: Failed ``` In this example I edited the existing `Memcached` CR to kick off the reconciliation loop which is why the `Status.Nodes` field is still populated. Another thing to note in this case - if there is no reason for the reconciliation loop to run in the `denied` namespace the existing watches won't be cleaned up. Eventually the watches will attempt to refresh and they will encounter a `WatchError` due to permissions having been revoked. If not handled properly this will cause the Operator to enter a blocking loop where it continuously attempts to reconnect the watch. In this Operator, when creating informers we inject our own `WatchErrorHandler` that will close the channel used by the informers to stop them. We then remove the ResourceCache that did not have the proper permissions so that when we reconcile a CR in that namespace again, we attempt to recreate the informers in the event RBAC has changed. This handling of the `WatchError` prevents the blocking loop of continuously attempting to reconnect the watch. In the Operator logs, this process looks like: ``` W0823 19:34:02.520559 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:167: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:scoped-memcached-operator-system:scoped-memcached-operator-controller-manager" cannot list resource "deployments" in API group "apps" in the namespace "denied" 1.661283242520624e+09 INFO Removing resource cache for memcached resource due to invalid permissions {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "967b9ba2-14f8-4a1b-bc14-a2802940d4a4", "memcached": {"apiVersion": "cache.example.com/v1alpha1", "kind": "Memcached", "namespace": "denied", "name": "memcached-sample-denied"}} ``` #### Step 10 Now lets delete the `Memcached` CR from the `denied` namespace entirely by running: ``` kubectl -n denied delete memcached memcached-sample-denied ``` Because the operator utilizes finalizers, our resource should not be deleted until the finalizer is removed. As part of the finalizer logic, we remove the cache for the `Memcached` CR that is being deleted. We should see in the logs: ``` 1.6608559969812284e+09 INFO Memcached is being deleted {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "a186a050-8ad7-4c92-855c-de59d7b371ea"} 1.6608559969812474e+09 INFO Removing ResourceCache {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "a186a050-8ad7-4c92-855c-de59d7b371ea", "CR UID:": "eda9cac4-c3c6-4da1-b920-f374748d40cb", "ResourceCache": {"eda9cac4-c3c6-4da1-b920-f374748d40cb":{"Scheme":{}}}} 1.6608559969813137e+09 INFO ResourceCache successfully removed {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "a186a050-8ad7-4c92-855c-de59d7b371ea", "CR UID:": "eda9cac4-c3c6-4da1-b920-f374748d40cb", "ResourceCache": {}} 1.660855996986491e+09 INFO Memcached resource not found. Ignoring since object must be deleted {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "aa1eef8d-90ac-428c-b434-d76abdcf167b"} 1.6608560039957695e+09 INFO Memcached resource not found. Ignoring since object must be deleted {"controller": "memcached", "controllerGroup": "cache.example.com", "controllerKind": "Memcached", "memcached": {"name":"memcached-sample-denied","namespace":"denied"}, "namespace": "denied", "name": "memcached-sample-denied", "reconcileID": "f51f0c62-0237-468b-8e55-7a6d03a0d400"} ``` --- --- ## Drawbacks ### PoC Limitations as of 08/26/2022 - **Limitation**: `controller-runtime` is not designed in such a way that easily enables this functionality. There are workarounds that needed to be created to be able to properly implement this logic. **Potential Solution**: Work with `controller-runtime` to implement changes that limits workarounds and improves the user experience when it comes to creating scopeable operators. - **Limitation**: Only a caching layer. It is up to the Operator Authors to implement more complex logic to properly update the cache and handle changing permissions. **Potential Solution**: Provide a higher level library that provides Operator Authors a better user experience when developing operators that should handle changing permissions. - **Limitation**: Currently when using the scoped-cache-poc library watches are recreated multiple times for the same resources due to the way that informers are created. Informers returned by the `ScopedCache` is currently an aggregate of all informers available to an operator within the `NamespacedResourceCache`. **Potential Solution**: Provide a way to only get informers from a specific `ResourceCache`, enabling watches to be created only for the informers from a specific `ResourceCache`. ## Alternatives ### Dynamically configuring watches based on RBAC at startup The Operator Framework team [explored](https://github.com/everettraven/scoped-informer-poc/tree/main/pkg/scoped) a solution in which watches were established for the `Multi Namespace Cache` based on Available RBAC, but was abandoned due to [performance issues](https://github.com/everettraven/scoped-operator-poc#performance-evaluation) and guidance from David Eads. In this approach, an operator would perform a series of [SelfSubjectAccessReview](https://kubernetes.io/docs/reference/kubernetes-api/authorization-resources/self-subject-access-review-v1/) requests to understand: - Which GVKs the operator could establish `watches` at the cluster level - Which GVKs the operator could establish `watches` at the namespace level. The operator would use this information to configure watches for the cache accordingly. Some drawbacks to this approach included: - On startup, the operator would spam a cluster with SelfSubjectAccessReviewRequests for each GVK that needed to be watched. This presented a performance hit on large clusters with thousands of namespaces. - The operator would establish a watch for each GVK in each namespace with permissions. This presented the same scalability issue seen in the `Multi Namespace Cache`. - The operator would need to be restarted in order to restablish the watches in the cache whenever the RBAC of the operator was changed. Ultimately, David Eads concluded that there was no feasible way for the API to support scoped operators at scale, and encouraged us to investigate an approach where: - Watches are driven by CRs when possible. - Configured "just in time". - Deleted when no longer needed.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully