implementable
To learn more about the motiviation behind Descoping Operators, please review:
As part of the Operator Framework's effort to move towards Descoped Operators we must identify how an operator could be configured to reconcile events in specific namespaces based on available Role Based Access Controls (RBAC). The Operator Framework ultimately hopes to fulfill the following problem statement:
As an operator author I want to develop an operator that can handle changing permissions so that cluster admins can use Role Based Access Controls (RBAC) to scope the permissions given to my operator.
In an effort to support the descoped operator
model, the Operator Framework will need to identify how a descoped operator can:
Operators built with Kubebuilder and the Operator SDK rely on Controller Runtime to build the cache used by their operator. Controller Runtime currently supports two forms of caches:
The cache used by the operator can be set by the options.NewCacheFunc parameter passed into the New Manager Function:
Informer Cache
is the default cache and can be configured to watch all namespaces or a single namespace. To configure it to watch a single namespace you can set the options.Namespace parameter when creating a new Manager
.Multi Namespace Cache
is used when the options.NewCacheFunc
is set to use the cache.MultiNamespacedCacheBuilder function. This cache can be configured to only set watches in a specific set of namespaces.Caches represent an operator's view of the resources available on the cluster. When an operator attempts to interact with a resource, the cache will:
list
call.watch
to mirror changes to resources in the cache made by outside entities.This means that an operator always needs appropriate list
and watch
permissions when configuring the cache. "Appropriate" list
and watch
perrmissions are defined by each Cache.
In the case of the Informer Cache
, the operator must have:
list
and watch
permissions for the GVK at the cluster level if not constrainted to a single namespacelist
and watch
permissions for the GVK at a namespace level if configured to a single namespaceIn the case of the Multi Namespace Cache
, the operator must have list
and watch
permissions for the GVK of the resource in all namespaces it is configured to.
This introduces a number of issues concerning the Operator Framework's Descoping Plan in which:
get
, list
, watch
, create
, delete
, update
pods in the bar namespace).Let's discuss key issues below
watches
Many customers are uncomfortable with granting operators cluster wide list
and watch
permissions. As discussed earlier, the existing cache implementation relies on certain list
and watch
permissions to populate the cache.
The existing Multi Namespace Cache
does not scale well. For each resource in the cache, a watch
stream is established in the namespaces given to the Manager function at startup. This cache was designed to limit an operator to a small number of namespaces. The more namespaces that are used with this cache, the less performant it becomes.
POSSIBLE SOLUTION:
The number of watches that we need to create could be limited by creating the watches as needed. In this scenario, watches
would be established when processing the CR, the operator could provide context to the cache regarding how to configure the watch. For example, if my CR specified that I needed to configure the cache for secrets
in a specific namespace, the watch could be established when the CR is processed. Likewise, the watch could be eliminated when all CRs that rely on the watch are deleted.
Neither cache supports watching specific resources within a namespace. Consider the scenario in which a CR allows a user to specify a single secret that the operator must watch
or interact with. There are many instances where an operator may not need to watch all events related to a resource type in a namespace. For example, the ingress controller only establishes a watch on secrets specified in a CR, it does not receive events for other secrets in the namespace.
Neither of Controller Runtime's cache implementations support populating GVKs at different scopes. The caches are populated for each GVK at the cluster level OR within the set of provided namespaces.
In an effort to address the shortcomings surrounding the existing cache implementations, this document proposes the introduction of a Dynamic Cache.
Key features delivered by the new Cache include:
watch
for a GVK at the cluster level.watch
for a GVK in specific namespaces.watch
for a specific resource.The key features deviate heavily from existing cache implementations because they assume that watches are either set at the cluster level or namespace level. If the new cache is designed to handle multiple ways to establish an informer, the onus must be placed on the Operator Author to specify how to configure the watch.
There is a need to remove watches
when they are no longer necessary in an effort to reduce the load that each operator places on the API server. Consider the scenario where the API server must fulfill the watch requests on a cluster with 100's of operators, the strain placed on the API server is unacceptable when using the Multi Namespace Cache
.
In instances where a watch is established by the Cache when reconciling a CR, there is a need to track when the watch can be discontinued. Today, this is done by placing a finalizer on the CR and having the operator clean it up.
Many of the benefits for establishing a watch
based on the existing of a CR that require it are consistent whether or not the watch is established at the cluster or namespace level or for a specific resource. These beneifits include:
watch
with the API server only if needed.watch
if no longer needed.TODO
Consider the following workflow with the Bar CR:
namespaceBar namespace
.namespaceBar namespace
.Dynamic Cache
establishes a watch
for all secrets in the namespaceBar namespace
and populates the cache with a list
.get request
for the namespaceBar/secretBar secret
The above scenario would only require list
, watch
, and get
permissions for secrets in the namespaceBar namespace
, so these permissions would not be required at the cluster level. The user could specify this in code like:
NOTE: The current PoC implementation would allow for this but only allows for one namespace to be have informers created in per CR. For example, for a given CR you would only be able to create informers in either namespace foo
or bar
but not both.
Consider the following workflow with the Bar CR:
namespaceBar/secretBar secret
namespaceBar/secretBar secret
.Dynamic Cache
establishes a watch
for just the namespaceBar/secretBar secret
and populates the cache with a list
.get request
for the namespaceBar/secretBar secret
The above scenario would only require list
, watch
, and get
permissions on the namespaceBar/secretBar secret
, so these permissions would not be required at the cluster or namespace level. The user could specify this in code like:
NOTE: The current PoC implementation only has a way to create watches on a specific resource in the same namespace a CR is in.
A major benefit to this approach is that an operator can be granted RBAC on specific resources.
TODO
[Bryce Palmer] - Proposed Changes (Add PoC details and demo from presentation document):
The scoped-cache-poc is a Go library that has a cache implementation (ScopedCache
) that satisfies the controller-runtime
cache.Cache
interface. This ScopedCache
provides operator authors a dynamic caching layer that can be used to handle dynamically changing permissions.
As of now, this library is ONLY the caching layer. It is up to the operator author to implement the logic to update the cache and handle the permission changes appropriately.
The idea behind the ScopedCache
is to create informers for resources as they are needed. This means:
One assumption is that Operators will always need to watch for CRs they reconcile at the cluster level.
In order to accomplish this, the ScopedCache
is comprised of a couple different caches:
cache.Cache
that is used for everything that should be cluster scopedNamespacedResourceCache
that is a mapping of Namespace
–-> ResourceCache
ResourceCache
is a mapping of types.UID
–-> cache.Cache
types.UID
is the unique identifier of a given Kubernetes ResourceTo properly use the ScopedCache
, when reconciling a CR you would need to create the corresponding watches. The workflow for creating these watches would look like:
cache.Cache
with options that scope the cache's view to only objects created or referenced when reconciling the given CRcache.Cache
above to the ScopedCache
:
ScopedCache
will create the correct mapping of Namespace
–-> Resource
–-> cache.Cache
for a given CR and cache.Cache
ScopedCache
cache.Cache
that corresponds to the CR being reconciledcontroller-runtime
utility functions to create watches with the informers from the ScopedCache
Due to the process of adding caches for a CR to the ScopedCache
being a deliberate process - if there are any requests made to the ScopedCache
without any ResourceCaches
having been created, it is assumed that it is intended to use the cluster scoped cache.Cache
.
This demonstration is an adapted version of the Operator SDK Memcached Operator Tutorial. It utilizes the scoped-cache-poc
library and the ScopedCache
cache implementation, enabling the Memcached operator to handle dynamically changing permissions.
The source code for this operator can be found on GitHub here: https://github.com/everettraven/scoped-operator-poc/tree/test/scoped-cache
NOTE: When using the scoped-operator-poc repository make sure to use the test/scoped-cache
branch
Run setup.sh
to:
create
, delete
, list
, watch
, patch
, get
, and update
permissions for memcacheds
resourcesupdate
permissions for memcacheds/finalizers
sub-resourceget
, patch
, update
permissions for memcacheds/status
sub-resourceallowed-one
, allowed-two
, denied
create
, delete
, get
, list
, patch
, update
, and watch
permissions for deployments
resources in the allowed-one
and allowed-two
namespacesget
, list
, watch
permissiosn for pods
resources in the allowed-one
and allowed-two
namespacesRun redeploy.sh
to:
scoped-memcached-operator-system
namespace so we can easily copy the pod name for when we take a look at the operator pod logsGet the logs by running:
We should see that the operator has started successfully:
Create some Memcached
resources in the allowed-one
and allowed-two
namespaces by running:
Get the logs again:
For each CR we should see that there are logs signifying that:
Memcached
CRDeployment
s created by the controller and one is for Pod
s created from the Deployment
s)The logs should look similar to:
As the deployments are spun up and reconciled, the deployment may be modified. This operator sets ownership on deployments and will reconcile the parent Memcached
CR whenever a child deployment is modified. You may see a chunk of logs similar to (example truncated to only a couple logs for brevity):
Check the namespaces to see that the proper deployments are created:
Output should look like:
Output should look like:
Let's see what happens when we create a Memcached
CR in a namespace that the operator does not have proper permissions in:
Create a Memcached
CR in the namespace denied
by running:
Check the logs, we should see:
We can see we are also removing any caches that have been created for this Memcached
CR to prevent unnecessary informers from hanging around.
Checking the Memcached
CR with kubectl -n denied describe memcached
we can see the status:
Update the RBAC to give permissions to the denied namespace by running:
After a little bit of time we should see in the logs:
We can see in the Memcached
CR status that it has been successfully reconciled:
We can also see that the deployment is up and running by running kubectl -n denied get deploy
:
Now let's restrict access again by deleting the RBAC we applied to give permissions in the denied
namespace:
This change won't affect the existing Memcached
CR since it has already been reconciled, but if we edit the existing Memcached
CR or create a new one in the denied
namespace we will see these logs start to pop up again:
The Memcached
CR status will again look like:
In this example I edited the existing Memcached
CR to kick off the reconciliation loop which is why the Status.Nodes
field is still populated.
Another thing to note in this case - if there is no reason for the reconciliation loop to run in the denied
namespace the existing watches won't be cleaned up. Eventually the watches will attempt to refresh and they will encounter a WatchError
due to permissions having been revoked. If not handled properly this will cause the Operator to enter a blocking loop where it continuously attempts to reconnect the watch.
In this Operator, when creating informers we inject our own WatchErrorHandler
that will close the channel used by the informers to stop them. We then remove the ResourceCache that did not have the proper permissions so that when we reconcile a CR in that namespace again, we attempt to recreate the informers in the event RBAC has changed. This handling of the WatchError
prevents the blocking loop of continuously attempting to reconnect the watch.
In the Operator logs, this process looks like:
Now lets delete the Memcached
CR from the denied
namespace entirely by running:
Because the operator utilizes finalizers, our resource should not be deleted until the finalizer is removed. As part of the finalizer logic, we remove the cache for the Memcached
CR that is being deleted. We should see in the logs:
controller-runtime
is not designed in such a way that easily enables this functionality. There are workarounds that needed to be created to be able to properly implement this logic.controller-runtime
to implement changes that limits workarounds and improves the user experience when it comes to creating scopeable operators.ScopedCache
is currently an aggregate of all informers available to an operator within the NamespacedResourceCache
.ResourceCache
, enabling watches to be created only for the informers from a specific ResourceCache
.The Operator Framework team explored a solution in which watches were established for the Multi Namespace Cache
based on Available RBAC, but was abandoned due to performance issues and guidance from David Eads.
In this approach, an operator would perform a series of SelfSubjectAccessReview requests to understand:
watches
at the cluster levelwatches
at the namespace level.The operator would use this information to configure watches for the cache accordingly. Some drawbacks to this approach included:
Multi Namespace Cache
.Ultimately, David Eads concluded that there was no feasible way for the API to support scoped operators at scale, and encouraged us to investigate an approach where: