Figuring out how configuring kube-apiserver is tweaked

# Figuring out how configuring kube-apiserver is tweaked (And how we configure how we authenticate requests to it?) ## What I (think I) know ### kube-apiserver config - kube-apiserver runs on the master node - Picks up it's config from `/etc/kubernetes/manifests/kube-apiserver.yaml` - Config is passed to kube-apiserver CLI (command being run inside the pod) as flags - kubelet watches this and makes sure that the kube-apiserver pod is updated if the config is ### Configuring auth to kube-apiserver - Looks like all the certificates needed are on the master node - The volumes are mounted and the path passed via flags to `kube-apiserver` - Looks like there are APIs that are only understood by `kube-apiserver`, and they have their corresponding flags, for example we define an `EgressSelectorConfiguration` resource and pass it to `--egress-selector-config-file` (PS I am playing around with this stuff on minikube with a single node, guessing that if there are multiple master nodes they will all have the certs they need on their file system, not sure if in other cases these certs are "issued" by something outside the node) ### The thing that the KEP is about - Today we can only pick which authorization modes we wanna use using the `--authorization-modes` flag. Which would allow us to enable one webhook. And then we pass relevant values to `--authorization-webhook-*` flags. We want to let the user add multiple webhooks and order them among other auth modes using a new API. ## The Fuzzy parts **Q:** If we pass `--authorization-modes=Webhook,RBAC,Node`, does that mean it will first validate using the Webhook and hen RBAC? (the docs do not call it out, or does the order not matter because it will have to be validated by all of them?) (Guess this well figure itself out when I peek into the code) **Q:**: If we do introduce something like `--authorization-configuration-config-file` which accepts a file adhering to a new API `AuthorizationConfiguration` that puts the `--authorization-webhook-*` flags in a weird place. Does the file take priority over the other flags stating the same information? (I guess we need to define how we prioritise them and have a way to deprecate the ones that are in a sus place now?) (^ This one is why we need a KEP I am guessing) ### Not-so-fuzzy fuzzy parts If someone is using a single webhook today and they want to start using pre-filtering for their webhook - they switch to using `AuthorizationConfiguration`. Which means we won't end up introducing something like `--authorization-webhook-rules-file` for users continuing to stick to the existing flags ## Definitely stupid questions Will prolly figure some of this myself tomorrow - **Q:** Looks like generally folks add relevant tests while changing stuff up and let the workflow do its job? So if I wanna let's say build k8s, create a cluster with local k8s and then take a look at how stuff looks. How would I go about that, or will I never need to do that? - **Q:** Looks like [this](https://kubernetes.io/docs/reference/access-authn-authz/webhook/) is what we supply to `--authorization-webhook-config-file`, sorta makes sense, but sorta does not. Like when would someone use this is not super clear, even though I get how. Probably gonna look up a blog/vlog of someone configuring something like this ¯\_(ツ)_/¯ - **Q:** How does this use case differ from what a `ValidatingAdmissionWebhook` (which is apparently an admission plugin - TIL) solves ## My to-do list for tomorrow - Closer look at Contributors guide - Gonna go through auth docs to figure out what the different auth modes do because I have absolutely no clue - Look up where the code for `kube-apiserver` lives ----------- Nabarun's Notes from Meeting with Mo - General desire to move away from flags. Versioned flags are needed. Similar to the OIDC proposal. Desire from David as well. - Use case from Mo and David. In authn, there's an arbitrary field called extra. The entire reason for this is to let the authz stack signal from the authn stack about some metadata. - One might be protecting resources. One be tied to the authn stack. One might be OPA. - Need for a priority level amongst modes. - Control on the version of SubjectAccessReview. - Open Question: Did anyone ever ask for a dynamic reload? Maybe keep out of scope for now? - Non-goal: Don't use kubeconfig. - Goal: A proper config is needed. - Goal: Behavior on error. Currently it is a bit weird. - If you have a webhook to protect secrets, all the webhooks would be invoked for this. Is there a way to filter? - I would also be able to say, webhook shoould be triggered only if user belongs to a particular group. Benefit is network cost saving. - one way to build a filtering mechanism is to use CEL. This would also instill confidence. - Metrics: Latency per webhooks, number of times they are invoked. - Goal: inability to reach the webhook currently results in meh behaviour, we would provide a way to define the behaviour. - Pinniped does authn only. What I want in Tanzu is a tightly coupled authn and authz system. In a commercial product tight coupling is required. We don't want to do patches to this feature as it would be a huge CI cost. - Q. Q. feature flag to switch between ways? Q. can you add this to the KEP Collection sheet? ---- ## Scrubbed from the KEP Draft ~~- Allow webhook invocations conditional on on metadata in [`Extra` fields](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#authentication-strategies)~~ --- 100mik's notes on Authorization # Multiple authorization webhooks filez - Define config in staging? (`staging/src/k8s.io/apiserver/pkg/apis/apiserver/types.go`) - How the hell does staging work? :/ - We would define the AuthorizationConfiguration type in `staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go` - Change how options are picked for Authorizer (`pkg/kubeapiserver/options/authorization.go`) - Add new flag to accept config file - If file is supplied `ToAuthorizationConfig` prioritises it over supplied flags (NO) - This also acts as a feature flag (NO) - `AuthorizationConfigFromFile` feature flag will gate the usage of `AuthorizationConfiguration` - Ordering of auth modes (`pkg/kubeapiserver/authorizer/config.go`) - Changes to `Config` struct to add a new key with list of WebhookConfigs (New type?) - Looks like all auth modes are already supplied in the order they are supplied to `--auth-modes` - Need to change up the case for a webhook to pick up values for a particular auth mode - A new key which has a list of configs for Webhooks --- ## First round of feedback - Use case: deny access to a system:masters - With our current goals, if there is a webhook resolving this case, it will be called for all requests - ~~Is this a tag? ns? role? The webhook can handle this with programming logic none the less, but just clarifying~~ - Looks like it is a typical clusterrolebinding in openshift. Not very clear from the comments, but looks like they wanna deny the userrole in certain cases? By adding an authorizer in some cases because the RBAC rules for the role are too privileged? - Why does RBAC not protect installed CRDs correctly? - "Can you expand on why a nested webhook could not also have logic to avoid unnecessary calls?" - A nested webhook is called irrespective of the scope of the nested webhooks, even if it has the logic narrowing down number of nested calls. - This results in dependency on availability of a single webhook - Need to decide whether or not we wanna move webhook config into the file structure - Gotta scope out how this impacts code changes - Does this mean that it is not something we can do while iterating over alpha? - Deprecate ABAC mode, no need to include it in file structure - Makes sense ¯\\\_(ツ)_/¯ - Integration test: "you need to check the failure mode handling too." - Unsure what this means, I am guesing we need tests for both happy and not-happy paths? - What happens if we reenable the feature if it was previously rolled back? (Unanswered) - Works as expected if appropriate values are to `--authorization-config-file` as a valid file path (?) - How can an operator determine if the feature is in use by workloads? ("this would be checking to see if the flag is set, correct?") - Sounds about right :/ it is the only way to determine this - Re: SLOs: This doesn't seem good enough to me. Which metric could someone use to determine this. - Do we need metrics of some sort? Difference if any when we toggle the feature? (Unsure about this) - How do we separate metrics for multiple webhooks? - (Still have not looked at part where the auth chain is executed) I believe the "server" value in kubeconfig could be used to separate it out, but not sure if this information is accessible from that part of the code ## Responses to first round of questions **Q:** in openshift we found a need to allow another authorizer to deny access to a system:masters if desired. We found a need to do this to support a system:master scope limiting his access. I suggest having this as a goal or user story. _(could use clarity)_ **A:** We could definitely add more relevant user stories. To clarify in this case we want to limit the access of role `system:masters` using an additional webhook so that we can have a role `system:master` with a similar level of access instead? **Changes to KEP:** Eventually a new user story --- **Q:** It's a long thread and the impact here isn't obvious from the description above. Why is an authorization webhook necessary to prevent access to a resource type? RBAC is allow only, no deny rules. It seems as though the only risk is clusters with overly broad allow rules. **Context:** https://groups.google.com/g/kubernetes-sig-api-machinery/c/MBa19WTETMQ _(unresolved)_ --- **Q:** Can you expand on why a nested webhook could not also have logic to avoid unnecessary calls? A: A webhook with nested webhooks could potentially filter on the request. However, that introduces a single point of failure which affects all requests. Having, a way to supply multiple authorization webhooks with defined `onError` behaviours would be a more robust way of achieving this natively. --- **Q:** add a line break on sentences to make commenting on the diff easier. > Eventually, we can make a decision about moving webhook configuration to the proposed >authorization configuration. This decision needs to be made before we write the code. (unresolved, we could add line breaks) --- **Q:** why would we choose to allow ABAC? If we don't want to encourage it, it seems more straightforward to not support it in this new file. **A:** It does make sense to to not support ABAC in the new format if it is going to be deprecated. We will update the KEP accordingly. **Changes to KEP**: Remove support for ABAC --- **Q:** you need to check the failure mode handling too. **A:** We will take note of cases to be added for the same **Changes to KEP:** Additional integration test cases with feature flag enabled: - With a webhook with `onError: Deny` - With a webhook with `onError: NoOpinion` --- **Q:** these need to be answered. **Context:** What happens if we reenable the feature if it was previously rolled back? (Feature enablement and roll back section) **A:** Yep, we will amend the KEP. **Changes to KEP:** Anser to the question: The default behaviour is affected as described above if the feature is re-anabled. The behaviour is the same as when the feature is enabled for the first time. --- **Q:** this would be checking to see if the flag is set, correct? **Context:** ``` ###### How can an operator determine if the feature is in use by workloads? Not applicable. ``` **A:** That is correct. An operator that can view kube-apiserver configuration will be able to determine this in this fashion. We can amend the KEP to reflect this. --- **Q:** This doesn't seem good enough to me. Which metric could someone use to determine this. **Context:** ``` ##### What are the reasonable SLOs (Service Level Objectives) for the enhancement? The authorization chain keeps on working as it is. ``` _(unresolved)_ **A:** --- **Q:** How will the metrics for multiple authorization webhooks be separated? _(unresolved)_ **Thoughts:** Not sure what identifies a webhooks config uniquely. Is that something new we have to define? Would the `current-context` value suffice? Do we need to ensure this value is not shared across webhooks? Is this easier if we bring the kubeconfig into the new structure and do it while validating it?