# Webhook reponse and error handling for mutating/validating webhooks
## Prior art (API server)
K8s apiserver does the following:
* If network fails:
* A golang error of type `ErrCallingWebhook` is generated and this is propagated up the chain
* If server sends back a non-200 response:
* A golang error of type `ErrCallingWebhook` is generated and this is propogated up the chain
* If server sends back a 200 response:
* The server response can either be `Allow=true` or `Allow=false`
* If `Allow==true`
* no golang error is propagated up the chain
* If `Allow==false`
* A golang error of type `ErrWebhookRejection` is generated and propagated up the chain
Ref code: https://github.com/kubernetes/kubernetes/blob/4a9b9028153c6984b9cf69067cc0a1aa12a00e73/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go#L267-L301
**Up the chain the following is done:**
The goal up the chain is to calcualte a result that is either `reject` or `accept`.
If we recieve an error here we do one of the following:
* error == `ErrCallingWebhook`
* If `FailurePolicy==Ignore`
* result is `accept`
* else
* result is `reject`
* error == `ErrWebhookRejection`
* result is `reject`
If we dont receive an error the result is `accept`
Ref code: https://github.com/kubernetes/kubernetes/blob/4a9b9028153c6984b9cf69067cc0a1aa12a00e73/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go#L144-L162
## Runtime SDK
### Error Handling
~ Same approach:
- Both error and response failures surfaces as go errors
- Failure Policy handled up the chain
Call (Atomic operation)
- "Golang error" --> ErrCallingExtension
- Response failure --> ErrExtensionFailure
- Response success --> No Error
CallAll (Up the chain)
- Applies failure policy
- Aggregate answer
Aggregate error:
- If at least one ErrCalling, ErrCalling
- If at least one ErrExtension without ignore, ErrExtension
- otherwise no error
Aggregate status:
- if at least one Failure without ignore, Failure
- Otherwise success
Aggregate message:
- if Failure without ignore, Concat extension name: message
Aggregate RetryAfter:
- if Failure without ignore, lowest not 0 retry after
### Error Handling V2
~ Same approach:
- Both error and response failures surfaces as go errors
- Failure Policy handled inside `Call`
Call (Atomic operation)
Call (request, response) err
response.status
- "Golang error" --> ErrCallingExtension
- Response failure --> ErrExtensionFailure
- Ignored Error --> No Error (captured at logs, metrics, etc)
- Drop message and status is set to something different that what we received
- Response success --> No Error
CallAll (Up the chain)
CallAll (request, response) err
response1 err1
response2 err2 (response.status==Failure, err2 == ErrExtensionFailure)
...
aggregate response/err
response.status
response.message
response.retryAfter
- Aggregate answer
Aggregate error:
- If at least one ErrCalling, ErrCalling
- If at least one ErrExtension and no ErrCalling, ErrExtension
- otherwise no error
Aggregate status:
- if at least one Failure, Failure
- Otherwise success
Aggregate message:
- if Failure without ignore, Concat extension name: message
Aggregate RetryAfter:
- if Failure without ignore, lowest not 0 retry after
## Visibility
### Logging
"always" errors
* Including ignored errors
"debug" what I call (level x)
* figure out a log level where this information should be shown
"trace" what response I get, errors (level y)
*
### Metrics
As we are calling external components that can affect our reconcile loop it is important to capture metrics information:
- calls
- duration
- errors
The metric collection show be out of the critical path - some kind of non-blocking routine(?)
## Implementation detail
Handling multiple response types
- Interfaces
- Reflection (even though this is runtime the types are controlled by us and we can surface any problems with unit tests)