# Webhook reponse and error handling for mutating/validating webhooks ## Prior art (API server) K8s apiserver does the following: * If network fails: * A golang error of type `ErrCallingWebhook` is generated and this is propagated up the chain * If server sends back a non-200 response: * A golang error of type `ErrCallingWebhook` is generated and this is propogated up the chain * If server sends back a 200 response: * The server response can either be `Allow=true` or `Allow=false` * If `Allow==true` * no golang error is propagated up the chain * If `Allow==false` * A golang error of type `ErrWebhookRejection` is generated and propagated up the chain Ref code: https://github.com/kubernetes/kubernetes/blob/4a9b9028153c6984b9cf69067cc0a1aa12a00e73/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go#L267-L301 **Up the chain the following is done:** The goal up the chain is to calcualte a result that is either `reject` or `accept`. If we recieve an error here we do one of the following: * error == `ErrCallingWebhook` * If `FailurePolicy==Ignore` * result is `accept` * else * result is `reject` * error == `ErrWebhookRejection` * result is `reject` If we dont receive an error the result is `accept` Ref code: https://github.com/kubernetes/kubernetes/blob/4a9b9028153c6984b9cf69067cc0a1aa12a00e73/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go#L144-L162 ## Runtime SDK ### Error Handling ~ Same approach: - Both error and response failures surfaces as go errors - Failure Policy handled up the chain Call (Atomic operation) - "Golang error" --> ErrCallingExtension - Response failure --> ErrExtensionFailure - Response success --> No Error CallAll (Up the chain) - Applies failure policy - Aggregate answer Aggregate error: - If at least one ErrCalling, ErrCalling - If at least one ErrExtension without ignore, ErrExtension - otherwise no error Aggregate status: - if at least one Failure without ignore, Failure - Otherwise success Aggregate message: - if Failure without ignore, Concat extension name: message Aggregate RetryAfter: - if Failure without ignore, lowest not 0 retry after ### Error Handling V2 ~ Same approach: - Both error and response failures surfaces as go errors - Failure Policy handled inside `Call` Call (Atomic operation) Call (request, response) err response.status - "Golang error" --> ErrCallingExtension - Response failure --> ErrExtensionFailure - Ignored Error --> No Error (captured at logs, metrics, etc) - Drop message and status is set to something different that what we received - Response success --> No Error CallAll (Up the chain) CallAll (request, response) err response1 err1 response2 err2 (response.status==Failure, err2 == ErrExtensionFailure) ... aggregate response/err response.status response.message response.retryAfter - Aggregate answer Aggregate error: - If at least one ErrCalling, ErrCalling - If at least one ErrExtension and no ErrCalling, ErrExtension - otherwise no error Aggregate status: - if at least one Failure, Failure - Otherwise success Aggregate message: - if Failure without ignore, Concat extension name: message Aggregate RetryAfter: - if Failure without ignore, lowest not 0 retry after ## Visibility ### Logging "always" errors * Including ignored errors "debug" what I call (level x) * figure out a log level where this information should be shown "trace" what response I get, errors (level y) * ### Metrics As we are calling external components that can affect our reconcile loop it is important to capture metrics information: - calls - duration - errors The metric collection show be out of the critical path - some kind of non-blocking routine(?) ## Implementation detail Handling multiple response types - Interfaces - Reflection (even though this is runtime the types are controlled by us and we can surface any problems with unit tests)