# API Retry in Kube
`client-go` provides three modes of communication:
- `Do` and `DoRaw`: for regular requests, like when we create a ConfigMap or get a `Pod` object.
- `Watch`: applies to watch request(s)
- `Stream`: for streaming APIs, when we do `oc log`?
Previously only `Do` and `DoRaw` had built-in retry logic. Recently we did the following:
- refactor the retry logic to be contained in a single unit, reusable and testable.
- add retry logic to `Watch` and `Stream`.
You can go through the following PRs to get a glimpse of what was done:
- https://github.com/kubernetes/kubernetes/pull/102217
- https://github.com/kubernetes/kubernetes/pull/102606
## Retry Semantics:
How does client know which failed request to retry? `client-go` uses the following net/http function to send a request to the server
```
var request http.Request
var client http.Client
response, err := client.Do(request)
```
- `err != nil` implies that the request failed with an error
- depending on the `StatusCode` of the http `response` obtained, the client will decide whether to retry the request.
- `Watch`: `StatusCode != 200` is an error and we can safely retry.
- `Stream`: `StatusCode >= 200 && StatusCode < 300` is a success and any other `StatusCode` implies we can retry.
### Server
If the server wants the caller to retry the request then it sends the following response to the caller:
```
StatusCode = {429|5xx}
Header:
retry-After: N
```
- A: response `StatusCode` is either `429` or `5xx`
- B: the response has a `Retry-After` header with a numeric value `N (N >= 0)`
Both `A` and `B` must be present in the response for the request to be retried.
One interesting fact, getting `429` does not always mean that the request was rejected by Priority and Fairness. The `kube-apiserver` rest/registry layer returns `429` to the caller if it wants the caller to retry the request.
### Client
```
response, err := client.Do(request)
```
after receiving `response` and `err`, `client-go` determines retryability:
- is the `err` set and is it retryable?
- are both `A` and `B` true?
if `err` is set then we retry only if:
- the request `verb` is `GET` (write operations are not retried as they may not be idempotent)
- if the `err` is:
- "connection reset"
- EOF
- unexpected EOF
- connection reset by peer
- use of closed network connection
- http2: server sent GOAWAY and closed the connection
if either of the above is true then client-go goes ahead and attempts a retry, but there are a few constraints:
### Retry Constraints
- `MaxRetries`: `client-go` allows the caller to set the maximum number of retries, the default is `10`
- The retry is roughly bound to the `context` of the `http.Request` object, if the `context` expires the retry operation is aborted
- the `body` of the request must be an `io.Seeker` so that we can seek to the beginning of the buffer before the next retry is attempted. if `Seek(0,0)` fails we usually see the following error
```
can't Seek() back to beginning of body
```
## Retry Loop
- 1:`client-go` attaches a `backoff` with a request, the default is `no backoff`. There is a way you can use exponential backoff by setting two environment variables.
- 2: calculate how much time to wait before the next attempt
- 3: call `Backoff.Sleep`
- 4: apply any client-side throttling using the `rateLimiter` associated with the request.
- 5: send the request
- 6: update `Backoff` with the response from the server
- 7: if it is a retryable response, calculate based on `(response, err)`
- 8: Seek to the beginning of the request `body` before the next retry, abort if any error.
- 9: Sleep for N seconds, N is obtained from the `Retry-After` response header
- 10: close the `body` of the http `Response` object to avoid memory leak
```sequence
Client->Client: 1: calculate Backoff Wait
Client->Client: 2: Backoff.Sleep
Client->Client: 3: Apply client-side throttling
Client->Server: 4: Send Request: client.Do(request)
Server->Client: 429 ('Retry-After: N')
Client->Client: 5: Update Backoff
Client->Client: 6: Is (response, err) retryable? Yes
Client->Client: 7: Current Attempt < MaxRetries? Yes
Client->Client: 8: Seek to the beginning of request body
Client->Client: 9: Sleep(N seconds)
Client->Client: 10: Close the body of the response
```
How to enable `URL Backoff`:
```
Environment variables: Note that the duration should be long enough
that the backoff persists for some reasonable time (i.e. 120 seconds).
The typical base might be "1".
envBackoffBase = "KUBE_CLIENT_BACKOFF_BASE"
envBackoffDuration = "KUBE_CLIENT_BACKOFF_DURATION"
```