SLO/SLI/SLA
- defining what availability is & defining how available you want to be.
SLI - Service Level Indicator
- Rquest latency
- Batch throughput - the throughput of requests per second in a case of a batch system
- Failures per request - failures per total number of requests
ex:
- 99th latency of requests received in the past five minutes less than 300 milliseconds
- 95th percentile latency of homepage requests over past 5 minutes < 300ms
- the ratio of errors to total requests in the past five minutes less than 1%
SLIs are service level indicators or metrics overtime, which inform about the health of a service.
SLO - Service Level Objectives
Binding target for a collection of SLIs
SLOs are service level objectives, which are agreed upon bounds for how often those SLIs must be met.
ex:
- 95th percentile homepage SLI will successd 99.9% over tailing year.
SLA - Service Level Aggrement
Business aggrement between a customer and service provider typically based on SLOs
Note:
- define: What I am going to do if I don't meet the level of reliability?
- SLAs describe the set of services and availability promises that a provider is willing to make to a customer, and then the ramifications associated with failing to deliver on those promises.
- Those ramifications might be things like money back or free credits for failing to deliver the service availability.
ex:
- Service credits if 95th percentile homepage SLI succeeds less than 99.5% over trailing year.
Relationship
SLIs drive SLOs which inform SLAs
You really want to make sure your SLA more lenient than your SLO. So you get early warning before you have to do things like field angry phone calls from customers or pay them lots of money for failing to deliver the services promised.
SLAs are business level agreements, which define the service availability for a customer and the penalties for failing to deliver that availability