<style>
.reveal {
font-size: 18px;
font-family: "courier"
}
</style>
## Bring Your Own Observability
---
## Observability
We're talking about the metrics, telemetry, tracing and logs that help us get notified, diagnose and improve the state of our applications.
---
#### Observability Options on GOV.UK PaaS
---
#### Pazmin + `cf logs`
* Great for getting started! :+1:
* Zero config! :+1:
* No option for custom metrics
* No option for custom dashboards
* No option for alerting
* Brief log retention
---
#### Export to a SaaS offering
* Zero maintenance! :+1:
* ie syslog drain -> Logit
* ie statsd -> Hosted Graphite
* Burden of procurement
* May require deploying some kind of adapter
* Fragmented from the PaaS
* Recommending products is a grey area for PaaS team
---
#### The "Observe" Prometheus
* Minimal config for app service discovery! :+1:
* Alerting in-the-box :+1:
* Dashboarding in-the-box :+1:
* Custom metrics collection in-the-box :+1:
* GDS teams only :unamused:
* No solution for logging (punt it to Logit)
* Has some quirks (exposing metrics publicly, service discovery gets out of sync, requires authorizing a "user" outside of the team)
---
#### DIY (Deploy It Yourself)
* InfluxDB backing service available! :+1:
* Many tools will run on the PaaS backed by InfluxDB ie:
* Deploy your own Prometheus for metrics :+1:
* Deploy your own Grafana for dashboarding :+1:
* Deploy your own Alertmanager for alert routing :+1:
* Deploy your own Telegraf for log collection :+1:
* Burden of choice and configuration: :cry:
* Burden of "Day 2" operations
* May require knowledge or more "advanced" PaaS features such as BoshDNS and NetworkPolicies
* Custom "glue" code or static configuration management
* Duplication of effort
* Bluring of the App vs Service model
---
#### DIY++ Can we do better?
* A "PaaS-native" experience available for ALL
* Solution for both metrics and logging in-the-box
* Reduce burden of configuration (minimal/zero)
* Reduce burden of choice
* Recuce burden of "day 2" operations
---
#### Kubernetes, Operators & Sidecars
* An "Operator" is an app that manages the configuration and lifecycle of another app
* A "Sidecar" is a supporting process bolted on to the side of an application or injected by an Operator
---
#### The Prometheus Operator
* An kubernetes application you can deploy to your namespace that:
* manages the lifecycle of Prometheus (metrics)
* manages the lifecycle of Alertmanager (alerts)
* manages the lifecycle of Grafana (dashboarding)
* manages sidecars to automate platform specific metric collection
* manages sidecars to automate configuration / discovery
* provides kubernetes-native methods to customise configurations (`kubectl apply custom-resource.yml`)
---
#### PaaS, Brokers & Sidecars
* A "Broker" is an app that manages the configuration and lifecycle of another app
* Although usually deployed platform-wide, you can deploy your own via "space scoped brokers"
* A "Sidecar" is a supporting process bolted on to the side of an application or injected by a buildpack
---
#### The BYO Obserability Broker
* A GOV.UK PaaS application you can deploy to your own space that:
* manages the lifecycle of Prometheus (metrics)
* manages the lifecycle of Grafana (dashboarding)
* manages the lifecycle of Telegraf (logs)
* manages the lifecycle of InfluxDB (storage)
* manage sidecars to automate platform specific metric collection
* manages sidecars to automate configuration / discovery
* provides PaaS-native methods to customise configurations (`cf bind my-app prometheus`, `cf bind my-app `)
* potentially offer other stacks TICK vs TIPG
---
#### User Experience
The UX we're aiming for something like:
```bash
# install the broker
cf push -b "https://github.com/alphagov/paas-byo-observability"
# create an oberserability stack
cf create-service TIPG observability
# configure app for metric collection and log shipping
cf bind-service my-app observability
# check last-status for useful URLs
cf service observability
...
grafana: https://some-grafana.cloudapps.digital
prometheus: https://some-grafana.cloudapps.digital
...
# open grafana URL and see my-app metrics and logs in one place
```
---
#### Great! What do we need?
* A broker implementation that can deploy other apps and services to the paas
* PaaS-deployable versions of:
* Proemtheus
* Grafana
* Telegraf
* Jager maybe?
* Alertmanager maybe?
* PaaS-deployable exporters for:
* Container metric collection
* Sidecar processes for each service that watch cloudfoundry API and generate configuration based on Bindings and Service parameters
* Automation of network policy where required
* An easy way to install the broker (custom buildpack - self register as broker?)
---
#### So far...
* [x] broker provision code for influx backed prometheus
* [x] broker provision code for grafana
* [x] broker provision code for paas-exporter
* [ ] broker provision code for influx backed telegraf
* [ ] broker provision code for alertmanager
* [x] grafana config sidecar (bindings -> datasources)
* [x] prometheus config sidecar (binding -> DNSSD scrape config)
* [x] telegraf config sidecar (binding -> syslog drain config)
* [ ] grafana authentication config
* [ ] deprovisioning steps
* [ ] deployable broker
* [ ] ...
{"metaMigratedAt":"2023-06-15T09:05:54.332Z","metaMigratedFrom":"YAML","title":"Untitled","breaks":true,"slideOptions":"{\"allottedMinutes\":5,\"theme\":\"night\"}","contributors":"[{\"id\":\"54773c8b-0bd2-4fd4-af0c-dd8e838f6c20\",\"add\":5596,\"del\":129}]"}