changed 2 years ago
Published Linked with GitHub

hachyderm SLOs

in an effort to be more targeted with our efforts to keep the site online, it seems reasonable to consider SLO definitions. this will allow us to both focus where focus is necessary, and to stop dashboard trawling and rely on alerts instead.

this document will define SLOs per grafana dashboard, for some level of structure.

nginx SLOs

dashboard

mastodon SLOs

dashboard
while it would be great to claim all queues are under some threshold, different queues affect the service differently. in priority order:

​​​​* default: the local experience
​​​​* push: delivery to other instances
​​​​* ingress: incoming activity pub processing
​​​​* mailers: sending email
​​​​* scheduler: cron jobs like refreshing hashtags and cleaning logs
​​​​* pull: low priority tasks such as imports, backups, deleting users

node SLOs

dashboard

- [ ] disk space used [/mnt/mastodon-storage]: 80%

  • sys load 5m: 200%
    • this might be a bit tight as we do run the machines hot
  • RAM: 90%

postgresql SLOs

dashboard

TODO!(idle sessions or active sessions could be options)

Select a repo