hachyderm postmortem: fritz overload 2023-01-03

# hachyderm postmortem: fritz overload 2023-01-03 _please do not change the format or delete sections_ _fill out anything in []_ |  |  | |---------------|----------------------| | Author | @dma | | Collaborators | | | Status | draft | ## executive summary |  |  | |------------|----------------| | Impact | spikes in response times and "streaming down" alerts in discord | | Root Cause | too much CPU being used on fritz | ## problem summary |  |  | |---------------------|-----------------| | Duration of problem | ~40m | | User impact | users experienced very long response times and 500s | | Detection | alerts fired in discord | | Resolution | changed mastodon-streaming service config on fritz | ## background  fritz runs mastodon-web and mastodon-streaming and all other web nodes proxy to fritz. mastodon-web was configured with 16 processes each having 20 threads. mastodon-streaming was configured with 16 processes ## root causes and trigger organic growth in users and traffic coupled with the return from vacation of the US caused the CPU to hit >90% consistently on fritz causing responses to fail to be returned to the upstream web frontends.   ## Impact  p90 response times grew from ~400ms to >2s. increase of 502 responses to >1000 per minute. ## Lessons Learned  response times are very sensitive to puma threads (reducing from 20 to 16 threads per process doubled GET response times). the site functions pretty well even with fewer streaming processes ## Things that went well  we had the core CPU load on the public dashboard. ## Things that went poorly  in an attempt to get things under control both mastodon-streaming and mastodon-web were changed. puma was then reverted as we had over-corrected and response times were getting quite bad. ## Where we got lucky  @dma was already keyed in to fritz thanks to an earlier issue where certs hadn't been renewed. ## Action items   | Action item | Type | GitHub Issue | |-------------|--------------|--------------| | reduce the number of streaming processes on fritz from 16 to 12 | repair | n/a |