owned this note
owned this note
Published
Linked with GitHub
# slow responses
## alert
```
Value: [ var='C0' metric='max(nginxlog_http_response_time_seconds{quantile="0.9",status=~"2[0-9]*",app="hachyderm",method="GET"})' labels={} value=11.141 ]
Labels:
- alertname = global 90th %ile response time (GET 200)
- grafana_folder = nginx alerts
Annotations:
- description = global 90th %ile response times (for GET 200s) have been >10s for 3m
- summary = HTTP responses are too slow
Source: https://grafana.hachyderm.io/alerting/grafana/YfwP_DK4z/view?orgId=1
Silence: https://grafana.hachyderm.io/alerting/silence/new?alertmanager=grafana&matcher=alertname%3Dglobal+90th+%25ile+response+time+%28GET+200%29&matcher=grafana_folder%3Dnginx+alerts
Dashboard: https://grafana.hachyderm.io/d/GxDpHJvVz?orgId=1
Panel: https://grafana.hachyderm.io/d/GxDpHJvVz?orgId=1&viewPanel=47
```
## investigation
first, i had to remember that the alert was firing in my local timezone (UTC) but the logs were in server timezone (UTC+1).
once i had the maths right, i could dig into some logs.
### slowest responses
picking the requests with the largest `rt` or `urt` entries in the access logs during the time of the last alert:
```
$ sudo tail -n 8000000 /var/log/nginx/access.log | awk '{print $1,$4,$7,$9,$(NF-1),$NF}' | grep "04/Dec/2022:10:1" | awk '{print $1,$3,$4,$5,$6}' | sort -k4,5 -r | head -n 30
100.79.63.43 /users/intentionaut/inbox 202 rt=9.109 urt=8.948
100.79.63.43 /api/v2/search?type=accounts&q=m 200 rt=7.358 urt=7.358
100.79.63.43 /inbox 202 rt=7.146 urt=6.985
100.79.63.43 /api/v2/search?type=accounts&q=ma 200 rt=4.660 urt=4.659
100.78.152.10 /media_proxy/109321232760470873/small 302 rt=4.235 urt=4.236
100.78.152.10 /media_proxy/109321232057637678/small 302 rt=4.186 urt=4.186
100.78.152.10 /media_proxy/109321232511542375/small 302 rt=4.012 urt=4.012
100.79.63.43 /api/v2/search?type=accounts&q=mast 200 rt=3.124 urt=3.124
100.79.63.43 /api/v2/search?type=accounts&q=mastod 200 rt=2.896 urt=2.896
100.79.63.43 /api/v2/search?type=accounts&q=mastodon 200 rt=2.840 urt=2.841
100.78.152.10 /media_proxy/109325556426753066/small 302 rt=2.477 urt=2.477
100.78.152.10 /media_proxy/109325556166561757/small 302 rt=2.425 urt=2.424
100.78.152.10 /media_proxy/109325556513566180/small 302 rt=2.408 urt=2.408
100.79.63.43 /api/v2/media 200 rt=2.276 urt=1.794
100.95.202.109 /api/v1/directory?order=active&local=false&limit=2000 200 rt=2.227 urt=2.226
100.95.202.109 /api/v1/directory?order=active&local=false&limit=2000 200 rt=2.161 urt=2.161
100.95.202.109 /api/v1/directory?order=active&local=false&limit=2000 200 rt=2.108 urt=2.108
100.78.152.10 /media_proxy/109325556329095718/small 302 rt=2.100 urt=2.100
100.79.63.43 /inbox 202 rt=2.035 urt=1.875
100.95.202.109 /inbox 202 rt=2.027 urt=2.022
100.95.202.109 /api/v2/search?type=&q=S&resolve=true 200 rt=1.978 urt=1.979
100.79.63.43 /api/v2/search?type=accounts&q=b 200 rt=1.880 urt=1.880
100.79.63.43 /inbox 202 rt=1.780 urt=1.619
100.95.202.109 /api/v1/media? 200 rt=1.657 urt=1.486
100.79.63.43 /inbox 202 rt=1.581 urt=1.420
100.95.202.109 /api/v1/media? 200 rt=1.571 urt=1.411
100.95.202.109 /api/v1/media? 200 rt=1.559 urt=1.392
100.79.63.43 /inbox 202 rt=1.526 urt=1.365
100.95.202.109 /users/Mara/statuses/109452191618583143 200 rt=1.523 urt=1.523
100.79.63.43 /inbox 202 rt=1.521 urt=1.360
```
filtering this further to pull out unique url, ip, and status codes:
```
$ sudo tail -n 8000000 /var/log/nginx/access.log | awk '{print $1,$4,$7,$9,$(NF-1),$NF}' | grep "04/Dec/2022:10:1" | awk '{print $1,$3,$4,$5,$6}' | sort -k4,5 -r | head -n 30 | awk '{print $1,$2,$3}' | sort | uniq -c | sort -k1 -rn
6 100.79.63.43 /inbox 202
3 100.95.202.109 /api/v1/media? 200
3 100.95.202.109 /api/v1/directory?order=active&local=false&limit=2000 200
1 100.95.202.109 /users/Mara/statuses/109452191618583143 200
1 100.95.202.109 /inbox 202
1 100.95.202.109 /api/v2/search?type=&q=S&resolve=true 200
1 100.79.63.43 /users/intentionaut/inbox 202
1 100.79.63.43 /api/v2/search?type=accounts&q=mastodon 200
1 100.79.63.43 /api/v2/search?type=accounts&q=mastod 200
1 100.79.63.43 /api/v2/search?type=accounts&q=mast 200
1 100.79.63.43 /api/v2/search?type=accounts&q=ma 200
1 100.79.63.43 /api/v2/search?type=accounts&q=m 200
1 100.79.63.43 /api/v2/search?type=accounts&q=b 200
1 100.79.63.43 /api/v2/media 200
1 100.78.152.10 /media_proxy/109325556513566180/small 302
1 100.78.152.10 /media_proxy/109325556426753066/small 302
1 100.78.152.10 /media_proxy/109325556329095718/small 302
1 100.78.152.10 /media_proxy/109325556166561757/small 302
1 100.78.152.10 /media_proxy/109321232760470873/small 302
1 100.78.152.10 /media_proxy/109321232511542375/small 302
1 100.78.152.10 /media_proxy/109321232057637678/small 302
```
## conclusions?
- search is incremental and some searches are very slow
- upstream fix required to maybe delay searching until 3+ characters are entered or the user hits enter.
- some inboxes are particularly slow
- ``/users/intentionaut/inbox`
- some `/media_proxy` fetches are particularly slow
- these all seem to result in 302