--- tags: public-doc --- # Flashbots Auction - Service Outage Reports ## Flashbots Relay & Dashboards & Website service degradation 7/31/2021 ### Impact At 2:15pm ET, users reported the dashboard service being unavailable, as well as the flashbots.net website not loading. A DNS change affected clients doing DNS lookups of various hostnames -- ultimately making the relay and other services unreachable for approximately an hour for many clients. ### Root Cause Upon investigation, it was discovered that during a domain mame renewal, the primary amd secondary DNS NS records had been re-set to those of the domain registrar. This change caused domain name lookups for flashbots.net, data.flashbots.net, and relay.flashbots.net to fail for clients as the NS records propagated. ### Fix DNS records were copied to the registrar DNS immediately upon discovery. This allowed clients to properly resolve the affected hostnames within one hour of service disruption. Also, the DNS NS records were also updated to the correct name servers, so that the primary DNS servers could take over serving flashbots.net hostnames. ### Alerting Alerting for successful resolution if all primary domains will be added, as well as team-wide alerting for website downtime. ## Flashbots Relay degraded service 5/13/2021 ### Impact At 7:30 pm eastern on 5/13/2021, an upgrade was made to the Flashbots Relay to resize the server and improve load. Doing this upgrade caused AWS to asign a new IP address to the server. Since miners manually whitelist the IP of the server in their firewall, this caused these miners to stop receiving bundles. ### Root Cause The Flashbots Relay was using automatically assigned IP address instead of using an elastic IP. ### Fix Moved the Flashbots relay to using an elastic IP, created a backup deployment of the relay, and asked the miners to whitelist the new IPs. Service was re-establised within the hour. ### Alerting The Flashbots Relay throws errors if bundles are not delivered to the miners which allowed us to quickly respond to the outage. ## Flashbots Relay degraded service 4/16/2021-4/19/2021 ### Impact Starting at approximately 4/16 7pm pst and ending 4/19 10pm pst, the relay database was overwhelmed and unable to send bundles quickly to miners. Approximately 70-80% of bundles were being dropped in the final step of the relay. ![](https://hackmd.io/_uploads/SJDKH5h8_.png) ### Root Cause The database was being overwhelmed by refreshing the stats view for each user (see https://github.com/flashbots/mev-relay-js/#flashbots_getuserstats). This data has grown exponentially since the view was first made, and it finally grew to the point where the hourly cron generating it never completed in under an hour. The database was left constantly refreshing this view, causing write latencies to consistently be over .1s, which in turn made the final relay step become incredibly slow, as it writes out a receipt for each miner that it sends a bundle to. Write latency spiking in the database: ![](https://hackmd.io/_uploads/rkUjB92Ld.png) ### Fix Disabled the stats views for now ### Alerting This was a tricky situation, since none of our normal alerts triggered. Bundles *were* making it through the relay, albeit less of them. We've now added alerts specifically checking for bundles that get stuck in this last step. We've also added tighter alerts monitoring the database performance.