EPF Update #5 - HackMD

# EPF Update #5 ### TLDR / Main Updates - Postgres DB backend for bid analysis - API endpoints for analysis stats ### Dev Update for 01/23/2023 Last week's update mentioned WIP work on persisting bid analysis data, and now that the patch is done, this week's update contains some notes on how it works and some useful things coming with it. So, last week I worked on the logic to persist bid analysis data in a flexible way and implemented an initial query layer on top of the DB store to get bid analysis data. I committed the cleaned-up patch [here](https://github.com/ralexstokes/relay-monitor/commit/1ad5e2959a5519c3930316cd97aee603bebcf178). The reason for this work was two-fold. First, so far bid analysis have been stored in-memory as a mapping of analysis type -> count of those analysis events, so persistence was needed. Second, because of having to store analysis in-memory, you couldn't really query for results in a flexible way, e.g. for slot ranges or filter by category. #### Bid Analysis As a quick recap, a bid "analysis" is an event / object that the relay monitor generates when it receives a bid from the relay (builder) via a `getHeader()` request. The relay monitor can do various analyses ranging from verifying bid w.r.t consensus to verifying validator preferences and whether the bid / corresponding payload conforms to these. For now, the relay monitor has just the consensus-lever analysis / verification, but adding new analyzers will be very easy. Regardless, with persistance to DB, the analysis data is being stored into a table with a following schema (i.e. format) ``` id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, inserted_at timestamp NOT NULL default current_timestamp, -- bid "context" data slot bigint NOT NULL, parent_hash varchar(66) NOT NULL, relay_pubkey varchar(98) NOT NULL, proposer_pubkey varchar(98) NOT NULL, category bigint, reason text ``` The "context" data helps identify the bid (in a followup PR this can be refactored to not require this much duplication). The `category` field indentifies what the result of the analysis was and right now is defined as the following enum ``` type AnalysisCategory uint const ( ValidBidCategory AnalysisCategory = iota InvalidBidConsensusCategory InvalidBidIgnoredPreferencesCategory ) ``` The `reason` is an optional string that the analyzer can use to provide context. Given this, the scenarios the relay monitor handles currently are 1. Analyze bid, bid is valid -- `category` is set to `ValidBidCategory`, `reason` is empty, and the analysis is persisted 2. Analyze bid, bid is faulty (e.g. missing field) -- `category` is set to `InvalidBidConsensusCategory` and `reason` provides an "error string" 3. Analyze bid, bid is valid but validator preference is not honored (e.g. gas limit) -- `category` is set to `InvalidBidIgnoredPreferencesCategory` and `reason` can be used for more context This covers the "save" portion of bid analysis. For "get", there are now a couple of generic "getters" for analysis data from the DB can be utilized to retrieve whichever faults (or bid verification "successes") that you'd like - `GetCountAnalysisWithinSlots` -- fetch analysis stats using slots as "time" - `GetCountAnalysisWithinDuration` -- fetch analysis stats using insertion time (used interchangeably with "analysis time" since we assume a save to DB shortly after analysis) Both functions accept a `AnalysisQueryFilter` ``` type AnalysisQueryFilter struct { Category AnalysisCategory Comparator string } ``` which helps filter out results. This allows queries like "count faults of class "ignored validator preference" that have occured in last 10 min". #### API There's a few new API endpoints for getting bid analysis stats. These are mostly for POC and are barebones so far, but show how the bid analysis numbers can be returned to a client / interested relay. ``` GetAnalysisCount = "/monitor/v1/analysis/count" GetAnalysisCountBidValid = "/monitor/v1/analysis/count/valid" GetAnalysisCountBidFault = "/monitor/v1/analysis/count/fault" ``` #### Supported Queries There's a number of things that can be done now, stats / analysis count wise. As an example, here's what I wrote from previous week's update: > 2. Support for queries of bid analysis, e.g. "count faults from slot n to n + 10" or just "how many bids have been analyzed / inserted in the last hour" "how many bids have been analyzed / inserted in the last hour" can be queried on route ``` /monitor/v1/analysis/count?minutes=60 ``` With the new endpoints "count faults from slot n to n + 10" becomes an API request to relay monitor on route ``` /monitor/v1/analysis/count/fault?slots=10 ``` more query params can add more filtering options. ### Work for upcoming week There are a few smaller-scoped features that will be added on top of this - Filtering by relay, since right now all the stats endpoints assume only a single relay - Generate per-relay fault report from persisted bid analysis data (currently still keeps a counter in-memory). Very easy to do utilizing the "getters" from above - More stats API endpoints and more query params for filtering, etc. I got started on, but didn't finish the "streaming" part of the bid analysis fault notification that I'm aiming to be done with this week. The idea is to have a nice way to be notified of faults (or successful verifications of valid bids) without having to send queries. API endpoints for stats are still useful for generic data keeping purposes, but for this to be useful to a relay we need a nice way to notify when something doesn't check out with a bid