EPF Update #5

TLDR / Main Updates

  • Postgres DB backend for bid analysis
  • API endpoints for analysis stats

Dev Update for 01/23/2023

Last week's update mentioned WIP work on persisting bid analysis data, and now that the patch is done, this week's update contains some notes on how it works and some useful things coming with it. So, last week I worked on the logic to persist bid analysis data in a flexible way and implemented an initial query layer on top of the DB store to get bid analysis data. I committed the cleaned-up patch here.

The reason for this work was two-fold. First, so far bid analysis have been stored in-memory as a mapping of analysis type -> count of those analysis events, so persistence was needed. Second, because of having to store analysis in-memory, you couldn't really query for results in a flexible way, e.g. for slot ranges or filter by category.

Bid Analysis

As a quick recap, a bid "analysis" is an event / object that the relay monitor generates when it receives a bid from the relay (builder) via a getHeader() request. The relay monitor can do various analyses ranging from verifying bid w.r.t consensus to verifying validator preferences and whether the bid / corresponding payload conforms to these. For now, the relay monitor has just the consensus-lever analysis / verification, but adding new analyzers will be very easy. Regardless, with persistance to DB, the analysis data is being stored into a table with a following schema (i.e. format)

id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
inserted_at timestamp NOT NULL default current_timestamp,

-- bid "context" data
slot                    bigint NOT NULL,
parent_hash             varchar(66) NOT NULL,
relay_pubkey            varchar(98) NOT NULL,
proposer_pubkey         varchar(98) NOT NULL,

category                bigint,
reason                  text

The "context" data helps identify the bid (in a followup PR this can be refactored to not require this much duplication). The category field indentifies what the result of the analysis was and right now is defined as the following enum

type AnalysisCategory uint

const (
	ValidBidCategory AnalysisCategory = iota
	InvalidBidConsensusCategory
	InvalidBidIgnoredPreferencesCategory
)

The reason is an optional string that the analyzer can use to provide context.

Given this, the scenarios the relay monitor handles currently are

  1. Analyze bid, bid is valid โ€“ category is set to ValidBidCategory, reason is empty, and the analysis is persisted
  2. Analyze bid, bid is faulty (e.g. missing field) โ€“ category is set to InvalidBidConsensusCategory and reason provides an "error string"
  3. Analyze bid, bid is valid but validator preference is not honored (e.g. gas limit) โ€“ category is set to InvalidBidIgnoredPreferencesCategory and reason can be used for more context

This covers the "save" portion of bid analysis. For "get", there are now a couple of generic "getters" for analysis data from the DB can be utilized to retrieve whichever faults (or bid verification "successes") that you'd like

  • GetCountAnalysisWithinSlots โ€“ fetch analysis stats using slots as "time"
  • GetCountAnalysisWithinDuration โ€“ fetch analysis stats using insertion time (used interchangeably with "analysis time" since we assume a save to DB shortly after analysis)

Both functions accept a AnalysisQueryFilter

type AnalysisQueryFilter struct {
	Category   AnalysisCategory
	Comparator string
}

which helps filter out results. This allows queries like "count faults of class "ignored validator preference" that have occured in last 10 min".

API

There's a few new API endpoints for getting bid analysis stats. These are mostly for POC and are barebones so far, but show how the bid analysis numbers can be returned to a client / interested relay.

GetAnalysisCount         = "/monitor/v1/analysis/count"
GetAnalysisCountBidValid = "/monitor/v1/analysis/count/valid"
GetAnalysisCountBidFault = "/monitor/v1/analysis/count/fault"

Supported Queries

There's a number of things that can be done now, stats / analysis count wise. As an example, here's what I wrote from previous week's update:

  1. Support for queries of bid analysis, e.g. "count faults from slot n to n + 10" or just "how many bids have been analyzed / inserted in the last hour"

"how many bids have been analyzed / inserted in the last hour" can be queried on route

/monitor/v1/analysis/count?minutes=60

With the new endpoints "count faults from slot n to n + 10" becomes an API request to relay monitor on route

/monitor/v1/analysis/count/fault?slots=10

more query params can add more filtering options.

Work for upcoming week

There are a few smaller-scoped features that will be added on top of this

  • Filtering by relay, since right now all the stats endpoints assume only a single relay
  • Generate per-relay fault report from persisted bid analysis data (currently still keeps a counter in-memory). Very easy to do utilizing the "getters" from above
  • More stats API endpoints and more query params for filtering, etc.

I got started on, but didn't finish the "streaming" part of the bid analysis fault notification that I'm aiming to be done with this week. The idea is to have a nice way to be notified of faults (or successful verifications of valid bids) without having to send queries. API endpoints for stats are still useful for generic data keeping purposes, but for this to be useful to a relay we need a nice way to notify when something doesn't check out with a bid