Try   HackMD

Track latest validator performance in prysm

The following is a design document allowing a beacon node to track more detailed information about a set of validators performances that what currently can be extracted from a BeaconState object.

The problem:

One of the main tools to debug p2p/forkchoice/performance problems is attestation/proposal timing information. With Altair's fork, we lost easy access to the attestation inclusion distance as the PendingAttestation object is no longer needed. Stakers need to rely currently on a centralized entity (like an explorer) to obtain this information, or they need to parse themself the beacon blocks.

Currently, to obtain information about validator performance, the validator needs to request the beacon node to fetch it from the objects it already has, or compute it if it is something that can only be obtained by processing these objects. Since performance information may be required on a per-epoch basis, on multiple validators, this ends up in an unnecessary burden of the beacon operation.

I believe that in order to alleviate this problem, it is unavoidable to break (albeit minimally) the separation of duties between validators and beacons: I propose two extra beacon node CLI flags, paralleling what is already supported by LightHouse: --track-validator-auto and --track-validator-indices. The second flag would take a list of validator indices to track performance-related information. The first flag would track this information for every validating key connected via RPC. The beacon will have some information through these flags, that a validator client will be interested in its performance parameters.

A naïve solution: first iteration

A naive solution to this problem, if it were simply about obtaining inclusion distance, would be simply add a log entry when processing an attestation, as part of beacon-block processing, whenever this attestation is from one of our tracked validators. At this stage we can log a message

[INFO] (track validator performance) validator index = xxx attestation slot = yyy inclusion slot zzz...

This approach has an advantage of simplicity, as it would merely require adding a CLI flag, and a single check in attestation processing. However there is a big problem with this approach and it lies in the separation of duties between the beacon-node and the validator clients. The requested information is of interest to the validator, not to the node. And this validator may be far away and not have access to the beacon node besides its RPC port.

The right solution

In order to maintain the idea that the consumer will be the validator client while the provider will be the beacon-node, the design becomes a bit more involved

  • The beacon node will keep a cache of just the last participation for each validator in its tracking list.
  • The beacon node will update this cache everytime it processes an attestation/participation for the tracked validators
  • The beacon will expose an RPC endpoint so that a validator client can query to use it.
  • The validator client may request this information from the node each epoch in order to log/update metrics if the user so decides.

This approach has a few advantages: it has minimal impact in maintaining the cache since it requires a single check per object processed and no block/state fetching. Therefore when a validator requests to update it's performance metrics/logs, the beacon does not need to perform any computation, it simply returns the last available information.

This does not render obsolete custom endpoints like validator/performance since the API endpoint would not take an epoch nor any timing parameter, it would simply reply with the tracked validator last participation in the cache.

Some technical aspects

Concretely I propose:

  • The addition of the above mentioned two flags to the beacon node.
  • The repurpose of the validator client flag --disable-account-metrics and --disable-rewards-penalty-logging to toggle wether or not the validator client calls the above mentioned API point every epoch.
  • The addition of a new protobuf structure (types are all wrong) placed in proto/eth/v2
message ValidatorLastPerformance {
    uint64 balance = 1 
    uint64 balance_change = 2
    bool correct_source = 3
    bool correct_target = 4
    bool correct_head = 5 
    int inclusion_distance = 6
    uint64 attestation_slot = 7
    repeated uint64 proposed_blocks = 8 // either last proposed block or every block proposed in last epoch
    sync_committee_performance = 9      // to be specified 
    more_specific_debug_info = 10 // timing information about processed blocks/attestations. To be specified
}
  • The addition of a Go map to the blockchain.config service
type config struct {
    ...
    PerformanceCache    map[types.ValidatorIndex]*v2.ValidatorLastPerformance
    ...
}
  • The addition of a new protobuf structure ValidatorLastPerformanceRespose which just consists of repeated ValidatorLastPerformance messages, one for each tracked index.

The beacon simply updates the cache, discarding any old information, whenever a new attestation for the validator is processed in the blockchain service. The rpc server just flushes this info on the API query.