Performant routing and latency benchmarking for Ethereum RPC and Relay Service Providers

In the paper, "Strategic Latency Reduction in Blockchain Peer-to-Peer Networks"
it was observed that direct global latency and direct targeted latency reductions each of about 11 ms—over half the reduction observed for bloXroute for direct latency and comparable to that of bloXroute for targeted latency. We improve upon their implementation by improving teir abstracted network oracle to with a proposed reputation circut for creating an RLN construct for the purposes of automating and regulating agent access to service endpoints. This reputation circuit is derived from both attestation and sync subnet results. This calculated score is used to provide a congestion score of network latency, as the calculated attnet/subnet is a function of connected peer latency.

Additionally we provide a testing methodology and benchmarking criteria that provides:

End user metrics (LAAP, )
Network capacity, throughput, latency and liveness monitoring

Network Formation.

Ethereum, like most permissionless blockchains, maintains a P2P network among its nodes. Each node is represented by its enode ID, which encodes the node’s IP address and TCP and UDP ports [42]. Nodes in the network learn about each other via a node discovery protocol based on the Kademlia distributed hash table (DHT) protocol [30]. To bootstrap after a quiescent period or upon first joining the network, a node either queries its previous peers or hard-coded bootstrap peers about other nodes in the network. Specifically, it sends a FINDNODE request using its own enode ID as the DHT query seed. The node’s peers respond with
the enode IDs and IP addresses of those nodes in their own peer tables that have IDs closest in distance to the query ID. Nodes use these responses to populate their local peer tables and identify new potential peers.

Peering quality

We assume there exists an oracle in the network serving as an abstraction of a distributed database of network IDs. Any node may query the oracle, which responds by independently drawing the network ID of a node in the network from some (unknown) probability distribution, where each node may be drawn with a non-zero probability. Specifically, there exists a universal constant 𝑞 > 0 where the probability of drawing an arbitrary node is bounded by the minimum
$𝑞$ . This is based on the assumption that the number of nodes is upper bounded. We assume the oracle can process a fixed number of queries per unit time, so each query (across all nodes) takes constant time. Upon processing a query, the oracle responds with the network ID of an existing node. This is an abstraction of peer discovery in many blockchains [2].

Taking this assumption and improving upon the paper's network ID for nodes, we can actually use in protocol values to not only generate a unique node identifier, we can also codify a score for the node based off SSZ attributes.

Recall that in proof-of-stake, we rely on the stake “weight” of the validators. The algorithm currently used in Ethereum is called LMD-GHOST and it leverages the messages validator use to come to consensus.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Constructing the RLN Circuit

Each of these messages, called an attestation, carries the validator’s current view of the chain and by tallying the amount of stake attesting to a given chain we can infer what the majority of the consensus set thinks is the “canonical” chain, just like in proof-of-work.

Using the data types we can construct a value of attestation attribute, or attnet attribute. It is a bitvector showing what subnet the node is currently subscribed to^[1].

Distribution of numbers of subscribed nodes to attnets

attnets is a Bitvector representing the node's persistent attestation subnet subscriptions.

Attestation subnet bitfield
The ENR attnets entry signifies the attestation subnet bitfield with the following form to more easily discover peers participating in particular attestation gossip subnets.

Key Value
attnets SSZ Bitvector[ATTESTATION_SUBNET_COUNT]
If a node's MetaData.attnets has any non-zero bit, the ENR MUST include the attnets entry with the same value as MetaData.attnets.

If a node's MetaData.attnets is composed of all zeros, the ENR MAY optionally include the attnets entry or leave it out entirely.

The attnetDB records a linked list of results

Production usage

The RLN enforces utilizing the ENR according to the subscribed attestation subnets in the attnetDB.

For congestion detection, this is based on the queuing latency instead of the queue length. Furthermore, we can also use the derivative (rate of change) of the queuing latency to help determine congestion levels and an appropriate response. Queuing latency is derived from connection churn value/turnover rate of connected peers.

# Appendix A: Consensus layer networking latencies

The following is a short overview of network latencies, relevant to this research.

Network wide parameters

Validator roles per slot: Attesting, Aggregating, Sync committee aggregating and block proposing. For each slot a validator can attest (and also be part of aggregation committee and sync committee) for a beacon block head (LMD GHOST vote) and the epoch checkpoint block (FFG vote) and/or propose a block. At rare occasions the block proposing validator will also be assigned as an attester for the same slot (The probability is 1/32).
Maximum committees per slot: 64
Maximum number of validators per committee: 2048
Minimum (target) number of validators per committee: 128
Number of block proposers per slot: 1
Number of aggregators per committee: 16
Number of sync committee validators: 512
Number of sync sub-committees: 4
Number of aggregators per sync sub-committee: 16
Sync committee duration: 256 epoch (~27 hours)
Slot time: 12 seconds

Latencies overview

Note: Only the rewards and penalties that have impact on this research are mentioned. In ETH2 there are rewards for different actions depending on the validator duties. There are also various penalties for misbehaviour.

Rewards in ETH2 are defined differently for different actions. One of the action that the validators are rewarded for is having their consensus messages included as soon as they’ve been created/propagated. This reward is called delay_reward, and the metric is called inclusion_delay. A rational goal for the validators is to maximise their earnings (without breaking the consensus rules), and this includes maximising the delay_reward. The validators would maximise their delay_reward if their attestations are included into blocks as soon as possible, and if they propose blocks at the slots when they’re selected as block proposers.

The attestation latency (inclusion_delay) can be expressed as: the number of slots after which the original consensus message is included.

Before altair

Attestation latency

Each attestation is voting for the source , target and the head of the beacon blockchain. A validator is only rewarded for an attestation if it is included in a block within a given inclusion_delay of blocks, but also if an attestation is valid according the consensus rules and if the attestation is aligned with the majority of the attestations at that slot (the values for source, target, and head are the same as the values for these fields at the rest of the attestations). The delay_reward before Altair, was related to the whole attestation - not for the different parts of it.

The minimum delay inclusion_delay of blocks for attestations is 1. When attesting to block n the most optimal case would be that an attestation is included in block n+1. The rewards that the validators get for having their attestation included as soon as possible (delay_reward) can be expressed as: delay_reward = delay_reward_quantity * 1/inclusion_delay . In other words the validators are getting the full attestation delay reward that they can get if their attestation at the slot n is included in the slot n+1. Otherwise they would get 1/inclusion_delay of the delay_reward_quantity, so for inclusion_delay of 2 blocks they would get 1/2 of the attestation delay reward. If an attestation is not included within 32 blocks, then the validators will not get any reward.

Block proposal latency

A block proposer is rewarded for proposing a block for a slot that they are assigned to, and is not rewarded at all if they don’t propose a block for a slot. The block proposer’s reward is proportional on the number of attestations they will include in the block (post for attestation packing).

In Altair:

In Altair the attestation reward weights, as well as the block proposal reward weights are changed. For the attestations, the votes for the different component of the attestation (source, target and head ) are not weighted equally anymore, and also the delay_reward is related to each component of the attestation, rather than to the whole attestation. In particular the validators are rewarded if the head vote from the attestation is included within 1 slot, the target is included within 5 slots and the source is included within 32 slots.

The latency is much more important (and treated much more severe) in Altair. The validators will not get any reward if their attestation is late (not included) for more than 5 slots. In comparison with the total attestation reward before Altair, if the inclusion_delay for an attestation is 2 and the attestation “votes” are correct, before Altair the validator would get 90% of the total attestation reward while in Altair the validator would only get 48% of the total attestation reward. Thus in Altair the inclusion_delay has a greater weight on the overall reward. The importance of latency is much more important in Altair, as the validators would be penalised more severely.

Reference post on the impact on the rewards introduced in Altair.

Sync committee latency

The members of a sync committee are rewarded if they sign the block headers at the slots they are part of a sync committee. If they don’t produce a signature for a given slot, or their signature is not included in a block proposal, while they are part of the sync committee then they will not be rewarded. The sync committee participant signature is checked at the next slot for verification (signature for slot n is verified at slot n+1). If their signature is included, they will get rewarded otherwise they will get penalised according to the consensus rules, meaning no inclusion delays are allowed.

Network latency does have impact on the sync committee rewards, in fact the sync committee rewards are very sensitive for network latency (i.e the sync committee signature not included on time). Sync committee members are rewarded for each slot they perform their duty correctly (signing block headers) during the whole sync committee period (256 epochs).

Conclusion:

In Altair it is even more important that no additional latencies are introduced that would have negative impact on the inclusion_delay and the overall rewards that the validators would get for performing their duties.

Reference

Datatypes

Active     =          bool 		// if false, nothing special is done
Targeted   =          bool
Period =              uint64 	 // Period of peer reselection in seconds
ReplaceRatio         float64 	// 0~1, ratio of replaced peers in each epoch =
DialRatio          = int
MinInbound         =  int
MaxDelayPenalty     = uint64 // Maximum delay of a tx recorded at a neighbor in ms
MaxDeliveryTolerance int64

ObservedTxRatio int

ShowTxDelivery bool 	// Controls whether the console prints all txs

TargetAccountList []string
NoPeerIPList      []string
NoDropList        []string

 DialRatio = %d, 
 MinInbound = %d"

Key	Value
Active	Enable Backbone
Targeted	global latency if false, targeted latency if true (target accounts in TargetAccountList)
ObservedTxRatio	(1 / sampling rate) of global latency; must be int
ShowTxDelivery	[DEBUG] print log of relevant transaction arrivals to command line if true
Period	Backbone period length in seconds
ReplaceRatio	Proportion of peers to be replaced/cycled
DialRatio	(1 / percentage of outbound peers). Must be int. 1 for Backbone/Hybrid and 3 for Baseline
MaxDelayPenalty	Δ in milliseconds
TargetAccountList	list of target accounts

Errata

Peering Log Example

Example: Participation Rate as a Health Metric?

example from sigma prime blog

Another detail that these results show is the limitations of using Participation Rate as a network health metric.

Participation rate in it's simplest form, is just a count of the voting ETH across an epoch and is often used as a metric for network health.

Participation rate (when calculated this way) is only interested in whether attestations were included at all in a given epoch. This means for any given participation rate, a spectrum of scenarios exists between two extremes:

Attestations are included as soon as possible (delay=1)
In this case, the efficiencies of blocks are unaffected by late attestations.
Attestations are included, but at the latest possible time (delay≤32)

In this case, the efficiencies of every block has been impacted by late attestations.
In both cases, the participation rate is the same, since all those attestations are still being included. Yet the health of the network is suffering due to late attestations.

Citations and Bibtex

@misc{https://doi.org/10.48550/arxiv.2205.06837,
doi = {10.48550/ARXIV.2205.06837},

url = {https://arxiv.org/abs/2205.06837},

author = {Tang, Weizhao and Kiffer, Lucianna and Fanti, Giulia and Juels, Ari},

keywords = {Cryptography and Security (cs.CR), Computer Science and Game Theory (cs.GT), Networking and Internet Architecture (cs.NI), FOS: Computer and information sciences, FOS: Computer and information sciences},

title = {Strategic Latency Reduction in Blockchain Peer-to-Peer Networks},

publisher = {arXiv},

year = {2022},

[2]: Wenbo Wang, Dinh Thai Hoang, Peizhao Hu, Zehui Xiong, Dusit Niyato, Ping Wang, Yonggang Wen, and Dong In Kim.
2019. A survey on consensus mechanisms and mining strategy management in blockchain networks. Ieee Access 7 (2019), 22328–22370

TODO:
cite Stokes, Alex
cite EIP relevants

see https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md#bitvectorn ↩︎