Try   HackMD

Pending attestation queue design notes

Currently when prysm receives an attestation for which it hasn't seen the target block, it gets put in a pending attestation queue. This queue is being emptied twice per slot, at even intervals.

The purpose of the queue handler is two-fold:

  1. when called to process the pending attestations, it calls to request the blocks that we haven't seen and
  2. it validates the attestations for blocks that we have seen.

In the current design we have

  • For most blocks, we do not need to process nor call pending blocks as the queue is empty.
   00''   05''   10''   15''   20''Slot N        Blk       Att       Slot N+1      SlotsTasksSlot timeline usual conditions
  • For some blocks (specially in slot zero) we process attestations at 6 seconds into the slot.
   00''   05''   10''   15''   20''Slot 32*N        Att       Blk       Pr Att    Slot 32*N+1      SlotsTasksSlot timeline some times slot 0
  • For even less blocks we request the block at 6 seconds, and then we process the attestation in the next slot at second 0.
   00''   05''   10''   15''   20''Slot 32*N        Att               Req.              Blk               Slot 32*N+1      Pr Att            SlotsTasksSlot timeline exceptional slot 0
  • In case the network is highly forked, the above pattern will happen every slot. But all attestations will be processed and all blocks will be requested at the same times, at 0" and 6" into the slot.

We want to move from a fixed timed system, to an on-demand system, where we call to clear the pending attestations as soon as we process a block. This suggests that we need to separate the two functions of the queue as we may want request blocks at different times than when we process the attestations.

A reasonable design proposal

I propose to divide the functionality to be completely on-demand, that is

  1. As soon as we see an attestation for which we don't have a block (and we haven't requested it already), we save it and request the block.
  2. As soon as we process a block we process all attestations that reference that block.

This can be easily achieved in the following steps

i. get rid of the processPendingAtts function entirely
ii. Modify savePendingAtt to in addition to saving the pending attestation in the queue, to request the block and save it in a list of requested blocks.

There are only two places where this function is called and its on validation, both in validateCommiteeIndexBeaconAttestation and validateBlockInAttestation. Thus whenever we are validating an attestation, this will add a little overhead of checking if the block has already been requested, and a bigger overhead of actually making the request.

The check to see if the block has already been requested is as simple as checking if the root appears in blkRootToPendingAtts, that is, if we have a root in that map then we can safely assume that we have already requesteed the block.

iii. Write a function processPendingAttsForBlock that takes a block Root bRoot, and simply processes all the attestations in blkRootToPendingAtts[bRoot] and then clears this entry in the map. Thus the signature of this function would be

func (s *Service) processPendingAttsForBlock(ctx context.Context, bRoot [32]byte) error { ... }

iv. Call this function in the subscriber after receiving every block, hence the call will be here and look like

if err := s.cfg.Chain.ReceiveBlock(ctx, signed, root); err != nil { interop.WriteBlockToDisk(signed, true /*failed*/) s.setBadBlock(ctx, root) return err } if err := s.processPendingAtts(s.ctx, root); err != nil { log.WithError(err).Debugf("Could not process pending attestation: %v", err) return err }

Resource considerations

The approach above affects resource consumption in a couple of different ways.

  • For most blocks we do not need to process nor call pending blocks, just like it is now.
   00''   05''   10''   15''   20''Slot N        Blk       Att       Slot N+1      SlotsTasksSlot timeline usual conditions
  • For some blocks (specially slot zero) we will make a request when we see the attestation and then we will process the attestation as soon as we see the block. Notice that in this pattern, even in the happy case, we will be making an extra request compared with the current design, since in the current design, we often see the block before we hit the 6 seconds mark.
   00''   05''   10''   15''   20''Slot 32*N        Att       Req       Blk       Pr Att    Slot 32*N+1      SlotsTasksSlot timeline some times slot 0
  • For even less blocks, we will make the request as above and then process the attestations later than 6 seconds when the block arrives, this has the same resource consumption as the analogous case in the current design, with the added big improvement that even in this exceptional scenario, we are processing the attestation before the slot end and hopefully before aggregation:
   00''   05''   10''   15''   20''Slot 32*N        Att               Req               Blk               Pr Att            Slot 32*N+1      SlotsTasksSlot timeline exceptional slot 0

The expensive tasks are signature verification, both for the attestation and the blocks. So in principle the possible extra requests that we may get from the scenario in 2 is not so bad.

A hybrid model

A hybrid model would be to add this extra path:

When receiving an attestation for a block we haven't seen:

  • If we are less than 6 seconds into the slot, save the attestation but do not request the blocks.
  • Requests all blocks with pending attestations at 6 seconds into the slot as it is currently done.
  • If we are more than 6 seconds into the slot request the block immediately.

This avoids the extra request in the non-exceptional slot 0 scenario.

A more radical approach

Do not request any block, just save the attestations and never requests the corresponding blocks. If we haven't seen the block then we will soon drop the attestation. I favor this approach as it has several performance advantages.

Extra considerations:

  • The queue should have a fixed length since we can be ddos if we keep receiving attestations for slots we haven't seen and keep requesting
  • It would be better to have a load balancer as lighthouse has: they start processing the attestation immediately after the block is synced, but they postpone if they are under stress.