owned this note changed 4 years ago
Linked with GitHub

A Prysm proposal to cache next slot state

Credit to Paul Hauner for the original idea.

Goals

  • Describe the current limitations
  • Describe the proposed solution
  • Describe the implementation plan

Limitation

In current spec, the validator processes slots/epoch before it processes a block. As defined in the spec:

def state_transition(state: BeaconState, signed_block: SignedBeaconBlock, validate_result: bool=True) -> None: block = signed_block.message # Process slots (including those with no blocks) since block process_slots(state, block.slot) # Verify signature if validate_result: assert verify_block_signature(state, signed_block) # Process block process_block(state, block) # Verify state root if validate_result: assert block.state_root == hash_tree_root(state)

In Prysm implementation, state_transition is triggered by a new incoming block, that means a validator will only process slots when there's a new block and up to that block's slot number.

An alternate implementation is to process slots separately from blocks. Validator first calculates a "base state" for a slot regardless if it has the block or not, then uses this base state to compute an epoch cache, then if there is a block, it applies the block. We negated from this design early on due to potential complexity down the line.

One major limitation with the formal approach is the "time loss" before one can process slots due to late arrival block. This becomes more apparent as the chain grows and more validators join. There have been frequent examples of blocks arriving 2 seconds or later. That means instead validator could be processing slots at the beginning of the slot, it has to wait 2 seconds. Such limitation magnifies during the first slot of the epoch because epoch processing time is no longer negligible. As we have seen reports that voting of incorrect head is getting more frequent during first the slot of the epoch.

Proposed solution

Here we propose “trailing edge” slot processing (i.e. precomputing the base state for the next block). We run the mandatory process_slots call after you’ve verified a block, rather than before it. The resulting state will be cashed and used when before process the next block. Keep in mind that there would be expected cache miss if the next block doesn't get built upon the previous block but that is ok, we are still on track with descent improvement with this. We can add metrics to monitor hit/miss ratio and determine the desired cache size around it.

Design detail

For simplicity, we initialize an object that contains one state and one root.

type nextSlotCache struct { sync.RWMutex root []byte state *state.BeaconState }

To retrieve the state from object, we check if the root matches

if !bytes.Equal(root, nsc.root) { return nil, nil } // Important to return copy. return nextSlotCache.state.Copy(), nil

To update the state in the object, we advance state's slot by one

copied := state.Copy() copied, err := ProcessSlots(ctx, copied, copied.Slot()+1) nextSlotCache.root = root nextSlotCache.state = copied return nil

When we do retrieve the state? During state transition and before process slots is called.
Note that we still call process slots in the event of skipped slots

func ExecuteStateTransitionNoVerifyAnySig( ctx context.Context, state *stateTrie.BeaconState, signed *ethpb.SignedBeaconBlock, ) (*bls.SignatureSet, *stateTrie.BeaconState, error) { // Check whether the parent state has been advanced by 1 slot in next slot cache. tsState, err := GetNextSlotState(ctx, signed.Block.ParentRoot) // If the next slot state is not nil (i.e. cache hit). // We replace next slot state with parent state. if tsState != nil { state = tsState } // Since next slot cache only advances state by 1 slot, // we check if there's more slots that need to process. if signed.Block.Slot > state.Slot() { state, err = ProcessSlots(ctx, state, signed.Block.Slot) if err != nil { return nil, nil, errors.Wrap(err, "could not process slots") } }

When we do update the state? After a block is processed.

set, postState, err := state.ExecuteStateTransitionNoVerifyAnySig(ctx, preState, signed) valid, err := set.Verify() if err := state.UpdateNextSlotCache(ctx, blockRoot[:], postState); err != nil { return err }

Working in progress PR

https://github.com/prysmaticlabs/prysm/pull/8357

Select a repo