Blockchain Integration Test Harness

# Blockchain Integration Test Harness ## Background In Prysm, we have two main kinds of tests: e2e and unit tests. For the most part, unit tests have a very complex setup especially in critical files such as the state transition, forkchoice, and the sync service. Even these complex examples tend to mock a lot of the blockchain's functionality by populating the database themselves or using mock versions of services for simplicity. Most recently, we had a situation where we had a bug at runtime in one of the caches used during block processing [here](https://github.com/prysmaticlabs/prysm/pull/11567), and the resolution was pretty simple but actually almost impossible to test with our current testing suite. This triggered a discussion about _why_ it was hard to test this code and if we could do anything about it. It turns out we don't have test harnesses powerful enough to give us a meaningful test aside from our end-to-end suite, and that immediately triggered some concerns. This document designs a new test harness for testing chain functionality in Prysm with minimal mocking, and helps us test different behaviors of block processing that are hard to set up otherwise outside of e2e. ## Existing blockchain test harnesses When testing blockchain behavior, we care about the following situations: 1. Running state transition functions and checking the results of applying a block 2. Making assertions about the chain state and chain metadata, such as the head block, validator balances, and consistency with the specification 3. Asserting results of forkchoice in complex fork scenarios Each package in the Prysm repo has different responsibilities. Let's see how we accomplish these assertions today in our Go unit tests and from there, figure out what we're missing: ### Anatomy of Prysm Chain Tests **State Transition Assertions** The state transition function is pure, so we typically initialize a genesis state, and process a series of pre-formed blocks through a pipeline to get some assertions about post-states. This is pretty straightforward, but somewhat repetitive in the codebase: ```go func TestExecuteStateTransition_FullProcess(t *testing.T) { // Set up a deterministic genesis state. beaconState, privKeys := util.DeterministicGenesisState(t, 100) // Update the state slot and retireve some metadata. beaconState.SetSlot(params.BeaconConfig().SlotsPerEpoch-1)) oldMix := beaconState.RandaoMixAtIndex(1) beaconState.SetSlot(beaconState.Slot()1)) epoch := time.CurrentEpoch(beaconState) randaoReveal err := util.RandaoReveal(beaconState, epoch, privKeys) beaconState.SetSlot(beaconState.Slot()-1) // Advance the chain some empty slots. nextSlotState := transition.ProcessSlots(beaconState.Copy(), beaconState.Slot()+1) parentRoot := nextSlotState.LatestBlockHeader().HashTreeRoot() proposerIdx := helpers.BeaconProposerIndex(nextSlotState) // Create a new block to advance the chain with. block := util.NewBeaconBlock() block.Block.ProposerIndex = proposerIdx block.Block.Slot = beaconState.Slot() + 1 block.Block.ParentRoot = parentRoot[:] block.Block.Body.RandaoReveal = randaoReveal // Compute state root and signature. stateRoot := transition.CalculateStateRoot(beaconState, block) block.Block.StateRoot = stateRoot[:] sig := util.BlockSignature(beaconState, block.Block) require.NoError(t, err) block.Signature = sig.Marshal() // Process block through state transition. beaconState = transition.ExecuteStateTransition(beaconState, block) // Assert results from the post-state, namely RANDAO data. assert.Equal(t, params.BeaconConfig().SlotsPerEpoch, beaconState.Slot(), "Unexpected Slot number") mix := beaconState.RandaoMixAtIndex(1) assert.DeepNotEqual(t, oldMix, mix, "Did not expect new and old randao mix to equal") } ``` We perform a lot of manual building of blocks, advancing a chain artificially, and performing assertions. Although repetitive, it gives us fine-grained control over the system in a test environment. **Forkchoice Testing** For forkchoice testing, we tend to create complex block trees using fake block hashes []byte{1}, []byte{2}, []byte{3} as examples for easy assertions. We tend to use the forkchoice store itself to do this and then verify values such as weights, the current head, and are able to also add votes to our current store computation. Here's what this looks like: ```go func TestFFGUpdates_TwoBranches(t *testing.T) { balances := []uint64{1, 1} f := setup(0, 0) ctx := context.Background() r, err := f.Head(context.Background(), balances) require.NoError(t, err) assert.Equal(t, params.BeaconConfig().ZeroHash, r, "Incorrect head with genesis") // Define the following tree: // 0 // / \ // justified: 0, finalized: 0 -> 1 2 <- justified: 0, finalized: 0 // | | // justified: 1, finalized: 0 -> 3 4 <- justified: 0, finalized: 0 // | | // justified: 1, finalized: 0 -> 5 6 <- justified: 0, finalized: 0 // | | // justified: 1, finalized: 0 -> 7 8 <- justified: 1, finalized: 0 // | | // justified: 2, finalized: 0 -> 9 10 <- justified: 2, finalized: 0 // Left branch. state, blkRoot, err := prepareForkchoiceState(context.Background(), 1, indexToHash(1), params.BeaconConfig().ZeroHash, params.BeaconConfig().ZeroHash, 0, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 2, indexToHash(3), indexToHash(1), params.BeaconConfig().ZeroHash, 1, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 3, indexToHash(5), indexToHash(3), params.BeaconConfig().ZeroHash, 1, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 4, indexToHash(7), indexToHash(5), params.BeaconConfig().ZeroHash, 1, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 4, indexToHash(9), indexToHash(7), params.BeaconConfig().ZeroHash, 2, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) // Right branch. state, blkRoot, err = prepareForkchoiceState(context.Background(), 1, indexToHash(2), params.BeaconConfig().ZeroHash, params.BeaconConfig().ZeroHash, 0, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 2, indexToHash(4), indexToHash(2), params.BeaconConfig().ZeroHash, 0, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 3, indexToHash(6), indexToHash(4), params.BeaconConfig().ZeroHash, 0, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 4, indexToHash(8), indexToHash(6), params.BeaconConfig().ZeroHash, 1, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) state, blkRoot, err = prepareForkchoiceState(context.Background(), 4, indexToHash(10), indexToHash(8), params.BeaconConfig().ZeroHash, 2, 0) require.NoError(t, err) require.NoError(t, f.InsertNode(ctx, state, blkRoot)) // With start at 0, the head should be 10: // 0 <-- start // / \ // 1 2 // | | // 3 4 // | | // 5 6 // | | // 7 8 // | | // 9 10 <-- head r, err = f.Head(context.Background(), balances) require.NoError(t, err) assert.Equal(t, indexToHash(10), r, "Incorrect head with justified epoch at 0") ``` The set up is complex, and requires a lot of manual plumbing in order to create a forked chain, but gives the developer full control of the system and it is self-contained to the forkchoice package. **Asserting Chain State / Metadata** When asserting chain state, in the blockchain service, we _do_ have the capabilities of initializing a real blockchain service, real forkchoice store, and database in order to process a pipeline of blocks. ```go func TestStore_OnBlockBatch_NotifyNewPayload(t *testing.T) { ctx := context.Background() // Set up a real database beaconDB := testDB.SetupDB(t) // Set up a real forkchoice store fc := doublylinkedtree.New() // Set up a real blockchain service opts := []Option{ WithDatabase(beaconDB), WithStateGen(stategen.New(beaconDB, fc)), WithForkChoiceStore(fc), } service, err := NewService(ctx, opts...) require.NoError(t, err) // Initialize genesis data st, keys := util.DeterministicGenesisState(t, 64) require.NoError(t, service.saveGenesisData(ctx, st)) bState := st.Copy() // Initialize 4 blocks, run them through state transition functions var blks []interfaces.SignedBeaconBlock var blkRoots [][32]byte blkCount := 4 for i := 0; i <= blkCount; i++ { b := util.GenerateFullBlock(bState, keys, util.DefaultBlockGenConfig(), types.Slot(i)) bState = transition.ExecuteStateTransition(ctx, bState, b) root := b.Block.HashTreeRoot() service.saveInitSyncBlock(ctx, root, b) blks = append(blks, b) blkRoots = append(blkRoots, root) } err = service.onBlockBatch(ctx, blks, blkRoots) require.NoError(t, err) } ``` However, we do not perform as many assertions aside from happy path checks. We lack coverage in many of these block processing functions because the complexity of testing the unhappy paths is too much. We do not make as many assertions on the results of block processing, and have a hard time building up negative tests. ## What our test harnesses are missing Our current test harnesses do a decent job at **separating concerns**. Namely, forkchoice test setups don't require us setting up a genesis state or real attestations, as forkchoice just operates on an abstract notion of "votes". The state transition tests don't require us to setup a blockchain, because they are pure, and blockchain tests tend to be more of a "black box". However, our test harnesses fail to capture the _timing_ of events and the synchronous, distributed system nature of a node. For example, we should be able to: 1. Control a set of validators submitting messages to a node at different intervals in a slot and control their latency 2. Observe the status of caches and inner functioning of the blockchain service at different points in time For this, we rely on our end-to-end test suite, which is error-prone, takes ages to run, and is a lot more of a black box than our unit tests. Can we do better? Let's think about why the [cache bug](https://github.com/prysmaticlabs/prysm/pull/11567) discovered by Terence was hard to test. First of all, the bug relies on skip slots happening in the chain, and on validators submitting messages at different intervals within a slot or at epoch boundaries. The bug Terence discovered would lead to a cache miss and give the proposer less time to prepare a block as they would not have a cached payload ahead of time and would have to go through normal processing. ## Brainstorming a better test harness Thinking out loud, it would be really nice if we had a testutil we can initialize and have a lot of granular control over a blockchain with minimal mocking. This is similar to how we call `testutil.SetupDB(t)` and you get a real, boltDB instance that is full-featured and works without issues. Here are some features that would be nice to see in this harness: - Have a chain running with slots and time advancing in the background - Ability to feed blocks into this chain at specific times, slot intervals, and simulate some latency (although adding non-determinism is a bad move in unit tests, by making it such that we can select intervals of a slot for submission rather than timestamps, we could alleviate this risk) - Ability to create forked chains with very easy configurations, perhaps in a way that is easy to read - Ability to have fake validator "agents" running in a goroutine and submitting messages to the chain, and make it easy to control their behavior, latency, and timings - Ability to run state transitions and advance a chain from blocks - Ability to trigger skip slots at various intervals or granular points - Ability to inspect the chain state, metadata, and perform assertions at specific times, for example, check caches, check validator balances at each N slot, etc. - Ability to control the execution client attached to the system. This will need to be mocked A problem with these ideas is that if we want to really tick slots, our test will take a long time to complete, even if we go down to two seconds per slot. Can we abstract the concept of time and slot intervals? How could we get this to work in a unit test? ## Design Overview ```go= { chain := testutil.NewSimulatedChain(t) .UpToEpoch(5) .SkipEveryNSlots(2) // Skip every 2nd slot .NumValidators(64) // Add granular control of when blocks are submitted // and which proposers submit them late. .LateBlocks(...) // Add the ability to control specific actions of proposers // such as what intervals they submit their slots in, whether // or not they should submit slashable offenses, etc. .ProposerBehaviors(...) // Similar to e2e, each slot check X things about the chain, // caches, fork choice. .Evaluators(...) .Populate() defer chain.Cleanup() // Could be done as part of test cleanup chain.Start() // Runs a chain in the background, non-blocking } // Evaluator API func proposerEvaluator(t *testing.T, chain *ChainHarness) { proposer := chain.ProposerAt(chain.HeadSlot()) assert.Equal(t, true, proposer.EffectiveBalance == 32) } ``` We can take a lot of inspiration for how our e2e suite uses evaluators at different times in the lifecycle of a chain, and create assertions based on this. Questions: what else should this harness have? How can we really test these timing scenarios effectively in an integration test? How can we find a middle-ground between e2e and unit tests? Perhaps these can run separately in CI than normal unit tests. Can we set this up with Bazel? ## References - https://github.com/prysmaticlabs/prysm/pull/11567

Read more

Migrating Prysm to Slog

Running BOLD Challenges on Sepolia

Generalized History Commitment

If I Had a Reset Button...