We (TxRx team, ConsenSys) have implemented a Fork Choice compliance test generator as well as have generated Fork Choice compliance test suites.
Overall F/C compliance testing methodology is described here.
In this report we briefly describe the results of the initial implementation phase (i.e. the F/C test generator and F/C test suites). A more detailed description of the work is TBD.
This work was supported by a grant from the Ethereum Foundation.
The initial version of the Fork Choice tests generator is implemented and currently available as a draft consensus-specs PR. We have been focusing on minimizing efforts for client implementer teams to adopt the generated tests. The only a small change to the existing FC test format is the addition of a new check, which is safe to ignore initially.
We have developed test generation parameters for three suites at the moment.
Test suite | size | Purpose | Status | Link |
---|---|---|---|---|
Tiny | 135 tests | Demonstration, smoke testing | Done | link |
Small | 1472 tests | Initial adoption, smoke testing | Done | link |
Standard | 13240 tests | Main testing | Done | link |
Extended | about 100K tests | Extended testing | TBD |
Note: We are able to generate the Extended test suite. However, it will take significant time (about a week), therefore, we have delayed actual test suite generation until it will be demanded.
It should be possible to generate test suites for any fork (Altair, Capella, Deneb) and preset (mainnet or minimal). However, test generation for mainnet is very slow. We have tested minimal/altair and minimal/deneb.
Test generation currently is slow (about 10-15 seconds per test on average). However, a multiprocessing mode is supported (about 2 seconds per test on Apple M1). Generation of the Standard test suite takes about 8 hours (multiprocessing mode) or two days (single process mode).
The reasons of slow performance are known and are to be alleviated in future. Currently, our top priority is to simplify adoption of the new test suites.
We have run the generated tests against Teku, using Teku test runner and against the official executable Fork Choice spec (minimal/deneb), using a simple Python test runner.
The test generation approach is a mix of model-based and fuzz testing.
Principles:
filter_block_tree
)Tests are generated with four steps:
The models are developed manually.
Solutions to the models are produced with a special generator.
Test instantiators and mutations are performed with test_gen.py.
After tests are generated, one can validate the produced test steps using test_run.py script, which executes the steps using the pyspecs, performing prescribed checks.
Test group | size (standard suite) | parameters (solutions + variations + mutations) | description |
---|---|---|---|
Block tree | 4096 tests | 1024*2*(1+1) | focus on trees of varying shapes |
Block weight | 2048 tests | 8*64*(1+3) | focus on producing block trees with varying weights |
Shuffling | 2048 tests | 8*4*(1+63) | focus on shuffling/mutation operators |
Attester slashing | 1024 tests | 8*16*(1+7) | focus on attester slashing |
Invalid messages | 1024 tests | 8*32*(1+3) | focus on invalid messages |
Block cover | 3000 tests | 60*5*(1+9) | cover various combinations of predicates from the filter_block_tree method |