Fork Choice spec compliance testing Grant Application

# Fork Choice spec compliance testing Grant Application # Project Abstract > In 3-5 sentences what problem are you trying to solve? Ensuring spec compliance of Fork Choice implementations is a challenging but very important problem. We have identified a rather sophisticated [approach](https://hackmd.io/@ericsson49/fork-choice-implementation-vs-spec-testing) to address it, which requires cooperation of client implementers as well as R&D teams. As a part of the project, we want to answer open questions and implement core components of the approach, which are common and will be used by client implementers: a) a seed corpus of test vectors, b) test mutation operators. Additional components may include: a) blindly fuzzed extended test corpus b) reference implementation of coverage criterion and c) test generator. # Objectives > What are you hoping to accomplish with this grant? How do you define and measure success for this project? The main objective is to establish common ground for client-specific fuzz testing. This consists of a set of components, which could, should or must be used by all client implementers, either to ensure interoperability, common coverage measures or to reduce implementation efforts. While there can be many such aspects, we focus on the following areas: 1. common test suites/corpus - including test format, fast test versions of cryptography routines, etc - to be used either as a seed corpus or to establish common minimal coverage - including test generator and auxiliary artifacts, like modified spec version used to generate tests, annotations, etc 2. test vector mutation operators 3. spec coverage criteria/measures - detailed procedures to calculate such measures - reference implementation of adjusted coverage criterion to be used in coverage-guided fuzzers, which takes multiplicative coverage into account - a set of Fork Choice spec features to be covered Note that implementing the full set of the above aspects is likely beyond the scope of the grant proposal. In particular, we expect that we will gather new experience and insights that can affect our plans, so we assume that the above program is implemented in an incremental fashion. # Outcomes > How does this project benefit the greater Ethereum ecosystem? - The common seed corpus together with test vector mutation operators will allow to bootstrap client-specific Fork Choice spec implementation fuzz testing. - Common blind fuzzed test suite will allow to ensure that some minimal set of the Fork Choice spec features are covered by all implementations. - Reference implementation of an adjusted coverage measure together with spec coverage-set will allow to guide client-specific fuzzers according to our ["multiplicative" coverage criterion](https://hackmd.io/@ericsson49/fork-choice-implementation-vs-spec-testing#Coverage-metriccriterion). - Test generator and decomposed specs. - An additional benefit is that the methodology and tools can be partially re-used to other spec compliance testing (e.g. p2p spec or confirmation rule). # Grant Scope > What are you going to research? What is the expected output? Open research questions: - how to calculate coverage measures/metrics exactly - statement coverage alone is unlikely to be satisfactory - should we use [white box](https://en.wikipedia.org/wiki/Black-box_testing) based coverage criteria, e.g. to ensure diverse kinds of blocks, attestations are tested? - should we desugar Fork Choice spec to obtained finer-grain coverage measures? - should we use path-based vs. set-based approach to measure coverage? - how to efficiently embed such coverage measure into coverage-guided fuzzers - BLS and hash procedures are slow and should normally be replaced by faster test routines. Since hash/BLS procedures affect test generation and sorting, we should establish common standard for the `hash`, `hash_tree_root` and BLS signature functions - which test vector format should we use? - fuzzing means that test corpus doesn't need to contain expected results, since fuzzer will constantly mutates test inputs and thus should execute test oracle repeatedly - how exactly to decompose spec in order to aid test generation - how to apply model checking to generate tests for the Fork Choice spec - should we translate python code to C? - how exact code generator procedure should look like? - what are obstacles to obtain really good spec coverage? - how they should be resolved? Artifacts: - common test suites/corpus - including test format, test hash and cryptography procedures, etc - to be used either as a seed corpus or to establish common minimal coverage **NB** there can be different test suites, e.g. a seed corpus (e.g. minimal coverage test suite) and an extended test suite obtained by blind fuzzing this seed corpus - including test generator and auxiliary artifacts, like modified spec version used to generate tests, annotations, etc 2. test vector mutation operators 3. spec coverage criteria/measures - detailed procedures to calculate such measures - reference implementation of adjusted coverage criterion to be used in coverage-guided fuzzers, which takes multiplicative coverage into account - a set of Fork Choice spec features to be covered # Project Team > How many people are working on this project? > Please list their names and roles for the project as well as how many hours per month will each person work on this project? Assuming 22 working days per month, 8 hours working day. Alexander Vlasov - principal researcher, 50% load, 88 hours per month Mikhail Kalinin - principal researcher, 50%, 88 hours per month # Methodology > How do you plan to achieve your research objectives? Base methodology is described [here](https://hackmd.io/@ericsson49/fork-choice-implementation-vs-spec-testing). Differential fuzz testing of Fork Choice implementations: - first, generate minimal acceptable coverage of Fork Choice spec - decompose the spec to aid test generation - use model checking to generate test vectors based on the decomposition - implement necessary modifications to make model checker reach necessary coverage of the Fork Choice spec - second, use the resulting test suite as a seed corpus for client-specific fuzzers - we provide smart test mutation operators based on the spec decomposition above - fuzzer's coverage metric should be adjusted to take into account our "multiplicative" coverage criterion - in addition to the second step, an extended blindly fuzzed suite of test vectors can be provided to establish minimum common Fork Choice spec coverage by all implementations - can be generated by various methods - can be used as a seed corpus too - optional, cross-testing of Fork Choice implementations - use test vectors gathered during fuzzing of a particular Fork Choice implementation to test other implementations - it can be used to estimate whether we obtained good coverage during prior testing, e.g. if cross testing discovered only a small number of unseen paths then we can guess that we have already reached reasonable multiplicative coverage # Timeline > Please include a brief explanation on the milestones/roadmap, along with expected deliverables. Also outline how the funds will be used for the research project and or members of the team. Total project duration is 6 months. During the period our team of two researchers will work on the following topics: 1. Decompose the Python Fork Choice spec to aid test generation and test vector mutation. 2. Analyze suitable test generation approaches required to reach good spec coverage. In particular, consider CPAChecker, ESBMC, CBMC or other model checkers/test generators to generate tests for the decomposed Fork Choice spec. It may include manual or mechanical translation parts of the spec to C. 3. Generate test vectors to achieve reasonably good coverage of the Fork Choice spec assuming easy criterion - e.g. good coverage of reachable statements. It's a first attempt to be improved further. 4. Research test corpus format issues, including test BLS and hash procedures. Experiments with Teku. 5. Experiment with Fork Choice spec coverage measures: desugar spec, use different coverage critertion, use black-box coverage criteria (e.g. [equivalence partitioning](https://en.wikipedia.org/wiki/Equivalence_partitioning)). How does it affect model checking-based test generation? 6. Experiment with coverage-guided fuzzing (e.g. of Teku) in order to figure out which coverage guided metric should be used. 7. Design and implement test mutation operators. 8. Final generation of minimal coverage test suite. 9. Generation of blind fuzzed extended test suite. We roughly estimate that each topic will take on average one calendar month for one researcher working part time (50% load), with and additional reserve of 1.5 man-months for resolving possible issues discovered during research. # Budget > Requested grant amount and how this will be used #### Principal Researchers Costs 2 persons * 6 months * 0.5 load = 6 man-months month rate = USD 22,000 total costs = 6 * USD 22,000 = USD 132,000

Read more

Fork Choice implementation testing framework

Pyspecs to Dafny transpiler

Why clock sync matters in Ethereum 2.0

A model to prove absence of aliasing