Pyspecs to Dafny transpiler

# Pyspecs to Dafny transpiler A Python to Dafny transpiler intended to process Ethereum-related specifications. ## Project Abstract The Ethereum-related specifications (like [consensus specs](https://github.com/ethereum/consensus-specs), [execution specs](https://github.com/ethereum/execution-specs), [DV specs](https://github.com/ethereum/distributed-validator-specs)) are not defined formally, while their core parts are expressed in Python. As a consequence, one cannot reason about such specs directly and has to express them in some verification-friendly language. This translation likely results in bugs introduced, which questions whether one can really trust the conclusions of such approach. We believe that the most practical way to overcome the problem is to a) *define the translation mechanically*, and b) *thoroughly test it*. To reduce testing and implementation efforts, we: a) chose [Dafny](https://dafny.org/) as the target language (as Dafny supports imperative and impure code), b) designed a simple and concise *direct imperative impure* transpiler, c) supporting a *subset* of Python. **NB** Additionally, there already is a number of Etherem-related F/V projects using Dafny, e.g. [Eth2.0 spec in Dafny](https://github.com/ConsenSys/eth2.0-dafny), [F/V of DVT protocol](https://github.com/ConsenSys/distributed-validator-formal-specs-and-verification), [EVM interpreter in Dafny](https://github.com/ConsenSys/evm-dafny). *[F/V]: Formal Verification *[DV]: Distributed Validator *[DVT]: Distributed Validator Technology ## Objectives The main objective is to ensure that the generated Dafny code behave exactly as the original Python code (within the supported subset). While formally proving such property would be a better guarantee, we assume that it is well beyond our reach, because: a) there is no formal definition of Python and/or certified Python implementation, b) efforts to develop them will be enormous. Therefore, one has to rely on testing here. However, we can reduce testing efforts required to achieve a (very) high quality translation, by minimizing the set of properties which must be tested. We assume the following measures of success for the project: - the transpiler can handle the subset of Python the specs are written in - the compatibility test suite validates that the transpiler preserves the Python semantics (within the supported Python subset) - the transpiler codebase and the test suite are of reasonable size and amenable to audit by human experts (we aim at few thousand LOCs and few thousand test cases respectively) **NB1** Certain Python features might turn to be too difficult (or even impossible) to implement and/or to test thoroughly within the project time frame. However, we expect that the vast majority of the specs will be supported. **NB2** Some Python features may have unclear or confusing semantics, so we assume that at times it's better to avoid using such features in specs, instead of trying to model them precisely. ## Outcomes The main outcome is that one can reason about the Python specs (and other Python programs within the supported subset) using Dafny infrastructure, relying on thoroughly tested mechanized translation from Python to Dafny. This unlocks several opportunities: - the protocol properties can be formally verified using Dafny - the transpiled code can serve as a *formal* definition of the protocol - the transpiler can be used to validate equivalence (equivalent behavior) of different implementations of the Python specs. This in turn unlocks further opportunities - equivalence of an optimized implementation vs. the reference one - equivalence of the imperative impure Dafny code vs. functional pure version. The latter is much easier to reason about formally. At the same time, the semantic gap (between the functional pure code and imperative impure Python code) is much wider and, thus, probability of introducing bugs is higher. - equivalence of the imperative impure Dafny code vs. an abstract declarative specification. An urge for an abstract spec which is free of implementation choices is more or less recognized. - equivalence of the original specs vs. a version optimized for model checking or test generation. **NB** Such tools can have problems dealing with particular language features (like complicated loops, recursion, complicated memory access patterns), therefore one can rewrite Python specs or their parts in a way, which is more amenable to bounded model checking. - Dafny infrastructure can be used to further transpile Dafny code to other supported languages (C#, C++, Java, Go, JavaScript). The code in other languages (e.g. C/C++) allows for application of wider selection of tools, like model checkers, concolic testing engines and test generators. - the transpiler can be a good starting point to generate code in other languages directly (w/o intermediate Dafny code), e.g. C/C++. Since, vast majority of code and tests will be reused, developing such transpiler will not be difficult, while one can tailor produced code to particular needs (e.g. so that it's better handled by model checkers or test generators). - one can implement code generators targeting ITP systems, like Coq, HOL, Lean, etc too. It's more involved, since such systems do not support imperative and impure features. However, they are more foundational. - regular type checking of new specification versions. While other kinds of static analysis are possible too, it might be not efficient in practice, since typically requires presence of user-specified annotations. The latter are arguably better developed in the context of model checking and/or formal verification. *[ITP]: Interactive Theorem Proving ## Project Scope While we are aiming at a very simple and concise transpiler, achieving high quality in every aspect is highly likely beyond the time frame of the project. We thus assume certain restrictions and priorities. Limitations: - support only those Python features, which are used to express Python specs, with primary focus on core [consensus specs](https://github.com/ethereum/consensus-specs) - [DV specs](https://github.com/ethereum/distributed-validator-specs) are easy to support too - [execution specs](https://github.com/ethereum/execution-specs) (and perhaps later consensus specs) use `try/catch` construct (intercept exceptions), which requires dedicated handling and, thus, will complicate the transpiler. Supporting `try/catch` is therefore low priority. - currently, lambda functions are supported only as arguments of calls like `map`, `filter`, `max`, `min`, `sorted`, etc or implicitly list/set/dict comprehensions. This is how they are used in actual specs. It's possible to lift the limitation in future, by converting lambdas to closures, however, it is low priority and not currently planned. - specs code should be statically typeable. While it's generally so, a small number of spec parts is causing type inference/checking problems in practice. While in theory they can be resolved with more flexible typing rules, it's better to modify specs (by adding explicit annotations or coercions), so that they are typeable within a more or less standard type system. Priorities (sorted by decreasing priority): - support fork choice phase0 code, since there is a project to formally verify it in Dafny - manual test coverage of the transpiler phases - [AntlrV4](https://www.antlr.org/) parser/lexer - simple user interface, error messages may be cryptic - core phase0, altair and bellatrix phases - other core specs phases Lowest priorities: - native Python parser and lexer - rich user interface, comprehensive error messages, configurable transpiler - fuzz and property-based testing of parameterized python tests - fuzz and property-based testing of desugaring/simplification rules - fuzz and property-based testing of semantic assumptions - fuzz and property-based testing of python parser The main outputs: - transpiler implementation - compatibility test suite - transpiled Python specs - technical report describing the transpiler and results ## Background There is a number of projects at ConsenSys R&D, which is related to the Python-to-Dafny transpiler. They can be split in two groups: a) Dafny verification projects, b) transpilers from Python to other languages ### Ethereum-related Dafny projects at ConsensSys [Eth2.0 spec in Dafny](https://github.com/ConsenSys/eth2.0-dafny) [F/V of DVT Protocol](https://blog.ethereum.org/2021/03/22/esp-allocation-update-q4-2020) [An EVM interpreter in Dafny](https://github.com/ConsenSys/evm-dafny) ### Python transpilers @ TxRx Team Tx/Rx team at ConsenSys has been already working on developing a number of research transpilers from Python, focusing on the core consensus specs. #### Transpilers written in Kotlin Kotlin [code base](https://github.com/ericsson49/research/tree/experimental5): - Python to Kotlin transpiler - sample generated [code](https://github.com/ericsson49/research/tree/master/beacon_kotlin_generated) - multiple bugs in pyspecs found [1807](https://github.com/ethereum/consensus-specs/pull/1807), [1809](https://github.com/ethereum/consensus-specs/pull/1809), [1811](https://github.com/ethereum/consensus-specs/pull/1811), [1844](https://github.com/ethereum/consensus-specs/pull/1844), [1905](https://github.com/ethereum/consensus-specs/issues/1905), [1908](https://github.com/ethereum/consensus-specs/issues/1908), [1910](https://github.com/ethereum/consensus-specs/pull/1910), [1924](https://github.com/ethereum/consensus-specs/pull/1924), [1946](https://github.com/ethereum/consensus-specs/pull/1946), [1961](https://github.com/ethereum/consensus-specs/pull/1961), [2032](https://github.com/ethereum/consensus-specs/pull/2032), [2078](https://github.com/ethereum/consensus-specs/pull/2078), [2271](https://github.com/ethereum/consensus-specs/pull/2271), [2355](https://github.com/ethereum/consensus-specs/pull/2355), [2356](https://github.com/ethereum/consensus-specs/pull/2356), [2357](https://github.com/ethereum/consensus-specs/issues/2357), [2358](https://github.com/ethereum/consensus-specs/pull/2358), [2359](https://github.com/ethereum/consensus-specs/pull/2359), [2360](https://github.com/ethereum/consensus-specs/issues/2360), [2361](https://github.com/ethereum/consensus-specs/pull/2361), [2420](https://github.com/ethereum/consensus-specs/pull/2420), [2421](https://github.com/ethereum/consensus-specs/pull/2421), [2482](https://github.com/ethereum/consensus-specs/pull/2482), [2483](https://github.com/ethereum/consensus-specs/pull/2483), [2484](https://github.com/ethereum/consensus-specs/pull/2484), [2564](https://github.com/ethereum/consensus-specs/pull/2564), [2565](https://github.com/ethereum/consensus-specs/pull/2565) - research phase1 [implementation](https://github.com/txrx-research/teku/tree/phase1-executable-beacon/phase1/src/main/kotlin/tech/pegasys/teku/phase1) - Python to Java transpiler - sample generated [code](https://github.com/ericsson49/research/tree/master/beacon_java_generated) - multiple bugs in pyspecs found (see above) - Python to Dafny, *intermediate imperative impure* transpiler - impure and/or imperative python is translated to impure/imperative Dafny, where possible, while trying to preserve original code structure - translated [phase0 fork choice code](https://gist.github.com/ericsson49/b562909ebe1973a1de48e281e12cb796#file-phase0_fork_choice_impl-dfy) - Python to Dafny, *functional pure* transpiler - everything is translated to functional pure code: - destructive updates are translated to non-destructive ones - loops are translated to fold-like or fix-point operators - exceptions are translated to explicit outcome propagation - translated [phase0 fork choice code](https://gist.github.com/ericsson49/b562909ebe1973a1de48e281e12cb796#file-phase0_fork_choice_spec-dfy) The transpilers are based on a shared [code base](https://github.com/ericsson49/research/tree/experimental5), which includes a number of static analyses and a type inference engine. The code quality is very "researchy", however, it has been a useful testbed for experimenting with different targets and implementation approaches. ##### Some published materials [Towards formal semantics of the beacon chain’s pyspec](https://ethresear.ch/t/towards-formal-semantics-of-the-beacon-chains-pyspec/8181) [How to combine beacon chain phase definitions](https://ethresear.ch/t/how-to-combine-beacon-chain-phase-definitions/9953) [“Mutable Forest” Memory Model for blockchain specs](https://ethresear.ch/t/mutable-forest-memory-model-for-blockchain-specs/10882) #### High quality transpiler written in Scala In order to improve quality and support wider set of Python specs, a new transpiler [codebase](https://github.com/ericsson49/pyspecs2dafny) has been started recently, using Scala 3 as the implementation language. The goals are: - improve implementations quality - compact and concise code - thoroughly tested - high quality of generated target code - the transpiler design aims at reducing testing efforts required to achieve (very) good test coverage - the transpiler should be usable by others - reduce efforts to support other Python specs (e.g. execution specs, DV specs, p2p specs) - reduce efforts to support other target languages (e.g. C, Coq/Lean/HOL, etc) Currently, the codebase contains implementation and tests of most of Python features required for fork choice phase0: - functions and classes (limited forms) - `if`, `while`, `for`, assignments, `return`, `assert`, `break`, `continue` statements - many expression types, including if expression and list/set/dict comprehensions, limited lambda expression support - concise term rewriting engine - simplification rules expressed as term rewriting rules (`Stmt -> List[Stmt]`, `Expr -> Expr`, `Expr -> (List[Stmt], Expr)`. - concise variable declaration inference ## Methodology The main goal is to reduce testing efforts required to achieve trustworthy translation to Dafny. The primary way here is to minimize the set of properties which must be tested. There are several directions: - carefully chosen Python subset - do not need to test features, which are not used - validate as many properties as possible in Dafny world, using symbolic reasoning and theorem proving - minimize the semantic gap between the supported Python subset and the target Dafny subset. Ideally, Python classes, objects, functions, statements and expressions should be mapped to corresponding Dafny classes, objects, methods, statements and expressions. - desugar the supported Python subset to a small core language, which consists of few constructs. The simple desugared code is much easier to map to Dafny, in particular it avoids certain problems with Dafny restrictions. - keep transpiler code as simple as possible. Ideally, it should be written as a primitive recursion on PyAST. **NB** Since Dafny supports object-oriented, imperative programming and programs allocating and mutating heap objects, the goal is easier to accomplish than with some other verification-friendly languages. **NB** Note that Dafny doesn't support exceptions, while there is some support to emulate them. Implementing `try/catch` support requires additional code transformations. **NB** Dafny does not support method lambdas. While it supports function lambdas, there are severe restrictions on what a function method can do (cannot allocate heap objects, call methods, modify heap objects, use imperative control flow). Since, in the pyspecs lambdas are used only as parameters to `filter`, `map`, `max/min`, `sorted` methods (including implicit use as part of list/set/dict comprehensions), we chose to inline such calls, so that lambda invocations can be inlined inside methods. An alternative approach would be to translate a lambda to a closure. Based on that, the transpiler consists of the following phases: - parsing and lexing - desugaring and simplifications - inference (variable declarations, type inference) - code generation Each phases can be made relatively simple and straightforward, with limited interaction between phases, which reduces efforts to review and to test the phases implementation. Each phase can be split in a set of fine-grained rules. Each rule can be tested in isolation - often exhaustively, for practical purposes. Then reasonable combinations of rules can be tested. **NB** Fuzz or property-based testing can be applied to test random combinations of rules applications and/or PyAST instances. Additionally: - we use Scala 3, which is very expressive and suitable to implementing transpilers (GADTs, Partial Functions, Pattern Matching, Contextual Abstractions, etc) - desugaring and simplification rules are expressed using term rewriting. We use a very small and concise rewriting engine. The rules are very simple and fine-grained, so that they are easier to review by human experts. - the transpiler is mostly written in pure functional style - Dafny type checking is to be employed to test the inferred type annotation. Therefore, potential bugs in type inference will be revealed during Dafny type checking. ## Future work A list of ideas which can be implemented in future: - support native Python parser and lexer. - generate Scala parser based on [python grammar description](https://docs.python.org/3.8/reference/grammar.html) (similar to how the native Python parser is generated). - implement differential fuzz testing of Antlr4 parser vs native parser. - fuzz/property-based testing of python semantical assumptions (expressed as parameterized Python tests). - implement C code generation (requires type inference). - implement C++ code generation (since C++ supports Object-Oriented paradigm). - better user interface, including better error messages, configuration, etc. - support other specifications expressed in Python, e.g. Ethereum [execution specs](https://github.com/ethereum/execution-specs)

Read more

Fork Choice implementation testing framework

Fork Choice spec compliance testing Grant Application

Why clock sync matters in Ethereum 2.0

A model to prove absence of aliasing