Formal engineering of beacon chain spec

# Formal engineering of beacon chain spec This is a follow up post to [Formal Engineering for Blockchains](https://hackmd.io/REqkGV21T8Kf86Rh_FFN_g), which discusses the [beacon chain protocol](https://github.com/ethereum/eth2.0-specs/tree/dev) specifics. ## Intro There is a growing interest to applying formal methods in industrial software engineering. It's perhaps mostly apparent in areas, where software quality is critical, which is exactly the case for the blockchain protocol implementations. Consequences of bugs in such software can be quite costly or even devastating. Even if a bug is not severe, deploying a fix in a distributed BFT setup can be a problem, if such a fix affects consensus. By their nature, blockchains are designed to be resilient to arbitrary faults and thus a bug fix can be seen as a form of a byzantine fault. For the reason, blockchain software evolution and maintenance would be easier, if bugs were wiped out. While it's impossible in general, formal methods can wipe out certain classes of bugs. In the write up, we want to discuss how the benefits can be brought to the beacon chain protocol development, however the ideas can be applicable to other blockchains as well. Our emphasis is on the engineering side: what are the problems, possible solutions and what can be done to improve the situation. The emphasis also means that we mostly concentrate on coding/engineering aspects, i.e. on bugs and problems arising during protocol specification and implementation. Though higher level analysis of the protocol definitely can (and does) benefit from application of formal methods, we mostly ignore this aspect here. One (minor) reason is that such work is already conducted by other teams. More importantly, a high-level analysis often assumes simplified versions of protocols. However, industrial-strength specifications and implementations necessarily add lots of low-level details. And software engineers need tools and approaches to manage the increased complexity. We thus believe that high-level analysis of protocols and industrial strength implementation are two ends, opposite in some sense and thus require separate treatment. However, it would be definitely great to trace high-level properties down to low-level bits, in some (hopefully not too distant) future. One example which illustrate the idea is that a practical beacon chain implementation will highly likely use a fixed-width integers. So, an overflow becomes possible. It's a known obstacle for formal verification, so a high-level analysis will likely use arbitrary-precision integers. Another example is that practical implementation will highly likely use imperative features, i.e. destructive updates, especially involving heap memory. This is another well-know "nightmare" for a formal verification. A third example is concurrency, which can also complicated formal verification significantly. And which, however, is difficult to avoid in a practical blockchain implementation. There can be other facets and examples, like interfacing with outer world, storage (i.e. effects). ### Beacon chain specs engineering Hereinafter, we concentrate on beacon chain protocol specifics (Eth2, in a more general context). One serious obstacle for application of formal methods to the beacon chain protocol development process is that its specs, in its current form, is a product of traditional software engineering approach, which do not extensively use formal methods. The only notable exception is static type checking, which can be seen as a lightweight formal method. As we note above, real world program exploit features, which are difficult for formal methods: fixed-width numeric types, destructive updates, mutable heap, side effects, concurrency, etc. Another problem is that formal verification requires extensive annotation of source code, which hinders readability. A particular feature of beacon chain specs is that its executable part specified in Python, which is definitely not a language of choice from a formal method practitioner point of view. However, Spec Developers use a statically-typed subset of Python (with some other restrictions). Anyway, there are contradictions between traditional software engineering approach and formal method requirements. We, however, believe that in a long run most of such contradictions are resolved. Basically, everyone is interested in a high quality specification, which can be translated to a high quality implementations. And formally proved properties, which justifies optimizations. From software engineering perspective, one must be very careful when introducing such changes into existing development process(es). We therefore review main roles and their requirements regarding the specs: Spec Developers, Spec Implementers and Spec Verifiers. These requirements differ during early and later stages. Basically, during early stages extra formalization may become counter-productive, as it imposes certain overhead. However, at later stages, when a general 'shape' of a protocol is more or less stable, there are lots of little details and contradictions to be settled and resolved. Mechanized static checkers can be very helpful here. An additional annotation required is less of a burden here. We therefore pursue a way of gradual introduction of formal methods into existing development process. There can be several possible directions: - build a better formal ground for specs development - extended spec annotations - mechanical spec translations - (extended) static checking We somewhat ignore formal verification here, as our goal is to 'engineer specs', which includes building a better ground for full-blown formal verification. We thus want to limit the scope (and the size) of the write up, and defer the formal verification discussion until following texts.  # Directions? ## Better formal foundation for specs development Python is not the best language for formal developments. A straightforward solution would be to port the specs to a language, which has a better formal foundation and an ecosystem of formal method tools. However, as current specs development process is built around Python, we believe that a better formal foundation can be built relatively easy by augmenting the existing executable py-specs with formal semantics. In general, developing formal semantics for a full-featured imperative programming language with object-oriented features can be quite difficult. There can be even more problems, when formally proving properties of programs, expressed in such a language. It would be adventurous to develop full Python semantics from scratch. Actually, there exist (partial) Python semantics, which can be re-used or adapted. However, such semantics are incomplete, my lack certain features and have been developed with different goals in mind. For example, they can be "heavy", supporting more features than required for beacon chain specification needs. As a back side, reasoning with such semantics can be more difficult. While the above options are worth pursuing, in the write up, we discuss another path: build lightweight semantics, tailored to beacon chain specification needs. In practice, Specs Developers use relatively small Python subset, so such efforts can be simplified significantly. A major advantage of the approach is that the semantics can be made lightweight, by ignoring features, which are not employed in the beacon chain specs. It's perhaps even more important, that reasoning about simpler specs can be easier - e.g. such semantics can employ simplified and constrained heap model. There are two additional arguments, which makes the option more attractive. First, Spec Developers make efforts to keep spec clear and unambiguous, e.g. avoiding constructs which are specific to Python, uses unsigned 64bit integers preferably, etc. In some cases, some pieces of code are re-written to make it clear and reduce chances of misinterpretations. Additionally, rarely used constructs, which are heavy from formalization point of view, can in theory be re-written too, if it helps to reduce overall efforts. Second and perhaps more importantly, the efforts to build Python semantics significantly overlap with efforts to develop py-specs transpiler, which turned to be quite useful for research and bug hunting. In fact, there is an inherent synergy between two activities: in order to build a high quality transpiler, one should have good understanding of semantics, while such semantics can be expressed and/or defined using the transpilation approach. We consider the synergetic approach as formal (specification) engineering, as it provides a possibilities to maneuver with existing source code by enriching it with formal semantics, translation to other languages, code transformations and reaching existing formal method eco-systems. It assumes certain restrictions on the subset of the source language used, however, it's perhaps inevitable and it can be carefully chosen to nicely match the problem domain. ## Semantics for the subset employed by py-specs In the section, we explore aspects of semantics, which need to be detailed and formalized. ### Concrete and abstract syntax A quite useful feature is that Python implementation includes a Python AST library, which is used by the Python interpreter. That means the Python AST can be used as a starting point when defining semantics, ignoring concrete syntax and reducing overall efforts. In theory, there can be "bugs" in python parser, but in the case of executable py-specs, it's not clear whether it would be indeed a bug or a feature. For simplicity, we assume that the AST form is the "true" "specification" and should there be any discrepancies between the AST syntax and concrete syntax, it should be resolved somehow, e.g. by switching to a new Python version or re-writing py-specs to avoid triggering the discrepancy, if former is not possible. In future, the translation from the concrete to the abstract syntax can be formally specified. ### Concurrency and determinism Current py-specs assumes single-threaded deterministic execution, so concurrency issues is not a problem at the moment. And thus significantly simplifies semantics development and formal reasoning. In general, it's possible that in future some spec aspects may benefit from concurrency features and/or non-determinism. For example, network-level (sub)protocols may need some form of non-determinism and/or concurrency. We defer detailed discussion of the topic until it's actually needed, assuming that single-threaded deterministic model is enough in short/mid term. ### Data types #### Numerics From native Python numeric data types, py-specs only uses `int` and `bool`. Floating point and complex numbers are avoided. However, the main work-horse is unsigned 64 bit integer (`uint64`) data type, which are part of SSZ specification. Additionally, `uint8`, `boolean` and `bit` of the SSZ spec are occasionally used too. Over- or under-flow should not happen at normal conditions and indicates bad input data (or a bug in specs). Details of semantics of operations with numerics deserves a separate document. We only briefly discuss briefly principles. ##### Data type conversions All the custom numeric data types are subtypes of Python `int`, so they can be used where an `int` is expected. However, there can be conversions between the numeric subtypes. We assume that when such a conversion cannot result in an overflow or an underflow, i.e. a lower bit-width integer is used where a higher bit-width integer is expected, then no explicit conversion is required. Otherwise, an explicit conversion is required, as a signal to Specs Implementers, that a conversion error is possible. Not also, that Python treats byte values as `int`s, so, when an element of a `bytes` value is accessed, it may be required to convert it to `uint8` explicitly, which may be excessive. A solution may be to allow implicit conversion to `uint8` in such cases (when an operations semantics ensures that no overflow is expected). ##### Conversion to `bool` There is a tricky case, when a `bool` type is expected, e.g. a test in `if` or `while` construct or a parameter of boolean operations. In some languages, including a Python non-bools are implicitly converted to a `bool` value, using some rules. Other languages avoid this and require an explicit conversion to `bool`. Currently, py-specs do not use explicit conversions to `bool`. We believe, it's not a big deal and assume and implicit conversion is okay in certain cases. ##### Numeric operations There can be a case, when a binary (or ternary) operation have arguments of different numeric types, e.g. `uint8 + uint64` or `uint64 - int`. Assuming the implicit conversion to a wider type is fine, it's rather natural to require that the result type should be of the 'widest' type of two (or three) argument types. Some integer operations like reminder/modulo, bitshifts, integer division can be defined differently for certain combinations of arguments. For example, `(-unit8) % uint8` or `uint8 % (-uint8)` can return different results in different languages. Integer division by zero can behave differently too. Shifting bits results also may vary (e.g. `uint64 >> 65`). In a multi-language context, when implementations are to be in different languages, a natural option perhaps is to avoid such cases whenever possible or throw a runtime exception. As we wrote before, we plan to clarify integer operation details in a separate document. #### Other Python data types Py-specs use some other Python native data types, like `list`s `set`, `dict`, `tuple`, `bytes`. `str` is not used. #### Custom and SSZ data types Classes and object-oriented features are used in a limited way. Py-specs themselves define a number of classes, which have SSZ types as their base classes typically. Only member fields and no member methods are defined. However, SSZ implementation uses inheritance and OO features more extensively. As py-specs relies on the SSZ specs, corresponding semantics should be defined too. We plan to discuss proposed semantics describing SSZ and custom data types in a separate document. ### Memory model There are three main memory components: `globals`, `locals` (e.g. stack frame) and heap. #### Globals and locals Python `Dict[str, Any]` is a natural way to describe global and local variables. We assume that py-specs can only read `globals` during normal method execution (though it contents can be defined statically, during initialization). We assume that there is a `locals` context per each function frame on stack, which can be read and written by the corresponding function. It can be also read by lambdas and inner functions, defined in the top-level function. The `nonlocal` Python keyword is not used at the moment. #### Heap model Heap model is more tricky. Reasoning formally about heap is one of the most difficult problem in general. Thus, it's better to use simplified models where reasoning automation can be much easier to implement. The most simple option would be limit heap mutation to allocation of new objects (and garbage collecting them). I.e. avoid destructive updates of allocated data structures. This can facilitate many optimizations, like memoizing, incremental hashing or other calculations, incremental storage updates. It also avoids many problems in multi-threaded context. Current py-specs use destructive updates relatively infrequent, so it can be re-written to a pure form with relatively low efforts. However, readability will be worse and it will be more difficult to contribute to the py-spec development for programmers, who are not familiar enough with pure functional programming. So, the option is considered as not appropriate right now. A milder option is to leave only top-level data structures mutable (i.e. BeaconState and Store at the moment, perhaps some other in future) and make other data structures immutable. **NB** This applies to the py-specs only, e.g. runtime libraries use mutable caches, which are intended to speed up py-specs execution and should use mutation for the reason. ##### Avoiding problems with aliasing Ownership type system of borrowing of Rust is perhaps the best match to the current py-specs, from heap model point of view. The main problem when formally reasoning about mutable heap is aliasing, i.e. there can be two or more pointers to the same heap location. If the location can be mutated via one of the pointers, then consequent reads should use the new value. Which means the information should be tracked somehow (which can significantly complicate formulas as well as reasoning about them). Type system of Rust doesn't allow to have a mutable reference and any other (mutable or not) reference to an object at the same time. Therefore the such problematic aliasing cases are excluded, i.e. multiple reference to the same location are possible only when all of them cannot mutate it. Such scheme can be partially implemented in runtime, using either static or dynamic (e.g. reference counting) approaches. The approach is a good match to prevent certain memory issues in the case of beacon chain protocol. By their nature, main objects like BeaconBlock, BeaconState, Attestation should immutable (once created and fully initialized). However, in practice construction of such objects can be defined with destructive updates. It is mostly apparent with BeaconState, where a new version of the state created by mutating a copy of a previous state. BeaconBlock is a simpler example: a partially initialized block is created first and then its `state_root` is updated. But after a state or a block is fully defined, it shouldn't be ever mutated, i.e. it reaches immutable state. In practice, due to programmer's mistakes, such mutation code can occasionally update objects, which should be immutable. E.g. a programmer can forget to make a copy of such object. The ownership type system of Rust is expressive enough to prevent such problems. I.e. a block or a state which reached the 'immutable' state can be 'marked' so by obtaining a (non-mutable) reference of it. Rust type checker will allow only non-mutable references then. And if some code wants to make a new version, based on some prototype, it can make a copy of it. Rust ownership type system requires user annotation, e.g. reference vs value, mutable vs non-mutable reference, scope annotations in certain cases, Copy traits, etc. However, such annotations can be sometimes inferred. It's an open question whether it's true for the py-specs case. However, pyspecs can be augmented with necessary annotations, when they cannot be inferred. ### Subset of language constructs We discuss AST components in the section, i.e. statements, expressions, variables, etc. #### Statements Py-spec itself (excluding dependencies and supporting runtime) uses a limited set of Python statements: `if`, `for`, `while`, `assign`, `return`, `assert`, `pass`, `break`, `continue`. There are a couple of inner functions (excluding lambdas), but they can be easily converted to a top-level ones. There is no inner classes defined. Py-spec do not use `try` statement at the moment (but used before), `raise`, `delete`, `async`s, `global`, `nonlocal`. #### Expressions Py-spec currently doesn't use `await`, `yield (from)`, [assignment expression](https://www.python.org/dev/peps/pep-0572/#:~:text=else%3A%20result%20%3D%20None-,Syntax%20and%20semantics,and%20NAME%20is%20an%20identifier.). It would be reasonable to restrict lambdas to be pure functions, i.e. they shouldn't call functions which can modify state. While it can be useful in some cases, it can be re-written using normal (named) functions, if needed. At the same time, the way lambdas are typically used (passed to `map`, `filter`, `sorted`, etc) would make it more difficult to reason about (both formally and informally), if they can modify state.