# Formal engineering of beacon chain spec
This is a follow up post to [Formal Engineering for Blockchains](https://hackmd.io/REqkGV21T8Kf86Rh_FFN_g), which discusses the [beacon chain protocol](https://github.com/ethereum/eth2.0-specs/tree/dev) specifics.
## Intro
There is a growing interest to applying formal methods in industrial software engineering. It's perhaps mostly apparent in areas, where software quality is critical, which is exactly the case for the blockchain protocol implementations. Consequences of bugs in such software can be quite costly or even devastating. Even if a bug is not severe, deploying a fix in a distributed BFT setup can be a problem, if such a fix affects consensus. By their nature, blockchains are designed to be resilient to arbitrary faults and thus a bug fix can be seen as a form of a byzantine fault. For the reason, blockchain software evolution and maintenance would be easier, if bugs were wiped out. While it's impossible in general, formal methods can wipe out certain classes of bugs.
In the write up, we want to discuss how the benefits can be brought to the beacon chain protocol development, however the ideas can be applicable to other blockchains as well. Our emphasis is on the engineering side: what are the problems, possible solutions and what can be done to improve the situation. The emphasis also means that we mostly concentrate on coding/engineering aspects, i.e. on bugs and problems arising during protocol specification and implementation.
Though higher level analysis of the protocol definitely can (and does) benefit from application of formal methods, we mostly ignore this aspect here. One (minor) reason is that such work is already conducted by other teams. More importantly, a high-level analysis often assumes simplified versions of protocols. However, industrial-strength specifications and implementations necessarily add lots of low-level details. And software engineers need tools and approaches to manage the increased complexity.
We thus believe that high-level analysis of protocols and industrial strength implementation are two ends, opposite in some sense and thus require separate treatment. However, it would be definitely great to trace high-level properties down to low-level bits, in some (hopefully not too distant) future.
One example which illustrate the idea is that a practical beacon chain implementation will highly likely use a fixed-width integers. So, an overflow becomes possible. It's a known obstacle for formal verification, so a high-level analysis will likely use arbitrary-precision integers.
Another example is that practical implementation will highly likely use imperative features, i.e. destructive updates, especially involving heap memory. This is another well-know "nightmare" for a formal verification.
A third example is concurrency, which can also complicated formal verification significantly. And which, however, is difficult to avoid in a practical blockchain implementation.
There can be other facets and examples, like interfacing with outer world, storage (i.e. effects).
### Beacon chain specs engineering
Hereinafter, we concentrate on beacon chain protocol specifics (Eth2, in a more general context).
One serious obstacle for application of formal methods to the beacon chain protocol development process is that its specs, in its current form, is a product of traditional software engineering approach, which do not extensively use formal methods. The only notable exception is static type checking, which can be seen as a lightweight formal method.
As we note above, real world program exploit features, which are difficult for formal methods: fixed-width numeric types, destructive updates, mutable heap, side effects, concurrency, etc. Another problem is that formal verification requires extensive annotation of source code, which hinders readability.
A particular feature of beacon chain specs is that its executable part specified in Python, which is definitely not a language of choice from a formal method practitioner point of view. However, Spec Developers use a statically-typed subset of Python (with some other restrictions).
Anyway, there are contradictions between traditional software engineering approach and formal method requirements. We, however, believe that in a long run most of such contradictions are resolved. Basically, everyone is interested in a high quality specification, which can be translated to a high quality implementations. And formally proved properties, which justifies optimizations.
From software engineering perspective, one must be very careful when introducing such changes into existing development process(es). We therefore review main roles and their requirements regarding the specs: Spec Developers, Spec Implementers and Spec Verifiers.
These requirements differ during early and later stages. Basically, during early stages extra formalization may become counter-productive, as it imposes certain overhead. However, at later stages, when a general 'shape' of a protocol is more or less stable, there are lots of little details and contradictions to be settled and resolved. Mechanized static checkers can be very helpful here. An additional annotation required is less of a burden here.
We therefore pursue a way of gradual introduction of formal methods into existing development process. There can be several possible directions:
- build a better formal ground for specs development
- extended spec annotations
- mechanical spec translations
- (extended) static checking
We somewhat ignore formal verification here, as our goal is to 'engineer specs', which includes building a better ground for full-blown formal verification. We thus want to limit the scope (and the size) of the write up, and defer the formal verification discussion until following texts.
<!-- ???As we noted above, type annotations already present in the current py-specs. There are also existing efforts to leverage the annotations to find bugs. And to introduce new ones along with corresponding formal method derivations.
We consider three main roles - or kinds of stakeholders. Besides Spec Developers and Spec Implementers, there are Spec Verifiers, which actually refers to formal methods practitioners who work to prove various properties - or found respective counter-examples. We believe that the role becomes increasingly important - while in an idealized world, Spec Developers and Spec Implementers would be enough.
We also explore the roles in dynamic, i.e. how the roles requirements regarding the specs change over time. And perhaps, most importantly, how the specs can be adapted.
-->
# Directions?
## Better formal foundation for specs development
Python is not the best language for formal developments. A straightforward solution would be to port the specs to a language, which has a better formal foundation and an ecosystem of formal method tools.
However, as current specs development process is built around Python, we believe that a better formal foundation can be built relatively easy by augmenting the existing executable py-specs with formal semantics.
In general, developing formal semantics for a full-featured imperative programming language with object-oriented features can be quite difficult. There can be even more problems, when formally proving properties of programs, expressed in such a language.
It would be adventurous to develop full Python semantics from scratch. Actually, there exist (partial) Python semantics, which can be re-used or adapted. However, such semantics are incomplete, my lack certain features and have been developed with different goals in mind. For example, they can be "heavy", supporting more features than required for beacon chain specification needs. As a back side, reasoning with such semantics can be more difficult.
While the above options are worth pursuing, in the write up, we discuss another path: build lightweight semantics, tailored to beacon chain specification needs. In practice, Specs Developers use relatively small Python subset, so such efforts can be simplified significantly.
A major advantage of the approach is that the semantics can be made lightweight, by ignoring features, which are not employed in the beacon chain specs. It's perhaps even more important, that reasoning about simpler specs can be easier - e.g. such semantics can employ simplified and constrained heap model.
There are two additional arguments, which makes the option more attractive.
First, Spec Developers make efforts to keep spec clear and unambiguous, e.g. avoiding constructs which are specific to Python, uses unsigned 64bit integers preferably, etc. In some cases, some pieces of code are re-written to make it clear and reduce chances of misinterpretations. Additionally, rarely used constructs, which are heavy from formalization point of view, can in theory be re-written too, if it helps to reduce overall efforts.
Second and perhaps more importantly, the efforts to build Python semantics significantly overlap with efforts to develop py-specs transpiler, which turned to be quite useful for research and bug hunting. In fact, there is an inherent synergy between two activities: in order to build a high quality transpiler, one should have good understanding of semantics, while such semantics can be expressed and/or defined using the transpilation approach.
We consider the synergetic approach as formal (specification) engineering, as it provides a possibilities to maneuver with existing source code by enriching it with formal semantics, translation to other languages, code transformations and reaching existing formal method eco-systems. It assumes certain restrictions on the subset of the source language used, however, it's perhaps inevitable and it can be carefully chosen to nicely match the problem domain.
## Semantics for the subset employed by py-specs
In the section, we explore aspects of semantics, which need to be detailed and formalized.
### Concrete and abstract syntax
A quite useful feature is that Python implementation includes a Python AST library, which is used by the Python interpreter. That means the Python AST can be used as a starting point when defining semantics, ignoring concrete syntax and reducing overall efforts.
In theory, there can be "bugs" in python parser, but in the case of executable py-specs, it's not clear whether it would be indeed a bug or a feature.
For simplicity, we assume that the AST form is the "true" "specification" and should there be any discrepancies between the AST syntax and concrete syntax, it should be resolved somehow, e.g. by switching to a new Python version or re-writing py-specs to avoid triggering the discrepancy, if former is not possible.
In future, the translation from the concrete to the abstract syntax can be formally specified.
### Concurrency and determinism
Current py-specs assumes single-threaded deterministic execution, so concurrency issues is not a problem at the moment. And thus significantly simplifies semantics development and formal reasoning.
In general, it's possible that in future some spec aspects may benefit from concurrency features and/or non-determinism. For example, network-level (sub)protocols may need some form of non-determinism and/or concurrency.
We defer detailed discussion of the topic until it's actually needed, assuming that single-threaded deterministic model is enough in short/mid term.
### Data types
#### Numerics
From native Python numeric data types, py-specs only uses `int` and `bool`. Floating point and complex numbers are avoided.
However, the main work-horse is unsigned 64 bit integer (`uint64`) data type, which are part of SSZ specification. Additionally, `uint8`, `boolean` and `bit` of the SSZ spec are occasionally used too.
Over- or under-flow should not happen at normal conditions and indicates bad input data (or a bug in specs).
Details of semantics of operations with numerics deserves a separate document. We only briefly discuss briefly principles.
##### Data type conversions
All the custom numeric data types are subtypes of Python `int`, so they can be used where an `int` is expected.
However, there can be conversions between the numeric subtypes. We assume that when such a conversion cannot result in an overflow or an underflow, i.e. a lower bit-width integer is used where a higher bit-width integer is expected, then no explicit conversion is required. Otherwise, an explicit conversion is required, as a signal to Specs Implementers, that a conversion error is possible.
Not also, that Python treats byte values as `int`s, so, when an element of a `bytes` value is accessed, it may be required to convert it to `uint8` explicitly, which may be excessive. A solution may be to allow implicit conversion to `uint8` in such cases (when an operations semantics ensures that no overflow is expected).
##### Conversion to `bool`
There is a tricky case, when a `bool` type is expected, e.g. a test in `if` or `while` construct or a parameter of boolean operations. In some languages, including a Python non-bools are implicitly converted to a `bool` value, using some rules.
Other languages avoid this and require an explicit conversion to `bool`.
Currently, py-specs do not use explicit conversions to `bool`. We believe, it's not a big deal and assume and implicit conversion is okay in certain cases.
##### Numeric operations
There can be a case, when a binary (or ternary) operation have arguments of different numeric types, e.g. `uint8 + uint64` or `uint64 - int`.
Assuming the implicit conversion to a wider type is fine, it's rather natural to require that the result type should be of the 'widest' type of two (or three) argument types.
Some integer operations like reminder/modulo, bitshifts, integer division can be defined differently for certain combinations of arguments. For example, `(-unit8) % uint8` or `uint8 % (-uint8)` can return different results in different languages. Integer division by zero can behave differently too. Shifting bits results also may vary (e.g. `uint64 >> 65`).
In a multi-language context, when implementations are to be in different languages, a natural option perhaps is to avoid such cases whenever possible or throw a runtime exception.
As we wrote before, we plan to clarify integer operation details in a separate document.
#### Other Python data types
Py-specs use some other Python native data types, like `list`s `set`, `dict`, `tuple`, `bytes`. `str` is not used.
#### Custom and SSZ data types
Classes and object-oriented features are used in a limited way.
Py-specs themselves define a number of classes, which have SSZ types as their base classes typically. Only member fields and no member methods are defined.
However, SSZ implementation uses inheritance and OO features more extensively. As py-specs relies on the SSZ specs, corresponding semantics should be defined too.
We plan to discuss proposed semantics describing SSZ and custom data types in a separate document.
### Memory model
There are three main memory components: `globals`, `locals` (e.g. stack frame) and heap.
#### Globals and locals
Python `Dict[str, Any]` is a natural way to describe global and local variables. We assume that py-specs can only read `globals` during normal method execution (though it contents can be defined statically, during initialization).
We assume that there is a `locals` context per each function frame on stack, which can be read and written by the corresponding function. It can be also read by lambdas and inner functions, defined in the top-level function. The `nonlocal` Python keyword is not used at the moment.
#### Heap model
Heap model is more tricky. Reasoning formally about heap is one of the most difficult problem in general. Thus, it's better to use simplified models where reasoning automation can be much easier to implement.
The most simple option would be limit heap mutation to allocation of new objects (and garbage collecting them). I.e. avoid destructive updates of allocated data structures.
This can facilitate many optimizations, like memoizing, incremental hashing or other calculations, incremental storage updates. It also avoids many problems in multi-threaded context.
Current py-specs use destructive updates relatively infrequent, so it can be re-written to a pure form with relatively low efforts.
However, readability will be worse and it will be more difficult to contribute to the py-spec development for programmers, who are not familiar enough with pure functional programming. So, the option is considered as not appropriate right now.
A milder option is to leave only top-level data structures mutable (i.e. BeaconState and Store at the moment, perhaps some other in future) and make other data structures immutable.
**NB** This applies to the py-specs only, e.g. runtime libraries use mutable caches, which are intended to speed up py-specs execution and should use mutation for the reason.
##### Avoiding problems with aliasing
Ownership type system of borrowing of Rust is perhaps the best match to the current py-specs, from heap model point of view. The main problem when formally reasoning about mutable heap is aliasing, i.e. there can be two or more pointers to the same heap location. If the location can be mutated via one of the pointers, then consequent reads should use the new value. Which means the information should be tracked somehow (which can significantly complicate formulas as well as reasoning about them).
Type system of Rust doesn't allow to have a mutable reference and any other (mutable or not) reference to an object at the same time. Therefore the such problematic aliasing cases are excluded, i.e. multiple reference to the same location are possible only when all of them cannot mutate it.
Such scheme can be partially implemented in runtime, using either static or dynamic (e.g. reference counting) approaches.
The approach is a good match to prevent certain memory issues in the case of beacon chain protocol. By their nature, main objects like BeaconBlock, BeaconState, Attestation should immutable (once created and fully initialized). However, in practice construction of such objects can be defined with destructive updates. It is mostly apparent with BeaconState, where a new version of the state created by mutating a copy of a previous state.
BeaconBlock is a simpler example: a partially initialized block is created first and then its `state_root` is updated.
But after a state or a block is fully defined, it shouldn't be ever mutated, i.e. it reaches immutable state.
In practice, due to programmer's mistakes, such mutation code can occasionally update objects, which should be immutable. E.g. a programmer can forget to make a copy of such object.
The ownership type system of Rust is expressive enough to prevent such problems. I.e. a block or a state which reached the 'immutable' state can be 'marked' so by obtaining a (non-mutable) reference of it. Rust type checker will allow only non-mutable references then. And if some code wants to make a new version, based on some prototype, it can make a copy of it.
Rust ownership type system requires user annotation, e.g. reference vs value, mutable vs non-mutable reference, scope annotations in certain cases, Copy traits, etc. However, such annotations can be sometimes inferred. It's an open question whether it's true for the py-specs case. However, pyspecs can be augmented with necessary annotations, when they cannot be inferred.
### Subset of language constructs
We discuss AST components in the section, i.e. statements, expressions, variables, etc.
#### Statements
Py-spec itself (excluding dependencies and supporting runtime) uses a limited set of Python statements: `if`, `for`, `while`, `assign`, `return`, `assert`, `pass`, `break`, `continue`.
There are a couple of inner functions (excluding lambdas), but they can be easily converted to a top-level ones.
There is no inner classes defined.
Py-spec do not use `try` statement at the moment (but used before), `raise`, `delete`, `async`s, `global`, `nonlocal`.
#### Expressions
Py-spec currently doesn't use `await`, `yield (from)`, [assignment expression](https://www.python.org/dev/peps/pep-0572/#:~:text=else%3A%20result%20%3D%20None-,Syntax%20and%20semantics,and%20NAME%20is%20an%20identifier.).
It would be reasonable to restrict lambdas to be pure functions, i.e. they shouldn't call functions which can modify state.
While it can be useful in some cases, it can be re-written using normal (named) functions, if needed. At the same time, the way lambdas are typically used (passed to `map`, `filter`, `sorted`, etc) would make it more difficult to reason about (both formally and informally), if they can modify state.
<!--
Memory model is a very important aspect of semantics. Current py-specs follow imperative paradigm, including use of destructive updates. It also uses heap memory with GC, datastructures being destructively updated too.
Though, by their nature, blockchain protocols are quite close to persistent data structures, used by (pure) functional languages. So, in general, destructive updates is a construct dictated by the imperative paradigm, which can be avoided in theory, with relatively low efforts. However, we believe that at the moment, destructive updates are important for readability.
At the same time, formal work can benefit a lot from a pure specification, which doesn't use destructive updates (or use them in a very limited way, e.g. at top level). This also quite beneficial for Specs Implementers at later stages. For example, pure code is much easier to memoize (cache heavy calculations), it's much easier to execute in a multi-threaded environment, incremental updates/re-calculation are easier to implement too.
# Beacon chain specs
## Specs (un)ambiguity
One of the most important qualities of the specs is (un)ambiguity. In the case of consensus, specs ambiguity can easily become a source of bugs: i.e. different implementations can treat it in different ways, which can break consensus in certain cases. However, one shouldn't require specs to be unambiguous in every possible way. Actually, ambiguity doesn't necessarily mean violation of important properties. In this respect, it's important to leave place for optimizations on the implementation level.
One very straightforward example could be that specs may specify a particular order of processing for a list of items (e.g. validators). However, such order can be expensive and/or some items can be skipped without violating critical properties.
This we do not consider specs ambiguity as a problem by itself.
## Examples???
# Spec User Roles & Their Views on the Spec
Requirements to how beacon chain specification should be shaped vary among different users. We do not try to describe requirements precisely, but rather want to summarize experience, outline problems and tensions, and predict future needs, for core spec user roles. We also believe that diversity between roles could be attributed to concentration on different aspects though beacon chain development lifecycle.
For example, it may be difficult and inconvenient to start from a formal specification, as formal methods are associated with high costs. So, spending time on draft ideas may be counter-productive. On the other hand, when ideas more or less mature, formal method can bring enormous benefits catching problems earlier and "forcing" to formulate ideas in an explicit form. For example, statical type checking can be considered as a lightweight formal method and routinely used to catch certain class of bugs very effectively. While type checking incurs some burden, type checking is performed mechanically in most language implementations.
## Spec Developers
Beacon chain spec development is heavily Python-centered. Large parts of beacon chain protocol and related functionality is specified using Python code (interleaved with prose), so that there is an executable specification (for important parts of beacon chain protocol). And related tools, like spec tests, tests generators, etc.
There are many benefits of the approach, as Python is a very popular language and any decent programmer can understand it or contribute to spec development.
The dark side of the approach is that Python is lacking formal method tools. So, this constitutes a problem in a long term. For example, several bugs have been found in spec, which would be mechanically found by static type checkers of modern (statically typed languages).
Requirements
- spec readability and unambiguity
- it should be easy to contribute to the spec
- executable spec (partial)
Problems:
- static analysis
- pre-/post-condition annotations
- lack of formal grounds
## Spec Implementers
Spec Implementers should translate the spec into different languages, without introducing bugs. An
There is a difference between "occasional" reading and building an industrial-strength implementation. So, we distinguish just Spec "Reading" and Spec Implementers.
Basically, both are important, latter on early stages and former on later stages.
### simple
Need "readable" spec. At the same time unambiguous.
Actually, need stay close to specs (reference implementation).
### advanced
But need fast, optimized implementation.
- optimized state transition (e.g. epoch transition)
- optimized fork choice (e.g. protoarray)
- SSZ/hash_tree_root recalculations ???
- state/store differencing or updates
## FM Users
Python is not good for formal stuff.
Actually, EF team uses a subset of Python.
Need to sync formal developments with state changes.
Mutability/Purity.
Pure spec is much easier to reason about formally.
beacon chain is inherently persistent data structure.
Optimizations are required to speed up calculations, e.g. caching.
Easy to introduce a bug, when datastructures are mutable:
- caching/optimizations
- multithreading
- refactoring
Typically, the closer code to spec, the less bugs.
Implementers need to "depart" from pyspec code layout, mutable data structures is a source of bugs here.
# Goals
Formal verification of spec properties (security, safety, maybe liveness, etc)
Formal verification of optimized versions
Test generation (difficult to verify implementations)
Code generation (e.g. reference implementation)
TCB minimization/optimization
# Spec Problems
Python is poor from FM point of view
Mutability
Is rather a reference implementation than a spec
# Ideas to solve spec problems
## Modularization
## Immutabilization
Mechanically transform code to immutable
Can be useful for verification, as it's way easier to work with pure/referentially transparent code.
Two ways:
- transform code to use immutable/persistent data structures, i.e. use/return updated versions of immutable structure. If datastructures are deeply nested, it can be a problem, since code becomes ugly with a direct translation (O(n^2)). One can use lenses to resolve this. Necessary code can be automatically generated.
- transform code to return a set of updates. So there can be destructive updates, but they are lifted to a top-level code (e.g. on_block, on_attestation). Can be useful for some styles of formal work (e.g. dynamic logic with updates) and for incremental storage implementation (or selective cache updates).
## Alternative versions of some functions/methods
FV to assure everything ok
## Mechanized Transpilation
It's relatively easy to work with Python AST, including its type annotations as it's supported by python distribution.
So, it's not very difficult to implement translations of a Python subset of limited size.
It can be implemented on various formality levels.
Initially, a straightforward (dumb) transpilation can be beneficial.
Then semantics can be added, to implement correct expression type inference. And implement static analyses.
Fully mechanized spec can be implemented too, in theory. In particular, one can build certified parsers.
spec kinds
reference impl
declarative spec
FV annotations
How to resolve contradictions above?
Spec is mostly pure.
Heap access patters are limited.
An idea is that most functions are pure or can be purified relatively easy, without losing much in readability (perhaps even gaining a bit).
However, we anticipate that some functions are difficult or even impossible to re-write in a pure functional style without losing readability (other properties).
So, the idea is to keep most functions pure, while have several versions of impure functions.
It's likely that these funcs are those which need optimization.
Idealized result
Formal semantics for statically-typed python subset
Python parser (either python standard or certified parser)
Data-flow analyses, conversion to SSA (certified)
Variable placement and type/expression inference
Static analyses (memory):
- escape
- point to
- alias
Purification:
- SSA/CPS
- persistent datastructures
- conversion to setuvupdates form
Optimized impls
- same py subset (other variants possible but require formal semantics for other langs and harmonization)
impure parts re-written
formal proofs for re-written parts and optimized versions
Benefits of Formal Methods
reasoning
- properties establishing
- high-level protocol properties (liveness, safety, etc)
- support for reasoning
- can be used to guide optimizations
- formal verification (invariants)
- bugs counter-examples to desired properties
- correspondence between implementations
- optimized spec parts
- reference implementations
- research implementations
- spec implementations
- code generation
- test generation
-->