# Modularization of beacon chain pyspecs Beacon chain pyspecs are a non-trivial software - in order to comprehend them one should decompose it into smaller parts - an important design principle in software engineering called [separation of concerns](https://en.wikipedia.org/wiki/Separation_of_concerns) (modularization). However, the best way to do such decomposition may depend on a purpose. Currently, pyspecs are split in phases, each phase split in *md* files, related to high level concerns like `core functionality`, `fork choice`, `validator`, `p2p-interface`, etc. And each file contains a number of python definitions (constants, classes and methods) along with some prose (a method can be seen as a module too). Such decomposition may work well for pyspecs developers and implementers, however, it is not enough from testing or formal method perspective. Currently, Formal Method developments ([Modeling and Verifying Gasper in Coq](https://github.com/runtimeverification/beacon-chain-verification/tree/master/casper/coq), [Formalization of the Beacon Chain Phase 0 Specification in K](https://github.com/runtimeverification/beacon-chain-spec), [Eth2.0 spec in Dafny](https://github.com/ConsenSys/eth2.0-dafny)) do not work with pyspecs directly. There are several reasons. First, there is lack of formal methods, tools and semantics for Python. Second, Formal Methods arre hard and labor-intensive, so Formal Method developments focus on parts of pyspecs and/or simplified models (of such parts). But even if these problems were overcome, the way pyspecs are currently structured still poses a problem for application of formal methods. As pyspecs are a moving target, such dissociation of pyspecs and formal developments poses a problem from development process point of view, as it creates a cyclic dependency: bugs can be found during formal verification, which will require pyspecs code changes, which in turn require corresponding updates of Formal Method developments. Moreover, some steps here are far from trivial and can take significant amount time to perform, resulting in a risk of multiple iterations and unpredictable delays. Software Quality is paramount for blockchains - once deployed, it can be very difficult to modify. At the same time, open-source code is available to attackers. So it's hardly possible to avoid and/or ignore such risk. # What is module? For the purpose of the post, we define module as an independent, interchangeable piece of software which contains everything necessary to execute an aspect or group of related aspects of the desired functionality. The definition adapted from [here](https://en.wikipedia.org/wiki/Modular_programming). We allow group of related aspects to (explicitly) support hierarchical modules. # Current state ## Existing modular structure As already noted, there is already some structure of pyspecs, related to modularization. The specs are split in phases. Each phase is split in number of files, each corresponding to a particular high level concern (like core concepts, fork choice, validator, etc). Additionally, there are supporting libraries, which defines aspects like BLS signatures, SSZ (data-structures, encoding, decopding), hashing (`hash`, `hash_tree_root`). Such libraries have clear interfaces and are easy to interchange with other implementation, e.g. to support testing. ## List of concerns Here is a rough list of aspects/concerns, which are good candidates to constitute a module, form Formal Verification perspective. - supporting - BLS - hash, hash_tree_root - shuffling, seed - SSZ field layout - Resource consumption - execution steps - memory consumption - caching - Mid level - fork choice - state transition - validator duties - High level - Casper FFG - LMD-GOST - Balances, rewards/penalties, leaks # Problems with the current approach ## Granularity Formal Methods typically require finer-grained modules. A method in a production code often comprises of several aspects/concerns intertwined in the method body. From Formal Method perspective, there are several problems: - the module specification should cover several aspects, which complicates the spec. It's less of a problem in the context of Software Engineering, since the latter is typically less formal and often omits such details - an aspect can introduce unwanted implicit dependency, e.g. aliasing - Formal specification describes pre-/post-condition at module boundaries, however, if several aspects are fused, it's more tricky to insert such annotations inside a module. ## Implicit dependencies In traditional Software Engineering, an objective is often to reduce efforts to develop code, which favors passing information in an implicit form. One example are destructive updates. A destructive update can create a dependency between modules, if there is aliasing, i.e. both modules can point to the same location (and at least one of them updates it). Formal Verification languages and frameworks have means to express such dependencies (e.g. dynamic frames, separation logic), but in regular languages it's often implicit. In addition, impure code often more efficient from resource consumption (CPU, memory) point of view. Other examples are exceptions, locking, etc. ## Leaky abstractions Without a mechanized proof checker, it's quite easy to break abstraction principles. Often engineers are not aware of "rules". And other times it can be non trivial to prove (either formally or informally) that some rule actually holds. It can be difficult to deal with such problems in presence of a mechanized proof checker, which is unable to "think" like a human. "Leaky" abstraction can arise when destructive updates are used. Let's consider an interface, which allows mutation, e.g. `MutableSequence`. Let's consider now two implementations of the interface, one which makes a copy of an object when it's added to a collection and one, which doesn't. The behavior of two implementations is different from aliasing point of view. In the context of Formal Methods, it will be tricky to write a specification, which allows both implementation and valuable in Formal Verification Context (i.e. excluding trivial options, like an empty specification). ## Interface inheritance Pyspecs developers deliberately avoid following OO paradigm. From Formal Verification's perspective this is actually not bad, since Object-Oriented is known to be a problem in formal verification context, at least, when destructive updates are present. However, one problem with Object-Oriented paradigm being [implementation inheritance](https://en.wikipedia.org/wiki/Composition_over_inheritance), while [interface inheritance](https://en.wikipedia.org/wiki/Subtyping) is generally considered as a good concept. It's especially true in Formal Verification context. Basically, with implementation inheritance there is a higher risk of contradicting subtyping. Note, however, that currently pyspecs themselves use inheritance only to define data-structures. However, the way phase definitions are [combined](https://ethresear.ch/t/how-to-combine-beacon-chain-phase-definitions/9953) can be seen as code inheritance too, though not explicitly expressed using Object-Oriented constructs. From Formal Verification point of view, interface inheritance, which conforms subtyping principle is required. Basically, Software Engineers can afford departure from strict subtyping, while it can be much more difficult to convince a mechanical proof checker that a methods behaves okay. # How to overcome the problems ## "Shadow" specs One way to deal with the problem (that traditional software engineers and formal method practitioners have different requirements to how specs should be decomposed) is to have two versions of specs and synchronized them. We assume that at least at the moment (and some future) traditional software engineering perspective is more important, we'll call specs developed with Formal Verification in mind "shadow" specs. The main question is how to efficiently synchronized such specs? ## Inlining We assume that from Formal Verification point of view, specs should be finer-grained, but it general it refines and details "Traditional Software" point of view. I.e. "Traditional Software" spec can in principle be obtained by coarsening "Formal Verification" specs, e.g. by inlining of some calls or definitions. ## Equivalence proofs After inlining, specs can still differ, due to details of concrete inline procedure, for example variable and definition names can differ, auxiliary variables can be introduces, complex expressions can be split in simpler ones and vice versa. In many cases, such specs can be proved equivalent with relatively simple means, i.e. symbolic execution, which performs [global value numbering](https://en.wikipedia.org/wiki/Value_numbering). ## Specs slicing As we mentioned above, some Formal Method developments deal with simplified models of parts of specs, i.e. [Gasper](https://github.com/runtimeverification/beacon-chain-verification/tree/master/casper/coq), which combined LMD-GHOST and Casper FFG, but ignores balance calculations. In some cases, one can strip pyspecs from details, which are irrelevant to a particular formal development (e.g. code which deals with balances). Such stripped-down version of spec can then be an input to a Formal Method development. For example, if specs changes do not affect the stripped-down version then corresponding F/M development doesn't need to be updated. <!-- Quality is paramount for blockchains. Need to maintain quality during development process. Need verification, but also testing/model checking. Modularity is important to do it efficiently. Especially, in with an agile (i.e. not a waterfall) approach. In order to avoid conflicts, which how pyspecs is developed right now, "shadow" spec may be required. Modularity is a very important concept in testing and/or verification. It's also important from software maintenance point of view. We focus here on software quality (e.g. testing and verification) point of view. From that point of view, current pyspecs are rather monolithic. There are however some structure (e.g. phases, separate `.md` files for different aspects (like fork choice, validator, p2p), which may be enough from an implementer's point of view. Given different perspectives, it may be important - and difficult - to provide convenient separation of concerns for different kinds of users. Design vs Analysis??? Requirements vs Design??? # Motivation Reduce efforts to test/verify Reduce efforts to adapt to changes # What does module mean? ## Methods as modules In the context of testing, corresponds to Unit testing. Also quite popular in formal verification, i.e. methods are annotated with pre- and post-conditions. However, method-granual modularization is too fragile, i.e. # Decidable fragments no loops parallel and similar loops Specs are changing ("Much of the essence of building a program is in fact the debugging of the specification." - Fred Brooks (of The Mythical Man-Month fame)). Phases are an additional dimension. Making sure that everything okay is a problem. Testing, reviews, static analysis, bounded verification, full blown verification. Modularity is important for efficient testing and verification. Implicit dependencies (e.g. aliasing). Annotations. Aspects/Concerns Modifying one aspect can trigger changes in others. Signatures Hashing Shuffling Store SSZ/Encoding State transition Quality Modularization to reduce efforts to be more flexible/agile Testing Model checking Verification Method-level - problems Opportunities no loops parallel or similar loops pure form # Pyspecs quality assuarance Pyspecs quality is ubiquitous. Pyspecs are a sfotware. # Formal methods Formal Verification is required, to enumerate worst-cases. # Dev process However, there is a dev process. # Lighter-weight quality assurance So, lighter-weight quality assurance is important too. # Goal Reduce efforts to reach quality goals. Formal verification can be very complex in the case of pyspecs. # What Pyspecs is a fusion of many concepts. BLS, hashing, shuffling, Casper, LMD-GHOST, SSZ, protocols, implementation choices, etc. Even if we have evidence that parts are okay, what about their combination. # Modularization We claim that proper modularization is an important component to reduce efforts. There are already simplified F/V developments. Basically, it's too hard to peeform formal verification for whole pyspecs, especially, when it's a moving target. It's natural to start from simpler models. So, the key problem is to assure that the parts are converging eventually to the same thing. # Modularization benefits Monolithic software is problematic to test/verify - the problem that efforts tend to grow super-linearly (e.g. polynomially, exponentially or even double-exponentially) relative to a piece size. Of course, there are decidable fragments, analysis efforts can grow linearly in practice. Thus, a general strategy is to split a (rather) monolythic piece in smaller components, which are more amenable to analysis and/or test coverage. Such pieces are also easier to attache with annotations, which are needed, e.g. to guide a prover or a test generator. # Shadow spec Pyspecs are the main venue between specs devs and implementers, so, it's undesirable to affect this. There can be of course intersection between how pyspecs modularized from human engineer's perspective and formal analysis's perspective. However, by default, it's better avoid negative implications. Which suggest building a "shadow" spec, intended for formal method applications. This raises a question - how to synchronize such specs? # State space The goal of Quality Assurance is to make sure that a software Modularization split code in smaller parts, to make something more manageable/scalable there is modularity and separation of concerns: - phases, md-files, functions - ssz, bls are separate (python) modules it's not enough from testing perspective, and even more - from formal verification perspective High level vs Low level "Bridge" to F/V devs aspects/concerns interfaces between modules BLS SSZ encode/decode hash, hash_tree_root field layout/structure resource consumption (execution steps, memory) Chris Reade: what to be computed computation sequencing in small steps memory management what to be computed: - signature - hashes Modularization Need split software in pieces, to comprehend/analyze it, make it scalable/manageable two(three) main goals: - communication with spec implementers - testing/model checking - formal verification F/V may need more granular decomposition: more aspects, reduce difficulty current (comm with spec implementers): - phases - md-files - methods (units) Not granual enough for F/V. A concern/aspect can be spread out through several methods. e.g. balances, seed. Interfaces. F/V need method specifications (e.g. pre-/post-conditions). They can be more difficult to attach, if modularization is not adequate. Modularization and F/V F/V is hard and labor-intensive. One of the ways to deal with complexity is to split problem into smaller ones. Given budget constraints, one often concentrate on most important aspects only. Proper modularization is thus crucial for F/V. Unfortunately, F/V typically needs modularization which differs from the one that software engineers do. One problem is that F/V needs **formal** specification of interfaces, whereas S/W engineers are typically okay with informal one. There are at least two aspects: - implicit dependencies - granularity ## Implicit dependencies ## Granularity So, how a software is modularized, split in parts along with corresponding interfaces can seriously affect efforts required to perform Formal Verification. Therefore it's highly likely that a particular F/V development focuses on some aspects of pyspecs (examples). However, pyspecs are moving target. Which raises a question: how such F/V developments should be synchronized with pyspecs code changes? In an ideal world, pyspecs should be split in parts, so that they match F/V requirements more or less closely. It's perhaps difficult to achieve in practice, since there S/W engineering requirements as well. Another approach would be to follow waterfall style. In practice, it's highly unlikely to succeed. Basically, F/V is likely to find problems, which will require pyspecs changes, etc. A practical way could be to automate, at least partially, such split/merge. -->