Formal Engineering for Blockchains, part 2

# Formal Engineering for Blockchains, part 2 # Lighter-weight formal methods A widely used idea to simplify introduction of formal methods is start from lighter formal methods, which suffer from less soundness or precision, but are more practical, i.e. requiring less annotation burden and/or less theorem proving. There is a wide spectrum of such methods, starting from static type checking to a full-blown verification. Actually, even when considering verification only, there can variations, depending on the amount of code base to be trusted, which can differ tremendously among approaches to verify software (e.g. see [here](https://ethresear.ch/t/optimizing-trusted-code-base-for-verified-precompiles/7340) some other link??). ## Annotation burden One way to describe a formal method 'weightiness' is the complexity of annotation, required to assist formal reasoner: simple methods can work with simple (often already existing) annotations, while full-blown verification requires not only extensive code annotations (like pre-/post-conditions, assertions, invariants, etc), but writing auxiliary lemmas and developing proof search procedures. The annotation burden can be a serious obstacle, especially during early development stages, where codebase can be quite unstable (it can help finding problems earlier too, though we assume this to be a separate activity, loosely coupled with the main s/w development process, e.g. like prototyping). - lightweight formal methods: static type checking - middleweight formal methods: [Extended Static checking](https://en.wikipedia.org/wiki/Extended_static_checking), [(bounded) Model Checking](https://en.wikipedia.org/wiki/Model_checking) - formal verification ## Weaker soundness/precision Another factor of 'weightiness' is degree of simplifying assumptions. Such assumptions can make formal reasoning faster and/or easier to implement, at the cost of reduced precision or soundness. One can also implement efficient reasoning, tailored to a particular domain (i.e. simplified relative to a general case), however, it's easy to make a mistake in the implementation of the reasoner. In other words, the reasoner should be trusted (that it works correctly). Such trusted code can be introduced less directly - e.g. by implementing a translation to another language or a transformation to another form. The new form can be easier to reason about and/or the transformation can be implemented relatively easy, however the transformation increases TCB. Starting with lighter-weight formal methods thus can bring benefits with relatively minor impact on the existing development process and/or with moderate increase of costs. The benefits will be limited too. However, it brings us to a better position for introducing heavier approaches, including more extensive annotations and reducing TCB by developing corresponding proofs and certificates. # Tailoring formal methods to a domain of interest Another idea is to tailor formal methods to the particular domain of interest, which can reduce efforts significantly, by exploiting the domain specifics. The other important aspect is that there can be a significant amount of efforts already invested in the existing software and tools. Which can make certain formal methods easier or more difficult to apply. Some formal method challenges are related to a form, via which an idea is expressed, e.g. an algorithm can be written in a pure programming language or in an imperative language, arbitrary of fixed precision arithmetic can be used, etc. Some of the forms are easier to formally reason about, however, they often result in a slower performance. It can also be the case, that some features (like destructive updates) are more convenient to s/w engineers. So, there are opportunities to maneuver: one can use forms, which can simplify formal work, however, it likely becomes less convenient for engineers. In order to alleviate the contradiction, we propose to use other principles discussed in the write up, i.e. lighter-weight formal methods and meta-programming. We have already touched the former, and will discuss latter in more details in the following section. ## Solving destructive updates problem A simple example how it can help, considering blockchain protocol development. We have already discussed that destructive updates is a major source of problems, since they introduce hidden dependencies. More precisely, problems may arise when dangerous aliasing happens: some expressions reading a memory location may become invalidated, when someone updates it. While such aliases to the same memory location can be spread though the code. Re-writing code so that such dependencies are explicit, e.g. using non-destructive updates, is possible. But such code is more difficult to read and may be inconvenient to write, for a general audience. Performance can degrade dramatically. We have also discussed, that certain type systems like Rust's [ownership](https://doc.rust-lang.org/1.8.0/book/ownership.html) system, can prevent dangerous aliasing. While it's more restrictive, it allows destructive updates, resulting in fast, but safer software. Due to restrictions, there can be at most one mutator for any location at each point of time, thus it's not difficult to mechanically transform such code to a non-destructive (pure) form, which is much easier to reason about formally. So, the example can be seen as a combinations of three (four) principles, discussed in the write up: - heap access semantics, tailored to the problem domain, which is expressive enough, but prevents dangerous aliasing - static type checking (a lightweight formal method), to enforce the semantics rules - meta-programming to transform the restricted set of programs to a non-destructive form - meta-programming can be also employed to facilitate type checking, i.e. to transpile blockchain specification program to Rust and leverage its typechecker It can be also seen as an example of 'formal spec engineering': one takes an executable specification in some language (e.g. Python in the case of beacon chain specification), but applies a modified formal semantics to it, which (further) restricts the set of admissible programs, in exchange of better reasoning support. # Engineering of formal specification Formal specification is a central concept in formal methods - mechanized reasoning needs formally defined concepts and rules. In the traditional software engineering, specifications are often informal. Moreover, 'formal' specification from a traditional s/w engineer point of view won't be typically considered formal enough by a formal method practitioner: there are likely to be several aspects, which are clear to human beings, but are ambiguous or difficult to mechanized reasoners. For example, an executable specification/reference implementation can be considered as a 'formal specification'. Indeed, it can be executed using some 'mechanism': programming language implementation plus some hardware. However, formal reasoning about is likely to be extremely difficult, as there can be several intermediate layers of concepts. Whether it is a specification or not - can be questioned too, though we consider a specification to be a set of programs (adhering to the spec). Providing a sample program is thus a valid (but perhaps somewhat 'degenerate') way to specify a set. Actually, one can think of a set of programs, which have the same behavior under all possible inputs. Moreover, from an engineering point of view, a reference implementation can use non-deterministic constructs or allow user-defined extensions (e.g. via subclassing, interface implementation, etc). However, both non-determinism and user-defined constructs can hinder formal reasoning. This is especially true for object-oriented programming languages with imperative features. Non-determinism also can significantly reduce usefulness of a reference implementation, as a reference output can vary. Specification 'formality', understood as suitability for application of formal methods, is thus the core problem for introducing the methods into traditional s/w development process. We address the problem by formal specification engineering: we treat specification as a central artifact, which should be carefully 'engineered' to reach necessary goals. For an example, requirements to the specification formality level should be stated too. Another example can to augment a not-so-formal specification with necessary semantics to make it formal-enough (from a formal method practitioner's point of view). One way could be a transpilation (meta-programming), which we consider next. # Meta-programming One more idea is to employ meta-programming (e.g. transpilation) as a tool of formal software engineering. Mechanical transformations of programs is an important area of formal verification. In our 'engineering' approach we can trade some formality for faster progress. In the industrial software engineering one works with multiple implementations of the same concept, which suit different needs. E.g. some of them easier to read, some are more performant (and some are easier to 'read' by a mechanical reader). Discrepancies between such views is an important source of bugs. Since s/w development is a process, discrepancies can arise, when one such view is updated, while another is not or updated in a wrong way. Actually, one can (or should) see a specification as a central concept, while the specification implementations are 'views' that should be kept 'synchronized' to the spec (and between each other). In a perfect world, 'synchronized' would mean equivalent (agree under every admissible input). In reality, 'synchronized' is a weaker concept (e.g. implementations can agree on a finite set of testing/validating inputs). Establishing the 'sync' and maintaining it as the system evolves can be very resource consuming, if done manually. Hopefully, some pairs of 'views' can be 'synchronized' via a mechanized transformation, e.g. compilation, transpilation, template/macro expansion and other forms of meta-programming. In addition to traditional meta-programming, we propose formal method specific ideas here. ## Executable semantics We consider beacon chain protocol as an example here, however, the idea is applicable to other cases. Currently, a core part of the beacon chain protocol is specified with a statically-typed subset of Python, which together with corresponding runtime libraries and tools forms a reference implementation (of the core of the beacon chain protocol). Python language is not the best choice from formal methods point of view, as there is lack of corresponding tools. So, such a specification is not a good starting point for a formal work. There are several ways, how the problem can be resolved. The code can be ported to a language more suitable for a formal work. However, such change can be too dramatic, as it can become less readable and, perhaps more importantly, it will be more difficult to contribute to the spec development, as most s/w engineers are not formal method experts. Python is a very popular and easy to use language, so entry barrier will becomes higher, after such change. A variation can be to port the specs to some other language, but keep the py-specs as the main source. For example, there are specs in k-framework and Dafny. However, keeping twos specs synchronized can be resource consuming. Another solution is to give formal semantics to Python language or rather to the subset actually used, which reduces efforts significantly. Or re-use existing formal semantics. The approach allows to keep the py-specs as the main source of requirements, while formal specification can be obtained more or less mechanically - depending on how close the semantics fit the Python subset used. One drawback of using existing semantics is that it can be too general (or too specific), which means that it can me more difficult to reason about. If the py-specs uses a limited set of Python features and/or rather simple patterns of heap usage, then a lighter-weight semantics can be defined, which also can be easier to reason about (e.g. automatic reasoning tools can better handle it). One example can be object-oriented features, which are notoriously difficult to formally reason about in a general case - py-specs uses such Python features in a very limited way. Another example is heap model, which also can be quite difficult to reason about in a general case. While the py-specs uses destructive updates (which complicates things from a formal point of view), aliasing (which is the main source of problems) is either absent or low, due to blockchain protocol nature. Py-specs uses datatypes in a limited way too, e.g. it doesn't use floating point or complex numbers. Therefore, it makes sense to develop a custom semantics which exploits the limitation to simplify semantics and/or formal reasoning about it. One way to implement the semantics is transpilation, i.e. a transpiler to the Python subset used can be implemented, translating the py-specs code into a language of choice. Actually, since it's a transpilation of a subset, it's not very difficult to implement and several such translations can be performed. ## Generating implementations in different languages A transpiler can be used to translate the py-specs into other languages. While it's not specific to formal methods, there is a specific case for it in formal software engineering. The py-spec is developed to be readable and understandable, rather than high performance (it's a part of specifications). So, an industrial-strength implementation will replace some parts of the py-spec code with faster algorithms, e.g. fork choice, caching/memoization, shuffling, etc. It's easy to introduce a bug, during such optimizations *much easier then when translating the py-specs one-to-one). An additional problem is that there are several implementations in different languages, so there are more opportunities to introduce a bug. Formal verification together with transpilation can be a solution, i.e. an optimized version of py-specs could be written, which can be transpiled to different implementation languages. [Part 3](https://hackmd.io/29oG2wvORLmUekkPnhOjpw)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.