My background
S/W engineer, mostly Financial S/W
Familiar with Formal Methods for a long time
Mostly around Coq and SMT/SAT solvers
didn't have a (good) chance to apply as S/W Engineer
Joined Eth2 research recently
My work @TXRX
My work @TXRX/PegaSys/ConsenSys:
clock sync, DeFi, oracles
fork choice tests + a bit of bug fixing
pyspec transpilation to Kotlin
pyspec formal engineering
But concentrate on the latter here
(Py)Spec Formal Engineering
Want high quality financial S/W
Even more important for blockchains
Lack of control after deployment
==>
Use Formal Methods
(Py)Spec Formal Engineering
Formal Methods are a mostly academic topic
(Very) difficult to employ in a typical industrial setting
ConsenSys, in fact, the first real opportunity for me
==>
Concentrate on Engineering aspects
(Py)Spec Formal Engineering
Concentrate on beacon chain pyspec, currently
But should be applicable for Distributed Protocols, in general
E.g. network layer (libp2p), other blockchains, oracles
Key problem
Expressive language means it's more difficult to reason about.
==>
S/W engineers and FM practioners chase different goals, as a consequence.
Key problem
Beacon chain perspective
Formal Methods importance is recognized
But the spec development process is centered around traditional S/W engineering
==>
FM practioners needs are largely ignored.
My research tries to answer "What can be done about it?"
Beacon chain problems (FM perspective)
Python is a bad choice from FM perspective
lack of tools, semantics, etc
pyspec is not a spec
not only from FM perspective
pyspec is a moving target
manual implementations in different languages
Opportunities
Good stuff
EF team is open to suggestions (but won't blindly accept them)
FM importance is generaly recognized
Can introduce simplifying assumptions
Transpilation is not crazy difficult
My approach
"Engineer" the spec
Meta-programming
Tailor FM to match problem domain
Light(er) weight formal methods
Engineer the spec
A spec is arguably the most important artifact
Especially true for the beacon chain case
"Engineer" the pyspec to make it suitable for applying FMs (and reach other goals)
most part of the spec is executable - pyspec
Define requirements for the spec
what does "good spec" mean?
Tailor FM to problem domain
Constrain expressivity but gain simpler formal reasoning
Python subset employed by the pyspec is already restricted
type annotations, limited OO features, limited set of types, no concurrency
Simplified memory model
mostly about restricted aliasing
Meta-programming
Transpile pyspec to a "better" form
pure functional - for formal needs
other langs - for more practical stuff
Executable semantics for a subset of Python
Pehaps, other execution models:
Extensions of Datalog
Differential dataflow
Light(er)-weight FMs
Start from light-weight formal methods (static type checking)
easier to introduce
less annotations (easy to adapt to changes)
still useful
Continue with middle-weight stuff
extended static checking
bounded model checking
concolic execution
But aim at full-blown verification in a long term
Transpilation
Semi-automatic translation to Kotlin
Regular pyspec updates (phse0/phase1)
Used as a base for Mikhail Kalinin's work on Eth1/Eth2 simulation
Several pyspec bugs found+fixed
Working on dataflow analyses
to support translations
starting point for semantic work
Working on a translation to Rust
Research stuff
Mostly ready, but not published yet
Started to work on "semantics"
typing rules
memory model
Near term plans
Complete dataflow analyses
Fully automated translation to Kotlin
Complete translation to Rust
Implement executable semantics
typing rules
memory model (based on Rust's one)
Spec purification
Pure functional/logic language target
Static analyses
Switch to datalog engines (Souffle, Formulog)
Longer term plans
Complete Formal Engineering series of posts
Trusted Assumption Base concept
Middle-weight Formal Methods
Executable "semantics"
"Semantics" - a model which allows to tranform the python subset (employed by pyspec) to a "better" form.
pure functional (Coq, Stainless/Scala, WhyML)
other formal frameworks
mainstream langs (Kotlin, Rust, Scala, etc)
Components
start from annotated Python AST
supported features
infer variable declarations
infer types
resolve ambiguities
type libraries (BLS, SSZ, pylib, etc)
typing rules
memory model
Some assumptions
Actor model
Protocol is built around a persistent data structure (e.g. blockchain)
Ignore concurrency (inside an actor)
but should be possible in a safe way (e.g. Rust-style)
==>
Pure functions mostly
But mutation is important for state updates
Formal Engineering perspective
Reasoning about mutation can be way more difficult
Pure code can be way more slow
Want the best of both ways, under reasonable restrictions
Aliasing
Aliasing is the core problem. Several ways to solve:
detect 'no aliasing' at runtime reference - counting + static analysis (Lean, Whiley) -
avoid aliasing - linear/uniqueness types (ATS, Idris, Mercury, Clean, SAC, etc)
avoid "dangerous" aliasing - e.g. Rust's ownership type system with borrowing
"Dangerous" aliasing
Two or more references pointing to the same location
At least one of them is mutable
In a concurrent program, access to the location should be synchronized to avoid a data race .
Rust ownership type + borrow checker prevents such dangerous aliasing. There's either:
one mutable reference
or multiple immutable references
Reasoning and aliasing
Formal reasoning about detructive updates is notoriously difficult for very similar reasons.
Formal properties can be seen as immutable references
Aliasing means more info about mutations to be propgated
Memory model
Typically, some simplifying assumptions are made. Often it's about restricting aliasing: the less aliasing, the less info to propagate.
The Rust memory model is very attractive in this respect:
multiple immutable references constitute no problem, as there is no update possible
any mutable reference is exclusive, i.e. there are no other reference to the location.
the approach works well in practice
Fast and Purious
My hypothesess are:
the Rust's memory model fits nicely the beacon chain pyspec
the same is true about Distributed Protocol specs, in general
'Fast and Purious', both goals are achievable:
fast execution (destructive updates)
translation to a pure code w/o blow-up (mutable references are exclusive)
Resume presentation
(Py)Spec Formal Engineering slides: https://hackmd.io/@ericsson49/pyspec-formal-engineering
{"metaMigratedAt":"2023-06-15T14:06:20.710Z","metaMigratedFrom":"YAML","title":"(Py)Spec Formal Engineering","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"transition\":\"fade\",\"theme\":\"beige\"}","contributors":"[{\"id\":\"6a1f88c4-0c61-422e-bd1f-8fcbf4e2dad4\",\"add\":16394,\"del\":8533}]"}