# (Py)Spec Formal Engineering slides: https://hackmd.io/@ericsson49/pyspec-formal-engineering --- ## My background - S/W engineer, mostly Financial S/W - Familiar with Formal Methods for a long time - Mostly around Coq and SMT/SAT solvers - didn't have a (good) chance to apply as S/W Engineer - Joined Eth2 research recently - have a great chance now --- ## My work @TXRX My work @TXRX/PegaSys/ConsenSys: - clock sync, DeFi, oracles - fork choice tests + a bit of bug fixing - pyspec transpilation to Kotlin - pyspec formal engineering But concentrate on the latter here --- ## (Py)Spec ***Formal*** Engineering - Want high quality financial S/W - Bugs can be costly - Even more important for blockchains - Lack of control after deployment ==> Use **Formal** Methods --- ## (Py)Spec Formal ***Engineering*** - Formal Methods are a mostly academic topic - But things are changing - (Very) difficult to employ in a typical industrial setting - ConsenSys, in fact, the first real opportunity for me ==> Concentrate on **Engineering** aspects --- ## ***(Py)Spec*** Formal Engineering - Concentrate on beacon chain pyspec, currently - But should be applicable for Distributed Protocols, in general - E.g. network layer (libp2p), other blockchains, oracles --- ## Key problem Expressive language means it's more difficult to reason about. ==> S/W engineers and FM practioners chase different goals, as a consequence. --- ## Key problem Beacon chain perspective - Formal Methods importance is recognized - But the spec development process is centered around traditional S/W engineering ==> FM practioners needs are largely ignored. My research tries to answer "What can be done about it?" --- ## Beacon chain problems (FM perspective) - Python is a bad choice from FM perspective - lack of tools, semantics, etc - pyspec is not a spec - not only from FM perspective - pyspec is a moving target - syncing can be costly - manual implementations in different languages - easy to introduce bugs --- ## Opportunities Good stuff - EF team is open to suggestions (but won't blindly accept them) - FM importance is generaly recognized - Can introduce simplifying assumptions - Transpilation is not crazy difficult --- ## My approach - "Engineer" the spec - Meta-programming - Tailor FM to match problem domain - Light(er) weight formal methods --- ## Engineer the spec - A spec is arguably the most important artifact - Especially true for the beacon chain case - "Engineer" the pyspec to make it suitable for applying FMs (and reach other goals) - most part of the spec is executable - pyspec - Define requirements for the spec - what does "good spec" mean? --- ## Tailor FM to problem domain - Constrain expressivity but gain simpler formal reasoning - Python subset employed by the pyspec is already restricted - type annotations, limited OO features, limited set of types, no concurrency - Simplified memory model - mostly about restricted aliasing --- ## Meta-programming - Transpile pyspec to a "better" form - pure functional - for formal needs - other langs - for more practical stuff - Executable semantics for a subset of Python - Pehaps, other execution models: - Extensions of Datalog - Differential dataflow --- ## Light(er)-weight FMs - Start from light-weight formal methods (static type checking) - easier to introduce - less annotations (easy to adapt to changes) - still useful - Continue with middle-weight stuff - extended static checking - bounded model checking - concolic execution - But aim at full-blown verification in a long term --- # Current state --- ## Transpilation - Semi-automatic translation to Kotlin - Regular pyspec updates (phse0/phase1) - Used as a base for Mikhail Kalinin's work on Eth1/Eth2 simulation - Several pyspec bugs found+fixed - Working on dataflow analyses - to support translations - starting point for semantic work - Working on a translation to Rust --- ## Research stuff Mostly ready, but not published yet - [Formal Engineering for Blockchains, part 1](https://hackmd.io/REqkGV21T8Kf86Rh_FFN_g) - [Formal Engineering for Blockchains, part 2](https://hackmd.io/mq7a6mOPTYacVux7fssL-Q) - [Formal engineering of beacon chain spec](https://hackmd.io/29oG2wvORLmUekkPnhOjpw) Started to work on "semantics" - typing rules - memory model --- # Plans --- ## Near term plans - Complete dataflow analyses - Fully automated translation to Kotlin - Complete translation to Rust - Implement executable semantics - typing rules - memory model (based on Rust's one) - Spec purification - Pure functional/logic language target - Static analyses - Switch to datalog engines (Souffle, Formulog) --- ## Longer term plans - Complete Formal Engineering series of posts - Trusted Assumption Base concept - Middle-weight Formal Methods --- # Semantics Highlits --- ## Executable "semantics" "Semantics" - a model which allows to tranform the python subset (employed by pyspec) to a "better" form. - pure functional (Coq, Stainless/Scala, WhyML) - other formal frameworks - mainstream langs (Kotlin, Rust, Scala, etc) --- ## Components - start from annotated Python AST - supported features - infer variable declarations - infer types - resolve ambiguities - type libraries (BLS, SSZ, pylib, etc) - typing rules - memory model --- # Memory Model --- ## Some assumptions - Actor model - Protocol is built around a persistent data structure (e.g. blockchain) - Ignore concurrency (inside an actor) - but should be possible in a safe way (e.g. Rust-style) ==> - Pure functions mostly - But mutation is important for state updates --- ## Formal Engineering perspective - Reasoning about mutation can be way more difficult - Pure code can be way more slow Want the best of both ways, under reasonable restrictions --- ## Aliasing Aliasing is the core problem. Several ways to solve: - detect 'no aliasing' at runtime reference - counting + static analysis (Lean, Whiley) - - avoid aliasing - linear/uniqueness types (ATS, Idris, Mercury, Clean, SAC, etc) - avoid "dangerous" aliasing - e.g. Rust's ownership type system with borrowing --- ## "Dangerous" aliasing - Two or more references pointing to the same location - At least one of them is mutable In a concurrent program, access to the location should be synchronized to avoid a *data race*. Rust ownership type + borrow checker prevents such dangerous aliasing. There's either: - one mutable reference - or multiple immutable references --- ## Reasoning and aliasing Formal reasoning about detructive updates is notoriously difficult for very similar reasons. - Formal properties can be seen as *immutable references* - Aliasing means more info about mutations to be propgated --- ## Memory model Typically, some simplifying assumptions are made. Often it's about restricting aliasing: the less aliasing, the less info to propagate. The Rust memory model is very attractive in this respect: - multiple immutable references constitute no problem, as there is no update possible - any mutable reference is exclusive, i.e. there are no other reference to the location. - the approach works well in practice --- ## Fast and Purious My hypothesess are: - the Rust's memory model fits nicely the beacon chain pyspec - the same is true about Distributed Protocol specs, in general - 'Fast and Purious', both goals are achievable: - fast execution (destructive updates) - translation to a pure code w/o blow-up (mutable references are exclusive) --- # Q&A ---
{"metaMigratedAt":"2023-06-15T14:06:20.710Z","metaMigratedFrom":"YAML","title":"(Py)Spec Formal Engineering","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"transition\":\"fade\",\"theme\":\"beige\"}","contributors":"[{\"id\":\"6a1f88c4-0c61-422e-bd1f-8fcbf4e2dad4\",\"add\":16394,\"del\":8533}]"}
    487 views