HackMD - Collaborative Markdown Knowledge Base

## Structural Comparisons of Smart Contracts The contents of this document address instances of contract specification for which current methods, mostly consisting of prose and proposition (i.e. writing properties in sentences), are not able to fully or rigorously express desirable properties. This is a weakness of current methods of contract specification which we address here through *morphisms of smart contracts*. A **morphism of smart contracts is a structural relationship between smart contracts**. As we will show, we are *relating* the possible execution pathways of two distinct smart contracts: sometimes showing that two contracts are equivalent to each other (essentially via a bisimulation); that one contract's functionality is fully present, or contained, within another (through an embedding); or that all the functionality of one contract conforms to patterns set by another contract (a quotient, or compression). The purpose of these morphisms is to let us express desired properties of smart contracts *by means of other smart contracts*, rather than by means of properties which we can articulate by definitions and written descriptions. As we will show, this enriches the language available to us to form specifications. As most attempts to do this consist of adapting the underlying logic of specification to a specific use case, this is a novel attempt to understand, express, and reason about the structure of smart contracts. Mathematicians routinely use functions not dissimilar to the ones we present here to understand complex mathematical objects in relation to each other. Properties that can be discovered and expressed through functions, rather than simply through properties that can be defined and described, tend to be more subtle and complex, and can also be more powerful. It is our thesis that enriching the language of specification with smart contracts will make us substantially more expressive, precise, and rigorous in contract specification and verification. The details of this document are slightly simplified for the sake of clear communication and understanding. ### Notation We first present some notation for the rest of this document: - We will use `C` and `C'` to indicate smart contracts throughout. - Every smart contract has a storage type `State`, a message type (the type of its entrypoints) `Msg`, and a function which takes a message `msg : Msg` and state `st : State` and outputs an updated state, along with a list of transactions to be executed. - We call this function `entrypoint`, and its type is `State * Msg -> State * (list Actions)`. - As a general rule of notation, `Msg` and `State` will be the message and storage types of `C`, and `entrypoint` its entrypoint function; `Msg'`, `State'`, and `entrypoint'` will be the corresponding types and functions of `C'`, and so on. ## Morphism of Contracts As we mentioned before, we call this structural relationship between smart contracts a **morphism of smart contracts**. Morphisms of smart contracts are functions between smart contracts. Recall that a smart contract `C` is characterized by its `Msg` and `State` types, as well as its `entrypoint` function. A morphism of contracts `C -> C'` is, then, a function between functions `entrypoint -> entrypoint`. In more explicit terms, a morphism of contracts `C -> C'` contains three component functions: - `msg_transform : Msg -> Msg'` - `state_transform : State -> State'` - `action_transform : list Action -> list Action` These functions relate the inputs and outputs of the entrypoint functions of both `C` and `C'`. In particular, a morphism of smart contracts requires that calling `entrypoint` with `m : Msg` and then transforming the resulting state `st : State` along `state_transform` and resulting actions along `action_transform` yields the same result as first transforming the message by `msg_transform` and then calling `C'`, which also yields a state. In other words, for all `msg : Msg` and `st : State`: ``` entrypoint ( state_transform st, msg_transform msg ) = let (st', actns) := (entrypoint st msg) in (st', acts) ``` We call a proof of the above statement a proof of **commutativity**, or the **commutativity result**. That's because we can represent this proposition as a square diagram, which we say *commutes*. These morphisms are literally functions between pure functions, so we will write them as such: given the three component functions `msg_transform`, `state_transform`, and `action_transform`, along with a proof of the commutativity statement above, we say that we have a morphism of contracts and write that type `C -> C'`. **These are best thought of as functions between smart contracts**, as they are functions that take into account the structure of smart contracts as pure functions. When considered as a class, these morphisms are too general to be useful in any systematic way. In the remainder of this document we will identify some subclasses of morphisms which each have their own useful properties. These are: 1. **Isomorphisms of contracts** : this relationship indicates a structural equivalence of non-identical contracts. 1. **Embeddings of contracts** : this indicates that the structure and functionality of one contract is fully present in another; this indicates some minimal functionality of a smart contract. 1. **Quotient of contracts** : this indicates that one contract can be compressed to match the pattern of another. In each of these cases, we will argue that (1) useful properties can be proved by using these functions, and (2) a specification which tries to capture these properties is hard to do rigorously with current methods. Current methods rely on propositions which can be written in prose. The examples we will give are precisely examples where the useful property is difficult to express precisely in prose, but straightforward to express by means of a smart contract. ### Some Basic Morphisms Before proceeding with examples of these three morphisms, we will discuss a couple specific morphisms and properties of morphisms which will become useful as we proceed further. These are the **identity morphism** and the **null morphism**. The **identity morphism** is a morphism `C -> C` which, as the name suggests, does nothing. Its components are: - `msg_transform := id` - `state_transform := id` - `action_transform := id` and the proof of commutativity is `reflexivity` (the trivial equality proof). This can be defined on any smart contract, so for a smart contract `C` we write its identity morphism `id_C`. The second is the **null morphism**, which a morphism `C -> Null_C` to the **null contract**. The null contract has message type `Type Null_Msg := Unit`, state type `Record Null_State := { u : unit }`, and its entrypoint function is the trivial (identity) function `entrypoint (st : State) (u : Msg) := st`. Then from any smart contract `C`, the null morphism has component functions: - `null_msg_transform (m : Msg) := tt` - `null_state_transform (st : State) := {| u := tt |}` - `null_action_transform := id` The proof of commutativity is always both possible and trivial, because the resulting state type `Null_State` only has one value and the entrypoint function is (necessarily) the identity. Finally, we note that **morphisms can be composed**, and that composition is associative. ## Isomorphisms (Strong Equivalence) The first class of morphisms we will consider are *isomorphisms of contracts*. These are morphisms which indicate structural equivalence between two smart contracts. This means that, while they are potentially unidentical smart contracts, for all intents and purposes isomorphic contracts are indistinguishable from the point of view of functional correctness (though not, for example, from the point of view of efficiency or human-readability). An isomorphism of smart contracts is a pair of morphisms - `f : C -> C'` - `g : C' -> C` such that composition either way results in the identity morphism. That is, - `compose g f = id_C`, and - `compose f g = id_C'`. An isomorphism indicates that the message and state types can be transformed to imitate each other. In other words, the difference between `C` and `C'` comes down to a relabelling of some kind, whether it be in relabelling entrypoint names, by using equivalent storage or message types, or internal code which is not identical but still functionally equivalent. From a functional perspective---relating inputs to outputs---these two contracts are identical even though their code might not be. Note that every contract `C` is isomorphic to itself via the identity morphism. ### Why would it be useful to know that two contracts are equivalent? The notion of equivalence of contracts lets us think about an *equivalence class* of a contract `C`. That is the set of smart contracts `{ C, C', C'', ... }` which are isomorphic to `C`. The contracts in this class all behave equivalently from a functional perspective, but they may have slightly different properties to them. For example: - Isomorphic contracts `C` and `C'` might differ in some metric of efficiency, including gas efficiency; - Isomorphic contracts `C` and `C'` might differ in human readability, amenability to formal methods and proof, or clarity. In fact, it is often the case that the more human-readable and -intelligible a piece of code is, the less efficient it is. The same can be true for code that is amenable to formal methods, or having some property proved of it. It may be easier to prove a specification true of `C`, and then perform isomorphic transformations on `C` to a more efficient or more conveniently-structured `C'`. Since `C` is isomorphic to `C'`, the properties proved of `C` still hold for `C'`, but now `C'` is more useful to actually be deployed. Equivalence classes are difficult to express in current methods. One could attempt to capture the behavior of `C` entirely in a specification, and then just ensure that the transformation `C'` of `C` matches that specification---however, one risks not having fully captured the behavior of `C` in the specification. One could also require that there be a bisimulation between `C` and `C'`, which is precisely an isomorphism of contracts. No specification is perfect or wholly comprehensive in identifying and preventing all bugs. When an engineer implements a contract `C` according to some specification, they make choices on many details of the contract. They may have a resulting example contract which is well-tested and human-readable that, aside from conforming to an explicit specification, may have additional properties which are subtle and difficult to express, but which the engineer intuitively *intends* to be part of a specification. You can see this in the wild, when specifications come accompanied with example implementations. This helps, in some sense, "fill in the gaps" on what was not expressed in the specification itself. With this in mind, a research engineer could produce code for a smart contract which has certain proven functionality matching a specification, along with some of these more subtle points of contract functionality expressed by the example implementation. Rather than assuming that the specification is comprehensive and developing a smart contract which only matches the specification, someone looking to deploy this contract in the wild might go further and write a contract which is isomorphic to the contract produced by the research engineer, but which differs perhaps in entrypoint names, efficiency, etc. Assuming the engineer implemented the specification faithfully, as well as these more subtle and desirable properties which were difficult to express in the specification, the resulting smart contract would conform to the output of the research in a stronger way than would an implementation that only matches the written specificaiton. ### Example: Equivalences Here we will illustrate the ways in which smart contracts can differ but remain isomorphic with three very simple examples. Fundamentally, we will show contracts which keep a string in storage, take an input string via entrypoints, and update their storage to `"Hello World" ++ input_string`. Contract `C` is given by: ``` Type Msg := string Record State := { stor : string } entrypoint (st : State) (msg : Msg) := {| stor := "Hello World" ++ msg |}, [] ``` Contract `C'` is given by: ``` Type Msg' := string Record State' := { stor' : string } entrypoint' (st : State') (msg : Msg') := {| stor' := "Hello " ++ "World" ++ msg |}, [] ``` Finally, Contract `C''` is given by: ``` Type Msg'' := string * unit Record State'' := { stor'' : string } entrypoint'' (st : State'') (msg : Msg'') := let (s, u) := msg in {| stor'' := "Hello World" ++ s |}, [] ``` Note that `C` differs from `C'` in the entrypoint function, as `C` concatenates `"Hello World"` with the input, but `C'` concatenates `"Hello "` and `"World"` with the input. This is analogous to optimized computation in the entrypoint function: they are different computationally, but functionally they are the same. In contrast, `C` differs from `C''` in the message types. `C` takes a string, while `C''` takes a (string, unit) pair. Of course, the unit type does not add any information, so getting an input of type `string` is equivalent to getting one of type `string * unit`. The isomorphisms, then, are as follows: ``` C -> C' is given by: msg_transform := id state_transform (st : State) := {| stor' := st.(stor) |} action_transform := id ``` That is, when going from `C` to `C'` we essentially change nothing -- the difference between the contracts is not in their message or state types, but rather in the internal workings of the contract. The reverse morphism of type `C' -> C` is analogous. It is in the fact that these morphisms can be defined via identity functions that we know the internal workings of `C` and `C'` are identical. ``` C -> C'' is given by: msg_transform (msg : Msg) := (msg, tt) state_transform (st : State) := {| stor'' := st.(stor) |} action_transform := id ``` while the reverse is: ``` C'' -> C is given by: msg_transform' (msg : Msg'') := let (s,u) := msg in s. state_transform' (st : State'') := {| stor := st.(stor'') |} action_transform := id ``` This isomorphism indicates that we can change the message type, using `msg_transform` and `msg_transform'`, to go between the contracts, and the results will always be the same. Furthermore, the functions `msg_transform` are an isomorphism of the underlying message types. The difference between `C` and `C''`, then, is external (in the entrypoint labeling) rather than internal (their entrypoint functions are identical). This points to the fact that a morphism of contracts `C -> C'` is an isomorphism if the underlying `msg_transform`, `state_transform`, and `action_transform` functions are isomorphisms (invertible functions). Because of associativity of morphism composition, isomorphisms compose to isomorphisms, so we have that each of `C`, `C'`, and `C''` are isomorphic to one another. ## Embeddings (Injections or Monomorphisms) The next kind of contract relationship that we will investigate is called an *embedding*. This is a morphism of contracts, `f : C -> C'`, which indicates that the structure of `C` is fully present in `C'`. You might say that `C` is a "sub contract" of `C'`---`C` does have the totality of the functional properties of `C'`, but it has some subset of them which can be identified via the embedding. Because of this, `C` will likely be simpler than `C'`. ### Definition An embedding of a smart contract `C` into `C'` is a morphism of contracts, where the component functions send distinct values to distinct values. In other words, given our component function - `msg_transform : Msg -> Msg'` For any two `m m' : Msg`, if `m <> m'`, then `msg_transform m <> msg_transform m'`. We require the analogous properties for the other component functions - `state_transform : State -> State` - `action_transform : Action -> Action`. This means that when the morphism of contracts transforms inputs and outputs of `C` into those of `C'`, to mimick the computation of `C` in `C'`, it does so in a faithful and fully structure-preserving way. Thus the embedding preserves the structure of `C` in `C'`, in that everything that can occur in `C` must by necessity also be able to occur in `C'` (up to some relabelling, like in entrypoint names, or different but equivalent types, etc). This kind of a structural relationship is particularly (though not exclusively) good for trying to prove that something is possible in `C'` by way of `C`. ### Example: Counter Embedding Consider the following counter contract `C`, which we give by its state, its message type, and the entrypoint function. ``` Record State := { counter : nat ; } Inductive Msg := | incr (n : nat) | decr (n : nat) Definition entrypoint (st : State) (msg : Msg) : option State := match msg with | incr amt => Some {| counter := st.(counter) + amt |} | decr amt => if st.(counter) - amt < 0 then None else Some {| counter := abs(st.(counter) - amt) |} end. ``` This contract keeps a natural number `counter : nat` in its storage, which can be modified by the `incr` and `decr` entrypoints. As long as the contract initializes with a nonzero value in storage, the `counter` will never go below zero. Counters like this are implicitly present in many smart contracts. One notable example is in a token contract, which keeps track of balances (and which cannot go negative). When counters are present, they are also often important to the integrity of a contract. In the case of the token contract, they are central as they keep track of balances. Because these implicit counters are often critical to the correct functionality of a smart contract, a specification verifying the correctness of a contract will likely include something about the implicit counter. Let us consider a contract `C'` in which there is an embedding of `C`. As before, the embedding consists of three functions - `msg_transform : Msg -> Msg'` - `state_transform : State -> State'` - `action_transform : Action -> Action'`. It also includes the proof of commutativity. This all means that anything that can be done in `C` can be mimicked in `C'` via the message, state, and action transform functions. In particular, `state_transform` will indicate which `nat` in storage we're targeting with this embedding, and `msg_transform` indicates some corresponding functions which result in an increment or decrement of the `nat` in storage. We note two things here: one thing that we can deduce because of this embedding, and another that we cannot. The first is that due to the embedding, we know that the `nat` in storage targetted by the embedding has no upper bound: if the correct entrypoints of `C'` are called, it is possible to make that particular `nat` in storage arbitrarily large. This is an affirmative property asserted by the fact that *all* the functionality of `C` must be present in `C'`. Recalling our example of a token contract, this embedding into a token contract would indicate that there is no upper limit on how many tokens can be minted. The second thing to note is that even though the `nat` in the storage of `C` can never go below zero, through the embedding alone we do not know that the same holds in `C'`. This illustrates that an embedding gives us information about the *minimal* structure of `C'`, *e.g.* asserting that something or some kind of behavior is possible, but does not always tell us the full range of behavior in `C'`. ### Version Control with Contract Embeddings As we mentioned before, the techniques presented here are meant to be useful in cases where a specification is difficult to describe in words. In the counter example, we could include in the specification that a particular `nat` in storage is not bounded above. To fully capture the properties proved by the embedding in words, we would also have to prove that there is a way to increment and decrement the `nat` in storage. All things considered, that particular case is relatively straightforward to do without embeddings. However, there are cases where one might want to assert that some minimal functionality of a smart contract is present, and that minimal functionality is difficult to specify in prose, using current methods. One such example could be when one expands or updates the code of a smart contract to include extra functionality, or to be optimized, and one wants to prove that the resulting code is backwards compatible. Depending on how development goes, the structure of the resulting code may obfuscate the original code, and it may not be obvious that the original functionality in its entirety still persists. To remedy this, one could effectively keep version control by working on a new version of a contract and proving an embedding of the old into the new. This way, one can be certain that the new contract is backwards compatible. Furthermore, the assertion of backwards compatibility is fully rigorous: without enumerating or testing all the possible functionality of the original contract, one can still assert that everything the old contract could do, the new one can do also. The embedding also indicates *how* backwards compatibility is preserved, in that it gives precise semantics on exactly what in the new contract corresponds to any given action from the old. Backwards compatibility is hard to specify holistically and precisely without the use of previous code. To do so, one would have to somehow enumerate in prose all the possible execution pathways of the previous contract. However, using contract embeddings, the statement that a contract `C'`, which is an updated version of `C`, is backwards compatible with `C` is both simple and fully comprehensive: that there exists an embedding of `C` into `C'`. ## Quotients (Compressions/Surjections) Our final example will be of quotients of contracts. In contrast to embeddings, which help us understand and assert minimal contract behavior (a contract must *at least* do `X`), quotients have to do with *maximal* structure of a contract. One way to use them can be to classify the bounds of certain contract behavior. If you have a quotient of `C` onto `C'`, then all possible behaviors of `C` *conform* in some sense to a pattern set by `C'`. In contrast to the embeddings, which classify `C'` by the contract `C`, quotients classify `C` by the structure of `C'` (the reverse). The more structure that is on the "compressed" contract `C'`, the more interesting the morphism (relationship) becomes (read: the stronger the proposition that the relationship proves). For example, every contract can be quotiented onto the null contract, but the null contract has essentially no structure to it so that is not a particularly useful fact to know. That particular quotient morphism only tells us that every contract has entrypoints, storage, and some functionality—something that we can observe, although it's not a totally trivial observation. ### Definition Like in the case of the embedding, the definition of a quotient morphism is in terms of its constituent functions. We say that `f : C -> C'` is a *quotient* morphism if each of - `msg_transform : Msg -> Msg'` - `state_transform : State -> State'` - `action_transform : list Action -> list Action`, are *onto functions*. For example, `msg_transform` is *onto* if for all `m' : Msg'`, there is some `m : Msg` such that `msg_transform m = m'`. Functions which are *onto* cover all of the codomain. In terms of a contract morphism, this indicates that if there is a quotient morphism `f : C -> C'`, all possible functionality of `C'` can be modeled by `C`. In contrast to embeddings, `C'` is likely to be simpler than `C`. As we mentioned before, a good way to think about this is as a "compression" of sorts, where `C'` represents a simpler contract than `C` for which all functionality must hold. As we will show in the following example, this can help us precisely and rigorously identify the limits of a contract's upgradeability features. ### Example: Simple version of "Upgradeability" The primary example we will use is of a contract with internal workings that can be modified by an entrypoint. Example of contracts that do this are often classified as DAOs---they include MakerDAO and other contracts which manage algorithmic stablecoins, each of which updates interest rates internally, and a DAO contract which allows users to change internal functionality explicitly through the entrypoints. Contracts whose internal functionality is modifiable range in the degree to which they can change (or upgrade) over time. Some contracts keep an anonymous function in storage, which can be called by an entrypoint, and which can be updated by another entrypoint, without restriction on what the anonymous function can do. While such contracts cannot modify the essential structure of their storage, the internal functionality can change arbitrarily within the type `Msg * State -> State * (list Actions)`. Consider our example contract which emits no actions: ``` Record State := { anon_fun : nat -> nat ; stor_nat : nat ; stor_string : string ; } Inductive Msg := | update_nat (n : nat) | update_fun (f : nat -> nat) | update_str (s : string). Definition entrypoint (st : State) (msg : Msg) : option State := match msg with | update_nat n => {| st with stor_nat := st.(anon_fun) n |} | update_fun f => {| st with anon_fun := f |} | update_str s => {| st with stor_string := s |} end. ``` This is a simple contract for which the internal functionality can be altered by calling `update_fun`; this then changes how the `update_nat` entrypoint works, so any proof of the behavior of the `update_nat` entrypoint must necessarily involve the `update_fun` entrypoint. We could call this contract "upgradeable" in a simple sense, in that one can upgrade how the `upate_nat` entrypoint functions. On the other hand, it is fairly straightforward to observe that, no matter how a user calls `update_fun`, the contract's functionality that manages the string in storage will never change. This is a bound on upgradeability and mutability of the contract, something which we might want to express more precisely in a higher-stakes and more complicated scenario. To show that the contract's upgrades will never affect the functionality managing the string in storage, we produce a quotient morphism from `C` onto the following contract `C'`: ``` Record State' := { stor_string' : string ; } Inductive Msg' := | null (u : unit) | update_str (s : string) Definition entrypoint' (st : State') (msg : Msg') : option State' := match msg with | update_str s => {| st with stor_string' := s |} | null _ => st end. ``` Note the similarities between `C'` and the null contract which we introduced earlier. The quotient morphism is given as follows: ``` msg_transform (m : Msg) : Msg' := match m with | update_nat _ => null tt | update_fun _ => null tt | update_str s => update_str' s end. ``` ``` state_transform (st : State) : State' := {| stor_string' := st.(stor_string ; |} ``` ``` action_transform := id ``` That our morphism is a quotient is clear: all the string-updating functionality of `C'` can be modeled by the analogous action in `C`. The difference between the contracts is that any action in `C` which updates `anon_fun` or `stor_nat` corresponds to the null action in `C'`. Importantly, this means that (a) the only entrypoint that can update the string is the `update_str` entrypont, and (b) despite the fact that `update_fun` changes the internal functionality of the smart contract, that "upgradeability" is fully limited to `stor_nat`, and not to `stor_string`: Every action that relates to the `update_fun` and `update_nat` functions are null actions from the point of view of the string in storage. Thus we have identified **bounds on the upgradeability** features of `C` by way of its quotient to `C'`. That is the core feature of quotient morphisms: they help us to specify very clearly what the bounds of the behavior and mutability of a contract can be, including what kinds of things one entrypoint *cannot* do within the contract. This is in contrast to contract embeddings, which are good for proving assertions about what *must* exist within a contract. While for this small example, that insight is straightforward to see, in larger and more complicated examples it would not necessarily be so. In practice, one can formulate the desired quotient contract `C'` by starting with `C` and nullifying the functionality of certain entrypoints or functions: for entrypoints this corresponds to routing them through a new `null` entrypoint, and for functions this corresponds to changing them to the identity. ### Specifying Bounds of Fully Upgradeable Contracts Bounds of upgradeability is one good example in which a rigorous specification is difficult to express using prose and propositions, but it is very easy to express by way of a contract morphism. In the example we gave above, we might do so by proving the proposition that any time we call `update_fun` or `update_nat`, the string stays the same before and after storage. It is simple in this case, just as it is simple to see. However, there are more complicated examples for which specifying the bounds of upgradeability with prose would be extremely difficult (perhaps even impossible) to do precisely, in contrast to simply building a skeleton contract which expresses this for us. Our primary example for this is a contract similar to that specified by the [Diamond upgradeability standard](https://eips.ethereum.org/EIPS/eip-2535) on Ethereum. This standard indicates how one can add, remove, and update entrypoints of an Ethereum smart contract. This in many ways could be considered a "fully upgradeable" smart contract. Aside from including mechanisms to change, add, and remove entrypoint functions of the contract, there is a basic and nontrivial structure to the smart contract which is invariant over changes to the entrypoints. This invariant structure is critical to the correct specification and verification of the contract, and so understanding the correct functionality of an upgradeable Diamond contract includes proving that the entrypoint upgrades do not alter this fundamental structure. As shown before, this can be done via quotients. [Full details on this particular example are out of scope of this document but will be included in a later writeup.] ## Conclusion We have identified three types of structural relationships between smart contracts, namely isomoprhisms, embeddings, and quotients. Each of them has their own set of properties which make them expressive in distinct instances of smart contract specification. The target scenario for which these morphisms might be useful is that in which a particular property to be specified is difficult to capture by using words. The examples we gave are: maintaining functional equivalence while optimizing (isomorphism), asserting backwards compatibility with upgraded contracts (embedding), and describing and asserting the bounds of a contract's mutability or upgradeability over time (quotient). Each of these examples are both highly relevant to modern contract specification and difficult to treat rigorously with current methods of contract specification and verification.