When Invariants Aren't: DAI's Certora Surprise

# When Invariants Aren't: DAI's Certora Surprise Authors: [Kurt Barry](https://twitter.com/Kurt_M_Barry) (MakerDAO), [Uri Kirstein](https://twitter.com/KirsteinUri) (Certora) ## Introduction > "It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so." - *Mark Twain* Multi-Collateral DAI, the stablecoin managed by MakerDAO, is one of the largest cryptocurrencies. Launched in November 2019, it has, as of May 2022, a market cap of roughly 6.5B$ backed by about 10B$ of collateral. Ancient by crypto standards, if its smart contracts contained any bugs, they should have been found long ago. Surprisingly, it was discovered that a relationship between system quantities considered to be an invariant, known since 2018 and even thought to be mathematically [provable](https://hackmd.io/lWCjLs9NSiORaEzaWRJdsQ?view), did not hold. This anomaly was found using the Certora Prover. Fortunately, no funds were or are at risk. This finding is an important reminder that one must be very careful to distinguish between things that are likely true and those that are actually proven (keeping in mind that proofs can contain mistakes too!). ## The Certora Prover To keep DAI and its backing collateral safe, the MakerDAO Protocol Engineering Core Unit (PECU) uses a variety of processes and technologies. Code changes undergo extensive internal review and must pass unit and integration tests. Mistakes are searched for via red teaming, fuzzing, and external audits. MakerDAO also has an Immunefi [bug bounty program](https://immunefi.com/bounty/makerdao/). Another essential piece of the PECU's security infrastructure is formal verification. Formal verification tries to prove that a program behaves according to some specification (generally defined separately). The system's code is transformed into a logical formula, and then mathematical methods are used to attempt to prove the properties described in the specification. The great advantage of Formal Verification is its exhaustiveness - all execution paths are checked, so we encounter rare edge cases. Certora offers an automatic formal verification tool called the Certora Prover. The MakerDAO PECU team has been writing specifications and verifying them with the Prover since June 2021. ## The Maker Protocol The Maker protocol consists of the smart contracts that create and sustain the DAI stablecoin. Some changes and updates to the contracts were needed in preparation for deployment on other chains (including layer 2 solutions like rollups). The Certora prover was used to perform formal verification on the modified contracts. In addition to function-level proofs similar to the original verification done with the K Framework, the ability of the Certora Prover to express contract-level properties (i.e. properties that hold for all functions of a contract) was leveraged. The most crucial such property is an invariant called the "Fundamental Equation of DAI". ## The Fundamental Equation of DAI DAI is backed by debt. Debt is either assigned to a Vault, meaning it is associated with a lien against some collateral asset, or it is "unbacked" (also: "bad"), meaning it is the protocol's (i.e., MKR holders') responsibility. These two sources of debt, when added, should equal the sum of all DAI balances. Mathematically, we can write: $$ \sum_u \text{dai}_u = \sum_{i,u}\text{tab}_{i,u} + \text{vice} $$ Some definitions are in order for the above equation to make sense: $\text{dai}_u$ refers to the DAI balance of a user $u$, $\text{tab}_{i,u}$ refers to the debt of the Vault belonging to user $u$ with collateral type $i$, and $\text{vice}$ is the unbacked debt. Sums can be assumed to run over all valid values, with the convention that values of $\text{dai}_u$ and $\text{tab}_{i,u}$ default to zero until modified. In the context of Ethereum, the notion of "user" coincides with "Ethereum address" and may be a smart contract or an externally-owned account. The source of truth for DAI's accounting resides in a contract called the Vat. The Fundamental Equation of DAI (FEoD) can thus be expressed in terms of the Vat's storage variables. Because it is infeasible to loop over all Vaults when collecting borrowing fees, the Vat doesn't precisely store all the values that most naturally express the Fundamental Equation. The simplest way to express the FEoD in terms of Vat storage variables is: $$ \text{Vat.debt} = \text{Vat.vice} + \sum_i \text{Vat.ilks}[i]\text{.Art} \cdot \text{Vat.ilks}[i]\text{.rate} $$ where the sum ranges over all valid collateral types. $\text{Vat.debt}$ is an _accumulator_ value that stores the sum over all DAI balances. $\text{Vat.vice}$ is just the total bad debt (it is also technically an accumulator since bad debt is always "assigned" to some address). $\text{Vat.ilks}[i]\text{.Art}$ is yet another accumulator that is the sum over "normalized" debt for a given collateral type ("ilk"). Normalized debt is a value that when multiplied the appropriate `rate` (which will be defined next) gives the current DAI debt. $\text{Vat.ilks}[i]\text{.rate}$ is a different kind of accumulator, a _rate_ accumulator. It tracks the total compounded fee accrued for a given collateral type since inception. For example, if a collateral has a constant stability fee of 5% annually and has existed for two years, its $\text{rate}$ in the Vat will be $1.05^2 = 1.1025$. If at that point someone drew 110.25 DAI from a Vault of that collateral type, the associated increase in $\text{Art}$ would be $110.25 / 1.1025 = 100$. More information on how fee calculations work in the Maker protocol can be found in the [documentation](https://docs.makerdao.com/smart-contract-modules/rates-module). Note that to truly prove the FEoD, the correctness of all accumulators should be established (i.e., they really equal the sums or products that they should). However, in the interest of brevity, here we'll just focus on the relationship between the accumulators. Now we have all the protocol-specific background that we need to understand the attempt to prove the FEoD using Certora's technology. ## Writing the Specification **Note:** A concise specification highlighting the bug is publicly available in a self-contained [repository](https://github.com/kmbarry1/fund-eq-of-dai-certora) with the code under test at the time of discovery. It contains instructions on how you can try it out yourself. Specifications for the Certora Prover are written in the Certora Verification Language, or CVL. In CVL, there is a construct called an _invariant_, which represents a property that the system should always have, no matter what changes it undergoes. Invariants are proven by [induction](https://en.wikipedia.org/wiki/Mathematical_induction). First, it is proven that an invariant holds right after the call to the constructor; this is the induction basis. Then, for each function in the contract, the invariant is assumed to hold before its invocation and proven to also hold afterward. Verifying each function is the inductive step. The invariant appears at the end of the [Vat.spec](https://github.com/kmbarry1/fund-eq-of-dai-certora/blob/master/Vat.spec) file in the condensed example: ``` invariant fundamental_equation_of_dai() debt() == vice() + sumOfVaultDebtGhost() filtered { f -> !f.isFallback } ``` The first line declares the invariant like a function - it has a name and arguments (none, in this case). The second line is the important part - the logical formula that describes the FEoD per the discussion above. The `debt()` and `vice()` expressions call Vat functions that read corresponding storage slots. Note that these are declared `envfree` (meaning they do not depend on call context like `msg.sender`) at the top of the file: ``` methods { debt() returns (uint256) envfree vice() returns (uint256) envfree } ``` The third line of the invariant tells the prover not to consider the Vat's fallback function, which always reverts and is thus irrelevant (if not done, a "rule sanity" warning would be issued by the prover). The `sumOfVaultDebtGhost()` in the second line of the invariant is a function not present on the contract. It is computed as a _ghost_. Ghosts model contract state that isn't directly available from a function call. No contract function returns the sum over all Vault debt, so we must use a ghost. More detailed information on ghosts can be found in the [Certora documentation](https://docs.certora.com/en/latest/docs/ref-manual/cvl/ghosts.html). Although only one ghost is explicitly used in the invariant, three are defined: ``` ghost sumOfVaultDebtGhost() returns uint256 { init_state axiom sumOfVaultDebtGhost() == 0; } ghost mapping(bytes32 => uint256) rateGhost { init_state axiom forall bytes32 ilk. rateGhost[ilk] == 0; } ghost mapping(bytes32 => uint256) ArtGhost { init_state axiom forall bytes32 ilk. ArtGhost[ilk] == 0; } ``` The other two (`rateGhost` and `ArtGhost`) are needed to construct the debt sum ghost. They represent, respectively, the cumulative compound interest and the total "normalized debt" for a given collateral type. Note that they are _mappings_ from a collateral identifier to its respective quantity, i.e. each ghost tracks a different value for each collateral type. The `init_state axiom ...` expressions define the initial values the ghosts should have right after calling the constructor and are set to zero to match Ethereum's behavior. Recall that the product of the `rate` and `Art` for an `ilk` gives the total DAI debt backed by loans against that `ilk`. Ghosts are updated using _hooks_, which modify ghost variables based on particular program behaviors. There is again [more detailed information](https://docs.certora.com/en/latest/docs/confluence/anatomy/hooks.html#) in the documentation for the curious. Vat.spec uses two types of hooks -`Sload` and `Sstore`. Consider the two `Sload` hooks first: ``` hook Sload uint256 v currentContract.ilks[KEY bytes32 ilk].(offset 0) STORAGE { require ArtGhost[ilk] == v; } ``` ``` hook Sload uint256 v currentContract.ilks[KEY bytes32 ilk].(offset 32) STORAGE { require rateGhost[ilk] == v; } ``` The syntax may look complex, but conceptually these hooks are very simple: any time a `rate` or `Art` value is read from contract storage, the corresponding ghost is forced to be equal to the read value. This ensures the ghost values are consistent with whatever is in storage whenever they are used. The `Sstore` hooks take effect when either a `rate` or `Art` is written to storage, as opposed to read. Both the old and new values stored are available within the hook body. These values are used to update the `ArtGhost[ilk]` or `rateGhost[ilk]` value similarly to the `Sload` hooks, but also to update the `sumOfVaulDebtGhost` value according to the relationship defined earlier between `rate`, `Art`, and the DAI debt of an `ilk`. ``` hook Sstore currentContract.ilks[KEY bytes32 ilk].(offset 0) uint256 newArt (uint256 oldArt) STORAGE { havoc sumOfVaultDebtGhost assuming sumOfVaultDebtGhost@new() == sumOfVaultDebtGhost@old() + (newArt * rateGhost[ilk]) - (oldArt * rateGhost[ilk]); ArtGhost[ilk] = newArt; } ``` ``` hook Sstore currentContract.ilks[KEY bytes32 ilk].(offset 32) uint256 newRate (uint256 oldRate) STORAGE { havoc sumOfVaultDebtGhost assuming sumOfVaultDebtGhost@new() == sumOfVaultDebtGhost@old() + (ArtGhost[ilk] * newRate) - (ArtGhost[ilk] * oldRate); rateGhost[ilk] = newRate; } ``` These `Sstore` hooks make `ArtGhost` and `rateGhost` necessary. It is not possible to call contract functions within hooks, as they are applied during program execution. Functions may not behave as expected since, e.g., the contract's storage might be in an unexpected state. As mentioned, this specification is not _sufficient_ to fully establish the FEoD. A complete proof would also require establishing the correctness of the `debt` and `Art` accumulators, among other things. However, any violation of this invariant **is** sufficient to demonstrate that the FEoD does not hold. Said another way, if the Vat contract is correct according to our expectations, this invariant should hold. Let's see what happens when we try to prove the FEoD! ## It Doesn't Hold!? The first thing we notice upon running the prover is the very end of the command line output: ![](https://i.imgur.com/M8mmzST.png) We can see immediately that there is a violation of the invariant, and it involves the `Vat.init` function. However, it's premature to conclude that the FEoD doesn't hold. The Certora Prover uses _overapproximation_ in its proving techniques, which means it may sometimes report violations which are false positives. However, barring bugs in the Prover itself, false negatives should never occur. Thus we can be confident of successful proofs, but must always check violations. Fortunately, the Prover always produces a concrete counterexample - a combination of inputs, storage, and call context that breaches the property - for any violation it finds. This makes it easy to check whether a violation is legitimate. We can inspect the counterexample by loading up the "Verification report" link in a web browser. We get a view like this: ![](https://i.imgur.com/UluU5eH.png) The next step is to drill down into the "fundamental_equation_of_dai" item under "Results". Doing so until clicking on "init(bytes32)" shows something like: ![](https://i.imgur.com/Sm79PHr.png) The "Call Trace" subwindow contains the information needed to extract the counterexample. Expanding "assume invariant in pre-state" shows that we're starting off with zero values for `debt()`, `vice()`, and `sumOfVaultDebtGhost()`: ![](https://i.imgur.com/OqGZ8cp.png) Looking at the post-state reveals that somehow `sumOfVaultDebtGhost()` became non-zero after the execution of `init`, which of course is a violation. Note that the exact numbers in the counterexample might differ in your own run of the tool. ![](https://i.imgur.com/hOyFFUX.png) But why? We can see the `Sstore` hook being applied during the function's execution: ![](https://i.imgur.com/idN71cZ.png) All the information needed is there, although some work is necessary to realize what happened. First, we need to understand how `init` works: for a new collateral type, `init` sets its `rate` to the fixed-point equivalent of one (in this case, 10^27) so that future compounding calculations work correctly. ``` function init(bytes32 ilk) external auth { require(ilks[ilk].rate == 0, "Vat/ilk-already-init"); ilks[ilk].rate = 10 ** 27; emit Init(ilk); } ``` This update to `rate` triggered the corresponding `Sstore` hook. Examining the equality at the end of the hook, we can deduce that `sumOfVaultDebtGhost()` becomes non-zero because `ArtGhost[ilk] * newRate` is non-zero. Therefore `ArtGhost[ilk]`, and thus the `Art` storage value for `ilk`, must be non-zero. In summary, the FEoD does not hold if the function`Vat.init(ilk)` is called when `Vat.ils[ilk].Rate` is zero, yet `Vat.ilks[ilk].Art` is non-zero. Now that the counterexample is clear, we still need to answer the question "Is this possible in practice?". The Certora Prover only assumes that the initial storage state obeys the invariant in question - such states are not guaranteed to be _reachable_ via execution of the smart contract's functions. One could attempt to prove that the state is unreachable with the Certora Prover (and this attempt would fail). However, we will take a shortcut by describing a way to achieve the counterexample's preconditions, which was quickly recognized based on the PECU's familiarity with the Vat during the actual discovery. The `Vat.fold` function can set a `rate` back to zero after the associated `Art` becomes non-zero, allowing `init` to be called on an `ilk` that has already been activated. An example test case can be found in the [bug](https://github.com/makerdao/xdomain-dss/issues/2) filed after this counter example discovery. Note that `fold` itself respects the FEoD, so it is only the second `init` call that produces a violation. We have high confidence that this is the only way to violate the FEoD because (1) `init` requires `rate` to be zero, (2) debt can only be generated if `rate` is non-zero, and (3) the only Vat function that can decrease a `rate` is `fold`. This was a surprising result. The FEoD is the most famous Multi-Collateral DAI "invariant", known since at least 2018 and even believed to be mathematically provable. Happily, this edge case poses no threat to the system because the `fold` and `init` functions are limited to governance and trusted contracts. If the violating sequence were carried out, there would be more Vault debt than outstanding DAI, potentially pushing the price of DAI above peg and making it difficult for Vaults to repay their debt. Governance can already inflict much worse damage in other ways and is therefore limited by having its actions delayed. This allows users to exit the system prior to a malicious action taking effect. An honest minority of MKR holders can even trigger a shutdown process that unwinds the system gracefully. Future deployments on other chains will contain a fix making the FEoD a true invariant. ## Conclusions This finding supports our use of Formal Verification as part of a holistic approach to smart contract safety. Discovering that a long-held assumption was false underscores the importance of remaining humble and open-minded when developing smart contract systems. Beliefs about safety or correctness should be held only in proportion to the weight of evidence, never as absolutes. Bugs and imperfections can hide in plain sight for a long time. Effective tools are a great help, but only if applied correctly and consistently! If you have any questions, please reach out to either the MakerDAO Protocol Engineering Core Unit (find us in the `protocol-engineering-public` channel in the MakerDAO [Discord](https://discord.gg/5mr8FzWg)) or the Certora team ([Discord](https://t.co/2liLHaKHIK) or [Twitter](https://twitter.com/CertoraInc)).