The bug was discovered by security researcher lucash-dev
Aztec Connect is a privacy-focused zkRollup system optimized for DeFi use-cases. It batches private transactions within its system using zero knowledge proofs, enabling users to send funds between each and to Layer 1 smart contracts privately. One of its novel features was DeFi interactions allowing users to exchange tokens or invest in protocols.
As an example of how Aztec Connect works, let's say several users want to invest in the same L1 DeFi protocol:
The bug uncovered was in the claim proof and could only be exploited by a sequencer (for example, through the escape hatch).
The core vulnerability was the calculation of the final output of a given user's DeFi interaction(i.e. how many tokens they received after a DeFi interaction), which could be spoofed by a malicious sequencer.
If this were a program written in rust or C, then a simple check that the in integers could be run.
Unfortunately zero knowledge circuits don't operate over integers but finite fields.
Because the fundamental units in zero knowledge circuits are finite fields, sophisticated tricks have to be implemented to run simple integer operations. If you are familiar with finite field arithmetic and ZK Circuits you can skip the next session and go right to the explanation of the bug.
A mini-tutorial on finite fields modulo prime numbers:
ZK circuits consist of connected gates using finite field arithmetic. Aztec Connect used TurboPlonk–which you can think about as an extended instruction set for Plonk. Each gate has the form
The ,, ,, are selectors. They are chosen by the circuit writer and define the logic of the circuit. ,, are witness values. You can think about them as the intermediate states of a program being executed.
The circuit consists of many gates like these, where individual are connected to each other in various ways, like in the image. Some are also connected to inputs and outputs of the circuit.
For example, if we wanted to show that we could produce the following gates:
We connect and at the first gate and at the second gate together so that they represent the same value and connect the at the first gate to at the second gate. Then at the second gate becomes . This is easy to check:
So this set of gates with the proper wiring computes exactly what we wanted. There are some standard gates that are used quite often :
, where through wiring.
Boolean gate ensuring is zero or 1.
Standard addition gate.
Standard multiplication gate.
Ensures that is non-zero.
along with where are connected to each over
Represents that
The last gate is actually a nice example of the core concept of circuits: you don't have to prove the computation, you just need to provide the witness for the correctness.
This is the case with division in finite fields. Finding the inverse is a heavy operation, but showing that you've found the inverse is trivial: just show that the product of the two values is 1.
So, going back to our problem: we needed to show that in integers, but we have to do this in a circuit.
Well, one of the problems is that the will be a floored value in most cases, since we are using integer division.
So we need to introduce a division remainder term:
This enables us to satisfy the constraint, but, unfortunately, now we can change the value of to any value while , and are set by varying the remainder. We need to ensure that this relation is correct in integers, not in field elements. For that purpose we split each of the values into 68-bit limbs and check the correctness of the whole relation the same way one would check the correctness of an equation in elementary school:
First we check the lower digits:
If there is a carry, we add it to the tens:
And check the hundreds:
When we do this computation, we only have to compute the products of individual digits in memory so that we don't overflow it. The way we implemented integer checks for the claim ratio equation is completely the same, but the limbs (digits) are in instead of . And each number consists of 4 such limbs. Here we have the piece of code performing this check:
So all we did was split each value into 4 limbs like this:
Each of the limbs was then range constrained to 68-bits to ensure correct decomposition (range constraints can be easily implemented by combining the witnesses from boolean gates).
The vulnerability stemmed from two core issues:
If you are range constraining a value that needs to be in the range to via decomposition, you lose the 1-to-1 correspondence that would've happened if the range constraint was . There are now possible values of limbs of each of the original values. More concretely, since all the arithmetic is happening modulo , you can add this modulus to the decomposition and when the value is reconstructed that extra is going to disappear, so instead of the original claim ratio equation in integers we can update it to:
For simplicity, let's say , then we can set , , , , , and the resulting equation is:
Then, we can set:
Now we can vary and as long as and we'll be able to compute an appropriate remainder, since it is not constrained and controlling user input gives us a mechanism to steal TVL.
The first and most important issue is to stop multiples of being added to various parts of the ratio equation. For that we simply started range constraining individual limbs so that there is a 1-to-1 map between original and split values.
Another issue that we noticed was that if we left the remainder unconstrained, but still constrained the limbs, the sequencer could possibly create a proof that would assign the depositor much less funds than they were supposed to receive.
If we look at the equation, it will become obvious that we can increase the remainder in steps equal to to decrease .
So we also added a range constraint ensuring
Any project as large as Aztec Connect requires a great deal of attention to implement securely and while the code was extensively audited this bug got through.
So what can be done to minimize the probability of such bugs in future systems? On our part, Aztec Labs is implementing a few process improvements to prevent future bugs of this type:
Implementing a policy where all ad-hoc implementations of primitives are prohibited. If there is a need to do integer comparisons, then the code must go into the standard library, where it is heavily scrutinised.
Creating automated tooling to detect unconstrained values and help focus auditors' attention specifically on them and derived values.
Minimizing the impact of human error is absolutely crucial to building a robust system, so there is huge value in redundancy and automation.
We credit independent security researcher lucash-dev with discovering the vulnerability. His vigilance–as well as the hard work of the greater zero knowledge community–ensure the safety of open-source crypto systems.