Aztec Connect Claim Proof Bug

# Aztec Connect Claim Proof Bug ## TLDR + On the 12th of September we received a submission to our bug bounty program on Immunefi claiming Aztec Connect had a critical bug in one of its core ZK (zero knowledge) circuits. + On the same day we assessed the validity of the submission and took preventative measures (disabling public access to the workflow that could exploit this issue) + On the 3rd of October we patched the target system + On the 10th of October we paid the bounty of $450,000 USD to the white hat The bug was discovered by security researcher [lucash-dev](https://github.com/lucash-dev) ## Details Aztec Connect is a privacy-focused zkRollup system optimized for DeFi use-cases. It batches private transactions within its system using zero knowledge proofs, enabling users to send funds between each and to Layer 1 smart contracts privately. One of its novel features was DeFi interactions allowing users to exchange tokens or invest in protocols. As an example of how Aztec Connect works, let's say several users want to invest in the same L1 DeFi protocol: 1. Each of the users submits a DeFi interaction deposit proof to the mempool. The proof keeps the identity of the user private, but the target protocol and investment amount are known. 2. The sequencer groups intended interactions with the same protocol together. Once there are enough intents to split L1 costs efficiently, the system computes and submits a rollup proof. 3. Having received the rollup proof, the smart contract acts on behalf of the users and performs the actual investment or exchange, receiving tokens in return. 4. On the next rollup the sequencer completes special claim proofs, splitting the newly received tokens between users. The bug uncovered was in the claim proof and could only be exploited by a sequencer (for example, through the escape hatch). The core vulnerability was the calculation of the final output of a given user's DeFi interaction(i.e. how many tokens they received after a DeFi interaction), which could be spoofed by a malicious sequencer. If this were a program written in rust or C, then a simple check that the $\mathit{result} = \mathit{total\_output}\frac{\mathit{user\_input}}{\mathit{total\_input}}$ in integers could be run. Unfortunately zero knowledge circuits don't operate over integers but finite fields. Because the fundamental units in zero knowledge circuits are finite fields, sophisticated tricks have to be implemented to run simple integer operations. If you are familiar with finite field arithmetic and ZK Circuits you can skip the next session and go right to the explanation of the bug. ## ZK Circuit Background A mini-tutorial on finite fields modulo prime numbers: + The finite field has a modulus $p$ which in our case is a prime number (can only be divided by itself and 1). We can use 5, we can't use 6 + All the elements in the finite field are integers from 0 to $p-1$. For $\mathbb{F}_5$ the elements are $\{0,1,2,3,4\}$ + We can perform addition. If the sum of two elements is more or equal to the modulus, we subtract the modulus from the result until it fits $3+4= 2\ mod\ 5$ + Negating an element is just finding an element that when summed with the original one produces 0 $1 + 4 = 0\ mod\ 5$ + Multiplication is performed in the same way, just multiply and then remove $p$ until the result fits $3\cdot 3 = 4\ mod\ 5$ + The inverse is an element that becomes 1 when multiplied by the original $4\cdot 4 = 1\ mod\ 5$, 0 doesn't have an inverse, every other element has an inverse. + Subtraction is just adding a negated element and division is multiplication by the inverse. ZK circuits consist of connected gates using finite field arithmetic. Aztec Connect used TurboPlonk--which you can think about as an extended instruction set for Plonk. Each gate has the form $q_m\cdot w_l\cdot w_r + q_1\cdot w_l + q_2\cdot w_r + q_3\cdot w_o + q_c = 0\ mod\ p$ The $q_m$,$q_1$, $q_2$,$q_3$, $q_c$ are selectors. They are chosen by the circuit writer and define the logic of the circuit. $w_l$,$w_r$, $w_o$ are witness values. You can think about them as the intermediate states of a program being executed. The circuit consists of many gates like these, where individual $w$ are connected to each other in various ways, like in the image. Some are also connected to inputs and outputs of the circuit. ![circuit](https://hackmd.io/_uploads/ryiwZDEGT.png) For example, if we wanted to show that $y=4x^3 +2$ we could produce the following gates: 1. $w_l\cdot w_r - w_o=0\ mod\ p$ 2. $4\cdot w_l\cdot w_r - w_o + 2 = 0 \ mod\ p$ We connect $w_l$ and $w_r$ at the first gate and $w_l$ at the second gate together so that they represent the same value $x$ and connect the $w_o$ at the first gate to $w_r$ at the second gate. Then $w_o$ at the second gate becomes $y$. This is easy to check: 1. $x\cdot x - x^2=0 \ mod\ p$ 2. $4\cdot x\cdot x^2 - y +2 = 0\ mod\ p$ So this set of gates with the proper wiring computes exactly what we wanted. There are some standard gates that are used quite often : 1. $w_l\cdot w_r - w_r=0\ mod\ p$, where $w_l=w_r$ through wiring. Boolean gate $x\cdot (x-1)=0\ mod\ p$ ensuring $x$ is zero or 1. 2. $w_l+w_r-w_o = 0\ mod\ p$ Standard addition gate. 3. $w_l\cdot w_r - w_o=0\ mod\ p$ Standard multiplication gate. 4. $w_l\cdot w_r - 1=0 \ mod\ p$ Ensures that $w_l$ is non-zero. 5. $w_{l1}\cdot w_{r1} - 1 =0$ along with $w_{l2}\cdot w_{r2} - w_{o2} = 0$ where $w_r$ are connected to each over Represents that $w_{o2}=\frac{w_{l2}}{w_{l1}}$ The last gate is actually a nice example of the core concept of circuits: you don't have to prove the computation, you just need to provide the witness for the correctness. This is the case with division in finite fields. Finding the inverse is a heavy operation, but showing that you've found the inverse is trivial: just show that the product of the two values is 1. ## Integer arithmetic over fields So, going back to our problem: we needed to show that $\mathit{user\_output}=\mathit{total\_output}\cdot\frac{\mathit{user\_input}}{\mathit{total\_input}}$ in integers, but we have to do this in a circuit. Well, one of the problems is that the $user\_output$ will be a floored value in most cases, since we are using integer division. $user\_output\cdot total\_input \le total\_output \cdot user\_input$ So we need to introduce a division remainder term: $$user\_output\cdot total\_input + remainder= total\_output\cdot user\_input$$ This enables us to satisfy the constraint, but, unfortunately, now we can change the value of $user\_output$ to any value while $total\_input$, $total\_output$ and $user\_input$ are set by varying the remainder. We need to ensure that this relation is correct in integers, not in field elements. For that purpose we split each of the values into 68-bit limbs and check the correctness of the whole relation the same way one would check the correctness of an equation in elementary school: $12 \cdot 12 = 11 \cdot 13 + 1$ First we check the lower digits: $2\cdot 2 = 1\cdot 3 + 1$ If there is a carry, we add it to the tens: $2 \cdot 1 + 1\cdot 2 = 1\cdot 3 +1\cdot 1$ And check the hundreds: $1\cdot 1= 1\cdot 1$ When we do this computation, we only have to compute the products of individual digits in memory so that we don't overflow it. The way we implemented integer checks for the claim ratio equation is completely the same, but the limbs (digits) are in $[0, 2^{68}-1]$ instead of $[0,9]$. And each number consists of 4 such limbs. Here we have the piece of code performing this check: ```c++ // takes a [204-208]-bit limb and splits it into a low 136-bit limb and a high 72-bit limb const auto split_out_carry_term = [&composer, &shift_2](const field_ct& limb) { const uint256_t limb_val = limb.get_value(); const uint256_t lo_val = limb_val.slice(0, 68 * 2); const uint256_t hi_val = limb_val.slice(68 * 2, 256); const field_ct lo(witness_ct(&composer, lo_val)); const field_ct hi(witness_ct(&composer, hi_val)); lo.create_range_constraint(68 * 2); hi.create_range_constraint(72); // allow for 4 overflow bits limb.assert_equal(lo + (hi * shift_2)); return std::array<field_ct, 2>{ lo, hi }; }; // Use schoolbook multiplication algorithm to multiply 2 4-limbed values together, then convert result into 4 // 2-limb values (with limbs twice the size) that do not overlap const auto compute_product_limbs = [&split_out_carry_term, &shift_1](const std::array<field_ct, 4>& left, const std::array<field_ct, 4>& right, const std::array<field_ct, 4>& to_add, const bool use_residual = false) { // a = left[0] * right[0]; const field_ct b = left[0].madd(right[1], left[1] * right[0]); const field_ct c = left[0].madd(right[2], left[1].madd(right[1], left[2] * right[0])); const field_ct d = left[0].madd(right[3], left[1].madd(right[2], left[2].madd(right[1], left[3] * right[0]))); const field_ct e = left[1].madd(right[3], left[2].madd(right[2], left[3] * right[1])); const field_ct f = left[2].madd(right[3], left[3] * right[2]); // g = left[3] * right[3]; if (use_residual) { const auto t0 = split_out_carry_term(to_add[0] + left[0].madd(right[0], (b * shift_1) + to_add[1] * shift_1)); const auto r0 = t0[0]; const auto t1 = split_out_carry_term(t0[1].add_two(c + to_add[2], to_add[3] * shift_1 + d * shift_1)); const auto r1 = t1[0]; const auto t2 = split_out_carry_term(t1[1].add_two(e, f * shift_1)); const auto r2 = t2[0]; const auto r3 = left[3].madd(right[3], t2[1]); return std::array<field_ct, 4>{ r0, r1, r2, r3 }; } const auto t0 = split_out_carry_term(left[0].madd(right[0], (b * shift_1))); const auto r0 = t0[0]; const auto t1 = split_out_carry_term(t0[1].add_two(c, d * shift_1)); const auto r1 = t1[0]; const auto t2 = split_out_carry_term(t1[1].add_two(e, f * shift_1)); const auto r2 = t2[0]; const auto r3 = left[3].madd(right[3], t2[1]); return std::array<field_ct, 4>{ r0, r1, r2, r3 }; }; const auto lhs = compute_product_limbs(left_1, right_1, { 0, 0, 0, 0 }, false); const auto rhs = compute_product_limbs(left_2, right_2, residual_limbs, true); bool_ct balanced(&composer, true); for (size_t i = 0; i < 4; ++i) { balanced = balanced && lhs[i] == rhs[i]; } return balanced; ``` So all we did was split each value into 4 limbs like this: $value = value_0 + value_1\cdot 2^{68} + value_2\cdot 2^{136} + value_3\cdot 2^{204}\ mod\ p$ Each of the limbs was then range constrained to 68-bits to ensure correct decomposition (range constraints can be easily implemented by combining the witnesses from boolean gates). The vulnerability stemmed from two core issues: 1. The top limb was constrained to 68 bits 2. The remainder had no range constraints at all ```c++ // Split a field_t element into 4 68-bit limbs const auto split_into_limbs = [&composer, &shift_1, &shift_2, &shift_3](const field_ct& input) { const uint256_t value = input.get_value(); const uint256_t t0 = value.slice(0, 68); const uint256_t t1 = value.slice(68, 136); const uint256_t t2 = value.slice(136, 204); const uint256_t t3 = value.slice(204, 272); std::array<field_ct, 4> limbs{ witness_ct(&composer, t0), witness_ct(&composer, t1), witness_ct(&composer, t2), witness_ct(&composer, t3), }; field_ct limb_sum_1 = limbs[0].add_two(limbs[1] * shift_1, limbs[2] * shift_2); field_ct limb_sum_2 = input - (limbs[3] * shift_3); limb_sum_1.assert_equal(limb_sum_2); limbs[0].create_range_constraint(68); limbs[1].create_range_constraint(68); limbs[2].create_range_constraint(68); limbs[3].create_range_constraint(68); // The offending range constraint return limbs; }; ``` If you are range constraining a value that needs to be in the range $2^{252}$ to $2^{272}$ via decomposition, you lose the 1-to-1 correspondence that would've happened if the range constraint was $2^{252}$. There are now $2^{20}$ possible values of limbs of each of the original values. More concretely, since all the arithmetic is happening modulo $p$, you can add this modulus to the decomposition and when the value is reconstructed that extra $p$ is going to disappear, so instead of the original claim ratio equation in integers we can update it to: $$(user\_output +a\cdot p)\cdot (total\_input +b\cdot p) + remainder + c \cdot p = (total\_output + d \cdot p) \cdot (user\_input + e\cdot p)$$ For simplicity, let's say $user\_input=1$, then we can set $a=0$, $b=0$, $c=0$, $d=1$, $e=0$, and the resulting equation is: $user\_output\cdot total\_input + remainder = total\_output + p$ Then, we can set: $remainder = total\_output + p - user\_output \cdot total\_input$ Now we can vary $user\_output$ and as long as $user\_output\cdot total\_input > total\_output$ and $user\_output \cdot total\_input - total\_output \le p$ we'll be able to compute an appropriate remainder, since it is not constrained and controlling user input gives us a mechanism to steal TVL. # The Patch # The first and most important issue is to stop multiples of $p$ being added to various parts of the ratio equation. For that we simply started range constraining individual limbs so that there is a 1-to-1 map between original and split values. ```c++ // Split a field_t element into 4 68-bit limbs const auto split_into_limbs = [&composer, &shift_1, &shift_2, &shift_3](const field_ct& input, const size_t MAX_INPUT_BITS) { const uint256_t value = input.get_value(); constexpr size_t NUM_BITS_PER_LIMB = 68; ASSERT(MAX_INPUT_BITS <= MAX_NO_WRAP_INTEGER_BIT_LENGTH); ASSERT(MAX_INPUT_BITS > 0); const uint256_t t0 = value.slice(0, NUM_BITS_PER_LIMB); const uint256_t t1 = value.slice(NUM_BITS_PER_LIMB, 2 * NUM_BITS_PER_LIMB); const uint256_t t2 = value.slice(2 * NUM_BITS_PER_LIMB, 3 * NUM_BITS_PER_LIMB); const uint256_t t3 = value.slice(3 * NUM_BITS_PER_LIMB, 4 * NUM_BITS_PER_LIMB); std::array<field_ct, 4> limbs{ witness_ct(&composer, t0), witness_ct(&composer, t1), witness_ct(&composer, t2), witness_ct(&composer, t3), }; field_ct limb_sum_1 = limbs[0].add_two(limbs[1] * shift_1, limbs[2] * shift_2); field_ct limb_sum_2 = input - (limbs[3] * shift_3); limb_sum_1.assert_equal(limb_sum_2); // Since the modulus is a power of two minus one, we can simply range constrain each of the limbs size_t bits_left = MAX_INPUT_BITS; for (size_t i = 0; i < 4; i++) { // If we've run out of bits, enforce zero if (bits_left == 0) { limbs[i].assert_is_zero(); // If there are not enough bits for a full lmb, reduce constraint } else if (bits_left < NUM_BITS_PER_LIMB) { limbs[i].create_range_constraint(bits_left); bits_left = 0; } else { // Just a regular limb limbs[i].create_range_constraint(NUM_BITS_PER_LIMB); bits_left -= NUM_BITS_PER_LIMB; } } return limbs; }; ``` Another issue that we noticed was that if we left the remainder unconstrained, but still constrained the limbs, the sequencer could possibly create a proof that would assign the depositor much less funds than they were supposed to receive. If we look at the equation, it will become obvious that we can increase the remainder in steps equal to $total\_input$ to decrease $user\_output$. $$user\_output\cdot total\_input + remainder= total\_output\cdot user\_input$$ So we also added a range constraint ensuring $remainder \in [0,total\_input)$ ```c++ residual.create_range_constraint(notes::NOTE_VALUE_BIT_LENGTH, "ratio_check range constraint failure: residual"); // We need to assert that residual < a2 // i.e. a2 - residual > 0 => a2 - residual - 1 >= 0 (ratios.a2 - residual - 1) .normalize() .create_range_constraint(notes::NOTE_VALUE_BIT_LENGTH, "ratio_check range constraint failure: residual >= a2"); ``` ## Conclusion Any project as large as Aztec Connect requires a great deal of attention to implement securely and while the code was extensively audited this bug got through. So what can be done to minimize the probability of such bugs in future systems? On our part, Aztec Labs is implementing a few process improvements to prevent future bugs of this type: 1. Implementing a policy where all ad-hoc implementations of primitives are prohibited. If there is a need to do integer comparisons, then the code must go into the standard library, where it is heavily scrutinised. 2. Creating automated tooling to detect unconstrained values and help focus auditors' attention specifically on them and derived values. Minimizing the impact of human error is absolutely crucial to building a robust system, so there is huge value in redundancy and automation. We credit independent security researcher [lucash-dev](https://github.com/lucash-dev) with discovering the vulnerability. His vigilance--as well as the hard work of the greater zero knowledge community--ensure the safety of open-source crypto systems.