Try   HackMD

Aztec Connect Claim Proof Bug

TLDR

  • On the 12th of September we received a submission to our bug bounty program on Immunefi claiming Aztec Connect had a critical bug in one of its core ZK (zero knowledge) circuits.
  • On the same day we assessed the validity of the submission and took preventative measures (disabling public access to the workflow that could exploit this issue)
  • On the 3rd of October we patched the target system
  • On the 10th of October we paid the bounty of $450,000 USD to the white hat

The bug was discovered by security researcher lucash-dev

Details

Aztec Connect is a privacy-focused zkRollup system optimized for DeFi use-cases. It batches private transactions within its system using zero knowledge proofs, enabling users to send funds between each and to Layer 1 smart contracts privately. One of its novel features was DeFi interactions allowing users to exchange tokens or invest in protocols.

As an example of how Aztec Connect works, let's say several users want to invest in the same L1 DeFi protocol:

  1. Each of the users submits a DeFi interaction deposit proof to the mempool. The proof keeps the identity of the user private, but the target protocol and investment amount are known.
  2. The sequencer groups intended interactions with the same protocol together. Once there are enough intents to split L1 costs efficiently, the system computes and submits a rollup proof.
  3. Having received the rollup proof, the smart contract acts on behalf of the users and performs the actual investment or exchange, receiving tokens in return.
  4. On the next rollup the sequencer completes special claim proofs, splitting the newly received tokens between users.

The bug uncovered was in the claim proof and could only be exploited by a sequencer (for example, through the escape hatch).

The core vulnerability was the calculation of the final output of a given user's DeFi interaction(i.e. how many tokens they received after a DeFi interaction), which could be spoofed by a malicious sequencer.

If this were a program written in rust or C, then a simple check that the

result=total_outputuser_inputtotal_input in integers could be run.

Unfortunately zero knowledge circuits don't operate over integers but finite fields.

Because the fundamental units in zero knowledge circuits are finite fields, sophisticated tricks have to be implemented to run simple integer operations. If you are familiar with finite field arithmetic and ZK Circuits you can skip the next session and go right to the explanation of the bug.

ZK Circuit Background

A mini-tutorial on finite fields modulo prime numbers:

  • The finite field has a modulus
    p
    which in our case is a prime number (can only be divided by itself and 1). We can use 5, we can't use 6
  • All the elements in the finite field are integers from 0 to
    p1
    . For
    F5
    the elements are
    {0,1,2,3,4}
  • We can perform addition. If the sum of two elements is more or equal to the modulus, we subtract the modulus from the result until it fits
    3+4=2 mod 5
  • Negating an element is just finding an element that when summed with the original one produces 0
    1+4=0 mod 5
  • Multiplication is performed in the same way, just multiply and then remove
    p
    until the result fits
    33=4 mod 5
  • The inverse is an element that becomes 1 when multiplied by the original
    44=1 mod 5
    , 0 doesn't have an inverse, every other element has an inverse.
  • Subtraction is just adding a negated element and division is multiplication by the inverse.

ZK circuits consist of connected gates using finite field arithmetic. Aztec Connect used TurboPlonkwhich you can think about as an extended instruction set for Plonk. Each gate has the form

qmwlwr+q1wl+q2wr+q3wo+qc=0 mod p

The

qm,
q1
,
q2
,
q3
,
qc
are selectors. They are chosen by the circuit writer and define the logic of the circuit.
wl
,
wr
,
wo
are witness values. You can think about them as the intermediate states of a program being executed.

The circuit consists of many gates like these, where individual

w are connected to each other in various ways, like in the image. Some are also connected to inputs and outputs of the circuit.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

For example, if we wanted to show that

y=4x3+2 we could produce the following gates:

  1. wlwrwo=0 mod p
  2. 4wlwrwo+2=0 mod p

We connect

wl and
wr
at the first gate and
wl
at the second gate together so that they represent the same value
x
and connect the
wo
at the first gate to
wr
at the second gate. Then
wo
at the second gate becomes
y
. This is easy to check:

  1. xxx2=0 mod p
  2. 4xx2y+2=0 mod p

So this set of gates with the proper wiring computes exactly what we wanted. There are some standard gates that are used quite often :

  1. wlwrwr=0 mod p, where
    wl=wr
    through wiring.

    Boolean gate

    x(x1)=0 mod p ensuring
    x
    is zero or 1.

  2. wl+wrwo=0 mod p

    Standard addition gate.

  3. wlwrwo=0 mod p

    Standard multiplication gate.

  4. wlwr1=0 mod p

    Ensures that

    wl is non-zero.

  5. wl1wr11=0 along with
    wl2wr2wo2=0
    where
    wr
    are connected to each over

    Represents that

    wo2=wl2wl1

The last gate is actually a nice example of the core concept of circuits: you don't have to prove the computation, you just need to provide the witness for the correctness.

This is the case with division in finite fields. Finding the inverse is a heavy operation, but showing that you've found the inverse is trivial: just show that the product of the two values is 1.

Integer arithmetic over fields

So, going back to our problem: we needed to show that

user_output=total_outputuser_inputtotal_input in integers, but we have to do this in a circuit.

Well, one of the problems is that the

user_output will be a floored value in most cases, since we are using integer division.
user_outputtotal_inputtotal_outputuser_input

So we need to introduce a division remainder term:

user_outputtotal_input+remainder=total_outputuser_input

This enables us to satisfy the constraint, but, unfortunately, now we can change the value of

user_output to any value while
total_input
,
total_output
and
user_input
are set by varying the remainder. We need to ensure that this relation is correct in integers, not in field elements. For that purpose we split each of the values into 68-bit limbs and check the correctness of the whole relation the same way one would check the correctness of an equation in elementary school:

1212=1113+1

First we check the lower digits:

22=13+1

If there is a carry, we add it to the tens:

21+12=13+11

And check the hundreds:

11=11

When we do this computation, we only have to compute the products of individual digits in memory so that we don't overflow it. The way we implemented integer checks for the claim ratio equation is completely the same, but the limbs (digits) are in

[0,2681] instead of
[0,9]
. And each number consists of 4 such limbs. Here we have the piece of code performing this check:

	// takes a [204-208]-bit limb and splits it into a low 136-bit limb and a high 72-bit limb
    const auto split_out_carry_term = [&composer, &shift_2](const field_ct& limb) {
        const uint256_t limb_val = limb.get_value();

        const uint256_t lo_val = limb_val.slice(0, 68 * 2);
        const uint256_t hi_val = limb_val.slice(68 * 2, 256);

        const field_ct lo(witness_ct(&composer, lo_val));
        const field_ct hi(witness_ct(&composer, hi_val));

        lo.create_range_constraint(68 * 2);
        hi.create_range_constraint(72); // allow for 4 overflow bits

        limb.assert_equal(lo + (hi * shift_2));

        return std::array<field_ct, 2>{ lo, hi };
    };
	// Use schoolbook multiplication algorithm to multiply 2 4-limbed values together, then convert result into 4
    // 2-limb values (with limbs twice the size) that do not overlap
    const auto compute_product_limbs = [&split_out_carry_term, &shift_1](const std::array<field_ct, 4>& left,
                                                                         const std::array<field_ct, 4>& right,
                                                                         const std::array<field_ct, 4>& to_add,
                                                                         const bool use_residual = false) {
        // a = left[0] * right[0];
        const field_ct b = left[0].madd(right[1], left[1] * right[0]);
        const field_ct c = left[0].madd(right[2], left[1].madd(right[1], left[2] * right[0]));
        const field_ct d = left[0].madd(right[3], left[1].madd(right[2], left[2].madd(right[1], left[3] * right[0])));
        const field_ct e = left[1].madd(right[3], left[2].madd(right[2], left[3] * right[1]));
        const field_ct f = left[2].madd(right[3], left[3] * right[2]);
        // g = left[3] * right[3];

        if (use_residual) {
            const auto t0 =
                split_out_carry_term(to_add[0] + left[0].madd(right[0], (b * shift_1) + to_add[1] * shift_1));
            const auto r0 = t0[0];
            const auto t1 = split_out_carry_term(t0[1].add_two(c + to_add[2], to_add[3] * shift_1 + d * shift_1));
            const auto r1 = t1[0];
            const auto t2 = split_out_carry_term(t1[1].add_two(e, f * shift_1));
            const auto r2 = t2[0];
            const auto r3 = left[3].madd(right[3], t2[1]);
            return std::array<field_ct, 4>{ r0, r1, r2, r3 };
        }
        const auto t0 = split_out_carry_term(left[0].madd(right[0], (b * shift_1)));
        const auto r0 = t0[0];
        const auto t1 = split_out_carry_term(t0[1].add_two(c, d * shift_1));
        const auto r1 = t1[0];
        const auto t2 = split_out_carry_term(t1[1].add_two(e, f * shift_1));
        const auto r2 = t2[0];
        const auto r3 = left[3].madd(right[3], t2[1]);
        return std::array<field_ct, 4>{ r0, r1, r2, r3 };
    };

    const auto lhs = compute_product_limbs(left_1, right_1, { 0, 0, 0, 0 }, false);
    const auto rhs = compute_product_limbs(left_2, right_2, residual_limbs, true);

    bool_ct balanced(&composer, true);
    for (size_t i = 0; i < 4; ++i) {
        balanced = balanced && lhs[i] == rhs[i];
    }

    return balanced;

So all we did was split each value into 4 limbs like this:

value=value0+value1268+value22136+value32204 mod p

Each of the limbs was then range constrained to 68-bits to ensure correct decomposition (range constraints can be easily implemented by combining the witnesses from boolean gates).

The vulnerability stemmed from two core issues:

  1. The top limb was constrained to 68 bits
  2. The remainder had no range constraints at all
	// Split a field_t element into 4 68-bit limbs
    const auto split_into_limbs = [&composer, &shift_1, &shift_2, &shift_3](const field_ct& input) {
        const uint256_t value = input.get_value();

        const uint256_t t0 = value.slice(0, 68);
        const uint256_t t1 = value.slice(68, 136);
        const uint256_t t2 = value.slice(136, 204);
        const uint256_t t3 = value.slice(204, 272);

        std::array<field_ct, 4> limbs{
            witness_ct(&composer, t0),
            witness_ct(&composer, t1),
            witness_ct(&composer, t2),
            witness_ct(&composer, t3),
        };

        field_ct limb_sum_1 = limbs[0].add_two(limbs[1] * shift_1, limbs[2] * shift_2);
        field_ct limb_sum_2 = input - (limbs[3] * shift_3);
        limb_sum_1.assert_equal(limb_sum_2);

        limbs[0].create_range_constraint(68);
        limbs[1].create_range_constraint(68);
        limbs[2].create_range_constraint(68);
        limbs[3].create_range_constraint(68); // The offending range constraint

        return limbs;
    };

If you are range constraining a value that needs to be in the range

2252 to
2272
via decomposition, you lose the 1-to-1 correspondence that would've happened if the range constraint was
2252
. There are now
220
possible values of limbs of each of the original values. More concretely, since all the arithmetic is happening modulo
p
, you can add this modulus to the decomposition and when the value is reconstructed that extra
p
is going to disappear, so instead of the original claim ratio equation in integers we can update it to:

(user_output+ap)(total_input+bp)+remainder+cp=(total_output+dp)(user_input+ep)

For simplicity, let's say

user_input=1, then we can set
a=0
,
b=0
,
c=0
,
d=1
,
e=0
, and the resulting equation is:

user_outputtotal_input+remainder=total_output+p

Then, we can set:

remainder=total_output+puser_outputtotal_input

Now we can vary

user_output and as long as
user_outputtotal_input>total_output
and
user_outputtotal_inputtotal_outputp
we'll be able to compute an appropriate remainder, since it is not constrained and controlling user input gives us a mechanism to steal TVL.

The Patch

The first and most important issue is to stop multiples of

p being added to various parts of the ratio equation. For that we simply started range constraining individual limbs so that there is a 1-to-1 map between original and split values.

 // Split a field_t element into 4 68-bit limbs
    const auto split_into_limbs = [&composer, &shift_1, &shift_2, &shift_3](const field_ct& input,
                                                                            const size_t MAX_INPUT_BITS) {
        const uint256_t value = input.get_value();

        constexpr size_t NUM_BITS_PER_LIMB = 68;

        ASSERT(MAX_INPUT_BITS <= MAX_NO_WRAP_INTEGER_BIT_LENGTH);
        ASSERT(MAX_INPUT_BITS > 0);

        const uint256_t t0 = value.slice(0, NUM_BITS_PER_LIMB);
        const uint256_t t1 = value.slice(NUM_BITS_PER_LIMB, 2 * NUM_BITS_PER_LIMB);
        const uint256_t t2 = value.slice(2 * NUM_BITS_PER_LIMB, 3 * NUM_BITS_PER_LIMB);
        const uint256_t t3 = value.slice(3 * NUM_BITS_PER_LIMB, 4 * NUM_BITS_PER_LIMB);

        std::array<field_ct, 4> limbs{
            witness_ct(&composer, t0),
            witness_ct(&composer, t1),
            witness_ct(&composer, t2),
            witness_ct(&composer, t3),
        };

        field_ct limb_sum_1 = limbs[0].add_two(limbs[1] * shift_1, limbs[2] * shift_2);
        field_ct limb_sum_2 = input - (limbs[3] * shift_3);
        limb_sum_1.assert_equal(limb_sum_2);

        // Since the modulus is a power of two minus one, we can simply range constrain each of the limbs
        size_t bits_left = MAX_INPUT_BITS;
        for (size_t i = 0; i < 4; i++) {
            // If we've run out of bits, enforce zero
            if (bits_left == 0) {
                limbs[i].assert_is_zero();
                // If there are not enough bits for a full lmb, reduce constraint
            } else if (bits_left < NUM_BITS_PER_LIMB) {
                limbs[i].create_range_constraint(bits_left);
                bits_left = 0;
            } else {
                // Just a regular limb
                limbs[i].create_range_constraint(NUM_BITS_PER_LIMB);
                bits_left -= NUM_BITS_PER_LIMB;
            }
        }

        return limbs;
    };

Another issue that we noticed was that if we left the remainder unconstrained, but still constrained the limbs, the sequencer could possibly create a proof that would assign the depositor much less funds than they were supposed to receive.

If we look at the equation, it will become obvious that we can increase the remainder in steps equal to

total_input to decrease
user_output
.

user_outputtotal_input+remainder=total_outputuser_input

So we also added a range constraint ensuring

remainder[0,total_input)

residual.create_range_constraint(notes::NOTE_VALUE_BIT_LENGTH, "ratio_check range constraint failure: residual");
    // We need to assert that residual < a2
    // i.e. a2 - residual > 0 => a2 - residual - 1 >= 0
(ratios.a2 - residual - 1)
        .normalize()
        .create_range_constraint(notes::NOTE_VALUE_BIT_LENGTH, "ratio_check range constraint failure: residual >= a2");

Conclusion

Any project as large as Aztec Connect requires a great deal of attention to implement securely and while the code was extensively audited this bug got through.

So what can be done to minimize the probability of such bugs in future systems? On our part, Aztec Labs is implementing a few process improvements to prevent future bugs of this type:

  1. Implementing a policy where all ad-hoc implementations of primitives are prohibited. If there is a need to do integer comparisons, then the code must go into the standard library, where it is heavily scrutinised.

  2. Creating automated tooling to detect unconstrained values and help focus auditors' attention specifically on them and derived values.

Minimizing the impact of human error is absolutely crucial to building a robust system, so there is huge value in redundancy and automation.

We credit independent security researcher lucash-dev with discovering the vulnerability. His vigilanceas well as the hard work of the greater zero knowledge communityensure the safety of open-source crypto systems.