Hubble BLS walkthrough

# Hubble BLS walkthrough [TOC] ## Introduction An optimistic rollup lets a coordinator submit a L2 transactions batch then let any watcher dispute the batch if the L2 transactions inside it turns out to be invalid. The dispute process is the validity logic of the L2 transactions programmed in the EVM. Let's list some validity logic for a simple value transfer: - The sender's pre-state must have enough balance to send the amount and pay the fee. - The sender's pre-state nonce must be equal to the transaction's nonce - The receiver's state has the same tokenID as the sender's - The tx is signed by the sender In Hubble we separate the dispute logic into two parts, the state transition part and the authenticity part. The former checks the validity of the balance update of the state tree leaves, and it should derive the post state tree root that agrees with the committed one. The latter checks tx is signed by the sender. In this article, we dive into the detail of the authenticity check. ## Aggregated Signature makes your L2 tx cheaper To validate a digital signature on-chain, the chain must know the public key, the message to sign against, and the signature. In hubble the sender's public key is registered on the AccountRegistry contract, which maintains a 32 level sparse merkle tree of public keys. The key is public avaiable by synchronizing the calldata of the registration. The message is the compressed tx that we discussed in the [previous article](https://hackmd.io/8q5tx1SdQguR1QJGCaaIfQ). Anyone can acquire this data by syncing the calldata from the batch submission function. The signature should be also attached when submitting the batch. It is also available by syncing the calldata. > TODO: Move this to somewhere useful ``` batch |commitmentRoot<32>|batchType<1>|size<1>|committer<20>|finalizeOn<8>| Commmitment |stateRoot<32>|bodyRoot<32>| CommitmentBody |accountRoot<32>|signature<64>|feeReceiver<32>|txs<12*32>| ``` Note that we have 1024 tx in a batch. Using the ECDSA we need to attach 65 bytes of calldata for each transaction, which costs us `65 * 16` gas since calldata consumes 16 gas per byte. But using a BLS aggregated signature, we can aggregate the signatues in a commitment together. A BLS signature on BN254 curve takes 2 uint256 values, which are 64 bytes. So we now amortizing the calldata cost across 32 L2 transations, each costs `64 * 16 / 32` gas. We reduce the calldata to commit from 12 + 65 bytes to 12 + 2 bytes. If we have X amount of gas, we now serve from `X/(12+65)*16` L2 transactions to `X/(12+2)*16`. Dear reader, we are looking at 8.4x scalability improvement. From another perspective, that's about saving 1000 gas, which is 0.165 USD (at 50 Gwei and 3300 USD/ETH) saving on your L2 transaction. ## Primer on elliptic curves and aggregated signatures The BLS signature scheme works on elliptic curves. It operates on two types of points G1 and G2 and it has a pairing function `e(G1, G2)` to manipulate points in G1 and G2. There are rules of the G1 and G2 we have to introduce. We use G1 as examples, but the same rules applies to G2. 1. the same types of point can added together. for example, if P and Q are G1 points, we can have P+ Q as a new G1 point. 2. We can multiply an integer with a point to get a new point. For example, Q is a G1 point, 3 * Q is another G1 point. 3. Let's a P = 3 * Q. You know the value of P and Q. But it's computationally infeasible to compute 3. 4. There's a special point in G1 named generator. Usually we abuse the notation and denote it as G1 too. The pairing function takes two arguments, the first is a point in G1, and the second is a point in G2. It has the biliear property that would enables the signature check. For $a, b \in Z_q$, $P \in \Bbb G_1$, and $Q \in \Bbb G_2$, we have: $$ e( a \times P, b \times Q) = e( ab \times P, Q) = e( P, ab \times Q) $$ The message we could like to sign on could be bytes. We define a function to map it to a G1 point, so that we could manipulate the message in G1. $$ H_0: \mathcal{M} \to \Bbb G_1 $$ Then we define the secret key as a random integer. We work integers in finite fields, that's why we have a big modulus $q$. Every artithmetics like addition and multiplication output the resulting integer modulo $q$ $$ \alpha \gets \Bbb Z_q $$ The public key is just multiplication of the secret with the G2 generator. Note that by rule 3 people can not derive your secret key from the public information h and G2 $$ h \gets \alpha \times G_2 \in \Bbb G_2 $$ To sign a message, we multiply the secret key to the hased message, which is now a G1 point. $$ \sigma \gets \alpha \times H_0(m) \in \Bbb G_1 $$ The validation is checking the two pairing function outputs the same result given the public key, messages, and the signature. $$ e(\sigma, G_2)\stackrel{?}{=} e(H_0(m), h) $$ To see why verification works, we have to apply the biliear rule. $$ e(\sigma, G_2) \\ = e( \alpha \times H_0(m), G_2) \\ = e( H_0(m), \alpha \times G_2) \\ = e(H_0(m), h) $$ In practice, the check is rearranged to the following to fit the [EIP 197](https://eips.ethereum.org/EIPS/eip-197) function signature, where neg(G2) is the negative "curve" operation on the G2 point. The neg(G2) itself can be defined as a constant in the solidity code, so no computation involved. $$ e(\sigma, neg(G_2)) e(H_0(m), h) \stackrel{?}{=} 1 $$ Cryptographers have defined different types of elliptic curves like BN254 or BLS12-381 with different efficiency or security properties. On Ethereum lives a EIP 197 precompile contract at address 0x8 that can perform the pairing check for the BLS signature. It works on BN254 curve, and when you call that contract in EVM, behind the scene the full node performs the computation with an optimized cryptography library. At the time of writing this article, BN254 is still the only supported curve in production. So we'll continue out discussion based on BN254 hereafter. The function signature of the EIP 197 looks like this ``` Input: (a1, b1, a2, b2, ..., ak, bk) from (G_1 x G_2)^k Output: If the length of the input is incorrect or any of the inputs are not elements of the respective group or are not encoded correctly, the call fails. Otherwise, return one if log_P1(a1) * log_P2(b1) + ... + log_P1(ak) * log_P2(bk) = 0 (in F_q) and zero else. ``` Below shows an example Solidity function that verifies a single signature with EIP197. A signature is a 64 bytes $\Bbb G_1$ group element, and its calldata is a length 2 array of uint256. On the other hand, a public key is a 128 bytes $\Bbb G_2$ group element, and its calldata is a length 4 array of uint256. ```solidity function verifySingle( uint256[2] memory signature, uint256[4] memory pubkey, uint256[2] memory message ) internal view returns (bool) { uint256[12] memory input = [ signature[0], signature[1], nG2x1, nG2x0, nG2y1, nG2y0, message[0], message[1], pubkey[1], pubkey[0], pubkey[3], pubkey[2] ]; uint256[1] memory out; bool success; assembly { success := staticcall(sub(gas(), 2000), 8, input, 384, out, 0x20) switch success case 0 { invalid() } } require(success, ""); return out[0] != 0; } ``` ### Note on the Gas Consumption Post [EIP-1108](https://eips.ethereum.org/EIPS/eip-1108), k pairings cost `34 000 * k + 45 000` gas. So as the above example, to validate a single signature takes 2 pairings and thus 113000 gas. Validating n different messages takes n + 1 pairings, which costs `80 000 + 34000*n` gas. But don't worry, the gas cost here only incurs when the dispute happens. It won't affect scalability but we just have to make sure it's not too expensive to make the fraud undisputable. ## Hash to curve It turns out mapping the signing message to a curve point is not trivial at all. We took some iterations to get a satisfactory result. To see the complexity of this problem, we need to dive in to the elliptic curve rabbit hole deeper. The BN254 "curve" is defined by the equation $y^2 = x^3 + 3$, where x and y are finite fields and arithematics are under modulo some prime p. You can think it a curve on a 2D plane. ![](https://i.imgur.com/pZifjS0.png) > If we define x and y on the Real Number, the equation looks smooth ![](https://i.imgur.com/38mDcP3.png) > If we define x and y on the finite fields, the equation looks like scatter dots. But you can still see the symmetry. This is a toy example defining on the field $\Bbb Z_{97}$. The $\Bbb Z_q$ that BN254 G1 is defined on has the field modulus a 254 bit prime number. The G2 is a little bit crazier, the curve is defined on $\Bbb Z_{q^2}$, where each x and y coordinate has two integers and is analogous to a Complex number. As we said before, we need to map the message to points on the curve. The x and y are fields modulo a 254 bit prime p. The simplist and naive approach would be what people call a "hash and pray" approach. We treat the message as a x coordinate. Unfortunatly not all x coordinate has a corresponding y coordinate. So we plug x in the equation $y^2 = x^3 + 3$, then compute the square root of y^2. If such y exists, then the (x, y) is a point on the curve. Otherwise, we bump x by 1 and pray we can find a valid y this time. The solidity code looks like this. The sqrt function is just the `input^(N+1/4)`, where N is the modulus. The finite field magic gurantees that when our modulus N is congruent to 3 modulo 4, we can compute a square root for x by computing `x^((N+1)/4)`, if the square root exists at all. The sqrt involves calling another precompile contract [EIP-198](https://eips.ethereum.org/EIPS/eip-198) to compute modular exponatiation. We attempted to optimize the modular exponatiation operation which we'll introduce later, but before we pull that off the precompile costs around 14k per call. ```solidity function mapToPoint(bytes32 _x) public view returns (uint256[2] memory point) { uint256 x = uint256(_x) % N; uint256 y; bool found = false; while (true) { y = mulmod(x, x, N); y = mulmod(y, x, N); y = addmod(y, 3, N); (y, found) = sqrt(y); if (found) { point[0] = x; point[1] = y; break; } x = addmod(x, 1, N); } } ``` On average the function takes 30k gas to find the result. But the real problem with the hash and pray approach is that an attacker can grind a message that takes the mapToPoint so many loops to find the square root. At extreme, it's too expensive to check on chain, this would make the message undisputable. Kobi proposed another [solution](https://gist.github.com/kobigurk/b9142a4755691bb12df59fbe999c2a1f#file-bls_with_help-sol-L129-L154) to save the 13k gas computing square root per loop. Instead of computing the square root on chain, the disputer can provide the result of square root. The `mapToPointWithHelp` now only need to check if the sqaure of the expected root matches the $y^2$. ```solidity function mapToPointWithHelp(bytes32 _x, uint256[] memory expected_roots) public view returns (uint256[2] memory p) { uint256 x = uint256(_x) % N; uint8 i = 0; uint256 y; uint256 m; while (true) { y = mulmod(x, x, N); y = mulmod(y, x, N); y = addmod(y, 3, N); m = mulmod(expected_roots[i],expected_roots[i], N); if (m == y) { p[0] = x; p[1] = expected_roots[i]; break; } else if (N-m == y) { x = addmod(x, 1, N); i += 1; } else { revert("Wrong expected root."); } } } ``` To really do the map to point right, we need to use the constant time algorithm Fouque-Tibouchi (FT hereafter). We didn't consider it at first because it is more difficult to implement. But Onur successfully implemented it. The FT function still takes up to 4 sqrt computation. Onur also implemented the [kobi's proposal for FT](https://github.com/kilic/evmbls/blob/54c3e8f269b2bafa02a4e4969d0e976ecb2ce04a/contracts/BLS.sol#L185), so that the disputer can witness the expected sqaure root to it. But this time we have the optimized modular exponantiation, we decided to stick to plain FT for simpicity reason. ### The side quest of the modular exponantiation The modular exponantiation (modexp hereafter) is to evaluate the output of `x^y % N` for integers x, y and N. We use it in two use cases. The first is what we discussed previously the square root computation `x^((N+1)/4) mod N`. The second is the computation of modular inverse `x^(N - 2) mod N` using Fermat's Little Theorem. The experienced reader would be asking "Wait, what! Why don't you use the Extended Euclidean algorithm (EEA hereafter)?" wouldn't that be more efficient? It turns out the overhead of the looping and branching in EEA make it more expensive. The optimized modexp version can be cheaper in gas, and more importantly it is constant time! No grinding numbers attacks! And yes, we have an [issue](https://github.com/thehubbleproject/hubble-contracts/issues/399) for you to dive in. So what is modexp? It is just expoentiation with modulo operation. How do you compute x^5? You multiply x by 5 times, which looks like this `x*x*x*x*x`. But that takes a 5 multiplications, and we usually want to exponantiate x to a giant 254 bit number `N`. So a slightly improved way is to memorize the squares we've computed, and reuse them to save multiplications. For example, to compute `x^5` we do `x2 = x*x`, `x^4 = x2*x2`, then `x5 = x*x4`. We can do binary decomposition of the exponent, and run a "sqaure-and-multiply" with few lines. If the exponent is t bits, then the sqaure-and-multiply scans the t bits and ends in t loops. This reduces from the naive 2^t multiplications to average 1.5t multiplications. ```solidity // compute x^v % n, where v and n are known constants function modexp(uint256 x) internal pure returns (uint256) { uint256 sq = x; uint256 xx = 1; uint256 n = 0x30644e72e131a029b85045b68181585d97816a916871ca8d3c208c16d87cfd47; uint256 v = 0xc19139cb84c680a6e14116da060561765e05aa45a1c72a34f082305b61f3f52; for (uint256 i = 0; i < 252; i++) { if ((v >> i) & 1 == 1) { xx = mulmod(xx, sq, n); } sq = mulmod(sq, sq, n); } return xx; } ``` But this Solidity code is still more expensive than the precompile code (precompile 13750 gas vs 38673). The branching and looping has overheads. We tried to implement the assembly code to golf down the gas (30470 gas). Vitalik also suggested us to try different loop unrolling tricks to reduce the overhead, where we ended up 21021 gas. The iterations can be tracked [here](https://github.com/ChihChengLiang/modexp/blob/master/contracts/ModExp.sol) After doing various loop unrolling, we realized that the exponent and the modulus are all constants. We have to exploit that hard enough. The result is [this](https://github.com/ChihChengLiang/modexp/blob/master/contracts/Monster.sol) monster code of hardcoded `mulmod`s (7133 gas). Onur also proposed to explore the addition chain approach. Instead of memorizing just sqaures, there are more intermediate results we can memorize and reuse later. Some [libaraies](https://github.com/kwantam/addchain) can help us explore a shortest addition chain, or to be exact, the `mulmod` chain. Kobi implemented a [contract](https://github.com/ChihChengLiang/modexp/blob/master/contracts/AddChainLong.sol) using this technique (6744 gas). By now we've already beat the gas cost of precompile contract by almost 50%. But Protolambda was not going to stop just there. Proto observed the monster code in the bytecode level. Solidity doesn't know what we are trying to do, so it's clumsily manipulating the stack with dups, swaps, and pushes. If we arrange the stack smart, we can save those swap overheads. Proto performed a temporal pincer attack (Tenet hasn't released by then actually). The stack is deep, which has 1024 slots. But we can only operate the top 16 slots. In the square and multiply, we push the input x and the modulus n required by the "multiply" step to the stack. Also we push the n required by the "square" step too. So as shown in the image below, the blue team worked the bits in the reversed order and push the values to the stack. Then we push the initial value xx, which also acts as an accumulator of the result, to the stack. Then the red team start to work with the normal order of the bits, consuming the values in the stack with minimum stack operations. (5075 gas) ![](https://i.imgur.com/EP1wSTQ.png) We eventally chose the addition chain method in Solidity assembly over the optimized bytecode approach. The reason is that the Solidity code are pure and internal functions and thus can be compiled with other contracts together. Calling those functions are JUMP operations. The bytecode needs to be deployed as an external contract and calling the function requires a static call. ## The issue of intepreting the failing precompile call We've talked about checking whether a tuple of (x, y) coordinate is on the curve, or put it differently, satisfying the equation y^2 = x^3 + 3. Unfortunately, checking if they are "on the curve" is not enough, we also need to check if they are "in the group". Not all points on the curve are valid for our pairing function inputs. We have the G1 group, which is a list of points that can be "generated" by the point G1, under the special curve operation "Add" and "Multiply." Same applies to G2. So only some of the points on the curve are valid. Unfortunately again, validating "in the group" for G2 is prohibitly gas costly on EVM. This means we are not able to catch the bad public key on registration. The EIP 197 precompile contract does the group check for us. So at least we are protected at the stage of validating the signature. The catch is it responds with **call fails** instead of **outputing false**. This is problematic because now we can't distingish the following cases. - The coordinator submitted a batch containing an not-in-group public key. The coordinator is guilty and should be slashed - The disputer runs the check with insufficient gas. It's the disputer's bad and we shouldn't blame the coordinator. Instead of calling the precompile contract using as much gas as possible, as we shown in previous code sample. Our solution is to call the precompile with [exact gas cost](https://github.com/thehubbleproject/hubble-contracts/pull/328), so that we can intepret the failing call as the coordinator's fault. Now that the function looks like this. ```solidity function verifySingle( uint256[2] memory signature, uint256[4] memory pubkey, uint256[2] memory message ) internal view returns (bool checkResult, bool callSuccess) { ... uint256[1] memory out; assembly { callSuccess := staticcall( precompileGasCost, 8, input, 384, out, 0x20 ) } if (!callSuccess) return (false, false); return (out[0] != 0, true); } ``` ### Future proof for gas cost change Here we are getting a new problem. If we hardcode the `precompileGasCost`, then in the future the client implment some more efficient cryptography libary and people decide to propose an EIP to reduce the gas cost more, then it sounds fine. But if for some reason, people propose to increase the gas cost, then all checks will fail due to out of gas and thus become slashable. To prevent this scary scenario to happen, we proposed a simple [solution](https://github.com/thehubbleproject/hubble-contracts/issues/400). We can just update the `precompileGasCost` on chain whenever we want! The [BLS cost estimator](https://github.com/thehubbleproject/hubble-contracts/blob/master/contracts/libs/BNPairingPrecompileCostEstimator.sol) calls two dummy pairing checks. Using the difference of the gas costs, we can compute the exact base gas cost and the per pairing gas cost. ## Putting all together Let's walk through how to dispute a commitment with invalid signature. We first enter the disputeSignatureTransfer function in Rollup.sol. - The first thing the disputer should specify is which batch to dispute, the chain will check if the batchID "is disputable." This prevents people dispute a finalized batch or perform the dispute when we are already in the middle of a rollback. - Then the disputer shuold prove the commitment to dispute is in that batch. The chain only remembers the merkle root of the batch's commitments. The disputer can prove this via a merkle proof. - Then we throw bunch of structs to a signature validity check, which we'll exam next. If the validity check returns a happy result, then nothing happens. Otherwise, it means the commitment is invalid and we should start a rollback immediately. ```solidity // Rollup.sol function disputeSignatureTransfer( uint256 batchID, Types.TransferCommitmentInclusionProof memory target, Types.SignatureProof memory signatureProof ) public isDisputable(batchID) { require( checkInclusion(batches[batchID].commitmentRoot, target), "Rollup: Commitment not present in batch" ); Types.AuthCommon memory common = Types.AuthCommon({ signature: target.commitment.body.signature, stateRoot: target.commitment.stateRoot, accountRoot: target.commitment.body.accountRoot, domain: domainSeparator(), txs: target.commitment.body.txs }); Types.Result result = transfer.checkSignature(common, signatureProof); if (result != Types.Result.Ok) startRollingBack(batchID, result); } ``` Before we enter the next function, we have to talk about the AuthCommon struct. We note that the stateRoot is the **post-state-transition** state root. So that before the disputer dispute the signature, the disputer should dispute the state-transition. Otherwise, disputer are not able to provide the state leaves for the later merkle proof checks. That transfer.checkSignature should route you to the Authenticity.sol::verifyTransfer. The big picture here is we are validating sigantures, which means we need public keys, signning messages, and the aggregated sigature. Another big picture is that before we are pointing fingers to the coordinator, we all have to check if the disputer provides valid data. We can't allow a fraudulent disputer using fraudulent fraudproof to prove non-fraudulent coordinator fraudulent. Before we established the signature, state root, account root, and transactions are indeed the the same data committed by the coordinator. - where's the signature? This is an easy one, we already get it from the commitment. - So, where's the signing message? We have the committed transaction, but we still need nonce, which can be acquired from the state leaf at the state tree index senderIndex. - Where's the public key? The public key is the leaf of the account tree, where the leaf index is the pubkeyID of the state leaf. We loop through the L2 transactions to compute the signing messages, then we feed it to the BLS.verifyMultiple with the aggregated signature and the array of public keys. Note that we loop through the L2 transactions in the reversed order and use the `senders` array to compute the correct nonce. If same sender happens to send two transactions in the same commitment, the transactions should have different nonce. But since we prove the membership of the state leaf using post-state-root, the two state leaves are having same nonces. Astute reader might already see the rooms of optimizations since we don't need to prove the same state leaf twice. We didn't optimize the fraudproof in the first place because the fraudproof rarely happens and it is good enough to just work. Proposals for optimizations and your contributions are welcomed! ```solidity // Authenticity.sol function verifyTransfer( Types.AuthCommon memory common, Types.SignatureProof memory proof ) internal view returns (Types.Result) { uint256 size = common.txs.transferSize(); uint256[2][] memory messages = new uint256[2][](size); uint256[] memory senders = new uint256[](size); // We reverse loop the transactions to compute nonce correctly for (uint256 j = 0; j < size; j++) { // i is the index counting down from tail uint256 i = size - j - 1; Tx.Transfer memory _tx = common.txs.transferDecode(i); // check state inclusion require( MerkleTree.verify( common.stateRoot, keccak256(proof.states[i].encode()), _tx.fromIndex, proof.stateWitnesses[i] ), "Authenticity: state inclusion signer" ); require(proof.states[i].nonce > 0, "Authenticity: zero nonce"); // check pubkey inclusion require( MerkleTree.verify( common.accountRoot, keccak256(abi.encodePacked(proof.pubkeys[i])), proof.states[i].pubkeyID, proof.pubkeyWitnesses[i] ), "Authenticity: account does not exists" ); uint256 nonce = proof.states[i].nonce - 1; // Add a huge value to avoid collision to the array initial value 0 uint256 safeIndex = _tx.fromIndex + 1000000000000000; for (uint256 k = 0; k < j; k++) { // Update nonce if the sender is seen before if (senders[k] == safeIndex) nonce--; } senders[j] = safeIndex; // construct the message bytes memory txMsg = Tx.transferMessageOf(_tx, nonce); messages[i] = BLS.hashToPoint(common.domain, txMsg); } bool callSuccess; bool checkSuccess; (checkSuccess, callSuccess) = BLS.verifyMultiple( common.signature, proof.pubkeys, messages ); if (!callSuccess) return Types.Result.BadPrecompileCall; if (!checkSuccess) return Types.Result.BadSignature; return Types.Result.Ok; } ``` ## Conclusion The Hubble project features scaling L2 transactions using the aggregate signature technique. We explained how the aggregate signature saves gas costs. We also explored the different subtleties and challenges of using aggregate signatures, and our efforts to solve the problem.