O(1) Labs THEMIS Proposal

# O(1) Labs THEMIS Proposal In this proposal, we describe a system for a new version of Brave's THEMIS protocol achieving the requirements described in the RFP. We describe a protocol involving three parties. **The user**, **Brave**, and **the roll-up prover**. We imagine that Brave would operate the roll-up prover, but this party receives no secret information, and so it could be anyone, or even a federation of unrelated provers for added decentralization. The protocol described achieves - **Excellent client side performance (compute):** efficient user computation of BBA updates and ZKPs for proving correctness of reward computation, requiring about 1 second on a single core. - **Excellent client side performance (bandwidth):** - BBA updates involve a single message of about 2kB from the user plus 8 bytes per campaign to update, and a response of about 64 bytes. - The user sends an opening proof to the roll-up prover which is roughly 2kB. - **Scalable on chain verification on Mina or Ethereum with a roll-up proof.** Computation of the roll-up proof costs about $0.000446 per reward-payout. *Execution of the payouts on Ethereum seems to exceed the desired cost slightly for reasons mostly unrelated to the proof verification. We request Brave's comment on this aspect of our proposal.* - **No trusted setup, and no pairings.** The security is based on the hardness of discrete log for the [Pasta curves](https://electriccoin.co/blog/the-pasta-curves-for-halo-2-and-beyond/) and the random oracle model. - **All desired privacy and verifiability properties.** Furthermore, when using Mina as the payout L1, even resource-constrained users can fully verify correctness of payouts by checking a single SNARK. The system is based on the existing THEMIS protocol, but with the pairing-based BBA replaced with a ZKP-based BBA, and with a roll-up proof system on top. You can find an implementation of the BBA scheme described [here](https://github.com/imeckler/bba), with a few differences described there. ## The proposed BBA scheme Here is a proposal for replacing the BBA scheme with a Pasta-based ZKP. The motivation for this proposal is that we want to use the Pasta curves for proofs of type (2) and (3) and we don't have access to a pairing-friendly curve inside those fields. ### Preliminaries Let $\mathbb{G}$ be the [Pallas curve](https://electriccoin.co/blog/the-pasta-curves-for-halo-2-and-beyond/). Let $\mathbb{F}$ be the scalar field of $\mathbb{G}$. Fix a hash-to-curve function $W$ mapping into $\mathbb{G}$ (our libraries have such a function implemented). Define group elements $G := W(0)$, $H := W(1)$, and $G_i := W(i + 2)$. Fix an integer $m$ and let $n := 2^m$. $n - 10$ will be an upper bound on the number of active ad campaigns. When discussing concrete efficiency, we assume $m = 10$ giving a bound of $2^{10} - 10 = 1014$ on the number of active campaigns. Let $(L_0, L_1, \dots, L_{2^m - 1})$ denote the commitments to the Lagrange polynomials over the domain of $2^m$ roots of unity against the vector $(G_0, G_1, \dots)$. That is, let $\omega$ be a primitive $2^m$ root of unity and let $\ell_i$ be the polynomial of degree $2^m - 1$ with $\ell_i(\omega^i) = 1$ and $\ell_i(\omega^j) = 0$ for $j \neq i$. Then, letting $c_{i,0}, \dots, c_{i,2^{m-1}}$ be the coefficients of $\ell_i$, $$ L_i := \sum_{j < 2^m} c_{i, j} G_j $$ Fix a signature scheme $(\mathsf{sign}, \mathsf{verifySig})$ based on $\mathbb{G}$ and say $P = k G$ is Brave's public key, with secret key $k$. ### The scheme Given a state $[s_0, \dots, s_{n-3}]$, the corresponding accumulator will consist of - $\mathsf{acc}$, a group element equal to $$r H + c L_0 + \sum_{i = 1}^5 \alpha_i L_{2+i} + \sum_{i < n-10} s_i L_{8 + i}$$ where $c,r, \alpha_1, \dots, \alpha_5$ are random scalars known only to the user. $c$ is akin to the token "ID" mentioned in the Earn scheme. $r$ and the $\alpha_i$ are used for achieving unlinkability and zero-knowledge opening respectively. - a signature on $\mathsf{acc}$ that verifies against Brave's public key $P$. The scheme will work as follows. - **Initialization.** The user samples scalars $r, \alpha_1, \dots, \alpha_5, c$ at random and sends Brave $C := rH + \sum_{i=1}^5 \alpha_i L_{1 + i} + c L_0$ along with a PoK that it knows scalars to make $C$ a linear combination of $(H, L_{n-1}, \dots, L_{n-5}, L_0)$. Brave responds with $\sigma := \mathsf{sign}_k(C)$ and the user sets their initial accumulator to $(C, \sigma)$ . - **Update.** Let $(A, \sigma)$ be the user's current accumultor with corresponding user secret state $(r, c, \alpha_1, \dots, \alpha_5)$. First, the user samples a random scalar $r'$. Let $A' = A + r' H$. Let $\pi$ be a zero-knowledge proof with statement $A'$ proving "I know $(A, r', \sigma)$ such that $\mathsf{verifySig}(A, \sigma)= \top$ and $A' = A + r'H$". The user sends Brave $(A', \pi)$ along with the updates $\delta_0, \dots, \delta_{n-1}$ to their ad state. Brave and the user both compute $A'' = A' + \sum_{i < n} \delta_i L_{8+i}$ and sends back $\sigma' := \mathsf{sign}_k(A'')$. The user's new accumulator is then $(A'', \sigma')$, with corresponding secret state $(r + r', c, \alpha_1, \dots, \alpha_5)$. - **Opening.** The user can open the accumulated value $A$ at any time by exhibiting $(c, \alpha_1, \dots, \alpha_5, r, s_0, \dots, s_{n-1})$ such that $$ A = c L_0 + r H + \sum_{i=1}^5 \alpha_i L_{2+i} + \sum_{i < n - 10} s_i L_{8+i} $$ However, they will not do so directly, but will instead create a ZKP proving they know an opening such that the inner product with the policy vector is a given public value (and no overflow occurred in computing the inner product, and that the sum of the reward-vector entries is bounded by a limit $L$.) ### The BBA update proof system As mentioned, for public input $A'$, users will prove > I know $(A, r', s)$ such that $\mathsf{verifySig}(A, s)= \top$ and $A' = A + r'H$. This statement is very efficient. Roughly, it involves - $\mathsf{verifySig}$ - hashing 3 field elements (the two coordinates of $A$, plus one field element from the signature): 3 + 63 constraints - performing 1 fixed base scalar multiplication: 2*255 constraints - performing 1 endoscaling: 255 constraints - computing the parity of a field element: (17 constraints: unpack into 16 quartets, unpack one quartet) - checking 2 equations of field elements: 2 constraints - computing the scalar multiplication $r' H$: 255 constraints (using endoscaling) In total, the cost is about 3 + 63 + 2*255 + 255 + 17 + 2 + 255 = 1105 constraints. - **Prover efficiency.** Less than 1 second on a single core. - **Proof size** - The proof would consist of - $24 = 2 + 2\lceil{\log_2(1105)}\rceil$ group elements for the inner-product opening - $14$ group elements for the PLONK messages - 38 field elements for polynomial evaluations and the opening proof - In total, requires $76 = 24 + 14 + 38$ field elements and $38$ bits, coming in total to $2437$ bytes. - **Verifier efficiency.** Somewhere in the neighborhood of 3.3ms per proof, plus 5.6ms per batch. ### The reward computation proof system Here we describe the proof system used by users to produce "proofs of correct reward computation" corresponding to their accumulators. Let $L$ be the limit mentioned by Brave on [page 6](https://raw.githubusercontent.com/brave-intl/themis-rfcc/main/rfcc-themis-bbas-v1.0.pdf). For public input - $(A, \sigma)$: the user's accumulator - $\vec{p}$: the policy vector - $v$: the payout amount - $c$: one of the user's secret scalars - $Q$: the user's public key on the L1 that we are paying out on Let $N \leq n - 10$ be the number of active campaigns. the user will prove: > I know $(\alpha_1, \dots, \alpha_5, r, s_0, \dots, s_{N-1})$ such that > > 1. $$ A = c L_0 + r H + \sum_{i=1}^5 \alpha_i L_{2+i} +\sum_{i < n - 10} s_i L_{8+i} > $$ > 2. > $$ <\vec{s}, \vec{p}> = v$$ > where $\vec{s} = (s_0, \dots, s_{n-10 - 1})$ is the user's reward state and there are no overflows in this computation, and > 3. $$\sum_{i < n - 10} s_i < L$$ This statement is also quite efficient. PLONK-type circuits consist of a series of rows with each row being a vector of field elements with some constraints asserted of that vector of field elements and those in the next row. We describe directly a circuit that proves (1) and (2), and explain at a high level how it could be modified to prove (3) as well. The circuit will consist of exactly $n$ rows of 7 field elements. Rows $0, 1$ and $2$ will contain the public input. They will be, respectively, - $(c, 0, 0, 0, 0, 0, 0)$ - $(Q, 0, 0, 0, 0, 0, 0)$ - $(v, 0, 0, 0, 0, 0, 0)$ For $0 \leq i < 5$, the $(3 + i)^{\text{th}}$ row will contain $$ (\alpha_{1+i}, 0, \beta_{i, 0}, \beta_{i, 1}, \beta_{i, 2}, \beta_{i, 3}, \beta_{i, 4}) $$ with no constraints and where each $\beta_{i, j}$ is a random field element. For $i < n - 10$, the $(8 + i)^{th}$ row will contain $$ (s_i, p_i, \mathsf{acc}_i, \ell_{i,0}, \ell_{i,1}, \ell_{i,2}, \ell_{i,3}) $$ where $s_i = \sum_{j < 4} 2^{8j} \ell_{i, j}$ and $0 \leq \ell_{i,j} < 2^8$. In other words, $\ell_{i, j}$ is the $j$th byte of $s_i$. There will be 3 groups of constraints on this row: 1. $s_i = \sum_{j < 4} 2^{8j} \ell_{i, j}$ 2. For each $j < 4$, a lookup constraint checking that $\ell_{i, j} \in \{ 0, \dots, 2^8 - 1 \}$. 3. If $i = 0$, $\mathsf{acc}_i = 0$, and in all cases $\mathsf{acc}_{i + 1} = \mathsf{acc}_i + s_i p_i$. The circuit is designed so that the accumulator value $A$ is in fact a hiding commitment to the first column and so that the second column, containing only the policy vector, can be provided by the verifier and need not be computed by the prover. The PLONK proof corresponding to this circuit will prove that $A$ is a commitment to a vector of 32 bit values, and that the inner product with the policy vector is the claimed value $v$, and that $c$ is the corresponding user secret. The proof will also serve as a signature-of-knowledge on the payout public key $Q$. - **Prover efficiency.** Should be comparable to the above proof, less than 1 second. - **Proof size.** The full payout proof will consist of the PLONK proof described above plus the signature on the accumulator. The signature is quite small compared to the proof (64 bytes), so it will be similar to the BBA update proof at about 2kB. **Remark on maintaining unlinkability.** Note that the above proof system can be used to maintain unlinkability to the user who watched the ads if used as follows. When the user is ready to compute an opening proof, they should update their BBA with updates $\delta_i = 0$ to get a freshly randomized accumulator that is not linkable to any of their ad viewing. ### Discussion of the scheme #### Why Lagrange commitments are used In light of this circuit description, one can see the motivation for the usage of Lagrange polynomial commitments as the bases in the accumulator value. Specifically, in the PLONK-type protocol we use here, the prover commits to the first column (i.e., the vector consisting of the first element of each row). Let $w = (w_0, \dots, w_{n-1})$ be that vector. The prover does this by taking $w$, interpolating to find a polynomial $\widehat{w}$ with $\widehat{w}(\omega^i) = w_i$ for $i < n$, and then commiting to the coefficients of that polynomial as a Pedersen commtiment against $(G_0, G_1, \dots)$. We can write this $\widehat{w}$ as a sum of Lagrange polynomials as $$ \widehat{w} = \sum_{i < n} w_i \ell_i $$ where, recall $\ell_i$ is the $i^{\text{th}}$ Lagrange polynomial. For $f$ a degree $n-1$ polynomial, let $\langle f, G \rangle$ denote the Pedersen commitment to the coefficients $(f_0, \dots, f_{n-1})$ of $f$. I.e., $$ \langle f, G \rangle := \sum_{i < n} f_i G_i $$ It is evident that this commitment is additively homomorphic. As a result, the commitment to $\widehat{w}$ is a linear combination of commitments to the $\ell_i$. Specifically, $$ \langle \widehat{w}, G \rangle = \sum_i w_i \langle \ell_i, G \rangle = \sum_i w_i L_i $$ The commitment scheme also permits making the commitments hiding by adding a multiple of the point $H$. In this case, we will have the prover add $r H$, where $r$ is one of the user's secret value. As a result, because of how the rows in the circuit are laid out, with - $w_0 = c$ - $w_1 = Q$ - $w_2 = v$ - $w_3 = \alpha_1, \dots, w_7 = \alpha_5$ - $w_{8 + i} = s_i$, with randomness $r$, the first column commitment is $$ r H + c L_0 + Q L_1 + v L_2 + \sum_{i=1}^5 \alpha_i L_{2+i} + \sum_{i < n - 10} s_i L_{8 + i} $$ which is equal to $$ A + Q L_1 + v L_2 $$ where $A$ is the user's accumulator group element. As a result, the opening proof can omit the first column commitment, which can then be recomputed from $(A, Q, v)$ and which then proves consistency of the values used in the circuit with the values in the accumulator commitment $A$. #### Why the scheme does not require a trusted setup Our SNARK is implemented using a discrete log-based polynomial commitment scheme based on the inner-product argument used in [Bulletproofs](https://eprint.iacr.org/2017/1066.pdf). The exact version we use is a slight variant of the one described in appendix A2 [here](https://eprint.iacr.org/2020/499.pdf). This inner-product argument does not require any trusted setup and the feature is carried over to our SNARK. #### How double spending of the proof-of-reward is prevented The value $c$ in our scheme serves the same role as the "id" in the Earn scheme. When creating a proof-of-reward, the user reveals their $c$ value as part of the opening proof. The party authorizing the ultimate payouts (i.e., a blockchain or a roll-up proof circuit) would check that $c$ values are never used more than once. #### Batch verification As mentioned, the proof system we use has an efficient "batch verifier". Specifically, the cost of verifying $N$ proofs is roughly 5.6ms plus $N$ times 3.3ms. So there is a fixed cost of 5.6ms which can be amortized across a batch of proofs to verify. As a result, when the update-authority server is verifying update requests, it can batch them to cut the cost of verifying proofs from roughly 3.3 + 5.6 = 8.9ms to roughly 3.3ms when batch size is large enough. ## Roll up proofs Here we describe the design of a roll-up system for aggregating payout proofs. We will describe a system that can either work for Mina, or for Ethereum with a slight modification which we describe. ### Circuits As is typical for such systems, the proof system will have three circuits. 1. A "base" circuit (based on the Vesta curve) proving knowledge of a single payout proof. 2. A "wrap" circuit (based on the Pallas curve) which is used to prove knowledge of a Vesta-based proof. This circuit is used to change the field required for verification. 3. A "merge" circuit which will prove knowledge of two other proofs and combine their sets of payouts into a single payout set. O(1) Labs' *pickles* library automates implementation of the "wrap" circuit and of the verifier circuit fragments inside the "merge", and allows the programer to only specify the application logic, which is what we will describe here. This proof system will have as its public input a pair of hashes, corresponding to hash-chains of the payouts that a given proof authorizes. Let us name this as $(\mathsf{pre}, \mathsf{post})$. #### Base circuit The base circuit will prove knowledge of a payout proof $\pi$ verifying against statement $(Q, v)$ (i.e., proving "public key $Q$ is entitiled to payout $v$") and that $\mathsf{post} = H((Q, v), \mathsf{pre})$ where $H$ is a hash function. If the system is to be deployed on Mina, $H$ will be a hash function that wraps $(Q, v)$ in a Mina transaction payload and then invokes Poseidon. If the system is to be deployed on Ethereum, $H$ should be SHA256 or another hash function that is both reasonable efficient in-circuit and efficient in EVM. #### Merge circuit The merge circuit will prove knowledge of an intermediate hash-chain $h$ and of 2 proofs verifying against the statements $(\mathsf{pre}, h)$ and $(h, \mathsf{post})$. ### Deployment We recommend deploying this roll-up as a cluster of prover nodes communicating with a single orchestrator node that parcels out proving work to the prover nodes in order to build a full tree of proofs. We further recommend deploying in a cloud setting with pre-emptible CPUs for cost efficiency. ### Discussion of efficiency and cost In total, from our experience with the efficiency of cloud CPUs, the CPU time required for aggregating $N$ payouts should be somewhere in the neighborhood of $160N$ seconds. The whole task is completely parallelizable and the wall clock time will be roughly $20 \log_2 N$ seconds (around 8.6 minutes for 50 million payouts). Using Google's [pricing list](https://cloud.google.com/compute/all-pricing), the cost of $160N$ seconds for pre-emptible cloud CPUs is about $N \cdot 0.000446$ dollars. ### L1 interaction: Mina The interaction on Mina would be straightforward. There would be a snapp account containing the verification key corresponding to the roll-up proof system. The roll-up proof would be posted directly as a snapp-transaction along with the list of payout payloads. Currently, pre-launch of Mina it is difficult to estimate the exact cost, but we expect it to track with the computation time for proving away those transactions, which is comparable to the cost of the roll-up. Verification of proofs of the type described above is native for Mina and does not incur a significant cost. ### L1 interaction: Ethereum The Ethereum contract for such a deployment would do the following when called with a list of payouts $P$ and a proof $\pi$ 1. Compute the hash-chain $h$ corresponding to $P$. 2. Verify the aggregated proof $\pi$ against the statement $(\varepsilon, h)$. where $\varepsilon$ is a 32 byte string of zeros. 3. Iterate over the list of payouts, and increase each user's BAT token balance with the payout amount. Thus, for $N$ payouts, the cost of the whole on-chain verification is - $N$ times the cost of a SHA256 invocation (or whichever hash function is used). This should be roughly $30N$ gas. - The cost of a single proof verification, which we estimate to be roughly 11,765,022,720 gas. - The cost of the call data, dominated by the $N$ public-key/payout-amount pairs. This should be roughly $16 * (32 + 8)N = 640N$ gas. - $N$ times the cost of updating a token balance This computation can be split up across multiple transactions and blocks. For $N =$ 50 million, the total gas is roughly 45000000000, which at current gas prices is about $0.25 per pay-out. This is above the requested limit. But given that on-chain verification is dominated by the $640N$ cost of posting the payout data, there does not seem to be much of a way out of it besides perhaps only executing on-chain payouts less frequently. That said, we are not EVM experts so it is possible this price could be significantly reduced. ### Advantages of the Mina L1 choice There are several advantages to the use of Mina as the L1. Foremost is the fact that Brave users can fully verify payouts themselves by running a Mina non-consensus node, which can verify the blockchain's SNARK cheaply without the cost of running a full node. Additionally, for future work, Mina's snapps layer supports efficient confidential assets which would enable users to use BAT with additional privacy. Finally, because the proposed roll-up SNARK has verification natively supported on Mina, we expect the on-chain costs of the roll-up payout to be significantly lower than Ethereum. ### Possibility of avoiding L1 entirely Instead of posting proof to an L1, Brave could serve users the rolled-up proofs on demand for them to verify their own payout sets. ## Tooling and implementation recommendations For the new BBA scheme, we propose using O(1) Labs' existing [Rust libraries](https://github.com/o1-labs/marlin/). These libraries contain optimized code for the Pasta curves, and allow you to construct proof systems of the type described in that section. For the roll-up prover implementation, we recommend using O(1) Labs' [*pickles* library](https://github.com/MinaProtocol/mina/tree/develop/src/lib/pickles). This is an OCaml library, but will soon have a Javascript wrapper to ease access.