Accelerating Elliptic Curve Operations and Finite Field Arithmetic (WASM) (RFC)

Authors: Tom Shen (UCB/Manta), Boyuan Feng (Manta), Shumo Chu (Manta)

Acknowledgement: Thanks for Kobi's comments on pipegener

WASM (WebAssembly) is the de-facto standard for smart contact VM like Polkadot, Definity, Cosmos. And also critical for wallet adoption. However, making ZKP works efficiently on WASM is still a challenge today. In Manta’s vallina benchmark, we can observe 6x - 60x performance penalty on WASM compared with X86 native binary. This WASM ZKP challenge is to bring breakthroughs in compilation to make ZKP on WASM (both prover and verifier).

Single Thread vs Parallelization

We think single thread performance is extremely important, for two reasons:

  1. Many wallet infrastructure, like metamask-snap, only support single threaded execution
  2. Parallel performance will be accelerated by single thread performance improvement as well but not vice versa.

Goal: Test the correctness and performance of WASM binaries on some operations that are common in ZKP systems.

Vallina Results

Please check out benchmark code.

Platform

Intel i7-6560U CPU @ 2.2GHz, 8GB Memory, Linux 16.04 LTS.
Using Arkworks BLS12-381 curve.

FFT Results

Input Vector Length Output Vector Length WASM (ms) Native (ms) Perf. Penalty
2^14 2^15 111 18 6.2
2^16 2^17 496 86 5.8
2^18 2^19 2373 450 5.3
2^20 2^21 9912 2118 4.7

We note that, for the same input vector length, the output vector length also has impact over the latency. Please see below:

Input Vector Length Output Vector Length WASM (ms) Native (ms) Perf. Penalty
2^14 2^15 111 18 6.2
2^14 2^16 190 32 5.9
2^14 2^17 358 62 5.8
2^14 2^18 773 129 6.0

MSM Results

Input Vector Length WASM (ms) Native (ms) Perf. Penalty
2^8 478 14 34.1
2^10 1627 38 42.8
2^12 6191 125 49.5
2^14 24243 393 61.7

Pairing Results

Input Vector Length WASM (ms) Native (ms) Perf. Penalty
2^2 97 7 13.9
2^4 368 30 12.3
2^6 1479 121 12.3
2^8 5916 485 12.2

Setting up the benchmark

Input:

  • \(P_u\): user’s wasm binary
  • \(P_b\): baseline wasm binary compiled from rust toolchain
  • \(d \in [0,1)\): difficulty
  1. We prepare some randomized input.
  2. Run \(P_u\), and put prepared input to stdin.
  3. Repeat the last step multiple times (TODO). Measure the average runtime \(T_u\) and record the output \(S_u\).
  4. Run \(P_b\), and put prepared input to stdin.
  5. Repeat the last step multiple times. Measure the average runtime \(T_b\) and record the output \(S_b\).
  6. Accept if \(S_u = S_b\) and \(T_u \le (1 - d) T_b\)

Test Suite 1: Low-degree extension

This test measures the performance of (I)FFT.

Test Field: Bls12-381-Fr

Input:

  • Radix-2 domain \(D_1\) with \(|D_1| = 2^{19}\), with generator \(g_1\)
  • Radix-2 domain \(D_2\) with \(|D_2| = 2^{20}\) generator \(g_2\)
  • random vector \(\vec{v_1}\) of length \(|D_1|\), interpreted as the evaluation of a polynomial \(p\) on \(D_1\).

Output:

vector \(\vec{v_2}\) of length \(|D_2|\) such that \(\vec{v_2}\) is the evaluation of \(p\) on \(D_2\).

Analysis:

A common wasm binary will need one IFFT to get coefficients of \(p\) and one FFT to evaluate \(p\) on \(D_2\)

Test Suite 2: Product of Pairings

This test measures the performance of bilinear pairing.

Test Curve: Bls12-381 as \((G_1, G_2)\)

Input:

  • \(\vec{v}_1\): Random vector of affine representation of \(G_1\).
  • \(\vec{v}_2\): Random vector of affine representation of \(G_2\).

We sample \(\vec{v}_1\), \(\vec{v}_2\) by taking random exponent on affine generation. Sampling is not included in benchmark time. Exponents will not be given.

Output:

\(x\in G_T\) such that
\[ x = \prod_{(P, Q)\in\mbox{zip}(\vec{v}_1, \vec{v}_2)} e(P,Q) \]
Analysis:

This operation requires computation of product of pairings.

Test Suite 3: Multi-Scalar Multiplication

This test measures the performance of scalar multiplication.

Test Curve: Bls12-381 as \((G_1, G_2, \mathbb{F}_r)\)

Input:

  • \(\vec{v}\): Random vector of \(G_1\).
  • \(\vec{s}\): Random vector of \(\mathbb{F}_r\).

Output:
\[ x = \sum_{(P, s)\in\mbox{zip}(\vec{v}, \vec{s})}sP \]
Analysis:

The naive operation requires computation of \(|\vec{v}|\) scalar multiplication.
Potential improvement ideas:

  1. Pipegener optimization: https://github.com/arkworks-rs/algebra/issues/60
Select a repo