owned this note
owned this note
Published
Linked with GitHub
# Accelerating Elliptic Curve Operations and Finite Field Arithmetic (WASM) (RFC)
Authors: Tom Shen (UCB/Manta), Boyuan Feng (Manta), Shumo Chu (Manta)
Acknowledgement: Thanks for Kobi's comments on pipegener
WASM (WebAssembly) is the de-facto standard for smart contact VM like Polkadot, Definity, Cosmos. And also critical for wallet adoption. However, making ZKP works efficiently on WASM is still a challenge today. In Manta’s vallina benchmark, we can observe 6x - 60x performance penalty on WASM compared with X86 native binary. This WASM ZKP challenge is to bring breakthroughs in compilation to make ZKP on WASM (both prover and verifier).
## Single Thread vs Parallelization
We think single thread performance is extremely important, for two reasons:
1. Many wallet infrastructure, like metamask-snap, only support single threaded execution
2. Parallel performance will be accelerated by single thread performance improvement as well but not vice versa.
**Goal:** Test the correctness and performance of WASM binaries on some operations that are common in ZKP systems.
## Vallina Results
Please check out [benchmark code](https://github.com/Manta-Network/wasm-zkp-chanllenge).
### Platform
Intel i7-6560U CPU @ 2.2GHz, 8GB Memory, Linux 16.04 LTS.
Using Arkworks BLS12-381 curve.
### FFT Results
|Input Vector Length | Output Vector Length | WASM (ms) | Native (ms) | Perf. Penalty |
| --- | --- | --- | --- | --- |
| 2^14 | 2^15 | 111 | 18 | 6.2 |
| 2^16 | 2^17 | 496 | 86 | 5.8 |
| 2^18 | 2^19 | 2373 | 450 | 5.3 |
| 2^20 | 2^21 | 9912 | 2118 | 4.7 |
We note that, for the same input vector length, the output vector length also has impact over the latency. Please see below:
|Input Vector Length | Output Vector Length | WASM (ms) | Native (ms) | Perf. Penalty |
| --- | --- | --- | --- | --- |
| 2^14 | 2^15 | 111 | 18 | 6.2 |
| 2^14 | 2^16 | 190 | 32 | 5.9 |
| 2^14 | 2^17 | 358 | 62 | 5.8 |
| 2^14 | 2^18 | 773 | 129 | 6.0|
### MSM Results
|Input Vector Length | WASM (ms) | Native (ms) | Perf. Penalty |
| --- | --- | --- | --- |
| 2^8 | 478 | 14 | 34.1 |
| 2^10 | 1627 | 38 | 42.8 |
| 2^12 | 6191 | 125 | 49.5 |
| 2^14 | 24243 | 393 | 61.7 |
### Pairing Results
|Input Vector Length | WASM (ms) | Native (ms) | Perf. Penalty |
| --- | --- | --- | --- |
| 2^2 | 97 | 7 | 13.9 |
| 2^4 | 368 | 30 | 12.3 |
| 2^6 | 1479 | 121 | 12.3 |
| 2^8 | 5916 | 485 | 12.2 |
### Setting up the benchmark
**Input:**
- $P_u$: user’s wasm binary
- $P_b$: baseline wasm binary compiled from rust toolchain
- $d \in [0,1)$: difficulty
1. We prepare some randomized input.
2. Run $P_u$, and put prepared input to stdin.
3. Repeat the last step multiple times (TODO). Measure the average runtime $T_u$ and record the output $S_u$.
4. Run $P_b$, and put prepared input to stdin.
5. Repeat the last step multiple times. Measure the average runtime $T_b$ and record the output $S_b$.
6. Accept if $S_u = S_b$ and $T_u \le (1 - d) T_b$
## Test Suite 1: Low-degree extension
This test measures the performance of (I)FFT.
**Test Field: Bls12-381-Fr**
**Input:**
- Radix-2 domain $D_1$ with $|D_1| = 2^{19}$, with generator $g_1$
- Radix-2 domain $D_2$ with $|D_2| = 2^{20}$ generator $g_2$
- random vector $\vec{v_1}$ of length $|D_1|$, interpreted as the evaluation of a polynomial $p$ on $D_1$.
**Output:**
vector $\vec{v_2}$ of length $|D_2|$ such that $\vec{v_2}$ is the evaluation of $p$ on $D_2$.
**Analysis:**
A common wasm binary will need one IFFT to get coefficients of $p$ and one FFT to evaluate $p$ on $D_2$
## Test Suite 2: Product of Pairings
This test measures the performance of bilinear pairing.
**Test Curve: Bls12-381** as $(G_1, G_2)$
**Input:**
- $\vec{v}_1$: Random vector of affine representation of $G_1$.
- $\vec{v}_2$: Random vector of affine representation of $G_2$.
We sample $\vec{v}_1$, $\vec{v}_2$ by taking random exponent on affine generation. Sampling is not included in benchmark time. Exponents will not be given.
**Output:**
$x\in G_T$ such that
$$
x = \prod_{(P, Q)\in\mbox{zip}(\vec{v}_1, \vec{v}_2)} e(P,Q)
$$
**Analysis:**
This operation requires computation of product of pairings.
## Test Suite 3: Multi-Scalar Multiplication
This test measures the performance of scalar multiplication.
**Test Curve: Bls12-381** as $(G_1, G_2, \mathbb{F}_r)$
**Input:**
- $\vec{v}$: Random vector of $G_1$.
- $\vec{s}$: Random vector of $\mathbb{F}_r$.
**Output:**
$$
x = \sum_{(P, s)\in\mbox{zip}(\vec{v}, \vec{s})}sP
$$
**Analysis:**
The naive operation requires computation of $|\vec{v}|$ scalar multiplication.
Potential improvement ideas:
1. Pipegener optimization: https://github.com/arkworks-rs/algebra/issues/60