Accelerating Elliptic Curve Operations and Finite Field Arithmetic (WASM) (RFC)

Authors: Tom Shen (UCB/Manta), Boyuan Feng (Manta), Shumo Chu (Manta)

Acknowledgement: Thanks for Kobi's comments on pipegener

WASM (WebAssembly) is the de-facto standard for smart contact VM like Polkadot, Definity, Cosmos. And also critical for wallet adoption. However, making ZKP works efficiently on WASM is still a challenge today. In Manta’s vallina benchmark, we can observe 6x - 60x performance penalty on WASM compared with X86 native binary. This WASM ZKP challenge is to bring breakthroughs in compilation to make ZKP on WASM (both prover and verifier).

Single Thread vs Parallelization

We think single thread performance is extremely important, for two reasons:

Many wallet infrastructure, like metamask-snap, only support single threaded execution
Parallel performance will be accelerated by single thread performance improvement as well but not vice versa.

Goal: Test the correctness and performance of WASM binaries on some operations that are common in ZKP systems.

Vallina Results

Please check out benchmark code.

Platform

Intel i7-6560U CPU @ 2.2GHz, 8GB Memory, Linux 16.04 LTS.
Using Arkworks BLS12-381 curve.

FFT Results

Input Vector Length	Output Vector Length	WASM (ms)	Native (ms)	Perf. Penalty
2^14	2^15	111	18	6.2
2^16	2^17	496	86	5.8
2^18	2^19	2373	450	5.3
2^20	2^21	9912	2118	4.7

We note that, for the same input vector length, the output vector length also has impact over the latency. Please see below:

Input Vector Length	Output Vector Length	WASM (ms)	Native (ms)	Perf. Penalty
2^14	2^15	111	18	6.2
2^14	2^16	190	32	5.9
2^14	2^17	358	62	5.8
2^14	2^18	773	129	6.0

MSM Results

Input Vector Length	WASM (ms)	Native (ms)	Perf. Penalty
2^8	478	14	34.1
2^10	1627	38	42.8
2^12	6191	125	49.5
2^14	24243	393	61.7

Pairing Results

Input Vector Length	WASM (ms)	Native (ms)	Perf. Penalty
2^2	97	7	13.9
2^4	368	30	12.3
2^6	1479	121	12.3
2^8	5916	485	12.2

Setting up the benchmark

Input:

\(P_u\): user’s wasm binary
\(P_b\): baseline wasm binary compiled from rust toolchain
\(d \in [0,1)\): difficulty

We prepare some randomized input.
Run \(P_u\), and put prepared input to stdin.
Repeat the last step multiple times (TODO). Measure the average runtime \(T_u\) and record the output \(S_u\).
Run \(P_b\), and put prepared input to stdin.
Repeat the last step multiple times. Measure the average runtime \(T_b\) and record the output \(S_b\).
Accept if \(S_u = S_b\) and \(T_u \le (1 - d) T_b\)

Test Suite 1: Low-degree extension

This test measures the performance of (I)FFT.

Test Field: Bls12-381-Fr

Input:

Radix-2 domain \(D_1\) with \(|D_1| = 2^{19}\), with generator \(g_1\)
Radix-2 domain \(D_2\) with \(|D_2| = 2^{20}\) generator \(g_2\)
random vector \(\vec{v_1}\) of length \(|D_1|\), interpreted as the evaluation of a polynomial \(p\) on \(D_1\).

Output:

vector \(\vec{v_2}\) of length \(|D_2|\) such that \(\vec{v_2}\) is the evaluation of \(p\) on \(D_2\).

Analysis:

A common wasm binary will need one IFFT to get coefficients of \(p\) and one FFT to evaluate \(p\) on \(D_2\)

Test Suite 2: Product of Pairings

This test measures the performance of bilinear pairing.

Test Curve: Bls12-381 as \((G_1, G_2)\)

Input:

\(\vec{v}_1\): Random vector of affine representation of \(G_1\).
\(\vec{v}_2\): Random vector of affine representation of \(G_2\).

We sample \(\vec{v}_1\), \(\vec{v}_2\) by taking random exponent on affine generation. Sampling is not included in benchmark time. Exponents will not be given.

Output:

\(x\in G_T\) such that
\[ x = \prod_{(P, Q)\in\mbox{zip}(\vec{v}_1, \vec{v}_2)} e(P,Q) \]
Analysis:

This operation requires computation of product of pairings.

Test Suite 3: Multi-Scalar Multiplication

This test measures the performance of scalar multiplication.

Test Curve: Bls12-381 as \((G_1, G_2, \mathbb{F}_r)\)

Input:

\(\vec{v}\): Random vector of \(G_1\).
\(\vec{s}\): Random vector of \(\mathbb{F}_r\).

Output:
\[ x = \sum_{(P, s)\in\mbox{zip}(\vec{v}, \vec{s})}sP \]
Analysis:

The naive operation requires computation of \(|\vec{v}|\) scalar multiplication.
Potential improvement ideas:

Pipegener optimization: https://github.com/arkworks-rs/algebra/issues/60

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.