# Benchmarking Plonky3 vs. Vortex **TL;DR**: If you're only interested in the benchmarking results for different polynomial commitment schemes (PCSs), feel free to skip ahead to the [performance results](https://hackmd.io/@YaoGalteland/SJ1WmzgTJg#Benchmark-Results). This blog compares **Plonky3**'s Fri PCS and **Vortex** using identical parameters to ensure a fair performance evaluation. The source code used in this benchmarking can be found at the following links: - [Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L232) — from the `gnark-crypto` repository. - [Plonky3's Fri PCS](https://github.com/YaoJGalteland/Plonky3/tree/main/plonky3-pcs) — forked from the `Plonky3` repository. <!-- :::warning note: What specifically are you implementing/proving in Vortex for the benchmark? not benchmark verifier, only prover we using `HashPoseidon2x16` for merkle hashes - relation between [linea-monorepo](https://github.com/Consensys/linea-monorepo/blob/f3081f6caf4d6ace72530324312ae73ebcda990b/prover/zkevm/full.go#L45) and [Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L232)? they use scalar field, - posedion2 is better for recursion. ::: --> ## Introduction Both PCS schemes use Reed-Solomon code to encode the input trace. FRI commits to the encoded trace by building a Merkle tree directly over it. Vortex, on the other hand, first computes a SIS hash of the encoded trace, and then constructs a Merkle tree over these SIS hashes. For a deeper theoretical introduction, refer to: - [Fri PCS](https://dev.risczero.com/proof-system/stark-by-hand#lesson-11-fri-protocol-commit-phase) - [Vortex](https://hackmd.io/@YaoGalteland/BJtzD2qCkx), it’s the PCS used in Linea ## Benchmark Setup We generate a random trace and benchmark the commit and open times for Fri PCS in Plonky3. The results are compared against [Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L235), focusing on commit and open time metrics to evaluate efficiency and scalability. ### Parameter Settings The following parameters are set to ensure a fair comparison. #### Trace Dimension - Plonky3:: The trace has $2^{19}$ Rows and $2^{11}$ columns, approximate 4GB of data. - Vortex: The same size is used but represented as $2^{11}$ rows and $2^{19}$ columns (Vortex rotates the trace table for implementation purposes). #### `FriConfig` - Plonky3: ```rust let fri_config = FriConfig { log_blowup: 1, log_final_poly_len: 0, num_queries: 256, proof_of_work_bits: 16, mmcs, }; ``` - Vortex: ```rust invRate = 2 numSelectedColumns = 256 ``` <!-- :::info Benchmarks can be extended for `invRate` = 8 (`log_blowup` = 3). ::: --> #### Extension/Challenge Field The extension field, where challenges are sampled, is selected to be the degree-4 extension of Koalabear. - Plonky3: ```rust type EF = BinomialExtensionField<KoalaBear, 4>; ``` - Vortex: ```go alpha = randFext(topRng) ``` #### Merkle Hashes - Plonky3: The `MerkleTreeMmcs` is instantiated using the `poseidon-2` hash. - Vortex: Uses `poseidon-2` to compute the Merkle tree of the SIS hashes. ### EC2 Instance Type and CPU Capabilities The benchmarks are run on the following server: * Instance Type: c7a.8xlarge * Architecture: x86_64 * Model name: AMD EPYC 9R14 * Core(s) per socket: 32 * AVX-512 Support: ✅ ## Running the Benchmarks **Environment**: AVX512F enabled, nightly Rust, parallel feature enabled - For **Fri PCS** in Plonky3, run: ```bash cd Plonky3/plonky3-pcs rustup default nightly RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C target-feature=+avx512f" cargo +nightly bench --features "nightly-features" --features parallel ``` - For Vortex, run: ```bash cd field/koalabear/vortex go test -run=NONE -bench=BenchmarkVortexReal -count=5 ``` :::info We continue updating our library and improving Vortex performance. The following performance was evaluated up to April 10th, 2025. ::: ## Benchmark Results The benchmarks measure: - Commit time - Open time | Libs | Commit To Trace (s)| Open (ms) | |-------------|---------|-------------| | Plonky3 (Fri PCS) | 4.3378 | 273.47 | | Vortex | 1.176 | 375.9 | <!-- **TO REMOVE: stable rust (non-AVX) + multi threaded** | Libs | Commit To Trace (s)| Open (ms) | |-------------|---------|-------------| | plonky3 (Fri PCS) | 9.5896 | 955.49 | | plonky3 (Circle PCS) | 10.110 | 626.24 | **TO REMOVE: stable rust (non-AVX) + single threaded** | Libs | Commit To Trace (s)| Open (s) | |-------------|---------|-------------| | plonky3 (Fri PCS) | 213.91 | 35.948 | | plonky3 (Circle PCS) |154.54 | 13.378 | **TO REMOVE: nightly rust (AVX) + single threaded** | Libs | Commit To Trace (s)| Open (s) | |-------------|---------|-------------| | plonky3 (Fri PCS) | 43.135 |8.5797| | plonky3 (Circle PCS) | 67.431 | 5.0071 | --> <!-- | plonky3 (Fri PCS) | (8, 64) | 10.342 | 802.61 |1.1957| 1134388 | | plonky3 (Circle PCS)| (8, 64) | 30.904 |1575.2 | 0.17762| 1088980 | The Core Idea from the Paper "AVX2 multiplications are about 40% faster than with BabyBear..." "With high thread count, our current implementation becomes memory-bound..." This means: M31 field arithmetic benefits greatly from SIMD vectorization (AVX2) — especially on single-threaded runs. However, with many threads, the bottleneck shifts from computation (CPU-bound) to memory access (memory-bound). The CPU cores can't feed the AVX2 units fast enough due to shared memory bandwidth, cache misses, etc. At that point, M31 loses its edge over BabyBear. --> ### Acknowledge Special thanks to Alexandre Belling, Gautam Botrel and Daniel Lubarov for their feedback and review. ## References - [Benchmarking Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L232) - [Benchmarking Plonky3](https://github.com/YaoJGalteland/Plonky3/tree/main/plonky3-pcs)