# Benchmarking Plonky3 vs. Vortex
**TL;DR**: If you're only interested in the benchmarking results for different polynomial commitment schemes (PCSs), feel free to skip ahead to the [performance results](https://hackmd.io/@YaoGalteland/SJ1WmzgTJg#Benchmark-Results).
This blog compares **Plonky3**'s Fri PCS and **Vortex** using identical parameters to ensure a fair performance evaluation.
The source code used in this benchmarking can be found at the following links:
- [Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L232) — from the `gnark-crypto` repository.
- [Plonky3's Fri PCS](https://github.com/YaoJGalteland/Plonky3/tree/main/plonky3-pcs) — forked from the `Plonky3` repository.
<!--
:::warning
note: What specifically are you implementing/proving in Vortex for the benchmark? not benchmark verifier, only prover
we using `HashPoseidon2x16` for merkle hashes
- relation between [linea-monorepo](https://github.com/Consensys/linea-monorepo/blob/f3081f6caf4d6ace72530324312ae73ebcda990b/prover/zkevm/full.go#L45) and [Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L232)? they use scalar field,
- posedion2 is better for recursion.
:::
-->
## Introduction
Both PCS schemes use Reed-Solomon code to encode the input trace.
FRI commits to the encoded trace by building a Merkle tree directly over it.
Vortex, on the other hand, first computes a SIS hash of the encoded trace, and then constructs a Merkle tree over these SIS hashes.
For a deeper theoretical introduction, refer to:
- [Fri PCS](https://dev.risczero.com/proof-system/stark-by-hand#lesson-11-fri-protocol-commit-phase)
- [Vortex](https://hackmd.io/@YaoGalteland/BJtzD2qCkx), it’s the PCS used in Linea
## Benchmark Setup
We generate a random trace and benchmark the commit and open times for Fri PCS in Plonky3. The results are compared against [Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L235), focusing on commit and open time metrics to evaluate efficiency and scalability.
### Parameter Settings
The following parameters are set to ensure a fair comparison.
#### Trace Dimension
- Plonky3:: The trace has $2^{19}$ Rows and $2^{11}$ columns, approximate 4GB of data.
- Vortex: The same size is used but represented as $2^{11}$ rows and $2^{19}$ columns (Vortex rotates the trace table for implementation purposes).
#### `FriConfig`
- Plonky3:
```rust
let fri_config = FriConfig {
log_blowup: 1,
log_final_poly_len: 0,
num_queries: 256,
proof_of_work_bits: 16,
mmcs,
};
```
- Vortex:
```rust
invRate = 2
numSelectedColumns = 256
```
<!--
:::info
Benchmarks can be extended for `invRate` = 8 (`log_blowup` = 3).
:::
-->
#### Extension/Challenge Field
The extension field, where challenges are sampled, is selected to be the degree-4 extension of Koalabear.
- Plonky3:
```rust
type EF = BinomialExtensionField<KoalaBear, 4>;
```
- Vortex:
```go
alpha = randFext(topRng)
```
#### Merkle Hashes
- Plonky3: The `MerkleTreeMmcs` is instantiated using the `poseidon-2` hash.
- Vortex: Uses `poseidon-2` to compute the Merkle tree of the SIS hashes.
### EC2 Instance Type and CPU Capabilities
The benchmarks are run on the following server:
* Instance Type: c7a.8xlarge
* Architecture: x86_64
* Model name: AMD EPYC 9R14
* Core(s) per socket: 32
* AVX-512 Support: ✅
## Running the Benchmarks
**Environment**: AVX512F enabled, nightly Rust, parallel feature enabled
- For **Fri PCS** in Plonky3, run:
```bash
cd Plonky3/plonky3-pcs
rustup default nightly
RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C target-feature=+avx512f" cargo +nightly bench --features "nightly-features" --features parallel
```
- For Vortex, run:
```bash
cd field/koalabear/vortex
go test -run=NONE -bench=BenchmarkVortexReal -count=5
```
:::info
We continue updating our library and improving Vortex performance. The following performance was evaluated up to April 10th, 2025.
:::
## Benchmark Results
The benchmarks measure:
- Commit time
- Open time
| Libs | Commit To Trace (s)| Open (ms) |
|-------------|---------|-------------|
| Plonky3 (Fri PCS) | 4.3378 | 273.47 |
| Vortex | 1.176 | 375.9 |
<!--
**TO REMOVE: stable rust (non-AVX) + multi threaded**
| Libs | Commit To Trace (s)| Open (ms) |
|-------------|---------|-------------|
| plonky3 (Fri PCS) | 9.5896 | 955.49 |
| plonky3 (Circle PCS) | 10.110 | 626.24 |
**TO REMOVE: stable rust (non-AVX) + single threaded**
| Libs | Commit To Trace (s)| Open (s) |
|-------------|---------|-------------|
| plonky3 (Fri PCS) | 213.91 | 35.948 |
| plonky3 (Circle PCS) |154.54 | 13.378 |
**TO REMOVE: nightly rust (AVX) + single threaded**
| Libs | Commit To Trace (s)| Open (s) |
|-------------|---------|-------------|
| plonky3 (Fri PCS) | 43.135 |8.5797|
| plonky3 (Circle PCS) | 67.431 | 5.0071 |
-->
<!--
| plonky3 (Fri PCS) | (8, 64) | 10.342 | 802.61 |1.1957| 1134388 |
| plonky3 (Circle PCS)| (8, 64) | 30.904 |1575.2 | 0.17762| 1088980 |
The Core Idea from the Paper
"AVX2 multiplications are about 40% faster than with BabyBear..."
"With high thread count, our current implementation becomes memory-bound..."
This means:
M31 field arithmetic benefits greatly from SIMD vectorization (AVX2) — especially on single-threaded runs.
However, with many threads, the bottleneck shifts from computation (CPU-bound) to memory access (memory-bound).
The CPU cores can't feed the AVX2 units fast enough due to shared memory bandwidth, cache misses, etc.
At that point, M31 loses its edge over BabyBear.
-->
### Acknowledge
Special thanks to Alexandre Belling, Gautam Botrel and Daniel Lubarov for their feedback and review.
## References
- [Benchmarking Vortex](https://github.com/Consensys/gnark-crypto/blob/master/field/koalabear/vortex/prover_test.go#L232)
- [Benchmarking Plonky3](https://github.com/YaoJGalteland/Plonky3/tree/main/plonky3-pcs)