# Benchmark Hash in SNARK
## Use cases
- Aggregation for hash-based signature
- Optimistic requirement: **500k hash/s** [^opt-sig-agg]
- STARKed binary hash tree
- Worst-case requirement: **100k hash/s** [^starked-binary-hash-trees]
[^opt-sig-agg]: 8k signatures * 250 hashes per signature by 4 seconds in optimistic scenario.
[^starked-binary-hash-trees]: https://vitalik.eth.limo/general/2024/10/23/futures4.html#starked-binary-hash-trees <br> https://ethresear.ch/t/proposal-delay-stateroot-reference-to-increase-throughput-and-reduce-latency/20490
## Requirements
- Post-quantum friendly
- Provable soundness
- Not-too-complex to write spec
## Benchmark candidates
- [plonky3][p3] - FRI + AIR
- [Blake3 circuit][p3-blake3-circuit], [Keccak circuit][p3-keccak-circuit] and [Poseidon2 circuit][p3-poseidon2-circuit] are implemented
- [stwo][stwo] - FRI + AIR
- [Blake2s circuit][stwo-blake2s-circuit] and [Poseidon2 circuit][stwo-poseidon2-circuit] are implemented
- [binius][binius]
- [Groestl circuit][binius-groestl-circuit] and [Keccak circuit][binius-keccak-circuit] are implemented
- [hashcaster][hashcaster] - GKR on binary field
- [Keccak circuit][hashcaster-keccak-circuit] is implemented but no PCS connected yet
- [expander][expander] - GKR
- [Keccak circuit][expander-keccak-circuit] and [Poseidon circuit][expander-poseidon-circuit] are implemented
## Benchmarks
- Machine spec
- CPU: i9-13900K
- RAM: 64GB (2x Crucial DDR5 5600 32GB Dual Channel)
- OS: Linux 5.15.0-124-generic x86-64
- Env:
- `RAYON_NUM_THREADS`: `24`
- Code: https://github.com/han0110/bench-hash-in-snark/tree/b482995
- Output
- `perm`: Number of permutations proving (<code>2<sup>10..20</sup></code>)
- `time`: Average proving time of 10 samples
- `throughput`: Average throughput of 10 samples (`= perm / time`)
- `proof_size`: Average proof size of 10 samples
- `peak_memory`: Peak memory usage measured by `/usr/bin/time` the proving command (if the process is killed due to OOM, the row will be filled with `-`)
:::warning
- The implementation of all packages are one round of permutation with fixed initial state, if the actual input length is larger than block length of the hash function, it'd require more constraints on the implementation based on the construction, but the cost should be negligible compared to the permutation.
:::
---
### plonky3
FRI parameter:
- Field size: 124 (degree-4 extension of 31-bits field)
- Rate: <sup>1</sup>/<sub>2</sub>
- Number of query: 256
- Poseidon2 parameter (notation following https://eprint.iacr.org/2023/323.pdf)
- $n = 31 \ \ t = 16 \ \ d = 3 \ \ R_F = 8 \ \ R_P = 20$
- $p = 2^{31} - 2^{24} + 1$ (KoalaBear)
:::warning
- The 124-bits field is too small to reach 128-bits provable soundness, we need roughly 192-bits field but currently there is no available option.
This makes the `throughput` below a bit optimistic.
- The number of query also needs to be increased a bit to reach accurate 128-bits provable soundness, but won't be too far from current one.
This makes the `proof_size` below a bit optimistic.
:::
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 ./bench.sh plonky3 blake3 $i
RAYON_NUM_THREADS=24 ./bench.sh plonky3 keccak $i
RAYON_NUM_THREADS=24 ./bench.sh plonky3 poseidon2 $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| ----------- | --------------------------- | ----------- | ------------ | ------------ | ----------- |
| `blake3` | <code>2<sup>10</sup></code> | `40.17 ms` | `25.49 K/s` | `9.85 MB` | `112.54 MB` |
| `blake3` | <code>2<sup>11</sup></code> | `78.07 ms` | `26.23 K/s` | `9.96 MB` | `192.92 MB` |
| `blake3` | <code>2<sup>12</sup></code> | `141.73 ms` | `28.90 K/s` | `10.08 MB` | `332.87 MB` |
| `blake3` | <code>2<sup>13</sup></code> | `349.22 ms` | `23.46 K/s` | `10.20 MB` | `627.86 MB` |
| `blake3` | <code>2<sup>14</sup></code> | `755.90 ms` | `21.67 K/s` | `10.33 MB` | `1.20 GB` |
| `blake3` | <code>2<sup>15</sup></code> | `1.77 s` | `18.53 K/s` | `10.47 MB` | `2.33 GB` |
| `blake3` | <code>2<sup>16</sup></code> | `4.15 s` | `15.81 K/s` | `10.62 MB` | `4.60 GB` |
| `blake3` | <code>2<sup>17</sup></code> | `8.95 s` | `14.64 K/s` | `10.77 MB` | `9.12 GB` |
| `blake3` | <code>2<sup>18</sup></code> | `19.37 s` | `13.53 K/s` | `10.93 MB` | `18.15 GB` |
| `blake3` | <code>2<sup>19</sup></code> | `40.80 s` | `12.85 K/s` | `11.10 MB` | `36.21 GB` |
| `blake3` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `keccak` | <code>2<sup>10</sup></code> | `313.36 ms` | `4.36 K/s` | `3.89 MB` | `706.08 MB` |
| `keccak` | <code>2<sup>11</sup></code> | `674.05 ms` | `4.05 K/s` | `4.04 MB` | `1.35 GB` |
| `keccak` | <code>2<sup>12</sup></code> | `1.71 s` | `3.19 K/s` | `4.19 MB` | `2.68 GB` |
| `keccak` | <code>2<sup>13</sup></code> | `4.20 s` | `2.60 K/s` | `4.35 MB` | `5.33 GB` |
| `keccak` | <code>2<sup>14</sup></code> | `9.87 s` | `2.21 K/s` | `4.52 MB` | `10.62 GB` |
| `keccak` | <code>2<sup>15</sup></code> | `22.93 s` | `1.91 K/s` | `4.70 MB` | `21.21 GB` |
| `keccak` | <code>2<sup>16</sup></code> | `49.11 s` | `1.78 K/s` | `4.88 MB` | `42.41 GB` |
| `keccak` | <code>2<sup>17</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>18</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `poseidon2` | <code>2<sup>10</sup></code> | `3.87 ms` | `264.69 K/s` | `1.68 MB` | `46.00 MB` |
| `poseidon2` | <code>2<sup>11</sup></code> | `5.53 ms` | `370.42 K/s` | `1.76 MB` | `45.73 MB` |
| `poseidon2` | <code>2<sup>12</sup></code> | `7.09 ms` | `578.11 K/s` | `1.85 MB` | `45.75 MB` |
| `poseidon2` | <code>2<sup>13</sup></code> | `7.33 ms` | `1.12 M/s` | `1.95 MB` | `45.50 MB` |
| `poseidon2` | <code>2<sup>14</sup></code> | `11.18 ms` | `1.47 M/s` | `2.06 MB` | `45.25 MB` |
| `poseidon2` | <code>2<sup>15</sup></code> | `24.85 ms` | `1.32 M/s` | `2.17 MB` | `56.11 MB` |
| `poseidon2` | <code>2<sup>16</sup></code> | `35.53 ms` | `1.84 M/s` | `2.30 MB` | `100.00 MB` |
| `poseidon2` | <code>2<sup>17</sup></code> | `84.20 ms` | `1.56 M/s` | `2.43 MB` | `188.50 MB` |
| `poseidon2` | <code>2<sup>18</sup></code> | `153.85 ms` | `1.70 M/s` | `2.57 MB` | `363.25 MB` |
| `poseidon2` | <code>2<sup>19</sup></code> | `305.82 ms` | `1.71 M/s` | `2.71 MB` | `713.38 MB` |
| `poseidon2` | <code>2<sup>20</sup></code> | `655.07 ms` | `1.60 M/s` | `2.87 MB` | `1.38 GB` |
:::spoiler Same FRI parameter but with rate = <sup>1</sup>/<sub>4</sub>
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh plonky3 blake3 $i
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh plonky3 keccak $i
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh plonky3 poseidon2 $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| ----------- | --------------------------- | ----------- | ------------ | ------------ | ----------- |
| `blake3` | <code>2<sup>10</sup></code> | `58.83 ms` | `17.41 K/s` | `5.10 MB` | `183.54 MB` |
| `blake3` | <code>2<sup>11</sup></code> | `111.66 ms` | `18.34 K/s` | `5.16 MB` | `329.06 MB` |
| `blake3` | <code>2<sup>12</sup></code> | `224.94 ms` | `18.21 K/s` | `5.22 MB` | `624.02 MB` |
| `blake3` | <code>2<sup>13</sup></code> | `533.98 ms` | `15.34 K/s` | `5.29 MB` | `1.18 GB` |
| `blake3` | <code>2<sup>14</sup></code> | `1.26 s` | `13.00 K/s` | `5.36 MB` | `2.32 GB` |
| `blake3` | <code>2<sup>15</sup></code> | `2.88 s` | `11.37 K/s` | `5.43 MB` | `4.59 GB` |
| `blake3` | <code>2<sup>16</sup></code> | `6.85 s` | `9.56 K/s` | `5.51 MB` | `9.10 GB` |
| `blake3` | <code>2<sup>17</sup></code> | `14.84 s` | `8.83 K/s` | `5.59 MB` | `18.14 GB` |
| `blake3` | <code>2<sup>18</sup></code> | `31.68 s` | `8.28 K/s` | `5.67 MB` | `36.21 GB` |
| `blake3` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `blake3` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `keccak` | <code>2<sup>10</sup></code> | `513.52 ms` | `2.66 K/s` | `2.04 MB` | `1.35 GB` |
| `keccak` | <code>2<sup>11</sup></code> | `1.09 s` | `2.49 K/s` | `2.12 MB` | `2.67 GB` |
| `keccak` | <code>2<sup>12</sup></code> | `2.81 s` | `1.94 K/s` | `2.20 MB` | `5.32 GB` |
| `keccak` | <code>2<sup>13</sup></code> | `6.92 s` | `1.58 K/s` | `2.28 MB` | `10.62 GB` |
| `keccak` | <code>2<sup>14</sup></code> | `16.21 s` | `1.35 K/s` | `2.37 MB` | `21.20 GB` |
| `keccak` | <code>2<sup>15</sup></code> | `37.62 s` | `1.16 K/s` | `2.46 MB` | `42.37 GB` |
| `keccak` | <code>2<sup>16</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>17</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>18</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `poseidon2` | <code>2<sup>10</sup></code> | `4.89 ms` | `209.46 K/s` | `902.76 KB` | `45.75 MB` |
| `poseidon2` | <code>2<sup>11</sup></code> | `5.07 ms` | `403.66 K/s` | `949.79 KB` | `45.75 MB` |
| `poseidon2` | <code>2<sup>12</sup></code> | `6.58 ms` | `622.60 K/s` | `0.98 MB` | `45.50 MB` |
| `poseidon2` | <code>2<sup>13</sup></code> | `8.86 ms` | `924.79 K/s` | `1.03 MB` | `45.48 MB` |
| `poseidon2` | <code>2<sup>14</sup></code> | `18.16 ms` | `902.06 K/s` | `1.09 MB` | `54.40 MB` |
| `poseidon2` | <code>2<sup>15</sup></code> | `29.40 ms` | `1.11 M/s` | `1.15 MB` | `97.00 MB` |
| `poseidon2` | <code>2<sup>16</sup></code> | `64.47 ms` | `1.02 M/s` | `1.22 MB` | `187.00 MB` |
| `poseidon2` | <code>2<sup>17</sup></code> | `114.73 ms` | `1.14 M/s` | `1.29 MB` | `362.08 MB` |
| `poseidon2` | <code>2<sup>18</sup></code> | `243.80 ms` | `1.08 M/s` | `1.36 MB` | `711.50 MB` |
| `poseidon2` | <code>2<sup>19</sup></code> | `480.40 ms` | `1.09 M/s` | `1.44 MB` | `1.38 GB` |
| `poseidon2` | <code>2<sup>20</sup></code> | `1.04 s` | `1.01 M/s` | `1.52 MB` | `2.74 GB` |
:::
---
### stwo
FRI parameter:
- Field size: 124 (degree-4 extension of 31-bits field)
- Rate: <sup>1</sup>/<sub>2</sub>
- Number of query: 256
- Poseidon2 parameter (notation following https://eprint.iacr.org/2023/323.pdf)
- $n = 31 \ \ t = 16 \ \ d = 5 \ \ R_F = 8 \ \ R_P = 14$
- $p = 2^{31} - 1$ (Mersenne31)
:::warning
- The 124-bits field is too small to reach 128-bits provable soundness, we need roughly 192-bits field but currently there is no available option.
This makes the `throughput` below a bit optimistic.
- The number of query also needs to be increased a bit to reach accurate 128-bits provable soundness, but won't be too far from current one.
This makes the `proof_size` below a bit optimistic.
:::
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 ./bench.sh stwo blake2s $i
RAYON_NUM_THREADS=24 ./bench.sh stwo poseidon2 $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| ----------- | --------------------------- | ----------- | ------------ | ------------ | ----------- |
| `blake2s` | <code>2<sup>10</sup></code> | `815.91 ms` | `1.26 K/s` | `3.26 MB` | `793.15 MB` |
| `blake2s` | <code>2<sup>11</sup></code> | `930.92 ms` | `2.20 K/s` | `3.24 MB` | `877.73 MB` |
| `blake2s` | <code>2<sup>12</sup></code> | `1.14 s` | `3.58 K/s` | `3.28 MB` | `1.02 GB` |
| `blake2s` | <code>2<sup>13</sup></code> | `1.43 s` | `5.71 K/s` | `3.29 MB` | `1.34 GB` |
| `blake2s` | <code>2<sup>14</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>15</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>16</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>17</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>18</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `poseidon2` | <code>2<sup>10</sup></code> | `8.13 ms` | `125.99 K/s` | `877.30 KB` | `47.00 MB` |
| `poseidon2` | <code>2<sup>11</sup></code> | `16.03 ms` | `127.78 K/s` | `1.13 MB` | `46.75 MB` |
| `poseidon2` | <code>2<sup>12</sup></code> | `18.55 ms` | `220.80 K/s` | `1.21 MB` | `47.00 MB` |
| `poseidon2` | <code>2<sup>13</sup></code> | `31.58 ms` | `259.44 K/s` | `1.39 MB` | `47.00 MB` |
| `poseidon2` | <code>2<sup>14</sup></code> | `65.68 ms` | `249.44 K/s` | `1.49 MB` | `75.50 MB` |
| `poseidon2` | <code>2<sup>15</sup></code> | `124.54 ms` | `263.12 K/s` | `1.57 MB` | `147.50 MB` |
| `poseidon2` | <code>2<sup>16</sup></code> | `251.82 ms` | `260.25 K/s` | `1.66 MB` | `290.50 MB` |
| `poseidon2` | <code>2<sup>17</sup></code> | `493.03 ms` | `265.85 K/s` | `1.75 MB` | `577.75 MB` |
| `poseidon2` | <code>2<sup>18</sup></code> | `1.04 s` | `252.76 K/s` | `1.85 MB` | `1.13 GB` |
| `poseidon2` | <code>2<sup>19</sup></code> | `2.05 s` | `255.67 K/s` | `1.94 MB` | `2.25 GB` |
| `poseidon2` | <code>2<sup>20</sup></code> | `1.61 s` | `650.45 K/s` | `2.04 MB` | `4.49 GB` |
:::spoiler Same FRI parameter but with rate = <sup>1</sup>/<sub>4</sub>
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh stwo blake2s $i
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh stwo poseidon2 $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| ----------- | --------------------------- | ----------- | ------------ | ------------ | ----------- |
| `blake2s` | <code>2<sup>10</sup></code> | `1.44 s` | `712.91 /s` | `1.80 MB` | `1.55 GB` |
| `blake2s` | <code>2<sup>11</sup></code> | `1.59 s` | `1.29 K/s` | `1.79 MB` | `1.69 GB` |
| `blake2s` | <code>2<sup>12</sup></code> | `1.98 s` | `2.07 K/s` | `1.80 MB` | `1.96 GB` |
| `blake2s` | <code>2<sup>13</sup></code> | `2.47 s` | `3.31 K/s` | `1.78 MB` | `2.49 GB` |
| `blake2s` | <code>2<sup>14</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>15</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>16</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>17</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>18</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `blake2s` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `poseidon2` | <code>2<sup>10</sup></code> | `7.59 ms` | `134.94 K/s` | `658.34 KB` | `46.75 MB` |
| `poseidon2` | <code>2<sup>11</sup></code> | `12.51 ms` | `163.74 K/s` | `723.80 KB` | `47.00 MB` |
| `poseidon2` | <code>2<sup>12</sup></code> | `13.35 ms` | `306.72 K/s` | `790.38 KB` | `46.50 MB` |
| `poseidon2` | <code>2<sup>13</sup></code> | `28.48 ms` | `287.63 K/s` | `815.78 KB` | `47.25 MB` |
| `poseidon2` | <code>2<sup>14</sup></code> | `46.67 ms` | `351.04 K/s` | `858.51 KB` | `63.50 MB` |
| `poseidon2` | <code>2<sup>15</sup></code> | `94.55 ms` | `346.55 K/s` | `913.75 KB` | `123.00 MB` |
| `poseidon2` | <code>2<sup>16</sup></code> | `182.82 ms` | `358.47 K/s` | `959.79 KB` | `242.86 MB` |
| `poseidon2` | <code>2<sup>17</sup></code> | `370.16 ms` | `354.09 K/s` | `0.99 MB` | `481.25 MB` |
| `poseidon2` | <code>2<sup>18</sup></code> | `765.87 ms` | `342.28 K/s` | `1.05 MB` | `960.00 MB` |
| `poseidon2` | <code>2<sup>19</sup></code> | `1.47 s` | `355.55 K/s` | `1.10 MB` | `1.87 GB` |
| `poseidon2` | <code>2<sup>20</sup></code> | `1.44 s` | `730.55 K/s` | `1.15 MB` | `3.73 GB` |
:::
---
### binius
FRI parameter:
- Field size: 128
- Rate: <sup>1</sup>/<sub>2</sub>
- Number of query: 241
- Provable soundness: 100-bits
:::warning
- The 128-bits field is too small to reach 128-bits provable soundness, we need roughly 150-bits field but currently there is no available option.
This makes the `throughput` below a bit optimistic.
:::
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 ./bench.sh binius groestl $i
RAYON_NUM_THREADS=24 ./bench.sh binius keccak $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| - | - | - | - | - | - |
| `groestl` | <code>2<sup>10</sup></code> | `165.11 ms` | `6.20 K/s` | `280.17 KB` | `57.30 MB` |
| `groestl` | <code>2<sup>11</sup></code> | `159.30 ms` | `12.86 K/s` | `303.33 KB` | `98.02 MB` |
| `groestl` | <code>2<sup>12</sup></code> | `271.24 ms` | `15.10 K/s` | `334.48 KB` | `164.84 MB` |
| `groestl` | <code>2<sup>13</sup></code> | `365.49 ms` | `22.41 K/s` | `389.92 KB` | `260.71 MB` |
| `groestl` | <code>2<sup>14</sup></code> | `415.81 ms` | `39.40 K/s` | `416.61 KB` | `447.43 MB` |
| `groestl` | <code>2<sup>15</sup></code> | `653.14 ms` | `50.17 K/s` | `447.30 KB` | `830.13 MB` |
| `groestl` | <code>2<sup>16</sup></code> | `967.02 ms` | `67.77 K/s` | `485.98 KB` | `1.54 GB` |
| `groestl` | <code>2<sup>17</sup></code> | `1.61 s` | `81.65 K/s` | `548.95 KB` | `3.01 GB` |
| `groestl` | <code>2<sup>18</sup></code> | `2.92 s` | `89.85 K/s` | `583.17 KB` | `5.93 GB` |
| `groestl` | <code>2<sup>19</sup></code> | `5.56 s` | `94.34 K/s` | `621.39 KB` | `11.78 GB` |
| `groestl` | <code>2<sup>20</sup></code> | `11.07 s` | `94.73 K/s` | `669.73 KB` | `23.49 GB` |
| | | | | | |
| `keccak` | <code>2<sup>10</sup></code> | `180.49 ms` | `5.67 K/s` | `342.63 KB` | `227.57 MB` |
| `keccak` | <code>2<sup>11</sup></code> | `295.05 ms` | `6.94 K/s` | `369.30 KB` | `423.50 MB` |
| `keccak` | <code>2<sup>12</sup></code> | `436.59 ms` | `9.38 K/s` | `399.97 KB` | `813.09 MB` |
| `keccak` | <code>2<sup>13</sup></code> | `785.99 ms` | `10.42 K/s` | `438.64 KB` | `1.57 GB` |
| `keccak` | <code>2<sup>14</sup></code> | `1.52 s` | `10.81 K/s` | `501.60 KB` | `3.04 GB` |
| `keccak` | <code>2<sup>15</sup></code> | `3.02 s` | `10.84 K/s` | `535.80 KB` | `6.03 GB` |
| `keccak` | <code>2<sup>16</sup></code> | `5.93 s` | `11.04 K/s` | `574.00 KB` | `12.01 GB` |
| `keccak` | <code>2<sup>17</sup></code> | `11.84 s` | `11.07 K/s` | `622.33 KB` | `23.94 GB` |
| `keccak` | <code>2<sup>18</sup></code> | `23.61 s` | `11.10 K/s` | `693.19 KB` | `47.80 GB` |
| `keccak` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
:::spoiler Same FRI parameter but with rate = <sup>1</sup>/<sub>4</sub>
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh binius groestl $i
RAYON_NUM_THREADS=24 PCS_LOG_INV_RATE=2 ./bench.sh binius keccak $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| - | - | - | - | - | - |
| `groestl` | <code>2<sup>10</sup></code> | `174.84 ms` | `5.86 K/s` | `233.48 KB` | `59.89 MB` |
| `groestl` | <code>2<sup>11</sup></code> | `200.98 ms` | `10.19 K/s` | `258.83 KB` | `103.04 MB` |
| `groestl` | <code>2<sup>12</sup></code> | `252.03 ms` | `16.25 K/s` | `285.20 KB` | `175.86 MB` |
| `groestl` | <code>2<sup>13</sup></code> | `421.06 ms` | `19.46 K/s` | `303.17 KB` | `281.22 MB` |
| `groestl` | <code>2<sup>14</sup></code> | `549.76 ms` | `29.80 K/s` | `325.14 KB` | `488.73 MB` |
| `groestl` | <code>2<sup>15</sup></code> | `754.96 ms` | `43.40 K/s` | `355.11 KB` | `911.30 MB` |
| `groestl` | <code>2<sup>16</sup></code> | `1.23 s` | `53.11 K/s` | `386.11 KB` | `1.71 GB` |
| `groestl` | <code>2<sup>17</sup></code> | `2.00 s` | `65.65 K/s` | `408.70 KB` | `3.34 GB` |
| `groestl` | <code>2<sup>18</sup></code> | `3.48 s` | `75.43 K/s` | `435.30 KB` | `6.60 GB` |
| `groestl` | <code>2<sup>19</sup></code> | `6.75 s` | `77.65 K/s` | `469.89 KB` | `13.13 GB` |
| `groestl` | <code>2<sup>20</sup></code> | `13.67 s` | `76.69 K/s` | `505.52 KB` | `26.19 GB` |
| | | | | | |
| `keccak` | <code>2<sup>10</sup></code> | `150.44 ms` | `6.81 K/s` | `255.88 KB` | `250.97 MB` |
| `keccak` | <code>2<sup>11</sup></code> | `296.99 ms` | `6.90 K/s` | `277.83 KB` | `465.19 MB` |
| `keccak` | <code>2<sup>12</sup></code> | `502.60 ms` | `8.15 K/s` | `307.79 KB` | `901.79 MB` |
| `keccak` | <code>2<sup>13</sup></code> | `948.37 ms` | `8.64 K/s` | `338.77 KB` | `1.74 GB` |
| `keccak` | <code>2<sup>14</sup></code> | `1.84 s` | `8.88 K/s` | `361.35 KB` | `3.38 GB` |
| `keccak` | <code>2<sup>15</sup></code> | `3.60 s` | `9.11 K/s` | `387.93 KB` | `6.71 GB` |
| `keccak` | <code>2<sup>16</sup></code> | `7.21 s` | `9.09 K/s` | `422.50 KB` | `13.35 GB` |
| `keccak` | <code>2<sup>17</sup></code> | `14.32 s` | `9.15 K/s` | `458.11 KB` | `26.64 GB` |
| `keccak` | <code>2<sup>18</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
:::
---
### hashcaster
Since hashcaster only implements the PIOP part, the benchmark below uses Binius commitment scheme for it for completeness.
FRI parameter:
- Field size: 128
- Rate: <sup>1</sup>/<sub>2</sub>
- Number of query: 241
- Provable soundness: 100-bits
:::warning
- The 128-bits field is too small to reach 128-bits provable soundness, we need roughly 150-bits field but currently there is no available option.
This makes the `throughput` below a bit optimistic.
- The iota step in keccak-f is not implemented yet, tho it should be negligible because it could be deferred and merged in the linear layer.
This makes the `throughput` below a bit optimistic.
:::
To reproduce, get into the code repository and run:
```sh
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 ./bench.sh hashcaster keccak $i
done;
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| -------- | --------------------------- | ----------- | ------------ | ------------ | ----------- |
| `keccak` | <code>2<sup>10</sup></code> | `121.83 ms` | `12.61 K/s` | `522.81 KB` | `47.89 MB` |
| `keccak` | <code>2<sup>11</sup></code> | `155.35 ms` | `19.78 K/s` | `540.91 KB` | `48.12 MB` |
| `keccak` | <code>2<sup>12</sup></code> | `191.01 ms` | `32.17 K/s` | `622.99 KB` | `83.98 MB` |
| `keccak` | <code>2<sup>13</sup></code> | `296.50 ms` | `41.44 K/s` | `642.52 KB` | `167.60 MB` |
| `keccak` | <code>2<sup>14</sup></code> | `484.90 ms` | `50.68 K/s` | `664.24 KB` | `334.00 MB` |
| `keccak` | <code>2<sup>15</sup></code> | `823.76 ms` | `59.67 K/s` | `689.87 KB` | `665.87 MB` |
| `keccak` | <code>2<sup>16</sup></code> | `1.77 s` | `55.59 K/s` | `779.48 KB` | `1.32 GB` |
| `keccak` | <code>2<sup>17</sup></code> | `3.77 s` | `52.16 K/s` | `806.55 KB` | `2.59 GB` |
| `keccak` | <code>2<sup>18</sup></code> | `7.64 s` | `51.47 K/s` | `835.80 KB` | `5.11 GB` |
| `keccak` | <code>2<sup>19</sup></code> | `15.38 s` | `51.13 K/s` | `868.95 KB` | `10.19 GB` |
| `keccak` | <code>2<sup>20</sup></code> | `30.44 s` | `51.67 K/s` | `966.10 KB` | `20.35 GB` |
---
### expander
- Poseidon parameter (notation following https://eprint.iacr.org/2023/323.pdf)
- $n = 31 \ \ t = 16 \ \ d = 5 \ \ R_F = 8 \ \ R_P = 14$
- $p = 2^{31} - 1$ (Mersenne31)
:::warning
- PCS is not finished yet so the commitment is just raw witness.
This makes the `proof_size` below much bigger than expected.
- The 93-bits field used in GKR is too small to reach 128-bits provable soundness.
This makes the `throughput` below a bit optimistic.
:::
To reproduce, get into the code repository and run:
```sh
cd expander/circuit/
go run gf2_keccak.go
go run m31_poseidon.go
cd ../../
for i in $(seq 10 20); do
RAYON_NUM_THREADS=24 ./bench.sh expander keccak $i
RAYON_NUM_THREADS=24 ./bench.sh expander poseidon $i
done
python3 render_table.py
```
| `hash` | `perm` | `time` | `throughput` | `proof_size` | `peak_mem` |
| ---------- | --------------------------- | ----------- | ------------ | ------------ | ----------- |
| `keccak` | <code>2<sup>10</sup></code> | `433.79 ms` | `3.54 K/s` | `1.75 MB` | `4.93 GB` |
| `keccak` | <code>2<sup>11</sup></code> | `989.40 ms` | `3.10 K/s` | `2.21 MB` | `9.85 GB` |
| `keccak` | <code>2<sup>12</sup></code> | `3.07 s` | `2.00 K/s` | `3.04 MB` | `19.69 GB` |
| `keccak` | <code>2<sup>13</sup></code> | `6.23 s` | `1.97 K/s` | `4.62 MB` | `39.38 GB` |
| `keccak` | <code>2<sup>14</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>15</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>16</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>17</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>18</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>19</sup></code> | `-` | `-` | `-` | `-` |
| `keccak` | <code>2<sup>20</sup></code> | `-` | `-` | `-` | `-` |
| | | | | | |
| `poseidon` | <code>2<sup>10</sup></code> | `5.85 ms` | `262.57 K/s` | `684.57 KB` | `63.87 MB` |
| `poseidon` | <code>2<sup>11</sup></code> | `6.17 ms` | `497.73 K/s` | `914.54 KB` | `78.75 MB` |
| `poseidon` | <code>2<sup>12</sup></code> | `9.34 ms` | `657.90 K/s` | `1.31 MB` | `154.25 MB` |
| `poseidon` | <code>2<sup>13</sup></code> | `15.77 ms` | `779.33 K/s` | `2.09 MB` | `307.25 MB` |
| `poseidon` | <code>2<sup>14</sup></code> | `30.90 ms` | `795.36 K/s` | `3.63 MB` | `607.75 MB` |
| `poseidon` | <code>2<sup>15</sup></code> | `65.07 ms` | `755.39 K/s` | `6.67 MB` | `1.18 GB` |
| `poseidon` | <code>2<sup>16</sup></code> | `146.69 ms` | `670.16 K/s` | `12.70 MB` | `2.35 GB` |
| `poseidon` | <code>2<sup>17</sup></code> | `371.74 ms` | `528.89 K/s` | `24.74 MB` | `4.70 GB` |
| `poseidon` | <code>2<sup>18</sup></code> | `1.02 s` | `386.61 K/s` | `48.78 MB` | `9.39 GB` |
| `poseidon` | <code>2<sup>19</sup></code> | `2.36 s` | `333.39 K/s` | `96.81 MB` | `18.76 GB` |
| `poseidon` | <code>2<sup>20</sup></code> | `5.23 s` | `300.56 K/s` | `192.85 MB` | `37.51 GB` |
## Possible further optimizations
- hashcaster
- Use binius PCS on polyval field directly
<!-- -->
[p3]: https://github.com/plonky3/plonky3
[p3-blake3-circuit]: https://github.com/plonky3/plonky3/tree/main/blake3-air
[p3-keccak-circuit]: https://github.com/plonky3/plonky3/tree/main/keccak-air
[p3-poseidon2-circuit]: https://github.com/plonky3/plonky3/tree/main/poseidon2-air
[stwo]: https://github.com/starkware-libs/stwo
[stwo-blake2s-circuit]: https://github.com/starkware-libs/stwo/tree/dev/crates/prover/src/examples/blake
[stwo-poseidon2-circuit]: https://github.com/starkware-libs/stwo/tree/dev/crates/prover/src/examples/poseidon
[binius]: https://github.com/IrreducibleOSS/binius
[binius-groestl-circuit]: https://github.com/IrreducibleOSS/binius/blob/main/examples/groestl.rs
[binius-keccak-circuit]: https://github.com/IrreducibleOSS/binius/blob/main/examples/keccak.rs
[hashcaster]: https://github.com/morgana-proofs/hashcaster/tree/master
[hashcaster-keccak-circuit]: https://github.com/morgana-proofs/hashcaster/tree/master/src/examples/keccak
[expander]: https://github.com/PolyhedraZK/expander
[expander-keccak-circuit]: https://github.com/PolyhedraZK/ExpanderCompilerCollection/blob/master/ecgo/examples/keccak/main.go
[expander-poseidon-circuit]: https://github.com/PolyhedraZK/ExpanderCompilerCollection/blob/master/ecgo/examples/poseidon_m31/main.go
[^log-perm]: Logarithmic amount of permutations called in circuit.