Try   HackMD

Benchmark Hash in SNARK

Use cases

  • Aggregation for hash-based signature
    • Optimistic requirement: 500k hash/s [1]
  • STARKed binary hash tree
    • Worst-case requirement: 100k hash/s [2]

Requirements

  • Post-quantum friendly
  • Provable soundness
  • Not-too-complex to write spec

Benchmark candidates

Benchmarks

  • Machine spec
    • CPU: i9-13900K
    • RAM: 64GB (2x Crucial DDR5 5600 32GB Dual Channel)
    • OS: Linux 5.15.0-124-generic x86-64
    • Env:
      • RAYON_NUM_THREADS: 4
  • Code: https://github.com/han0110/bench-hash-in-snark/tree/3398a8c
  • Output
    • perm: Number of permutations proving (210..20)
    • time: Average proving time of 10 samples
    • throughput: Average throughput of 10 samples (= perm / time)
    • proof_size: Average proof size of 10 samples
    • peak_memory: Peak memory usage measured by /usr/bin/time the proving command (if the process is killed due to OOM, the row will be filled with -)
  • The implementation of all packages are one round of permutation with fixed initial state, if the actual input length is larger than block length of the hash function, it'd require more constraints on the implementation based on the construction, but the cost should be negligible compared to the permutation.

plonky3

FRI parameter:

  • Field size: 124 (degree-4 extension of 31-bits field)
  • Rate: 1/2
  • Number of query: 256
  • Poseidon2 parameter (notation following https://eprint.iacr.org/2023/323.pdf)
    • n=31  t=16  d=3  RF=8  RP=20
    • p=231224+1
      (KoalaBear)
  • The 124-bits field is too small to reach 128-bits provable soundness, we need roughly 192-bits field but currently there is no available option.
    This makes the throughput below a bit optimistic.
  • The number of query also needs to be increased a bit to reach accurate 128-bits provable soundness, but won't be too far from current one.
    This makes the proof_size below a bit optimistic.

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 ./bench.sh plonky3 blake3 $i
    RAYON_NUM_THREADS=4 ./bench.sh plonky3 keccak $i
    RAYON_NUM_THREADS=4 ./bench.sh plonky3 poseidon2 $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
blake3 210 59.02 ms 17.35 K/s 9.85 MB 100.58 MB
blake3 211 120.67 ms 16.97 K/s 9.96 MB 170.68 MB
blake3 212 242.09 ms 16.92 K/s 10.08 MB 316.95 MB
blake3 213 483.01 ms 16.96 K/s 10.20 MB 605.91 MB
blake3 214 982.55 ms 16.67 K/s 10.33 MB 1.16 GB
blake3 215 2.06 s 15.88 K/s 10.47 MB 2.29 GB
blake3 216 4.35 s 15.06 K/s 10.62 MB 4.54 GB
blake3 217 10.11 s 12.96 K/s 10.77 MB 9.06 GB
blake3 218 23.10 s 11.35 K/s 10.93 MB 18.09 GB
blake3 219 49.49 s 10.59 K/s 11.10 MB 36.15 GB
blake3 220 - - - -
keccak 210 587.01 ms 2.33 K/s 3.89 MB 692.65 MB
keccak 211 1.20 s 2.27 K/s 4.04 MB 1.34 GB
keccak 212 2.50 s 2.19 K/s 4.19 MB 2.66 GB
keccak 213 5.06 s 2.16 K/s 4.35 MB 5.31 GB
keccak 214 10.74 s 2.03 K/s 4.52 MB 10.61 GB
keccak 215 23.03 s 1.90 K/s 4.70 MB 21.20 GB
keccak 216 54.17 s 1.61 K/s 4.89 MB 42.39 GB
keccak 217 - - - -
keccak 218 - - - -
keccak 219 - - - -
keccak 220 - - - -
poseidon2 210 3.61 ms 283.96 K/s 1.68 MB 44.39 MB
poseidon2 211 4.50 ms 455.39 K/s 1.76 MB 44.80 MB
poseidon2 212 6.04 ms 677.63 K/s 1.85 MB 45.20 MB
poseidon2 213 9.86 ms 830.99 K/s 1.95 MB 44.31 MB
poseidon2 214 16.98 ms 965.01 K/s 2.06 MB 44.86 MB
poseidon2 215 34.77 ms 942.38 K/s 2.17 MB 53.09 MB
poseidon2 216 78.66 ms 833.12 K/s 2.30 MB 96.97 MB
poseidon2 217 161.45 ms 811.83 K/s 2.43 MB 184.11 MB
poseidon2 218 327.44 ms 800.58 K/s 2.57 MB 359.19 MB
poseidon2 219 647.94 ms 809.16 K/s 2.71 MB 708.80 MB
poseidon2 220 1.34 s 779.93 K/s 2.87 MB 1.38 GB
Same FRI parameter but with rate = 1/4

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh plonky3 blake3 $i
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh plonky3 keccak $i
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh plonky3 poseidon2 $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
blake3 210 95.24 ms 10.75 K/s 5.10 MB 167.62 MB
blake3 211 184.65 ms 11.09 K/s 5.16 MB 311.15 MB
blake3 212 375.61 ms 10.90 K/s 5.22 MB 600.53 MB
blake3 213 766.45 ms 10.69 K/s 5.29 MB 1.15 GB
blake3 214 1.58 s 10.34 K/s 5.36 MB 2.28 GB
blake3 215 3.30 s 9.94 K/s 5.43 MB 4.54 GB
blake3 216 6.98 s 9.39 K/s 5.51 MB 9.06 GB
blake3 217 16.11 s 8.14 K/s 5.59 MB 18.08 GB
blake3 218 37.18 s 7.05 K/s 5.67 MB 36.14 GB
blake3 219 - - - -
blake3 220 - - - -
keccak 210 942.43 ms 1.45 K/s 2.04 MB 1.34 GB
keccak 211 1.93 s 1.42 K/s 2.12 MB 2.66 GB
keccak 212 3.95 s 1.38 K/s 2.20 MB 5.30 GB
keccak 213 8.07 s 1.35 K/s 2.28 MB 10.60 GB
keccak 214 17.35 s 1.26 K/s 2.37 MB 21.18 GB
keccak 215 37.05 s 1.18 K/s 2.46 MB 42.35 GB
keccak 216 - - - -
keccak 217 - - - -
keccak 218 - - - -
keccak 219 - - - -
keccak 220 - - - -
poseidon2 210 4.03 ms 254.37 K/s 903.14 KB 44.68 MB
poseidon2 211 4.41 ms 464.28 K/s 950.17 KB 44.64 MB
poseidon2 212 7.68 ms 533.45 K/s 0.98 MB 44.70 MB
poseidon2 213 12.51 ms 654.95 K/s 1.03 MB 44.86 MB
poseidon2 214 27.83 ms 588.72 K/s 1.09 MB 52.23 MB
poseidon2 215 59.70 ms 548.83 K/s 1.15 MB 95.39 MB
poseidon2 216 123.37 ms 531.22 K/s 1.22 MB 182.63 MB
poseidon2 217 249.28 ms 525.81 K/s 1.29 MB 357.43 MB
poseidon2 218 500.62 ms 523.64 K/s 1.36 MB 706.47 MB
poseidon2 219 1.02 s 514.07 K/s 1.44 MB 1.37 GB
poseidon2 220 2.09 s 502.81 K/s 1.52 MB 2.74 GB

stwo

FRI parameter:

  • Field size: 124 (degree-4 extension of 31-bits field)
  • Rate: 1/2
  • Number of query: 256
  • Poseidon2 parameter (notation following https://eprint.iacr.org/2023/323.pdf)
    • n=31  t=16  d=5  RF=8  RP=14
    • p=2311
      (Mersenne31)
  • The 124-bits field is too small to reach 128-bits provable soundness, we need roughly 192-bits field but currently there is no available option.
    This makes the throughput below a bit optimistic.
  • The number of query also needs to be increased a bit to reach accurate 128-bits provable soundness, but won't be too far from current one.
    This makes the proof_size below a bit optimistic.

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 ./bench.sh stwo blake2s $i
    RAYON_NUM_THREADS=4 ./bench.sh stwo poseidon2 $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
blake2s 210 807.93 ms 1.27 K/s 3.28 MB 772.36 MB
blake2s 211 898.52 ms 2.28 K/s 3.28 MB 853.77 MB
blake2s 212 1.06 s 3.85 K/s 3.32 MB 1018.40 MB
blake2s 213 1.43 s 5.73 K/s 3.30 MB 1.31 GB
blake2s 214 1.90 s 8.62 K/s 3.43 MB 2.04 GB
blake2s 215 3.10 s 10.58 K/s 3.57 MB 3.49 GB
blake2s 216 5.35 s 12.24 K/s 3.67 MB 6.40 GB
blake2s 217 10.02 s 13.08 K/s 3.79 MB 12.21 GB
blake2s 218 19.57 s 13.40 K/s 3.96 MB 23.83 GB
blake2s 219 - - - -
blake2s 220 - - - -
poseidon2 210 7.28 ms 140.66 K/s 892.29 KB 36.17 MB
poseidon2 211 12.66 ms 161.80 K/s 1.15 MB 36.61 MB
poseidon2 212 19.98 ms 204.98 K/s 1.25 MB 36.62 MB
poseidon2 213 45.35 ms 180.63 K/s 1.40 MB 40.01 MB
poseidon2 214 85.72 ms 191.14 K/s 1.53 MB 75.77 MB
poseidon2 215 137.20 ms 238.84 K/s 1.63 MB 147.96 MB
poseidon2 216 289.18 ms 226.63 K/s 1.70 MB 291.55 MB
poseidon2 217 523.06 ms 250.59 K/s 1.79 MB 579.75 MB
poseidon2 218 1.03 s 253.99 K/s 1.91 MB 1.13 GB
poseidon2 219 1.94 s 269.86 K/s 2.00 MB 2.25 GB
poseidon2 220 2.44 s 429.44 K/s 2.12 MB 4.50 GB
Same FRI parameter but with rate = 1/4

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh stwo blake2s $i
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh stwo poseidon2 $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
blake2s 210 1.35 s 756.10 /s 1.81 MB 1.53 GB
blake2s 211 1.49 s 1.38 K/s 1.81 MB 1.67 GB
blake2s 212 1.74 s 2.36 K/s 1.81 MB 1.93 GB
blake2s 213 2.24 s 3.65 K/s 1.81 MB 2.46 GB
blake2s 214 2.92 s 5.61 K/s 1.88 MB 3.82 GB
blake2s 215 4.70 s 6.97 K/s 1.94 MB 6.67 GB
blake2s 216 7.92 s 8.28 K/s 2.01 MB 12.37 GB
blake2s 217 - - - -
blake2s 218 - - - -
blake2s 219 - - - -
blake2s 220 - - - -
poseidon2 210 7.11 ms 144.09 K/s 721.87 KB 36.41 MB
poseidon2 211 11.88 ms 172.43 K/s 746.97 KB 36.50 MB
poseidon2 212 21.57 ms 189.87 K/s 810.15 KB 36.62 MB
poseidon2 213 32.85 ms 249.40 K/s 857.73 KB 36.67 MB
poseidon2 214 48.68 ms 336.57 K/s 899.84 KB 63.93 MB
poseidon2 215 126.33 ms 259.38 K/s 947.25 KB 123.99 MB
poseidon2 216 216.96 ms 302.07 K/s 987.39 KB 244.27 MB
poseidon2 217 398.75 ms 328.71 K/s 1.02 MB 484.21 MB
poseidon2 218 819.53 ms 319.87 K/s 1.07 MB 964.48 MB
poseidon2 219 1.64 s 319.43 K/s 1.15 MB 1.88 GB
poseidon2 220 2.30 s 455.03 K/s 1.20 MB 3.76 GB

binius

FRI parameter:

  • Field size: 128
  • Rate: 1/2
  • Number of query: 241
  • Provable soundness: 100-bits
  • The 128-bits field is too small to reach 128-bits provable soundness, we need roughly 150-bits field but currently there is no available option.
    This makes the throughput below a bit optimistic.

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 ./bench.sh binius keccak $i
    RAYON_NUM_THREADS=4 ./bench.sh binius vision $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
keccak 210 372.85 ms 2.75 K/s 429.68 KB 153.11 MB
keccak 211 734.93 ms 2.79 K/s 453.10 KB 288.49 MB
keccak 212 1.41 s 2.90 K/s 536.74 KB 551.79 MB
keccak 213 2.86 s 2.86 K/s 561.60 KB 1.06 GB
keccak 214 5.59 s 2.93 K/s 588.65 KB 2.11 GB
keccak 215 11.12 s 2.95 K/s 619.60 KB 4.26 GB
keccak 216 21.96 s 2.98 K/s 710.77 KB 8.50 GB
keccak 217 43.68 s 3.00 K/s 745.79 KB 16.75 GB
keccak 218 87.99 s 2.98 K/s 780.49 KB 34.11 GB
keccak 219 - - - -
keccak 220 - - - -
vision 210 571.78 ms 1.79 K/s 847.87 KB 155.05 MB
vision 211 886.24 ms 2.31 K/s 867.41 KB 235.49 MB
vision 212 1.47 s 2.78 K/s 890.87 KB 391.64 MB
vision 213 2.37 s 3.46 K/s 974.54 KB 705.71 MB
vision 214 4.18 s 3.92 K/s 999.43 KB 1.28 GB
vision 215 8.04 s 4.08 K/s 1.00 MB 2.44 GB
vision 216 15.19 s 4.31 K/s 1.03 MB 4.77 GB
vision 217 29.34 s 4.47 K/s 1.12 MB 9.54 GB
vision 218 57.11 s 4.59 K/s 1.16 MB 19.07 GB
vision 219 113.15 s 4.63 K/s 1.19 MB 37.87 GB
vision 220 - - - -
Same FRI parameter but with rate = 1/4

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh binius keccak $i
    RAYON_NUM_THREADS=4 PCS_LOG_INV_RATE=2 ./bench.sh binius vision $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
keccak 210 440.88 ms 2.32 K/s 316.43 KB 173.86 MB
keccak 211 827.01 ms 2.48 K/s 331.88 KB 330.25 MB
keccak 212 1.62 s 2.53 K/s 384.24 KB 633.39 MB
keccak 213 3.27 s 2.51 K/s 402.57 KB 1.22 GB
keccak 214 6.46 s 2.54 K/s 421.90 KB 2.42 GB
keccak 215 13.06 s 2.51 K/s 441.98 KB 4.82 GB
keccak 216 25.85 s 2.54 K/s 498.96 KB 9.81 GB
keccak 217 51.52 s 2.54 K/s 521.92 KB 19.70 GB
keccak 218 103.61 s 2.53 K/s 545.87 KB 38.87 GB
keccak 219 - - - -
keccak 220 - - - -
vision 210 625.80 ms 1.64 K/s 739.43 KB 164.54 MB
vision 211 951.49 ms 2.15 K/s 754.16 KB 255.32 MB
vision 212 1.52 s 2.69 K/s 769.65 KB 432.86 MB
vision 213 2.57 s 3.19 K/s 822.04 KB 779.41 MB
vision 214 4.57 s 3.58 K/s 840.40 KB 1.42 GB
vision 215 8.79 s 3.73 K/s 859.76 KB 2.76 GB
vision 216 16.91 s 3.88 K/s 879.87 KB 5.42 GB
vision 217 33.11 s 3.96 K/s 936.88 KB 10.92 GB
vision 218 65.25 s 4.02 K/s 959.87 KB 21.57 GB
vision 219 127.21 s 4.12 K/s 983.85 KB 43.49 GB
vision 220 - - - -

hashcaster

Since hashcaster only implements the PIOP part, the benchmark below uses Binius commitment scheme for it for completeness.

FRI parameter:

  • Field size: 128
  • Rate: 1/2
  • Number of query: 241
  • Provable soundness: 100-bits
  • The 128-bits field is too small to reach 128-bits provable soundness, we need roughly 150-bits field but currently there is no available option.
    This makes the throughput below a bit optimistic.
  • The iota step in keccak-f is not implemented yet, tho it should be negligible because it could be deferred and merged in the linear layer.
    This makes the throughput below a bit optimistic.

To reproduce, get into the code repository and run:

for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 ./bench.sh hashcaster keccak $i
done;
python3 render_table.py
hash perm time throughput proof_size peak_mem
keccak 210 84.53 ms 18.17 K/s 522.81 KB 47.09 MB
keccak 211 126.91 ms 24.21 K/s 540.91 KB 46.80 MB
keccak 212 202.20 ms 30.39 K/s 622.99 KB 86.95 MB
keccak 213 338.92 ms 36.26 K/s 642.52 KB 168.63 MB
keccak 214 676.60 ms 36.32 K/s 664.24 KB 333.86 MB
keccak 215 1.35 s 36.44 K/s 689.87 KB 662.56 MB
keccak 216 2.87 s 34.22 K/s 779.48 KB 1.31 GB
keccak 217 6.15 s 31.99 K/s 806.55 KB 2.59 GB
keccak 218 12.87 s 30.54 K/s 835.80 KB 5.12 GB
keccak 219 25.81 s 30.47 K/s 868.95 KB 10.20 GB
keccak 220 52.34 s 30.05 K/s 966.10 KB 20.39 GB

expander

  • PCS is not finished yet so the commitment is just raw witness.
    This makes the proof_size below much bigger than expected.
  • The 93-bits field used in GKR is too small to reach 128-bits provable soundness.
    This makes the throughput below a bit optimistic.

To reproduce, get into the code repository and run:

cd expander/circuit/
go run gf2_keccak.go
go run m31_poseidon.go
cd ../../
for i in $(seq 10 20); do
    RAYON_NUM_THREADS=4 ./bench.sh expander keccak $i
    RAYON_NUM_THREADS=4 ./bench.sh expander poseidon $i
python3 render_table.py
hash perm time throughput proof_size peak_mem
keccak 210 481.14 ms 2.13 K/s 518.82 KB 3.93 GB
keccak 211 1.10 s 1.87 K/s 788.70 KB 7.86 GB
keccak 212 2.32 s 1.76 K/s 1.28 MB 15.70 GB
keccak 213 4.97 s 1.65 K/s 2.30 MB 31.34 GB
keccak 214 - - - -
keccak 215 - - - -
keccak 216 - - - -
keccak 217 - - - -
keccak 218 - - - -
keccak 219 - - - -
keccak 220 - - - -
poseidon 210 2.40 ms 426.93 K/s 222.76 KB 61.77 MB
poseidon 211 4.99 ms 410.02 K/s 357.09 KB 62.34 MB
poseidon 212 9.68 ms 423.27 K/s 619.41 KB 119.96 MB
poseidon 213 18.74 ms 437.14 K/s 1.11 MB 235.33 MB
poseidon 214 37.95 ms 431.75 K/s 2.12 MB 465.13 MB
poseidon 215 78.41 ms 417.88 K/s 4.12 MB 926.41 MB
poseidon 216 180.71 ms 362.65 K/s 8.13 MB 1.81 GB
poseidon 217 449.54 ms 291.57 K/s 16.14 MB 3.59 GB
poseidon 218 1.09 s 241.31 K/s 32.14 MB 7.18 GB
poseidon 219 2.33 s 225.40 K/s 64.15 MB 14.36 GB
poseidon 220 4.59 s 228.30 K/s 128.15 MB 28.74 GB

Possible further optimizations

  • hashcaster
    • Use binius PCS on polyval field directly

  1. 8k signatures * 250 hashes per signature by 4 seconds in optimistic scenario. ↩︎

  2. https://vitalik.eth.limo/general/2024/10/23/futures4.html#starked-binary-hash-trees
    https://ethresear.ch/t/proposal-delay-stateroot-reference-to-increase-throughput-and-reduce-latency/20490 ↩︎