# Native Nova SHA256 bench *Context: See https://hackmd.io/0gVClQ9IQiSXHYAK0Up9hg?view= for previous Nova benchmarks, with more of a focus on recursion.* Hardware: Macbook Pro M1 Max (2021), 64GB memory. Native Nova SHA256 benchmark with varying preimage. Code: https://github.com/srinathsetty/Nova/blob/main/benches/sha256.rs | Size | Constraints | Time | | ---- | ----------- | ----- | | 64B | 55k | 53ms | | 128B | 82k | 59ms | | 256B | 135k | 77ms | | 512B | 242k | 106ms | | 1KB | 456k | 182ms | | 2KB | 883k | 320ms | | 4KB | 1.7m | 521ms | | 8KB | 3.4m | 1s | | 16KB | 6.8m | 1.9s | | 32KB | 13.7m | 4.1s | | 64KB | 27.4m | 7.7s | ## Comment - Using native Nova and Bellperson SHA256, not via Nova Scotia - Size is preimage for sha256, not proof size - Constraints is per step (primary circuit), secondarty circuit is constant ~10k constraints - A single fold, no recursion **Comparing with Celer Network Benchmarks:** - https://github.com/celer-network/zk-benchmark - Hardware: MBP but M1 vs M1 Max, e.g. 16GB memory - Starky also takes ~8s for 64KB preimage size (same) - Halo2 and Plonky takes ~100s for ~8KB preimage size (100x faster) **Comparing with Nova Scotia SHA256:** - See https://hackmd.io/0gVClQ9IQiSXHYAK0Up9hg?view= - Multiple SHA256 hashes (p=100), also single fold (k=1) - Written in Circom - 2.9m constraints - 635ms for prove step (base case) - If done recursively (e.g. 10 or 100 folds) then each recursive proof step is ~2s, x5 base case - appears to expand, from 1.3s to 3.1s - leak? **Comparing Halo2 SHA256 benchmarks:** - In varying preimage size case 8KB takes ~100s - 8KB SHA256 takes ~3m constraints in R1CS - For recursive hashing case ~3m constraints gives you 100 hashes in one fold - With 100 hashes using lookup tables Halo2 takes ~1.6s - I.e. a ~100x difference, which explains disparity **Tldr (tentative):** - Can reproduce Nova native benchmarks x100 faster than Plonky/Halo2 and on par with Starky - For same number of constraints and no recursion prove step same prover speed for Nova Scotia and Nova - Recursion overhead ~x5 more than base case in each step - Lookup tables seems to have made Halo2 KZG x100 diff - Using Circom toolchain takes long time to get many constraints (>3m) into circuit - Doing 100 hashes in a circuit vs varying preimage size different problems **Things to follow up / answer:** 1) Why and when would we _not_ stick as much as possible into a single fold? 2) Reproduce preimage hash bench with Nova Scotia and e.g. rapidsnark toolchain 3) What would a better recursive benchmark look like that can't easily be solved with e.g. lookups?