# Native Nova SHA256 bench
*Context: See https://hackmd.io/0gVClQ9IQiSXHYAK0Up9hg?view= for previous Nova benchmarks, with more of a focus on recursion.*
Hardware: Macbook Pro M1 Max (2021), 64GB memory.
Native Nova SHA256 benchmark with varying preimage.
Code: https://github.com/srinathsetty/Nova/blob/main/benches/sha256.rs
| Size | Constraints | Time |
| ---- | ----------- | ----- |
| 64B | 55k | 53ms |
| 128B | 82k | 59ms |
| 256B | 135k | 77ms |
| 512B | 242k | 106ms |
| 1KB | 456k | 182ms |
| 2KB | 883k | 320ms |
| 4KB | 1.7m | 521ms |
| 8KB | 3.4m | 1s |
| 16KB | 6.8m | 1.9s |
| 32KB | 13.7m | 4.1s |
| 64KB | 27.4m | 7.7s |
## Comment
- Using native Nova and Bellperson SHA256, not via Nova Scotia
- Size is preimage for sha256, not proof size
- Constraints is per step (primary circuit), secondarty circuit is constant ~10k constraints
- A single fold, no recursion
**Comparing with Celer Network Benchmarks:**
- https://github.com/celer-network/zk-benchmark
- Hardware: MBP but M1 vs M1 Max, e.g. 16GB memory
- Starky also takes ~8s for 64KB preimage size (same)
- Halo2 and Plonky takes ~100s for ~8KB preimage size (100x faster)
**Comparing with Nova Scotia SHA256:**
- See https://hackmd.io/0gVClQ9IQiSXHYAK0Up9hg?view=
- Multiple SHA256 hashes (p=100), also single fold (k=1)
- Written in Circom
- 2.9m constraints
- 635ms for prove step (base case)
- If done recursively (e.g. 10 or 100 folds) then each recursive proof step is ~2s, x5 base case
- appears to expand, from 1.3s to 3.1s - leak?
**Comparing Halo2 SHA256 benchmarks:**
- In varying preimage size case 8KB takes ~100s
- 8KB SHA256 takes ~3m constraints in R1CS
- For recursive hashing case ~3m constraints gives you 100 hashes in one fold
- With 100 hashes using lookup tables Halo2 takes ~1.6s
- I.e. a ~100x difference, which explains disparity
**Tldr (tentative):**
- Can reproduce Nova native benchmarks x100 faster than Plonky/Halo2 and on par with Starky
- For same number of constraints and no recursion prove step same prover speed for Nova Scotia and Nova
- Recursion overhead ~x5 more than base case in each step
- Lookup tables seems to have made Halo2 KZG x100 diff
- Using Circom toolchain takes long time to get many constraints (>3m) into circuit
- Doing 100 hashes in a circuit vs varying preimage size different problems
**Things to follow up / answer:**
1) Why and when would we _not_ stick as much as possible into a single fold?
2) Reproduce preimage hash bench with Nova Scotia and e.g. rapidsnark toolchain
3) What would a better recursive benchmark look like that can't easily be solved with e.g. lookups?