# NEBRA GPU Prover Hardware Guide
# Specs
## Bare Metal
- Badass (NEBRA's in house machine):
- 1 x RTX 4090
- 128G RAM
- Intel Core i9-13900KF (5.8GHz)
- 8x performance cores
- 16x efficient cores
- (32 total "threads")
- g4dn.metal (AWS)
- 8 x Tesla T4 GPU
- 384G RAM
- 96 x vCPU
- rockaway:
- 2 x 6 core Xeon CPU (1.9 GHz)
- RTX 3090
- 377G RAM
- rockaway (upgraded CPUs):
- 2 x 18 core Xeon Gold 6240 (2.6 GHz)
- RTX 3090
- 377G RAM
## Virtualized
- g4dn.12xlarge
- 4 x Tesla T4 GPU
- 192G RAM
- 48x vCPU
- g4dn.8xlarge
- 1 x Tesla T4 GPU
- 128G RAM
- 32 vCPU
- g5.4xlarge
- 1 x A10 GPU (24G)
- 64 RAM
- 16 vCPU
- g5.8xlarge
- 1 x A10 GPU (24G)
- 128 RAM
- 32 vCPU
- g5.16xlarge
- 1 x A10 GPU (24G)
- 256G RAM
- 64 vCPU
# Benchmarks
## MSM of length 2^23
| Instance | num CPU cores | GPU Type, Memory | CPU-only Time (s) | GPU-enabled Time (s) |
| - | - | - | - | - |
| Badass | 24 | 4090, 24 gb | 2.69 | 0.32 |
| Rockaway (original CPU was what?) | 12 | 3090, 24 gb | 11.47 | 0.57 |
| Rockaway (Intel Xeon Gold 6240, 2.6 GHz) | 36 | 3090, 24 gb | 2.57 | 0.528 |
| Rockaway (AMD EPYC 9374F 32-Core) | 32 | 4090, 24 gb | 1.59 | 0.31 |
| g4dn.8xlarge | 32 | T4, 15 gb | 5.49 | 1.28 |
| g5.16xlarge | 64 | A10, 24 gb | 2.10 | 0.45 |
| g5.8xlarge | 32 | A10, 24 gb | 3.88 | 0.45 |
| g5.4xlarge | 16 | A10, 24 gb | 7.31 | 0.45 |
## BV Prover Benchmark
- BV prover (single, sec): `cargo bench --features gpu --bench batch_verify`
- Badass: 19.75
- g5.16xlarge: 29.7
- g5.12xlarge: 30.74
- g5.8xlarge: 32.85
- g5.4xlarge: 39.97
- g4dn.metal: 49.38
- g4dn.12xlarge: 50.92
- g4dn.8xlarge: 55
- rockaway: 81.78
- rockaway (Intel Xeon Gold 6240, 2.6 GHz): 42.98
- rockaway (AMD EPYC 9374F 32-Core): 22.26
<!--
commented out because I believe these are duplicates of the Xeon Gold measurement
- rockaway (upgraded CPUs): 41.162
- rockaway (upgraded CPUs - 1 GPU): 40.409 -->
- Outer prover (single, sec)
- Badass: 32.45
- g5.16xlarge: 55.31
- g5.12xlarge: 54.80
- g5.8xlarge: 59.66
- g5.4xlarge: 71.038
- rockaway (Xeon Gold 6240, 2.6 GHz): 76.85
- - rockaway (AMD EPYC 9374F 32-Core): 39.53
[Concurrent request benchmarks ...](https://docs.google.com/spreadsheets/d/1ADRaROIVPL_ekMNZjueHIZ7Ye7vAFNXNK93HAXf76BQ/edit#gid=0)
## BV Prover Benchmark details
### Badass
```
Start: Phase 1: Witness assignment and MSM commitments
End: Phase 1: Witness assignment and MSM commitments ...........................3.427s
Start: Phase 2: Lookup commit permuted
··Start: permute_par input hashmap (cpu par)
··End: permute_par input hashmap (cpu par) .....................................370.114ms
··Start: permute_par input unique ranges (cpu par)
··End: permute_par input unique ranges (cpu par) ...............................36.657ms
··Start: to_vec
··End: to_vec ..................................................................17.050ms
··Start: permute_par sort table
··End: permute_par sort table ..................................................19.164ms
··Start: leftover table coeffs (cpu par)
··End: leftover table coeffs (cpu par) .........................................20.952ms
··Start: permute_par input hashmap (cpu par)
··End: permute_par input hashmap (cpu par) .....................................362.263ms
··Start: permute_par input unique ranges (cpu par)
··End: permute_par input unique ranges (cpu par) ...............................35.662ms
··Start: to_vec
··End: to_vec ..................................................................16.347ms
··Start: permute_par sort table
··End: permute_par sort table ..................................................19.302ms
··Start: leftover table coeffs (cpu par)
··End: leftover table coeffs (cpu par) .........................................20.358ms
End: Phase 2: Lookup commit permuted ...........................................1.717s
Start: Phase 3a: Commit to permutations
End: Phase 3a: Commit to permutations ..........................................4.187s
Start: Phase 3b: Lookup commit product
End: Phase 3b: Lookup commit product ...........................................704.458ms
Start: Commit to vanishing argument's random poly
End: Commit to vanishing argument's random poly ................................104.197ms
Start: Calculate advice polys (fft)
End: Calculate advice polys (fft) ..............................................275.227ms
Start: Phase 4: Evaluate h(X)
End: Phase 4: Evaluate h(X) ....................................................4.826s
Start: Commit to vanishing argument's h(X) commitments
End: Commit to vanishing argument's h(X) commitments ...........................404.288ms
Start: Commit to vanishing argument's h(X) commitments
End: Commit to vanishing argument's h(X) commitments ...........................569.526ms
Start: Phase 5: multiopen
End: Phase 5: multiopen ........................................................2.037s
```
### Rockaway (updated CPUs)
```
Start: Phase 1: Witness assignment and MSM commitments
End: Phase 1: Witness assignment and MSM commitments ...........................8.200s
Start: Phase 2: Lookup commit permuted
··Start: permute_par input hashmap (cpu par)
··End: permute_par input hashmap (cpu par) .....................................565.808ms
··Start: permute_par input unique ranges (cpu par)
··End: permute_par input unique ranges (cpu par) ...............................123.731ms
··Start: to_vec
··End: to_vec ..................................................................38.869ms
··Start: permute_par sort table
··End: permute_par sort table ..................................................52.650ms
··Start: leftover table coeffs (cpu par)
··End: leftover table coeffs (cpu par) .........................................46.858ms
··Start: permute_par input hashmap (cpu par)
··End: permute_par input hashmap (cpu par) .....................................422.185ms
··Start: permute_par input unique ranges (cpu par)
··End: permute_par input unique ranges (cpu par) ...............................113.320ms
··Start: to_vec
··End: to_vec ..................................................................37.426ms
··Start: permute_par sort table
··End: permute_par sort table ..................................................50.308ms
··Start: leftover table coeffs (cpu par)
··End: leftover table coeffs (cpu par) .........................................44.098ms
End: Phase 2: Lookup commit permuted ...........................................3.686s
Start: Phase 3a: Commit to permutations
End: Phase 3a: Commit to permutations ..........................................9.067s
Start: Phase 3b: Lookup commit product
End: Phase 3b: Lookup commit product ...........................................1.439s
Start: Commit to vanishing argument's random poly
End: Commit to vanishing argument's random poly ................................221.308ms
Start: Calculate advice polys (fft)
End: Calculate advice polys (fft) ..............................................869.569ms
Start: Phase 4: Evaluate h(X)
End: Phase 4: Evaluate h(X) ....................................................9.296s
Start: Commit to vanishing argument's h(X) commitments
End: Commit to vanishing argument's h(X) commitments ...........................778.317ms
Start: Commit to vanishing argument's h(X) commitments
End: Commit to vanishing argument's h(X) commitments ...........................1.039s
Start: Phase 5: multiopen
End: Phase 5: multiopen ........................................................2.692s
```
# Rockaway (updated CPUs) - restrict to 1 GPU
```
Start: Phase 1: Witness assignment and MSM commitments
End: Phase 1: Witness assignment and MSM commitments ...........................7.809s
Start: Phase 2: Lookup commit permuted
··Start: permute_par input hashmap (cpu par)
··End: permute_par input hashmap (cpu par) .....................................582.723ms
··Start: permute_par input unique ranges (cpu par)
··End: permute_par input unique ranges (cpu par) ...............................123.674ms
··Start: to_vec
··End: to_vec ..................................................................40.498ms
··Start: permute_par sort table
··End: permute_par sort table ..................................................48.569ms
··Start: leftover table coeffs (cpu par)
··End: leftover table coeffs (cpu par) .........................................47.056ms
··Start: permute_par input hashmap (cpu par)
··End: permute_par input hashmap (cpu par) .....................................387.511ms
··Start: permute_par input unique ranges (cpu par)
··End: permute_par input unique ranges (cpu par) ...............................112.539ms
··Start: to_vec
··End: to_vec ..................................................................37.878ms
··Start: permute_par sort table
··End: permute_par sort table ..................................................52.644ms
··Start: leftover table coeffs (cpu par)
··End: leftover table coeffs (cpu par) .........................................43.820ms
End: Phase 2: Lookup commit permuted ...........................................3.580s
Start: Phase 3a: Commit to permutations
End: Phase 3a: Commit to permutations ..........................................8.977s
Start: Phase 3b: Lookup commit product
End: Phase 3b: Lookup commit product ...........................................1.419s
Start: Commit to vanishing argument's random poly
End: Commit to vanishing argument's random poly ................................215.236ms
Start: Calculate advice polys (fft)
End: Calculate advice polys (fft) ..............................................790.475ms
Start: Phase 4: Evaluate h(X)
End: Phase 4: Evaluate h(X) ....................................................9.160s
Start: Commit to vanishing argument's h(X) commitments
End: Commit to vanishing argument's h(X) commitments ...........................765.571ms
Start: Commit to vanishing argument's h(X) commitments
End: Commit to vanishing argument's h(X) commitments ...........................1.118s
Start: Phase 5: multiopen
End: Phase 5: multiopen ........................................................2.556s
```
## References:
scroll GPU : 4x4090, 1T RAM (200G in use), 128 threads (AMD EPYC 7702 64-Core Processor, ZEN2)