# NEBRA GPU Prover Hardware Guide # Specs ## Bare Metal - Badass (NEBRA's in house machine): - 1 x RTX 4090 - 128G RAM - Intel Core i9-13900KF (5.8GHz) - 8x performance cores - 16x efficient cores - (32 total "threads") - g4dn.metal (AWS) - 8 x Tesla T4 GPU - 384G RAM - 96 x vCPU - rockaway: - 2 x 6 core Xeon CPU (1.9 GHz) - RTX 3090 - 377G RAM - rockaway (upgraded CPUs): - 2 x 18 core Xeon Gold 6240 (2.6 GHz) - RTX 3090 - 377G RAM ## Virtualized - g4dn.12xlarge - 4 x Tesla T4 GPU - 192G RAM - 48x vCPU - g4dn.8xlarge - 1 x Tesla T4 GPU - 128G RAM - 32 vCPU - g5.4xlarge - 1 x A10 GPU (24G) - 64 RAM - 16 vCPU - g5.8xlarge - 1 x A10 GPU (24G) - 128 RAM - 32 vCPU - g5.16xlarge - 1 x A10 GPU (24G) - 256G RAM - 64 vCPU # Benchmarks ## MSM of length 2^23 | Instance | num CPU cores | GPU Type, Memory | CPU-only Time (s) | GPU-enabled Time (s) | | - | - | - | - | - | | Badass | 24 | 4090, 24 gb | 2.69 | 0.32 | | Rockaway (original CPU was what?) | 12 | 3090, 24 gb | 11.47 | 0.57 | | Rockaway (Intel Xeon Gold 6240, 2.6 GHz) | 36 | 3090, 24 gb | 2.57 | 0.528 | | Rockaway (AMD EPYC 9374F 32-Core) | 32 | 4090, 24 gb | 1.59 | 0.31 | | g4dn.8xlarge | 32 | T4, 15 gb | 5.49 | 1.28 | | g5.16xlarge | 64 | A10, 24 gb | 2.10 | 0.45 | | g5.8xlarge | 32 | A10, 24 gb | 3.88 | 0.45 | | g5.4xlarge | 16 | A10, 24 gb | 7.31 | 0.45 | ## BV Prover Benchmark - BV prover (single, sec): `cargo bench --features gpu --bench batch_verify` - Badass: 19.75 - g5.16xlarge: 29.7 - g5.12xlarge: 30.74 - g5.8xlarge: 32.85 - g5.4xlarge: 39.97 - g4dn.metal: 49.38 - g4dn.12xlarge: 50.92 - g4dn.8xlarge: 55 - rockaway: 81.78 - rockaway (Intel Xeon Gold 6240, 2.6 GHz): 42.98 - rockaway (AMD EPYC 9374F 32-Core): 22.26 <!-- commented out because I believe these are duplicates of the Xeon Gold measurement - rockaway (upgraded CPUs): 41.162 - rockaway (upgraded CPUs - 1 GPU): 40.409 --> - Outer prover (single, sec) - Badass: 32.45 - g5.16xlarge: 55.31 - g5.12xlarge: 54.80 - g5.8xlarge: 59.66 - g5.4xlarge: 71.038 - rockaway (Xeon Gold 6240, 2.6 GHz): 76.85 - - rockaway (AMD EPYC 9374F 32-Core): 39.53 [Concurrent request benchmarks ...](https://docs.google.com/spreadsheets/d/1ADRaROIVPL_ekMNZjueHIZ7Ye7vAFNXNK93HAXf76BQ/edit#gid=0) ## BV Prover Benchmark details ### Badass ``` Start: Phase 1: Witness assignment and MSM commitments End: Phase 1: Witness assignment and MSM commitments ...........................3.427s Start: Phase 2: Lookup commit permuted ··Start: permute_par input hashmap (cpu par) ··End: permute_par input hashmap (cpu par) .....................................370.114ms ··Start: permute_par input unique ranges (cpu par) ··End: permute_par input unique ranges (cpu par) ...............................36.657ms ··Start: to_vec ··End: to_vec ..................................................................17.050ms ··Start: permute_par sort table ··End: permute_par sort table ..................................................19.164ms ··Start: leftover table coeffs (cpu par) ··End: leftover table coeffs (cpu par) .........................................20.952ms ··Start: permute_par input hashmap (cpu par) ··End: permute_par input hashmap (cpu par) .....................................362.263ms ··Start: permute_par input unique ranges (cpu par) ··End: permute_par input unique ranges (cpu par) ...............................35.662ms ··Start: to_vec ··End: to_vec ..................................................................16.347ms ··Start: permute_par sort table ··End: permute_par sort table ..................................................19.302ms ··Start: leftover table coeffs (cpu par) ··End: leftover table coeffs (cpu par) .........................................20.358ms End: Phase 2: Lookup commit permuted ...........................................1.717s Start: Phase 3a: Commit to permutations End: Phase 3a: Commit to permutations ..........................................4.187s Start: Phase 3b: Lookup commit product End: Phase 3b: Lookup commit product ...........................................704.458ms Start: Commit to vanishing argument's random poly End: Commit to vanishing argument's random poly ................................104.197ms Start: Calculate advice polys (fft) End: Calculate advice polys (fft) ..............................................275.227ms Start: Phase 4: Evaluate h(X) End: Phase 4: Evaluate h(X) ....................................................4.826s Start: Commit to vanishing argument's h(X) commitments End: Commit to vanishing argument's h(X) commitments ...........................404.288ms Start: Commit to vanishing argument's h(X) commitments End: Commit to vanishing argument's h(X) commitments ...........................569.526ms Start: Phase 5: multiopen End: Phase 5: multiopen ........................................................2.037s ``` ### Rockaway (updated CPUs) ``` Start: Phase 1: Witness assignment and MSM commitments End: Phase 1: Witness assignment and MSM commitments ...........................8.200s Start: Phase 2: Lookup commit permuted ··Start: permute_par input hashmap (cpu par) ··End: permute_par input hashmap (cpu par) .....................................565.808ms ··Start: permute_par input unique ranges (cpu par) ··End: permute_par input unique ranges (cpu par) ...............................123.731ms ··Start: to_vec ··End: to_vec ..................................................................38.869ms ··Start: permute_par sort table ··End: permute_par sort table ..................................................52.650ms ··Start: leftover table coeffs (cpu par) ··End: leftover table coeffs (cpu par) .........................................46.858ms ··Start: permute_par input hashmap (cpu par) ··End: permute_par input hashmap (cpu par) .....................................422.185ms ··Start: permute_par input unique ranges (cpu par) ··End: permute_par input unique ranges (cpu par) ...............................113.320ms ··Start: to_vec ··End: to_vec ..................................................................37.426ms ··Start: permute_par sort table ··End: permute_par sort table ..................................................50.308ms ··Start: leftover table coeffs (cpu par) ··End: leftover table coeffs (cpu par) .........................................44.098ms End: Phase 2: Lookup commit permuted ...........................................3.686s Start: Phase 3a: Commit to permutations End: Phase 3a: Commit to permutations ..........................................9.067s Start: Phase 3b: Lookup commit product End: Phase 3b: Lookup commit product ...........................................1.439s Start: Commit to vanishing argument's random poly End: Commit to vanishing argument's random poly ................................221.308ms Start: Calculate advice polys (fft) End: Calculate advice polys (fft) ..............................................869.569ms Start: Phase 4: Evaluate h(X) End: Phase 4: Evaluate h(X) ....................................................9.296s Start: Commit to vanishing argument's h(X) commitments End: Commit to vanishing argument's h(X) commitments ...........................778.317ms Start: Commit to vanishing argument's h(X) commitments End: Commit to vanishing argument's h(X) commitments ...........................1.039s Start: Phase 5: multiopen End: Phase 5: multiopen ........................................................2.692s ``` # Rockaway (updated CPUs) - restrict to 1 GPU ``` Start: Phase 1: Witness assignment and MSM commitments End: Phase 1: Witness assignment and MSM commitments ...........................7.809s Start: Phase 2: Lookup commit permuted ··Start: permute_par input hashmap (cpu par) ··End: permute_par input hashmap (cpu par) .....................................582.723ms ··Start: permute_par input unique ranges (cpu par) ··End: permute_par input unique ranges (cpu par) ...............................123.674ms ··Start: to_vec ··End: to_vec ..................................................................40.498ms ··Start: permute_par sort table ··End: permute_par sort table ..................................................48.569ms ··Start: leftover table coeffs (cpu par) ··End: leftover table coeffs (cpu par) .........................................47.056ms ··Start: permute_par input hashmap (cpu par) ··End: permute_par input hashmap (cpu par) .....................................387.511ms ··Start: permute_par input unique ranges (cpu par) ··End: permute_par input unique ranges (cpu par) ...............................112.539ms ··Start: to_vec ··End: to_vec ..................................................................37.878ms ··Start: permute_par sort table ··End: permute_par sort table ..................................................52.644ms ··Start: leftover table coeffs (cpu par) ··End: leftover table coeffs (cpu par) .........................................43.820ms End: Phase 2: Lookup commit permuted ...........................................3.580s Start: Phase 3a: Commit to permutations End: Phase 3a: Commit to permutations ..........................................8.977s Start: Phase 3b: Lookup commit product End: Phase 3b: Lookup commit product ...........................................1.419s Start: Commit to vanishing argument's random poly End: Commit to vanishing argument's random poly ................................215.236ms Start: Calculate advice polys (fft) End: Calculate advice polys (fft) ..............................................790.475ms Start: Phase 4: Evaluate h(X) End: Phase 4: Evaluate h(X) ....................................................9.160s Start: Commit to vanishing argument's h(X) commitments End: Commit to vanishing argument's h(X) commitments ...........................765.571ms Start: Commit to vanishing argument's h(X) commitments End: Commit to vanishing argument's h(X) commitments ...........................1.118s Start: Phase 5: multiopen End: Phase 5: multiopen ........................................................2.556s ``` ## References: scroll GPU : 4x4090, 1T RAM (200G in use), 128 threads (AMD EPYC 7702 64-Core Processor, ZEN2)