Try   HackMD

NEBRA GPU Prover Hardware Guide

Specs

Bare Metal

  • Badass (NEBRA's in house machine):
    • 1 x RTX 4090
    • 128G RAM
    • Intel Core i9-13900KF (5.8GHz)
      • 8x performance cores
      • 16x efficient cores
      • (32 total "threads")
  • g4dn.metal (AWS)
    • 8 x Tesla T4 GPU
    • 384G RAM
    • 96 x vCPU
  • rockaway:
    • 2 x 6 core Xeon CPU (1.9 GHz)
    • RTX 3090
    • 377G RAM
  • rockaway (upgraded CPUs):
    • 2 x 18 core Xeon Gold 6240 (2.6 GHz)
    • RTX 3090
    • 377G RAM

Virtualized

  • g4dn.12xlarge
    • 4 x Tesla T4 GPU
    • 192G RAM
    • 48x vCPU
  • g4dn.8xlarge
    • 1 x Tesla T4 GPU
    • 128G RAM
    • 32 vCPU
  • g5.4xlarge
    • 1 x A10 GPU (24G)
    • 64 RAM
    • 16 vCPU
  • g5.8xlarge
    • 1 x A10 GPU (24G)
    • 128 RAM
    • 32 vCPU
  • g5.16xlarge
    • 1 x A10 GPU (24G)
    • 256G RAM
    • 64 vCPU

Benchmarks

MSM of length 2^23

Instance num CPU cores GPU Type, Memory CPU-only Time (s) GPU-enabled Time (s)
Badass 24 4090, 24 gb 2.69 0.32
Rockaway (original CPU was what?) 12 3090, 24 gb 11.47 0.57
Rockaway (Intel Xeon Gold 6240, 2.6 GHz) 36 3090, 24 gb 2.57 0.528
Rockaway (AMD EPYC 9374F 32-Core) 32 4090, 24 gb 1.59 0.31
g4dn.8xlarge 32 T4, 15 gb 5.49 1.28
g5.16xlarge 64 A10, 24 gb 2.10 0.45
g5.8xlarge 32 A10, 24 gb 3.88 0.45
g5.4xlarge 16 A10, 24 gb 7.31 0.45

BV Prover Benchmark

  • BV prover (single, sec): cargo bench --features gpu --bench batch_verify
    • Badass: 19.75
    • g5.16xlarge: 29.7
    • g5.12xlarge: 30.74
    • g5.8xlarge: 32.85
    • g5.4xlarge: 39.97
    • g4dn.metal: 49.38
    • g4dn.12xlarge: 50.92
    • g4dn.8xlarge: 55
    • rockaway: 81.78
    • rockaway (Intel Xeon Gold 6240, 2.6 GHz): 42.98
    • rockaway (AMD EPYC 9374F 32-Core): 22.26
  • Outer prover (single, sec)
    • Badass: 32.45
    • g5.16xlarge: 55.31
    • g5.12xlarge: 54.80
    • g5.8xlarge: 59.66
    • g5.4xlarge: 71.038
    • rockaway (Xeon Gold 6240, 2.6 GHz): 76.85
      • rockaway (AMD EPYC 9374F 32-Core): 39.53

Concurrent request benchmarks

BV Prover Benchmark details

Badass

Start:   Phase 1: Witness assignment and MSM commitments
End:     Phase 1: Witness assignment and MSM commitments ...........................3.427s
Start:   Phase 2: Lookup commit permuted
··Start:   permute_par input hashmap (cpu par)
··End:     permute_par input hashmap (cpu par) .....................................370.114ms
··Start:   permute_par input unique ranges (cpu par)
··End:     permute_par input unique ranges (cpu par) ...............................36.657ms
··Start:   to_vec
··End:     to_vec ..................................................................17.050ms
··Start:   permute_par sort table
··End:     permute_par sort table ..................................................19.164ms
··Start:   leftover table coeffs (cpu par)
··End:     leftover table coeffs (cpu par) .........................................20.952ms
··Start:   permute_par input hashmap (cpu par)
··End:     permute_par input hashmap (cpu par) .....................................362.263ms
··Start:   permute_par input unique ranges (cpu par)
··End:     permute_par input unique ranges (cpu par) ...............................35.662ms
··Start:   to_vec
··End:     to_vec ..................................................................16.347ms
··Start:   permute_par sort table
··End:     permute_par sort table ..................................................19.302ms
··Start:   leftover table coeffs (cpu par)
··End:     leftover table coeffs (cpu par) .........................................20.358ms
End:     Phase 2: Lookup commit permuted ...........................................1.717s
Start:   Phase 3a: Commit to permutations
End:     Phase 3a: Commit to permutations ..........................................4.187s
Start:   Phase 3b: Lookup commit product
End:     Phase 3b: Lookup commit product ...........................................704.458ms
Start:   Commit to vanishing argument's random poly
End:     Commit to vanishing argument's random poly ................................104.197ms
Start:   Calculate advice polys (fft)
End:     Calculate advice polys (fft) ..............................................275.227ms
Start:   Phase 4: Evaluate h(X)
End:     Phase 4: Evaluate h(X) ....................................................4.826s
Start:   Commit to vanishing argument's h(X) commitments
End:     Commit to vanishing argument's h(X) commitments ...........................404.288ms
Start:   Commit to vanishing argument's h(X) commitments
End:     Commit to vanishing argument's h(X) commitments ...........................569.526ms
Start:   Phase 5: multiopen
End:     Phase 5: multiopen ........................................................2.037s

Rockaway (updated CPUs)

Start:   Phase 1: Witness assignment and MSM commitments
End:     Phase 1: Witness assignment and MSM commitments ...........................8.200s
Start:   Phase 2: Lookup commit permuted
··Start:   permute_par input hashmap (cpu par)
··End:     permute_par input hashmap (cpu par) .....................................565.808ms
··Start:   permute_par input unique ranges (cpu par)
··End:     permute_par input unique ranges (cpu par) ...............................123.731ms
··Start:   to_vec
··End:     to_vec ..................................................................38.869ms
··Start:   permute_par sort table
··End:     permute_par sort table ..................................................52.650ms
··Start:   leftover table coeffs (cpu par)
··End:     leftover table coeffs (cpu par) .........................................46.858ms
··Start:   permute_par input hashmap (cpu par)
··End:     permute_par input hashmap (cpu par) .....................................422.185ms
··Start:   permute_par input unique ranges (cpu par)
··End:     permute_par input unique ranges (cpu par) ...............................113.320ms
··Start:   to_vec
··End:     to_vec ..................................................................37.426ms
··Start:   permute_par sort table
··End:     permute_par sort table ..................................................50.308ms
··Start:   leftover table coeffs (cpu par)
··End:     leftover table coeffs (cpu par) .........................................44.098ms
End:     Phase 2: Lookup commit permuted ...........................................3.686s
Start:   Phase 3a: Commit to permutations
End:     Phase 3a: Commit to permutations ..........................................9.067s
Start:   Phase 3b: Lookup commit product
End:     Phase 3b: Lookup commit product ...........................................1.439s
Start:   Commit to vanishing argument's random poly
End:     Commit to vanishing argument's random poly ................................221.308ms
Start:   Calculate advice polys (fft)
End:     Calculate advice polys (fft) ..............................................869.569ms
Start:   Phase 4: Evaluate h(X)
End:     Phase 4: Evaluate h(X) ....................................................9.296s
Start:   Commit to vanishing argument's h(X) commitments
End:     Commit to vanishing argument's h(X) commitments ...........................778.317ms
Start:   Commit to vanishing argument's h(X) commitments
End:     Commit to vanishing argument's h(X) commitments ...........................1.039s
Start:   Phase 5: multiopen
End:     Phase 5: multiopen ........................................................2.692s

Rockaway (updated CPUs) - restrict to 1 GPU

Start:   Phase 1: Witness assignment and MSM commitments
End:     Phase 1: Witness assignment and MSM commitments ...........................7.809s
Start:   Phase 2: Lookup commit permuted
··Start:   permute_par input hashmap (cpu par)
··End:     permute_par input hashmap (cpu par) .....................................582.723ms
··Start:   permute_par input unique ranges (cpu par)
··End:     permute_par input unique ranges (cpu par) ...............................123.674ms
··Start:   to_vec
··End:     to_vec ..................................................................40.498ms
··Start:   permute_par sort table
··End:     permute_par sort table ..................................................48.569ms
··Start:   leftover table coeffs (cpu par)
··End:     leftover table coeffs (cpu par) .........................................47.056ms
··Start:   permute_par input hashmap (cpu par)
··End:     permute_par input hashmap (cpu par) .....................................387.511ms
··Start:   permute_par input unique ranges (cpu par)
··End:     permute_par input unique ranges (cpu par) ...............................112.539ms
··Start:   to_vec
··End:     to_vec ..................................................................37.878ms
··Start:   permute_par sort table
··End:     permute_par sort table ..................................................52.644ms
··Start:   leftover table coeffs (cpu par)
··End:     leftover table coeffs (cpu par) .........................................43.820ms
End:     Phase 2: Lookup commit permuted ...........................................3.580s
Start:   Phase 3a: Commit to permutations
End:     Phase 3a: Commit to permutations ..........................................8.977s
Start:   Phase 3b: Lookup commit product
End:     Phase 3b: Lookup commit product ...........................................1.419s
Start:   Commit to vanishing argument's random poly
End:     Commit to vanishing argument's random poly ................................215.236ms
Start:   Calculate advice polys (fft)
End:     Calculate advice polys (fft) ..............................................790.475ms
Start:   Phase 4: Evaluate h(X)
End:     Phase 4: Evaluate h(X) ....................................................9.160s
Start:   Commit to vanishing argument's h(X) commitments
End:     Commit to vanishing argument's h(X) commitments ...........................765.571ms
Start:   Commit to vanishing argument's h(X) commitments
End:     Commit to vanishing argument's h(X) commitments ...........................1.118s
Start:   Phase 5: multiopen
End:     Phase 5: multiopen ........................................................2.556s

References:

scroll GPU : 4x4090, 1T RAM (200G in use), 128 threads (AMD EPYC 7702 64-Core Processor, ZEN2)