Parallel measurements

Setup

  1. Get a parallel feature enabled rustc/cargo toolchain (Zoxc)
  • In the root of rust repo, set [rust] parallel-compiler = true in config.toml

  • (Optional) If there was an existing compilation before, you need to run ./x.py clean

  • 6f087ac1c17723a84fd45f445c9887dbff61f8c0 is a try build commit for enable-parallel-compiler

  • compare with 80e7cde2238e837a9d6a240af9a3253f469bb2cf (bors master)

  1. The flags for cargo, rustc for compilation with N threads (Zoxc)
  • cargo -jN
  • rustc -Zthreads=N
  1. The configuration/changes to perf to support multi-thread benchmarks (TODO - @simulacrum)

  2. The (possibly) perf support for full crate graph build times (TODO - @simulacrum)

What to measure (in order of importance):

  • perf runs with 1 thread (seq. overhead)
  • perf runs with 2 threads (expected benefit)
  • full crate graph measurements
    • rustc bootstrap
    • servo
    • cranelift
    • ripgrep
  • ideally: perf runs with 4, 8 threads

Machines to measure on

  • perf.rlo (4 cores, 8 threads: Intel® Core i7-4790 CPU @ 3.60GHz)
  • Niko's "14 core"
  • eddyb's "lotsa cores"

Measurements:

https://docs.google.com/spreadsheets/d/1bNQJSDhbmOKbtb8EKg40C7D_gboayDpDoC-eQO6XbdI/edit#gid=0

from perf.rlo (4 core, 8 thread CPU)

Wall time, single crate:

from Niko's 14 core (28 thread?)

Whole crate graph data from Mark's computer (8 core, 16 thread)

Summary Sheet

0 is a sequential compiler for reference;
1 is -Zthreads=1 which shows the seq overhead;
2 is -Zthreads=2 which shows the benefit from parallel

rustc stage 0 compilation

> ./rustc-stage0-single-threaded <
552.759227012 seconds time elapsed ( ± 0.13% )

> ./rustc-stage0-multi-threaded-t2-j8 <
683.030195011 seconds time elapsed ( ± 5.02% )

> ./rustc-stage0-multi-threaded-t2-j16 <
688.029377313 seconds time elapsed ( ± 7.82% )

> ./rustc-stage0-multi-threaded-t4-j16 <
710.423041145 seconds time elapsed ( ± 8.60% )

> ./rustc-stage0-multi-threaded-t16-j16 <
812.783412266 seconds time elapsed ( ± 8.37% )

> ./rustc-stage0-multi-threaded-t2-j4 <
957.243363191 seconds time elapsed ( ± 5.62% )

Cargo compilation (not yet all data gathered)

> ./cargo-multi-threaded-t1-j1 <
553.290588371 seconds time elapsed ( ± 0.05% )

> ./cargo-multi-threaded-t2-j1 <
553.770275596 seconds time elapsed ( ± 0.03% )

> ./cargo-multi-threaded-t3-j1 <
553.533103358 seconds time elapsed ( ± 0.04% )

> ./cargo-multi-threaded-t4-j1 <
553.758315225 seconds time elapsed ( ± 0.03% )

> ./cargo-multi-threaded-t6-j1 <
553.333730522 seconds time elapsed ( ± 0.03% )

> ./cargo-multi-threaded-t8-j1 <
553.487285875 seconds time elapsed ( ± 0.03% )

Mark's initial thoughts:

  • Do we need data on e.g. Windows, macOS? Currently perf can't do that but could manually collect
Select a repo