# Parallel measurements
## Setup
1. Get a parallel feature enabled rustc/cargo toolchain (Zoxc)
* In the root of rust repo, set `[rust] parallel-compiler = true` in config.toml
* (Optional) If there was an existing compilation before, you need to run `./x.py clean`
* 6f087ac1c17723a84fd45f445c9887dbff61f8c0 is a try build commit for --enable-parallel-compiler
* compare with 80e7cde2238e837a9d6a240af9a3253f469bb2cf (bors master)
2. The flags for cargo, rustc for compilation with N threads (Zoxc)
* `cargo -jN`
* `rustc -Zthreads=N`
3. The configuration/changes to perf to support multi-thread benchmarks (TODO - @simulacrum)
4. The (possibly) perf support for full crate graph build times (TODO - @simulacrum)
## What to measure (in order of importance):
- perf runs with 1 thread (seq. overhead)
- perf runs with 2 threads (expected benefit)
- full crate graph measurements
- rustc bootstrap
- servo
- cranelift
- ripgrep
- ideally: perf runs with 4, 8 threads
## Machines to measure on
- perf.rlo (4 cores, 8 threads: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz)
- Niko's "14 core"
- eddyb's "lotsa cores"
## Measurements:
https://docs.google.com/spreadsheets/d/1bNQJSDhbmOKbtb8EKg40C7D_gboayDpDoC-eQO6XbdI/edit#gid=0
### from perf.rlo (4 core, 8 thread CPU)
Wall time, single crate:
* [single vs. -Zthreads=8](https://perf.rust-lang.org/compare.html?start=single-threaded&end=parallel-rustc-8&stat=wall-time) - 40-60% speedup on average; no significant performance hits
* [single vs. -Zthreads=4](https://perf.rust-lang.org/compare.html?start=single-threaded&end=parallel-rustc-4&stat=wall-time) - 35-50% speedup on average; no significant performance hits
* [single vs. -Zthreads=2](https://perf.rust-lang.org/compare.html?start=single-threaded&end=parallel-rustc-2&stat=wall-time) - 20-30% speedup on average; no significant performance hits (most tiny, i.e., <1 second)
* [single vs. -Zthreads=1](https://perf.rust-lang.org/compare.html?start=single-threaded&end=parallel-rustc-1&stat=wall-time) - 10%-20% slowdown (i.e., overhead of parallelism)
### from Niko's 14 core (28 thread?)
* [single vs. -Zthreads=8](https://perf.rust-lang.org/compare.html?start=niko-single-thread&end=niko-parallel-rustc-8&stat=wall-time)
* [single vs. -Zthreads=4](https://perf.rust-lang.org/compare.html?start=niko-single-thread&end=niko-parallel-rustc-4&stat=wall-time)
* [single vs. -Zthreads=2](https://perf.rust-lang.org/compare.html?start=niko-single-thread&end=niko-parallel-rustc-2&stat=wall-time)
* [single vs. -Zthreads=1](https://perf.rust-lang.org/compare.html?start=niko-single-thread&end=niko-parallel-rustc-1&stat=wall-time)
## Whole crate graph data -- from Mark's computer (8 core, 16 thread)
[Summary Sheet](https://docs.google.com/spreadsheets/d/1vadQWQQqTODU1_cAENnUjLyXM6cxms-tiCf2kCiNGGM/edit#gid=0)
> `0` is a sequential compiler for reference;
> `1` is `-Zthreads=1` which shows the seq overhead;
> `2` is `-Zthreads=2` which shows the benefit from parallel
### rustc stage 0 compilation
==> ./rustc-stage0-single-threaded <==
552.759227012 seconds time elapsed ( +- 0.13% )
==> ./rustc-stage0-multi-threaded-t2-j8 <==
683.030195011 seconds time elapsed ( +- 5.02% )
==> ./rustc-stage0-multi-threaded-t2-j16 <==
688.029377313 seconds time elapsed ( +- 7.82% )
==> ./rustc-stage0-multi-threaded-t4-j16 <==
710.423041145 seconds time elapsed ( +- 8.60% )
==> ./rustc-stage0-multi-threaded-t16-j16 <==
812.783412266 seconds time elapsed ( +- 8.37% )
==> ./rustc-stage0-multi-threaded-t2-j4 <==
957.243363191 seconds time elapsed ( +- 5.62% )
### Cargo compilation (not yet all data gathered)
==> ./cargo-multi-threaded-t1-j1 <==
553.290588371 seconds time elapsed ( +- 0.05% )
==> ./cargo-multi-threaded-t2-j1 <==
553.770275596 seconds time elapsed ( +- 0.03% )
==> ./cargo-multi-threaded-t3-j1 <==
553.533103358 seconds time elapsed ( +- 0.04% )
==> ./cargo-multi-threaded-t4-j1 <==
553.758315225 seconds time elapsed ( +- 0.03% )
==> ./cargo-multi-threaded-t6-j1 <==
553.333730522 seconds time elapsed ( +- 0.03% )
==> ./cargo-multi-threaded-t8-j1 <==
553.487285875 seconds time elapsed ( +- 0.03% )
Mark's initial thoughts:
* Do we need data on e.g. Windows, macOS? Currently perf can't do that but could manually collect