Try   HackMD

[DRAFT] parallel rustc shipping strategy

The serialness/parallelness of the compiler is dictated by code in a small number of places, such as compiler/rustc_data_structures/src/sync.rs, which defines types like Lock and operations like par_iter.

Currently there are two code paths for these places. The code to use is selected at rustc build time via cfg(parallel_compiler). E.g. Lock is a wrapper around RefCell if !cfg(parallel_compiler), or a wrapper around parking_lot::Mutex if cfg(parallel_compiler).

The end goal is to reduce these places to a single parallel code path, and for the compiler to be multithreaded. (The number of threads to use is an open question, but any default greater than 1 that gives better performance than the serial compiler would constitute success.)

Evolution

The simplest approach would be to simply remove the serial code paths and ship the parallel compiler with multithreading on by default in a single step. But that isn't reasonable due to the complexity and risk of the changes. We will need to evolve the serial/parallel/multithreading structure over a number of releases: serial (current), then parallel but single-threaded by default (-Zthreads=1), and finally parallel and multithreaded by default. This lets us pause or even go backwards if major problems occur.

There are two main paths possible.

Shorter path

Here is one possible path. In the following, "||c" is a compile-time choice between code paths, "||r" is a runtime choice between code paths, and square brackets indicate defaults.

  1. [serial] ||c parallel[-Zthreads=1]
    This is the current state. We have serial code, and parallel code. The choice is made via parallel-compiler in config.toml. When using the parallel code, the default number of threads is 1.

  2. serial ||c [parallel[-Zthreads=1]]
    If the single-threaded parallel paths are not too slow, once they are sufficiently reliable we can switch to them by default. Importantly, at this point we are shipping a parallel compiler! This means users can try out a multithreaded parallel compiler with -Zthreads, which will give us useful data about performance and reliability.

  3. serial ||c parallel[-Zthreads=2+]
    Once performance and reliability are good enough, multi-threading can be made the default.

  4. parallel[-Zthreads=2+]
    Finally, the serial code paths can be removed.

Steps 3 and 4 could be switched, or even combined.

Longer path

If the single-threaded parallel paths are too slow, a longer path will be required, taking more work. This is the path that @SparrowLii is currently pursuing.

  1. [serial] ||c parallel[-Zthreads=1]
    This is the current state, as above.

  2. [serial] ||c (parallel-single-threaded ||r parallel-multithreaded)
    This step introduces a new, temporary set of code paths on the parallel side of the build-time choice. The synchronization forms, etc., used when cfg(parallel_compiler) is true are now chosen at runtime, depending on the value of -Zthreads. (XXX: requires #109776, DynSend). For example, the parallel-single-threaded paths would still use RefCell for Lock, which would make their speed closer to that of the serial code. The runtime selectability does have a non-zero performance cost, but hopefully it will be small.

  3. serial ||c ([parallel-single-threaded] ||r parallel-multithreaded)
    Once the parallel-single-threaded code paths are reliable enough, we can switch the default to them. Importantly, at this point we are shipping a parallel compiler!

  4. serial ||c (parallel-single-threaded ||r [parallel-multithreaded])
    Once performance is good enough, the default number of threads can increase to 2 or more, switching to the parallel-multithreaded code paths with full-strength synchronization.

  5. parallel[-Zthreads=2+]
    Finally, the serial and parallel-single-threaded code paths can be removed, leaving only have parallel code paths and a default number of threads of more than 1. (Users can still choose to use -Zthreads=1, but they will get full-strength synchronization.) Removing the runtime selectability will slightly speed up the multithreaded case.

Similar to the shorter path, some later steps can be reordered or combined.