We have good benchmarking for Rust compile times, via rustc-perf. We should add benchmarking for Rust runtimes, i.e. the speed of the generated code.
There is a defunct project called lolbench which used to do this. It was only run on each Rust release. It hasn't run for a couple of years, and the website no longer exists, but the code is still available on GitHub.
The (in-progress) implementation is here.
The goals are similar to the goals of the existing compile-time benchmarks: to detect regressions and improvements in the Rust compiler.
There are several types of runtime performance changes that we would like to detect:
An explicit non-goal is to compare Rust speed against the speed of other languages.
This is possibly the most important section and also one with most unanswered questions. How do we actually measure the performance of the runtime benchmarks? Here are three groups of metrics that we could use, from the most to the least stable:
rustc
?), but it's also the most difficult one to measure correctly. We could measure wall times by launching the benchmark multiple times (here we have an advantage vs the comptime benchmarks, because the runtime ones will probably be much quicker to execute than compiling a crate), but there will probably still be considerable noise.We could try to employ techniques to reduce noise (disable ASLR, hyper-threading, turbo-boost, interrupts, pin threads to cores, compile code with over-aligned functions etc.), but that won't solve everything.
A related question is what tool to use to actually execute the benchmarks. Using cargo bench
or Criterion
will probably be quite noisy and only produce wall-times. We could measure the other metrics too, but they would include the benchmark tool itself, which seems bad.
Another option would be to write a small benchmark runner that would e.g. let the user define a block of code to be benchmarked using a macro. We could then use e.g. perf_event_open
to manually gather metrics only for that specific block of code. This is basically what iai
does (although it seems unmaintained?).
There are many things that we could benchmark. We could roughly divide them into two categories (although the distinction might not always be clear):
n-body
simulation, searching in text using the regex
/aho-corasick
crate, stress-testing a hashbrown
table etc.These two categories kind of correspond to the existing primary
and secondary
categories of comptime benchmarks.
Example pull request that adds a runtime benchmark: https://github.com/rust-lang/rustc-perf/pull/1459
slice::sort_unstable
microbenchmarks from stdlib
and port them to rustc-perf
.regex
to go through a body of text and find/replace several regexes.I-slow
label from the rustc
repo. Candidates:
We should decide in which repository should the benchmarks live.
rustc-perf
Since we will most probably want to use the existing rustc-perf infrastructure for storing and visualising the results, this is an obvious choice.rust
This would probably make it more discoverable and easier to add for existing rustc developers, but the same could be said about the existing comptime benchmarks.lolbench
) Probably not worth it to stretch the benchmarks amongst yet another repository.What configurations should we measure. A vanilla release
build is the obvious starting point. Any others?
Obviously, it will take some CI time to execute the benchmarks. We should decide whether the current perf.rlo infrastructure can handle it.
Since the vast majority of rustc commits shouldn't affect codegen, we can make the runtime benchmarks optional for manual perf. runs (e.g. @rust-timer build runtime=yes
). In terms of automated runs on merge commits, we could only run the benchmarks e.g. if some specific parts of the compiler are changed (MIR, LLVM). But this was already envisioned for comptime benchmarks and may not work well.
We should also think about how important will the runtime benchmarks be for us. Are they lower or higher priority than comptime benchmarks? Do we want to stop merging a PR because of a runtime regression? Do we want to run the runtime suite on all merged commits?
We could reuse the rustc-perf
infrastructure. We can use the same DB as is used for comptime benchmarks (e.g. store profile=opt, scenario=runtime or something like that, and reuse the pstat_series
table). We could put the runtime benchmarking code under the collector
crate, because we will need the existing infrastructure to use a specific rustc version for actually compiling the runtime benchmarks.
Then we could prepare a benchmarking mini-library that would allow crates to register a set of benchmarks, it would execute them and write the results as e.g. JSON to stdout.
Something like this:
fn main() {
let mut suite = BenchmarkSuite::new();
suite.register("bench1", || { ... });
suite.run();
}
Then we could create a new directory for runtime benchmarks in collector
. We could either put all of the runtime benchmarks into a single crate or (preferably) create several crates (each with different dependencies etc., based on the needs of its benchmarks) that would contain a set of benchmarks.
collector
would then go through all the crates, build them, execute them, read the results from the JSON and store it into the DB. I'm not sure how should the interface look like, maybe we could introduce another level to the existing commands, like bench_next comptime --profiles ...
and bench_next runtime --benchmarks x,y,z
.
A sketch of this implementation can be found here.
Since we already have the perf.RLO dashboard, it probably makes the most sense to reuse it. Even though we will probably reuse the DB structure for the existing comptime benchmarks, the runtime benchmarks might want a separate UI page, because of these reasons: