Rust Compiler Team
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
      • No invitee
    • Publish Note

      Publish Note

      Everyone on the web can find and read all notes of this public team.
      Once published, notes can be searched and viewed by anyone online.
      See published notes
      Please check the box to agree to the Community Guidelines.
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
No invitee
Publish Note

Publish Note

Everyone on the web can find and read all notes of this public team.
Once published, notes can be searched and viewed by anyone online.
See published notes
Please check the box to agree to the Community Guidelines.
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
1
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Compiler performance roadmap for 2022 **Update:** *With 2022 having passed, this roadmap is now a historical document. It was a useful tool and guided a lot of performance work in 2022. Good progress was made on most top-level items, except for "Better UX for perf evaluation" where not much progress was made. Future performance work will be directed elsewhere.* There are many ideas on how to reduce Rust compile times and avoid regressions. It can be hard to know in advance which ones are likely to work. This document outlines a relatively small number of tractable tasks with a high chance of success that we hope to complete in 2022. It is partly informed by the [analysis](https://hackmd.io/mxdn4U58Su-UQXwzOHpHag?view) on the lqd's [large-scale profiling exercise](https://github.com/lqd/rustc-benchmarking-data). It is best to think of this document as a rough guide, rather than a strict prescription. There are ideas not covered by this roadmap that could help compiler perf, including large projects like the parallel compiler. This document's existence does not preclude people working on those ideas. As well as a plan, this document will serve as a means of tracking who is doing/has done what work. Task assignations are shown in square bracket, e.g. "\[name\]". Task completion is indicated like so: - [ ] incomplete - [x] complete ## Faster single crate compilation Tasks that will speed up compilation of individual crates. - [ ] **Fully optimize `rustc` on all tier-1 platforms**. There are various outside-the-compiler optimizations *only* applied to x86_64-unknown-linux-gnu. We should bring these optimizations to other popular tier-1 platforms like Mac and Windows. In some cases this may be as easy as using the relevant scripts on the appropriate CI builders. This could give 15-30% wins for relatively little effort! - [x] PGO on Windows \[lqd + Kobzol, done in [#96978](https://github.com/rust-lang/rust/pull/96978), lots of 10-20% wins] - [ ] PGO on Mac [CI configuration/capacity is the limiting factor here] - [ ] Build LLVM with ThinLTO on Windows. - [ ] Build LLVM with ThinLTO on Mac. - [x] Use [BOLT](https://research.facebook.com/publications/bolt-a-practical-binary-optimizer-for-data-centers-and-beyond/) on x86-64 Linux (the only Rust tier 1 platform supported by BOLT). Expected 5-10% improvement atop the existing set of PGO/ThinLTO. \[Kobzol, [#94381](https://github.com/rust-lang/rust/pull/94381) did it for the LLVM backend, giving 3.5% bootstrap improvement and 3.6% average max-rss improvement.] - [ ] A better allocator on Windows than the system allocator. Issue [#103595](https://github.com/rust-lang/rust/issues/103595) is tracking PGO/LTO/BOLT/allocators more comprehensively across all the most popular platforms. - [x] **Hot code.** Try to optimize the hottest code, as identified by Cachegrind/DHAT/etc in lqd's data set. - [x] `parse_tt` and other functions related to macro parsing are easily the hottest, and do many allocations, and are likely worth significant effort. \[nnethercote, this [blog post](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) has details\] - [x] Some crates trigger uses of huge `BitSet`s, e.g. http-0.2.6. This causes high memory usage, lots of `memcpy`ing, etc. \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\] - [x] There is a long tail of opportunities that affect a few crates. It will be worth looking at each of them briefly, there are probably a few easy wins. See the [analysis](https://hackmd.io/mxdn4U58Su-UQXwzOHpHag?view) document for details. [nnethercote + others] - [x] Metadata decoding takes roughly constant time for crates like `std` and `core`. For tiny crates this is a high fraction of compilation time, and may be worth some effort reducing. It's also something that is repeated for every crate in a multi-crate project. \[martingms, [#95981](https://github.com/rust-lang/rust/pull/95981), helped a bit\] \] \[nnethercote, [#95981](https://github.com/rust-lang/rust/pull/95981), helped a lot] - [x] **FxHasher improvements.** [#93651](https://github.com/rust-lang/rust/pull/93651) refers to a promising change to `rustc-hash`. The performance benefit should be reconfirmed and if it persists, the improvement merged and imported into rustc. \[lqd, [#96863](https://github.com/rust-lang/rust/pull/96893); results seemed good on AMD, but indifferent on Intel. We decided to give up on this because `FxHasher` is a tar pit and lots of time has been previously wasted on attempts to improve it.] - [ ] **Linker improvements.** - [ ] `lld` is much faster than the default linkers, but isn't used by default. Can it be? [lqd, [MCP 510](https://github.com/rust-lang/compiler-team/issues/510) is the first step, which links to other PRs.] - [x] `mold` is a new linker that is reputed to be even faster than `lld`. We should investigate how well it works with Rust, and whether it an be made easier to use, including documentation. Given that it is new and not yet multi-platform, this would be for opt-in use. [nnethercote successfully tried `-Clink-arg=-fuse-ld=lld` on a small program on Linux, found it to be faster than lld, and updated the perf-book accordingly. lqd's work on `lld` will also partly pave the way for `mold` in the future.] ## Faster project compilation Tasks that will speed up the compilation of multi-crate projects. - [x] **Improve cargo scheduling**. The choice of which crates to compile first can greatly affect overall time. Use better heuristics to improve scheduling, e.g. based on crates size, or past records of compile times. Support for crate priority (a measure of the transitive number of dependencies) in the wait queue has landed in [#11032](https://github.com/rust-lang/cargo/pull/11032) (and there are more benchmark results [here](https://github.com/lqd/rustc-benchmarking-data/tree/main/experiments/cargo-schedules/pending-queue-sorted)): when there are multiple crates waiting to be built, e.g. when there are a lot of units of work or a low core-count, the highest priority will be picked next. [lqd. More work can be done here, like described in the description, and an experiment where these priorities are computed differently, where popular proc-macros are preferred, can also be found [here](https://github.com/lqd/rustc-benchmarking-data/tree/main/experiments/cargo-schedules/pending-queue-prioritized). Similarly, we could also choose different defaults targeting compilation speed, like disabling debuginfo for build dependencies in [#10493](https://github.com/rust-lang/cargo/pull/10493): the topics can be subtle and require interaction with the cargo team, which has little time at the moment. Both directions have potential and measured improvements, but now may not be the best time to pursue them] - [ ] **Improve `syn`/`proc-macro2`/`quote`**. These crates incur a lot of compilation costs, being among the most popular crates and quite slow to compile. Also, they block compilation of proc macro crates that are themselves often slow to compile. This leads to long and slow dependency chains like: `proc-macro2`, `quote`, `syn`, `serde_derive`, `serde`,`serde_json`. (And that omits the build scripts.) Can they be improved somehow? - [ ] **Investigate build script costs.** Build scripts are typically tiny but take a surprisingly long time to compile. What is going on, and can they be improved? Alternatively, can we provide features (e.g. in Cargo) to avoid the need for build scripts that just do simple things like setting conditional compilation flags? [lqd, in-progress. Note: build scripts are a cargo concept, so any changes there may need time from t-cargo and the rustup wg, which is currently limited as mentioned above] ## Better benchmarks [rustc-perf](https://github.com/rust-lang/rustc-perf/) is the primary benchmark suite used for gauging Rust compilation speed, in particular for detecting improvements and regressions. It can be improved in several ways. - [x] **Split into "primary" and "secondary" benchmarks.** The suite contains a lot of "real-world" crates, like `cargo`, `diesel`, `ripgrep`, and `syn`. It also contains a number of "synthetic" stress tests and microbenchmarks, either extracted from real-world code, or simply constructed out of thin air. When looking at the performance effects of a change on perf.rust-lang.org, changes to the speed of real-world tests is more important than changes to the speed of synthetic tests, but the latter often crowd out the former. It should be simple to split the suite into "primary" and "secondary" benchmarks, and present the primary results on all pages before the secondary results. The division would be roughly "real-world" vs. "synthetic", but the terms "primary" and "secondary" allow for some human judgment. E.g. the `helloworld` benchmark isn't exactly "real-world" but it's a useful indication of the minimum time taken for any program, and might be considered a "primary" benchmark despite its simplicity. \[Kobzol, [#1181](https://github.com/rust-lang/rustc-perf/pull/1181)\] - [x] **Update the real-world benchmarks.** The versions in the suite are mostly quite old, e.g. 3 or 4 years. For example, the `syn` crate version in the suite is 0.11.11 from April 2017, but the latest version of that crate at the time of writing is 1.0.86. We should update the real-world benchmarks to their latest versions to ensure we are benchmarking widely-used code. The previous upgrade of `hyper` (and its renaming within the suite as `hyper-2`) provides precedent for this, though perhaps we should switch to using the crate version number, e.g. `syn-1.0.86`, which would be visually noisier but have a clearer meaning. \[nnethercote, rylev, lqd, Kobzol; tracked [here](https://hackmd.io/d9uE7qgtTWKDLivy0uoVQw)\] - [x] **Add new benchmarks.** lqd's data set identifies the following. - Widely-used crates that are not in the benchmark suite, such as `libc`, `quote`, `proc-macro2`, `cfg-if`, and `log`. - Hot macro parsing functions (like `parse_tt`), used in numerous crates, that the existing benchmarks hardly use. - Crates that stress the compiler in interesting ways, such as `nalgebra`. We should consider adding new benchmarks to represent these. We should also consider adding one or more build scripts (as leaf crates, if possible) because these are common but not represented. \[nnethercote, tracked [here](https://hackmd.io/d9uE7qgtTWKDLivy0uoVQw)\] - [x] **Remove or combine low-value benchmarks.** So that the suite doesn't grow in size monotonically, we should remove or combine benchmarks that are uninteresting. Highly synthetic benchmarks should be candidates for this; it is generally better to have a real-world benchmark that stresses a particular aspect of the compiler. For example, `tuple-stress` contains a huge literal extracted from a real program, whereas `deep-vector` contains a huge `vec!` literal with thousands of zeroes. And we don't need three different benchmarks testing "deeply nested" behaviour. \[nnethercote, tracked [here](https://hackmd.io/d9uE7qgtTWKDLivy0uoVQw)\] - [x] **Establish benchmark add/remove/update policies.** Previous additions, removals, and updates have been done on an ad hoc basis. It would be good to establish some basic policies around this. For example, should we update real-world crates every 1 year? Every 2 years? \[nnethercote, [#1318](https://github.com/rust-lang/rustc-perf/pull/1318)\] - [ ] **Consider multi-crate benchmarks.** The suite mostly measures intra-crate compilation speed, specifically that of the final crate compiled in a package. (There is also one multi-crate benchmark, measuring the time taken to compile the compiler itself. This is a moving target because it always compiles the latest version of the compiler.) Project-level improvements, such as pipelining, do not show up much in the suite results. We should consider whether to add new benchmark types to capture the multi-crate improvements/regressions. Adding `cargo --timings` to the profiling tools would also be useful. ## Better UX for perf evaluation Identify and implement features and/or process improvements to improve drive-by performance evaluation of pull requests, both in likelihood of occurrence and effective decision making. - [ ] **Define a set of rules for go/no-go decisions**. Enable clear-cut cases to land without manual review by perf team (as signaled by rust-timer comments), and cases that should be referred for evaluation to the performance team. [All comments now have a clear "ACTION NEEDED"/"no action needed" marker, and the perf team is now CC'd to all "ACTION NEEDED" ones and often triage/comment quickly. Not quite what this item was asking for, but both changes make life easier for rustc devs who don't intimately understand the perf suite.] - [ ] **Survey compiler team on impediments to result triage**. Enable triage of perf.rust-lang.org results and/or local investigation into them. Depending on results of discussions, may involve cutting down time to perf.r-l.o results (try + perf run) or providing more granular per-benchmark results pages (e.g., cachegrind diffs). [Known problems: regressions in rollups; noisy results] - [ ] **Improve memory usage tracking**. Current max-rss statistics are sufficiently noisy that the significance threshold is often high, impeding small improvements from making a difference. Plus, tackling memory usage *not* at peak is also important and not tracked at all. This is of particular importance as core counts in machines greatly increase in the next 2-3 years, while available memory remains largely flat, meaning that memory usage in rustc must go down if we are to spread across all cores. - [ ] **Expose more/all metrics into decision-making**. Currently, the only metrics we use are the instructions:u for benchmarks, excluding all other metrics. We also don't include bootstrap data into summary reports. These metrics are then essentially not considered when evaluating performance changes. [RSS and cycles are now put in the GitHub comments, but more can be done here]

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template is not available.
Upgrade
All
  • All
  • Team
No template found.

Create custom template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

How to use Slide mode

API Docs

Edit in VSCode

Install browser extension

Get in Touch

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Upgrade to Prime Plan

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

No updates to save
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully