HackMD
  • Prime
    Prime  Full-text search on all paid plans
    Search anywhere and reach everything in a Workspace with Prime plan.
    Got it
      • Create new note
      • Create a note from template
    • Prime  Full-text search on all paid plans
      Prime  Full-text search on all paid plans
      Search anywhere and reach everything in a Workspace with Prime plan.
      Got it
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Save as template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
    Menu Sharing Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Gist
    Import
    Dropbox Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Write
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note      
    Published Linked with GitHub
    Like BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- breaks: false --- ### Introduction Small projects build relatively quickly, with a combination of check builds and incremental compilation, but issues compound at scale, e.g. with larger projects with interdependent crates and where available parallelism is often untapped. It's hard to specify precise projects in advance. Most of the initial work would be investigating reasons for any current slowness and then proposing solutions. Hopefully a holistic look at how users are experiencing compile times. We don't have to limit investigations to the perf.rlo benchmarks, and measurements can be gathered on real projects. (Low-hanging fruit, easily seen in our existing benchmarks, is disappearing, but there surely still are some to be found elsewhere) Similarly, we can also look at wins by - aggregation of marginal gains: even small wins compound with numbers (many small wins become big wins), and over time - reducing ecosystem-wide TCO: even small wins compound over space, i.e. all users. Some possible priorities: - A. Making pipelined compilation as efficient as possible - B. Improving raw compilation speed - C. Improve support for persistent, cached and distributed builds (for cases where incremental compilation is not an obvious win but caching can sometimes be one, e.g. on CI) (we will refer to these items elsewhere in the document as e.g. "theme-pipelined-compilation") ### Areas - Improving compilation times by making rustc/cargo faster - Improving compilation times by preventing rustc/cargo from getting slower - Improving compilation times by making changes to the input code: finding sources of slowness in a project (e.g. unused dependencies, duplicate dependencies, buggy build scripts causing unnecessary recompiles, stalling pipelines due to architecture e.g. some possibly slow to build and execute proc-macros) - Helping with other people's tasks achieving the same goals (e.g. reviews, benchmarks, etc) Some shorter tasks' size/complexity (S/M/L) can roughly be guesstimated and we'll refer to them as e.g. "size-s", or open-ended as ongoing work and harder to measure and call complete ("size-open-ended"). Only scope 1 above is easily directly measurable, scope 2-4 are indirect and more open-ended, although actual concrete tasks are identifiable as parts of ongoing work towards a goal. --- ### 1. Gathering benchmarks to detect problems and opportunities (the benchmarking itself is generally size-s) - benchmark/optimize execution and compile times of key crates in the ecosystem, syn/quote/serde and similar slow-to-compile crates that are heavily being depended on, or the list of most popular crates (e.g. the playground refers to the 100/200 most popular ones), and so on. Goal: improvements to all dependent crates. - look for extreme cases of compilation (pathologic compile times?) on crates.io, look for crates exercizing the most the different queries, compilation phases and passes, etc. At some scale, quadratic spots can be hit, e.g. in coherence checking; and separate rustc from LLVM (and evaluate the impact of e.g. polymorphization) with at least: * timely / differential-dataflow (likely nowadays: materialize): Frank has mentioned before that they generated gigabytes of LLVM IR (from what I imagine was excessive monomorphization and extremely generic APIs) - with the increasing use of compile-time features, check if the CTFE engine/miri are imposing a big cost in practice (and if so, see how much a better VM / faster interpreter, some JIT, etc, could help). Benchmarking: size-s - analyze proc-macro usage on all of crates.io to see their impact on the ecosystem, look for common work and patterns that could be moved into the proc-macro server, as well as help move along in-progress work there (and try to find optimization opportunities there if needed). - try to assess the `parallel-compiler` in-progress work on all of crates.io (I'm not sure whether this cfg compiling is enforced in CI, but it does still compile as of writing.). Although, results could be seen as a min-threshold since IIUC some queries could be re-architectured to make opt-in better use of parallelism (e.g. doing some work at the module level in parallel) - maybe interesting, contrast miniserde vs serde vs serde-erased: their design goals are different, see if how they respectively impact compile times and the ecosystem. (and a possible link to "replacing monomorphization with dynamic dispatch") The first step of gathering data is in-progress, and we've tried to focus on: - differentiating leaf crates, from the full compilation schedule - CPU and memory usage measurements - metrics of "size" (line counts, amount of LLVM IR or MIR) to have some idea of throughput in looking for outliers - cargo timings (although available parallelism here needs to be tailored carefully to common realistic cases) ### 2. Helping others improve perf, and people help themselves (size-open-ended, indirect tasks, scope 2-4) - help with the weekly perf report done by the performance WG - collaboration with interested people - observability and allowing people to understand which parts of their code or dependencies slow compile times down: a mix between showing unused dependencies & duplicates, cargo llvm-lines and cargo bloat, displaying heavy items in the monomorphization graph and their actual cost in CGUs rather than MIR statement counts, etc to see how a user could improve build times by changing their own project (pulling things into different crates, limiting mono items using dynamic dispatch, outlining and polymorphization opportunities). Sometimes long compile times are caused by unexpected rebuilds via build.rs: there's a hard to find environment variable to track the cause of these rebuilds, which can help fix these issues. We should at least document it better, maybe surface it in the CLI in a more prominent manner. Maybe being able to monitor builds over time and show possible improvements (to users or rustc devs), e.g. single-core bottlenecks in their cargo timings graphs (not necessarily related to the old telemetry ideas). ### 3. rustc (generally theme-raw-compilation-speed) #### 3a. rustc - beginning of the pipeline - help the frontend work to make it more incremental and queryfied, and parallel. (There are many open issues and PRs: each having different scope, so more "size-open-ended") - emit rmeta's as early in the pipeline as possible, both in efficiency and latency. Some work in this area has already been tried by njn and it didn't make that much of an improvement. The situation may have changed: higher core counts, deeper/wider build graphs, and could also be more impactful if combined with cargo-specific build schedule changes we'll touch on below (theme-pipelined-compilation. And there are some quick things to try here as size-s.) - optimize encoding/decoding of metadata files as well as incremental compilation caches, e.g. LEB128 encoding via SIMD, as well as encoding & decoding of Spans. Note that LEB128 handling has already been very optimized, and low-hanging fruit are unlikely here. (Some SIMD LEB128 experiments chould be size-s/size-m if SSSE3 can be detected early enough -- of course, the concerns of micro-architecture levels apply here and all similar topics requiring levels other than 1, be it in libcore/libstd or rustc. Scalar improvements could require to rearchitect things to not encode one int at a time) (theme-pipelined-compilation as well) - (probably not super interesting: a few things have been tried before by njn) lexing/parsing: a few years back, some perf tests were done on the list of pre-interned symbols (as well as a few PRs I don't remember being merged, to improve cacheability and open up parallelism opportunities here) which seemed interesting to try again, esp as one such change had landed and was categorized as "rogue optimization" in a t-compiler meeting. Gather a list of most common identifiers/names on crates.io and libstd/libcore, and use them as pre-interned symbols to speed up lexing/parsing. Or use data from previous compilations to intern the most common symbols in the current crate, or from the whole crate graph to limit symbols lookup when importing metadata. - look for parallelism opportunities in incremental compilation cache encoding #### 3b. rustc - general - there are obviously allocations on the happy path that could be avoided (e.g. the most recent example from my own memory in reordering generic parameters), and some similar thinking could be trying to avoid gather diagnostic data if there are ultimately no errors that would make use of it (kind of size-open-ended as this entails many small tasks). (But those areas are hard to find, and tooling will not easily help out to point at areas of interest) - look into reducing memory usage by dropping pieces of context earlier (from recent perf tests on the intervals PR, I was surprised at some state being retained in borrowck until codegen and the final context drop). Also look into doing the drops on another thread (if that's not done already, I think it's not) e.g. the "drop AST" tasks. (Although there can be an impact on allocators tailored for highly parallel applications, e.g. it could drain tcmalloc's size class unexpectedly). Generally size-s, size-m. (Can we use the multi-process trick where an allocating process takes the hit of memory deallocation while the initiating process handles the results and can quit earlier?) - how well does the query system work for both its memoization and incremental compilation use-cases ? do they impose significant costs to one another ? (e.g. would they individually work better if separated ? say, hash stability looks like a concern for cross-session serialization that could impact general use and memoization when needing to limit the existing structures' capabilities à la removing some `Ord` impls recently) - related (although long, complicated, unlikely to be worthwhile): can salsa be evaluated, on the subsets it shares with the query system ? - are there ideas from the polonius work on location insensitivity, separating and avoiding work for loans and subset errors, that could be applied to NLLs ? - Obligations and ObligationForest (although some of this has been worked, and has been optimized extensively already since it's so hot in profiles): * a bunch of in-progress work was never (to my knowledge) completed (in rustc and possibly in ena as well), and since this is quite regularly hot, maybe that could help here * obligations are stored in a Vec but require deduplication, and this has some performance impact (and in some cases, a big impact on compile times, cf. recent issues), improvements there could be using a dedicated set (or understanding the source of these duplicate obligations: the deduping is a workaround, and may be avoidable) - some parts that are expected to be feature gated impact the rest of the code even when not active (hard to find in general, maybe sifting through old perf.rlo benchmarks results could help locate some of these; but in general tools cannot easily help here), but a couple examples are/were known and there may be others: * the closure capture RFC 2229 implementation (it's now enabled by default on edition 2021, and therefore not a concern per se) * some checks in negative impls appear to impact regular code (because coherence can already be hot at scale), although some work has already been done to mitigate them - allocators (generally size-s): * revisit jemalloc for sized deallocations (some activity has been done in this area and it seems like they're an actual loss whereas the authors didn't expect that). Open issues there so that they can be fixed if possible. Check jemalloc stats for the benchmarked cases. Possibly move to the 5.3 release which should be released soon. * if jemalloc's fast-path sized deallocations aren't a actually win (while we were expected to be using its slow path), then test alternatives: mimalloc seems to have faster non-sized deallocs (and using it was generally a win in rustc benchmarks at the time, compared to older versions of jemalloc before switching to tikv's most recent rev) or tcmalloc (I don't remember it being tested recently, possibly ever). (mimalloc could also be nice to evaluate on windows, as the rustc allocator for the win32 targets) * mimalloc: in my limited tests (mostly incremental + debug profiles on the shorter perf.rlo benchmarks), it seemed like a win of a few 1-5% (but big regressions in max-rss, supposedly fixed for later beta releases of mimalloc, but these also regressed performance). And apparently since I've written this, someone else also started evaluating this. * jemalloc seems to have windows CI and testing under windows: is the situation similar to years ago when using on windows was pessimizing, or have these regressions since been fixed ? * snmalloc (a couple years ago its performance was close to mimalloc and some inspiration happened between the two) (but uses sized deallocations IIRC): snmalloc v1's regular standard "unsized" deallocations are in my tests slower than jemalloc's (or mimalloc's), but up to 4% improvements with sdallocx, with no big issue (apart from a const eval test failing unexpectedly for OOM, and which needs investigating: it looks like an allocator panic/abort rather than returning an error that the allocation failed). Less lock/mutex use shows up in the profile compare to jemalloc, and supposedly multi-threaded performance and cross-thread deallocations are a key feature (which ties up nicely with the other "Drop XXX" items in this list). snmalloc v2 is being worked on with more improvements. Promising in any case. (Although I'm not quite 100% confident I managed to override LLVM's allocator at the same time as rustc's) * amanieu mentioned rpmalloc as another possibility - evaluate MIR optimizations and their potential: * what's the maximum performance LLVM could achieve if rustc generated perfect IR (size-s?). Also: the MIR inliner looked promising in recent tests, it would be good to better analyze and understand where it does offer gains (that is, it intuitively should be because inlining allows rustc to then optimize the MIR better, and greatly reduce the amount of IR given to LLVM). * monomorphizations are likely the area with the biggest potential for improvement and impact (and the source of many wins in the past): targeted optimizations there (or manual hints ?) to limit size and number of instantiations will help. Polymorphization is intuitively of interest here, and evaluating it globally, finding possible issues and expanding its scope is expected to be worthwhile. Focusing existing MIR optimizations on the most heavily instantiated generic functions could be as well. On the opposite side of the spectrum, there could be an opt-in to trading runtime speed for compilation speed, e.g by switching from generic function monomorphization to use dynamic dispatch instead. * iterators seem to generate a ton of IR that LLVM can take a while to handle, are there such patterns that codegen emits ? * how common is drop-glue bloat due to inlining a bunch of code in leaf crates (IIRC alex crichton had opened an issue about this, encountered in one the stdlib types) ? Manually outlining these would help here (but maybe the LLVM IR outliner & similarity analyzer passes are worth trying as well ?) * (or: pcwalton's local-GVN tests, although incomplete, seemed interesting as well; maybe look into that.) - hashing: * there must still be things that are hashed multiple times à la hashes-of-hashes (as that happened before), check this * quantify the impact of incremental hash verifications (+ current non-determinism): can it be graphed and tracked on perf.rlo, and most importantly can it be reduced and removed altogether (this is hard and would require completing the work to limit access to data that is not stable across sessions, it's already in-progress and helping there could be interesting) * can we use e.g. LCGs + Unhash on things to reduce the number of pieces of data hashed very often * check if we can actually make some changes in fxhash (unlikely to be super worthwhile): some WIP PRs exist, ahash's creator could be starting a low quality hash, maybe some ILP improvements could help throughput without increasing collisions (or vectorizing even more the hashing of bigger non-primitive types). A lot of work has already been put into this, so easy pickings are unlikely, as fxhash is frustratingly effective in our benchmarks. * (.. or in hashing of the incremental compilation cache ? where cryptographic properties are more important than runtime hashing speed per se) * why is there MD5 hashing in incremental, and why are we sometimes hashing full source code with fxhash (check if that's indeed fast enough and out of the critical path as we think it is) instead of other unique interned value about that piece of code * re-analyze the sizes and distributions of values for hashes and keys: lots of small values, lots of empty tables in the end, and see if we can special-case/optimize the most common cases - random codegen and simd: * technically rustc is a rust program and improving codegen improves rust compile times (of course, this is indirect and hard to test in the absence of runtime benchmarks à la lolbench). Some work could be interesting there, a lot of tests have been done in the past, including some apparently showing the niche layout optimization (on Option-like enums) can be a pessimization in some cases, because of bad codegen of match/discriminant checking (it's not clear whether fixing that could be an interesting compiler throughput improvement, and there are a couple of in-progress work towards fixing it. It does block more niche optimization work from landing though.). * some of the derives lower to AST in interesting ways, e.g. PartialEq/Eq on enums extract the discriminant and then do a match on the values which also lower to extracting the discriminant, check if that AST lowering has impact downstream, in MIR construction all the way to codegen ? * the included Derive macros compile slowly compared to a manual implementation of the traits (e.g. rylev's benchmarks), and can compile slowly in general. Can their cost be evaluated on all of crates.io, and see how much would using MIR shims help and defer costs until they're actually used. (Maybe recognizable dummy implementations could also reduce typechecking and other steps: if there may be a way to take advantage of their final known structure, or similarity between different instantations, to avoid early/middle work ?) (Maybe here as well look whether polymorphization could help improve the scaling behaviour when there's a lot of Derives) * are there hot loops with bounds checks in them ? they often are hurtful in performance sensitive contexts. (at a recent LLVM dev meeting there was mention of a recent pass trying to use dominators to help track redundant checks, and that could be interesting to try) * use SIMD in libcore/libstd/rustc e.g. for rust_data_structures; the intervalset could use it for compression, searches and intersections, and there are a bunch of Lemire's work that could provide inspiration. Maybe collecting use-cases could help provide incentive to improve/fix the situation. * changes to libstd, e.g. sorting with SIMD acceleration (à la VxSort) and/or Lomuto/Hoare partitioning (unfortunately, stjepang has deleted all his github repositories, including the pattern-defeating quicksort benchmarks he did when implementing it into the unstable sorting functions in the stdlib) * check usage of the crc32fast crate, and if rustc is indeed able to take advantage of SSE4.2 at runtime * check real-world impact of aborting when a panic occurs during drop, it looked promising on syn in amanieu's WIP PR - data structures (more exploratory with size-open-ended): * try using intervals/interval set in more areas of the compiler: e.g. in borrowck, compute reachability at the same time as the SCCs à la Nuutila, or where dataflow is super sparse, or where some metadata is super dense (CTFE allocations) * IntervalSet: check if datafrog's Leapers optimizations can apply to bulk inserting ranges, and caching binary search insertion points when inserting multiple (sorted) intervals. (Or if van Emde Boas and Eytzinger layouts from e.g. Paul Khuong's papers on binary search layouts could be useful here). (Also: SIMD). * can some point-queries be turned into data-oriented range-queries (and whether specialized structures could help here, e.g. implicit in-order forests, various segment/fenwick/etc trees) * (are there heavily used sorted structures where tries, adaptive radix trees, etc could help ?) * (check cranelift's bforests properties ?) * can probabilistic data structures à la bloom/binary fuse filters speed up the fast type rejection in coherence on cases where there cannot be any overlapping impls (or help lower the cost of seemingly quadratic overlap checks) ? * in general, there's often (bigger) objects with disparate data, partially read in different contexts. Splitting some of these entities in hot/cold sets, making them smaller and densely packed together in memory, in addition to following more data-oriented design principles, should help. (this conjecture would need to be validated in benchmarks but it's so common that it is a likely source of cache misses in all hot pieces of code) #### 3c. rustc - end of the pipeline - split debuginfo can be a big time saver in debug builds for big crates, evaluate it (do we need to check and track the performance of thorin ?) to have data for stabilization, and inclusion in "fast build" profiles. (Could it also help when building rustc itself with debuginfo ?). Look for crates with big debuginfo and where linking takes a long time and see how it helps there. (theme-pipelined-compilation) - try to evaluate and see how to evolve the linker situation, both in detangling rustc from it, and checking numbers and issues on LLD, and mold in non-LTO cases (e.g. test it out on all of crates.io). (Also: split dwarf helps reducing the work done by the linker in debug builds.) (theme-pipelined-compilation) - codegen: can there be some parallelism be moved up into codegen_ssa (although: could create tricky requirements on the backends, even if cg_clif doesn't really use cg_ssa that much IIRC) or in CGU partitioning ? - LLVM (there were ideas in theme-raw-compilation-speed to track perf on CI with LLVM master, but the breaking changes there would make that hard, although for this cycle it seems we've tracked these way earlier than we used to): * LLD master is supposedly faster than 13.0.1, so that could interesting to verify in the LLVM 14 upgrade * IIRC clang 14 has a recent new LICM optimization hoisting some loads, check if that's in clang or LLVM, and could be worthy of tests with the LLVM 14 upgrade as well. * (other fixes to optimizations, causing actual issues in rust (e.g. creating strings from constant data) will be present in LLVM14 as well). * while we wouldn't expect to be making perf improvements to LLVM itself (besides WG-llvm members, that is) the opposite could be done: finding actual issues, be that related to the newPM, or catastrophic inlining issues that still exist, etc. - cg_clif & cranelift backend: (can range from size-s to size-m, to unfeasible) * it seems the work turning the backend into a rustup component, making it available to test for more people, is stalled on some windows tests or work * cranelift also lacks features present in gcc and llvm, like inline assembly, possibly some unwinding work as well IIRC: how should graceful errors or degradation work here, so that it can land and be iterated upon ? * update the results for the rust benchmark suite for the backend (it'll be good to check if all old and new benchmarks look similar as the most recent published numbers) * test it out on all of crates.io, to see if the couple missing features are in fact seldom depended on (although the new inline assembly support recently being stabilized will tend to increase that possible pain point) * check if the lazy JIT code and plans are still good, or what remains to be done and tested here * check the existing backend's parallelism (in theory cranelift was designed to allow for one function te be compiled per-thread) * cranelift is in the middle of an intertwined transition to ISLE and regalloc2, which should improve performance and unlock other opportunities. Helping there could improve the cg_clif backend sooner rather than later. - PGO (generally size-s; but not generally easy to test ?): * the current crates used for PGO are a subset of the benchmarked crates, with apparently good results on the other benchmarks. But what is the effect on crates.io ? * can this subset be improved for both perf.rlo and crates.io ? are there possibly other more interesting crates, exercizing code paths not currently profiled here ? * can we enable it on more targets than only x64 linux ? * BOLT: evaluate the effect it could have on rustc (although: could be using intel-only LBR?) and LLVM * recent published research looked into machine-learned PGO without a profiling step, replacing LLVM's BPI heuristics with offline-trained model results. Maybe could be interesting to try out e.g. to speed up bootstrap and our CI. ### 4. Cargo and build systems (mostly theme-pipelined-compilation) - people don't usually use rustc directly, so see if there are important cargo codepaths that could be sped up / parallelized - look into tuning cargo's build plan / timings to see if the parallelism level can be improved ? (are there interesting algorithms in taskflow that could be of interest here ?). Or others from the DAG scheduling literature: there are a few cases where pipelines can be stalled by long tasks, delaying successors. If it were known to be the case, they could be scheduled earlier in the pipeline; here providing scheduling hints, or using previous builds' timings to prioritize how cargo schedules units of work could offer that (e.g. prioritizing units of work on the critical path). This and alternative scheduling could be evaluated on all of crates.io. There are already issues about this: https://github.com/rust-lang/cargo/issues/7396, https://github.com/rust-lang/cargo/issues/5125, or https://github.com/rust-lang/cargo/issues/7437 - could we map cargo's build graph to a causal profile (Coz), and evaluate virtual speedups per crate over the complete schedule ? - check if cargo's named profiles have all the settings needed for fast build configurations (IIRC they mostly do) to be able to use the cg_clif backend, share_generics, LLD/Mold, etc (maybe setting the linker via Cargo still requires rustflags ?) - evaluate possible faster cargo configs on all of crates.io - evaluate the -Z binary-dep-depinfo flag (and if it's currently set by tools and the perf collector and so on, I think it is indeed) and if we can use it by default (there's a compiler-team MCP about it IIRC) to help track transitive dependencies (and unused ones?) e.g with better build plan from a point above - surely a ton more here (including some tasks about the theme-distributed-builds) - (switching allocators, if cargo doesn't already use e.g. jemalloc, wouldn't have a lot of benefits ?) ### 5. Tooling (mostly indirect tasks, in scope 2-4) - check WIP PRs that are still not merged in sccache, and things to do to improve caching there or in general (related to linking), between projects, pre-built artifacts (possibly on crates.io but that is hard) (theme-distributed-builds) - sometimes things compile so slow that we have to kill the process, but measureme and the self-profiler don't support that. It could be nice to have that, to at least get a bit of information there (when not recording event keys) - can we have automation tooling to help bisect compile times issues ? - perf.rlo: * there has been (and are likely still) a bunch of unseen regressions in both rustc and LLVM (projection caching, nested decorator types, NewPM and inlining, lack of vectorization on AVX2 and vectorization regressions). Can they be found, tracked/fixed ? * track the benchmarks that are most affected by some given queries and their variance/history (this instability can sometimes cause confusion about results, in particular recently this has happened on process_obligations). * land measureme/rustc/perf.rlo infrastructure to track performance counters in the self-profilers instead of just wall time * record and display hits and misses stats of the query system * record and display CPU cache misses and branch mispredictions in the perf collector ? * record and display the amount of IO done, to track and correlate size changes on metadata and incremental compilation with the amount of data read, and changes in performance (esp. on spinning disks) * record and display syscalls * display broken benchmarks better, and fixed, by PRs; rather than just globally en passant on the status page * make artifact sizes more prominent, track and graph them over time ? * add a toggle to filter out the rustdoc benchmarks on the compare UI * per-benchmark and per-query graphs on details page, stats (e.g. change point analysis à la Laurie Tratt), more iterations, record more data as byproduct of benchmarking (e.g. cargo timings) * could better tracking of variance ranges, and change-point analysis, have helped avoid the hashing issue #1126 ? * (why is the codegen-schedule not working ?) - perf bot: * ~~display the profile as well as the benchmark name. Without that, it can be confusing whether benchmarks in the summary are rustdoc benches~~ My PR doing that landed already. * ~~ability to launch a perf run on a subset of benchmarks (e.g. only for rustdoc)~~ neither guillaume nor I knew that: this already works ### 6. Misc (exploratory, size-open-ended) - MIR-only rlibs for crates.io (faster to download?), and check other recent use-cases where they've been mentioned, even though they didn't seem to be a win a few years ago - remove unused crates (and duplicates) from rustc and try to find sources of bloat (generally size-s and size-open-ended, and I've already opened a few PRs for that) - with such a long history of changes and refactors in the crate structure, there's likely be project-wide dead code in rustc. An on-demand lint tracking the overall uses of items could be interesting (and incidentally speed up compile times)

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.


    Upgrade

    All
    • All
    • Team
    No template found.

    Create custom template


    Upgrade

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Tutorials

    Book Mode Tutorial

    Slide Mode Tutorial

    YAML Metadata

    Contacts

    Facebook

    Twitter

    Feedback

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings
    Upgrade now

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Available push count

        Upgrade

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Upgrade

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully