HackMD
  • Beta
    Beta  Get a sneak peek of HackMD’s new design
    Turn on the feature preview and give us feedback.
    Go → Got it
    • Beta  Get a sneak peek of HackMD’s new design
      Beta  Get a sneak peek of HackMD’s new design
      Turn on the feature preview and give us feedback.
      Go → Got it
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Please check the box to agree to the Community Guidelines.
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive Export to Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive Import from Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
    Menu Sharing Help
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Export to Google Drive Gist
    Import
    Dropbox Google Drive Import from Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    Write
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Please check the box to agree to the Community Guidelines.
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note      
    Published Linked with GitHub
    Like BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Analysis of `rustc-benchmarking-data` lqd gathered a lot of data in the [`rustc-benchmarking-data`](https://github.com/lqd/rustc-benchmarking-data) repository. This document is nnethercote's analysis of it (with a few additional comments from others). It is long, detailed, and quite dry. It is aimed at Rust compiler developers, and not intended for a general audience. It is also not the highest quality prose, in part because it is likely to become out of date in the not too distant future as performance work addresses things this measurement and analysis has identified. See the [roadmap](https://hackmd.io/YJQSj_nLSZWl2sbI84R1qA) for a higher-level view of rustc perf work for 2022. As well as an analysis, it will serve as a means of tracking who is doing/has done what work. Task assignations are shown in square bracket, e.g. “[name]”. ## round-1-cachegrind-check **Executive summary** - [x] `parse_tt` and other functions related to macro parsing are the hottest, and correlate highly with allocations. \[nnethercote, this [blog post](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) has details\] - [x] `memcpy` is high in functions using BitSets a lot for dataflow analysis, e.g. in `http-0.2.6`. \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\] - [x] Metadata decoding/file reading is roughly constant for all crates; this can be a moderately high proportion of time for tiny crates. Likewise the LLVM `SetImpliedBits` function getting target feature information. - [x] There is a long tail of moderate opportunities for wins on a few crates, worth looking at each of them briefly, there are probably a few easy wins. ### Hot functions in a single crate - [x] `deunicode-1.3.1` dominated by `core::ascii::escape_default` \[martingms, [#94776](https://github.com/rust-lang/rust/pull/94776)\] - [x] `tinyvec-1.5.1` dominated by `<rustc_mir_build::build::Builder>::diverge_cleanup` \[nnethercote: not worth fixing within rustc, but there are several possible fixes within `tinyvec` itself. See [#161](https://github.com/Lokathor/tinyvec/issues/161) for details.\] - [x] `unicode-normalization-0.1.19` dominated by `try_eval_bits` \[nnethercote, [#97936](https://github.com/rust-lang/rust/pull/97936)\] ### Widely used functions This shows all functions across all benchmarks, weighted by their `Ir` percentage. This demonstrates breadth of usage. I've excluded malloc, memcpy, and dlopen/elf stuff, which made up lots of slots. This table is hard to read, but metadata decoding dominates because of its effect on small crates. The next section ("Hot functions in multiple crates") breaks hot functions down more and is probably more useful. ``` 33116.2 counts (weighted fractional, erased) ( 8) 913.6 ( 2.8%, 45.9%): compiler/rustc_serialize/src/opaque.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode ( 9) 802.4 ( 2.4%, 48.4%): library/alloc/src/vec/mod.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode ( 10) 757.8 ( 2.3%, 50.7%): compiler/rustc_span/src/lib.rs:<rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode ( 13) 466.1 ( 1.4%, 56.3%): ???:SetImpliedBits(llvm::FeatureBitset&, llvm::FeatureBitset const&, llvm::ArrayRef<llvm::SubtargetFeatureKV>) ( 14) 406.1 ( 1.2%, 57.5%): hashbrown-0.12.0/src/raw/mod.rs:<hashbrown::map::RawEntryBuilderMut<rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>, (), core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::from_hash::<hashbrown::map::equivalent<rustc_middle::ty::sty::TyKind, rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>>::{closure#0}> ( 15) 367.0 ( 1.1%, 58.6%): compiler/rustc_serialize/src/leb128.rs:<rustc_metadata::rmeta::decoder::DecodeContext as rustc_serialize::serialize::Decoder>::read_u32 ( 17) 342.1 ( 1.0%, 60.7%): library/core/src/slice/iter/macros.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(),core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}> ( 18) 340.9 ( 1.0%, 61.7%): library/core/src/slice/iter/macros.rs:<rustc_span::source_map::SourceMap>::new_imported_source_file ( 19) 340.0 ( 1.0%, 62.8%): compiler/rustc_span/src/lib.rs:<rustc_span::source_map::SourceMap>::new_imported_source_file ( 21) 243.4 ( 0.7%, 64.4%): compiler/rustc_metadata/src/rmeta/decoder.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}> ( 22) 239.5 ( 0.7%, 65.2%): library/core/src/num/uint_macros.rs:<rustc_data_structures::sip128::SipHasher128>::short_write_process_buffer::<u64> ( 24) 220.0 ( 0.7%, 66.5%): compiler/rustc_span/src/lib.rs:<core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<rustc_metadata::rmeta::decoder::ImportedSourceFile, <alloc::vec::Vec<rustc_metadata::rmeta::decoder::ImportedSourceFile> as alloc::vec::spec_extend::SpecExtend<rustc_metadata::rmeta::decoder::ImportedSourceFile, core::iter::adapters::map::Map<core::iter::adapters::map::Map<core::ops::range::Range<usize>, <rustc_metadata::rmeta::Lazy<[rustc_span::SourceFile], usize>>::decode<rustc_metadata::creader::CrateMetadataRef>::{closure#0}>, <rustc_metadata::creader::CrateMetadataRef>::imported_source_files::{closure#3}::{closure#0}>>>::spec_extend::{closure#0}>::{closure#0}> ( 25) 214.2 ( 0.6%, 67.1%): compiler/rustc_serialize/src/leb128.rs:<rustc_serialize::opaque::Decoder as rustc_serialize::serialize::Decoder>::read_usize ( 27) 185.4 ( 0.6%, 68.3%): hashbrown-0.12.0/src/map.rs:<hashbrown::map::RawEntryBuilderMut<rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>, (), core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::from_hash::<hashbrown::map::equivalent<rustc_middle::ty::sty::TyKind, rustc_middle::ty::context::Interned<rustc_middle::ty::TyS>>::{closure#0}> ( 28) 183.1 ( 0.6%, 68.9%): library/std/src/sys/unix/alloc.rs:__rdl_alloc ( 29) 182.1 ( 0.5%, 69.4%): compiler/rustc_middle/src/ty/context.rs:<rustc_middle::ty::context::CtxtInterners>::intern_ty ( 30) 181.7 ( 0.5%, 70.0%): compiler/rustc_middle/src/ty/sty.rs:<rustc_middle::ty::sty::TyKind as core::hash::Hash>::hash::<rustc_hash::FxHasher> ``` ### Hot functions in multiple crates This section lists all the functions that hit 1.5% or higher in one benchmark and appear in more than one benchmark. It's a long list. Related functions (i.e. functions that are hot in tandem) are grouped together. ---- \[nnethercote, mostly related to macro parsing, greatly improved, see [here](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) for details] ``` _int_free/_int_malloc/malloc/free/malloc_consolidate, etc., as represented by _int_free 315: 6.67% async-std-1.10.0 375: 5.85% yansi-0.5.0 401: 5.60% time-macros-0.2.3 582: 4.50% inotify-0.10.0 591: 4.47% web-sys-0.3.56 667: 4.19% nix-0.23.1 685: 4.13% vsdb-0.13.10 687: 4.11% cloudabi-0.1.0 692: 4.10% vsdb_derive-0.2.2 706: 4.06% pest_generator-2.1.3 726: 3.99% futures-lite-1.12.0 736: 3.95% scroll_derive-0.11.0 739: 3.92% num-derive-0.3.3 744: 3.91% raw-cpuid-10.2.0 751: 3.89% clap_derive-3.0.12 755: 3.89% prost-derive-0.9.0 760: 3.88% tonic-build-0.6.2 763: 3.86% pyo3-macros-backend-0.15.1 764: 3.86% diesel_derives-1.4.1 765: 3.85% wasm-bindgen-backend-0.2.79 ``` This table undersells the cost of allocations a lot, because it's only showing `_int_free` results. But it also oversells a little in a different way, because jemalloc is more efficient than glibc malloc (which is measured here). We can probably assume allocations in general account for double the percentage in this table. See the DHAT results for more data. Note that these crates correlate highly with the crates where `parse_tt` and related functions are hot. ---- \[nnethercote, greatly improved, see [here](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) for details] These are all the macro parsing functions. ``` macro_parser::parse_tt 359: 5.99% async-std-1.10.0 462: 5.08% async-std-1.10.0 550: 4.63% time-macros-0.2.3 642: 4.28% yansi-0.5.0 879: 3.65% time-macros-0.2.3 1115: 3.32% yansi-0.5.0 2884: 2.29% num-derive-0.3.3 2939: 2.25% pest_generator-2.1.3 3095: 2.12% ctor-0.1.21 3103: 2.11% scroll_derive-0.11.0 3112: 2.10% tonic-build-0.6.2 3167: 2.06% vsdb_derive-0.2.2 3265: 2.00% stdweb-derive-0.5.3 3500: 1.91% mockall_derive-0.11.0 3512: 1.91% wasm-bindgen-backend-0.2.79 3581: 1.89% futures-macro-0.3.19 3620: 1.88% wayland-scanner-0.30.0-alpha3 3623: 1.88% clap_derive-3.0.12 3806: 1.82% prost-derive-0.9.0 3880: 1.80% diesel_derives-1.4.1 ``` ``` <rustc_parse::parser::Parser>::{bump,bump_with} 603: 4.43% bump time-macros-0.2.3 791: 3.80% bump async-std-1.10.0 2784: 2.36% bump yansi-0.5.0 3240: 2.01% bump_with time-macros-0.2.3 3673: 1.87% bump web-sys-0.3.56 4078: 1.73% bump_with async-std-1.10.0 4228: 1.67% bump num-derive-0.3.3 4549: 1.57% bump futures-macro-0.3.19 4566: 1.56% bump ctor-0.1.21 5171: 1.39% bump mockall_derive-0.11.0 5288: 1.37% bump pest_generator-2.1.3 5343: 1.35% bump wasm-bindgen-backend-0.2.79 5359: 1.35% bump tonic-build-0.6.2 5453: 1.32% bump vsdb_derive-0.2.2 5634: 1.27% bump wayland-scanner-0.30.0-alpha3 5647: 1.27% bump stdweb-derive-0.5.3 5845: 1.22% bump scroll_derive-0.11.0 5872: 1.22% bump enum-as-inner-0.3.3 5918: 1.20% bump clap_derive-3.0.12 5934: 1.20% bump ref-cast-impl-1.0.6 ``` ``` <rustc_parse::parser::TokenCursor>::{next,next_desugared} 633: 4.30% next time-macros-0.2.3 844: 3.70% next async-std-1.10.0 2770: 2.37% next yansi-0.5.0 3312: 1.98% next web-sys-0.3.56 3504: 1.91% next_desugared time-macros-0.2.3 4214: 1.68% next num-derive-0.3.3 4490: 1.59% next futures-macro-0.3.19 4569: 1.56% next ctor-0.1.21 5071: 1.42% next mockall_derive-0.11.0 5240: 1.38% next pest_generator-2.1.3 5274: 1.37% next wasm-bindgen-backend-0.2.79 5328: 1.35% next tonic-build-0.6.2 5396: 1.33% next vsdb_derive-0.2.2 5576: 1.28% next wayland-scanner-0.30.0-alpha3 5651: 1.27% next stdweb-derive-0.5.3 5714: 1.25% next enum-as-inner-0.3.3 5844: 1.22% next scroll_derive-0.11.0 5874: 1.21% next clap_derive-3.0.12 5888: 1.21% next_desugared async-std-1.10.0 5999: 1.18% next ref-cast-impl-1.0.6 ``` ``` <rustc_ast::tokenstream::Cursor>::next_with_spacing 1580: 3.03% time-macros-0.2.3 2386: 2.61% async-std-1.10.0 4437: 1.61% yansi-0.5.0 5322: 1.36% web-sys-0.3.56 6027: 1.17% num-derive-0.3.3 6302: 1.10% futures-macro-0.3.19 6351: 1.09% ctor-0.1.21 6867: 0.97% mockall_derive-0.11.0 6870: 0.97% pest_generator-2.1.3 6939: 0.96% tonic-build-0.6.2 7000: 0.95% wasm-bindgen-backend-0.2.79 7118: 0.93% vsdb_derive-0.2.2 7406: 0.89% stdweb-derive-0.5.3 7409: 0.89% wayland-scanner-0.30.0-alpha3 7619: 0.86% scroll_derive-0.11.0 7735: 0.85% ref-cast-impl-1.0.6 7758: 0.85% enum-as-inner-0.3.3 7771: 0.85% clap_derive-3.0.12 7917: 0.82% pyo3-macros-backend-0.15.1 8071: 0.80% prost-derive-0.9.0 ``` ``` <rustc_expand::mbe::macro_parser::MatcherPos as core::clone::Clone>::clone <rustc_expand::mbe::macro_parser::MatcherPosHandle as core::clone::Clone>::clone 4659: 1.54% MatcherPos async-std-1.10.0 6090: 1.15% MatcherPos time-macros-0.2.3 6831: 0.98% MatcherPos yansi-0.5.0 14457: 0.39% MatcherPos inotify-0.10.0 18643: 0.31% MatcherPosHandle async-std-1.10.0 23304: 0.26% MatcherPos funty-2.0.0 24134: 0.25% MatcherPosHandle time-macros-0.2.3 26172: 0.24% MatcherPos rustfix-0.6.0 29551: 0.22% MatcherPos async-std-1.10.0 33695: 0.19% MatcherPosHandle yansi-0.5.0 ``` ---- \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984), completed, addresses the biggest of these: keccak, http, vte\] ``` memcpy (also: 9.19% for keccak in rustc-perf) 274: 7.35% http-0.2.6 2466: 2.56% vte-0.10.1 2721: 2.40% js-sys-0.3.56 2801: 2.35% unic-ucd-segment-0.9.0 2940: 2.25% aes-gcm-0.9.4 2953: 2.24% pbkdf2-0.10.0 2976: 2.21% stdweb-derive-0.5.3 2999: 2.20% c2-chacha-0.3.3 3047: 2.15% pest_generator-2.1.3 3072: 2.13% rls-data-0.19.1 3073: 2.13% tonic-build-0.6.2 3076: 2.13% num-derive-0.3.3 3101: 2.11% ctor-0.1.21 3115: 2.10% sentry-types-0.24.2 3116: 2.10% pest-2.1.3 3117: 2.10% lsp-types-0.91.1 3122: 2.10% mockall_derive-0.11.0 3124: 2.09% wasm-bindgen-backend-0.2.79 3128: 2.09% postgres-protocol-0.6.3 3136: 2.09% cargo_metadata-0.14.1 ``` keccak and http-0.2.6 high numbers are due to large bitsets in borrowck dataflow analysis. Note that keccak-0.1.0 has some significant changes vs. keccak in rustc-benchmarks. ---- [Hard to improve. On x86-64 we query ~50 target feature flags, for things like SSE*, AVX*, etc. This is within `target_features` in `compiler/rustc_codegen_llvm/src/llvm_util.rs`. We check one flag at a time because the LLVM interface makes it hard to do otherwise, and LLVM is moderately slow to check each one. Even though it's a significant fraction of execution time for small programs, the absolute time is low, so doesn't seem worth any further effort.] ``` ???:SetImpliedBits(llvm::FeatureBitset&, llvm::FeatureBitset const&, llvm::ArrayRef<llvm::SubtargetFeatureKV>) 1425: 3.11% opaque-debug-0.3.0 1436: 3.10% new_debug_unreachable-1.0.4 1443: 3.10% tinyvec_macros-0.1.0 1570: 3.03% matches-0.1.9 1692: 2.97% cfg-if-1.0.0 1699: 2.97% pin-utils-0.1.0 1733: 2.95% match_cfg-0.1.0 1775: 2.93% fuchsia-cprng-0.1.1 1918: 2.85% cty-0.2.2 1923: 2.85% unic-ucd-version-0.9.0 1999: 2.82% if_chain-1.0.2 2114: 2.76% assert_matches-1.5.0 2144: 2.74% more-asserts-0.2.2 2206: 2.71% wincolor-1.0.3 2217: 2.71% winapi-util-0.1.5 2219: 2.71% fsevent-sys-4.1.0 2256: 2.68% miow-0.4.0 2292: 2.66% cpufeatures-0.2.1 2309: 2.65% schannel-0.1.19 2354: 2.63% byte-tools-0.3.1 ``` This is significant only for very small crates. It's getting some target feature information from LLVM. ---- \[nnethercote, [#97575](https://github.com/rust-lang/rust/pull/97575) fixes it] ``` <rustc_span::SourceFile as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode 404: 5.57% (2793830 Ir) fsevent-sys-4.1.0 406: 5.56% (2793830 Ir) winapi-util-0.1.5 412: 5.55% (2793830 Ir) wincolor-1.0.3 420: 5.49% (2793830 Ir) miow-0.4.0 426: 5.44% (2793830 Ir) schannel-0.1.19 444: 5.22% (2793830 Ir) output_vt100-0.1.2 446: 5.22% (2793830 Ir) precomputed-hash-0.1.1 448: 5.19% (2793830 Ir) typeable-0.1.2 453: 5.12% (2793830 Ir) encoding_index_tests-0.1.4 464: 5.06% (2929437 Ir) crossbeam-0.8.1 471: 5.03% (2793830 Ir) fsevent-2.1.2 476: 5.01% (2843100 Ir) enum_primitive-0.1.1 481: 4.97% (2793830 Ir) block-cipher-0.99.99 482: 4.97% (2793830 Ir) stream-cipher-0.99.99 484: 4.95% (2793830 Ir) string_cache_shared-0.3.0 489: 4.93% (2475668 Ir) winapi-util-0.1.5 490: 4.93% (2475668 Ir) fsevent-sys-4.1.0 493: 4.92% (2475668 Ir) wincolor-1.0.3 496: 4.90% (2793830 Ir) maplit-1.0.2 498: 4.89% (2793830 Ir) mac-0.1.1 ``` ``` ...::imported_source_files::... 3064: 2.14% (1071721 Ir) fsevent-sys-4.1.0 3075: 2.13% (1071721 Ir) winapi-util-0.1.5 3077: 2.13% (1071721 Ir) wincolor-1.0.3 3098: 2.11% (1071721 Ir) miow-0.4.0 3131: 2.09% (1071721 Ir) schannel-0.1.19 3248: 2.00% (1071721 Ir) output_vt100-0.1.2 3256: 2.00% (1071721 Ir) precomputed-hash-0.1.1 3289: 1.99% (1071721 Ir) typeable-0.1.2 3350: 1.96% (1071721 Ir) encoding_index_tests-0.1.4 3369: 1.95% (1130584 Ir) crossbeam-0.8.1 3430: 1.93% (1093218 Ir) enum_primitive-0.1.1 3446: 1.93% (1071721 Ir) fsevent-2.1.2 3498: 1.91% (1071721 Ir) stream-cipher-0.99.99 3501: 1.91% (1071721 Ir) block-cipher-0.99.99 3547: 1.90% (1071721 Ir) string_cache_shared-0.3.0 3631: 1.88% (1071721 Ir) maplit-1.0.2 3664: 1.87% (1071721 Ir) mac-0.1.1 3746: 1.84% (1071721 Ir) wayland-protocols-0.30.0-alpha3 3834: 1.82% (1059387 Ir) num-0.4.0 3882: 1.80% (1071721 Ir) mio-named-pipes-0.1.7 ``` ``` <rustc_span::source_map::SourceMap>::new_imported_source_file 3074: 2.13% (1067761 Ir) winapi-util-0.1.5 3078: 2.13% (1067761 Ir) fsevent-sys-4.1.0 3089: 2.12% (1067761 Ir) wincolor-1.0.3 3090: 2.12% (1064836 Ir) fsevent-sys-4.1.0 3094: 2.12% (1064836 Ir) winapi-util-0.1.5 3097: 2.12% (1064836 Ir) wincolor-1.0.3 3114: 2.10% (1067761 Ir) miow-0.4.0 3130: 2.09% (1064836 Ir) miow-0.4.0 3148: 2.08% (1067761 Ir) schannel-0.1.19 3153: 2.07% (1064836 Ir) schannel-0.1.19 3271: 2.00% (1067761 Ir) precomputed-hash-0.1.1 3275: 1.99% (1064836 Ir) precomputed-hash-0.1.1 3288: 1.99% (1064836 Ir) output_vt100-0.1.2 3292: 1.99% (1067761 Ir) output_vt100-0.1.2 3300: 1.98% (1064836 Ir) typeable-0.1.2 3322: 1.98% (1067761 Ir) typeable-0.1.2 3357: 1.96% (1067761 Ir) encoding_index_tests-0.1.4 3389: 1.95% (1064836 Ir) encoding_index_tests-0.1.4 3425: 1.94% (1123069 Ir) crossbeam-0.8.1 3426: 1.94% (1126309 Ir) crossbeam-0.8.1 ``` ``` <rustc_metadata::rmeta::decoder::DecodeContext as rustc_serialize::serialize::Decoder>::read_u32 4352: 1.63% pretty_env_logger-0.4.0 6813: 0.98% rand_os-0.2.2 6968: 0.96% async-compression-0.3.12 6969: 0.96% thread-id-4.0.0 7084: 0.94% strum-0.23.0 7124: 0.93% tokio-buf-0.2.0-alpha.1 7210: 0.92% matchers-0.1.0 7228: 0.92% void-1.0.2 7344: 0.90% crypto-hash-0.3.4 7352: 0.90% gethostname-0.2.2 7391: 0.90% errno-0.2.8 7521: 0.88% inotify-sys-0.1.5 7579: 0.87% atomic-waker-1.0.0 7606: 0.86% malloc_buf-1.0.0 7670: 0.86% terminal_size-0.1.17 7679: 0.86% remove_dir_all-0.7.0 7759: 0.85% hostname-0.3.1 7865: 0.83% ident_case-1.0.1 7906: 0.82% atty-0.2.14 7944: 0.82% clicolors-control-1.0.1 ``` Metadata decoding. High relative number for many, but mostly on very short-running crates, with constant amounts of decoding, presumably for decoding common libs like `std`, `core`. ---- \[nnethercote, [#94316](https://github.com/rust-lang/rust/pull/94316 )] ``` rustc_lexer::unescape::scan_escape 1355: 3.15% pkcs8-0.8.0 4650: 1.54% der-0.6.0-pre.0 5519: 1.30% bitvec-1.0.0 6190: 1.12% snafu-0.7.0 7261: 0.91% unicode_categories-0.1.1 7732: 0.85% web-sys-0.3.56 10163: 0.58% elliptic-curve-0.12.0-pre.1 12835: 0.44% rusoto_s3-0.47.0 13723: 0.41% pkcs8-0.8.0 14869: 0.38% bumpalo-3.9.1 ``` ``` rustc_lexer::unescape::unescape_literal::<<rustc_ast::ast::LitKind>::from_lit_token and friends 2843: 2.32% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0 6105: 1.15% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0 7234: 0.91% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> bitvec-1.0.0 7904: 0.82% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0 8997: 0.69% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1 9719: 0.62% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> snafu-0.7.0 9847: 0.61% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0 11220: 0.52% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0 11318: 0.51% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> pkcs8-0.8.0 13177: 0.43% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> elliptic-curve-0.12.0-pre.1 14008: 0.40% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0 17420: 0.33% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> rusoto_s3-0.47.0 18179: 0.32% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> bitvec-1.0.0 18485: 0.31% <rustc_ast::ast::Lit>::from_lit_token lexical-core-0.8.2 19087: 0.30% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0 19243: 0.30% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1 19244: 0.30% <rustc_ast::ast::Lit>::from_lit_token lexical-6.0.1 23059: 0.26% <rustc_ast::ast::Lit>::from_lit_token web-sys-0.3.56 23736: 0.26% <rustc_ast::ast::LitKind>::from_lit_token::{closure#2}> der-0.6.0-pre.0 23924: 0.26% <rustc_ast::ast::Lit>::from_lit_token async-compression-0.3.12 ``` ---- [nnethercote, [#98153](https://github.com/rust-lang/rust/pull/98153)] ``` <rustc_lint::builtin::MissingDoc as rustc_lint::passes::LateLintPass>::enter_lint_attrs 1631: 3.00% structopt-0.3.26 3259: 2.00% structopt-0.3.26 4776: 1.50% mockall-0.11.0 6739: 1.00% mockall-0.11.0 9250: 0.67% derive_builder-0.10.2 10079: 0.59% tracing-0.1.29 12858: 0.44% derive_builder-0.10.2 14079: 0.40% tracing-0.1.29 16334: 0.35% pest_derive-2.1.0 16479: 0.34% bitflags-1.3.2 ``` ---- [lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)] ``` super_relate_consts::<rustc_infer::infer::equate::Equate> super_relate_consts::<rustc_infer::infer::combine::ConstInferUnifier> 1294: 3.19% equate::Equate> bitmaps-3.1.0 6584: 1.03% equate::Equate> hex-0.4.3 6740: 1.00% equate::Equate> bitmaps-3.1.0 8574: 0.74% combine::ConstInferUnifier> nalgebra-0.30.1 11867: 0.49% equate::Equate> secrecy-0.8.0 15796: 0.36% equate::Equate> nalgebra-0.30.1 18242: 0.32% equate::Equate> hex-0.4.3 27985: 0.23% combine::ConstInferUnifier> nalgebra-0.30.1 28678: 0.22% equate::Equate> bytemuck-1.7.3 38119: 0.17% equate::Equate> bytestring-1.0.0 ``` ``` super_relate_tys::<rustc_infer::infer::equate::Equate> super_relate_tys::<rustc_infer::infer::combine::Generalizer> 4144: 1.71% equate::Equate bitmaps-3.1.0 7745: 0.85% combine::Generalizer pbkdf2-0.10.0 9182: 0.67% equate::Equate hex-0.4.3 9981: 0.60% equate::Equate nalgebra-0.30.1 11379: 0.51% combine::Generalizer aes-gcm-0.9.4 15196: 0.37% combine::Generalizer quickcheck-1.0.3 15212: 0.37% combine::Generalizer sha3-0.10.0 17959: 0.32% equate::Equate secrecy-0.8.0 18288: 0.31% equate::Equate vsdb-0.13.10 18587: 0.31% combine::Generalizer jsonrpc-client-transports-18.0.0 ``` ---- [This code was heavily optimised a couple of years ago for rustc-perf benchmarks like `keccak` and `inflate`, and further improvements are difficult. [#97674](https://github.com/rust-lang/rust/pull/97674) has some small improvements.] ``` process_obligations 2037: 2.80% wast-39.0.0 2614: 2.47% wast-39.0.0 2926: 2.26% wast-39.0.0 3012: 2.18% rustc-serialize-0.3.24 3017: 2.18% wasmparser-0.82.0 3854: 1.81% rustc-serialize-0.3.24 4291: 1.65% rustc-serialize-0.3.24 4434: 1.61% wasmparser-0.82.0 4457: 1.60% wast-39.0.0 4482: 1.59% wast-39.0.0 4726: 1.51% wast-39.0.0 4856: 1.48% inflate-0.4.5 4971: 1.45% mime-0.3.16 5144: 1.40% wasmparser-0.82.0 5555: 1.29% mime-0.3.16 5684: 1.26% wast-39.0.0 5924: 1.20% rustc-serialize-0.3.24 6093: 1.15% wast-39.0.0 6141: 1.14% wasmparser-0.82.0 6188: 1.13% inflate-0.4.5 6250: 1.11% rustc-serialize-0.3.24 6284: 1.10% primitive-types-0.10.1 6328: 1.09% rustc-serialize-0.3.24 6343: 1.09% inflate-0.4.5 6357: 1.09% wast-39.0.0 7032: 0.95% keccak-0.1.0 7038: 0.95% primitive-types-0.10.1 7159: 0.93% keccak-0.1.0 7166: 0.93% wasmparser-0.82.0 7168: 0.93% wasmparser-0.82.0 ``` ``` uninlined_get_root_key 4727: 1.51% (211196081 Ir) wast-39.0.0 5969: 1.19% (5880320 Ir) mime-0.3.16 6988: 0.95% (79707031 Ir) redis-0.21.5 10444: 0.57% (34953834 Ir) rustc-serialize-0.3.24 14051: 0.40% (2061753 Ir) keccak-0.1.0 14277: 0.39% (439948359 Ir) nalgebra-0.30.1 16888: 0.34% (22681340 Ir) http-0.2.6 17441: 0.33% (12036120 Ir) vte-0.10.1 18441: 0.31% (27171051 Ir) procfs-0.12.0 23024: 0.26% (6715340 Ir) rand-0.8.4 ``` A few crates over-represented: `wast-39.0.0`, `rustc-serialize-0.3.24`, `wasmparser-0.82.0`, `inflate-0.4.5`. ---- [This is caused by lots of type folding and interning, very hard to improve.] ``` hashbrown...::from_hash:: 2898: 2.28% cexpr-0.6.0 3311: 1.98% combine-4.6.3 3877: 1.80% diesel-1.4.8 4972: 1.45% pest_meta-2.1.3 5334: 1.35% pbkdf2-0.10.0 5413: 1.33% der-parser-6.0.1 5442: 1.32% arbitrary-1.0.3 5701: 1.26% redis-0.21.5 5772: 1.24% actix-web-4.0.0-beta.21 6016: 1.18% bitvec-1.0.0 6029: 1.17% cookie_store-0.15.1 6044: 1.17% quickcheck-1.0.3 6088: 1.15% tera-1.15.0 6097: 1.15% elliptic-curve-0.12.0-pre.1 6162: 1.13% aes-gcm-0.9.4 6183: 1.13% clap-3.0.13 6200: 1.12% actix-http-3.0.0-beta.19 6379: 1.08% jsonrpc-client-transports-18.0.0 6412: 1.07% convert_case-0.5.0 6430: 1.07% cexpr-0.6.0 ``` ---- \[nnethercote, [#96210](https://github.com/rust-lang/rust/pull/96210) + [#96683](https://github.com/rust-lang/rust/pull/96683 )] ``` <rustc_parse::lexer::StringReader>::next_token 2433: 2.58% web-sys-0.3.56 5945: 1.19% bitflags-1.3.2 8053: 0.81% unicode_categories-0.1.1 8215: 0.78% quick-error-2.0.1 8584: 0.74% pin-project-lite-0.2.8 9592: 0.63% pest-2.1.3 10257: 0.58% mio-named-pipes-0.1.7 10363: 0.57% fixed-hash-0.7.0 10450: 0.57% web-sys-0.3.56 10505: 0.56% tracing-0.1.29 10606: 0.56% uint-0.9.2 10684: 0.55% downcast-rs-1.2.0 11257: 0.52% arrayref-0.3.6 11538: 0.50% web-sys-0.3.56 11961: 0.48% idna-0.2.3 12249: 0.47% jni-sys-0.3.0 12330: 0.46% static_assertions-1.1.0 12674: 0.45% assert_matches-1.5.0 12823: 0.44% parking_lot-0.12.0 13230: 0.43% web-sys-0.3.56 ``` ``` <rustc_parse::lexer::tokentrees::TokenTreesReader>::parse_token_tree 4048: 1.74% web-sys-0.3.56 7244: 0.91% bitflags-1.3.2 9726: 0.62% quick-error-2.0.1 9990: 0.60% pin-project-lite-0.2.8 11603: 0.50% unicode_categories-0.1.1 12702: 0.45% pest-2.1.3 12806: 0.44% fixed-hash-0.7.0 12822: 0.44% mio-named-pipes-0.1.7 12999: 0.44% tracing-0.1.29 13026: 0.43% downcast-rs-1.2.0 ``` ``` <rustc_lexer::cursor::Cursor>::advance_token 4211: 1.68% web-sys-0.3.56 7009: 0.95% bitflags-1.3.2 9521: 0.64% pin-project-lite-0.2.8 9932: 0.60% quick-error-2.0.1 11691: 0.49% unicode_categories-0.1.1 12004: 0.48% pest-2.1.3 12528: 0.45% mio-named-pipes-0.1.7 12625: 0.45% static_assertions-1.1.0 12643: 0.45% tracing-0.1.29 13025: 0.43% downcast-rs-1.2.0 ``` ---- \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\] ``` BitSet<...>::union 2929: 2.26% http-0.2.6 4775: 1.50% vte-0.10.1 6620: 1.02% language-tags-0.3.2 8138: 0.79% vte-0.10.1 10793: 0.54% tinyvec-1.5.1 11031: 0.53% stdweb-derive-0.5.3 11483: 0.50% language-tags-0.3.2 12603: 0.45% keccak-0.1.0 13153: 0.43% wasmparser-0.82.0 15617: 0.36% futures-macro-0.3.19 16358: 0.35% regalloc-0.0.34 16887: 0.34% http-0.2.6 17406: 0.33% json-0.12.4 21027: 0.28% inflate-0.4.5 21621: 0.28% num-derive-0.3.3 22142: 0.27% cranelift-codegen-meta-0.80.0 26357: 0.24% wasm-bindgen-backend-0.2.79 26554: 0.24% vte-0.10.1 27012: 0.23% enumset_derive-0.5.5 29555: 0.22% mockall_derive-0.11.0 ``` ---- [lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)] ``` <rustc_trait_selection::traits::select::SelectionContext>::match_impl 3016: 2.18% match_impl bitmaps-3.1.0 6666: 1.01% match_impl nalgebra-0.30.1 7145: 0.93% match_impl bitmaps-3.1.0 7985: 0.81% match_impl hex-0.4.3 12231: 0.47% match_impl scroll-0.11.0 12901: 0.44% match_impl bitmaps-3.1.0 12903: 0.44% match_impl bitmaps-3.1.0 13382: 0.42% match_impl bitmaps-3.1.0 13727: 0.41% match_impl ordered-float-2.10.0 14243: 0.40% match_impl nalgebra-0.30.1 14451: 0.39% match_impl bytestring-1.0.0 14817: 0.38% match_impl::{closure#0}> bitmaps-3.1.0 14829: 0.38% match_impl bitmaps-3.1.0 14956: 0.38% match_impl lzw-0.10.0 15418: 0.36% match_impl strsim-0.10.0 15450: 0.36% match_impl::{closure#0}> bitmaps-3.1.0 16229: 0.35% match_impl num-complex-0.4.0 16832: 0.34% match_impl hex-0.4.3 17045: 0.33% match_impl aes-gcm-0.9.4 17628: 0.32% match_impl subtle-2.4.1 ``` ---- [lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)] ``` fast_reject::simplify_type 4642: 1.54% bitmaps-3.1.0 10384: 0.57% nalgebra-0.30.1 11646: 0.50% num-complex-0.4.0 11790: 0.49% bytestring-1.0.0 11832: 0.49% ordered-float-2.10.0 13444: 0.42% hex-0.4.3 14691: 0.38% scroll-0.11.0 14982: 0.38% bigdecimal-0.3.0 18130: 0.32% lzw-0.10.0 19718: 0.30% aes-gcm-0.9.4 ``` ---- [lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345)] ``` <rustc_infer::infer::InferCtxtInner>::rollback_to 4787: 1.50% (150578940 Ir) bitmaps-3.1.0 7036: 0.95% (95183010 Ir) bitmaps-3.1.0 8789: 0.72% (2176185 Ir) secrecy-0.8.0 10271: 0.58% (41184070 Ir) rustc-rayon-0.3.2 10343: 0.57% (5906612 Ir) hex-0.4.3 10497: 0.56% (4717135 Ir) scroll-0.11.0 10703: 0.55% (628248294 Ir) nalgebra-0.30.1 11310: 0.51% (29431780 Ir) serde_with-1.11.0 11410: 0.51% (174009408 Ir) diesel-1.4.8 11837: 0.49% (5214125 Ir) ordered-float-2.10.0 12520: 0.45% (1460629 Ir) strsim-0.10.0 13314: 0.42% (13746895 Ir) funty-2.0.0 13350: 0.42% (1903593 Ir) aes-gcm-0.9.4 13428: 0.42% (8132966 Ir) num-complex-0.4.0 13740: 0.41% (15534789 Ir) arbitrary-1.0.3 13947: 0.40% (1519390 Ir) bytemuck-1.7.3 14557: 0.39% (1717338 Ir) pbkdf2-0.10.0 14940: 0.38% (8150921 Ir) parity-scale-codec-2.3.1 15389: 0.37% (37069588 Ir) bitmaps-3.1.0 15499: 0.36% (2017806 Ir) smallvec-1.8.0 ``` ## round-2-llvm-lines-leaf-crate **Executive summary** - Very little room for improvement here. ---- The top functions in `std`, `alloc` and `core`, as weighted by "Lines" counts. (The percentages here are more useful as a relative measure than an absolute measure.) ``` 13677742 counts (weighted integral, erased) ( 1) 269227 ( 2.0%, 2.0%): <core::result::Result<T,E> as core::ops::try_trait::Try>::branch ( 2) 255665 ( 1.9%, 3.8%): alloc::raw_vec::RawVec<T,A>::grow_amortized ( 3) 228899 ( 1.7%, 5.5%): core::option::Option<T>::map ( 4) 158437 ( 1.2%, 6.7%): alloc::alloc::box_free ( 5) 154300 ( 1.1%, 7.8%): alloc::raw_vec::RawVec<T,A>::allocate_in ( 6) 151742 ( 1.1%, 8.9%): core::iter::traits::iterator::Iterator::try_fold ( 7) 140484 ( 1.0%, 9.9%): alloc::raw_vec::RawVec<T,A>::current_memory ( 8) 136639 ( 1.0%, 10.9%): core::iter::traits::iterator::Iterator::fold ( 9) 136108 ( 1.0%, 11.9%): core::result::Result<T,E>::map_err ( 10) 135024 ( 1.0%, 12.9%): core::mem::replace ( 11) 128457 ( 0.9%, 13.9%): <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual ( 12) 114300 ( 0.8%, 14.7%): <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter ( 13) 104392 ( 0.8%, 15.5%): core::ptr::read ( 14) 97865 ( 0.7%, 16.2%): <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend ( 15) 94520 ( 0.7%, 16.9%): core::alloc::layout::Layout::array ( 16) 83034 ( 0.6%, 17.5%): core::slice::iter::Iter<T>::post_inc_start ( 17) 82289 ( 0.6%, 18.1%): core::ops::function::FnOnce::call_once ( 18) 78632 ( 0.6%, 18.6%): core::iter::adapters::map::map_fold::{{closure}} ( 19) 78011 ( 0.6%, 19.2%): core::result::Result<T,E>::map ( 20) 75512 ( 0.6%, 19.8%): core::slice::iter::Iter<T>::new ( 21) 73783 ( 0.5%, 20.3%): <&T as core::fmt::Debug>::fmt ( 22) 71914 ( 0.5%, 20.8%): <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop ( 23) 70031 ( 0.5%, 21.3%): <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next ( 24) 69193 ( 0.5%, 21.8%): core::ptr::metadata::from_raw_parts_mut ( 25) 69071 ( 0.5%, 22.4%): <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold ( 26) 67503 ( 0.5%, 22.8%): alloc::vec::Vec<T,A>::push ( 27) 64384 ( 0.5%, 23.3%): core::fmt::ArgumentV1::new ( 28) 60300 ( 0.4%, 23.8%): core::mem::maybe_uninit::MaybeUninit<T>::assume_init ( 29) 55834 ( 0.4%, 24.2%): core::char::methods::encode_utf8_raw ( 30) 55700 ( 0.4%, 24.6%): alloc::vec::Vec<T,A>::extend_desugared ``` `grow_amortized` has been heavily optimized in the past, and the other top functions are generally very small, hard to improve upon. One possibility: `map_fold`: just inline and remove it? Most affected: actix-router, quote, diesel_derives, bytecount \[nnethercote, [#94442](https://github.com/rust-lang/rust/pull/94442)\], didn't help] ## round-3-dhat **Executive summary** - [x] `parse_tt` and related macro parsing functions cause by far the most allocations, and correlate highly with Cachegrind results. [nnethercote, this [blog post](https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html) has details] - [x] Large BitSets are the next best opportunity, featuring in several crates. \[nnethercote, [#93984](https://github.com/rust-lang/rust/pull/93984)\] - [x] After that, a handful of areas that would help one or two crates and might be worth some effort to look for easy wins, e.g. `match_impl`, `super_relate_tys`, ena snapshot vecs, `escape_defaults`, `ModChild`, thir `mirror_expr_inner`, etc. [lcnr + nnethercote, [#97345](https://github.com/rust-lang/rust/pull/97345), deals with `match_impl`/`super_relate_tys`; nnethercote [#98569](https://github.com/rust-lang/rust/pull/98569), deals with `ModChild`; not much other scope for easy improvements] ---- Top 20 malloc users, from Cachegrind, and biggest source of allocations as determined by looking at the DHAT profiles. ``` 315: 6.67% async-std-1.10.0 macro parsing 375: 5.85% yansi-0.5.0 macro parsing 401: 5.60% time-macros-0.2.3 macro parsing 582: 4.50% inotify-0.10.0 macro parsing 591: 4.47% web-sys-0.3.56 other parsing/AST stuff 667: 4.19% nix-0.23.1 macro parsing 685: 4.13% vsdb-0.13.10 super_relate_tys 687: 4.11% cloudabi-0.1.0 very spread out, no esp. hot places 692: 4.10% vsdb_derive-0.2.2 macro parsing 706: 4.06% pest_generator-2.1.3 macro parsing 726: 3.99% futures-lite-1.12.0 macro parsing (a little), spread out 736: 3.95% scroll_derive-0.11.0 macro parsing 739: 3.92% num-derive-0.3.3 macro parsing 744: 3.91% raw-cpuid-10.2.0 very spread out 751: 3.89% clap_derive-3.0.12 macro parsing 755: 3.89% prost-derive-0.9.0 macro parsing 760: 3.88% tonic-build-0.6.2 macro parsing 763: 3.86% pyo3-macros-backend-0.15.1 macro parsing 764: 3.86% diesel_derives-1.4.1 macro parsing 765: 3.85% wasm-bindgen-backend-0.2.79 macro parsing ``` ---- Hottest program points (PPs) by allocation rate (blocks). This isn't a perfect metric because sometimes multiple distinct PPs are best considered in combination, which requires human understanding of the stack traces. But it's a good start, and while the Cachegrind numbers can be high if there's lots of allocations spread across lots of places, single PPs that allocate a lot are more likely to be optimizable. Ones marked with `**` are not in the top 20 Cachegrind list above. ``` 251.3 / Minstr (5,362,131 blocks) / async-std-1.10.0 251.0 / Minstr (5,355,825 blocks) / async-std-1.10.0 212.0 / Minstr (2,110,433 blocks) / bitmaps-3.1.0 ** match_impl 176.2 / Minstr (3,759,571 blocks) / async-std-1.10.0 176.1 / Minstr (654,394 blocks) / time-macros-0.2.3 176.0 / Minstr (654,019 blocks) / time-macros-0.2.3 175.7 / Minstr (653,182 blocks) / time-macros-0.2.3 175.7 / Minstr (3,750,274 blocks) / async-std-1.10.0 173.8 / Minstr (646,121 blocks) / time-macros-0.2.3 154.2 / Minstr (158,177 blocks) / yansi-0.5.0 150.7 / Minstr (154,606 blocks) / yansi-0.5.0 98.5 / Minstr (280,913 blocks) / num-derive-0.3.3 97.5 / Minstr (164,019 blocks) / vsdb_derive-0.2.2 97.0 / Minstr (190,944 blocks) / pest_generator-2.1.3 90.7 / Minstr (225,904 blocks) / tonic-build-0.6.2 90.7 / Minstr (108,378 blocks) / scroll_derive-0.11.0 90.3 / Minstr (78,899 blocks) / ctor-0.1.21 87.3 / Minstr (89,576 blocks) / yansi-0.5.0 86.7 / Minstr (340,107 blocks) / clap_derive-3.0.12 86.1 / Minstr (88,312 blocks) / yansi-0.5.0 85.9 / Minstr (75,505 blocks) / stdweb-derive-0.5.3 ** macro parsing 81.7 / Minstr (312,670 blocks) / wasm-bindgen-backend-0.2.79 81.3 / Minstr (60,048 blocks) / enumflags2_derive-0.7.3 ** macro parsing 81.3 / Minstr (595,413 blocks) / mockall_derive-0.11.0 ** macro parsing 80.7 / Minstr (284,977 blocks) / wayland-scanner-0.30.0-alpha3 ** macro parsing 79.3 / Minstr (133,508 blocks) / futures-macro-0.3.19 79.0 / Minstr (257,950 blocks) / prost-derive-0.9.0 77.2 / Minstr (185,937 blocks) / diesel_derives-1.4.1 77.2 / Minstr (178,079 blocks) / structopt-derive-0.4.18 ** macro parsing 77.1 / Minstr (77,488 blocks) / hex-0.4.3 ** match_impl ``` Macro parsing dominates here, again. `match_impl` also shows up. ---- Hottest program points (PPs) by allocation rate (bytes). Excludes tiny crates dominated by metadata decoding, which all have a `bytes` value in the range 0.9-2.0MB, mostly around 1.4MB. The rightmost column indicates the hot allocation causes. Ones marked with `**` are not in the top 20 Cachegrind list above. ``` 33,740.63 / Minstr (720,052,608 bytes) / async-std-1.10.0 33,375.40 / Minstr (124,055,232 bytes) / time-macros-0.2.3 28,855.35 / Minstr (8,407,296 bytes) / secrecy-0.8.0 ** ena snapshot vecs 28,620.20 / Minstr (12,557,016 bytes) / aes-gcm-0.9.4 ** pred oblig. hashmaps 27,516.50 / Minstr (11,782,896 bytes) / pbkdf2-0.10.0 ** vtbl_impl 27,398.95 / Minstr (8,388,592 bytes) / deunicode-1.3.1 ** escape_default 24,645.81 / Minstr (3,985,120 bytes) / web-sys-0.3.56 parsing/AST stuff 22,248.21 / Minstr (9,761,328 bytes) / aes-gcm-0.9.4 22,085.03 / Minstr (471,312,600 bytes) / async-std-1.10.0 21,752.66 / Minstr (9,314,748 bytes) / pbkdf2-0.10.0 ** pred obligs 18,237.79 / Minstr (15,495,584 bytes) / unicode_categories-0.1.1 ** thir mirror_expr_inner 17,878.73 / Minstr (85,887,776 bytes) / pest-2.1.3 ** thir mirror_expr_inner 16,525.18 / Minstr (16,955,904 bytes) / yansi-0.5.0 16,447.43 / Minstr (5,035,623 bytes) / deunicode-1.3.1 escape_default 16,080.75 / Minstr (343,176,384 bytes) / async-std-1.10.0 15,484.04 / Minstr (57,553,672 bytes) / time-macros-0.2.3 14,426.90 / Minstr (307,881,792 bytes) / async-std-1.10.0 13,991.95 / Minstr (4,283,840 bytes) / deunicode-1.3.1 escape_default 13,621.47 / Minstr (50,101,664 bytes) / vte-0.10.1 BitSets 13,568.19 / Minstr (135,067,712 bytes) / bitmaps-3.1.0 ** match_impl 13,259.72 / Minstr (13,605,328 bytes) / yansi-0.5.0 13,140.27 / Minstr (79,387,384 bytes) / http-0.2.6 BitSets 13,140.27 / Minstr (79,387,384 bytes) / http-0.2.6 BitSets 12,873.49 / Minstr (6,820,112 bytes) / c2-chacha-0.3.3 ModChild ``` A much wider range of results here. ---- Hottest program points (PPs) by peak memory usage. The rightmost column indicates the hot allocation causes. ``` 33.10% (3,985,120 bytes) / web-sys-0.3.56 Vec<TreeAndSpacing> in TokenStreamBuilder 32.29% (79,387,384 bytes) / http-0.2.6 BitSets 22.30% (25,059,520 bytes) / vte-0.10.1 BitSets 21.30% (6,422,572 bytes) / unicode_categories-0.1.1 LitToConstInput 20.95% (4,182,024 bytes) / rand_chacha-0.3.1 NameBinding, NameResolution, BindingKey 20.94% (51,517,192 bytes) / http-0.2.6 BitSets 20.94% (4,182,024 bytes) / c2-chacha-0.3.3 NameBinding, NameResolution, BindingKey 18.95% (17,086,432 bytes) / language-tags-0.3.2 BitSets 18.95% (17,086,168 bytes) / language-tags-0.3.2 BitSets 18.89% (21,230,008 bytes) / vte-0.10.1 BitSets 18.02% (5,035,623 bytes) / deunicode-1.3.1 Symbol, LitKind, encode_metadata_impl 15.76% (3,145,728 bytes) / rand_chacha-0.3.1 NameBinding, NameResolution, BindingKey 14.07% (5,609,280 bytes) / c2-chacha-0.3.3 NameBinding, NameResolution, BindingKey 13.30% (2,097,152 bytes) / keccak-0.1.0 ProjectionElem, PredicateInner 13.28% (8,832,256 bytes) / pest-2.1.3 as_operand, Expr 12.95% (2,042,880 bytes) / keccak-0.1.0 ProjectionElem, PredicateInner 12.89% (2,860,032 bytes) / serde_qs-0.8.5 DroplessArena 12.78% (8,388,600 bytes) / unic-ucd-segment-0.9.0 encode_metadata_impl 12.57% (2,097,152 bytes) / deunicode-1.3.1 Symbol, LitKind, encode_metadata_impl 12.14% (2,451,456 bytes) / actix-tls-3.0.1 DroplessArena 11.99% (41,901,440 bytes) / redis-0.21.5 obligation_forest::Node 11.95% (4,761,040 bytes) / rand_chacha-0.3.1 NameBinding, NameResolution, BindingKey 11.91% (7,922,432 bytes) / pest-2.1.3 as_operand, Expr 11.75% (8,637,520 bytes) / tinyvec-1.5.1 BitSet 11.75% (8,637,312 bytes) / tinyvec-1.5.1 BitSet 10.72% (2,451,456 bytes) / stdweb-derive-0.5.3 DroplessArena 10.50% (2,097,152 bytes) / c2-chacha-0.3.3 NameBinding, NameResolution, BindingKey 10.21% (5,968,560 bytes) / aes-0.7.5 ModChild ``` BitSets are again common. `DroplessArena` ones are difficult to action because that covers many different types. Otherwise, fairly spread out. ---- Highest peak memory usage, absolute. ``` 535,773,520 bytes / vsdb-0.13.10 350,086,000 bytes / lsp-types-0.91.1 312,125,839 bytes / nalgebra-0.30.1 295,757,875 bytes / diesel-1.4.8 245,575,943 bytes / http-0.2.6 224,398,810 bytes / combine-4.6.3 199,890,087 bytes / nix-0.23.1 195,042,951 bytes / gimli-0.26.1 185,666,858 bytes / rusoto_s3-0.47.0 179,543,658 bytes / object-0.28.3 174,479,216 bytes / proptest-1.0.0 169,319,484 bytes / wast-39.0.0 168,982,525 bytes / tendermint-proto-0.24.0-pre.1 161,741,679 bytes / goblin-0.4.3 148,107,835 bytes / h2-0.3.11 ``` Other than `http-0.2.6`, which is dominated by BitSets, these are not all that interesting. No particularly hot allocations sites, a pretty similar mix, with higher ones tending to be `DroplessArena`, mir CFG building, metadata encoding, etc. ## round-4-line-counts Biggest crates. ``` web-sys-0.3.56.txt : 155935 lines of rust regex-syntax-0.6.25.txt : 45482 lines of rust tokio-1.16.1.txt : 44900 lines of rust nalgebra-0.30.1.txt : 43215 lines of rust gimli-0.26.1.txt : 38441 lines of rust unicode-normalization-0.1.19.txt : 27132 lines of rust curve25519-dalek-4.0.0-pre.1.txt : 26478 lines of rust object-0.28.3.txt : 26403 lines of rust diesel-1.4.8.txt : 25390 lines of rust ndarray-0.15.4.txt : 24502 lines of rust rustls-0.20.2.txt : 23704 lines of rust nix-0.23.1.txt : 22779 lines of rust rusoto_s3-0.47.0.txt : 20968 lines of rust trust-dns-proto-0.21.0-alpha.4.txt : 19930 lines of rust image-0.23.14.txt : 19866 lines of rust petgraph-0.6.0.txt : 19283 lines of rust git2-0.13.25.txt : 18990 lines of rust vsdbsled-0.34.7-patched.txt : 18900 lines of rust bitvec-1.0.0.txt : 18437 lines of rust actix-web-4.0.0-beta.21.txt : 17388 lines of rust ``` Smallest crates. ``` fuchsia-cprng-0.1.1.txt : 40 lines of rust cranelift-codegen-shared-0.80.0.txt : 39 lines of rust num-0.4.0.txt : 38 lines of rust headers-core-0.2.0.txt : 38 lines of rust waker-fn-1.1.0.txt : 36 lines of rust thread-id-4.0.0.txt : 36 lines of rust foreign-types-shared-0.3.0.txt : 36 lines of rust darling_macro-0.13.1.txt : 34 lines of rust new_debug_unreachable-1.0.4.txt : 29 lines of rust opaque-debug-0.3.0.txt : 25 lines of rust tinyvec_macros-0.1.0.txt : 22 lines of rust byte-tools-0.3.1.txt : 21 lines of rust precomputed-hash-0.1.1.txt : 13 lines of rust winapi-build-0.1.1.txt : 12 lines of rust string_cache_shared-0.3.0.txt : 10 lines of rust enum-iterator-0.7.0.txt : 10 lines of rust typeable-0.1.2.txt : 8 lines of rust stream-cipher-0.99.99.txt : 3 lines of rust jsonrpc-core-client-18.0.0.txt : 3 lines of rust block-cipher-0.99.99.txt : 3 lines of rust ``` Not much to analyze here. ## round-5-cachegrind-debug This is similar to round-1-cachegrind-check, but with additional LLVM costs, which aren't very interesting to analyze here. ## round-6-llvm-lines-project This gave very similar results to round-2-llvm-lines-leaf-crate, so I haven't analyzed it. ## round-7-cargo-timing-check-j1 The use of `-j1` forces codegen to be non-parallel, which makes these results non-representative. See the `-j8` results instead. ## round-8-cargo-timing-debug-j1 The use of -j1 force codegen to be non-parallel, which makes these results non-representative. See the `-j8` results instead. ## round-9-cargo-timing-opt-j1 The use of -j1 force codegen to be non-parallel, which makes these results non-representative. See the `-j8` results instead. ## round-10-cachegrind-opt This is similar to round-1-cachegrind-check, but with additional LLVM costs, which aren't very interesting to analyze here. ## round-13-cargo-timing-opt-j8 Most expensive crates. The counts are seconds of compile time, e.g. `syn` accounted for 643.5 seconds of compile time, which is 8.1% of the total. (There is of course overlap in crate compilation, so this doesn't say much about the critical path.) ``` 7932.5 counts (weighted fractional, erased) ( 1) 643.5 ( 8.1%, 8.1%): syn v1.0.86 ( 2) 235.4 ( 3.0%, 11.1%): serde v1.0.136 ( 3) 202.1 ( 2.5%, 13.6%): tokio v1.16.1 ( 4) 182.9 ( 2.3%, 15.9%): regex-syntax v0.6.25 ( 5) 172.3 ( 2.2%, 18.1%): libc v0.2.116 ( 6) 155.9 ( 2.0%, 20.1%): regex v1.5.4 ( 7) 151.7 ( 1.9%, 22.0%): proc-macro2 v1.0.36 ( 8) 138.8 ( 1.7%, 23.7%): serde_derive v1.0.136 ( 9) 130.0 ( 1.6%, 25.4%): memchr v2.4.1 ( 10) 99.6 ( 1.3%, 26.6%): libc v0.2.116 build script ( 11) 95.6 ( 1.2%, 27.8%): proc-macro2 v1.0.36 build script ( 12) 92.2 ( 1.2%, 29.0%): quote v1.0.15 ( 13) 86.4 ( 1.1%, 30.1%): syn v1.0.86 build script ( 14) 78.3 ( 1.0%, 31.1%): http v0.2.6 ( 15) 77.2 ( 1.0%, 32.0%): futures-util v0.3.19 ( 16) 69.3 ( 0.9%, 32.9%): aho-corasick v0.7.18 ( 17) 63.5 ( 0.8%, 33.7%): serde_json v1.0.78 ( 18) 54.3 ( 0.7%, 34.4%): bytes v1.1.0 ( 19) 50.7 ( 0.6%, 35.0%): h2 v0.3.11 ( 20) 49.4 ( 0.6%, 35.7%): autocfg v1.0.1 ( 21) 49.1 ( 0.6%, 36.3%): cc v1.0.72 ( 22) 49.1 ( 0.6%, 36.9%): thiserror-impl v1.0.30 ( 23) 49.0 ( 0.6%, 37.5%): log v0.4.14 ( 24) 48.7 ( 0.6%, 38.1%): hyper v0.14.16 ( 25) 48.1 ( 0.6%, 38.7%): num_cpus v1.13.1 ( 26) 47.9 ( 0.6%, 39.3%): mio v0.7.14 ( 27) 45.1 ( 0.6%, 39.9%): unicode-bidi v0.3.7 ( 28) 42.1 ( 0.5%, 40.4%): log v0.4.14 build script ( 29) 42.0 ( 0.5%, 41.0%): unicode-xid v0.2.2 ( 30) 40.9 ( 0.5%, 41.5%): num-traits v0.2.14 ``` `syn`/`quote`/`proc-macro2` (and their build scripts) are the most frequent. Very surprising to see so many build scripts in there! Definitely worth investigation. Some analysis of build script use-cases (areas where declaratively supporting the feature in cargo would remove the need for the script): - setting conditional compilation flags depending on the compiler version, handling MSRV: * [`syn`](https://github.com/dtolnay/syn/blob/master/build.rs) * [`proc-macro2`](https://github.com/dtolnay/proc-macro2/blob/master/build.rs) * [`libc`](https://github.com/rust-lang/libc/blob/master/build.rs) * [`serde`](https://github.com/serde-rs/serde/blob/master/serde/build.rs) - setting conditional compilation flags depending on the target: * [`log`](https://github.com/rust-lang/log/blob/master/build.rs). This seems more of a convenience than something impossible without a script though: the crate could likely contain `cfg` expressions matching the same targets (doing so would remove this node from 120 crates' dependency graph in the dataset) \[https://github.com/rust-lang/log/issues/489\] * [`proc-macro2`](https://github.com/dtolnay/proc-macro2/blob/master/build.rs): e.g. for the wasm target * [`libc`](https://github.com/rust-lang/libc/blob/master/build.rs): e.g. for the FreeBSD target versions * [`memchr`](https://github.com/BurntSushi/memchr/blob/master/build.rs): e.g. for SIMD * [`serde`](https://github.com/serde-rs/serde/blob/master/serde/build.rs): e.g. for wasm/asm.js, and architectures where libstd supports atomics * [`futures-core`](https://github.com/rust-lang/futures-rs/blob/master/futures-core/build.rs): e.g. for targets without atomic CAS ops - parsing and checking other environment variables (although one could see the `target` use-case above as parsing the `TARGET` env var): * `proc-macro2` also checks the `DOCS_RS` env var, likely to control and improve rustdoc output on docs.rs * `libc` for CI to deny warnings, to check if it's a dependency of libstd, and to access cargo feature flags (which are probably equivalent to using `cfg!` expressions in the build script) - setting conditional compilation flags derived from other feature flags (e.g. in `proc-macro2`) TODO: also investigate the build script compile times. Some of these scripts are simple (but use various parts of libstd), but compile slowly (e.g. `syn`'s build script compiles in >400ms in 150 crates). We need to look into that: whether it's because of opt levels or else; maybe some simple scripts could be interpreted. ---- Most popular crates, i.e. how often they are dependencies for other crates. ``` 10657 counts: ( 1) 224 ( 2.1%, 2.1%): libc v0.2.116 build script (run) ( 2) 224 ( 2.1%, 4.2%): libc v0.2.116 ( 3) 223 ( 2.1%, 6.3%): cfg-if v1.0.0 ( 4) 215 ( 2.0%, 8.3%): libc v0.2.116 build script ( 5) 200 ( 1.9%, 10.2%): unicode-xid v0.2.2 ( 6) 199 ( 1.9%, 12.1%): proc-macro2 v1.0.36 ( 7) 199 ( 1.9%, 13.9%): quote v1.0.15 ( 8) 199 ( 1.9%, 15.8%): proc-macro2 v1.0.36 build script (run) ( 9) 197 ( 1.8%, 17.6%): proc-macro2 v1.0.36 build script ( 10) 193 ( 1.8%, 19.5%): syn v1.0.86 build script (run) ( 11) 193 ( 1.8%, 21.3%): syn v1.0.86 ( 12) 191 ( 1.8%, 23.1%): syn v1.0.86 build script ( 13) 122 ( 1.1%, 24.2%): log v0.4.14 build script (run) ( 14) 122 ( 1.1%, 25.3%): log v0.4.14 ( 15) 120 ( 1.1%, 26.5%): log v0.4.14 build script ( 16) 103 ( 1.0%, 27.4%): memchr v2.4.1 ( 17) 103 ( 1.0%, 28.4%): memchr v2.4.1 build script (run) ( 18) 102 ( 1.0%, 29.4%): lazy_static v1.4.0 ( 19) 101 ( 0.9%, 30.3%): memchr v2.4.1 build script ( 20) 88 ( 0.8%, 31.1%): autocfg v1.0.1 ( 21) 76 ( 0.7%, 31.8%): serde v1.0.136 ( 22) 76 ( 0.7%, 32.6%): serde v1.0.136 build script (run) ( 23) 74 ( 0.7%, 33.3%): serde v1.0.136 build script ( 24) 72 ( 0.7%, 33.9%): version_check v0.9.4 ( 25) 63 ( 0.6%, 34.5%): pin-project-lite v0.2.8 ( 26) 62 ( 0.6%, 35.1%): futures-core v0.3.19 build script ( 27) 62 ( 0.6%, 35.7%): futures-core v0.3.19 ( 28) 62 ( 0.6%, 36.3%): futures-core v0.3.19 build script (run) ( 29) 60 ( 0.6%, 36.8%): once_cell v1.9.0 ( 30) 58 ( 0.5%, 37.4%): fnv v1.0.7 ``` `libc`, `cfg-if`, `unicode-xid`, and `syn`/`quote`/`proc-macro2`/`unicode-xid` are the most popular. ---- The biggest projects, i.e. most crates compiled. ``` jsonrpc-client-transports-18.0.0: 184 actix-web-4.0.0-beta.21: 178 sentry-0.24.2: 149 rusoto_s3-0.47.0: 144 awc-3.0.0-beta.19: 141 warp-0.3.2: 129 tonic-0.6.2: 112 actix-connect-2.0.0: 110 rusoto_signature-0.47.0: 108 tera-1.15.0: 107 reqwest-0.11.9: 105 actix-http-3.0.0-beta.19: 100 tokio-postgres-0.7.5: 93 vsdb-0.13.10: 90 rusoto_credential-0.47.0: 88 glutin-0.28.0: 87 criterion-0.3.5: 82 jsonrpc-core-client-18.0.0: 81 trust-dns-resolver-0.21.0-alpha.4: 79 hyper-rustls-0.23.0: 79 tokio-tungstenite-0.16.1: 75 rustc-ap-rustc_data_structures-727.0.0: 75 log4rs-1.0.0: 73 ammonia-3.1.3: 73 hyper-tls-0.5.0: 72 jsonrpc-server-utils-18.0.0: 68 jsonrpc-pubsub-18.0.0: 68 trust-dns-proto-0.21.0-alpha.4: 66 tracing-opentelemetry-0.16.0: 65 sentry-backtrace-0.24.2: 63 ``` 219 out of 777 projects contain a single crate, i.e. zero dependencies. --- Observations just from looking at some timings graphs. - The `hyper` crates depends on the `h2` crate, but doesn't start building until `hyper` is fully compiled, rather than when `hyper`'s metadata is emitted before codegen'. Is this necessary? E.g. in [`warp-0.3.2`](https://lqd.github.io/rustc-benchmarking-data/results/round-13-cargo-timing-opt-j8/cargo-timing-warp-0.3.2-opt-j8.html). [lqd, [hyper:#2770](https://github.com/hyperium/hyper/pull/2770), complete] - Likewise for everything that depends on `syn`, e.g. in [`actix-connect-2.0.0`](https://lqd.github.io/rustc-benchmarking-data/results/round-13-cargo-timing-opt-j8/cargo-timing-actix-connect-2.0.0-opt-j8.html) - Some build scripts that compile C code are very slow to run, e.g. `zstd-sys build script (run)` in [`awc-3.0.0-beta.19`](https://lqd.github.io/rustc-benchmarking-data/results/round-13-cargo-timing-opt-j8/cargo-timing-awc-3.0.0-beta.19-opt-j8.html). Can we do better with them? Prioritizing some of them earlier in the pipeline could help, thanks to increased parallelism. The same thing used to happen on servo but I've also seen it on crates depending on `openssl`, and is tracked in [this cargo issue](https://github.com/rust-lang/cargo/issues/7437). Note: although, native library builds can also compete for tokens and build in parallel, and moving those earlier can in turn make them build slower because of higher contention and less resources. ## round-11-cargo-timing-check-j8 Most expensive crates, same idea as for round-13. ``` 5196.0 counts (weighted fractional, erased) ( 1) 491.5 ( 9.5%, 9.5%): syn v1.0.86 ( 2) 179.7 ( 3.5%, 12.9%): serde v1.0.136 lib (check) ( 3) 154.6 ( 3.0%, 15.9%): serde_derive v1.0.136 ( 4) 132.4 ( 2.5%, 18.4%): libc v0.2.116 lib (check) ( 5) 110.8 ( 2.1%, 20.6%): libc v0.2.116 build script ( 6) 110.6 ( 2.1%, 22.7%): proc-macro2 v1.0.36 ( 7) 109.5 ( 2.1%, 24.8%): proc-macro2 v1.0.36 build script ( 8) 100.0 ( 1.9%, 26.7%): syn v1.0.86 lib (check) ( 9) 98.2 ( 1.9%, 28.6%): syn v1.0.86 build script ( 10) 82.7 ( 1.6%, 30.2%): tokio v1.16.1 lib (check) ( 11) 64.2 ( 1.2%, 31.5%): futures-util v0.3.19 lib (check) ( 12) 63.5 ( 1.2%, 32.7%): quote v1.0.15 ( 13) 57.0 ( 1.1%, 33.8%): autocfg v1.0.1 ( 14) 52.2 ( 1.0%, 34.8%): thiserror-impl v1.0.30 ( 15) 47.8 ( 0.9%, 35.7%): regex-syntax v0.6.25 lib (check) ( 16) 47.3 ( 0.9%, 36.6%): cc v1.0.72 ( 17) 43.9 ( 0.8%, 37.4%): log v0.4.14 build script ( 18) 43.0 ( 0.8%, 38.3%): memchr v2.4.1 lib (check) ( 19) 42.5 ( 0.8%, 39.1%): memchr v2.4.1 build script ( 20) 38.9 ( 0.7%, 39.8%): version_check v0.9.4 ( 21) 36.8 ( 0.7%, 40.6%): serde v1.0.136 build script ( 22) 36.7 ( 0.7%, 41.3%): typenum v1.15.0 build script ( 23) 36.0 ( 0.7%, 42.0%): http v0.2.6 lib (check) ( 24) 33.9 ( 0.7%, 42.6%): typenum v1.15.0 lib (check) ( 25) 32.3 ( 0.6%, 43.2%): zstd-sys v1.6.2+zstd.1.5.1 build script (run) ( 26) 31.3 ( 0.6%, 43.8%): jemalloc-sys v0.3.2 build script (run) ( 27) 31.3 ( 0.6%, 44.4%): unicode-xid v0.2.2 ( 28) 31.1 ( 0.6%, 45.0%): cfg-if v1.0.0 lib (check) ( 29) 28.8 ( 0.6%, 45.6%): num-traits v0.2.14 lib (check) ( 30) 26.7 ( 0.5%, 46.1%): derive_more v0.99.17 ``` Reasonably similar results to round-13. ## round-12-cargo-timing-debug-j8 Most expensive crates, same idea as for round-13. ``` 6451.3 counts (weighted fractional, erased) ( 1) 663.9 (10.3%, 10.3%): syn v1.0.86 ( 2) 222.7 ( 3.5%, 13.7%): serde v1.0.136 ( 3) 154.1 ( 2.4%, 16.1%): serde_derive v1.0.136 ( 4) 153.5 ( 2.4%, 18.5%): proc-macro2 v1.0.36 ( 5) 144.6 ( 2.2%, 20.8%): libc v0.2.116 ( 6) 135.1 ( 2.1%, 22.8%): tokio v1.16.1 ( 7) 112.6 ( 1.7%, 24.6%): libc v0.2.116 build script ( 8) 108.7 ( 1.7%, 26.3%): proc-macro2 v1.0.36 build script ( 9) 97.6 ( 1.5%, 27.8%): syn v1.0.86 build script ( 10) 91.9 ( 1.4%, 29.2%): regex-syntax v0.6.25 ( 11) 89.3 ( 1.4%, 30.6%): quote v1.0.15 ( 12) 76.2 ( 1.2%, 31.8%): memchr v2.4.1 ( 13) 73.9 ( 1.1%, 32.9%): futures-util v0.3.19 ( 14) 62.4 ( 1.0%, 33.9%): regex v1.5.4 ( 15) 58.4 ( 0.9%, 34.8%): http v0.2.6 ( 16) 58.1 ( 0.9%, 35.7%): autocfg v1.0.1 ( 17) 54.7 ( 0.8%, 36.5%): thiserror-impl v1.0.30 ( 18) 52.1 ( 0.8%, 37.4%): cc v1.0.72 ( 19) 43.6 ( 0.7%, 38.0%): log v0.4.14 build script ( 20) 42.5 ( 0.7%, 38.7%): memchr v2.4.1 build script ( 21) 41.8 ( 0.6%, 39.3%): unicode-xid v0.2.2 ( 22) 40.9 ( 0.6%, 40.0%): log v0.4.14 ( 23) 39.3 ( 0.6%, 40.6%): bytes v1.1.0 ( 24) 38.7 ( 0.6%, 41.2%): serde_json v1.0.78 ( 25) 38.6 ( 0.6%, 41.8%): version_check v0.9.4 ( 26) 38.0 ( 0.6%, 42.4%): hyper v0.14.16 ( 27) 37.9 ( 0.6%, 43.0%): serde v1.0.136 build script ( 28) 36.6 ( 0.6%, 43.5%): typenum v1.15.0 build script ( 29) 35.2 ( 0.5%, 44.1%): zstd-sys v1.6.2+zstd.1.5.1 build script (run) ( 30) 34.7 ( 0.5%, 44.6%): typenum v1.15.0 ``` Reasonably similar results to round-13. ## round-14-self-profile-check The heaviest relative queries seen. (More data [here](https://lqd.github.io/rustc-benchmarking-data/summaries/)). The `expand_crate` ones have some correlation with the hot macro parsing results seen with Cachegrind and DHAT. ``` expand_crate rel 83.73%, abs 44.35ms web-sys-0.3.56 expand_crate rel 70.53%, abs 3.62s async-std-1.10.0 metadata_register_crate rel 60.64%, abs 15.26ms jsonrpc-core-client-18.0.0 metadata_register_crate rel 59.59%, abs 24.52ms impl-codec-0.5.1 expand_crate rel 54.35%, abs 155.29ms yansi-0.5.0 typeck rel 53.87%, abs 1.41s redis-0.21.5 typeck rel 52.41%, abs 94.35ms keccak-0.1.0 specialization_graph_of rel 51.22%, abs 13.41s nalgebra-0.30.1 expand_crate rel 50.40%, abs 6.93ms opaque-debug-0.3.0 expand_crate rel 49.74%, abs 556.79ms time-macros-0.2.3 expand_crate rel 49.62%, abs 59.15ms enum-iterator-derive-0.7.0 expand_crate rel 49.29%, abs 350.36ms num-derive-0.3.3 expand_crate rel 49.10%, abs 9.20ms static_assertions-1.1.0 expand_crate rel 48.98%, abs 1.89s js-sys-0.3.56 expand_crate rel 48.64%, abs 10.33ms mac-0.1.1 expand_crate rel 48.61%, abs 6.49ms matches-0.1.9 expand_crate rel 48.54%, abs 114.46ms ctor-0.1.21 expand_crate rel 48.47%, abs 14.52ms fixed-hash-0.7.0 expand_crate rel 48.40%, abs 6.16ms pin-utils-0.1.0 expand_crate rel 48.34%, abs 6.21ms tinyvec_macros-0.1.0 expand_crate rel 48.23%, abs 10.45ms crossbeam-0.8.1 expand_crate rel 47.68%, abs 7.28ms cpufeatures-0.2.1 expand_crate rel 46.52%, abs 240.79ms pest_generator-2.1.3 expand_crate rel 46.31%, abs 18.75ms term_size-1.0.0-beta1 expand_crate rel 45.75%, abs 8.37ms miow-0.4.0 typeck rel 45.58%, abs 479.33ms vte-0.10.1 expand_crate rel 44.70%, abs 8.46ms wincolor-1.0.3 expand_crate rel 44.51%, abs 65.65ms enum-as-inner-0.3.3 expand_crate rel 44.50%, abs 490.77ms pear-0.2.3 expand_crate rel 44.43%, abs 8.63ms winapi-util-0.1.5 ``` Slowest passes overall, weighted by percentages. ``` 77399.4 counts (weighted fractional, erased) ( 1) 13800.9 (17.8%, 17.8%): typeck ( 2) 13736.9 (17.7%, 35.6%): expand_crate ( 3) 7133.2 ( 9.2%, 44.8%): mir_borrowck ( 4) 2454.5 ( 3.2%, 48.0%): evaluate_obligation ( 5) 2191.4 ( 2.8%, 50.8%): free_global_ctxt ( 6) 2120.5 ( 2.7%, 53.5%): metadata_register_crate ( 7) 2114.5 ( 2.7%, 56.3%): metadata_decode_entry_impl_trait_ref ( 8) 2072.1 ( 2.7%, 58.9%): hir_lowering ( 9) 1733.5 ( 2.2%, 61.2%): mir_built ( 10) 1692.4 ( 2.2%, 63.4%): specialization_graph_of ( 11) 1549.9 ( 2.0%, 65.4%): late_resolve_crate ( 12) 1473.5 ( 1.9%, 67.3%): parse_crate ( 13) 1214.3 ( 1.6%, 68.8%): type_op_prove_predicate ( 14) 979.1 ( 1.3%, 70.1%): check_impl_item_well_formed ( 15) 905.9 ( 1.2%, 71.3%): check_item_well_formed ( 16) 822.8 ( 1.1%, 72.3%): param_env ( 17) 667.8 ( 0.9%, 73.2%): generate_crate_metadata ( 18) 655.9 ( 0.8%, 74.1%): thir_body ( 19) 615.8 ( 0.8%, 74.9%): check_mod_item_types ( 20) 580.2 ( 0.7%, 75.6%): metadata_decode_entry_type_of ``` ## round-15-self-profile-debug The heaviest relative queries seen. ``` expand_crate rel 73.81%, abs 40.93ms web-sys-0.3.56-Debug-Full.txt run_linker rel 68.10%, abs 152.87ms block-cipher-0.99.99-Debug-Full.txt run_linker rel 67.83%, abs 141.60ms stream-cipher-0.99.99-Debug-Full.txt run_linker rel 63.27%, abs 856.92ms wasm-bindgen-macro-0.2.79-Debug-Full.txt run_linker rel 58.74%, abs 713.84ms pest_derive-2.1.0-Debug-Full.txt run_linker rel 55.58%, abs 898.85ms darling_macro-0.13.1-Debug-Full.txt metadata_register_crate rel 54.79%, abs 23.41ms jsonrpc-core-client-18.0.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 54.76%, abs 423.21ms rpassword-5.0.1-Debug-Full.txt LLVM_module_codegen_emit_obj rel 54.30%, abs 405.03ms color_quant-1.1.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 53.62%, abs 514.55ms predicates-tree-1.0.5-Debug-Full.txt LLVM_module_codegen_emit_obj rel 53.24%, abs 497.02ms slog-scope-4.4.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 53.21%, abs 579.14ms dirs-sys-next-0.1.2-Debug-Full.txt LLVM_module_codegen_emit_obj rel 52.92%, abs 556.50ms diff-0.1.12-Debug-Full.txt LLVM_module_codegen_emit_obj rel 52.65%, abs 603.78ms pem-1.0.2-Debug-Full.txt LLVM_module_codegen_emit_obj rel 52.47%, abs 488.49ms log-mdc-0.1.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 52.39%, abs 375.82ms heck-0.4.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 52.00%, abs 222.41ms shlex-1.1.0-Debug-Full.txt run_linker rel 51.17%, abs 993.63ms pyo3-macros-0.15.1-Debug-Full.txt LLVM_module_codegen_emit_obj rel 51.12%, abs 588.33ms jobserver-0.1.24-Debug-Full.txt LLVM_module_codegen_emit_obj rel 50.60%, abs 612.43ms strsim-0.10.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 50.43%, abs 615.94ms convert_case-0.5.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 50.33%, abs 410.43ms tokio-tcp-0.2.0-alpha.1-Debug-Full.txt LLVM_module_codegen_emit_obj rel 50.13%, abs 591.25ms dotenv-0.15.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 50.11%, abs 643.20ms simplelog-0.11.2-Debug-Full.txt LLVM_module_codegen_emit_obj rel 49.82%, abs 451.52ms polling-2.2.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 49.72%, abs 611.30ms threadpool-1.8.1-Debug-Full.txt LLVM_module_codegen_emit_obj rel 49.64%, abs 233.63ms shell-escape-0.1.5-Debug-Full.txt LLVM_module_codegen_emit_obj rel 49.51%, abs 527.52ms proc-macro-crate-1.1.0-Debug-Full.txt LLVM_module_codegen_emit_obj rel 49.48%, abs 645.35ms stringprep-0.1.2-Debug-Full.txt LLVM_module_codegen_emit_obj rel 49.48%, abs 571.53ms futures-timer-3.0.2-Debug-Full.txt ``` Slowest passes overall, weighted by percentages. ``` 77694.1 counts (weighted fractional, erased) ( 1) 21595.5 (27.8%, 27.8%): LLVM_module_codegen_emit_obj ( 2) 8452.2 (10.9%, 38.7%): LLVM_passes ( 3) 5659.6 ( 7.3%, 46.0%): expand_crate ( 4) 4990.5 ( 6.4%, 52.4%): typeck ( 5) 4766.1 ( 6.1%, 58.5%): codegen_module ( 6) 2562.4 ( 3.3%, 61.8%): mir_borrowck ( 7) 1740.9 ( 2.2%, 64.1%): finish_ongoing_codegen ( 8) 1651.1 ( 2.1%, 66.2%): run_linker ( 9) 1407.6 ( 1.8%, 68.0%): LLVM_module_optimize ( 10) 1232.0 ( 1.6%, 69.6%): free_global_ctxt ( 11) 1130.4 ( 1.5%, 71.0%): LLVM_module_codegen ( 12) 1113.2 ( 1.4%, 72.5%): evaluate_obligation ( 13) 1057.2 ( 1.4%, 73.8%): metadata_register_crate ( 14) 894.0 ( 1.2%, 75.0%): metadata_decode_entry_impl_trait_ref ( 15) 870.6 ( 1.1%, 76.1%): hir_lowering ( 16) 816.2 ( 1.1%, 77.1%): mir_drops_elaborated_and_const_checked ( 17) 794.9 ( 1.0%, 78.2%): specialization_graph_of ( 18) 720.7 ( 0.9%, 79.1%): parse_crate ( 19) 676.3 ( 0.9%, 80.0%): optimized_mir ( 20) 643.4 ( 0.8%, 80.8%): mir_built ``` ## round-16-self-profile-opt The heaviest relative queries seen. ``` expand_crate rel 65.50%, abs 43.94ms web-sys-0.3.56-Opt-Full.txt run_linker rel 62.97%, abs 167.55ms block-cipher-0.99.99-Opt-Full.txt run_linker rel 56.23%, abs 147.61ms stream-cipher-0.99.99-Opt-Full.txt specialization_graph_of rel 49.22%, abs 13.32s nalgebra-0.30.1-Opt-Full.txt LLVM_module_optimize rel 41.94%, abs 994.78ms rustc-demangle-0.1.21-Opt-Full.txt metadata_register_crate rel 41.40%, abs 21.23ms impl-codec-0.5.1-Opt-Full.txt metadata_register_crate rel 39.58%, abs 16.02ms jsonrpc-core-client-18.0.0-Opt-Full.txt LLVM_module_optimize rel 38.36%, abs 459.10ms rpassword-5.0.1-Opt-Full.txt LLVM_module_optimize rel 38.27%, abs 662.36ms slog-scope-4.4.0-Opt-Full.txt LLVM_module_optimize rel 38.05%, abs 1.83s async-process-1.3.0-Opt-Full.txt LLVM_module_optimize rel 37.84%, abs 1.26s textwrap-0.14.2-Opt-Full.txt LLVM_module_optimize rel 36.96%, abs 614.37ms serial_test-0.5.1-Opt-Full.txt LLVM_module_optimize rel 36.58%, abs 675.59ms log-mdc-0.1.0-Opt-Full.txt LLVM_module_optimize rel 36.53%, abs 623.55ms actix-threadpool-0.3.3-Opt-Full.txt LLVM_module_optimize rel 36.51%, abs 679.71ms futures-executor-0.3.19-Opt-Full.txt LLVM_module_optimize rel 36.48%, abs 752.50ms blocking-1.1.0-Opt-Full.txt LLVM_module_optimize rel 36.25%, abs 1.23s version_check-0.9.4-Opt-Full.txt LLVM_module_optimize rel 36.25%, abs 764.37ms dirs-sys-0.3.6-Opt-Full.txt expand_crate rel 36.21%, abs 10.49ms crossbeam-0.8.1-Opt-Full.txt LLVM_module_optimize rel 36.12%, abs 2.41s async-global-executor-2.0.2-Opt-Full.txt LLVM_module_optimize rel 35.89%, abs 586.79ms tokio-udp-0.2.0-alpha.1-Opt-Full.txt LLVM_module_optimize rel 35.59%, abs 741.72ms tokio-tcp-0.2.0-alpha.1-Opt-Full.txt LLVM_module_optimize rel 35.59%, abs 1.11s rusty-fork-0.3.0-Opt-Full.txt LLVM_module_optimize rel 35.58%, abs 534.42ms wasm-bindgen-futures-0.4.29-Opt-Full.txt LLVM_module_optimize rel 35.52%, abs 1.08s os_info-3.1.0-Opt-Full.txt LLVM_module_optimize rel 35.45%, abs 1.25s dotenv-0.15.0-Opt-Full.txt LLVM_module_optimize rel 35.37%, abs 267.58ms crypto-hash-0.3.4-Opt-Full.txt LLVM_module_optimize rel 34.98%, abs 447.28ms hyper-rustls-0.23.0-Opt-Full.txt typeck rel 34.95%, abs 518.49ms vte-0.10.1-Opt-Full.txt LLVM_module_optimize rel 34.93%, abs 1.57s tokio-signal-0.3.0-alpha.1-Opt-Full.txt ``` Slowest passes overall, weighted by percentages. ``` 77768.4 counts (weighted fractional, erased) ( 1) 13851.8 (17.8%, 17.8%): LLVM_module_optimize ( 2) 10520.5 (13.5%, 31.3%): LLVM_passes ( 3) 8220.3 (10.6%, 41.9%): LLVM_module_codegen_emit_obj ( 4) 8131.6 (10.5%, 52.4%): finish_ongoing_codegen ( 5) 7983.4 (10.3%, 62.6%): LLVM_lto_optimize ( 6) 3890.3 ( 5.0%, 67.6%): expand_crate ( 7) 3213.1 ( 4.1%, 71.8%): typeck ( 8) 1628.8 ( 2.1%, 73.9%): mir_borrowck ( 9) 1405.6 ( 1.8%, 75.7%): codegen_module ( 10) 1002.9 ( 1.3%, 77.0%): codegen_module_optimize ( 11) 936.9 ( 1.2%, 78.2%): LLVM_thin_lto_import ( 12) 894.2 ( 1.1%, 79.3%): free_global_ctxt ( 13) 802.3 ( 1.0%, 80.3%): evaluate_obligation ( 14) 768.9 ( 1.0%, 81.3%): metadata_register_crate ( 15) 604.3 ( 0.8%, 82.1%): metadata_decode_entry_impl_trait_ref ( 16) 601.3 ( 0.8%, 82.9%): codegen_module_perform_lto ( 17) 594.2 ( 0.8%, 83.6%): hir_lowering ( 18) 558.6 ( 0.7%, 84.4%): parse_crate ( 19) 544.0 ( 0.7%, 85.1%): specialization_graph_of ( 20) 512.3 ( 0.7%, 85.7%): mir_drops_elaborated_and_const_checked ``` ## round-17-time-passes-check **Executive summary** - Crate expansion and type checking are the passes that increase memory usage the most. ---- `-Ztime-passes` gives both time and RSS (absolute and change) for each pass. Self-profiling covers time, so I'll just analyze the change in RSS for each stage. I don't entirely trust the RSS numbers produced by `-Ztime-passes`, the sometimes seem wonky, but here goes. Weighted RSS changes. Note that the totals aren't that meaningful, it's about the percentages. ``` 184746.0 counts (weighted fractional, erased) ( 1) 48259.0 (26.1%, 26.1%): total ( 2) -36664.0 (-19.8%, 6.3%): free_global_ctxt ( 3) 33765.0 (18.3%, 24.6%): configure_and_expand ( 4) 29491.0 (16.0%, 40.5%): macro_expand_crate ( 5) 29448.0 (15.9%, 56.5%): expand_crate ( 6) 27825.0 (15.1%, 71.5%): type_check_crate ( 7) 11533.0 ( 6.2%, 77.8%): coherence_checking ( 8) 7046.0 ( 3.8%, 81.6%): item_bodies_checking ( 9) 6986.0 ( 3.8%, 85.4%): MIR_borrow_checking ( 10) 4205.0 ( 2.3%, 87.6%): type_collecting ( 11) 3477.0 ( 1.9%, 89.5%): hir_lowering ( 12) 3060.0 ( 1.7%, 91.2%): wf_checking ( 13) 2704.0 ( 1.5%, 92.6%): resolve_crate ( 14) 2451.0 ( 1.3%, 94.0%): late_resolve_crate ( 15) 1770.0 ( 1.0%, 94.9%): parse_crate ( 16) 1751.0 ( 0.9%, 95.9%): item_types_checking ( 17) 1358.0 ( 0.7%, 96.6%): misc_checking_1 ( 18) 1329.0 ( 0.7%, 97.3%): generate_crate_metadata ( 19) 1141.0 ( 0.6%, 97.9%): misc_checking_3 ( 20) 830.0 ( 0.4%, 98.4%): lint_checking ``` I don't think the `total` number is meaningful. `macro_expand_crate` and `expand_crate` are almost always identical, not sure what to make of that, seems suspicious. ## round-18-time-passes-debug ``` 274400.0 counts (weighted fractional, erased) ( 1) 75783.0 (27.6%, 27.6%): total ( 2) -40315.0 (-14.7%, 12.9%): free_global_ctxt ( 3) 33988.0 (12.4%, 25.3%): configure_and_expand ( 4) 29827.0 (10.9%, 36.2%): macro_expand_crate ( 5) 29774.0 (10.9%, 47.0%): expand_crate ( 6) 28294.0 (10.3%, 57.3%): type_check_crate ( 7) 24573.0 ( 9.0%, 66.3%): codegen_crate ( 8) 23209.0 ( 8.5%, 74.8%): codegen_to_LLVM_IR ( 9) 11883.0 ( 4.3%, 79.1%): coherence_checking ( 10) 7118.0 ( 2.6%, 81.7%): item_bodies_checking ( 11) 7029.0 ( 2.6%, 84.2%): generate_crate_metadata ( 12) 7028.0 ( 2.6%, 86.8%): MIR_borrow_checking ( 13) 5132.0 ( 1.9%, 88.7%): monomorphization_collector_graph_walk ( 14) 4240.0 ( 1.5%, 90.2%): type_collecting ( 15) 3472.0 ( 1.3%, 91.5%): hir_lowering ( 16) 3114.0 ( 1.1%, 92.6%): wf_checking ( 17) 2732.0 ( 1.0%, 93.6%): resolve_crate ( 18) 2506.0 ( 0.9%, 94.5%): late_resolve_crate ( 19) 1731.0 ( 0.6%, 95.2%): item_types_checking ( 20) 1710.0 ( 0.6%, 95.8%): parse_crate ``` Numbers for front-end passes are similar to `round-17`, as expected. Codegen passes add some extra memory use, unsurprisingly. ## round-19-time-passes-opt ``` 34209.0 counts (weighted fractional, erased) ( 1) 111637.0 (25.7%, 25.7%): LLVM_lto_optimize(*-cgu.N) ( 2) 94078.0 (21.7%, 47.4%): total ( 3) -39819.0 (-9.2%, 38.2%): free_global_ctxt ( 4) 33939.0 ( 7.8%, 46.0%): configure_and_expand ( 5) 30996.0 ( 7.1%, 53.2%): codegen_crate ( 6) 29712.0 ( 6.8%, 60.0%): macro_expand_crate ( 7) 29634.0 ( 6.8%, 66.8%): expand_crate ( 8) 28418.0 ( 6.5%, 73.4%): type_check_crate ( 9) 24642.0 ( 5.7%, 79.0%): codegen_to_LLVM_IR ( 10) 17904.0 ( 4.1%, 83.2%): finish_ongoing_codegen ( 11) 15878.0 ( 3.7%, 86.8%): link ( 12) 12012.0 ( 2.8%, 89.6%): coherence_checking ( 13) 7087.0 ( 1.6%, 91.2%): item_bodies_checking ( 14) 6953.0 ( 1.6%, 92.8%): MIR_borrow_checking ( 15) 5477.0 ( 1.3%, 94.1%): monomorphization_collector_graph_walk ( 16) 4179.0 ( 1.0%, 95.1%): type_collecting ( 17) 3479.0 ( 0.8%, 95.9%): hir_lowering ( 18) 3115.0 ( 0.7%, 96.6%): wf_checking ( 19) 2737.0 ( 0.6%, 97.2%): resolve_crate ( 20) 2513.0 ( 0.6%, 97.8%): generate_crate_metadata ``` `LLVM_lto_optimize` is the most memory-hungry pass, in general.

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.


    Upgrade

    All
    • All
    • Team
    No template found.

    Create custom template


    Upgrade

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Tutorials

    Book Mode Tutorial

    Slide Mode Tutorial

    YAML Metadata

    Contacts

    Facebook

    Twitter

    Feedback

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings
    Upgrade now

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Available push count

        Upgrade

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Upgrade

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully