# wg-binary-size The Binary Size Working Group aims to reduce the size of compiled Rust binaries. ## Motivation - Embedded developers/users operating under tight memory and storage constraints. - Rust-in-Linux. - General desire for efficiency. - Parity with C/C++. ## Basics - [List of members](https://www.rust-lang.org/governance/teams/compiler#Binary%20size%20working%20group) - [Zulip channel](https://rust-lang.zulipchat.com/#narrow/stream/wg-binary-size) - [Repository](https://github.com/rust-lang/wg-binary-size) ## Meetings Meetings are text-based, done via Zulip. Search for topics with "Meeting" in the title on [the Zulip channel](https://rust-lang.zulipchat.com/#narrow/stream/wg-binary-size). As of late 2023, plan is to meet at UTC 21:00 every second Thursday, at least until DST changes in 2024 mix things up. The first meeting was on November 16. This meeting can also be imported into your calendar: - [ICS file][meeting-ics] - [Add to Google Calendar][meeting-gcal] [meeting-ics]: https://rust-lang.zulipchat.com/user_uploads/4715/PX1uF2vsZfVdLt1DF5Xwk_Pr/wg-binary-size-meeting-invite.ics <!-- The same ICS file, but pre-filled to add to Google Calendar --> [meeting-gcal]: https://www.google.com/calendar/render?cid=webcal%3A//rust-lang.zulipchat.com/user_uploads/4715/PX1uF2vsZfVdLt1DF5Xwk_Pr/wg-binary-size-meeting-invite.ics ## Scope Things that are in scope: 1. exe/dylib/cdylib size on disk - Before stripping: text, data, symbols, debuginfo - After stripping: text, data - dylib may include metadata? Need to understand/determine - bjorn3: "I had a WIP PR to allow splitting the dylib metadata into a separate file and leaving a stub with the crate hash in its place, but that PR has been closed for inactivity" 1. exe/dylib/cdylib size mapped into a process - Text, data, BSS - Closely related to the previous item 3. rlib and rmeta size? - Rust-in-Linux might ship rmeta files Things that are out of scope: 1. Total memory usage (RSS/working set) of a process - A lot of that usage can be unrelated to binary size 2. Target directory size - Includes caches, incremental compilation data, etc. Configurations of interest - `cargo build` - `cargo build --release` - `-Cdebuginfo=1` - `-Os`/`-Oz` - `-Oz` should be the smallest, currently something bigger than `-Os` and/or `-O3` - PIE/PIC? Platforms of interest - Tier 1 platforms - x86_64 - aarch64 - i686 (probably lower priority?) - Embedded platforms - armv6 - armv7/thumbv7 - riscv32 - Web assembly Backends: LLVM vs GCC vs Cranelift - GCC currently much bigger, e.g. 11MB (7.7MB if the sysroot is compiled in release mode) for helloworld vs 4.4MB for LLVM - cranelift smaller? maybe due to less inlining? ## Tools - `ls -l` (crude, excludes BSS) - `size` (include BSS; excludes debuginfo; `--format` differences) - `size -A` prints all sections - [cargo-bloat](https://crates.io/crates/cargo-bloat/) - [cargo-llvm-lines](https://crates.io/crates/cargo-llvm-lines) - [findpanics](https://github.com/philipc/findpanics) - Gary is "working on a tool that tries to do some analysis on binary/disassembly level, although it's still a very early WIP" ## Benchmarking rustc-perf - [binary size graphs](https://perf.rust-lang.org/index.html?kind=raw&stat=size%3Alinked_artifact) - [binary size past month comparison](https://perf.rust-lang.org/compare.html?stat=size%3Alinked_artifact) - Most benchmarks are library crates, and not very useful for binary size metrics. - Binary crates: - `helloworld` - `helloworld-tiny` - `ripgrep-13.0.0` - `ripgrep-13.0.0-tiny` - `tiny` ones use: ``` opt-level = "z" lto = true codegen-units = 1 panic = "abort" strip = true ``` ## Ideas brainstorming - size of constant data? - what tooling do we need? - [I-heavy backlog](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AI-heavy) ### Platforms #### "Size Classes" Platforms tend to be used in a few different ways, which affects their binary size concerns: - application - generally is a 64-bit execution environment, with accordingly big pointers - can potentially exploit linked-in dynamic library code (mostly libc) - have access to vector instructions which can make things bigger or smaller ("pointless" (subjective) loop unrolling and vectorization is a frequent complaint re: code size) - embedded - usually bare metal 32-bit, sometimes also 16-bit but Rust supports those less-well - may have Harvard architectures and limits in (mapped) code size measured in only KB - without an OS, they often have to ship the code for even basic things like memcpy - kernel - "bare metal" like embedded, but with concerns about 64-bit environments - in practice, has more "tricks" than embedded targets do to minimize code size, because the OS itself constitutes its own software ecosystem - wasm - 32-bit, with similar concerns as embedded, but usually runs on powerful execution environments - code is generally JIT compiled, so users "pay twice" for in-memory code: first for the wasm encoding, then for the JIT binary which is actually executed, but still only hits icache once - in most wasm use-cases, even servers, code must ship over the network (that's not the only point of a VM, but it is a primary one), so one can expect ~100 megabytes-per-second transfer speeds in areas with gigabit ethernet, and it's only part of the total application with data, etc. ## References The [Build Configuration](https://nnethercote.github.io/perf-book/build-configuration.html) chapter of the Rust Performance Book has details on reducing binary size. [min-sized-rust](https://github.com/johnthagen/min-sized-rust) has more details, including more aggressive approaches. [Tighten Rust’s Belt: Shrinking Embedded Rust Binaries](https://dl.acm.org/doi/pdf/10.1145/3519941.3535075) is an LCTES'22 paper about a study comparing embedded Rust binaries with equivalent C binaries. It explores - Deeply ingrained monomorphization - Suboptimal compiler generated support code - String formatting (e.g. `derive(Debug)`) - Drop - Futures - Hidden data structures and data - Panics - Dynamic dispatch - Static initializers - Fewer compiler optimizations