owned this note
owned this note
Published
Linked with GitHub
# wg-binary-size
The Binary Size Working Group aims to reduce the size of compiled Rust binaries.
## Motivation
- Embedded developers/users operating under tight memory and storage constraints.
- Rust-in-Linux.
- General desire for efficiency.
- Parity with C/C++.
## Basics
- [List of members](https://www.rust-lang.org/governance/teams/compiler#Binary%20size%20working%20group)
- [Zulip channel](https://rust-lang.zulipchat.com/#narrow/stream/wg-binary-size)
- [Repository](https://github.com/rust-lang/wg-binary-size)
## Meetings
Meetings are text-based, done via Zulip. Search for topics with "Meeting" in the title on [the Zulip channel](https://rust-lang.zulipchat.com/#narrow/stream/wg-binary-size).
As of late 2023, plan is to meet at UTC 21:00 every second Thursday, at least until DST changes in 2024 mix things up. The first meeting was on November 16. This meeting can also be imported into your calendar:
- [ICS file][meeting-ics]
- [Add to Google Calendar][meeting-gcal]
[meeting-ics]: https://rust-lang.zulipchat.com/user_uploads/4715/PX1uF2vsZfVdLt1DF5Xwk_Pr/wg-binary-size-meeting-invite.ics
<!-- The same ICS file, but pre-filled to add to Google Calendar -->
[meeting-gcal]: https://www.google.com/calendar/render?cid=webcal%3A//rust-lang.zulipchat.com/user_uploads/4715/PX1uF2vsZfVdLt1DF5Xwk_Pr/wg-binary-size-meeting-invite.ics
## Scope
Things that are in scope:
1. exe/dylib/cdylib size on disk
- Before stripping: text, data, symbols, debuginfo
- After stripping: text, data
- dylib may include metadata? Need to understand/determine
- bjorn3: "I had a WIP PR to allow splitting the dylib metadata into a separate file and leaving a stub with the crate hash in its place, but that PR has been closed for inactivity"
1. exe/dylib/cdylib size mapped into a process
- Text, data, BSS
- Closely related to the previous item
3. rlib and rmeta size?
- Rust-in-Linux might ship rmeta files
Things that are out of scope:
1. Total memory usage (RSS/working set) of a process
- A lot of that usage can be unrelated to binary size
2. Target directory size
- Includes caches, incremental compilation data, etc.
Configurations of interest
- `cargo build`
- `cargo build --release`
- `-Cdebuginfo=1`
- `-Os`/`-Oz`
- `-Oz` should be the smallest, currently something bigger than `-Os` and/or `-O3`
- PIE/PIC?
Platforms of interest
- Tier 1 platforms
- x86_64
- aarch64
- i686 (probably lower priority?)
- Embedded platforms
- armv6
- armv7/thumbv7
- riscv32
- Web assembly
Backends: LLVM vs GCC vs Cranelift
- GCC currently much bigger, e.g. 11MB (7.7MB if the sysroot is compiled in release mode) for helloworld vs 4.4MB for LLVM
- cranelift smaller? maybe due to less inlining?
## Tools
- `ls -l` (crude, excludes BSS)
- `size` (include BSS; excludes debuginfo; `--format` differences)
- `size -A` prints all sections
- [cargo-bloat](https://crates.io/crates/cargo-bloat/)
- [cargo-llvm-lines](https://crates.io/crates/cargo-llvm-lines)
- [findpanics](https://github.com/philipc/findpanics)
- Gary is "working on a tool that tries to do some analysis on binary/disassembly level, although it's still a very early WIP"
## Benchmarking
rustc-perf
- [binary size graphs](https://perf.rust-lang.org/index.html?kind=raw&stat=size%3Alinked_artifact)
- [binary size past month comparison](https://perf.rust-lang.org/compare.html?stat=size%3Alinked_artifact)
- Most benchmarks are library crates, and not very useful for binary size metrics.
- Binary crates:
- `helloworld`
- `helloworld-tiny`
- `ripgrep-13.0.0`
- `ripgrep-13.0.0-tiny`
- `tiny` ones use:
```
opt-level = "z"
lto = true
codegen-units = 1
panic = "abort"
strip = true
```
## Ideas
brainstorming
- size of constant data?
- what tooling do we need?
- [I-heavy backlog](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AI-heavy)
### Platforms
#### "Size Classes"
Platforms tend to be used in a few different ways, which affects their binary size concerns:
- application
- generally is a 64-bit execution environment, with accordingly big pointers
- can potentially exploit linked-in dynamic library code (mostly libc)
- have access to vector instructions which can make things bigger or smaller ("pointless" (subjective) loop unrolling and vectorization is a frequent complaint re: code size)
- embedded
- usually bare metal 32-bit, sometimes also 16-bit but Rust supports those less-well
- may have Harvard architectures and limits in (mapped) code size measured in only KB
- without an OS, they often have to ship the code for even basic things like memcpy
- kernel
- "bare metal" like embedded, but with concerns about 64-bit environments
- in practice, has more "tricks" than embedded targets do to minimize code size, because the OS itself constitutes its own software ecosystem
- wasm
- 32-bit, with similar concerns as embedded, but usually runs on powerful execution environments
- code is generally JIT compiled, so users "pay twice" for in-memory code: first for the wasm encoding, then for the JIT binary which is actually executed, but still only hits icache once
- in most wasm use-cases, even servers, code must ship over the network (that's not the only point of a VM, but it is a primary one), so one can expect ~100 megabytes-per-second transfer speeds in areas with gigabit ethernet, and it's only part of the total application with data, etc.
## References
The [Build Configuration](https://nnethercote.github.io/perf-book/build-configuration.html) chapter of the Rust Performance Book has details on reducing binary size.
[min-sized-rust](https://github.com/johnthagen/min-sized-rust) has more details, including more aggressive approaches.
[Tighten Rust’s Belt: Shrinking Embedded Rust Binaries](https://dl.acm.org/doi/pdf/10.1145/3519941.3535075) is an LCTES'22 paper about a study comparing embedded Rust binaries with equivalent C binaries. It explores
- Deeply ingrained monomorphization
- Suboptimal compiler generated support code
- String formatting (e.g. `derive(Debug)`)
- Drop
- Futures
- Hidden data structures and data
- Panics
- Dynamic dispatch
- Static initializers
- Fewer compiler optimizations