As new features get added to Rust, the v0 mangling scheme needs to evolve to accommodate them. Some additions like the new f16
and f128
basic types could be added in a backwards compatible way, but others, like complex const generics require extending the grammar.
Symbol names containing new grammar constructs cannot be demangled by existing demanglers. The main goal of this discussion is to decide on a policy on how to deal with this breakage:
Both options are viable. The following sections describe the implications of each to inform a decision.
When a tool encounters a name it cannot demangle it usually will display the mangled version. E.g. instead of
my_crate::foo::<{my_crate::Foo { x: 4, y: 5 }>
it would display
_RINvCs38VALFIIAdn_8my_crate3fooKVNtB2_3FooS1xl4_1yl5_EEB
To mitigate the problem, MCP 737 suggests making the compiler emit symbol names in a way that allows demanglers to skip unknown parts. As a result, demanglers that don't know about complex const generics would display:
my_crate::foo::<KVNtB2_3FooS1xl4_1yl5_E>
i.e. the const generic argument is still displayed in its mangled form but the rest of the function name has been demangled successfully. Once demanglers know about this mechanism, they can prettify the skipped parts to make more clear what's going on:
my_crate::foo::<{parse error: KVNtB2_3FooS1xl4_1yl5_E}>
In theory it would also be possible to update the rustfilt
tool to repair such partially demangled symbol names.
However, providing this mitigation comes at a cost:
C<numbytes>_
where <numbytes>
is an ASCII decimal number encoding the length of the skippable section. This increases the length of symbol names. An experimental perf run did not show significant file size increases, but perf.rlo is not a good test case because the benchmarked crates don't contain complex const generics. In the extrem, every skippable section adds around 4 bytes, and every leaf generic argument is skippable.<numbytes>
sections have the correct number of digits. In practice, however, I (mw) don't think this is a problem. I was not able to construct an example that needed more than 2 steps to converge. Having the additional logic in the mangling code does pose a potential maintenance burden.rustc-demangle
crate can be found here. Implementing this has medium complexity and doing the implementing in plain C (as required for some demanglers) is not going to make it easier. On the other hand, once it is implemented it does not need to change for future grammar additions.Different tools are affected in different ways. Some tools, like GDB, only rely on symbol name demangling if no debuginfo is available. Others like LLDB and Valgrind always use symbol names[1]. The following table shows how grammar additions affect each tool (with and without debuginfo present) and how long it usually takes for a fix to arrive on the users system:
With Debuginfo | No Debuginfo | Expected time until fixed | |
---|---|---|---|
GDB | OK | breakage | months to several years |
LLDB | breakage | breakage | ~6 months |
Valgrind | breakage | breakage | months to several years |
perf | ? | ?[2] | months to several years |
c++filt | N/A | breakage | months to several years |
llvm-cxxfilt | N/A | breakage | ~6 months |
rustfilt | N/A | OK | immediately |
backtrace-rs | OK | OK | immediately |
WinDbg | OK | ? | N/A (uses debuginfo only) |
Any tools that come with Linux distros and that usually stick to the same major version throughout the lifetime of the distro release are affected worst because fixes only become available with the next distro release at the earliest. For users having to stick to LTS versions that can be several years.
For other tools like LLDB
and llvm-cxxfilt
it is more common to have up-to-date packages for new major versions available, or for the tool to be acquired through other channels.
Anything that is distributed together with the compiler (e.g. backtrace-rs
) or that is installed via cargo install
has no problem. The tools can be updated at the same time as the grammar change is made.
The following is an example of an LLDB backtrace of a program with full debuginfo, with and without the mitigation. The first frame either is the mangled symbol name or a partially demangled one.
Without mitigation:
* thread #1, name = 'symbol_mangling', stop reason = breakpoint 1.1
* frame #0: 0x05678ec _RINvCs38VALFIIAdn_25symbol_mangling_consumers3fooKVNtB2_3FooS1xl4_1yl5_EEB2_(x=0x055bbf8) at main.rs:13:22
frame #1: 0x05679ad symbol_mangling_consumers::main at main.rs:17:5
frame #2: 0x056789b <fn() as core::ops::function::FnOnce<()>>::call_once((null)=(symbol_mangling_consumers::main at main.rs:16), (null)=<unavailable>) at function.rs:250:5
With mitigation:
* thread #1, name = 'symbol_mangling', stop reason = breakpoint 1.1
* frame #0: 0x057eb2c symbol_mangling_consumers::foo::<KVNtB2_3FooS1xl4_1yl5_E>(x=0x0562c10) at main.rs:13:22
frame #1: 0x057ebfd symbol_mangling_consumers::main at main.rs:17:5
frame #2: 0x057eadb <fn() as core::ops::function::FnOnce<()>>::call_once((null)=(symbol_mangling_consumers::main at main.rs:16), (null)=<unavailable>) at function.rs:250:5
We have to decide on how we want to deal with these kinds of breaking changes.
The expected breakage for each new addition is limited because only symbol names using new features will be affected. On the other hand, the mitigation seems feasible in terms of maintenance, artifact size, and compile time cost.
Choosing either of the options is better than having no policy 🙂