Policy Discussion: Updating v0 symbol mangling

As new features get added to Rust, the v0 mangling scheme needs to evolve to accommodate them. Some additions like the new f16 and f128 basic types could be added in a backwards compatible way, but others, like complex const generics require extending the grammar.

Symbol names containing new grammar constructs cannot be demangled by existing demanglers. The main goal of this discussion is to decide on a policy on how to deal with this breakage:

Do nothing: external tools just will have to catch up with the changes.
Mitigate the problem by implementing more fine-grained graceful degradation as suggested in MCP 737 (at the cost of sometimes longer symbol names and some implementation complexity).

Both options are viable. The following sections describe the implications of each to inform a decision.

The MCP 737 mitigation

When a tool encounters a name it cannot demangle it usually will display the mangled version. E.g. instead of

my_crate::foo::<{my_crate::Foo { x: 4, y: 5 }>

it would display

_RINvCs38VALFIIAdn_8my_crate3fooKVNtB2_3FooS1xl4_1yl5_EEB

To mitigate the problem, MCP 737 suggests making the compiler emit symbol names in a way that allows demanglers to skip unknown parts. As a result, demanglers that don't know about complex const generics would display:

my_crate::foo::<KVNtB2_3FooS1xl4_1yl5_E>

i.e. the const generic argument is still displayed in its mangled form but the rest of the function name has been demangled successfully. Once demanglers know about this mechanism, they can prettify the skipped parts to make more clear what's going on:

my_crate::foo::<{parse error: KVNtB2_3FooS1xl4_1yl5_E}>

In theory it would also be possible to update the rustfilt tool to repair such partially demangled symbol names.

However, providing this mitigation comes at a cost:

Every skippable section must be prefixed by C<numbytes>_ where <numbytes> is an ASCII decimal number encoding the length of the skippable section. This increases the length of symbol names. An experimental perf run did not show significant file size increases, but perf.rlo is not a good test case because the benchmarked crates don't contain complex const generics. In the extrem, every skippable section adds around 4 bytes, and every leaf generic argument is skippable.
On the compiler side, we have to determine if a section is skippable and then implement the wrapping. An example for the former can be seen here, and the wrapping logic can be implemented in a generic way as seen here. As tmiasko has pointed out, the wrapping logic implementation has exponential complexity because it needs to do a fixed point iteration until all <numbytes> sections have the correct number of digits. In practice, however, I (mw) don't think this is a problem. I was not able to construct an example that needed more than 2 steps to converge. Having the additional logic in the mangling code does pose a potential maintenance burden.
On the demangler side, the skipping logic has to be implemented once. An implementation for the rustc-demangle crate can be found here. Implementing this has medium complexity and doing the implementing in plain C (as required for some demanglers) is not going to make it easier. On the other hand, once it is implemented it does not need to change for future grammar additions.

How Grammar Changes Affect Tools

Different tools are affected in different ways. Some tools, like GDB, only rely on symbol name demangling if no debuginfo is available. Others like LLDB and Valgrind always use symbol names^[1]. The following table shows how grammar additions affect each tool (with and without debuginfo present) and how long it usually takes for a fix to arrive on the users system:

	With Debuginfo	No Debuginfo	Expected time until fixed
GDB	OK	breakage	months to several years
LLDB	breakage	breakage	~6 months
Valgrind	breakage	breakage	months to several years
perf	?	?^[2]	months to several years
c++filt	N/A	breakage	months to several years
llvm-cxxfilt	N/A	breakage	~6 months
rustfilt	N/A	OK	immediately
backtrace-rs	OK	OK	immediately
WinDbg	OK	?	N/A (uses debuginfo only)

Any tools that come with Linux distros and that usually stick to the same major version throughout the lifetime of the distro release are affected worst because fixes only become available with the next distro release at the earliest. For users having to stick to LTS versions that can be several years.

For other tools like LLDB and llvm-cxxfilt it is more common to have up-to-date packages for new major versions available, or for the tool to be acquired through other channels.

Anything that is distributed together with the compiler (e.g. backtrace-rs) or that is installed via cargo install has no problem. The tools can be updated at the same time as the grammar change is made.

Example: LLDB backtrace

The following is an example of an LLDB backtrace of a program with full debuginfo, with and without the mitigation. The first frame either is the mangled symbol name or a partially demangled one.

Without mitigation:

* thread #1, name = 'symbol_mangling', stop reason = breakpoint 1.1
  * frame #0: 0x05678ec _RINvCs38VALFIIAdn_25symbol_mangling_consumers3fooKVNtB2_3FooS1xl4_1yl5_EEB2_(x=0x055bbf8) at main.rs:13:22
    frame #1: 0x05679ad symbol_mangling_consumers::main at main.rs:17:5
    frame #2: 0x056789b <fn() as core::ops::function::FnOnce<()>>::call_once((null)=(symbol_mangling_consumers::main at main.rs:16), (null)=<unavailable>) at function.rs:250:5

With mitigation:

* thread #1, name = 'symbol_mangling', stop reason = breakpoint 1.1
  * frame #0: 0x057eb2c symbol_mangling_consumers::foo::<KVNtB2_3FooS1xl4_1yl5_E>(x=0x0562c10) at main.rs:13:22
    frame #1: 0x057ebfd symbol_mangling_consumers::main at main.rs:17:5
    frame #2: 0x057eadb <fn() as core::ops::function::FnOnce<()>>::call_once((null)=(symbol_mangling_consumers::main at main.rs:16), (null)=<unavailable>) at function.rs:250:5

Conclusion

We have to decide on how we want to deal with these kinds of breaking changes.

Accept the breakage until external tools catch up, or
Mitigate the problem as proposed in MCP 737.

The expected breakage for each new addition is limited because only symbol names using new features will be affected. On the other hand, the mitigation seems feasible in terms of maintenance, artifact size, and compile time cost.

Choosing either of the options is better than having no policy 🙂

At least I found no setting for changing that in either tool. ↩︎
I was not able to get perf to demangle anything or use debuginfo, even when building my own version of it 🤷 ↩︎