changed 2 years ago
Linked with GitHub

Non-local errors in Rust

Feb 22, 2023
Design meeting issue: lang-team#195

Summary

Post-monomorphization errors are part of a larger family of compile-time errors, referred to in this doc as "non-local errors". These errors are occasionally very useful but violate some nice properties of the language. The language team needs to understand the tradeoffs of these errors, then decide its stance on them and in particular, how they relate to the decision to stabilize inline_const.

Framework and definitions

While this doc is not meant to capture every aspect or decision point regarding non-local errors, we will first attempt to establish a broad definition and framework for thinking about them.

Hopefully this provides a useful context for the specific decisions being made today, in addition to being useful framing for future conversations.

Stages of errors

For the purposes of this conversation, in Rust, we can broadly sort errors that indicate bugs into these stages. They are sorted in ascending order of how much information they incorporate.

  1. Syntactic: Invalid syntax or use of an unknown name. The compiler does not understand what the user is trying to say.
  2. Semantic: Usually, type system or borrow checker errors. The compiler understood the user to mean something, but it went against language rules that protect against undesired states. Pinpointing the cause of the error is relatively easy, because the language rules are designed to enable local reasoning.
  3. Non-local: The focus of this doc. Similar to (2), except pinpointing the cause of the error is relatively hard in the general case. From the compiler's perspective something has gone wrong at the level of an entire call/use chain in the program, rather than a specific line of code or function.
  4. Global: Similar to (3), except something has gone wrong at the level of the entire binary. There are two pieces of code that are inherently incompatible and cannot be linked together.
  5. Runtime panic: Some unanticipated error condition arose, usually signaling a bug in the program, but it was not caught until runtime. Sometimes "at runtime" means "in production". At this point pinpointing the exact cause is usually hard and must be done by a human. Bugs caught in production can be much more expensive than bugs caught at compile time.

Non-local errors

A non-local error is a compile-time error that appears in code only in some cases, contingent on external factors not controlled by the code itself.

Non-local errors are often more difficult for a user to act on than regular errors, because:

  • They violate abstraction boundaries, requiring a user to gain more context than they generally need to use a component.
  • Pinpointing the cause is difficult for tooling to do in an intuitive way.
  • They are reported later in the development process, making fixing their true cause more costly. In some cases the user does not control the offending code.

There are a few overlapping categories of possible non-local errors:

  • Language level
    • Parameter-dependent errors only show up for certain values of generic parameters (types or consts).
    • Use-dependent errors only show up when code is transitively used from a public function or main.
    • Config-dependent errors only show up when building code for certain cfg values.
  • Tool level
    • CLI-dependent errors only show up when passing certain flags or commands, e.g. cargo build rather than cargo check.
    • Implementation-dependent errors depend on implementation details of the compiler frontend, such as how optimizations get applied.
    • Backend-dependent errors only show up when generating code on a particular backend, possibly for a particular target.

Evaluating a non-local error

There are several axes we can use to judge how problematic a non-local error might be.

  • Axis of context: How much information does the user need to effectively reason about whether an error will occur, or how to fix it?
  • Axis of support: How precisely can tooling pinpoint the cause and suggest a remedy?
  • Axis of actionability: How easily can the user correct the error?

Generally, the closer to its actual cause an error can be reported, the better.

What happened to post-monomorphization errors?

An earlier version of this doc referred to post-monomorphization errors, a term that is tied to a compiler implementation and can mean something different based on who you talk to. It has therefore been replaced with the terms defined in the above section.

Local reasoning in Rust

Rust traits, parametricity, and composition

Rust's trait system is designed to eliminate non-local errors, specifically parameter-dependent errors, by requiring generic code to declare all requirements in its signature.

This in turn enables users to create high-fidelity abstractions that cover over details of their implementation while being reusable in many different applications. These abstractions are, arguably, a key part of what enables the crates.io ecosystem to develop and interoperate in a scalable way.

However, as we'll see below, there are times when Rust chooses to pierce the veil of abstraction in favor of more practical concerns.

Comparison to C++ templates

C++ templates are based on text substitution, meaning they don't type check until instantiated with a given type.

This can lead to cryptic errors like "no member function length on value of type int" within the template itself, when the real problem was that the template should only have been instantiated with a container type. Template instantiation errors come with a "backtrace" of instantiations for the user to search through, looking for the actual cause.[1]

The stark difference between Rust and C++ in this regard has led to a common belief that "Rust has no post-monomorphization errors". That isn't really true, but they are much less common.

Uses of non-local errors

Const eval

Const eval is useful when you need to guarantee an invariant that can't be expressed in the type system. This is great for internal consistency checks (static assertions), but it can also be used to enforce non-local interface contracts. These come in several flavors:

  • Type properties: Unsafe code that doesn't work with ZSTs, or requires size/alignment to be a power of 2.[2]
  • Domain restrictions: Generic code that doesn't work with some values of const parameters, like zero.[3]
  • Non-local static assertions: Checking that the size of a type in an external crate doesn't change, or that an anonymous future doesn't exceed a certain threshold.

In each case, reporting an error at compile time is preferred to reporting it at runtime.

Other mechanisms

Sometimes Rust itself defers errors from local to non-local, global, or runtime.

  • Cell and RefCell work around borrow checker restrictions by deferring checking to runtime.
  • Opaque types leak auto traits, eliminating the need to write them in a bunch of places, and working around the lack of implication bounds (e.g. impl Future + (Send if T: Send)). This can violate abstraction boundaries and create semver hazards.
  • The global allocator can only be defined once; this eliminates the need to pass an allocator parameter through every type.
  • mem::transmute gives an error if transmuting between two types of different sizes.
  • Any and TypeId allow a user to do dynamic type checking at runtime.

Avoiding global errors in Rust

Note that linking errors, which are global, can be a common source of pain in other systems languages. Rust tries hard to avoid these with features like:

  • Orphan rules in the trait system
  • Lack of a global namespace (undone by #[no_mangle])
  • Hashes in mangled symbol names
  • Managing the linker invocation for you
  • The -sys crate pattern

Similarly, dynamic languages run into global conflicts when multiple libraries "monkey patch" the same code in an incompatible way (common in Ruby, for instance). Rust prevents this with a strong static type system that enforces abstraction boundaries.

In situations where global errors can arise (such as the global allocator), they are expected to be easily diagnosed and actionable by the binary crate author.

Non-local errors in APIs

Library authors sometimes have to choose a level at which to encode an invariant.

For instance, a crate author could choose to parameterize their entire crate on some pervasive concern like logging or an async executor. Or they could choose to require that a global context is set and defer the check to runtime. The latter is usually more ergonomic; it can be a good choice when the user is likely to encounter and fix any problems during development.

Moving a requirement up in the stage hierarchy is almost always a breaking change. Some requirements (such as where clauses) can be moved farther down, though I don't know if this is common in practice.

Decision: Use- and CLI-dependent errors

Question: Should any errors depend on check vs build, whether code is used, or on how code is optimized?

Status quo: Type system errors are always surfaced, unless they are cfg-dependent. Const eval errors are use- and CLI- dependent; they do not show up in unused code unless -Clink-dead-code is passed, and do not show up at all in cargo check.

Proposed ideal world: Frontend errors are always surfaced. build vs check, codegen flags, and the details of MIR optimizations do not affect which errors are shown. Errors from the backend and linker can still occur when using native libraries, in lower tier targets, or in highly unusual situations unsupported by the backend.

  • Pro: Users expect cargo check to surface all errors.
  • Con: Could impact compiler performance.

Implemented in #107510.

Alternative 1: Accept the ideal world, but make exceptions in cases where compiler performance is severely affected.

Alternative 2: Codify the status quo, or some part of it. Errors are defined to occur in code that is reachable from main for a binary crate or any publicly-reachable function for a library crate.

Decision: inline const

Background

Parameter-dependent errors can already be expressed directly using associated consts. Example:[4]

fn require_zst<T>() {
    struct Anon<U>(PhantomData<U>);
    impl<U> Anon<U> {
        const A: () = assert!(std::mem::size_of::<U>() == 0);
    }
    Anon::<T>::A
}

Parameter-dependent errors have been possible since associated consts were stable in 1.20,[5] since panicking operations were available in const since then. size_of was const-stabilized in 1.24, making the general form of this error available.[6] I'm not sure if there was ever discussion within the lang team about the implications of these features on parameter-dependent errors.

const_panic, stabilized in 1.57 (Jan 2022),[7] made implementing one feel like less of a hack and provided for nicer error messages. The justification seems to be that this was already possible with various hacks, so we might as well make it nice.

The inline_const feature makes the above example significantly easier to write. Note that since the const context is no longer embedded in an item, it is free to name the parameters in scope directly.

#![feature(inline_const)]
fn require_zst<T>() {
    const { assert!(std::mem::size_of::<T>() == 0); }
}

Finally, since inline_const also makes const contexts easier to access in general, it could exacerbate problems with use- and CLI-dependent errors identified above.

Questions

Based on the previous decision and the last paragraph, are we concerned with how inline_const could further proliferate use- and CLI-dependent errors? Does that concern affect stabilization?

Are we concerned with how inline_const further expands access to, and endorses, parameter-dependent errors? Does that concern affect stabilization?

Decision: Stance on non-local errors

Should the lang team have an overall stance on non-local errors? Is it important to justify the status quo, and how would we justify it?

This comment by RalfJung suggests:

I view post-monomorphization errors in consts as a fallback plan that library authors can use when expressing their constraints with traits doesn’t work or becomes too unergonomic, or when it concerns conditions that are very unlikely to be violated in practice so it’s not worth burdening all users with this concern.

Future possibilities and interactions

Controlling monomorphization

scottmcm suggests something like if const that would not monomorphize anything in its body if the const expression evalutes to false. This makes non-local const eval errors actionable.

Uplifting parameter-dependent errors

If we added new bounds like T: ZeroSized or N > 0 to the type system, users might need to produce these bounds from code that previously only used static assertions to enforce them.

There are a couple ideas for how to do this.

Ad-hoc specialization

pub fn old_thing_that_requires_zst<T>(x: T) {
    const { assert!(size_of::<T>() == 0); }
    where T: ZeroSized {
        new_thing_that_requires_zst::<T>();
    }
}

As far as I know the only thing that makes this "ad hoc" is the syntax; it's equivalent to full specialization.

Deferring trait bounds to after monomorphization

We could opt to enforce some/all trait bounds at monomorphization time, only making it a lint if you don't have bounds in your environment that guarantee they will always be met.

This would allow existing APIs to add the bounds without a breaking change.

It was noted in the discussion that a version of the above inline where syntax that introduces mono-time assertions that can then be relied on, instead of compiling code conditionally, would be a feasible way of introducing these deferred bounds (and would not rely on specialization).

Future use cases for NLEs

  • C++ interop: Many legacy codebases that fit the use case of Rust are C++ and cannot be rewritten wholesale. Interop with C++ is an area of active research and development, but use of template APIs, which are very common, will always be limited without some mechanism to instantiate templates with type parameters. This requires parameter-dependent errors. The solutions in the above section would also help here.
  • Contracts: Some of these might qualify as NLEs.
  • Maybe others?

Potential discussion questions

How do I add a question?

Ferris: To add a question, create a section (###) like this one. That helps with the markdown formatting and gives something to link to. Then you can type your name (e.g., ferris:) and your question. During the meeting, others might write some quick responses, and when we actually discuss the question, we can take further minutes in this section.

Observation: Mono-time name lookup feels like the most horrific part of C++ templates.

scottmcm: Upon reflection, I think that the main horror with C++ templates is the ad-hoc extension points. Not knowing whether foo(x) is supposed to be an in-scope-at-template-definition function or an overloaded extension point seems like it causes most of the weirdness. In comparison, Rust's principled extension points (MyTrait::blah(x)) avoid that: the MyTrait would have to be resolved at generic definition time, and the corresponding error for the type not implementing that would be quite clear, albeit potentially highly nested and thus non-local. More details on zulip.

scottmcm/niko: Being able to say "T does not implement Trait" is better than "I don't know about this method".

gary: Const eval is similar in that it has a clear error message.

Philosophically, what's the line between trait generics and value generics here?

scottmcm: For example, is there a difference between needing to add where const { N + M < 100 } all the way up the chain and needing to add where T: Debug all the way up the chain?

gary: I think there is value in deferring trait bound errors to mono-time. People seem to be more amenable to const eval errors than this though.

scottmcm: Someone Tyler mentioned in Zulip that if we deferred Send checks on async fn for example, maybe that would be good enough.

nikomatsakis: I think that cuts both ways.. feels very non-local today.

tmandry: Could imagine a compromise where you need to put Send bounds at public crate boundaries, but on private code you don't need them (enforced at mono-time).

nikomatsakis: Enforcing const eval errors at mono time: If you're enforcing a precondition, fine, but other times it seems like you're leaking implementation details.

gary: still better than duck typing and templates.

Do we need strict language rules for what gets monomorphized?

scottmcm: Is a compiler allowed to monomorphize things spuriously, and have compilation fail if one of those "extra" monomorphizations fails? Would that mean that -C link-dead-code makes it "not Rust"?

Similarly, is it a legal optimization to stop monomorphizing foo::<T> if it's seen in if false { foo::<T>() }, or could people be relying on that NLE for a safety property?

What's the plan for doing bounded const generics without NLEs?

scottmcm: Is there a feasible way to prove "early" the kinds of things that people will want to bound in const generics?

  • Will we possibly be stuck with NLEs being the only feasible way to do this anyway, given that we wouldn't want to solve NP-hard problems at compile-time.
  • Is there some sufficiently-useful subset that we can check feasibly? I think of how we don't try to do do exhaustiveness for general expressions (x if x > 0) in match, but have a more-constrained problems space (x @ 1..) for which it's feasible. What operations are needed in common cases today? How can we bound concat([T; N], [T; M]) -> [T; {N + M}]?

nikomatsakis: I don't think we have one, requires an SMT solver basically

gary: we have generic_const_exprs but that's very limited.

josh: At some point if it starts becoming complicated to echo the body of the function as five different conditions that need to be met, it stops being worth it. It's still better to have a mono-time error than a runtime error.

nikomatsakis: I think this feels similar to contracts. I would like to have contracts that are enforced dynamically but with option for an add-on tool to check statically.

scottmcm: And those contract tools are already running SMT solvers.

nikomatsakis: And I'd rather keep those out of the core language.

Is the an "in the signature" middle-ground?

Disclaimer: Unbaked idea, might be terrible.

scottmcm: Inspired by the TAITs conversation, maybe there's a way to attack this by making it visible? For example, if the errors had to happen in associated constants of a trait mentioned in the signature, then they'd be more obvious, though still depend on implementation details of those impls, not just on bounds, so still wouldn't be as transparent as trait errors but typenum shows that trait errors aren't necessarily clear/obvious either.

(Though that substantially nerfs inline_const, so I'm not sure I actually like it.)

Interaction of inline-const and use- or cli-dependent errors

nikomatsakis: The doc says

Based on the previous decision and the last paragraph, are we concerned with how inline_const could further proliferate use- and CLI-dependent errors? Does that concern affect stabilization?

but it's not obvious to me how much those use- and CLI-dependent errors are "inherent" to inline const. The previous "decision point" was saying "no, let's not have use- and cli-dependent errors". It seems clear that inline const will result in use-dependent errors, because we can't know if a check fails until we know the value of the const generics. But this doesn't necessarily imply cli-dependent errors, right?

tmandry: To clarify, "use-dependent" doesn't include parameter-dependent.

Performance impact

Doc says:

Proposed ideal world: Frontend errors are always surfaced. build vs check, codegen flags, and the details of MIR optimizations do not affect which errors are shown. Errors from the backend and linker can still occur when using native libraries, in lower tier targets, or in highly unusual situations unsupported by the backend.

  • Pro: Users expect cargo check to surface all errors.
  • Con: Could impact compiler performance.

Implemented in #107510.

is there a sense of the compiler performance impact of this PR?

[eager] inline const?

The 8472: Would a marking specific inline consts for eager evaluation in cargo check provide a middle ground?

Actionable?

nikomatsakis: What does "actionable" mean here?

This makes non-local const eval errors actionable.

I think it means: it is possible to write a const error in dead code without aborting compilation. e.g., in something like this?

const fn foo<C: usize>() -> f32 {
    if const C != 0.0 { 1.0 / C } else { 0.0 }
}

NLE vs runtime panic

gary: I think one key point to consider is whether we would prefer NLE to runtime panic. Currently people tend to defer panic to runtime for many functions that doesn't take ZST or zero. In my view NLE is always better than runtime panic for checking bugs.

gary: Also in Rust-for-Linux we try to avoid runtime panic enough that we make code not compile depending on whether optimiser can get rid of checks. See https://rust-for-linux.github.io/docs/kernel/macro.build_assert.html

scottmcm: One huge reason that runtime panic can be better is that it's context-sensitive, which the current NLE errors are not. So if you have a runtime check for the condition, the error in the unreachable monomorphized code is obnoxious.

Breaking changes to move to a better experience

scottmcm: To look at the as_chunks example again, if we stabilize it as const { assert!(N > 0) }, it'd be a breaking change to move to the "better" where const { N > 0 }, should that become possible in future. Does that mean that core still shouldn't use NLEs for things?

gary: Maybe not std, but if it's a crate people would start using it.

nikomatsakis: Could we use editions? In earlier editions it would not be checked, in later ones it would.

gary: Could prevent library from upgrading editions because they would have to break their API.

nikomatsakis: Could assert that something is true, which is a non-local error,

tmandry: Equivalent to specialization? Not if it's just an assertion, rather than conditionals.

gary: Since consts are just values, don't have to worry about lifetime issues in specialization.

Decision: Use- and CLI-dependent errors

nikomatsakis: Remember GCC did stuff like this. Obviously a worse user experience to have CLI-dependent errors, but so is slower compilation time.

scottmcm: If we're going to say this exists we have to define what monomorphizes; does this mean that -C link-dead-code isn't Rust anymore?

nikomatsakis: Is there a way to think about these as "deny-by-default" lints?

gary: Miri people would not like to make another const eval-like lint.

scottmcm: I'm not sure what allowing it would mean.

nikomatsakis: Semantics is if it runs it panics, but it's not going to run.

Would this be desirable? Defining what gets monomorphized isn't that bad.

niko: Interesting example: You could imagine a compiler that starts from your main function and only builds exactly what's needed in all your dependencies. In fact I would like that.

gary: It's not causing an issue if it's not being used..

scottmcm: that breaks a bunch of stuff like the fn _must_support_dyn(_: &dyn Foo){} pattern?

nikomatsakis: There may be limits to what we can cut out. My assumption is that

scottmcm: Obligatory mention of dead-code optimize if const { expr } even in opt-level=0 #85836 (which is also mentioned below).

tmandry: Do we have any consensus?

scottmcm: If we don't have a plan to not have these, don't we need to.. have them?

joshtriplett: It's strictly better than a runtime assertion.

scottmcm: Only if it's reachable that's where it gets awkward to me. Also examples of "if T is ZST then call this thing" in std, if that was a PME it would be seriously broken. Don't think we can say you should move all your panics on const generics to NLEs.

scottmcm: For example, there's lots of things like if T::IS_ZST { return unsafe { mem::zeroed() }. That works great, but if zeroed got a NLE for things where it's insta-UB to zero them, it'd blow up horribly (unless we had #85836).

Follow up: Summary of discussion

tmandry:

  • Consensus that this is a useful framework for thinking about the problem
  • Consensus that we prefer compile time errors to runtime
  • No consensus on whether we should eagerly surface errors
  • Consensus that we would probably have to allow PMEs, because there was no way we were going to be able to express everything in the type system?
  • Lingering question about whether we would ever want these in std maybe not, but useful for ecosystem libraries.
  • Some options available for making it possible to migrate in the future, though we haven't covered these in detail yet. Seems like they would work but can't say for certain.

Follow up: Decisions on inline const

tmandry: There is a summary of the consensus reached on inline_const in this comment.


  1. C++20 introduced concepts and constraints, which are basically named boolean expressions at the level of the type system that are checked at instantiation time to provide better error messages. These are not used to check the template definition, as far as I know, and they can therefore afford to support arbitrary constraints such as A || B disjunctions. C++20 does not appear to be in widespread use at the time of writing. ↩︎

  2. See for example sub_ptr, mentioned by scottmcm in this comment. ↩︎

  3. See for example as_chunks, where a decision on how to enforce N != 0 is currently the blocker for stabilization. ↩︎

  4. Adapted from this comment by RalfJung. ↩︎

  5. See this example from oli-obk. It depends on manually defining associated consts for every type, since size_of was not const stable until 1.24, and the fact that integer underflow panics. ↩︎

  6. Here is another way to implement this assertion without panicking directly in const, using only the index operator. The static_assertions crate was a collection of similar hacks. ↩︎

  7. https://github.com/rust-lang/rust/pull/89508 ↩︎

Select a repo