Design meeting: consts in patterns

We allow using (some) constants in patterns. However, we cannot allow all of them: some just don't have a way of being compared, such as unions; others get rejected for being "not structural-match", as defined in RFC 1445. The structural-match check had some holes so some constants that we do accept get linted as "this will be an error in the future". These lints were introduced a long time ago and have been warn-by-default since Rust 1.48, close to 3 years ago. Since then the pattern matching implementation in the compiler changed a lot and our ideas of what we do and don't want to do with pattern matching also changed. We also realized there are some gaps in what the RFC discusses, such as raw pointers.

It's time to figure out where we want to go with this: enforce RFC 1445 by making (some of) these lints hard errors, or change our mind and remove the lints.

The main questions to figure out are:

  • Which exact consts do we want to accept in patterns?
  • What should their semantics be, i.e., when are they considered to "match" the place-being-matched-on?

Some smaller questions that also show up are:

  • How should consts interact with exhaustiveness checking, i.e., when will the constant value be taken into account to compute which cases are already covered by match arms?
  • Should we lint against some consts that we do allow, e.g. because they have surprising semantics?

Terminology: "structural match"

Before we dive deeper, we need to define some words that will come up again and again.

We say that a type is structural-match if its PartialEq instance is (syntactically or semantically) equivalent to the one that would be generated by derive(PartialEq). This term only really applies to ADTs. The StructuralPartialEq trait reflects this property. Note that this is non-recursive, it only talks about the PartialEq instance of this type, not about its fields!

We say that a type is recursively structural-match if all ADTs that recursively appear in fields are structural-match.

We say that a value is recursively structural-match if all ADTs that appear in this value are structural-match. Due to enums, it is possible to have non-structural-match types where some values are structural-match, such as the None value of Option<MyNonStructuralMatchType>.

Possible design options

Largely, there are two "main points" in the design space. Of course one could also consider a design that sits somewhere in between those two points.

Option 1: Consts desugar to a pattern.
This design considers a constant used in pattern position to be basically syntactic sugar for a pattern that one could have also written otherwise.
For instance, matching on a constant with value (0i32, None) is completely equivalent to writing the pattern (0i32, None).

Option 2: Consts desugar to ==.
This design considers a constant used in pattern position to be basically equivalent to a == guard, except that possibly exhaustiveness checking can take into account the concrete value of C.
For instance, matches!(x, C) aka match x { C => true, _ => false } would desugar to x == C.

Design option discussion

Option 1: Desugaring to a pattern

Reading through the historic record, this is likely the originally intended design.
In particular, RFC 1445, which introduced the "structural match" restriction, seems to pre-suppose that we want constants in patterns to behave like a regular pattern desugared from the computed value of the constant.

One consequence of this design is that the value of the const must be visible we need the exact pattern to build the MIR, and to do exhaustiveness checking, so if the const cannot be evaluated (since it depends on some generic parameters), we have to reject it.

With this design we could accept constants as patterns that do not implement PartialEq, let alone Eq. (In fact we currently do accept some such constants, though the examples are contrived and this is likely an accident it involves unnecessary bounds on the derived PartialEq instance. Non-PartialEq constants in patters used to be more common when not all function pointer types implemented PartialEq, but that has been fixed.)
Of course we might want to require PartialEq to keep our bets open for the future, e.g. to be forward-compatible with option 2.

This design makes match as a language construct completely independent of any user-defined code such as ==.

Pointers in const-patterns

However, to really say that constants desugar to a pattern, we must make sure that the "leaf types" (after traversing all tuples, structs, enums, and arrays) can actually be used as patterns.
And that is not actually the case: we allow matching on constants that involve raw pointers and function pointers, which are not otherwise allowed as patterns.
So, to use this design option, we must pick one of:

  • Consider the Rust language of patterns to include raw pointer and function pointer patterns. These patterns do not have a direct surface syntax, they can only come about through the desugaring from constants. They behave like == on these types (which is a built-in primitive, so no user-defined == is sneaking in here).
  • Deprecate and eventually remove support for matching on consts that involve raw pointers or function pointers. (Currently, we emit a future-incompat lint for function pointers and for raw pointers to unsized types, but not for other kinds of raw pointers.)

Of course this choice can be made independently for function pointers, and raw pointers with different pointee types.
RFC 1445 does not mention raw pointers or function pointers at all, and arguably they are not very "structural" in their equality.
(It does mention floats and calls them out as non-structural.)

If we want to not have function pointer or dyn Trait raw pointer "leaf patterns" because their notion of equality is wonky and definitely not structural, pointer_structural_match (or a subset of it that accepts raw slice pointers) needs to be made a hard error.
If we want to not have any raw pointer "leaf patterns", we need a new lint; however, such patterns are used widely, including in the standard library.

We could consider only allowing matching against raw pointer constants such as 4 as *const i32 but not &42, i.e., only constants with a fixed integer value rather than some dynamic memory location.
We make few to no guarantees about pointer identity for pointer equality on consts, so this can be justified but it would require a new future-incompat lint since currently we just accept such code.
(The fallout from rejecting such patterns is thus also unknown, a crater run would be required.)
Integer-constant raw pointers arguably are as structural as regular integers, so allowing match on them is in line with the general philosophy of this design option.

Floats in const-patterns

Floats are also an interesting question here.
They have been allowed as primitive patterns (even without constants) since Rust 1.0, but RFC 1445 called them out as non-structural and we have had a future-incompat warning against float patterns for a long time now.
However, when the proposal came up to turn this warning into a hard error, that PR was rejected by t-lang.

Floats have strange equality on NaNs (which are never equal to anything) and zeroes (where positive and negative zero compare equal).
We could consider entirely rejecting NaNs in patterns since those arms can never be reached anyway, but accepting other float constants.
That would still allow the use-cases people brought up in the tracking issue.
(For floats we also allow range patterns, but NaN seems to be rejected in range patterns.)
Rejecting zeroes would likely be a lot more surprising, so here we likely have to live with the fact that match will consider both zeroes to be equal (if we want to allow float matching at all).

Another option for matching on NaN would be to make the f32::NAN pattern match any NaN.

Structural match restriction

Based on this design, the entire "structural match" story was born out of a desire to reject cases where the desugared pattern does not behave like == would. (That's what RFC 1445 was all about; also see the motivation there.)
To achieve this we must reject consts whose value is not recursively structural-match.
This was originally intended (in the RFC) to be a hard error, but arguably could also be made a lint.
(This also explains why the StructuralPartialEq trait is a safe one it isn't really load-bearing in this design.)

One notable downside of the structural match checks is that it makes switching from the derived PartialEq to a custom one (e.g. one that is more efficient or avoids unnecessary bounds) a breaking change, even if the behavior of == remains unchanged.
If we want a hard guarantee that pattern semantics and == semantics agree to avoid any potential confusion, this cannot really be avoided.
We could make the trait unsafe and off-load the guarantee partially to the user writing unsafe impl StructuralPartialEq.
We could also make the trait safe if we treat the structural match check like a lint.

This restriction, when implemented as a hard error, ensures that option 1 is forward-compatible with option 2.

Changes from today's behavior

To desugar consts into patterns we need to either reject raw pointer values or consider them legal "leaf patterns" (which likely means we need to permit constructing valtrees with raw pointers, since pattern construction goes through valtrees).
At least the pointer_structural_match lint should be made a hard error.

If we want the structural match check to be a hard error, the indirect_structural_match future-compat lint also has to be turned into a hard error.
If we want to be able to analyze whether a value is recursively structural-match without computing it (using logic similar to how we determine whether a const value needs dropping), we need to make nontrivial_structural_match a hard error; but we could alternatively just say that we will compute the value of the constant and check that as basis for the hard error/lint.
(We have to compute that value anyway.)

To follow this design we also have to change the behavior of some existing code such as this one, where currently one can tell that we are not actually desugaring consts to native patterns.
This code already has a future-incompat lint (and has had it for a long time), so we could avoid silently changing semantics by making such code a hard error instead.

Option 2: Desugar to ==

The alternative to the above is to say that constants used as patterns behave like ==, and everything else is linting and quality-of-life improvements.

Why would we explain consts-in-patterns via == rather than desugared patterns?
We already allow using floats in patterns without any constants being involved (this has worked since Rust 1.0, though we started linting against it at some point many years ago), and that uses == semantics rather than "exact bitwise equality" and one can argue about whether this is "structural".
We also allow raw pointers to sized types and don't even lint against that; whether one considers == on *const u8 to be "structural" is probably a matter of opinion arguably that type doesn't really have any "structure", and its notion of equality is a very low-level machine detail.
So saying that all consts use == is not a total surprise.
(OTOH, as discussed above, if we restrict this to "integer pointers" then raw pointer equality can be argued to be structural.)

One big advantage of this option is that we will be able to allow matching against generic consts, associated consts of generic types, and other consts whose value we cannot know at MIR building time.

One big downside is that if people expect consts in patterns to behave as if they desugar to a pattern, then they are not getting the semantics they are expecting.
Generally people might be expecting stricter equality from match as what == provides.
(However, if that's the deciding argument, we should do something about matching on floats, and possible raw pointers as well.)
People also expressed the opinion that match behavior should never depend on user-defined code like custom ==.

We can of course still detect and lint against "non-structural-match" cases where if the final value were to be written as a pattern, it would behave differently (at least we can do that for consts whose value we can compute).
This would somewhat preserve the spirit of RFC 1445.
The details of what gets linted would have to be determined if we do allow opaque consts in patterns, we probably don't want to lint against every use of them, so the lint would necessarily miss some warnings.

This option was not a real possibility during prior discussion in past years, since not all function pointer types implemented PartialEq.
However, that issue has been resolved now, so (except for what looks like accidents), all types we allow matching on currently do have PartialEq.

We could consider requiring Eq and not just PartialEq, but that would rule out matching on floats.

Exhaustiveness checking

As an example for a quality-of-life aspect, if we can determine the value of the constant, we might want to take it into account for exhaustiveness checking though of course we can only do that if == actually behaves like the desugared pattern, i.e., if the constant value is recursively structural-match.
In those cases we can transparently rewrite the == check to a pattern, knowing it does not change program behavior, and then we can do exhaustiveness checking on that pattern.
(This makes the StructuralPartialEq trait's promise load-bearing for soundness, and the trait should be made unsafe.)

We could say that only constants whose type is recursively structural-match are taken into account for exhaustiveness checking; this would entirely avoid having to run the analysis of whether the concrete value is recursively structural-match.
(However, this would reject some code that we currently accept and don't lint against.)

Changes from today's behavior

For this option we definitely need to reject all constants in patterns that do not implement PartialEq.
There are already forward-compatibility warnings against basically every possible such case, though one corner case was missed.

Other than that we can remove all the structural-equality forward-compatibility lints.
We might consider turning some of them into general lints about potentially surprising behavior.

This is also a massive breaking change for matching on consts in const fn, which is currently sometimes allowed but would never work under this option since == is not const fn.

Further options

Of course we don't have to decide to be on either end of this design spectrum.
We could say that some consts behave like desugared to a pattern, while others behave like ==.
This could be decided based on some trait, or the value of the constant, or other things.

This document by @lcnr describes a variant of this. The trait is called StructuralEq there, but StructuralMatch would probably be a more apt name so we will use that here.
The compiler checks that StructuralMatch is only implemented when all fields are StructuralMatch and not implemented for unions (this ensures a pattern can always be constructed for all values of this type), but otherwise the trait is safe and can be arbitrarily implemented by users.
Consts that implement StructuralMatch get pattern behavior and exhaustiveness checking, all other consts get == behavior and no exhaustiveness checking.
(Floats and raw pointers could be considered non-StructuralMatch to avoid having to ever consider them as primitive patterns.)

This is a slight breaking change compared to today: if an enum has some variants that are StructuralPartialEq and others that are not, a constant whose value is a structural variant currently can participate in exhaustiveness checking.
Here's an example.
@lcnr's proposal would treat this constant opaquely and match via ==.
We don't have a future-compat lint for that so we don't know how much breakage this would cause.

One consequence of this design is that when a constant has type (T, U), whether or not the T part is compared using == or by pattern desugaring depends on whether U: StructuralMatch.
That is a potentially concerning semantic discontinuity.
As another example, if we eventually allow matching on const generics, the same constant value might behave differently when it is used as a pattern via a const generic vs a regular const: in the first case the value is unknown at MIR building time so it uses == semantics; in the 2nd case the value is known so it could be turned into a pattern (if the type is StructuralMatch).

Overall this variant is very similar to option 2 with exhaustiveness checks only for StructuralMatch types, except that we don't promise that all consts behave like ==, but instead say that consts of StructuralMatch type whose value is available at MIR building time behave like the desugared pattern.

Summary and comparison

Desugaring to a pattern:

  • Matches the historic intent
  • Can (optionally) ensure that pattern semantics and == semantics are equivalent with a hard error, at the cost of ruling out matching on consts of types with a custom PartialEq even if that PartialEq is equivalent to pattern semantics.
  • Makes match behavior completely independent from potentially user-defined code such as ==.
  • Is not quite a desugaring, since we need to add support for raw pointer "leaf" patterns despite their notion of equality being a lot more subtle and low-level than anything one can write as a native pattern.
  • Cannot support opaque constants or constants that contain a union, even if they have a sensible notion of equality.
  • Holds custom ADTs to a higher standard than float types and raw pointers, which can be matched on despite not being really fully "structural". (Floats are currently being linted against, but t-lang previously rejected turning that lint into a hard error, so I assume we want to keep allowing float patterns with == semantics. Raw pointer patterns with sized pointees do not even have a lint; they are used in the standard library and presumably widely used in the ecosystem to check against sentinel values, so I assume we want to keep allowing those as well.)
    • There are possible sub-options here that could avoid some of these issues, such as allowing matching only on particular values when raw pointers and floats are involved. For raw pointers we could rule out constant pointers that are not integer constants we guarantee little about their identity anyway. For floats we could rule out NaNs; those match arms are unreachable anyway. It's unclear how surprised people would be about such value-based restrictions there is precedent; after all we already allow matching on None but not Some(MaybeUninit::new(...)) even when those both have the same type. However, for floats, NaNs are not the only values with "strange" equality, there is also the fact that +0.0 == -0.0 despite those having different bit patterns, so if we disallow NaNs the question comes up what we should do about zeroes.
  • Changes behavior of some code we used to accept without warnings where == and pattern semantics disagree. However all such code has future-incompat lints since Rust 1.48 (November 2020).

Desugaring to ==:

  • Can support opaque consts and other consts one could not write as a pattern (e.g., a type that involves a custom tagged union with a PartialEq instance).
  • Can only best-effort lint against possible cases of semantic mismatch between the pattern and ==, leading to possibly surprising behavior that the programmer did not expect.
  • Defies expectations of programmers that expect match to have a more strict notion of equality than ==.
  • "Opens the floodgates"; suddenly one will be able to match on tons of things, based on their == semantics. That can be seen as a good thing or a bad thing.
  • Is not quite a desugaring, since we want to take into account some constant values for exhaustiveness checking to avoid unnecessary "non-exhaustive match" errors.

Neither of these options has cares much about the Eq trait, only PartialEq is relevant.
Requiring Eq would anyway be inconsistent with allowing matching on floats.

Post-meeting notes

Some of the main arguments:

  1. For option 1: refactoring a binderless pattern into a const should not change behavior. In particular for fieldless enum variants, which are almost identical to consts, this would be really surprising. It also violates the "consts behave as if inlined" principle we've been repeating a lot.
  2. against unrestricted option 1, for option 2: we shouldn't expose operations on a type that a user didn't decide to expose if they gave no ==, we shouldn't allow matching on those consts.

Argument 1 rules out option 2.
Argument 2 means we need to restrict option 1. But how?
The current scheme is geared towards allowing matching on a const if the value is recursively structural-match, and furthermore the type must implement PartialEq.
That means if you have no == nobody can match on your types, so that's good we don't expose syntactic capabilities that the user didn't explicitly expose.
And it means if we allow matching then its behavior is the same as that of ==, so we also don't expose semantic capabilities that the user didn't choose to expose.

If we want the refactoring from argument 1 to always result in compiling code (as opposed to just ensuring that if it compiles, it is a semantic NOP), we need to relax this check in a scope-based way, where if we can see all the fields of a type (we are in the same module or they are public), then we allow matching even if there is no PartialEq and the value is not structural match.
For all-pub types this would mean everyone can match any constant no matter which traits are derived or manually implemented!

But it means if you derive(PartialEq) that's a semver promise that your consts can be matched on, so you can't ever have a non-structural PartialEq in the future.
If we want to avoid that we need to decouple derive(PartialEq) from "allow matching on consts of this type".
This can only be changed via an edition transition.

This proposal does not let one define a MyBool type with an unconventional equality and have reasonable match behavior for that type.
But it does let one define such a type and at least be sure users are not circumventing the abstraction with match.
Supporting that would require much more fundamental changes to our match system.
Ideally we'd remaing forward-compatible with such changes, but what exactly would be required to ensure this?

Compared to today, this proposal only breaks code that we are already linting against with future-compatibility warnings. Specifically this affects the indirect_structural_match lint (which identifies const values that are not recursively structural-match) and the const_patterns_without_partial_eq lint (which identifies const values of non-PartialEq type).
The latter is very recent (not on stable yet, riding the train for 1.74) but appears in cargo's future compatibility reports; the former is ancient but does not appear in cargo's future compatibility report.
If we want to determine whether a const value is recursively structural-match before evaluating it, and instead do an analysis based on the MIR source that computes the const value, then we'd also need to make the nontrivial_structural_match lint a hard error but it's unclear what the motivation for that would be.

However this isn't a complete proposal yet, since no answer is given for:

  • floats
  • raw pointers (thin/wide, integer or "actually pointer")
  • function pointers
  • potentially letting non-derive(PartialEq) types opt-in to allowing matching (with structural semantics)
Select a repo