Type-based Match Ergonomics RFC

# Type-based Match Ergonomics RFC ## Summary Today's match ergonomics have a series of edge cases that make it hard to predict and understand. In this RFC I propose a reframe of match ergonomics that (I hypothesize) is a better match for intuition. This is a breaking change that requires an edition. This proposal, as well as today's stable behavior, RFC3627, and several variants, can be experimented with in [a web tool](https://nadrieril.github.io/typing-rust-patterns/) I wrote, inspired by (and checked against) TC's [`match-ergonomics-formality` tool](https://github.com/traviscross/match-ergonomics-formality/). The tool is an interactive web app that allows setting various options, comparing concrete cases, and showing rulesets in the style of this proposal. I'd like to thank everyone who participated in the various meetings about match ergonomics, and particularly TC and Waffle, for taking the time to understand this complex domain, putting forward proposals, having brilliant insights, and working to make rust better. This proposal could not have come to exist without you. ## Motivation Today's match ergonomics operate based on the "in-memory type" i.e. the type of the place that a subpattern matches on. This type is different than the "user-visible type" i.e. the type that would be obtained if the current pattern was replaced with a binding. When thought of in terms of the "user-visible type", match ergonomics exhibit surprising edge cases: - `mut` sometimes dereferences the value; ```rust let (x, mut y) = &(true, false); // x: &bool, y: bool ``` - `ref`/`ref mut` is sometimes a no-op; ```rust let (x, ref y) = &(true, false); // x: &bool, y: &bool let (x, ref y) = (&true, &false); // x: &bool, y: &&bool ``` - when a pattern binds `x: &T`, writing `&x` inside the pattern has context-dependent effects: it can be a type error, work as expected, or dereference the value twice. ```rust let (x, &y) = (&true, &false); // x: &bool, y: bool let (x, &y) = &(&true, &false); // x: &&bool, y: bool let (x, &y) = &(true, false); // x: &bool, y gives a type error let (x, &y) = &(&mut true, &mut false); // x: &&mut bool, y gives a type error let (x, &mut y) = &(&mut true, &mut false); // x: &&mut bool, y: bool ``` [RFC 3627](https://github.com/rust-lang/rfcs/pull/3627) recognizes these issues and seeks to address them, but doesn't solve them all: - `mut x` is disallowed inconsistently: ```rust let (&x, &(mut y)): &(T, T) = ...; // RFC3627: `x: T, y: T` let (&x, &(mut y)): &(&U, &U); // RFC3627: `x: &U, y: type error` ``` - multiple references are sometimes matched in the opposite order than one would expect: ```rust let [&&mut x]: &[&mut T] = ...; // RFC3627: type error let [&mut &x]: &[&mut T] = ...; // RFC3627: `x: T` ``` - mutability can affect the type of patterns in inconsistent ways: ```rust let [&mut (ref x)]: &mut [&mut T] = ...; // RFC3627: `x: &T` let [&mut (ref x)]: &mut [& T] = ...; // RFC3627: `x: &&T` ``` The starting point of this RFC is the claim that these behaviors do not match intuition, and the hypothesis that this comes from the fact that RFCs 2005 and 3627 operate on the in-memory type, with extra hidden state called the "binding mode". This RFC proposes to typecheck patterns based on the user-visible type instead, which mostly requires no such hidden state and solves all these surprises. ## Guide-level explanation The first idea of pattern-matching is that each type can be matched/deconstructed using the same notation used to construct it: - an `Option<T>` can be matched with patterns `Some(p)` and `None`; - a struct can be matched with `Struct { field1: p1, field2: p2, .. }`; - a reference type `&T` can be matched with pattern `&p`; - a reference type `&mut T` can be matched with pattern `&mut p`; - etc. Match ergonomics extend this behavior: a non-reference constructor is allowed to match through arbitrary layers of references. The references "slip into" the fields of the constructors, and nested references merge into a single one. See "reference-level explanation" for a precise formulation of this idea. For example: ```rust! let (x, &y) = &(true, false); // The reference slips into the fields, so this is equivalent to: let (x, &y) = (&true, &false); // Which gives: `x: &bool`, `y: bool` ``` Compared to stable rust: - `mut` simply makes the binding mutable; ```rust let (x, mut y) = &(true, false); // x: &bool, y: &bool ``` - `ref`/`ref mut` always borrows (but we may forbid the case where this creates an invisible temporary, see next section); ```rust let (x, ref y) = &(true, false); // x: &bool, y: &&bool let (x, ref y) = (&true, &false); // x: &bool, y: &&bool ``` - when a pattern binds `x: &T`, writing `&x` inside the pattern always dereferences the value to `x: T`; ```rust let (x, &y) = (&true, &false); // x: &bool, y: bool let (x, &y) = &(&true, &false); // x: &&bool, y: &bool let (x, &y) = &(true, false); // x: &bool, y: bool let (x, &y) = &(&mut true, &mut false); // x: &&mut bool, y: &mut bool (gives a borrowck error) let (x, &mut y) = &(&mut true, &mut false); // x: &&mut bool, y gives a type error ``` - `&p` is allowed on `&mut T`. ```rust! let (x, &y) = (&mut true, &mut false); // x: &mut bool, y: bool ``` ## Reference-level explanation In this section, we describe the typechecking of patterns using typing rules written in the typical notation of type theory papers: each rule is to be read bottom to top, as "for `<bottom>` to hold, `<top>` must (recursively) hold". The top is a comma-separated list of typing predicates. An empty top means that the rule applies unconditionally. If no rule applies to a given case (e.g. `&mut x: &T`), this means that the case counts as a type error. In these rules, `C` denotes a constructor for the type `CT` with fields of types `CT0`, `CT1`, etc; `p`/`p0`/`p1` denote arbitrary patterns, `T` denotes an arbitrary type; `x` denotes an identifier (for a binding). Without match ergonomics, the typechecking of patterns can be described by the following set of rules: ```rust! // Read "a constructor pattern is valid for type `CT` if each subpattern pi corresponding to the fields of the constructor is valid for type `CTi`". p0: CT0, p1: CT1, .. -------------------- "Constructor" C(p0, p1, ..): CT // Read "a reference pattern is valid for type `&T` if the underlying pattern is valid for type `T`". Same for `&mut`. p: T ------ "Deref" &p: &T p: T -------------- "Deref" &mut p: &mut T // These two describe the behavior of `ref` and `ref mut` bindings. x: &T -------- "BindingBorrow" ref x: T x: &mut T ------------ "BindingBorrow" ref mut x: T // These two are the final rules that are reached whenever we succeed in typing a pattern. They mean that a binding pattern is valid for any type. ---- "Binding" x: T -------- "Binding" mut x: T ``` Match ergonomics (as proposed in this RFC) adds the following extra rules: ```rust! // These two rules describe how references "slip" through a constructor pattern onto its fields. p0: &CT0, p1: &CT1, .. --------------------- "ConstructorRef" C(p0, p1, ..): &CT p0: &mut CT0, p1: &mut CT1, .. ----------------------------- "ConstructorRef" C(p0, p1, ..): &mut CT // These four rules describe how multiple levels of references are merged together before slipping into a constructor pattern. C(p0, p1, ..): &T ------------------ "ConstructorMultiRef" C(p0, p1, ..): &&T C(p0, p1, ..): &T ---------------------- "ConstructorMultiRef" C(p0, p1, ..): &&mut T C(p0, p1, ..): &T ---------------------- "ConstructorMultiRef" C(p0, p1, ..): &mut &T C(p0, p1, ..): &mut T -------------------------- "ConstructorMultiRef" C(p0, p1, ..): &mut &mut T // Optional extra rule, for convenience. p: T ---------- "DerefMutWithShared" &p: &mut T ``` Note that today's (RFC2005) match ergonomics cannot be directly described in this way, as there is extra information (the binding mode) to track. Neither can RFC3627. The rules formalism can be extended to track this extra information (as is done in the tool), but isn't as natural a fit for these rulesets. Note also that the following is allowed: ```rust let opt: &Option<T> = ...; if let Some(ref x) = y { // `x: &&T` } ``` which requires `x` to borrow a temporary, in a way equivalent to: ```rust let opt: &Option<T> = ...; if let Some(x0) = y { let x = &x0; // `x: &&T` } ``` As this may be surprising (especially for `ref mut`), we propose to add a warn-by-default lint that suggests removing the `ref`/`ref mut`, or forbid this altogether. Note finally that the following is allowed: ```rust let opt: &Option<T> = ...; if let Some(mut x) = y { // `mut x: &T` } ``` which cannot be desugared into a form without match ergonomics, as this would require something like `mut ref x`. The current proposal purposefully steps away from the "desugaring" framing as it is no longer necessary to explain the behavior of patterns, so this isn't seen as a problem. ## Migration This proposal changes the meaning of many patterns; as such, this is a breaking change. These rules would only take effect on the next edition. The migration lint will modify all patterns to lie in the subset of rules that are common to all editions. At worst this involves fully desugaring match ergonomics, much like the proposed RFC3627 migrations. Previous editions only get the backwards-compatible changes. This means allowing `&p` patterns on `&mut T` types, and allowing `&p`/`&mut p` patterns to consume the binding mode as if it was a reference in places where there would otherwise be a type error. In the language of RFC3627, that's rules 4 and 5 (not rule 3 since it is not part of this proposal). In terms of tool options (described below), this is `allow_ref_pat_on_ref_mut + EatBoth + fallback_to_outer + eat_inherited_ref_alone`. ## Rationale and alternatives This is a counter-proposal to RFC3627. Both proposals aim to improve match ergonomics and require breaking changes over an edition. They mostly agree on what's confusing in today's match ergonomics. The proposals differ in execution. In this section, we will compare in more detail this proposal against RFC3627 and its variants. Note that contrary to RFC3627, the current proposal doesn't really have degrees of freedom; the simplicity of the formalism forces a single option. The degrees of freedom are the extra hard errors we could add as well as RFC3627's rule 3, which we discuss below. These are listed in the "unresolved questions" section. To go more into detail, apart from RFC3627's rule 3 which has its own section below, the differences between the various proposals all lie in two areas: 1. how to handle binding modifiers (i.e. `mut`, `ref` and `ref mut`) in the presence of inherited references (aka non-`move` binding mode); 2. how to handle reference types in the presence of inherited references (aka non-`move` binding mode). The fact that these are the only variables at play was discovered by formalizing all proposals in [the tool](https://github.com/Nadrieril/typing-rust-patterns). Each of these alternatives corresponds to an option that can be set in the tool. The real crux of the difference between the proposals is what "type" they care about. RFC3627 inspects the "in-memory type" i.e. the type of the place that is being matched on. This proposal inspects the "user-visible type" i.e. the type that would be obtained if the current pattern was a binding. #### `&p` on `&mut T` All the proposals agree on this: it is desireable for `&p` to match on `&mut T` as if it was `&T`. This is only mentioned for completeness and future reference. The tool calls this option `allow_ref_pat_on_ref_mut`. #### `mut` binding modifier Take the case of `let [mut x]: &[T] = ...;`. Without the `mut`, all proposals agree that `x: &T`. With the `mut`, proposals differ in how to treat the inherited reference aka non-`move` binding mode. There are only three options: 1. (stable rust) reset the binding mode, i.e. dereference the inherited reference. This means `x: T`; 2. raise an error; 3. simply make the `x: &T` binding mutable. Option 3 seems unanimously desireable. Its drawback is that it prevents the desugaring of such patterns into patterns that don't use match ergonomics, because that would require a new binding modifier (that we could call `mut ref`). This is why RFC3627 chose option 2. Both 2 and 3 are deemed acceptable in the frame of the current proposal; 1 isn't. These three alternatives can be set with `set mut_binding_on_inherited` in the tool. #### `ref`/`ref mut` binding modifiers Take the case of `let [ref x]: &[T] = ...;`. Without the `ref`, all proposals agree that `x: &T`. With the `ref`, proposals differ in how to treat the inherited reference aka non-`move` binding mode. There are only three options: 1. reset the binding mode, i.e. dereference the inherited reference first. This means `x: &T`; 2. raise an error; 3. simply borrow the `&T` value. This means `x: &&T` and requires creating a temporary to store the `&T` value. Stable rust and RFC3627 (including all its variants) agree on option 1. The typing-rules framework disagrees as option 1 requires tracking the binding mode. Option 3 is probably what most rust users would expect at first. The drawback of option 3 is that it requires an invisible temporary. Both 2 and 3 are deemed acceptable in the frame of the current proposal. These three alternatives can be set with `set ref_binding_on_inherited` in the tool. #### Dereferencing inherited references This is the most subtle part. The setup is as follows: a reference pattern `&{mut}p` is matched against a doubly-referenced type `&{mut}&{mut}T` where the outer reference is inherited (i.e. came from an application of one of the `ConstructorRef` rules). In the framework of RFC3627, this is the same as a `&{mut}p` pattern matched against a reference type `&{mut}T` with a `ref` or `ref mut` binding mode. There are two levels of reference to consider: the inherited (aka outer aka binding mode) one, and the "real" (aka inner) one. Proposals differ on which should be considered first. The tool calls this option `inherited_ref_on_ref`. The possible values are: - `EatOuter`: match the pattern against the outer reference only; - `EatInner`: match the pattern against the inner reference first; if they match then remove the inner one and keep the outer one; - `EatBoth`: match the pattern against the inner reference; if they match then remove both. For example: ```rust! let [&mut x]: &mut [&T] = ...; // EatOuter: `x: &T` // EatInner: mutability mismatch // EatBoth: mutability mismatch let [&mut x]: &[&mut T] = ...; // EatOuter: mutability mismatch // EatInner: `x: &T` // EatBoth: `x: T` ``` Separately, in case of a mutability mismatch, some proposals propose to try again in `EatOuter` mode. The tool represents this with a boolean option `fallback_to_outer`. This is not relevant if the base option is already `EatOuter` of course. For example: ```rust! let [&mut x]: &mut [&T] = ...; // EatOuter: `x: &T` // EatInner + !fallback_to_outer: mutability mismatch // EatInner + fallback_to_outer: `x: &T` // EatBoth + !fallback_to_outer: mutability mismatch // EatBoth + fallback_to_outer: `x: &T` let [&mut x]: &[&mut T] = ...; // EatOuter: mutability mismatch // EatInner: `x: &T` (no mutability mismatch so fallback_to_outer is irrelevant) // EatBoth: `x: T` (no mutability mismatch so fallback_to_outer is irrelevant) ``` - Stable rust implements `EatBoth + !fallback_to_outer`; - The current proposal implements `EatOuter`, as the other options require tracking the binding mode; - RFC3627 implements `EatInner + fallback_to_outer` - The subset of RFC3627 made of rules 1 and 2 only implements `EatInner + !fallback_to_outer`; - The behavior proposed by RFC3627 for pre-2024 editions implements `EatBoth + fallback_to_outer`. In principle `EatInner + fallback_to_outer` should allow more cases than `EatOuter` alone, but combined with `allow_ref_pat_on_ref_mut` this is not always the case, and either way this leads to surprising behaviors such as the ones noted in the introduction: - `mut x` is disallowed inconsistently: ```rust! let [&(mut x)]: &[&T] = ...; // EatInner: error because `EatInner` eats the "real" reference and leaves the inherited one, which triggers the `mut_binding_on_inherited` case discussed above. // EatOuter: we get `mut x: &T` with a non-inherited reference, which is valid regardless of `mut_binding_on_inherited`. ``` - multiple references are sometimes matched in the opposite order than one would expect: ```rust! let [&mut &x]: &[&mut T] = ...; // EatInner: the inner one is eaten first, hence this is ok and `x: T` // EatOuter: mutability mismatch let [&&mut x]: &[&mut T] = ...; // EatInner + !fallback_to_outer: mutability mismatch // EatInner + fallback_to_outer + !allow_ref_pat_on_ref_mut: `x: T` // EatInner + fallback_to_outer + allow_ref_pat_on_ref_mut (i.e. RFC3627): mutability mismatch because the `&` pattern eats the inner `&mut` type, which leaves `&mut x: &T`. // EatOuter: `x: T` ``` - mutability can affect the type of patterns in inconsistent ways: ```rust! let [&mut (ref x)]: &mut [&mut T] = ...; // RFC3627: `x: &T` because the `&mut` pattern matches the inner `&mut`, and the outer one is then reset by `ref x`. let [&mut (ref x)]: &mut [& T] = ...; // RFC3627: `x: &&T` because the `&mut` pattern matches the outer `&mut`, and the `ref x` then borrows the `&T`. ``` #### RFC3627's rule 3 RFC 3627's rule 3 says (in our frame): "If we've previously matched against a shared reference in the scrutinee, downgrade any inherited `&mut` to `&`". In the tool, this option is named `downgrade_mut_inside_shared`. This ensures that a `&mut` pattern is only allowed if mutable access to the value is indeed possible (at least as far as match ergonomics are concerned). E.g.: ```rust! let &[[x]]: &[&mut [T]] = ...; // with !downgrade_mut_inside_shared: `x: &mut T`, borrow-check error // with downgrade_mut_inside_shared: `x: &T` ``` This rule is somewhat orthogonal to the current proposal. It is not included because it requires hidden state; it is however compatible and could be added without making the mental model much more complicated. This option is not backwards-compatible either way: `&[&mut x]: &&mut [T]` is allowed without but becomes an error with it, and conversely the example above is allowed with it but is an error without. ## Unresolved questions - Should `Some(mut x): &Option<T>` be banned because it cannot be desugared? - Should `Some(ref x): &Option<T>` be banned because of the invisible temporary? - Should we add RFC3627's rule 3 "If we've previously matched against a shared reference in the scrutinee, downgrade any inherited `&mut` to `&`"?