owned this note
owned this note
Published
Linked with GitHub
# Enum Reform
[toc]
## Topic
How does the `arbitrary_enum_discriminant` (abbreviated `a_e_d`) feature fit into the plan for Rust enums going forward?
To answer that question, we should check that we are aligned on what *is* the plan for Rust enums going forward.
## Motivation
The original design of enums presents both semantic footguns and stability hazards:
- discriminants serve two purposes: ergonomic logical integer conversions (`as`-casting), and memory layout
- unfortunately, these are usually (but not always) coupled
- the implicit assignment of discriminants poses a stability hazard (and sometimes safety hazard!) on fieldless enums
- as-casting is error prone
- nothing prevents accidental truncation
```rust=
enum Enum {
Variant = -1,
}
println!("{}", Enum::Variant as usize);
```
- other surprises:
```rust=
enum Fieldless {
Unit,
Tuple(),
Struct{},
}
println!("{}", Fieldless::Unit as u8);
println!("{}", Fieldless::Tuple as u8); // surprise!
println!("{}", Fieldless::Struct{} as u8);
```
## Goals
Shiny future:
- decouple ergonomic logical integer conversions from memory layout control
- reduce the stability hazard posed by implict discrimiants
- move away from as-casting enums
Constraints:
- don't break backcompat
We can get to this shiny future incrementally. Initial improvements can be achieved in Rust 2021. Fully achieving the above goals requires an edition.
### Rust 2021 Vision
Abstractly:
- encourage explicit discriminants on enums
- discourage `as`-casting enums in favor of a less error-prone alternative
Concretely:
- permit explicit discriminants on `repr(C/primitive)` enums
- ~~(possibly) permit explicit discriminants on arbitrary default-repr enums~~
- provide a trait, `convert::legacy::As` that:
- looks like `trait As<T> { fn r#as(self) -> T; }`
- is automatically implemented for all enums that are as-castable in 2021
- cannot be implemented by users
- (this is similar to `trait AsRepr` proposed in Rust [PR 81642](https://github.com/rust-lang/rust/pull/81642); but `AsRepr` is only auto-implemented for enums with explicit reprs where all variants are unit-like)
- separately, provide a robust, ergonomic way to declare enum↔integer conversions
- e.g., `derive(convert::IntoTag<usize>)` on type def, `#[tag = 42]` on variants
- **this can be prototyped as a crate first**
- separately, provide a safe way to get the discriminant value from enums that have well-defined layouts
### Rust 2024 Vision
Abstractly:
- finish migrating away from as-casting
- decouple the logical and layout roles of explicit discriminants
Concretely:
- in all editions, forbid `as`-casting enums defined in Rust 2024
- in Rust 2024, forbid `as`-casting all enums
- `cargo fix --edition` will rewrite `expr as ty` to `<_ as convert::legacy::As<ty>>::r#as(expr)`
- forbid explicit discriminants on default-repr enums
- `cargo fix --edition` will add `derive(convert::legacy::As)` and annotate variants with `#[legacy_as = value]`
- forbid implicit discriminants on `repr(C/integer)` enums
- `cargo fix --edition` will make discriminants explicit
## Stabilizing `arbitrary_enum_discriminant`
henceforth: '`a_e_d`', for short.
### General Background
#### What's `as`-castable?
In Rust, fieldless enums have long been as-castable; e.g. ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=261765a57676eda3290dfd2e48525fd2)):
```rust=
enum Fieldless {
Unit,
Tuple(),
Struct{},
}
println!("{}", Fieldless::Unit as u8);
println!("{}", Fieldless::Tuple() as u8);
println!("{}", Fieldless::Struct{} as u8);
```
#### When can you set explicit discriminants?
In Rust, it has long been possible to set explicit discriminants on enums where every variant is unit-like. You can do this for both enums with and without the `#[repr]` attribute ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=737269bd95cf21cc7c1b5ebba7995719)):
```rust=
enum Digits {
One = 1,
Two,
Three,
Zero = 0,
}
println!("{}", Digits::Zero as u8);
println!("{}", Digits::One as u8);
println!("{}", Digits::Two as u8);
println!("{}", Digits::Three as u8);
#[repr(u8)]
enum Units {
Kilo = 'k' as u8,
Mega = 'm' as u8,
Giga = 'g' as u8,
}
println!("{}", Units::Kilo as u8 as char);
println!("{}", Units::Mega as u8 as char);
println!("{}", Units::Giga as u8 as char);
```
### What does `a_e_d` permit?
With this feature, we permit setting explicit discriminants on enums *with* a payload (e.g. fields), so long as they have an explicit `repr(C/integer)` attribute (aka "`C` or primitive repr"). For example:
```rust=
#![feature(arbitrary_enum_discriminant)]
#[repr(u8)]
enum WithFields {
Tuple(i32) = 1,
Struct {f: i32} = 3,
Unit = 5,
}
```
### Closed Questions
#### Should we extend `a_e_d` to repr(Rust) enums, too?
Answer: Not yet!
Let us make the question more concrete. Given that:
- there can be good reasons for *not* declaring an explicit `repr`
- implicit discriminants are a footgun
...it seems valuable to extend `a_e_d` to default-repr enums too! E.g., we'd permit this:
```rust=
#![feature(arbitrary_enum_discriminant)]
// no repr!
enum WithFields {
Tuple(i32) = 1,
Struct{f: i32} = 3,
Unit = 5,
}
```
However, the impact of setting such discriminants is currently *not* soundly observable. `WithFields` is *not* as-castable. And, since the layout of this type is unspecified, it is not sound to use unsafe code to try to extract the tag byte. (Because using `unsafe` code to extract the type tag on `#[repr(rust)]` types is tightly coupled to whatever the compiler's code generation strategy is, and that strategy is unspecified.)
**Conclusion:** We should not provide a stable capability to set explicit discriminants on such enums until a sound method to observe these vaues exists.
### Open Questions
#### Should we take this opportunity to futher restrict `as`-casting?
Consider this enum, whose memory layout is well-defined and whose values are already (pre-`feature(a_e_d)`) as-castable:
```rust=
#[repr(u8)]
enum Fieldless {
Unit,
Tuple(),
Struct{},
}
```
However, as with all enums with implicit discriminants, the result of as-casting `Fieldless` is sensitive to the declaration order and number of variants. This is, potentially, a safety hazard. (I say *safety* hazard, because the primary motivation for setting an explicit repr is for FFI/unsafe code.)
With `feature(a_e_d)`, the programmer can reduce this risk by supplying explicit discriminants:
```rust=
#![feature(arbitrary_enum_discriminant)]
#[repr(u8)]
enum Fieldless {
Unit = 1,
Tuple() = 2,
Struct{} = 3,
}
```
This scenario presents us with an opportunity: We can *remove* the `as`-castability of enums like `Fieldless`: enums that are *fieldless, but non-unit-like, with an explicit C or primitive repr, that have one or more explicit discriminants*.
##### Reasons to do this:
1. As-casting is a footgun, and the less of it the better! E.g. ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=5bfd9c310d814634698029e650686dcc)):
```rust=
println!("{}", Fieldless::Unit as u8);
println!("{}", Fieldless::Tuple as u8);
println!("{}", Fieldless::Struct{} as u8);
```
See the bug? This compiles, but almost certainly does not do what the programmer intended.
- Note that the lang team has talked about this elsewhere, that we should lint against such casts of a tuple-constructor function to primitive integer, see e.g. [rust issue 81686](https://github.com/rust-lang/rust/issues/81686).
##### Reasons not to do this:
1. Adding an explicit `repr` to fieldless enums becomes a breaking change for library authors.
2. We cannot reverse this decision later without introducing a stability hazard.
3. Documentation complexity increases. Consider how we explain what can be `as`-casted:
- Status quo: "Fieldless enums can be as-casted".
- With restriction: "Enums with the default repr can be as-casted if they are fieldless. Enums with a `C` or primitive repr can be as-casted if all variants are unit-like."
##### jswrenn's recommendation
Don't forbid `as`-casting fieldless enums with explicit reprs; revert [#89234](https://github.com/rust-lang/rust/pull/89234).
> [name=Gary Guo] I think we should deprecate `as`-casting with fieldless enums instead, see [#92700](https://github.com/rust-lang/rust/pull/92700).
#### Should we permit explicit discriminants on non-unit-like enums *without* explicit reprs?
Why shouldn't we permit this:
```rust=
#![feature(arbitrary_enum_discriminant)]
// no repr!
enum Fieldless {
Unit = 1,
Tuple() = 2,
Struct{} = 3,
}
```
The [rational for disallowing](#Should-we-extend-a_e_d-to-reprRust-enums-too) `a_e_d` on default-repr enums doesn't apply to this case. This enum *can* be as-casted.
##### Reasons to do this:
- explicit discriminants are good! (see footguns mentioned in intro of this doc)
##### Reasons not to do this:
- perhaps not worth the effort? we want to move away from `as`-casting, eventually, anyways
##### jswrenn's recommendation
Allow explicit discriminants on fieldless-but-non-`repr(C)` enums; merge [#88203](https://github.com/rust-lang/rust/pull/88203). Ultimately, it's a very niche case.
<!--
- https://github.com/rust-lang/rust/issues/60553#issuecomment-984290718
-->
## References
* Make specifying repr optional for fieldless enums [PR #88203](https://github.com/rust-lang/rust/pull/88203)
* Tracking issue for RFC 2363, "Allow arbitrary enums to have explicit discriminants" [Issue 60553](https://github.com/rust-lang/rust/issues/60553)
* Automatically implement AsRepr and allow deriving FromRepr for fieldless enums [PR #81642](https://github.com/rust-lang/rust/pull/81642
## Meeting Notes
### Initial conversation
(content discussion)
What is overall motivation for #[repr(u8)] without e.g. #[repr(C)]
* unsafe code guidelines group covered this; believe outcome was that `#[repr(u8)]` alone does not tell you full ABI layout
Is group aligned that we want to eschew `as` for extracting discriminant?
* Yes
Sem. Ver. Hazards
* conflation of the discriminant/integer conversion and the actual value chosen for the disciminant tag
* e.g. reordering variants when discriminant value is inferred will cause values to change.
* adding new variant, in absence of #[repr], can unexpectedly change size of the enum.
Example of a semver hazard, this is an edit that may look harmless but isn't:
```rust
// V0
enum Foo { A, B, C }
// V1
enum Foo { A, C, B }
```
Similar to `Ord`.
Another motivation: We have more than one way to do things: both the numeric conversion traits *and* the as-casting method for enum/integer conversion. Would be nice if there were just one unified way in future.
### nikomatsakis: Do we want to permit discriminants for "default-repr" or not?
Under Rust 2021 recommendations, it says:
> (possibly) permit explicit discriminants on arbitrary default-repr enums
but under Rust 2024 it says:
> forbid explicit discriminants on default-repr enums
It feels like these are in tension, why would be permit something in Rust 2021 only to forbid it in Rust 2024? What am I missing. =)
jswrenn: I would like to push Rust 2021 towards one internally-consistent point, and Rust 2024 towards (another) internally-consistent point.
pnkfelix: that raises an interesting meta-point about goals for each edition
### josh
I'd like to understand the intended use of `convert::legacy::As` better. What cases does that work on that *don't* work with `AsRepr` but that we do want to continue supporting? Could we just *not* have `convert::legacy::As` at all?
* `convert::legacy::As` is for converting 2021 code to 2024; it provides a target abstraction for translating existing `as`-casts.
* we might not need this at all if we can translate as casts to `match`
* e.g. the migration functionality of `convert::legacy::As` could be provided by a 3rd party crate. It doesn't have to be in stdlib.
* the purpose of `AsRepr` is to convert into the integer value provided in the enum definition
* jswrenn: in that case, I'd prefer `AsRepr` to be something you derive
* joshtriplett: that's not unreasonable; there are advantages and disadvantages to opt-in vs opt-out. (But lets not bikeshed that in this meeting.)
And how do these relate to [`discriminant`](https://doc.rust-lang.org/std/mem/fn.discriminant.html)?
## scottmcm
Could we allow `as` in 2024, but only for enums defined in 2021 or earlier?
(Possible issue: does that mean a major version bump for any library changing editions if they have an enum?)
josh: What Scott said, and we could deprecate `as` in 2024 and require allowing a lint to use it.
## cramertj: Discussion of `MyEnum::TupleVariant as ...`
This cast is mentioned several times in this doc, and IMO it'd be good not to focus the design around this specific issue, as I think this problem is solved naturally by gradual moves away from `as` cast (which here are acting as function pointer conversion rather than enum descriminant extraction).
## cramertj: Use of `#[repr(u8)]` for actual representation guarantees
I've forgotten what exact guarantees we make about the representation of `#[repr(u8)]` enums. Is it guaranteed, e.g. that the descriminant will always be the first field, or that the remainder of the struct (after padding) will have the same representation as the `repr(Rust)` value that follows? e.g. are these guaranteed to have the same repr:
```rust=
#[repr(u8)]
enum MyEnum {
MyVariant(u64),
}
// is the above repr-equivalent to:
#[repr(C)]
struct MyEnumLike {
discriminant: u8,
variant: MyVariantStruct,
}
// is there a `repr(C)` here?
struct MyVariantStruct(u64);
```
If so, are all variant beginnings padded according to the largest variant?
I ask all this because the doc says:
> I say safety hazard, because the primary motivation for setting an explicit repr is for FFI/unsafe code.
scottmcm: ~~IIRC there's not necessarily a stored discriminant with *just* `repr(u8)`. That takes `repr(C, u8)` with <https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html>.~~ I was wrong. See <https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html#guide-level-explanation>.
> For field-less enums, primitive representations set the size and alignment to be the same as the primitive type of the same name. For example, a field-less enum with a u8 representation can only have discriminants between 0 and 255 inclusive.
(https://doc.rust-lang.org/nightly/reference/type-layout.html#primitive-representation-of-field-less-enums)
---
got about this far in the meeting
---
## cramertj: `repr`-compatibility with discriminated `C` unions
I'm sure this has been discussed before, but it seems like for C compatibility it would be necessary to allow specific placement of the discriminant within a structure.
I don't think we actually need this feature-- I think folks who need this can write the tag and the union types separately and explicitly. (maybe generate w/ proc macro?)
## pnkfelix: requiring explicit values
Is it universally agreed that we want to require explicit discriminant values for all repr(C)/repr(primitive) enums? I personally can imagine people who *like* the ability to specify a starting value and let the remaining variants get the successors implicitly.
josh: :+1:, I don't think we should mandate explicit values here.
## pnkfelix: fieldless vs unit-like
When I see code like `enum E { Tup() = 3, Struct {} = 4 }`, my gut hurts a little bit. Were Rust's designers past/present wrong to unify fieldless variants and unit-like variants so much? Maybe they *are* different at some level that is unappreciated in the language currently?
(rahter than phrase the above as a mere feeling, I can perhaps make arguments of the form "a unit is signalling there will be no payload, while a fieldless tuple/struct variant is indicating that potentially one will be added in the future" ... though I think in the past when I've argued that, I've hit the wall of "that is what `#[non_exhaustive]` is for!")
## nikomatsakis: wishing for a table that summarizes the options
I think I want a table that shows (a) all the various "options", like "has explicit repr", "fieldless", etc, along with a :heavy_check_mark: for those that can be as-casted.
## nikomatsakis: stability hazards
The intro mentions stability hazards, am I correct in assuming that the concern is that users may re-order the variants (or insert new variants) without realizing that this can affect downstream code?
I do share the concern, but adding explicit discriminants for every variant feels like a heavy-handed way to go about resolving that. I wonder if some kind of "opt-in" (e.g., `#[derive(IntoDiscriminant)`) would suffice. It seems like a key part of the danger is that the fact that people may rely on the ordering is automatic and not "opt-in".
## scottmcm: any downside to setting reprs?
Is there anything that gets blocked by adding `repr` attributes? For examples, does it keep enum layout optimizations from being able to remove the stored discriminant entirely if all but one variant ends up uninhabited?
(I think this is the other side of taylor's earlier question.)
### josh
What's the advantage of allowing explicit discriminants on repr(Rust) enums, rather than just requiring that if you care about discriminant values you must set a discriminant repr? Could we just move towards partitioning enums into explicit-repr and no-visible-discriminant?
### how many traits do we need
jswrenn: three!
joshtriplett: why?
jswrenn: 1. one (automatic, `core::mem`) to expose the value used in the memory layout itself, 2. one (opt-in) to expose the value used for the integer provided in the enum definition, 3. `As` trait (automatic) for backwards compat