# Validity invariants and representations
Present: Mario, Joshua (remote), Jacob, Ralf, Nadri, Lukas-code, saethlin, amanieu, bee, CAD97, Jacob Degen, Jack Wrenn (remote)
*Context (joshlf, jswrenn): [Zerocopy :: Rust All Hands / Opsem](https://hackmd.io/1hlXXK82TMqCPHvXm5Cu5w)*
Open questions:
- ptr to int transmute
- validity of refs (how recursive)
- unions (what exactly is being preserved, bag of bytes/something clever about which bytes are nonnull/initialized/have(n't) provenance)
- ManuallyDrop of Box after it's been dropped, also ManuallyDrop+MaybeDangling
-Jak `repr(C)` enums? some ppl might want a weaker validity invariant for these?
- (back and forth about what exactly we're talking about)
- Box noalias? (at the end, if time)
- Partially initialized enums in `match` and `std:mem::discriminant`
Jakob: let's order these by how much progress we can make
## ptr to int transmute
Josh: there are two usecases in my head. the broader usecase is that banning it makes our ontology very complicated. we (zerocopy) want our operations to be broadly uniform. having to carve out ptr where the mapping to bytes isn't bidirectional is harder. Allow it is a footgun for sure, but makes the API easier to express.
Josh: number two, there are niche usecases where we want to look at the bytes, e.g. check if a `Option<&T>` is non-null.
Ralf: second usecase is compelling, you're reimplementing what the compiler does for readdiscirminant. for the first point, we can't fix footguns by making them UB.
Ralf: one qusestion is whether LLVM would be mad at us for leading a ptr at integer type, but actually we do that in rustc for readdiscriminant of `Option<&T>` so it better work
Ralf: given that, feels like a no brainer?
Jake: what was the theoretical issue?
Ralf: if we don't do this then we don't have provenance monoticity everywhere and we also don't need it, and vice-versa.
Jake: There's no reason to believe that provenance monotonicity is not workable in the rest of the language?
Ralf: what this can't do is behave like a ptr to int cast. don't think there was a deeper theoretical issue. also we're comitting to provenance monotonicity across all the semantics. and we're not sure what LLVM does
saethlin: I think LLVM would want to do this too
Ralf: there's a C proposal but it doesn't seem sane to implement. every memcopy could expose
Ralf: FCP? we also need lang. volunteer to write up the FCP?
Jakob: what should go in there?
- manual null check
- provenance monotonicity
- memcmp on data with pointers
saethlin: actually why the nullcheck thing? why not load as a ptr and do the null check?
Josh: we don't want to load at `Option<&T>` until we know that's sound, can't remember why other solutions than loading as bytes didn't work. The other option that comes to mind is doing `[u8; N] -> Option<NonNull<T>>` and then using safe code to check for `Non` since `Option<NonNull<T>>` and `Option<&T>` have the same representation. I don't remember why that didn't work.
Mario: is it UB if you guess the layout of something?
Jakob: the safety precondition of ReadDiscriminant is an open question
Ralf: in rustc we always load as integer type for readdiscriminant.
Jakob: seems reasonable to support that
Ralf: also memcmp on pointers, want to allow that
saethlin: the Niche struct in rustc ABI has support for ptr/float niches?
Ralf: nah it's always int. there's a virtual field for the discirminant, which has a scalar type, but it actually can only be int
saethlin: ok, how do we explain lang that we're reoving this UB
Ralf: no one seems to want this
jakob: if we do this, we should commit that we prefer provenance monotonicity (it's part of why we want this)
Ralf: memcmp is an argument for why
Josh: I remember the issue now. it was again about API consistency that it was easier to go through bytes than special case the `Option<ptr>` case
## Unions
Ralf: what's the design space?
jakob: are we talking about maybeuninit ABI?
Ralf: no
Ralf: unions currently have padding, e.g. `repr(C)` with a single field which is a struct with padding (we don't have a choice by the C ABI)
jakob: should we maybe talk about `repr(C)` separately?
Ralf: we support `repr(transparent)` on unions but it's unstable and weird
Amanieu: if you have two variants in the union, is there guarantee that one of the variants is valid?
Ralf: no, see classic example of `(u8, bool) | (bool, u8)` that contains `(42, 42)`. can be written in safe code.
Amanieu: does that mean that MaybeUninit could have a single field?
Ralf: as per what I have in MiniRust, yes
Josh: we want to ensure that the set of union value constructible from safe code is the set of union values that we can expect to see.
Jakob: that's safety invariants though
Ralf: yep, these two don't have to be the same
Josh: it would be weird if the language and library invariants are different.
Mario: safely constructible union values are quite weird!
Jakob: I don't see the issue in having them separate
Ralf: for structs, safety invariant is easy. union is harder because we can't read from them in safe code. so we have some design space.
Josh: to clarify, there's a doc. short version: if the invariants differ, we basically have to say that there's a default library invariant that holds if you don't specify anything further but you can relax it? nothing is like this in the language I think.
jakob: `str` is like that
Ralf: (other examples), this came up in the `unsafe` fields RFC discussion
Mario: if you want a safety invariant, you add restrictions over the default bag-of-bytes invariant. we could say that the safety invariant is bag-of-bytes (and validity too).
jakob: we likely want the invariants to be the same. that's significantly different from what you can safely construct
Josh: that would be the only case where the language-specified safety invariant is different from the validity invariant. for `str`, it's only in comments. ppl will be confused (?? Nadri: couldn't follow the argument)
CAD97: this is also about what a safe derive can assume just from knowing it's a union. safe derive can't read your docs. e.g. might know that the bytes are always fully initialized in safe code
Ralf: a library can be correct even if the invariant isn't documented.library authors can shoose stronger invariants even by accident.
jakob: this entire problem doesn't exist if your union isn't in a public API. it's only public unions with public fields where the safety invariant matters. if you want a safety invariant, make a wrapper struct and an API
Ralf: there is a crate with a bunch of public unions: libc.
Ralf: what are actual usecases where we want a stronger validity invariant? scott wanted `noundef` on a union. also safe transmute?
Jack: maybe we can use attributes. hard for me to imagine any safe union access with the current design. maybe over an edition. See https://jack.wrenn.fyi/blog/safety-hygiene/#safe-unions
Ralf: my thinking is that if you want safe union access you need an opt-in like an attribute. maybe disallow writing to parts of a field or reject the union entirely. would also inform safety and possibly validity invariants. better than trying to guess invariants for existing unions
Josh: I'd like to hear why we want union validity to be MaybeUninit. if we decide the defaut lib invariant is maybeuninit, there's an argument that users of your API can deinit private fields of your union
Ralf: no, you can't assume the safety invariant
Jakob: maybe if a union has some pub and some priv fields.
### (Later, in the 2nd afternoon meeting)
Josh: Usecases for derives on unions making assumptions about safety invariant:
`IntoBytes` = it's legal to convert this into a `u8` array, i.e., no padding or similar.
Want: `derive(IntoBytes)` on a union should make the union `IntoBytes` if all fields are, and the union doesn't add any extra padding.
RJ: That's a bit like Scott wanting `noundef` on some unions.
Josh: ... except that Scott's case is an opsem question, we just need the safety invariant.
MC: We could say that `derive(...)` declares that the safety invariant is XYZ. That would be unsafe to declare then.
Josh: We're trying to move to `unsafe` being a code smell. We deal with many programmers that are new to Rust.
RJ: The attribute would not have to be unsafe.
RJ: Should we try to move to a world where subfield writes are unsafe?
Discussion: Has to be over an edition.
Josh: That seems worthwhile.
CAD: Agreed. Safe transmute could help recover some of the safety.
MC: We could have "safety invariant selector" attributes -- do not fix a default safety invariant. Communicate with an attribute what the safety invariant for that type is. This would have values including "whatever we do today" and "one variant", and we could change the default over an edition.
CAD: The old semantics could still be expressed by just adding a unit variant.
At least from the opsem perspective.
MC: I was thinking of the attribute as just for the safety. What do we get out of also using it for validity?
RJ: `noundef`. Niches.