owned this note
owned this note
Published
Linked with GitHub
# Unsafe Fields Design Document
#### Helpful Links
- [RFC 3458](https://github.com/rust-lang/rfcs/pull/3458)
- [Zulip Topic](https://rust-lang.zulipchat.com/#narrow/channel/213817-t-lang/topic/unsafe.20fields.20RFC)
- [Experimental Implementation](https://github.com/veluca93/rust/tree/unsafe-fields)
## Overview
This RFC proposes extending Rust's tooling support for safety hygiene to named fields that carry library safety invariants. Consequently, Rust programmers will be able to use the `unsafe` keyword to denote when a named field carries a library safety invariant; e.g.:
```rust
struct UnalignedRef<'a, T> {
/// # Safety
///
/// `ptr` is a shared reference to a valid-but-unaligned instance of `T`.
unsafe ptr: *const T,
_lifetime: PhantomData<&'a T>,
}
```
Rust will enforce that potentially-invalidating uses of `unsafe` fields only occur in the context of an `unsafe` block, and Clippy's [`missing_safety_doc`] lint will check that `unsafe` fields have accompanying safety documentation.
[`missing_safety_doc`]: https://rust-lang.github.io/rust-clippy/master/index.html#missing_safety_doc
## Status Quo and Motivations
Safety hygiene is the practice of denoting and documenting where memory safety obligations arise and where they are discharged. Rust provides some tooling support for this practice. For example, if a function has safety obligations that must be discharged by its callers, that function *should* be marked `unsafe` and documentation about its invariants *should* be provided (this is optionally enforced by Clippy via the [missing_safety_doc](https://rust-lang.github.io/rust-clippy/master/index.html#missing_safety_doc) lint). Consumers, then, *must* use the `unsafe` keyword to call it (this is enforced by rustc), and *should* explain why its safety obligations are discharged (again, optionally enforced by Clippy).
Functions are often marked `unsafe` because they concern the safety invariants of fields. For example, [`Vec::set_len`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set_len) is `unsafe`, because it directly manipulates its `Vec`'s length field, which carries the invariants that it is less than the capacity of the `Vec` and that all elements in the `Vec<T>` between 0 and `len` are valid `T`. It is critical that these invariants are upheld; if they are violated invoking many of `Vec`'s other, safe methods induces undefined behavior.
To help ensure these invariants are upheld, programmers may apply safety hygiene techniques to fields, denoting when they carry invariants and documenting why their uses satisfy their invariants. For example, the `zerocopy` crate maintains the policy that fields with safety invariants have `# Safety` documentation, and that uses of those fields occur in the lexical context of an `unsafe` block with a suitable `// SAFETY` comment.
Unfortunately, Rust does not yet provide tooling for this practice declaring, discharging, or documenting the safety invariants of fields. Since the `unsafe` keyword cannot be applied to field definitions, Rust cannot enforce that potentially-invalidating uses of fields occur in the context of `unsafe` blocks, and thus Clippy cannot enforce that safety comments are present either at definition or use sites. This RFC is motivated by the benefits of closing this tooling gap.
### Motivation: Improving Field Safety Hygiene
The absence of the safety tooling support for fields makes practice of good field safety hygiene entirely a matter of programmer discipline, and, consequently, the practice of good field safety hygiene is nascent.
Rust's visibility mechanisms can, to some extent, be (ab)used to help enforce good field safety hygiene. For example, zerocopy's [`Ptr` type is defined in a private `def` module](https://docs.rs/zerocopy/0.8.14/src/zerocopy/pointer/ptr.rs.html#13-121), which solely contains the datatype definition and an impl containing `pub(super)` `unsafe` constructors, getters and setters. All other `impl`s of `Ptr` are defined outside of this module and therefore must mediate their access to `Ptr`'s private fields through these unsafe functions. This roundabout approach poses significant linguistic friction and may be untenable when split borrows are required. Consequently, this approach is uncommon in the Rust ecosystem.
We hope that less friction and better tooling will make good field safety hygiene more common in the Rust ecosystem.
### Motivation: Improving Function Safety Hygiene
Rust's safety tooling ensures that `unsafe` operations may only occur in the lexical context of an `unsafe` block or function. If the safety obligations of an operation cannot be discharged entirely by an `unsafe` block, then the surrounding function must, itself, be `unsafe`. This tooling cue nudges programmers towards good function hygiene.
But, presently, it has a shortcoming: dangerous field uses are not linted against. The unsafe `Vec::set_len` method, for example, contains entirely safe code. There is no tooling cue that suggests this function should be unsafe — only programmer knowledge. Extending safety tooling to fields will close this gap.
### Motivation: Making Rust Easier to Audit
To evaluate the soundness of `unsafe` code (i.e., code which relies on safety invariants being upheld), it is not enough for reviewers to check the contents of `unsafe` blocks — they must check *all* places (including safe contexts) in which safety invariants might be violated. (See [*The Scope of Unsafe*].) This is, in large part, because safety tooling does not extend to fields. Consequently, safety invariants may be violated at-a-distance in safe code, and safety audits must therefore carefully consider distant safe code.
Crates that practice good field safety hygiene will be easier to review. While reviewers must still ensure that fields which carry safety invariants are actually marked `unsafe`, having done so, they may largely limit their review to `unsafe` code and (in the absence of unsafe local bindings) safe code in the same function.
[*The Scope of Unsafe*]: https://www.ralfj.de/blog/2016/01/09/the-scope-of-unsafe.html
### Other Benefits
- Unsafe fields provide the libs team a knob with which to revisit the (un)safety of traits without breaking users, since trait can be made conditionally safe to implement, depending on whether the target type has unsafe fields. This RFC proposes that `Copy` gets this treatment; `Unpin` and `UnwindSafe` may also be a compelling candidates.
## Design Tenets
The design of `unsafe` fields is guided by three tenets:
1. [**Unsafe Fields Denote Safety Invariants**](#1-Unsafe-Fields-Denote-Safety-Invariants)
A field *should* be marked `unsafe` if it carries arbitrary library safety invariants with respect to its enclosing type.
2. [**Unsafe Usage is Always Unsafe**](#2-Unsafe-Usage-is-Always-Unsafe)
Uses of `unsafe` fields which could violate their invariants *must* occur in the scope of an `unsafe` block.
3. [**Safe Usage is Usually Safe**](#3-Safe-Usage-is-Usually-Safe)
Uses of `unsafe` fields which cannot violate their invariants *should not* require an unsafe block.
## Glossary
- **safety invariant**
A *safety invariant* is a boolean statement about the state of the computer at time *t*.
- **language safety invariant**
A *language safety invariant* is a safety invariant guaranteed by the language such that the compiler may reason about it. Language safety invariants must always hold. For example, a `NonZeroU8` must **never** be `0`.
- **library safety invariant**
A *library safety invariant* is a safety invariant assumed to be true by an API. For example, `str` encapsulates valid UTF-8 bytes, and much of its API assumes this to be true. However, this invariant may be temporarily violated, so long as no code that assumes this safety invariant holds is invoked.
- **implicit constructor**
The implicit constructor of a Rust type is the one that's provided by Rust upon defining a type. For example, defining `struct Foo(u8)` implicitly introduces a function named `Foo` which consumes a `u8` and produces a `Foo`.
## 1. Unsafe Fields Denote Safety Invariants
> A field *should* be marked `unsafe` if it carries library safety invariants with respect to its enclosing type.
### Rationale
This purpose is consistent with the purpose of the `unsafe` keyword in other declaration positions, where it signals to consumers of the `unsafe` item that their consumption is conditional on upholding safety invariants; for example:
- An `unsafe` trait denotes that it carries safety invariants which must be upheld by implementors.
- An `unsafe` function denotes that it carries safety invariants which must be upheld by callers.
#### SHOULD vs MUST
A field carrying safety invariants *should* — not *must* — be marked `unsafe`.
We cannot programatically enforce that fields which carry safety invariants are marked `unsafe`, just as we cannot enforce that functions with safety invariants are marked unsafe. The use of `unsafe` in declaration position is a social contract.
We also cannot immediately change Rust's social contract, since doing so would mean that code which is currently compliant with Rust's social contract (which does not and cannot require that `unsafe` fields are marked with `unsafe`) would cease to be compliant. At best, we may be able to evolve Rust's social contract over an edition boundary.
### Example: Field with Local Invariant
In the simplest case, a field's safety invariant is a restriction of the invariants imposed by the field type, and concern only the immediate value of the field; e.g.:
```rust
struct Alignment {
/// SAFETY: `pow` must be between 0 and 29.
pub unsafe pow: u8,
}
```
### Example: Field with Referent Invariant
A field might carry an invariant with respect to its referent; e.g.:
```rust
struct CacheArcCount<T> {
/// SAFETY: This `Arc`'s `ref_count` must equal the
/// value of the `ref_count` field.
unsafe arc: Arc<T>,
/// SAFETY: See [`CacheArcCount::arc`].
unsafe ref_count: usize,
}
```
### Example: Field with External Invariant
A field might carry an invariant with respect to data outside of the Rust abstract machine.
```rust
struct Zeroator {
/// SAFETY: The fd points to a uniquely-owned file,
/// and the bytes from the start of the file to the
/// offset `cursor` (exclusive) are zero.
unsafe fd: OwnedFd,
/// SAFETY: See [`Zeroator::fd`].
unsafe cursor: usize,
}
```
### Example: Field with Suspended Invariant
A field safety invariant might also be a *relaxation* of the safety invariants imposed by the field type. For example, a `str` is bound by both the language safety invariant that it is initialized bytes, and by the library safety invariant invariant that it contains valid UTF-8. It is sound to temporarily violate the library invariant of `str`, so long as the invalid `str` is not exposed to code that might assume `str` validity.
Below, `MaybeInvalidStr` encapsulates an initialized-but-potentially-invalid `str` as an unsafe field:
```rust
struct MyabeInvalidStr<'a> {
/// SAFETY: `maybe_invalid` may not contain valid
/// UTF-8. It MUST always contain initialized
/// bytes (per language safety invariant on `str`).
pub unsafe maybe_invalid: &'a str
}
```
### Counter-Example: Unsafety is Orthogonal to Privacy
Field unsafety is orthogonal to field visibility. An `unsafe` field may be `pub`, just as a safe field may be `pub(self)`; e.g.:
```rust
struct NuclearBriefcase {
/// Do not expose carelessly!
pub(self) launch_code: [u8; 32]
}
```
## 2. Unsafe Usage is Always Unsafe
> Uses of `unsafe` fields which could violate their invariants *must* occur in the scope of an `unsafe` block.
### Rationale
This requirement is consistent with the requirements of the `unsafe` keyword when applied to other declarations; for example:
- An `unsafe` trait may only be implemented with an `unsafe impl`.
- An `unsafe` function is only callable in the scope of an `unsafe` block.
These requirements are not negotiable; likewise, the requirement that risky operations on `unsafe` fields require `unsafe` should also be non-negotiable.
### Implications
#### Implicit Constructors are Unsafe
The implicit constructor of a struct or enum variant with an `unsafe` field must require `unsafe`. Writing, referencing or reading an `unsafe` field must require `unsafe`.
#### Reading Unsafe Fields is Unsafe
An unsafe field with a suspended invariant can only be read from its enclosing type if the reader respects that the value might be in an invalid state. This amounts to a safety invariant: if the value is in an invalid state, subsequent (potentially safe) uses must not require that it is in a valid state.
Consequently, reading an unsafe field must require unsafe.
#### Dropping an Unsafe Field is Unsafe
A field with both:
1. a non-trivial drop
2. suspended invariant
may not be sound to drop, as its `Drop` impl may depend on the value being in a valid state. Consequently, `unsafe` fields must either be `Copy` or `ManuallyDrop`, both of which preclude non-trivial drops.
## 3. Safe Usage is Usually Safe
> Uses of `unsafe` fields which cannot violate their invariants *should not* require an unsafe block.
Given that the use of `unsafe` on fields is a social contract, adherence to that social contract will depend on the UX of *using* `unsafe` fields. We should take care to minimize how often users will be prompted to use `unsafe` for field accesses that *clearly* cannot violate the field's safety invariant.
### Variants with Safe Fields
Given an enum whose variants contain a mix of safe and unsafe fields; e.g.:
```rust
enum Example {
Safe(u8, u8, u8),
Unsafe(u8, unsafe u8, u8),
}
```
It should be safe to initialize and destruture `Example::Safe`, but not `Example::Unsafe`.
### Fields with Local, Non-Suspended Invariants
Fields with local, non-suspended invariants are potentially always safe to read. For example, consider reading out the field `pow` from `Alignment`:
```rust
struct Alignment {
/// SAFETY: `pow` must be between 0 and 29.
pub unsafe pow: u8,
}
```
Outside of the context of `Alignment`, `u8` has no special meaning. It has no library safety invariants (and thus no library safety invariants that might be suspended by the field `pow`), and it is not a pointer or handle to another resource.
The set of safe-to-read types, $S$, includes:
- primitive numeric types
- public, compound types with public constructors whose members are in $S$.
A type-directed analysis could make reads of these field types safe.
## Design: Syntax
The `unsafe` modifier is applicable to the fields of struct-like variants; e.g.:
```rust
struct ExampleStruct {
a: u8,
unsafe b: u8,
}
struct ExampleEnum {
UnitLike,
TupleLike(u8, u8),
StructLike {
a: u8,
unsafe b: u8,
},
}
struct ExampleUnion {
a: u8,
unsafe b: u8,
}
```
It is not applicable tuple-like variants, as this would admit ambiguous parses; e.g.:
```rust
struct ExampleAmbiguous(unsafe fn())
```
## Design: Interaction with `Copy`
The `Copy` trait is, semantically, an `unsafe` trait whose safety contract is that all members must be `Copy`. However, it is not marked `unsafe` since the compiler enforces this condition automatically on all implementations.
The introduction of unsafe fields creates a declaration-site `unsafe` obligation — namely, that reading is unsafe — that would not be discharged by use-site in a `Copy` impl (which has no methods that would mention the unsafe fields).
To resolve this, we make `Copy` *conditionally* (un)safe: If `Self` contains `unsafe` fields, `Copy` is `unsafe` to implement; otherwise it remains safe to implement.
## Design: Interaction with Unsafe Auto Traits
If a type has unsafe fields, its safety invariants are not simply the conjunction of its field types' safety invariants. Consequently, it's invalid to reason about the safety properties of these types in a purely structural manner — i.e., the manner in which auto traits are implemented. Consequently, auto implementations of unsafe auto traits should not be generated for types with unsafe fields.