Unsafe Fields Design Document

Overview

This RFC proposes extending Rust's tooling support for safety hygiene to named fields that carry library safety invariants. Consequently, Rust programmers will be able to use the unsafe keyword to denote when a named field carries a library safety invariant; e.g.:

struct UnalignedRef<'a, T> {
    /// # Safety
    /// 
    /// `ptr` is a shared reference to a valid-but-unaligned instance of `T`.
    unsafe ptr: *const T,
    _lifetime: PhantomData<&'a T>,
}

Rust will enforce that potentially-invalidating uses of unsafe fields only occur in the context of an unsafe block, and Clippy's missing_safety_doc lint will check that unsafe fields have accompanying safety documentation.

Status Quo and Motivations

Safety hygiene is the practice of denoting and documenting where memory safety obligations arise and where they are discharged. Rust provides some tooling support for this practice. For example, if a function has safety obligations that must be discharged by its callers, that function should be marked unsafe and documentation about its invariants should be provided (this is optionally enforced by Clippy via the missing_safety_doc lint). Consumers, then, must use the unsafe keyword to call it (this is enforced by rustc), and should explain why its safety obligations are discharged (again, optionally enforced by Clippy).

Functions are often marked unsafe because they concern the safety invariants of fields. For example, Vec::set_len is unsafe, because it directly manipulates its Vec's length field, which carries the invariants that it is less than the capacity of the Vec and that all elements in the Vec<T> between 0 and len are valid T. It is critical that these invariants are upheld; if they are violated invoking many of Vec's other, safe methods induces undefined behavior.

To help ensure these invariants are upheld, programmers may apply safety hygiene techniques to fields, denoting when they carry invariants and documenting why their uses satisfy their invariants. For example, the zerocopy crate maintains the policy that fields with safety invariants have # Safety documentation, and that uses of those fields occur in the lexical context of an unsafe block with a suitable // SAFETY comment.

Unfortunately, Rust does not yet provide tooling for this practice declaring, discharging, or documenting the safety invariants of fields. Since the unsafe keyword cannot be applied to field definitions, Rust cannot enforce that potentially-invalidating uses of fields occur in the context of unsafe blocks, and thus Clippy cannot enforce that safety comments are present either at definition or use sites. This RFC is motivated by the benefits of closing this tooling gap.

Motivation: Improving Field Safety Hygiene

The absence of the safety tooling support for fields makes practice of good field safety hygiene entirely a matter of programmer discipline, and, consequently, the practice of good field safety hygiene is nascent.

Rust's visibility mechanisms can, to some extent, be (ab)used to help enforce good field safety hygiene. For example, zerocopy's Ptr type is defined in a private def module, which solely contains the datatype definition and an impl containing pub(super) unsafe constructors, getters and setters. All other impls of Ptr are defined outside of this module and therefore must mediate their access to Ptr's private fields through these unsafe functions. This roundabout approach poses significant linguistic friction and may be untenable when split borrows are required. Consequently, this approach is uncommon in the Rust ecosystem.

We hope that less friction and better tooling will make good field safety hygiene more common in the Rust ecosystem.

Motivation: Improving Function Safety Hygiene

Rust's safety tooling ensures that unsafe operations may only occur in the lexical context of an unsafe block or function. If the safety obligations of an operation cannot be discharged entirely by an unsafe block, then the surrounding function must, itself, be unsafe. This tooling cue nudges programmers towards good function hygiene.

But, presently, it has a shortcoming: dangerous field uses are not linted against. The unsafe Vec::set_len method, for example, contains entirely safe code. There is no tooling cue that suggests this function should be unsafe — only programmer knowledge. Extending safety tooling to fields will close this gap.

Motivation: Making Rust Easier to Audit

To evaluate the soundness of unsafe code (i.e., code which relies on safety invariants being upheld), it is not enough for reviewers to check the contents of unsafe blocks — they must check all places (including safe contexts) in which safety invariants might be violated. (See The Scope of Unsafe.) This is, in large part, because safety tooling does not extend to fields. Consequently, safety invariants may be violated at-a-distance in safe code, and safety audits must therefore carefully consider distant safe code.

Crates that practice good field safety hygiene will be easier to review. While reviewers must still ensure that fields which carry safety invariants are actually marked unsafe, having done so, they may largely limit their review to unsafe code and (in the absence of unsafe local bindings) safe code in the same function.

Other Benefits

  • Unsafe fields provide the libs team a knob with which to revisit the (un)safety of traits without breaking users, since trait can be made conditionally safe to implement, depending on whether the target type has unsafe fields. This RFC proposes that Copy gets this treatment; Unpin and UnwindSafe may also be a compelling candidates.

Design Tenets

The design of unsafe fields is guided by three tenets:

  1. Unsafe Fields Denote Safety Invariants
    A field should be marked unsafe if it carries arbitrary library safety invariants with respect to its enclosing type.
  2. Unsafe Usage is Always Unsafe
    Uses of unsafe fields which could violate their invariants must occur in the scope of an unsafe block.
  3. Safe Usage is Usually Safe
    Uses of unsafe fields which cannot violate their invariants should not require an unsafe block.

Glossary

  • safety invariant
    A safety invariant is a boolean statement about the state of the computer at time t.
  • language safety invariant
    A language safety invariant is a safety invariant guaranteed by the language such that the compiler may reason about it. Language safety invariants must always hold. For example, a NonZeroU8 must never be 0.
  • library safety invariant
    A library safety invariant is a safety invariant assumed to be true by an API. For example, str encapsulates valid UTF-8 bytes, and much of its API assumes this to be true. However, this invariant may be temporarily violated, so long as no code that assumes this safety invariant holds is invoked.
  • implicit constructor
    The implicit constructor of a Rust type is the one that's provided by Rust upon defining a type. For example, defining struct Foo(u8) implicitly introduces a function named Foo which consumes a u8 and produces a Foo.

1. Unsafe Fields Denote Safety Invariants

A field should be marked unsafe if it carries library safety invariants with respect to its enclosing type.

Rationale

This purpose is consistent with the purpose of the unsafe keyword in other declaration positions, where it signals to consumers of the unsafe item that their consumption is conditional on upholding safety invariants; for example:

  • An unsafe trait denotes that it carries safety invariants which must be upheld by implementors.
  • An unsafe function denotes that it carries safety invariants which must be upheld by callers.

SHOULD vs MUST

A field carrying safety invariants should — not must — be marked unsafe.

We cannot programatically enforce that fields which carry safety invariants are marked unsafe, just as we cannot enforce that functions with safety invariants are marked unsafe. The use of unsafe in declaration position is a social contract.

We also cannot immediately change Rust's social contract, since doing so would mean that code which is currently compliant with Rust's social contract (which does not and cannot require that unsafe fields are marked with unsafe) would cease to be compliant. At best, we may be able to evolve Rust's social contract over an edition boundary.

Example: Field with Local Invariant

In the simplest case, a field's safety invariant is a restriction of the invariants imposed by the field type, and concern only the immediate value of the field; e.g.:

struct Alignment {
    /// SAFETY: `pow` must be between 0 and 29.
    pub unsafe pow: u8,
}

Example: Field with Referent Invariant

A field might carry an invariant with respect to its referent; e.g.:

struct CacheArcCount<T> {
    /// SAFETY: This `Arc`'s `ref_count` must equal the
    /// value of the `ref_count` field.
    unsafe arc: Arc<T>,
    /// SAFETY: See [`CacheArcCount::arc`].
    unsafe ref_count: usize,
}

Example: Field with External Invariant

A field might carry an invariant with respect to data outside of the Rust abstract machine.

struct Zeroator {
    /// SAFETY: The fd points to a uniquely-owned file,
    /// and the bytes from the start of the file to the 
    /// offset `cursor` (exclusive) are zero.
    unsafe fd: OwnedFd,
    /// SAFETY: See [`Zeroator::fd`].
    unsafe cursor: usize,
}

Example: Field with Suspended Invariant

A field safety invariant might also be a relaxation of the safety invariants imposed by the field type. For example, a str is bound by both the language safety invariant that it is initialized bytes, and by the library safety invariant invariant that it contains valid UTF-8. It is sound to temporarily violate the library invariant of str, so long as the invalid str is not exposed to code that might assume str validity.

Below, MaybeInvalidStr encapsulates an initialized-but-potentially-invalid str as an unsafe field:

struct MyabeInvalidStr<'a> {
    /// SAFETY: `maybe_invalid` may not contain valid
    /// UTF-8. It MUST always contain initialized
    /// bytes (per language safety invariant on `str`).
    pub unsafe maybe_invalid: &'a str
}

Counter-Example: Unsafety is Orthogonal to Privacy

Field unsafety is orthogonal to field visibility. An unsafe field may be pub, just as a safe field may be pub(self); e.g.:

struct NuclearBriefcase {
    /// Do not expose carelessly!
    pub(self) launch_code: [u8; 32]
}

2. Unsafe Usage is Always Unsafe

Uses of unsafe fields which could violate their invariants must occur in the scope of an unsafe block.

Rationale

This requirement is consistent with the requirements of the unsafe keyword when applied to other declarations; for example:

  • An unsafe trait may only be implemented with an unsafe impl.
  • An unsafe function is only callable in the scope of an unsafe block.

These requirements are not negotiable; likewise, the requirement that risky operations on unsafe fields require unsafe should also be non-negotiable.

Implications

Implicit Constructors are Unsafe

The implicit constructor of a struct or enum variant with an unsafe field must require unsafe. Writing, referencing or reading an unsafe field must require unsafe.

Reading Unsafe Fields is Unsafe

An unsafe field with a suspended invariant can only be read from its enclosing type if the reader respects that the value might be in an invalid state. This amounts to a safety invariant: if the value is in an invalid state, subsequent (potentially safe) uses must not require that it is in a valid state.

Consequently, reading an unsafe field must require unsafe.

Dropping an Unsafe Field is Unsafe

A field with both:

  1. a non-trivial drop
  2. suspended invariant

may not be sound to drop, as its Drop impl may depend on the value being in a valid state. Consequently, unsafe fields must either be Copy or ManuallyDrop, both of which preclude non-trivial drops.

3. Safe Usage is Usually Safe

Uses of unsafe fields which cannot violate their invariants should not require an unsafe block.

Given that the use of unsafe on fields is a social contract, adherence to that social contract will depend on the UX of using unsafe fields. We should take care to minimize how often users will be prompted to use unsafe for field accesses that clearly cannot violate the field's safety invariant.

Variants with Safe Fields

Given an enum whose variants contain a mix of safe and unsafe fields; e.g.:

enum Example {
    Safe(u8, u8, u8),
    Unsafe(u8, unsafe u8, u8),
}

It should be safe to initialize and destruture Example::Safe, but not Example::Unsafe.

Fields with Local, Non-Suspended Invariants

Fields with local, non-suspended invariants are potentially always safe to read. For example, consider reading out the field pow from Alignment:

struct Alignment {
    /// SAFETY: `pow` must be between 0 and 29.
    pub unsafe pow: u8,
}

Outside of the context of Alignment, u8 has no special meaning. It has no library safety invariants (and thus no library safety invariants that might be suspended by the field pow), and it is not a pointer or handle to another resource.

The set of safe-to-read types, \(S\), includes:

  • primitive numeric types
  • public, compound types with public constructors whose members are in \(S\).

A type-directed analysis could make reads of these field types safe.

Design: Syntax

The unsafe modifier is applicable to the fields of struct-like variants; e.g.:

struct ExampleStruct {
    a: u8,
    unsafe b: u8,
}

struct ExampleEnum {
    UnitLike,
    TupleLike(u8, u8),
    StructLike {
        a: u8,
        unsafe b: u8,
    },
}

struct ExampleUnion {
    a: u8,
    unsafe b: u8,
}

It is not applicable tuple-like variants, as this would admit ambiguous parses; e.g.:

struct ExampleAmbiguous(unsafe fn())

Design: Interaction with Copy

The Copy trait is, semantically, an unsafe trait whose safety contract is that all members must be Copy. However, it is not marked unsafe since the compiler enforces this condition automatically on all implementations.

The introduction of unsafe fields creates a declaration-site unsafe obligation — namely, that reading is unsafe — that would not be discharged by use-site in a Copy impl (which has no methods that would mention the unsafe fields).

To resolve this, we make Copy conditionally (un)safe: If Self contains unsafe fields, Copy is unsafe to implement; otherwise it remains safe to implement.

Design: Interaction with Unsafe Auto Traits

If a type has unsafe fields, its safety invariants are not simply the conjunction of its field types' safety invariants. Consequently, it's invalid to reason about the safety properties of these types in a purely structural manner — i.e., the manner in which auto traits are implemented. Consequently, auto implementations of unsafe auto traits should not be generated for types with unsafe fields.


unsafe(invalid)

Select a repo