Field Projection Use Cases

Overview

Field projections are a way to turn a pointer to a struct into a pointer to a field of that struct. The definition of "pointer" is rather broad: it includes any smart pointer type or reference as well as raw pointers. Essentially, the operation is adding the field offset to the pointer and then casting the pointer to the type of the field in the struct.

The main motivation for field projections is to make pin projections ergonomic. However, they also allow to succinctly express other ideas such as addr_of!((*ptr).field)/&raw (*ptr).field as ptr->field. In addition, they often come up in Rust for Linux as a way to safely wrap an existing C API.

In this document all of the main use cases for abstract field projections are explained and examples outline how they would look like. The word "abstract" is used to convey that there might be additional use cases for the specific implementation of field projections. An implementation using a compiler-internal type for every field of a struct will be explained in a future document. That approach has other potential use cases, also in the context of Rust for Linux.

This document assumes that a new operator "->" will be introduced solely for field projections. This new operator is crucial in ensuring the ergonomics of this new feature.

While of course any kind of feedback is welcome, this document is only concerned with conveying the use cases for field projection, thus giving the motivation for bringing them into the language. Discussions of implementation, edgecases and general bikeshedding should be postponed until the RFC is updated.

Throughout the examples, we will use the following struct definitions:

struct Data {
    cfg: Config,
    items: Vec<i32>,
}

struct Config {
    name: &'static str,
    port: u16,
    stats: StatsConfig,
}

struct StatsConfig {
    level: u8,
}

Simple Use Cases

There are several low hanging fruit that we can get in addition to the more complicated use cases given below. While not the main motivation, these simple cases might help you understand the concept of field projection better. Additionally, the field projection operator -> will make these use cases much more ergonomic than they are at the moment.

Pointers

All pointers listed below have the same behavior. The projection operation is just returning a pointer to the projected field. So if I have a variable ptr: P<Data> (where P<T> is one of the pointer types listed below), then one can use ptr->cfg to get a P<Config>.

&T
&mut T
*const T
*mut T
NonNull<T>
cell::Ref<'a, T>
cell::RefMut<'a, T>

Users should be able to define their own.

Examples

Initialization using Pointers

impl Data {
    unsafe fn raw_init(ptr: NonNull<Self>) {
        unsafe {
            let cfg: NonNull<Config> = ptr->cfg;
            let cfg: *mut Config = cfg.as_ptr();
            Config::raw_init(cfg);
            ptr->items.write(vec![]);
        }
    }
}

impl Config {
    unsafe fn raw_init(ptr: *mut Self) {
        unsafe {
            ptr->port.write(8080);
            ptr->name.write("no name configured");
            StatsConfig::raw_init(ptr->stats);
        }
    }
}

impl StatsConfig {
    unsafe fn raw_init(ptr: *mut Self) {
        unsafe {
            ptr->level.write(0);
        }
    }
}

`RefCell`

let cell = RefCell::new(Data { /* ... */ });
let cfg = cell.borrow_mut()->cfg;
*cfg->port = 42;

Containers

A "container" is a generic repr(transparent) type that has the same layout as one of it's generics. They often add or remove "properties" of the wrapped value. Since they affect every field of a struct in the same way, projecting through them is possible. Every container type C<T> is able to be projected through any previously mentioned pointer type: given ptr: P<C<Data>>, we have ptr->cfg: P<C<Config>>. Container types are:

MaybeUninit<T>
UnsafeCell<T>
Cell<T>

Users should be able to define their own.

Examples

fn set_stats_level(data: &Cell<Data>, level: u8) {
    data->cfg->stats->level.set(level);
}

fn safer_init(data: &mut MaybeUninit<Data>) {
    data->cfg->stats->level.write(0);
    data->cfg->port.write(8080);
    data->cfg->name.write("no name configured");
    data->items.write(vec![]);
}

// doesn't fit on the stack!
struct BigData {
    data: [u8; 1024 * 1024 * 1024],
}

fn init_big_data_to_zero(data: &mut MaybeUninit<BigData>) {
    let ptr: *mut [u8; 1024 * 1024 * 1024] = data->data.as_mut_ptr();
    unsafe { ptr.write_bytes(0, 1) };
}

Complicated Use Cases

In addition to the simple use cases above, field projection will make these more complicated operations not only safe, but also ergonomic. These more complex use cases require a way to mark the fields of a struct in a particular way.

`Pin<P>`

Fields of a struct can be structurally pinned. If the field cfg is structurally pinned, then given ptr: Pin<P<Data>>, we obtain via projection ptr->cfg: Pin<P<Config>>. If cfg is not structurally pinned, we get ptr->cfg: P<Config>, so the wrapper type Pin is only preserved when the field is structurally pinned. One way of marking the fields would be by annotating them with #[pin].

This complicates the idea of field projection quite a lot, the return type of the -> operator now depends on a property of the field. With this added complexity, we also gain new expressiveness that Rust for Linux can take advantage of.

Examples

struct FairRaceFuture<F1, F2> {
    #[pin]
    f1: F1,
    #[pin]
    f2: F2,
    fair: bool,
}

impl<F1, F2> Future for FairRaceFuture<F1, F2>
where
    F1: Future,
    F2: Future<Output = F1::Output>,
{
    type Output = F1::Output;

    fn poll(self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Self::Output> {
        // Since `fair` is not marked with `#[pin]`, we don't get a pinned reference here
        let fair: &mut bool = self->fair;
        *fair ^= true;
        if *fair {
            // For `f1` we do get a pinned reference, since `f1` is annotated with `#[pin]`.
            let f1: Pin<&mut F1> = self->f1;
            match f1.poll(ctx) {
                Poll::Ready(value) => Poll::Ready(value),
                Poll::Pending => self->f2.poll(ctx),
            }
        } else {
            match self->f2.poll(ctx) {
                Poll::Ready(value) => Poll::Ready(value),
                Poll::Pending => self->f1.poll(ctx),
            }
        }
    }
}

RCU

RCU stands for read, copy, update. It is a creative locking mechanism that is very efficient for data that is seldomly updated, but read very often. Below you can find a small summary of how I understand it to work. No guarantees that I am 100% correct, if you want to make sure that you have a correct understanding of how RCU works, please read the sources provided in the next section.

It requires quite a lot of explaining until I can express why field projection comes up in this instance. However, in this case (similar to Pin) it is (to my knowledge) impossible to write a safe API without field projections, so they would be invaluable for this use case.

Explaining RCU

For a much more extensive explanation, please see https://docs.kernel.org/RCU/whatisRCU.html. Since the first paragraph of the first section is invaluable in understanding RCU, it is quoted here for the reader's convenience:

The basic idea behind RCU is to split updates into “removal” and “reclamation” phases. The removal phase removes references to data items within a data structure (possibly by replacing them with references to new versions of these data items), and can run concurrently with readers. The reason that it is safe to run the removal phase concurrently with readers is the semantics of modern CPUs guarantee that readers will see either the old or the new version of the data structure rather than a partially updated reference. The reclamation phase does the work of reclaiming (e.g., freeing) the data items removed from the data structure during the removal phase. Because reclaiming data items can disrupt any readers concurrently referencing those data items, the reclamation phase must not start until readers no longer hold references to those data items.

In C, RCU is used like this:

the data protected by RCU sits behind a pointer,
readers must use the rcu_read_lock() and rcu_read_unlock() functions when accessing any data protected by RCU, within this critical section, blocking is forbidden.
read accesses of the pointer must only be done after calling rcu_dereference(<pointer>).
write accesses of the pointer must be done via rcu_assign_pointer(<old-pointer>, <new-pointer>).
before a writer frees the old value (ie it enters into the reclamation phase), they must call synchronize_rcu().
multiple writers still require some other kind of locking mechanism.

synchronize_rcu() waits for all existing read-side critical sections to complete. It does not have to wait for new read-side critical sections that are begun after it has been called.

The big advantage of RCU is that in certain kernel configurations, (un)locking the RCU read lock is achieved with absolutely no instructions.

Wrapping RCU in a Safe Abstraction

In Rust, we will of course use a guard for the RCU read lock, so we have

mod rcu {
    pub struct Guard(/* ... */);

    impl Drop for Guard { /* ... */ }

    pub fn read_lock() -> Guard;
}

The pointers that are protected by RCU must be specially tagged, so we introduce the Rcu type. It exposes the Rust equivalents of rcu_dereference and rcu_assign_pointer:

mod rcu {
    pub struct Rcu<P> {
        inner: UnsafeCell<P>,
        // we require this to opt-out of uniqueness of `&mut`.
        // if `UnsafePinned` were available, we would use that instead.
        _phantom: PhantomPinned,
    }
    
    impl<P: Deref> Rcu<P> {
        pub fn read<'a>(&'a self, _guard: &'a RcuGuard) -> &'a P::Target;
        pub fn set(self: Pin<&mut Self>, new: P) -> Old<P>;
    }

    pub struct Old<P>(/* ... */);
    
    impl<P> Drop for Old<P> {
        fn drop() {
            unsafe { bindings::synchronize_rcu() };
        }
    }
}

The Old type is responsible for calling synchronize_rcu before dropping the old value.

Note that set takes a pinned mutable reference to Rcu. This is important, since it might not be obvious why there is pinning involved here. Firstly, we need to take a mutable reference, since writers still need to be synchronized. Secondly, since there are still concurrent shared references, we must not allow users to use mem::swap, since that would change the value without the required compiler and CPU barriers in place.

Now to the crux of the issue and why field projection comes up here: we have to wrap data that is protected by RCU with a lock. However, locks do not allow access to the inner value without locking it (that's kind of their whole point…). So we need a way to get to the Rcu<P> without locking the lock. Using field projection, we would allow projections for fields of type Rcu from &Lock to &Rcu<P>.

This way, readers can use field projection and the Rcu::read function and writers can continue to lock the lock and then use Rcu::set.

Examples

Please read the previous section to understand the RCU API in Rust.

struct BufferConfig {
    flush_sensitivity: u8,
}

struct Buffer {
    // We also require `Rcu` to be pinned, because `&mut Rcu` must not exist (otherwise one could
    // call mem::swap).
    #[pin]
    cfg: Rcu<Box<BufferConfig>>,
    buf: Vec<u8>,
}

struct MyDriver {
    // The `Mutex` in the kernel needs to be pinned
    #[pin]
    buf: Mutex<Buffer>,
}

impl MyDriver {
    fn set_buffer_config(&self, flush_sensitivity: u8) {
        let mut guard: Pin<MutexGuard<'_, Buffer>> = self.buf.lock();
        let buf: Pin<&mut Buffer> = guard.as_mut();
        // We can use pin-projections since we marked `cfg` as `#[pin]`
        let cfg: Pin<&mut Rcu<Box<BufferConfig>>> = buf->cfg;
        cfg.set(Box::new(BufferConfig { flush_sensitivity }));
    }

    fn buffer_config<'a>(&'a self, rcu_guard: &'a RcuGuard) -> &'a BufferConfig {
        let buf: &Mutex<Buffer> = &self.buf;
        // Here we use the special projections set up for `Mutex` with fields of type `Rcu<T>`
        let cfg: &Rcu<Box<BufferConfig>> = buf->cfg;
        cfg.read(rcu_guard)
    }

    fn read_to_buffer(&self, data: &[u8]) -> Result {
        let mut buf: Pin<Guard<'_, Buffer, MutexBackend>> = self.buf.lock();
        // This method allocates, so it must be fallible.
        // `buf.as_mut()->buf` again uses the field projection for `Pin` to yield a `&mut Vec<u8>`.
        buf.as_mut()->buf.extend_from_slice(data)
    }
}

Rust for Linux Use Cases

Rust for Linux would heavily utilize field projections for:

safe pin projections of pinned structures (note that many common structures such as locks (mutex, spinlock etc) need to be pinned).
field projections for RCU protected data.
initialization via &mut MaybeUninit<T> without overflowing the stack.
raw pointer field access without unsafe and having to use addr_of!((*ptr).field) / &raw (*ptr).field.
setting individual values using UnsafeCell<T>.
defining our own custom container types such as VolatileMem<T>.
defining our own custom pointer types.
creating more complicated use cases in the future akin to RCU.

Untrusted Data

The untrusted data patch series introduces the Untrusted<T> type. It is used to mark data from userspace or hardware as untrusted. Kernel developers are supposed to validate such data before it is used to drive logic within the kernel.

One use case of untrusted data will be ioctls. They are being discussed in this reply (slightly adapted the code):

Example in pseudo-rust:
struct IoctlParams {
  input: u32,
  ouptut: u32,
}
The thing is that ioctl that use the struct approach like drm does, use the same struct if there's both input and output paramterers, and furthermore we are not allowed to overwrite the entire struct because that breaks ioctl restarting. So the flow is roughly
let userptr: UserSlice;
let params: Untrusted<IoctlParams>;

userptr.read(params));

// validate params, do something interesting with it params.input

// this is _not_ allowed to overwrite params.input but must leave it
// unchanged

params.write(|x| { x.output = 42; });

userptr.write(params);
Your current write doesn't allow this case, and I think that's not good enough. The one I propsed in private does:
Untrusted<T>::write(&mut self, impl Fn(&mut T))

Importantly, we would like to only overwrite the output field of the IoctlParams struct. This is the exact pattern that field projections can help with, instead of exposing a mutable reference to the untrusted data via the write function, we can have:

impl<T> Untrusted<T> {
  fn write(&mut self, value: T);
}

In addition to allowing projections of Untrusted<IoctlParams> to Untrusted<u32>.

Future Additions

This document does not consider:

slices and arrays
tuples and tuple structs
enums
Cow<'_, T>

They will be considered when writing the RFC (either as part of the feature or as future possibilities).