Try   HackMD

Design meeting 2023-03-29: Field Projection

Summary

This meeting originated from the Field Projection RFC, which has several major problems in its current design which are the topic of today's meeting.

The Problem of Projections

Rust often employs the use of wrapper types, for example Pin<P>, NonNull<T>, Cell<T>, UnsafeCell<T>, MaybeUninit<T> and more. These types provide additional properties for the wrapped type and often also logically affect their fields. For example, if a struct is uninitialized, its fields are also uninitialized.

However, these wrapper types have no ergonomic way to provide easy access to fields of the wrapped type. This problem gets worse with Pin, since the projection functions are unsafe and accessing fields is natural:

struct RaceFutures<F1, F2> { fut1: F1, fut2: F2, } impl<F1, F2> Future for RaceFutures<F1, F2> where F1: Future, F2: Future<Output = F1::Output>, { type Output = F1::Output; fn poll(mut self: Pin<&mut Self>, ctx: &mut Context) -> Poll<Self::Output> { match unsafe { self.as_mut().map_unchecked_mut(|t| &mut t.fut1) }.poll(ctx) { Poll::Pending => { unsafe { self.map_unchecked_mut(|t| &mut t.fut2) }.poll(ctx) } rdy => rdy, } } }

If one wants to add SAFETY comments, it gets even more tedious.

Proposition

Introduce a new binary operator -> to provide projections. The syntax is $expr->$ident and it has the same precedence as normal field access via the . operator. The projection operator -> allows projecting a wrapper type that is wrapping a struct to one of the struct's fields. struct->field is only valid if the field is accessible.

The above example would become:

impl<F1, F2> Future for RaceFutures<F1, F2> where F1: Future, F2: Future<Output = F1::Output>, { type Output = F1::Output; fn poll(self: Pin<&mut Self>, ctx: &mut Context) -> Poll<Self::Output> { match self->fut1.poll(ctx) { Poll::Pending => self->fut2.poll(ctx), rdy => rdy, } } }

Motivation

The need for this feature arose from the Rust-for-Linux project. In the kernel almost all types contain self-references and thus need to be pinned.
The RFC design process uncovered that many more types might benefit from projections. Notably MaybeUninit<T> and the combination with Pin: Pin<&mut MaybeUninit<T>>. These types are also often used when initializing pinned types in-place in the Linux kernel.
This is the reason why pin-projections are the minimum that general field projection should support. Additionally, in the kernel we are interested in creating our own custom projections to help with the ergonomics of RCU-locks.

Problems

This section contains the design problems. Each subsection first states the problem and its details. After that the currently discovered solutions are given. The order of the solutions is from best to worst according to the author's opinion.

Problem 1: Representing Fields in Code

The problem: How to represent fields in code dealing with projections?

The first problem with implementing the -> operator is that it requires some way of referring to fields in code. This is because we want a trait governing the behavior of ->. This trait will need to have a method that does the projection, but it needs some way to communicate the field that is projected to the implementer.

Solution 1

Fields are compiler-generated types that implement the Field trait:

pub trait Field {
    /// The type of the struct containing this field.
    type Base;
    /// The type of this field.
    type Type;
    /// The offset of this field from the beginning of the `Base` struct in bytes.
    const OFFSET: usize;
}

This trait cannot be implemented manually. The compiler generates types for every field of every struct that implement this trait with the associated constants set accordingly. Users are able to rely on these constant being correct. For example they are able to write:

struct Foo { count: usize, info: String, } fn get_field<F>(foo: &Foo) -> &F::Type where F: Field<Base = Foo> { let ptr: *const Foo = foo; let ptr: *const u8 = ptr.cast(); // SAFETY: the pointer is valid and `F::OFFSET` is still within `Foo`. let off_ptr: *const u8 = unsafe { ptr.add(F::OFFSET) }; let field_ptr: *const F::Type = off_ptr.cast(); // SAFETY: the pointer is valid and at offset `F::OFFSET`, // `F::Type` lives inside `Foo`. unsafe { &*field_ptr } }

There will be a macro field_of!(Struct, field) that returns the compiler-generated type for that field.

Solution 2

A field is an instance of the generic Field struct:

/// This type represents a Field of type `U` inside of the struct `T`. /// /// `N` is a compiler-internal unique identifier for the field. /// It might change between invocations of the same compiler and must not /// be relied upon. pub struct Field<T, U, const N: usize> { offset: usize, phantom: PhantomData<fn(T, U) -> (T, U)>, } impl<T, U, const N: usize> Field<T, U, N> { pub fn offset(&self) -> usize { self.offset } }

This type is not constructible and will be automatically created by the compiler when handling ->. Users are able to rely on the value of offset and the type U to be correct:

struct Foo { count: usize, info: String, } fn get_field<Field, const N: usize>( foo: &Foo, field: Field<Foo, Field, N>, ) -> &Field { let ptr: *const Foo = foo; let ptr: *const u8 = ptr.cast(); // SAFETY: the pointer is valid and `offset` is still within `Foo`. let off_ptr: *const u8 = unsafe { ptr.add(field.offset()) }; let field_ptr: *const Field = off_ptr.cast(); // SAFETY: the pointer is valid and at offset `offset`, // `Field` lives inside `Foo`. unsafe { &*field_ptr } }

The author greatly prefers Solution 1 and mainly wrote this section to explain the Field trait and to determine potentially undiscovered solutions. The rest of this document will use Solution 1.

Problem 2: Projection Output Behavior

The problem: How does an author of a wrapper type specify the projection output behavior? How will we support the case "let the author of the struct decide the projection kind"?

Another challenge for the -> operator is that projecting Pin<&mut Struct> to field has two possible output types (Pin<&mut Field> and &mut Field). This depends on whether field is structurally pinned or not. Currently this information is managed entirely by the programmer, who has to coordinate that a field is either only structurally projected or not (and use the unsafe projection functions accordingly). The pin-project crate alleviates this problem by declaring which fields are structurally pinned on the struct definition and giving safe access only to the correct projection. However for field projection, the compiler-generated field type will have to hold this information.

The projection output of wrapper types varies widely. Some types:

  • only have one projection output for all fields. For *mut Struct, all projections result in *mut Field.
  • depend on the type of the projected field. &mut MaybeUninit<Struct> could project fields that are allowed to be uninitialized to &mut Field and the rest to &mut MaybeUninit<Field>.
  • allow the author of the projected struct to specify the projection output: Pin<P>.

Custom projections might want to have more complex projection output behavior.

Solution 1

A combination of marker traits, proc-macros for implementing said marker traits and negative reasoning for marker traits. The Project trait is as follows:

/// Used for projection operations like `expr->field`.
/// 
/// `F` is the projected Field of the inner type.
pub trait Project<F: Field<Base = Self::Inner>> {
    /// The projected type that is wrapped by this Wrapper.
    type Inner;
    /// The output of the projection.
    type Output;

    /// Projects this wrapper type to the given field.
    fn project(self) -> Self::Output;
}

Here is the example implementation for Pin:

/// The `#[pin]` proc-macro on fields will implement this trait. #[marker] pub trait Pinned {} impl<'a, T, F: Field<Base = T>> Project<F> for Pin<&'a mut T> where F::Type: 'a + Pinned, { type Inner = T; type Output = Pin<&'a mut F::Type>; fn project(self) -> Self::Output { let ptr: *mut T = unsafe { self.get_unchecked_mut() }; let ptr: *mut F::Type = unsafe { ptr.cast::<u8>().add(F::OFFSET).cast() }; unsafe { Pin::new_unchecked(&mut *ptr) } } } impl<'a, T, F: Field<Base = T>> Project<F> for Pin<&'a mut T> where // Note the !Pinned, this is only allowed, because it is `#[marker]`. // Additionally the trait resolution of these two impls works // exactly because of this bound. F::Type: 'a + !Pinned, { type Inner = T; type Output = &'a mut F::Type; fn project(self) -> Self::Output { let ptr: *mut T = unsafe { self.get_unchecked_mut() }; let ptr: *mut F::Type = unsafe { ptr.cast::<u8>().add(F::OFFSET).cast() }; unsafe { &mut *ptr } } }

The problems with this approach are:

  • Every wrapper type will need to define their own marker trait, if they have varying projection kinds. A proc-macro for adding it to the struct if it should be struct-author specified.
  • While negative reasoning with marker types should be feasible, this adds additional overhead to implementing field projection.

The author chose to not mention the earliest proposal that was based on proc-macros to facilitate projections on wrapper types as it did not provide a Project trait.

Problem 3: What is a Projection?

The problem: What types of operations are allowed within Project::project? What output types are valid projection outputs?

The author initially thought of projections as being only allowed to do pointer offsetting. As this is true to the original problem of pin projections. However, this is a restriction preventing possibly useful projections. One suggestion that Josh brought up, was to have projections on an Iterator:

struct Struct {
    field: u64,
}

fn use_iter(it: impl Iterator<Item = Struct>) -> u64 {
    it->field.sum()
}

Where it->field is implemented as it.map(|item| item->field).

The author's current opinion is to allow arbitrary code in projections. Since these will always need to be statically dispatched, the programmer will have full knowledge of potential performance caveats.

Do we want to require that the return type of a projection "has to include the field type in some form"? Otherwise this would be possible:

pub struct Baz {
    foo: Foo,
    bar: Bar,
}

pub struct Foo;
pub struct Bar;

impl Project<field_of!(Baz, foo)> for Baz {
    type Inner = Baz;
    type Output = Bar;
    
    fn project(self) -> Self::Output {
        Bar
    }
}
impl Project<field_of!(Baz, bar)> for Baz {
    type Inner = Baz;
    type Output = Foo;
    
    fn project(self) -> Self::Output {
        Foo
    }
}

And now the user can do:

let baz = Baz { foo: Foo, bar: Bar };
let foo: Bar = foo->foo;
let bar: Foo = foo->bar;

Which is strange.

Problem 4: Automatic Reborrow

The problem: Project::project takes self, this means that e.g. Pin<&mut T> has to always be reborrowed via as_mut(), can we avoid this?

This problem affects Pin<&mut T> in general, so this might get fixed by some other means of allowing automatic reborrowing for more types than just &T and &mut T.

The author finds this problem annoying since the Pin example from the beginning will probably look like this:

impl<F1, F2> Future for RaceFutures<F1, F2> where F1: Future, F2: Future<Output = F1::Output>, { type Output = F1::Output; fn poll(self: Pin<&mut Self>, ctx: &mut Context) -> Poll<Self::Output> { match self.as_mut()->fut1.poll(ctx) { Poll::Pending => self->fut2.poll(ctx), rdy => rdy, } } }

Not Discussed in this Meeting

This section lists the things that will not be discussed in this meeting:

  • enum support
  • the syntax of the operator: ~ vs -> vs overloading .
  • desugaring of ->

Discussion

Straw poll on goals

nikomatsakis: I'm super excited about iter->field as a shorthand for iter.map(|i| i.field), though I would also want iter->method(22). I also like the other use cases. I'm curious to get a sense for how folks on the lang team feel about this. They are clearly both ergonomics but the target audience etc is fairly distinct. It feels like instead of a "projection operator" this starts to be a "map operator" though (e.g., option->foo?)

tmandry: not sure how obvious it is

y86-dev: how would methods even work?

nikomatsakis: not sure, could imagine that iter->method(x) desugars differently, just as foo.bar() is a method call but foo.bar is a field access

nikomatsakis: arguably we should add _ expressions a la scala

Big difference in coolness factor, ability to act on a collection in almost the same way as an element of the collection

  • today: iter.map(|x|x.bar())
  • iter.map(_.bar())
  • iter->bar()

I would expect iter->bar() to yield an iterator so you can chain; I feel like "let me act on collections more naturally" is a different feature, ultimately.

tmandry: Sharing syntax for one value vs multiple values is confusing to me.

Building in support for pin? Other pain points?

nikomatsakis: should we consider making pin more usable? other pain points?

y86dev: mostly this plus reborrow

nikomatsakis: reborrow is a pretty general pain point. We're not very good at letting you make a "fake reference" overall.

pnkfelix: how much benefit do you get from other things? how general is this?

y86dev: maybe-uninit is very similar. combined with pin + maybe-uninit, same. also in the kernel we'd like to create our own custom projection types to deal with rcu lock in a more elegant.

nikomatsakis, tmandry: O_O say more

y86dev: when you have an rcu-guard, basically a token that you locked it, you can access data read-only, and otherwise you are allowed to lock the mutex.

brainstorming: place closures

nikomatsakis: what if we had kind of extra traits

impl<T> Pin<T> {
    fn map(&mut self, op: PFM) -> Pin<&mut PFM::Target> where PFM: PlaceFnMut<T> {}
}

x.map(|x| 22) // ERROR; not a place function

x.map(|x| x.foo) // OK

trait PlaceFnMut {
    type Target;
    const IS_STRUCTURALLY_PINNED: bool;
    fn offset(&self) -> usize;
}
impl<T> Pin<T> {
    fn map(&mut self, op: PFM) -> Pin<&mut PFM::Target>
    where
        PFM: PlaceFnMut<T>,
        PFM::Target: !Unpin,
    {}

    fn map(&mut self, op: PFM) -> Pin<&mut PFM::Target>
    where
        PFM: PlaceFnMut<T>,
        PFM::Target: Unpin,
    {}
}
struct NotPin<T> {
    t: T
}

impl<T> Unpin for NotPin<T> { }

struct Foo<F: Future> {
    not_pinned: F,
    pinned: StructurallyPinned<F>,
}
Pin<&mut Struct> -> Pin<&mut Field>
Pin<&mut Struct> -> &mut Field
struct Foo {
    count: usize,
    info: RCUMutex<String>,
}

when you hold an RCUGuard (a global thing) then you can read info

#[marker]
#[sealed]
pub trait RCUProjected {}

impl<T> RCUProjected for RCUMutex<String> {}
let f: &Foo
let g: RCUGuard = ...;

f.info->len() // how do I give evidence of the `g`?

here is an example from Gary:

struct Rcu;

#[derive(kernel::projection::Field, RcuField)]
struct Test {
    a: Rcu<Box<u32>>,
    b: i32,
    next: Rcu<Box<Test>>,
}

struct test {
    struct mutex mutex;
    u32 __rcu *a;
    int b;
    struct test __rcu *next;
};
/*
unsafe auto trait NoRcu {}

impl<T> !NoRcu for Rcu<T> {}

impl<T> DerefMut for MutexGuard<T> where T: NoRcu {}*/

fn foo(data: &Rcu<Box<Test>>) {
    
}

fn reader(data: &RcuLock<Mutex<Test>>) {
    /* reader(struct test* data) */
    let a_ptr /*: &Rcu<Box<u32>>*/ = project!(data => a);
    /* a_ptr = &data->a; */
    let next_ptr = project!(data => next);
    /* next_ptr = &data->next; */
    let rcu_guard = rcu::read_lock();
    /* rcu_read_lock(); */
    let a: &u32 = a_ptr.get(&rcu_guard);
    /* a = rcu_dereference(*a_ptr) */
    let next: &Test = project!(data => next).get(&rcu_guard);
    /* next = rcu_dereference(*next_ptr) */
    
    /* rcu_read_unlock(); // automatically called on rcu_guard drop */
}

fn writer(data: &RcuLock<Mutex<Test>>) {
    let mut guard = data.lock();
    // guard does not give `&mut Test`!
    
    // Use projection to get a mutable reference
    *project!(&mut guard => b) = 10;
    let _: &Rcu = project!(&mut guard => a); // This will only give an immutable reference
    let _: &Rcu = project!(&*data => a); // Still able to get an immutable reference while mutex is locked.
    drop(guard);
    // Read can be done directly
    pr_info!("Value: {}\n", data.lock().b);
}

brainstorming: AST closures

nikomatsakis: or we had something to get out the AST for like "run this code on the GPU"

scottmcm: Prior art:
https://learn.microsoft.com/en-us/archive/blogs/charlie/expression-tree-basics
gives an AST for a closure in C#

nikomatsakis:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Constraining output of projections

pnkfelix: Re "Do we want to require that the return type of a projection “has to include the field type in some form”?", I'd say it should be a lint at most. I see no reason currently to make it a stronger error than that, by default.

Automatic Reborrow

pnkfelix: I don't understand the "Automatic Reborrow" issue completely; where are you saying the as_mut always has to happen, and where are you hoping to avoid it?

y86: it is a general problem when using Pin<&mut Self>, &mut Self allows automatic reborrowing:

fn foo(&mut self, count: usize) {
    self.do_baz();
    self.do_baz();
    if count > 0 {
        self.foo(count - 1);
    }
}

do_baz takes a &mut Self by value, automatic reborrowing essentially does (&mut *self).do_baz();. With Pin<&mut Self> this would have to look like this:

fn foo(Pin<&mut Self>, count: usize) {
    self.as_mut().do_baz();
    self.as_mut().do_baz();
    if count > 0 {
        self.as_mut().foo(count - 1);
    }
}

Negative where clauses considered harmful

nikomatsakis: I don't think we want negative where clauses, whether for auto traits or other things, because it makes trait checking must less tractable and opens the doors to paradoxes (maybe there are some special cases; but then we'd have to explain why ! only works for certain things etc.) That said, I get the pin use case. Reminds me of how sometimes for (small) copy types it'd be nice to have things that just copy out.