Subclassing pattern solutions

# Subclassing pattern solutions The Linux kernel (and in particular the DRM subsystem) pervasively uses a traditional OOP style subclassing pattern. We'll use `dma_fence` as an example. ## `dma_fence` API summary ### Init/free stuff * `void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, spinlock_t *lock, u64 context, u64 seqno)` * `void dma_fence_release(struct kref *kref)` * `void dma_fence_free(struct dma_fence *fence)` ### Refcounting * `static inline void dma_fence_put(struct dma_fence *fence)` * `static inline struct dma_fence *dma_fence_get(struct dma_fence *fence)` ### Callbacks and signaling * `int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb, dma_fence_func_t func)` * `bool dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)` * `int dma_fence_signal(struct dma_fence *fence)` * `static inline bool dma_fence_is_signaled(struct dma_fence *fence)` * `static inline void dma_fence_set_error(struct dma_fence *fence, int error)` ### Lockdep annotations * `bool dma_fence_begin_signalling(void)` * `void dma_fence_end_signalling(bool cookie)` ### Subclass vtable ```c struct dma_fence_ops { bool use_64bit_seqno; const char * (*get_driver_name)(struct dma_fence *fence); const char * (*get_timeline_name)(struct dma_fence *fence); bool (*enable_signaling)(struct dma_fence *fence); bool (*signaled)(struct dma_fence *fence); void (*release)(struct dma_fence *fence); void (*fence_value_str)(struct dma_fence *fence, char *str, int size); void (*timeline_value_str)(struct dma_fence *fence, char *str, int size); } ``` ## Mapping this to Rust The basic operations map as you'd expect to a struct impl or a trait: ```rust pub trait RawDmaFence { fn signal(&self) -> Result; fn set_error(&self, err: Error); // etc } ``` The problem comes with subclassing. In C, you'd embed the original struct into the subclass: ```c struct foo_fence { struct dma_fence base; /* other stuff */ } ``` And then upcast to the `struct foo_fence` from the callbacks using `container_of()`. This doesn't map well to Rust, because there is no way to do `container_of()` safely in a public API... We can solve this by using generics instead: ```rust pub struct FenceObject<T: FenceOps> { fence: bindings::dma_fence, inner: T, } ``` The driver provides the rest of its object as `T`. But this creates a new problem: T by itself is useless, since it is the rest of a fence without the fence... so how do we implement methods on `T`, both callbacks called by the core and arbitrary user-defined methods? ### The straight solution Since a `FenceObject<T>` is not a `T`, we can't have methods on `T`, so instead we use associated functions. ```rust pub trait FenceOps: Sized + Send + Sync { const USE_64BIT_SEQNO: bool; fn get_driver_name<'a>(fence: &'a FenceObject<Self>) -> &'a CStr; fn get_timeline_name<'a>(fence: &'a FenceObject<Self>) -> &'a CStr; fn enable_signaling(fence: &FenceObject<Self>) -> bool; fn signaled(fence: &FenceObject<Self>) -> bool; fn fence_value_str(fence: &FenceObject<Self>, output: &mut dyn Write); fn timeline_value_str(fence: &FenceObject<Self>, output: &mut dyn Write); } ``` To get the `T`, we provide an `inner()` or similar method to get a reference to `T` from a `FenceObject<T>` (and then we might also need an `inner_mut()`). Let's say our driver-specific fence needs two methods: ```rust impl MyFence::ver { fn add_command(fence: &FenceObject<Self>) { fence.inner().pending.fetch_add(1, Ordering::Relaxed); } fn command_complete(fence: &FenceObject<Self>) { let remain = fence.inner().pending.fetch_sub(1, Ordering::Relaxed); if remain == 1 { fence.signal(); } } } ``` The `inner()` ends up being quite awkward, especially for more complex objects that might have lots of subclass fields that need to be accessed by these methods. That means we need to call methods like this: ```rust let fence: FenceObject<MyFence> = ...; MyFence::add_command(fence); fence.set_error(EIO); MyFence::command_complete(fence); ``` That ends up verbosely repeating `MyFence` everywhere we call a driver-defined method, and looks awkward given the intention that this is an object with methods... (which it still is for the superclass methods, so now we have two different syntaxes). ### `Deref` abuse If we impl `Deref` on `FenceObject<T>`, we can get rid of the `inner()`: ```rust impl MyFence { fn add_command(fence: &FenceObject<Self>) { fence.pending.fetch_add(1, Ordering::Relaxed); } // ... } ``` That starts looking nicer, though it goes against idiomatic Rust guidelines (but those guidelines weren't written for the use case of embedding Rust into an existing complex ecosystem with different approaches...) ### `Receiver` abuse There is one more trick though. There is a secret internal trait called `core::ops::Receiver` enabled by the secret undocumented `receiver_trait` feature. If we implement it for `FenceObject<T>`... ```rust impl<T: FenceOps> core::ops::Receiver for FenceObject<T> {} ``` That then magically makes functions taking `FenceObject<T>` as a receiver work, which means we can have real methods: ```rust pub trait FenceOps: Sized + Send + Sync { const USE_64BIT_SEQNO: bool; fn get_driver_name<'a>(self: &'a FenceObject<Self>) -> &'a CStr; fn get_timeline_name<'a>(self: &'a FenceObject<Self>) -> &'a CStr; fn enable_signaling(self: &FenceObject<Self>) -> bool; fn signaled(self: &FenceObject<Self>) -> bool; fn fence_value_str(self: &FenceObject<Self>, output: &mut dyn Write); fn timeline_value_str(self: &FenceObject<Self>, output: &mut dyn Write); } ``` ```rust impl MyFence { fn add_command(self: &FenceObject<Self>) { self.pending.fetch_add(1, Ordering::Relaxed); } fn command_complete(self: &FenceObject<Self>) { let remain = self.pending.fetch_sub(1, Ordering::Relaxed); if remain == 1 { self.signal(); } } } ``` And to use the methods: ```rust let fence: &FenceObject<MyFence> = ...; fence.add_command(); fence.set_error(EIO); fence.command_complete(); ``` That now looks like what you'd expect for a subclassing paradigm, but we've abused two Rust features to get there, including one that is so secret it isn't even documented as an unstable feature. Rust-for-Linux already uses `Receiver` in the `Box` code (pulled straight from `std`) and in the `Arc` code (which mirrors the API of the `std` version). This trait is virtually unknown and there are practically no hits on Google for it other than Rust-for-Linux stuff recently, and a couple obscure articles, though it was mentioned when it was implemented [here](https://github.com/rust-lang/rust/pull/56805). ### The slightly less secret solution Instead of `impl Receiver`, we can just enable the `arbitrary_self_types` nightly feature, which is actually [documented](https://github.com/rust-lang/rust/issues/44874). This works the same way, it just has different tradeoffs: * `Receiver` only needs `receiver_trait` enabled in the `kernel` crate, and applies only to specific types which implement it. So it is arguably safer and we can control what types exhibit this behavior. * `arbitrary_self_types` needs to be enabled for both the kernel and all driver/module crates, and then makes this work for any type that `Deref`s to `Self` automatically. ## Alternatives ### Raw pointer dance Gary: The most straightforward solution would be to do something like ```rust unsafe trait Inherit<T> { fn upcast(this: *const Self) -> *const T; fn downcast(this: *const T) -> *const Self; } ``` (and realized this is what Boqun call as `Contains`) Example: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=93abd4b98bf8920b45d2d4a2573e0889 Then we can use them in the bridging method to downcast `struct dma_fence*` to `&MyFence` and then forward them to the trait impl. For convience, the upcasting part could also be implemented with `AsRef/Deref`. A plus side for this approach is that it's possible to provide bindings to C-side created dma fences (not sure if useful, though); the approach described by Lina wouldn't work in this scenario. A down side is that this requires some `unsafe` code, although, we could reasonably hide this under a proc macro 😉, e.g. A down side is that in `drop` impl can use `DmaFence` and it could have potential to screw up. ```rust #[derive(Inherit)] struct MyFence { #[inherit] fence: DmaFence, } // Upcasting part, maybe could be generated by inherit macro impl Deref for MyFence { type Target = DmaFence; fn deref(&self) -> &DmaFence { &self.fence } } ``` Another drawback of this design is that when downcasting it's very hard to get provenance correct. For example, the example above is rejected by stack borrows. This is the fixed version with provenance dance: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=12eb460ca40bfc6bdefc64440ebdc17f (alternative, [using strict provenance feature gate](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f01f4fee5875c79bd79059411850aabd)) ## Comments Gary: Personally I am a big fan of `Deref` abuse. I actually already use the pattern described by Lina here quite extensively in a personal project (and yes, including the `Receiver` part). Although, for my project the `struct dma_fence` equivalent part is on negative offset (and `T` starts from offset 0), so I convinced myself it does indeed look like a smart pointer :) Boqun: The thing is the `Deref` pattern doesn't work if we have two `dma_fence` like type, for example, `timer_list`, `work_struct`, `rcu_head` etc. Or it does work but will look really weird if we `Deref` for one thing but not for the other. * Gary: I don't think that a valid analogy. The refcounting aspect of dma fence sounds like that all dma fences must have their refcounting delegated to the `struct dma_fence`. It's more like `struct device`. * Boqun: so your point is that `FenceObject<T>` **is-a** `dma_fence` other than **has-a** `dma_fence`? But could we catagorize the following as **is-a** `dma_fence`? ```clike= 290 struct i915_ttm_memcpy_work { 291 struct dma_fence fence; 292 struct work_struct work; 293 spinlock_t lock; 294 struct irq_work irq_work; 295 struct dma_fence_cb cb; 296 struct i915_ttm_memcpy_arg arg; 297 struct drm_i915_private *i915; 298 struct drm_i915_gem_object *obj; 299 bool memcpy_allowed; 300 }; 301 ``` ## Discussion Lina: (Introduction) Alice: I can think 3 places where this pattern can be used. Gary: I have one usage, and convince myself it's a smart pointer. Gary: All the usage will use `FenceObject<MyFence>`, I think.. Lina: will I need to upcast for callback Alice: similar to workqueue implementation I have Lina: How do you Alice: (showing workqueue) https://github.com/Darksonn/linux/blob/alice/workqueue/rust/kernel/workqueue.rs ```rust #[repr(transparent)] pub struct Work<T: ?Sized> { work: Opaque<bindings::work_struct>, _pin: PhantomPinned, _adapter: PhantomData<T>, } ``` Gary: `dma_fence` is different than `work`, **is-a** vs **has-a**. Alice: In this case, workqueue also control the lifetime of the work item. Gary: for `dma_fence`, you want to erase the inner type. Alice: It also support the case where there are two work items Daniel: Is `Arc` required? Alice: no it can be `Box` Alice: how do you ge Gary: This sounds similar to a pattern that I use in field-projection https://docs.rs/field-projection Gary: question to Lina, how do you think the approach we have for `device`, i.e. `AsRef<Device>` Lina brings up an issues about drop. (discussion about drm_file and how it's currently not using embedding approach) Daniel talks about using lockdep to help debug `Arc` drop.