Design meeting 2024-02-28: Arbitrary self types v2

--- title: "Design meeting 2024-02-28: Arbitrary self types v2" tags: ["T-lang", "design-meeting", "minutes"] date: 2024-02-28 discussion: https://rust-lang.zulipchat.com/#narrow/stream/410673-t-lang.2Fmeetings/topic/Design.20meeting.202024-02-28 url: https://hackmd.io/T6C4IbfwQwSo-JNpA_qJuA --- # 2024-02-28: Arbitrary Self Types v2 ## The decision to be made Options: 1. Do not progress arbitrary self types; 2. Do a simple version, with support for new smart pointer types, solving most use-cases; 3. Do a more complex version which can additionally support raw pointers, `NonNull` and `Weak` receivers, solving some additional use-cases. Compromise option: we agree to do the simple version, so long as we see a path to later support raw pointers, `Weak` and `NonNull`. ## What's "arbitrary self types", anyway? This is the pattern we want to support: ```rust struct MyRc<T>(T); struct Dino; impl Dino { fn roar(self: MyRc<Self>) { } // note receiver type } fn main() { let dino = MyRc(Dino); dino.roar(); } ``` Stable Rust already supports method calls like this: ```rust impl Dino { fn roar(self: std::rc::Rc<Self>) { } } fn main() { let dino = std::rc::Rc::new(Dino); dino.roar(); } ``` but this is possible only because `Rc` is special (along with `Box`, `Arc` and `Pin`). We can't currently call methods on some custom smart pointer type. In today's [stable Rust](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=ae1dfe5e3875f4736d98600b016a3bbf), this breaks, saying: ``` error[E0307]: invalid `self` parameter type: MyRc<Dino> --> src/main.rs:6:19 | 6 | fn roar(self: MyRc<Dino>) { } | ^^^^^^^^^^ | = note: type of `self` must be `Self` or a type that dereferences to it = help: consider changing to `self`, `&self`, `&mut self`, `self: Box<Self>`, `self: Rc<Self>`, `self: Arc<Self>`, or `self: Pin<P>` (where P is one of the previous types except `Self`) ``` There's already a form of arbitrary self type support in nightly, tied to use of `Deref`: ```rust #![feature(arbitrary_self_types)] struct MyRc<T>(T); impl<T> std::ops::Deref for MyRc<T> { type Target = T; fn deref(&self) -> &Self::Target { &self.0 } } struct Dino; impl Dino { fn roar(self: MyRc<Self>) { } } fn main() { let dino = MyRc(Dino); dino.roar(); } ``` The use of `Deref` has some disadvantages: * some smart pointer types contain a pointer to a `T` which can't safely be turned into a reference that's compliant with Rust's aliasing rules (there are sometimes workarounds here using ZSTs, or`MaybeUninit` and `UnsafeCell` to wrap the `T`) * some smart pointer types `P<T>` don't always have a `T`, e.g. imagine wrapping a pointer which might be null. * some smart pointer types want to prevent the vending of normal references, and only vend special references which might (for instance) have `Drop` semantics In these cases, implementing `Deref::deref` is impossible, so in v2 we propose using a `Receiver` trait instead: ```rust impl<T> std::ops::Receiver for MyRc<T> { type Target = T; } // Everything else identical to previous code example ``` We propose a blanket implementation of `Receiver` for all `T: Deref`. This is unusual, but this is believed the best choice because: * We would otherwise be searching along different paths for method candidates, vs. the types to which the actual receiver can be converted as a `self` type in those candidates. This sounds like a recipe for great user confusion. * Method resolution already searches along the `Deref` chain, even in stable Rust, so we'd have to preseve behavior by searching along _both_ the `Deref` and `Receiver` chains in some way. Extra confusion. * Existing smart pointer types which implement `Deref` can immediately be used as `self` types. The net effect of this blanket implementation is that a type has a chain of `Receiver`s to follow, the first few steps of which may also be found by following `Deref` instead: ```mermaid graph LR A --Receiver--> B A --Deref--> B B --Receiver--> C B --Deref--> C C --Receiver--> D D --Receiver-->E ``` # Rationale for arbitrary self types * [Rust for Linux](https://github.com/rust-lang/rfcs/pull/3519#discussion_r1492385549) would like their kernel-side equivalents of `Arc`, `Box` etc. to have the same capabilities as `std`'s user-side equivalents. @Darksonn says: > The kernel needs a custom Arc for various reasons, but the most important reason is that we need to use the kernel's refcounting logic, instead of the logic used by `alloc::sync::Arc`. This is because the standard library `Arc` will abort the program on overflow, which is entirely unacceptable in the kernel. ... Usually, making `Arc` into a receiver comes up because we have many methods where we want to call `self.clone()` inside a method. ... There are also several other types that we would like to become receivers * Interop with other languages. In the case of JavaScript, Python and [C++](https://medium.com/@adetaylor/are-we-reference-yet-c-references-in-rust-72c1c6c7015a), we want to represent foreign language references/pointers using a Rust type, and to call methods on such references. (In many cases these foreign language pointers/references can't obey Rust aliasing rules, so Rust references are no good.) Example: ```rust // Call into C++ to get a C++ reference/pointer... let cpp_obj_reference: CppRef<ConcreteCppType> = get_cpp_reference(); // cpp_obj_reference does not obey Rust reference semantics. Other // "references" to the same data may exist in the Rust or C++ domain. // But it can effectively be used as an opaque token to pass safely // through Rust back into C++ let some_value: u32 = cpp_obj_reference.some_cpp_method(); ``` * Use-cases where smart references have semantics of their own, for instance custom reference counting types or cases where a UI should be re-laid-out after the last reference disappears. * We currently can't add methods to `Rc`, `Box` etc. because they might shadow methods on contained types. These smart pointers would be more intuitive if they had methods. * Accepting raw pointers, `NonNull` and so-on as method receivers in order to allow (for example) field projection methods without any dereferencing or `unsafe`. @Manishearth says: > raw pointer receivers are quite important for the future of safe Rust, because stacked borrows makes it illegal to materialize references in many positions, and there are a lot of operations (like going from a raw pointer to a raw pointer to a field) where you don't need to or want to do that. > and @nikomatsakis says: > it enables `*const self` methods -- this is a big win for unsafe code, in my opinion, for all the same reasons. Right now code that wants to take a raw pointer and doesn't want to guarantee reference validity is pretty stuck. ## Adding new methods to smart pointers We currently cannot add new methods to `Rc`, `Arc`, `Box`, or `Pin`. That's because they may be used as `self` types, and so any method calls must pass through to the contained type. This results in an awkward API for these types, and it would be great to relax this. (We might, for example, want to add new methods for provenance tracking in future). We'd also very much like to be able to support `NonNull`, `Weak` and raw pointers as `self` types. Such method calls would look like this: ```rust use std::ptr::NonNull; use std::rc::{Rc, Weak}; struct Dino; impl Dino { fn roar(self: NonNull<Self>) { } fn eat_toilet(self: *mut Self) { } // *const also OK fn trash_visitor_center(self: Weak<Self>) { } } fn main() { let mut dino = Dino; let dino_ptr = &dino as *mut Dino; dino.eat_toilet(); let dino_nonnull = NonNull::new(dino_ptr).unwrap(); dino_nonnull.roar(); let dino2 = Rc::new(Dino); let dino2_weak = Rc::downgrade(&dino2); dino2_weak.trash_visitor_center(); } ``` Supporting these types of calls is seen as highly desirable by many in the Rust lang community, but without special handling this would prevent us adding more methods to `NonNull`, `Weak` or raw pointers, because any such methods would shadow method calls on the inner type. In this case, imagine we add `NonNull::roar()`. It's ambiguous whether `dino_nonnull.roar()` should call `NonNull::roar()` or `Dino::roar()`. ## How we solve this We propose a new set of rules for disambiguating method calls. The goal is: if we add a new method to `Rc`, `Arc`, `Box`, `NonNull`, `Weak`, raw pointers or any other type implementing `Receiver`, we'd like to continue calling pre-existing inner type methods. The proposal is to disambiguate to the inner method (for instance `Dino::roar()`), but to show a warning such that the user takes action to disambiguate the call. Specifically: * In method probing, we identify if there are multiple method candidates with the same `self` type, but different distances through the `Receiver` chain. (This case will only trigger if the `self` type is identical, because in one case more `Deref`s have been applied than the other - I probably need to word this better, but putting this comment here because it has caused a bit of confusion below) * If so, we choose the candidate which is furthest along the `Receiver` chain and show a warning. There's more detail on the implementation of these rules below. ### Why some say this feels counterintuitive Normally in the Rust world, we probe for methods from the outer to the inner along the `Deref` chain. In this case, we need to disambiguate by choosing the _inner_ method (`Dino::roar` not the outer `NonNull::roar`), because we can be confident that it was added _first_. An example using a new custom smart pointer type: ```rust // in crate myrc struct MyRc<T>(T); impl<T> std::ops::Receiver for MyRc<T> { type Target = T; } // in crate jurassic struct Dino; impl Dino { fn roar(self: MyRc<Self>) { } } // somewhere else fn main() { let dino = MyRc(Dino); dino.roar(); } ``` If we *later* add `MyRc::roar`, it's important that `main` continues calling `Dino::roar`. ### A note about method resolution order *(section written by Nadrieril)* [TC: This may be a distinct counterproposal, though it does have overlap, and it gets into a question of changing the method resolution order more generally.] Compare: ```rust impl<T> CustomRc<T> { fn frob(&self) { ... } // sugar for: fn frob(self: &CustomRc<T>) { ... } } impl MyType { fn frob(self: &CustomRc<MyType>) { ... } } ``` This looks like specialization to me. In other words, the impl on `MyType` is more specific than the generic impl, so it makes sense to select it instead. If we ever get specialization on traits, this will feel consistent with that. This is different from how `Deref` works. If we had `fn frob(self: &MyType)`, this is then less specific than the generic `CustomRc<T>` impl. So it makes sense to select the generic method first. To sum up, I argue that method resolution should be "most specific first", which means the following order of priority: 1. Inherent method `fn frob(self: &CustomRc<MyType>)` on `MyType`; 1. Inherent method `fn frob(self: &Self)` on `CustomRc<T>`; 1. Trait method `fn frob(self: &Self)` if implemented by `CustomRc<T>`; 1. Inherent method `fn frob(self: &MyType)` on `MyType` (via `Deref`); 1. Trait method `fn frob(self: &Self)` if implemented by `MyType` (via `Deref`). When mixing several layers of `Receiver` and `Deref`, "most specific first" gives: 1. `fn frob(self: &Arc<Box<Self>>)` on `ConcreteType`; 1. `fn frob(self: &Arc<Self>)` on `Box<T>`; 1. `fn frob(self: &Self)` on `Arc<T>`; 1. `fn frob(self: &Box<Self>)` on `ConcreteType` (via `Deref`); 1. `fn frob(self: &Self)` on `Box<T>` (via `Deref`); 1. `fn frob(self: &Self)` on `ConcreteType` (via two layers of `Deref`). which is already what was implemented in `arbitrary_self_types` [last I checked](https://gist.github.com/Nadrieril/10d909b02e07ae493db2fe98ce48c715). EDIT: I (Nadri) had not in fact read the rest of the document. I want to be clear that I do not propose we change anything about `Deref` method resolution. This is only about how `Receiver` method resolution fits into existing method resolution. ### What about if the outer type adds a method first? Imagine we have: ```rust impl<T> MyRc<T> { fn eat(self) {} } ``` If the author of `Dino` later adds `Dino::eat(self: MyRc<Self>)` then this will shadow `MyRc::eat`. But! The self-type is `MyRc` and they should already be aware of the methods on `MyRc`. So we don't need to guard against this eventuality. The same applies if we're talking about adding a trait method `Carnivore::eat(self: MyRc<Self>)` where `Dinosaur: Carnivore`. The creator of `Carnivore` should be aware of the methods on `MyRc`, so there doesn't appear to be any way a compatibility break can occur here except in race conditions. (Thought on this point is appreciated.) ### Why this is OK Whenever we disambiguate to an inner method like this, we show a warning, which can be easily resolved by fully qualifying the function call. **All warning-free Rust code resolves methods from the outside in, in the normal intuitive fashion.** In my opinion this means that this rule, even if some see it as counterintuitive, can cause no significant confusion in the Rust ecosystem. ## How these warnings would be experienced by users Imagine: ```rust use std::rc::{Rc, Weak}; struct Orbit; impl Orbit { fn retrograde(self: Weak<Self>) { } } fn main() { let orbit = Rc::new(Orbit); let orbit_weak = Rc::downgrade(&dino2); orbit_weak.retrograde(); } ``` then, we add `Weak::retrograde(&self)` to the standard library. The above code would receive a **warning** something like: ``` warning[W0666]: ambiguous function call --> src/main.rs:13:4 | 13 | orbit_weak.retrograde(); | ^^^^^^^^^^^^ | = note: you may have intended a call to `Orbit::retrograde` or to `Weak::retrograde` = note: this method won't be called --> src/rc/rc.rs:136:21 | 136 | fn retrograde(&self) { | ^^^^^^^^^^^^^^^^^ | = note: because we'll call this method instead --> src/space/near_earth.rs:357:68 | 357 | fn retrograde(self: Weak<Self>) { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: call as a function not a method: ~ Orbit::retrograde(orbit_weak) = help: call as a function not a method: ~ Weak::retrograde(orbit_weak) ``` The warning can readily be resolved by adopting either of the suggestions. (This lint is not yet fully implemented; [work in progress here](https://github.com/rust-lang/rust/compare/master...adetaylor:rust:receiver_trait_with_target#diff-090eb0e88a58a0f9ee4138978029c8d73a05e77306555ab98f076bf24a6f35b9R1289) so some of the details of the lint may not be exactly right. Suggestions much appreciated.) ## More examples of when this lint might appear * If a user implements a method with a raw pointer receiver, say, `impl WindowPane { fn clear(self: *mut Self) {} }` then we in the standard library implement a new `clear` method on `*mut T`, a warning will be generator for callers of `WindowPane::clear`. * If a user in crate A implements a smart pointer type, say, `CppRef<T>`. User in crate B implements a method `impl CppParty { fn dance(self: CppRef<Self>) {} }` and then the creators of crate A implement `CppRef::dance()`, such a warning will be produced for anyone calling `CppParty::dance`. * We have `Box<Door>` and `impl Door { fn open(self: Box<Self>) {} }`. We then add `Box::open(self)` to the standard library. Callers of `Door::open` will start to experience a warning. ## Details of the deshadowing rules The deshadowing rules apply in two circumstances, which aim to have the same effect: ### Case A: when there are multiple candidates at the same step For each step of the `Deref` chain, and each type of receiver (by-value, by-reference, by-mut-reference, by-const-pointer) the function `consider_candidates` is called to decide between possible method candidates. Prior to this RFC, if it found multiple possible candidates, it would show *an error*. We downgrade this error to a warning if we believe that the "outer" candidates may be newly added and may be shadowing a pre-existing "inner" candidate. Specifically the rule is: * If of all the candidates, exactly one has the greatest "depth" - that is, there's one which is deeper through the `Receiver` chain than all the others, then * We emit a warning * We discard all the other candidates. Under all other circumstances, we continue to emit an error as we do now. This rule applies in cases like this: ```rust impl<T> SmartPtr<T> { fn m(&self) {} } impl Foo { fn m(self: &SmartPtr<Self>) {} } fn main() { let f = SmartPtr(Foo); f.m(); } ``` ### Case B: when we've found picks of different kinds The picking code currently picks: * The best possible candidate where `self` is received by value; and if none, it picks: * The best possible candidate where `self` is received by reference; and if none, it picks: * The best possible candidate where `self` is received by mutable reference; and if none, it picks: * The best possible candidate where `self` is received by const ptr. This serial approach doesn't work in cases like this: ```rust impl<T> SmartPtr<T> { fn m(self) {} // note by value } impl Foo { fn m(self: &SmartPtr<Self>) {} // note by reference } fn main() { let f = SmartPtr(Foo); f.m(); } ``` Assume that `SmartPtr::m` was added later; it may shadow `Foo::m` which may have been added earlier. So, in `pick_all_method`, we no longer do these steps in series. We instead work out the best candidate by value, by reference, and by mutable reference. We consider `&mut T` to be the weakest, `&` to be intermediate, and `T` to be the strongest kind of method calls. If a stronger method call might shadow a weaker method call, _and_ the weaker method call is further along the `Receiver` chain, then we instead pick that method and show the same warning. ### Possible enhancements to these rules As an implementation detail: possibly, the current method resolution logic (`pick_all_method`, `pick_by_value_method`, `pick_autorefd_method`, `consider_candidates`) should be flattened to consider a single list of candidates such that these deshadowing rules can be applied in one place not two. ## Other benefits of these rules If we employ these rules, they allow us to add methods to `Box`, `Rc` etc. safe in the knowledge that we won't shadow methods in the contained types. Also, all current workarounds for the lack of "arbitrary self types" pose similar shadowing hazards. For instance: ```rust // crate: myrc pub mod myrc { pub struct MyRc<T>(pub T); impl<T> MyRc<T> { // Consider what happens to downstream if we later add: // pub fn eat(&self) {} } } // crate: dino pub mod dino { use super::myrc::MyRc; pub mod prelude { pub use super::MyRcExt; } pub struct Dino; pub trait MyRcExt { fn eat(&self) {} } impl MyRcExt for MyRc<Dino> { fn eat(&self) { println!("dino eat"); } } } // crate: downstream use dino::{prelude::*, Dino}; use myrc::MyRc; fn main() { MyRc(Dino).eat(); } ``` The addition of these rules would remove these shadowing hazards. ## Other options * Decide to allow method calls on `NonNull`, `Weak` and raw pointers, by simply deciding never to add more methods to them again. * Decide to support arbitrary self types without adding support for `NonNull`, `Weak` and raw pointers. Do not add this rule. Disallow, or at least advise against, implementing `Recever` for types with methods. * Radically change a future edition of Rust to resolve all methods from the inside out! ## Non-goals of this proposal The current proposal covers only method dispatch with a custom `self` type. It does not cover: * Dynamic upcasting * Dynamic dispatch. Some say this is more important: for example @nbdd0121 [says](https://hackmd.io/z4n40072Tqy8MhQPZn0N-g): > [dynamic dispatch] provides real power to library authors. Currently, smart pointers that allow dynamic dispatch is a privilege that only standard library smart pointers enjoy. It's extremely difficult if not impossible to replicate this feature in 3rd party libraries, and even it's possible, it won't be as ergnomical. It's certainly important that we don't constrain our ability to broaden dynamic dispatch in the future. ## Decisions @Darksonn says: > One idea that has come up again and again in the Rust for Linux discussions is that, if there is some feature where we aren't able to stabilize it because some details are uncertain, then we can still stabilize the part of the feature that Rust for Linux needs. > > This is very relevant for this RFC: The kernel does not really need raw pointers and `Weak` to become receivers. If this RFC ends up being blocked just because we can't find a solution to those things, then that would be really unfortunate. There is a clear path forward on the easy part of the `Arc` problem. Let's solve that so we can move on to the hard part. ## Background reading & references * The [proposed RFC](https://github.com/rust-lang/rfcs/pull/3519). * The consensus that we _really_ want to support `NonNull`, `Weak` etc. even if we have to do something a bit weird: * [Summary of views](https://github.com/rust-lang/rfcs/pull/3519#issuecomment-1824448078) * Results of [a prior T-lang meeting discussing this](https://github.com/rust-lang/rfcs/pull/3519#issuecomment-1853465851) * The [comment explaining the rules needed to avoid compatibility problems with `NonNull` etc.](https://github.com/rust-lang/rfcs/pull/3519#issuecomment-1858223400). * [Zulip thread](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Arbitrary.20self.20types.20v2.20RFC) * [Gary's thoughts on different aspects of smart pointers](https://hackmd.io/z4n40072Tqy8MhQPZn0N-g) --- # Discussion ## Attendance - People: TC, tmandry, Adrian Taylor, Nadri, nikomatsakis, Mara, eholk, pnkfelix, Gary Guo, Josh, scottmcm ## Meeting roles - Minutes, driver: TC ## We can't add a method to Box because we can't change stable Deref method resolution order Nadri: Having read the proposal, I don't see how you can add a method to `Box`. Today it is legal to define: ```rust! struct MyBox<T> { ... } impl MyBox<T> { fn foo(&self) { ... } } struct MyType; impl MyType { fn foo(&self) { ... } } ``` Calling `x.foo()` on `x: MyBox<MyType>` will, without error, select the `foo` defined on `MyBox`. Hence method resolution order involving `Deref` is set in stone. Hence no amount of warning can allow us to add methods to a `Deref` smart pointer in a non-breaking way. What am I missing? Adrian: I am not proposing to change this. The cases where warnings would be shown are those where the self type involves the same number of derefs, because that's where the ambiguity occurs. In this case a call to `MyType::Foo` would involve one extra deref so none of the new behavior would occur. I agree that's probably not clear! Nadri: OK, I was confused by the "Adding new methods to smart pointers" heading and first paragraph. nikomatsakis: I believe that *if* (in the future) we adopted this "inside-out" resolution rule more uniformly, we could add methods to `Box`, by the same logic. As is, I think we cannot. (But we can for `*mut T`, as it uses the inside-out resolution rule.) scottmcm: Doesn't inside-out just change *who* can't add new things? If we changed to inside-out then people trying to call the Box methods would have the problem of getting something different. ## Fourth option: support raw pointers and NonNull but not *arbitrary* self types? Josh: Note: I am in favor of shipping this as proposed, but want to make sure we consider all possibilities. If we don't have consensus to ship this as proposed, I'd rather have limited support than no support. Josh: It seems like we could support raw pointers and NonNull without committing to supporting *arbitrary* self types. Could we evaluate that independently? TC: `NonNull` also? Josh: Yes, fixed. Adrian: That would not meet the needs of the cross-language interop use-cases, but that doesn't mean we shouldn't do it... Josh: Agreed, and it also wouldn't meet some Rust-for-Linux needs. Mara: NonNull already has methods taking `self`, so you'd still have to solve the same problems with that. NM: Is there anyone who would block an attempt to proceed with the smaller subset that did not address raw pointers and `NonNull`? JT: I would block if I felt it was a one-way door that prevented us from doing them in the future. NM: Nobody knows if there is a path? scottmcm: Let's talk about the next point then... ## Do we really need arbitrary self types for raw pointer and `NonNull`? Gary: It was mentioned that there's a consensus to pursue allowing raw pointers and `NonNull` for arbitrary self types by pointing a link to https://github.com/rust-lang/rfcs/pull/3519#issuecomment-1824448078. But I don't read it as a strong consensus to allow arbitrary self types of them. The majority of the problem/complexity of this RFC seems to originate from the need to support those types, and it could be vastly simplified if these types are omitted. Adrian: FWIW as proposer of this RFC I am personally not attached to providing `NonNull`, pointers, etc. support but I've heard from multiple people that this is the main use-case which makes arbitrary self types appealing for the Rust lang community, which is why I've tried to take it in this direction. We should definitely discuss. scottmcm: I think the "not something holding a reference" part is undeniably critical, but "on a raw pointer directly" has never felt persuasive to me. Saying that you need to wrap the raw pointer in something else -- `CppRef`, `kernel::Weak`, etc -- seems like it could plausibly be entirely fine. Hardly the first place where the answer is "look, you should be newtyping if you want to do this.". Gary: I think this is a good point. ~~But could there be workaround for that for dereffing to a middle type provided by the standard library that is fundamental so that users can implement types on. E.g. `MyPtr: Deref<BikeshedSomeThingLikeUnsafeCell<Foo>>` and allow user to implement on `BikeshedSomeThingLikeUnsafeCell<Foo>`. Not a pretty solution, but is a workaround.~~ Actually you can't do inherent impl on foreign types even if it's fundamental. Adrian: I think it's generally _actively good_ for us to encourage newtype wrappers in cases where we want things to have semantics like this. However, the counterargument I've heard is that `unsafe` Rust gets generally a lot _nicer_ if you can call methods on raw pointers. scottmcm: I'd love to see elaborated versions of that that showing why those things can't reasonably be newtypes. Adrian: let's pester Niko about it :) Gary: Personally I think using `TypeName::method_name` isn't too bad especially when you're already going to attach `unsafe` and a safety comment to that method call. I don't think we want to make doing unsafe calls too easy :) Mara: Agree on not allowing it for raw pointers but pushing for wrapping in a newtype. Raw pointers already have [many self-taking methods](https://doc.rust-lang.org/stable/std/primitive.pointer.html#implementations). You need to wrap a pointer in a newtype anyway for other reasons, e.g. to define its thread safety (Send/Sync). Gary: Also, for FFI, one observation: for Rust-for-Linux we never need to use raw pointer as receivers. We always define a wrapper struct that is `UnsafeCell<MaybeUninit<CType>>` --- pnkfelix: It seems important that people can use raw pointers here? scottmcm: It's critical that it avoids creating references, so that it's not forcing people to meet references rules (tree borrows scope limits, immutability/noalias restrictions, etc). The question is whether `self: *const Foo` *directly* is essential or if `self: MyTransparentPtrWrapper<Foo>` is fine. scottmcm: The question is really whether people should need to make newtypes for all of these. Josh: Do we lack a consensus on proceeding with arbitrary self types v2, the full version? scottmcm: I would at the moment. NM: If we abandoned supporting raw pointers and said you always had to use a newtype, and then we had the rule that once you implement `Deref` or `Receiver` that you can't add methods to the type, that seems simpler and better to you scottmcm? scottmcm: Yes. tmandry: We need to decide, I think, on the inside out rule -- the resolution rule -- or we need to find a way to disallow new methods on types that are arbitrary self types. We can't change our minds later on this. Nadri: Could we make these all hard errors? JT: Making them hard errors would introduce the SemVer problem here. We'd probably at least want to add a lint here. Nadri: Adding a method to a Deref type is already breaking in stable Rust so that's not something new. TC: There's an example of this in the document above using traits. Nadri: We don't even need traits, see: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=24a1d92bd6e24a4e167e8a3a94c5dc0a NM: Let's talk about high level goals... ## Sketching the constraints NM: Hard constraints: * We want to permit adding additional methods to `*mut T` * We want to permit adding `self: MyRc<T>` methods * We need code C to continue compiling, with same semantics, in rustc vN and vN+1 * We need `trait Receiver` and `trait Deref: Receiver` because not all smart pointer types can support deref (for good reasons) * We need a way to define methods that do not form a (safe) Rust reference when called * We need a way to invoke through `dyn` (not covered in the RFC) Soft goals: * We want to support `self: *mut T`... * ...but if we add add'l methods to `*mut T` that would be breaking... * ...leading to the inside-out resolution. ```rust impl RayonTask { fn call_method(self: *mut Self) { RayonTask::call_method(self, ...) self.call_method(...) } } ``` Gary: This doesn't seem like a hard goal to me. NM: This is a matter of ergonomics, but it seems an important one. pnkfelix: This reminds me of `await!(..)` vs `.await`. In some sense these are equivalent, but people did feel there was a big difference. Nadri: maybe this isn't just ergonomics, e.g. if the method was in a trait and we want dynamic dispatch on the `*mut dyn Trait`. Gary: Dynamic dispatch should be separate from receiver and method resolution: https://hackmd.io/z4n40072Tqy8MhQPZn0N-g NM: ```rust trait Job { fn call(*mut self); } let job: *mut dyn Job; job.call(...); // // what rayon did is something like struct JobPair { data: *mut (), f: fn(*mut ()), } ``` Gary: We should not tie together the method call syntax and dynamic dispatch. These are two orthogonal things. Nadri: Example for Gary's idea: ```rust trait Job { fn call(x: *mut Self); } let ptr: *mut dyn Job = ...; Job::call(ptr) // does dynamic dispatch ``` NM: What scottmcm was suggesting is that people write this: ```rust struct RayonRef<T> { data: *mut T } // but we cannot impl<T> Receiver for RayonRef<T> { } trait Job { fn call(self: RayonRef<Self>); } let ptr: RayonRef<dyn Job> = ...; ptr.call(); ``` NM: We'd have to have some way to get at the vtable. Gary: But then we could not later add methods to `RayonRef`, correct? NM: Yes. NM: If we changed the rules such that we follow the `Receiver` chain and did nothing more, that would work, but we'd have to use newtypes. I think that is incompatible with adding support later for `*mut`. Nadri: so that would be committing to some method resolution order, I see. --- Mara: Everything that implements `Deref` in the standard library has been designed for that, by not having methods. But that's not the case for other types like `Weak`. To change this, even with the new rules, we'd have to think about how many warnings we'd be creating in the ecosystem when we add a new method. That's something I rather not spend time thinking about when designing their API. Mara: There are other reasons for why you should wrap pointers anyway, like implenting Send/Sync correctly. So maybe we should just push people to do that. Nadri: My proposal (assuming it makes sense) should not need warnings, which would mean the libs team wouldn't have to worry about adding methods. pnkfelix: Are we talking about adding special types for this? I'm thinking about atomics here, e.g. Mara: I'm not sure. pnkfelix: There are lots of problems here no matter what. NM: There seems an analogy to me about the `itertools` APIs. Something that has downstream consequences has to be considered. --- TC: I'm curious, historically, to what degree were these tradeoffs, e.g. that the method resolution rule implies that types that implement `Deref` shouldn't add new methods (and that adding new methods is breaking anyway because e.g. of how people can use traits), intended versus accidental? NM: It was a bit of both. Some of these calls seem wrong to me now, looking back. Sort of yes and sort of no. ## Inner-first versus most-specific-first tmandry: I want to reserve some time to talk about Nadri's proposal, how it differs from the one in the rest of the doc, and if it's feasible and preferable in some way. Personally I find the rule of "inner first" clearer than "most specific first" (more specific in what way?), but there might be advantages to this rule. Nadri: I don't know what "inner first" is but that doesn't sound compatible with today's `Deref` nikomatsakis: I don't really follow Nadri's proposal, though I get the high-level idea of it. The examples are confusing to me. Nadri: what's confusing? can I do something to clarify? Nadri: my intent is to ensure adding methods on a non-Deref pointer type isn't breaking by always resolving in favor of the downstream crate. it so happens that it looks like specialization to me, hence why I'm arguing it's a decent choice. ## RecvPtr tmandry: Maybe something like this would work... ```rust! struct RecvPtr<T>(NonNull<T>); impl<T> NonNull<T> { fn receive(self) -> RecvPtr<T> { .. } } impl<T> Receiver for RecvPtr<T> { .. } ``` NM: This is a good example of why we'd want this. It would be awkward to solve without this. scottmcm: Is there any possibility of us supporting this on a restricted set of things? E.g. fundamental types where no method resolution conflict could come up. NM: We probably shouldn't get too distracted on the atomic question. Gary: gathered some thoughts on today's workaround, i.e. define a `Deref` with `panic!()` or post-mono error impl. Consensus: don't want that. ## Straw poll NM: I'd like a straw poll about where people stand. I'm feeling a bit convinced by scottmcm's position. tmandry: Is the proposal to leave a path for doing that in the future or to not solve it ever? NM: I don't think there is a path except using an edition. We could of course stop adding methods to `*mut`. scottmcm: I don't think the latter one is realistic. The presence of so many methods indicates that we probably don't have the full desirable set. NM: It's not a decision we'd be making today. We could tweak the method resolution on an edition boundary. scottmcm: I do like that point. NM: For me, the killer is that the design doesn't fully solve it. tmandry: So this would be to adopt the full set of rules for `Deref` and apply those to `Receiver`? NM: Yes. scottmcm: If we can avoid new rules, that seems best. NM: It seems we do need to solve the atomics problem at some point. pnkfelix: There's no path that doesn't involve having a reference somewhere. That's a bear trap rather than a footgun. NM: My other concern is whether we can eventually support `dyn`? --- TC: Josh, how are you feeling about this given what you expressed earlier? JT: This would be disappointing, but a solution for some things would be better than a solution for no things. There are a number of people who don't even care about arbitrary self types, they just want us to support it on pointers. And there are people who want the opposite. NM: The fact that these use cases are distinct maybe suggests we should treat these separately. TC: I'm curious how we feel about the existing problem here, i.e. what people would do with traits in the absence of arbitrary self types that has the same SemVer problem. NM: My feeling over the course of this meeting has shifted to feeling like we're not solving that problem either even with this proposal. It seems we should attack that directly and separately. Mara: My feeling is it doesn't really fit Rust to not be wrapping pointers in your own pointer type. I think of `*const T` more as "an address", rather than "a pointer". Its (safe) methods are about manipulating the address (offset, alignment, etc.). To make a pointer/reference thing, you wrap it into something that has that address as a field. JT: Would we get any benefit from using `Receiver` for the arbitrary self types case, and using some other specific mechanism for raw pointers? Nadri: I'd be interested in examples of why doing the forwards compatible thing is much harder than using the `Deref` rules. NM: Thinking about that. Nadri: The hard error would be so that we can ship something today while leaving the door open. NM: That's a viable option. It's plausible. I can see the appeal of leaving some space. Nadri: We'll need some new errors anyway. scottmcm: I'd be fine reserving so space, subject of course to it not adding too many of its own new traps. ## Next steps JT: Has the alternative of using wrapper types been added to the RFC? Adrian: It was in an earlier one. I can add it back. NM: Let's write up short versions of the options on the table and have each lang-team member write-up their position on each one. Adrian: I can do that. (The meeting ended here.) --- ## What's the most restricted version of arbitrary self types we could do Nadri: If we error in all cases where there would be method resolution ambiguity, is this maximally forward-compatible? Can we do this and figure later what to relax to allow or not `NonNull: Receiver`? Nadri: In other words, is there an easy version of the feature? Adrian: I am in favor of initially doing a minimal version which is just for newtypes, not for `NonNull`, pointers etc. but I think there's already consensus that the pointer, etc. version is sufficiently valuable that we'd want to understand how we get there, to avoid constraining the solution space. Nadri: I guess that's my question then: do we have enough of an idea of how to get there? ## Separating hard/soft constraints nikomatsakis: I was thinking that we should try to be more specific in what we mean when we say (e.g.) backwards compatibility or other things. Specifically, the hard constraint we have is: * code C compiles with compiler version N and has behavior X * *implies* * code C compiles with compiler version N+1 and has behavior X ...but not that "code C' (which compiles in N) has same behavior in N and N+1", though it'd be nice if that were true. This feels a bit clumsy now that I write it out, but let me get to the point -- given a change where N+1 introduces a method on `* mut`, the point is that the inside-out rule preserves this invariant, because before there was no method, so you can't call it. In contrast, adding a method to some inner type (going from C to C') could have impact on the behavior. i.e., library semver is a bit tricky here, right? But specificaly only in race condition situations where C to C' adds a method and N to N+1? Ok. Thanks for attending the musings in my head. ## Warning-free code pnkfelix: the text says that "Warning-free code resolves outside in, in the normal intuitive fashion." I want to try to double-check my understanding, when you have crates being developed in parallel here. Is there a scenario where 1. you have three crates (D provides Dino, M provides MyRc, and U uses both of them together), 2. D and M are both not aware of each other, 3. D and M in their respective development histories each add methods with the same name (but not using eachothers type, due to (2)), and 4. U either itself adds a method or calls a method, and doesn't get a warning, but the resulting call is depending on the race-condition of which was pulled in? Adrian: I am definitely keen to have more brainpower applied here. This is the area I am least confident on. TC: I've been calling this the race condition case: dino 1.1 depends on myrc 1.1. myrc 1.2 adds `MyRc::eat`. Downstream app depends on myrc and dino and calls `MyRc(Dino).eat()`. The dino author, still depending on myrc 1.1 himself, adds `Dino::eat` without knowledge of `MyRc::eat`. Downstream app runs `cargo update` and gets dino 1.2. Method resolution on the call changes. Josh: Would this require one of the crates to have written a method using a *generic* arbitrary self type, since it doesn't know about the concrete type it has (because of your constraint (2))? ## libs-api (from Jitsi chat:) Mara: for allowing arbitrary self types for std types that are *not* Deref (e.g. Weak or NonNull), libs-api should be consulted before accepting that, since that significantly impacts API design. Niko Matsakis:interesting. Josh Triplett: In general, or only if we're recommending against having methods on those types? Josh Triplett: In any case, that would only be allowed if those libs-api types implemented Receiver, which is up to libs-api. Mara: libs-api should at least check what effect it has on library maintainers and (std) api design. Josh Triplett: Agreed. Josh Triplett: In particular, if we're approving an RFC that has extra complexity in order to support raw pointers and NonNull, we should only do that if libs-api expects to support this on raw pointers and NonNull. 😃 ## Dynamic dispatch: not constraining our future selves Adrian: Gary has pointed out that in future we are likely to want dynamic dispatch too, and we should ensure we're not constraining our future design space there. Should we discuss? (this was indeed discussed - NM and Gary agreed that this is probably not closing any doors) ## Has the types team reviewed this and approved? Josh: In order to catch potential isues that we may not have foreseen, I'd want to make sure the types team had reviewed this.