Extern types V2

Introduction

Extern types have been accepted into the language as RFC 1861, however it is not implementable due to extern types not having a known alignment. I am proposing RFC 3396 which changes the meaning of ?Sized and introduces a new trait (MetaSized) in order to fix this. It requires an edition change to do this and has some outstanding unresolved design questions.

Summary

Some types do not have a size known at runtime, these fall into 3 categories (that I know about):

FFI types - opaque types not defined by Rust, if these come from a dynamically linked library then the size/alignment can even vary at runtime. The Rustnomicon currently suggests using zero sized types for this but admits this is suboptimal.
CStr-like types - CStr has a known alignment (1) but its size can only be determined by iterating over the bytes.
Types with an opaque tail - it's a common pattern in C to have an enum like object with a header containing a discriminant and an opaque tail. Unlike Rust enums, the size of the object can vary as long as it stays behind a pointer.

In order to more easily discuss these types I will introduce some descriptors for the size and alignment of types:

statically - the size/alignment is known by the Rust compiler at compile time. This is the current Sized trait. Most types in Rust are statically sized and aligned, like u32, String, &[u8].
metadata - the size/alignment can be derived purely from pointer metadata without having to inspect or dereference the pointer.
All remaining types fit in this category and are DSTs. [u8] has a statically known alignment but the size can only be determined from the pointer metadata, dyn Debug's size and alignment are both obtained from the vtable in the pointer metadata.
dynamically - the size/alignment can only be determined at run time. There are no types currently expressible in the language with dynamically known size or alignment.
The most discussed potential type in this category is CStr, which has a statically known alignment but it's size can only be determined by iterating over its contents to find the position of the null byte. Note that these types are odd, for example determining the size of a Mutex<CStr> requires taking a lock on the mutex.
unknown - the size/alignment is not able to be determined at compile time or run time.
This is the category that opaque types fall in (and no other existing types occupy) without any additional domain specific knowledge. Therefore extern types will occupy this category to allow the most flexibility.

"dynamically aligned" (or "unknown aligned") types cannot be placed as the last field of a struct as their offset cannot be determined without already having a pointer to the field. This is the main issue with the previous RFC. Because generic structs exist, I don't believe we can do something simpler like an explicit error for using these types in a struct (unless we accepted a post-monomorphisation error).

The solution this RFC proposes is to add a MetaSized trait that means a type is metadata sized (and implies it's metadata aligned) and to relax ?Sized to mean a types has unknown size and alignment (rather than metadata size and alignment).

The lack of the Sized and MetaSized traits on a type prevents you from calling ptr::read, mem::size_of_val, etc, which are not meaningful for opaque types.

In the 2021 edition and earlier, these types cannot be used in generic contexts as T: Sized and T: ?Sized both imply that T has a computable size and alignment.

In the 2024 edition and later, T: ?Sized no longer implies any knowledge of the size and alignment so opaque types can be used in generic contexts. If you require your generic type to have a computable size and alignment you can use the bound T: ?Sized + MetaSized, which will enable you to store the type in a struct.

The automated tooling for migrating from the 2021 edition to the 2024 edition will replace ?Sized bounds with ?Sized + MetaSized bounds.

The RFC proposes adding extern types defined like so:

extern "C" {
    type Foo;
}

Foo is !Sized, !MetaSized, !Send, !Sync, !Freeze and is FFI safe. It can not be included in a struct unless it is the single non-zero-sized field of a repr(transparent) struct.

I believe sorting out the syntax and precise semantics of extern types is secondary to the MetaSized trait as they can be sorted after the 2024 edition.

Unresolved questions

Does this require implied bounds?

Box and Arc will require MetaSized bounds, which means that you'd have to write traits like this:

pub trait Trait {
    fn foo(self: Box<Self>) where Self: MetaSized;
    fn bar(self: Arc<Self>) where Self: MetaSized;
}

Do we need to worry about bounds in std that can't be relaxed because it would be a breaking change?

Should "metadata sized" imply "metadata aligned" or should we be adding the MetaAligned trait rather than MetaSized?

Should MetaSized be a supertrait of Sized?

All Sized things are MetaSized but Sized doesn't semantically require MetaSized.

Should users be able to slap a `#[repr(align(n))]` attribute onto opaque types to give them an alignment?

This would allow us to represent CStr properly but would necessitate splitting MetaSized and MetaAligned as it is only "dynamically sized" but "statically aligned". (We may be able to get away with the Aligned trait)

Notes from meeting

Attendees: Jack Rickard (Skepfyr), tmandry, Josh, scottmcm, TC
Minutes: TC

Actions

Look into how many ?Sized bounds would need + MetaSized.
Can we remove the MetaSized bound on Box so that it doesn't need to be mentioned as much?

Can we work around this?

pnkfelix: why can't the compiler just assume that such types have an alignment of 1? It won't be able to construct the pointers to them itself anyway, right – as in, it has to accept the pointers from foreign code, so it seems to me like assuming the maximally conservative alignment (in terms of not knowing anything about what the foreign code might do) would work here, in terms of making it illegal to e.g. use those low-order bits to store a niche?

pnkfelix: Oh, I just read enough to see the bit about "using the extern type as the last field of a struct" … that … hmm.

Skepfyr: The other thing is that the compiler still needs to know that these types are different so that it can prevent you from putting them in structs.

pnkfelix: And we cannot just "treat extern types as special" because we need to be able to instantiate type parameters as instances of them, okay.

pnkfelix: We could get away with a post-monomorphization time error here though, right? Where we reject uses of all extern types (unless they otherwise indicate their alignment) as a struct field? (Indeed, the doc as written alludes to this…)

(in meeting…)

Skepfyr: These types have to be understood by the compiler to not be allowed in certain places. Post-mono errors are definitely a possibility here.

pnkfelix: I need to better understand the use-cases here to understand how the value comparxes to the adding this complexity to the language.

pnkfelix: This adds a lot of mental overhead for users. Maybe?

Josh: It is almost the case that rather than a post-mono error you could prohibit its use in generic contexts entirely.

scottmcm: I'm not convinced by that. People will want byte-add on this kind of thing, for example.

Josh: Fair.

Josh: Clarifying… under what circumstances do you want the generic type rather than a pointer to it?

scottmcm: https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.byte_add has the generic type as the pointee, for example.

Skepfyr: …something like Swift.

Josh: If you had a generic type that included the pointer type, that wouldn't be a problem, but since we have types like Rc, we need to allow these in a generic context.

tmandry: I agree about the overhead and the high cost. All contexts where we have ?Sized today we don't want to accept these types. There was previous t-lang guidance to not add a new ?-bound. If this is something you only run into very rarely, maybe it's not so bad.

Skepfyr: The thing that worries me the most is blanket impls. There are probably some of those that people will want to add to these types.

tmandry: So you're saying that you'll see ?MetaSized a lot, e.g. on the blanket impls.

scottmcm: I'm curious how often these will end up in a box rather than being a reference where you aren't looking at the size at all. I don't know that references always care, but Box and Arc do.

Skepfyr: Yeah, I'm not sure. I don't know.

digama0: How do you differentiate between NonNull<T> which can take a ?MetaSized and Rc<T> which can't, without a trait bound? Would a post-mono error even be able to get this right without hard-coding everything about these types?

Skepfyr: That's roughly what we're talking about. You need that or some kind of post-mono error.

scottmcm: I ponder things like &A: PartialEq<&B> https://doc.rust-lang.org/std/primitive.reference.html#impl-PartialEq<%26B>-for-%26A, which doesn't need MetaSized, and don't know if that's more or less common than generic-in-Box.

Skepfyr: There's also the fact that things like PartialEq will be implemented on references to the type since the types themselves don't have a type. So it's likely that the blanket impls will usually apply because people will have implemented on the references.

tmandry: scottmcm, clarification on your question…

scottmcm: If you have && to extern type, you'd want to still forward that to the underlying PartialEq. Once you know that it's a slice, you know it's MetaSized, but the forwarding didn't require that, so it forwarded the generic.

tmandry: We still use the impl, PartialEq for slice of T. And that impl knows how to get the size…

scottmcm: Once you're implementing PartialEq for a slice, you have .len on that slice. When you're implementing it on i32, you have the size.

Discuss post-monomorphization errors

TC: The document writes:

Because generic structs exist, I don't believe we can do something simpler like an explicit error for using these types in a struct (unless we accepted a post-monomorphisation error).

How bad is this exactly as compared with the alternative proposed?

scottmcm: Would we feel comfortable saying this is a layout post-mono error? Like we have existing [u8; 1<<49] post-mono errors on x64…

Skepfyr: This is just a marker type. So a post-mono error is definitely available. The only minor caveat, I have wondered whether this type should have methods on it, such as size of val and align of val. At the momet they are free functions so it would just work. But if you made them methods on the trait, that maybe makes a bit more sense.

tmandry: It definitely feels like a different category of post-mono error than the existing ones. It does feel more like the case we already have for Sized. So there's an argument that it would be inconsistent. But we do need to be pragmatic.

pnkfelix: I can see the value in having a trait that let people who want to perform this reasoning to get that. But I worry about people being forced to do that plumbing when a post-mono error would be fine for most cases. I haven't yet figured out whether the ?-bounds would help.

scottmcm: we'll definitely have a bound that includes it, since T: Thin is already RFC-accepted, but I don't know if that would meet everyone's needs.

tmandry: The set of use-cases for ?MetaSized is the subset of ?Sized, and so I could see a world where we just continue writing ?Sized everywhere… and we wait to decide whether it's MetaSized and we defer figuring that out to post-mono.

Josh: It sounds like a thing that we could do. But how often do we expect this to come up. Our expectation if we go with the proposal here is that… there's an ordering of how often each thing is.

scottmcm: The thing I'd emphasize is impls/functions, we'll have more types that are Sized rather than MetaSized.

For example, fn foo<T: ?Sized>(x: &T) { ... } might usually not actually care about MetaSized.

Josh: Does anyone feel like a huge number of things will not be Sized but will want to go in a struct?

Skepfyr: You do need the MetaSized bound to put it in a Box. You need the layout when dropping it.

Pondering implied bounds here

scottmcm: I guess part of the problem with Box<T> is that it's Drop that needs the bound, so it's not something that we could do some kind of "you don't need to prove it just to mention the type" exhaust hatch? (Since Drop can't have more requirements than the type itself, IIRC.)

Josh: If it weren't an issue for Drop, then I'd agree, the obvious answer is that Box::new requires MetaSized but the existence of the type doesn't, but since we don't have linear types, we don't have a way to say that.

Skepfyr: You could do something like require that all ways create a Box require MetaSized, and then not require Box to be MetaSized, and that would fix the issue.

scottmcm: But that would be new magic, a layout that only Box can use?

scottmcm: I guess we can have a version of get the layout that panics if it's not MetaSized and then something like Box or Arc used it, it would always be fine because it would always be unreachable.

tmandry: Maybe reasonable?

scottmcm: As a temperature check… if we had a way where this would be required for Box::new or Arc::new, but it wouldn't require the bound to use the Box or Arc, would that make people feel better about this?

tmandry: Not me. It's type-state as a pattern. And it might make the MetaSized trait less ubiquitous.

Josh: I'm honestly hesitant to do this as a post-mono error, because then anyone who does encounter it, it would be unique and special and different. Doing it as a bound is at least normal, even if advanced. We have a big fancy type system but then choose not to use it here? It's a bit weird. It seems like it would be optimizing for minimizing the number of times one writes MetaSized rather than optimizing for users of those interfaces.

pnkfelix: scottmcm was proposing implied bounds?

scottmcm: Josh is skeptical of the post-mono, so writing MetaSized would be good, so then the question is, can we not write this where not needed.

pnkfelix: Three things. The proposal as written here; if you're not ?Sized, you're ?MetaSized. MetaSized is a sort of normal trait. But it's implicit. Option 2: A new ?Sized bound. I could imagine a new world where ?Sized stays the same; it means MetaSized… Third option: Implied bounds of some form.

scottmcm: The option in the middle, ?Sized + MetaSized… if you call size_of_val, the compiler could tell you what to do.

tmandry: Presumably the compiler could tell you that you should have written one rather than the other.

scottmcm: One way around may be harder. If we've taught people ?Sized for "look, I want non-sized things", then the compiler suggesting + MetaSized seems easier than it being able to suggest "you meant ?MetaSized instead".

Josh: If we do a world with one, then that's a world in which extern types can't appear everywhere.

pnkfelix: Is that a bad world? You have to reason about each individual type.

Josh: The default should be that you ask for the things that you need. Most things are going to ask for Sized. How often will this really come up?

tmandry: That is the key question that we need to answer. I had assumed most things would be MetaSized.

Josh: What cases will likely come up in practice? How often will you have a case where you have to add + MetaSized for it to work, other than for smart pointers.

Josh: To be clear, I think it will be less than 10% in user code, but not in the standard library. There are more smart pointers there.

tmandry: The use-cases for this are when you want to allocate and when this needs to be the trailing field in a struct.

Josh: You could generalize it a bit to placement in structs.

Josh: We could do something implicit; if you put it in a field, obviously you require that it's MetaSized.

scottmcm: That would be the first trait bound we did that for; we do that for lifetime bounds.

Josh: We shouldn't do that in this proposal. But we should consider a follow-on proposal that would make this better.

scottmcm: Everything that today uses ?Sized on a type would need ?MetaSized? Unless it's holding a &T.

scottmcm: The first thing I think of for ?Sized is something like

fn foo<T: ?Sized + Ord>(x: &T, y: &T) { ... }

that almost certainly doesn't care about MetaSized.

Skepfyr: What I'm interested in is, what needs to happen to progress this?

Josh: There are some factual questions that could be answered here. E.g. what fraction of bounds that now need ?Sized will need + MetaSized? 10%, 50%, 90%?

Skepfyr: I keep flipping back and forth.

tmandry: Is there something that we could do with crater?

Josh: Could we just categorize the things in the standard library.

Skepfyr: There actually aren't that many ?Sized in the standard library.

Scottmcm: it might not be as bad as you think, because

impl<B: ?Sized + ToOwned> Cow<'_, B> {

for example is sortof a smart pointer, but doesn't use MetaSized.

scottmcm: Maybe the top libraries really should be exporting the MetaSized bounds. Maybe that's OK if most code doesn't have to say this most of the time.

tmandry: Going through the exercise is probably what has the most value.

scottmcm: there's just so many forwarding things like impl<T: fmt::Display + ?Sized> ToString for T { that don't care about MetaSized.

Josh: As an example, trait AsRef<T: ?Sized> doesn't need + MetaSized because it just gives you a reference back to T.

Skepfyr: From what I've seen ?Sized just doesn't show up that much..

TC: Maybe the value of going through the exercise is to come up with a set of patterns. Even if we didn't have numbers, we could look at the patterns and build an intuition for how likely they are to come up.

Skepfyr: Happy to go through code and document this.

scottmcm: You said something really interesting; that in your early survey most code never says ?Sized.

Josh: That's why the post-mono error seemed like the wrong path.

scottmcm: The other thing that might be possible is maybe exploring what does a Box look like that maybe can get away without this requirement on the type so it doesn't need to be mentioned everywhere.

Josh: You're proposing that Box could call that, but then if it failed on drop, it would panic?

scottmcm: But it would be unreachable. But the point is that this should be explored as a way forward.

tmandry: Sketching it out as an alternative or future possibility would be good.

Josh: There's one other aspect worth considering. Putting a not-MetaSized thing into a Box with a different allocator might be OK.

scottmcm: Interesting. I was imagining a Box using a malloc/free allocator… It wouldn't be a type restriction that it's MetaSized at all.

Josh: Your allocator could be that deallocating calls a specific FFI function.

scottmcm: I like this as a principled reason for why Box itself does not require MetaSized.

Josh: The majority of allocators would, but this wouldn't.

tmandry: We probably don't need to design the full API…

Skepfyr: You'd get some of the bounds from the other generic.

Josh: Looking at the allocator trait, none of it passes around a size, it passes around a layout. There's something you could do here, where an FFIAllocator would know the Layout that works for that one size and doesn't let you use that Layout to allocate the type.

Using the term "opaque types"

TC: We already use the term "opaque types" to mean something very specific in Rust. We probably shouldn't overload that word (@oli also raised this in Zulip).

tmandry: +1

Skepfyr: Right… I'll try to think of a synonoym there.

Josh: Maybe we could just call these "extern types".

Skepfyr: That sounds good.

(Meeting ended here.)

When do we need the length of a CStr-like type?

Josh: Is there any circumstance where we need the length of such a type, other than at the user's explicit request for the length, or when moving it, or when dropping it? I want to make sure we never implicitly take the mutex behind the scenes.

Can opaque types be explicitly marked `Send` or `Sync`?

Josh: Is it possible to explicitly mark extern types as being Send or Sync? Or mark pointers to them as being Send?

Skepfyr: Yes, I expect unsafe impl Send etc to be supported like anything else. (This means that they can never be Freeze but can be any of the other auto traits)

Can we automatically detect that `MetaSized` is unnecessary?

Josh: The proposed edition migration turns ?Sized into ?Sized + MetaSized. However, in some cases such bounds won't actually require MetaSized. 1) Can rustc detect cases where a MetaSized bound appears unnecessary, and lint? 2) Can we somehow integrate that detection into the edition migration, and not add the bound when unnecessary?

Can we spell out what you can do with an extern type?

Josh: Can you put a pointer to an extern type roughly anywhere? What about a reference, or mutable reference? Presumably you cannot create one unless you manually allocate memory?

ScottMcM: I think the core thing you can do is that extern types are why there's a difference between https://doc.rust-lang.org/core/ptr/traitalias.Thin.html and Sized: you get to make *const ()-sized pointers and references to pointees that are not Sized.

Josh: Could we implicitly have a reference be a thin reference, since there's no way to have a non-thin reference?

Scottmcm: Can you elaborate on "implicitly" there? It needs to be based on something, right?

Can we do this without changing the meaning of `Sized`?

tmandry: Could we leave the existing meaning of ?Sized as it is and only allow uses of extern types in places that have + ?MetaSized bounds?

(As I understand it, this proposal changes the meaning of ?Sized so that every existing bound has to be rewritten as + ?Sized + MetaSized. Is that right?)

Skepfyr: Yes that's correct. This proposal was written with the previous lang team guidance of not adding any more ?Trait bounds.

scottmcm: see https://github.com/Skepfyr/rfcs/blob/extern-types-v2/text/3396-extern-types-v2.md#opt-out-trait-bound---metasized for that alternative in the proposed RFC.

Clarification on 2021 edition language