or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
maybe_dangling
maybe_dangling
Summary
Declare that references and
Box
inside a newMaybeDangling
type do not need to satisfy any memory-dependent validity properties (such asdereferenceable
andnoalias
).Motivation
Example 1
Sometimes one has to work with references or boxes that either are already deallocated, or might get deallocated too early. This comes up particularly often with
ManuallyDrop
. For example, the following code is UB at the time of writing this RFC:It is unsound because we are passing a dangling
ManuallyDrop<Box<i32>>
toid
. In terms of invariants required by the language ("validity invariants"),ManuallyDrop
is a regularstruct
, so all its fields have to be valid, but that means theBox
needs to valid, so in particular it must point to allocated memory – but whenid
is invoked, theBox
has already been deallocated. Given thatManuallyDrop
is specifically designed to allow dropping theBox
early, this is a big footgun (that people do run into in practice).Example 2
There exist more complex versions of this problem, relating to a subtle aspect of the (currently poorly documented) aliasing requirements of Rust: when a reference is passed to a function as an argument (including nested in a struct), then that reference must remain live throughout the function. (In LLVM terms: we are annotating that reference with
dereferenceable
, which means "dereferenceable for the entire duration of this function call"). In issue #101983, this leads to a bug inscoped_thread
. There we have a function that invokes a user-suppliedimpl FnOnce
closure, roughly like this:The closure has a non-
'static
lifetime, meaning clients can capture references to on-stack data. The surrounding code ensure that'lifetime
lasts at least untilsignal_done
is triggered, which ensures that the closure never accesses dangling data.However, note that
thread
continues to run even aftersignal_done
! Now consider what happens if the closure captures a reference of lifetime'lifetime
:The type of
closure
is a struct (the implicit unnameable closure type) with a&'lifetime mut T
field. References passed to a function must be live for the entire duration of the call.The closure runs,
signal_done
runs. Then – potentially – this thread gets scheduled away and the main thread runs, seeing the signal and returning to the user. Now'lifetime
ends and the memory the reference points to might be deallocated.Now we have UB! The reference that as passed to
thread
with the promise of remaining live for the entire duration of the function, actually got deallocated while the function still runs. Oops.Example 3
As a third example, consider a type that wants to store a "pointer together with some data borrowed from that pointer", like the
owning_ref
crate. This will usually boil down to something like this:Such a type is unsound when
T
is&mut U
orBox<U>
because those types are assumed by the compiler to be unique, so any timeOwningRef
is passed around, the compiler can assume thatbuffer
is a unique pointer – an assumption that this code breaks becauseref_
points to the same memory!Goal of this RFC
The goal of this RFC is to
unsafe
code(Making the 2nd example UB-free without code changes would incur cost across the ecosystem, see the alternatives discussed below.)
The examples described above are far from artificial, here are some real-world crates that need
MaybeDangling
to ensure their soundness (some currently crudely work-around that problem withMaybeUninit
but that is really not satisfying):Yoke and Yoke again (the first needs opting-out of
dereferenceable
for the yoke, the latter needs opting-out ofnoalias
for both yoke and cart)ouroboros
Guide-level explanation
To handle situations like this, Rust has a special type called
MaybeDangling<P>
: references and boxes inP
do not have to be dereferenceable or follow aliasing guarantees. This applies inside nested references/boxes insideP
as well. They still have to be non-null and aligned, and it has to at least be possible that there exists valid data behind that reference (i.e.,MaybeDangling<&!>
is still invalid). Also note that safe code can still generally assume that everyMaybeDangling<P>
it encounters is a validP
, but within unsafe code this makes it possible to store data of arbitrary type without making reference guarantees (this is similar toManuallyDrop
). In other words,MaybeDangling<P>
is entirely likeP
, except that the rules that relate to the contents of memory that pointers inP
point to (dereferencability and aliasing restrictions) are suspended when the pointers are not being actively used. You can think of theP
as being "suspended" or "inert".The
ManuallyDrop<T>
type internally wrapsT
in aMaybeDangling
.This means that the first example is actually fine: the dangling
Box
was passed inside aManuallyDrop
, so there is no UB.The 2nd example can be fixed by passing the closure in a
MaybeDangling
:The 3rd example can be fixed by storing the
buffer
inside aMaybeDangling
, which disables its aliasing requirements:As long as the
buffer
field is not used, the pointer stored inref_
will remain valid.Reference-level explanation
The standard library contains a type
MaybeDangling<P>
that is safely convertible withP
(i.e., the safety invariant is the same), and that has all the same niches asP
, but that does allow passing around dangling boxes and references within unsafe code.MaybeDangling<P>
propagates auto traits, drops theP
when it is dropped, and has (at least)derive(Copy, Clone, Debug)
."Behavior considered undefined" is adjusted as follows:
Note: this diff is based on an updated version of the reference.
Another way to think about this is: most types only have "by-value" requirements for their validity, i.e., they only require that the bit pattern be of a certain shape. References and boxes are the sole exception, they also require some properties of the memory they point to (e.g., they need to be dereferenceable).
MaybeDangling<T>
is a way to "truncate"T
to its by-value invariant, which changes nothing for most types, but means that references and boxes are allowed as long as their bit patterns are fine (aligned and non-null) and as long as there conceivably could be a state of memory that makes them valid (T
is inhabited).codegen is adjusted as follows:
Newtype<&mut i32>
is marked asdereferenceable(4) noalias aligned(4)
. When traversing belowMaybeDangling
, no memory-related attributes such asdereferenceable
ornoalias
are emitted. Other value-related attributes such asaligned
are still emitted. (Really this happens as part of computing theArgAttributes
in the function ABI, and that is the code that needs to be adjusted.)Miri is adjusted as follows:
MaybeDangling
. (Note that by default, Miri will not do any such recursion, and only retag bare references. But that is not sound, given that we do emitnoalias
for newtyped references and boxes. The-Zmiri-retag-fields
flag makes retagging "peer into" compound types to retag all references it can find. This flag needs to become the default to make Miri actually detect all UB in the LLVM IR we generate. This RFC says that that traversal stops atMaybeDangling
.)Comparison with some other types that affect aliasing
UnsafeCell
: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e.&UnsafeCell<T>
is special.UnsafeCell<&T>
(by-val, fully owned) is not special at all and basically like&T
;&mut UnsafeCell<T>
is also not special.UnsafeAliased
: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e.&mut UnsafeAliased<T>
is special.UnsafeAliased<&mut T>
(by-val, fully owned) is not special at all and basically like&mut T
;&UnsafeAliased<T>
is also not special.MaybeDangling
: disables aliasing and dereferencable of all references (and boxes) directly inside it, i.e.MaybeDangling<&[mut] T>
is special.&[mut] MaybeDangling<T>
is not special at all and basically like&[mut] T
.Drawbacks
For users of
ManuallyDrop
that don't need this exceptions, we might miss optimizations if we start allowing example 1.We are accumulating quite a few of these marker types to control various aspect of Rust's validity and aliasing rules: we already have
UnsafeCell
andMaybeUninit
, and we are likely going to need a "mutable reference version" ofUnsafeCell
to properly treat self-referential types. It's easy to get lost in this sea of types and mix up what exactly they are acting on and how. In particular, it is easy to think that one should do&mut MaybeDangling<T>
(which is useless, it should beMaybeDangling<&mut T>
) – this type applies in the exact opposite way compared toUnsafeCell
(where one uses&UnsafeCell<T>
, andUnsafeCell<&T>
is useless).Rationale and alternatives
The most obvious alternative is to declare
ManuallyDrop
to be that magic type with the memory model exception. This has the disadvantage that one risks memory leaks when all one wants to do is pass around data of someT
without upholding reference liveness. For instance, the third example would have to remember to calldrop
on thebuffer
. This alternative has the advantage that we avoid introducing another type, and it is future-compatible with factoring that aspect ofManuallyDrop
into a dedicated type in the future.Another tempting alternative is to attach the special meaning not to a type, but an attribute. We could have a
#[maybe_dangling]
attribute that can be attached to ADTs, such that references andBox
inside that type are not required to be dereferenceable or non-aliasing as the type gets moved around. This has the advantage that user can attach the attribute to their own type and directly access the fields, so e.g.MyType
can have aBox<T>
field and all of the magic ofBox
is still available, but the type can be moved around freely without worrying about aliasing. For the compiler and Miri implementation this would barely make a difference; we would simply stop recursing into fields when encountering any type with that attribute (rather than only stopping when encountering the magicMaybeDangling
type).Another alternative is to change the memory model such that the example code is fine as-is. There are several variants of this:
[Make all examples legal] All newtype wrappers behave the way
MaybeDangling
is specified in this RFC. This means it is impossible to do zero-cost newtype-wrapping of references and boxes, which is against the Rust value of zero-cost abstractions. It is also a non-compositional surprise for type semantics to be altered through a newtype wrapper.[Make examples 1+2 legal] Or we leave newtype wrappers untouched, but rule that boxes (and references) don't actually have to be dereferenceable. This is just listed for completeness' sake, removing all those optimizations is unlikely to make our codegen folks happy. It is also insufficient for example 3, which is about aliasing, not dereferencability.
[Make only the 2nd example legal] We could remove the part about references always being live for at least as long as the functions they are passed to. This corresponds to replacing the LLVM
dereferenceable
attribute by a (planned by not yet implemented)dereferenceable-on-entry
, which matches the semantics of references in C++. But that does not solve the problem of theMaybeUninit<Box<_>>
footgun, i.e., the first example. (We would have to change the rules forBox
for that, saying it does not need to be dereferenceable at all.) Nor does it help the 3rd example. Also this loses some very desirable optimizations, such asUnder the adjusted rules,
x
could stop being live in the middle of the execution offoo
, so it might not be live any more when thereturn
is executed. Therefore the compiler is not allowed to insert a new use ofx
there.We could more directly expose ways to manipulate the underlying LLVM attributes (
dereferenceable
,noalias
) using by-value wrappers. (When adjusting the pointee type, such as in&UnsafeCell<T>
, we already provide a bunch of fine-grained control.) However there exist other backends, and LLVM attributes were designed for C/C++/Swift, not Rust. The author would argue that we should first think of the semantics we want, and then find ways to best express them in LLVM, not the other way around. And while situations are conceivable where one wants to disable onlynoalias
or onlydereferenceable
, it is unclear whether they are worth the extra complexity. (On the pointee side, Rust used to have aUnique
type, that still exists internally in the standard library, which was intended to providenoalias
without any form ofdereferenceable
. It was deemed better to not expose this.)Instead of saying that all fields of all compound types still must abide by the aliasing rules, we could restrict this to fields of
repr(transparent)
types. That would solve the 2nd and 3rd example without any code changes. It would make it impossible to package up multiple references (in a struct with multiple reference-typed fields) in a way that their aliasing guarantees are still in full force. Right now, we actually do emitnoalias
for the 2nd and 3rd example, so codegen of existing types would have to be changed under this alternative. It would not help for the first example.Finally we could do nothing and declare all examples as intentional UB. The 2nd and 3rd example could use
MaybeUninit
to pass around the closure / the buffer in a UB-free way. That will however requireunsafe
code, and leavesManuallyDrop<Box<T>>
with its footgun (1st example).Prior art
The author cannot think of prior art in other languages; the issue arises because of Rust's unique combination of strong safety guarantees with low-level types such as
ManuallyDrop
that manage memory allocation in a very precise way.Inside Rust, we do have precedent for wrapper types altering language semantics; most prominently, there are
UnsafeCell
andMaybeUninit
. Notice thatUnsafeCell
acts "behind references" whileMaybeDangling
, likeMaybeUninit
, acts "around references":MaybeDangling<&T>
vs&UnsafeCell<T>
.Unresolved questions
What should the type be called?
MaybeDangling
is somewhat misleading since the safety invariant still requires everything to be dereferenceable, only the requirement of dereferencability and noalias is relaxed. This is a bit likeManuallyDrop
which supports dropping via anunsafe
function but its safety invariant says that the data is not dropped (so that it can implementDeref
andDerefMut
and a safeinto_inner
). Furthermore, the type also allows maybe-aliasing references, not just maybe-dangling references. Other possible names might be things likeInertPointers
orSuspendedPointers
.Should
MaybeDangling
implementDeref
andDerefMut
likeManuallyDrop
does, or should accessing the inner data be more explicit since that is when the aliasing and dereferencability requirements do come back in full force?Future possibilities
Design meeting minutes
Attendance: TC, pnkfelix, scottmcm, waffle, yosh, RalfJ, tmandry
Minutes, driver: TC
Not-a-question
scottmcm: I started thinking about changing validity rules, and if the "references don't need valid things behind them" affects this, but that's irrelevant because this is more about things we definitely want like the uniqueness requirements on
&mut
, so I don't think we'd want to solve this by weakening the validity rules.Question: Pondering the "implement
Vec
withBox
" problemsscottmcm:
Box
's aliasing rules have made it disallowed to makeVec<T>
be a wrapper aroundBox<[MaybeUninit<T>]>
. I guess with this RFC it would be possible to do that by making it a wrapper aroundMaybeDangling<Box<[MaybeUninit<T>]>>
and doing some careful pointer access to the internals? It still drops, which is one of the important gains from having the field be aBox
instead of a pointer. But maybe the extra level of wrapping would lose basically all of the advantages of having it being a Box, since the normal Box helpers wouldn't be usable without re-introducing the aliasing issue.Ralf: Yes that sounds accurate. Almost all of
Vec
only needsas_ptr
/as_mut_ptr
which shouldn't be too hard to implement without re-introducing aliasing… I hope.scottmcm: Vec's implementation would never make a Box directly maybe. If I have a reference to a Box, does that introduce the problems here?
RalfJ: Not under any of the aliasing semantics we currently have. SB and TB are only on the outer pointers. That's for deep reasons in how the models work. We shouldn't have aliasing for deeper references in pointers, but we maybe haven't made an official call there.
scottmcm: It sounds like this would still be elegant.
RalfJ: …
RalfJ: If we put Box into Vec, we still want this.
scottmcm: +1.
Question: The purpose of
ManuallyDrop
? (resolved)yosh: Wait, is that what
ManuallyDrop
was specifically designed for? I'm a little confused about the motivating example here I guess?Ralf: I do think that calling
drop
on aManuallyDrop<Box<T>>
, and then moving that variable, was meant to be allowed, yes.yosh: Oh ok, it's to show that merely observing the bytes by another function after dropping is UB. Which should be fine since none of this becomes part of the public API. I had to go through the
ManuallyDrop
docs to fully get that.Ralf: It's not just "observing in another function", a mere "let x = y;" will trip Miri.
Question: Consuming values as in Example 2
scott: That example is calling as by-
self
method, so if I'm understanding properly the value has already been consumed before the "time passes here" part? Can we do something to say that already-moved-out-of variables never need to meet these rules? (I guess that gets into the whole "what does moving even mean" question, and things like whether pointers to moved-from values can be read from to get bitvalid-but-maybe-not-safe values…)Ralf: For references we have
dereferenceable noalias
that apply throughout the entire function. If we want to say that a mutable reference that was moved out no longer has such restrictions, we'll have to stop emitting at leastdereferenceable
to LLVM. (Box
is not a problem for this example.)scott: Ah, because LLVM doesn't have it for scopes, I see. Thanks.
Question: Is the recursive descent based on pre-monomorphized type or post-monomorphization?
pnkfelix: I'm not sure I understand all the implications of "directly inside it" in the following:
In part because I'm not sure whether I can read the
T
there as an uninstantiated type-parameter, or as a place-holder for a concrete type expression that may have its own substructure.pnkfelix: For example: what about
MaybeDangling<Box<Vec<&mut i32>>>
? Does theVec
there act just like aBox
since it owns its contents? Or doBox
andVec
differ here in some way that I'm not understanding yet?scottmcm: Based on this paragraph, I interpret it as the monomorphized type, like when you use
UnsafeCell<T>
:Ralf: Yes everything in opsem is post-monomorphization. For the example,
T
is universally quantified, i.e. the statement holds for any choice ofT
.pnkfelix: Is it correct that under this,
Box<T>
is special, but forVec<T>
, this specialness does not descent into the elements of theVec
?RalfJ: Yes, that's correct. But this gets into another question, which is what are the aliasing requirements on
Vec
.pnkfelix: I'd like it if people could predict how deep the recursion go.
RalfJ: We don't recurse into pointer indirections. Whether we do that is orthogonal to this RFC.
RalfJ: Here, when we hit a
MaybeDangling
we stop recursing.RalfJ: In this document, "directly" means "not through a pointer indirection".
waffle: One way to think about it is that we flatten the types. We have a list of fields, then we turn those into boxes and references.
RalfJ: Aliasing and dereferencable don't apply to indirect elements, i.e. things behind pointers.
RalfJ: MaybeUninit is a crude workaround for the lack of MaybeDangling right now because it too stops the recursion, but it does other things.
Question: Should
MaybeDangling
implementDeref[Mut]
?waffle: wouldn't we have the same benefits by just implementing
Deref<Target = T>
&DerefMut
forMaybeDangling<T>
? (similarly to how we do withManuallyDrop
)Ralf: No, that would lose some
Box
magic, such as partial moves.waffle: Is this magic important? Deref and DerefMut provide us most of the benefits that we would get with an attribute.
RalfJ: Porting from Box to MaybeDangling of Box could be tricky because of this magic.
waffle: Can we implement the same magic for MaybeDangling that we do for Box?
RalfJ: There's probably not much desire for more magic like Box.
Question: MaybeDangling vs #[maybe_dangling]
TC: RalfJ brings up in the document that this may be better as an attribute. This was discussed during a T-opsem meeting and the feeling there was that an attribute had some advantages. What do we think?
TC: The main point in favor of the attribute is that, since we have a number of these marker types, the order in which they are layered becomes somewhat arbitrary. These properties are actually commutative, but of course the type system doesn't know that, and it cares about the order in which these are composed.
RalfJ: Mario suggested this. Those were the arguments.
…
RalfJ: We'd still have the MaybeDangling type, but we'd also have the attribute.
waffle: Would an attribute have the same problems as the types we currently have in terms of being… Currently there are a lot of wrapper types. If we add attributes for all of them, maybe that's the same problem.
RalfJ: If you have two wrapper types, you have to layer them in some order. But with attributes, you just add the attributes.
scottmcm: If we're going to have the type anyway, we can always choose later whether to expose an attribute on a type for it. So I think if we're sure we need the type regardless, then we can punt the attribute question to later after we have stable experience with the type.
pnkfelix: If we did an attribute, we should consider adding such attributes for the others.
TC: If we had the attribute, why would we have the type?
RalfJ: It's for example 2. We wouldn't actually have to have the type, but then you'd have to define it for example 2.
waffle: How would this affect learning?
RalfJ: The documentation tooling is better for types, but maybe we shouldn't design the language around the current documentation tooling.
(Discussion about how the documentation tools could better handle this.)
waffle: Maybe I was asking more about how we teach this?
RalfJ: This is maybe orthogonal to the MaybeDangling semantic question.
Question: What other squares are we missing on this matrix?
TC: As RalfJ brings up in the document, we have a number of these marker traits. It'd be nice if we could ensure these are orthogonal. What does the full matrix look like (including the others under consideration), and what squares are we possibly missing?
RalfJ: NoAlias and Dereferencable are themselves not orthogonal, so that's one thing. Dimensions:
E.g., Box only gets dereferencable at entry.
TC: We should also write out a matrix of what we have covered with all of
MaybeDangling
,UnsafeAliased
,ManuallyDrop
,MaybeUninit
,UnsafeCell
, etc.Question: What does the first unresolved question mean?
waffle: I'm not sure what this means:
The highlighted parts seem to contradict each other.
Ralf: That was indeed not worded properly. The validity requirement is relaxed, the safety requirement is not.
Question: Missed optimizations on
ManuallyDrop
TC: The document notes:
Do we have any options that let users keep these optimizations when wanted? I.e., is there any way to make this more orthogonal?
RalfJ: If ManuallyDrop were an attribute an attribute, that would do it.
Question: What would the API look like?
TC: The document doesn't propose any API for this type. RalfJ, what do you have in mind?
RalfJ: It'd be mostly the same as
ManuallyDrop
, but without the unsafe drop.Question: How are we feeling?
TC: How are we feeling about this?
scottmcm: It's well motivated. We should do it.
pnkfelix: Agreed. I'm wondering about the attribute alternative.
scottmcm: Maybe the type would be more OK because of Deref.
scottmcm: We can FCP the semantic. We could always rename it or do the attribute.
scottmcm: I'll propose FCP merge.
(The meeting ended here.)