owned this note changed 2 years ago
Published Linked with GitHub

Unsafe Extern Blocks

Summary

In Edition 2024 it is unsafe to declare an extern function or static, but external functions and statics can be safe to use after the initial declaration.

Motivation

Simply declaring extern items, even without ever using them, can cause Undefined Behavior.
When performing cross-language compilation, attributes on one function declaration can flow to the foreign declaration elsewhere within LLVM and cause a miscompilation.
In Rust we consider all sources of Undefined Behavior to be unsafe, and so we must make declaring extern blocks be unsafe.
The up-side to this change is that in the new style it will be possible to declare an extern fn that's safe to call after the initial unsafe declaration.

Guide-level explanation

Rust can utilize functions and statics from foreign code that are provided during linking, though it is unsafe to do so.

An extern block can be placed anywhere a function declaration could appear (generally at the top level of a module), and must always be prefixed with the keyword unsafe.

Within the block you can declare the exernal functions and statics that you want to make visible within the current scope.
Each function declaration gives only the function's signature, similar to how methods for traits are declared.
If calling a foreign function is unsafe then you must declare the function as unsafe fn, otherwise you can declare it as a normal fn.
Each static declaration gives the name and type, but no initial value.

  • If the unsafe_code lint is denied or forbidden at a particular scope it will cause the unsafe extern block to be a compilation error within that scope.
  • Declaring an incorrect external item signature can cause Undefined Behavior during compilation, even if Rust never accesses the item.
unsafe extern {
    // sqrt (from libm) can be called with any `f64`
    pub fn sqrt(x: f64) -> f64;

    // strlen (from libc) requires a valid pointer,
    // so we mark it as being an unsafe fn
    pub unsafe fn strlen(p: *const c_char) -> usize;

    pub static IMPORTANT_BYTES: [u8; 256];

    pub static LINES: SyncUnsafeCell<i32>;
}

Note: other rules for extern blocks, such as optionally including an ABI, are unchanged from previous editions, so those parts of the guide would remain.

Reference-level explanation

This adjusts the grammar of the language to require the unsafe keyword before an extern block declaration (currently it's optional and syntatically allowed but semantically rejected).

Replace the Functions and Statics sections with the following:

Functions

Functions within external blocks are declared in the same way as other Rust functions, with the exception that they must not have a body and are instead terminated by a semicolon. Patterns are not allowed in parameters, only IDENTIFIER or _ may be used. The function qualifiers const, async, and extern are not allowed. If the function is unsafe to call, then the function must use the unsafe qualifier.

If the function signature declared in Rust is incompatible with the function signature as declared in the foreign code it is Undefined Behavior.

Functions within external blocks may be called by Rust code, just like functions defined in Rust. The Rust compiler will automatically use the correct foreign ABI when making the call.

When coerced to a function pointer, a function declared in an extern block has type

extern "abi" for<'l1, ..., 'lm> fn(A1, ..., An) -> R

where 'l1, 'lm are its lifetime parameters, A1, , An are the declared types of its parameters and R is the declared return type.

Statics

Statics within external blocks are declared in the same way as statics outside of external blocks, except that they do not have an expression initializing their value. It is unsafe to declare a static item in an extern block, whether or not it's mutable, because there is nothing guaranteeing that the bit pattern at the static's memory is valid for the type it is declared with.

Extern statics can be either immutable or mutable just like statics outside of external blocks. An immutable static must be initialized before any Rust code is executed. It is not enough for the static to be initialized before Rust code reads from it. A mutable extern static is unsafe to access, the same as a Rust mutable static.

Drawbacks

  • It is very unfortunate to have to essentially reverse the status quo.
    • Hopefully, allowing people to safely call some foreign functions will make up for the churn caused by this change.

Rationale and alternatives

Incorrect extern declarations can cause UB in current Rust, but we have no way to automatically check that all declarations are correct, nor is such a thing likely to be developed. Making the declarations unsafe so that programmers are aware of the dangers and can give extern blocks the attention they deserve is the minimum step.

Prior art

None we are aware of.

Unresolved questions

  • Extern declarations are actually always unsafe and able to cause UB regardless of edition. This RFC doesn't have a specific answer on how to improve pre-2024 code.

Future possibilities

None are apparent at this time.


Design meeting minutes

Attendance: TC, tmandry, scottmcm, pnkfelix, eholk, Urgau

Minutes, driver: TC

Question: bit-patterns of statics

scottmcm: the doc mentions that one thing making statics have a proof obligation on declaration (man I want the unsafe-vs-hold_my_beer keywords distinction for these conversations) is that the bit pattern might not match. Would it be worth having an internal CanBeUndef autotrait à la Freeze that would let things happen? Or is that useless because there's still a proof obligation that something else has defined the static, so extern statics are fundamentally in need of a proof obligation anyway?

Question: Is the premise unchangeable?

tmandry: Ralf said in this comment that we could work towards changing the fact that simply declaring an extern fn can cause UB. Is that possible when we need C FFI?

TC: RalfJ:

Simply declaring extern items, even without ever using them, can cause Undefined Behavior.

FWIW this is not a fact of nature, it's an LLVM thing inherited from C. We could also try to work towards fixing that.

tmandry: It'd be nice for the RFC to mention how an extern by itself can cause UB.

scottmcm:

pnkfelix: But Ralf's point is that if you never call the function it shouldn't cause UB.

pnkfelix: My viewpoint is that LLVM does this and we're stuck with it.

scottmcm: If we can fix 4-5 reasons why it's unsafe but we still have 1, it's still unsafe.

(Discussion of what's in the realm of the feasible.)

Comment: cargo fix

TC: In earlier discussions with Lokathor, the belief was that this is trivially cargo fix-able, though that is not yet mentioned in the RFC.

scottmcm: The code for the fix should be simple, but the flow of actually doing the transition needs to be considered carefully.

scottmcm: If just an unsafe gets added to the extern, that would change people's code.

TC: cargo fix would add unsafe to both.

scottmcm: Not everyone will use cargo fix unfortunately. It'd be easy for them to miss.

pnkfelix: Even going forward, most of these will be unsafe. Maybe this should take the same number of characters.

Question: Some kind of opt-in

pnkfelix: The ability to express "this extern function is entirely safe to call" does seem somewhat useful. (I assume the only way to achieve it today is via a safe wrapper marked #[inline(always)].) But the text of this RFC is perhaps making it too easy, given that I assume the vast majority of extern functions are unsafe to call.

pnkfelix: Thus, I wonder whether we should still require some kind of opt-in, e.g. an attribute #[safe] that has to appear in front of every fn in an extern block that is not already marked unsafe.

pnkfelix: In the interest of being concrete, this is what I am suggesting would be required. (I'm not clear on how TC's #[allow(safe_extern_fn) extends into this example.)

unsafe extern {
    // sqrt (from libm) can be called with any `f64`
    #[safe] pub fn sqrt(x: f64) -> f64;

    // strlen (from libc) requires a valid pointer,
    // so we mark it as being an unsafe fn
    pub unsafe fn strlen(p: *const c_char) -> usize;

    pub static IMPORTANT_BYTES: [u8; 256];

    pub static LINES: SyncUnsafeCell<i32>;
}

TC: We could use a real lint for this as we did with refine for RPTIIT. So #[allow(safe_extern_fn)] or whatnot. E.g.:

unsafe extern {
    // sqrt (from libm) can be called with any `f64`
    #[allow(safe_extern_fn)]
    pub fn sqrt(x: f64) -> f64;

    // strlen (from libc) requires a valid pointer,
    // so we mark it as being an unsafe fn
    pub unsafe fn strlen(p: *const c_char) -> usize;

    pub static IMPORTANT_BYTES: [u8; 256];

    pub static LINES: SyncUnsafeCell<i32>;
}

pnkfelix: I suppose at that point it is a question of whether we try to enforce this attribute be used on each fn item, or if we accept cases where people apply the allow at higher levels of scope.

scottmcm: We could use a safe contextual keyword. We could allow unsafe on old editions also. This could allow a multistep transition.

tmandry: There's a consistency argument for having a keyword since we already have unsafe.

pnkfelix: Do we have data on how many extern functions would be considered safe?

tmandry: We need a strategy to prevent people from accidentally declaring things safe.

TC: One nice thing about a lint is that it gives us data.

TC: For someone learning Rust they might expect to be able to use safe on any fn.

scottmcm: People have wanted this for expressions. Having a safe { .. } within an unsafe { .. } to opt back into safe mode.

pnkfelix: If we treat safe as switching back into the safe mode, then do we need to write unsafe in front of each function that's unsafe?

scottmcm: I agree we don't need it, but it might be nice to allow it.

TC: rustfmt is going to want normalize this. So we'd actually just be kicking the question to T-style.

tmandry: This makes the migration easier.

scottmcm: This means we could largely do this independent of the edition.

TC: The proposal on the table is that in the earlier editions, unsafe becomes optional in front of the extern. In 2024, unsafe becomes required before the extern. In all editions, within an unsafe extern, safe becomes legal before all items, and in all editions items within an unsafe extern marked safe are OK to use in safe code. In all editions, within an unsafe extern, unsafe` becomes legal but optional before all items.

Question: Accidental safe fn

tmandry: Should we be worried about forgetting unsafe in front of individual fn definitions?

scottmcm: I think this is the normal state of things, so might make most sense in context of the "how do we do the transition" conversation. The idea of having safe fn for a while, as part of making the transition more visible, has come up before.

eholk: I was concerned about this too. I suspect in practice people who are writing unsafe extern {} blocks are doing this in crates that just bind a single non-Rust library. They are probably paying more attention to the safety requirements of the library, since that is the whole point of the crate.

Question: Punning of extern

tmandry: Pre-existing, but it's a little confusing that extern has these two roles, one for saying that we're declaring an item that's defined externally, and one for declaring the ABI of a function.

Question: Assuming contextual safe fn, do we need unsafe fn within unsafe extern { ... }

pnkfelix: (Just trying to capture discussion we had)

Question: What about statics?

scottmcm: does anyone want unsafe static ever? (Distinct from static mut's unsafe)

Proposal summary

(scottmcm's attempt at recording the) Proposal on the table:

  • On all editions, the grammar for declaration inside extern { allows safe & unsafe.
    • This applies to statics and fns
    • These are optional (well, this is a point of contention) in all editions, and things default to unsafe
  • On all editions, you can write unsafe extern { for better clarity.
  • We lint on unannotated extern in ≤2021 editions, suggesting it change to unsafe extern; in 2024 it's required to use unsafe extern.
  • You can only use safe and unsafe inside unsafe extern; it's a semantic error to use them in unannotated extern.
    • MSRV consequences for unsafe extern and safe fn are the same; there's no reason to use unsafe fn and safe fn in an extern block without it being unsafe extern.
    • Ideally there's a nice error saying "change the block to unsafe extern and you can do this" (or similar)

Goals:

  • We agree that it should be possible for people to have safe-to-use items in externs without needing separate wrappers (both fns and statics).
  • Low churn on 2024 edition migration, even if done manually just the one s/extern {/unsafe extern {/ for the whole block.
  • Enabling safe fn/unsafe fn for people who want it, regardless of edition.
    • Even if unsafe is the default, it's good to make the unsafe fn part clear in the declaration so that you don't need to notice the context in which it's found.
  • Avoiding worries about accidentally making things in extern safe to call.
  • Leave space available to be stricter in future editions, should experience suggest that's the best way forward.
    • For example, we might start requiring unsafe fn on everything unsafe-to-call, so that an edition after that could tweak defaults in various ways.

Example:

This compiles:

#[cfg(FALSE)]
extern {
    const fn foo();
}

because it's parsable

even though this is a hard error:

extern {
    const fn foo();
}

So we can do the same thing for safe and unsafe in there.

Example 2:

This parses, but is a semantic error:

trait Foo {
    pub fn foo();
}

This compiles fine:

#[cfg(FALSE)]
trait Foo {
    pub fn foo();
}
Select a repo