Design meeting 2023-10-11: Unsafe Extern Blocks RFC read

--- title: "Design meeting 2023-10-11: Unsafe Extern Blocks RFC read" date: 2023-10-11 tags: T-lang, design-meeting, minutes discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Design.20meeting.202023-10-11 url: https://hackmd.io/tQW0eMJkSrma8eN6yTBhJA --- # Unsafe Extern Blocks - Feature Name: `unsafe_extern` - Start Date: 2023-05-23 - RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary [summary]: #summary In Edition 2024 it is `unsafe` to declare an `extern` function or static, but external functions and statics *can* be safe to use after the initial declaration. # Motivation [motivation]: #motivation Simply declaring extern items, even without ever using them, can cause Undefined Behavior. When performing cross-language compilation, attributes on one function declaration can flow to the foreign declaration elsewhere within LLVM and cause a miscompilation. In Rust we consider all sources of Undefined Behavior to be `unsafe`, and so we must make declaring extern blocks be `unsafe`. The up-side to this change is that in the new style it will be possible to declare an extern fn that's safe to call after the initial unsafe declaration. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation Rust can utilize functions and statics from foreign code that are provided during linking, though it is `unsafe` to do so. An `extern` block can be placed anywhere a function declaration could appear (generally at the top level of a module), and must always be prefixed with the keyword `unsafe`. Within the block you can declare the exernal functions and statics that you want to make visible within the current scope. Each function declaration gives only the function's signature, similar to how methods for traits are declared. If calling a foreign function is `unsafe` then you must declare the function as `unsafe fn`, otherwise you can declare it as a normal `fn`. Each static declaration gives the name and type, but no initial value. * If the `unsafe_code` lint is denied or forbidden at a particular scope it will cause the `unsafe extern` block to be a compilation error within that scope. * Declaring an incorrect external item signature can cause Undefined Behavior during compilation, even if Rust never accesses the item. ```rust unsafe extern { // sqrt (from libm) can be called with any `f64` pub fn sqrt(x: f64) -> f64; // strlen (from libc) requires a valid pointer, // so we mark it as being an unsafe fn pub unsafe fn strlen(p: *const c_char) -> usize; pub static IMPORTANT_BYTES: [u8; 256]; pub static LINES: SyncUnsafeCell<i32>; } ``` Note: other rules for extern blocks, such as optionally including an ABI, are unchanged from previous editions, so those parts of the guide would remain. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation This adjusts the grammar of the language to *require* the `unsafe` keyword before an `extern` block declaration (currently it's optional and syntatically allowed but semantically rejected). Replace the *Functions* and *Statics* sections with the following: ### Functions Functions within external blocks are declared in the same way as other Rust functions, with the exception that they must not have a body and are instead terminated by a semicolon. Patterns are not allowed in parameters, only IDENTIFIER or _ may be used. The function qualifiers `const`, `async`, and `extern` are not allowed. If the function is unsafe to call, then the function must use the `unsafe` qualifier. If the function signature declared in Rust is incompatible with the function signature as declared in the foreign code it is Undefined Behavior. Functions within external blocks may be called by Rust code, just like functions defined in Rust. The Rust compiler will automatically use the correct foreign ABI when making the call. When coerced to a function pointer, a function declared in an extern block has type ```rust extern "abi" for<'l1, ..., 'lm> fn(A1, ..., An) -> R ``` where `'l1`, ... `'lm` are its lifetime parameters, `A1`, ..., `An` are the declared types of its parameters and `R` is the declared return type. ### Statics Statics within external blocks are declared in the same way as statics outside of external blocks, except that they do not have an expression initializing their value. It is unsafe to declare a static item in an extern block, whether or not it's mutable, because there is nothing guaranteeing that the bit pattern at the static's memory is valid for the type it is declared with. Extern statics can be either immutable or mutable just like statics outside of external blocks. An immutable static must be initialized before any Rust code is executed. It is not enough for the static to be initialized before Rust code reads from it. A mutable extern static is unsafe to access, the same as a Rust mutable static. # Drawbacks [drawbacks]: #drawbacks * It is very unfortunate to have to essentially reverse the status quo. * Hopefully, allowing people to safely call some foreign functions will make up for the churn caused by this change. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives Incorrect extern declarations can cause UB in current Rust, but we have no way to automatically check that all declarations are correct, nor is such a thing likely to be developed. Making the declarations `unsafe` so that programmers are aware of the dangers and can give extern blocks the attention they deserve is the minimum step. # Prior art [prior-art]: #prior-art None we are aware of. # Unresolved questions [unresolved-questions]: #unresolved-questions * Extern declarations are actually *always* unsafe and able to cause UB regardless of edition. This RFC doesn't have a specific answer on how to improve pre-2024 code. # Future possibilities [future-possibilities]: #future-possibilities None are apparent at this time. --- # Design meeting minutes Attendance: TC, tmandry, scottmcm, pnkfelix, eholk, Urgau Minutes, driver: TC ## Question: bit-patterns of statics scottmcm: the doc mentions that one thing making statics have a proof obligation on declaration (man I want the `unsafe`-vs-`hold_my_beer` keywords distinction for these conversations) is that the bit pattern might not match. Would it be worth having an internal `CanBeUndef` autotrait à la `Freeze` that would let things happen? Or is that useless because there's still a proof obligation that something else has defined the static, so extern statics are fundamentally in need of a proof obligation anyway? ## Question: Is the premise unchangeable? tmandry: Ralf said in [this comment](https://github.com/rust-lang/lang-team/issues/223#issuecomment-1747396734) that we could work towards changing the fact that simply declaring an extern fn can cause UB. Is that possible when we need C FFI? TC: RalfJ: > > Simply declaring extern items, even without ever using them, can cause Undefined Behavior. > > FWIW this is not a fact of nature, it's an LLVM thing inherited from C. We could also try to work towards fixing that. tmandry: It'd be nice for the RFC to mention how an `extern` by itself can cause UB. scottmcm: ... pnkfelix: But Ralf's point is that if you never call the function it shouldn't cause UB. pnkfelix: My viewpoint is that LLVM does this and we're stuck with it. scottmcm: If we can fix 4-5 reasons why it's unsafe but we still have 1, it's still unsafe. (Discussion of what's in the realm of the feasible.) ## Comment: `cargo fix` TC: In earlier discussions with Lokathor, the belief was that this is trivially `cargo fix`-able, though that is not yet mentioned in the RFC. scottmcm: The code for the fix should be simple, but the flow of actually doing the transition needs to be considered carefully. scottmcm: If just an `unsafe` gets added to the `extern`, that would change people's code. TC: `cargo fix` would add `unsafe` to both. scottmcm: Not everyone will use `cargo fix` unfortunately. It'd be easy for them to miss. pnkfelix: Even going forward, most of these will be `unsafe`. Maybe this should take the same number of characters. ## Question: Some kind of opt-in pnkfelix: The ability to *express* "this extern function is entirely safe to call" does seem somewhat useful. (I assume the only way to achieve it today is via a safe wrapper marked `#[inline(always)]`.) But the text of this RFC is perhaps making it too easy, given that I assume the vast majority of extern functions **are** unsafe to call. pnkfelix: Thus, I wonder whether we should still require some kind of opt-in, e.g. an attribute `#[safe]` that has to appear in front of every `fn` in an `extern` block that is not already marked `unsafe`. pnkfelix: In the interest of being concrete, this is what I am suggesting would be required. ~~(I'm not clear on how TC's `#[allow(safe_extern_fn)` extends into this example.)~~ ```rust unsafe extern { // sqrt (from libm) can be called with any `f64` #[safe] pub fn sqrt(x: f64) -> f64; // strlen (from libc) requires a valid pointer, // so we mark it as being an unsafe fn pub unsafe fn strlen(p: *const c_char) -> usize; pub static IMPORTANT_BYTES: [u8; 256]; pub static LINES: SyncUnsafeCell<i32>; } ``` TC: We could use a real lint for this as we did with `refine` for RPTIIT. So `#[allow(safe_extern_fn)]` or whatnot. E.g.: ```rust unsafe extern { // sqrt (from libm) can be called with any `f64` #[allow(safe_extern_fn)] pub fn sqrt(x: f64) -> f64; // strlen (from libc) requires a valid pointer, // so we mark it as being an unsafe fn pub unsafe fn strlen(p: *const c_char) -> usize; pub static IMPORTANT_BYTES: [u8; 256]; pub static LINES: SyncUnsafeCell<i32>; } ``` pnkfelix: I suppose at that point it is a question of whether we try to enforce this attribute be used on each `fn` item, or if we accept cases where people apply the `allow` at higher levels of scope. scottmcm: We could use a `safe` contextual keyword. We could allow `unsafe` on old editions also. This could allow a multistep transition. tmandry: There's a consistency argument for having a keyword since we already have `unsafe`. pnkfelix: Do we have data on how many extern functions would be considered safe? tmandry: We need a strategy to prevent people from accidentally declaring things safe. TC: One nice thing about a lint is that it gives us data. TC: For someone learning Rust they might expect to be able to use `safe` on any fn. scottmcm: People have wanted this for expressions. Having a `safe { .. }` within an `unsafe { .. }` to opt back into safe mode. pnkfelix: If we treat `safe` as switching back into the `safe` mode, then do we need to write `unsafe` in front of each function that's unsafe? scottmcm: I agree we don't need it, but it might be nice to allow it. TC: rustfmt is going to want normalize this. So we'd actually just be kicking the question to T-style. tmandry: This makes the migration easier. scottmcm: This means we could largely do this independent of the edition. TC: The proposal on the table is that in the earlier editions, `unsafe` becomes optional in front of the `extern`. In 2024, `unsafe` becomes required before the `extern`. In all editions, within an `unsafe extern`, `safe` becomes legal before all items, and in all editions items within an `unsafe extern` marked `safe` are OK to use in safe code. In all editions, within an `unsafe extern`, unsafe` becomes legal but optional before all items. ## Question: Accidental safe fn tmandry: Should we be worried about forgetting `unsafe` in front of individual fn definitions? scottmcm: I think this is the normal state of things, so might make most sense in context of the "how do we do the transition" conversation. The idea of having `safe fn` for a while, as part of making the transition more visible, has come up before. eholk: I was concerned about this too. I suspect in practice people who are writing `unsafe extern {}` blocks are doing this in crates that just bind a single non-Rust library. They are probably paying more attention to the safety requirements of the library, since that is the whole point of the crate. ## Question: Punning of `extern` tmandry: Pre-existing, but it's a little confusing that `extern` has these two roles, one for saying that we're declaring an item that's defined externally, and one for declaring the ABI of a function. ## Question: Assuming contextual `safe fn`, do we *need* `unsafe fn` within `unsafe extern { ... }` pnkfelix: (Just trying to capture discussion we had) ## Question: What about statics? scottmcm: does anyone want `unsafe static` ever? (Distinct from `static mut`'s unsafe) ## Proposal summary (scottmcm's attempt at recording the) Proposal on the table: - On all editions, the grammar for declaration inside `extern {` allows `safe` & `unsafe`. - This applies to `static`s and `fn`s - These are *optional* (well, this is a point of contention) in all editions, and things default to `unsafe` - On all editions, you can write `unsafe extern {` for better clarity. - We lint on unannotated `extern` in ≤2021 editions, suggesting it change to `unsafe extern`; in 2024 it's required to use `unsafe extern`. - You can only use `safe` and `unsafe` inside `unsafe extern`; it's a semantic error to use them in unannotated `extern`. - MSRV consequences for `unsafe extern` and `safe fn` are the same; there's no reason to use `unsafe fn` and `safe fn` in an extern block without it being `unsafe extern`. - Ideally there's a nice error saying "change the block to `unsafe extern` and you can do this" (or similar) Goals: - We agree that it should be possible for people to have safe-to-use items in `extern`s without needing separate wrappers (both `fn`s and `static`s). - Low churn on 2024 edition migration, even if done manually -- just the one s/`extern {`/`unsafe extern {`/ for the whole block. - Enabling `safe fn`/`unsafe fn` for people who want it, regardless of edition. - Even if `unsafe` is the default, it's good to make the `unsafe fn` part clear in the declaration so that you don't need to notice the context in which it's found. - Avoiding worries about accidentally making things in `extern` safe to call. - Leave space available to be stricter in future editions, should experience suggest that's the best way forward. - For example, we might start requiring `unsafe fn` on everything `unsafe`-to-call, so that an edition after that could tweak defaults in various ways. Example: This compiles: ```rust #[cfg(FALSE)] extern { const fn foo(); } ``` ...because it's parsable... ...even though this is a hard error: ```rust extern { const fn foo(); } ``` So we can do the same thing for `safe` and `unsafe` in there. Example 2: This *parses*, but is a semantic error: ```rust trait Foo { pub fn foo(); } ``` This compiles fine: ```rust #[cfg(FALSE)] trait Foo { pub fn foo(); } ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.