Matthew Maurer
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Fixing CFI VTables Today, most things you can do that generates vtables will SIGILL with CFI enabled. Calling methods on a `&dyn Foo` will work properly, but nearly everything else can cause trouble. This document explores a proposed fix so that trait objects, closures, etc. will work under CFI in Rust. ## Background (What is CFI?) CFI, or control-flow integrity, is a class of mechanisms for hardening compiled code by enforcing that the runtime trace of the compiled program stays within the control flow graph of the source. It accomplishes this by restricting indirect jumps[^icall] to targets that analysis of the source program can't rule out. ### Kinds of CFI In order to do this efficiently and precisely, these are separated into different classes. Backwards-edge CFI specifically refers to protections for `ret` or equivalent instructions. Because this follows a stack discipline, there are a wide variety of specialized techniques that work better for this class of indirect jumps ([shadow call stack](https://clang.llvm.org/docs/ShadowCallStack.html)[^scs], [safe stack](https://clang.llvm.org/docs/SafeStack.html)[^safestack], [stack cookies](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fstack-protector)[^stackcookies], etc.). The other major class is known as forwards-edge CFI. This covers calls through function pointers, `longjmp`, and calls that translate to vtables (e.g. calls through virtual functions in C++, calls through a trait object in Rust). There are a variety of approaches to this as well, such as [IBT/BTI](https://en.wikipedia.org/wiki/Indirect_branch_tracking)[^ibt], [FineIBT](https://dl.acm.org/doi/pdf/10.1145/3607199.3607219)[^fineibt], [LLVM CFI](https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html), and [KCFI](https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanitize-kcfi). #### LLVM CFI When you hear someone say they "enabled CFI" for something, this is almost always what they mean - this is LLVM's software-based forward-edge control flow protection which determines its alias sets based on types. This system gives you [two tools](https://llvm.org/docs/TypeMetadata.html): * Attach type metadata (an integer and a string) to a function declaration or global declaration. * Test and branch on whether a given LLVM address belongs to a particular equivalence class. You can attach multiple equivalence classes to a given address, but functions and globals must not share an equivalence class. ##### C++ In C++, the integer is used as a byte offset into the vtable, and the string as itanium-mangled version of the type. Functions and globals will use an offset of 0. The offset acts as an additional disambiguator because a method that is not installed at a particular offset can't have been reached by loading from that offset, even if the type signature is otherwise compatible. Clang offers a number of different hardenings based on these tools: * `cfi-cast-strict` / `cfi-derived-cast` / `cfi-unrelated-cast` - Dynamically check based on type information whether a cast is legal, at various levels of strictness. * `cfi-vcall` - Checks the type of the vtable before invoking a method on it to make sure it is a vtable known to have a method available to the provided receiver type. * `cfi-nvcall` - Uses the type of the vtable to check that direct calls are being called on objects of the correct type. * `cfi-icall` - Uses the type of a function pointer to check if it has the expected signature before calling. * `cfi-mfcall` - Like `cfi-icall`, but with support for pointer-to-member-function. ##### Implementation + Optimization LLVM implements the `llvm.type.test` intrinsic in two different ways depending on whether it is the type signature of a global or a function. ###### Globals Typesigning globals is primarily used for vtables in C++, though there's not anything preventing you from using it for something else. LLVM will place all type-signed globals into the same region, and then for each call to `llvm.type.test`, compute a lower bound, upper bound, alignment restriction, and validity bitvector for which globals are legal. As an optimization, it will do its best to lay out globals which have at least one type in common next to each other, preserve alignment, and try to sort them so that bitvectors are all ones as often as possible (and so unnecessary). See the [CFI design document](https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-virtual-calls) for further details. ###### Functions Functions are implemented similarly to globals, but use a jump table rather than placing all the functions in the same section. This allows alignment to remain constant and small despite the greatly varying size of functions (and possibly even size varying based on the layout itself). If I had three functions, `f`, `g`, and `h`, and all had their addresses taken in ways the compiler couldn't track, they'd end up in the table like this: ``` f: jmp f_real int3 int3 int3 g: jmp g_real int3 int3 int3 h: jmp h_real int3 int3 int3 ``` Every time the source language tried to take "address of `f`", it'd get `f`, not `f_real`. If `f` and `g` have the same type signature, but `h` has a different one, at a callsite to the first signature it'd do a range and alignment check, then jump. If `f` and `h` had the same signature, it would use the bitvector to check. As with globals, the compiler will do its best to lay these out such that bitvectors are not usually necessary, but their flexibility means that functions can be assigned multiple types. ##### DSOs All of the above can only be done efficiently within a single lowering pass, which is why it requires LTO - laying out the global table and the canonical jump table require global information. Unfortunately, shared libraries are quite popular, and implicitly reject this global knowledge. To address this, there is an experimental mode where in addition to the inlinable checks described above, each module: * Exports a function called `__cfi_check` which can be used to validate an arbitrary type exposed with itself * If the inlinable checks fail, it locates the module which contains the pointer to be checked, and invokes its `__cfi_check` if it is CFI enabled. ### KCFI (Kernel CFI) LLVM CFI has a few major drawbacks. Notably: * LTO must be used. * Calls between modules are slow, experimental, and require matching compilers. * The address of the same function in two different modules may be different. * Lockin to LLVM (e.g. can't use `gcc`) In the particular case of embedded and kernel code, many of these drawbacks move from unfortunate to untenable. The Linux kernel supported traditional LLVM CFI for a bit, but: * The kernel is large, so LTO is more of a hit to build performance * Kernel modules are frequently built by third parties, which now need to ensure they use identical compilers. * Addresses of functions are compared for equality within the kernel, which lead to bugs. * The kernel generally wants to be buildable with `gcc`, even if `clang` is a well-supported option these days. This led to the [introduction](https://reviews.llvm.org/D119296) of KCFI. This makes a few observations: * C (not C++) lacks vtables to protect or typesign * All executable memory should be read-only * C doesn't have polymorphism, so every function has a principal type From this, they came up with a simplified version of the LLVM type testing system: * Every function has a prefix before it that encodes a type ID as a valid instruction. * Before each indirect call, it extracts the tag from the header immediately before the function and checks it against the expected value. These typeIDs are the `xxHash` of the Itanium-mangled representation of the function's type, which means that as long as compilers agree on the instruction to embed them in, this gives you cross-compiler CFI without global knowledge. The primary limitations here are: * You cannot typesign globals - since you don't jump to them, you don't have an implicit check that they are in read-only memory, so you can't trust any type tag on them. * You cannot attach multiple types to the same function - even redesigning it to go to two types would make every indirect call site require 3 more instructions. ## Rust CFI Usage Scenarios Incomplete support for CFI may prevent Rust from being used in several environments that we could make real improvements in. ### Mixed-language Vulnerabilities Especially in the embedded world, Rust tends to get mixed into existing C/C++ codebases. Unfortunately, this can turn into a situation where the mixture is [less secure than either](https://dl.acm.org/doi/pdf/10.1145/3418898). This is because while Rust has robust protection against memory safety violations baked into the language, hardened C++ environments have invested in CFI and related hardening features to prevent a memory safety violation from turning into a full instruction pointer hijack. This means that a memory unsafe C++ program which was compiled with CFI enabled can go from safe to unsafe when it adds a Rust dependency if Rust does not have CFI as well - the memory unsafety of the C++ code can modify data accessed by Rust, which then fails to detect the bad control flow transfer. We want to be positioned to tell people that yes, using a Rust component in your existing stack will improve your security posture. ### Android Kernel The Android Kernel is generally deployed with KCFI enabled. We are currently working to deploy a Rust-based rewrite of the binder[^binder] driver after a number of memory safety issues over the years. This component is exposed to literally every process on the system, so any vulnerabilities in it are almost always exploitable. To even consider deploying it, the functions Rust exposes to C must be appropriately tagged with compatible KCFI types. To be *confident* deploying it, we need Rust's internal indirect calls to be protected as well, because [C may corrupt them, even if Rust doesn't](#Mixed-language-Vulnerabilities), and because `unsafe` code in the Rust portion of the kernel may have its own bugs. # Current State Currently, regular functions work in a way compatible with C, largely thanks to the [`-fsanitize=cfi-icall-experimental-normalize-integers`](https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanitize-cfi-icall-experimental-normalize-integers)[^thanksramon] flag. Trait objects with receivers of `&self` or `&mut self` can have their methods called. However, the following will currently result in a CFI mismatch (which will lead to a program abort) or a `bug!` triggering 1. `<S as Foo>::foo as fn(&S)` where `Foo` is a trait with `fn foo(&self)` - you can't use methods from traits as functions 2. Calling a method with signature `fn foo(self: Arc<Self>)` on a trait object - you can't use anything but `&self` or `&mut self` as a receiver 3. `let f: &fn() = &((|| ()) as _);` - you can't convert a closure to a callable function pointer[^fixinprogress] 4. `let _ = Box<dyn Foo> = Box::new(S);` - you can't drop a trait object, the drop entry has the wrong type. 5. If you use `self_cell`, you'll get an infinite loop This means that in practice, you can't enable CFI in a real Rust codebase and have it run. You'll also notice one thing in common here - other than the first and last entry, these are a result of calling vtable entries that have a type that does not match the `Virtual` call that is made to them, usually because the vtable entry has its fully concretized type, while the `Virtual` type is abstract. The first one is as a result of trying to fix this abstraction problem in a specific case by adjusting the type encoding. # Proposed Solution (CFI Shims) If the vtable entries have alias sets that are incompatible with virtual call alias sets, what should we do about this? ## Picking an Alias Set Since objct-safe vtable functions are not directly exposed to C (they aren't `extern "C"` ABI, so there's no way to usefully do that), we are largely free to select whatever alias set we want. We explore several options: ### Singleton All call-sites and definition sites with Rust ABI have the same alias set. This is almost equivalent to disabling the CFI on intra-rust calls. This will work, and be simple, but leaves us open to [mixed language issues](#Mixed-language-Vulnerabilities) and makes incorrect `unsafe` much more likely to be exploitable. ### Rust ABI Compatibility Use the [rules for ABI compatibility](https://doc.rust-lang.org/nightly/std/primitive.fn.html#abi-compatibility) to normalize types down before encoding them when the ABI is "Rust". This should cover all object-safe methods, which should cover the vtable. We would still need to use traditional, type-based encoding for non-"Rust" ABIs for compatibility with external code we link against. Normalization would proceed on each argument and the return type before encoding as we already do, likely with a different prefix in order to prevent Rust ABI annotations from joining the alias set of `extern "C"` functions. The per-type normalization would work roughly like this, following as a visitor 1. If it's a `*const T`, `*mut T`, `&mut T`, `Box<T, Global>`, `NonNull<T>`, replace it with a `*mut <T as Pointee>::Metadata`, normalizing the type before proceeding. 2. Rewrite `usize` or `isize` to the platform's pointer width. 3. Rewrite `char` to `u32` 4. Rewrite any `extern "foo" fn(..) -> T` to `extern "foo" fn()`, do not recurse into arguments 5. Rewrite any align=1 ZST to `()` 6. Rewrite any `repr(transparent)` struct to its unique field, with a stack to catch cycles. On cycle, don't apply this rule but apply any other rules that apply. 7. Rewrite `NonZero<T>` to `T` 8. Strip `Option<T>` to `T` if `T` is subject to [null pointer optimization](https://doc.rust-lang.org/nightly/std/option/index.html#representation). Finally, any argument whose type is `PassMode::Ignore` in the ABI (the usual example being ZSTs in the "Rust" ABI) is removed. The major remaining caveat is that `Virtual` uses `force_thin` to become ABI compatible. This means that when computing the "ABI signature" for a `Virtual` call, the first argument should always become `*mut ()`, e.g. "a pointer with no metadata". The primary advantage of this approach is that `miri` is already busy enforcing these rules on indirect call boundaries, so it should be valid already with no real modifications to codegen, just an alternate computation of alias set. This approach sounds enticing, but this leaves alias sets *extremely* weak, not much better than the Singleton strategy. I haven't done measurements, but since all `fn foo(&self)` would be mutually compatible, regardless of what trait we're talking about, CFI would not be enforcing much beyond an IBT-style scheme here - little more than arity restrictions. ### Middle Ground Restrict further than the ABI compatibility rules require, but while maintaining a that the normalization procedure can assign a principal alias set per `def_id`/`args` combination. This restriction means we will never need to introduce an additional shim, because the alias set can be determined without casing out on the `InstanceDef`. We maintain this restriction on our choices because if we allow differing alias sets per shim^[We technically ignore `VTableShim` here, because it is only used for `unsized_fn_params` (which is in no danger of stabilization), and `call_once`, which is behind the `fn_traits` feature. This means that without using unstable features that aren't commonly used, we can set all trait methods that take `self` to encode as though they took `*mut Self` without breaking anything.], we gain the implementation complexity and runtime overhead we'd incur with the [type signature approach](#Type-Signatures) and may as well use them as our basis since they would be more precise. The only piece that we know actually mismatches in type-based alias sets today is the receiver type. For the first argument of any function, use the ABI normalization rules that we just proposed to transform the first argument. All other arguments and the return type must match the type. This still leaves `fn(&self)` of all kinds compatible with each other, but at least `fn(&self, &Foo)` and `fn(&self, &Bar)` are no longer compatible, which improves things somewhat. #### Limitations We could do better than this while still maintaining a principal alias set, but we have a lot of limitations without whole-program analysis. We would effectively need to class potential receivers into components based on a transitive closure over an undirected variant of the "implements" and "supertype" relationship. With whole-program analysis, we have some hope of making a few smaller components to class things into, rather than all legal object receivers just becoming `*mut ()`. Without full-program analysis (which is one of the goals for [KCFI](#KCFI-Kernel-CFI)), all public types and traits must be assumed to be connected (as a foreign crate could easily create a link), and so most receiver types contract to the same representation. ### Actual Alias Analysis We could use LLVM's alias analysis passes on an un-instrumented variant of the program to dynamically generate alias sets. This would have the advantage of being able to use dataflow and LLVM type compatibility, but the disadvantage of requiring LTO without any hope of cross DSO support. It would also lead to difficult to predict changes in alias set when implementation details in the program change. This might be interesting research, or an experiment, but it doesn't satisfy our practical goals. ### Type Signatures This is what the codebase today attempts to do, and generates the bugs described above without alterations because it never alters shimming, and so no assignment of type-signature-based principal alias sets can be correct. In this approach, every *Instance* has a principal alias set, and this alias set is allowed to depend upon shim information as well. If we need to call a given `def_id` + `args` combination at two different types (say, `*thinconst dyn MyTrait` and `&MyConcrete`), then there will be a way to shim it via `InstanceDef` that will result in each type being present. When creating an instance that may be the recipient of an indirect call (whether because it is going into a vtable or because a function pointer is being created), it will be resolved in a why that applies an appropriate `InstanceDef` to get the receiver type to match, not just be compatible. The rest of the design assumes this is what we're doing. ## Multiple Variants are Required For LLVM CFI, we could perhaps [attach every possible type](#Shim-less-LLVM-CFI) that a method could have. However, if we want KCFI to work, we need at least one compilation mode where every address has at most one type. The rest of the design assumes this restriction. It will still solve the crashing problems of LLVM-CFI, just possibly not in a way that leverages the type-test system for optimal performance. We cannot just make each `Instance` have the alias set expected by its virtual call for a few reasons: * The underlying instance should still be accessible at its conrete type - see bug 1 * If `trait Parent { fn f(&self); } trait Child: Parent {}`, we expect to have an entry that works for `*const dyn Parent` and one that works for `*const dyn Child`[^futurereduce] * `drop_in_place<T>` is inserted into every vtable. Since a given concrete `T` may implement any number of traits, and every one needs a `drop_in_place` entry in their vtable, we can't just adjust the type of `drop_in_place<T>`. This leaves two approaches open: ### Default fn-ptr alias set `Instance`s by default match the alias set it would be legal to perform an explicit indirect call at (e.g. fn-ptr based). Shim adjustments are applied when producing instances for a vtable to abstract them. The primary advantage of this approach is in implementation simplicity and results looking as one might expect. For example, an `Item` of a function `fn foo(&self)` with `Self = Bar` will have the alias set for `fn(&Bar)`, not `fn(*const dyn Foo)` with an *unprinted* caveat that this is a thin-call receiver, not a fat dyn pointer. It also results in fewer changes from the status quo when CFI is not enabled, as implicit argument transmutes will not need to be inserted into the generation of the base instance for methods. This keeps the changes generated by CFI mostly separated from normal compilation. Basically, this assigns the alias set you'd expect if you were not thinking about vtables, and is simpler to implement and test. This approach is what I have implemented in the current PR. ### Default vtable alias set `Instance`s by default match the alias set it would be legal to perform a vtable call at (e.g. thindyn based). Shim adjustments are applied when producing instances for function pointer. Virtual calls are adjusted from today so that instead of calling using an expected signature of `*const dyn MyTrait`, they will call with `*const dyn MySuperTrait` if `MySuperTrait` provides the method being called. The primary advantage of this approach is that it will likely result in faster and smaller code, as creating a function pointer from a trait function is uncommon relative to vtable dispatch. This means that the common case will avoid a shim. This adds complexity, because not all `Instance` types can tell select a default vtable alias set, but are still used in vtables. `drop_in_place` via `DropGlue` is a key example of this - it cannot determine the trait it is being used in, because it may be used in multiple traits. This means that *some* instances do not need a shim when going into a vtable, and others do, and we need to figure out which they are and shim them. If we make make the instance *type* match its alias set, this also means that the logic for determining if an instance needs to be called with a "thin self" will expand from a simple enum check on the `InstanceDef` type^[Today, whether it's `Virtual`, with my patches, whether it's `Virtual` or `CfiShim`] to a computation involving both the shim kind and the def_id in order to determine whether that leading `*const dyn Foo` is a fat or thin pointer. This logic gets even more confusing if we have a trait function which takes a *fat* `&dyn MyTrait` in the first argument, with no receiver, and uses a `where Self: Sized` clause to keep the trait object safe. We now need to check not only whether an item implements a trait, but whether it *really* would go in a vtable before we can determine whether it is `force_thin` or not. The other approach dodges this by only shimming things *as* they're put into the vtable, which means they're pre-selected to have an object-safe receiver that can be rewritten to a thin pointer. This approach would also make the [extra arguments](#Extra-InstanceDef-arguments) approach more viable, as `ReifyShim` would likely cover some of what is needed. ## Don't enable CFI shims by default These shims should be small, but there may be a lot of them and they introduce indirection through trampolines in many cases. This will lead to unnecessary code bloat, confuse the optimizer, etc. ## Variant Instance Strategy Regardless of which alias set we assign to unshimmed `Instance`s, we know that we need to create new `Instance`s either for vtables or function pointers. These instances need a field to determine which abstracted `Self` is in use, since `drop_in_place` demonstrates that we will have an unbounded number of variants for each existing instance. We still need the original `Self` type as well however, as we need to be able to generate the original shim or call the original instance. We explore several strategies for making this invisible available: ### Add it to `Instance` It seems tempting to add the replacement `Self` type to `Instance`, either by packing it into `Args` as an extra field or creating an `Option<Ty>` field for use during CFI, however MIR generation is done on `InstanceDef`, *not* `Instance`. This presents two problems: 1. The MIR generator needs a concrete type, not just a type variable, to repeatedly unwrap the receiver type to make alternate receivers (e.g. `Arc<Self>`) compatible with `thin_self`. 2. We cannot cheaply tell whether a cast was needed, so we don't know whether any instance we're calling should be thin or not. ### Extra `InstanceDef` arguments Only some `InstanceDef` elements can possibly be inside a vtable today - we could just add an extra argument to all of them, and add a helper method for getting the `InstanceDef`'s abstracted `Self` which we could reference in shim generation. I started going down this road initially, but then remembered that `Fn`-family trait objects cover almost everything else; it's only `Virtual`, `Intrinsic`, and `ThreadLocalShim` that are not going to end up in a vtable. We could still do this if we really want to avoid a wrapping `InstanceDef`, but we'd end up adding it to several different shim types and repeating our logic in several places. ### Wrapping `InstanceDef`: `CfiShim` This creates a new shim kind that is essentially a modifier for all other shim kinds. We can now transform any `InstanceDef` into a variant that abstracts to a particular type. The primary advantages of this approach are: 1. The compiler internal representation of non-shimmed code remains identical. No size increases or changes to the shim generation code other than a check for shim enablement. 2. Composes well with new shims - if a new shim is added, it will more than likely Just Work with CFI out of the box. 3. Centralizes logic for CFI concerns - for example, as discussed later, a CFI shim for a function abstracted to a particular trait goes in the crate that performed the `&dyn` cast, not in the defining crate. This decision is written in one place, not 7. The primary disadvantage of this representation is that wrapping some `InstanceDef`s (closure-likes and trait implementation items) require additional logic. This is the approach we assume through the rest of the design. Switching to extra `InstanceDef` arguments would be compatible with the rest of the design, but require adding codepaths to several branches rather than the central one. ## Computing the Alias Set Given an instance, how do we want to get the alias set? ### Separate Machinery We could add a separate method to `Instance` which computes the principal alias set. If the ABI is `"Rust"`, we use the logic described in this section. `"rustcall"` likely needs special casing. All other ABIs directly compute the type and then call the type encoder, as they do today. This allows us to avoid modifying the type in a way that would change how Rust expects to interact with things. Rather than being a transmute followed by a direct call, or a transmute followed by an existing shim, generated code can just *be* a direct call or the existing shim. An `Instance` is available when assigning an alias set to a declaration, but not when making an indirect call - there will be an `Instance` only for direct calls and indirect calls through a vtable (`Virtual`). This means that we still need to have our call-site alias set be computable from the function pointer type as well. Advantages: * Fewer changes to the flow of the compiler * Flexibility to adjust alias sets to contain non-type information for Virtual calls Disadvantages: * Complexity in implementation - the type is available early on, where the alias set requires normalization and arguments. * Complexity in maintenance - instances now have an extra piece of metadata computed for them that could be easily brought out of sync when updating any shim generation code. * Complexity in debugging - it is not clear what the perceived alias set of an instance is as it goes through the compiler unless explicit debug statements are added. In retrospect, this might be worth trying as a refactor. The current patchset uses the type directly. ### Use the Type Directly We generate shims such that the type of the shim reflects the generalization we're allowing. For example, with ``` trait Foo { fn foo(&self); } struct Bar; impl Foo for Bar { fn foo(&self) {} } ``` the `Bar` vtable for type `Foo` would have a `foo` entry with explicit type `fn(*const dyn Foo)`, and a hidden^[In an ideal world, thin dyn pointers would be explicit in the type rather than based on logic around what instance is being called.] `force_thin` annotation based on the instance. The `DROPINPLACE` entry would have type `fn(*mut dyn Foo)` as well. Advantages: * It's obvious what is misaligned when looking at any stage of the compiler - if you're trying to make an indirect call, the target needs the same type. Disadvantages: * We need to insert a transmute in our shims to change the type. This will codegen out, but it is otherwise not needed. This is the approach currently implemented and assumed for the rest of the design. ## Replacing the Receiver Our primary goal on each shim is to convert the receiver type of the function to match the abstracted type. We have several options: ### Rewrite the Receiver Directly The most straightforwards approach would be to examine the first argument's type, match on the receiver structures, and replace it. This approach is simple, and works for `&self`, `&mut self`, `Self`, `Box<Self>`, `Rc<Self>`, `Arc<Self>`, `Pin<P>` with `P` matching the rest of this pattern aside from `Self`. There are two difficulties this produces: * This does not support arbitrary external receiver types. This is not just a nice-to-have, as the Rust support in the Linux kernel uses multiple custom receivers. * A non-trivial amount of custom code is required to perform this rewrite. ### Compute type at concrete and `dyn MyTrait`, merge When producing a shimmed function, rewrite the underlying `def_id` if it is closure-like or a trait function to the generic version of it, with appropriate arguments. When we're only shimming vtables, it's gauranteed that all functions have a receiver, and after this transformation, the receiver is always in the first position. Instantiate the function signature at the concrete type, then instantiate the receiver at the abstract type, and replace the first argument with the rewritten receiver. This largely works, but if, and could theoretically fail if a receiver type depends on an associated type of `MyTrait`. The main downside here is that it involves two type instantiations, creating a new signature, and all returned types are now `FnPtr`, which is not quite accurate. It is also slightly weakens CFI since the associated types are no longer qualifiers. ### Compute type at abstract `Self`, with appropriate associated types Produce a complete trait object type for the conrete type's implementation of the trait in question at the trait arguments. For example, if we have ``` trait Foo<U> { type T1; } trait Bar<U>: Foo<U> { type T2; } struct C; impl Foo<u32> for C { type T1 = u8; } impl Bar<u32> for C { type T2 = i8; } ``` then when we try to produce the type for `C` as it implements `Bar<u32>`, we emit `dyn Bar<u32, T1 = u32, T2 = u32>`. These constraints will be present at the call site, because they are required for a valid trait object. We then replace the `def_id` as in the previous approach, switching closure-likes to the `call` family of functions with appropriate arguments, and trait implementations to the trait method they're *implementing*. This means that the first parameter of everything in the vtable at this point will be `Self`, and we can use the trait object type above. Because we added the associated types, the entire type should instantiate cleanly against the generalized type. This is what is currently implemented in the patchset. ## Which crate do `CfiShim`s go in? In this design, these shims should go in the crate where the vtable is created. The defining crate for the underlying instance cannot know all types the shim will need to be defined at. If we [shimmed function pointers rather than vtables](#Default-vtable-alias-set), shims other than `DropGlue` (since we again don't know all the types it needs to be instantiated at) and similar shims^[Since I haven't implemented this, I don't know that `DropGlue` is the only one that still requires a shim to go into the vtable, but it is an example.] can go in the defining crate. ## MIR Compatibility This section is only needed if we choose to express the alias set directly through the type. If we were to use a separate mechanism, we would not need to adjust the MIR. When we rewrite the type on a function through a shim, we need to line up the type of the generated MIR to match the type of the function. We generate for the wrapped `InstanceDef`, then we rewrite the body to have a transmute from the new receiver to the one the original instance would have expected. To support alternate receivers, we unfortunately have to unwrap to the inner `dyn` pointer. For example, if we have `Arc<dyn Foo>`, we unwrap to `NonNull<ArcInner<dyn Foo>>`, then `*mut ArcInner<dyn Foo>` to match what the ABI expects and allow `force_thin` to do its job. # Patchstack Tour This is a brief overview of the patchstack, intended to help reviewers find specific sections they're looking for. This refers to the patchstack as it is, and not to the design or alternate implementations. ## Prelude ### Introduce trait_obj_ty query This computes the [trait object type with associated types](#Compute-type-at-abstract-Self-with-appropriate-associated-types) that will be later used to compute the abstract type of an instance. It's in a query both because some of the functions it calls are not available at the intended call-site, and because it is called with the same argument several times. ### Refactor visiting instance_def This splits `InstanceDef` visit code out into a function to make the introduction of CFI shims a little smaller ### Refactor fmt_instance Same as above - factors out a function to decrease noise in the CFI shim patch ### Refactor to create InstanceDef::fn_sig There are several places throughout the code where `VTableShim` is special-cased to handle its cast from `self` to `*mut self`. Most of these are annotated with a comment by eddyb to factor it out. Since the shim generator needed to do it one more time, I factored it out first. ## CFI Shims This creates the `InstanceDef::CfiShim` variant. It wraps another `InstanceDef`, which is expected to be another shim, and carries an abstract type, as computed by the `trait_obj_ty` query. The user is intended to construct a `CfiShim` via the `.cfi_shim` instance method. This method will be a no-op if `cfi_shims()` does not return true on the session, currently controlled by either CFI or KCFI being enabled. If it's wrapping a closure-like or a trait method implementation, these are replaced with a `ReifyShim` pointing to the abstract method, as described in the design. When generating the shim, it will prepend a transmute from the abstracted receiver to the concrete receiver. ## Generate Shims This enables usage of these shims with CFI ### CFI: Apply CFI shims to drops Attach `.cfi_shim()` to vtable drop and to collector visiting. This makes trait object drops start working. ### CFI: Enable vtable shimming Attach `.cfi_shim()` to vtable method entry and collector visiting. Alternate receivers begin to work. Standard receivers already work at this point because we changed the encoding of any method on a trait to abstract itself if it used `&self` or `&mut self` as a receiver. ## Fixups These fix individual remaining bugs, though in ways that may depend on CFI shimming being enabled already. ### Revert "CFI: Fix SIGILL reached via trait objects" This removes the encoding modification we previously had that explicitly re-encodes `&self` and `&mut self` on trait methods. It is no longer needed with the rest of this design, and removing it makes function pointers to methods on a trait work again. ### CFI: Skip non-passed arguments Some conversions to function pointers, most notably the conversion from a closure that captures nothing to a function pointer, depends on the understanding that a `PassMode::Ignore` argument does not alter the ABI. This patch loosens the CFI around these arguments by skipping them when generating the alias set. The alternative would be to introduce another kind of shim to explicitly truncate a non-passed argument from the type. ### CFI: Handle dyn with no principal In user Rust, `dyn` is not a type. However, it effectively appears in `drop_in_place` when generating a vtable with no principal trait. For example: `let x: Box<dyn Send> = Box::new(MyType) as _;` will construct a drop call for a vtable with no trait. `Send` and other auto-traits are non-principal, so at vtable allocation time, we have no description for the dropped type other than "It's a pointer to an object that was converted into a `dyn ?` at some point". Because we are describing our alias set in the type system, this means that the receiver type uses `dyn` as a self type, which causes hiccups in several places. This teaches those places to tolerate a `dyn` with no predicates. ### CFI: Support self_cell-like recursion This patch, or something like it, can and probably should land even if the rest don't. The type encoder wants to flatten `#[repr(transparent)]` into its single, non-ZST field for compatibility. The existing code attempts to avoid recursion by generalizing pointers, but the use of `PhantomData` or any similar structure defeats this. This pattern is used in `self_cell`, so it's not just an adversarial example. ### CFI: Generate super vtables explicitly Super vtables are currently skipped in the collector because the the child vtable includes instances for all the super vtable entries. In our case though, the super vtable will be shimmed to a different abstract type than the entries in the child vtable. This means it will have different instances, so we need to generate them. ### CFI: Strip auto traits from Virtual receivers Auto traits are not present when *generating* vtables, but they may be present on the receiver when calling them. This strips auto traits off the receiver of virtual calls so that they're compatible with the target, since a virtual call cannot *require* those additional bounds because it was an object-safe method. # Future Work This stack gets us to "Most Rust code actually builds and runs under CFI", but there are more improvements we can still make in the CFI area. ## Shim-less LLVM-CFI A lot of this design is determined by the restrictions of KCFI - namely that we can't typesign globals, and every function can only have one type signature. We can leverage both of those capabilities to remove shims and make things more efficient. If we do these, we'll want to enable KCFI in userspace to ensure it continues to work, as the implementations will diverge. ### Assign all legal types to each function 1. Make `Virtual` call through `dyn DefiningTrait` rather than `dyn MaybeSuperTrait`. 2. When attaching CFI types to a method, attach both its concrete type, and the method type instantiated with the generalized object type described earlier, with `force_thin` applied. 3. For `drop_in_place<T>`, we still need a shim, but only one per crate. The implementor of some trait for `T` should provide the shim, and it should contain types for all traits that crate implements for it on the same shim. ### Skip signature check on virtual calls In short, check the vtable, don't check the virtual call itself. I haven't verified the checks on loading vtables work correctly, but I see the implementation. If you checked the load of the vtable, you've already checked the type of the method, you don't need to do it again. The vtables reside in a RO region, so if you did a typechecked load, you're already safe. ## Reduce KCFI Shim Count We can't get rid of shims entirely, because the alias set expression we're using fundamentally doesn't have a principal alias set per unshimmed instance. Each shim we produce is essentially a witness for membership in an additional alias set. We can avoid these by reducing the number of alias sets a method is in, or by making the unshimmed case match what is common in real world code. ### Reduce total possible alias sets If we make `Virtual` use the defining trait as a receiver rather than the current one, we no longer need a shim for every supertrait on the trait the method implements. If deep supertrait hierarchies are used, this could significantly reduce the number of possible shims. ### Make unshimmed the common case We can [switch to vtable-compatible defaults](#Default-fn-ptr-alias-set). Without the the `Virtual` call change, this wouldn't do much - supertraits would bring most of the shims back, while adding complexity. However with it, this would make shim generation much more unlikely, as calling a method through a vtable is much more common than converting a trait method to a function pointer. `drop_in_place` would be by far the most common shim remaining. This is potentially significantly more complicated, but would improve performance. ## FineIBT Support FineIBT is only available on some CPUs, but it provides more flexibility of implementation. The main benefits the Linux experimental implementation claims are: * Speculation barrier * No reads can allow XOM * No reads can improve performance Today, they are using a hashed C signature as an alias set, the same way KCFI does. However, the interesting part is that the alias set policy is enforced at the *callee*. This means that if we convert the `Virtual` call to call at trait definition type, we could use a custom enforcement that checks if the caller is one of *two* values rather than just one, e.g. ``` fn_ptr_entry_f: endbr CHECK_FNPTR_BUNDLE jmp direct_f virtual_entry_f: endbr CHECK_FNPTR_BUNDLE direct_entry_f: // Actual implementation ``` This is experimental though, and not available on all chips, so it wouldn't actually allow us to ship Rust in the Android kernel. I'm mostly including it in case all this discussion of CFI got you excited, because this is probably the direction the version of this without whole-program-analysis is going. [^icall]: Indirect jumps are control transfer instructions which use a computed rather than a constant target. [^scs]: Shadow Call Stack uses a separate stack in x18 which is only accessed around call/return to store return addresses. This makes it difficult to overwrite the return address because the real one is only accessible through a register which isn't used elsewhere in the program. [^safestack]: Safe Stack partitions the stack into two stacks, one of which holds compiler-controlled values (register spills, return addresses, etc.) and one which has address-taken and programmer controlled values. This makes it difficult to overwrite "safe" values from out-of-bounds writes relative to "unsafe" values because they are no longer adjacent. [^stackcookies]: Stack cookies put a randomized (extent of randomization varies by implementation) value onto the stack immediately after the return address. This value is checked before returning, which makes linear overwrites from the stack onto the return address difficult to perform because the cookie should be unpredictable. [^ibt]: These are a pair of hardware accelerated indirect branch control present on recent Intel and ARM CPUs respectively. They both work on the same principle - a special instruction that would decode into a `nop` on earlier versions of the CPU is placed at every location in the program it would be legal for an indirect branch to go. When the protection is enabled, indirect control flow transfers that do not end on one of these landing pads will fault. [^fineibt]: Fine IBT is a software extension of IBT/BTI in which callers move the identifier for an alias set into a caller-saved register, and after each landing pad, there is an efficient check of the provided identifier. The paper does not select a scheme for defining alias classes, but the [experimental Clang implementation](https://github.com/lvwr/llvm-project/commits/fineibt/kernel/) uses hashes of type signatures, similar to KCFI. See [FineIBT Support](#FineIBT-Support) for further discussion. [^binder]: Binder is the name of Android's IPC system. It's in the kernel to allow it to make scheduling decisions, reduce copy count, and manage the lifetime of objects passed between more than just two processes. [^thanksramon]: Kudos to @rcvalle for designing and pushing that flag through LLVM. [^fixinprogress]: I believe @rcvalle has an alternate fix in progress for this focused on adjusting closure encoding. [^futurereduce]: We might be able to avoid this in the future by [changing virtual calls](#Reduce-KCFI-Shim-Count) when in CFI mode to perform an implicit trait upcast.

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully