Today, most things you can do that generates vtables will SIGILL with CFI enabled. Calling methods on a &dyn Foo
will work properly, but nearly everything else can cause trouble. This document explores a proposed fix so that trait objects, closures, etc. will work under CFI in Rust.
CFI, or control-flow integrity, is a class of mechanisms for hardening compiled code by enforcing that the runtime trace of the compiled program stays within the control flow graph of the source. It accomplishes this by restricting indirect jumps[1] to targets that analysis of the source program can't rule out.
In order to do this efficiently and precisely, these are separated into different classes. Backwards-edge CFI specifically refers to protections for ret
or equivalent instructions. Because this follows a stack discipline, there are a wide variety of specialized techniques that work better for this class of indirect jumps (shadow call stack[2], safe stack[3], stack cookies[4], etc.). The other major class is known as forwards-edge CFI. This covers calls through function pointers, longjmp
, and calls that translate to vtables (e.g. calls through virtual functions in C++, calls through a trait object in Rust). There are a variety of approaches to this as well, such as IBT/BTI[5], FineIBT[6], LLVM CFI, and KCFI.
When you hear someone say they "enabled CFI" for something, this is almost always what they mean - this is LLVM's software-based forward-edge control flow protection which determines its alias sets based on types. This system gives you two tools:
You can attach multiple equivalence classes to a given address, but functions and globals must not share an equivalence class.
In C++, the integer is used as a byte offset into the vtable, and the string as itanium-mangled version of the type. Functions and globals will use an offset of 0. The offset acts as an additional disambiguator because a method that is not installed at a particular offset can't have been reached by loading from that offset, even if the type signature is otherwise compatible.
Clang offers a number of different hardenings based on these tools:
cfi-cast-strict
/ cfi-derived-cast
/ cfi-unrelated-cast
- Dynamically check based on type information whether a cast is legal, at various levels of strictness.cfi-vcall
- Checks the type of the vtable before invoking a method on it to make sure it is a vtable known to have a method available to the provided receiver type.cfi-nvcall
- Uses the type of the vtable to check that direct calls are being called on objects of the correct type.cfi-icall
- Uses the type of a function pointer to check if it has the expected signature before calling.cfi-mfcall
- Like cfi-icall
, but with support for pointer-to-member-function.LLVM implements the llvm.type.test
intrinsic in two different ways depending on whether it is the type signature of a global or a function.
Typesigning globals is primarily used for vtables in C++, though there's not anything preventing you from using it for something else.
LLVM will place all type-signed globals into the same region, and then for each call to llvm.type.test
, compute a lower bound, upper bound, alignment restriction, and validity bitvector for which globals are legal. As an optimization, it will do its best to lay out globals which have at least one type in common next to each other, preserve alignment, and try to sort them so that bitvectors are all ones as often as possible (and so unnecessary). See the CFI design document for further details.
Functions are implemented similarly to globals, but use a jump table rather than placing all the functions in the same section. This allows alignment to remain constant and small despite the greatly varying size of functions (and possibly even size varying based on the layout itself). If I had three functions, f
, g
, and h
, and all had their addresses taken in ways the compiler couldn't track, they'd end up in the table like this:
f:
jmp f_real
int3
int3
int3
g:
jmp g_real
int3
int3
int3
h:
jmp h_real
int3
int3
int3
Every time the source language tried to take "address of f
", it'd get f
, not f_real
. If f
and g
have the same type signature, but h
has a different one, at a callsite to the first signature it'd do a range and alignment check, then jump. If f
and h
had the same signature, it would use the bitvector to check. As with globals, the compiler will do its best to lay these out such that bitvectors are not usually necessary, but their flexibility means that functions can be assigned multiple types.
All of the above can only be done efficiently within a single lowering pass, which is why it requires LTO - laying out the global table and the canonical jump table require global information. Unfortunately, shared libraries are quite popular, and implicitly reject this global knowledge.
To address this, there is an experimental mode where in addition to the inlinable checks described above, each module:
__cfi_check
which can be used to validate an arbitrary type exposed with itself__cfi_check
if it is CFI enabled.LLVM CFI has a few major drawbacks. Notably:
gcc
)In the particular case of embedded and kernel code, many of these drawbacks move from unfortunate to untenable. The Linux kernel supported traditional LLVM CFI for a bit, but:
gcc
, even if clang
is a well-supported option these days.This led to the introduction of KCFI. This makes a few observations:
From this, they came up with a simplified version of the LLVM type testing system:
These typeIDs are the xxHash
of the Itanium-mangled representation of the function's type, which means that as long as compilers agree on the instruction to embed them in, this gives you cross-compiler CFI without global knowledge.
The primary limitations here are:
Incomplete support for CFI may prevent Rust from being used in several environments that we could make real improvements in.
Especially in the embedded world, Rust tends to get mixed into existing C/C++ codebases. Unfortunately, this can turn into a situation where the mixture is less secure than either. This is because while Rust has robust protection against memory safety violations baked into the language, hardened C++ environments have invested in CFI and related hardening features to prevent a memory safety violation from turning into a full instruction pointer hijack. This means that a memory unsafe C++ program which was compiled with CFI enabled can go from safe to unsafe when it adds a Rust dependency if Rust does not have CFI as well - the memory unsafety of the C++ code can modify data accessed by Rust, which then fails to detect the bad control flow transfer.
We want to be positioned to tell people that yes, using a Rust component in your existing stack will improve your security posture.
The Android Kernel is generally deployed with KCFI enabled. We are currently working to deploy a Rust-based rewrite of the binder[7] driver after a number of memory safety issues over the years. This component is exposed to literally every process on the system, so any vulnerabilities in it are almost always exploitable. To even consider deploying it, the functions Rust exposes to C must be appropriately tagged with compatible KCFI types. To be confident deploying it, we need Rust's internal indirect calls to be protected as well, because C may corrupt them, even if Rust doesn't, and because unsafe
code in the Rust portion of the kernel may have its own bugs.
Currently, regular functions work in a way compatible with C, largely thanks to the -fsanitize=cfi-icall-experimental-normalize-integers
[8] flag.
Trait objects with receivers of &self
or &mut self
can have their methods called.
However, the following will currently result in a CFI mismatch (which will lead to a program abort) or a bug!
triggering
<S as Foo>::foo as fn(&S)
where Foo
is a trait with fn foo(&self)
- you can't use methods from traits as functionsfn foo(self: Arc<Self>)
on a trait object - you can't use anything but &self
or &mut self
as a receiverlet f: &fn() = &((|| ()) as _);
- you can't convert a closure to a callable function pointer[9]let _ = Box<dyn Foo> = Box::new(S);
- you can't drop a trait object, the drop entry has the wrong type.self_cell
, you'll get an infinite loopThis means that in practice, you can't enable CFI in a real Rust codebase and have it run. You'll also notice one thing in common here - other than the first and last entry, these are a result of calling vtable entries that have a type that does not match the Virtual
call that is made to them, usually because the vtable entry has its fully concretized type, while the Virtual
type is abstract. The first one is as a result of trying to fix this abstraction problem in a specific case by adjusting the type encoding.
If the vtable entries have alias sets that are incompatible with virtual call alias sets, what should we do about this?
Since objct-safe vtable functions are not directly exposed to C (they aren't extern "C"
ABI, so there's no way to usefully do that), we are largely free to select whatever alias set we want. We explore several options:
All call-sites and definition sites with Rust ABI have the same alias set. This is almost equivalent to disabling the CFI on intra-rust calls. This will work, and be simple, but leaves us open to mixed language issues and makes incorrect unsafe
much more likely to be exploitable.
Use the rules for ABI compatibility to normalize types down before encoding them when the ABI is "Rust". This should cover all object-safe methods, which should cover the vtable. We would still need to use traditional, type-based encoding for non-"Rust" ABIs for compatibility with external code we link against.
Normalization would proceed on each argument and the return type before encoding as we already do, likely with a different prefix in order to prevent Rust ABI annotations from joining the alias set of extern "C"
functions.
The per-type normalization would work roughly like this, following as a visitor
*const T
, *mut T
, &mut T
, Box<T, Global>
, NonNull<T>
, replace it with a *mut <T as Pointee>::Metadata
, normalizing the type before proceeding.usize
or isize
to the platform's pointer width.char
to u32
extern "foo" fn(..) -> T
to extern "foo" fn()
, do not recurse into arguments()
repr(transparent)
struct to its unique field, with a stack to catch cycles. On cycle, don't apply this rule but apply any other rules that apply.NonZero<T>
to T
Option<T>
to T
if T
is subject to null pointer optimization.Finally, any argument whose type is PassMode::Ignore
in the ABI (the usual example being ZSTs in the "Rust" ABI) is removed.
The major remaining caveat is that Virtual
uses force_thin
to become ABI compatible. This means that when computing the "ABI signature" for a Virtual
call, the first argument should always become *mut ()
, e.g. "a pointer with no metadata".
The primary advantage of this approach is that miri
is already busy enforcing these rules on indirect call boundaries, so it should be valid already with no real modifications to codegen, just an alternate computation of alias set.
This approach sounds enticing, but this leaves alias sets extremely weak, not much better than the Singleton strategy. I haven't done measurements, but since all fn foo(&self)
would be mutually compatible, regardless of what trait we're talking about, CFI would not be enforcing much beyond an IBT-style scheme here - little more than arity restrictions.
Restrict further than the ABI compatibility rules require, but while maintaining a that the normalization procedure can assign a principal alias set per def_id
/args
combination. This restriction means we will never need to introduce an additional shim, because the alias set can be determined without casing out on the InstanceDef
. We maintain this restriction on our choices because if we allow differing alias sets per shim[10], we gain the implementation complexity and runtime overhead we'd incur with the type signature approach and may as well use them as our basis since they would be more precise.
The only piece that we know actually mismatches in type-based alias sets today is the receiver type. For the first argument of any function, use the ABI normalization rules that we just proposed to transform the first argument. All other arguments and the return type must match the type.
This still leaves fn(&self)
of all kinds compatible with each other, but at least fn(&self, &Foo)
and fn(&self, &Bar)
are no longer compatible, which improves things somewhat.
We could do better than this while still maintaining a principal alias set, but we have a lot of limitations without whole-program analysis. We would effectively need to class potential receivers into components based on a transitive closure over an undirected variant of the "implements" and "supertype" relationship. With whole-program analysis, we have some hope of making a few smaller components to class things into, rather than all legal object receivers just becoming *mut ()
. Without full-program analysis (which is one of the goals for KCFI), all public types and traits must be assumed to be connected (as a foreign crate could easily create a link), and so most receiver types contract to the same representation.
We could use LLVM's alias analysis passes on an un-instrumented variant of the program to dynamically generate alias sets. This would have the advantage of being able to use dataflow and LLVM type compatibility, but the disadvantage of requiring LTO without any hope of cross DSO support. It would also lead to difficult to predict changes in alias set when implementation details in the program change. This might be interesting research, or an experiment, but it doesn't satisfy our practical goals.
This is what the codebase today attempts to do, and generates the bugs described above without alterations because it never alters shimming, and so no assignment of type-signature-based principal alias sets can be correct.
In this approach, every Instance has a principal alias set, and this alias set is allowed to depend upon shim information as well. If we need to call a given def_id
+ args
combination at two different types (say, *thinconst dyn MyTrait
and &MyConcrete
), then there will be a way to shim it via InstanceDef
that will result in each type being present. When creating an instance that may be the recipient of an indirect call (whether because it is going into a vtable or because a function pointer is being created), it will be resolved in a why that applies an appropriate InstanceDef
to get the receiver type to match, not just be compatible.
The rest of the design assumes this is what we're doing.
For LLVM CFI, we could perhaps attach every possible type that a method could have. However, if we want KCFI to work, we need at least one compilation mode where every address has at most one type. The rest of the design assumes this restriction. It will still solve the crashing problems of LLVM-CFI, just possibly not in a way that leverages the type-test system for optimal performance.
We cannot just make each Instance
have the alias set expected by its virtual call for a few reasons:
trait Parent { fn f(&self); } trait Child: Parent {}
, we expect to have an entry that works for *const dyn Parent
and one that works for *const dyn Child
[11]drop_in_place<T>
is inserted into every vtable. Since a given concrete T
may implement any number of traits, and every one needs a drop_in_place
entry in their vtable, we can't just adjust the type of drop_in_place<T>
.This leaves two approaches open:
Instance
s by default match the alias set it would be legal to perform an explicit indirect call at (e.g. fn-ptr based). Shim adjustments are applied when producing instances for a vtable to abstract them.
The primary advantage of this approach is in implementation simplicity and results looking as one might expect. For example, an Item
of a function fn foo(&self)
with Self = Bar
will have the alias set for fn(&Bar)
, not fn(*const dyn Foo)
with an unprinted caveat that this is a thin-call receiver, not a fat dyn pointer. It also results in fewer changes from the status quo when CFI is not enabled, as implicit argument transmutes will not need to be inserted into the generation of the base instance for methods. This keeps the changes generated by CFI mostly separated from normal compilation.
Basically, this assigns the alias set you'd expect if you were not thinking about vtables, and is simpler to implement and test.
This approach is what I have implemented in the current PR.
Instance
s by default match the alias set it would be legal to perform a vtable call at (e.g. thindyn based). Shim adjustments are applied when producing instances for function pointer. Virtual calls are adjusted from today so that instead of calling using an expected signature of *const dyn MyTrait
, they will call with *const dyn MySuperTrait
if MySuperTrait
provides the method being called.
The primary advantage of this approach is that it will likely result in faster and smaller code, as creating a function pointer from a trait function is uncommon relative to vtable dispatch. This means that the common case will avoid a shim.
This adds complexity, because not all Instance
types can tell select a default vtable alias set, but are still used in vtables. drop_in_place
via DropGlue
is a key example of this - it cannot determine the trait it is being used in, because it may be used in multiple traits. This means that some instances do not need a shim when going into a vtable, and others do, and we need to figure out which they are and shim them.
If we make make the instance type match its alias set, this also means that the logic for determining if an instance needs to be called with a "thin self" will expand from a simple enum check on the InstanceDef
type[12] to a computation involving both the shim kind and the def_id in order to determine whether that leading *const dyn Foo
is a fat or thin pointer.
This logic gets even more confusing if we have a trait function which takes a fat &dyn MyTrait
in the first argument, with no receiver, and uses a where Self: Sized
clause to keep the trait object safe. We now need to check not only whether an item implements a trait, but whether it really would go in a vtable before we can determine whether it is force_thin
or not.
The other approach dodges this by only shimming things as they're put into the vtable, which means they're pre-selected to have an object-safe receiver that can be rewritten to a thin pointer.
This approach would also make the extra arguments approach more viable, as ReifyShim
would likely cover some of what is needed.
These shims should be small, but there may be a lot of them and they introduce indirection through trampolines in many cases. This will lead to unnecessary code bloat, confuse the optimizer, etc.
Regardless of which alias set we assign to unshimmed Instance
s, we know that we need to create new Instance
s either for vtables or function pointers. These instances need a field to determine which abstracted Self
is in use, since drop_in_place
demonstrates that we will have an unbounded number of variants for each existing instance. We still need the original Self
type as well however, as we need to be able to generate the original shim or call the original instance. We explore several strategies for making this invisible available:
Instance
It seems tempting to add the replacement Self
type to Instance
, either by packing it into Args
as an extra field or creating an Option<Ty>
field for use during CFI, however MIR generation is done on InstanceDef
, not Instance
. This presents two problems:
Arc<Self>
) compatible with thin_self
.InstanceDef
argumentsOnly some InstanceDef
elements can possibly be inside a vtable today - we could just add an extra argument to all of them, and add a helper method for getting the InstanceDef
's abstracted Self
which we could reference in shim generation. I started going down this road initially, but then remembered that Fn
-family trait objects cover almost everything else; it's only Virtual
, Intrinsic
, and ThreadLocalShim
that are not going to end up in a vtable.
We could still do this if we really want to avoid a wrapping InstanceDef
, but we'd end up adding it to several different shim types and repeating our logic in several places.
InstanceDef
: CfiShim
This creates a new shim kind that is essentially a modifier for all other shim kinds. We can now transform any InstanceDef
into a variant that abstracts to a particular type.
The primary advantages of this approach are:
&dyn
cast, not in the defining crate. This decision is written in one place, not 7.The primary disadvantage of this representation is that wrapping some InstanceDef
s (closure-likes and trait implementation items) require additional logic.
This is the approach we assume through the rest of the design. Switching to extra InstanceDef
arguments would be compatible with the rest of the design, but require adding codepaths to several branches rather than the central one.
Given an instance, how do we want to get the alias set?
We could add a separate method to Instance
which computes the principal alias set. If the ABI is "Rust"
, we use the logic described in this section. "rustcall"
likely needs special casing. All other ABIs directly compute the type and then call the type encoder, as they do today.
This allows us to avoid modifying the type in a way that would change how Rust expects to interact with things. Rather than being a transmute followed by a direct call, or a transmute followed by an existing shim, generated code can just be a direct call or the existing shim.
An Instance
is available when assigning an alias set to a declaration, but not when making an indirect call - there will be an Instance
only for direct calls and indirect calls through a vtable (Virtual
). This means that we still need to have our call-site alias set be computable from the function pointer type as well.
Advantages:
Disadvantages:
In retrospect, this might be worth trying as a refactor. The current patchset uses the type directly.
We generate shims such that the type of the shim reflects the generalization we're allowing. For example, with
trait Foo {
fn foo(&self);
}
struct Bar;
impl Foo for Bar {
fn foo(&self) {}
}
the Bar
vtable for type Foo
would have a foo
entry with explicit type fn(*const dyn Foo)
, and a hidden[13] force_thin
annotation based on the instance. The DROPINPLACE
entry would have type fn(*mut dyn Foo)
as well.
Advantages:
Disadvantages:
This is the approach currently implemented and assumed for the rest of the design.
Our primary goal on each shim is to convert the receiver type of the function to match the abstracted type. We have several options:
The most straightforwards approach would be to examine the first argument's type, match on the receiver structures, and replace it. This approach is simple, and works for &self
, &mut self
, Self
, Box<Self>
, Rc<Self>
, Arc<Self>
, Pin<P>
with P
matching the rest of this pattern aside from Self
. There are two difficulties this produces:
dyn MyTrait
, mergeWhen producing a shimmed function, rewrite the underlying def_id
if it is closure-like or a trait function to the generic version of it, with appropriate arguments. When we're only shimming vtables, it's gauranteed that all functions have a receiver, and after this transformation, the receiver is always in the first position.
Instantiate the function signature at the concrete type, then instantiate the receiver at the abstract type, and replace the first argument with the rewritten receiver.
This largely works, but if, and could theoretically fail if a receiver type depends on an associated type of MyTrait
.
The main downside here is that it involves two type instantiations, creating a new signature, and all returned types are now FnPtr
, which is not quite accurate. It is also slightly weakens CFI since the associated types are no longer qualifiers.
Self
, with appropriate associated typesProduce a complete trait object type for the conrete type's implementation of the trait in question at the trait arguments. For example, if we have
trait Foo<U> {
type T1;
}
trait Bar<U>: Foo<U> {
type T2;
}
struct C;
impl Foo<u32> for C {
type T1 = u8;
}
impl Bar<u32> for C {
type T2 = i8;
}
then when we try to produce the type for C
as it implements Bar<u32>
, we emit dyn Bar<u32, T1 = u32, T2 = u32>
. These constraints will be present at the call site, because they are required for a valid trait object.
We then replace the def_id
as in the previous approach, switching closure-likes to the call
family of functions with appropriate arguments, and trait implementations to the trait method they're implementing. This means that the first parameter of everything in the vtable at this point will be Self
, and we can use the trait object type above. Because we added the associated types, the entire type should instantiate cleanly against the generalized type.
This is what is currently implemented in the patchset.
CfiShim
s go in?In this design, these shims should go in the crate where the vtable is created. The defining crate for the underlying instance cannot know all types the shim will need to be defined at.
If we shimmed function pointers rather than vtables, shims other than DropGlue
(since we again don't know all the types it needs to be instantiated at) and similar shims[14] can go in the defining crate.
This section is only needed if we choose to express the alias set directly through the type. If we were to use a separate mechanism, we would not need to adjust the MIR.
When we rewrite the type on a function through a shim, we need to line up the type of the generated MIR to match the type of the function. We generate for the wrapped InstanceDef
, then we rewrite the body to have a transmute from the new receiver to the one the original instance would have expected.
To support alternate receivers, we unfortunately have to unwrap to the inner dyn
pointer. For example, if we have Arc<dyn Foo>
, we unwrap to NonNull<ArcInner<dyn Foo>>
, then *mut ArcInner<dyn Foo>
to match what the ABI expects and allow force_thin
to do its job.
This is a brief overview of the patchstack, intended to help reviewers find specific sections they're looking for. This refers to the patchstack as it is, and not to the design or alternate implementations.
This computes the trait object type with associated types that will be later used to compute the abstract type of an instance. It's in a query both because some of the functions it calls are not available at the intended call-site, and because it is called with the same argument several times.
This splits InstanceDef
visit code out into a function to make the introduction of CFI shims a little smaller
Same as above - factors out a function to decrease noise in the CFI shim patch
There are several places throughout the code where VTableShim
is special-cased to handle its cast from self
to *mut self
. Most of these are annotated with a comment by eddyb to factor it out. Since the shim generator needed to do it one more time, I factored it out first.
This creates the InstanceDef::CfiShim
variant. It wraps another InstanceDef
, which is expected to be another shim, and carries an abstract type, as computed by the trait_obj_ty
query. The user is intended to construct a CfiShim
via the .cfi_shim
instance method. This method will be a no-op if cfi_shims()
does not return true on the session, currently controlled by either CFI or KCFI being enabled. If it's wrapping a closure-like or a trait method implementation, these are replaced with a ReifyShim
pointing to the abstract method, as described in the design. When generating the shim, it will prepend a transmute from the abstracted receiver to the concrete receiver.
This enables usage of these shims with CFI
Attach .cfi_shim()
to vtable drop and to collector visiting. This makes trait object drops start working.
Attach .cfi_shim()
to vtable method entry and collector visiting. Alternate receivers begin to work. Standard receivers already work at this point because we changed the encoding of any method on a trait to abstract itself if it used &self
or &mut self
as a receiver.
These fix individual remaining bugs, though in ways that may depend on CFI shimming being enabled already.
This removes the encoding modification we previously had that explicitly re-encodes &self
and &mut self
on trait methods. It is no longer needed with the rest of this design, and removing it makes function pointers to methods on a trait work again.
Some conversions to function pointers, most notably the conversion from a closure that captures nothing to a function pointer, depends on the understanding that a PassMode::Ignore
argument does not alter the ABI. This patch loosens the CFI around these arguments by skipping them when generating the alias set. The alternative would be to introduce another kind of shim to explicitly truncate a non-passed argument from the type.
In user Rust, dyn
is not a type. However, it effectively appears in drop_in_place
when generating a vtable with no principal trait. For example:
let x: Box<dyn Send> = Box::new(MyType) as _;
will construct a drop call for a vtable with no trait. Send
and other auto-traits are non-principal, so at vtable allocation time, we have no description for the dropped type other than "It's a pointer to an object that was converted into a dyn ?
at some point". Because we are describing our alias set in the type system, this means that the receiver type uses dyn
as a self type, which causes hiccups in several places. This teaches those places to tolerate a dyn
with no predicates.
This patch, or something like it, can and probably should land even if the rest don't. The type encoder wants to flatten #[repr(transparent)]
into its single, non-ZST field for compatibility. The existing code attempts to avoid recursion by generalizing pointers, but the use of PhantomData
or any similar structure defeats this. This pattern is used in self_cell
, so it's not just an adversarial example.
Super vtables are currently skipped in the collector because the the child vtable includes instances for all the super vtable entries. In our case though, the super vtable will be shimmed to a different abstract type than the entries in the child vtable. This means it will have different instances, so we need to generate them.
Auto traits are not present when generating vtables, but they may be present on the receiver when calling them. This strips auto traits off the receiver of virtual calls so that they're compatible with the target, since a virtual call cannot require those additional bounds because it was an object-safe method.
This stack gets us to "Most Rust code actually builds and runs under CFI", but there are more improvements we can still make in the CFI area.
A lot of this design is determined by the restrictions of KCFI - namely that we can't typesign globals, and every function can only have one type signature.
We can leverage both of those capabilities to remove shims and make things more efficient. If we do these, we'll want to enable KCFI in userspace to ensure it continues to work, as the implementations will diverge.
Virtual
call through dyn DefiningTrait
rather than dyn MaybeSuperTrait
.force_thin
applied.drop_in_place<T>
, we still need a shim, but only one per crate. The implementor of some trait for T
should provide the shim, and it should contain types for all traits that crate implements for it on the same shim.In short, check the vtable, don't check the virtual call itself.
I haven't verified the checks on loading vtables work correctly, but I see the implementation. If you checked the load of the vtable, you've already checked the type of the method, you don't need to do it again. The vtables reside in a RO region, so if you did a typechecked load, you're already safe.
We can't get rid of shims entirely, because the alias set expression we're using fundamentally doesn't have a principal alias set per unshimmed instance. Each shim we produce is essentially a witness for membership in an additional alias set. We can avoid these by reducing the number of alias sets a method is in, or by making the unshimmed case match what is common in real world code.
If we make Virtual
use the defining trait as a receiver rather than the current one, we no longer need a shim for every supertrait on the trait the method implements. If deep supertrait hierarchies are used, this could significantly reduce the number of possible shims.
We can switch to vtable-compatible defaults. Without the the Virtual
call change, this wouldn't do much - supertraits would bring most of the shims back, while adding complexity. However with it, this would make shim generation much more unlikely, as calling a method through a vtable is much more common than converting a trait method to a function pointer. drop_in_place
would be by far the most common shim remaining. This is potentially significantly more complicated, but would improve performance.
FineIBT is only available on some CPUs, but it provides more flexibility of implementation. The main benefits the Linux experimental implementation claims are:
Today, they are using a hashed C signature as an alias set, the same way KCFI does. However, the interesting part is that the alias set policy is enforced at the callee. This means that if we convert the Virtual
call to call at trait definition type, we could use a custom enforcement that checks if the caller is one of two values rather than just one, e.g.
fn_ptr_entry_f:
endbr
CHECK_FNPTR_BUNDLE
jmp direct_f
virtual_entry_f:
endbr
CHECK_FNPTR_BUNDLE
direct_entry_f:
// Actual implementation
This is experimental though, and not available on all chips, so it wouldn't actually allow us to ship Rust in the Android kernel. I'm mostly including it in case all this discussion of CFI got you excited, because this is probably the direction the version of this without whole-program-analysis is going.
Indirect jumps are control transfer instructions which use a computed rather than a constant target. ↩︎
Shadow Call Stack uses a separate stack in x18 which is only accessed around call/return to store return addresses. This makes it difficult to overwrite the return address because the real one is only accessible through a register which isn't used elsewhere in the program. ↩︎
Safe Stack partitions the stack into two stacks, one of which holds compiler-controlled values (register spills, return addresses, etc.) and one which has address-taken and programmer controlled values. This makes it difficult to overwrite "safe" values from out-of-bounds writes relative to "unsafe" values because they are no longer adjacent. ↩︎
Stack cookies put a randomized (extent of randomization varies by implementation) value onto the stack immediately after the return address. This value is checked before returning, which makes linear overwrites from the stack onto the return address difficult to perform because the cookie should be unpredictable. ↩︎
These are a pair of hardware accelerated indirect branch control present on recent Intel and ARM CPUs respectively. They both work on the same principle - a special instruction that would decode into a nop
on earlier versions of the CPU is placed at every location in the program it would be legal for an indirect branch to go. When the protection is enabled, indirect control flow transfers that do not end on one of these landing pads will fault. ↩︎
Fine IBT is a software extension of IBT/BTI in which callers move the identifier for an alias set into a caller-saved register, and after each landing pad, there is an efficient check of the provided identifier. The paper does not select a scheme for defining alias classes, but the experimental Clang implementation uses hashes of type signatures, similar to KCFI. See FineIBT Support for further discussion. ↩︎
Binder is the name of Android's IPC system. It's in the kernel to allow it to make scheduling decisions, reduce copy count, and manage the lifetime of objects passed between more than just two processes. ↩︎
Kudos to @rcvalle for designing and pushing that flag through LLVM. ↩︎
I believe @rcvalle has an alternate fix in progress for this focused on adjusting closure encoding. ↩︎
We technically ignore VTableShim
here, because it is only used for unsized_fn_params
(which is in no danger of stabilization), and call_once
, which is behind the fn_traits
feature. This means that without using unstable features that aren't commonly used, we can set all trait methods that take self
to encode as though they took *mut Self
without breaking anything. ↩︎
We might be able to avoid this in the future by changing virtual calls when in CFI mode to perform an implicit trait upcast. ↩︎
Today, whether it's Virtual
, with my patches, whether it's Virtual
or CfiShim
↩︎
In an ideal world, thin dyn pointers would be explicit in the type rather than based on logic around what instance is being called. ↩︎
Since I haven't implemented this, I don't know that DropGlue
is the only one that still requires a shim to go into the vtable, but it is an example. ↩︎