2022-10-07, PnkFelix share with RCValle, Notes on RFC PR [#3296][rfc#3296]

RFC PR #3296: Improve C types for cross-language LLVM CFI support

References

Broader "LLVM Control Flow Integrity" Tracking issue https://github.com/rust-lang/rust/issues/89653

Papaevripides and Athanasopoulos (2020): "Exploting Mixed Binaries" (PDF)

Mergendahl et al (2022): "Cross-Language Attacks" (PDF)

RFC #3296 proposes to "identify" uses of C char and integer types at "time types are encoded". (PnkFelix didn't understand what the quoted terms were meant to mean on their first read, but they believe they understand it now, and can work with the RFC author revise their text to make this more immediately clear.)

Background

Indirect branches in Rust-compiled code are not validated, while such branches would be validated under analogous C/C++ code compiled with forward-edge control-flow protection. Thus, Rust is allowing potentially desirable control-flow protection to be bypassed.

PnkFelix thinks that part of the goal (maybe all of it?) is to ensure that enough metadata is stored to enable, at runtime, a test whether a given (function) pointer was assigned a given type by the compiler.

So: you need to encode the type itself, and you need to associate that encoded-type with the pointer.

Dealing with this problem at the cross-language level requires cooperation between distinct tools (since you inherently will be combining object code from a Rust compiler and a C/C++ compiler such as Clang). Achieving such cooperation requires either:

One tool (e.g. Rust) reuse an established comptible encoding (e.g. the Itanium C++ ABI mangling used by Clang)
Both tools select an entirely new encoding that both can adopt.

Meta-point: "Motivation" section seems very long. (Does it need to go into this level of depth to motivate this change?)

PnkFelix doesn't understand the point the doc is making when it distinguishes between "provide comprehensive protection for C and C++ -compiled code" vs "provide comprehensive protection across the FFI boundary". Is the idea that one might try to hack something up that doesn't commit to an encoding supported on the C/C++ side, and try to ensure all the necessary checks here are entirely handled at the FFI boundary? (To PnkFelix, that sounds inherently broken, in terms of the amount of dynamic validation that would require during any FFI call.)

Meta-point: It seems like this RFC is perhaps trying to elaborate the whole design space in the motivation section. This is not the way good RFC's are written. Instead: Move the elaboration of the space into the "Alternatives" section towards the end, and have the motivation section focus on the single point (or at least design subspace) that is being recommended here.

The Sub-Problem Of Interest

PnkFelix is guessing/hoping that the section Rust vs C char and integer types is going to start into the real meat of the specific proposal at hand here.

Aha!:

For convenience, some C-like type aliases are provided by libcore and libstd (and also by the libc crate) for use when interoperating with foreign code written in C. For instance, one of these type aliases is c_char, which is a type alias to Rust’s i8.

To be able to encode these correctly, the Rust compiler must be able to identify C char and integer type uses at the time types are encoded, and the C type aliases may be used for disambiguation. However, at the time types are encoded, all type aliases are already resolved to their respective ty::Ty type representations[11] (i.e., their respective Rust aliased types), making it currently not possible to identify C char and integer type uses from their resolved types.

So it seems like the heart of the issue is that we have implemented our C FFI support via definitions of type aliases, and the compiler throws away the fact that the original "intended" types corresponding to certain (abstract) C types, rather than implemented-defined (concrete) integer types of specific sizes.

the Rust compiler must be changed to

be able to identify C char and integer type uses at the time types are encoded.

not assume that C char and integer types and their respective Rust aliased types can be used interchangeably across the FFI boundary when forward-edge control flow protection is enabled.

Okay, the above seems like this heart of what is actually being proposed here. And that is indeed a severe language change. PnkFelix needs to go back and review why this makes sense (versus stating that the implementation-selected concrete type is entirely suitable when it matches up with the given semi-abstract C type).

It sounds like the discussion of a new encoding was perhaps being floated as another way to deal with the above problem, but PnkFelix thinks the problem is that the encoding they chose ends up providing less protection (and perhaps inherently was forced to do so? Its not clear to PnkFelix why, but they don't know if they care to dig into that rabbit hole…)

Notes from 2022-10-07 meeting with rcvalle

Q: can you associated more than one fn sig with a given fn-ptr?
- If you associate a fn-ptr with more than one sig … lookup cost is O(n)
- e.g. normal and generalized signature.
- LLVM CFI created and borne in clang, i.e. clang-cfi
- but it is sufficently extracted from clang, and now LLVM general
- anyway: is pass that replacing the type used for bitmap membership, replacing *type with *void …
- to generalize a type or function pointer at level where pass runs, because at that time you're using LLVM types.
- Type ID is assigned early on
  - pass that replacing the type-test intrinsic is later on, link-time.
  - ID is entirely abstract at time.
why would there be a runtime need to look in multiple buckets…
- if we assigned multiple
"every function pointer should have just one ID"
- lower type test pass
- distinguishes pointers by unique ID's
why not allow multiple "IDs" for a single fn defn
- complexity of LowerTypeTest pass
linker
- today, if you link C/C++ code compiled-with-CFI and Rust code compiled-with-CFI, things work as long if you don't pass things across the boundary and end up with cross-language
Q generalize pointers flag exists. can that be generalized to allow other normalizations (e.g. lowering all integers to their bitwidth)?
- RCValle spoke with LLVM developers about potential for doing this. they were not terribly receptive.
  - consider integer normalization case
- PnkFelix wonders if we can potentially support such normalization via a local patch to Rust
  - not every company uses the locally patched LLVM.
PnkFelix muses that we could allow local attribute at the Rust defintion of an extern "C" fn, to indicate how the given signature should be remapped for purposes of CFI.
- RCValle points out that similar feature could be leveraged to provide even more fine-grained differentiation.
potentially related to above grsecurity PaX RAP analysis
- Commerical CFI implementation
- infers, via code analysis, whether two functions that have the same signature can actually be put into two distinct groups. (Currently implemented via synthesizing new type that is added onto the signature.)
CFI checks (all for indirect calls)
- Rust code calling Rust code; this we can always make work
- Rust code calling C code (i.e. C definition passed into Rust); this is easy to make work, because Rust has the information it needs to do the right thing
  - (PnkFelix muses after meeting: does it work out-of-box today? How can it, if aliasing means we don't actually know the actual C fn declaration?)
- C code calling Rust code: this is the problematic case

2022-10-07, PnkFelix share with RCValle, Notes on RFC PR #3296

References

Background

The Sub-Problem Of Interest

Notes from 2022-10-07 meeting with rcvalle

Read more

Core Transmutation Call Graph

Course Correction: Rust 1.52 fingerprint bug

Untitled

Krabcake: A Rust UB detector