Miri C FFI Extension

This doc describes a proposed design and current work on extending Miri to support the Rust C FFI. The plan involves some changes to underlying data structures that are part of rustc, and the doc also explains what these changes are and the reasoning behind them.

Miri Design

At its core, Miri is an abstract machine. It represents the state of the program it is executing, including an internal model of the memory used by the process. It consists of an Evaluator struct, that has fields for all the components of the runtime state.

The Abstract Machine

The Rust compiler provides a Machine trait designed to help instantiate an interpreter for MIR. It provides hooks for different operations involved in program execution. For example, the trait provides a hook function memory_read which is used to add custom functionality to memory reads. The code for this hook stub in the rustc Machine is as follows:

/// Hook for performing extra checks on a memory read access.
///
/// Takes read-only access to the allocation so we can keep all the memory read
/// operations take `&self`. Use a `RefCell` in `AllocExtra` if you
/// need to mutate.
#[inline(always)]
fn memory_read(
    _tcx: TyCtxt<'tcx>,
    _machine: &Self,
    _alloc_extra: &Self::AllocExtra,
    _tag: (AllocId, Self::TagExtra),
    _range: AllocRange,
) -> InterpResult<'tcx> {
    Ok(())
}

In Miri, the Evaluator implements the rustc Machine trait. There, it overrides some trait functions. Among these functions is the memory_read function described above: here, Miri has some custom functionality for tracking and dealing with data races and stacked borrows when memory is accessed.

The Evaluation Context

The Machine is running inside an evaluation context. This is the InterpCx (Interpreter Context) struct, provided again by the rustc interpreter support. Miri has its own version of the InterpCx, the MiriEvalContext, which is just the base InterpCx with the appropriate lifetime parameters, for the Miri Evaluator.

/// A rustc InterpCx for Miri.
pub type MiriEvalContext<'mir, 'tcx> = InterpCx<'mir, 'tcx, Evaluator<'mir, 'tcx>>;

Miri also provides an extension trait for custom evaluation contexts even within Miri itself. This is the mechanism by which different parts of Miri modularize their customizations to the environment. For example, Miri provides some functionality for detecting data races. As part of this functionality, they extend the evaluation context with some data race-specific functions: this is done by extending the MiriEvalContext.

Current FFI Support

Miri does currently have some limited support for foreign function calls via emulation. This is all contained in the foreign_items module.

This support consists of a hardcoded list of manually emulated functions, built to support commonly used foreign functions such as malloc. As it stands, there is a custom extension to the MiriEvalContext (in shims/mod) that implements a custom hook for function calls. This hook calls a function emulate_foreign_item if the function being called is identified as being a “foreign item” (i.e., if its body cannot be found). The relevant call, along with the corresponding comments, is included below to illustrate.

// Try to see if we can do something about foreign items.
if this.tcx.is_foreign_item(instance.def_id()) {
    // An external function call that does not have a MIR body. We either find MIR elsewhere
    // or emulate its effect.
    // This will be Ok(None) if we're emulating the intrinsic entirely within Miri (no need
    // to run extra MIR), and Ok(Some(body)) if we found MIR to run for the
    // foreign function
    // Any needed call to `goto_block` will be performed by `emulate_foreign_item`.
    return this.emulate_foreign_item(instance.def_id(), abi, args, dest, ret, unwind);
}

Since this list of supported foreign functions is hardcoded, it is limited to only built-in native calls (and is not an exhaustive list of these). If Miri encounters a foreign item whose name is unknown, then it throws an unsupported exception and crashes the interpreter.

Proposed plan

We are not touching the C code being executed. As much as possible, we constrained the modifications to Miri itself, with some modifications to the rustc compiler. We have hooks around calls to external C functions, that handle any tagging or custom allocation of data returned from C calls or passed as arguments to C calls.

System support

Currently we're only supporting this feature on Linux.

Calling C from Miri

In order to call C code from a Rust program executing in Miri, we are extending Miri with the libffi crate. This provides an interface to the host system’s libffi. It allows us to dispatch calls to linked code.

Linking C code

Miri doesn’t currently have a mechanism to link to external C code. We’ve implemented this by adding a new command line argument -Zmiri-extern-so-file that users can use to specify a path to a shared object file.

Dispatching calls to C code

When an external C call is encountered by Miri, the steps it follows to dispatch the call are:

  1. Load the specified linked C shared object file (if applicable)
  2. Load the specified function call from the linked library
    • again using libloading
  3. Convert all arguments to the function into values that can be passed into C
  4. Call the C function
  5. Store the return value of the function

Following is a simplified/condensed version of the code we added to call a function that returns an i32 primitive using libffi. Note that we’ve removed the error handling code for simplicity.

unsafe {
    // get the libloading::Library and extract the function
   let lib = this.machine.external_so_lib.as_ref().unwrap(); 
   let func: libloading::Symbol<unsafe extern fn()> = 
                                       lib.get(link_name.as_str().as_bytes());

   // get the code pointer 
   let ptr = CodePtr(*func.deref() as *mut _);
        
   // call function and get return value (in this case an i32)
   let x = call::<i32>(ptr, &libffi_args.as_slice()); 
   
   // store the value in Miri's internal memory 
   this.write_int(x, dest)?;
}

CodePtr is a code pointer type supplied by libffi, to provide access to the function being called.

A note on types

Part of the simplification of the code above is that it elides the type conversion required to turn values from their Miri representation into their corresponding values that get passed into the C function call. This is required for both the function arguments (to construct the libffi_args vector, we iterate over the arguments to the call in Miri) and for the function return.

To determine a correspondence between the Miri types and the C types, we refer to the available Miri types (TyKinds) and the types that implement the CType trait in libffi. Clearly these do not have a 1:1 correspondence: there are many more complex types with a Miri TyKind representation that are not explicitly supported by CType. For us to support these types, we will need to make use of the “catch-all” CTypes: *const T and *mut T, the pointers.

As a demonstrative example of this conversion code, here is the code for converting a list of arguments that are all i32.

// Get the function arguments, and convert them to `libffi`-compatible form.
let mut libffi_args = Vec::<CArg>::with_capacity(args.len());
for cur_arg in args.iter() {
    libffi_args.push(Self::scalar_to_carg(
        this.read_scalar(cur_arg)?,
        &cur_arg.layout.ty,
        this,
    )?);
}

// Convert them to `libffi::high::Arg` type.
let libffi_args = libffi_args
    .iter()
    .map(|cur_arg| cur_arg.arg_downcast())
    .collect::<Vec<libffi::high::Arg<'_>>>();
// ...

// scalar_to_carg for i32
match arg_type.kind() {
    // If the primitive provided can be converted to a type matching the type pattern
    // then create a `CArg` of this primitive value with the corresponding `CArg` constructor.
    // the ints
    TyKind::Int(IntTy::I32) => {
        return Ok(CArg::Int32(k.to_i32()?));
    }

The code for getting the corresponding return type is similar, just matching over the dest.layout.ty.kind() (the dest is the destination where the return is stored) instead of the arg_type.

When and where are we dispatching the C calls?

As discussed above, Miri handles dispatching to its emulated foreign functions through a function called emulate_foreign_item_by_name in the foreign_items module. In our implementation, we are adding the dispatch to linked foreign functions before the match to try and call the built-in emulated functions. In foreign_items.rs:

fn emulate_foreign_item_by_name(...) {
    let this = self.eval_context_mut();

    // First deal with any external C functions in linked .so file
    // (if any SO file is specified).
    if this.machine.external_so_lib.as_ref().is_some() {
        // An Ok(false) here means that the function being called was not exported
        // by the specified SO file; we should continue and check if it corresponds to
        // a provided shim.
        if this.call_and_add_external_c_fct_to_context(link_name, dest, args)? {
            return Ok(EmulateByNameResult::NeedsJumping);
        }
    }
    // continue to testing the emulated functions

The effect of this decision is that now, if Miri encounters a call to a linked foreign function that has the same name as a built-in (emulated) function, then the linked implementation will be run instead of the emulated version. If there is no linked foreign function then the execution will proceed as before: Miri will check to see if the foreign function matches one that is emulated, and if not, it will throw an unsupported error. The reasoning behind this design decision is that if a developer provides a function that has the same signature as a built-in function, it will take precedence over the built-in function, and we want to model this behavior.

Miri support for the C FFI for functions that take/return primitive values is done, and we're in the process of merging it upstream, with some ongoing discussion and great feedback from the Miri developers (see this PR).

Pointers

As discussed, C FFI support for functions that take/return primitive values is fairly straight forward. The real technical challenge is in dealing with shared memory between the languages. This includes C pointers returned from C functions (which then may be used in the Rust program, and/or modified by future C function calls), and Miri pointers passed as arguments to C functions.

There's been some discussion on how to support/represent C pointers in Miri and how to pass Miri pointers to C. This can be found in this Zulip thread, and this GitHub issue. We summarize the main points here.

Passing Miri pointers to external C functions

Here are some points about how the Miri allocator works, brought up in the Zulip discussion, that are relevant to the proposed implementation.

  • Allocations in Miri are not resized or moved after they are created.
  • The data stored in Miri allocations must already be compatible with the type expected by a foreign function if a pointer to it is being passed in as an argument, if this FFI call worked in the original Rust program.
  • Miri pointers, in addition to the data bytes, also carry metadata such as the provenance.

This means a pointer to the data bytes of Miri memory can be passed directly to a foreign function, but that we need to account for the metadata (provenance in particular) and make sure that after calls to foreign functions it is modified properly. The effect of foreign function calls on provenance is discussed in this section.

Implementation details: modifications to Allocation means modifications to the compiler

The Miri allocator is actually the rustc allocator creation of Allocations in Miri are through hooks to the Allocation creation/modification in rustc.

In its current state, the bytes of an Allocation are not directly accessible to Miri. To be able to pass a pointer to Miri memory to a foreign function call, we need access to the actual machine address of the bytes of an Allocation so that it can be passed to the foreign function.

In Miri, the function alloc_base_addr serves to get an address for a given allocation, specified by an AllocID. Right now, a fake address is used, but if the machine address of the bytes of the Allocation specified were accessible, then we could just use this instead (with no change on the functionality of Miri on non-FFI code).

A problem: when the bytes field of an Allocation is created, it is not actually aligned with the alignment parameter specified. This causes issues of alignment mismatch if we pass the machine address of the underlying bytes directly to a foreign function, as the bytes won't necessarily have the alignment that the foreign function expects it to have.

How does this manifest as a bug? The current setup causes issues with double-dereferencing. For example, consider this C function:

int double_deref(const int **p) {
   return **p;
}

And the Rust that calls it:

extern "C" {
   fn double_deref(x: *const *const i32) -> i32;
}

fn main() {
   unsafe {
      let base: i32 = 42;
      let base_p: *const i32 = &base as *const i32; 
      let base_pp: *const *const i32 = &base_p as *const *const i32;  
      assert_eq!(double_deref(base_pp), 42); // seg fault!!
   }
}

Here the C program segfaults, because the first dereference *p does not match the actual address of base_p. In C, when we dereference **p this is not a valid dereference and it crashes.

rustc changes required
  • Add getters for the address of the bytes field of an Allocation, so it can be accessed in Miri.
  • Properly align the bytes field of an allocation when it is created or updated.

These changes are currently in a PR.

There is still some work to be done here, as seen in the discussion on the PR. The code for manual alignment of the bytes is unsafe, and the current plan is to remove the unsafe code from rustc by following Ralf's suggestion, which is:

  • Parameterize the Allocation over the type of the bytes, defaulting to Box<[u8]>
  • Define a trait such as AllocationBytes and constrain the type of bytes to this trait; then in this trait we include all the operations we'd like
  • Use the normal Box<[u8]> implementation in rustc, and implement AllocationBytes with the manual alignment for the Miri Machine. This way all the unsafe code for aligning the bytes is constrained to the Miri codebase.

The current code for manually aligning the bytes also has an issue that needs to be addressed: With this current code, when the bytes field is deallocated, even though the size is right, in the layout the alignment might be under-required: for example we might have allocated it with alignment 4 and deallocate it with alignment 2. This is undefined behaviour violating the memory fitting requirements of dealloc we're going to work around this by adding a wrapper struct for the Box<[u8]> that stores the alignment it was allocated with, and manually implements deallocation with the right alignment. Specifically, we're going to use this AlignedSlice that @maurer designed.

Miri changes required
  • Modify the alloc_base_addr function to use the actual address of the bytes of an Allocation instead of generating a fake address.

These changes are done and in this PR. This PR will be updated with the new changes to Miri when we:

  • Implement the AllocationBytes trait and remove the unsafe code from rustc
  • Move this unsafe code into a custom type in Miri that implements AllocationBytes
  • This custom type for aligned Allocations will use the AlignedSlice for proper deallocation
Miri tests that fail if executed in FFI mode

There are some tests in Miri’s test suite that fail if they’re executed using the real bytes of the Allocation as the address. All of the failures are because of allocation alignment assumptions being violated, and we don’t think they correspond to bugs in our implementation.

Note: this was found when we were testing – the version of Miri we pushed only uses the real bytes for the address if we’re executing in the FFI mode, and so none of these tests are affected.

These are listed and explained in this linked doc.

Passing pointers to C memory back to Miri

In order to support foreign functions that return pointers to foreign memory, we need further modifications to the structure of Allocations.

As it stands, the alloc_id_from_addr function in Miri deals with retrieving the corresponding Allocation from a given address. This working is predicated on there being an existing Allocation for an address to be valid but of course, if a foreign function returns the address of a pointer to foreign memory, this will not correspond to an existing Allocation. So, we will need to create an Allocation for this external memory.

The current structure of Allocations is such that they own their bytes. However, this won't work for bytes in foreign memory, which are not even owned by the Rust program executing. We propose that instead of using Box<[u8]> for the type of the bytes field of an Allocation, we introduce a new enum type:

pub enum AllocBytes {
    /// Owned, boxed slice of [u8].
    Boxed(Box<[u8]>),
    /// Address, size of the type stored, and length of the allocation.
    /// This is used for representing pointers to bytes that belong to a 
    /// foreign process (such as pointers into C memory, passed back to Rust
    /// through an FFI call).
    Addr(AddrAllocBytes),
}

This enum type will implement the AllocationBytes trait discussed above, and then we will use AllocBytes as the type that that the Miri Machine will parameterize the Allocation bytes with. Here, AllocBytes::Boxed represents the Box<[u8]> with manual alignment. The AllocBytes::Addr variant is used to represent Allocations corresponding to foreign memory. For this we use another new data structure, AddrAllocBytes, which represents a section of memory, starting at a particular address, and of a specified length.

pub struct AddrAllocBytes {
    /// Address of the beginning of the bytes.
    pub addr: u64,
    /// Size of the type of the data being stored in these bytes.
    pub type_size: usize,
    /// Length of the bytes, in multiples of `type_size`; 
    /// it's in a `RefCell` since it can change dynamically, 
    /// depending on how it's used in the program. UNSAFE
    pub len: std::cell::RefCell<usize>,
}

With these changes, we can support foreign pointers in Miri by, in the case of foreign functions that return pointers, creating an allocation of this kind and adding it to memory.

Then, alloc_id_from_addr works as before, since now the foreign address does have a corresponding Allocation.

This part of the implementation is still a work in progress.

Specifically, one thing that we will change is that in our current implementation, we've added a functionallocate_ptr_raw_addr in the rustc Memory to allow Miri to create this new kind of Allocation, and store it in memory. However, when we create the AllocationBytes trait and move the AllocBytes enum to Miri, this function will no longer be necessary.

Length of C pointers: current hack solution

The AddrAllocBytes len field represents the length of the Allocation. It is a RefCell so that it can be modified over the lifetime of the Allocation Unfortunately, we can't know the exact size of the memory the C pointer refers to without some instrumentation or interception of the C code executing, which we are not currently doing.

The initial value chosen for len determines the size that Miri considers valid for the pointer. We want to allow C to return (for example) arrays as pointers and for Rust to then access sequential elements in the array. For example, we should be able to run the following program.

// C code
int* array_pointer_test() {
  const int COUNT = 3;
  int *arr = malloc(COUNT*sizeof(int));
  for(int i = 0; i < COUNT; ++i) 
    arr[i] = i;
  return arr;
}
extern "C" {
    fn array_pointer_test() -> *mut i32;
}

// Return pointer to array of i32 from C, 
// and read part of the array as a slice
fn main() {
    unsafe {
        let arr_ptr = array_pointer_test();
        let slice = std::slice::from_raw_parts(
                        arr_ptr as *const i32, 3u64 as usize);
        assert_eq!(slice, [0, 1, 2]);
        assert_eq!(*arr_ptr, 0);
        assert_eq!(*arr_ptr.offset(1), 1);
    }
}

When we create an Allocation for arr_ptr in Miri, this needs to have a len large enough that the creation of a slice of length 3 and the access to *arr_ptr.offset(1) are not out of bounds. Our current hack solution is to just say that every Allocation corresponding to a C pointer is given len of 1000 (so in this case, we consider arr_ptr to be a 1GB array). The reasoning is that this should be large enough to cover the vast majority of pointers. This doesn't actually allocate any memory, so it is not wasting space, it just means that if an access is actually out of bounds this error will not be caught.

Provenance

This comment in the related GitHub issue raises some important questions about provenance. In particular:

  • Pointers in Miri have provenance metadata. If a C function returns a Miri pointer, this provenance data will have been stripped. How do we restore the provenance?
  • What about the provenance of pointers to C memory that C returns?

One "C" provenance value all memory explicitly exposed to C

We already need to create some provenance value for pointers to C memory. One idea would be to have one provenance value for these, and then give this same C provenance value to any pointers to Miri memory that are exposed to C (i.e., passed as arguments). This would involve recursing through any exposed Miri pointers (basically, building a pointer reachability graph and giving all of these pointers the same provenance) this is the same idea as the retag_fields option in stacked borrows, which determines if retagging (modifying the provenance) should recurse into fields, but in this case it should always be true. Of course, we can't recurse into C memory, since we don't know if there are any pointer fields in a pointer to C memory Miri will know that a pointer returned from C is a pointer into C memory, but not know the underlying structure of that memory. We know that anything accessed through a pointer into C memory will be known to have the C provenance.

Note: things might get complicated here if a C object stores a pointer into Miri memory but we will be able to tell that it is Miri memory by checking that there is a corresponding Miri AllocID for its address

The C provenance value is a similar idea to the Wildcard provenance that already exists in Miri we would reuse this to tag all the memory exposed to or originating from C.

The advantage of this option is that we would not need to track the list of exposed addresses and re-sync the provenance of this entire list after every call to C: there is no need to re-sync the provenance since we already know that everything in the sync list will have the same ( C ) provenance! This would be much more efficient than the more complex idea proposed below, and would be simpler to implement.

This idea would also not result in any loss in provenance data in memory not passed into the FFI. Essentially, it would only affect the provenance data for Miri memory that is pointed to by a Miri pointer that is passed to C.

Design decision: only changing provenance of explicitly exposed memory

We have various options when it comes to what to do with the provenance of memory exposed to C.

  1. Recognize that all memory is exposed to C, and so use only the "C provenance" for all memory in a Rust program if the C FFI is used, regardless of whether or not it is explicitly exposed.
  2. Make an assumption that C will only modify the Rust memory that is exposed to it, and use the "C provenance" for pointers to all of this memory, while leaving the provenance of the rest of the Rust memory untouched (this is our proposed plan).
  3. Implement a strategy for more fine-grained reasoning about the provenance of the pointers exposed to C.

The first option would be the most conservative, and the most sound without implementing a strategy for being able to look at the specific effect of C code on values in memory. However, it is also pretty useless this would mean all provenance in the entire program is lost with any use of the C FFI at all. We propose that the second option is a better idea it allows for the checks that make use of provenance to continue unchanged in the pointers not explicitly exposed to C, and still allows to reasoning/tracking of the memory that is exposed.

The last solution is the best solution, as it is the most precise. We have some ideas about how this might be implemented, but they are future work for now, and we propose solution 2 in the meantime.

Strict provenance?

We propose that the FFI support and strict provenance mode not be allowed to be used together in Miri.

Questions

  • What level of provenance tracking is acceptable?
    • Can we use the strategy of having one provenance value for anything exposed to C?
  • What is the plan for Miri's support of non strict provenance mode in the future?

Ideas for more fine-grained reasoning about provenance

We have a couple potential ideas for implementing more fine-grained reasoning about provenance of pointers exposed to C.

Sync list: keeping track of C changes to Miri pointer provenance

We could add a list that tracks all the pointers and their provenance values. This would be in the Miri evaluation context, separate from the memory itself. Then, after every FFI call, we would iterate over the memory and compare the pointer values to their original values (in the newly added list).

With this, we would catch all changes to the Miri memory. We would also be able to say what the changes to provenance should be, as we could identify when Miri pointers have been reassigned and what they have been reassigned to.

This solution still only has one "C provenance" value for all pointers to C memory (and this would be the provenance for Miri pointers reassigned to refer to C memory).

Pros

  • All the changes would be constrained to Miri itself

Cons

  • Inefficient in both time and space storing these extra lists and running all these extra checks would add a lot of overhead

ASAN: using a sanitizer to detect memory accesses and modifications

AddressSanitizer (ASAN) is a sanitizer designed to find memory use errors, such as use-after-free bugs. Of particular relevance for us, ASAN allows for specific memory to be exposed or "poisoned", such that access to poisoned memory would be considered a bug.

We could use ASAN to set up "guards" on the Miri memory that C doesn't have explicit access to, by only exposing the data it should be able to see and "poisoning" the rest. This would mean that our assumption about the provenance of the non-directly-C-accessible Miri memory not changing in the presence of C calls would be verifiable and it would let us catch errors if C does access this memory. We might also be able to use it to detect if C doesn't access Miri memory that it technically has access to, in which case that provenance information can also remain unmodified.

ASAN has an interface for C/C++, and can be used with FFI, as long as the C/C++ code it is sanitizing is linked with ASAN when it is compiled. At a high level, it seems like it should work out-of-the-box on the C code being called with the Rust C FFI.

This solution also still has "C provenance" for all memory modified by or originating from C.

Pros

  • More efficient, and only requires the compilation of the linked C/C++ program with ASAN.

Cons

  • Requires ASAN: now the changes are no longer constrained to Miri.
  • Don't immediately see how we could use it to get specific provenance values if C does access the memory, even if it is reassigning Miri pointers to each other (in which case their provenance values should be swapped, for e.g.).

Current state of the project

In this last section we summarize the current state of the project, and the work that still needs to get done.

What is working?

At this point, we can call C functions from Miri with the following argument and return types:

  • Primitive (integer) or void/empty arguments and returns
  • Arguments that are pointers to Miri memory
    • the C function can change the value that the pointer points to
    • we do not yet support C functions that write pointers to Miri memory
  • Arguments and returns that are pointers to C memory

Here are some examples of function calls that we support:

// C code
void deref_and_print(int *p) {
  printf("deref in C has value: %d\n", *p);
}
long add_short_to_long(short x, long y) {
  return x + y;
}
int* pointer_test() {
  int *point = malloc(sizeof(int)); 
  *point=1;  
  return point;
}
int* array_pointer_test() {
  const int COUNT = 3;
  int *arr = malloc(COUNT*sizeof(int));
  for(int i = 0; i < COUNT; ++i) 
    arr[i] = i;
  return arr;
}
// double dereference pointers, and swap what values they're pointing to
// note: this is only writing non-pointer values to memory
void swap_double_ptrs(short **x, short **y) {
    short temp = **x;
    **x = **y;
    **y = temp;
}
// write non-pointer values to memory represented by pointers
void set(short *x, short val) { *x = val; }
extern "C" {
    fn add_short_to_long(x: i16, y: i64) -> i64;
    fn pointer_test() -> *mut i32;
    fn deref_and_print(x: *mut i32);
    fn array_pointer_test() -> *mut i32;
    fn swap_double_ptrs(x: *mut *mut i16, y: *mut *mut i16);
    fn set(x: *mut i16, v: i16);
}

fn main() {
    unsafe {
        // test function that adds an i16 to an i64
        assert_eq!(add_short_to_long(-1i16, 123456789123i64), 123456789122i64);

        // test return pointer to i32 from C, dereference, modify in Rust, 
        // and see changes in C
        let ptr = pointer_test();
        assert_eq!(*ptr, 1);
        *ptr = 5;
        assert_eq!(*ptr, 5);
        deref_and_print(ptr); // void function that prints: *ptr is 5 in C

        // test return pointer to array of i32 from C, 
        // and read part of the array as a slice
        let arr_ptr = array_pointer_test();
        let slice = std::slice::from_raw_parts(arr_ptr as *const i32, 3u64 as usize);
        assert_eq!(slice, [0, 1, 2]);
        assert_eq!(*arr_ptr, 0);
        assert_eq!(*arr_ptr.offset(1), 1);

        // mutate the pointer and see it reflected in the slice
        *arr_ptr.offset(1) = 5;
        assert_eq!(slice, [0, 5, 2]);

        // test passing a Rust pointer to C and reassigning its value
        let mut set_base: i16 = 1;
        let mut set_base_p: *mut i16 = &mut set_base as *mut i16;
        set(set_base_p, 3);
        assert_eq!(set_base, 3);
        assert_eq!(*set_base_p, 3);

        // test passing two double pointers, and swapping the _values_ they point to 
        // note: this is _not_ C writing pointers to Miri memory
        let mut new_base: i16 = 2;
        let mut new_base_p: *mut i16 = &mut new_base as *mut i16;
        let new_base_pp: *mut *mut i16 = &mut new_base_p as *mut *mut i16;
        let set_base_pp: *mut *mut i16 = &mut set_base_p as *mut *mut i16;
        assert_eq!(**set_base_pp, 3);
        assert_eq!(**new_base_pp, 2);
        assert_ne!(*new_base_pp, *set_base_pp);
        swap_double_ptrs(set_base_pp, new_base_pp);
        assert_ne!(*new_base_pp, *set_base_pp);
        assert_eq!(**set_base_pp, 2);
        assert_eq!(**new_base_pp, 3);
    }
}

This is a subset of our whole test suite, which can be found:

What is not quite working, and needs to get done?

There are some things that are not quite working yet: In particular, C functions that write pointers to Miri memory cause Miri to crash with a UB error stating that the memory access is invalid when this memory is dereferenced to a value.

For example:

// C code
void setptr(short **x, short *val) { *x = val; }
extern "C" {
    fn setptr(p: *mut *mut i16, x: *mut i16);
}
fn main(){
    unsafe {
        // test passing a double pointer and a single pointer, 
        // and reassigning the double pointer
        // to point to the single pointer
        let mut new_base: i16 = 2;
        let new_base_p: *mut i16 = &mut new_base as *mut i16;
        assert_ne!(new_base_p, set_base_p);
        setptr(set_base_pp, new_base_p);
        assert_eq!(new_base_p, set_base_p);

        // uh oh: the following code breaks
        // let rust_ddref = **set_base_pp;
        // let rust_dref = *set_base_p;
    }
}

This is because the Miri pointers exposed to C need to be updated with Wildcard provenance after calls to C, but this is not done yet. In these cases, the expected provenance is now wrong, since the pointers have been reassigned in C.

There are also some concrete work items that still need to get done:

  • Replace the unsafe code in rustc with the AllocationBytes trait, and then propagate this change into Miri.
  • Change the Box<[u8]> Allocation bytes representation to use the AlignedSlice when it needs to be manually aligned.
  • Keep up with the existing PRs and make the changes/fixes requested.
  • Properly set up propagation of C (Wildcard) provenance to the pointers to Miri memory that C gets access to this should enable support for C functions that write pointers to Miri memory.

Directions for future work

In addition to the concrete work items listed above, there are various interesting avenues for future work on this project.

  • Incorporate ASAN to enable more fine-grained reasoning about the provenance of pointers once they are passed to C (as described above). This would allow us to:
    • Find bugs where C is accessing memory it should not have access to.
    • Confidently use the provenance information of pointers in Miri that C did not change (since then we will know that C did not modify the memory).
  • Intercept calls to malloc and free to provide Miri with information on the actual size and lifetime of C pointers.
    • We could use something more precise than "every C pointer has len 1000" in our Allocations.
  • Run this on real code bases where C FFI is used, and see what bugs we can find!
  • and other suggestions welcome :)
Select a repo