# C-variadic stabilization report ## Summary In C, functions can use a variable argument list `...` to accept an arbitrary number of untyped arguments. Rust is already able to call such functions (e.g. `libc::printf`), the `c_variadic` feature adds the ability to define them. A rust c-variadic function looks like this: ```rust /// SAFETY: must be called with (at least) 2 i32 arguments. unsafe extern "C" fn sum(mut args: ...) -> i32 { let a = args.arg::<i32>(); let b = args.arg::<i32>(); a + b } fn foo() -> i32 { unsafe { sum(0i32, 2i32) } } ``` This function accepts a variable arguments list `args: ...`, from which it is able to read arguments using the `arg` method. The main goal of defining c-variadic functions in rust is interaction with C code. Therefore it is a design goal that the rust types map directly to their C counterparts. Additionally, we disallow interaction between c-variadic functions and certain rust features that don't make much sense in an FFI context. ## How variadics work in C The authoritative source for how variadics (also known as "variable arguments") work in C is the C specification. In this document we'll use [section 7.16 of the final draft of the C23 standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf#page=302). > A function may be called with a variable number of arguments of varying types if its parameter type list ends with an ellipsis Earlier versions of C furthermore required that the `...` argument is not the first argument of the parameter type list (so at least one other argument was required). Starting in C23 this requirement has been lifted. ## C API surface - `va_list`: an opaque type that stores the information needed to read variadic arguments, or copy the list of variadic arguments. Typically values of this type get the name `ap`. - `va_start`: a `va_list` must be initialized with the `va_start` macro before it can be used. - `va_copy`: the `va_copy` macro copies a `va_list`. The copy starts at the position in the argument list of the original (so **not** at the first variadic argument to the function), and both can be moved forward independently. This means the same argument can be read multiple times. - `va_arg`: reads the next argument from the `va_ist`. - `va_end`: deinitializes a `va_list`. ## Important notes ### Not calling `va_end` is UB Section 7.16.1 > Each invocation of the `va_start` and `va_copy` macros shall be matched by a corresponding invocation of the `va_end` macro in the same function. Section 7.16.1.3: > The `va_end` macro facilitates a normal return from the function whose variable argument list was referred to by the expansion of the `va_start` macro, or the function containing the expansion of the `va_copy` macro, that initialized the `va_list` ap. The `va_end` macro may modify ap so that it is no longer usable (without being reinitialized by the `va_start` or `va_copy` macro). If there is no corresponding invocation of the `va_start` or `va_copy` macro, or if the `va_end` macro is not invoked before the return, the behavior is undefined. We believe that this behavior is this strict because some early C implementations chose to implement `va_list` like so [(source)](https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol): ```rust #define va_start(ap, parmN) {\ va_buf _va;\ _vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN) #define va_end(ap) } #define va_arg(ap, mode) *((mode *)_vaarg(ap, sizeof (mode))) ``` To our knowledge no remotely-modern implementation actually implements `va_end` as anything but a no-op. ### A `va_list` may be moved Section 7.16 > The object `ap` may be passed as an argument to another function; if that function invokes the `va_arg` macro with parameter `ap`, the representation of `ap` in the calling function is indeterminate and shall be passed to the `va_end` macro prior to any further reference to `ap`. and > A pointer to a `va_list` can be created and passed to another function, in which case the original function can make further use of the original list after the other function returns So `va_list` can be moved into another function, but `va_end` must still run in the frame that initialized (with `va_start` or `va_copy`) the `va_list` . ### Representation of `va_list` The representation of `va_list` is platform-specific. There are three flavors that are relevant for rust: - `va_list` is an opaque pointer * `va_list` is a struct - `va_list` is a single-element array, containing a struct The opaque pointer approach is the simplest to implement: the pointer just points to an array of arguments on the caller's stack. The struct and single-element array variants are more complex, but potentially more efficient because the additional state makes it possible to pass variadic arguments via registers. #### array-to-pointer decay If the `va_list` is of the single-element array flavor, it is subject to array-to-pointer decay: in C, arrays are passed not by-value, but as pointers. Hence, from an FFI perspective, these two functions are equivalent. ```c #include <stdarg.h> extern int foo(va_list va) { return va_arg(va, int); } extern int bar(va_list *va) { return va_arg(*va, int); } ``` Indeed, they generate the same assembly, see https://godbolt.org/z/n8c4aq5hM. ### other calling conventions Both `clang` and `gcc` refuse to compile a function that uses variadic arguments and a non-default calling convention. See also https://github.com/rust-lang/rust/issues/141618, in particular https://github.com/rust-lang/rust/issues/141618#issuecomment-2911802411. Hence the calling convention of a c-variadic function implicitly uses the default C ABI on the current platform. ### `va_arg` and argument promotion With some exceptions, the type of a `va_arg` call must match the type of the supplied argument: Section 7.16.1 > If type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined These default argument promotions are specified in section 6.5.3.3: > The arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter, if present. The integer promotions are performed on each trailing argument, and trailing arguments that have type float are promoted to double. These are called the default argument promotions. No other conversions are performed implicitly There are a couple of additional conversions that are allowed, such as casting signed to/from unsigned integers. A concrete example of what is not allowed is to use `va_arg` to read (signed or unsigned) `char` or `short`, or `float` arguments. Reading such types is UB. See also https://github.com/rust-lang/rust/issues/61275#issuecomment-2193942535. ## How c-variadics work in rust ### API surface The rust API has similar capabilities to C but uses rust names and concepts. ```rust struct VaList<'f> { /* ... */ _marker: PhantomCovariantLifetime<'f>, } ``` The `VaList` struct has a lifetime parameter that ensures that the `VaList` cannot outlive the function that created it. Semantically `VaList` contains mutable references (to the caller's and/or callee's stack), so the lifetime is covariant. The `#[rustc_pass_indirectly_in_non_rustic_abis]` attribute is applied to the definition if `va_list` is a single-element array on the target platform. This attribute simulates the array-to-pointer decay that the C `va_list` type is subject to on that target. By using this attribute, `va_list` and `VaList` are FFI-compatible. On targets where `va_list` is just a pointer or a struct the C and rust types are already FFI-compatible, so no special attribute is needed there. ```rust unsafe trait VaArgSafe: Sealed {} impl<'f> VaList<'f> { pub unsafe fn arg<T: VaArgSafe>(&mut self) -> T { /* ... */ } } ``` The `VaList::arg` method can be used to read the next argument. The implementation uses `va_arg` (though in most cases we re-implement the logic in rustc itself, see below). The return type is constrained by `VaArgSafe` so that only valid argument types can be read. In particular this mechanism prevents subtle issues around implicit numeric promotion in C. Reading an argument is unsafe because reading more arguments than were supplied is UB. All current implementers of `VaArgSafe` are scalar primitives (`f64`, 32-bit and 64-bit integers, raw pointers). C argument types are considered to have the rust type that corresponds to `core::ffi::*`, so a C `char` is mapped to `c_char` and so on. We don't consider the C `_BitInt` or `_Float32` types here, rather we map `i8` to `char` etc. `_BitInt(8)` and `char` are distinct types. `_BitInt` is furthermore special because it does not participate in integer promotion. ```rust impl<'f> Clone for VaList<'f> { /* ... */ } impl<'f> Drop for VaList<'f> { fn drop(&mut self) { /* no-op */ } } ``` The `Clone` implementation can be used to duplicate a `VaList`. The copy has the same position as the original, but both can be incremented independently. The `Drop` implementation is a no-op. This choice is based on the assumption that in rust it is safe to not run a destructor. Additionally, `va_end` is a no-op for all current LLVM targets. In C,`va_end` must run in the frame where the `va_list` was initialized. Because `VaList` can be moved (like the C `va_list`), the frame in which a `VaList` is dropped may not be the frame in which it was initialized. ### Syntax In rust, a C-variadic function looks like this: ```rust unsafe extern "C" fn foo(a: i32, b: i32, args: ...) { /* body */ } ``` The special `...` argument stands in for an arbitrary number of arguments that the caller may pass. The `...` argument must be the last argument in the parameter list of a function. Like in C23 and later, `...` may be the only argument. The `...` syntax is already stable in foreign functions, `c_variadic` additionally allows it in function definitions. In function definitions, the `...` argument must have a pattern. The argument can be ignored by using `_: ...`. In foreign function declarations the pattern can be omitted. A function with a `...` argument must be an `unsafe` function. Passing an incorrect number of arguments, or arguments of the wrong type, is UB, and hence every call site should have a safety comment. A special case is a function that ignores its `VaList` entirely using `_: ...`: we may decide to allow such functions to be safe. At the time of writing we see insufficient benefits relative to the additional complexity that this entails. A function with the `...` argument must be an `extern "C"` or `extern "C-unwind"` function. In the future we want to extend the set of accepted ABIs to include all ABIs for which we allow calling a c-variadic function (including e.g. `sysv64` and `win64`). The `...` argument can occur definitions of functions, inherent methods, and trait methods. When any method on a trait uses a c-variadic argument, the trait is no longer dyn-compatible. The technical reason is that there is no sound way to generate a `ReifyShim` that passes on the c-variadic arguments. ### Desugaring In a function like this: ```rust unsafe extern "C" fn foo(args: ...) { // ... } ``` The `args: ...` is internally desugared into a call to `va_start` that initializes `args` as a `VaList`, and a call to `va_end` on every return path. The `VaList` gets the lifetime of a local variable on `foo`'s stack, so that the `VaList` cannot outlive the function that created it. ### A note on LLVM `va_arg` The LLVM `va_arg` intrinsic is known to silently miscompile. This is likely due to a combination of: - LLVM does not have sufficient layout information to accurately implement the ABI - Clang provides its own implementation of `va_arg`, so the LLVM implementation is mostly untested Hence, like clang, `rustc` implements `va_arg` for most commonly-used targets (specifically including all tier-1 targets) in [`va_arg.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_llvm/src/va_arg.rs). If no custom implementation is provided, the LLVM implementation is used as a fallback. But again, it may silently miscompile the input program. ## Future extensions ### C-variadics and `const fn` Support for c-variadic `const fn` (and by extension, support in Miri) is implemented in https://github.com/rust-lang/rust/pull/150601. ### C-variadics and coroutines An `async fn` or any other type of coroutine cannot be c-variadic. We see no reason to support this. ### Naked variadic functions Currently only `C` and `C-unwind` are valid ABIs for all c-variadic function definitions. With naked functions it is possible to define e.g. a `win64` c-variadic function in a program where `sysv64` is the default. This feature is tracked as [`c_variadic_naked_functions`](https://github.com/rust-lang/rust/issues/148767). ```rust #![feature(c_variadic, c_variadic_naked_functions)] #[unsafe(naked)] unsafe extern "win64" fn variadic_win64(_: u32, _: ...) -> u32 { core::arch::naked_asm!( r#" push rax mov qword ptr [rsp + 40], r9 mov qword ptr [rsp + 24], rdx mov qword ptr [rsp + 32], r8 lea rax, [rsp + 40] mov qword ptr [rsp], rax lea eax, [rdx + rcx] add eax, r8d pop rcx ret "#, ) } ``` Calling such a function requires `unsafe` of some kind (custom assembly, cast to `extern "C"`, pass to FFI). ### Defining safe C-variadic functions In `extern` blocks, it is valid to mark C-variadic functions as safe, under the assumption that the function completely ignores the variable arguments list: ```rust unsafe extern "C" { safe fn foo(...); } ``` Normally, C-variadic function definitions must be unsafe, because calling the function with unexpected (in type or number) elements is UB. We could relax this constraint on C-variadic functions that ignore their C variable argument list, e.g.: ```rust // NOTE: not unsafe extern "C" fn(x: i32, _: ...) -> i32 { x } ``` At the moment we don't have a good reason to add this behavior. It is completely backwards compatible, so if a need arises in the future we can revisit this. ### Accepting more `va_arg` return types The return type is restricted with the `VaArgSafe` trait. It is only implemented for primitive types that are safe (`f64`, 32-bit and 64-bit integers, and raw pointers). C allows richer types to be read directly. We could expand the set of accepted types, though it is unclear whether that will be worth the implementation effort. In any case, the current `va_arg` implementations don't all support e.g. types with an alignment of 16 or higher, or wider than a `u64`. Any extension will require extensive testing on all supported platforms. ### Multiple C-variadic ABIs in the same program https://github.com/rust-lang/rust/issues/141618 Both `clang` and `gcc` reject using `...` in functions with a non-default ABI for the target. That makes the layout of `VaList` unambiguous. For now we impose a similar restriction for the rust implementation. This restriction could be lifted in the future. One approach is to add a phantom type parameter to `VaList` that default's to the platform's default ABI. Each c-variadic argument would then desugar to use the ABI of the function that specifies it. Currently LLVM always desugars `va_start` and friends using the target's default ABI. In order for rust to support defining c-variadic functions with multiple ABIs in the same program, either LLVM must respect the function's calling convention, or rustc must implement these functions itself. ## History [RFC 2137](https://github.com/rust-lang/rfcs/pull/2137) proposes to "support defining C-compatible variadic functions in rust" in 2017, and it is still the core of the implementation today. The text lays out a basic rust API and highlights potential issues (e.g. some solution is needed to match C's array-to-pointer decay), but does not always provide concrete solutions. In 2019 https://github.com/rust-lang/rust/pull/59625 introduces a wrapper type to simulate array-to-pointer decay. With this API the C semantics can be matched, but doing so correctly takes a great deal of care. The `VaList` type also has two lifetime arguments in this version, which is inelegant. Then, little seems to have happened for 6 years, until the recent burst of activity that resulted in the current proposal. - [#t-compiler > c_variadic API and ABI](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/c_variadic.20API.20and.20ABI/with/527115587) - https://github.com/rust-lang/rust/issues/141524