Safety Extensions

# Introduction This document details how to write good safety documentation. The goals of such documentation are: - reduce the amount of bugs in `unsafe` code, - help readers know immediately what conditions hold and why they hold, - help programmers write `unsafe` code with confidence, - simplify the work of reviewers, ## General Rules The general theme behind all rules in this document is that everything that cannot be statically checked by the Rust compiler and guaranteed -- modulo any compiler bugs -- need to be either checked at runtime, or have to be documented by the means detailed here. This is a tall task, however there are some ways to prevent having to manually document everything. The most common way is to write a safe abstraction. This means translating every condition into Rust-checkable types and functions. If this cannot be done then there is the option of using runtime checks and `panic!()` when the conditions are violated. If runtime checks lead to unacceptable performance or failure is not an option, then you will probably have to reach for `unsafe` and documentation. Note that there still is one last option to avoid `unsafe` code. Custom static checks implemented by `klint`. Currently `klint` is used to ensure that that mutexes are not locked while in atomic contexts. The rationale for this decision is that using something as fundamental as a mutex must not be `unsafe`. This last resort is something that is not taken lightly and most APIs are better off using `unsafe` code and relying on documentation. To summarize, you should take the following steps: 1. Explore the currently present abstraction and see if they fulfill your needs. 2. Try to write your own abstraction layer -- feel free to reach out to us at [zulip]! 3. Evaluate whether runtime checks could be used in your use-case. 4. Implement an `unsafe` proof of concept API (does not need to be complete) for your purpose as described by this document. 5. If you still think that it would be better to offload the API to `klint`, then contact us. # Documentation Elements In order to make it easy for everyone to understand what is going on this document details certain standardized documentation elements. Some are already established convention in the Rust world, others have been invented for the kernel. First we will describe the two main sections that we use and then we will explain how we write common safety conditions. ## `# Safety` section The safety section must be present on `unsafe` traits and `unsafe` functions. This section consists of a list of safety conditions needed to implement the trait or call the function. There are no restrictions as to what these conditions can be, they can affect anything. If a user upholds every listed safety property then there must not exist an execution path of the program that runs into UB. In other words, users that use the API exactly as documented benefit from having UB-free programs. If the property from the previous paragraph is not guaranteed by the API, we call it *unsound*. It is the purpose of safety documentation to avoid this at all costs and e.g. catch it during review. ## `# Invariants` section The `Invariants` section can be put on traits, structs, enums and unions. For functions use the `# Guarantees` section. Structs, enums and unions requrie for any construction of the type or modification of a field that is used in an invariant requires an `INVARIANTS` comment. This comment is similar to a `SAFETY` comment in that it needs to prove every condition. ## Specifying Safety Conditions In both `Safety` and `Invariants` sections we use bullet point lists of safety conditions. This section lists a comprehensive table of standardized safety conditions. Whenever possible, use these terms instead of natural language. We denote the set of primitive types by `PRIMITIVE = {u8, u16, u32, u64, usize, i8, i16, i32, i64, isize, f32, f64}`. Parameters/Fields refer to function parameters or fields of the item that is being documented. | Syntax | Parameters/Fields | Meta-variables | Semantics | |------------------------------------|-------------------------------|------------------------------------|---------------------------------------------------------------| | `val in i..j` | `val: T` with `T ∈ PRIMITIVE` | `i, j: T` | `i <= val && val < j` | | `val in i..=j` | `val: T` with `T ∈ PRIMITIVE` | `i, j: T` | `i <= val && val <= j` | | `val in i..` | `val: T` with `T ∈ PRIMITIVE` | `i, j: T` | `i <= val && val <= T::MAX` | | `val in ..j` | `val: T` with `T ∈ PRIMITIVE` | `i, j: T` | `T::MIN <= val && val < j` | | `val in ..=j` | `val: T` with `T ∈ PRIMITIVE` | `i, j: T` | `T::MIN <= val && val <= j` | | `ptr` is R-valid | `ptr: *const T` | | Abbreviation for "`ptr` is R-valid[..size_of::<T>()]" | | `ptr` is R-valid[..j] | `ptr: *const T` | `j: usize` | Abbreviation for "`ptr` is R-valid[0..j]" | | `ptr` is R-valid[i..j] | `ptr: *const T` | `i,j: usize` | Abbreviation for "`ptr` is R-valid[i..j] for [`'self`]" | | `ptr` is R-valid[i..j] for `'a` | `ptr: *const T` | `i,j: usize` | `ptr` is dereferenceable from `i` to `j`: all bytes from `ptr.offset(i)` until (excluding) `ptr.offset(j)` have to be part of the same allocated object. No concurrent (another thread) writes to the memory behind `ptr` must occur (reads are still permitted). | | `ptr` is RW-valid | `ptr: *mut T` | | Abbreviation for "`ptr` is RW-valid[..size_of::<T>()]" | | `ptr` is RW-valid[..j] | `ptr: *mut T` | `j: usize` | Abbreviation for "`ptr` is RW-valid[0..j]" | | `ptr` is RW-valid[i..j] | `ptr: *mut T` | `i,j: usize` | Abbreviation for "`ptr` is RW-valid[i..j] for [`'self`]" | | `ptr` is RW-valid[i..j] for `'a` | `ptr: *mut T` | `i,j: usize` | `ptr` is R-valid[i..j] for `'a` and writeable from `i` to `j` (all bytes from `ptr.offset(i)` until (excluding) `ptr.offset(j)`). No concurrent (another thread) reads or writes to the memory behind `ptr` must occur. | | `EX $var: T` where COND | | `var: T`, `COND` a meta condition | There exists a value of type `T` that satisfies COND (any condition; preferable defined using this document). For the rest of the documented item there exists the meta variable `$var` with some value that satisfies the condition. The `$` is part of the name of that variable to distinguish it from normal variables. Also take care that the value is either unique or the item can use any value that satisfies COND | | while `RET` is alive uphold COND | | `COND` a meta condition | For as long as the returned value is alive -- i.e. until it is no longer used, this can be before any associated lifetimes expire -- COND has to be true. Note that dropping a value with a `Drop` implementation aka drop-glue also counts as a use. | | during `'a` uphold COND | `'a` lifetime | `COND` a meta condition | While `'a` has not expired COND has to be true. | | `bytes` is valid UTF-8 | `bytes: &[u8]` | | The sequence of bytes contained in `bytes` forms valid UTF-8. | ### Meta Conditions Some safety conditions require us to write some explicit conditions. We use the following notation for that: | Syntax | Meta-variables | Semantics | |------------------------------------|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------| | `mem` is not mutated | `mem` an expression evaluating to a memory location | The memory location must not be mutated (this should be part of a continuously checked condition) | If the property you require is not present above, then you can try to contact us to see if it would be logical to add it. If not, then you can still specify the property in two ways: 1. If the property can be expressed by Rust code, then just do so, e.g. for a string `s` you could require that `s.len() > 2`. 2. If that is not an option, then write your property in natural language, but prefix it with `TOFORMALIZE:` when the time for review rolls around the reviewers will discuss and try to find a good formalization. You should also add that prefix to `SAFETY` and `INVARIANT` discharge comments. ## Proving Properties The discharging of `Safety` and `Invariants` sections happens in non-doc comments prefixed with `SAFETY` and `INVARIANT` respectively. They should mirror the section on the used item and provide a proof for every safety condition. To make life easier there are standardized terms to prove certain safety conditions. The syntax is: `$cond by $justification` where `$cond` is the safety condition that is required. It is important that the proven property is repeated here, since it lets reviewers discover mismatching discharging comments. It also helps readers to understand what properties are actually required for e.g. a function call. The `$justification` should be one from the list below. | Syntax | Meta-variables | Semantics | |----------------------|-----------------------|---------------------------------------------------------------------------------------------------------------| | FR | | "FR = function requirements" the required safety condition is also required by the sorrounding function | | TI | | Abbreviation for "TI of `Self`" | | TI of `T` | `T`: arbitrary type | "TI = type invariant" the required safety condition is provided by `T`'s invariants | | check above | | A check in the code above (up to 5 lines above the comment counting from `SAFETY`). | | check `name` | `name` check identifier | A check in the code labeled by `name`. Labels should be `//CHECK: name`. | | FG of `func` | `func`: function name | "FG = function guarantee" the function `func` guarantees the safety condition in its `# Guarantees` section | | DEF of `var` | `var`: variable name | "DEF = definition" the safety condition is satisfied by the defintion/construction of `var` | | use of `var` | `var`: variable name | The usage of `var` guarantees the safety condition | When multiple justifications have to be considered together write `by $just1 & $just2`. When using "FR" you are allowed to use the following shorthand: "SAFETY: .. by FR" if the safety conditions of the current function exactly match the safety conditions of the called function. The same can be done for `INVARIANT` comments. We allow this, since the safety conditions are already listed on the function and the reader knows the context. ## General Formatting When there are no safety condition on a function but it is still `unsafe` e.g. because it is an ffi function, then we write `SAFETY: <nothing>,`. If there is exactly one safety condition then you should write it in the same line as `SAFETY`. If there are more than one condition then you should write a bullet point list with the same order as on the definition. A comma should be present at the end of every item in these lists. This also applies to the last item, this makes appending the list easier, since you do not have to change `,` to `.`. ## Examples TODO # `Send` and `Sync` # Type invariants Types may list custom invariants in their documentation. These must only affect fields that are not `pub`. Invariants should be specified similarly to safety sections in this document: - an itemized list of invariants using the concise conventions from this document. ```rust /// # Invariants /// - `count in 10..100`, pub struct Foo { count: usize } ``` Then the module owning `Foo` must uphold that invariant, every time the field is set, an appropriate `INVARIANT` comment must be present that explains why the invariant is upheld. For example: ```rust impl Foo { pub fn set_count(&mut self, count: usize) { if count >= 10 && count < 100 { // INVARIANT: by check above: `count in 10..100`. self.count = count; } } } ``` # Unresolved Questions ## Checking C API When calling FFI functions, how do we check the safety reqs? 1. have a list of function -> reqs somewhere [zulip]: https://rust-for-linux.zulipchat.com/