Linux 核心 KCSAN

# Linux 核心 KCSAN contributed by < [`linD026`](https://github.com/linD026) > ###### tags: `linux2022`, `Linux kernel KCSAN` --- https://lore.kernel.org/all/20191009201706.GA3755@andrea/t/#u https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt https://lpc.events/event/7/contributions/647/attachments/549/972/LPC2020-KCSAN.pdf https://algs4.cs.princeton.edu/42digraph/ https://lore.kernel.org/lkml/1574191653.9585.6.camel@lca.pw/T/ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt http://cdn.kernel.org/pub/linux/kernel/people/will/slides/elce-2018.pdf https://bristot.me/wp-content/uploads/2019/09/paper.pdf https://hackmd.io/@sysprog/formal-verification?type=view https://lwn.net/Articles/243851/ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922 https://lwn.net/Articles/816850/ ## missing barrier detecting https://lwn.net/Articles/877200/ https://lwn.net/ml/linux-kernel/20211130114433.2580590-1-elver@google.com/ > miss barrier patches * Main change > [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling :::success modeling access reordering. Each memory access for which a watchpoint is set up, is also selected for simulated reordering within the scope of its function (at most 1 in-flight access). **We are limited to modeling the effects of "buffering" (delaying the access), since the runtime cannot "prefetch" accesses. Once an access has been selected for reordering, it is checked along every other access until the end of the function scope.** If an appropriate memory barrier is encountered, the access will no longer be considered for reordering. When the result of a memory operation should be ordered by a barrier, KCSAN can then detect data races where the conflict only occurs as a result of a missing barrier due to reordering accesses. ::: ``` + bool "Enable weak memory modeling to detect missing memory barriers" + default y + depends on KCSAN_STRICT + # We can either let objtool nop __tsan_func_{entry,exit}() and builtin + # atomics instrumentation in .noinstr.text, or use a compiler that can + # implement __no_kcsan to really remove all instrumentation. + depends on STACK_VALIDATION || CC_IS_GCC + help + Enable support for modeling a subset of weak memory, which allows + detecting a subset of data races due to missing memory barriers. + + Depends on KCSAN_STRICT, because the options strenghtening certain + plain accesses by default (depending on !KCSAN_STRICT) reduce the + ability to detect any data races invoving reordered accesses, in + particular reordered writes. + + Weak memory modeling relies on additional instrumentation and may + affect performance. ``` ## KCSAN general details https://docs.kernel.org/dev-tools/kcsan.html KCSAN relies on observing that two accesses happen concurrently. Crucially, we want to (a) increase the chances of observing races (especially for races that manifest rarely), and (b) be able to actually observe them. We can accomplish (a) by injecting various delays, and (b) by using address watchpoints (or breakpoints). If we deliberately stall a memory access, while we have a watchpoint for its address set up, and then observe the watchpoint to fire, two accesses to the same address just raced. Using hardware watchpoints, this is the approach taken in DataCollider. Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead relies on compiler instrumentation and “soft watchpoints”. In KCSAN, watchpoints are implemented using an efficient encoding that stores access type, size, and address in a long; the benefits of using “soft watchpoints” are portability and greater flexibility. KCSAN then relies on the compiler instrumenting plain accesses. For each instrumented plain access: * Check if a matching watchpoint exists; if yes, and at least one access is a write, then we encountered a racing access. * Periodically, if no matching watchpoint exists, set up a watchpoint and stall for a small randomized delay. Also check the data value before the delay, and re-check the data value after delay; if the values mismatch, we infer a race of unknown origin. To detect data races between plain and marked accesses, KCSAN also annotates marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never sets up a watchpoint on marked accesses. By never setting up watchpoints for marked operations, if all accesses to a variable that is accessed concurrently are properly marked, KCSAN will never trigger a watchpoint and therefore never report the accesses. ### [Finding race conditions with KCSAN](https://lwn.net/Articles/802128/) KCSAN finds potential problems by monitoring access to memory locations and identifying patterns where: - multiple threads of execution access the location, - those accesses are unordered — not protected by a lock, for example, and, - at least one of those accesses is a write. ```cpp int a; __tsan_write4(&a); a = 1; __tsan_read4(&a); ... = a; /// void __tsan_readN(void *ptr) { ... } ``` The above picture is simplified somewhat; there are a couple of exceptions to keep in mind. The first of those is that, before deciding whether to ignore an access, KCSAN looks to see if there is already a watchpoint established for the address in question. If so, and if either the current access or the access that created the watchpoint is a write, then a race condition has been detected and a report will be sent to the system log. https://lore.kernel.org/lkml/20211130114433.2580590-5-elver@google.com/T/ --- ### assert bit checking (ctx->access_mask) include/linux/kcsan-checks.h ```cpp /** * ASSERT_EXCLUSIVE_BITS - assert no concurrent writes to subset of bits in @var * * Bit-granular variant of ASSERT_EXCLUSIVE_WRITER(). * * Assert that there are no concurrent writes to a subset of bits in @var; * concurrent readers are permitted. This assertion captures more detailed * bit-level properties, compared to the other (word granularity) assertions. * Only the bits set in @mask are checked for concurrent modifications, while * ignoring the remaining bits, i.e. concurrent writes (or reads) to ~mask bits * are ignored. * * Use this for variables, where some bits must not be modified concurrently, * yet other bits are expected to be modified concurrently. * * For example, variables where, after initialization, some bits are read-only, * but other bits may still be modified concurrently. A reader may wish to * assert that this is true as follows: * * .. code-block:: c * * ASSERT_EXCLUSIVE_BITS(flags, READ_ONLY_MASK); * foo = (READ_ONCE(flags) & READ_ONLY_MASK) >> READ_ONLY_SHIFT; * * Note: The access that immediately follows ASSERT_EXCLUSIVE_BITS() is assumed * to access the masked bits only, and KCSAN optimistically assumes it is * therefore safe, even in the presence of data races, and marking it with * READ_ONCE() is optional from KCSAN's point-of-view. We caution, however, that * it may still be advisable to do so, since we cannot reason about all compiler * optimizations when it comes to bit manipulations (on the reader and writer * side). If you are sure nothing can go wrong, we can write the above simply * as: * * .. code-block:: c * * ASSERT_EXCLUSIVE_BITS(flags, READ_ONLY_MASK); * foo = (flags & READ_ONLY_MASK) >> READ_ONLY_SHIFT; * * Another example, where this may be used, is when certain bits of @var may * only be modified when holding the appropriate lock, but other bits may still * be modified concurrently. Writers, where other bits may change concurrently, * could use the assertion as follows: * * .. code-block:: c * * spin_lock(&foo_lock); * ASSERT_EXCLUSIVE_BITS(flags, FOO_MASK); * old_flags = flags; * new_flags = (old_flags & ~FOO_MASK) | (new_foo << FOO_SHIFT); * if (cmpxchg(&flags, old_flags, new_flags) != old_flags) { ... } * spin_unlock(&foo_lock); * * @var: variable to assert on * @mask: only check for modifications to bits set in @mask */ #define ASSERT_EXCLUSIVE_BITS(var, mask) \ do { \ kcsan_set_access_mask(mask); \ __kcsan_check_access(&(var), sizeof(var), KCSAN_ACCESS_ASSERT);\ kcsan_set_access_mask(0); \ kcsan_atomic_next(1); \ } while (0) ``` --- ### Compiler-time instrumentation [Dynamic Race Detection with LLVM Compiler Compile-time instrumentation for ThreadSanitizer](http://supertech.csail.mit.edu/papers/SchardlDeDo17.pdf) --- ### Compiler optimization https://lwn.net/Articles/799218/ Who's afraid of a big bad optimizing compiler? https://lwn.net/Articles/793253/ --- ### Clang 15.0.0 ThreadSanitizer https://clang.llvm.org/docs/ThreadSanitizer.html --- ### Papers https://www.usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf --- ### KCSAN: data-race in find_next_bit / rcu_report_exp_cpu_mult https://lwn.net/ml/linux-kernel/000000000000604e8905944f211f@google.com/ --- :::info https://github.com/google/kernel-sanitizers/blob/master/KTSAN.md https://github.com/google/kernel-sanitizers/blob/ktsan/Documentation/ktsan.txt https://github.com/google/kernel-sanitizers/tree/ktsan [KernelThreadSanitizer (KTSAN) slide](https://docs.google.com/presentation/d/1OsihHNut6E26ACTnT-GplQrdJuByRPNqUmN0HkqurIM/edit#slide=id.gdcd83eab6_0_5) :::

Read more

Linux 核心 Read-Copy Update 筆記整理

Linux 核心採納 Rust 的狀況

Making Dynamic Page Coalescing Effective on Virtualized Clouds. EuroSys '23

Ownership for C Language