# TSan PR
This PR introduces a new configure-time flag `--enable-tsan` to enable compilation with [ThreadSanitizer (TSan)](https://clang.llvm.org/docs/ThreadSanitizer.html) instrumentation. This is a joint work with Fabrice Buoro (@fabbing), based on [the work of Anmol Sahoo](https://anmolsahoo25.github.io/blog/thread-sanitizer-ocaml/) (@anmolsahoo25). We are also grateful to @jhjourdan and @maranget for their help on memory models and @gadmm and @art-w for useful feedback.
TSan is an approach developed by Google to locate data races in code bases. It works by instrumenting your executables to keep a history of previous memory accesses (at a certain performance cost), in order to detect data races, even when they have no visible effect on the execution. TSan instrumentation has been implemented in various compilers (gcc, clang, as well as the Go and Swift compilers), and has proved very effective in detecting hundreds of concurrency bugs in large projects.
Executables instrumented with TSan report data races without false positives. However, data races in code paths that are not visited will not be detected.
### Example
A data race consists in two accesses to the same location, at least one of them being a write, without any order being enforced between them:
```ocaml
type t = { mutable x : int }
let v = { x = 0 }
let () =
let t1 = Domain.spawn (fun () -> v.x <- 10; Unix.sleep 1) in
let t2 = Domain.spawn (fun () -> ignore v.x; Unix.sleep 1) in
Domain.join t1;
Domain.join t2
```
Compiling this program with a TSan-enabled compiler is done by the usual command `ocamlopt program.ml`. Running it will output a report looking like this:
```
==================
WARNING: ThreadSanitizer: data race (pid=4170290)
Read of size 8 at 0x7f072bbfe498 by thread T4 (mutexes: write M0):
#0 camlSimpleRace__fun_524 /tmp/simpleRace.ml:7 (simpleRace.exe+0x43dc9d)
#1 camlStdlib__Domain__body_696 /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/stdlib/domain.ml:202 (simpleRace.exe+0x47b5dc)
#2 caml_start_program ??:? (simpleRace.exe+0x4f51c3)
#3 caml_callback_exn /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:168 (simpleRace.exe+0x4c2b93)
#4 caml_callback /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:256 (simpleRace.exe+0x4c36e3)
#5 domain_thread_func /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/domain.c:1093 (simpleRace.exe+0x4c6ad1)
Previous write of size 8 at 0x7f072bbfe498 by thread T1 (mutexes: write M1):
#0 camlSimpleRace__fun_520 /tmp/simpleRace.ml:6 (simpleRace.exe+0x43dc45)
#1 camlStdlib__Domain__body_696 /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/stdlib/domain.ml:202 (simpleRace.exe+0x47b5dc)
#2 caml_start_program ??:? (simpleRace.exe+0x4f51c3)
#3 caml_callback_exn /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:168 (simpleRace.exe+0x4c2b93)
#4 caml_callback /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:256 (simpleRace.exe+0x4c36e3)
#5 domain_thread_func /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/domain.c:1093 (simpleRace.exe+0x4c6ad1)
Mutex M0 (0x000000567ad8) created at:
#0 pthread_mutex_init /home/olivier/other_projects/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1316 (libtsan.so.0+0x3cafb)
[...]
SUMMARY: ThreadSanitizer: data race /tmp/simpleRace.ml:7 in camlSimpleRace__fun_524
==================
ThreadSanitizer: reported 1 warnings
```
Many other examples can be found in the test suite. More information about ThreadSanitizer usage are available in [the readme of the ocaml-tsan repo](https://github.com/ocaml-multicore/ocaml-tsan/blob/tsan/README.adoc), which has been released as a fork based on 5.0.0 (release date December 16, 2022). This approach already allowed to detect a number of data races in OCaml libraries:
- Lockfree: https://github.com/ocaml-multicore/lockfree/pull/40, https://github.com/ocaml-multicore/lockfree/issues/39
- Domainslib: https://github.com/ocaml-multicore/domainslib/issues/72, https://github.com/ocaml-multicore/domainslib/pull/103
The generated executables should not be impacted by this change when `--enable-tsan` is not set. Compilation times are the same as before without `--enable-tsan`.
## Caveats
- We inherit the portability limitations of ThreadSanitizer of not being available on Windows.
- Currently, only instrumentation of x86_64 executables is supported; future work could include ARM64 support.
## How TSan works
Executables are instrumented with calls to the TSan runtime library, which tracks accesses to shared data and reports races.
Internally the runtime library associates with each word of application memory at least 2 "shadow words". Each shadow word contains information about a recent memory access to that word, including a "scalar clock". Those clocks serve to establish a happens-before relation.
This information is maintained as a "shadow state" in a separate memory region, and updated at every (instrumented) memory access. A data race is reported every time two memory accesses are made to overlapping memory regions, and:
- one of them is a write, and
- there is no established happens-before relation between them.
## Instrumentation pass
Instrumentation calls are inserted in several places:
- Memory accesses are instrumented with calls to the TSan runtime, with functions of the form `__tsan_read8`, `__tsan_atomic_store`; those calls are inserted into the `Cmm` representation of code. We exploit mutability information to improve performance: immutable loads are translated to non-instrumented memory reads.
- Function entries and exits are instrumented with calls to `__tsan_func_entry` and `__tsan_func_exit`. This is used by TSan to provide backtraces in data race reports (including backtraces of past memory accesses).
- The C runtime is instrumented as well, using the built-in ThreadSanitizer support of GCC or Clang. This is necessary as
1. many relevant memory accesses are made from the runtime, and
2. the runtime handles thread-synchronizing operations such as locking and unlocking, operations which TSan needs to know about.
Enabling `tsan` also causes C code to be built with `-fno-omit-frame-pointers`. The OCaml compiler's `--enable-frame-pointers` configure flag is however independent of `--enable-tsan`.
Instrumentation of function entries and exits is simple in C or C++ (where raising an exception always unwinds the stack), but challenging in OCaml which has non-local control flow due to exceptions and effect handlers. In order to keep correct backtraces in all circumstances, it is necessary to emit *ad hoc* calls to `__tsan_func_entry` and/or `__tsan_func_exit` when an exception is raised, an effect is performed, or a continuation is resumed, in order to keep TSan’s view of the call stack up-to-date.
Doing this requires unwinding the stack. When possible, we use the already-present mechanism of `frame_descr` descriptors, embedded in the executable, to do it. However, exceptions can be raised across C stack frames, and frame descriptors only exist for OCaml frames. In order to unwind the C parts of the stack, we use the [libunwind](https://github.com/libunwind/libunwind) library. As a consequence, one must install libunwind to build OCaml with `--enable-tsan`.
`__tsan_func_entry` takes as an argument the return address in the current stack frame. This required adding a constructor `Creturn_addr` to `Cmm` expressions and `Ireturn_addr` to the type `Mach.operation`, whose semantics is to fetch the return address from the stack frame.
## Embedding of the OCaml memory model into the C11 one
TSan’s underlying memory model is the C11 model. Therefore, we have to map OCaml memory operations to C11 operations. The relations between the two models have been the subject of many discussions (https://github.com/ocaml/ocaml/pull/10995, https://github.com/ocaml/ocaml/issues/10992, https://github.com/ocaml/ocaml/pull/10972#issuecomment-1035890572), which we took as a basis for our work. Conceptually, the instrumentation translates OCaml memory operations to C11 operations. Race-free OCaml programs are translated to race-free C programs, while OCaml programs containing races (in the OCaml sense) are translated to C programs that are racy (in the C11 sense).
**The resulting runtime analysis does not report false positives, in the sense that all races reported by TSan are indeed data races in the OCaml sense[^1].**
The mapping between OCaml memory operations and C11 operations signaled to TSan can be found [here](https://github.com/ocaml-multicore/ocaml-tsan/pull/22#issuecomment-1377439074). In the same thread, the previous post contains a justification of the absence of false positives. In a meeting, we presented these to @maranget and @jhjourdan, who agreed that it should have the correctness properties we claim it has.
## Performance cost of the instrumentation
First of all, there is no compilation or runtime cost associated with the change unless the compiler is configured with `--enable-tsan`.
When TSan instrumentation is enabled, it incurs a slowdown and increases memory consumption. Preliminary benchmarks show a slowdown in the range 7x-13x and a memory consumption increase in the range 2x-7x. Some pathological cases exist, such as `testsuite/tests/lf_skiplist/test_parallel.ml` for which time and memory usage blow up by 23x and 27x, respectively. (We haven’t investigated further yet.)
The memory consumption increase is better than for C/C++/Go ([this conference talk](https://www.youtube.com/watch?v=5erqWdlhQLA) reports 5x-10x). The slowdown is in range of what is observed in C/C++/Go (5x-15x), but since OCaml programs are expected to perform less mutation than C on average, there is probably room for improvement. There are probably low-hanging fruits in terms of optimization, notably removing instrumentation from the runtime where it is not strictly necessary to correctly detect races in user programs; and investigate the aforementioned pathological case.
## Organization of the changes
This PR is best reviewed commit by commit.
The bulk of the diff for this PR is due to a new set of tests in `testsuite/tsan/`. A small number of tests have to be disabled when `--enable-tsan` is set, due to the fact that a call tree of depth >64k causes the ThreadSanitizer runtime to crash. (This is a [limitation on the ThreadSanitizer side](https://gcc.gnu.org/bugzilla//show_bug.cgi?id=77538#c21).)
In the main commit, big changes are limited to a few files: the `Cmm` instrumentation pass is in `asmcomp/thread_sanitizer.ml[i]`. The new C file `runtime/tsan.c` handles the task of letting TSan know about function entries and exits when raising exceptions or handling effects. Finally, some of the instrumentation calls have to be inserted directly into the assembly routines of `runtime/amd64.S`.
In a separate commit, we disable TSan on a number of functions in the runtime. This is because TSan warns about data races on these functions. We reported those warnings in https://github.com/ocaml/ocaml/issues/11040, and silence them to avoid facing users of TSan with data race reports that are not related to their code. To save performance, we also un-instrument some functions that are executed often, but should not create races with user code.
[^1]: Except maybe in some rare cases involving data dependencies or `volatile`, due to limitations of TSan itself. False positives due to `volatile` can only appear in legacy FFI code, not written with parallelism in mind.