# "Publication safety"
Consider the OCaml program:
```ocaml
let r : int ref ref = ref (ref 0)
let t0 () = (* Thread 0 *)
let v = ref 42
r := v
let t1 () = (* Thread 1 *)
let r1 = !r in
let r2 = !r1 in
print_int r2
```
Let us try to understand its execution at a slightly lower level by rewriting it in C:
```c
/* Let's assume for the sake of simplicity that these three initial writes are immediately propagated to all CPUs */
value **r = alloc_small(1);
*r = alloc_small(1);
**r = Val_int(0);
P0() {
value *v = alloc_small(1);
*v = Val_int(42);
/* The C11 equivalent of `r := v` is the two operations below */
atomic_thread_fence(acquire);
atomic_store_release(r, v);
}
P1() {
value *r1 = atomic_load_relaxed(r);
value r2 = atomic_load_relaxed(r1);
print_int(Int_val(r2));
}
```
Now, suppose that `r1` contains `v`. Will the load from `r1` in P1 see the initializing store to `v`, i.e., `*v = Val_int(42);`? If not, `r2` will contain uninitialised garbage, which is very bad (notably because it breaks type safety). Publication safety consists in assuring that this doesn’t happen. In the program above, it doesn’t.
This program may appear racy, but in fact it isn’t. The reasoning goes as follows:
- The release store in P0 is also a release fence, i.e. it guarantees that all stores before it in program order also happen-before it.
- If r1 = v in P1, then necessarily the store of v in r got propagated to CPU 1 before the first load of P1. If r1 != v, then the second load is not from v and thus there is no race on v.
- There is an address dependency from the first load to the second in P1, so the second cannot be reorder before the first, because the first load must be performed for the CPU to know which location the second load should access.
In other words, the store of 42 to v in P0 is necessarily executed before the store of v to r, which is necessarily executed before the load from r in P1 (assuming it returns v), which is necessarily executed before the load from v in P1. Hence, there is no data race on v, and either this program prints 0 or it prints 42.
Source for statements 1 to 3 in my reasoning above: https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt. Keywords: ppo (preserved program order), rfe (external reads-from), addr (address dependency), hb (happens-before).
The reasoning works in the LKMM memory model. In C11, two things are missing:
- C11 considers that only an acquire load that reads from a release store can establish a hb relation. The LKMM relaxes this by allowing the same reasoning on "relaxed" loads that read from a release store (I’m using quotes because the LKMM does not have a notion of relaxed atomics, they use `READ_ONCE` and `WRITE_ONCE` which are similar)
- C11 does not have the notion of address/data dependencies. Well, formally it does with `memory_order_consume`, but that was never correctly implemented in compilers, preventing us from using it and forcing us to reason outside of C11.