--- tags: algorithm analysis, week 13, java --- # Atomicity, Sequential Consistency and Weak Memory (part of my course Algorithm Analysis 2023 at Chapman University) This lecture shows that multi-threaded computations on modern multi-core hardware violates some of the properties (atomicity, sequential consistency) that we have been taken for granted ever since CPSC-101. We exhibit a program showing that the memory model of Java is not sequentially consistent, that is, may reorder instructions in a way that allows "impossible" or "inconsistent" results. The lecture is building up to this, but you can jump directly to the section [Weak Memory](https://hackmd.io/Gqs5p8gaREqtomu2tOvLhQ?both#Weak-Memory) and run [`memoryModelWithStats`](https://www.online-java.com/xMhXfG6So1). **Important:** All multi-threaded programming examples in this lecture are executed non-deterministically (why?). So it is crucial to run them many times to get a sense of each program's behavior. Moreover, while I put the programming examples in the cloud, it is essential to run them on your local machine as behaviors may differ. ## Introduction In **Week 4 and 7** we learned about distributed databases. In particular, in the discussion of the CAP-theorm, in the paper by Susan Davidson etal, we talked about **sequential consistency**. Recall the following example: ![](https://hackmd.io/_uploads/S1kiSLrm3.png) This week we will look at the same phenomenon from the point of view of multi-threading on multi-core hardware. In **Week 9** we learned about **mutual exclusion** (which I called the "hello world" of concurrency) and the **Peterson algorithm**. Then we used a model checker to prove the correctness of Peterson's algorithm. Today we will see Peterson's algorithm in Java and discuss Java's (weak) memory model. We conclude with Amdahl's law which, even if rough, allows for quick back-of-the-envelope calculations showing that locks are costly and non-blocking algorithms are important. ## Atomicity Non-atomicity can be illustrated with a simple example of two threads that increment the same variable (notice that this is the multi-threaded version of the bankacount example above): - [Python](https://colab.research.google.com/drive/1mwDC1No8Xbk6Vej2QoGMcLlHBgdUHmKD?usp=sharing) - [Java](https://www.online-java.com/PVcKvqgYRu); here is [another version](https://www.online-java.com/lADgjO8w1a). In class, we worked together on the following exercise. With the help of AI, it only takes a few minutes to solve it. Just paste the code and prompt the AI. **Exercise:** Can you modify the programs above to give the expected result? *My solution to the exercise* (your's may look different): - Python: See [here](https://colab.research.google.com/drive/1dLotbXgbAuEYpqF3Coxkw6bw7T2SrV6Z?usp=sharing) - Java: We can use the class `AtomicInteger`, see [here](https://www.online-java.com/tPqBfET3ak). - The same effect can be achieved with `synchronized`, see [here](https://www.online-java.com/f0eO6CZqAg). ## Does Mutual Exclusion Need Atomicity? In the previous exercise we have seen how the use of locks can ensure mutual exclusion. Timing the version with lock and the version without lock, we also saw that using locks to synchronize threads is expensive. The question we investigate now is whether the Peterson algorithm is an algorithm that needs locks in order to synchronize. Peterson's algorithm does implement a look (the Peterson lock), but, as you can see by inspection of the code, it does not itself make use of locks. Let us run some experiments with Peterson's algorithm in Python and Java. In class, we did the following exercises. **Exercise/Homework:** Have a look at [`peterson.py`](https://colab.research.google.com/drive/126wdCs8F_VJ7VwTcaXUv70PwAVTGzVy_?usp=sharing). What program behavior do you expect? Is your expectation confirmed when you run the program? **Exercise/Homework:** Analyse the Java program [`peterson`](https://www.online-java.com/G0N5eA9tR6) in the same way as you analysed the Python program in the previous exercise. Make sure to run the Java program on your local machine. What observations do you make? Again, the following exercises should take only a few minutes with the help of AI: **Exercise:** Can you modify the code of the Java program `peterson` so that you know for sure that the unexpected behaviour is due to the failure of mutual exclusion? (Remember how we used an assertion to verify mutual exclusion in the class on model checking.) *My solution:* [`petersonAssert`](https://www.online-java.com/PpcRx4gk56). Run this locally your on machine, not online. Run with `java -ea Main` to enforce that the assertion is checked at runtime. We didn't do the following exercise in class, but one solution is to qualify the variables (which variables exactly) with `volatile`. To understand what is going on, we need to talk first about weak (or relaxed memory). **Exercise:** Can you modify the Java program`peterson` so that we always obtain the expected behaviour? ## Weak Memory To understand why the Java program `peterson` did not work as expected (that is, did not guarantee mutual exclusion), we make the following example. **Discussion:** Which results are possible when running the Java program [`memoryModel`](https://www.online-java.com/yFkZ4HqfRG)? We spent some time in class on this. In particular, we explained the notion of **sequential consistency** at the hand of this example. **Exercise/Homework:** Explain why the outcome $a=0,b=0$ is not sequentially consistent, but the other three outcomes are. We did the following exercise in class, again with the help of AI. **Exercise:** Modify the program so that it counts how often each result is obtained. What observations do you make? *My solution:* [`memoryModelWithStats`](https://www.online-java.com/xMhXfG6So1). I also link [`memoryModelWithAssert`](https://www.online-java.com/HOTElYhKy3). **Exercise/Homework:** Report the results you get from running [`memoryModelWithStats`](https://www.online-java.com/xMhXfG6So1) on your local machine. Include the specs of our processor, in particular the number of cores. If you can find out something about the cashes, add this as well. **Exercise/Homework:** You can force sequential consistency of [`memoryModelWithStats`](https://www.online-java.com/xMhXfG6So1) by declaring certain variables `volatile` (see below). In general, declaring variables as `volatile` comes at cost in execution time, so we want to use this sparingly. Which variables must be declare as `volatile` to ensure sequential consistency? Measure and report the effect that `volatile` has on your run time (I use `java Main | gnomon`). ## Sequential Consistency A multi-threaded computation is **sequentially consistent** if there is a sequential computation that leads to the same result. See the example we discussed in the previous section for illustration and Chapter 3 of [TAMP] for more details and precise definitions. ## The keywords 'volatile' and 'synchronized' in Java Roughly speaking, `synchronized` enforces atomicity, while `volatile` forces the content in a cash to be flushed back to the main memory. Here are some more details: - A "volatile" variable is always directly written through to main memory and updated from main memory. With a "synchronized", first a lock has to be acquired, then the same happens with the variables - updated from main memory, written to main memory - and then an unlock has to happen. That's two operations more, and, in particular, locking is costly. It would, however, pay, if some more complicated operations happen with the variables which inside a synchronised block need not always be updated from main memory - only at the start - and written back to main memory - only at the end. - `volatile`: TAMP, Ch 3.8.2 "Although reading and writing a volatile field has the same effect on memory consistency as acquiring and releasing a lock, multiple reads–writes are not atomic. For example, if `x` is a volatile variable, the expression `x++` will not necessarily increment x if concurrent threads can modify `x`. ... One common usage pattern for volatile variables occurs when a field is read by multiple threads, but only written by one ... The `compareAndSet()` and `set()` methods act like volatile writes, and `get()` acts like a volatile read." - Baeldung: [Volatile vs. Atomic Variables in Java](https://www.baeldung.com/java-volatile-vs-atomic) - Brian Goetz: [Managing volatility](http://web.archive.org/web/20210221170926/https://www.ibm.com/developerworks/java/library/j-jtp06197/), linked from [stackoverflow](https://stackoverflow.com/questions/106591/what-is-the-volatile-keyword-useful-for). --- (this is how far we got on Tuesday) --- ## Amdahl's Law I use Chapter 1.5 of [TAMP] as my reference for this section. To quote (slightly edited): - Amdahl’s Law captures the extent to which we can speed up any complex job by how much of the job must be executed sequentially. - The speedup $S$ is the *ratio between the time it takes one processor to complete the job versus the time it takes $n$ concurrent processors* to complete the same job. - $n$ is the number of processors collaborating. - $p$ is the *fraction of the job that can be executed in parallel* - Amdahl’s Law characterizes the maximum speedup $S$ that can be achieved given $n$ and $p$. - Assume, for simplicity, that it takes (normalized) time $1$ for a single processor to complete the job. With $n$ concurrent processors, the parallel part takes time $p/n$ and the sequential part takes time $1 − p$. Overall, the parallelized computation takes time $$(1-p) + \frac p n$$ - Amdahl’s Law says that the speedup, that is, the ratio between the sequential (single-processor) time and the parallel time, is: $$ S = \frac 1 {1-p + \frac p n} $$ **Example:** Consider a '5-room' house with 4 equally sized rooms and one double sized room. 5 friends want to parallelize painting the house. This means that $n=5$ and $p=5/6$. The formula then gives $S=3$. **Exercise:** Say, in the example above, one small room takes one person one unit of time. In Amdahl's model, how much time does it take 5 painters to paint the house? The morale of the story is that because of blocking (only one painter can paint the big room) increasing the power of the hardware by 5 (in the example of the 5-room house) only gives a speed-up of 3. To summarize, blocking is expensive. ## Acknowledgements I am grateful to Samuel Balco, Alexander Knapp and Tom Ridge for discussions about the Java memory model and weak memory more generally. ## References The main reference for this week is - [TAMP] Herlihy, Shivat: [The Art of Multiprocessor Programming](https://github.com/amilajack/reading/blob/master/Computer_Science/The%20Art%20of%20Multiprocessor%20Programming.pdf), 2008. If you want to have a look at this material from the point of view of C++ - Anthony Williams: [C++ Concurrency in Action - Practical Multithreading](https://www.manning.com/books/c-plus-plus-concurrency-in-action), 2012. In particular Chapter 5 "The C++ memory model and operations on atomic types." A useful [overiew of weak memory](https://preshing.com/20120930/weak-vs-strong-memory-models/). Some theory papers on weak memory I looked at are - Cenciarelli, Knapp, Sibilio: [The Java Memory Model: Operationally, Denotationally, Axiomatically](https://link.springer.com/chapter/10.1007/978-3-540-71316-6_23), ESOP 2007. - Tom Ridge: [A Rely-Guarantee Proof System for x86-TSO](http://www.tom-ridge.com/www_resources/doc/ridge10vstte.pdf), VSTTE 2010 - Sevcik, Vafeiadis, Nardelli, Jagannathan, Sewell: [Relaxed-Memory Concurrency and Verified Compilation](https://www.cl.cam.ac.uk/~pes20/CompCertTSO/doc/paper.pdf), 2020. - Simner, Armstrong, Pichon-Pharabod, Pulte, Grisenthwaite, Sewell: [Relaxed virtual memory in Armv8-A](https://link.springer.com/chapter/10.1007/978-3-030-99336-8_6), ESOP 2022. Section 3 has a good discussion of the design issues of relaxed memory. ## Appendix: Chapter 2 of [TAMP] We quickly compare some of Chapter 2 of [TAMP] with what we have done on mutual exclusion before. Back then, we used Promela and Spin to automatically verify correcntess properties. [TAMP] instead uses Java. Figure 2.4 of TAMP (The LockOne algorithm). This is essentially the same as our`mutex.try1.pml`, but now in Java. ```java= class LockOne implements Lock { private volatile boolean[] flag = new boolean[2]; // thread-local index, 0 or 1 public void lock() { int i = ThreadID.get(); int j = 1 - i; flag[i] = true; while (flag[j]) {} // wait } public void unlock() { int i = ThreadID.get(); flag[i] = false; } } ``` Fig. 2.5 of TAoPM (The LockTwo algorithm). This is what we called `mutex.try1.pml`. ```java= class LockTwo implements Lock { private int victim; public void lock() { int i = ThreadID.get(); victim = i; // let the other go first while (victim == i) {} // wait } public void unlock() {} } ``` Fig. 2.6 of TAMP (The Peterson lock algorithm) ```java= class Peterson implements Lock { // thread-local index, 0 or 1 private volatile boolean[] flag = new boolean[2]; private volatile int victim; public void lock() { int i = ThreadID.get(); int j = 1 - i; flag[i] = true; // I am interested victim = i; // you go first while (flag[j] && victim == i) {}; // wait } public void unlock() { int i = ThreadID.get(); flag[i] = false; } } ```