# 9: Concurrency (and Bonus Generics) ###### tags: `Tag(sp22)` ## Logistics * Many of you asked about testing your REPL. If you've got a generic REPL, then unit testing is a lot easier: * write tests that register new commands; * use dependency injection to provide the REPL with a custom input stream that lets you test it without needing to enter input at the keyboard. * Note some changes to Sprint 2 requirements (read the EdStem post)! #### Reading: Effective Java re: Concurrency Chapter 11 of [Effective Java](https://www-oreilly-com.revproxy.brown.edu/library/view/effective-java/9780134686097/?ar) will be useful to you when you start to program with concurrency in Java. Only _Item 78_ (the first in Chapter 11) is required reading. Please do read it; concurrency bugs are some of the most frustrating and time-consuming to encounter. Note especially that there are multiple things that can go wrong in a simple data access: even if only one thread is _writing_ (unlike our example today) synchronization may still be needed to ensure that other threads _read_ the updated value consistently. Continue to think _defensively_ as you use concurrency and parallelism. #### Optional Reference Save these for use later: [The "Gang of Four" patterns book](https://www-oreilly-com.revproxy.brown.edu/library/view/design-patterns-elements/0201633612/?ar) discusses, in detail, many more patterns than we can cover in lecture. [Angelika Langer's Java Generics FAQ](http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) is a wonderfully detailed reference for Java generics. ## Motivating Concurrency Let's go back to the example we used to motivate the proxy pattern: the hours queue dispatcher. You can find a cleaner version of the example [here](https://github.com/tnelson/cs32-livecode/tree/main/src/main/java/edu/brown/cs32/spring22/Mar01). If we run the `main` method in the `Main` class of that example, we'll get: ``` Dispatcher: Welcome to TA hours! Today we're discussing Concurrency Dispatcher: Hi Nim; you'll be seen by Galen Galen says: Hello Nim! Galen says: Goodbye Nim, I hope that helped!! Dispatcher: Hi Alice; you'll be seen by Jenny Jenny says: Hello Alice! Jenny says: Goodbye Alice, I hope that helped!! Dispatcher: Hi Bob; you'll be seen by Galen Galen says: Hello Bob! Galen says: Goodbye Bob, I hope that helped!! Dispatcher: Hi Charli; you'll be seen by Jenny Jenny says: Hello Charli! Jenny says: Goodbye Charli, I hope that helped!! Dispatcher: Hi Boatswain; you'll be seen by Galen Galen says: Hello Boatswain! Galen says: Goodbye Boatswain, I hope that helped!! Dispatcher: Hi Bucky; you'll be seen by Jenny Jenny says: Hello Bucky! Jenny says: Goodbye Bucky, I hope that helped!! Dispatcher: Nobody waiting in queue, will check again in three seconds. ``` By itself, this doesn't look too bad. But what happens if we watch the timing of these events? ``` 11:27:21 Dispatcher: Welcome to TA hours! Today we're discussing Concurrency 11:27:21 Dispatcher: Hi Nim; you'll be seen by Galen 11:27:21 Galen says: Hello Nim! 11:27:22 Galen says: Goodbye Nim, I hope that helped!! 11:27:22 Dispatcher: Hi Alice; you'll be seen by Jenny 11:27:22 Jenny says: Hello Alice! 11:27:24 Jenny says: Goodbye Alice, I hope that helped!! 11:27:24 Dispatcher: Hi Bob; you'll be seen by Galen 11:27:24 Galen says: Hello Bob! 11:27:25 Galen says: Goodbye Bob, I hope that helped!! 11:27:25 Dispatcher: Hi Charli; you'll be seen by Jenny 11:27:25 Jenny says: Hello Charli! 11:27:27 Jenny says: Goodbye Charli, I hope that helped!! 11:27:27 Dispatcher: Hi Boatswain; you'll be seen by Galen 11:27:27 Galen says: Hello Boatswain! 11:27:29 Galen says: Goodbye Boatswain, I hope that helped!! 11:27:29 Dispatcher: Hi Bucky; you'll be seen by Jenny 11:27:29 Jenny says: Hello Bucky! 11:27:30 Jenny says: Goodbye Bucky, I hope that helped!! 11:27:30 Dispatcher: Nobody waiting in queue, will check again in three seconds. 11:27:33 Dispatcher: Nobody waiting in queue, will check again in three seconds. ``` This doesn't look so great. We have some problems to solve. If you look at the code, you might see other problems too. Here are three: * Challenge 1: the dispatcher is waiting for _an individual TA_ to finish helping a student before allocating the next TA. Lots of TAs will be idle, and students will wait a lot longer. * Challenge 2: maybe we'd like to add TAs while the dispatcher is running. * Challenge 3: how can new students join the queue? All of these problems are related to today's topic: _concurrency_: * We'd like TAs to be able to help students without holding up the dispatcher from allocating other students to other TAs. _We need the dispatcher to run concurrently with the TAs._ * We need a way for the `Main` class to call `dispatcher.addTA()` after the dispatcher starts running. But right now, there's no way for the `Main` class to run independently of the dispatcher. _We need the dispatcher to run concurrently with the `main()` method._ * We need a way for new students to be added to the queue. This is the same problem as above! Concerns like these pop up all the time in engineering. The way I've written the dispatcher library is just wildly unrealistic! ## How does concurrency help us? A program is _concurrent_ if it involves multiple simultaneous threads of execution. Note that this doesn't necessarily mean that these multiple threads really are running at the same time, just that they are "executing". We will make a distinction in this class between concurrency and _parallelism_, where threads really are executing at the same time, usually on multi-core hardware or in the cloud. A parallel program is always concurrent, but a concurrent program may or may not actually be parallel. To illustrate this idea, consider what your computer is doing right now. You're probably running more programs than you can count, even before you think about what your operating system is doing. How many CPU cores do you have? Probably not more than a dozen (and likely fewer). So not all of the concurrency that's happening can be parallelized: you'd need hundreds, or thousands of cores for that! Instead, your operating system runs a _scheduler_ program which allocates slices of time to different threads of execution. Concurrency is more common than you might imagine. Because we're working with Java, _every_ program you write is concurrent, even a "Hello World!" program. Why? <details> <summary><B>Think, then click!</B></summary> The garbage collector! </details> ### Using concurrency to get what we want Imagine logically splitting our big program into separate, independent "threads" of execution: one that runs the dispatcher, another that runs when a TA helps a student, and so on. We just need to separate them from each other, and help them communicate. So far we've only had one thread: the one that starts up in our `main` method. How do we get another, and which should it correspond to? <details> <summary><B>Think, then click!</B></summary> There are a few options. But let's start simply, and not try to solve _all_ the challenges at once. We'll have every TA correspond to their own thread, and have those threads woken up by the dispatcher. </details> ### Runnables and Threads Java has an interface called `Runnable`, which requires the implementation of a `run()` method. It's also got a class called `Thread`, which has a constructor that accepts a `Runnable` and a method called `start()`. So, as a first cut, let's make `TA` implement `Runnable`, and whenever we dispatch a student to that TA, we run the thread. ```java public class TA implements Runnable { // ... public void seeStudent(Student student) throws TABusyException { if(helping != null) throw new TABusyException(); helping = student; new Thread(this).start(); // NOT the same as .run() } // ... @Override public void run() { // help the student (simulate by pausing for 1.5 seconds -- we're VERY EFFICIENT) System.out.println(timestamp()+" "+name+ " says: Hello "+helping+"!"); try { Thread.sleep(1500); } catch (InterruptedException e) { System.out.println("Oops -- terminated!"); // not a great message System.exit(1); } System.out.println(timestamp()+" "+name+ " says: Goodbye "+helping+", I hope that helped!!"); helping = null; } } ``` Now, when we run, we'll see: ``` 11:45:32 Dispatcher: Welcome to TA hours! Today we're discussing Concurrency 11:45:32 Dispatcher: Hi Nim; you'll be seen by Galen 11:45:32 Dispatcher: Hi Alice; you'll be seen by Jenny 11:45:32 Galen says: Hello Nim! 11:45:32 Jenny says: Hello Alice! 11:45:33 Jenny says: Goodbye Alice, I hope that helped!! 11:45:33 Galen says: Goodbye Nim, I hope that helped!! 11:45:33 Dispatcher: Hi Bob; you'll be seen by Jenny 11:45:33 Dispatcher: Hi Charli; you'll be seen by Galen 11:45:33 Jenny says: Hello Bob! 11:45:33 Galen says: Hello Charli! 11:45:35 Galen says: Goodbye Charli, I hope that helped!! 11:45:35 Jenny says: Goodbye Bob, I hope that helped!! 11:45:35 Dispatcher: Hi Boatswain; you'll be seen by Galen 11:45:35 Dispatcher: Hi Bucky; you'll be seen by Jenny 11:45:35 Galen says: Hello Boatswain! 11:45:35 Jenny says: Hello Bucky! 11:45:35 Dispatcher: Nobody waiting in queue, will check again in three seconds. 11:45:36 Jenny says: Goodbye Bucky, I hope that helped!! 11:45:36 Galen says: Goodbye Boatswain, I hope that helped!! 11:45:38 Dispatcher: Nobody waiting in queue, will check again in three seconds. 11:45:41 Dispatcher: Nobody waiting in queue, will check again in three seconds. ``` Much better! ### Concurrency vs. Parallelism (Again) The `TA` thread and the main thread really are running separately. It's not clear whether they are truly running _in parallel_, though: the operating system and Java runtime decide that, in part based on how many cores the hardware has available. ## What Could Go Wrong? Concurrency seems really powerful. But are there any risks associate with it? Let's investigate. Suppose we want to record how many students have been seen before the dispatcher terminates. One natural way to do this is by adding a `static` counter to the dispatcher class: ```java static int studentsSeen = 0; ``` Now, every `TA` can increment this counter in its `run` method, when the student has been helped: ```java HoursDispatcher.studentsSeen += 1; ``` If we tell the dispatcher to print this counter out, we'll see output like this at the end of the queue: ``` 11:56:10 Dispatcher: Nobody waiting in queue, will check again in three seconds. So far we helped 6 students. ``` And indeed that's what we see. But let's see how this works _at scale_. Instead of using these names, we'll create a hundred TAs, and a few thousand students who need to be helped! To keep things simple, we'll just have one queue of students. ```java List<Student> oneQueue = new ArrayList<>(); final int numStudents = 30000; final int numTas = 1000; for(int i = 1;i<=numStudents; i++) oneQueue.add(new Student("Student"+i)); HoursDispatcher dispatcher = new HoursDispatcher(oneQueue.iterator(), "Concurrency 2"); for(int i = 1;i<=numTas;i++) dispatcher.addTA(new TA("TA"+i), 120); dispatcher.dispatch(); ``` So that we can simulate helping so many students without waiting, let's reduce the delay time to help a student: students will be helped instantaneously! We'll also remove the printing in the `TA` class, since that slows things down. We might see something like this: ```12:10:52 Dispatcher: Nobody waiting in queue, will check again in three seconds. So far we helped 299993 students.``` Uh oh. What's going on? (By the way, this issue might be less likely to happen if we left the printing in.) <details> <summary>Think, then click!</summary> I've made the classic _thread safety_ mistake. Incrementing that counter isn't _atomic_: two threads might be trying to edit it at once. Suppose they both fetch the current value at once, add 1 to that value, and then write: the counter will only be incremented once! </details> </br> This sort of issue is _pervasive_ in multi-threaded programming. Do the reading---you'll save yourselves a lot of pain. ## How Can We Fix This? The first approach is old-school synchronization: ```java synchronized (HoursDispatcher.class) { HoursDispatcher.studentsSeen += 1; } ``` This will tell Java that only one `TA` can be running that increment operation at a time. (The argument to `synchronized` helps disambiguate between multiple dimensions of synchronization we might have happening). Another approach is to use _thread safe objects_ from Java's standard library. In particular, I could have used an `AtomicInt` for the counter: ```java static AtomicInteger studentsSeen = new AtomicInteger(0); ``` ```java HoursDispatcher.studentsSeen.incrementAndGet(); ``` Both of these approaches fix the problem. ## More Concurrency Tips <A name="tips"></A> There are many, many ways to manage concurrency in Java and other languages. Your lab this week very briefly introduces the _Future/Promise_ approach, made simpler in JavaScript via the `asynch` and `await` keywords. We'll talk more about this next time, when we talk about concurrency in JavaScript. There's another file, [ThreadPoolDemo.java](https://github.com/tnelson/cs32-livecode/blob/main/src/main/java/edu/brown/cs32/fall21/Oct12Prep/ThreadPoolDemo.java), in the livecode repo that demonstrates another way of working with threads in Java that's very well suited to applications where you need to carve up a lot of work (which you hope to parallelize across multiple cores) or don't yet know how much work you have to do. We won't cover it in lecture, but it could be a useful example for you on your projects. ## Generics (Part The Second) ## Making Sense of Generics: the LSP <a name="lsp"></a> Let's start with a puzzle about subtyping. We know that `Object` is a supertype of `Number`, which is a supertype of `Integer`. One thing this buys us is that we can plug in something with `Number` type anywhere an `Object` is expected, or pass an `Integer` to a method that accepts `Numbers`. How does this work with generic types? Here's a method: ```java static Number sumAll(Collection<Number> numbers) { int sum = 0; for(Number num : numbers) { sum += num.intValue(); // "may involve rounding or truncation" } return sum; } ``` ### Which types work to store the result of `sumAll`? The method by itself typechecks just fine. Now let's call `sumAll`. We'll start by giving it an empty collection of numbers, and saving the result to a variable. _What types of variable can we store the return value of `sumAll` in?_ Concretely, which of the 3 assignments below do you expect to produce a type error? ```java Collection<Number> someNums = new HashSet<>(); Number aNumber = sumAll(someNums); // ? Integer anInteger = sumAll(someNums); // ? Object anObject = sumAll(someNums); // ? ``` <details> <summary><B>Think, then click!</B></summary> `Number` and `Object` both work OK, but `Integer` produces a type error. This is perhaps what we expected: after all, that's how subtyping works in languages like Java! The type system doesn't have enough info to know that the return value is always an `Integer` (although it _is_), and so it forces us to use variables of a less-specific type. </details> <BR/> ### Which types work as arguments to `sumAll`? Now suppose we have 3 `Collection` objects we could pass in to `sumAll`. Keeping in mind the result of the last experiment, which of these do you expect to work? ```java Collection<Number> someNums = new HashSet<>(); Collection<Integer> someInts = new HashSet<>(); Collection<Object> someObjects = new HashSet<>(); sumAll(someNums); // ? sumAll(someInts); // ? sumAll(someObjects); // ? ``` <details> <summary><B>Think, then click!</B></summary> Only `someNums`, the `Collection<Number>`, works. Both of the others produce a type error. In spite of the fact that `Integer` _is_ a subtype of `Number`, `Collection<Integer>` is not a subtype of `Collection<Number>`. The same goes for `List`, `Set`, and so on. </details> <BR/> ### Wait, what? Why do you think this is the case? Is Java's type checker just bad? One way to explore whether or not this behavior is reasonable is to experiment. Suppose that Java had let us use a `Collection<Integer>` as a subtype of `Collection<Number>`. Can you write a program that would then produce a run-time type error? (Hint: you don't need more than a 2 or 3 lines; you don't even need to use the `someAll` function.) <details> <summary><B>Think, then click!</B></summary> Here's one: ```java someNums = someInts; // the problematic line someNums.add(Math.PI); // adding a Number to a collection of Numbers for(int i : someInts) { System.out.println(Integer.numberOfLeadingZeros(i)); } ``` </details> <BR/> This is why Java does what it does. But sometimes we really do want to accept "a set of any kind of number" without knowing in advance exactly which type it is. And this is where generics become a little bit more complex. Before we start, I want to cover a rule that can _really_ help clear up confusion about generic types. It's called the Liskov Substitution Principle or LSP. You can look up the full LSP if you want, but here I'm going to put a particular spin on it: > If you're able to safely use an object of type $T$ someplace, you should also be able to safely use an object of type $S$, where $S$ is a subtype of $T$. This is the guiding principle that the above example violates, and it's worth keeping in mind as you work with generics in the future. ## Type variables and Wildcards <a name="wildcards"></a> Let's get more concrete. What if we were trying to write the type for a sorting function? All we'd like to depend on is that elements of the type are comparable to other elements of that type: ```java public static <T extends Comparable<T>> void sortSomeRecords(List<T> lst) ``` Hopefully the above example motivates why we needed the type variable: _any_ kind of `Comparable` will do, so long as it can compare against its own type. Java also allows _wildcard_ type variables, which are written `?`. A wildcard represents a type variable that won't be used elsewhere, so doesn't require a unique name. But we couldn't have used a wildcard here, since we needed `T` both to say what type to compare against, and to label the argument. Java's standard library sorting function, however, uses both a type variable and a wildcard: ```java public static <T extends Comparable<? super T>> void sort(List<T> list) ``` This allows `T` to implement comparisons against any of its supertypes. (By the way, the bit declaring `T` isn't part of the return type; it's just a note to the type checker.) ### What a variable means Here are some things to try: ```java someNums.add(1); // setup (works) someNums.add(1.5); // setup (works) Collection< ? extends Number> someNums2 = new HashSet<>(); // ? someNums2.add(1); // ? someNums2.add(1.5); // ? someNums2.add(null); // ? someNums2 = someNums; // ? ``` What's going on? <details> <summary><B>Think, then click!</B></summary> `? extends Number` does not mean "a list of objects which all extend `Number`". It means "a list of objects of _one single type_, where that type extends number". We can assign `someNums` to `someNums2` because `someNums` is a collection of `Number` (which fits into the wildcard). But we cannot add an `Integer` to `someNums2`, not even if we try to typecast. Why? Because we could have assigned `someInts` to `someNums2` instead, or `someDoubles` or `someFloats`! Java has no guarantee (at least, not without a far more sophisticated and time-consuming code-crawl) that it will be safe to add an integer to whatever `someNums2` references. </details> <BR/>