Parallelism/Concurrency

Lecture 16: Parallelism/Concurrency --- :::warning Announcements - **Please sit in the first 2 rows of the room**. - Quiz 2 grades likely later this week - A7: Modularity due Thursday - Next help hours: - Mine 6:35-8:30pm, L037 - Grace's 1-2pm Wednesday, Zoom ::: :::success Agenda for today - Parallelism/concurrency - Goals/non-goals - Parallelism vs. concurrency, vocab - Summary of landscape - Promises - OCaml parallelism with `domainslib` ::: # Parallelism/Concurrency For the next topic, we'll do a brief introduction to *parallelism and concurrency* and how they impact programming languages. Almost every modern programming language has support for concurrency. While the exact mechanics differ, understanding the core concepts will help you pick up concurrent programming in new languages in the future! In CS251, we only have about a week to cover this topic, so we'll be explicit about our goals and non-goals: ## Goals - Familiarity with high-level ideas (enough that you can recognize key concepts when you see them in other languages). - Non-sequential thinking (a break from programming languages as "one step after another"). - Some mid-to-high-level mechanisms (writing simple concurrent/parallel programs in OCaml). ## Non-goals - Detailed performance engineering / measurement (this would require more than a week to cover). - Deep parallel programming proficiency (again, not enough time!). - Exhaustive survey of models and mechanisms. - ... see CS240, CS340, and CS343! As well as courses at MIT, etc! # Parallelism vs. concurrency Both "parallelism" and "concurrency" eliminate 1 big assumption: "evaluation happens as a specific sequence of ordered steps". There are many use cases where we want a computer to do ==more than one thing at once==! For example: - When you want to do Control-C in the terminal on a program that's already running. :::success With your table, discuss other examples of "computer(s) doing multiple things at once" ::: :::spoiler Think, then click! - When you are listening to music and writing code. - When you are taking notes on a tablet, but also want to receive push notifications. - When you want to scroll on a website or app, while it also receives data from the server over the network. - Multiple users should be able to like an Instagram photo at once. ::: People use both "concurrency" and "parallelism" to refer to these ideas, but there are subtle differences between the two concepts. :::info **Concurrency:** ==Coordinate== access to ==shared resources== across the same period of time. ::: For example: one [CPU core](https://en.wikipedia.org/wiki/Central_processing_unit) can only perform one instruction at a time. Example: Spotify and VSCode might need to coordinate access to one shared core. ![Screenshot 2025-04-06 at 8.46.03 PM](https://hackmd.io/_uploads/r1LUxjeCke.png) :::info **Parallelism:** Use ==more resources== to complete work faster. ::: For example: high-performance computer chips that parallelize machine learning applications. ![Screenshot 2025-04-06 at 8.45.54 PM](https://hackmd.io/_uploads/S1aBxsgR1x.png) How are these different? Often, concurrency is implemented via _interleaving_, or the operating system rapidly switching between computations. This is likely what's happening when you are scrolling on a webpage while data also loads. There might be only one CPU actually doing the computation to scroll and to load, but the OS (for example, iOS) switches back and forth so rapidly that you don't notice. True *parallelism* is when the underlying system is capable of performing multiple actions literally simultaneously (for example, by having multiple CPU cores, as most modern laptops have). Most processors today are _multicore_. In distributed systems, computations are shared across servers that may not even be in the same location. ### Analogy CS111 idea: A program is like a recipe for a cook. One cook does one thing at a time! (Sequential) #### Concurrency: - Lots of cooks making different things, but only 4 stove burners. - Want to allow access to all 4 burners, but not cause spills or incorrect burner settings. #### Parallelism: - Have lots of veggies to slice? - Hire more chefs, hand out potatoes and knives. - But too many chefs and you spend all your time coordinating! :::success Group exercise: determine whether the following involve primarily concurrency, primarily parallelism, or both. 1. The map reduce algorithm run on 100 worker computers. 2. A bank processing two transactions to the same account in one day. 3. You having multiple tabs open in your phone browser. 4. Multiple CS251 students submitting A7 to Gradescope at the exact same time. ::: :::spoiler Think, then click! 1. Primarily parallelism, since we are using more resources (multiple worker computers). There is also some concurrency involved in coordinating the reduction back to a combined answer. 2. Primarily concurrency. The bank's database is the shared resource. 3. Both parallelism and concurrency. Concurrency because the touch-screen and phone processor are a shared resource. Likely parallelism, too, because most smartphones contain multiple cores. 4. Both parallelism and concurrency. Concurrency because the Gradescope database is a shared resource. Parallelism because each student has their own computer that they are using to submit. ::: # Common parallel/concurrent abstractions Computer scientists have many shared abstractions that transcend individual programming languages. We'll start with _processes_ and _threads_. ## Processes Processes are currently-running program executions. Processes run individual program executables: in our example, Spotify and VSCode must be running in different processes. One program (defined as a specific file or set of files) can be run in multiple processes at the same time. ## Threads A _thread_ is a single sequential stream of computation. A process can contain multiple threads. For example, the Spotify process may contain multiple threads running at once, one for controlling the user interface, and several for loading data from the Spotify servers. Most desktop applications have multiple concurrent threads. By default, all threads within a process share memory (their heaps), whereas processes do not share memory by default. We'll leave the details of processes to CS240 and later classes. Multiple threads can run at once, either via interleaving or parallel CPUs. A *scheduler* (often, the operating system) handles scheduling across multiple threads. Programs have a *main thread*, which is the thread that executes the main function/runs by default. The main thread can create new threads, which (appear to) run simultaneously with the main thread. # Promises While threads and processes are the core concepts at the operating system level, most modern programming languages introduce higher-level abstractions on top of the OS level. In both functional programming languages and popular imperative languages such as Javascript, a popular higher-level abstraction is a **promise** (also called a "future"). A promise/future is a type of computation that may not have finished yet: it *promises* to eventually produce a value of a specified type in the *future*, but the computation may not be complete yet. We can think of a promise as similar to a `ref` cell that can be mutated at most once. When a promise is created, it's like an empty cell that does not yet contain a value. Creating a promise creates the empty cell and kicks off the computation that will eventually fill the cell, but the sequential stream of execution continues even if the computation itself is not done. Eventually, at some point in the future, the promise can be *fulfilled* when the computation completes and places a value in the cell. Alternatively, if an error happens during the promise fulfillment, the cell could be filled with an exception instead of a normal value (which the OCaml book calls "rejected"). The book uses "resolved" for when the promise is either fulfilled or rejected. In either case, though, once the promise cell is filled, its contents cannot change (which is why we think of it as being mutated at most once). What ties promises to parallel/concurrent computation is that a single program can have multiple promises being fulfilled at once. Once a promise resolves, the code may kick off additional sequential or concurrent computations that depend on its value. This is the same core idea used by the popular `async`/`await` in modern Javascript! In Javascript, the `async` keyword causes a function to return a promise, and the `await` keyword that waits for a promise to finish computing before proceeding with execution. OCaml uses these same names for a similar effect. # Concurrency in OCaml The ideas above are shared across programming languages, but we can now also dig into the OCaml specifics. OCaml has several different libraries that provide concurrency/parallelism. Prior to OCaml 5, OCaml did not support true multicore parallelism. - The OCaml standard library provides the [`Threads` library](https://ocaml.org/manual/5.1/libthreads.html#:~:text=Only%20one%20thread%20at%20a,programs%20as%20several%20communicating%20processes.), which is quite low-level. - The [OCaml Programming: Correct + Efficient + Beautiful book](https://cs3110.github.io/textbook/chapters/ds/promises.html) uses the `lwt` library for promises and concurrency. - We'll use the new [`domainslib` library](https://ocaml.org/manual/5.3/parallelism.html), which is now recommended by the OCaml manual for higher-level parallelism. :::success Install `domainslib` with the following: ``` opam install domainslib ``` ::: # Parallelism with Domains A domain is an OCaml unit of parallelism that maps 1-to-1 with an operating system thread. The OCaml manual recommends that programs do not spawn more domains than the number of available cores. :::info Next class, we'll go over how to see how many cores a computer has. ::: We can spawn a new domain, and code called within that domain will be run _in parallel_ with the main domain. To use domains, ```ocaml #require "domainslib";; open Domain;; open Domainslib;; ``` To create a new domain, we use `Domain.spawn`. ```ocaml # Domain.spawn ;; - : (unit -> 'a) -> 'a t = <fun> ``` What is type `t`? A promise! A domain of type `'a t` eventually produces either a result of type `'a` (is fulfilled) or an exception (is rejected). Let's show this with some printlines: If we run just: ```ocaml Domain.spawn (fun () -> print_endline "printed from parallel domain") ; ``` This spawns a new *domain* (conceptually running in a new thread), and runs the body of the thunk from within that domain. Conceptually: ``` Main domain Spawned domain ----------- --------------- time ↓ ↓ spawn ─────────────────────> starts running concurrently ↓ ↓ ↓ print "printed from parallel domain" ↓ ↓ ``` :::success Why do we need to write `fun () ->` before the print line? ::: :::spoiler Think, then click! Because otherwise, OCaml's eager evaluation would evaluate the function argument to a value before ever passing it to the concurrent library call! (Think back to some of our `if` examples from the Racket part of the course). We need to wrap the print in a *thunk* to delay its evaluation (that is, to let the library call handle executing the print on a separate domain/thread). ::: When we run with two prints: ```ocaml Domain.spawn (fun _ -> print_endline "printed from parallel domain") ; print_endline "printed from main domain" ;; ``` We see that despite the order in the code, the main message sometimes prints first! This is because the main domain/thread by default **does not wait** for the spawned domain to resolve. The ordering of operations between concurrent domains/threads is **not specified** unless we explicitly force it to be: operations may be interleaved in any order (as long as sequential order **within** each domain/thread is respected). We could have either of these conceptual outcomes: ``` Main domain Spawned domain ----------- --------------- time ↓ ↓ spawn ─────────────────────> starts running concurrently ↓ ↓ ↓ ↓ ↓ print "printed from parallel domain" ↓ ↓ ↓ ↓ print "printed from main domain" ↓ ↓ ↓ ``` Or: ``` Main domain Spawned domain ----------- --------------- time ↓ ↓ spawn ─────────────────────> starts running concurrently ↓ ↓ ↓ ↓ print "printed from main domain" ↓ ↓ ↓ ↓ ↓ ↓ print "printed from parallel domain" ↓ ↓ ``` In general, code run from two concurrent domains or threads is *nondeterministic*: any possible order of events *between* domains is possible. We can change this by using other concurrency mechanisms to coordinate between concurrent executions. In this case, we can force a specific ordering by adding a `join`. ```ocaml= let task = Domain.spawn (fun _ -> print_endline "printed from parallel thread") in let _ = Domain.join task in print_endline "printed from main thread" ;; ``` `join` waits for the promise to resolve before it continues with sequential execution. Conceptually: ``` Main domain Spawned domain ----------- --------------- time ↓ ↓ spawn ─────────────────────> starts running concurrently ↓ ↓ ↓ ↓ join: wait until promise fulfilled ↓ ↓ ↓ print "printed from parallel domain" promise fulfilled, domain done join stops waiting! ↓ print "printed from main domain" ``` Here, because we call `join` from the main domain on the `task` promise before the second printline, this now forces the following order of prints: ``` printed from parallel thread printed from main thread ``` Because the main domain cannot hit its `print_endline` until the task domain has resolved! :::success Does this mean the code is no longer concurrent? ::: :::spoiler Think, then click! Not quite. There is still concurrency between the spawned domain code and whatever code happens in the main domain between line 1 and the `join` call being made. In this case, that is nothing observable, but there are still briefly two parallel domains running simultaneously. ::: :::success How does scope work with the spawned body? ::: :::spoiler Think, then click! We still have the same lexical scoping rules, with the same set of bindings that can reference the same data on the heap. ::: For example, the spawned domain can mutate a ref cell created by the main domain! On-the-fly example of this: ```ocaml utop # let x = ref 5 in let p = Domain.spawn (fun () -> x := 3) in print_endline (string_of_int !x); Domain.join p; print_endline (string_of_int !x) ;; 5 3 - : unit = () ``` In this case, the main domain creates a ref cell that initially holds `5`. Concurrently, (1) the spawned domain attempts to update the contents of the ref cell to `3`, and (2) the main domain accesses and prints the cell contents. (1) and (2) could happen in either order. The main domain then calls `join` on the spawned promise, meaning the second access-and-print from the main domain must show the updated value of `3`. We could also see the following output from the same program: ```ocaml utop # let x = ref 5 in let p = Domain.spawn (fun () -> x := 3) in print_endline (string_of_int !x); Domain.join p; print_endline (string_of_int !x) ;; 3 3 - : unit = () ``` Next class, we'll see how to use promises to parallelize computations!