# Assignment 8: Parallelism & Concurrency This is a **pair or solo Gradescope assignment**—you will submit a README and the following OCaml files: - `sort.ml` - `sortfromfile.ml` - `sorttest.ml` - `bank.ml` :::info Recall that assignments, if completed as a pair, must be completed **together** rather than dividing work. See the syllabus for more. ::: # AI Policy For the ==programming== on this assignment, you may use a wide range of resources, including generative AI if you so desire, subject to the following restrictions: 1. You should not submit any code that you do not understand. I may ask to have an in-person grading meeting to discuss your submission. 2. You must describe in detail in your README which parts of your submission used which resources, including AI. 3. For any ==written== questions, text must be entirely in your own words. You should check information you cite against authoritative sources (e.g., the language documentation or tutorials). # Summary This assignment has 3 parts, and an optional extra credit section at the end. - Part 1: Use `domainslib` tasks and `async`/`await` to implement a parallel sorting algorithm. - Part 2: Concurrency and mutual exclusion for a bank model. - Part 3: Compare and contrast related constructs in different languages. - (Optional, extra credit) Read a seminal research paper on language support for threads. ### Change log I will avoid making changes to the assignment after it is posted, but any necessary changes will be added here: # Setup :::success Install Domainslib with: ``` opam install domainslib ``` ::: The [stencil code](https://drive.google.com/file/d/1sSVTRIUis012etpYw3muKWFZbiwBC-94/view?usp=sharing) is set up as a `dune` OCaml project. # Part 1: Parallel sort :::success Your task in this part is to implement and evaluate a parallel sorting implementation on OCaml records. ::: :::warning For this problem, you should not use `List.sort`: you must implement a sorting algorithm yourself. You may otherwise use any standard OCaml library functions, and may look up external sources as long as you cite them. ::: You are provided with a file of records, with 1 record per line. A record includes a name (a string of lowercase letters) and an ID (an integer) separated by a comma. An example line would be `alexa,42`. Your sorting function should use `String.compare` on record names as the primary key for comparing records, with IDs as the secondary key (for tie-breaking). Choose one of the following 2 sorting algorithms to implement (or message me on Zulip to propose an alternative sorting algorithm): 1. [Mergesort](https://en.wikipedia.org/wiki/Merge_sort): you have already implemented in Racket in Assignment 1. 2. [Quicksort](https://en.wikipedia.org/wiki/Quicksort): more difficult to parallelize, but interesting to try if you have never implemented it. In practice, it's typically the fastest sorting algorithm for randomized data, and is the basis for most library sorting implementations. The stencil code is set up to create a `sortfromfile.exe` program executable that reads in a file representing a list of records. The `sortfromfile.exe` program executable takes 3 arguments: 1. The [path](https://en.wikipedia.org/wiki/Path_(computing)) to file of records, where each line of the file takes the form `<string>,<int>` (for example `alexa,42`). 2. The number of domains `k` to use for sorting (an integer). 3. The minimum size of a list to parallelize (an integer). ``` Usage: sortfromfile.exe <file> <k> <min_par_size> ``` :::info Dune provides a single command, `exec`, that builds and then runs an executable. `dune exec` takes the executable name, then the characters `--`, then any command line arguments to the executable. For example, to run the stencil code from the command line with 1 thread and a minimum parallel size of 1, run: ``` $ dune exec ./sortfromfile.exe -- 10_000_records.txt 1 1 ``` Note that with the initial stencil code, this is expected to fail with: ``` Fatal error: exception Failure("line_to_record: not implemented") ``` ::: :::success First, complete the code to read in records from a file (and print results to standard out) in `sortfromfile.ml`. Raise `InvalidInput` when needed (e.g., if a line cannot be parsed as `<string>,<int>`). Then, implement a sequential version of your chosen algorithm in `sort.ml`. The sequential version should ignore the later parameters. Add several additional smaller, manual tests of `sort` in `sorttest.ml`. Test with `dune test`. Once you are passing smaller tests, run your sequential code on the 4 provided `_records.txt` files. ::: Next, parallelize your code! :::success In `sort.ml`: - Use `Domainslib` tasks and `async`/`await` to parallelize across the provided number of threads, `k`. - As with the `fib` example from class, you should not attempt to parallelize subproblems if the size is under some limit, here called `min_par_size`. For initial testing, I suggest setting `min_par_size` to just 2. ::: Finally, do some informal measurements of performance. You can use either the `time` function from the class notes, the Unix [`time` utility](https://en.wikipedia.org/wiki/Time_(Unix)) on MacOS/Linux, or [Measure-Command](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/measure-command?view=powershell-7.5) on Windows. :::warning Note that typically, for rigorous performance measurements, we'd want to run the same computation many times and take an average. This is a good best practice for actual experiments, to account for variability in timing, noise, etc. For the purpose of this assignment, just running a few times and choosing one of them is sufficient to get a rough estimate. ::: For the `time` Mac/Linux utility, you'll want to use the "real" entry, which corresponds to the "wall clock" time of how much time passed. The `time` function from class and `Measure-Command` both report real/wall clock time already. For performance measurement, raise the minimum parallel size (start with 100, then experiment to adjust this). :::info Note: for performance testing, do not include the time to print records in your measurements (you can either temporarily comment it out, or add logic to disable it based on a boolean argument). ::: :::success **Written Q1**: Experiment with different options for `k` for the largest file. List the runtimes for at least 4 different values of `k` (and list the`min_par_size` you used). On your computer, which option for `k` provides the fastest runtime? **Written Q2**: Fix `k` at the time you found in **Q1**. Measure the runtime for each of the provided test files. Do you expect the trend between runtime and input size to be linear? Why or why not? **Written Q3**: Explain how you decided where to insert calls to `await`. Justify your choices. ::: --- # Part 2: Concurrent bank The file `bank.ml` contains a simple model for a bank with accounts. A `bank` record contains a mutable list of accounts. Each `account` contains a `cell`, which implements a mutable balance for that account. The provided code is currently **unsafe**: it does not correctly coordinate access to shared resources across multiple spawned domains. Initially, the bank only contains a single account. The function `stress_test` runs multiple `deposit` and `withdrawal` calls on the single account. Your first task is to demonstrate that the current implementation is broken. The key issue is that the `deposit` and `withdraw` operations are not *atomic*. An operation is *atomic* if no other parallel execution should interleave with its intermediate steps. Clients expect that a `deposit` occurs as a single atomic event that—*all at once*—reads the current balance, adds to it, and updates it. However, the `UnsafeAccount` implements `deposit`/`withdraw` using distinct `cell_get` and `cell_put` operations on the underlying `cell`. If multiple client threads are interacting with the same account, the underlying `cell` operations by one client threads might happen in between the underlying `cell` of another client thread. Modify the arguments to the `stress_test` function to show this issue resulting in an incorrect balance calculation. Due to nondeterminism in the program, you may need to run a few times to find an incorrect result. :::info Run the code from the command line by running: ``` dune exec ./bank.exe ``` ::: :::success **Written Q4**: In your README, describe the arguments you changed and why the current implementation produces an incorrect balance calculation. Provide a diagram of which lines of code could execute over time to cause a problem (with time flowing downward). For example, a start might be: ``` Domain 1 Domain 2 line 31: cell_get line 31: cell_get line 32: + ``` ::: Next, add logic to **safely** allow multiple domains/threads to withdraw or deposit from an account. You may edit the record definitions. :::success Edit the `account` record and the code that uses it such that there is still as much parallel logic as possible, but with no risk of an incorrect balance. ::: Finally, the bank record currently has code to add and delete account, but no concurrency logic. To check if your new logic works, you'll want to add additional stress testing. :::success First, add a new function, `stress_test_accounts`, and call it in addition to the existing stress test. The goal is to show that without adding coordination logic to `bank`, parallel access is again unsafe. To provide an interesting test, `stress_test_accounts` will need to be able to call `delete` on accounts that already exist. The exact strategy for doing this is up to you, but you may want to initialize the bank with a list of known accounts before kicking off parallel tasks. ::: Next, your goal is to allow multiple domains to safely add and remove accounts concurrently (again, minimizing the amount of logic that must happen sequentially). :::success Edit the `bank` record and the code that uses it to allow multiple threads to add and delete accounts. Be sure to allow as much logic as possible to run in parallel. **Written Q5**: Using any online resource(s) of your choosing, look up the difference between "coarse-grained" and "fine-grained" locking. Which is your code using here? ::: --- # Part 3: Language constructs :::info Provide at least a few sentences for each question. Cite every source you referenced below each answer in your README. ::: Using any online resource(s) of your choosing, read about the `async`/`await` constructs in Javascript. :::success **Written Q6**: How are `async`/`await` in Javascript similar to the constructs with the same name in OCaml? **Written Q7**: How are `async`/`await` in Javascript different than the constructs with the same name in OCaml? ::: Using any online resource(s) of your choosing, read about the `synchronized` keyword in Java. :::success **Written Q8**: Explain the Java `synchronized` keyword in your own words. Relate it to the constructs we learned in class. Would you consider it higher-level or lower-level than the OCaml constructs we used? Why? ::: # Extra credit: Threads paper *Up to +5 extra credit*. Read the seminal 2005 research paper [*Threads cannot be implemented as a library*](https://cs.nyu.edu/~mwalfish/classes/14fa/ref/boehm05threads.pdf). - Summarize the main point of this paper (in your own words) in one paragraph. - `domainslib` is a "library" that provides thread-like abstractions for OCaml. Is this a contraction to the research paper? - What constraints must programming language designers keep in mind when considering support for parallelism/concurrency? # Submission Submit the following **5 files** on **Gradescope**: :::success - `sort.ml` - `sortfromfile.ml` - `sorttest.ml` - `bank.ml` - `README` as either a `txt`, `md`, or `pdf` file, with **Written Questions** and - Roughly how long did this assignment take you (in hours)? - Which was the most challenging part of this assignment? - Which resources did you find to be the most useful? :::