# Operating System Concepts, Ch. 4: Threads & Concurrency
# Overview
- **Thread**: A basic unit of CPU utilization, comprising a thread ID, a program counter, a register set, and a stack.
- Threads can be used for parallelizing intensive CPU workloads. Before threads, process-creation method was popular (huge overhead).
- There are several benefits to using threads,
- **Responsiveness**: A lengthy task does not block the process.
- **Resource Sharing**: Threads share many resoruces under a process by default.
- **Economy**: In general, threads consume less resources (overhead, memory, context-switch, ...) than process-based methods.
- **Scalability**: Multiple threads can utilize multiple cores.
## Multithreading Models
There are **user threads** and **kernel threads** in most contemporary OSs. Their relationship can be determined in one of the following ways.
- **Many-to-One**: Many user threads mapped to one kernel thread. Induces blockage, and mutually exclusive kernel access. Less systems continue to use this.
- **One-to-One**: Can utilize parallelism, but may induce burden on the system. Number of user threads may be limited.
- **Many-to-Many**: User threads are multiplexed on multiple kernel threads. More flexible than the previous two. More contemporary systems use one-to-one instead.
- **Two-Level**: A combination of many-to-many and one-to-one.
## Thread Libraries
- **Pthread**: A POSIX library that can be either user-level or kernel-level. To be precise, Pthread is an IEEE specification, not an implementation.
- **Windows**: A kernel-level library.
- **Java**: Typically implemented with system libraries (in JVMs).
There are two general strategies with threads,
- **Asynchronous**: The parent is allowed to resume concurrent execution. Typically, little data are shared.
- **Synchronous**: Parent only resumes control after all its children terminate. Typically involves significant data sharing.
## Implicit Threading
- The job of creating and managing hundreds, if not thousands of threads, can be trasferred to compilers and run-time libraries.
- Developers can annotate **tasks** that can be parallelized.
- **Thread Pool**: The Pool-of-Objects technique, used to alleviate the overhead and unboundedness of the multi-threaded web server model.
- Servicing with an existing thread is faster than a created one.
- The max number of threads is limited.
- The task itself, and its creation, are separated. For example, one can schedule a delay, or a repetition of a task.
- The number of pooled threads can be manually determined, or dynamically adjusted.
- Windows and Java provide thread pool utilities.
- Implicit **Fork-Join** model is also possible. The library decides how many threads to create. For example, Quicksort, Mergesort, or array summation.
- **OpenMP** identifies **parallel regions**, a block of code, and transform it into parallelism. Directives guide its way.
- **Grand Central Dispatch (GCD)** is an MacOS & iOS technique. Tasks are scheduled on a dispatch queue, and assigned to an available thread.
- **Serial Queue**: Every process has one, and can create extra ones. Tasks in this queue are serially executed.
- **Concurrent Queue**: Tasks can be executed in concurrency. System-wide queues.
- GCD thread pool is composed of POSIX threads. GCD dynamically, actively manages this pool.
- **Intel Threading Building Blocks (TBB)** is a template library in C++. Developers specify tasks that can run in parallel. Its scheduler provides load-balancing and cache-awareness.
## Thraeding Issues
- **Forking** a process may duplicate all threads, or only the thread that called it. Exec calls work similarly.
- **Signals** are used to notify processes of events, either synchronously (delivered to the culprit process) or asynchronously (otherwise).
- They should be handled by either a default signal handler, or a user-defined one.
- Which thread(s) should a signal be delivered to is a delicate problem, though it is trivial for synchronous cases.
- Threads can be **cancelled** (thread race, overriding operations, ...). They can be asynchronously (by other threads) cancelled, or cancel themselves in an orderly, deferred manner.
- When cancelled, the resources a thread has may or may not be reclaimed by the OS. There are mechanisms to compensate for that via programming. Asynchronous cancellation is not recommended in Pthread.
- Threads may have their own **local storage**, which may be global to that thread only. `gcc` provides **Thread Local Storage (TLS)** with the identifier `__thread`.
- In many-to-many and two-level schemes, the thread library and kernel may need to communicate (e.g. adjust the number of kernel thread dynamically).
- For that, an intermediate data structure **Lightweight Process (LWP)** is adopted. Each LWP is attached to a kernel thread. Kernel threads are actually scheduled to run on physical devices by the OS.
- To applications, LWPs are virtual processors that users can schedule their threads to run on.
- LWP blockage (by any user threads in it) affects other threads in the same LWP
- The number of non-blocking operations needed simultaneously is the number of LWPs needed.
- The kernel may notify applications about certain events with an **upcall** (e.g. when a thread is about to block). Thread libraries can provide **upcall handlers**.
## Operating System Examples
- **Windows**: Uses one-to-one scheme for user threads.
- **Linux**: Does not explicitly distinguish between processes and threads. The `clone` call can specify how much sharing happens between the parent and the created task (process).