Operating System Concepts, Ch. 4: Threads & Concurrency

# Operating System Concepts, Ch. 4: Threads & Concurrency # Overview - **Thread**: A basic unit of CPU utilization, comprising a thread ID, a program counter, a register set, and a stack. - Threads can be used for parallelizing intensive CPU workloads. Before threads, process-creation method was popular (huge overhead). - There are several benefits to using threads, - **Responsiveness**: A lengthy task does not block the process. - **Resource Sharing**: Threads share many resoruces under a process by default. - **Economy**: In general, threads consume less resources (overhead, memory, context-switch, ...) than process-based methods. - **Scalability**: Multiple threads can utilize multiple cores. ## Multithreading Models There are **user threads** and **kernel threads** in most contemporary OSs. Their relationship can be determined in one of the following ways. - **Many-to-One**: Many user threads mapped to one kernel thread. Induces blockage, and mutually exclusive kernel access. Less systems continue to use this. - **One-to-One**: Can utilize parallelism, but may induce burden on the system. Number of user threads may be limited. - **Many-to-Many**: User threads are multiplexed on multiple kernel threads. More flexible than the previous two. More contemporary systems use one-to-one instead. - **Two-Level**: A combination of many-to-many and one-to-one. ## Thread Libraries - **Pthread**: A POSIX library that can be either user-level or kernel-level. To be precise, Pthread is an IEEE specification, not an implementation. - **Windows**: A kernel-level library. - **Java**: Typically implemented with system libraries (in JVMs). There are two general strategies with threads, - **Asynchronous**: The parent is allowed to resume concurrent execution. Typically, little data are shared. - **Synchronous**: Parent only resumes control after all its children terminate. Typically involves significant data sharing. ## Implicit Threading - The job of creating and managing hundreds, if not thousands of threads, can be trasferred to compilers and run-time libraries. - Developers can annotate **tasks** that can be parallelized. - **Thread Pool**: The Pool-of-Objects technique, used to alleviate the overhead and unboundedness of the multi-threaded web server model. - Servicing with an existing thread is faster than a created one. - The max number of threads is limited. - The task itself, and its creation, are separated. For example, one can schedule a delay, or a repetition of a task. - The number of pooled threads can be manually determined, or dynamically adjusted. - Windows and Java provide thread pool utilities. - Implicit **Fork-Join** model is also possible. The library decides how many threads to create. For example, Quicksort, Mergesort, or array summation. - **OpenMP** identifies **parallel regions**, a block of code, and transform it into parallelism. Directives guide its way. - **Grand Central Dispatch (GCD)** is an MacOS & iOS technique. Tasks are scheduled on a dispatch queue, and assigned to an available thread. - **Serial Queue**: Every process has one, and can create extra ones. Tasks in this queue are serially executed. - **Concurrent Queue**: Tasks can be executed in concurrency. System-wide queues. - GCD thread pool is composed of POSIX threads. GCD dynamically, actively manages this pool. - **Intel Threading Building Blocks (TBB)** is a template library in C++. Developers specify tasks that can run in parallel. Its scheduler provides load-balancing and cache-awareness. ## Thraeding Issues - **Forking** a process may duplicate all threads, or only the thread that called it. Exec calls work similarly. - **Signals** are used to notify processes of events, either synchronously (delivered to the culprit process) or asynchronously (otherwise). - They should be handled by either a default signal handler, or a user-defined one. - Which thread(s) should a signal be delivered to is a delicate problem, though it is trivial for synchronous cases. - Threads can be **cancelled** (thread race, overriding operations, ...). They can be asynchronously (by other threads) cancelled, or cancel themselves in an orderly, deferred manner. - When cancelled, the resources a thread has may or may not be reclaimed by the OS. There are mechanisms to compensate for that via programming. Asynchronous cancellation is not recommended in Pthread. - Threads may have their own **local storage**, which may be global to that thread only. `gcc` provides **Thread Local Storage (TLS)** with the identifier `__thread`. - In many-to-many and two-level schemes, the thread library and kernel may need to communicate (e.g. adjust the number of kernel thread dynamically). - For that, an intermediate data structure **Lightweight Process (LWP)** is adopted. Each LWP is attached to a kernel thread. Kernel threads are actually scheduled to run on physical devices by the OS. - To applications, LWPs are virtual processors that users can schedule their threads to run on. - LWP blockage (by any user threads in it) affects other threads in the same LWP - The number of non-blocking operations needed simultaneously is the number of LWPs needed. - The kernel may notify applications about certain events with an **upcall** (e.g. when a thread is about to block). Thread libraries can provide **upcall handlers**. ## Operating System Examples - **Windows**: Uses one-to-one scheme for user threads. - **Linux**: Does not explicitly distinguish between processes and threads. The `clone` call can specify how much sharing happens between the parent and the created task (process).