# 2 Pthreads Programming ###### tags: `SS2021-IN2147-PP` ## Threading Basics ### What is a Thread? * Own PC * Own Stack #### Hardware threads * `Implementation` of an execution stream in hardware * `Separate Control Unit` #### Software threads * `Programming abstraction` that represents a stream of execution * Seen by programmer #### Hyperthreading / SMT * Useful for **`regular concurrent`** workloads * For parallel programming, this is **`problematic`** * Multiple instances of same program with `same resource requirements` * Heavy impact on speedup * Need to be careful on what to schedule on a HT/SMT * **`Still could be useful`** for background tasks * System daemons * I/O operations ### Software Threading Basics * **`Traditional`** view * OS maintain processes * `Processes` get scheduled to available `hardware threads` * Each process has `one execution stream` * Processes maintain **`isolation`** for protection * **`Separate`** address spaces and files * Coupled with user IDs * Communication only via IPC * Threading was intended to **`make this easier`** * **`Sharing of data`** without protection boundaries * Cooperative **`concurrency`** to support **`asynchronous`** behavior * OS still responsible for scheduling ### The Linux Clone call ```c int clone( int (*fn)(void *), void *child_stack, int flags, void *arg, ... /* pid_t *ptid, void *newtls, pid_t *ctid */ ); ``` * Shared call for process and thread creation * Threads and processes are scheduled by the same scheduler ### User-Level Threads * Using user-level library to implement threads * Maintain own `PC` and `stack` * Switch between threads as needed * `No kernel support` or modifications necessary * Advantages * **Easier to implement** and support * **Lighter weight** * Disadvantages * No scheduling guarantees * Preemption and progress is very hard to guarantee, if not impossible * Often **requires explicit yield()** calls required * Use cases * **Light-weight**, fine-grained task based parallel programming * Closely coordinated activities with **well-define switchpoints** ### Hybrid Models (e.g., Solaris) ![](https://i.imgur.com/egSNtPu.png) ### The Windows View of the World * A process has: * Memory space * File descriptors * Network interfaces * A process contains 1+ threads * Thread has: `Stack` and `PC` * Fibers * **User-level threads** * User scheduled and cannot be preempted * Hierarchical compared to main threads :::info Light weight process (LWP): * 介於 user-level thread 和 kernel-level thread 的資料結構,用來提供一個 virtual process 讓應用程式 thread 做排程,支援多個 User-level Thread 對應到一個 kernel thread,屬於 **kernel-level thread**。 ::: ### Recommended Reading * [OS - Ch4 多執行緒 Multithread Programming | Mr. Opengate](https://mropengate.blogspot.com/2015/01/operating-system-ch4-multithread.html) * [纖程 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/%E7%BA%96%E7%A8%8B) ## The POSIX Thread API ### How to Program with Threads? * **Common APIs** * POSIX Threads * Win32 Threads * Solaris Threads * Java Threads * **`user-level threads`** * **Custom APIs** * Many custom/research packages * Often mapped to native APIs * Often **`user-level threads`** * Lower overhead * for particular tasks * Custom hardware with special properties ### POSIX Threading * Portable Operating System Interface * POSIX threads: IEEE Std 1003.1c-1995 * Programming with pthreads ```c // On some systems compile with „-lpthreads“ #include <pthread.h> ``` ### Fork/Join Model ![](https://i.imgur.com/Y0xvoOR.png =200x) #### POSIX Thread Create ```c // Create a new Pthread // Returns 0 if successful int pthread_create( pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg ); ``` #### POSIX Thread Join ```c // Waits for another thread to terminate // Returns 0 if successful int pthread_join(pthread_t thread, void **retval); // Attributes: // Set of properties defining thread behavior // Examples: bound/unbound, scheduling policy, ... ``` #### POSIX Threads - Details ```c // Get own thread ID pthread_self() // Compare two thread IDs pthread_equal(t1,t2) // Run a particular function once in a process pthread_once(ctrl, fct) ``` ### StackStructure ![](https://i.imgur.com/OJTPZfg.png =400x) ## Thread Synchronization ### Synchronization Between Threads * Goal * Common completion of tasks * Happens-before relationships * Guard updates to common data structures * 2 concepts * Locks / Mutual Exclusion * Condition variables ### Concept 1: Mutual Exclusion #### POSIX Thread Locks ```c // type pthread_mutex_t // Initialization PTHREAD_MUTEX_INITIALIZER pthread_mutex_init/destroy() pthread_mutex_lock(&mutex); pthread_mutex_unlock(&mutex); pthread_mutex_trylock(&mutex); ``` #### Lock Implementations * Spin-Locks * Actively wait by “spinning” on a flag * Advantage: **`fast`** response time * Disadvantage: **`blocks the hardware threads`**, uses resources, costs energy * **Generally bad for concurrent executions** * Yielding Locks * Yield hardware thread * Advantage: **`low resource usage`** * Disadvantage: **`slow`** response time * **Generally bad for HPC** * With Hardware Support * Atomic memory operations * Test and set ```c volatile int *mutex; void lock() { while ( test_and_set(&mutex)==1 ); } ``` * Compare and swap * Without Hardware Support * Peterson’s Algorithm: ```c // Still assumes some atomicity and memory consistency volatile int flag intent_mutex[2] = {0,0}; volatile int turn; intent_mutex[tid]=1; /* thread wants to enter */ turn = 1-tid; /* other thread is next */ /* if lock is taken and not my turn, wait */ while (intent_mutex[1-tid] && turn==1-tid); /* critical section */ intent_mutex[tid]=0; ``` #### Lock Granularity * Coarse grained locking ![](https://i.imgur.com/TsfQfbR.png =300x) * Fine grained locking ![](https://i.imgur.com/7ylZJFU.png =300x) * **Hybrid versions** of Coarse and Fine often useful! #### Deadlocks * Cyclic dependency ![](https://i.imgur.com/UKVrSjA.png) * Strategies for Deadlock Avoidance * Easy option: only hold **one lock at a time** * One lock for whole table? * **Central arbiter** to ask for permission * Need central instance * Complex * **Order among all locks** and only allocate in that order * Not fairness (one philosopher may starve) * Solution * Custom schemes * Requesting forks from neighbors * Preference to starving processes ### Concept 2: Condition Variables #### Condition Variables in the POSIX Thread API ```c // type pthread_cond_t // Initialization PTHREAD_COND_INITIALIZER pthread_cond_init/destroy() // Mutex must be locked before calling pthread_cond_wait(&condition,&mutex); // Must be followed by an unlock pthread_cond_signal(&mutex); pthread_cond_broadcast(&mutex); ``` ### Performance Aspects for Threading #### Overheads * **Thread** creation and destruction: **expensive** * Ensure large parallel regions * Or “park” threads: threads pool #### Lock contention * low overhead: few threads try access the lock * Overhead grows: many threads access the lock #### Thread pinning (固定) * The OS does scheduling * Mapping of SW to HW thread * Determines the location of a thread’s execution * **Large impact on performance** * NUMA properties * Pinning, fixing a SW thread to a (group of) HW thread(s) * Thread attributes, libraries like `libNUMA` (see `man numa`) #### Good use of caches, avoidance of false sharing * False Sharing: * The parallel data is on the same cache line. * Though different threads do not modify the same data, the cache still needs synchronization due to the modification of the the same cache line.