owned this note
owned this note
Published
Linked with GitHub
# 2 Pthreads Programming
###### tags: `SS2021-IN2147-PP`
## Threading Basics
### What is a Thread?
* Own PC
* Own Stack
#### Hardware threads
* `Implementation` of an execution stream in hardware
* `Separate Control Unit`
#### Software threads
* `Programming abstraction` that represents a stream of execution
* Seen by programmer
#### Hyperthreading / SMT
* Useful for **`regular concurrent`** workloads
* For parallel programming, this is **`problematic`**
* Multiple instances of same program with `same resource requirements`
* Heavy impact on speedup
* Need to be careful on what to schedule on a HT/SMT
* **`Still could be useful`** for background tasks
* System daemons
* I/O operations
### Software Threading Basics
* **`Traditional`** view
* OS maintain processes
* `Processes` get scheduled to available `hardware threads`
* Each process has `one execution stream`
* Processes maintain **`isolation`** for protection
* **`Separate`** address spaces and files
* Coupled with user IDs
* Communication only via IPC
* Threading was intended to **`make this easier`**
* **`Sharing of data`** without protection boundaries
* Cooperative **`concurrency`** to support **`asynchronous`** behavior
* OS still responsible for scheduling
### The Linux Clone call
```c
int clone(
int (*fn)(void *),
void *child_stack,
int flags,
void *arg,
... /* pid_t *ptid, void *newtls, pid_t *ctid */
);
```
* Shared call for process and thread creation
* Threads and processes are scheduled by the same scheduler
### User-Level Threads
* Using user-level library to implement threads
* Maintain own `PC` and `stack`
* Switch between threads as needed
* `No kernel support` or modifications necessary
* Advantages
* **Easier to implement** and support
* **Lighter weight**
* Disadvantages
* No scheduling guarantees
* Preemption and progress is very hard to guarantee, if not impossible
* Often **requires explicit yield()** calls required
* Use cases
* **Light-weight**, fine-grained task based parallel programming
* Closely coordinated activities with **well-define switchpoints**
### Hybrid Models (e.g., Solaris)

### The Windows View of the World
* A process has:
* Memory space
* File descriptors
* Network interfaces
* A process contains 1+ threads
* Thread has: `Stack` and `PC`
* Fibers
* **User-level threads**
* User scheduled and cannot be preempted
* Hierarchical compared to main threads
:::info
Light weight process (LWP):
* 介於 user-level thread 和 kernel-level thread 的資料結構,用來提供一個 virtual process 讓應用程式 thread 做排程,支援多個 User-level Thread 對應到一個 kernel thread,屬於 **kernel-level thread**。
:::
### Recommended Reading
* [OS - Ch4 多執行緒 Multithread Programming | Mr. Opengate](https://mropengate.blogspot.com/2015/01/operating-system-ch4-multithread.html)
* [纖程 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/%E7%BA%96%E7%A8%8B)
## The POSIX Thread API
### How to Program with Threads?
* **Common APIs**
* POSIX Threads
* Win32 Threads
* Solaris Threads
* Java Threads
* **`user-level threads`**
* **Custom APIs**
* Many custom/research packages
* Often mapped to native APIs
* Often **`user-level threads`**
* Lower overhead
* for particular tasks
* Custom hardware with special properties
### POSIX Threading
* Portable Operating System Interface
* POSIX threads: IEEE Std 1003.1c-1995
* Programming with pthreads
```c
// On some systems compile with „-lpthreads“
#include <pthread.h>
```
### Fork/Join Model

#### POSIX Thread Create
```c
// Create a new Pthread
// Returns 0 if successful
int pthread_create(
pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine) (void *),
void *arg
);
```
#### POSIX Thread Join
```c
// Waits for another thread to terminate
// Returns 0 if successful
int pthread_join(pthread_t thread, void **retval);
// Attributes:
// Set of properties defining thread behavior
// Examples: bound/unbound, scheduling policy, ...
```
#### POSIX Threads - Details
```c
// Get own thread ID
pthread_self()
// Compare two thread IDs
pthread_equal(t1,t2)
// Run a particular function once in a process
pthread_once(ctrl, fct)
```
### StackStructure

## Thread Synchronization
### Synchronization Between Threads
* Goal
* Common completion of tasks
* Happens-before relationships
* Guard updates to common data structures
* 2 concepts
* Locks / Mutual Exclusion
* Condition variables
### Concept 1: Mutual Exclusion
#### POSIX Thread Locks
```c
// type
pthread_mutex_t
// Initialization
PTHREAD_MUTEX_INITIALIZER
pthread_mutex_init/destroy()
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
pthread_mutex_trylock(&mutex);
```
#### Lock Implementations
* Spin-Locks
* Actively wait by “spinning” on a flag
* Advantage: **`fast`** response time
* Disadvantage: **`blocks the hardware threads`**, uses resources, costs energy
* **Generally bad for concurrent executions**
* Yielding Locks
* Yield hardware thread
* Advantage: **`low resource usage`**
* Disadvantage: **`slow`** response time
* **Generally bad for HPC**
* With Hardware Support
* Atomic memory operations
* Test and set
```c
volatile int *mutex;
void lock() {
while ( test_and_set(&mutex)==1 );
}
```
* Compare and swap
* Without Hardware Support
* Peterson’s Algorithm:
```c
// Still assumes some atomicity and memory consistency
volatile int flag intent_mutex[2] = {0,0};
volatile int turn;
intent_mutex[tid]=1; /* thread wants to enter */
turn = 1-tid; /* other thread is next */
/* if lock is taken and not my turn, wait */
while (intent_mutex[1-tid] && turn==1-tid);
/* critical section */
intent_mutex[tid]=0;
```
#### Lock Granularity
* Coarse grained locking

* Fine grained locking

* **Hybrid versions** of Coarse and Fine often useful!
#### Deadlocks
* Cyclic dependency

* Strategies for Deadlock Avoidance
* Easy option: only hold **one lock at a time**
* One lock for whole table?
* **Central arbiter** to ask for permission
* Need central instance
* Complex
* **Order among all locks** and only allocate in that order
* Not fairness (one philosopher may starve)
* Solution
* Custom schemes
* Requesting forks from neighbors
* Preference to starving processes
### Concept 2: Condition Variables
#### Condition Variables in the POSIX Thread API
```c
// type
pthread_cond_t
// Initialization
PTHREAD_COND_INITIALIZER
pthread_cond_init/destroy()
// Mutex must be locked before calling
pthread_cond_wait(&condition,&mutex);
// Must be followed by an unlock
pthread_cond_signal(&mutex);
pthread_cond_broadcast(&mutex);
```
### Performance Aspects for Threading
#### Overheads
* **Thread** creation and destruction: **expensive**
* Ensure large parallel regions
* Or “park” threads: threads pool
#### Lock contention
* low overhead: few threads try access the lock
* Overhead grows: many threads access the lock
#### Thread pinning (固定)
* The OS does scheduling
* Mapping of SW to HW thread
* Determines the location of a thread’s execution
* **Large impact on performance**
* NUMA properties
* Pinning, fixing a SW thread to a (group of) HW thread(s)
* Thread attributes, libraries like `libNUMA` (see `man numa`)
#### Good use of caches, avoidance of false sharing
* False Sharing:
* The parallel data is on the same cache line.
* Though different threads do not modify the same data, the cache still needs synchronization due to the modification of the the same cache line.