1 Introduction

# 1 Introduction ###### tags: `SS2021-IN2147-PP` ## Concurrency vs Parallelsm ![](https://i.imgur.com/HYiY3zB.png) ## Goals of Parallel Systems * Reduction of **execution time** * Increase in **problem size** * Increased **extensibility** and **configurability** * Possibly better **fault tolerance** ## Trend towards Multi-core Chips ![](https://i.imgur.com/opv8PY4.png) ## Performance Goal * **Speedup** $$ speedup(p\ processors) = \dfrac{performance(p\ processors)}{performance(1\ processors)} $$ * **Scientific computing**: performance=work/time $$ speedup(p\ processors) = \dfrac{time(1\ processors)}{time(p\ processors)} $$ * **Speedup based on Throughput**: performance = throughput = transactions / minute $$ speedup(p\ processors) = \dfrac{tpm(p\ processors)}{tpm(1\ processors)} $$ * **Parallel Efficiency** $$ efficiency(p\ processors) = \dfrac{speedup(p\ processors)}{p} $$ * **Amdahl’s Law** $$ T(p) = (1-f) * T + \dfrac{f*T}{p} \\ SU(p) = \dfrac{T}{T(p)} = \dfrac{T}{(1-f) * T + \frac{f*T}{p}} = \dfrac{1}{1-f+\frac{f}{p}} $$ ## Parallel Programming Schemes ### Multiple Concurrent Executions aka. Single Program Multiple Data (SPMD) ![](https://i.imgur.com/voF8fqT.png) * 優： * **limited orchestration overhead** * explicit mapping of problem * 缺： * need to explicitly split data * **possibility of load imbalance** ### Master/Worker ![](https://i.imgur.com/bUB3NKA.png) * 優： * simple approach * **easy load balancing** * 缺： * extra orchestration **overhead** * memory requirements (at master) * possibly more communication * scalability ### Pipelining ![](https://i.imgur.com/18zQ58D.png) * 優： * **use of specialized units** * 缺： * more communication * limited parallelism ### Arbitrary Task Dependencies ![](https://i.imgur.com/x0oXWnG.png) * 優： * **most parallelism exposed** * 缺： * **overhead** * complicated dependencies ### Data Parallelism vs. Functional Parallelism #### Data parallelism (e.g., SPMD) * same operations * tasks are of same length in most cases #### Functional parallelism (e.g., Pipelining) * different calculations * different functions or different code regions * parallelism is usually modest * tasks are of different length in most cases ## Higher Level Parallel Programming Models ### Automatic parallelization * The Dream of Parallel Computing ### Loop abstractions * **Decouple** loop body and loop **scheduling** ### Domain Specific Languages (DSLs) * Abstract data and code structures for **one domain** * Limited applicability * Specific frameworks ## Parallel Architectures ### Flynn’s Classification ![](https://i.imgur.com/EY02PKZ.png) ### SIMD Systems * Vector processing * GPGPU processing ### MIMD Systems #### Shared Memory - SM (multiprocessor) * Shared address space * Uniform Memory Access – UMA * Centralized shared memory * all processors have “same” latency * Non-uniform Memory Access Systems - NUMA * Local accesses much faster * COMA (Cacho Only) * NCC-NUMA (non cache coherent) * 用：POSIX threads, OpenMP * global address space * Communication * Synchronization #### Distributed Memory - DM (multicomputer) * Private physical address space * Messages * 用：MPI, PVM * no global address space * Independent nodes * messages via a network * synchronization ### Hybrid Programming * OpenMP + MPI * **Critical Step: Mapping** * Which task to execute where? * Increasing data locality * Minimizing communication * Resource contention * Memory usage