# General-Purpose Graphics Processor Architectures ###### tags: `GPUs` ![](https://i.imgur.com/Tjzsdze.png =150x200) ![](https://i.imgur.com/fB7m1WH.png) [TOC] ## Hardware Structure ### Computation Accelerator - No more free lunch - transitor size ↘ - clock frequencies improve slowly as devices become smaller - Find new way to accelerate - minimizing data movement - Introduct complex operations that perform multiple arithmetic operations - avoiding accesses to large memory - Accelerator Trade off - flexibility vs efficiency - Balance Point: GPU - GPU is Turing Complete ### GPU Hardware basic - CPU / GPU work together - Why - there is no good I/O direct to GPU. - Program rely on OS on CPU - API hides the complexity of transfer data from CPU to GPU - CPU initiate computation, pass data to GPU - Typical System - Intergated CPU / GPU - Share Single DRAM memory - Low power - System with Discrete GPU - Transfer Data - Bus (PCIE) - Orchestrate data on CPU, then move to GPU - Nvidia Unified Memory - CPU DRAM memory - DDR - low latency access - GPU device memory - GDDR - high throughput - GPU Architecture - A generic modern GPU architecture - Core - Executes SIMD program (kernel) - Thousands of threads - Thread - communicate by scratchpad memory - Synchronize by fast barrier operations. ## Programming Model ## Computing Core Architecture ## Memory System