# General-Purpose Graphics Processor Architectures
###### tags: `GPUs`


[TOC]
## Hardware Structure
### Computation Accelerator
- No more free lunch
- transitor size ↘
- clock frequencies improve slowly as devices become smaller
- Find new way to accelerate
- minimizing data movement
- Introduct complex operations that perform multiple arithmetic operations
- avoiding accesses to large memory
- Accelerator Trade off
- flexibility vs efficiency
- Balance Point: GPU
- GPU is Turing Complete
### GPU Hardware basic
- CPU / GPU work together
- Why
- there is no good I/O direct to GPU.
- Program rely on OS on CPU
- API hides the complexity of transfer data from CPU to GPU
- CPU initiate computation, pass data to GPU
- Typical System
- Intergated CPU / GPU
- Share Single DRAM memory
- Low power
- System with Discrete GPU
- Transfer Data
- Bus (PCIE)
- Orchestrate data on CPU, then move to GPU
- Nvidia Unified Memory
- CPU DRAM memory
- DDR
- low latency access
- GPU device memory
- GDDR
- high throughput
- GPU Architecture
- A generic modern GPU architecture
- Core
- Executes SIMD program (kernel)
- Thousands of threads
- Thread
- communicate by scratchpad memory
- Synchronize by fast barrier operations.
## Programming Model
## Computing Core Architecture
## Memory System