DMA (Direct Memory Access) lets hardware copy data between memory and peripherals (or memory↔memory) without the CPU moving each byte. A small engine—the DMA controller—becomes a bus master, reads a descriptor (source, destination, length, options), moves the bytes in bursts, then raises an interrupt when done (or on error).

**Why use DMA?**
* Offloads the CPU: frees cycles for real work or lets it sleep.
* Higher throughput & lower latency jitter: burst transfers at bus speed.
* Energy efficient: fewer interrupts and context switches for large/continuous data.
**How it works (typical flow)**
1. CPU sets up a descriptor: source addr, dest addr, transfer size, increment modes, burst size, peripheral request, etc.
2. CPU starts the DMA channel.
3. Peripheral or timer triggers the DMA (e.g., UART RX not-empty, ADC end-of-conversion) or it runs immediately for mem↔mem.
4. DMA reads/writes the bus directly in bursts, updating addresses and counters.
5. On completion/half-transfer/error, DMA raises an interrupt; the ISR handles bookkeeping (e.g., advance a ring buffer).
**Common transfer types**
* Peripheral→Memory: e.g., ADC samples into a RAM buffer; UART RX into ring buffer.
* Memory→Peripheral: e.g., SPI TX from a frame buffer; DAC playback.
* Memory→Memory: fast memcpy/fill for large blocks (some [MCUs](https://www.onzuu.com/category/microcontrollers)/SOCs).
Options you’ll see:
* Increment/Fixed address per side (peripherals often fixed).
* Burst size (beats per burst).
* Circular (ring) mode for continuous capture/playback.
* Scatter–gather (linked descriptors) for noncontiguous buffers—common in NICs, SD/PCIe, AXI DMA.
**Microcontrollers vs. SoCs/FPGA**
* [MCUs](https://www.ampheo.com/c/microcontrollers) (e.g., [STM32](https://www.ampheo.com/search/STM32), [NXP](https://www.ampheo.com/manufacturer/nxp-semiconductors)): a central DMA/DMAMUX services on-chip buses (AHB/APB). You select a channel/stream and a request (UART/ADC/SPI).
* Linux/[SoC](https://www.ampheo.com/c/system-on-chip-soc) world: devices contain their own DMA engines; drivers program descriptor rings. MSI/MSI-X interrupts signal completion.
* [FPGA](https://www.ampheo.com/c/fpgas-field-programmable-gate-array) (AXI systems): use an AXI DMA/VDMA (memory-mapped↔AXI-Stream) with descriptors in DDR; great for high-rate ADC/[DSP](https://www.ampheo.com/c/dsp-digital-signal-processors)/video pipelines.
**Minimal MCU example (UART RX to circular buffer)**
Conceptual steps:
1. Allocate rx_buf[N].
2. Configure DMA: peripheral=UART_DR, memory=rx_buf, dir=Periph→Mem, circular, mem increment, half/full complete IRQs.
3. In DMA ISR: if half or full flag, process that half of the buffer; clear flags.
4. Main loop stays free; no per-byte interrupts.
This pattern scales to ADC continuous sampling, I²S audio, SPI camera streams, etc.
**Performance & correctness tips**
* Choose burst/beat size that matches bus/peripheral FIFO width (e.g., 4- or 8-beat bursts on 32-bit bus).
* Arbitration & priorities: heavy DMA can starve CPU or other DMA; tune priorities.
* Cache coherency ([SOCs](https://www.ampheoelec.de/c/system-on-chip-soc)/CPUs with caches):
* Device→RAM: invalidate cache before reading the buffer.
* RAM→Device: clean/flush cache before starting the DMA.
* Prefer noncacheable/“coherent” buffers if available (e.g., dma_alloc_coherent, ACP/ACE on ARM).
* Alignment: align buffers to bus width/line (often 4/8/32/64 bytes).
* Interrupt strategy: use half-transfer for steady processing; avoid tiny fragments.
* Error handling: handle FIFO overruns, bus faults, and transfer-abort paths.
**When to use (and not)**
* Use DMA for large or continuous transfers (audio, video, [sensors](https://www.ampheo.com/c/sensors), storage, comms).
* Prefer interrupts/polling for tiny, sporadic transfers—DMA setup overhead can dominate.
**Typical pitfalls**
* Forgetting to enable the peripheral’s DMA request or wrong channel mapping.
* Cache bugs (stale/dirty data) on cached CPUs.
* Mis-sized burst/beat causing FIFO underruns/overruns.
* Reading/writing the buffer in the CPU while DMA is modifying it—use double buffers or producer/consumer indices.
In one line: DMA is a bus-master copy engine that moves data between memory and devices autonomously; you configure descriptors, let it stream in the background, and handle completion with lightweight interrupts.