Here’s the fast, practical way to understand (and predict) [FPGA](https://www.ampheo.com/c/fpgas-field-programmable-gate-array) resources from RTL—what maps to LUT/FF/BRAM/[DSP](https://www.ampheo.com/c/dsp-digital-signal-processors), and how to verify it in the tools without doing a full place-and-route.

**1) The only numbers that matter: run synthesis-only**
You don’t need implementation to see resources.
**[AMD](https://www.ampheo.com/manufacturer/amd)/[Xilinx](https://www.vemeko.com/product/#xilinx) (Vivado):**
```
read_verilog [glob src/**/*.v]
synth_design -top <top>
report_utilization -hierarchical -file util.rpt
open_run synth_1
report_utilization -hierarchical
```
Useful GUIs: Open Elaborated Design (quick structure view), Open Synthesized Design → Utilization by Hierarchy, Technology Schematic (carry chains, SRLs, LUTRAM, DSP, BRAM).
**[Intel](https://www.ampheo.com/manufacturer/intel) (Quartus):**
* Run Analysis & Synthesis only.
* Open Compilation Report → Resource Utilization by Entity / Flow Summary.
**[Lattice](https://www.ampheo.com/manufacturer/lattice-semiconductor) (Radiant/iCEcube2/Trellis):**
* Run synthesis (LSE/Synplify). Open the Mapping/Utilization report per module.
Look at the hierarchical breakdown to see which RTL blocks are expensive.
**2) “RTL → resource” mental map (cheatsheet)**
**Combinational logic**
* assign, case, boolean ops → LUTs. Wider equations use more LUTs and may add levels (timing).
* Wide muxes: w-bit, N-to-1 mux ≈ scales with w * log_K(N) LUTs (K = LUT inputs; 6 on many devices).
**Registers / pipelines**
always_ff @(posedge clk) flops → FFs. Enable/reset add control inputs (no extra LUTs unless gated).
**Add/Sub/Compare**
+, -, comparators map to carry chains + LUTs; roughly ~1 LUT/bit plus dedicated carry wiring.
**Multipliers / MAC**
* (and a*b + c) → DSP blocks if widths fit the device’s DSP size. Otherwise split across multiple DSPs or fall back to LUTs if disabled.
**Memories**
reg [W-1:0] mem [0:D-1];
* Single/dual-port, synchronous read/write → Block RAM (BRAM/URAM/M20K).
* Asynchronous read, many write enables, tiny depth → often distributed RAM (LUTRAM) or FFs.
ROMs (case tables or initialized arrays) → BRAM if large/regular; else LUTs.
**Shift registers / FIFOs**
Long shift chains with enables → SRL/SRL16/SRL32 (LUT-based shifters) on Xilinx; otherwise FF chains or small BRAM FIFOs.
**State machines**
One-hot vs binary encoding changes FF count but LUTs depend on transition logic. Synthesis usually picks encoding for timing.
**3) Nudge the mapper (portable hints)**
Prefer dedicated hardware
* Force DSP use for mults/MACs (if your tool supports):
(* use_dsp = "yes" *) (Xilinx), or vendor attributes (ramstyle, multstyle) in Intel/Lattice.
* Pick RAM type:
Xilinx: (* ram_style = "block" | "distributed" | "ultra" *)
Intel: // synthesis ramstyle = "M20K" | "MLAB" or (* ramstyle = "M20K" *)
Enable SRLs (Xilinx): keep simple shift patterns; avoid async reset on every tap.
Avoid accidental LUT RAM: synchronous, single-clock RAM with clean write enables maps to BRAM.
**4) Quick estimation rules (sanity checks)**
* Adder/subtractor: ≈ 1 LUT/bit + carry chain (timing-friendly).
* Multiplier: use 1 DSP if within native width (e.g., 18×25, 27×18, etc.); larger → multiple DSPs.
* Block RAM need: bits = W × D. Compare to on-chip block size (e.g., 18 k/36 k/20 k). Add 10–20% headroom for control logic.
* SRL vs FF: long, simple shifts (no async reset per stage) → SRL; else FFs.
* Mux farms / buses: each extra input roughly adds LUT levels—pipeline wide mux trees.
**5) Verify by hierarchy (what to click)**
* Vivado: after synthesis → Report Utilization → Hierarchical, sort by LUT/BRAM/DSP; double-click a heavy block → Schematic to see if that multiplier became a [DSP](https://www.onzuu.com/category/dsp), if memories are BRAM, etc.
* Quartus: Resource Utilization by Entity and Inferred RAM/ROM summary; open Technology Map Viewer (post-map) to see DSP/BRAM placement.
**6) Common pitfalls that skew resources**
* Async reads/writes or two writes/clock → can break BRAM inference → LUT/FF explosion.
* Resets on every stage of long shifters → prevents SRL mapping.
* Disabled [DSP](https://www.ampheoelec.de/c/dsp-digital-signal-processors) inference (global setting or attribute) → multipliers burn LUTs.
* Tiny memories sprinkled everywhere → many LUTRAMs; consider packing into BRAM with a shared controller.
* Over-wide buses without pipelining → deep LUT levels and timing pressure (and sometimes more duplication).
**7) Minimal experiment pattern**
Create a tiny, isolated module for the structure you care about (e.g., a 64×16 dual-port RAM, or a 24×24 multiplier with pipeline), synthesize out-of-context, check the utilization. Iterate until the mapping is what you want, then drop into the main design.