# Oral discussion Report: BitNet 1.58-bit LLM and RVV Hardware Acceleration
**Date:** January 6, 2026
**Topic:** Computational Simplification of LLMs via Ternary Weights and RISC-V Vector Extensions
**Name:** 陳彥伯
**Student ID:** P76141144
---
## 1. Technical Breakthrough: BitNet (1.58-bit LLM)
Traditional Large Language Models (LLMs) rely on high-precision floating-point formats (e.g., FP16 or BF16), which demand significant memory bandwidth and power during inference. **BitNet b1.58** represents a paradigm shift toward extreme quantization by using **ternary weights**.
### Core Principle: Ternary Weights
BitNet restricts model weights to the set $\{-1, 0, 1\}$. This design fundamentally alters the underlying mathematical operations:
```
Standard GEMV: BitNet Ternary:
┌───────────────────┐ ┌───────────────────┐
│ y += w[i] * x[i] │ │ if w[i] == +1: │
│ (multiply) │ → │ y += x[i] │
│ │ │ if w[i] == -1: │
│ │ │ y -= x[i] │
└───────────────────┘ └───────────────────┘
HW: Multiplier HW: Mux + Adder
```
* **Mathematical Transformation**: The standard Matrix-Vector Multiplication ($y = \sum w_i \times x_i$) is simplified into **Accumulate (ACC) operations**.
* **Hardware Efficiency**: During inference, the need for expensive floating-point **Multipliers** is eliminated. Instead, the hardware uses a **Multiplexer (Mux)** to select the sign of the input and an **Adder** to update the accumulator.
---
## 2. Training Mechanism: Quantization-Aware Training (QAT)
Ternary weights are produced through a specialized training process rather than direct initialization:
* **Shadow Weights**: The system maintains high-precision **FP32 "shadow weights"** during training to accurately accumulate small gradient updates.
* **Quantization Process**: High-precision weights are mapped to $\{-1, 0, 1\}$ using a scaling factor ($\beta$) and a rounding function.
* **Straight-Through Estimator (STE)**: Since quantization is non-differentiable, STE is used during backpropagation to allow gradients to bypass the quantization function and update the shadow weights.
---
## 3. Implementation Strategy: RISC-V Vector Extension (RVV)
To maximize the performance of BitNet on RISC-V architectures, the **RVV** is utilized for vectorized execution.
### Numerical Representation (Weight Encoding)
To optimize storage and processing speed, weights are encoded using **2 bits per weight**:
| Encoding (Enc) | Value | Hardware Operation |
| -------------- | ----- | ------------------ |
| `00` | 0 | No contribution (Skip) |
| `01` | +1 | Add activation |
| `10` | -1 | Subtract activation |
| `11` | 0 | Reserved (as 0) |
### Branchless Vectorization
Traditional `if-else` logic causes pipeline stalls in processors. RVV solves this by enabling **branchless ternary MAC**:
* **Vector Merge/Mask**: RVV uses mask operations to decide which activations to add or subtract in a single parallel step.
* **VLA (Vector Length Agnostic)**: The code automatically adapts to different hardware vector widths, ensuring scalability and portability across various RISC-V cores.
---
## 4. Conclusion: Performance and Energy Gains
1. **Memory Savings**: The 2-bit representation reduces the memory footprint and bandwidth requirements by approximately **8x to 10x** compared to FP16.
2. **Energy Efficiency**: Replacing multipliers with adders significantly lowers the energy consumption per operation, making LLMs viable for battery-powered Edge AI devices.
3. **High Throughput**: By leveraging RVV's parallel processing capabilities, BitNet achieves superior inference speeds on RISC-V platforms.
---
## Appendix: Understanding Vectors vs. Matrices in AI
To understand why BitNet and RVV are so effective, it is essential to distinguish between the two fundamental data structures used in AI:
### 1. Vector (1D Entity)
* **Definition**: A vector is an ordered, one-dimensional list of numbers.
* **In LLMs**: A vector typically represents a single data point or a "hidden state" of a word (token) in the model.
* **Hardware Context**: RVV (RISC-V Vector Extension) is designed to process these long lists of numbers simultaneously, performing the same operation (like addition) on every element in the vector at once.
### 2. Matrix (2D Entity)
* **Definition**: A matrix is a two-dimensional grid of numbers arranged in rows and columns.
* **In LLMs**: The "Weights" of a model are stored as matrices. When an input vector passes through a layer, it is multiplied by these weight matrices.
* **The Challenge**: Standard Matrix-Vector Multiplication (GEMV) is computationally expensive because it requires $O(N^2)$ multiplications and additions.
### How BitNet Changes the Interaction
Traditionally, we multiply a **Vector** by a **Matrix** using high-precision floating-point math.
* **BitNet's Approach**: By restricting the **Matrix** elements (weights) to $\{-1, 0, 1\}$, we no longer "multiply."
* **The Result**: The **Matrix** acts as a set of instructions (Add, Subtract, or Skip) for the incoming **Vector**. This allows the hardware to treat complex matrix math as a series of high-speed vector accumulations (ACC).