2025/1/20 kevinchou
===
## BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
### Abstract
- Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable within bit-serial deep learning accelerators. They are orthogonal to and compatible with other deep neural network (DNN) efficiency methods such as quantization and pruning.
- In this work, we enhance the practicality and efficiency of bit-level sparsity.
- On the algorithmic side, we introduce bidirectional bit sparsity (BBS). Building on BBS, we also propose two bit-level binary pruning methods that require no retraining and can be seamlessly applied to quantized DNNs.
- On the hardware side, we demonstrate the potential of BBS through BitVert, a bit-serial architecture with an efficient processing element (PE) design, aimed at accelerating DNNs with low overhead by exploiting our proposed binary pruning methods.
---
### Related Works

- Bit-Parallel PE

- Pragmatic

Pragmatic requires a variable shifter after every bit-serial multiplier to synchronize the significance of essential bits.
- Bitlet

Bitlet digests multiple weights and activations, and computes every bit-significance independently. However, since every bit lane can absorb the essential bit from an arbitrary weight, Bitlet requires a large multiplexer.
- Bitwave

---
### Method - BBS: BI-DIRECTIONAL BIT-LEVEL SPARSITY
#### BBS Theorem


- From Eq. 2 and 3, we can infer that instead of adding the effectual activations associated with non-zero weight bits, the same result can be obtained by subtracting the activations indicated by zero weight bits from the sum of all activations.
#### Bit-level Binary Pruning
- BBS with Rounded Averaging

- Step1: Identifies if there are redundant bit columns that immediately follow the mostsignificant column with the same content.
- Step2: Achieved by calculating the rounded average of the values represented by the 3 lower significant bits of original weights. Essentially, this is replacing the 3 lower significant bits of all weights with a 3-bit constant while minimizing the MSE
- Step3: Compresses the original weight group by storing only the remaining 4 bit columns and an 8-bit encoding metadata
- BBS with Zero-point Shifting

- Step1: Assume a constant −14 is added to the original weight, which changes the binary content of all numbers.
- Step2: To minimize the MSE when pruning the 4 lower significant bit columns, a number can either directly zero out the 4 lower bits, or round up to the higher bit significance.
- Step3: Shows the actual values after binary pruning and stores the new zero-point in the encoding metadata.

#### Hardware-aware Global Binary Pruning

---
### Method - BITVERT HARDWARE ARCHITECTURE
- BitVert Processing Element

- Step1 receives 16 activations A0 ~ A15 and selects 8 of them based on sel0 ~ sel7 that indicates the position of effectual bits in the weight bit-vector.
- Step2 performs bit-serial multiplication using valid signals val0 ~ val7 in case there are less than 8 effectual bits. A subtractor subtracts the adder tree result from the sum of activations (Eq. 2), followed by a mux to select the partial sum.
- Step3 then shifts the partial sum based on the column index col_idx that specifies the significance of current weight bits.
- Step4 multiples this constant with the sum of activations. Finally, the product and bit-serial partial sum are accumulated in Step5.
- BitVert Scheduler

- BitVert Accelerator

---