Lab 2 Report: Qunatization

# Lab 2 Report: Quantization ## Section 2.3 Analysis 1. Table of `DeiT-S` with group_size=64 | nbit | 32 | 8 | 4 | 3 | 2 | | ---------------- | ------ | ------ | ------ | ------ | ----- | | Accuracy (%) | 90.99 | 90.96 | 89.59 | 94.93 | 4.79 | | Model Size (MiB) | 82.540 | 24.216 | 14.073 | 12.050 | 9.001 | 2. Table of `Llama3.2-1B-Instruct` with group_size=64 | nbit | 16 | 8 | 4 | 2 | | ------------------- | -------- | -------- | -------- | -------- | | Perplexity (PPL) | 13.160 | 13.173 | 15.853 | 122273.7 | | Model Size (MiB) | 2858.129 | 1959.134 | 1495.134 | 1263.134 | | Throughput (toks/s) | 321.17 | 405.051 | 537.063 | 614.574 | 3. I only use the basic way to quantize the both model, which is the `BaseQuantizeConfig` function brought by HQQ Module Also, I use the grid search to find the optimal parameter of `BaseQuantizeConfig` for these models, here's the results: #### For DeiT-S ![Screenshot 2025-04-20 at 12.50.19 AM](https://hackmd.io/_uploads/r1wLV8-Jeg.png) - z-axis is the score calculated by the provided function - The parameters can get highest score for `DeiT-S` is (4, 48, 1) (nbits, group_size, axis) #### For Llama3.2-1B ![Screenshot 2025-04-20 at 11.31.43 AM](https://hackmd.io/_uploads/SkP6qyzkll.png) - The z-axis is the score, there're some parameters can get the highest score (aka 10). ![Screenshot 2025-04-20 at 11.33.05 AM](https://hackmd.io/_uploads/r10gi1G1gx.png) - The z-axis is perplexity score, we can observe that when quantize under 4 nbits, the perplexity score raise sharply ![Screenshot 2025-04-20 at 11.34.14 AM](https://hackmd.io/_uploads/HkMSokfkge.png) - The z-axis is speed up, it is linearly growing as we quantize with less nbits 4. In this lab experiment, I think the LLM Model will be hard to quantize while maintaining the output quality at the same time. One possible reason is that the language model has more complicated structure and more parameters than the DeiT -- which is basically a simplified version of ViT. 5. ![Screenshot 2025-04-20 at 1.08.57 AM](https://hackmd.io/_uploads/HJK3d8Z1lg.png)