# Lab 2 Report: Quantization
## Section 2.3 Analysis
1. Table of `DeiT-S` with group_size=64
| nbit | 32 | 8 | 4 | 3 | 2 |
| ---------------- | ------ | ------ | ------ | ------ | ----- |
| Accuracy (%) | 90.99 | 90.96 | 89.59 | 94.93 | 4.79 |
| Model Size (MiB) | 82.540 | 24.216 | 14.073 | 12.050 | 9.001 |
2. Table of `Llama3.2-1B-Instruct` with group_size=64
| nbit | 16 | 8 | 4 | 2 |
| ------------------- | -------- | -------- | -------- | -------- |
| Perplexity (PPL) | 13.160 | 13.173 | 15.853 | 122273.7 |
| Model Size (MiB) | 2858.129 | 1959.134 | 1495.134 | 1263.134 |
| Throughput (toks/s) | 321.17 | 405.051 | 537.063 | 614.574 |
3. I only use the basic way to quantize the both model, which is the `BaseQuantizeConfig` function brought by HQQ Module
Also, I use the grid search to find the optimal parameter of `BaseQuantizeConfig` for these models, here's the results:
#### For DeiT-S

- z-axis is the score calculated by the provided function
- The parameters can get highest score for `DeiT-S` is (4, 48, 1) (nbits, group_size, axis)
#### For Llama3.2-1B

- The z-axis is the score, there're some parameters can get the highest score (aka 10).

- The z-axis is perplexity score, we can observe that when quantize under 4 nbits, the perplexity score raise sharply

- The z-axis is speed up, it is linearly growing as we quantize with less nbits
4. In this lab experiment, I think the LLM Model will be hard to quantize while maintaining the output quality at the same time.
One possible reason is that the language model has more complicated structure and more parameters than the DeiT -- which is basically a simplified version of ViT.
5.
