**Lab 3: Quantize MobileNet**

# **Lab 3: Quantize MobileNet** <span style="color:Red;">**Due Date: 5/9 23:55**</span> ## Introduction This lab aims to quantize the MobileNetV2 model, resulting in reduced model size and potentially improved inference speed without significant loss in accuracy. * Please download the provided Jupyter Notebook file using the link below. Follow the prompts and hints provided within the notebook to fill in the empty blocks and answer the questions. > [lab3.ipynb](https://colab.research.google.com/drive/1uO8nSfVe5SbY-DR0M4o3wBuhTwJZr4gG?usp=drive_link) ## Part 1: Linear Quatization Implementation (70%) In this part, you will learn how linear quantization works and implement the quantized version of **Fully Connected Layer** (Linear) and **Convolution Layer** (Conv2d). Refer to **"Part1"** in the provided notebook (ipynb). ## Part 2: Quantize MobileNetV2 and Export (30%) You will quantize MobileNetV2 using XNNPack library, and you may try to deploy the quantized model on to Raspberry Pi. Refer to **"Part2"** in the provided notebook (ipynb). * Below is a MobileNetV2 with 96.3% accuarcy on CIFAR10, finetuned by TAs. You will be using the model's weights a starting point for quantization: > [mobilenetv2_0.963.pth](https://drive.google.com/file/d/1k89xAqC1FETperw11xvpxSPGcEwMfZJh/view) You can load the above model with the following snippet: ```python import torch from torchvision.models.quantization import mobilenet_v2 model = torch.load('./mobilenetv2_for_quant.pth', map_location="cpu") ``` :::success We will not calculate your score by executorch pte, but you can deploy it onto Raspberry Pi to compare the execution time with the model before being quantized. ::: ## Hand-In Policy You will need to hand-in: * Your quantized model ***mobilenet_quantized.pth*** * Fill out ***lab3.ipynb***, and rename it to ***```<YourID>```.ipynb*** Please organize your submission files into a zip archive structured as follows: ```scss YourID.zip ├── model/ │ └── mobilenet_quantized.pth └── YourID.ipynb ``` ## Evaluation Criteria Upon receiving your zip file: 1. We will use following code to load your quantized model ```scss // Load the saved ExportedProgram loaded_quantized_ep = torch.export.load(pt2e_quantized_model_file_path) loaded_quantized_model = loaded_quantized_ep.module() ``` 2. Evaluate the accuracy of your quantized model using the following code: ```scss acc = evaluate_model(loaded_quantized_model, test_loader, device) ``` $$ Score = (10 \times Step function(Accuracy-0.88) + 20 \times \dfrac{Accuracy - 0.88}{0.96 - 0.88}) $$ The reported accuracy must be higher than *88%* to obtain full score.