Lab 1: Run MobileNet on CPU

# **Lab 1: Running MobileNet on a CPU** <span style="color:Red;">**Due Date: 3/21 23:55**</span> In this lab, you'll utilize ExecuTorch to construct an executor for deploying MobileNet on a Raspberry Pi. ## **What is ExecuTorch** ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices. In this lab, you will be using a series of APIs from ExecuTorch to **convert the model into a binary file with extension .pte**, which can be loaded in a C++ program called **executor runner** in order to run the model. The basic workflow of converting the model is like the following graph: > ![image](https://hackmd.io/_uploads/BJldjaSa6.png) You can check [EXPORTING TO EXECUTORCH TUTORIAL](https://pytorch.org/executorch/0.1/tutorials/export-to-executorch-tutorial.html) and [PyTorch Edge: Developer Journey for Deploying AI Models Onto Edge Devices - Mengwei Liu & Angela Yi](https://www.youtube.com/watch?v=Z6tJg2XSWqY) to know more about it. ## **Setting Up ExecuTorch** ### Supported Host Environment - Linux (x86_64) - CentOS 8 or later - Ubuntu 20.04.6 LTS or later - RHEL 8 or later - Windows Subsystem for Linux running any of the above - macOS (x86_64/M1/M2) - Big Sur or later :::success If you are utilizing a Virtual Machine, it is advisable to allocate a minimum of 40GB of disk space. ::: ### ExecuTorch Setup Tutorial **Reminder**: You will need these tools during the tutorial: - clang / clang++ - gcc / g++ - Cmake Carefully follow the provided instructions to successfully run the sample model add.pte. > [ExecuTorch Setup Tutorial](https://pytorch.org/executorch/stable/getting-started-setup.html) :::danger Please make sure the version of ExecuTorch is **v0.1.0**. ::: ### Building ExecuTorch Runner with CMake Guide Learn how to use CMake for building a cross-compiled code runner for Raspberry Pi in this section. Ignore the Cross Compilation section for now. > [CMake Building Guide](https://pytorch.org/executorch/stable/runtime-build-and-cross-compilation.html) ## **Introduction** ### **Part 1: Export a MobileNetV2 Model (20%)** Follow the guide below and prepare ```export.py``` to export a MobileNetV2 model using the weights from Lab0. You don’t need the ```to_backend``` function and the performing quantization section for this part. **Please set the batch size of input as 1**. > [MobileNetV2 Export Guide](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html) You will obtain the a ExecuTorch binary file ```mobilenet.pte``` which is for later use. ### **Part 2: Running an ExecuTorch Model In C++ (40%)** In this part, we are going to run MobileNetV2 on CIFAR-10 Dataset. #### **2.1 Data Download** Download and unzip the ```CIFAR-10 binary version``` from the link below. Use only the ```test_batch.bin``` file. > [CIFAR-10 Download](https://www.cs.toronto.edu/~kriz/cifar.html) #### **2.2 Implementing executor runner** This graph shows the how the executor runner inference the model: > ![image](https://hackmd.io/_uploads/SJ0is6HTa.png) You are going to implement a custom executor runner, which includes: - Data pre-processing functions - Model inference The flow of model inference is as following: 1. Read images from the file ```test_batch.bin``` 2. Resize the images from 3x32x32 to 3x224x224 using [bilinear interpolation](https://medium.com/@sim30217/bilinear-interpolation-e41fc8b63fb4) 3. Divide the pixel values of the image by 255 4. Normalize the image with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] 5. Set the images as input data for the model 6. Get the prediction label based on the output tensor 7. Calculate the accuracy We have prepared the code template of following files: * ```executorch/examples/portable/executor_runner/executor_runner.cpp``` * ```executorch/util/util.h``` Please download the code template here: > [Code Template](https://drive.google.com/drive/folders/1B4b_ltpwYlPUwCKHDZqwA_uYUJNkW8p4?usp=drive_link) After you have done all the above, build the program with CMake, and you should find the built executable at ```executorch/cmake-out/executor_runner```. Use the below command to run locally on your system: ```bash ./cmake-out/executor_runner --model_path /path/to/your_model.pte ``` :::success You can simply test 10 images instead of 10k images at local to see if the function works. ::: ### **Part 3: Run on Rpi (15%)** The executable generated in part 2 is built for your operating system, i.e., the host operating system. To run the program on Raspberry Pi, cross-compilation is needed. Download the [Toolchain File](https://drive.google.com/file/d/1otWaB2YJcFsnn92Jd_wy25XsjHeyqNG7/view?usp=drive_link) and install the cross compilers by running: ```bash sudo apt-install gcc-aarch64-linux-gnu sudo apt-install g++-aarch64-linux-gnu ``` Then run the command below to cross-compile the program: ```bash (rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/path/to/toolchain-aarch64-gcc.cmake -DBUCK2=/path/to/buck2 ..) ``` After cross-compiling the program, use the ```scp``` command to transport ```executor_runner```, your generated binary file ```mobilenet.pte```, and the test dataset ```test_batch.bin``` to Raspberry Pi. Run the executable on the **first 10 images** of the test dataset to check the accuracy. You should set your file structure like this: ``` ├── data/ │ └── test_batch.bin │ ├── model/ │ └── mobilenet.pte │ └── executor_runner ``` Ensure all the paths are set correctly, especially the file path in your C++ program. This is the file structure when TAs are grading your work. :::danger You should **NOT** git clone the ExecuTorch repo or install any packages on Raspberry Pi. You only need executable ```executor_runner```, your generated binary file ```mobilenet.pte```, and the dataset ```test_batch.bin```. ::: ### **Part 4: Apply XNNPack Backend (10%)** You should find that the inference speed is slow. This is because the above implementation only ensures portability but not performance. Learn how to use [XNNPack Backend](https://pytorch.org/executorch/stable/native-delegates-executorch-xnnpack-delegate.html) for faster inference speeds by following the guide below. > [XNNPack Backend Guide](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html) Prepare a new export file ```xnn_export.py``` to export a MobileNetV2 model using the weights from Lab0. You will obtain the generated binary file ```xnn_mobilenet.pte``` which is for later use. :::success In this section, you do not need to implement the ```Lowering a Quantized Model to XNNPACK``` part. ::: ### **Part 5: Run XNNPack Backend model on Rpi (15%)** In this part, you need to cross-compile your program with XNNPack support, and transfer the new executable and model to Raspberry Pi. Run the command below to cross-compile the program: ```bash (rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/path/to/toolchain-aarch64-gcc.cmake -DEXECUTORCH_BUILD_XNNPACK=ON -DBUCK2=/path/to/buck2 ..) ``` You will find the executable at **```executorch/cmake-out/backends/xnnpack/xnn_executor_runner```** Again, use the ```scp``` command to transport the executable ```xnn_executor_runner``` and your generated binary file ```xnn_mobilenet.pte``` to the Raspberry Pi. Run the executable with the whole dataset (10,000 images) and check the accuracy. ## **Hand-In Policy** Submit your code, both executor runners, and your generated binary files (.pte) in a zip file structured as shown: ``` TeamXX.zip ├── code/ │ ├── executor_runner.cpp │ ├── util.h │ ├── export.py │ └── xnn_export.py │ ├── model/ │ ├── mobilenet.pte (from part 1) │ └── xnn_mobilenet.pte (from part 4) │ ├── executor_runner (from part 3) └── xnn_executor_runner (from part 4) ``` Ensure your program outputs accuracy in the specified format. ## **Evaluation Criteria** Upon receiving your zip file, we will unzip it and add the dataset folder, ```data/test_batch.bin```, which the new folder structure should become as below: ``` TeamXX ├── data/ │ └── test_batch.bin ├── code/ ├── model/ ├── executor_runner └── xnn_executor_runner ``` Then we will run the following commands to test your executor runner: ```bash ./executor_runner --model_path ./model/mobilenet.pte ./xnn_executor_runner --model_path ./model/xnn_mobilenet.pte ``` :::danger **Important Note:** To ensure that you receive full credit, it's important that your executor runner can correctly read the data and run the model. Also, the accuracy must be higher than **0.9**. These two steps are crucial and shouldn't be overlooked. ::: ## **Reference** - [Executorch Overview](https://pytorch.org/executorch/stable/intro-overview.html) - [How Executorch works](https://pytorch.org/executorch/stable/intro-how-it-works.html) - [Executorch Backend and Delegate](https://pytorch.org/executorch/stable/compiler-delegate-and-partitioner.html) - [Executorch Concept](https://pytorch.org/executorch/stable/concepts.html)