## Problem - Design a CFU for MLPerf™ Tiny image classification benchmark model and targeting on decreasing latency. - Your design will be benchmarked by the MLPerf™ Tiny Benchmark Framework. Here is its [Github page](https://github.com/mlcommons/tiny) for detailed information aboud MLPerf™ Tiny. ### Selected model - [MLPerf™ Tiny Image Classification Benchmark Model](https://github.com/mlcommons/tiny/tree/master/benchmark/training/image_classification) is a tiny version of ResNet. - It consists of Conv2D, Add, AvgPool2D, FC, and Softmax. - You don't need to itegrate the model your self. The model is already included in CFU. - See `${CFU_ROOT}/common/src/models/mlcommons_tiny_v01/imgc/` - You can inspect the architecture of the selected model with [Netron](https://netron.app/). - Upload the model and you will see a vivid computation graph containing infomation of operators, tensors, and dependency between each objects. - It might provide you some inspiration for your design. ## Setup - Clone this [fork of CFU](https://github.com/liuyy3364/CFU-Playground.git) to get the final project template - Final project template path: ${CFU_ROOT}/proj/AAML_final_proj - Accuracy and Latency are evaluated by the provided evaluation script - Script path: `${CFU_ROOT}/proj/AAML_final_proj/eval_script.py` - Dependency: ```shell= pip install pyserial tdqm ``` ## Requirement - Files that you can modify - Kernel API 1. `tensorflow/lite/micro/kernels/add.cc` 2. `tensorflow/lite/micro/kernels/conv.cc` 3. `tensorflow/lite/micro/kernels/fully_connected.cc` - Kernel implementation 1. `tensorflow/lite/kernels/internal/reference/integer_ops/add.h` 2. `tensorflow/lite/kernels/internal/reference/integer_ops/conv.h` 3. `tensorflow/lite/kernels/internal/reference/integer_ops/fully_connected.h` - HW design 1. `cfu.v` :::info :warning: No other src code in ${CFU_ROOT}/common/** and ${CFU_ROOT}/third_party/** should be overriden unless asking for permission. ::: - Your design should pass the golden test - After `make prog && make load`, input `11g` to run golden test of MLPerf Tiny imgc model - Make sure you are running imgc's golden test if multiple models are included - Gold test passed if you see this: - ![image](https://hackmd.io/_uploads/Hyi4yYGBa.png) - You can modify the architecture or the parameters of the selected model - The classification accuracy of your design should be evaluated - run `python eval_script.py` in `${CFU_ROOT}/proj/AAML_final_proj` - `--port {tty_path:-/dev/ttyUSB1}`: Add this argument to select correct serial port - Improve the performance of your design to decrease the latency as low as it could be - Accuracy and Latency are evaluated by the provided evaluation script - Usage: - `make prog && make load` > reboot litex > turn off litex-term > run eval script - ![image](https://hackmd.io/_uploads/HJ0NHVCST.png) :::info :bulb: If you just want to know the latency of your design, it would be easier to run a test input instead of whole process of evaluation. ::: ## Presentation :::info :warning: You will receive 0 point if you don't present your work ::: - 30% - You should give a presentation in the last class of this semester - Each team has 5 minutes to present at most - Your presentation should contains - The introduction of your design - SW - HW - (Optional) The implementation of your design - SW - HW - The evaluation of your design - Accuracy (if you modify the selected model) - Latency ## Grading Policy - We will compare the performance of your design with our reference design which is a implementation of HW2 and will not be released. - ACC won't be test if you don't modify the model - $LAT_{TA} \approx 154M\ cycles \approx 2036000\ \mu s$ - All $ACC_{XX}$ and ${LAT_{XX}}$ are measured by the provided evaluation script - Ranking will be released with everyone's evaluation result after the deadline. ### Grading formula - Accuracy: $$ GOLD=\left\{ \begin{array}{l} 1\ if\ golden\ test\ passed, \\ 0\ if\ golden\ test\ failed \end{array} \right. $$ $$ ACC=Min(\frac{ACC_{student}}{ACC_{ori}},\ 100\%) $$ :::info :warning: Note that better ACC won't give you better score!! ::: - Latency: $$ \left. \begin{array}{l} LAT_{base}=Min(80\times\frac{LAT_{TA}}{LAT_{student}},\ 80)\\ LAT_{rank}=Min(20\times\frac{\#students-Rank_{student}}{\#students},\ 20) \\ \ \ \ \ \ \ \ \ \ where\ Rank_{student} \in [0,\#students-1] \end{array} \right. $$ - Presentation $$ Present=\left\{ \begin{array}{l} -30\ if\ you\ submit\ a\ plain\ impl\ of\ lab2\ with\ the\ same\ performance\ as\ TA's, \\ -0\ otherwise \end{array} \right. $$ - Final score: $$ \left. \begin{array}{c} Score=GOLD\times ACC \times (LAT_{base}+LAT_{rank})\ +\ Present\\ (Highest\ score=1\times 100\%\times(80+20)-0=100) \end{array} \right. $$ ## Submission - Please fork my repo and push your work to it - **If you use your own model** - Put pretrained model under `${CFU_ROOT}/proj/AAML_final_proj` or somewhere else we can easily find it - Send us the link to your training/optimization script (Github repo or GoogleDrive ...) via email (yyliu.cs11@nycu.edu.tw) - Or you can put them in your final project repo and leave a message about where to find them in the `README.md` under your CFU project direcrtory \([this file](https://github.com/liuyy3364/CFU-Playground/blob/main/proj/AAML_final_proj/README.md)\) - **Put the link of your fork and your presentation slides to this spreadsheet** - https://docs.google.com/spreadsheets/d/15Got2YzOi-4sHKineF5v3dHaOsCUczfY464RRLa8qCU/edit#gid=1193696083 - Grading workflow will be: 1. Clone your fork 2. Apply your custom model if needed 3. make prog && make load 4. Run golden test 5. Run evaluation script 6. Record measurements