## Problem
- Design a CFU for MLPerf™ Tiny image classification benchmark model and targeting on decreasing latency.
- Your design will be benchmarked by the MLPerf™ Tiny Benchmark Framework. Here is its [Github page](https://github.com/mlcommons/tiny) for detailed information aboud MLPerf™ Tiny.
### Selected model
- [MLPerf™ Tiny Image Classification Benchmark Model](https://github.com/mlcommons/tiny/tree/master/benchmark/training/image_classification) is a tiny version of ResNet.
- It consists of Conv2D, Add, AvgPool2D, FC, and Softmax.
- You don't need to itegrate the model your self. The model is already included in CFU.
- See `${CFU_ROOT}/common/src/models/mlcommons_tiny_v01/imgc/`
- You can inspect the architecture of the selected model with [Netron](https://netron.app/).
- Upload the model and you will see a vivid computation graph containing infomation of operators, tensors, and dependency between each objects.
- It might provide you some inspiration for your design.
## Setup
- Clone this [fork of CFU](https://github.com/liuyy3364/CFU-Playground.git) to get the final project template
- Final project template path: ${CFU_ROOT}/proj/AAML_final_proj
- Accuracy and Latency are evaluated by the provided evaluation script
- Script path: `${CFU_ROOT}/proj/AAML_final_proj/eval_script.py`
- Dependency:
```shell=
pip install pyserial tdqm
```
## Requirement
- Files that you can modify
- Kernel API
1. `tensorflow/lite/micro/kernels/add.cc`
2. `tensorflow/lite/micro/kernels/conv.cc`
3. `tensorflow/lite/micro/kernels/fully_connected.cc`
- Kernel implementation
1. `tensorflow/lite/kernels/internal/reference/integer_ops/add.h`
2. `tensorflow/lite/kernels/internal/reference/integer_ops/conv.h`
3. `tensorflow/lite/kernels/internal/reference/integer_ops/fully_connected.h`
- HW design
1. `cfu.v`
:::info
:warning: No other src code in ${CFU_ROOT}/common/** and ${CFU_ROOT}/third_party/** should be overriden unless asking for permission.
:::
- Your design should pass the golden test
- After `make prog && make load`, input `11g` to run golden test of MLPerf Tiny imgc model
- Make sure you are running imgc's golden test if multiple models are included
- Gold test passed if you see this:
- 
- You can modify the architecture or the parameters of the selected model
- The classification accuracy of your design should be evaluated
- run `python eval_script.py` in `${CFU_ROOT}/proj/AAML_final_proj`
- `--port {tty_path:-/dev/ttyUSB1}`: Add this argument to select correct serial port
- Improve the performance of your design to decrease the latency as low as it could be
- Accuracy and Latency are evaluated by the provided evaluation script
- Usage:
- `make prog && make load` > reboot litex > turn off litex-term > run eval script
- 
:::info
:bulb: If you just want to know the latency of your design, it would be easier to run a test input instead of whole process of evaluation.
:::
## Presentation
:::info
:warning: You will receive 0 point if you don't present your work
:::
- 30%
- You should give a presentation in the last class of this semester
- Each team has 5 minutes to present at most
- Your presentation should contains
- The introduction of your design
- SW
- HW
- (Optional) The implementation of your design
- SW
- HW
- The evaluation of your design
- Accuracy (if you modify the selected model)
- Latency
## Grading Policy
- We will compare the performance of your design with our reference design which is a implementation of HW2 and will not be released.
- ACC won't be test if you don't modify the model
- $LAT_{TA} \approx 154M\ cycles \approx 2036000\ \mu s$
- All $ACC_{XX}$ and ${LAT_{XX}}$ are measured by the provided evaluation script
- Ranking will be released with everyone's evaluation result after the deadline.
### Grading formula
- Accuracy:
$$
GOLD=\left\{
\begin{array}{l}
1\ if\ golden\ test\ passed, \\
0\ if\ golden\ test\ failed
\end{array}
\right.
$$
$$
ACC=Min(\frac{ACC_{student}}{ACC_{ori}},\ 100\%)
$$
:::info
:warning: Note that better ACC won't give you better score!!
:::
- Latency:
$$
\left.
\begin{array}{l}
LAT_{base}=Min(80\times\frac{LAT_{TA}}{LAT_{student}},\ 80)\\
LAT_{rank}=Min(20\times\frac{\#students-Rank_{student}}{\#students},\ 20) \\
\ \ \ \ \ \ \ \ \ where\ Rank_{student} \in [0,\#students-1]
\end{array}
\right.
$$
- Presentation
$$
Present=\left\{
\begin{array}{l}
-30\ if\ you\ submit\ a\ plain\ impl\ of\ lab2\ with\ the\ same\ performance\ as\ TA's, \\
-0\ otherwise
\end{array}
\right.
$$
- Final score:
$$
\left.
\begin{array}{c}
Score=GOLD\times ACC \times (LAT_{base}+LAT_{rank})\ +\ Present\\
(Highest\ score=1\times 100\%\times(80+20)-0=100)
\end{array}
\right.
$$
## Submission
- Please fork my repo and push your work to it
- **If you use your own model**
- Put pretrained model under `${CFU_ROOT}/proj/AAML_final_proj` or somewhere else we can easily find it
- Send us the link to your training/optimization script (Github repo or GoogleDrive ...) via email (yyliu.cs11@nycu.edu.tw)
- Or you can put them in your final project repo and leave a message about where to find them in the `README.md` under your CFU project direcrtory \([this file](https://github.com/liuyy3364/CFU-Playground/blob/main/proj/AAML_final_proj/README.md)\)
- **Put the link of your fork and your presentation slides to this spreadsheet**
- https://docs.google.com/spreadsheets/d/15Got2YzOi-4sHKineF5v3dHaOsCUczfY464RRLa8qCU/edit#gid=1193696083
- Grading workflow will be:
1. Clone your fork
2. Apply your custom model if needed
3. make prog && make load
4. Run golden test
5. Run evaluation script
6. Record measurements