# MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
###### tags:`TinyML`
###### paper origin: NeurIPS 2021
###### paper: [link](https://mcunet.mit.edu/#mcunetv2)
## Objective
Enable memory consuming DNN application, especially for CV, on memory constrained device like MCU.
## Prerequisite Knowledge
### Computational Graph in DNN Model
A graph shows datadepency among layers and excution flow of model.

### Memory Plan
A static memory allocation for DNN inference on MCU.

- Reason why we need this:
1. Too high cost for dynamic allocation implementation.
2. Dynamic allocation may cause heap fragmentaion.
3. All the memory usage are determined by model definition.
- So we can perform static analysis to approach, or even reach, the optimal solution.
### Receptive field of CNN layers
The portion of the input that is needed to compute a portion output.

## Motivation
- DNN trend
- Memory consumption of DNN application
- Memory consumption of MobileNetV2_1.0_244's first convolution layer
- $224^2 \times 3 + 112^2 \times 32 = 539 kB$
- Resource constrained platform: MCU
- SRAM usually smaller than 512 kB
- Imbalanced memory usage distribution in CNN inference.
- The first few layers dominate the spatial memory efficiency.
- 
## Design
### Patch-based Inference (Per-Patch inference)

- Split the output tensor of the last layer into multiple output patches.
- Perform convolution on the receptive field of output patches and combine all the output patches to get the result.
- Similar to [Fused-Layer CNN](/3ZS0x6CvTFqZ8kEJTcFkYw)
### Use Patch-based Inference to solve memory Imbalancing in CNN Inference
Divide a CNN model into two part: Patch-based and normal

- Patch-based part
- Perform per-patch inference
- High memory consumption reduced by Patch-based inference
- normal part
- Act normally(per-layer inference)
- Result: Break memory bottleneck and meet the memory constraint.
- MbV2: 1372kB ->172kB
## Problem of Design
### Computation Overhead: Recomputation of Overlapping receptive field
The overlapped receptive field of output patches would be recomputed.

### Solution: Redistributing the Receptive Field
- Modify the model architecture
- Reduce receptive field size in per-patch stage.
- Change the kernel size of the first few layers to reduce receptive field size.
- Remove some of the first few layer.
- Make up the previous receptive field reduction in per-layer stage.
- Insert more layers.

### Result

## Joint Neural Architecture and Inference Scheduling Search
### Discription
Use NAS to find the proper redistribution config.
### Reason
Redistributing the receptive field can relief computation overhead, but the way how to redistribute varies case-by-case for different models.
Not efficient to select config manually, so we need an automated tool to handle this.
### Backbone Optimization

### Inference Scheduling Optimization

### Joint Search

#### Appendix for Joint search



## Experiment
### Memory profiling
1. Analytic profiling

2. On-device profiling
## Evaluation
### Reducing Peak Memory of Existing Networks


### MCUNetV2 for Tiny Image Classification

### MCUNetV2 for Tiny Object Detection

## Analysis

## Related works
