MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

# MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning ###### tags:`TinyML` ###### paper origin: NeurIPS 2021 ###### paper: [link](https://mcunet.mit.edu/#mcunetv2) ## Objective Enable memory consuming DNN application, especially for CV, on memory constrained device like MCU. ## Prerequisite Knowledge ### Computational Graph in DNN Model A graph shows datadepency among layers and excution flow of model. ![source: visualization of a toy model by netron](https://i.imgur.com/3z5Tt9f.png) ### Memory Plan A static memory allocation for DNN inference on MCU. ![source: TFLM paper](https://i.imgur.com/AO0qxjc.png) - Reason why we need this: 1. Too high cost for dynamic allocation implementation. 2. Dynamic allocation may cause heap fragmentaion. 3. All the memory usage are determined by model definition. - So we can perform static analysis to approach, or even reach, the optimal solution. ### Receptive field of CNN layers The portion of the input that is needed to compute a portion output. ![source:https://blog.mlreview.com/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807](https://i.imgur.com/mwUhVXo.png) ## Motivation - DNN trend - Memory consumption of DNN application - Memory consumption of MobileNetV2_1.0_244's first convolution layer - $224^2 \times 3 + 112^2 \times 32 = 539 kB$ - Resource constrained platform: MCU - SRAM usually smaller than 512 kB - Imbalanced memory usage distribution in CNN inference. - The first few layers dominate the spatial memory efficiency. - ![source: MCUNetV2 paper](https://i.imgur.com/L2HWBaX.png) ## Design ### Patch-based Inference (Per-Patch inference) ![](https://i.imgur.com/ixz4p5o.png) - Split the output tensor of the last layer into multiple output patches. - Perform convolution on the receptive field of output patches and combine all the output patches to get the result. - Similar to [Fused-Layer CNN](/3ZS0x6CvTFqZ8kEJTcFkYw) ### Use Patch-based Inference to solve memory Imbalancing in CNN Inference Divide a CNN model into two part: Patch-based and normal ![source: MCUNetV2 paper](https://i.imgur.com/L2HWBaX.png) - Patch-based part - Perform per-patch inference - High memory consumption reduced by Patch-based inference - normal part - Act normally(per-layer inference) - Result: Break memory bottleneck and meet the memory constraint. - MbV2: 1372kB ->172kB ## Problem of Design ### Computation Overhead: Recomputation of Overlapping receptive field The overlapped receptive field of output patches would be recomputed. ![](https://i.imgur.com/nSZCT5q.png) ### Solution: Redistributing the Receptive Field - Modify the model architecture - Reduce receptive field size in per-patch stage. - Change the kernel size of the first few layers to reduce receptive field size. - Remove some of the first few layer. - Make up the previous receptive field reduction in per-layer stage. - Insert more layers. ![](https://i.imgur.com/VpKUa7u.png) ### Result ![](https://i.imgur.com/rTJehOd.png) ## Joint Neural Architecture and Inference Scheduling Search ### Discription Use NAS to find the proper redistribution config. ### Reason Redistributing the receptive ﬁeld can relief computation overhead, but the way how to redistribute varies case-by-case for different models. Not efficient to select config manually, so we need an automated tool to handle this. ### Backbone Optimization ![](https://i.imgur.com/bbA7p5T.png) ### Inference Scheduling Optimization ![](https://i.imgur.com/nBlLXja.png) ### Joint Search ![](https://i.imgur.com/IYzhkQM.png) #### Appendix for Joint search ![](https://i.imgur.com/C4lRzp7.png) ![](https://i.imgur.com/zdB7wWu.png) ![](https://i.imgur.com/XYh7Dxs.png) ## Experiment ### Memory proﬁling 1. Analytic profiling ![](https://i.imgur.com/a6Su321.png) 2. On-device profiling ## Evaluation ### Reducing Peak Memory of Existing Networks ![](https://i.imgur.com/M0GAjiT.png) ![](https://i.imgur.com/eNl6d9j.png) ### MCUNetV2 for Tiny Image Classiﬁcation ![](https://i.imgur.com/Gbx2gAO.png) ### MCUNetV2 for Tiny Object Detection ![](https://i.imgur.com/xNg1VTn.png) ## Analysis ![](https://i.imgur.com/kHlP92F.png) ## Related works ![](https://i.imgur.com/6H8jcuw.png)