Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

# Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks ###### tags: `Accelerators` `Multi-Tenant` `Dynamic Architecture Fission` ###### paper origin: MICRO 2020 ###### papers: [link](https://ieeexplore.ieee.org/document/9251939) ###### video: [link](https://www.youtube.com/watch?v=PKS6VgxYVg0) # 1. INTRODUCTION ## Research Problems * As the demand for INFaaS scales, continuously increasing the number of accelerators in the cloud may not be efficient. whereas multi-tenancy has been a primary enabler for the success of cloud-computing in current scale. ## Proposed Solutions * Dynamically fissioning the DNN accelerator at runtime to spatially co-locate multiple DNN inferences on the same hardware. * Dynamic architecture fission for spatial multi-tenant execution. * Task scheduling for spatial multi-tenant execution. # Synamic Architecture Fission : Concepts And Overview ![](https://i.imgur.com/jDmOKxF.png) The objective is to enable multi-tenant execution of DNNs by spatially co-locating multiple DNN tasks on a single accelerator. To do so, the underlying accelerator needs to dynamically fission at runtime into smaller pieces of logical full-fledged accelerators that can execute their pertinent DNN. * Fission microarchitecture * Microarchitecture that can fission dynamically into smaller full-fledged accelerators to execute multiple DNNs simultaneously. * Task scheduler * A task scheduling algorithm that adaptively schedules and assigns the resources to different tasks. * the scheduler identifies minimal amount of resources required to execute the DNN while meeting the QoS constraints imposed. * it uses a scoring mechanism that congregates task priority and remaining time to distribute the remaining resources on the accelerator to spatially co-locate tasks. # Architecture Design For Fission: Challanges And Oppotunities ![](https://i.imgur.com/aM3TP7O.png) * Fission for Compute and the Need for New Communication Patterns * The need for flexible and cost-effective fission of compute resources * mapping a convolution or matrix multiplication operation to a big monolithic systolic array can lead to underutilization of compute resources ![](https://i.imgur.com/XmIR7Nc.png) * Fission granularity * replace a subset of the links to determine the granularity such that they can disconnect a subarray of the PEs instead of a single PE. * The need for new and flexible patterns of communication for richer fission possibilities. * bi-directional ![](https://i.imgur.com/Yr7pFxk.png) we propose omni-directional systolic arrays that can forward the input activations and partial sums in all directions as opposed to conventional systolic arrays that always forward the data in just two directions. * Enabling full-fledged logical accelerators through fission for the SIMD Vector Unit. * the SIMD Vector Unit also needs to be broken into smaller segments * Fission for the On-Chip Memory and the Need for Reorganizing the Entire Design * Weight buffer fission. * Activation and output buffer fission ![](https://i.imgur.com/2J7GuNb.png) * Fission without Reorganization Defeats the Purpose ![](https://i.imgur.com/fxX86DP.png) # Microarchitecture For Fission * Omni-Directional Systolic Array Design ![](https://i.imgur.com/HGzIG0B.png) * Reorganizing the Accelerator Microarchitecture through Fission Pod Design * Objective * Creating multiple stand-alone and full-fledged logical accelerators to enable spatial co-location. * Enriching the fission possibilities as much as possible to serve various computational needs of co-located DNNs. * Maximizing the PE subarray utilization. * Maximizing the on-chip buffers utilization and their bandwidths to subarrays. * Constraints * Imposing minimal power/area overhead to the hardware * Maintaining the baseline clock frequency ![](https://i.imgur.com/adu68o3.png) * Memory-compute interweaving in Fission Pod * Intra Fission Pod data communication. * Clock frequency consideration. * Planaria Overall Architecture ![](https://i.imgur.com/OMH7p6I.png) * the original monolithic systolic array has been broken down into 16 omni-directional systolic subarrays, where a group of four subarrays form one Fission Pod that contains a Pod Memory. # Spatial Task Scheduling * Requirements * The scheduler ideally needs to be aware of the optimal fission configurations for DNN tasks to leverage dynamic fission and co-location. * The scheduler needs to be QoS-aware and leverage the available slack time offered by QoS constraint of each task to maximize the co-location and utilization while adhering to the SLA. * Task re-allocation requires checkpointing the intermediate results, while making sure that the re-allocation and checkpointing does not overuse on-chip memory or result in significant context switching overheads. ![](https://i.imgur.com/EROgJkh.png) * Estimating minimal resource to meet the QoS requirement. * A new inference task is dispatched to the task queue of the datacenter node or a running inference task finishes. * Allocating resources to improve QoS. * Determines the allocation of the subarrays based on their availability and priority of the inference requests. * If not allocatable, increase the score. * Tile-based scheduling to minimize re-allocation overheads. * the scheduling happens at tile-granularity * the tasks are preempted only when the resource allocation changes. # Result ![](https://i.imgur.com/rQceYD6.png)