# VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
##### paper origin : HPCA 2023
##### paper : [link](https://arxiv.org/pdf/2302.08687.pdf)
[TOC]
## Introduction
* Deep learning (DL) is used in various domains including computer vision, recommendation systems, and natural language processing.
* In fact, there are cases where CPUs are more suitable than GPUs/accelerators as the primary processor for processing deep neural networks (DNNs).
### Problems
* The computations in CPU are increased, due to matrix multiplications.
### Solutions
* Deploy dense matrix engines along with conventional scalar and vector engines to accelerate GEMM (i.e., general matrix multiplication) which is at the core of DL models.
* VEGETA, Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.
## Background
* Sparsity Pattern

* VEGETA Registers


* VEGETA Instructions

## Architecture
* Processing Unit (PU). A PU is composed of a number of MAC units.
* Processing Element (PE). We group PUs that share the same eastbound inputs and output buffers into a PE.



* Cycle-level visualization for TILE SPMM V instruction on VEGETA-S-2-2 with 1:4 structured sparsity for matrix A with dimensions A: 16×128 (yellow), B: 128×16 (magenta), C:16×16 (green)


* Overview of VEGETA in a CPU. We highlight the parts including out contributions with red.

## Evalution
* Row-wise achieves 2.36× and 3.28× at 90% and 95% sparsity degree. SIGMA performs better than others with extremely high sparsity degrees (>95%), but it is inefficient for the modest sparsity degree (the target of our work) indicating that extra area overhead is not useful.
* Area and power normalized to RASA-SM and frequency for different VEGETA engines.

## Conclusions
* VEGETA architecture adds flexible N:M structured sparsity support in CPU matrix engines via extensions in ISA and engines.
* Exploring different VEGETA engine design choices to understand the trade-offs in performance and area.