# Reading Note – Compute Caches
###### tags: `paper`
## Introduction
* paper: [here](http://web.eecs.umich.edu/~reetudas/papers/compute_cache.pdf)
* author: Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das
* publish: 2017 HPCA
* key: bit-line SRAM circuit as computational cache
## background
* SRAM circuit bit line computing [2],[3]
> parallelism & reduce data movement
> trade off: 8% of cache area overhead
* major problem:
* operand locality, Bit-line computing requires that the data operands are stored in rows that share the same set of bit-lines.
* software geometry
* near-place compute caches: read out from cache sub-arrays, perform arithmetic close to cache controller
* managing parallelism across cache levels
* evalutaion & application
* text processing, bitmap indexing, copy-on-write checkpointing in OS, and bit matrix multiplication (BMM); a critical primitive used in cryptography, bioinformatics, and image processing.
*
## background
* compute cache overview on Intel's SandyBridge

* A sub-array in a cache bank is organized into multiple rows of data-storing bit-cells
* SRAM circuit for in-place operation

## compute cache advantages
* reduces on-chip data movement overhead: energy spent on data transfer: energy spent on data transfer and energy spent when reading and writing in the higher-level caches
## compute cache architecture
* prerequisite: operands are mapped to sub-arrays such that they share the same bit-lines
* a cache geometry that allows a compiler to satisfy operand locality by ensuring that the operands are page-aligned
* Compute Cache (CC) ISA

* operand size: 64 words (512 bytes)
* By feeding the result of the sense-amplifiers back to the bit-lines, one word-line can be copied to another without ever latching the source operand.
* operand locality -- the operands need to be physi- cally stored in a sub-array, such that they share the same set of bitlines
* cache organization

* Block Partition (BP): group of cache blocks in that sub-array that share the same bitline
* all the ways in a set are mapped to the same block partition
* software

* software can ensure operand locality as long as operands are page-aligned, i.e., have the same page offset.
## simulation:
* [SniperSim](http://snipersim.org/w/The_Sniper_Multi-Core_Simulator)