System relevant paper notes

# System relevant paper notes ###### tags: `paper` * [ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs](https://hackmd.io/@yaohsiaopid/SJbN6UduH) * [Compute Cache](https://hackmd.io/@yaohsiaopid/SkMnzfzKH) * [Composite-ISA](https://hackmd.io/@yaohsiaopid/S17BSeEtS) * [DREDGE: Dynamic Repartitioning during Dynamic Graph Execution](##DREDGE) * [Co-Design of Deep Neural Net Accelerators for Embedded Vision Applications; BAIR, DAC 2018](##DAC18_BAIR) ## DREDGE DREDGE: Dynamic Repartitioning during Dynamic Graph Execution, DAC'19 * Authors: Andrew McCrabb, Eric Winsor, Valeria Bertacco; University of Michigan * Abstract: hardware solution for heuristic repartitioning optimization of dynamic graph. * Problem: graph-based algorithm on dynamic graph has a bottleneck in memory for repartitioning. * scenario: recommendation system ~ dynamic graph where vertices or edges are added or deleted. * flaws: software repartitioning for graph-partitioning on many-core architecture replies on costly cross-vault communication to optimize the layout between algorithm iteratoins. ![](https://i.imgur.com/0VwSvD6.png) * Idea: a hardware unit on each vault snoops the messaging and tracks each vertices' need to move to other vault during the computation, and makes vertices migration during short phase of repartitioning. ![](https://i.imgur.com/BRNGmZo.png) ![](https://i.imgur.com/7JOWsEm.png) * How: "pressure()" on each vertices tracking, whenever an access coming from another vertex through message, the pressure toward that vault will increase, while the internal access within vault will decrease teh presssure for this vertex. One thing about the unit is that not all vertices on a vault are tracked because of the limitation of memory size, so when a vertice is less accessd with pressure() below zero will be removed from tracking list, and will be added back when accessed again. ![](https://i.imgur.com/fBje6dy.png) * Result: hopcount (the sumof the distances (in network hops) of all edges in the graph) as repartitioning quality ![](https://i.imgur.com/yDrHojh.png) ## DAC18_BAIR Co-Design of Deep Neural Net Accelerators for Embedded Vision Applications; BAIR, DAC 2018 * SqueezeNet as object * "co-design" ~ according to layer, do WS or OS * 1x1 convolution: WS > OS * 1st layer: OS > WS * Depthwise: OS > WS * processes layer one by one * OS exploits sparsity of filters