# 論文 ## ISCA 2018 ### GPUs - [ ] [RegMutex: Inter-Warp GPU Register Time-Sharing](https://drive.google.com/open?id=1hXG25sjvY-7Ldf6Q5GlkNwoksV450PtI) - share a subset of physical registers between warps during the GPU kernel execution. >warp: 是 CUDA 中,每一個 SM 執行的最小單位 >SM: Streaming Multiprocessor >參考: [CUDA 的 Threading:Block 和 Grid 的設定與 Warp](https://kheresy.wordpress.com/2008/07/09/cuda-%E7%9A%84-threading%EF%BC%9Ablock-%E5%92%8C-grid-%E7%9A%84%E8%A8%AD%E5%AE%9A%E8%88%87-warp/) > - [ ] [Generic System Calls for GPUs](https://drive.google.com/open?id=1AKfOyoqLuMAKL67fWMRX9BGdcLtHrbeB) - 在 Linux 上實作一些提高 GPU 效能的 system calls - [ ] [The Locality Descriptor](https://drive.google.com/open?id=1Vn4zh4cui3f_hBaKKmCZA1fukzeKpZSG) - 增加 GPU data 的 locality 和 reuse ### Machine learning system - [ ] [SnaPEA: Predictive Early Activation for Reducing](https://drive.google.com/open?id=1TsHVDOvCJAX29Vse75vYeYEltyZWh8Mh) - 刪減 `convolution` 的計算量。通常負數的 `input` 經過 `activation` 後會變成 0 ,此實驗透過預測的方式得知 `input` 會不會是負數,若會就不繼續計算 - [ ] [UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition](https://drive.google.com/open?id=14z5jzHEuAr5UTQkNpkX9iCckzlGk-7Aj) - 實作 Weights repeatation 相關的加速器,用 Eyeriss-style sparsity optimizations. - [ ] [Energy-efficient Neural Network Accelerator Based on Outlier-aware Low-precision Computation](https://drive.google.com/open?id=1i7f80-QalH6tm29n9DvM24dRR15GKOt_) - hardware accelerator, called the outlier-aware accelerator - It performs dense and low-precision computations for a majority of data (weights and activations) while efficiently handling a small number of sparse and high-precision outliers (e.g., amounting to 3% of total data). ## AI accelerator - [ ] On-Chip Memory Optimization of High Efficiency Accelerator for Deep Convolutional Neural Networks - 沒幹用 - [ ] High Performance Accelerator for CNN Applications - 在 FPGA 上實現很簡單版的 CNN 加速器 - 沒什麼幹用 - [ ] An Energy-Efficient and Flexible Accelerator based on Reconfigurable Computing for Multiple Deep Convolutional Neural Networks - 改進 Eyeriss 的 data 搬運方式 ###### tags: `論文`