P76111246 陳靖雯 AI Chip System Capstone Project Planning

# P76111246 陳靖雯 AI Chip System Capstone Project Planning [TOC] ## Preface ::: info **TODO - 簡述一下你修專題課的動機** ::: 理解人工智慧晶片在軟硬體之間的設計與整合細節。 ## Team members ::: info **TODO - Please list your project team members with the following format** ::: | Name | School ID | Email | | ---- | --------- | ----- | | | | | | | | | | 陳靖雯 | P76111246 | wartytw@gmail.com | ## Project Description and Scope :::info **TODO - 描述你的專題方向跟你目前知道理解的細節** 1. 定義一下你要解的問題是什麼？為什麼這個問題重要 2. 你想要朝什麼方向做研究來解這個問題？ ::: 1. 決定模型的資料參數如何排程與做映射，以提高運算速度及降低能量消耗。此兩點為因應現今龐大數據（如醫療影像MRI）以及邊緣運算裝置(低功耗需求)而做的考量要點 2. 第一,使用現有工具進行模型耗能與運算速度評估，由老師所推薦maestro開源工具做初步分析，並將此工具做分析與修改，以符合計畫製作晶片所需 :::info Belows are suggestions and modifications from 偉棻老師 ::: ### How to describe a map file? Do we need to redefine the map file format? ### Cutlass code review - help you to understand the possible mapping scheme and optimizations - Try to run cutlass on GPU - [Efficient GEMM in CUDA](https://github.com/NVIDIA/cutlass/blob/master/media/docs/efficient_gemm.md) - [Cutlass GitHub](https://github.com/NVIDIA/cutlass) code review clust - [name=玠志] [name=光祥] [name=靖雯] ### survey and propose how to enumerate the design space for explore - input: a model layer (CONV2D/GEMM) - output: a lot of map file with different work partition ## Project Planning ::: info **TODO - Please do a literature search to figure out the details of your target project** This planning will keep being updated as your project makes more progress ::: 1. To fully understanding the execution of maestro(evaluation tool) to the extent that I could totally understand how each output of the following program comes from. > (Note: --Mapping_file='data/mapping/example.m') ![](https://osp.computing.ncku.edu.tw:3001/uploads/upload_240db3b493ebad24ed43181249428c00.png) 2. ### To-Do List ::: info **TODO - Please list at least the items you plan to do for the coming week. It's better to include all items that you can think of in your project** ::: > Version 1 > 1. 第一步：使用現有工具進行模型耗能與運算速度評估，由老師所推薦maestro開源工具做初步分析，並將此工具做分析與修改，以符合計畫製作晶片所需 > 2. 第二步：抽換硬體步驟，根據Maestro，順序依序為 PE -> NOC -> Memory，所以，必須先對基本PE設計有深度理解 > 3. 第三步：必須進一步理解NOC(network on chip)運作原理，以做進一步資料排程與規劃 > 4. 第四步：對於SRAM，以及Cache設計與彼此間的關係（包括耗能問題，位置與傳送速率相關問題），需要更加理解。 > 5. 第五步：在研究完硬體部份流程以後，針對partition&scheduling，需對平行運算有更進一步認識，在此部份，目前初步規劃為先試圖理解研究三部份(1). 記憶體管理 (2). 平行程式設計(including cuda programming) (3). 針對(2)進一步理解nvidia 釋放出來的driver source code 做研究與理解 > 6. 第六步：設計初步排程映射演算法，並使用簡單模型進行實體fpga實驗以驗證理論 > > 初步總結：我對整體研究實做方向其實還是很不完善，為此，我認為最先該進行的應該是把AIAS裡頭的功課至少先全部review一遍，尤其是lab8,lab9，以及[MCVP Tutorial Lab 2 - Modeling a device in QEMU](https://osp.computing.ncku.edu.tw:3001/abQe0WC5TEGxUK5twsk06A#),才能針對主題(allocation&mapping)做完整規劃（還有jserv老師[「Linux 核心設計」系列講座](https://beta.hackfoldr.org/linux/)，關於記憶體，排程方面的教材，雖然有些不是直接相關，但我欠缺這方面基礎知識，可能會影響對設計的完整理解）。 > Version 2 > 1. Run [Cutlass](https://github.com/NVIDIA/cutlass) and write a CUDA code to compare the efficiency. > 2. Understanding the structure of the Cutlass > 3. Understanding the mechanism behind the mapping file on Maestro > 4. Research the optimization algorithm behind the Cutlass(GEMM and CONV) ## Reference [MAESTRO Tutorial - MICRO 2020](https://maestro.ece.gatech.edu/docs/build/html/tutorials/micro2020.html) [MAESTRO Website](https://maestro.ece.gatech.edu/) [並行程式設計](https://hackmd.io/@sysprog/concurrency/https%3A%2F%2Fhackmd.io%2F%40sysprog%2FS1AMIFt0D) [Mapping Alexnet to a Target Hardware](https://aionchip.computing.ncku.edu.tw:3001/6s7NSAa2SieopGLMIUyY7Q?view#) [Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp](https://developer.nvidia.com/blog/optimizing-data-transfer-using-lossless-compression-with-nvcomp/)