# P76111246 陳靖雯 AI Chip System Capstone Project Planning [TOC] ## Preface ::: info **TODO - 簡述一下你修專題課的動機** ::: 理解人工智慧晶片在軟硬體之間的設計與整合細節。 ## Team members ::: info **TODO - Please list your project team members with the following format** ::: | Name | School ID | Email | | ---- | --------- | ----- | | | | | | | | | | 陳靖雯 | P76111246 | wartytw@gmail.com | ## Project Description and Scope :::info **TODO - 描述你的專題方向跟你目前知道理解的細節** 1. 定義一下你要解的問題是什麼?為什麼這個問題重要 2. 你想要朝什麼方向做研究來解這個問題? ::: 1. 決定模型的資料參數如何排程與做映射,以提高運算速度及降低能量消耗。此兩點為因應現今龐大數據(如醫療影像MRI)以及邊緣運算裝置(低功耗需求)而做的考量要點 2. 第一,使用現有工具進行模型耗能與運算速度評估,由老師所推薦maestro開源工具做初步分析,並將此工具做分析與修改,以符合計畫製作晶片所需 :::info Belows are suggestions and modifications from 偉棻老師 ::: ### How to describe a map file? Do we need to redefine the map file format? ### Cutlass code review - help you to understand the possible mapping scheme and optimizations - Try to run cutlass on GPU - [Efficient GEMM in CUDA](https://github.com/NVIDIA/cutlass/blob/master/media/docs/efficient_gemm.md) - [Cutlass GitHub](https://github.com/NVIDIA/cutlass) code review clust - [name=玠志] [name=光祥] [name=靖雯] ### survey and propose how to enumerate the design space for explore - input: a model layer (CONV2D/GEMM) - output: a lot of map file with different work partition ## Project Planning ::: info **TODO - Please do a literature search to figure out the details of your target project** This planning will keep being updated as your project makes more progress ::: 1. To fully understanding the execution of maestro(evaluation tool) to the extent that I could totally understand how each output of the following program comes from. > (Note: --Mapping_file='data/mapping/example.m') ![](https://osp.computing.ncku.edu.tw:3001/uploads/upload_240db3b493ebad24ed43181249428c00.png) 2. ### To-Do List ::: info **TODO - Please list at least the items you plan to do for the coming week. It's better to include all items that you can think of in your project** ::: > Version 1 > 1. 第一步:使用現有工具進行模型耗能與運算速度評估,由老師所推薦maestro開源工具做初步分析,並將此工具做分析與修改,以符合計畫製作晶片所需 > 2. 第二步:抽換硬體步驟,根據Maestro,順序依序為 PE -> NOC -> Memory,所以,必須先對基本PE設計有深度理解 > 3. 第三步:必須進一步理解NOC(network on chip)運作原理,以做進一步資料排程與規劃 > 4. 第四步:對於SRAM,以及Cache設計與彼此間的關係(包括耗能問題,位置與傳送速率相關問題),需要更加理解。 > 5. 第五步:在研究完硬體部份流程以後,針對partition&scheduling,需對平行運算有更進一步認識,在此部份,目前初步規劃為先試圖理解研究三部份(1). 記憶體管理 (2). 平行程式設計(including cuda programming) (3). 針對(2)進一步理解nvidia 釋放出來的driver source code 做研究與理解 > 6. 第六步:設計初步排程映射演算法,並使用簡單模型進行實體fpga實驗以驗證理論 > > 初步總結:我對整體研究實做方向其實還是很不完善,為此,我認為最先該進行的應該是把AIAS裡頭的功課至少先全部review一遍,尤其是lab8,lab9,以及[MCVP Tutorial Lab 2 - Modeling a device in QEMU](https://osp.computing.ncku.edu.tw:3001/abQe0WC5TEGxUK5twsk06A#),才能針對主題(allocation&mapping)做完整規劃(還有jserv老師[「Linux 核心設計」系列講座](https://beta.hackfoldr.org/linux/),關於記憶體,排程方面的教材,雖然有些不是直接相關,但我欠缺這方面基礎知識,可能會影響對設計的完整理解)。 > Version 2 > 1. Run [Cutlass](https://github.com/NVIDIA/cutlass) and write a CUDA code to compare the efficiency. > 2. Understanding the structure of the Cutlass > 3. Understanding the mechanism behind the mapping file on Maestro > 4. Research the optimization algorithm behind the Cutlass(GEMM and CONV) ## Reference [MAESTRO Tutorial - MICRO 2020](https://maestro.ece.gatech.edu/docs/build/html/tutorials/micro2020.html) [MAESTRO Website](https://maestro.ece.gatech.edu/) [並行程式設計](https://hackmd.io/@sysprog/concurrency/https%3A%2F%2Fhackmd.io%2F%40sysprog%2FS1AMIFt0D) [Mapping Alexnet to a Target Hardware](https://aionchip.computing.ncku.edu.tw:3001/6s7NSAa2SieopGLMIUyY7Q?view#) [Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp](https://developer.nvidia.com/blog/optimizing-data-transfer-using-lossless-compression-with-nvcomp/)