# P76111246 陳靖雯 AI Chip System Capstone Project Planning
[TOC]
## Preface
::: info
**TODO - 簡述一下你修專題課的動機**
:::
理解人工智慧晶片在軟硬體之間的設計與整合細節。
## Team members
::: info
**TODO - Please list your project team members with the following format**
:::
| Name | School ID | Email |
| ---- | --------- | ----- |
| | | |
| | | |
| 陳靖雯 | P76111246 | wartytw@gmail.com |
## Project Description and Scope
:::info
**TODO - 描述你的專題方向跟你目前知道理解的細節**
1. 定義一下你要解的問題是什麼?為什麼這個問題重要
2. 你想要朝什麼方向做研究來解這個問題?
:::
1. 決定模型的資料參數如何排程與做映射,以提高運算速度及降低能量消耗。此兩點為因應現今龐大數據(如醫療影像MRI)以及邊緣運算裝置(低功耗需求)而做的考量要點
2. 第一,使用現有工具進行模型耗能與運算速度評估,由老師所推薦maestro開源工具做初步分析,並將此工具做分析與修改,以符合計畫製作晶片所需
:::info
Belows are suggestions and modifications from 偉棻老師
:::
### How to describe a map file? Do we need to redefine the map file format?
### Cutlass code review - help you to understand the possible mapping scheme and optimizations
- Try to run cutlass on GPU
- [Efficient GEMM in CUDA](https://github.com/NVIDIA/cutlass/blob/master/media/docs/efficient_gemm.md)
- [Cutlass GitHub](https://github.com/NVIDIA/cutlass) code review clust - [name=玠志] [name=光祥] [name=靖雯]
### survey and propose how to enumerate the design space for explore
- input: a model layer (CONV2D/GEMM)
- output: a lot of map file with different work partition
## Project Planning
::: info
**TODO - Please do a literature search to figure out the details of your target project**
This planning will keep being updated as your project makes more progress
:::
1. To fully understanding the execution of maestro(evaluation tool) to the extent that I could totally understand how each output of the following program comes from.
> (Note: --Mapping_file='data/mapping/example.m')

2.
### To-Do List
::: info
**TODO - Please list at least the items you plan to do for the coming week. It's better to include all items that you can think of in your project**
:::
> Version 1
> 1. 第一步:使用現有工具進行模型耗能與運算速度評估,由老師所推薦maestro開源工具做初步分析,並將此工具做分析與修改,以符合計畫製作晶片所需
> 2. 第二步:抽換硬體步驟,根據Maestro,順序依序為 PE -> NOC -> Memory,所以,必須先對基本PE設計有深度理解
> 3. 第三步:必須進一步理解NOC(network on chip)運作原理,以做進一步資料排程與規劃
> 4. 第四步:對於SRAM,以及Cache設計與彼此間的關係(包括耗能問題,位置與傳送速率相關問題),需要更加理解。
> 5. 第五步:在研究完硬體部份流程以後,針對partition&scheduling,需對平行運算有更進一步認識,在此部份,目前初步規劃為先試圖理解研究三部份(1). 記憶體管理 (2). 平行程式設計(including cuda programming) (3). 針對(2)進一步理解nvidia 釋放出來的driver source code 做研究與理解
> 6. 第六步:設計初步排程映射演算法,並使用簡單模型進行實體fpga實驗以驗證理論
>
> 初步總結:我對整體研究實做方向其實還是很不完善,為此,我認為最先該進行的應該是把AIAS裡頭的功課至少先全部review一遍,尤其是lab8,lab9,以及[MCVP Tutorial Lab 2 - Modeling a device in QEMU](https://osp.computing.ncku.edu.tw:3001/abQe0WC5TEGxUK5twsk06A#),才能針對主題(allocation&mapping)做完整規劃(還有jserv老師[「Linux 核心設計」系列講座](https://beta.hackfoldr.org/linux/),關於記憶體,排程方面的教材,雖然有些不是直接相關,但我欠缺這方面基礎知識,可能會影響對設計的完整理解)。
> Version 2
> 1. Run [Cutlass](https://github.com/NVIDIA/cutlass) and write a CUDA code to compare the efficiency.
> 2. Understanding the structure of the Cutlass
> 3. Understanding the mechanism behind the mapping file on Maestro
> 4. Research the optimization algorithm behind the Cutlass(GEMM and CONV)
## Reference
[MAESTRO Tutorial - MICRO 2020](https://maestro.ece.gatech.edu/docs/build/html/tutorials/micro2020.html)
[MAESTRO Website](https://maestro.ece.gatech.edu/)
[並行程式設計](https://hackmd.io/@sysprog/concurrency/https%3A%2F%2Fhackmd.io%2F%40sysprog%2FS1AMIFt0D)
[Mapping Alexnet to a Target Hardware](https://aionchip.computing.ncku.edu.tw:3001/6s7NSAa2SieopGLMIUyY7Q?view#)
[Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp](https://developer.nvidia.com/blog/optimizing-data-transfer-using-lossless-compression-with-nvcomp/)