[2021-10-22] Mr. Peter Chang, 技術行銷經理, Skymizer Taiwan, Inc., "The anchor while both hardware and software are changing in AI area"

# [2021-10-22] Mr. Peter Chang, 技術行銷經理, Skymizer Taiwan, Inc., "The anchor while both hardware and software are changing in AI area" “It is the golden age of computer architecture.” 與日俱增的AI需求，供給方似乎漸漸跟不上AI龐大的model architecture及computing time，Skymizer從中看見需求，並提出他在middleware上對於加速及降低成本的解決方案。在AI model training上，最耗時的往往是porting，也就是資料在硬體間的傳輸，高達90%的時間都在等，尤其現在AI inference裝置極小的情況下更趨嚴重。目前的解決辦法通常是在有限的accuracy犧牲下，砍掉最多的Layer。但假若在systm software上改善，比如在pruning上砍node而非砍layer？是否會minimize accuracy sacrifice以及maximize memory optimize？因此在data movement上，提出data降低精度的方式，使傳輸資料加大到4倍，以解決速度上最大的瓶頸，但也因降精度會造成model garbage in/out的狀況，因此客製化的方法需要軟硬體端同步設計，由software端提出accu容忍度，並同時由hardware端配合try and error。在實務上，於yolo v5的compiler優化上，如何在異質原件上使data movement sommther，其亦提出了資料分流傳輸以及中繼站停滯的方式，以及把重要的weights放在local cache上，以降低資料在bus上pending的時間。 32bits FP的資料如何降到8 bits也是其所關注之範圍，通常是以Naïve的方式直接砍bit，但是為了降低data info loss，其設計了留下重要bits的方式，在減損info最低的情形下，加速傳輸。在memory saving上，其也提出layerwise的memory split，拆開unshared layer weights。最後公司也提出alternative way去節省成本，將次級SRAM(良率較差者)拿來運用，用compiler將model中重要的operator放在良率高的SRAM；將不重要的layer放在良率低的，由於AL model可以容忍不要太大的誤差，因此此類做法可以在硬體成本將低的條件下，維持accuracy(probability)，此類方式叫做SRAM fault mitigating。 ## Note ### The note I write is totally summarized version of speaker with minor my opinion. The citation is described below. ## Citation ### Topic: The anchor while both hardware and software are changing in AI area ### Speaker: Prof. Mr. Peter Chang