# String Matching ###### tags: `Accelerator` ###### tags: @張凱雋 @潘冠蓁 ## Time: 7/5/2021 - 7/12/2021 ### 1. Successes This Week * Build up the riscv-toolchain https://hackmd.io/@nx1bTzFpQvaKD_yxYRmBaQ/Skpx_8tSO (riscv-toolchain) * check string matching AC algorithm https://hackmd.io/XvnX70MESVGPriNpdYPmWA #### 2. QEMU user-mode installed ### 2. Progress and Problems Last Week * **AC algo** * https://github.com/BurntSushi/aho-corasick * https://github.com/morenice/ahocorasick * https://www.evanlin.com/about-aca/ * **RVV ISA** * https://github.com/riscv/rvv-intrinsic-doc &https://github.com/riscv/riscv-v-spec ### 3. Not Resolved Problems * https://github.com/plctlab/rvv-benchmark ### 4. Goals for next week * Goals for next week --- ## Time: 7/12/2021 - 7/19/2021 ### 1. Successes this Week #### string matching * Parallel Failure less AC algorithm https://hackmd.io/Z-U2VC1fT2S2_rpPzlVAQA * PFAC's optimized implementation on GPGPU - Two-Phase PFAC https://hackmd.io/_UOAvR8qSSacNJW2DEnY4w * PFAC-library source code https://github.com/pfac-lib/PFAC ### 2. Progress and Problems This Week ##### 1. cuda版本不支援,此程式更新於2012,很可能不支援現在cuda版本(pfac-lib) ![](https://i.imgur.com/pIiWcP0.png) ### 3. Not Resolved Problems ### 4. Discussion Record This week #### string matching 1. PFAC 在 GPU 上運算的部分,是否得等所有封包都抵達後才能對整個長字串進行同步分析? 能不能依當下狀況有調整空間? ex: 或許設個時間上限,超過後不用等封包全數抵達即可提早開始運作? 怎麼運作勒? 2. shared memory 以外,後面存取在 global/texture memory 內的字典樹可否依據使用頻率對其存放位置進行調整? 用深度學習去分析其存放位置 & 深度學習的架構在這邊是否可用上? 會不會在 accelerator 上的快取記憶體區還是不夠大,需要繼續優化? 3. 字串長短不一,要怎麼貼合類似 GPU 的特性可平均化其長度,方便用矩陣運算實現同步運作,怎麼切段 & 補其長度 的問題 ### 5. Goals for next week #### string matching 1. AC 的 SIMD 運作 2. 看看 paper 內有沒有字典樹太大的相關描述 -> 好像沒有XD,因為兩階段 PFAC 那篇論文主要就是在說在他們對記憶體分配有做階段切割後,記憶體面向的問題就會好很多XD --- ## Time: 7/19/2021 - 7/26/2021 ### 1. Successes this Week #### string matching * AC with SIMD https://hackmd.io/HtM93FleSNSvh0JPRI8uEA?both ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week #### string matching 1. WM, CW 算法具體細節上比不上 AC algo 的原因 -> paper 內有標記其他篇 paper 需要的話可以再去查看 2. WM, CW 算法因為都是承襲 BM 的精神,由後往前掃,那會不會因為這個特性讓 process 查找或跳轉過程 & code 變得更複雜一些? 這兩個算法各自需要的 3 個 table 之於記憶體需求 & 算法步驟相對繁雜的部分,是否真的比 AC 不實際? 3. 到底是自己的 pattern set 大還是攻擊的 pattern 長 & 大量會讓算法們遇到瓶頸? 4. goto & fail 一起轉成 next state function 後是否有一些必要的東西被省略精簡了? 5. ☆ vectorize 的部分,state table 內每個 row 所需存取的資料量不同,會不會有很多空間被閒置了勒?要怎麼解決? 想法 : 1. 每個 row 把 32 bytes 塞滿避免閒置記憶體? (但需要去另外去記各個 state 對應資料的區間,而且實際上的情況應該是 32 bytes 都可以塞滿因為情形複雜八 2. 稀疏矩陣的 map 法 3. RVV 對不同長短資料的處理方式 ![](https://i.imgur.com/1P2Mhxy.png) (☆ 才是重要的問題) ### 5. Goals for next week ## Time: 7/26/2021 - 8/2/2021 ### 1. Successes this Week IDS/IPS 系統設計 : https://hackmd.io/@nctu-cas-lab/H1FamnkyK ### 2. Progress and Problems This Week ### 3. Not Resolved Problems * vtune無法開啟 ### 4. Discussion Record This week 1. 如何將paper中的FPGA轉換成GPU或者是TPU 2. 要如何把程式設計成可以接收網路流量(flow),並且用測資測試paper中流量上限是memory-bound還是computation-bound 3. AC跟PFAC的程式輸入以及輸出比較,並且了解效能差異 ### 5. Goals for next week 1. https://github.com/efficient/gopt 2. https://www.sciencedirect.com/topics/computer-science/network-processor 3. 了解PFAC code是如何運作的 ## Time: 8/2/2021 - 8/7/2021 ### 1. Successes this Week 1. sequence ac (C++) 2. SIMD ac (RUST) 3. PFAC (C++) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week 1. 把gopt的AC跑起來,並且用軟體profile程式執行cycle,記憶體讀取次數等 2. PFAC不用GPU而是使用CPU跑 3. 看有沒有辦法把AC跑在CGRA上 ### 5. Goals for next week 1. 上述第一點完成 ## Time: 8/7/2021 - 8/13/2021 ### 1. Successes this Week 1. ![](https://i.imgur.com/AzTH2MQ.png) 2. [Verilog for string matching](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/ryc3zfMlK) 1. 只有字串比對功能,PE分布還沒有像tree一樣呈現 ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week 1. 當字典樹大於PE會面臨的問題 1. 因為是pipeline比對,所以可能有些比對到一半突然換字典樹導致結果錯誤 2. failure link連接的node在另一半字典樹上 2. 字串比對的實現 1. Baseline:sequential、pipeline比對 2. 平行比對(不用failure link) 3. Aho-Corasick:pipeline比對且有failure link ### 5. Goals for next week 1. 了解AC的branch miss ,memory ld/sw次數,並將程式拆開測量(read data,build trie and failure link,string match),使用perf測量 2. 多字串比對改成pipeline、平行(不用failure link) ## Time: 8/14/2021 - 8/20/2021 ### 1. Successes this Week 1. AC program analyze : https://hackmd.io/@nx1bTzFpQvaKD_yxYRmBaQ/SylCb3Eet 2. AC implementation for FPGA : https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/SJkoYmtlY ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week 1. Parallel FPGA * 缺點 * 會有不必要的比較 * 需要擴充input string * 優點 * 不用failure link且省時 * 相比PFAC使用較少空間 ### 5. Goals for next week 1. AC implementation for FPGA 1. Implement PFAC for FPGA 2. Implement KMP for FPGA 3. Optimize parallel * 參考DNN kernel computation slide 2. Performance of AC 1. Measuring time of each stages of AC 2. Measuring size of pattern and pkt 3. The ratio of access trie and non access trie 4. Other gopt program implement ## Time: 8/21/2021 - 8/27/2021 ### 1. Successes this Week * AC_data : https://docs.google.com/spreadsheets/d/19dQSQtQtPcJzFA_91FFm0p8ccpDhtFB91lIXLHhmhlI/edit#gid=0 * gopt : https://hackmd.io/@nx1bTzFpQvaKD_yxYRmBaQ/HkXLBi7bY * AC implementation for FPGA : [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/BJ-zfVfZF](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/BJ-zfVfZF) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week 1. AC implementation for FPGA 1. 實作NoC放入parallel:不用擴充input string可以省多少sram 2. 怎麼把短字串map到PE上:短字串長度不一致 3. 字典樹比PE大的解決方式 1. 短字串很短但很多個 2. 短字串較長超過一列PE數量 2. 了解concurrent hash table如何運作以及實作 3. 將snort rule轉換成snort trace並使用 ## Time: 8/28/2021 - 9/3/2021 ### 1. Successes this Week 1. concurrent hash table : https://hackmd.io/@nx1bTzFpQvaKD_yxYRmBaQ/SJPMemoWK 2. AC for FPGA — 字典樹比PE大 & NoC : [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/HyT06Pn-K](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/HyT06Pn-K) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week 1. DLRM,KSVS資料查找,以及問題發想。 2. AC expand parallelism實作的其他可能情況 1. 短字串大於PE row的解決方式:以短字串長10為例,一列有3個PE,切成3,3,1要花多久比對 2. 短字串小於PE row的解決方式:假設PE是10x10,短字串長度為3,一列塞3組短字串 3. 長字串太小:PE分成上下兩半,處理不同字串 4. pattrrn大於PE總數:把pattern切兩半 5. **短字串長度不一:以長度1,2,3,4為例,總數未超過12,但如果一列放一個會導致最後一列要多做一輪** ## Time: 9/4/2021 - 9/9/2021 ### 1. Successes this Week 1. kvs介紹以及問題: https://hackmd.io/@nx1bTzFpQvaKD_yxYRmBaQ/ry94ZCrGY 2. AC expand parallelism實作的其他可能情況: [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/Sk7rQKQzY](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/Sk7rQKQzY) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week 1. AC expand parallelism 1. routing table紀錄weight被塞在哪裡 2. controller決定下一個是什麼 3. 盡量利用所有PE 2. lock-free tree 以及如何將hash function放在多個PE上已進行kvs database的加速 ## Time: 9/10/2021 - 9/16/2021 ### 1. Successes this Week 1. Router & Controller : [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/B1aKW1RMt](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/B1aKW1RMt) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week 1. 做一版短字串>17要分到下一個大cycle的case做為比較 2. 跑data做完整的比較,把結果畫成圖(sequential、pipeline、expand parallel) 1. cycle數:(X:種類、Y:cycle) 2. PE使用率 3. PE變多的差異(group數不同) 3. 把多做比較的PE關掉 ## Time: 9/17/2021 - 9/23/2021 ### 1. Successes this Week 1. string preprocess & modify weight cut module : [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/SydpMFOXF](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/SydpMFOXF) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week ## Time: 9/17/2021 - 9/30/2021 ### 1. Successes this Week 1. [https://hackmd.io/@i2u9vCl_QXOivx0252xGWQ/HkG2dafVF](https://hackmd.io/@i2u9vCl_QXOivx0252xGWQ/HkG2dafVF) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems 1. 早點結束 2. PE 資料都要大一點 3. cyle utilization圖 4. 多dataset模擬 ### 4. Discussion Record This week ### 5. Goals for next week 1. 圖 ## Time:10/1/2021 - 10/7/2021 ### 1. Successes this Week 1. [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/HyfELesEt](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/HyfELesEt) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week ## Time:10/8/2021 - 10/14/2021 ### 1. Successes this Week 1. [https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/HyfELesEt](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/HyfELesEt) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week## Time:10/8/2021 - 10/14/2021 ## Time:10/22/2021 - 10/28/2021 ### 1. Successes this Week 1. [AutomataZoo](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/SyhQTV8LY) ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week ## Time:11/5/2021 - 11/11/2021 ### 1. Successes this Week 1. [AutomataZoo可以用的測資](https://drive.google.com/drive/folders/17RGVuzopeTYKzPj7WWk94DexY9pONTre?usp=sharing) * CRISPR * EntityResolution * Hamming * Levenshtein ### 2. Progress and Problems This Week ### 3. Not Resolved Problems ### 4. Discussion Record This week ### 5. Goals for next week