# Networking Accelerator
###### tags: `Accelerator`
###### members: @謝翔丞 @張可晴 @楊卓敏, @kksweet8845
## Topic
A novel design for a generalized network processor accelerator to tradeoff.
## Introduction
Author: `@kksweet8845`
As the time goes by, the technology of chip improved every year, and someone start to observe that the Admah's law are going to disapear. The clock frequency cannot be increased because the power problem. And for the nowadays ASIC design, this kind of chip has high performance but with low flexible to map a new algorithm.
And for the FPGA, this is largely used in industry because of its reconfigurable property. However, the obstacle to explore the performance of FPGA is hard, developers need to know the details of FPGA and its corresponding hardware knowledge.
And, there is another field not be explored throughly. CGRA (coarse-grained reconfigurable array). In this section, we would like to discuss what is CGRA and how CGRA can be done. About the introduction of CGRA, please refer to the [CGRA](http://scis.scichina.com/cn/2020/SSI-2020-0130.pdf).

Here is some disadvantage of this nowadays general purpose CPU.
- The rate of power/op is very inefficient.
- As you can see, a five stage pipeline CPU has only one stage that will compute the value or executing something.
- And a high latency of accessing memory. (Cache, DDR4, Multi master snoop protocol.)
- The low flexibility of ASIC.
- Although the ASIC efficient low power/op, however, the the ASIC is too solid that cannot be used in other application of other field.
- And ASIC are usually used in mature field such as communication and networking processing.
- A reconfigurable computation model
- The DSA (Domain Specfic Acc) have efficient low power characteristic. However, the programmer are limited by the DSA, that it the programmer need to understand the feature of Hardware, then manipulating their algorithm to fit the hardware.
- Different algorithm needs different set of usage of resource

- The key point of CGRA
- To maintain the high flexibility and it low power problem.
- CGRA are configured by configuration message. That means there is a hardware will receive the message and perform the routing and scheduling.
- CGRA are more like ASIC or more like GP-CPU.
- The difference between CGRA and FPGA
- FPGA: long developing time and need to understand the hardware architecture, long synthesizing problem. FPGA needs long time to synthesize.
- CGRA:
- The multi mini core computation model
- PE array, the PE array just needs to perform the different simple operation to increase the energy efficiency.
- CGRA application
- Informaiton Security
- Informaiton and graph processing
- Deep learning, CNN, RNN.
- Potential field
- Network processing :
- [ClickNP]() : focus on FPGA (Programmable hardware)

## Goals
Design a CGRA architecture to tradeoff between reconfigurable and performance.
- The problem of [string matching](http://rportal.lib.ntnu.edu.tw/bitstream/20.500.12235/35334/1/ntnulib_tp_E0213_04_003.pdf)
- Not specific for rule matching
- Need to manipulate this for rule matching
-
## Ideas
Find the main ~~throughtput~~ performance bottleneck to of AC algorithm. And design a CGRA architecture that the algorithm can map to this hardware.
## Steps of Goals
### 1. Research AC algorithm
- Find the problem of AC algorithm.
- Find the main bottleneck and the pigasus bottleneck
- Research the algorithm of Hyperscan
- Keep up the progress of this [note](https://hackmd.io/Bak1f1bBRIyOMIQOwKfW-A)
- AC String matching for FPGA [note](https://hackmd.io/@akk26fxhRC-OqvycOrP6lw/SJkoYmtlY)
### 2. Snort (Data)

- Undertsand How to define rule and how to do string-matching it
- [Snort](https://www.snort.org/documents)
- [Snort Rule TechByte](https://snort-org-site.s3.amazonaws.com/production/document_files/files/000/000/128/original/Snort_Rules_Techbyte_Final.mp4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAU7AK5ITMGOEV4EFM%2F20220317%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220317T152751Z&X-Amz-Expires=172800&X-Amz-SignedHeaders=host&X-Amz-Signature=f8e9785359ea422e85b629b05069618b9f3cce59c2c9b4369da9140bfc2dc150)
- [Snort Tutorial](https://snort-org-site.s3.amazonaws.com/production/document_files/files/000/000/069/original/Snort-IPS-Tutorial.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAU7AK5ITMGOEV4EFM%2F20220317%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220317T153029Z&X-Amz-Expires=172800&X-Amz-SignedHeaders=host&X-Amz-Signature=c4abd3c9516c215124ca15b9a7573cc6dfe3764937630331defe8cf7e8b80273)
### 3. CGRA Arch
- CGRA architecture design
- [ADRES](https://courses.cs.washington.edu/courses/cse591n/06au/papers/fpl_03_mei.pdf)
- [CCF](https://github.com/MPSLab-ASU/ccf)
- [CGRA Versat Architecture](https://mdpi-res.com/d_attachment/electronics/electronics-10-00669/article_deploy/electronics-10-00669.pdf)
## Related Work
- [ADRES](https://courses.cs.washington.edu/courses/cse591n/06au/papers/fpl_03_mei.pdf)
- [RISC-V/CGRA](https://web.fe.up.pt/~specs/events/wrc2020/files/Jose%20de%20Sousa%20WRC%202020.pdf)
- [Deep Versat](https://fenix.tecnico.ulisboa.pt/downloadFile/844820067126330/Paper_90901.pdf)
- [iob-versat](https://github.com/IObundle/iob-versat)
## Weekly Progress
* [02/18/2022- 03/04/2022](/VFs7gXgBQHOBX-LDAq7qbw)
* [03/04/2022- 03/18/2022](/JjO46mBOSvqxaMwCMRCQWg)
* [03/18/2022- 04/01/2022](/Z55qpQyWTYy7BWWMlFPWJg)
* [04/01/2022- 04/15/2022](/XcEau5CjTSmQwjbg2gs4Zw)
* [04/29/2022- 05/13/2022](/Lyg_9H2bRbW_rniGwtH1wg)
* [05/13/2022- 05/27/2022](/eJLqb66gTNG0srky54ps4A)
* [05/27/2022- 07/01/2022](/KKyjhoBQRD-BvdGmuC5V-g)
* [07/15/2022- 07/29/2022](/PvVY4pY4TbKQbvCvs7Q6Ww)
* [07/30/2022- 08/12/2022](/Avk98AxNSsqJwbyKaWGg3g)
* [08/15/2022- 08/26/2022](/9ab4nGoxTmuz10PFK6VTHw)
* [08/26/2022- 09/12/2022](/I1uJlGo0QRC_UMs0lT3YUg)
* [09/13/2022- 09/26/2022](/AVOSZm4uSNygq4FPaWW3vA)
* [rsy](/JvDwB0frTy6tQSIRbPe-7w)
## TODO List
* [ ] Data
* [x] [Snort Analysis](https://hackmd.io/CffDauUURAqEp7nSZzdrsA)
* [ ] [Packet Capture DPDK](https://doc.dpdk.org/dts/test_plans/packet_capture_test_plan.html)
* [ ] CGRA
* [ ] [CGRA-Flow](https://github.com/tancheng/CGRA-Flow)
* [ ] [OpenCGRA](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9516636)
* [ ] [Similarity Aware CGRA](https://mdpi-res.com/d_attachment/electronics/electronics-10-02210/article_deploy/electronics-10-02210-v2.pdf)
* [ ] ==Bottleneck of Pigasus==
* [ ] Rule
* [ ] Memory
* [ ] Computation
* [ ] Pattern Matching
* [ ] [DFC](https://hackmd.io/Zu0euPOGTaGFdU1HaBH_aw)
* [ ] Realistic thing
* [ ] Ethernet IP core spec (Packet capture)
* [ ] DMA Engine
* [ ] TCP Reassembly
* [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Packet_structure)
* [Internet Procotol](https://datatracker.ietf.org/doc/html/rfc791)
* [IP Fragmentation](https://en.wikipedia.org/wiki/IP_fragmentation#cite_ref-RFC791_1-0)
* [ ] [Paper List](https://docs.google.com/spreadsheets/d/1tGhkR8jy8Ep4fAVZozEdckFykA-gjcMYf0x4OJeJ0Vo/edit?usp=sharing)
* [ ] VCS
## References
- [Achieving 100Gbps Intrusion Prevention on a Single Server](https://www.usenix.org/conference/osdi20/presentation/zhao-zhipeng)
- [FlowBlaze: Stateful Packet Processing in Hardware](https://www.usenix.org/system/files/nsdi19-pontarelli.pdf)
- Flow table implementation
- [Hyperscan: A Fast Multi-pattern Regex matcher for Modern CPUs](https://www.usenix.org/system/files/nsdi19-wang-xiang.pdf)
- Hash table lookup
- String matching
- [Robuts TCP reassembly](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5575701)
- TCP reassembly basic approach
- [Plasticine: A Reconfigurable Architecture For Parallel Patterns](https://dl.acm.org/doi/pdf/10.1145/3079856.3080256)
- [Stream-Dataflow Acceleration](https://research.cs.wisc.edu/vertical/papers/2017/isca17-stream-dataflow.pdf)
- [Coarse-Grained Reconfigurable Computing with the Versat Architecture](https://www.mdpi.com/2079-9292/10/6/669/htm)
## Resources
- [GOPT](https://github.com/efficient/gopt/tree/cuckoo-48-gpu)
- [A brefie introduction about CGRA](http://scis.scichina.com/cn/2020/SSI-2020-0130.pdf)
- [Snort 3.0](https://www.snort.org/)
- [Compiler-Microarchitecture](https://www.public.asu.edu/~ashriva6/cml/research/)
- [iob-soc](https://github.com/IObundle/iob-soc)
- [RISC-V/CGRA-based open source SoC](https://web.fe.up.pt/~specs/events/wrc2020/files/Jose%20de%20Sousa%20WRC%202020.pdf)
- [Fast Mapping Algo for CGRA](https://mpslab-asu.github.io/publications/papers/Balasubramanian2022DATE.pdf)
- [CGRA Template](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1413142)
- [CGRA handbook](https://link.springer.com/content/pdf/10.1007/978-3-319-91734-4.pdf)
- [Performance Analysis](https://www.intel.com/content/dam/develop/external/us/en/documents/performance-analysis-guide-181827.pdf)
- [DFC](https://hackmd.io/Zu0euPOGTaGFdU1HaBH_aw)
<!--
### Archive -->
<!--
* Project Summary
* [home](/23rt7rdmRrSBs2oecJd2tQ)
* [Book mode]
* [Related Works](/YjcrAN9ASBuug3UHD2xeAg)
* [實驗數據表格]
* [Setup Environment](/YFeAEuhVRpCX-jOgWbVs-g)
* [Debug Note](/cNu4NEzpR9-N5SFrsKMd-w)
* Paper list()
-->