GraphRC research progress

# [Index] GraphRC project schedule ###### tags: `research-GraphRC` [TOC] ## ==Progress for ICCAD 2022== - Apr 28 - Paper draft schedule: - [Target] Format draft deadline 5/1 - [俊峰] review date 5/9 - [張老師] review date 5/9 - [林老師] review date - [ICCAD] Absract deadline 5/16 - [ICCAD] Final deadline 5/23 - Paper draft questions - Q: What kind of CPU to be used ? - A: There is no need to choose CPU core used in our system since our work focus on memory design. - Q: What is the metric we will use to compare our performance ? Power ? Speed ? - A: We will use speed as the performance metrix. We will not compare with other PIM-based graph accelerator because that will be complicate and confusing. Also, we will be comparing with our naive implementation with row-column address enabled. - Q: There are very few papers on dual addressible memory. What should we put in previous work ? - A: Previous work can be break into two parts: Graph accelerator, and dual-addressable memory. We can extend the previous work to replacing capacitor in main memory with NVM cell. There are plenty of papers in this field. Refer to this [paper](https://ieeexplore.ieee.org/document/8337123/) for details. - Q: What should we put in system architecture ? - A: We can refer to the architecture of NVMain. ![](https://i.imgur.com/PiFCNj5.png). But we should not include that in our paper. Only draw the diagram that emphasize the work done by us (like RC-NVM) - Don't use pseudo code in section 2: background and motivation. Use diagram to demonstrate the idea of vertex-centric programming model. - Apr 21 - Make ppt about your method - Apr 7 - 論文的寫法請參考 http://johnson.iis.sinica.edu.tw/ 的 "系統論文的新手研究步驟" - 做 ppt 解釋本作的想法，做完全部的實驗 - Apr 1 - 論文寫法: vertex-centric 性質(先不要提graphchi不然會被以為我們的框架跟graphchi綁在一起) -> read in/out-edges, write in/out-edges -> 對應到 row/col memory access -> 所以適合 RC-NVM 來做 - 對照組:要找一個簡單 straight forward 的RC解法來比較，而且要能夠帶到我們最終的解法。這個解法應該是用最簡的最基礎的方法在 RC-NVM 上面實作 vertex-centric 演算法 - 例如之前提出的 RC-NVM shard placement 的方法就可以考慮 - 對照組或許可以跟 GaaX 來做比較 - Mar 31 - 想辦法在不忽略 RAW hazard 的情況下將 memory access reorder 來獲得的效能 - 找更多的 fault tolerant 圖的演算法來實做 - 做 experiment runner - 設計 merging threshold 曲線調整的實驗 - Mar 24 - 參數 merging threshold 要用何種函數曲線調整？ linear ? constant ? - 找更多的 fault tolerant 圖的演算法來實做 - Mar 20 - 參數 merging threshold 要如何找到最好的初始值，尋找這個最佳的初始值要如何設計實驗？這個可能跟圖的分支度的分佈有關係 - 想想 RC-NVM 跟 vertex-centric program 的結合，自己試著分析一下 - Mar 18 - 謂何要用 RC-NVM 這個硬體作 vertex-centric 演算法？RC-NVM 有何特性讓你一定要這個硬體？還是有 row, col access 性質的應用都可以用 RC-NVM 用的很好 - Feb 10 - Q: How can we come up with a reasonable buffer size ? Larger the buffer is, the more requests can be merged together, and potentially, more row access can be replaced with column access. However, this come at the cost of memory access delay. Another problem is the potential RAW hazard. If write requests are delayed, will there be RAW hazard ? There should be a hash-based address matching system which is capable of detecting RAW hazard and issue write requests immediately if that request cannot be delayed. - Jan 28 - The meaning of second baseline. This method should be relatively intuitive and but different from the first baseline. - Matrix multiplication is not the second baseline. Instead, it should be a persusive example at the beginning of the paper telling readers why RC-NVM is useful. - You need to have better understanding in this simulator to come up with genuine solution. Why is RC-NVM unique other than row column access. - If you don't have an idea about the next innovative solution. Consider analyzing your dataset. Understanding your dataset is a good way to get good ideas. - Try exploring new algorithms it might help you to get better insights. - NVM variation is a plus but it should not affect your overall logic flow in your paper. It adds another dimension to your solution and it should be elegently integrated. Otherwise, it affecet the flow of your paper. - - Nov 11 - Draw memory mountain figure - Nov 4 - Present paper [link](https://docs.google.com/presentation/d/1GQacC4wsoTi90jYBAoVGdECgj6kpc4J746uQH2D7eig/edit?usp=sharing) - Wider range for memory mountain prober - Oct 28 - Write a Memory Mountain Prober that generate probing tracing automatically [Github](https://github.com/WeiCheng14159/MemoryMountainProber) - Oct 13 - Identify the underlying problem - Oct 6 - Conduct a memory striding test for RC-NVM [here](https://hackmd.io/@WeiCheng14159/ryfGQkjVK). This stride test probe the memory space and aim to find the switching cost of each level hardware architecture. - Sep 29 - Clarify the architecture of RC_NVM, updated slide is [here](https://docs.google.com/presentation/d/1Nod-33jaI3vAFOe47u_qGkhPJmi6bO7PC4ZuV98r6m4/edit?usp=sharing) - Came up with a new conceptual architectural level **logical subarray, and physical subarray** - Sep 24 Meeting with Dr. Wu - Discuss the research progress - Plan the road map for this paper: - Compare our method with the one with only row address space; however, there will be less innovation in your paper because most researchers given enough time would come up with similar solution. - Think of your method as the base line performance. Add more graph data placement strategy that BEST-FIT the RC-NVM architecture. Such as subarray level parallelism, how memory controller schedules request to DRAM/NVM etc. - Sep 22 Meeting with Profs - Present my findings in DRAM protocol, including number of bits transferred per cycle, the detailed architecture of RC-NVM - [ ] Make ppt slides to animate the bit-transferring process of RC-NVM - Sep 15 - Very few progress due to vaccine side effect - Sep 9 - Day off due to vaccine side effect - Sep 1 - Progress - [x] Verify the speedup ratio next time - [x] Report speedup results of page rank algorithm - Aug 25 (Wed) - Dayoff - Aug 22 - Meeting rescheduled to Friday - Progress - [x] Study and implement connected component graph algorithm - [x] Verify the result of connected component graph algorithm - [x] Test idea: Run graph algorithm for 1 iteration to obtain trace. - Aug 11 - No meeting - Aug 4 - Progress - [x] Try to implement multi-threading graph simulator - [x] Generate RC-NVM compatible memory trace - [x] Obtain results from RC-NVM simulator - [x] Idea: Run graph algorithm for 1 iteration to obtain trace. Then, times 10 to obtain results for 10 iterations - [x] Discuss how to multi-threading issue - [x] Discuss how we promote our method. What exactly do we want to compare to. We want to compare our method with memory with only ROW address space. - TODO - [ ] Generate the memory trace given ROW address space only - [ ] Implement multi-threading for Graph simulator - [ ] Implement scheduler for Graph simulator - Jul 28 - [x] Fix memory hungry issue caused by high degree vertex - [x] Fix how we ROW access method - [x] Verify simulator results by comparing generated results with GraphChi - Jul 21 - Progress - [x] Fix bugs in trace generator: Incorrect trace during col access, out-edge left out some edges in shards - [ ] Discuss how to preprocess graph dataset - TODO - [ ] Implement other graph algorithm - [x] How to run real-world graph ? Not enough memory - Jul 14 - Progress - Verify the simulator is correct (compared to GraphChi) - Generate trace from processing engine - Meeting - **Add TCAM for searching in a specific field** - Discuss whether our assumption about cache block is adequate: generate a single trace for all items in a cache block. - Discuss whether our assumption about previous access entry is adequate. - Jul 8 - Implement a GraphChi-like graph processing engine - [x] Verify the correctness of simulator - [x] Generate trace from processing engine - [ ] Figure out the SALP(Subarray Level Parallelism) in RC-NVM - Jul 1 - Conclude that "we won't use the real trace of GraphChi for our experiment" Instead, we will use a "pseudo" data layout for graph dataset and simulate the impact of column access - Meeting with Dr. Wu: meeting notes [here](https://hackmd.io/@WeiCheng14159/rk90hLn3d) - Jun 24 - No meeting - Jun 17 - Generate gem5 compatible trace and feed into RC_NVM simulator. This generate the base line performace (row access only) - Jun 10 - Generate memory reference address from GraphChi. - We can simply ignore the multi-core situation during simulation - - Jun 3 - Present the encoding scheme of GraphChi shard - Discuss: Minimum access unit of RC-NVM ? - The minimum accessible unit of RC-NVM is 8-byte. Which means a column or a row address will return the same 8 byte value given different address space. - Discuss: Memory shard is in memory, while sliding shard is **partially** in memory. - May 27 - Try 2 ways to print out the shard table for GraphChi - May 20 - Study GraphChi source code, further explain the mechanism of `run()` function - May 13 - Study source code of GraphChi - Identify some important functions in GraphChi including the `graphchi::graphchi_engine::run(...)` function - May 6 - Present paper **GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures** [link](https://docs.google.com/presentation/d/17O9mezPQByB7PenCCE7FgfxFMfGSiQ--gAAz4X29CWM/edit?usp=sharing) - Present **Graph Processing Model** [link](https://docs.google.com/presentation/d/1ZRDOfsMkU2R4J2Hhz2iuetaoIXiFEcVjWHzR21mzEb4/edit?usp=sharing) - New goal: work on a "pseudo" gem5 simulator that perform graph algorithm and generate gem5-like trace - Apr 29 - Study RC-NVM source code - Port RC-NVM source code to latest version - Apr 22 - Clarify DRAM architecture - Find DRAM delay info in Memory System Book - Apr 15 - Read **Memory Systems** Chap 7 Overview of DRAM [note](https://hackmd.io/@WeiCheng14159/SyW1xGBId) - Upgrade **NVMain** to support GCC 9.3 [here](https://github.com/WeiCheng14159/NVmain/commit/7f0b303acd05b8322d15e69d8465a8715fde02d3) and Python3 [here](https://github.com/WeiCheng14159/NVmain/commit/77f50b9cd8cad0c9f5c45d1f9386d967ce7dfb9d) - Identify a bug in NVMain: **busWidth** not working ```c=1 ; Bus width in bits. JEDEC standard is 64-bits BusWidth 128 ``` - How to distinguish a ROW or COL access ? - Add new instruction in ISA ? - Add one bit in physical address to differentiate between ROW/COL address - Apr 8 - **NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories** - Mar 11 - **Propose a "shard placement strategy" in RC-NVM** - [Bad] Place shard in adj subarray in a **physical** bank - [Better] Place shard in subarray in a **logical** bank - **Focus on simpler algorithm** to better understand RC-NVM arch - Triangle counting algo (can be implemented in diff ways, choose the one best fits RC-NVM) - Matrix multiplication algo on RC-NVM - Learn how to use **gem5, NVMain** simulation tool - Next week no meeting - Mar 4 - **Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller** - Present RC-NVM to 俊峯 on Mar 5 (Fri) - Next week focus on "RC-NVM placement policy" - Feb 24 - **Propose several potential attempt to "combine RC-NVM arch and PSW(Parallel Sliding Window) algorithm"** - Map adjancy matrix to RC-NVM subarray - Map adjancy matrix to RC-NVM subarray WITH pointers - Map RC-NVM subarray to PSW shard (Most feasible solution) - Think: What will happend when graph is dynamic ? - 整理一下 ppt - Modify GraphChi as simulator and count # of Row/Col access - Feb 9 - **Graphchi: Large-scale graph computation on just a PC** - How to combine RC-NVM arch and GraphChi algorithm ? - Jan 27 - Revisit the rest of **RC-NVM: Dual-Addressing Non-Volatile Memory Architecture Supporting Both Row and Column Memory Accesses** - Find potential applications for both Row and Column memory access. - Accelerate AI application with GEMM - Accelerate graph processing - Add in/near memory computing to RC-NVM - Read GraphChi paper - Jan 19 - **RC-NVM: Dual-Addressing Non-Volatile Memory Architecture Supporting Both Row and Column Memory Accesses** - What is logical bank ? What is physical bank ? - Read the remaining examples in RC-NVM paper. IMDB example and GEMM example. - Which paper cites RC-NVM ? Read abstract and summarize. ## ==Paper== ### GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures - This paper presents a PIM (Process-In-Memory) graph accelerator consists of a CAM (Content Addressable Memory) crossbar and a MAC (Multiply-ACcumulate) crossbar. The graph processing model used in this paper is based on SpMV (Sparse Matrix Vector Multiplication) - [note](https://docs.google.com/presentation/d/17O9mezPQByB7PenCCE7FgfxFMfGSiQ--gAAz4X29CWM/edit?usp=sharing) - [paper](https://ieeexplore.ieee.org/abstract/document/9138911) ### NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories - This paper presents a NVM simulator based on DRAM written in C++ - [note](https://hackmd.io/@WeiCheng14159/SJYYeAtEd) - [paper](https://ieeexplore.ieee.org/document/6296505) ### RC-NVM: Dual-Addressing Non-Volatile Memory Architecture Supporting Both Row and Column Memory Accesses - This paper presents a non-volatile memory cross architecture supporting both row and column access. The simulation in this paper is based on NVMain and gem5. - [note](https://hackmd.io/@WeiCheng14159/BJ8VowTx_) - [paper](https://ieeexplore.ieee.org/document/8453833) ### Graphchi: Large-scale graph computation on just a PC - This paper demonstrates the ability to process real-world graph on disk-based PC by carefully organized the data layout on disk. A novel algorithm PSW (Parallel Sliding Window) is presented to handle the memory access pattern of a vertex programming model. - [note](https://hackmd.io/ZczwWT2iRk292E7bUAIkKg) - [paper](https://dl.acm.org/doi/10.5555/2387880.2387884) ### Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters - Why graph partition is hard - [note (WIP)]() - [paper](https://arxiv.org/abs/0810.1355) ### Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller - This paper presents a DRAM controller designed for CNN accelerator. The controller is based on Xilinx FPGA. - [note](https://hackmd.io/@WeiCheng14159/BkD3go6GO) - [paper](https://www.mdpi.com/2079-9292/10/4/438/pdf) ## ==Slides== ### Understanding NVMain source code - This note briefly analyze the structure and control flow of NVMain simulator - [note](https://hackmd.io/@WeiCheng14159/HyUEMmMru) ### Highlight in RC-NVM source code - This note briefly go through 51 code commits that enables column memory access in NVMain simulator. - [note](https://hackmd.io/@WeiCheng14159/S1FLh4DDO) ### RC-NVM architecture - This slide visualizes the architecture of RC-NVM - [note](https://docs.google.com/presentation/d/1Nod-33jaI3vAFOe47u_qGkhPJmi6bO7PC4ZuV98r6m4/edit#slide=id.p) ### Memory System -- DRAM study note - This note briefly summarizes the info related to DRAM in the book **Memory System** - [note](https://hackmd.io/@WeiCheng14159/SyW1xGBId) ### Graph Processing Model - This slides concludes three type of graph processing model used in recent graph accelerator. That is **vertex-centric, edge-centric, SpMV model** - [note](https://docs.google.com/presentation/d/1ZRDOfsMkU2R4J2Hhz2iuetaoIXiFEcVjWHzR21mzEb4/edit?usp=sharing)