--- tags: CSIE,Research --- # CCF Manual ## 1. CCF - 全名叫做**C**GRA **C**ompilation **F**ramework > CGRA:Coarse-Grained Reconfigurable Array - 主要simulate的架構為heterogeneous processor core+CGRAs architecture >heterogeneous:使用不同的ISA與不同的architecture做運算 ### 1.1. Overview of CCF - 可以在C/C++ code增加`#pragma CGRA`即可使CGRA做運算 > CCF-Compiler會自動extracts(提取) ### 1.2. Organization of Source Code Directories - `RAMP`:負責mapping loop到CGRA上(CONFIG在這!!) - `InstructionGenerator`:傳遞需要的 machine instructions - `scripts`:包含下載的資料名稱,主要有CGRA Library function - `work/benchmarks/Mibench`:範例的程式(有用到CCF) - `basicmath`:基本的運算及迴圈 - `bitcount`:二位元的運算及迴圈(待確認) --- ## 2. Code Generation and Validation Using CCF ### 2.1. Loop Annotation 想要CGRA跑的話,在需要的loop註記`#pragma CGRA` ### 2.2. Make 原本是用`gcc`這個compiler,取代成`cgracc` >cgracc全名為CCF's CGRA Compiler Collection CCF-compiler會說明有沒有用到CGRA >如果在loop裡有system call,卻沒有用到CGRA >或是compiler對code vectorlized(向量化?) >就不會產生給CGRA的code >同時也會說明為何不產生 >如果compiler有產生給CGRA的code >則會有一個`CGRAExec`的資料夾 >裡面會包含給CGRA的所有loops的information ### 2.3. Simulate Heterogeneous Execution - simulate platform是建立在`gem5`上的,可以執行`se_hetro.py`來取代`se.py` >SE - System call emulation mode >hetro - 使用多於一種的processor執行 - 在CCF的資料夾中可以輸入來執行模擬用了 CGRA+CPU 的gem5 model >n=2 cores, 1 core is specified as a CGRA, and another n-1=1 is the processor core >`-c`:Run Executable File ``` gem5/build/ARM/gem5.opt gem5/configs/example/se_hetro.py -n 2 --cpu-type atomic -c ./work/benchmarks/MiBench/basicmath-CCF/basicmath12/basicmath_small > output_small.txt ``` - 如果想要直接在外面嘗試的話要改成這樣 ``` #!/bin/sh #qemu-arm ./bitcnts 75000 > output_small.txt gem5/build/ARM/gem5.opt gem5/configs/example/se_hetro.py -n 2 --cpu-type atomic -c ./work/benchmarks/MiBench/bitcount-CCF/bitcount9/bitcnts --options="75000" > output_small.txt ``` > 在work中裡的資料夾都可以去改 - 如果對debug mode有興趣,可以增加一些參數,可以看到CGRA的micro-architecture conponents ``` --debug-flags=CGRA,CGRA_Detailed ``` --- ## 3. CCF’s Code Generation Steps 說明如何產生CGRA的code ### 3.1. Extraction of the Annotated Loop(s) - CCF-compiler從C/C++的程式碼去找`#pragma CGRA` > CCF-compiler可以透過clang去修改 - 之後會找到intermediate representation (IR) > `temporary.ll` > 在這裡若有符合annotated loop 的metadata則CCF-compiler會來分析及轉換到CGRA上 - CCF target擁有最高優化的執行權 > 舉例來說auto-vectorization enabled - 若IR符合CCF-compiler的要求,則會產生DDG > DDG: data dependency graph ### 3.2. Generation of DDG and Communication of the Live Data - LLVM pass(DDGGen) > DDG可以被視覺化 [Graphviz tool (online)]( http://www.graphviz.org/) ### 3.3. Mapping of DDG on the Target CGRA - DDG map到CGRA上時會順便產生prologue, kernel, and the epilogue of the mapping - **The source code of our framework is parameterized and can model different CGRA configurations** >XML based input for target configuration - 他的target就是16個不同的pe然後是一個4x4的2-D Network > each PE has 4 local registers. - 但CCF專注在support general-purpose application ### 3.4. Generation of Machine Instructions #### 3.4.1 Instruction Formats(是32-bit) - R-Type Instruction Format | 31:28 | 27 | 26:24 | 23:21 | 20:19 | 18:17 | 16:15 | 14 | 13 | 12 | 11:0 | |:------:|:---:|:-----:|:-----:|:-----:|:-----:|:-----:|:---:|:---:|:---:|:---------:| | opcode | P | LMUX | RMUX | R1 | R2 | RW | WE | AB | DB | Immediate | - P-Type Instruction Format | 31:28 | 27 | 26:24 | 23:21 | 20:19 | 18:17 | 16:15 | 14:12 | 11:0 | |:------:|:---:|:-----:|:-----:|:-----:|:-----:|:-----:|:---:|:---------:| | opcode | P | LMUX | RMUX | R1 | R2 | RP | PMUX | Immediate | ### 3.5. Architectural Simulation - 建立在gem5的platform,他把CGRA當作ARM Cortex的seperate cortex,並且當作heterogeneous platform ### 3.6. Techniques Implemented >看起來像縮寫的解釋 - CCF: A CGRA Compilation Framework - RAMP: Resource-Aware Mapping for CGRAs - URECA: A Compiler Solution to Manage Unified Register File for CGRAs - REGIMap: Register-aware Application Mapping on Coarse-grained Reconfigurable Architectures (CGRAs) (for implementation of clique-based mapping heuristic, routing through registers) - Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA (partial predication based execution of conditional operations) ## test ``` s108321042@lab304-ubuntu:~$ docker attach dd root@479fceca14b2:/# cd ~ root@479fceca14b2:~# ls ccf ccf-origin ccf2*2 cmake-3.11.1-Linux-x86_64 cmake-3.11.1-Linux-x86_64.tar.gz gem5 root@479fceca14b2:~# cd ccf root@479fceca14b2:~/ccf# ls CCF Manual.pdf InstructionGenerator RAMP README.md gem5 install llvm plugin-llvm-gold scripts work root@479fceca14b2:~/ccf# cd work/benchmarks/MiBench/basicmath-CCF/ basicmath1/ basicmath11/ basicmath13/ basicmath15/ basicmath2/ basicmath4/ basicmath6/ basicmath8/ org/ basicmath10/ basicmath12/ basicmath14/ basicmath16/ basicmath3/ basicmath5/ basicmath7/ basicmath9/ root@479fceca14b2:~/ccf# cd work/benchmarks/MiBench/basicmath-CCF/basicmath13 root@479fceca14b2:~/ccf/work/benchmarks/MiBench/basicmath-CCF/basicmath13# ls CGRAExec LICENSE Makefile basicmath_large.c basicmath_small.c cubic.c isqrt.c newIR.ll pi.h round.h runme_small.sh sniptype.h temporary.ll COMPILE LoopInfo.txt TCInfo.txt basicmath_small combinedIR.ll debugfile m5out output_small.txt rad2deg.c runme_large.sh snipmath.h temp.ll temporaryIR.ll root@479fceca14b2:~/ccf/work/benchmarks/MiBench/basicmath-CCF/basicmath13# make clean rm -rf basicmath_small basicmath_large output* *.ll CGRAExec m5out *.s root@479fceca14b2:~/ccf/work/benchmarks/MiBench/basicmath-CCF/basicmath13# make cgracc -static -O3 basicmath_small.c rad2deg.c cubic.c isqrt.c -o basicmath_small -lm Loop has Depth=1 Number of BBs=1 With one exiting branch Loop has Depth=2 Number of BBs=1 With one exiting branch Total Loops Compiled for CGRA: 2 root@479fceca14b2:~/ccf/work/benchmarks/MiBench/basicmath-CCF/basicmath13#./runme_small.sh ```
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up