# Hardware Ray Tracer Datapath > 傅信豪 contribute by [Shin-Hao, Fu](https://github.com/HowFunSong) ## Development Objectives of this Project my goal is to analysis [raytracer](https://github.com/purdue-aalp/raytracer), and this project have already finished some fundamental work of how to support raytrace calculation. In raytracer, usually heavily involved dot product, cosine similarity calculation, and determinate whether a ray have intersection with a object, which may included circle sphere, cub... . ## Prerequisite ### Test top module ```shell $ sbt "test:testOnly raytracer_datapath.Datapath_test" ``` but it shows error ::: spoiler java.lang.OutOfMemoryError: Java heap space ``` ~/CS/raytracer stable sbt test:testOnly raytracer_datapath.Datapath_test 127 err 10:51:56 PM [info] welcome to sbt 1.8.0 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project raytracer-build from plugins.sbt ... [info] loading project definition from /Users/shinhaofu/CS/raytracer/project [info] loading settings for project root from build.sbt ... [info] set current project to raytracer (in build file:/Users/shinhaofu/CS/raytracer/) [warn] sbt 0.13 shell syntax is deprecated; use slash syntax instead: Test / testOnly [info] compiling 28 Scala sources to /Users/shinhaofu/CS/raytracer/target/scala-2.13/classes ... https://repo1.maven.org/maven2/org/scala-sbt/compiler-bridge_2.13/1.8.0/compiler-bridge_2.13-1.8.0.pom 100.0% [##########] 2.7 KiB (2.7 KiB / s) [info] Non-compiled module 'compiler-bridge_2.13' for Scala 2.13.8. Compiling... [info] Compilation completed in 4.053s. [warn] 'genBundleElements' is now default behavior, you can remove the scalacOption. [info] compiling 8 Scala sources to /Users/shinhaofu/CS/raytracer/target/scala-2.13/test-classes ... [warn] 'genBundleElements' is now default behavior, you can remove the scalacOption. datapath does not support euclidean! [info] FNFlipSign_test: [info] - FNFlipSign correctly changes sign of float32 vlaues [info] - FNFlipSign handles pos/neg zero correctly [info] SkidBufferStageTest: [info] - skidbuffer stage should pass through identical values from 0 to 1000 when both ends are free [info] - skidbuffer stage should pass through identical values from 0 to 1000 when emit is congested [info] - skidbuffer stage should pass through identical values from 0 to 1000 when intake is throttled [info] - skidbuffer stage should pass through identical values from 0 to 1000 when intake is throttled and emit is congested [info] - skidbuffer stage should pass through squared values from 0 to 1000 when intake is throttled and emit is congested [info] - chained skidbuffer stages should pass through correct values from 0 to 1000 when intake is throttled and emit is congested [info] - chained generalized skid buffer stages should function correctly when intake is throttled and emit is congested [warn] In the last 10 seconds, 5.027 (51.5%) were spent in GC. [Heap: 0.00GB free of 1.00GB, max 1.00GB] Consider increasing the JVM heap using `-Xmx` or try a different collector, e.g. `-XX:+UseG1GC`, for better performance. Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "classloader-cache-cleanup-0" | => raytracer_datapath.Datapath_test 32s Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-7864" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "process reaper" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sbt-progress-report-scheduler" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "make -C /Users/shinhaofu/CS/raytracer/cached_verilator_backend/euclideanfalse/verilated -j 8 -f VUnifiedDatapath_wrapper.mk VUnifiedDatapath_wrapper stdout thread" java.lang.NullPointerException at com.swoval.files.ApplePathWatcher$1.accept(ApplePathWatcher.java:267) at com.swoval.files.ApplePathWatcher$1.accept(ApplePathWatcher.java:261) at com.swoval.files.apple.FileEventMonitorImpl$WrappedConsumer$1.run(FileEventMonitors.java:178) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) [info] Datapath_test: [info] QuadSortRecFNTest: [info] raytracer_datapath.Datapath_test *** ABORTED *** [info] raytracer_datapath.QuadSortRecFNTest *** ABORTED *** [info] java.lang.OutOfMemoryError: Java heap space [info] ... [info] java.lang.OutOfMemoryError: Java heap space [info] ... java.lang.OutOfMemoryError: Java heap space [error] [launcher] error during sbt launcher: java.lang.OutOfMemoryError: Java heap space Exception in thread "com.swoval.files.apple.FileEventsMonitor.runloop" java.lang.OutOfMemoryError: Java heap space ``` ::: ### Explain it says the memory is not enough to run this test ### Solution evaluate ***all testbench*** ```shell $ sbt -J-Xmx4G test ``` ```sh= [info] Run completed in 31 seconds, 188 milliseconds. [info] Total number of tests run: 20 [info] Suites: completed 5, aborted 0 [info] Tests: succeeded 20, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` evaluate ***one testbench*** result ```shell $ sbt -J-Xmx4G "test:testOnly raytracer_datapath.Datapath_test" ``` ```sh= [info] Run completed in 21 seconds, 177 milliseconds. [info] Total number of tests run: 6 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## PartA : Study Document && source code ### review test code we use `Datapath_test.scala` as example, in `Datapath_test.scala` ```scala class Datapath_test extends AnyFreeSpec with ChiselScalatestTester ``` it first initialize to test several functionality, each test module in will have answer and test function like `testUnifiedIntersection`, which will use several hardware unit to achive the goal. because some test will involve several hardware unit to generate result. `object RaytracerGold` was writed in`scala`, use sofeware-implement for testbench - Ray-Box Intersection test - Ray-Boxes Intersection test - Ray-Triangle Intersection test ### test bench ![image](https://hackmd.io/_uploads/SypIUClq1l.png) #### 1. FNFlipSign_test.scala validate the floating pointe operation corrrectness #### 2. RecFNCCompareSelectTest.scalar validate the 33-bit floating point operation correctness. this type fp mainly use for calculation in the whole pipeline. #### 3. QuadSortRecFNTest.scala test quadsort/quadsortRec, which will use in unified datapath stage(10), if we want to judge a ray has intersection with boxes. #### 4. Datapath_test.scala this is the main test of whole system, every feature test will initial this ```scala= def testUnifiedIntersection( extended: Boolean, description: String, ray_seq: Seq[SW_Ray], box_seq_seq: Seq[Seq[SW_Box]], triangle_seq: Seq[SW_Triangle], op_seq: Seq[SW_Opcode] ){... test(gen_baseline_or_extended_datapath(extended)) } ``` Let's see `gen_baseline_or_extended_datapath(extended)` the extended will determinate which length of ```scala= def gen_baseline_or_extended_datapath(extended: Boolean) = extended match { case true => new UnifiedDatapath_wrapper_16 case false => new UnifiedDatapath_wrapper } ``` unifiedDatapath is the top level module for the ray tracer datapath. Two type of wrapper, `UnifiedDatapath_wrapper_16` and `UnifiedDatapath_wrapper`. ![image](https://hackmd.io/_uploads/rJfu03CUye.png) #### 5. SkidBufferStageTest.scala according the document from https://github.com/purdue-aalp/raytracer/releases/tag/v1 >This allows the Ray Tracer Datapath to be integrated into a larger datapath and propagate back pressure via its valid-ready interfaces. A drawback of this design is increased hardware overhead since each skid buffer contains two registers. it says this design may cause overhead by adding extra registers, but it can guarantee the dataflow safe by "handshake mechanism" ### Modules ![image](https://hackmd.io/_uploads/rJw3AnA8yg.png) ### waveform after run testcase, which is write in scala, we can get the waveform result in`raytracer/cached_verilator_backend` ```bash $ cd raytracer/cached_verilator_backend/euclideanfalse ``` we can check the waveform ```bash $ gtkwave UnifiedDatapath_wrapper.vcd ``` to make result clear according difference test, i adjust `outdir` for UnifiedDatapath_wrapper.vcd. ``` ~/CS/raytracer/cached_verilator_backend stable >1 tree -d -L 1 ok 11:31:46 AM . ├── angular ├── baseline_ray_triangle_random ├── euclidean ├── extended_ray_box_random ├── extended_ray_triangle_random └── ray_box_random 7 directories ``` enable 32 bit datapath. the test incuding euclidean, angular(a.k.a cosin similarity), ray_box_is_ intersection, ray triangle is intersection. Sequentially feed 100 random testcase to each test . ![image](https://hackmd.io/_uploads/SJaaRn0Lyx.png) enable only euclidean test 16bit datapath ![image](https://hackmd.io/_uploads/HJdufa0DJx.png) ## PartB : Get familiar with chisel development toolchain ### overview of chisel toolchain overview the toolchain for chisel development.! ![image](https://hackmd.io/_uploads/B1C1ypAIyl.png) ***cited from [Verification of Chisel Hardware Designs with ChiselVerify.](https://www.sciencedirect.com/science/article/pii/S0141933122002666)*** it shows that we can combin sim_main.cpp with our hardware implementation by transfering hardware desing to verilog with verilator, then compile to executeable file. ![image](https://hackmd.io/_uploads/rkpeJaCIkx.png) ### review [ca2023_lab3](https://hackmd.io/@sysprog/r1mlr3I7p#Development-Objectives-of-this-Project) use ```shell $ make verilator ``` in `Makefile` ```makefile ... verilator: sbt "runMain board.verilator.VerilogGenerator" cd verilog/verilator && verilator --trace --exe --cc sim_main.cpp Top.v && make -C obj_dir -f VTop.mk ... ``` then use ```shell $ ./run-verilator.sh -instruction src/main/resources/hello.asmbin -time 2000 -vcd dump.vcd ``` Let's see `run-verilator.sh`. it will execute VTop.exe, and take the hello.asmbin as input and other arguments to run. ```bash #!/bin/sh if [ ! -f verilog/verilator/obj_dir/VTop ]; then echo "Failed to generate Verilog" exit 1 fi verilog/verilator/obj_dir/VTop $* ``` run `$ gtkwave dump.vcd` ![image](https://hackmd.io/_uploads/SyyMJTCLJx.png) ## PartC : Testing Module ### translate hardware module to verilog :::spoiler code ```scala= ... // define how to connect to IO to datapath ... object VerilogGenerator extends App { emitVerilog( new Top(false), // To generate Top(false), you can change to false here Array("--target-dir", "verilog/verilator") // Specify the output directory ) } ``` ::: but it shows error with message below ```bash= [error] stack trace is suppressed; run last Compile / runMain for the full output [error] (Compile / runMain) circt.stage.phases.Exceptions$FirtoolNonZeroExitCode: firtool returned a non-zero exit code. Note that this version of Chisel (5.0.0) was published against firtool version 1.40.0. ``` it shows `Chisel (5.0.0) was published against firtool version 1.40.0.` , it depends on firtool 1.40.0 follow [chisel page](https://www.chisel-lang.org/docs/appendix/versioning) && [firtool 1.40.0 release page](https://github.com/llvm/circt/releases?q=1.40.0&expanded=true) , download `firrtl-bin-macos-11.tar.gz` ![image](https://hackmd.io/_uploads/rJtezXfwyx.png) According to you DevEnv chose the properly version, here i use macOS version ```bash= tar -xvzf firrtl-bin-macos-11.tar.gz cd firtool-1.40.0 sudo mv ./bin/firtool /usr/local/bin/firtool firtool --version ``` if success download and add to your env ```bash ~/Dep/firtool-1.40.0 firtool --version LLVM (http://llvm.org/): LLVM version 17.0.0git Optimized build. CIRCT firtool-1.40.0 ``` go to project run `make verilator` , define in Makefile ```MakeFile verilator: sbt "runMain board.verilator.VerilogGenerator" verilator --cc ./verilog/verilator/Top.sv --exe --Mdir ./verilog/verilator/obj_dir ``` check `verilog/verilator/obj_dir` ![image](https://hackmd.io/_uploads/HJh9I0xcJx.png) :::danger Fix the permissions of the uploaded picture. ::: ### write test case with sim_main.cpp ::: spoiler testcode ```cpp= #include <verilated.h> #include <verilated_vcd_c.h> #include <algorithm> #include <fstream> #include <iostream> #include <memory> #include <string> #include <vector> #include "VTop.h" // Verilated model header // Global variables for simulation time vluint64_t main_time = 0; // Main simulation clock constexpr vluint64_t sim_time = 10000; // Maximum simulation duration // Simulation time update double sc_time_stamp() { return main_time; } int main(int argc, char **argv) { // Initialize the Verilated model Verilated::commandArgs(argc, argv); // Process command-line arguments VTop *top = new VTop; // Instantiate the module // Waveform output (if needed) VerilatedVcdC *vcd = nullptr; Verilated::traceEverOn(true); // Enable tracing vcd = new VerilatedVcdC; top->trace(vcd, 99); // Set trace level vcd->open("waveform.vcd"); // Main simulation loop while (!Verilated::gotFinish() && main_time < sim_time) { // Update clock signal (rising/falling edges) top->clock = (main_time % 2 == 0); // Reset signal (active during the initial few clock cycles) top->reset = (main_time < 10); // Drive input signals if (main_time > 20) { top->io_in_ray_origin_x = 0x12345678; // Simulate input ray origin top->io_in_ray_origin_y = 0x87654321; top->io_in_ray_origin_z = 0x11112222; top->io_in_ray_dir_x = 0x33334444; // Simulate input ray direction top->io_in_ray_dir_y = 0x55556666; top->io_in_ray_dir_z = 0x77778888; top->io_in_opcode = 2; // Opcode (test a specific function) } // Evaluate the simulation step top->eval(); // Check output signals (compare with golden results) if (main_time > 30) { if (top->io_out_valid) { // Check if output is valid printf("Time %llu: Output Valid\n", main_time); printf("Tmin_out_0 = %u\n", top->io_out_bits_tmin_out_0); printf("Tmin_out_1 = %u\n", top->io_out_bits_tmin_out_1); // Additional output signals can be checked here } } // Output waveform (optional) if (vcd) vcd->dump(main_time); // Update simulation time main_time++; } // End simulation top->final(); // Perform the final operation on the module if (vcd) { vcd->close(); delete vcd; } delete top; return 0; } ``` ::: run `make verilator_compile` ```MakeFile= verilator_compile: sbt "runMain board.verilator.VerilogGenerator" verilator --cc --trace ./verilog/verilator/Top.sv --exe ./verilog/verilator/sim_main.cpp --Mdir ./verilog/verilator/obj_dir cd verilog/verilator && make -C obj_dir -f VTop.mk ``` then run `make veriltor_run` ```MakeFile= verilator_run: cd verilog/verilator/obj_dir Vtop.exe&&gtkwave waveform.vcd ``` ![image](https://hackmd.io/_uploads/HJCuTnQDyg.png) ### visualize the hierachy of datapath if we want to visualize the datapath, first we need to [convert](https://javadoc.io/doc/org.chipsalliance/chisel_2.13/5.0.0/circt/stage/ChiselStage$.html) Top module into fir format. ```scala= ... object FIRRTLGenerator extends App { // convert generate FIRRTL Circuit val firrtlCircuit: Circuit = convert(new Top(false)) // write out FIRRTL result val outputPath = Paths.get("firrtl/Top.fir") Files.createDirectories(outputPath.getParent) Files.write(outputPath, firrtlCircuit.serialize.getBytes(StandardCharsets.UTF_8)) println(s"FIRRTL saved to: $outputPath") } ... ``` ```shell $ sbt "runMain board.verilator.FIRRTLGenerator" ``` use [diagrammer](https://github.com/freechipsproject/diagrammer), a open source project that help developer transfer fir format into svg format, it depends on [graphviz](https://www.graphviz.org/) ```bash= git clone https://github.com/freechipsproject/diagrammer cd diagrammer ./diagram.sh -i ~/projects/output/yourDesign.fir ``` ![image](https://hackmd.io/_uploads/SJbiahQvJx.png) ### test ray triangle intersection(n = 1000) ```bash= Output for Test 179: Result = 0 Output for Test 180: Result = 0 Output for Test 181: Result = 0 Output for Test 182: Result = 0 Output for Test 183: Result = 0 Output for Test 184: Result = 0 ``` ![image](https://hackmd.io/_uploads/SkoBfTRwkx.png) we can see the input is send into the pipeline. But it still have error , try to find where gone wrong. the result shows all is false. `io_out_bits_output_triangle_hit == false` 1. input signal not setup properly 2. cycle not set properly ## PartD : proposal implementation in raytracer, usually we dont need calculate all the ray in scene, so we need consider the position of `camere`/`ray`/`object`, therefore, we need a cross operation to determin our camera coordinate. furthermore when calculate the reflection also need cross operation. $$ u \times v= \begin{vmatrix} 1 & 1 & 1 \\ u_x & u_y & u_z \\ v_x & v_y & v_z \end{vmatrix} $$ which can express as $$ u \times v = \big(u_y v_z - u_z v_y\big) - \big(u_z v_x - u_x v_z\big) + \big(u_x v_y - u_y v_x\big) $$ its not much difference to dot product but project owner seem not deal with this. ## Reference 1. https://github.com/purdue-aalp/raytracer 2. [吃透Chisel语言.10.Chisel项目构建、运行和测试(二)——Chisel中生成Verilog代码&Chisel开发流程](https://blog.csdn.net/weixin_43681766/article/details/125582801) 3. [Verilator User’s Guide](https://verilator.org/guide/latest/) 4. Berkeley HardFloat * GitHub - [berkeley-hardfloat](https://github.com/ucb-bar/berkeley-hardfloat) * [Berkeley HardFloat](https://www.jhauser.us/arithmetic/HardFloat.html) 5. [Collision detection (AABB) wiki](https://en.wikipedia.org/wiki/Collision_detection) 6. [Hello Verilator—高品質&開源的 SystemVerilog(Verilog) 模擬器介紹&教學(一)](https://ys-hayashi.me/2020/12/verilator/) 7. https://github.com/freechipsproject/diagrammer 8. https://www.graphviz.org/