Try   HackMD

Hardware Ray Tracer Datapath

傅信豪

contribute by Shin-Hao, Fu

Development Objectives of this Project

my goal is to analysis raytracer, and this project have already finished some fundamental work of how to support raytrace calculation. In raytracer, usually heavily involved dot product, cosine similarity calculation, and determinate whether a ray have intersection with a object, which may included circle sphere, cub .

Prerequisite

Test top module

$ sbt "test:testOnly raytracer_datapath.Datapath_test"

but it shows error

java.lang.OutOfMemoryError: Java heap space
~/CS/raytracer  stable  sbt test:testOnly raytracer_datapath.Datapath_test                                     127 err  10:51:56 PM 

[info] welcome to sbt 1.8.0 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project raytracer-build from plugins.sbt ...
[info] loading project definition from /Users/shinhaofu/CS/raytracer/project
[info] loading settings for project root from build.sbt ...
[info] set current project to raytracer (in build file:/Users/shinhaofu/CS/raytracer/)
[warn] sbt 0.13 shell syntax is deprecated; use slash syntax instead: Test / testOnly
[info] compiling 28 Scala sources to /Users/shinhaofu/CS/raytracer/target/scala-2.13/classes ...
https://repo1.maven.org/maven2/org/scala-sbt/compiler-bridge_2.13/1.8.0/compiler-bridge_2.13-1.8.0.pom
  100.0% [##########] 2.7 KiB (2.7 KiB / s)
[info] Non-compiled module 'compiler-bridge_2.13' for Scala 2.13.8. Compiling...
[info]   Compilation completed in 4.053s.
[warn] 'genBundleElements' is now default behavior, you can remove the scalacOption.
[info] compiling 8 Scala sources to /Users/shinhaofu/CS/raytracer/target/scala-2.13/test-classes ...
[warn] 'genBundleElements' is now default behavior, you can remove the scalacOption.
datapath does not support euclidean!
[info] FNFlipSign_test:
[info] - FNFlipSign correctly changes sign of float32 vlaues
[info] - FNFlipSign handles pos/neg zero correctly
[info] SkidBufferStageTest:
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when both ends are free
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when emit is congested
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when intake is throttled
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when intake is throttled and emit is congested
[info] - skidbuffer stage should pass through squared values from 0 to 1000 when intake is throttled and emit is congested
[info] - chained skidbuffer stages should pass through correct values from 0 to 1000 when intake is throttled and emit is congested
[info] - chained generalized skid buffer stages should function correctly when intake is throttled and emit is congested
[warn] In the last 10 seconds, 5.027 (51.5%) were spent in GC. [Heap: 0.00GB free of 1.00GB, max 1.00GB] Consider increasing the JVM heap using `-Xmx` or try a different collector, e.g. `-XX:+UseG1GC`, for better performance.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "classloader-cache-cleanup-0"
  | => raytracer_datapath.Datapath_test 32s
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-7864"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "process reaper"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sbt-progress-report-scheduler"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "make -C /Users/shinhaofu/CS/raytracer/cached_verilator_backend/euclideanfalse/verilated -j 8 -f VUnifiedDatapath_wrapper.mk VUnifiedDatapath_wrapper stdout thread"
java.lang.NullPointerException
        at com.swoval.files.ApplePathWatcher$1.accept(ApplePathWatcher.java:267)
        at com.swoval.files.ApplePathWatcher$1.accept(ApplePathWatcher.java:261)
        at com.swoval.files.apple.FileEventMonitorImpl$WrappedConsumer$1.run(FileEventMonitors.java:178)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
[info] Datapath_test:
[info] QuadSortRecFNTest:
[info] raytracer_datapath.Datapath_test *** ABORTED ***
[info] raytracer_datapath.QuadSortRecFNTest *** ABORTED ***
[info]   java.lang.OutOfMemoryError: Java heap space
[info]   ...
[info]   java.lang.OutOfMemoryError: Java heap space
[info]   ...
java.lang.OutOfMemoryError: Java heap space
[error] [launcher] error during sbt launcher: java.lang.OutOfMemoryError: Java heap space
Exception in thread "com.swoval.files.apple.FileEventsMonitor.runloop" java.lang.OutOfMemoryError: Java heap space

Explain

it says the memory is not enough to run this test

Solution

evaluate all testbench

$ sbt -J-Xmx4G test
[info] Run completed in 31 seconds, 188 milliseconds. [info] Total number of tests run: 20 [info] Suites: completed 5, aborted 0 [info] Tests: succeeded 20, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed.

evaluate one testbench result

$ sbt -J-Xmx4G "test:testOnly raytracer_datapath.Datapath_test"
[info] Run completed in 21 seconds, 177 milliseconds. [info] Total number of tests run: 6 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed.

PartA : Study Document && source code

review test code

we use Datapath_test.scala as example, in Datapath_test.scala

class Datapath_test extends AnyFreeSpec with ChiselScalatestTester

it first initialize to test several functionality, each test module in will have answer and test function like testUnifiedIntersection, which will use several hardware unit to achive the goal.

because some test will involve several hardware unit to generate result. object RaytracerGold was writed inscala, use sofeware-implement for testbench

  • Ray-Box Intersection test
  • Ray-Boxes Intersection test
  • Ray-Triangle Intersection test

test bench

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

1. FNFlipSign_test.scala

validate the floating pointe operation corrrectness

2. RecFNCCompareSelectTest.scalar

validate the 33-bit floating point operation correctness. this type fp mainly use for calculation in the whole pipeline.

3. QuadSortRecFNTest.scala

test quadsort/quadsortRec, which will use in unified datapath stage(10), if we want to judge a ray has intersection with boxes.

4. Datapath_test.scala

this is the main test of whole system, every feature test will initial this

def testUnifiedIntersection( extended: Boolean, description: String, ray_seq: Seq[SW_Ray], box_seq_seq: Seq[Seq[SW_Box]], triangle_seq: Seq[SW_Triangle], op_seq: Seq[SW_Opcode] ){... test(gen_baseline_or_extended_datapath(extended)) }

Let's see gen_baseline_or_extended_datapath(extended)
the extended will determinate which length of

def gen_baseline_or_extended_datapath(extended: Boolean) = extended match { case true => new UnifiedDatapath_wrapper_16 case false => new UnifiedDatapath_wrapper }

unifiedDatapath is the top level module for the ray tracer datapath.
Two type of wrapper, UnifiedDatapath_wrapper_16 and UnifiedDatapath_wrapper.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

5. SkidBufferStageTest.scala

according the document from https://github.com/purdue-aalp/raytracer/releases/tag/v1

This allows the Ray Tracer Datapath to be integrated into a larger datapath and propagate back pressure via its valid-ready interfaces. A drawback of this design is increased hardware overhead since each skid buffer contains two registers.

it says this design may cause overhead by adding extra registers, but it can guarantee the dataflow safe by "handshake mechanism"

Modules

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

waveform

after run testcase, which is write in scala, we can get the waveform result inraytracer/cached_verilator_backend

$ cd raytracer/cached_verilator_backend/euclideanfalse

we can check the waveform

$ gtkwave UnifiedDatapath_wrapper.vcd 

to make result clear according difference test, i adjust outdir for UnifiedDatapath_wrapper.vcd.

~/CS/raytracer/cached_verilator_backend  stable >1  tree -d -L 1                                                                                                                                                                   ok  11:31:46 AM 
.
├── angular
├── baseline_ray_triangle_random
├── euclidean
├── extended_ray_box_random
├── extended_ray_triangle_random
└── ray_box_random

7 directories

enable 32 bit datapath. the test incuding euclidean, angular(a.k.a cosin similarity), ray_box_is_ intersection, ray triangle is intersection.
Sequentially feed 100 random testcase to each test .

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

enable only euclidean test 16bit datapath

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

PartB : Get familiar with chisel development toolchain

overview of chisel toolchain

overview the toolchain for chisel development.!

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

cited from Verification of Chisel Hardware Designs with ChiselVerify.

it shows that we can combin sim_main.cpp with our hardware implementation by transfering hardware desing to verilog with verilator, then compile to executeable file.

image

review ca2023_lab3

use

$ make verilator

in Makefile

...

verilator:
	sbt "runMain board.verilator.VerilogGenerator"
	cd verilog/verilator && verilator --trace --exe --cc sim_main.cpp Top.v && make -C obj_dir -f VTop.mk

...

then use

$ ./run-verilator.sh -instruction src/main/resources/hello.asmbin
-time 2000 -vcd dump.vcd

Let's see run-verilator.sh. it will execute VTop.exe, and take the hello.asmbin as input and other arguments to run.

#!/bin/sh

if [ ! -f verilog/verilator/obj_dir/VTop ]; then
    echo "Failed to generate Verilog"
    exit 1
fi

verilog/verilator/obj_dir/VTop $*

run $ gtkwave dump.vcd

image

PartC : Testing Module

translate hardware module to verilog

code
... // define how to connect to IO to datapath ... object VerilogGenerator extends App { emitVerilog( new Top(false), // To generate Top(false), you can change to false here Array("--target-dir", "verilog/verilator") // Specify the output directory ) }

but it shows error with message below

[error] stack trace is suppressed; run last Compile / runMain for the full output [error] (Compile / runMain) circt.stage.phases.Exceptions$FirtoolNonZeroExitCode: firtool returned a non-zero exit code. Note that this version of Chisel (5.0.0) was published against firtool version 1.40.0.

it shows Chisel (5.0.0) was published against firtool version 1.40.0. , it depends on firtool 1.40.0

follow chisel page && firtool 1.40.0 release page , download firrtl-bin-macos-11.tar.gz

image

According to you DevEnv chose the properly version, here i use macOS version

tar -xvzf firrtl-bin-macos-11.tar.gz cd firtool-1.40.0 sudo mv ./bin/firtool /usr/local/bin/firtool firtool --version

if success download and add to your env

 ~/Dep/firtool-1.40.0  firtool --version                   
  LLVM (http://llvm.org/):
  LLVM version 17.0.0git
  Optimized build.
  CIRCT firtool-1.40.0

go to project run make verilator , define in Makefile

verilator:
	sbt "runMain board.verilator.VerilogGenerator"
	verilator --cc ./verilog/verilator/Top.sv --exe --Mdir ./verilog/verilator/obj_dir

check verilog/verilator/obj_dir

image

Fix the permissions of the uploaded picture.

write test case with sim_main.cpp

testcode
#include <verilated.h> #include <verilated_vcd_c.h> #include <algorithm> #include <fstream> #include <iostream> #include <memory> #include <string> #include <vector> #include "VTop.h" // Verilated model header // Global variables for simulation time vluint64_t main_time = 0; // Main simulation clock constexpr vluint64_t sim_time = 10000; // Maximum simulation duration // Simulation time update double sc_time_stamp() { return main_time; } int main(int argc, char **argv) { // Initialize the Verilated model Verilated::commandArgs(argc, argv); // Process command-line arguments VTop *top = new VTop; // Instantiate the module // Waveform output (if needed) VerilatedVcdC *vcd = nullptr; Verilated::traceEverOn(true); // Enable tracing vcd = new VerilatedVcdC; top->trace(vcd, 99); // Set trace level vcd->open("waveform.vcd"); // Main simulation loop while (!Verilated::gotFinish() && main_time < sim_time) { // Update clock signal (rising/falling edges) top->clock = (main_time % 2 == 0); // Reset signal (active during the initial few clock cycles) top->reset = (main_time < 10); // Drive input signals if (main_time > 20) { top->io_in_ray_origin_x = 0x12345678; // Simulate input ray origin top->io_in_ray_origin_y = 0x87654321; top->io_in_ray_origin_z = 0x11112222; top->io_in_ray_dir_x = 0x33334444; // Simulate input ray direction top->io_in_ray_dir_y = 0x55556666; top->io_in_ray_dir_z = 0x77778888; top->io_in_opcode = 2; // Opcode (test a specific function) } // Evaluate the simulation step top->eval(); // Check output signals (compare with golden results) if (main_time > 30) { if (top->io_out_valid) { // Check if output is valid printf("Time %llu: Output Valid\n", main_time); printf("Tmin_out_0 = %u\n", top->io_out_bits_tmin_out_0); printf("Tmin_out_1 = %u\n", top->io_out_bits_tmin_out_1); // Additional output signals can be checked here } } // Output waveform (optional) if (vcd) vcd->dump(main_time); // Update simulation time main_time++; } // End simulation top->final(); // Perform the final operation on the module if (vcd) { vcd->close(); delete vcd; } delete top; return 0; }

run make verilator_compile

verilator_compile: sbt "runMain board.verilator.VerilogGenerator" verilator --cc --trace ./verilog/verilator/Top.sv --exe ./verilog/verilator/sim_main.cpp --Mdir ./verilog/verilator/obj_dir cd verilog/verilator && make -C obj_dir -f VTop.mk

then run make veriltor_run

verilator_run: cd verilog/verilator/obj_dir Vtop.exe&&gtkwave waveform.vcd

image

visualize the hierachy of datapath

if we want to visualize the datapath, first we need to convert Top module into fir format.

... object FIRRTLGenerator extends App { // convert generate FIRRTL Circuit val firrtlCircuit: Circuit = convert(new Top(false)) // write out FIRRTL result val outputPath = Paths.get("firrtl/Top.fir") Files.createDirectories(outputPath.getParent) Files.write(outputPath, firrtlCircuit.serialize.getBytes(StandardCharsets.UTF_8)) println(s"FIRRTL saved to: $outputPath") } ...
$ sbt "runMain board.verilator.FIRRTLGenerator"

use diagrammer, a open source project that help developer transfer fir format into svg format, it depends on graphviz

git clone https://github.com/freechipsproject/diagrammer cd diagrammer ./diagram.sh -i ~/projects/output/yourDesign.fir

image

test ray triangle intersection(n = 1000)

Output for Test 179: Result = 0 Output for Test 180: Result = 0 Output for Test 181: Result = 0 Output for Test 182: Result = 0 Output for Test 183: Result = 0 Output for Test 184: Result = 0

image

we can see the input is send into the pipeline. But it still have error , try to find where gone wrong. the result shows all is false.

io_out_bits_output_triangle_hit == false

  1. input signal not setup properly
  2. cycle not set properly

PartD : proposal implementation

in raytracer, usually we dont need calculate all the ray in scene, so we need consider the position of camere/ray/object, therefore, we need a cross operation to determin our camera coordinate. furthermore when calculate the reflection also need cross operation.

u×v=|111uxuyuzvxvyvz|

which can express as

u×v=(uyvzuzvy)(uzvxuxvz)+(uxvyuyvx)

its not much difference to dot product but project owner seem not deal with this.

Reference

  1. https://github.com/purdue-aalp/raytracer
  2. 吃透Chisel语言.10.Chisel项目构建、运行和测试(二)——Chisel中生成Verilog代码&Chisel开发流程
  3. Verilator User’s Guide
  4. Berkeley HardFloat
  5. Collision detection (AABB) wiki
  6. Hello Verilator—高品質&開源的 SystemVerilog(Verilog) 模擬器介紹&教學(一)
  7. https://github.com/freechipsproject/diagrammer
  8. https://www.graphviz.org/