傅信豪
contribute by Shin-Hao, Fu
my goal is to analysis raytracer, and this project have already finished some fundamental work of how to support raytrace calculation. In raytracer, usually heavily involved dot product, cosine similarity calculation, and determinate whether a ray have intersection with a object, which may included circle sphere, cub… .
$ sbt "test:testOnly raytracer_datapath.Datapath_test"
but it shows error
~/CS/raytracer stable sbt test:testOnly raytracer_datapath.Datapath_test 127 err 10:51:56 PM
[info] welcome to sbt 1.8.0 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project raytracer-build from plugins.sbt ...
[info] loading project definition from /Users/shinhaofu/CS/raytracer/project
[info] loading settings for project root from build.sbt ...
[info] set current project to raytracer (in build file:/Users/shinhaofu/CS/raytracer/)
[warn] sbt 0.13 shell syntax is deprecated; use slash syntax instead: Test / testOnly
[info] compiling 28 Scala sources to /Users/shinhaofu/CS/raytracer/target/scala-2.13/classes ...
https://repo1.maven.org/maven2/org/scala-sbt/compiler-bridge_2.13/1.8.0/compiler-bridge_2.13-1.8.0.pom
100.0% [##########] 2.7 KiB (2.7 KiB / s)
[info] Non-compiled module 'compiler-bridge_2.13' for Scala 2.13.8. Compiling...
[info] Compilation completed in 4.053s.
[warn] 'genBundleElements' is now default behavior, you can remove the scalacOption.
[info] compiling 8 Scala sources to /Users/shinhaofu/CS/raytracer/target/scala-2.13/test-classes ...
[warn] 'genBundleElements' is now default behavior, you can remove the scalacOption.
datapath does not support euclidean!
[info] FNFlipSign_test:
[info] - FNFlipSign correctly changes sign of float32 vlaues
[info] - FNFlipSign handles pos/neg zero correctly
[info] SkidBufferStageTest:
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when both ends are free
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when emit is congested
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when intake is throttled
[info] - skidbuffer stage should pass through identical values from 0 to 1000 when intake is throttled and emit is congested
[info] - skidbuffer stage should pass through squared values from 0 to 1000 when intake is throttled and emit is congested
[info] - chained skidbuffer stages should pass through correct values from 0 to 1000 when intake is throttled and emit is congested
[info] - chained generalized skid buffer stages should function correctly when intake is throttled and emit is congested
[warn] In the last 10 seconds, 5.027 (51.5%) were spent in GC. [Heap: 0.00GB free of 1.00GB, max 1.00GB] Consider increasing the JVM heap using `-Xmx` or try a different collector, e.g. `-XX:+UseG1GC`, for better performance.
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "classloader-cache-cleanup-0"
| => raytracer_datapath.Datapath_test 32s
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-7864"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "process reaper"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sbt-progress-report-scheduler"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "make -C /Users/shinhaofu/CS/raytracer/cached_verilator_backend/euclideanfalse/verilated -j 8 -f VUnifiedDatapath_wrapper.mk VUnifiedDatapath_wrapper stdout thread"
java.lang.NullPointerException
at com.swoval.files.ApplePathWatcher$1.accept(ApplePathWatcher.java:267)
at com.swoval.files.ApplePathWatcher$1.accept(ApplePathWatcher.java:261)
at com.swoval.files.apple.FileEventMonitorImpl$WrappedConsumer$1.run(FileEventMonitors.java:178)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
[info] Datapath_test:
[info] QuadSortRecFNTest:
[info] raytracer_datapath.Datapath_test *** ABORTED ***
[info] raytracer_datapath.QuadSortRecFNTest *** ABORTED ***
[info] java.lang.OutOfMemoryError: Java heap space
[info] ...
[info] java.lang.OutOfMemoryError: Java heap space
[info] ...
java.lang.OutOfMemoryError: Java heap space
[error] [launcher] error during sbt launcher: java.lang.OutOfMemoryError: Java heap space
Exception in thread "com.swoval.files.apple.FileEventsMonitor.runloop" java.lang.OutOfMemoryError: Java heap space
it says the memory is not enough to run this test
evaluate all testbench
$ sbt -J-Xmx4G test
[info] Run completed in 31 seconds, 188 milliseconds.
[info] Total number of tests run: 20
[info] Suites: completed 5, aborted 0
[info] Tests: succeeded 20, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
evaluate one testbench result
$ sbt -J-Xmx4G "test:testOnly raytracer_datapath.Datapath_test"
[info] Run completed in 21 seconds, 177 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
we use Datapath_test.scala
as example, in Datapath_test.scala
class Datapath_test extends AnyFreeSpec with ChiselScalatestTester
it first initialize to test several functionality, each test module in will have answer and test function like testUnifiedIntersection
, which will use several hardware unit to achive the goal.
because some test will involve several hardware unit to generate result. object RaytracerGold
was writed inscala
, use sofeware-implement for testbench
validate the floating pointe operation corrrectness
validate the 33-bit floating point operation correctness. this type fp mainly use for calculation in the whole pipeline.
test quadsort/quadsortRec, which will use in unified datapath stage(10), if we want to judge a ray has intersection with boxes.
this is the main test of whole system, every feature test will initial this
def testUnifiedIntersection(
extended: Boolean,
description: String,
ray_seq: Seq[SW_Ray],
box_seq_seq: Seq[Seq[SW_Box]],
triangle_seq: Seq[SW_Triangle],
op_seq: Seq[SW_Opcode]
){...
test(gen_baseline_or_extended_datapath(extended))
}
Let's see gen_baseline_or_extended_datapath(extended)
the extended will determinate which length of
def gen_baseline_or_extended_datapath(extended: Boolean) = extended match {
case true => new UnifiedDatapath_wrapper_16
case false => new UnifiedDatapath_wrapper
}
unifiedDatapath is the top level module for the ray tracer datapath.
Two type of wrapper, UnifiedDatapath_wrapper_16
and UnifiedDatapath_wrapper
.
according the document from https://github.com/purdue-aalp/raytracer/releases/tag/v1
This allows the Ray Tracer Datapath to be integrated into a larger datapath and propagate back pressure via its valid-ready interfaces. A drawback of this design is increased hardware overhead since each skid buffer contains two registers.
it says this design may cause overhead by adding extra registers, but it can guarantee the dataflow safe by "handshake mechanism"
after run testcase, which is write in scala, we can get the waveform result inraytracer/cached_verilator_backend
$ cd raytracer/cached_verilator_backend/euclideanfalse
we can check the waveform
$ gtkwave UnifiedDatapath_wrapper.vcd
to make result clear according difference test, i adjust outdir
for UnifiedDatapath_wrapper.vcd.
~/CS/raytracer/cached_verilator_backend stable >1 tree -d -L 1 ok 11:31:46 AM
.
├── angular
├── baseline_ray_triangle_random
├── euclidean
├── extended_ray_box_random
├── extended_ray_triangle_random
└── ray_box_random
7 directories
enable 32 bit datapath. the test incuding euclidean, angular(a.k.a cosin similarity), ray_box_is_ intersection, ray triangle is intersection.
Sequentially feed 100 random testcase to each test .
enable only euclidean test 16bit datapath
overview the toolchain for chisel development.!
it shows that we can combin sim_main.cpp with our hardware implementation by transfering hardware desing to verilog with verilator, then compile to executeable file.
use
$ make verilator
in Makefile
...
verilator:
sbt "runMain board.verilator.VerilogGenerator"
cd verilog/verilator && verilator --trace --exe --cc sim_main.cpp Top.v && make -C obj_dir -f VTop.mk
...
then use
$ ./run-verilator.sh -instruction src/main/resources/hello.asmbin
-time 2000 -vcd dump.vcd
Let's see run-verilator.sh
. it will execute VTop.exe, and take the hello.asmbin as input and other arguments to run.
#!/bin/sh
if [ ! -f verilog/verilator/obj_dir/VTop ]; then
echo "Failed to generate Verilog"
exit 1
fi
verilog/verilator/obj_dir/VTop $*
run $ gtkwave dump.vcd
...
// define how to connect to IO to datapath
...
object VerilogGenerator extends App {
emitVerilog(
new Top(false), // To generate Top(false), you can change to false here
Array("--target-dir", "verilog/verilator") // Specify the output directory
)
}
but it shows error with message below
[error] stack trace is suppressed; run last Compile / runMain for the full output
[error] (Compile / runMain) circt.stage.phases.Exceptions$FirtoolNonZeroExitCode: firtool returned a non-zero exit code. Note that this version of Chisel (5.0.0) was published against firtool version 1.40.0.
it shows Chisel (5.0.0) was published against firtool version 1.40.0.
, it depends on firtool 1.40.0
follow chisel page && firtool 1.40.0 release page , download firrtl-bin-macos-11.tar.gz
According to you DevEnv chose the properly version, here i use macOS version
tar -xvzf firrtl-bin-macos-11.tar.gz
cd firtool-1.40.0
sudo mv ./bin/firtool /usr/local/bin/firtool
firtool --version
if success download and add to your env
~/Dep/firtool-1.40.0 firtool --version
LLVM (http://llvm.org/):
LLVM version 17.0.0git
Optimized build.
CIRCT firtool-1.40.0
go to project run make verilator
, define in Makefile
verilator:
sbt "runMain board.verilator.VerilogGenerator"
verilator --cc ./verilog/verilator/Top.sv --exe --Mdir ./verilog/verilator/obj_dir
check verilog/verilator/obj_dir
Fix the permissions of the uploaded picture.
#include <verilated.h>
#include <verilated_vcd_c.h>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
#include "VTop.h" // Verilated model header
// Global variables for simulation time
vluint64_t main_time = 0; // Main simulation clock
constexpr vluint64_t sim_time = 10000; // Maximum simulation duration
// Simulation time update
double sc_time_stamp() { return main_time; }
int main(int argc, char **argv) {
// Initialize the Verilated model
Verilated::commandArgs(argc, argv); // Process command-line arguments
VTop *top = new VTop; // Instantiate the module
// Waveform output (if needed)
VerilatedVcdC *vcd = nullptr;
Verilated::traceEverOn(true); // Enable tracing
vcd = new VerilatedVcdC;
top->trace(vcd, 99); // Set trace level
vcd->open("waveform.vcd");
// Main simulation loop
while (!Verilated::gotFinish() && main_time < sim_time) {
// Update clock signal (rising/falling edges)
top->clock = (main_time % 2 == 0);
// Reset signal (active during the initial few clock cycles)
top->reset = (main_time < 10);
// Drive input signals
if (main_time > 20) {
top->io_in_ray_origin_x = 0x12345678; // Simulate input ray origin
top->io_in_ray_origin_y = 0x87654321;
top->io_in_ray_origin_z = 0x11112222;
top->io_in_ray_dir_x = 0x33334444; // Simulate input ray direction
top->io_in_ray_dir_y = 0x55556666;
top->io_in_ray_dir_z = 0x77778888;
top->io_in_opcode = 2; // Opcode (test a specific function)
}
// Evaluate the simulation step
top->eval();
// Check output signals (compare with golden results)
if (main_time > 30) {
if (top->io_out_valid) { // Check if output is valid
printf("Time %llu: Output Valid\n", main_time);
printf("Tmin_out_0 = %u\n", top->io_out_bits_tmin_out_0);
printf("Tmin_out_1 = %u\n", top->io_out_bits_tmin_out_1);
// Additional output signals can be checked here
}
}
// Output waveform (optional)
if (vcd) vcd->dump(main_time);
// Update simulation time
main_time++;
}
// End simulation
top->final(); // Perform the final operation on the module
if (vcd) {
vcd->close();
delete vcd;
}
delete top;
return 0;
}
run make verilator_compile
verilator_compile:
sbt "runMain board.verilator.VerilogGenerator"
verilator --cc --trace ./verilog/verilator/Top.sv --exe ./verilog/verilator/sim_main.cpp --Mdir ./verilog/verilator/obj_dir
cd verilog/verilator && make -C obj_dir -f VTop.mk
then run make veriltor_run
verilator_run:
cd verilog/verilator/obj_dir Vtop.exe&>kwave waveform.vcd
if we want to visualize the datapath, first we need to convert Top module into fir format.
...
object FIRRTLGenerator extends App {
// convert generate FIRRTL Circuit
val firrtlCircuit: Circuit = convert(new Top(false))
// write out FIRRTL result
val outputPath = Paths.get("firrtl/Top.fir")
Files.createDirectories(outputPath.getParent)
Files.write(outputPath, firrtlCircuit.serialize.getBytes(StandardCharsets.UTF_8))
println(s"FIRRTL saved to: $outputPath")
}
...
$ sbt "runMain board.verilator.FIRRTLGenerator"
use diagrammer, a open source project that help developer transfer fir format into svg format, it depends on graphviz
git clone https://github.com/freechipsproject/diagrammer
cd diagrammer
./diagram.sh -i ~/projects/output/yourDesign.fir
Output for Test 179: Result = 0
Output for Test 180: Result = 0
Output for Test 181: Result = 0
Output for Test 182: Result = 0
Output for Test 183: Result = 0
Output for Test 184: Result = 0
we can see the input is send into the pipeline. But it still have error , try to find where gone wrong. the result shows all is false.
io_out_bits_output_triangle_hit == false
in raytracer, usually we dont need calculate all the ray in scene, so we need consider the position of camere
/ray
/object
, therefore, we need a cross operation to determin our camera coordinate. furthermore when calculate the reflection also need cross operation.
which can express as
its not much difference to dot product but project owner seem not deal with this.