GPU Ray Tracing

# GPU Ray Tracing ###### tags: `GPUs` ###### paper: [link](https://www.classes.cs.uchicago.edu/archive/2016/winter/32001-1/papers/p93-parker.pdf) ###### Real-time Raytracing with Nvidia RTX: [link](https://on-demand.gputechconf.com/gtc-eu/2018/pdf/e8527-real-time-ray-tracing-with-nvidia-rtx.pdf) ## Introduction ### Motivation * Researchers have invented many techniques for improving the performance of ray tracing, but most such techniques muddy the simplicity and conceptual purity that make ray tracing attractive. * Nor have industry standards emerged to hide these complexities, as Direct3D and OpenGL do for rasterization. ### OptiX * A general purpose ray tracing engine. * A general programming interface enables the implementation of a variety of ray tracing-based algorithms in graphics and non-graphics domains, such as rendering, sound propagation, collision detection, and artificial intelligence. * Optix is conceptually simple yet enables high performance on modern GPU architectures. * OptiX targets highlly parallel GPU architecture, but it is applicable to a wide range of special and general-purpose hardware, including modern CPUs. ### Ray tracing, rasterization, and GPUs * Computer graphics algorithms for rendering, or image synthesis take one of two complementary approaches * **ray tracing**: loop over the pixels in the image, computing for each pixel, the first object visible at that pixel. It solves the geometric problem of intersecting a ray from the pixel into the objects. * data structure: *accelaration structure* * **rasterization**: loop over the objects in the scene, computing for each object the pixel coverd by that object. The resulting per-object pixels (fragments) are formatted for a raster display. * data structure: *depth buffer* ### Contributions and design goals * A general, low level ray tracig engine. * A programmable ray tracing pipeline. * A simple programming model. * A domain-specific compiler. ## Programmable Ray Tracing Pipeline * Core idea: most ray tracing algorithm can be implemented using combinations of a small set of programmable operations. * User-provided operations (*programs*) can be combined with a user-defined data structure (*payload*) associated with each ray. ### Programs ![](https://i.imgur.com/5UA6GRd.png) * ↑ Ray tracing pipline (Call graph) * yellow boxes repressent user-specified programs; blue boxes are algorithm internal to OptiX. * The core operation, ***rtTrace***, alternates between locating an intersection (*Traverse*) and responding to that intersection (*Shade*). * ***rtContextLaunch*** create many instantiations of Ray generation programs. * ***Ray generation*** program will create a ray using a cemera model for a single sample within a pixel, start a trace operation, and store the resulting color in an output buffer. * ***Intersection*** programs implement ray-geometry intersection tests. The program determines if and where the ray touches the object and may compute normals, texture coordinates, or other attributes based on the hit position. * ***Closest-hit*** programs are invoked once traversal has found the nearst intersection of a ray with the scene geometry. The program perform computations like shading, potentially casting new rays in the process, and store resulting data in the ray payload * ***Any-hit*** programs are called during traversal for every ray-object intersection that is found. It may optionally terminate the ray, stopping all traversal. This is a lightweight exception mechanism that can be used to implement early ray termination for shadow rays. * ***Miss*** programs are executed when the ray does not intersect any geometry in the interval provided. * ***Exception*** programs are executed when the system encounters an exceptional condition, for example, when the recursion stack exceeds the amount of memory available for thread, or when a bufer access index is out of range. * ***Selector visit*** programs expose programmability for coarse-level node graph traversal. * a bouding box program operates on geometry to determine primitive bounds for acceleration structure construction. ### Scene representation * The OptiX engine employs a simple but flexible structure for representing scene information and associated programmable operations, collected in a container object called the *context*. ![](https://i.imgur.com/KrMLWis.png) ![](https://i.imgur.com/i9uusnv.png) * The ray generation program implements the camera, while a miss program implements a constant white background. * 1. The ray generation program creates rays and traces them against the geometry group. * 2.3. Executing parallelogram and triangle-mesh intersection until an intersection is found. If the ray intersects with geometry, the *closest-hit* program will be called whether the intersection was found on the ground plane or on the triangle mesh. * 4. The material will recursively generate show rays to determine if the light source is unobstructed. When any intersection along the shadow ray is found, the *any-hit* program will terminate ray traversal and return to the calling program with shadow occlusion information. * 5. If a ray does not interset with any scene geometry, the *miss* program will be invoked. ## Domain-specific Compilation * The core of the OptiX host runtime is a just-in-time (JIT) compiler that serves several important functions. 1. the JIT stage combines all of the user-provided shader programs into one or more kernels 2. it analyzes the node graph to identify data-dependednt optimizations. 3. the resulting kernel is executed on the GPU using the CUDA driver API. ## Execution Model * Ray tracing is a highly MIMD operation. Rays rapidly diverge even if they begin together in the camera model, which is challenge for GPUs that rely on SIMT. * However, execution divergence is only temporary; a ray that hits a glass material temporarily diverges from one that hits a painted surface, yet they both quickly return to the core operation of tracing rays - a refraction and reflection in the former case and a shadow ray in the latter. * We link all of the programs into a monolithic kernel, *megakernel* ![](https://i.imgur.com/naXzbL0.png) * Inside each case, a user program is excuted and the result of this computation is the case, or state, to select on the next iteration. ### Fine-grained Scheduling * megakernel execution suffers serialization penalties when the state diverges within a single SIMT unit. * The OptiX runtime uses a fine-grained scheduling scheme to reclaim divergent threads. * OptiX explicitly selects a single state for an entire SIMT unit to execute using a scheduling heuristic. Threads within the SIMT unit that do not require the state simply idle that iteration. ![](https://i.imgur.com/337uYRX.png) ![](https://i.imgur.com/Ux2kM7z.png) ## Application Case Studies ### Whitted-style ray tracing ![](https://i.imgur.com/u1GVl1m.png) * The ***ray generation*** program implements a basic pinhole camera model, begining the shading process by shooting a single ray per pixel. * The material ***closest-hit*** programs are responsible for recursively casting rays and computing a shaded sample color. * The application defines three seperate pairs of intersections and bounding box programs: * a parallelogram for the floor * a sphere for the metal ball * a thin-shell sphere for the hollow glass ball * The application attaches to the materials' ***any-hit*** slots for shadow rays, a trivial program that immediately terminates a ray. * The ***any-hit*** program is used to attenuate a visibility factor stored in the ray payload. As a result, the glass sphere casts a subtler shadow than the metal sphere. ## Future Work * Adding support to further high level input languages, i.e. languages that produce PTX code to be consumed by OptiX. * PTX serves only as an intermediate representation, it is possible to translate and execute compiled megakernels on machines other than NVIDIA GPUs.