# GPU Ray Tracing
###### tags: `GPUs`
###### paper: [link](https://www.classes.cs.uchicago.edu/archive/2016/winter/32001-1/papers/p93-parker.pdf)
###### Real-time Raytracing with Nvidia RTX: [link](https://on-demand.gputechconf.com/gtc-eu/2018/pdf/e8527-real-time-ray-tracing-with-nvidia-rtx.pdf)
## Introduction
### Motivation
* Researchers have invented many techniques for improving the performance of ray tracing, but most such techniques muddy the simplicity and conceptual purity that make ray tracing attractive.
* Nor have industry standards emerged to hide these complexities, as Direct3D and OpenGL do for rasterization.
### OptiX
* A general purpose ray tracing engine.
* A general programming interface enables the implementation of a variety of ray tracing-based algorithms in graphics and non-graphics domains, such as rendering, sound propagation, collision detection, and artificial intelligence.
* Optix is conceptually simple yet enables high performance on modern GPU architectures.
* OptiX targets highlly parallel GPU architecture, but it is applicable to a wide range of special and general-purpose hardware, including modern CPUs.
### Ray tracing, rasterization, and GPUs
* Computer graphics algorithms for rendering, or image synthesis take one of two complementary approaches
* **ray tracing**: loop over the pixels in the image, computing for each pixel, the first object visible at that pixel. It solves the geometric problem of intersecting a ray from the pixel into the objects.
* data structure: *accelaration structure*
* **rasterization**: loop over the objects in the scene, computing for each object the pixel coverd by that object. The resulting per-object pixels (fragments) are formatted for a raster display.
* data structure: *depth buffer*
### Contributions and design goals
* A general, low level ray tracig engine.
* A programmable ray tracing pipeline.
* A simple programming model.
* A domain-specific compiler.
## Programmable Ray Tracing Pipeline
* Core idea: most ray tracing algorithm can be implemented using combinations of a small set of programmable operations.
* User-provided operations (*programs*) can be combined with a user-defined data structure (*payload*) associated with each ray.
### Programs

* ↑ Ray tracing pipline (Call graph)
* yellow boxes repressent user-specified programs; blue boxes are algorithm internal to OptiX.
* The core operation, ***rtTrace***, alternates between locating an intersection (*Traverse*) and responding to that intersection (*Shade*).
* ***rtContextLaunch*** create many instantiations of Ray generation programs.
* ***Ray generation*** program will create a ray using a cemera model for a single sample within a pixel, start a trace operation, and store the resulting color in an output buffer.
* ***Intersection*** programs implement ray-geometry intersection tests. The program determines if and where the ray touches the object and may compute normals, texture coordinates, or other attributes based on the hit position.
* ***Closest-hit*** programs are invoked once traversal has found the nearst intersection of a ray with the scene geometry. The program perform computations like shading, potentially casting new rays in the process, and store resulting data in the ray payload
* ***Any-hit*** programs are called during traversal for every ray-object intersection that is found. It may optionally terminate the ray, stopping all traversal. This is a lightweight exception mechanism that can be used to implement early ray termination for shadow rays.
* ***Miss*** programs are executed when the ray does not intersect any geometry in the interval provided.
* ***Exception*** programs are executed when the system encounters an exceptional condition, for example, when the recursion stack exceeds the amount of memory available for thread, or when a bufer access index is out of range.
* ***Selector visit*** programs expose programmability for coarse-level node graph traversal.
* a bouding box program operates on geometry to determine primitive bounds for acceleration structure construction.
### Scene representation
* The OptiX engine employs a simple but flexible structure for representing scene information and associated programmable operations, collected in a container object called the *context*.


* The ray generation program implements the camera, while a miss program implements a constant white background.
* 1. The ray generation program creates rays and traces them against the geometry group.
* 2.3. Executing parallelogram and triangle-mesh intersection until an intersection is found. If the ray intersects with geometry, the *closest-hit* program will be called whether the intersection was found on the ground plane or on the triangle mesh.
* 4. The material will recursively generate show rays to determine if the light source is unobstructed. When any intersection along the shadow ray is found, the *any-hit* program will terminate ray traversal and return to the calling program with shadow occlusion information.
* 5. If a ray does not interset with any scene geometry, the *miss* program will be invoked.
## Domain-specific Compilation
* The core of the OptiX host runtime is a just-in-time (JIT) compiler that serves several important functions.
1. the JIT stage combines all of the user-provided shader programs into one or more kernels
2. it analyzes the node graph to identify data-dependednt optimizations.
3. the resulting kernel is executed on the GPU using the CUDA driver API.
## Execution Model
* Ray tracing is a highly MIMD operation. Rays rapidly diverge even if they begin together in the camera model, which is challenge for GPUs that rely on SIMT.
* However, execution divergence is only temporary; a ray that hits a glass material temporarily diverges from one that hits a painted surface, yet they both quickly return to the core operation of tracing rays - a refraction and reflection in the former case and a shadow ray in the latter.
* We link all of the programs into a monolithic kernel, *megakernel*

* Inside each case, a user program is excuted and the result of this computation is the case, or state, to select on the next iteration.
### Fine-grained Scheduling
* megakernel execution suffers serialization penalties when the state diverges within a single SIMT unit.
* The OptiX runtime uses a fine-grained scheduling scheme to reclaim divergent threads.
* OptiX explicitly selects a single state for an entire SIMT unit to execute using a scheduling heuristic. Threads within the SIMT unit that do not require the state simply idle that iteration.


## Application Case Studies
### Whitted-style ray tracing

* The ***ray generation*** program implements a basic pinhole camera model, begining the shading process by shooting a single ray per pixel.
* The material ***closest-hit*** programs are responsible for recursively casting rays and computing a shaded sample color.
* The application defines three seperate pairs of intersections and bounding box programs:
* a parallelogram for the floor
* a sphere for the metal ball
* a thin-shell sphere for the hollow glass ball
* The application attaches to the materials' ***any-hit*** slots for shadow rays, a trivial program that immediately terminates a ray.
* The ***any-hit*** program is used to attenuate a visibility factor stored in the ray payload. As a result, the glass sphere casts a subtler shadow than the metal sphere.
## Future Work
* Adding support to further high level input languages, i.e. languages that produce PTX code to be consumed by OptiX.
* PTX serves only as an intermediate representation, it is possible to translate and execute compiled megakernels on machines other than NVIDIA GPUs.