PLOCTree - HackMD

# PLOCTree A Fast, High-Quality Hardware BVH Builder --- ## Background  --- ### Bounding Volume Hierarchy (BVH) Accelleration data structure for ray tracing.  ---  ![](https://i.imgur.com/dnjAbHX.png =600x)  --- ![](https://i.imgur.com/uHKtdoK.png =600x)  --- ![](https://i.imgur.com/tasDKhh.png =600x)  --- ### Surface Area Heuristic (SAH)  Estimates the likelihood of intersection with a given tree node based on the surface area of its bounding box. ![](https://i.imgur.com/P49qPzB.png)  --- ### Morton Code ![](https://upload.wikimedia.org/wikipedia/commons/c/cd/Four-level_Z.svg)  --- ### Streaming algorithm  From [wikipedia](https://en.wikipedia.org/wiki/Streaming_algorithm#Data_stream_model): "streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes" --- ### BVH builders --- #### SAH-based Top-down construction via "SAH sweeps" **PRO**: High quality **CON**: Costly --- #### LBVH-based Sort triangles according to their Morton Codes, and then produces a hierarchy based on that **PRO**: Linear cost **CON**: Inaccurate --- #### Parallel Locality Ordered Clustering (PLOC) Parallel, bottom-up BVH construction algorithm Uses Morton Code order for efficient neighboring distance computation --- ##### The PLOC algorithm steps: 1. Assign each triangle a Morton Code based on its centroid 2. Sort triangles in Morton code order 3. Apply multiple "PLOC sweeps"  --- ##### PLOC sweeps ![](https://i.imgur.com/TSEUX7o.png) ![](https://i.imgur.com/8rgJUZP.png) ---- ##### PLOC sweeps ![](https://i.imgur.com/hhuzRBo.png) ![](https://i.imgur.com/8rgJUZP.png) ---- ##### PLOC sweeps ![](https://i.imgur.com/GE9nVsl.png) ![](https://i.imgur.com/8rgJUZP.png) ---- ##### PLOC sweeps ![](https://i.imgur.com/YV5IpFE.png) ![](https://i.imgur.com/8rgJUZP.png) --- ##### Nearest Neighbors using Morton Code ordering ![](https://i.imgur.com/HZ4vajy.png =500x)  --- ## PLOCTree  --- ### Algorithm adaptation Changes to the PLOC algorithm have to be made on order for an hardware implementation --- #### Basic, serial PLOC  ![](https://i.imgur.com/9MqpLKh.png =700x) * All the data is required at start (line 1-3) * Runs kernel multiple times --- #### Streaming, hardware-oriented PLOC  * Fuses the two loops into a single one * Exploits having a small execution window --- ### Hardware implementation * Pipeline for a fixed-radius PLOC sweep * System made out of several pipelines --- #### PLOC sweep pipeline ![](https://i.imgur.com/XmzWalB.png) --- #### System architecture ![](https://i.imgur.com/KiThMjf.png =700x) --- #### Evaluation * RTL level in SystemVerilog * Various scenes  * Performance metrics  * GPU PLOC and on 1080 Ti  --- ## Results ---   * PLOCTree consumes 1.4–1.9W * 7% slower than LBVH * More power/energy consumtion than LBVH * 5 x faster with 3 x less bandwith usage, 5 x less silicon than "binned SAH builder" * 4 x faster and 8 x less bandwidth than CUDA PLOC implementation --- ![](https://i.imgur.com/3ozffhT.png) --- ![](https://i.imgur.com/AVXgN9x.png) --- ## Shortcomings * Shaky performance metrics, not accurate - is performance really better than LBVH? * Not acounting for newer memory or on-chip memory technology * Limited number of pipelines, small evaluation scope * Justifcation for mobile platform ###### tags: `chalmers`