### Experiment results
#### Maze
Comparison of local search steps (reviewer 1, 4)
Denote BoN-i as best-of-N sampling with i local seach steps. Compute is measured as total number of denoising steps, which includes both local and global search. Performance is evaluated as success rates
|Compute|BoN-0|BoN-1|BoN-2|BoN-6 (our baseline)|BoN-8|TreeG|SVDD|FK|DAS|
|-|-|-|-|-|-|-|-|-|-|
|128|6.875±2.1|3.125±2.72|5.6±2.07|**19.2±2.2**|17.5±2.5|4.375±2.1|5.0±3.1|5.0±3.1|5.0±3.1|
|256|6.25±1.2|7.5±3.1|13.75±2.8|**27.0±3.2**|20.6±1.1|9.375±2.1|8.75±1.25|6.9±2.1|6.875±2.1|
|512|7.5±1.8|10.0±3.1|26.25±2.2|36.7|**38.1±1.1**|10.0±1.77|15.0±3.25|10.0±1.8|12.5±1.8|
|1024|15.6±2.7|20.6±1.1|41.25±2.2|**58.3±3.2**|50.6±3.2|19.375±3.2|17.625±2.1|26.9±3.2|29.4±3.25|
|2048|25±2.5|35±2.5|53.75±2.8|**81.0±2.5**|71.9±1.1|22.5±2.5|20.675±3.2|37.25±2.2|35.0±1.1|
Comparison of our adaptive compute allocation strategy
Here, even with the same ammount of 6 local search steps, our adaptive compute allocation improves the pareto curves of inference-time compute
|compute|BFS-Pruning|BFS-Resampling|TreeG|SVDD|FK|DAS|
|-|-|-|-|-|-|-|
|192|**32.5±1.1**|20.0±2.5|18.75±2.8|16.25±3.7|17.5±2.5|18.1±1.1|
|384|**48.1±1.1**|41.25±2.7|38.1±1.1|15.0±2.5|32.5±3.1|36.9±1.1|
|768|**71.25±2.2**|57.75±3.2|52.5±1.8|17.5±4.3|45.75±2.8|50±2.5|
|Compute|BFS-0.02|BFS-0.005|BFS-0.01|TreeG|SVDD| FK| DAS |
|-|-|-|-|-|-|-|-|
|192|27.5±4.3|**32.5±1.1**|31.2±4.2|18.75±2.8|16.25±3.7|17.5±2.5|18.1±1.1|
|384|42.5±5.2|**48.1±1.1**|45.5±2.3|38.1±1.1|15.0±2.5|32.5±3.1|36.9±1.1|
|768|67.6±1.1|**71.25±2.2**|70.05±1.1|52.5±1.8|17.5±4.3|45.75±2.8|50±2.5|
#### Imagenet
Double verifier
|Compute|BoN-Single|FK|DAS|TreeG|SVDD|BoN-Double (our baseline)|
|-|-|-|-|-|-|-|
|200|188.5±0.1/25.6±0.5|181.4±0.2/26.7±0.3|180.2±0.1/27.8±0.6|182.3±0.1/26.2±0.2|190.2±0.2/22±0.2 |**178.4±0.2/29.7±0.3**|
|400|171.5±0.3/31.8±0.3|152.8±0.1/37.5±0.2|156.0±0.1/36.0±0.1|157.2±0.3/34.8±0.5|178.3±0.2/29.5±0.1|**151.2±0.1/37.5±0.2**|
|800|155.7±0.2/35.8±0.3|132.1±0.1/46.5±0.1|133.1±0.2/46.2±0.1|134.2±0.1/45.8±0.2|167.5±0.3/32.8±0.5|**127.8±0.2/49.2±0.3**|
Recurrence
|Compute|BoN-1|BoN-2|FK|DAS|TreeG|SVDD|
|-|-|-|-|-|-|-|
|200|**178.4±0.2/29.7±0.3**|180±0.3/28±0.1|181.4±0.2/26.7±0.3|180.2±0.1/27.8±0.6|182.3±0.1/26.2±0.2|190.2±0.2/22±0.2|
|400|151.2±0.1/37.5±0.2|**146±0.5/43±0.3**|152.8±0.1/37.5±0.2|156.0±0.1/36.0±0.1|157.2±0.3/34.8±0.5|178.3±0.2/29.5±0.1|
|800|127.8±0.2/49.2±0.3|**118±0.1/52.3±0.2**|132.1±0.1/46.5±0.1|133.1±0.2/46.2±0.1|134.2±0.1/45.8±0.2|167.5±0.3/32.8±0.5|
Classifier guidance
|Compute|BoN|BFS|DFS|
|-|-|-|-|
|200|58.6±0.2/80.5±0.4|56.6±0.1/81±0.2|56.2±0.2/90±0.3|
|400|54.5±0.3/88.5±0.5|54.2±0.2/92.2±0.3|-|
OOD
|Compute|BoN-single|BoN-double|
|-|-|-|
|400|0.161|0.164|
|800|0.165|0.184|
Comparison between inference-scaling
We compare against classifier guidance, which is the state-of-the-art training based guided image generation method, we have:
|Method|Compute (NFEs)|FID↓|Acc↑|
|-|-|-|-|
|Inference-scaling|1100|97±0.2|**65.5±0.3**|
|Classifier-guidance|100+training compute|**70**|59.5|
|TFG|100|206|22|
### DFS
|Compute|BoN|DFS-0.5|DFS-0.7|DFS-0.9|FK|DAS|TreeG|SVDD|
|-|-|-|-|-|-|-|-|-|
|100|0.631±0.001|0.682±0.002|**0.693±0.001**|-|0.635±0.001|0.637±0.002|0.633±0.003|0.630±0.003|
|150|0.675±0.002|0.721±0.001|**0.725±0.002**|0.708±0.001|0.678±0.002|0.683±0.001|0.676±0.003|0.653±0.001|
|200|0.704±0.002|-|**0.737±0.003**|0.733±0.002|0.710±0.002|0.711±0.001|0.709±0.002|0.667±0.001|
|250|0.719±0.002|-|-|**0.748±0.001**|0.721±0.002|0.723±0.002|0.722±0.001|0.679±0.001|