[11-02 to 11-06]

# [11-02 to 11-06] - Single image throughput was 49 imgs/sec - Batch size for optimum throughput: | Batch_size | Time | Throughput (imgs/sec) | | -------- | -------- | -------- | | 2 | 20ms | 100 | | 4 | 27ms | 148 | | 6 | 44ms | 136 | 8 | 58 ms | 138 | 10 | 62 ms | 161 | 16 | 83 ms | 193 | 32* | 100 ms | 320 | 64 | 169 ms | 378 | 128 | 312 ms | 410 | 256 | out of mem | lol Key GPU performance changes in code: - Exported Onnx model support dynamic i/p dimensions - Set max batch size when you create the engine. - Use `execute_async` instead of `execute` for doing inference. It increases performance by overlapping GPU stream and kernel compute. Profile of batch inference code of - 384 image samples - with batch size 32 ![](https://i.imgur.com/sFO4azL.png) GPU streams ![](https://i.imgur.com/P8pZOco.png)