A100 Benchmark

# A100 Benchmark ### host<->device bandwidth ``` Device 0: A100-SXM4-40GB Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(GB/s) 32000000 24.6 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(GB/s) 32000000 25.9 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(GB/s) 32000000 1091.6 ``` ### inter-GPU bandwidth ``` P2P Connectivity Matrix D\D 0 1 0 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 1276.55 17.38 1 17.51 1279.69 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) D\D 0 1 0 1278.64 265.74 1 264.77 1288.13 Bidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 1288.13 17.78 1 18.26 1301.54 Bidirectional P2P=Enabled Bandwidth Matrix (GB/s) D\D 0 1 0 1302.08 515.24 1 528.83 1304.80 P2P=Disabled Latency Matrix (us) GPU 0 1 0 2.33 24.14 1 24.45 2.81 CPU 0 1 0 3.03 9.05 1 9.02 3.00 P2P=Enabled Latency (P2P Writes) Matrix (us) GPU 0 1 0 2.33 3.02 1 2.98 2.74 CPU 0 1 0 3.06 2.38 1 2.45 3.06 ```