DG2 performance on Windows

--- title: 'DG2 performance on Windows' disqus: hackmd --- DG2 performance on Windows === ## Table of Contents [TOC] ## BKC * ADL-S RVP * FRD4 128EU B1 Arc A380 * Win10 10.0.19044.2006 * DG2 * FRD1: FRD_DG2_512_C1_ES_136_IFWI_22WW38_02_GS1879_PC9771_1059_SN_V5_14GT_TRC_DS_C8.bin * FRD4: FRD4_DG2_128_B0_ES_276_IFWI_22WW39_04_GS1899_PC9775C_OP1059_SN_V5_15.5GT_C8_TR_DS.bin * Driver: 31.0.101.3276 ## DG2 AIC FRD1 512 EU Default -d GPU.1 ``` C:\Users\Win10>openvino\Scripts\activate (openvino) C:\Users\Win10>benchmark_app -m C:\Users\Win10\Desktop\openvino\openvino_models\public\yolo-v4-tf\FP16-INT8\yolo-v4-tf.xml -d GPU.1 [Step 1/11] Parsing and validating input arguments [ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. [Step 2/11] Loading OpenVINO [ WARNING ] PerformanceMode was not explicitly specified in command line. Device GPU.1 performance hint will be set to THROUGHPUT. [ INFO ] OpenVINO: API version............. 2022.2.0-7690-940e927a22b-refs/pull/1296/head [ INFO ] Device info GPU Intel GPU plugin........ version 2022.2 Build................... 2022.2.0-7690-940e927a22b-refs/pull/1296/head [Step 3/11] Setting device configuration [ WARNING ] -nstreams default value is determined automatically for GPU.1 device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. [Step 4/11] Reading network files [ INFO ] Read model took 127.79 ms [Step 5/11] Resizing network to match image sizes and given batch [ INFO ] Network batch size: 1 [Step 6/11] Configuring input of the model [ INFO ] Model input 'image_input' precision u8, dimensions ([N,H,W,C]): 1 608 608 3 [ INFO ] Model output 'Func/StatefulPartitionedCall/output/_542:0' precision f32, dimensions ([...]): 1 38 38 255 [ INFO ] Model output 'Func/StatefulPartitionedCall/output/_543:0' precision f32, dimensions ([...]): 1 19 19 255 [ INFO ] Model output 'Func/StatefulPartitionedCall/output/_544:0' precision f32, dimensions ([...]): 1 76 76 255 [Step 7/11] Loading the model to the device [ INFO ] Compile model took 16760.36 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] DEVICE: GPU.1 [ INFO ] AVAILABLE_DEVICES , ['0', '1'] [ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 2, 1) [ INFO ] RANGE_FOR_STREAMS , (1, 2) [ INFO ] OPTIMAL_BATCH_SIZE , 1 [ INFO ] MAX_BATCH_SIZE , 1 [ INFO ] FULL_DEVICE_NAME , Intel(R) Arc(TM) A770 Graphics (dGPU) [ INFO ] DEVICE_TYPE , Type.DISCRETE [ INFO ] OPTIMIZATION_CAPABILITIES , ['FP32', 'BIN', 'FP16', 'INT8', 'GPU_HW_MATMUL'] [ INFO ] GPU_UARCH_VERSION , 12.7.1 [ INFO ] GPU_EXECUTION_UNITS_COUNT , 512 [ INFO ] PERF_COUNT , False [ INFO ] MODEL_PRIORITY , Priority.MEDIUM [ INFO ] GPU_HOST_TASK_PRIORITY , Priority.MEDIUM [ INFO ] GPU_QUEUE_PRIORITY , Priority.MEDIUM [ INFO ] GPU_QUEUE_THROTTLE , Priority.MEDIUM [ INFO ] GPU_ENABLE_LOOP_UNROLLING , True [ INFO ] CACHE_DIR , [ INFO ] PERFORMANCE_HINT , PerformanceMode.THROUGHPUT [ INFO ] COMPILATION_NUM_THREADS , 24 [ INFO ] NUM_STREAMS , 1 [ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0 [ INFO ] DEVICE_ID , 1 [Step 9/11] Creating infer requests and preparing input data [ INFO ] Create 64 infer requests took 454.93 ms [ WARNING ] No input files were given for input 'image_input'!. This input will be filled with random values! [ INFO ] Fill input 'image_input' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 64 inference requests using 1 streams for GPU.1, inference only: True, limits: 60000 ms duration) [ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop). [ INFO ] First inference took 652.58 ms [Step 11/11] Dumping statistics report Count: 16640 iterations Duration: 60293.32 ms Latency: Median: 231.60 ms AVG: 231.50 ms MIN: 119.99 ms MAX: 237.63 ms Throughput: 275.98 FPS (openvino) C:\Users\Win10> ``` >-d GPU.1 -hint none Count: 5452 iterations Duration: 60027.22 ms Latency: Median: 21.93 ms AVG: 21.93 ms MIN: 11.59 ms MAX: 23.97 ms Throughput: 90.83 FPS >-d GPU.1 -b 8 -nstream 4 -hint none Count: 1136 iterations Duration: 60574.46 ms Latency: Median: 421.04 ms AVG: 425.30 ms MIN: 122.51 ms MAX: 461.08 ms Throughput: 150.03 FPS >-d GPU.1 -b 12 -nstream 4 -hint none Count: 736 iterations Duration: 61253.30 ms Latency: Median: 666.41 ms AVG: 662.91 ms MIN: 213.72 ms MAX: 697.80 ms Throughput: 144.19 FPS >-d GPU.1 -b 16 -nstream 4 -hint none Count: 32 iterations Duration: 94451.72 ms Latency: Median: 23589.75 ms AVG: 21390.65 ms MIN: 11717.85 ms MAX: 23649.15 ms Throughput: 5.42 FPS >-d GPU.1 -b 8 -nstream 8 -hint none Count: 1152 iterations Duration: 61502.75 ms Latency: Median: 855.28 ms AVG: 848.88 ms MIN: 231.36 ms MAX: 906.93 ms Throughput: 149.85 FPS >-d GPU.1 -hint throughput Count: 16384 iterations Duration: 60663.84 ms Latency: Median: 472.02 ms AVG: 472.80 ms MIN: 246.64 ms MAX: 516.02 ms Throughput: 270.08 FPS DG2 AIC FRD4 128 EU --- ``` (openvino) C:\Users\Win10>benchmark_app -m C:\Users\Win10\Desktop\openvino\openvino_models\public\yolo-v4-tf\FP16-INT8\yolo-v4-tf.xml -d GPU.1 [Step 1/11] Parsing and validating input arguments [ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. [Step 2/11] Loading OpenVINO [ WARNING ] PerformanceMode was not explicitly specified in command line. Device GPU.1 performance hint will be set to THROUGHPUT. [ INFO ] OpenVINO: API version............. 2022.2.0-7690-940e927a22b-refs/pull/1296/head [ INFO ] Device info GPU Intel GPU plugin........ version 2022.2 Build................... 2022.2.0-7690-940e927a22b-refs/pull/1296/head [Step 3/11] Setting device configuration [ WARNING ] -nstreams default value is determined automatically for GPU.1 device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. [Step 4/11] Reading network files [ INFO ] Read model took 161.76 ms [Step 5/11] Resizing network to match image sizes and given batch [ INFO ] Network batch size: 1 [Step 6/11] Configuring input of the model [ INFO ] Model input 'image_input' precision u8, dimensions ([N,H,W,C]): 1 608 608 3 [ INFO ] Model output 'Func/StatefulPartitionedCall/output/_542:0' precision f32, dimensions ([...]): 1 38 38 255 [ INFO ] Model output 'Func/StatefulPartitionedCall/output/_543:0' precision f32, dimensions ([...]): 1 19 19 255 [ INFO ] Model output 'Func/StatefulPartitionedCall/output/_544:0' precision f32, dimensions ([...]): 1 76 76 255 [Step 7/11] Loading the model to the device [ INFO ] Compile model took 17912.19 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] DEVICE: GPU.1 [ INFO ] AVAILABLE_DEVICES , ['0', '1'] [ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 2, 1) [ INFO ] RANGE_FOR_STREAMS , (1, 2) [ INFO ] OPTIMAL_BATCH_SIZE , 1 [ INFO ] MAX_BATCH_SIZE , 1 [ INFO ] FULL_DEVICE_NAME , Intel(R) Arc(TM) A380 Graphics (dGPU) [ INFO ] DEVICE_TYPE , Type.DISCRETE [ INFO ] OPTIMIZATION_CAPABILITIES , ['FP32', 'BIN', 'FP16', 'INT8', 'GPU_HW_MATMUL'] [ INFO ] GPU_UARCH_VERSION , 12.7.1 [ INFO ] GPU_EXECUTION_UNITS_COUNT , 128 [ INFO ] PERF_COUNT , False [ INFO ] MODEL_PRIORITY , Priority.MEDIUM [ INFO ] GPU_HOST_TASK_PRIORITY , Priority.MEDIUM [ INFO ] GPU_QUEUE_PRIORITY , Priority.MEDIUM [ INFO ] GPU_QUEUE_THROTTLE , Priority.MEDIUM [ INFO ] GPU_ENABLE_LOOP_UNROLLING , True [ INFO ] CACHE_DIR , [ INFO ] PERFORMANCE_HINT , PerformanceMode.THROUGHPUT [ INFO ] COMPILATION_NUM_THREADS , 24 [ INFO ] NUM_STREAMS , 1 [ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0 [ INFO ] DEVICE_ID , 1 [Step 9/11] Creating infer requests and preparing input data [ INFO ] Create 16 infer requests took 143.93 ms [ WARNING ] No input files were given for input 'image_input'!. This input will be filled with random values! [ INFO ] Fill input 'image_input' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 16 inference requests using 1 streams for GPU.1, inference only: True, limits: 60000 ms duration) [ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop). [ INFO ] First inference took 912.59 ms [Step 11/11] Dumping statistics report Count: 4624 iterations Duration: 60273.46 ms Latency: Median: 207.98 ms AVG: 208.24 ms MIN: 103.81 ms MAX: 1146.41 ms Throughput: 76.72 FPS ``` > -d GPU.1 -hint throughput Count: 4704 iterations Duration: 60499.28 ms Latency: Median: 411.43 ms AVG: 410.37 ms MIN: 202.38 ms MAX: 412.55 ms Throughput: 77.75 FPS >-d GPU.1 -b 8 -nstream 8 -hint none Count: 608 iterations Duration: 62618.32 ms Latency: Median: 1647.54 ms AVG: 1627.46 ms MIN: 381.35 ms MAX: 1651.07 ms Throughput: 77.68 FPS ![](https://i.imgur.com/NvNzIWB.png) ###### tags: `DG2` `OPENVINO' `