---
title: 'DG2 performance on Windows'
disqus: hackmd
---
DG2 performance on Windows
===
## Table of Contents
[TOC]
## BKC
* ADL-S RVP
* FRD4 128EU B1 Arc A380
* Win10 10.0.19044.2006
* DG2
* FRD1:
FRD_DG2_512_C1_ES_136_IFWI_22WW38_02_GS1879_PC9771_1059_SN_V5_14GT_TRC_DS_C8.bin
* FRD4: FRD4_DG2_128_B0_ES_276_IFWI_22WW39_04_GS1899_PC9775C_OP1059_SN_V5_15.5GT_C8_TR_DS.bin
* Driver: 31.0.101.3276
## DG2 AIC FRD1 512 EU
Default -d GPU.1
```
C:\Users\Win10>openvino\Scripts\activate
(openvino) C:\Users\Win10>benchmark_app -m C:\Users\Win10\Desktop\openvino\openvino_models\public\yolo-v4-tf\FP16-INT8\yolo-v4-tf.xml -d GPU.1
[Step 1/11] Parsing and validating input arguments
[ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device GPU.1 performance hint will be set to THROUGHPUT.
[ INFO ] OpenVINO:
API version............. 2022.2.0-7690-940e927a22b-refs/pull/1296/head
[ INFO ] Device info
GPU
Intel GPU plugin........ version 2022.2
Build................... 2022.2.0-7690-940e927a22b-refs/pull/1296/head
[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for GPU.1 device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Read model took 127.79 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'image_input' precision u8, dimensions ([N,H,W,C]): 1 608 608 3
[ INFO ] Model output 'Func/StatefulPartitionedCall/output/_542:0' precision f32, dimensions ([...]): 1 38 38 255
[ INFO ] Model output 'Func/StatefulPartitionedCall/output/_543:0' precision f32, dimensions ([...]): 1 19 19 255
[ INFO ] Model output 'Func/StatefulPartitionedCall/output/_544:0' precision f32, dimensions ([...]): 1 76 76 255
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 16760.36 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: GPU.1
[ INFO ] AVAILABLE_DEVICES , ['0', '1']
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 2, 1)
[ INFO ] RANGE_FOR_STREAMS , (1, 2)
[ INFO ] OPTIMAL_BATCH_SIZE , 1
[ INFO ] MAX_BATCH_SIZE , 1
[ INFO ] FULL_DEVICE_NAME , Intel(R) Arc(TM) A770 Graphics (dGPU)
[ INFO ] DEVICE_TYPE , Type.DISCRETE
[ INFO ] OPTIMIZATION_CAPABILITIES , ['FP32', 'BIN', 'FP16', 'INT8', 'GPU_HW_MATMUL']
[ INFO ] GPU_UARCH_VERSION , 12.7.1
[ INFO ] GPU_EXECUTION_UNITS_COUNT , 512
[ INFO ] PERF_COUNT , False
[ INFO ] MODEL_PRIORITY , Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY , Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY , Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE , Priority.MEDIUM
[ INFO ] GPU_ENABLE_LOOP_UNROLLING , True
[ INFO ] CACHE_DIR ,
[ INFO ] PERFORMANCE_HINT , PerformanceMode.THROUGHPUT
[ INFO ] COMPILATION_NUM_THREADS , 24
[ INFO ] NUM_STREAMS , 1
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0
[ INFO ] DEVICE_ID , 1
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 64 infer requests took 454.93 ms
[ WARNING ] No input files were given for input 'image_input'!. This input will be filled with random values!
[ INFO ] Fill input 'image_input' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 64 inference requests using 1 streams for GPU.1, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 652.58 ms
[Step 11/11] Dumping statistics report
Count: 16640 iterations
Duration: 60293.32 ms
Latency:
Median: 231.60 ms
AVG: 231.50 ms
MIN: 119.99 ms
MAX: 237.63 ms
Throughput: 275.98 FPS
(openvino) C:\Users\Win10>
```
>-d GPU.1 -hint none
Count: 5452 iterations
Duration: 60027.22 ms
Latency:
Median: 21.93 ms
AVG: 21.93 ms
MIN: 11.59 ms
MAX: 23.97 ms
Throughput: 90.83 FPS
>-d GPU.1 -b 8 -nstream 4 -hint none
Count: 1136 iterations
Duration: 60574.46 ms
Latency:
Median: 421.04 ms
AVG: 425.30 ms
MIN: 122.51 ms
MAX: 461.08 ms
Throughput: 150.03 FPS
>-d GPU.1 -b 12 -nstream 4 -hint none
Count: 736 iterations
Duration: 61253.30 ms
Latency:
Median: 666.41 ms
AVG: 662.91 ms
MIN: 213.72 ms
MAX: 697.80 ms
Throughput: 144.19 FPS
>-d GPU.1 -b 16 -nstream 4 -hint none
Count: 32 iterations
Duration: 94451.72 ms
Latency:
Median: 23589.75 ms
AVG: 21390.65 ms
MIN: 11717.85 ms
MAX: 23649.15 ms
Throughput: 5.42 FPS
>-d GPU.1 -b 8 -nstream 8 -hint none
Count: 1152 iterations
Duration: 61502.75 ms
Latency:
Median: 855.28 ms
AVG: 848.88 ms
MIN: 231.36 ms
MAX: 906.93 ms
Throughput: 149.85 FPS
>-d GPU.1 -hint throughput
Count: 16384 iterations
Duration: 60663.84 ms
Latency:
Median: 472.02 ms
AVG: 472.80 ms
MIN: 246.64 ms
MAX: 516.02 ms
Throughput: 270.08 FPS
DG2 AIC FRD4 128 EU
---
```
(openvino) C:\Users\Win10>benchmark_app -m C:\Users\Win10\Desktop\openvino\openvino_models\public\yolo-v4-tf\FP16-INT8\yolo-v4-tf.xml -d GPU.1
[Step 1/11] Parsing and validating input arguments
[ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device GPU.1 performance hint will be set to THROUGHPUT.
[ INFO ] OpenVINO:
API version............. 2022.2.0-7690-940e927a22b-refs/pull/1296/head
[ INFO ] Device info
GPU
Intel GPU plugin........ version 2022.2
Build................... 2022.2.0-7690-940e927a22b-refs/pull/1296/head
[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for GPU.1 device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Read model took 161.76 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'image_input' precision u8, dimensions ([N,H,W,C]): 1 608 608 3
[ INFO ] Model output 'Func/StatefulPartitionedCall/output/_542:0' precision f32, dimensions ([...]): 1 38 38 255
[ INFO ] Model output 'Func/StatefulPartitionedCall/output/_543:0' precision f32, dimensions ([...]): 1 19 19 255
[ INFO ] Model output 'Func/StatefulPartitionedCall/output/_544:0' precision f32, dimensions ([...]): 1 76 76 255
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 17912.19 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: GPU.1
[ INFO ] AVAILABLE_DEVICES , ['0', '1']
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 2, 1)
[ INFO ] RANGE_FOR_STREAMS , (1, 2)
[ INFO ] OPTIMAL_BATCH_SIZE , 1
[ INFO ] MAX_BATCH_SIZE , 1
[ INFO ] FULL_DEVICE_NAME , Intel(R) Arc(TM) A380 Graphics (dGPU)
[ INFO ] DEVICE_TYPE , Type.DISCRETE
[ INFO ] OPTIMIZATION_CAPABILITIES , ['FP32', 'BIN', 'FP16', 'INT8', 'GPU_HW_MATMUL']
[ INFO ] GPU_UARCH_VERSION , 12.7.1
[ INFO ] GPU_EXECUTION_UNITS_COUNT , 128
[ INFO ] PERF_COUNT , False
[ INFO ] MODEL_PRIORITY , Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY , Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY , Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE , Priority.MEDIUM
[ INFO ] GPU_ENABLE_LOOP_UNROLLING , True
[ INFO ] CACHE_DIR ,
[ INFO ] PERFORMANCE_HINT , PerformanceMode.THROUGHPUT
[ INFO ] COMPILATION_NUM_THREADS , 24
[ INFO ] NUM_STREAMS , 1
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0
[ INFO ] DEVICE_ID , 1
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 16 infer requests took 143.93 ms
[ WARNING ] No input files were given for input 'image_input'!. This input will be filled with random values!
[ INFO ] Fill input 'image_input' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 16 inference requests using 1 streams for GPU.1, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 912.59 ms
[Step 11/11] Dumping statistics report
Count: 4624 iterations
Duration: 60273.46 ms
Latency:
Median: 207.98 ms
AVG: 208.24 ms
MIN: 103.81 ms
MAX: 1146.41 ms
Throughput: 76.72 FPS
```
> -d GPU.1 -hint throughput
Count: 4704 iterations
Duration: 60499.28 ms
Latency:
Median: 411.43 ms
AVG: 410.37 ms
MIN: 202.38 ms
MAX: 412.55 ms
Throughput: 77.75 FPS
>-d GPU.1 -b 8 -nstream 8 -hint none
Count: 608 iterations
Duration: 62618.32 ms
Latency:
Median: 1647.54 ms
AVG: 1627.46 ms
MIN: 381.35 ms
MAX: 1651.07 ms
Throughput: 77.68 FPS

###### tags: `DG2` `OPENVINO'
`