Try   HackMD

[Note] Build Openvino

tags: Openvino VPU edge devices Lab research Raspberry Pi

Table of Contents

  1. aarch64 (Raspberry Pi 4)
  2. x86_64 (AMD 4700U)
  3. benchmark result on myriad
  4. benchmark result on CPU

aarch64 platform

1. Install tools for the build

  1. CMake (latest stable version == 3.14.4)
$ sudo apt install build-essential $ sudo apt intsall CMake

2. Build OpenCV==4.5.5 from source

  • The most difficult part
  • To ease the pain of building opencv, please refer this article
  • It takes 1~1.5 hr to complete entire build

3. Build Openvino

1) Download source code

mkdir ~/openvino_build && cd ~/openvino_build # Get latest version git clone https://github.com/openvinotoolkit/openvino.git # Or Get 2021.4.2 LTS version # git clone --depth 1 --branch tag 2021.4.2 LTS # clone openvino submodule git submodule update --init --recursive

2) Install dependencies

  • Install the following dependencies
cd openvino sudo apt-get install libpython3-dev sudo apt-get install python3-pip pip install numpy pip install cpython

3) Build from source

  • Enable custom option to build
  • To invoke openvino via Python API, found the following path of shared library object file
    • Python shared library MUST be .so extension
    • It could be found under /usr/lib/<your-architecture>-linux-gnu/
    • /usr/lib/python3.9/config-3.9-aarch64-linux-gnu/libpython3.9.so
# Specify path to opencv library export OpenCV_DIR=/usr/local/lib cd ~/openvino mkdir build && cd build # Specify Option for cmake cmake -DCMAKE_BUILD_TYPE=Release \ -DENABLE_MKL_DNN=OFF \ -DENABLE_CLDNN=ON \ -DENABLE_GNA=OFF \ -DENABLE_SSE42=OFF \ -DENABLE_SAMPLES=ON \ -DENABLE_PYTHON=ON \ -DPYTHON_EXECUTABLE=`which python3.9` \ -DPYTHON_LIBRARY=/usr/lib/aarch64-linux-gnu/libpython3.9.so \ -DPYTHON_INCLUDE_DIR=/usr/include/python3.9 \ .. # Build openvino with all availabe threads make --jobs=$(nproc --all)

4) Export environment variable

  • Add the following 2 commands to the bottom of ~/.basrc file (depending on your $SHELL)
# Specify path of opevino library to shell & python path export PYTHONPATH=PYTHONPATH:~/openvino_build/bin/aarch64/Release/python_api/python3.9 export LD_LIBRARY_PATH=LD_LIBRARY_PATH:~/openvino_build/bin/aarch64/Release
  • Import openvino via python api
  • It works successfully with nothing return.
$ python3 >>> from openvino.runtime import Core

5) (Optional)udev rules for intel NCS2

sudo usermod -a -G users "$(whoami)" # install USB rules of NCS2 cat <<EOF > 97-myriad-usbboot.rules SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1" SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1" EOF sudo cp 97-myriad-usbboot.rules /etc/udev/rules.d/ sudo udevadm control --reload-rules sudo udevadm trigger sudo ldconfig rm 97-myriad-usbboot.rules

6) (Optional) ARM Neural Network Library Plugin

  • Enable ARM CPU plugin (run model on ARM CPU)
# install depedencies sudo apt-get update sudo apt-get install -y git cmake scons build-essential cd ~/openvino_build # clone openvino-contribution repo git clone --recurse-submodules --single-branch --branch=master https://github.com/openvinotoolkit/openvino_contrib.git # change to arm_plugin folder cd openvino_contrib/modules/arm_plugin mkdir build && cd build # config cmake and build cmake -DIE_EXTRA_MODULES=~/openvino_build/openvino_contrib/modules/arm_plugin .. make --jobs=$(nproc --all)

5. Test openvino with built-in benchmark

  • Under the built folder: openvino/bin/aarch64/Release/benchmark_app


x86_64

  • Reference
  • Demo platform
    • AMD Ryzen 4700U (x64 arch.)
    • Ubuntu 22.04.1 LTS with 5.15.0-52-generic kernel

cmake option & building openvino with all core

  • Different python versions was managed by pyenv
  • To enable shared library (libpython3.8.so) under pyenv manager
    • env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.8.11
    • refer this article to get more infos.
cmake -DCMAKE_BUILD_TYPE=Release \ -DENABLE_OV_ONNX_FRONTEND=ON \ -DENABLE_INTEL_GPU=OFF \ -DENABLE_PYTHON=ON \ -DPYTHON_EXECUTABLE=`which python3.9` \ -DPYTHON_LIBRARY=~/.pyenv/versions/3.8.11/lib/libpython3.8.so \ -DPYTHON_INCLUDE_DIR=~/.pyenv/versions/3.8.11/include \ .. make --jobs=$(nproc --all)
cmake -DCMAKE_BUILD_TYPE=Release \
-DENABLE_OV_ONNX_FRONTEND=ON \
-DENABLE_INTEL_GPU=OFF \
-DENABLE_PYTHON=ON \
-DENABLE_PYTHON=ON \
-DPYTHON_EXECUTABLE=`which python3.8` \
-DPYTHON_LIBRARY=usr/lib/x86_64-linux-gnu/libpython3.8.so \
-DPYTHON_INCLUDE_DIR=/usr/include/python3.8 ..

wrap openvino with python api

  • append the following 2 commands to the bottom of ~/.bashrc or other shell config file.
export PYTHONPATH=PYTHONPATH:~/openvino_build/openvino/bin/intel64/Release/python_api/python3.8
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:~/openvino_build/openvino/bin/intel64/Release

(Optional)udev rules for Intel Myriad NCS2

  1. Add the current Linux user to the users group; you will need to log out and log in for it to take effect:
sudo usermod -a -G users "$(whoami)"
  1. To perform inference on Intel® Neural Compute Stick 2, install the USB rules as follows:
cat <<EOF > 97-myriad-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF
sudo cp 97-myriad-usbboot.rules /etc/udev/rules.d/ 
sudo udevadm control --reload-rules 
sudo udevadm trigger 
sudo ldconfig
rm 97-myriad-usbboot.rules


Benchmark result

1. levit-128s

  • CPU (AMD Ryzen 4700U, 8C8T)
  • blob, FP16
  • Refer to this article (SnippetsOpset unsupportbug)
/benchmark_app -m ~/public/levit-128s/FP16/CPU_MYRIAD_levit-128s.blob -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/levit-128s/FP16/CPU_MYRIAD_levit-128s.blob -d CPU 
[ INFO ] Network is compiled
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Skipping the step for compiled network
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/11] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/11] Loading the model to the device
[ ERROR ] Check 'false' failed at frontends/common/src/frontend.cpp:53:
Converting input model
Cannot create Subgraph layer Multiply_16115 id:43 from unsupported opset: SnippetsOpset
  • CPU (AMD Ryzen 4700U)
  • xml, FP16
/benchmark_app -m ~/public/levit-128s/FP16/levit-128s.xml -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/levit-128s/FP16/levit-128s.xml -d CPU 
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 73.08 ms
[ INFO ] Original network I/O parameters:
Network inputs:
    image (node: image) : f32 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 5/11] Resizing network to match image sizes and given batch
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    image (node: image) : u8 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 1023.34 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 8 }
[ INFO ]   { NUM_STREAMS , 8 }
[ INFO ]   { AFFINITY , CORE }
[ INFO ]   { INFERENCE_NUM_THREADS , 8 }
[ INFO ]   { PERF_COUNT , NO }
[ INFO ]   { INFERENCE_PRECISION_HINT , f32 }
[ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ]   { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image  ([N,C,H,W], u8, {1, 3, 224, 224}, static):      random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 27.88 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             26432 iterations
[ INFO ] Duration:          60020.27 ms
[ INFO ] Latency:
[ INFO ]        Median:     15.47 ms
[ INFO ]        Average:    18.00 ms
[ INFO ]        Min:        13.21 ms
[ INFO ]        Max:        200.96 ms
[ INFO ] Throughput: 440.38 FPS
  • Intel NCS2 MYRIAD
  • FP16
$ ./benchmark_app -m ~/public/levit-128s/FP16/levit-128s.blob -d MYRIAD
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/levit-128s/FP16/levit-128s.blob -d MYRIAD 
[ INFO ] Network is compiled
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] MYRIAD
[ INFO ] openvino_intel_myriad_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(MYRIAD) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Skipping the step for compiled network
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/11] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/11] Loading the model to the device
W: [ncTool] [     99521] [benchmark_app] patchSetMemTypeCommand:205     Fail to find call command
W: [ncTool] [     99521] [benchmark_app] bootDevice:233 Fail to patch "Set memory type" command for firmware sc = -2
[ INFO ] Import network took 1954.21 ms
[ INFO ] Original network I/O paramteters:
Network inputs:
    image (node: image) : f16 / [...] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f16 / [...] / {1,1000}
[ WARNING ] image: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: MYRIAD
[ INFO ]   { NETWORK_NAME , __importedExecutableNetworkFromBlobName }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
[ INFO ]   { DEVICE_THERMAL , 46.4524 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image  ([N,C,H,W], f16, {1, 3, 224, 224}, static):     random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 106.65 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             776 iterations
[ INFO ] Duration:          60328.09 ms
[ INFO ] Latency:
[ INFO ]        Median:     310.87 ms
[ INFO ]        Average:    310.55 ms
[ INFO ]        Min:        176.46 ms
[ INFO ]        Max:        345.88 ms
[ INFO ] Throughput: 12.86 FPS

2. swin-tiny-patch4-window7-224

  • CPU (AMD Ryzen 4700U, 8C8T)
  • xml, FP16
./benchmark_app -m ~/public/swin-tiny-patch4-window7-224/FP16/swin-tiny-patch4-window7-224.xml -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/swin-tiny-patch4-window7-224/FP16/swin-tiny-patch4-window7-224.xml -d CPU 
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 104.10 ms
[ INFO ] Original network I/O parameters:
Network inputs:
    input (node: input) : f32 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 5/11] Resizing network to match image sizes and given batch
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    input (node: input) : u8 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 815.24 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
[ INFO ]   { NUM_STREAMS , 4 }
[ INFO ]   { AFFINITY , CORE }
[ INFO ]   { INFERENCE_NUM_THREADS , 8 }
[ INFO ]   { PERF_COUNT , NO }
[ INFO ]   { INFERENCE_PRECISION_HINT , f32 }
[ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ]   { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] input  ([N,C,H,W], u8, {1, 3, 224, 224}, static):      random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 130.01 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             1520 iterations
[ INFO ] Duration:          60136.50 ms
[ INFO ] Latency:
[ INFO ]        Median:     137.26 ms
[ INFO ]        Average:    158.10 ms
[ INFO ]        Min:        86.66 ms
[ INFO ]        Max:        476.25 ms
[ INFO ] Throughput: 25.28 FPS

3. deit-tiny-distilled-patch16-224

  • CPU
  • xml
    $ ./benchmark_app -m ~/public/deit-tiny-distilled-patch16-224/deit-tiny-distilled-patch16-224.xml -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/deit-tiny-distilled-patch16-224/deit-tiny-distilled-patch16-224.xml -d CPU 
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 33.28 ms
[ INFO ] Original network I/O parameters:
Network inputs:
    x (node: x) : f32 / [...] / {1,3,224,224}
Network outputs:
    1127 (node: 1127) : f32 / [...] / {1,1000}
[Step 5/11] Resizing network to match image sizes and given batch
[ WARNING ] x: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    x (node: x) : u8 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    1127 (node: 1127) : f32 / [...] / {1,1000}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 1087.66 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 8 }
[ INFO ]   { NUM_STREAMS , 8 }
[ INFO ]   { AFFINITY , CORE }
[ INFO ]   { INFERENCE_NUM_THREADS , 8 }
[ INFO ]   { PERF_COUNT , NO }
[ INFO ]   { INFERENCE_PRECISION_HINT , f32 }
[ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ]   { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] x  ([N,C,H,W], u8, {1, 3, 224, 224}, static):  random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 98.19 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             8440 iterations
[ INFO ] Duration:          60068.46 ms
[ INFO ] Latency:
[ INFO ]        Median:     47.66 ms
[ INFO ]        Average:    56.86 ms
[ INFO ]        Min:        39.28 ms
[ INFO ]        Max:        541.54 ms
[ INFO ] Throughput: 140.51 FPS