[Note] Build Openvino

tags: `Openvino` `VPU` `edge devices` `Lab research` `Raspberry Pi`

aarch64 (Raspberry Pi 4)
x86_64 (AMD 4700U)
benchmark result on myriad
benchmark result on CPU

aarch64 platform

Reference
Demo Platform
- Raspberry Pi 4B + Intel NCS2
- Debian GNU/Linux 11 with 5.15.76-v8+ kernel (Raspbian 64-bit)

1. Install tools for the build

CMake (latest stable version == 3.14.4)


$ sudo apt install build-essential 
$ sudo apt intsall CMake

2. Build OpenCV==4.5.5 from source

The most difficult part
To ease the pain of building opencv, please refer this article
It takes 1~1.5 hr to complete entire build

3. Build Openvino

1) Download source code









mkdir ~/openvino_build && cd ~/openvino_build

# Get latest version
git clone https://github.com/openvinotoolkit/openvino.git
# Or Get 2021.4.2 LTS version
# git clone --depth 1 --branch tag 2021.4.2 LTS

# clone openvino submodule
git submodule update --init --recursive

2) Install dependencies

Install the following dependencies







cd openvino

sudo apt-get install libpython3-dev
sudo apt-get install python3-pip

pip install numpy
pip install cpython

3) Build from source

Enable custom option to build
To invoke openvino via Python API, found the following path of shared library object file
- Python shared library MUST be .so extension
- It could be found under /usr/lib/<your-architecture>-linux-gnu/
- /usr/lib/python3.9/config-3.9-aarch64-linux-gnu/libpython3.9.so























# Specify path to opencv library
export OpenCV_DIR=/usr/local/lib

cd ~/openvino
mkdir build && cd build

# Specify Option for cmake
cmake -DCMAKE_BUILD_TYPE=Release \
-DENABLE_MKL_DNN=OFF \
-DENABLE_CLDNN=ON \
-DENABLE_GNA=OFF \
-DENABLE_SSE42=OFF \
-DENABLE_SAMPLES=ON \
-DENABLE_PYTHON=ON \
-DPYTHON_EXECUTABLE=`which python3.9` \
-DPYTHON_LIBRARY=/usr/lib/aarch64-linux-gnu/libpython3.9.so \
-DPYTHON_INCLUDE_DIR=/usr/include/python3.9 \
..
    


# Build openvino with all availabe threads
make --jobs=$(nproc --all)

4) Export environment variable

Add the following 2 commands to the bottom of ~/.basrc file (depending on your $SHELL)



# Specify path of opevino library to shell & python path
export PYTHONPATH=PYTHONPATH:~/openvino_build/bin/aarch64/Release/python_api/python3.9
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:~/openvino_build/bin/aarch64/Release

Import openvino via python api
It works successfully with nothing return.


$ python3
>>> from openvino.runtime import Core

5) (Optional)`udev rules` for intel NCS2













sudo usermod -a -G users "$(whoami)"

# install USB rules of NCS2
cat <<EOF > 97-myriad-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF

sudo cp 97-myriad-usbboot.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules
sudo udevadm trigger
sudo ldconfig
rm 97-myriad-usbboot.rules

Reference:
- (Optional) Additional Installation Steps for the Intel® Neural Compute Stick 2
- Input for the Neural Network
At this moment, you can run openvino on intel NCS2.

6) (Optional) ARM Neural Network Library Plugin

Enable ARM CPU plugin (run model on ARM CPU)
















# install depedencies
sudo apt-get update  
sudo apt-get install -y git cmake  scons build-essential

cd ~/openvino_build

# clone openvino-contribution repo
git clone --recurse-submodules --single-branch --branch=master https://github.com/openvinotoolkit/openvino_contrib.git 

# change to arm_plugin folder
cd openvino_contrib/modules/arm_plugin
mkdir build && cd build
     
# config cmake and build 
cmake -DIE_EXTRA_MODULES=~/openvino_build/openvino_contrib/modules/arm_plugin .. 
make --jobs=$(nproc --all)

5. Test openvino with built-in benchmark

Under the built folder: openvino/bin/aarch64/Release/benchmark_app

x86_64

Reference
- Github Wiki
Demo platform
- AMD Ryzen 4700U (x64 arch.)
- Ubuntu 22.04.1 LTS with 5.15.0-52-generic kernel

cmake option & building openvino with all core

Different python versions was managed by pyenv
To enable shared library (libpython3.8.so) under pyenv manager
- env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.8.11
- refer this article to get more infos.










cmake -DCMAKE_BUILD_TYPE=Release \
-DENABLE_OV_ONNX_FRONTEND=ON \
-DENABLE_INTEL_GPU=OFF \
-DENABLE_PYTHON=ON \
-DPYTHON_EXECUTABLE=`which python3.9` \
-DPYTHON_LIBRARY=~/.pyenv/versions/3.8.11/lib/libpython3.8.so \
-DPYTHON_INCLUDE_DIR=~/.pyenv/versions/3.8.11/include \
..
    
  make --jobs=$(nproc --all)

cmake -DCMAKE_BUILD_TYPE=Release \
-DENABLE_OV_ONNX_FRONTEND=ON \
-DENABLE_INTEL_GPU=OFF \
-DENABLE_PYTHON=ON \
-DENABLE_PYTHON=ON \
-DPYTHON_EXECUTABLE=`which python3.8` \
-DPYTHON_LIBRARY=usr/lib/x86_64-linux-gnu/libpython3.8.so \
-DPYTHON_INCLUDE_DIR=/usr/include/python3.8 ..

wrap openvino with python api

append the following 2 commands to the bottom of ~/.bashrc or other shell config file.

export PYTHONPATH=PYTHONPATH:~/openvino_build/openvino/bin/intel64/Release/python_api/python3.8
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:~/openvino_build/openvino/bin/intel64/Release

(Optional)`udev rules` for Intel Myriad NCS2

Refer to section: (Optional) Additional Installation Steps for the Intel® Neural Compute Stick 2

Add the current Linux user to the users group; you will need to log out and log in for it to take effect:

sudo usermod -a -G users "$(whoami)"

To perform inference on Intel® Neural Compute Stick 2, install the USB rules as follows:

cat <<EOF > 97-myriad-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF

sudo cp 97-myriad-usbboot.rules /etc/udev/rules.d/ 
sudo udevadm control --reload-rules 
sudo udevadm trigger 
sudo ldconfig
rm 97-myriad-usbboot.rules

Benchmark result

1. levit-128s

CPU (AMD Ryzen 4700U, 8C8T)
blob, FP16
Refer to this article (SnippetsOpset unsupportbug)

/benchmark_app -m ~/public/levit-128s/FP16/CPU_MYRIAD_levit-128s.blob -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/levit-128s/FP16/CPU_MYRIAD_levit-128s.blob -d CPU 
[ INFO ] Network is compiled
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Skipping the step for compiled network
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/11] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/11] Loading the model to the device
[ ERROR ] Check 'false' failed at frontends/common/src/frontend.cpp:53:
Converting input model
Cannot create Subgraph layer Multiply_16115 id:43 from unsupported opset: SnippetsOpset

CPU (AMD Ryzen 4700U)
xml, FP16

/benchmark_app -m ~/public/levit-128s/FP16/levit-128s.xml -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/levit-128s/FP16/levit-128s.xml -d CPU 
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 73.08 ms
[ INFO ] Original network I/O parameters:
Network inputs:
    image (node: image) : f32 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 5/11] Resizing network to match image sizes and given batch
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    image (node: image) : u8 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 1023.34 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 8 }
[ INFO ]   { NUM_STREAMS , 8 }
[ INFO ]   { AFFINITY , CORE }
[ INFO ]   { INFERENCE_NUM_THREADS , 8 }
[ INFO ]   { PERF_COUNT , NO }
[ INFO ]   { INFERENCE_PRECISION_HINT , f32 }
[ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ]   { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image  ([N,C,H,W], u8, {1, 3, 224, 224}, static):      random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 27.88 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             26432 iterations
[ INFO ] Duration:          60020.27 ms
[ INFO ] Latency:
[ INFO ]        Median:     15.47 ms
[ INFO ]        Average:    18.00 ms
[ INFO ]        Min:        13.21 ms
[ INFO ]        Max:        200.96 ms
[ INFO ] Throughput: 440.38 FPS

Intel NCS2 MYRIAD
FP16

$ ./benchmark_app -m ~/public/levit-128s/FP16/levit-128s.blob -d MYRIAD
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/levit-128s/FP16/levit-128s.blob -d MYRIAD 
[ INFO ] Network is compiled
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] MYRIAD
[ INFO ] openvino_intel_myriad_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(MYRIAD) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Skipping the step for compiled network
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/11] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/11] Loading the model to the device
W: [ncTool] [     99521] [benchmark_app] patchSetMemTypeCommand:205     Fail to find call command
W: [ncTool] [     99521] [benchmark_app] bootDevice:233 Fail to patch "Set memory type" command for firmware sc = -2
[ INFO ] Import network took 1954.21 ms
[ INFO ] Original network I/O paramteters:
Network inputs:
    image (node: image) : f16 / [...] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f16 / [...] / {1,1000}
[ WARNING ] image: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: MYRIAD
[ INFO ]   { NETWORK_NAME , __importedExecutableNetworkFromBlobName }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
[ INFO ]   { DEVICE_THERMAL , 46.4524 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image  ([N,C,H,W], f16, {1, 3, 224, 224}, static):     random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 106.65 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             776 iterations
[ INFO ] Duration:          60328.09 ms
[ INFO ] Latency:
[ INFO ]        Median:     310.87 ms
[ INFO ]        Average:    310.55 ms
[ INFO ]        Min:        176.46 ms
[ INFO ]        Max:        345.88 ms
[ INFO ] Throughput: 12.86 FPS

2. swin-tiny-patch4-window7-224

CPU (AMD Ryzen 4700U, 8C8T)
xml, FP16

./benchmark_app -m ~/public/swin-tiny-patch4-window7-224/FP16/swin-tiny-patch4-window7-224.xml -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/swin-tiny-patch4-window7-224/FP16/swin-tiny-patch4-window7-224.xml -d CPU 
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 104.10 ms
[ INFO ] Original network I/O parameters:
Network inputs:
    input (node: input) : f32 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 5/11] Resizing network to match image sizes and given batch
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    input (node: input) : u8 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    probs (node: probs) : f32 / [...] / {1,1000}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 815.24 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
[ INFO ]   { NUM_STREAMS , 4 }
[ INFO ]   { AFFINITY , CORE }
[ INFO ]   { INFERENCE_NUM_THREADS , 8 }
[ INFO ]   { PERF_COUNT , NO }
[ INFO ]   { INFERENCE_PRECISION_HINT , f32 }
[ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ]   { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] input  ([N,C,H,W], u8, {1, 3, 224, 224}, static):      random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 130.01 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             1520 iterations
[ INFO ] Duration:          60136.50 ms
[ INFO ] Latency:
[ INFO ]        Median:     137.26 ms
[ INFO ]        Average:    158.10 ms
[ INFO ]        Min:        86.66 ms
[ INFO ]        Max:        476.25 ms
[ INFO ] Throughput: 25.28 FPS

3. deit-tiny-distilled-patch16-224

    $ ./benchmark_app -m ~/public/deit-tiny-distilled-patch16-224/deit-tiny-distilled-patch16-224.xml -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: /home/heyrward-ubuntu/openvino_build/openvino/bin/intel64/Release/benchmark_app -m /home/heyrward-ubuntu/public/deit-tiny-distilled-patch16-224/deit-tiny-distilled-patch16-224.xml -d CPU 
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8631-cec772f2c09
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 33.28 ms
[ INFO ] Original network I/O parameters:
Network inputs:
    x (node: x) : f32 / [...] / {1,3,224,224}
Network outputs:
    1127 (node: 1127) : f32 / [...] / {1,1000}
[Step 5/11] Resizing network to match image sizes and given batch
[ WARNING ] x: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    x (node: x) : u8 / [N,C,H,W] / {1,3,224,224}
Network outputs:
    1127 (node: 1127) : f32 / [...] / {1,1000}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 1087.66 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 8 }
[ INFO ]   { NUM_STREAMS , 8 }
[ INFO ]   { AFFINITY , CORE }
[ INFO ]   { INFERENCE_NUM_THREADS , 8 }
[ INFO ]   { PERF_COUNT , NO }
[ INFO ]   { INFERENCE_PRECISION_HINT , f32 }
[ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ]   { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] x  ([N,C,H,W], u8, {1, 3, 224, 224}, static):  random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 98.19 ms

[Step 11/11] Dumping statistics report
[ INFO ] Count:             8440 iterations
[ INFO ] Duration:          60068.46 ms
[ INFO ] Latency:
[ INFO ]        Median:     47.66 ms
[ INFO ]        Average:    56.86 ms
[ INFO ]        Min:        39.28 ms
[ INFO ]        Max:        541.54 ms
[ INFO ] Throughput: 140.51 FPS

[Note] Build Openvino

tags: Openvino VPU edge devices Lab research Raspberry Pi

Table of Contents

aarch64 platform

Reference

Demo Platform

1. Install tools for the build

2. Build OpenCV==4.5.5 from source

3. Build Openvino

1) Download source code

2) Install dependencies

3) Build from source

4) Export environment variable

5) (Optional)udev rules for intel NCS2

6) (Optional) ARM Neural Network Library Plugin

5. Test openvino with built-in benchmark

x86_64

cmake option & building openvino with all core

wrap openvino with python api

(Optional)udev rules for Intel Myriad NCS2

Benchmark result

1. levit-128s

2. swin-tiny-patch4-window7-224

3. deit-tiny-distilled-patch16-224

tags: `Openvino` `VPU` `edge devices` `Lab research` `Raspberry Pi`

5) (Optional)`udev rules` for intel NCS2

(Optional)`udev rules` for Intel Myriad NCS2