1.Open the docker environment of Vitis AI shell , and go to the path of this tutorial lab --- ``` ./docker_run.sh xilinx/vitis-ai-cpu:latest ``` ![image](https://hackmd.io/_uploads/SJjCdWOMZe.png) 2.Active the conda environment of this lab , and download the package we need --- ``` conda activate vitis-ai-pytorch pip install torchsummary conda deactivate ``` 3.Run the script --- ``` source ./run_all.sh ``` First it will start the training : ![image](https://hackmd.io/_uploads/r1fXgQOfbl.png) ![image](https://hackmd.io/_uploads/BkVDqWuzbl.png) output the weight file to "cnn_float.pt" ![image](https://hackmd.io/_uploads/S1AmmXdG-g.png) After training finished , it will show the imformation of this CNN model . And it will quantize the model ![image](https://hackmd.io/_uploads/HkdDrQ_f-l.png) It put the following structure into your device before loading your weight ![image](https://hackmd.io/_uploads/HyqXUw_z-e.png) ![image](https://hackmd.io/_uploads/Hk9hqZdGWe.png) ![image](https://hackmd.io/_uploads/B19ZobdMbx.png) Then compile the model for hardware DPU ![image](https://hackmd.io/_uploads/Bk30ZQOG-e.png) ![image](https://hackmd.io/_uploads/r18BoW_Gbl.png) In this step I try to use another compile command to compile the model for device KV260 ``` ARCH=/opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json TARGET=kv260 echo "-----------------------------------------" echo "COMPILING MODEL FOR KV260" echo "Using ARCH: $ARCH" echo "-----------------------------------------" rm -rf ./build/compiled_model_kv260 mkdir -p ./build/compiled_model_kv260 compile() { vai_c_xir \ --xmodel ./build/quantized_model/CNN_int.xmodel \ --arch $ARCH \ --net_name CNN_int_${TARGET} \ --output_dir ./build/compiled_model_kv260 } compile 2>&1 | tee ./build/compile_${TARGET}.log echo "-----------------------------------------" echo "Compile completed! Output in ./build/compiled_model_kv260" echo "-----------------------------------------" ``` 4.Dataflow graph of this CNN --- Tanh(x) and sigmoid operation can't run on DPU , these operation will run on PS side's ARM processor : ![image](https://hackmd.io/_uploads/BkpmZG_zZl.png) So it arrange the dataflow graph as following : ![image](https://hackmd.io/_uploads/rJ_LAZOfWg.png) 5.Host code --- a. tanh 、 sigmoid function ![image](https://hackmd.io/_uploads/BJFCffdGZg.png) b. Img pre-process (open grayscale img、resize) ![image](https://hackmd.io/_uploads/By2k7MuzWx.png) c. Excution inference ![image](https://hackmd.io/_uploads/BJoi7zdGbe.png) In run_dpu() : ![image](https://hackmd.io/_uploads/HkViizOGbe.png) ![image](https://hackmd.io/_uploads/ByM96zdzZe.png) I run this lab by PYNQ's environment instead of the origin way (because the setting's issue),and the following are my host code's link : 6.generate the bitstream and xclbin、hwh to program PL side --- In this lab ,I use the xilinx's offical github repo "DPU-PYNQ" to generate the bitstream and xclbin ``` git clone https://github.com/Xilinx/DPU-PYNQ.git ``` And then enter the folder to run makefile ``` cd boards make BOARD=kv260_som ``` Then it will generate the file in the folder kv260_som . You can modify the command to generate other board's bitstream . ex : zcu104 zcu102... 7.Inference on KV260 board --- 1. Upload the requirement files tn jupyter lab ![image](https://hackmd.io/_uploads/r1Gg7gw7bg.png) ![image](https://hackmd.io/_uploads/ryRxmgPQ-g.png) hostcode : ![image](https://hackmd.io/_uploads/HJGU4QlvQWg.png) 2. Excute the host code to get the result ![image](https://hackmd.io/_uploads/BJyu7lPXbg.png)