使用OpenCVDNN子產品提升目标檢測模型推理速度
一、簡介
在使用OpenCV基于YOLO預訓練模型進行目标檢測模型推理時,發現使用CPU進行推理的速度達不到預期的效果,是以使用GPU提升推理速度,這裡對如何使用GPU進行推理的具體方法進行介紹。
二、編譯安裝OpenCV
使用Pypi源安裝的OpenCV包,隻能使用CPU進行模型推理,想要使用GPU進行模型推理,需要使用源碼編譯安裝,開啟使用GPU訓練支援,下面介紹源碼編譯安裝OpenCV的具體方法。
1.
*顯示卡:NVIDIAGeForce GTX 3060
*作業系統:Ubuntu20.04
*cuda:11.2.2
*cudnn:8
為了便于模型推理環境的分發,我們使用容器進行模型推理環境安裝,這裡使用NVIDIA提供的cuda鏡像`nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04`作為基礎鏡像建立容器安裝模型推理環境。
2.
01
在主控端上安裝19或更高版本的Docker,以支援直接透傳GPU給容器使用,或基于已安裝的Docker版本根據NVIDIA官方提供的方法安裝相關包以支援GPU透傳給容器使用。後續文章以Docker已支援将GPU透傳給容器使用為前提。
02
在主控端上安裝NVIDIA驅動,我們這裡安裝`470.86`版本驅動,驅動安裝方法見NVIDIA官方提供的驅動安裝方法,這裡不再贅述。
3.
01
# docker pull nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04
02
# docker run -d --name opencv-gpu-image-build --gpus 1nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04 sleep infinity
03
# docker exec -ti opencv-gpu-image-build bash
04
>在容器中執行以下指令
# nvidia-smi
Sun Dec 26 11:17:422021
+-----------------------------------------------------------------------------+
|NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version:11.4 |
|-------------------------------+----------------------+----------------------+
|GPU Name Persistence-M| Bus-Id Disp.A | VolatileUncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M.|
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:8A:00.0 Off | N/A |
| 30% 28C P8 11W / 170W | 0MiB / 12053MiB | 0% Default |
| | | N/A|
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
|Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
如上所示,容器中能夠檢視到GPU裝置資訊,基礎環境已準備好。後續操作全部在容器環境中進行。
4.
# apt update
# apt upgrade -y
# apt install -ybuild-essential cmake pkg-config unzip yasm git checkinstall
#apt install -y libjpeg-dev libpng-dev libtiff-dev
# apt install-y libavcodec-dev libavformat-dev libswscale-dev libavresample-dev
#apt install -y libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
#apt install -y libxvidcore-dev x264 libx264-dev libfaac-devlibmp3lame-dev libtheora-dev
# apt install -y libfaac-devlibmp3lame-dev libvorbis-dev
# apt install -ylibopencore-amrnb-dev libopencore-amrwb-dev
# apt-get install -ylibdc1394-22 libdc1394-22-dev libxine2-dev libv4l-dev v4l-utils
#cd /usr/include/linux
# ln -s -f ../libv4l1-videodev.hvideodev.h
# cd ~
# apt-get -y install libgtk-3-dev
#apt-get -y install python3-dev python3-pip
# pip3 install -U pipnumpy
# apt install -y python3-testresources
# apt-getinstall -y libtbb-dev
# apt-get install -y libatlas-base-devgfortran
# apt-get install -y libprotobuf-devprotobuf-compiler
# apt-get install -y libgoogle-glog-devlibgflags-dev
# apt-get install -y libgphoto2-dev libeigen3-devlibhdf5-dev doxygen
5.
# apt install -y wget unzip
# wget -O opencv.ziphttps://github.com/opencv/opencv/archive/refs/tags/4.5.4.zip
#wget -O opencv_contrib.ziphttps://github.com/opencv/opencv_contrib/archive/refs/tags/4.5.4.zip
#unzip opencv.zip
# unzip opencv_contrib.zip
# pipinstall numpy
6.
>注意:編譯安裝過程中會有軟體包的下載下傳,可能會受防火牆限制無法正常下載下傳,需自行解決。
01
>通過設定`-DWITH_CUDA=ON`、`-DWITH_CUDNN=ON`、`-DOPENCV_DNN_CUDA=ON`參數開啟GPU支援。
# cd opencv-4.5.4
# mkdir build
# cd build
#cmake -D CMAKE_BUILD_TYPE=RELEASE
-DCMAKE_INSTALL_PREFIX=/usr/local
-D WITH_TBB=ON
-D ENABLE_FAST_MATH=1
-D CUDA_FAST_MATH=1
-DWITH_CUBLAS=1
-D WITH_CUDA=ON
-DBUILD_opencv_cudacodec=OFF
-D WITH_CUDNN=ON
-DOPENCV_DNN_CUDA=ON
-D CUDA_ARCH_BIN=8.6
-DWITH_V4L=ON
-D WITH_QT=OFF
-D WITH_OPENGL=ON
-D WITH_GSTREAMER=ON
-D OPENCV_GENERATE_PKGCONFIG=ON
-D OPENCV_PC_FILE_NAME=opencv.pc
-DOPENCV_ENABLE_NONFREE=ON
-DPYTHON_EXECUTABLE=/usr/bin/python3
-DOPENCV_EXTRA_MODULES_PATH=/root/opencv_contrib-4.5.4/modules
-D INSTALL_PYTHON_EXAMPLES=OFF
-D INSTALL_C_EXAMPLES=OFF
-D BUILD_EXAMPLES=OFF ..
...
-------------------------------------------------------------------
--
--Configuring done
-- Generating done
-- Build files havebeen written to: /root/opencv-4.5.4/build
02
>使用`nproc`指令檢視可用CPU數量,确認用于編譯的CPU數量,在`make`指令執行時通過`-j`選項指定編譯時使用的CPU數量。
# nproc
# cd /root/opencv-4.5.4/build
# make -j8
03
# make install
# /bin/bash -c 'echo "/usr/local/lib">> /etc/ld.so.conf.d/opencv.conf'
# ldconfig
04
進入python互動環境,導入`cv2`庫,檢視版本,确認OpenCV安裝成功。
# python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC9.3.0] on linux
Type "help", "copyright","credits" or "license" for more information.
>>>import cv2
>>> cv2.__version__
'4.5.4'
>>>
三、模型推理對比測試
OpenCV安裝完成後,下載下傳YOLO預訓練模型進行目标檢測模型推理對比測試。
1.
*[預訓練YOLO模型](https://pjreddie.com/media/files/yolov3.weights)
*[模型配置檔案](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg)
*測試圖檔`yolo-test.jpeg`(準備一個測試用街景圖檔)
# mkdir model
# cd model
# wgethttps://pjreddie.com/media/files/yolov3.weights
# wgethttps://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg
#cd ..
2.
01
将以下代碼儲存到object-detect-cpu.py檔案中
import cv2
import numpy as np
from time importtime
## Loading image
img =cv2.imread("/root/yolo-test.jpeg")
## LoadYolo
yolo_weight = "/root/model/yolov3.weights"
yolo_config= "/root/model/yolov3.cfg"
net =cv2.dnn.readNet(yolo_weight, yolo_config)
# # Definingdesired shape
fWidth = 320
fHeight = 320
## Findnames of all layers
layer_names = net.getLayerNames()
##Find names of three output layers
output_layers = [layer_names[i- 1] for i in net.getUnconnectedOutLayers()]
defdetect():
t = time()
## Convert imageto Blob
blob = cv2.dnn.blobFromImage(img, 1/255, (fWidth,fHeight), (0, 0, 0), True, crop=False)
## Set inputfor YOLO object detection
net.setInput(blob)
## Send blob data to forward pass
outs =net.forward(output_layers)
dt = (time() - t) * 1000
print("Detect time: %s ms" % dt)
for i inrange(5):
detect()
02
将以下代碼儲存到object-detect-gpu.py檔案中
import cv2
import numpy as np
from time importtime
## Loading image
img =cv2.imread("/root/yolo-test.jpeg")
## LoadYolo
yolo_weight = "/root/model/yolov3.weights"
yolo_config= "/root/model/yolov3.cfg"
net =cv2.dnn.readNet(yolo_weight,yolo_config)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
## Defining desired shape
fWidth = 320
fHeight = 320
##Find names of all layers
layer_names = net.getLayerNames()
##Find names of three output layers
output_layers = [layer_names[i- 1] for i in net.getUnconnectedOutLayers()]
defdetect():
t = time()
## Convert imageto Blob
blob = cv2.dnn.blobFromImage(img, 1/255, (fWidth,fHeight), (0, 0, 0), True, crop=False)
## Set inputfor YOLO object detection
net.setInput(blob)
## Send blob data to forward pass
outs =net.forward(output_layers)
dt = (time() - t) * 1000
print("Detect time: %s ms" % dt)
for i inrange(5):
detect()
03
GPU測試代碼比CPU測試代碼多以下兩行内容,這兩行内容用于設定推理時使用GPU資源
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
代碼中推理過程重複執行了5次,原因是因為使用GPU進行推理時,第一次推理将模型加載到顯存中時耗時較長,從第二次開始的耗時才是實際推理耗時。
3.
01
如下所示,使用CPU推理平均耗時600ms左右
# python3 object-detect-cpu.py
Detect time: 593.8053131103516ms
Detect time: 345.13401985168457 ms
Detect time:859.9557876586914 ms
Detect time: 866.8849468231201 ms
Detecttime: 460.890531539917 ms
02
如下所示,使用GPU推理耗時16ms左右(第二次到第五次推理資料平均值)
# python3 object-detect-gpu.py
Detect time: 8162.863969802856ms
Detect time: 21.87061309814453 ms
Detect time:14.004230499267578 ms
Detect time: 14.102935791015625 ms
Detecttime: 14.727354049682617 ms
從上面的對比測試結果來看,使用GPU進行推理速度比使用CPU進行推理的速度提升了35倍以上,推理速度符合預期。
四、容器轉換為鏡像
測試确認沒問題後,可以将容器轉換為鏡像,便于在其它環境中快速部署環境使用。
# docker commit opencv-gpu-image-build opencv-gpu-image:1.0
# docker save -o opencv-gpu-image_1.0.tar opencv-gpu-image:1.0
作者介紹
馬哲,海雲捷迅研發工程師。畢業于山東農業大學資訊科學與工程學院。10年軟體開發經驗,熟悉Linux、Docker、OpenStack、Kubernetes等開源技術并具有開源社群貢獻經曆。在雲計算、人工智能等技術領域具有豐富的研究開發經驗。