使用OpenCVDNN子產品提升目标檢測模型推理速度

一、簡介

在使用OpenCV基于YOLO預訓練模型進行目标檢測模型推理時，發現使用CPU進行推理的速度達不到預期的效果，是以使用GPU提升推理速度，這裡對如何使用GPU進行推理的具體方法進行介紹。

二、編譯安裝OpenCV

使用Pypi源安裝的OpenCV包，隻能使用CPU進行模型推理，想要使用GPU進行模型推理，需要使用源碼編譯安裝，開啟使用GPU訓練支援，下面介紹源碼編譯安裝OpenCV的具體方法。

*顯示卡：NVIDIAGeForce GTX 3060

*作業系統：Ubuntu20.04

*cuda：11.2.2

*cudnn：8

為了便于模型推理環境的分發，我們使用容器進行模型推理環境安裝，這裡使用NVIDIA提供的cuda鏡像`nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04`作為基礎鏡像建立容器安裝模型推理環境。

在主控端上安裝19或更高版本的Docker，以支援直接透傳GPU給容器使用，或基于已安裝的Docker版本根據NVIDIA官方提供的方法安裝相關包以支援GPU透傳給容器使用。後續文章以Docker已支援将GPU透傳給容器使用為前提。

在主控端上安裝NVIDIA驅動，我們這裡安裝`470.86`版本驅動，驅動安裝方法見NVIDIA官方提供的驅動安裝方法，這裡不再贅述。

# docker pull nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04

# docker run -d --name opencv-gpu-image-build --gpus 1nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04 sleep infinity

# docker exec -ti opencv-gpu-image-build bash

>在容器中執行以下指令

# nvidia-smi

Sun Dec 26 11:17:422021

+-----------------------------------------------------------------------------+

|NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version:11.4 |

|-------------------------------+----------------------+----------------------+

|GPU Name Persistence-M| Bus-Id Disp.A | VolatileUncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M.|

|===============================+======================+======================|

| 0 NVIDIA GeForce ... Off | 00000000:8A:00.0 Off | N/A |

| 30% 28C P8 11W / 170W | 0MiB / 12053MiB | 0% Default |

| | | N/A|

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

|Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

如上所示，容器中能夠檢視到GPU裝置資訊，基礎環境已準備好。後續操作全部在容器環境中進行。

# apt update

# apt upgrade -y

# apt install -ybuild-essential cmake pkg-config unzip yasm git checkinstall

#apt install -y libjpeg-dev libpng-dev libtiff-dev

# apt install-y libavcodec-dev libavformat-dev libswscale-dev libavresample-dev

#apt install -y libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev

#apt install -y libxvidcore-dev x264 libx264-dev libfaac-devlibmp3lame-dev libtheora-dev

# apt install -y libfaac-devlibmp3lame-dev libvorbis-dev

# apt install -ylibopencore-amrnb-dev libopencore-amrwb-dev

# apt-get install -ylibdc1394-22 libdc1394-22-dev libxine2-dev libv4l-dev v4l-utils

#cd /usr/include/linux

# ln -s -f ../libv4l1-videodev.hvideodev.h

# cd ~

# apt-get -y install libgtk-3-dev

#apt-get -y install python3-dev python3-pip

# pip3 install -U pipnumpy

# apt install -y python3-testresources

# apt-getinstall -y libtbb-dev

# apt-get install -y libatlas-base-devgfortran

# apt-get install -y libprotobuf-devprotobuf-compiler

# apt-get install -y libgoogle-glog-devlibgflags-dev

# apt-get install -y libgphoto2-dev libeigen3-devlibhdf5-dev doxygen

# apt install -y wget unzip

# wget -O opencv.ziphttps://github.com/opencv/opencv/archive/refs/tags/4.5.4.zip

#wget -O opencv_contrib.ziphttps://github.com/opencv/opencv_contrib/archive/refs/tags/4.5.4.zip

#unzip opencv.zip

# unzip opencv_contrib.zip

# pipinstall numpy

>注意：編譯安裝過程中會有軟體包的下載下傳，可能會受防火牆限制無法正常下載下傳，需自行解決。

>通過設定`-DWITH_CUDA=ON`、`-DWITH_CUDNN=ON`、`-DOPENCV_DNN_CUDA=ON`參數開啟GPU支援。

# cd opencv-4.5.4

# mkdir build

# cd build

#cmake -D CMAKE_BUILD_TYPE=RELEASE

-DCMAKE_INSTALL_PREFIX=/usr/local

-D WITH_TBB=ON

-D ENABLE_FAST_MATH=1

-D CUDA_FAST_MATH=1

-DWITH_CUBLAS=1

-D WITH_CUDA=ON

-DBUILD_opencv_cudacodec=OFF

-D WITH_CUDNN=ON

-DOPENCV_DNN_CUDA=ON

-D CUDA_ARCH_BIN=8.6

-DWITH_V4L=ON

-D WITH_QT=OFF

-D WITH_OPENGL=ON

-D WITH_GSTREAMER=ON

-D OPENCV_GENERATE_PKGCONFIG=ON

-D OPENCV_PC_FILE_NAME=opencv.pc

-DOPENCV_ENABLE_NONFREE=ON

-DPYTHON_EXECUTABLE=/usr/bin/python3

-DOPENCV_EXTRA_MODULES_PATH=/root/opencv_contrib-4.5.4/modules

-D INSTALL_PYTHON_EXAMPLES=OFF

-D INSTALL_C_EXAMPLES=OFF

-D BUILD_EXAMPLES=OFF ..

...

-------------------------------------------------------------------

--Configuring done

-- Generating done

-- Build files havebeen written to: /root/opencv-4.5.4/build

>使用`nproc`指令檢視可用CPU數量，确認用于編譯的CPU數量，在`make`指令執行時通過`-j`選項指定編譯時使用的CPU數量。

# nproc

# cd /root/opencv-4.5.4/build

# make -j8

# make install

# /bin/bash -c 'echo "/usr/local/lib">> /etc/ld.so.conf.d/opencv.conf'

# ldconfig

進入python互動環境，導入`cv2`庫，檢視版本，确認OpenCV安裝成功。

# python3

Python 3.8.10 (default, Nov 26 2021, 20:14:08)

[GCC9.3.0] on linux

Type "help", "copyright","credits" or "license" for more information.

>>>import cv2

>>> cv2.__version__

'4.5.4'

>>>

三、模型推理對比測試

OpenCV安裝完成後，下載下傳YOLO預訓練模型進行目标檢測模型推理對比測試。

*[預訓練YOLO模型](https://pjreddie.com/media/files/yolov3.weights)

*[模型配置檔案](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg)

*測試圖檔`yolo-test.jpeg`（準備一個測試用街景圖檔）

# mkdir model

# cd model

# wgethttps://pjreddie.com/media/files/yolov3.weights

# wgethttps://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

#cd ..

将以下代碼儲存到object-detect-cpu.py檔案中

import cv2

import numpy as np

from time importtime

## Loading image

img =cv2.imread("/root/yolo-test.jpeg")

## LoadYolo

yolo_weight = "/root/model/yolov3.weights"

yolo_config= "/root/model/yolov3.cfg"

net =cv2.dnn.readNet(yolo_weight, yolo_config)

# # Definingdesired shape

fWidth = 320

fHeight = 320

## Findnames of all layers

layer_names = net.getLayerNames()

##Find names of three output layers

output_layers = [layer_names[i- 1] for i in net.getUnconnectedOutLayers()]

defdetect():

t = time()

## Convert imageto Blob

blob = cv2.dnn.blobFromImage(img, 1/255, (fWidth,fHeight), (0, 0, 0), True, crop=False)

## Set inputfor YOLO object detection

net.setInput(blob)

## Send blob data to forward pass

outs =net.forward(output_layers)

dt = (time() - t) * 1000

print("Detect time: %s ms" % dt)

for i inrange(5):

detect()

将以下代碼儲存到object-detect-gpu.py檔案中

import cv2

import numpy as np

from time importtime

## Loading image

img =cv2.imread("/root/yolo-test.jpeg")

## LoadYolo

yolo_weight = "/root/model/yolov3.weights"

yolo_config= "/root/model/yolov3.cfg"

net =cv2.dnn.readNet(yolo_weight,yolo_config)

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

## Defining desired shape

fWidth = 320

fHeight = 320

##Find names of all layers

layer_names = net.getLayerNames()

##Find names of three output layers

output_layers = [layer_names[i- 1] for i in net.getUnconnectedOutLayers()]

defdetect():

t = time()

## Convert imageto Blob

blob = cv2.dnn.blobFromImage(img, 1/255, (fWidth,fHeight), (0, 0, 0), True, crop=False)

## Set inputfor YOLO object detection

net.setInput(blob)

## Send blob data to forward pass

outs =net.forward(output_layers)

dt = (time() - t) * 1000

print("Detect time: %s ms" % dt)

for i inrange(5):

detect()

GPU測試代碼比CPU測試代碼多以下兩行内容，這兩行内容用于設定推理時使用GPU資源

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

代碼中推理過程重複執行了5次，原因是因為使用GPU進行推理時，第一次推理将模型加載到顯存中時耗時較長，從第二次開始的耗時才是實際推理耗時。

如下所示，使用CPU推理平均耗時600ms左右

# python3 object-detect-cpu.py

Detect time: 593.8053131103516ms

Detect time: 345.13401985168457 ms

Detect time:859.9557876586914 ms

Detect time: 866.8849468231201 ms

Detecttime: 460.890531539917 ms

如下所示，使用GPU推理耗時16ms左右（第二次到第五次推理資料平均值）

# python3 object-detect-gpu.py

Detect time: 8162.863969802856ms

Detect time: 21.87061309814453 ms

Detect time:14.004230499267578 ms

Detect time: 14.102935791015625 ms

Detecttime: 14.727354049682617 ms

從上面的對比測試結果來看，使用GPU進行推理速度比使用CPU進行推理的速度提升了35倍以上，推理速度符合預期。

四、容器轉換為鏡像

測試确認沒問題後，可以将容器轉換為鏡像，便于在其它環境中快速部署環境使用。

# docker commit opencv-gpu-image-build opencv-gpu-image:1.0

# docker save -o opencv-gpu-image_1.0.tar opencv-gpu-image:1.0

作者介紹

馬哲，海雲捷迅研發工程師。畢業于山東農業大學資訊科學與工程學院。10年軟體開發經驗，熟悉Linux、Docker、OpenStack、Kubernetes等開源技術并具有開源社群貢獻經曆。在雲計算、人工智能等技術領域具有豐富的研究開發經驗。

使用OpenCV DNN子產品提升目标檢測模型推理速度

使用OpenCVDNN子產品提升目标檢測模型推理速度

一、簡介

二、編譯安裝OpenCV

三、模型推理對比測試

四、容器轉換為鏡像

繼續閱讀

使用opencv的dnn子產品進行人臉檢測

YOLOv8來啦 | 詳細解讀YOLOv8的改進子產品！YOLOv5官方出品YOLOv8！1、YOLOv5回顧2、YOLOv8核心介紹參考文章

對YOLO-v1的了解及閱讀筆記YOLO-v1 閱讀筆記

yolox運作報錯--can‘t find starting numberyolox運作報錯–can’t find starting number

【論文閱讀筆記】Deep Neural Networks for Object Detection

【論文閱讀筆記】CenterNet：Objects as Points

【論文閱讀筆記】ThunderNet: Towards Real-time Generic Object Detection

【ICLR2019】Oral 論文彙總

【ICLR2019】Poster 論文彙總

目标檢測系列（IV）：YOLO V1、YOLO V2、YOLO V3

pp-picodet從環境配置到部署全流程（5）——PaddleLite端側部署1. PaddleDetection支援的部署形式說明

目标檢測架構｜又一新架構來襲，關系網絡用于目标檢測（文末附源碼）

yolov7 tensorrt模型加速部署【實戰】

目标檢測：YOLOV3論文解讀一、yolov3論文解讀

Pytorch機器學習（九）—— YOLO中對于錨框，預測框，産生候選區域及對候選區域進行标注詳解 Pytorch機器學習（九）—— YOLO中錨框，預測框，産生候選區域及對候選區域進行标注詳解前言一、基本概念二、代碼講解總結

2021-09-30三維點雲測量正方形包裹體積