這篇文章主要介紹了通過使用OpenCV進行基于深度學習的對象檢測以及使用OpenCV檢測視訊，文中的示例代碼講解詳細，需要的可以參考一下

使用深度學習和 OpenCV 進行目标檢測

基于深度學習的對象檢測時，您可能會遇到三種主要的對象檢測方法：

Faster R-CNNs (Ren et al., 2015)

You Only Look Once (YOLO) (Redmon et al., 2015)

Single Shot Detectors (SSD)（Liu 等人，2015 年）

Faster R-CNNs 可能是使用深度學習進行對象檢測最“聽說”的方法；然而，該技術可能難以了解（特别是對于深度學習的初學者）、難以實施且難以訓練。

此外，即使使用“更快”的 R-CNN 實作（其中“R”代表“區域提議”），算法也可能非常慢，大約為 7 FPS。

如果追求純粹的速度，那麼我們傾向于使用 YOLO，因為這種算法要快得多，能夠在 Titan X GPU 上處理 40-90 FPS。 YOLO 的超快變體甚至可以達到 155 FPS。

YOLO 的問題在于它的準确性不高。

最初由 Google 開發的 SSD 是兩者之間的平衡。該算法比 Faster R-CNN 更直接。

MobileNets：高效（深度）神經網絡

在建構對象檢測網絡時，我們通常使用現有的網絡架構，例如 VGG 或 ResNet，這些網絡架構可能非常大，大約 200-500MB。由于其龐大的規模和由此産生的計算數量，諸如此類的網絡架構不适合資源受限的裝置。相反，我們可以使用 Google 研究人員的另一篇論文 MobileNets（Howard 等人，2017 年）。我們稱這些網絡為“MobileNets”，因為它們專為資源受限的裝置而設計，例如您的智能手機。 MobileNet 與傳統 CNN 的不同之處在于使用了深度可分離卷積。深度可分離卷積背後的一般思想是将卷積分成兩個階段：

3×3 深度卷積。
随後是 1×1 逐點卷積。

這使我們能夠實際減少網絡中的參數數量。問題是犧牲了準确性——MobileNets 通常不如它們的大哥們準确…… ……但它們的資源效率要高得多。

使用 OpenCV 進行基于深度學習的對象檢測

MobileNet SSD 首先在 COCO 資料集（上下文中的常見對象）上進行訓練，然後在 PASCAL VOC 上進行微調，達到 72.7% mAP（平均精度）。

是以，我們可以檢測圖像中的 20 個對象（背景類為 +1），包括飛機、自行車、鳥、船、瓶子、公共汽車、汽車、貓、椅子、牛、餐桌、狗、馬、機車、人、盆栽植物、羊、沙發、火車和電視顯示器。

在本節中，我們将使用 OpenCV 中的 MobileNet SSD + 深度神經網絡 (dnn) 子產品來建構我們的目标檢測器。

打開一個新檔案，将其命名為 object_detection.py ，并插入以下代碼：

| import numpy as np
| import cv2
| if __name__=="__main__":
| image_name = '11.jpg'
| prototxt = 'MobileNetSSD_deploy.prototxt.txt'
| model_path = 'MobileNetSSD_deploy.caffemodel'
| confidence_ta = 0.2
| # 初始化MobileNet SSD訓練的類标簽清單
| # 檢測，然後為每個類生成一組邊界框顔色
| CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
| "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
| "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
| "sofa", "train", "tvmonitor"]
| COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

導入需要的包。

定義全局參數：

image_name：輸入圖像的路徑。
prototxt ：Caffe prototxt 檔案的路徑。
model_path ：預訓練模型的路徑。
confidence_ta ：過濾弱檢測的最小機率門檻值。預設值為 20%。

接下來，讓我們初始化類标簽和邊界框顔色。

| # load our serialized model from disk
| print("[INFO] loading model...")
| net = cv2.dnn.readNetFromCaffe(prototxt, model_path)
| # 加載輸入圖像并為圖像構造一個輸入blob
| # 将大小調整為固定的300x300像素。
| # （注意：SSD模型的輸入是300x300像素）
| image = cv2.imread(image_name)
| (h, w) = image.shape[:2]
| blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843,
| (300, 300), 127.5)
| # 通過網絡傳遞blob并獲得檢測結果和
| # 預測
| print("[INFO] computing object detections...")
| net.setInput(blob)
| detections = net.forward()

從磁盤加載模型。

讀取圖檔。

提取高度和寬度（第 35 行），并從圖像中計算一個 300 x 300 像素的 blob。

将blob放入神經網絡。

計算輸入的前向傳遞，将結果存儲為 detections。

| # 循環檢測結果
| for i in np.arange(0, detections.shape[2]):
| # 提取與資料相關的置信度（即機率）
| # 預測
| confidence = detections[0, 0, i, 2]
| # 通過確定“置信度”來過濾掉弱檢測
| # 大于最小置信度
| if confidence > confidence_ta:
| # 從`detections`中提取類标簽的索引，
| # 然後計算物體邊界框的 (x, y) 坐标
| idx = int(detections[0, 0, i, 1])
| box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
| (startX, startY, endX, endY) = box.astype("int")
| # 顯示預測
| label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
| print("[INFO] {}".format(label))
| cv2.rectangle(image, (startX, startY), (endX, endY),
| COLORS[idx], 2)
| y = startY - 15 if startY - 15 > 15 else startY + 15
| cv2.putText(image, label, (startX, y),
| cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
| # show the output image
| cv2.imshow("Output", image)
| cv2.imwrite("output.jpg", image)
| cv2.waitKey(0)

循環檢測，首先我們提取置信度值。

如果置信度高于我們的最小門檻值，我們提取類标簽索引并計算檢測到的對象周圍的邊界框。

然後，提取框的 (x, y) 坐标，我們将很快使用它來繪制矩形和顯示文本。

接下來，建構一個包含 CLASS 名稱和置信度的文本标簽。

使用标簽，将其列印到終端，然後使用之前提取的 (x, y) 坐标在對象周圍繪制一個彩色矩形。

通常，希望标簽顯示在矩形上方，但如果沒有空間，我們會将其顯示在矩形頂部下方。

最後，使用剛剛計算的 y 值将彩色文本覆寫到圖像上。

運作結果：

使用 OpenCV 檢測視訊

打開一個新檔案，将其命名為 video_object_detection.py ，并插入以下代碼：

| video_name = '12.mkv'
| prototxt = 'MobileNetSSD_deploy.prototxt.txt'
| model_path = 'MobileNetSSD_deploy.caffemodel'
| confidence_ta = 0.2
| # initialize the list of class labels MobileNet SSD was trained to
| # detect, then generate a set of bounding box colors for each class
| CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
| "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
| "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
| "sofa", "train", "tvmonitor"]
| COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
| # load our serialized model from disk
| print("[INFO] loading model...")
| net = cv2.dnn.readNetFromCaffe(prototxt, model_path)
| # initialze the video stream, allow the camera to sensor to warmup,
| # and initlaize the FPS counter
| print('[INFO] starting video stream...')
| vs = cv2.VideoCapture(video_name)
| fps = 30 #儲存視訊的FPS，可以适當調整
| size=(600,325)
| fourcc=cv2.VideoWriter_fourcc(*'XVID')
| videowrite=cv2.VideoWriter('output.avi',fourcc,fps,size)
| time.sleep(2.0)

定義全局參數：

video_name：輸入視訊的路徑。
prototxt ：Caffe prototxt 檔案的路徑。
model_path ：預訓練模型的路徑。
confidence_ta ：過濾弱檢測的最小機率門檻值。預設值為 20%。

接下來，讓我們初始化類标簽和邊界框顔色。

加載模型。

初始化VideoCapture對象。

設定VideoWriter對象以及參數。size的大小由下面的代碼決定，需要保持一緻，否則不能儲存視訊。

接下就是循環視訊的幀，然後輸入到檢測器進行檢測，這一部分的邏輯和圖像檢測一緻。代碼如下：

| # loop over the frames from the video stream
| while True:
| ret_val, frame = vs.read()
| if ret_val is False:
| break
| frame = imutils.resize(frame, width=1080)
| print(frame.shape)
| # grab the frame dimentions and convert it to a blob
| (h, w) = frame.shape[:2]
| blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 1| 27.5)
| # pass the blob through the network and obtain the detections and predictions
| net.setInput(blob)
| detections = net.forward()
| # loop over the detections
| for i in np.arange(0, detections.shape[2]):
| # extract the confidence (i.e., probability) associated with
| # the prediction
| confidence = detections[0, 0, i, 2]
| # filter out weak detections by ensuring the `confidence` is
| # greater than the minimum confidence
| if confidence > confidence_ta:
| # extract the index of the class label from the
| # `detections`, then compute the (x, y)-coordinates of
| # the bounding box for the object
| idx = int(detections[0, 0, i, 1])
| box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
| (startX, startY, endX, endY) = box.astype("int")
| # draw the prediction on the frame
| label = "{}: {:.2f}%".format(CLASSES[idx],
| confidence * 100)
| cv2.rectangle(frame, (startX, startY), (endX, endY),
| COLORS[idx], 2)
| y = startY - 15 if startY - 15 > 15 else startY + 15
| cv2.putText(frame, label, (startX, y),
| cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
| # show the output frame
| cv2.imshow("Frame", frame)
| videowrite.write(frame)
| key = cv2.waitKey(1) & 0xFF
| # if the `q` key was pressed, break from the loop
| if key == ord("q"):
| break
| videowrite.release()
| # do a bit of cleanup
| cv2.destroyAllWindows()
| vs.release()

運作結果：

（B站）：https://www.bilibili.com/video/BV1h8411c7zk/?spm_id_from=333.337.search-card.all.click&vd_source=b2d0f4c8b7d5e055a1426385d0f0fd5b

以上就是基于深度學習和OpenCV實作目标檢測的詳細内容，更多關于深度學習 OpenCV目标檢測的資料請關注小編其它相關文章！

準備了100G人工智能學習禮包：

1：人工智能詳細學習路線圖、大綱

2：300本人工智能經典書籍

3：機器學習算法+深度學習神經網絡學習教程

4：計算機視覺論文合集

5：人工智能實戰項目合集（附源碼）

6、人工智能筆試、面試題

【還有更多學習筆記，評論區：666，先到先得！】

20年AI經驗大佬深度解析基于深度學習和OpenCV實作目标檢測

目錄

使用深度學習和 OpenCV 進行目标檢測

MobileNets：高效（深度）神經網絡

使用 OpenCV 進行基于深度學習的對象檢測

使用 OpenCV 檢測視訊

運作結果：

準備了100G人工智能學習禮包：

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

初級銀行從業資格證有什麼用？

MBA提前面試純幹貨分享

MBA值得學麼

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

深度學習模型分析人類複雜疾病的準确性

【趨高機器視覺】機器視覺技術原了解析及解決方案

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡