1. 目标檢測基礎知識
1.1 目标檢測概念
根據對比圖像分類,來明晰目标檢測:
圖像分類:
隻需要判斷輸入的圖像中是否包含感興趣物體。
目标檢測:
需要在識别出圖檔中目标類别的基礎上,還要精确定位到目标的具體位置,并用外接矩形框标出。
1.2 目标檢測思路
總體思路:先确立衆多候選框,再對候選框進行分類和微調。

圖1 結合分類來看目标檢測
1.3 目标框定義方式
在圖像分類中,标簽資訊是類别。目标檢測的标簽資訊除了類别label以外,需要同時包含目标的位置資訊,也就是目标的外接矩形框bounding box。
用來表達bbox的格式通常有兩種,(x1, y1, x2, y2) 和 (c_x, c_y, w, h)
1.4 交并比(IoU)
IoU的全稱是交并比Intersection over Union),表示兩個目标框的交集占其并集的比例。
具體計算方法:
1.首先擷取兩個框的坐标,紅框坐标: 左上(red_x1, red_y1), 右下(red_x2, red_y2),綠框坐标: 左上(green_x1, green_y1),右下(green_x2, green_y2)
2.計算兩個框左上點的坐标最大值:(max(red_x1, green_x1), max(red_y1, green_y1)), 和右下點坐标最小值:(min(red_x2, green_x2), min(red_y2, green_y2))
3.利用2算出的資訊計算黃框面積:yellow_area
4.計算紅綠框的面積:red_area 和 green_area
5.iou = yellow_area / (red_area + green_area - yellow_area)
2. 目标檢測資料集VOC
2.1 VOC資料集介紹
2.1.1 資料集類别:
2.1.2 資料集量級
VOC數量集圖像和目标數量的基本資訊如下圖所示:
其中,Images表示圖檔數量,Objects表示目标數量
2. VOC資料集的dataloader建構
2.1 資料集準備
- 準備create_data_lists.py
"""python
create_data_lists
"""
from utils import create_data_lists
if __name__ == '__main__':
# voc07_path,voc12_path為我們訓練測試所需要用到的資料集,output_folder為我們生成建構dataloader所需檔案的路徑
# 參數中涉及的路徑以個人實際路徑為準,建議将資料集放到dataset目錄下,和教程保持一緻
create_data_lists(voc07_path='Desktop/Computer Science/Datawhale/CV/dataset/VOCdevkit/VOC2007',
voc12_path='Desktop/Computer Science/Datawhale/CV/dataset/VOCdevkit/VOC2012',
output_folder='Desktop/Computer Science/Datawhale/CV/dataset/VOCdevkit')
在設定好對應路徑後,我們可以在jupyter notebook上進行運作
2.2 建構dataloader
我們直接上代碼:
"""python
PascalVOCDataset具體實作過程
"""
import torch
from torch.utils.data import Dataset
import json
import os
from PIL import Image
from utils import transform
class PascalVOCDataset(Dataset):
"""
A PyTorch Dataset class to be used in a PyTorch DataLoader to create batches.
"""
#初始化相關變量
#讀取images和objects标注資訊
def __init__(self, data_folder, split, keep_difficult=False):
"""
:param data_folder: folder where data files are stored
:param split: split, one of 'TRAIN' or 'TEST'
:param keep_difficult: keep or discard objects that are considered difficult to detect?
"""
self.split = split.upper() #保證輸入為純大寫字母,便于比對{'TRAIN', 'TEST'}
assert self.split in {'TRAIN', 'TEST'}
self.data_folder = data_folder
self.keep_difficult = keep_difficult
# Read data files
with open(os.path.join(data_folder, self.split + '_images.json'), 'r') as j:
self.images = json.load(j)
with open(os.path.join(data_folder, self.split + '_objects.json'), 'r') as j:
self.objects = json.load(j)
assert len(self.images) == len(self.objects)
#循環讀取image及對應objects
#對讀取的image及objects進行tranform操作(資料增廣)
#傳回PIL格式圖像,标注框,标注框對應的類别索引,對應的difficult标志(True or False)
def __getitem__(self, i):
# Read image
#*需要注意,在pytorch中,圖像的讀取要使用Image.open()讀取成PIL格式,不能使用opencv
#*由于Image.open()讀取的圖檔是四通道的(RGBA),是以需要.convert('RGB')轉換為RGB通道
image = Image.open(self.images[i], mode='r')
image = image.convert('RGB')
# Read objects in this image (bounding boxes, labels, difficulties)
objects = self.objects[i]
boxes = torch.FloatTensor(objects['boxes']) # (n_objects, 4)
labels = torch.LongTensor(objects['labels']) # (n_objects)
difficulties = torch.ByteTensor(objects['difficulties']) # (n_objects)
# Discard difficult objects, if desired
#如果self.keep_difficult為False,即不保留difficult标志為True的目标
#那麼這裡将對應的目标删去
if not self.keep_difficult:
boxes = boxes[1 - difficulties]
labels = labels[1 - difficulties]
difficulties = difficulties[1 - difficulties]
# Apply transformations
#對讀取的圖檔應用transform
image, boxes, labels, difficulties = transform(image, boxes, labels, difficulties, split=self.split)
return image, boxes, labels, difficulties
#擷取圖檔的總數,用于計算batch數
def __len__(self):
return len(self.images)
#我們知道,我們輸入到網絡中訓練的資料通常是一個batch一起輸入,而通過__getitem__我們隻讀取了一張圖檔及其objects資訊
#如何将讀取的一張張圖檔及其object資訊整合成batch的形式呢?
#collate_fn就是做這個事情,
#對于一個batch的images,collate_fn通過torch.stack()将其整合成4維tensor,對應的objects資訊分别用一個list存儲
def collate_fn(self, batch):
"""
Since each image may have a different number of objects, we need a collate function (to be passed to the DataLoader).
This describes how to combine these tensors of different sizes. We use lists.
Note: this need not be defined in this Class, can be standalone.
:param batch: an iterable of N sets from __getitem__()
:return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
"""
images = list()
boxes = list()
labels = list()
difficulties = list()
for b in batch:
images.append(b[0])
boxes.append(b[1])
labels.append(b[2])
difficulties.append(b[3])
#(3,224,224) -> (N,3,224,224)
images = torch.stack(images, dim=0)
return images, boxes, labels, difficulties # tensor (N, 3, 224, 224), 3 lists of N tensors each
最後,我們可以向pytorch傳入dataset直接建構DataLoader了。
"""python
DataLoader
"""
#參數說明:
#在train時一般設定shufle=True打亂資料順序,增強模型的魯棒性
#num_worker表示讀取資料時的線程數,一般根據自己裝置配置确定(如果是windows系統,建議設預設值0,防止出錯)
#pin_memory,在計算機記憶體充足的時候設定為True可以加快記憶體中的tensor轉換到GPU的速度,具體原因可以百度哈~
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
collate_fn=train_dataset.collate_fn, num_workers=workers,
pin_memory=True) # note that we're passing the collate function here
文獻資料參考DataWhale社群