抽空把這個網絡細究一下，希望大佬指正~~

大緻了解：SSD網絡抽取不同的特征圖，每個特征圖可以看成是一個網格圖，每個點即是一個錨點，以錨點為中心，可以生成不同大小和比例的anchor，這些anchor都是可能的目标。目标檢測網絡分為目标定位和分類兩個部分，分類很簡單，就是在每個特征圖上的每個點的每個anchor都進行分類，SSD網絡中把背景也單獨分成了一類，至于定位，就涉及到了邊框回歸問題（bounding-box regression）。邊框回歸最早出現在R-CNN中，其意思就是，我們的網絡可以在每個特征圖每個點的每個anchor上預估一個邊框回歸的值，而真實的邊框回歸值也可以根據真實目标所在的位置計算出來，進而計算估計值和真實值之間的偏差。

1.讀取訓練資料

源碼中的訓練資料讀取在datasets檔案夾中。資料讀取為以下幾行代碼

dataset_dir ='./datasets/train/'
dataset_split_name = 'train'
dataset_name = 'pascalvoc_2012'
dataset = dataset_factory.get_dataset(dataset_name, dataset_split_name, dataset_dir)

進入dataset_factory可以看到有cifar,imagenet和pascalvoc三種資料集。這裡我選擇的是voc2012資料集的格式來制作和讀取訓練資料，源碼中voc生成tfrecord的過程非常簡單，這裡不多描述。

可以進入pascalvoc2012.py檢視相應的配置。再看其中的get_split函數，主要是用到了slim.dataset.Dataset()函數來讀取資料。具體可以參考tensorflow從磁盤讀取資料

2.資料預處理

深度學習圖像處理上的預處理主要是做一些圖像增廣，通常的操作是裁剪、随機亮度、随機對比度、白化等，SSD中，由于涉及到了目标框的标注，是以在剪裁圖像之後需要對目标框的資訊進行相應的改變。

源碼中資料處理在preprocessing檔案夾中，其中，preprocessing_factory.py用于選擇使用何種預處理方式。

看源碼時，發現python有直接傳回函數的用法：

def get_preprocessing(name, is_training=False):
    
    preprocessing_fn_map = {
        'ssd_300_vgg': ssd_vgg_preprocessing,
        'ssd_512_vgg': ssd_vgg_preprocessing,
    }

    if name not in preprocessing_fn_map:
        raise ValueError('Preprocessing name [%s] was not recognized' % name)

    def preprocessing_fn(image, labels, bboxes,
                         out_shape, data_format='NHWC', **kwargs):
        return preprocessing_fn_map[name].preprocess_image(
            image, labels, bboxes, out_shape, data_format=data_format,
            is_training=is_training, **kwargs)
    return preprocessing_fn

百度了一下這種用法，可以了解為把函數也看成了一個類，是以在外部調用get_preprocessing時，傳回的是preprocessing_fn這一個類的執行個體化：

image_preprocessing_fn = preprocessing_factory.get_preprocessing(
        preprocessing_name, is_training=True)

個人覺得這種方式可以延遲函數的調用，便于在執行前檢查參數，但是看源碼時這層層調用确實讓人懵逼，具體解釋可以圍觀知乎：Python 裡為什麼函數可以傳回一個函數内部定義的函數？

再看具體的圖像預處理方法，傳回的也是一個函數preprocess_image，可以看到訓練和驗證時，預處理方法是不一樣的。

def preprocess_image(image,
                     labels,
                     bboxes,
                     out_shape,
                     data_format,
                     is_training=False,
                     **kwargs):

    if is_training:
        return preprocess_for_train(image, labels, bboxes,
                                    out_shape=out_shape,
                                    data_format=data_format)
    else:
        return preprocess_for_eval(image, labels, bboxes,
                                   out_shape=out_shape,
                                   data_format=data_format,
                                   **kwargs)

訓練時的預處理流程：1.剪裁圖像；2.随機左右翻轉；3.顔色改變；4.白化。這裡麻煩一點的就是剪裁圖像和翻轉之後，bboxes都要進行相應的改變。

先看剪裁圖像：

def distorted_bounding_box_crop(image,
                                labels,
                                bboxes,
                                min_object_covered=0.3,
                                aspect_ratio_range=(0.9, 1.1),
                                area_range=(0.1, 1.0),
                                max_attempts=200,
                                clip_bboxes=True,
                                scope=None):
    
    with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bboxes]):
        # Each bounding box has shape [1, num_boxes, box coords] and
        # the coordinates are ordered [ymin, xmin, ymax, xmax].
        # 生成用于剪裁圖像的邊界框，用作重新計算bbox的參考，bbox_begin是左上角點
        bbox_begin, bbox_size, distort_bbox = tf.image.sample_distorted_bounding_box(
                tf.shape(image),
                bounding_boxes=tf.expand_dims(bboxes, 0),
                min_object_covered=min_object_covered,
                aspect_ratio_range=aspect_ratio_range,
                area_range=area_range,
                max_attempts=max_attempts,
                use_image_if_no_bounding_boxes=True)
        # 上面傳回的distort_bbox次元為[1,1,4],是以這裡要重新取出
        distort_bbox = distort_bbox[0, 0]

        # Crop the image to the specified bounding box.
        cropped_image = tf.slice(image, bbox_begin, bbox_size)
        # Restore the shape since the dynamic slice loses 3rd dimension.
        cropped_image.set_shape([None, None, 3])

        # Update bounding boxes: resize and filter out.
        bboxes = tfe.bboxes_resize(distort_bbox, bboxes)
        labels, bboxes = tfe.bboxes_filter_overlap(labels, bboxes,
                                                   threshold=BBOX_CROP_OVERLAP,
                                                   assign_negative=False)
        return cropped_image, labels, bboxes, distort_bbox

bbox的更新在tf_extended中，首先是更新bbox的坐标點：

def bboxes_resize(bbox_ref, bboxes, name=None):
    
    # Bboxes is dictionary.
    if isinstance(bboxes, dict):
        with tf.name_scope(name, 'bboxes_resize_dict'):
            d_bboxes = {}
            for c in bboxes.keys():
                d_bboxes[c] = bboxes_resize(bbox_ref, bboxes[c])
            return d_bboxes

    # Tensors inputs.
    with tf.name_scope(name, 'bboxes_resize'):
        # Translate.
        # 相當于是把原點從[0,0]變換到了[bbox_ref[0], bbox_ref[1]]
        v = tf.stack([bbox_ref[0], bbox_ref[1], bbox_ref[0], bbox_ref[1]])
        bboxes = bboxes - v
        # Scale.
        # 重新計算歸一化的尺度
        s = tf.stack([bbox_ref[2] - bbox_ref[0],
                      bbox_ref[3] - bbox_ref[1],
                      bbox_ref[2] - bbox_ref[0],
                      bbox_ref[3] - bbox_ref[1]])
        bboxes = bboxes / s
        return bboxes

然後判斷有的目标是否被剪裁得太厲害，要不要保留：

def bboxes_filter_overlap(labels, bboxes,
                          threshold=0.5, assign_negative=False,
                          scope=None):

    with tf.name_scope(scope, 'bboxes_filter', [labels, bboxes]):
        # bbox被裁後，保留的部分與原來的面積比
        scores = bboxes_intersection(tf.constant([0, 0, 1, 1], bboxes.dtype),
                                     bboxes)
        mask = scores > threshold
        # 保留所有的label和框，重疊區不夠的label置負
        if assign_negative:
            labels = tf.where(mask, labels, -labels)
            # bboxes = tf.where(mask, bboxes, bboxes)
        # 删除重疊區不夠的label和框
        else:
            labels = tf.boolean_mask(labels, mask)
            bboxes = tf.boolean_mask(bboxes, mask)
        return labels, bboxes

def bboxes_intersection(bbox_ref, bboxes, name=None):
    with tf.name_scope(name, 'bboxes_intersection'):
        # Should be more efficient to first transpose.
        bboxes = tf.transpose(bboxes)
        bbox_ref = tf.transpose(bbox_ref)
        # Intersection bbox and volume.
        int_ymin = tf.maximum(bboxes[0], bbox_ref[0])
        int_xmin = tf.maximum(bboxes[1], bbox_ref[1])
        int_ymax = tf.minimum(bboxes[2], bbox_ref[2])
        int_xmax = tf.minimum(bboxes[3], bbox_ref[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        # Volumes.
        inter_vol = h * w
        bboxes_vol = (bboxes[2] - bboxes[0]) * (bboxes[3] - bboxes[1])
        scores = tfe_math.safe_divide(inter_vol, bboxes_vol, 'intersection')
        return scores

再看水準翻轉，其實也就是在x方向上，将x變換為1-x：

def random_flip_left_right(image, bboxes, seed=None):
    """Random flip left-right of an image and its bounding boxes.
    """
    def flip_bboxes(bboxes):
        """Flip bounding boxes coordinates.
        """
        bboxes = tf.stack([bboxes[:, 0], 1 - bboxes[:, 3],
                           bboxes[:, 2], 1 - bboxes[:, 1]], axis=-1)
        return bboxes

    # Random flip. Tensorflow implementation.
    with tf.name_scope('random_flip_left_right'):
        image = ops.convert_to_tensor(image, name='image')
        _Check3DImage(image, require_static=False)
        # 随機生成0-1之間的數，與0.5判斷
        uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
        mirror_cond = math_ops.less(uniform_random, .5)
        # Flip image.
        # control_flow_ops.cond相當于if-else語句
        result = control_flow_ops.cond(mirror_cond,
                                       lambda: array_ops.reverse_v2(image, [1]),
                                       lambda: image)
        # Flip bboxes.
        bboxes = control_flow_ops.cond(mirror_cond,
                                       lambda: flip_bboxes(bboxes),
                                       lambda: bboxes)
        return fix_image_flip_shape(image, result), bboxes

另外的兩種方法都比較簡單，在此就不多做描述。驗證時的預處理，主要是在沒有目标時，加進去了一個原圖大小的框，預處理采用的方式也是剪裁、白化等，在看驗證代碼時再進行補充。至此，ssd網絡中的圖像預處理部分就結束了。

深度學習——SSD目标檢測網絡源碼學習之圖像預處理1.讀取訓練資料2.資料預處理

1.讀取訓練資料

2.資料預處理

繼續閱讀

使用opencv的dnn子產品進行人臉檢測

YOLOv8來啦 | 詳細解讀YOLOv8的改進子產品！YOLOv5官方出品YOLOv8！1、YOLOv5回顧2、YOLOv8核心介紹參考文章

對YOLO-v1的了解及閱讀筆記YOLO-v1 閱讀筆記

yolox運作報錯--can‘t find starting numberyolox運作報錯–can’t find starting number

【論文閱讀筆記】Deep Neural Networks for Object Detection

【論文閱讀筆記】CenterNet：Objects as Points

【論文閱讀筆記】ThunderNet: Towards Real-time Generic Object Detection

【ICLR2019】Oral 論文彙總

【ICLR2019】Poster 論文彙總

目标檢測系列（IV）：YOLO V1、YOLO V2、YOLO V3

pp-picodet從環境配置到部署全流程（5）——PaddleLite端側部署1. PaddleDetection支援的部署形式說明

目标檢測架構｜又一新架構來襲，關系網絡用于目标檢測（文末附源碼）

yolov7 tensorrt模型加速部署【實戰】

目标檢測：YOLOV3論文解讀一、yolov3論文解讀

Pytorch機器學習（九）—— YOLO中對于錨框，預測框，産生候選區域及對候選區域進行标注詳解 Pytorch機器學習（九）—— YOLO中錨框，預測框，産生候選區域及對候選區域進行标注詳解前言一、基本概念二、代碼講解總結

2021-09-30三維點雲測量正方形包裹體積