YOLOV3中關于k-means算法計算聚類中心具體實作方法

首先貼一份資料https://blog.csdn.net/zxyhhjs2017/article/details/83012425，相信大家都應該看過，但是yolov3的代碼中并沒有使用其中講解的k-means++算法，而是使用k-means，應為所要得到的n個聚類中心是直接在box中随機選取，然後更新聚類中心。

接下來，附上代碼分析！

import numpy as np


class YOLO_Kmeans:

    def __init__(self, cluster_number, filename):
        self.cluster_number = cluster_number
        self.filename = "kmeans.txt"

    def iou(self, boxes, clusters):  # 1 box -> k clusters
        n = boxes.shape[0]
        k = self.cluster_number
        #把box_area整理成n行k列的形式
        box_area = boxes[:, 0] * boxes[:, 1]
        box_area = box_area.repeat(k)
        box_area = np.reshape(box_area, (n, k))
        #把cluster_area整理成n行k列的形式
        cluster_area = clusters[:, 0] * clusters[:, 1]
        cluster_area = np.tile(cluster_area, [1, n])
        cluster_area = np.reshape(cluster_area, (n, k))
        #把box和cluster的寬都整理成n行k列的形式，并把兩者做比較，最後還是一個n行k列的形式，這個                過程其實在比較box和兩個cluster的寬，并選出小的
        box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k))
        cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k))
        min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)
        # 把box和cluster的高都整理成n行k列的形式，并把兩者做比較，最後還是一個n行k列的形式，這個過程其實在比較box和兩個cluster的高，并選出小的
        box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k))
        cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k))
        min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)
        #把小的寬和高相乘
        inter_area = np.multiply(min_w_matrix, min_h_matrix)

        result = inter_area / (box_area + cluster_area - inter_area)
        return result

    def avg_iou(self, boxes, clusters):
        accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)])
        return accuracy

    def kmeans(self, boxes, k, dist=np.median):
        box_number = boxes.shape[0]
        distances = np.empty((box_number, k))
        last_nearest = np.zeros((box_number,))
        np.random.seed()
        clusters = boxes[np.random.choice(
            box_number, k, replace=False)]  # init k clusters
        while True:

            distances = 1 - self.iou(boxes, clusters)
            #distance是一個n行k列的小于1的數組，比較每一行提出來最小的一個，意義是每行中選出一個最合适的聚類中心，比如第一個box和第3個距離最小，第二個和第4個聚類中心距離最小。。。。。。[3,4,5,0,1,。。。。。。。4]
            current_nearest = np.argmin(distances, axis=1)
            print((last_nearest == current_nearest))
            if (last_nearest == current_nearest).all():
                break  # clusters won't change
           #難點是更換中心
            for cluster in range(k):
                clusters[cluster] = dist(  # update clusters
                    boxes[current_nearest == cluster], axis=0)

            last_nearest = current_nearest

        return clusters

    def result2txt(self, data):
        f = open("yolo_anchors.txt", 'w')
        row = np.shape(data)[0]
        for i in range(row):
            if i == 0:
                x_y = "%d,%d" % (data[i][0], data[i][1])
            else:
                x_y = ", %d,%d" % (data[i][0], data[i][1])
            f.write(x_y)
        f.close()

    def txt2boxes(self):
        f = open(self.filename, 'r')
        dataSet = []
        for line in f:
            infos = line.split(" ")
            length = len(infos)
            for i in range(1, length):
                width = int(infos[i].split(",")[2]) - \
                    int(infos[i].split(",")[0])
                height = int(infos[i].split(",")[3]) - \
                    int(infos[i].split(",")[1])
                dataSet.append([width, height])
        result = np.array(dataSet)
        f.close()
        return result

    def txt2clusters(self):
        all_boxes = self.txt2boxes()
        result = self.kmeans(all_boxes, k=self.cluster_number)
        result = result[np.lexsort(result.T[0, None])]
        self.result2txt(result)
        print("K anchors:\n {}".format(result))
        print("Accuracy: {:.2f}%".format(
            self.avg_iou(all_boxes, result) * 100))


if __name__ == "__main__":
    cluster_number = 3
    filename = "train.txt"
    kmeans = YOLO_Kmeans(cluster_number, filename)
    kmeans.txt2clusters()

YOLOV3中關于k-means算法計算聚類中心具體實作方法

繼續閱讀

K-means優化（Kmeans++, ISODATA, Kernal k-means）1. Kmeans++

【機器學習之K-means聚類算法】前言一、什麼是聚類算法？二、算法過程是怎樣的？總結

KMeans算法——銀行客戶分群模型1.讀取資料2.可視化展示 3.資料模組化 4.模組化效果可視化展示

KMeans in Hadoop

Yolov3：win10下訓練自己的資料（GPU版）（詳細步驟）

win10+pytorch+yolov3 訓練爬取資料寫在前頭編譯運作拓展錯誤彙總小結

YOLOv3訓練自己的資料

使用yolov3訓練的資料集

使用yolov3訓練自己的資料集（c++ vs2017 win10）

yolov3 訓練及資料集準備【記錄】yolov3 訓練及資料集準備【記錄】

yolov3在win10下訓練自己的資料

Coursera NG 機器學習第七周 KMeans PCA 圖像壓縮 Python實作

python 基本Kmeans算法實作

python手寫kmeans以及kmeans++聚類算法

Mahout 之kmeans算法學習筆記Mahout 之kmeans算法學習筆記

目标檢測：YOLOV3論文解讀一、yolov3論文解讀