kmeans代碼分析

2023-08-05 06:37:16

kmeans: 缺點：對初始值的選取敏感, 使用bikmean可以解決。完整代碼參考部落格：http://blog.csdn.net/zouxy09/article/details/17590137 kmeans算法分析： 1、初始化聚類中心

def initCentroids(dataSet, k):
    numSamples, dim = dataSet.shape
    centroids = zeros((k, dim))
    for i in range(k):
        index = int(random.uniform(0, numSamples))
        centroids[i, :] = dataSet[index, :]
    return centroids

循環：如果未疊代：即clusterChanged = True 1、計算各點到聚類中心的距離，選擇最近的距離，更新clusterAssent：第一列存放該樣本所在簇的類标，第二列存儲該樣本到對應簇中心的距離。判斷是否疊代，更新 clusterChanged 。

while clusterChanged:
    clusterChanged = False
    ## for each sample
    for i in range(numSamples):
        minDist = 100000.0
        minIndex = 0
        ## for each centroid
        ## step 2: find the centroid who is closest
        for j in range(k):
            distance = euclDistance(centroids[j, :], dataSet[i, :])
            if distance < minDist:
                minDist = distance
                minIndex = j  #距離最小的聚類中心類标

                ## step 3: update its cluster
        if clusterAssment[i, 0] != minIndex:
            clusterChanged = True
            clusterAssment[i, :] = minIndex, minDist ** 2

2、更新聚類中心

for j in range(k):
    pointsInCluster = dataSet[nonzero(clusterAssment[:, 0].A == j)[0]]
    centroids[j, :] = mean(pointsInCluster, axis=0)

kmeans代碼分析

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入