kmeans代码分析

2023-08-05 06:37:16

kmeans: 缺点：对初始值的选取敏感, 使用bikmean可以解决。完整代码参考博客：http://blog.csdn.net/zouxy09/article/details/17590137 kmeans算法分析： 1、初始化聚类中心

def initCentroids(dataSet, k):
    numSamples, dim = dataSet.shape
    centroids = zeros((k, dim))
    for i in range(k):
        index = int(random.uniform(0, numSamples))
        centroids[i, :] = dataSet[index, :]
    return centroids

循环：如果未迭代：即clusterChanged = True 1、计算各点到聚类中心的距离，选择最近的距离，更新clusterAssent：第一列存放该样本所在簇的类标，第二列存储该样本到对应簇中心的距离。判断是否迭代，更新 clusterChanged 。

while clusterChanged:
    clusterChanged = False
    ## for each sample
    for i in range(numSamples):
        minDist = 100000.0
        minIndex = 0
        ## for each centroid
        ## step 2: find the centroid who is closest
        for j in range(k):
            distance = euclDistance(centroids[j, :], dataSet[i, :])
            if distance < minDist:
                minDist = distance
                minIndex = j  #距离最小的聚类中心类标

                ## step 3: update its cluster
        if clusterAssment[i, 0] != minIndex:
            clusterChanged = True
            clusterAssment[i, :] = minIndex, minDist ** 2

2、更新聚类中心

for j in range(k):
    pointsInCluster = dataSet[nonzero(clusterAssment[:, 0].A == j)[0]]
    centroids[j, :] = mean(pointsInCluster, axis=0)

kmeans代码分析

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入