在机器学习中的决策树类算法中主要依靠信息熵的大小来选择重要的特征属性作为节点换分数据集从而获得训练结果
#-*-coding:utf-8-*-
from math import log
import operator
def calcShanonEnt(dataSet):
'''
计算给定数据集的香农熵
:param dataSet:
:return:shanonEnt
'''
numEntries = len(dataSet)
labelCounts={}
for featVec in dataSet:
currentLabel = featVec[-1]
if currentLabel not in labelCounts.keys():
labelCounts[currentLabel]=0
labelCounts[currentLabel] +=1
shanonEnt = 0.0
for key in labelCounts:
prob= float(labelCounts[key])/numEntries
shanonEnt -= prob*log(prob,2)
return shanonEnt
作者:WangB