在機器學習中的決策樹類算法中主要依靠資訊熵的大小來選擇重要的特征屬性作為節點換分資料集進而獲得訓練結果
#-*-coding:utf-8-*-
from math import log
import operator
def calcShanonEnt(dataSet):
'''
計算給定資料集的香農熵
:param dataSet:
:return:shanonEnt
'''
numEntries = len(dataSet)
labelCounts={}
for featVec in dataSet:
currentLabel = featVec[-1]
if currentLabel not in labelCounts.keys():
labelCounts[currentLabel]=0
labelCounts[currentLabel] +=1
shanonEnt = 0.0
for key in labelCounts:
prob= float(labelCounts[key])/numEntries
shanonEnt -= prob*log(prob,2)
return shanonEnt
作者:WangB