天天看點

Python中Counter計數統計

計數統計應用舉例:

1.檢測樣本中某一值出現的次數

2.日志分析某一消息出現的頻率

3.分析檔案中相同字元串出現的機率等

實作:

1.dict

some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']

count_frq = dict()

for item in some_data:

if item in count_frq:

count_frq[item] += 1

else:

count_frq[item] = 1

print count_frq

{'a': 3, 2: 1, 'b': 1, 4: 2, 5: 2, 7: 1, '2': 2, 'z': 1, 'd': 1}

2.defaultdict

>>> from collections import defaultdict

>>> some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']

>>> count_frq = defaultdict(int)

>>> for item in some_data:

...     count_frq[item] +=1

... 

>>> print count_frq

defaultdict(<type 'int'>, {'a': 3, 2: 1, 'b': 1, 4: 2, 5: 2, 7: 1, '2': 2, 'z': 1, 'd': 1})

3.使用set和list

>>> some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']

>>> count_set = set(some_data)

>>> count_list = []

>>> for item in count_set:

...     count_list.append((item,some_data.count(item)))

... 

>>> print count_list

[('a', 3), (2, 1), ('b', 1), (4, 2), (5, 2), (7, 1), ('2', 2), ('z', 1), ('d', 1)]

4.使用collections.Counter

>>> from collections import Counter

>>> some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']

>>> print Counter(some_data)

Counter({'a': 3, 4: 2, 5: 2, '2': 2, 2: 1, 'b': 1, 7: 1, 'z': 1, 'd': 1})

Counter類是一個容器對象,屬于字典類的子類,用于統計散列對象,支援集合操作+,-,&,|,其中&和|操作分别傳回兩個Counter對象各元素的最小值和最大值。

3種初始化方式:

Counter("success")   #可疊代對象

Counter(s=3,c=2,e=1,u=1) #關鍵字參數

Counter({"s":3,"c":2,"u":1,"e":1}) #字典

方法:

1.elements()方法擷取Counter中key值

>>> list(Counter(some_data).elements())

['a', 'a', 'a', 2, 'b', 4, 4, 5, 5, 7, '2', '2', 'z', 'd']

2.most_common()方法找出前N個出現頻率最高的元素以及對應的次數。

>>> Counter(some_data).most_common(2)

[('a', 3), (4, 2)]

3.update()方法更新被統計對象的元素

>>> c = Counter("success")

>>> c.update("successfully")

>>> c

Counter({'s': 6, 'c': 4, 'u': 3, 'e': 2, 'l': 2, 'f': 1, 'y': 1})