計數統計應用舉例:
1.檢測樣本中某一值出現的次數
2.日志分析某一消息出現的頻率
3.分析檔案中相同字元串出現的機率等
實作:
1.dict
some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']
count_frq = dict()
for item in some_data:
if item in count_frq:
count_frq[item] += 1
else:
count_frq[item] = 1
print count_frq
{'a': 3, 2: 1, 'b': 1, 4: 2, 5: 2, 7: 1, '2': 2, 'z': 1, 'd': 1}
2.defaultdict
>>> from collections import defaultdict
>>> some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']
>>> count_frq = defaultdict(int)
>>> for item in some_data:
... count_frq[item] +=1
...
>>> print count_frq
defaultdict(<type 'int'>, {'a': 3, 2: 1, 'b': 1, 4: 2, 5: 2, 7: 1, '2': 2, 'z': 1, 'd': 1})
3.使用set和list
>>> some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']
>>> count_set = set(some_data)
>>> count_list = []
>>> for item in count_set:
... count_list.append((item,some_data.count(item)))
...
>>> print count_list
[('a', 3), (2, 1), ('b', 1), (4, 2), (5, 2), (7, 1), ('2', 2), ('z', 1), ('d', 1)]
4.使用collections.Counter
>>> from collections import Counter
>>> some_data = ['a','2',2,4,5,'2','b',4,7,'a',5,'d','a','z']
>>> print Counter(some_data)
Counter({'a': 3, 4: 2, 5: 2, '2': 2, 2: 1, 'b': 1, 7: 1, 'z': 1, 'd': 1})
Counter類是一個容器對象,屬于字典類的子類,用于統計散列對象,支援集合操作+,-,&,|,其中&和|操作分别傳回兩個Counter對象各元素的最小值和最大值。
3種初始化方式:
Counter("success") #可疊代對象
Counter(s=3,c=2,e=1,u=1) #關鍵字參數
Counter({"s":3,"c":2,"u":1,"e":1}) #字典
方法:
1.elements()方法擷取Counter中key值
>>> list(Counter(some_data).elements())
['a', 'a', 'a', 2, 'b', 4, 4, 5, 5, 7, '2', '2', 'z', 'd']
2.most_common()方法找出前N個出現頻率最高的元素以及對應的次數。
>>> Counter(some_data).most_common(2)
[('a', 3), (4, 2)]
3.update()方法更新被統計對象的元素
>>> c = Counter("success")
>>> c.update("successfully")
>>> c
Counter({'s': 6, 'c': 4, 'u': 3, 'e': 2, 'l': 2, 'f': 1, 'y': 1})