Python包括很多标準程式設計資料結構,如
list
,
tuple
,
dict
,
set
,這些屬于内置類型
collections子產品包含多種資料結構的實作,擴充了其他子產品中相應的結構。
Counter是一個容器,可以跟蹤相同的值增加了多少次。這個類可以用來實作其他語言常用包或多集合資料結構來實作的算法。
Deque是一個雙端隊列,允許從任意一端增加或删除元素。
defaultdict是一個字典,如果找不到某個鍵,會相應一個預設值。
OrderedDict會記住增加元素的序列。
nametuple擴充了一般的
tuple
,除了為每個成員元素提供一個數值索引外還提供了一個屬性名。
1.Counter
Counter作為一個容器,可以跟蹤相同的值增加了多少次。這個類可以用來實作其他語言常用包或多集合資料結構來實作的算法。
初始化
Counter支援
3
種形式的初始化。調用Counter的構造函數時可以提供一個元素序列或者一個包含鍵和計數的字典,還可以使用關鍵字參數将字元串名映射到計數。
import
collections
print
collections.Counter([
'a'
,
'b'
,
'c'
,
'a'
,
'b'
,
'b'
])
print
collections.Counter({
'a'
:
2
,
'b'
:
3
,
'c'
:
1
})
print
collections.Counter(a
=
2
, b
=
3
, c
=
1
)
print
collections.Counter('aabbbc'
)
這四種形式的初始化結構都是一樣的。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Counter({
'b'
:
3
,
'a'
:
2
,
'c'
:
1
})
如果不提供任何參數,可以構造一個空的Counter,然後通過update()方法填充。
import
collections
c
=
collections.Counter()
print
'Initial :'
, c
c.update(
'abcdcaa'
)
print
'Sequencel:'
, c
c.update({
'a'
:
1
,
'd'
:
6
})
print
'Dict :'
, c
計數值将根據新資料增加,替換資料不會改變計數。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Initial : Counter()
Sequencel: Counter({
'a'
:
3
,
'c'
:
2
,
'b'
:
1
,
'd'
:
1
})
Dict
: Counter({
'd'
:
7
,
'a'
:
4
,
'c'
:
2
,
'b'
:
1
})
通路計數
一旦填充了Counter,可以使用字典API擷取它的值。
import
collections
c
=
collections.Counter(
'abcdccca'
)
for
letter
in
'abcde'
:
print
'%s : %d'
%
(letter, c[letter])
對于未知元素,Counter不會産生KerError,如果沒有找到某個值,其計數為
。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
a :
2
b :
1
c :
4
d :
1
elements()方法傳回一個疊代器,将生産Counter知道的所有元素
import
collections
c
=
collections.Counter(
'abcdccca'
)
c[
'e'
]
=
print
c
print
list
(c.elements())
不能保證元素順序不變,另外計數小于或等于
的元素不包含在内。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Counter({
'c'
:
4
,
'a'
:
2
,
'b'
:
1
,
'd'
:
1
,
'e'
:
})
[
'a'
,
'a'
,
'c'
,
'c'
,
'c'
,
'c'
,
'b'
,
'd'
]
使用most_common()可以生成一個序列,其中包含n個最常遇到的輸入值及其相應計數。
import
collections
c
=
collections.Counter()
with
open
(r
'd:\check_traffic.sh'
,
'rt'
) as f:
for
line
in
f:
c.update(line.rstrip().lower())
print
'Most common:'
for
letter, count
in
c.most_common(
5
):
print
'%s: %6d'
%
(letter, count)
統計系統所有單詞中出現的字母,生成一個頻率分布,然後列印
5
個最常見的字母。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Most common:
:
6535
e:
3435
:
3202
t:
3141
i:
3100
算術操作
Counter執行個體支援算術和集合操作來完成結果的聚集。
import
collections
c1
=
collections.Counter([
'a'
,
'a'
,
'c'
,
'b'
,
'a'
])
c2
=
collections.Counter(
'alphabet'
)
print
'c1:'
, c1
print
'c2:'
, c2
print
'\nCombined counts:'
print
c1
+
c2
print
'\nSubtraction:'
print
c1
-
c2
print
'\nIntersection:'
print
c1 & c2
print
'\nUnion:'
print
c1 | c2
每次通過操作生成一個新的Counter時,計數為
或者負的元素都會被删除。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
c1: Counter({
'a'
:
3
,
'c'
:
1
,
'b'
:
1
})
c2: Counter({
'a'
:
2
,
'b'
:
1
,
'e'
:
1
,
'h'
:
1
,
'l'
:
1
,
'p'
:
1
,
't'
:
1
})
Combined counts:
Counter({
'a'
:
5
,
'b'
:
2
,
'c'
:
1
,
'e'
:
1
,
'h'
:
1
,
'l'
:
1
,
'p'
:
1
,
't'
:
1
})
Subtraction:
Counter({
'a'
:
1
,
'c'
:
1
})
Intersection:
Counter({
'a'
:
2
,
'b'
:
1
})
Union:
Counter({
'a'
:
3
,
'c'
:
1
,
'b'
:
1
,
'e'
:
1
,
'h'
:
1
,
'l'
:
1
,
'p'
:
1
,
't'
:
1
})
2
defaultdict
标準字典包括一個方法setdefault()來擷取一個值
,如果值不存在則建立一個預設值。defaultdict初始化容器是會讓調用者提前指定預設值。
import
collections
def
default_factory():
return
'default value'
d
=
collections.defaultdict(default_factory, foo
=
'bar'
) 或者 d = collections.defaultdict(lambda :'333',{'
foo'
=
'bar'}
)
print
'd:'
, d
print
'foo =>'
, d[
'foo'
]
print
'var =>'
, d[
'bar'
]
隻要所有鍵都有相同的預設值,就可以使用這個方法。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
d: defaultdict(<function default_factory at
0x0201FAB0
>, {
'foo'
:
'bar'
})
foo
=
> bar
var
=
> default value
3
deque
deque(兩端隊列)支援從任意一端增加和删除元素。常用的兩種結果,即棧和隊列,就是兩端隊列的退化形式,其輸入和輸出限制在一端。
import
collections
d
=
collections.deque(
'abcdefg'
)
print
'Deque:'
, d
print
'Length:'
,
len
(d)
print
'Left end'
, d[
]
print
'Right end'
, d[
-
1
]
d.remove(
'c'
)
print
'remove(c):'
, d
deque是一種序列容器,支援
list
操作,可以通過比對辨別從序列中間删除元素。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Deque: deque([
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
,
'g'
])
Length:
7
Left end a
Right end g
remove(c): deque([
'a'
,
'b'
,
'd'
,
'e'
,
'f'
,
'g'
])
填充
deque可以從任意一端填充,在python實作稱為“左端”和“右端”。
import
collections
d1
=
collections.deque()
d1.extend(
'abcdefg'
)
print
'extend:'
, d1
d1.append(
'h'
)
print
'append:'
, d1
d2
=
collections.deque()
d2.extendleft(
xrange
(
6
))
print
'extendleft'
, d2
d2.appendleft(
6
)
print
'appendleft'
, d2
extendleft()疊代處理其輸入,對每個元素完成與appendleft()相同的處理。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
extend: deque([
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
,
'g'
])
append: deque([
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
,
'g'
,
'h'
])
extendleft deque([
5
,
4
,
3
,
2
,
1
,
])
appendleft deque([
6
,
5
,
4
,
3
,
2
,
1
,
])
利用
可以從兩端利用deque元素,取決于應用的算法。
import
collections
print
"From the right:"
d
=
collections.deque(
'abcdefg'
)
while
True
:
try
:
print
d.pop(),
except
IndexError:
break
print
print
"\nFrom the left:"
d
=
collections.deque(
xrange
(
6
))
while
True
:
try
:
print
d.popleft(),
except
IndexError:
break
print
使用pop()可以從deque右端删除一個元素,使用popleft()可以從deque左端删除一個元素。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
From the right:
g f e d c b a
From the left:
1
2
3
4
5
由于雙端隊列是線程安全的,可以在不同的線程中同時從兩端利用隊列的内容。
import
collections
import
threading
import
time
candle
=
collections.deque(
xrange
(
5
))
def
burn(direction, nextSource):
while
True
:
try
:
next
=
nextSource()
except
IndexError:
break
else
:
print
'%8s: %s'
%
(direction,
next
)
time.sleep(
0.1
)
print
'%8s done'
%
direction
return
left
=
threading.Thread(target
=
burn, args
=
(
'Left'
, candle.popleft))
right
=
threading.Thread(target
=
burn, args
=
(
'Right'
, candle.pop))
left.start()
right.start()
left.join()
right.join()
線程交替處理兩端,删除元素,知道這個deque為空。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Left:
Right:
4
Right:
3
Left:
1
Right:
2
Left done
Right done
旋轉
deque另外一個作用可以按照任意一個方向旋轉,而跳過一些元素。
import
collections
d
=
collections.deque(
xrange
(
10
))
print
'Normal:'
, d
d
=
collections.deque(
xrange
(
10
))
d.rotate(
2
)
print
'Right roration:'
, d
d
=
collections.deque(
xrange
(
10
))
d.rotate(
-
2
)
print
'Left roration:'
, d
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Normal: deque([
,
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
])
Right roration: deque([
8
,
9
,
,
1
,
2
,
3
,
4
,
5
,
6
,
7
])
Left roration: deque([
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
,
1
])
4
namedtuple
标準
tuple
使用數值索引來通路其成員。
nametuple執行個體與正常元祖在記憶體使用方面同樣高效,因為它們沒有各執行個體的字典。各種nametuple都是由自己的類表示,使用nametuple()工廠函數來建立。參數就是一個新類名和一個包含元素名的字元串。
import
collections
Person
=
collections.namedtuple(
'Persion'
,
'name age gender'
)
print
'Type of Person:'
,
type
(Person)
bob
=
Person(name
=
'Bob'
, age
=
30
, gender
=
'male'
)
print
'\nRepresentation:'
, bob
jane
=
Person(name
=
'Jane'
, age
=
28
, gender
=
'female'
)
print
'\nField by name:'
, jane.name
print
'\nField by index:'
for
p
in
[bob, jane]:
print
'%s is a %d year old %s'
%
p
5
OrderedDict
OrderedDict是一個字典子類,可以記住其内容增加的順序。
import
collections
print
'Regular dictionary:'
d
=
{}
d[
'a'
]
=
'A'
d[
'b'
]
=
'B'
d[
'c'
]
=
'C'
for
k, v
in
d.items():
print
k, v
print
'\nOrderDict:'
d
=
collections.OrderedDict()
d[
'a'
]
=
'A'
d[
'b'
]
=
'B'
d[
'c'
]
=
'C'
for
k, v
in
d.items():
print
k, v
正常
dict
并不跟蹤插入順序,疊代處理會根據鍵在散清單中存儲的順序來生成值。在OrderDict中則相反,它會記住元素插入的順序,并在建立疊代器時使用這個順序。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Regular dictionary:
a A
c C
b B
OrderDict:
a A
b B
c C
正常
dict
在檢查相等性是會檢視其内容,OrderDict中還會考慮元素增加的順序。