天天看點

Python資料結構——collections

Python包括很多标準程式設計資料結構,如

list

,

tuple

,

dict

,

set

,這些屬于内置類型

collections子產品包含多種資料結構的實作,擴充了其他子產品中相應的結構。

Counter是一個容器,可以跟蹤相同的值增加了多少次。這個類可以用來實作其他語言常用包或多集合資料結構來實作的算法。

Deque是一個雙端隊列,允許從任意一端增加或删除元素。

defaultdict是一個字典,如果找不到某個鍵,會相應一個預設值。

OrderedDict會記住增加元素的序列。

nametuple擴充了一般的

tuple

,除了為每個成員元素提供一個數值索引外還提供了一個屬性名。

1.Counter

Counter作為一個容器,可以跟蹤相同的值增加了多少次。這個類可以用來實作其他語言常用包或多集合資料結構來實作的算法。

初始化

Counter支援

3

種形式的初始化。調用Counter的構造函數時可以提供一個元素序列或者一個包含鍵和計數的字典,還可以使用關鍵字參數将字元串名映射到計數。

import

collections

print

collections.Counter([

'a'

'b'

'c'

'a'

'b'

'b'

])

print

collections.Counter({

'a'

:

2

'b'

:

3

'c'

:

1

})

print

collections.Counter(a

=

2

, b

=

3

, c

=

1

)

print

collections.Counter('aabbbc'

)

這四種形式的初始化結構都是一樣的。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Counter({

'b'

3

'a'

2

'c'

1

})

如果不提供任何參數,可以構造一個空的Counter,然後通過update()方法填充。

import

collections

=

collections.Counter()

print

'Initial  :'

, c

c.update(

'abcdcaa'

)

print

'Sequencel:'

, c

c.update({

'a'

:

1

'd'

:

6

})

print

'Dict     :'

, c

計數值将根據新資料增加,替換資料不會改變計數。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Initial  : Counter()

Sequencel: Counter({

'a'

3

'c'

2

'b'

1

'd'

1

})

Dict

: Counter({

'd'

7

'a'

4

'c'

2

'b'

1

})

通路計數

一旦填充了Counter,可以使用字典API擷取它的值。

import

collections

=

collections.Counter(

'abcdccca'

)

for

letter 

in

'abcde'

:

print

'%s : %d'

%

(letter, c[letter])

對于未知元素,Counter不會産生KerError,如果沒有找到某個值,其計數為

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

a : 

2

b : 

1

c : 

4

d : 

1

elements()方法傳回一個疊代器,将生産Counter知道的所有元素

import

collections

=

collections.Counter(

'abcdccca'

)

c[

'e'

=

print

c

print

list

(c.elements())

不能保證元素順序不變,另外計數小于或等于

的元素不包含在内。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Counter({

'c'

4

'a'

2

'b'

1

'd'

1

'e'

})

[

'a'

'a'

'c'

'c'

'c'

'c'

'b'

'd'

]

使用most_common()可以生成一個序列,其中包含n個最常遇到的輸入值及其相應計數。

import

collections

=

collections.Counter()

with 

open

(r

'd:\check_traffic.sh'

'rt'

) as f:

for

line 

in

f:

c.update(line.rstrip().lower())

print

'Most common:'

for

letter, count 

in

c.most_common(

5

):

print

'%s: %6d'

%

(letter, count)

統計系統所有單詞中出現的字母,生成一個頻率分布,然後列印

5

個最常見的字母。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Most common:

:   

6535

e:   

3435

:   

3202

t:   

3141

i:   

3100

算術操作

Counter執行個體支援算術和集合操作來完成結果的聚集。

import

collections

c1 

=

collections.Counter([

'a'

'a'

'c'

'b'

,

'a'

])

c2 

=

collections.Counter(

'alphabet'

)

print

'c1:'

, c1

print

'c2:'

, c2

print

'\nCombined counts:'

print

c1 

+

c2

print

'\nSubtraction:'

print

c1 

-

c2

print

'\nIntersection:'

print

c1 & c2

print

'\nUnion:'

print

c1 | c2

每次通過操作生成一個新的Counter時,計數為

或者負的元素都會被删除。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

c1: Counter({

'a'

3

'c'

1

'b'

1

})

c2: Counter({

'a'

2

'b'

1

'e'

1

'h'

1

'l'

1

'p'

1

't'

1

})

Combined counts:

Counter({

'a'

5

'b'

2

'c'

1

'e'

1

'h'

1

'l'

1

'p'

1

't'

1

})

Subtraction:

Counter({

'a'

1

'c'

1

})

Intersection:

Counter({

'a'

2

'b'

1

})

Union:

Counter({

'a'

3

'c'

1

'b'

1

'e'

1

'h'

1

'l'

1

'p'

1

't'

1

})

2

defaultdict

标準字典包括一個方法setdefault()來擷取一個值

,如果值不存在則建立一個預設值。defaultdict初始化容器是會讓調用者提前指定預設值。

import

collections

def

default_factory():

return

'default value'

=

collections.defaultdict(default_factory, foo 

=

'bar'

) 或者 d = collections.defaultdict(lambda :'333',{'

foo' 

=

'bar'}

)

print

'd:'

, d

print

'foo =>'

, d[

'foo'

]

print

'var =>'

, d[

'bar'

]

隻要所有鍵都有相同的預設值,就可以使用這個方法。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

d: defaultdict(<function default_factory at 

0x0201FAB0

>, {

'foo'

'bar'

})

foo 

=

> bar

var 

=

> default value

3

deque

deque(兩端隊列)支援從任意一端增加和删除元素。常用的兩種結果,即棧和隊列,就是兩端隊列的退化形式,其輸入和輸出限制在一端。

import

collections

=

collections.deque(

'abcdefg'

)

print

'Deque:'

, d

print

'Length:'

len

(d)

print

'Left end'

, d[

]

print

'Right end'

, d[

-

1

]

d.remove(

'c'

)

print

'remove(c):'

, d

deque是一種序列容器,支援

list

操作,可以通過比對辨別從序列中間删除元素。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Deque: deque([

'a'

'b'

'c'

'd'

'e'

'f'

'g'

])

Length: 

7

Left end a

Right end g

remove(c): deque([

'a'

'b'

'd'

'e'

'f'

'g'

])

填充

deque可以從任意一端填充,在python實作稱為“左端”和“右端”。

import

collections

d1 

=

collections.deque()

d1.extend(

'abcdefg'

)

print

'extend:'

, d1

d1.append(

'h'

)

print

'append:'

, d1

d2 

=

collections.deque()

d2.extendleft(

xrange

(

6

))

print

'extendleft'

, d2

d2.appendleft(

6

)

print

'appendleft'

, d2

extendleft()疊代處理其輸入,對每個元素完成與appendleft()相同的處理。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

extend: deque([

'a'

'b'

'c'

'd'

'e'

'f'

'g'

])

append: deque([

'a'

'b'

'c'

'd'

'e'

'f'

'g'

'h'

])

extendleft deque([

5

4

3

2

1

])

appendleft deque([

6

5

4

3

2

1

])

利用

可以從兩端利用deque元素,取決于應用的算法。

import

collections

print

"From the right:"

=

collections.deque(

'abcdefg'

)

while

True

:

try

:

print

d.pop(),

except

IndexError:

break

print

print

"\nFrom the left:"

=

collections.deque(

xrange

(

6

))

while

True

:

try

:

print

d.popleft(),

except

IndexError:

break

print

使用pop()可以從deque右端删除一個元素,使用popleft()可以從deque左端删除一個元素。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

From the right:

g f e d c b a

From the left:

1

2

3

4

5

由于雙端隊列是線程安全的,可以在不同的線程中同時從兩端利用隊列的内容。

import

collections

import

threading

import

time

candle 

=

collections.deque(

xrange

(

5

))

def

burn(direction, nextSource):

while

True

:

try

:

next

=

nextSource()

except

IndexError:

break

else

:

print

'%8s: %s'

%

(direction, 

next

)

time.sleep(

0.1

)

print

'%8s done'

%

direction

return

left 

=

threading.Thread(target

=

burn, args

=

(

'Left'

, candle.popleft))

right 

=

threading.Thread(target

=

burn, args

=

(

'Right'

, candle.pop))

left.start()

right.start()

left.join()

right.join()

線程交替處理兩端,删除元素,知道這個deque為空。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Left: 

Right: 

4

Right: 

3

Left: 

1

Right: 

2

Left done

Right done

旋轉

deque另外一個作用可以按照任意一個方向旋轉,而跳過一些元素。

import

collections

=

collections.deque(

xrange

(

10

))

print

'Normal:'

, d

d

=

collections.deque(

xrange

(

10

))

d.rotate(

2

)

print

'Right roration:'

, d

=

collections.deque(

xrange

(

10

))

d.rotate(

-

2

)

print

'Left roration:'

, d

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Normal: deque([

1

2

3

4

5

6

7

8

9

])

Right roration: deque([

8

9

1

2

3

4

5

6

7

])

Left roration: deque([

2

3

4

5

6

7

8

9

1

])

4

namedtuple

标準

tuple

使用數值索引來通路其成員。

nametuple執行個體與正常元祖在記憶體使用方面同樣高效,因為它們沒有各執行個體的字典。各種nametuple都是由自己的類表示,使用nametuple()工廠函數來建立。參數就是一個新類名和一個包含元素名的字元串。

import

collections

Person 

=

collections.namedtuple(

'Persion'

'name age gender'

)

print

'Type of Person:'

type

(Person)

bob 

=

Person(name

=

'Bob'

, age

=

30

, gender

=

'male'

)

print

'\nRepresentation:'

, bob

jane 

=

Person(name

=

'Jane'

, age

=

28

, gender

=

'female'

)

print

'\nField by name:'

, jane.name

print

'\nField by index:'

for

in

[bob, jane]:

print

'%s is a %d year old %s'

%

p

5

OrderedDict

OrderedDict是一個字典子類,可以記住其内容增加的順序。

import

collections

print

'Regular dictionary:'

=

{}

d[

'a'

=

'A'

d[

'b'

=

'B'

d[

'c'

=

'C'

for

k, v 

in

d.items():

print

k, v

print

'\nOrderDict:'

=

collections.OrderedDict()

d[

'a'

=

'A'

d[

'b'

=

'B'

d[

'c'

=

'C'

for

k, v 

in

d.items():

print

k, v

正常

dict

并不跟蹤插入順序,疊代處理會根據鍵在散清單中存儲的順序來生成值。在OrderDict中則相反,它會記住元素插入的順序,并在建立疊代器時使用這個順序。

>>> 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

RESTART 

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

>>> 

Regular dictionary:

a A

c C

b B

OrderDict:

a A

b B

c C

正常

dict

在檢查相等性是會檢視其内容,OrderDict中還會考慮元素增加的順序。