天天看點

如何在Python 3中使用collections子產品

The author selected the COVID-19 Relief Fund to receive a donation as part of the Write for DOnations program.

作者選擇了COVID-19救濟基金來接受捐贈,這是Write for DOnations計劃的一部分。

介紹 (Introduction)

Python 3 has a number of built-in data structures, including tuples, dictionaries, and lists. Data structures provide us with a way to organize and store data. The

collections

module helps us populate and manipulate data structures efficiently.

Python 3具有許多内置的資料結構,包括元組,字典和清單。 資料結構為我們提供了一種組織和存儲資料的方法。

collections

子產品可幫助我們有效地填充和處理資料結構。

In this tutorial, we’ll go through three classes in the

collections

module to help you work with tuples, dictionaries, and lists. We’ll use

namedtuples

to create tuples with named fields,

defaultdict

to concisely group information in dictionaries, and

deque

to efficiently add elements to either side of a list-like object.

在本教程中,我們将在

collections

子產品中周遊三個類,以幫助您使用元組,字典和清單。 我們将使用

namedtuples

建立具有命名字段的元組,

defaultdict

将字典中的資訊簡潔地分組,并使用

deque

高效地将元素添加到類似清單的對象的任一側。

For this tutorial, we’ll be working primarily with an inventory of fish that we need to modify as fish are added to or removed from a fictional aquarium.

在本教程中,我們将主要處理魚類的清單,當将魚類添加到虛拟水族館或從中删除時,需要對其進行修改。

先決條件 (Prerequisites)

To get the most out of this tutorial, it is recommended to have some familiarity with the tuple, dictionary, and list data types, both with their syntax, and how to retrieve data from them. You can review these tutorials for the necessary background information:

為了充分利用本教程,建議您熟悉元組,字典和清單資料類型,以及它們的文法以及如何從中檢索資料。 您可以檢視這些教程以擷取必要的背景資訊:

  • Understanding Tuples in Python 3

    了解Python 3中的元組

  • Understanding Dictionaries in Python 3

    了解Python 3中的字典

  • Understanding Lists in Python 3

    了解Python 3中的清單

将命名字段添加到元組 (Adding Named Fields to Tuples)

Python tuples are an immutable, or unchangeable, ordered sequence of elements. Tuples are frequently used to represent columnar data; for example, lines from a CSV file or rows from a SQL database. An aquarium might keep track of its inventory of fish as a series of tuples.

Python元組是一個不變或不變的有序元素序列。 元組通常用于表示列資料。 例如,CSV檔案中的行或SQL資料庫中的行。 水族館可能會跟蹤作為一系列元組的魚類資源。

An individual fish tuple:

單個魚元組:

("Sammy", "shark", "tank-a")
           

This tuple is composed of three string elements.

該元組由三個字元串元素組成。

While useful in some ways, this tuple does not clearly indicate what each of its fields represents. In actuality, element

is a name, element

1

is a species, and element

2

is the holding tank.

盡管在某些方面有用,但該元組不能清楚地訓示其每個字段代表什麼。 實際上,元素

是名稱,元素

1

是種類,元素

2

是儲罐。

Explanation of fish tuple fields:

魚元組字段說明:

name species tank
Sammy shark tank-a
名稱 種類 坦克
薩米 鲨魚 坦克

This table makes it clear that each of the tuple’s three elements has a clear meaning.

該表清楚地表明,元組的三個元素中的每一個都有明确的含義。

namedtuple

from the

collections

module lets you add explicit names to each element of a tuple to make these meanings clear in your Python program.

collections

子產品中的

namedtuple

允許您将顯式名稱添加到元組的每個元素,以使這些含義在Python程式中清晰可見。

Let’s use

namedtuple

to generate a class that clearly names each element of the fish tuple:

讓我們使用

namedtuple

生成一個類,清楚地命名魚元組的每個元素:

from collections import namedtuple

Fish = namedtuple("Fish", ["name", "species", "tank"])
           

from collections import namedtuple

gives your Python program access to the

namedtuple

factory function. The

namedtuple()

function call returns a class that is bound to the name

Fish

. The

namedtuple()

function has two arguments: the desired name of our new class

"Fish"

and a list of named elements

["name", "species", "tank"]

.

from collections import namedtuple

可讓您的Python程式通路

namedtuple

工廠功能。

namedtuple()

函數調用傳回一個綁定到名稱

Fish

namedtuple()

函數具有兩個參數:新類

"Fish"

的所需名稱和命名元素的清單

["name", "species", "tank"]

We can use the

Fish

class to represent the fish tuple from earlier:

我們可以使用

Fish

類來表示先前的魚元組:

sammy = Fish("Sammy", "shark", "tank-a")

print(sammy)
           

If we run this code, we’ll see the following output:

如果運作此代碼,我們将看到以下輸出:

Output
   Fish(name='Sammy', species='shark', tank='tank-a')
           

sammy

is instantiated using the

Fish

class.

sammy

is a tuple with three clearly named elements.

sammy

是使用

Fish

類執行個體化的。

sammy

是具有三個明确命名的元素的元組。

sammy

’s fields can be accessed by their name or with a traditional tuple index:

sammy

的字段可以通過其名稱或傳統的元組索引進行通路:

print(sammy.species)
print(sammy[1])
           

If we run these two

print

calls, we’ll see the following output:

如果我們運作這兩個

print

調用,我們将看到以下輸出:

Output
   shark
shark
           

Accessing

.species

returns the same value as accessing the second element of

sammy

using

[1]

.

通路

.species

傳回與使用

[1]

通路

sammy

的第二個元素相同的值。

Using

namedtuple

from the

collections

module makes your program more readable while maintaining the important properties of a tuple (that they’re immutable and ordered).

使用

collections

子產品中的

namedtuple

使您的程式更具可讀性,同時保持元組的重要屬性(它們是不可變的和有序的)。

In addition, the

namedtuple

factory function adds several extra methods to instances of

Fish

.

另外,

namedtuple

工廠函數向

Fish

執行個體添加了一些額外的方法。

Use

._asdict()

to convert an instance to a dictionary:

使用

._asdict()

将執行個體轉換為字典:

print(sammy._asdict())
           

If we run

print

, you’ll see output like the following:

如果運作

print

,您将看到類似以下的輸出:

Output
   {'name': 'Sammy', 'species': 'shark', 'tank': 'tank-a'}
           

Calling

.asdict()

on

sammy

returns a dictionary mapping each of the three field names to their corresponding values.

sammy

上調用

.asdict()

傳回一個字典,該字典将三個字段名稱中的每一個映射到它們對應的值。

Python versions older than 3.8 might output this line slightly differently. You might, for example, see an

OrderedDict

instead of the plain dictionary shown here.

低于3.8的Python版本可能會略有不同。 例如,您可能會看到

OrderedDict

而不是此處顯示的普通字典。

Note: In Python, methods with leading underscores are usually considered “private.” Additional methods provided by

namedtuple

(like

_asdict()

,

._make()

, .

_replace()

, etc.), however, are public.

注意:在Python中,帶下劃線的方法通常被視為“私有”。 通過提供額外的方法

namedtuple

(像

_asdict()

._make()

_replace()

等等),但是, 是公開的 。

在字典中收集資料 (Collecting Data in a Dictionary)

It is often useful to collect data in Python dictionaries.

defaultdict

from the

collections

module can help us assemble information in dictionaries quickly and concisely.

在Python詞典中收集資料通常很有用。

collections

子產品中的

defaultdict

可以幫助我們快速簡潔地将資訊彙編到字典中。

defaultdict

never raises a

KeyError

. If a key isn’t present,

defaultdict

just inserts and returns a placeholder value instead:

defaultdict

永遠不會引發

KeyError

。 如果不存在鍵,則

defaultdict

隻會插入并傳回一個占位符值:

from collections import defaultdict

my_defaultdict = defaultdict(list)

print(my_defaultdict["missing"])
           

If we run this code, we’ll see output like the following:

如果運作此代碼,我們将看到如下輸出:

Output
   []
           

defaultdict

inserts and returns a placeholder value instead of throwing a

KeyError

. In this case we specified the placeholder value as a list.

defaultdict

插入并傳回一個占位符值,而不是抛出

KeyError

。 在這種情況下,我們将占位符值指定為清單。

Regular dictionaries, in contrast, will throw a

KeyError

on missing keys:

相反,正常詞典将在缺少鍵的情況下抛出

KeyError

my_regular_dict = {}

my_regular_dict["missing"]
           

If we run this code, we’ll see output like the following:

如果運作此代碼,我們将看到如下輸出:

Output
   Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'missing'
           

The regular dictionary

my_regular_dict

raises a

KeyError

when we try to access a key that is not present.

正常字典

my_regular_dict

引發一個

KeyError

,當我們試圖通路一個關鍵是不存在的。

defaultdict

behaves differently than a regular dictionary. Instead of raising a

KeyError

on a missing key,

defaultdict

calls the placeholder value with no arguments to create a new object. In this case

list()

to create an empty list.

defaultdict

行為與正常字典不同。

defaultdict

不會在缺少的鍵上引發

KeyError

,而是調用不帶參數的占位符值來建立新對象。 在這種情況下,

list()

建立一個空清單。

Continuing with our fictional aquarium example, let’s say we have a list of fish tuples representing an aquarium’s inventory:

繼續我們虛構的水族館示例,假設我們有一個代表水族館清單的魚元組清單:

fish_inventory = [
    ("Sammy", "shark", "tank-a"),
    ("Jamie", "cuttlefish", "tank-b"),
    ("Mary", "squid", "tank-a"),
]
           

Three fish exist in the aquarium—their name, species, and holding tank are noted in these three tuples.

水族館中有3條魚-在這三個元組中注明了它們的名稱,種類和儲水箱。

Our goal is to organize our inventory by tank—we want to know the list of fish present in each tank. In other words, we want a dictionary that maps

"tank-a"

to

["Jamie", "Mary"]

and

"tank-b"

to

["Jamie"]

.

我們的目标是按儲罐組織庫存-我們想知道每個儲罐中存在的魚類清單。 換句話說,我們需要一個字典,将

"tank-a"

映射到

["Jamie", "Mary"]

,将

"tank-b"

映射到

["Jamie"]

We can use

defaultdict

to group fish by tank:

我們可以使用

defaultdict

将魚按坦克分組:

from collections import defaultdict

fish_inventory = [
    ("Sammy", "shark", "tank-a"),
    ("Jamie", "cuttlefish", "tank-b"),
    ("Mary", "squid", "tank-a"),
]
fish_names_by_tank = defaultdict(list)
for name, species, tank in fish_inventory:
    fish_names_by_tank[tank].append(name)

print(fish_names_by_tank)
           

Running this code, we’ll see the following output:

運作此代碼,我們将看到以下輸出:

Output
   defaultdict(<class 'list'>, {'tank-a': ['Sammy', 'Mary'], 'tank-b': ['Jamie']})
           

fish_names_by_tank

is declared as a

defaultdict

that defaults to inserting

list()

instead of throwing a

KeyError

. Since this guarantees that every key in

fish_names_by_tank

will point to a

list

, we can freely call

.append()

to add names to each tank’s list.

fish_names_by_tank

被聲明為

defaultdict

,預設為插入

list()

而不是抛出

KeyError

。 由于這保證了

fish_names_by_tank

中的每個鍵

fish_names_by_tank

将指向一個

list

,是以我們可以自由地調用

.append()

将名稱添加到每個水箱的清單中。

defaultdict

helps you here because it reduces the chance of unexpected

KeyErrors

. Reducing the unexpected

KeyErrors

means your program can be written more clearly and with fewer lines. More concretely, the

defaultdict

idiom lets you avoid manually instantiating an empty list for every tank.

defaultdict

在這裡為您提供幫助,因為它減少了意外的

KeyErrors

的機會。 減少意外的

KeyErrors

意味着您的程式可以更清晰地編寫,并且行數更少。 更具體地講,

defaultdict

慣用語使您避免手動為每個水箱執行個體化一個空清單。

Without

defaultdict

, the

for

loop body might have looked more like this:

沒有

defaultdict

for

循環主體可能看起來像這樣:

More Verbose Example Without defaultdict 沒有defaultdict的更多詳細示例

...

fish_names_by_tank = {}
for name, species, tank in fish_inventory:
    if tank not in fish_names_by_tank:
      fish_names_by_tank[tank] = []
    fish_names_by_tank[tank].append(name)
           

Using just a regular dictionary (instead of a

defaultdict

) means that the

for

loop body always has to check for the existence of the given

tank

in

fish_names_by_tank

. Only after we’ve verified that

tank

is already present in

fish_names_by_tank

, or has just been initialized with a

[]

, can we append the fish name.

僅使用正常字典(而不是

defaultdict

)意味着

for

循環主體始終必須在

fish_names_by_tank

檢查給定

tank

的存在。 隻有當我們确認

tank

已經存在于

fish_names_by_tank

中或剛剛使用

[]

初始化之後,才可以追加魚的名稱。

defaultdict

can help cut down on boilerplate code when filling up dictionaries because it never raises a

KeyError

.

當填充字典時,

defaultdict

可以幫助減少樣闆代碼,因為它從不引發

KeyError

使用雙端隊列将元素有效地添加到集合的任一側 (Using deque to Efficiently Add Elements to Either Side of a Collection)

Python lists are a mutable, or changeable, ordered sequence of elements. Python can append to lists in constant time (the length of the list has no effect on the time it takes to append), but inserting at the beginning of a list can be slower—the time it takes increases as the list gets bigger.

Python清單是可變的或可變的有序元素序列。 Python可以以固定的時間追加到清單中(清單的長度對追加時間沒有影響),但是在清單的開頭插入可能會變慢-随着清單變大,花費的時間也會增加。

In terms of Big O notation, appending to a list is a constant time

O(1)

operation. Inserting at the beginning of a list, in contrast, is slower with

O(n)

performance.

用大O表示法 ,将附加到清單的内容是固定時間的

O(1)

操作。 相比之下,插入清單的開頭會降低

O(n)

性能。

Note: Software engineers often measure the performance of procedures using something called “Big O” notation. When the size of an input has no effect on the time it takes to perform a procedure, it is said to run in constant time or

O(1)

(“Big O of 1”). As you learned above, Python can append to lists with constant time performance, otherwise known as

O(1)

.

注意:軟體工程師通常使用“ Big O”表示法來衡量過程的性能。 如果輸入的大小對執行過程所花費的時間沒有影響,則稱其以恒定時間或

O(1)

(“ 1的大O”)運作。 如上所述,Python可以以恒定的時間性能附加到清單,否則稱為

O(1)

Sometimes, the size of an input directly affects the amount of time it takes to run a procedure. Inserting at the beginning of a Python list, for example, runs slower the more elements there are in the list. Big O notation uses the letter

n

to represent the size of the input. This means that adding items to the beginning of a Python list runs in “linear time” or

O(n)

(“Big O of n”).

有時,輸入的大小直接影響運作過程所花費的時間。 例如,在Python清單的開頭插入時,清單中的元素越多,運作速度就越慢。 大O表示法使用字母

n

表示輸入的大小。 這意味着将項目添加到Python清單的開頭以“線性時間”或

O(n)

(“ n的大O”)運作。

In general,

O(1)

procedures are faster than

O(n)

procedures.

通常,

O(1)

過程比

O(n)

過程快。

We can insert at the beginning of a Python list:

我們可以在Python清單的開頭插入:

favorite_fish_list = ["Sammy", "Jamie", "Mary"]

# O(n) performance
favorite_fish_list.insert(0, "Alice")

print(favorite_fish_list)
           

If we run the following, we will see output like the following:

如果運作以下指令,将看到類似以下的輸出:

Output
   ['Alice', 'Sammy', 'Jamie', 'Mary']
           

The

.insert(index, object)

method on list allows us to insert

"Alice"

at the beginning of

favorite_fish_list

. Notably, though, inserting at the beginning of a list has

O(n)

performance. As the length of

favorite_fish_list

grows, the time to insert a fish at the beginning of the list will grow proportionally and take longer and longer.

list上的

.insert(index, object)

方法使我們可以在

favorite_fish_list

的開頭插入

"Alice"

。 但是,值得注意的是,插入清單的開頭具有

O(n)

性能。 随着

favorite_fish_list

長度的增加,在清單的開頭插入一條魚的時間将成比例地增加,并且花費的時間也越來越長。

deque

(pronounced “deck”) from the

collections

module is a list-like object that allows us to insert items at the beginning or end of a sequence with constant time (

O(1)

) performance.

collections

子產品中的

deque

(發音為“ deck”)是一個類似清單的對象,使我們可以在具有恒定時間(

O(1)

)性能的序列的開頭或結尾處插入項目。

Insert an item at the beginning of a

deque

:

deque

的開頭插入一個項目:

from collections import deque

favorite_fish_deque = deque(["Sammy", "Jamie", "Mary"])

# O(1) performance
favorite_fish_deque.appendleft("Alice")

print(favorite_fish_deque)
           

Running this code, we will see the following output:

運作此代碼,我們将看到以下輸出:

Output
   deque(['Alice', 'Sammy', 'Jamie', 'Mary'])
           

We can instantiate a

deque

using a preexisting collection of elements, in this case a list of three favorite fish names. Calling

favorite_fish_deque

’s

appendleft

method allows us to insert an item at the beginning of our collection with

O(1)

performance.

O(1)

performance means that the time it takes to add an item to the beginning of

favorite_fish_deque

will not grow even if

favorite_fish_deque

has thousands or millions of elements.

我們可以使用預先存在的元素集合執行個體化

deque

,在這種情況下,這是三個喜歡的魚的名字的清單。 調用

favorite_fish_deque

appendleft

方法可以使我們在集合的開頭插入具有

O(1)

性能的項。

O(1)

性能意味着,即使

favorite_fish_deque

具有數千或數百萬個元素,将項添加到

favorite_fish_deque

開頭的時間也不會增加。

Note: Although

deque

adds entries to the beginning of a sequence more efficiently than a list,

deque

does not perform all of its operations more efficiently than a list. For example, accessing a random item in a

deque

has

O(n)

performance, but accessing a random item in a list has

O(1)

performance. Use

deque

when it is important to insert or remove elements from either side of your collection quickly. A full comparison of time performance is available on Python’s wiki.

注意:盡管

deque

比清單更有效地将條目添加到序列的開頭,但是

deque

并沒有比清單更有效地執行所有操作。 例如,通路

deque

的随機項具有

O(n)

性能,但是通路清單中的随機項具有

O(1)

性能。 當必須快速從集合的任一側插入或删除元素很重要時,請使用

deque

。 時間性能的完整比較可在Python的Wiki上找到 。

結論 (Conclusion)

The

collections

module is a powerful part of the Python standard library that lets you work with data concisely and efficiently. This tutorial covered three of the classes provided by the

collections

module including

namedtuple

,

defaultdict

, and

deque

.

collections

子產品是Python标準庫的強大部分,可讓您簡潔高效地使用資料。 本教程涵蓋了

collections

子產品提供的三個類,包括

namedtuple

defaultdict

deque

From here, you can use the

collection

module’s documentation to learn more about other available classes and utilities. To learn more about Python in general, you can read our How To Code in Python 3 tutorial series.

在這裡,您可以使用

collection

子產品的文檔來了解有關其他可用類和實用程式的更多資訊。 要全面了解有關Python的更多資訊,請閱讀我們的“ 如何在Python 3中編碼”教程系列 。

翻譯自: https://www.digitalocean.com/community/tutorials/how-to-use-the-collections-module-in-python-3