老衛帶你學---python中heapq源碼剖析

這是一個相當實用的内置子產品，但是很多人竟然不知道他的存在——筆者也是今天偶然看到的，哎……盡管如此，還是改變不了這個子產品好用的事實

heapq 子產品實作了适用于Python清單的最小堆排序算法。

堆是一個樹狀的資料結構，其中的子節點都與父母排序順序關系。因為堆排序中的樹是滿二叉樹，是以可以用清單來表示樹的結構，使得元素 N 的子元素位于 2N + 1 和 2N + 2 的位置（對于從零開始的索引）。

本文内容将分為三個部分，第一個部分簡單介紹 heapq 子產品的使用；第二部分回顧堆排序算法；第三部分分析heapq中的實作。

heapq 的使用

建立堆有兩個基本的方法：heappush() 和 heapify()，取出堆頂元素用 heappop()。

heappush() 是用來向已有的堆中添加元素，一般從空清單開始建構：

import heapq
 
data = [97, 38, 27, 50, 76, 65, 49, 13]
heap = []
 
for n in data:
 heapq.heappush(heap, n)
 
print('pop:', heapq.heappop(heap)) # pop: 13
print(heap) # [27, 50, 38, 97, 76, 65, 49]

如果資料已經在清單中，則使用 heapify() 進行重排：

import heapq
 
data = [97, 38, 27, 50, 76, 65, 49, 13]
 
heapq.heapify(data)
 
print('pop:', heapq.heappop(data)) # pop: 13
print(data) # [27, 38, 49, 50, 76, 65, 97]

回顧堆排序算法

堆排序算法基本思想是：将無序序列建成一個堆，得到關鍵字最小（或最大的記錄；輸出堆頂的最小（大）值後，使剩餘的 n-1 個元素重又建成一個堆，則可得到n個元素的次小值；重複執行，得到一個有序序列，這個就是堆排序的過程。

堆排序需要解決兩個問題：

如何由一個無序序列建立成一個堆？

如何在輸出堆頂元素之後，調整剩餘元素，使之成為一個新的堆？

新添加元素和，如何調整堆？

先來看看第二個問題的解決方法。采用的方法叫“篩選”，當輸出堆頂元素之後，就将堆中最後一個元素代替之；然後将根結點值與左、右子樹的根結點值進行比較，并與其中小者進行交換；重複上述操作，直至葉子結點，将得到新的堆，稱這個從堆頂至葉子的調整過程為“篩選”。

老衛帶你學---python中heapq源碼剖析

如上圖所示，當堆頂 13 輸出後，将堆中末尾的 97 替代為堆頂，然後堆頂與它的子節點 38 和 27 中的小者交換；元素 97 在新的位置上在和它的子節點 65 和 49 中的小者交換；直到元素97成為葉節點，就得到了新的堆。這個過程也叫下沉。

讓堆中位置為 pos 元素進行下沉的如下：

def heapdown(heap, pos):
 endpos = len(heap)
 while pos < endpos:
 lchild = 2 * pos + 1
 rchild = 2 * pos + 2
 if lchild >= endpos: # 如果pos已經是葉節點，退出循環
  break
 childpos = lchild # 假設要交換的節點是左節點
 if rchild < endpos and heap[childpos] > heap[rchild]:
  childpos = rchild
 
 if heap[pos] < heap[childpos]: # 如果節點比子節點都小，退出循環
  break
 heap[pos], heap[childpos] = heap[childpos], heap[pos] # 交換
 pos = childpos

再來看看如何解決第三個問題：新添加元素和，如何調整堆？這個的方法正好與下沉相反，首先将新元素放置清單的最後，然後新元素與其父節點比較，若比父節點小，與父節點交換；重複過程直到比父節點大或到根節點。這個過程使得元素從底部不斷上升，從下至上恢複堆的順序，稱為上浮。

将位置為 pos 進行上浮的代碼為：

def heapup(heap, startpos, pos): # 如果是新增元素，startpos 傳入 0
 while pos > startpos:
 parentpos = (pos - 1) // 2
 if heap[pos] < heap[parentpos]:
  heap[pos], heap[parentpos] = heap[parentpos], heap[pos]
  pos = parentpos
 else:
  break

第一個問題：如何由一個無序序列建立成一個堆？從無序序列的第 n/2 個元素（即此無序序列對應的完全二叉樹的最後一個非終端結點）起，至第一個元素止，依次進行下沉：

老衛帶你學---python中heapq源碼剖析

for i in reversed(range(len(data) // 2)):
 heapdown(data, i)

heapq 源碼分析

添加新元素到堆中的 heappush() 函數：

def heappush(heap, item):
 """Push item onto heap, maintaining the heap invariant."""
 heap.append(item)
 _siftdown(heap, 0, len(heap)-1)

把目标元素放置清單最後，然後進行上浮。盡管它命名叫 down ,但這個過程是上浮的過程，這個命名也讓我困惑，後來我才知道它是因為元素的索引不斷減小，是以命名 down 。下沉的過程它也就命名為 up 了。

def _siftdown(heap, startpos, pos):
 newitem = heap[pos]
 # Follow the path to the root, moving parents down until finding a place
 # newitem fits.
 while pos > startpos:
  parentpos = (pos - 1) >> 1
  parent = heap[parentpos]
  if newitem < parent:
   heap[pos] = parent
   pos = parentpos
   continue
  break
 heap[pos] = newitem

一樣是通過 newitem 不斷與父節點比較。不一樣的是這裡缺少了元素交換的過程，而是計算出新元素最後所在的位置 pos 并進行的指派。顯然這是優化後的代碼，減少了不斷交換元素的備援過程。

再來看看輸出堆頂元素的函數 heappop():

def heappop(heap):
 """Pop the smallest item off the heap, maintaining the heap invariant."""
 lastelt = heap.pop() # raises appropriate IndexError if heap is empty
 if heap:
  returnitem = heap[0]
  heap[0] = lastelt
  _siftup(heap, 0)
  return returnitem
 return lastelt

通過 heap.pop() 獲得清單中的最後一個元素，然後替換為堆頂 heap[0] = lastelt ，再進行下沉：

def _siftup(heap, pos):
 endpos = len(heap)
 startpos = pos
 newitem = heap[pos]
 # Bubble up the smaller child until hitting a leaf.
 childpos = 2*pos + 1 # 左節點，預設替換左節點
 while childpos < endpos:
  # Set childpos to index of smaller child.
  rightpos = childpos + 1 # 右節點
  if rightpos < endpos and not heap[childpos] < heap[rightpos]:
   childpos = rightpos # 當右節點比較小時，應交換的是右節點
  # Move the smaller child up.
  heap[pos] = heap[childpos]
  pos = childpos
  childpos = 2*pos + 1
 # The leaf at pos is empty now. Put newitem there, and bubble it up
 # to its final resting place (by sifting its parents down).
 heap[pos] = newitem
 _siftdown(heap, startpos, pos)

這邊的代碼将準備要下沉的元素視為新元素 newitem ，将其目前的位置 pos 視為空位置，由其子節點中的小者進行取代，反複如此，最後會在葉節點留出一個位置，這個位置放入 newitem ，再讓新元素進行上浮。

再來看看讓無序數列重排成堆的 heapify() 函數：

def heapify(x):
 """Transform list into a heap, in-place, in O(len(x)) time."""
 n = len(x)
 for i in reversed(range(n//2)):
  _siftup(x, i)

這部分就和理論上的一緻，從最後一個非葉節點 (n // 2) 到根節點為止，進行下沉。

總結

堆排序結合圖來了解還是比較好了解的。這種資料結構常用于優先隊列（标準庫Queue的優先隊列用的就是堆）。 heapq 子產品中還有很多其他 heapreplace ，heappushpop 等大體上都很類似。

老衛帶你學---python中heapq源碼剖析

heapq 的使用

heapq 源碼分析

總結

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入