文章目錄
- Huffman Tree
- 特點
- code
- Huffman Code
Huffman Tree
最優二叉樹 指帶權路徑長度最短的樹
特點
- 不唯一(具有相同帶權結點)且 左右子樹可以互換
- 帶權值的節點都是葉子節點
- 隻有葉子節點和度為2的節點,沒有度為1的節點
- 若有n個葉子節點,則一共有2n-1個結點
code
class HeapNode:
def __init__(self, char=None, weight=None, left=None, right=None):
self.char = char
self.weight = weight
self.left = left
self.right = right
def __lt__(self, other):
return self.weight < other.weight
from itertools import groupby
from heapq import *
class HuffmanTree:
def __init__(self):
self.codes = {}
self.reverse_mapping = {}
def build(self, text): # 使用heapq建構最小堆,生成Haffman樹
minheap = [HeapNode(char, len(list(times))) for char, times in groupby(sorted(text))] # groupby為分類并統計出現次數
heapify(minheap) # heapify之後的為清單,按weight排好
while len(minheap) > 1:
left = heappop(minheap) # 取出最小的
right = heappop(minheap) # 取出最小的
parent = HeapNode(None, left.weight + right.weight) # 将它們并在一起
parent.left = left
parent.right = right
heappush(minheap, parent) # 并在一起的父結點再入堆,重複找兩個最小的
return minheap # 最終清單長度為1,Huffman樹的根節點為minheap[0]
def make_codes(self, code, node):
if node.char: # 隻有葉子節點才有char
if not code: # 對于樹的depth為1
self.codes[node.char] = "0"
else:
self.codes[node.char] = code
self.reverse_mapping[code] = node.char
else:
self.make_codes(code + "0", node.left) # 左子樹編碼為0
self.make_codes(code + "1", node.right) # 右子樹編碼為1
def encoded_text(self, text):
encoded_text = ""
for character in text:
encoded_text += self.codes[character]
return encoded_text
def decode_text(self, encoded_text):
current_code = ""
decoded_text = ""
for bit in encoded_text:
current_code += bit
if current_code in self.reverse_mapping:
character = self.reverse_mapping[current_code]
decoded_text += character
current_code = ""
return decoded_text
def wpl(self,subtree,depth):
if subtree.left is None and subtree.right is None: # 隻有為葉子結點時,才計算權長
return depth*subtree.weight
else:
return self.wpl(subtree.left,depth+1) + self.wpl(subtree.right,depth+1) # 非葉子節點,遞歸的計算它的左右子樹的權長
def test_huffman():
ht = HuffmanTree()
text = 'hhhhhhhhhellllllooo'
httree = ht.build(text) # 根據text中字元出現的頻率來建構一顆Huffman樹
ht.make_codes('', httree[0]) # 對葉子結點的字元編碼
print(ht.codes) # 字元對應的編碼 {'h': '0', 'e': '100', 'o': '101', 'l': '11'}
print(ht.reverse_mapping) # 編碼對應的字元 {'0': 'h', '100': 'e', '101': 'o', '11': 'l'}
encode = ht.encoded_text("hello")
text_ = ht.decode_text(encode)
print('code:',encode) # code: 01001111101
print('text:',text_) # text: hello
print('WPL:',ht.wpl(httree[0],0)) # Huffman樹的權長最小 WPL: 33
Huffman Code
由測試結果已知:
- n 個權值建構出來的哈夫曼樹擁有 n 個葉子節點
- 每個哈夫曼編碼都不是另一個哈夫曼編碼的字首
- 哈夫曼樹是帶權路徑長度最短的樹,權值較大的節點離根節點較近
- 帶權路徑長度:樹中所有的葉子節點的權值乘上其到根節點的路徑長度。與最終的哈夫曼編碼總長度成正比關系。
Reference:小碼哥MJ