###### tags: `Data Compression` # Huffman Coding 霍夫曼編碼是一種可變長度編碼法 (VLC) ==定義== X 是在 pmf p(X) 中的離散隨機變數,C 是一個 X 中的任一前綴碼,$C_{Huff}$是一個霍夫曼編碼 那麼 $l(C_{Huff})<=l(C)$ 而且 $H(X) <= l(C_{Huff}) < H(X)+1$ ## 範例一 以下是每個 Letter 出現的機率,接著使用霍夫曼編碼進行編碼 | x | 1 | 2 | 3 | 4 | 5 | |:-:|:-:|:-:|:-:|:-:|:-:| | $P_i$ | 0.4 | 0.2 | 0.2 | 0.1 | 0.1 | ### Step1 將機率小的合併 ```graphviz graph step1 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix: s -- x4: n [style = "dotted", color = "red", label = "a to b"] mix: s -- x5: n { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ]; } } ``` ### Step2 ```graphviz graph step2 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix2 -- {x3 mix} mix: sw -- x4 mix: se -- x5 { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ]; } } ``` ### Step3 ```graphviz graph step3 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix3 -- {x2 mix2} mix2 -- {x3 mix} mix -- {x4 x5} { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ]; } } ``` ### Step4 左子樹給 0,右子樹給 1 ```graphviz graph step4 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix4 [label="1", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix4 -- x1 [label = "0"] mix4 -- mix3 [label = "1"] mix3 -- x2 [label = "0"] mix3 -- mix2 [label = "1"] mix2 -- x3 [label = "0"] mix2 -- mix [label = "1"] mix -- x4 [label = "0"] mix -- x5 [label = "1"] { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ]; } subgraph a{ node[shape = plaintext]; Code 0 10 110 1110 1111; } x1 -- 0 [style=invisible] x2 -- 10 [style=invisible] x3 -- 110 [style=invisible] x4 -- 1110 [style=invisible] x5 -- 1111 [style=invisible] { rank = same; Code; 0 } } ``` 平均編碼長度 $l(C) = 1 * 0.4 + 2 * 0.2 + 3 * 0.2 + 4 * 0.1 + 4 * 0.1 = 2.2$ bits/symbol 散度 $H(X) = 0.4Log_2\dfrac{1}{0.4} + 0.2Log_2\dfrac{1}{0.2} + 0.2Log_2\dfrac{1}{0.2} + 0.1Log_2\dfrac{1}{0.1} + 0.1Log_2\dfrac{1}{0.1} = 2.12$ bits/symbol ## 範例二 ### Step2 如果我們將上述範例一的步驟二作修改,因為 0.2 有三個,優先挑尚未合併者 ```graphviz graph step2 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix2 -- {x2 x3} mix -- {x4 x5} { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ]; } } ``` ### Step3 出現兩個 0.4,優先合併還未合併的,為了讓樹好看一點,把要合併的放在一起 ```graphviz graph step2 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix3 -- {x1 mix} mix2 -- {x2 x3} mix -- {x4 x5} { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x2 -- x3 -- x1 -- x4 -- x5 [ style=invis ]; } } ``` ### Step4 較大的機率移動到左邊,最後合併,左子樹給 0,右子樹給 1 ```graphviz graph step2 { // make invisible ranks rank1 [style=invisible]; rank2 [style=invisible]; // make "invisible" (white) link between them rank1 -- rank2 [color=white]; // declare nodes all out of desired order mix4 [label="1.0", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16]; x1 [label="1 | 0.4"]; x2 [label="2 | 0.2"]; x3 [label="3 | 0.2"]; x4 [label=" 4 | 0.1"]; x5 [label=" 5 | 0.1"]; // connect node mix4 -- {mix3 mix2} mix3 -- {x1 mix} mix2 -- {x2 x3} mix -- {x4 x5} { rank = same; // Here you enforce the desired order with "invisible" edges and arrowheads rank2 -- x1 -- x4 -- x5 -- x2 -- x3 [ style=invis ]; } subgraph a{ node[shape = plaintext]; Code 00 010 011 10 11; } x1 -- 00 [style=invisible] x4 -- 010 [style=invisible] x5 -- 011 [style=invisible] x2 -- 10 [style=invisible] x3 -- 11 [style=invisible] { rank = same; Code; 00 } } ``` 平均編碼長度 $l(C) = 2 * 0.4 + 3 * 0.1 + 3 * 0.1 + 2 * 0.2 + 2 * 0.2 = 2.2$ bits/symbol :::info 我們可以發現不同的編排步驟,平均編碼長度是一樣的,範例一編排方式有較小的 Code 4 bits 長度的 Codeword 發生機率很低,但記憶體仍然要空出 4-bits 的空間 因此 Codeword 長度變化較少的範例二,會比較好 ::: :::success 根據以下要求,所獲得的霍夫曼代碼在大多數情況下是唯一的 - 合併發生機率低的節點 - 每一步驟都要進行排序,由大到小,由左而右 - 已經合併過的節點必然造成較長的長度 ::: ppt 16 頁