###### tags: `Data Compression`
# Huffman Coding
霍夫曼編碼是一種可變長度編碼法 (VLC)
==定義==
X 是在 pmf p(X) 中的離散隨機變數,C 是一個 X 中的任一前綴碼,$C_{Huff}$是一個霍夫曼編碼
那麼 $l(C_{Huff})<=l(C)$ 而且 $H(X) <= l(C_{Huff}) < H(X)+1$
## 範例一
以下是每個 Letter 出現的機率,接著使用霍夫曼編碼進行編碼
| x | 1 | 2 | 3 | 4 | 5 |
|:-:|:-:|:-:|:-:|:-:|:-:|
| $P_i$ | 0.4 | 0.2 | 0.2 | 0.1 | 0.1 |
### Step1
將機率小的合併
```graphviz
graph step1 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix: s -- x4: n [style = "dotted", color = "red", label = "a to b"]
mix: s -- x5: n
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ];
}
}
```
### Step2
```graphviz
graph step2 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix2 -- {x3 mix}
mix: sw -- x4
mix: se -- x5
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ];
}
}
```
### Step3
```graphviz
graph step3 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix3 -- {x2 mix2}
mix2 -- {x3 mix}
mix -- {x4 x5}
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ];
}
}
```
### Step4
左子樹給 0,右子樹給 1
```graphviz
graph step4 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix4 [label="1", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix4 -- x1 [label = "0"]
mix4 -- mix3 [label = "1"]
mix3 -- x2 [label = "0"]
mix3 -- mix2 [label = "1"]
mix2 -- x3 [label = "0"]
mix2 -- mix [label = "1"]
mix -- x4 [label = "0"]
mix -- x5 [label = "1"]
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ];
}
subgraph a{
node[shape = plaintext];
Code 0 10 110 1110 1111;
}
x1 -- 0 [style=invisible]
x2 -- 10 [style=invisible]
x3 -- 110 [style=invisible]
x4 -- 1110 [style=invisible]
x5 -- 1111 [style=invisible]
{
rank = same; Code; 0
}
}
```
平均編碼長度
$l(C) = 1 * 0.4 + 2 * 0.2 + 3 * 0.2 + 4 * 0.1 + 4 * 0.1 = 2.2$ bits/symbol
散度 $H(X) = 0.4Log_2\dfrac{1}{0.4} + 0.2Log_2\dfrac{1}{0.2} + 0.2Log_2\dfrac{1}{0.2} + 0.1Log_2\dfrac{1}{0.1} + 0.1Log_2\dfrac{1}{0.1} = 2.12$ bits/symbol
## 範例二
### Step2
如果我們將上述範例一的步驟二作修改,因為 0.2 有三個,優先挑尚未合併者
```graphviz
graph step2 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix2 -- {x2 x3}
mix -- {x4 x5}
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x1 -- x2 -- x3 -- x4 -- x5 [ style=invis ];
}
}
```
### Step3
出現兩個 0.4,優先合併還未合併的,為了讓樹好看一點,把要合併的放在一起
```graphviz
graph step2 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix3 -- {x1 mix}
mix2 -- {x2 x3}
mix -- {x4 x5}
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x2 -- x3 -- x1 -- x4 -- x5 [ style=invis ];
}
}
```
### Step4
較大的機率移動到左邊,最後合併,左子樹給 0,右子樹給 1
```graphviz
graph step2 {
// make invisible ranks
rank1 [style=invisible];
rank2 [style=invisible];
// make "invisible" (white) link between them
rank1 -- rank2 [color=white];
// declare nodes all out of desired order
mix4 [label="1.0", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix3 [label="0.6", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix2 [label="0.4", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
mix [label="0.2", shape=circle, style="filled", fillcolor="lightsalmon", color="lightsalmon", fontcolor=black, fontsize=16];
x1 [label="1 | 0.4"];
x2 [label="2 | 0.2"];
x3 [label="3 | 0.2"];
x4 [label=" 4 | 0.1"];
x5 [label=" 5 | 0.1"];
// connect node
mix4 -- {mix3 mix2}
mix3 -- {x1 mix}
mix2 -- {x2 x3}
mix -- {x4 x5}
{
rank = same;
// Here you enforce the desired order with "invisible" edges and arrowheads
rank2 -- x1 -- x4 -- x5 -- x2 -- x3 [ style=invis ];
}
subgraph a{
node[shape = plaintext];
Code 00 010 011 10 11;
}
x1 -- 00 [style=invisible]
x4 -- 010 [style=invisible]
x5 -- 011 [style=invisible]
x2 -- 10 [style=invisible]
x3 -- 11 [style=invisible]
{
rank = same; Code; 00
}
}
```
平均編碼長度
$l(C) = 2 * 0.4 + 3 * 0.1 + 3 * 0.1 + 2 * 0.2 + 2 * 0.2 = 2.2$ bits/symbol
:::info
我們可以發現不同的編排步驟,平均編碼長度是一樣的,範例一編排方式有較小的 Code
4 bits 長度的 Codeword 發生機率很低,但記憶體仍然要空出 4-bits 的空間
因此 Codeword 長度變化較少的範例二,會比較好
:::
:::success
根據以下要求,所獲得的霍夫曼代碼在大多數情況下是唯一的
- 合併發生機率低的節點
- 每一步驟都要進行排序,由大到小,由左而右
- 已經合併過的節點必然造成較長的長度
:::
ppt 16 頁