[資料結構] CH9.Red-Black, Splay and Huffman Trees

# [資料結構] CH9.Red-Black, Splay and Huffman Trees ## 紅黑樹 * 紅黑樹也是Binary Search Trees的一種，同時也是屬於**平衡樹**的一種。 * 紅黑樹的性質(很重要)： ``` 1.所有node除了Data, left/right node之外，還會有一個"color"的屬性，這個color只會是紅色或是黑色。 2.root node 必定為黑色。 3.所有的leaf node也必定為黑色。 4.紅色節點的左右子樹必定為黑色。 5.不管從哪一個節點開始當起點，往下到leaf node的每個路徑中，經過的黑色節點數量是一樣的。 6.所有的leaf node都不會儲存Data。 ``` * 一個紅黑樹的例子： ```graphviz digraph RBT{ 1[label="16",style=filled,fillcolor="gray30",fontcolor="white"] 2[label="9",style=filled,fillcolor="crimson",fontcolor="white"] 3[label="27",style=filled,fillcolor="crimson",fontcolor="white"] 4[label="7",style=filled,fillcolor="gray30",fontcolor="white"] 5[label="11",style=filled,fillcolor="gray30",fontcolor="white"] 6[label="21",style=filled,fillcolor="gray30",fontcolor="white"] 7[label="45",style=filled,fillcolor="gray30",fontcolor="white"] 8[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 9[label="11",style=filled,fillcolor="crimson",fontcolor="white"] 10[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 11[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 12[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 13[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 14[label="36",style=filled,fillcolor="crimson",fontcolor="white"] 15[label="63",style=filled,fillcolor="crimson",fontcolor="white"] 18[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 19[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 20[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 21[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 22[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 23[label="NULL",style=filled,fillcolor="gray30",fontcolor="white",shape="box"] 1->2 1->3 2->4 2->5 3->6 3->7 4->8 4->9 5->10 5->11 6->12 6->13 7->14 7->15 14->18 14->19 15->20 15->21 9->22 9->23 } ``` ### 搜尋 * 由於同樣擁有Binary Search Trees的性質(左邊小右邊大)，該部分和之前的一樣，故跳過。 ### 插入 * 為了確保維持所有紅黑樹的特性，我們在插入時需要做**轉色**以及**旋轉**，以下是完整的步驟： ``` 1.搜尋該資料應被放入哪個位置。 2.搜尋階段，如果遇到一個node兩個子樹都是紅色的： a) 執行"換色"。 b) 換色完畢後，重新檢查該路徑中是否有出現"連續兩次的"紅色節點。 I) 如果有，執行"旋轉"。 3.插入該資料，並將它設定為紅色。 4.因為你加入了一個紅色節點，再次檢查該路徑中是否有出現"連續兩次的"紅色節點。 a) 如果有，執行"旋轉"。 5.將root node直接換成黑色(本來就是黑色就不用管)。 ※Color Change → Rotation → Insert → Rotation → Check Root ``` * 看起來似乎有點複雜，實際找一個範例來試試看。 * 將下圖中的紅黑樹加入`35`。(子樹的NULL我不畫出來了) ```graphviz digraph RBT{ 1[label="27",style=filled,fillcolor="gray30",fontcolor="white"] 2[label="15",style=filled,fillcolor="gray30",fontcolor="white"] 3[label="45",style=filled,fillcolor="crimson",fontcolor="white"] 4[label="33",style=filled,fillcolor="gray30",fontcolor="white"] 5[label="57",style=filled,fillcolor="gray30",fontcolor="white"] 6[label="30",style=filled,fillcolor="crimson",fontcolor="white"] 7[label="40",style=filled,fillcolor="crimson",fontcolor="white"] 1->2 1->3 3->4 3->5 4->6 4->7 } ``` 1.檢查應該放在哪裡：27 → 45 → 33，此時發現33的兩個子樹都是紅色的，進行換色。 2.換色：直接將33和其子樹的顏色轉換即可。 ```graphviz digraph RBT{ 1[label="27",style=filled,fillcolor="gray30",fontcolor="white"] 2[label="15",style=filled,fillcolor="gray30",fontcolor="white"] 3[label="45",style=filled,fillcolor="crimson",fontcolor="white"] 4[label="33",style=filled,fillcolor="crimson",fontcolor="white"] 5[label="57",style=filled,fillcolor="gray30",fontcolor="white"] 6[label="30",style=filled,fillcolor="gray30",fontcolor="white"] 7[label="40",style=filled,fillcolor="gray30",fontcolor="white"] 1->2 1->3 3->4 3->5 4->6 4->7 } ``` 3.因為換色了，檢查路徑上是否有連續的紅色；發現45和33都是紅色，根據其方向進行RL旋轉(詳情請看[AVL Tree](https://hackmd.io/s/rJksqh83X)的部分)，旋轉後的中間值為黑色，兩個子樹為紅色。 ```graphviz digraph RBT{ subgraph cluster1 { label="旋轉前"; 1[label="27",style=filled,fillcolor="gray30",fontcolor="white"] 2[label="15",style=filled,fillcolor="gray30",fontcolor="white"] 3[label="45",style=filled,fillcolor="crimson",fontcolor="white"] 4[label="33",style=filled,fillcolor="crimson",fontcolor="white"] 5[label="57",style=filled,fillcolor="gray30",fontcolor="white"] 6[label="30",style=filled,fillcolor="gray30",fontcolor="white"] 7[label="40",style=filled,fillcolor="gray30",fontcolor="white"] } 1->2 1->3[label="R"] 3->4[label="L"] 3->5 4->6 4->7 subgraph cluster2 { label="旋轉後"; a1[label="33",style=filled,fillcolor="gray30",fontcolor="white"] a2[label="27",style=filled,fillcolor="crimson",fontcolor="white"] a3[label="45",style=filled,fillcolor="crimson",fontcolor="white"] a4[label="15",style=filled,fillcolor="gray30",fontcolor="white"] a5[label="30",style=filled,fillcolor="gray30",fontcolor="white"] a6[label="40",style=filled,fillcolor="gray30",fontcolor="white"] a7[label="57",style=filled,fillcolor="gray30",fontcolor="white"] } a1->a2 a1->a3 a2->a4 a2->a5 a3->a6 a3->a7 } ``` 4.完成後將資料`35`放進來，設成紅色。 ```graphviz digraph RBT{ 1[label="33",style=filled,fillcolor="gray30",fontcolor="white"] 2[label="27",style=filled,fillcolor="crimson",fontcolor="white"] 3[label="45",style=filled,fillcolor="crimson",fontcolor="white"] 4[label="15",style=filled,fillcolor="gray30",fontcolor="white"] 5[label="30",style=filled,fillcolor="gray30",fontcolor="white"] 6[label="40",style=filled,fillcolor="gray30",fontcolor="white"] 7[label="57",style=filled,fillcolor="gray30",fontcolor="white"] 8[label="35",style=filled,fillcolor="crimson",fontcolor="white"] 1->2 1->3 2->4 2->5 3->6 3->7 6->8 } ``` 5.再次檢查有沒有連續的紅色。 6.將root改為黑色。 7.完成。 * 練習：試著依序將`1`, `2`, `3`, `4`, `5`和`6`放入一個空的Red-Black Tree. * 最後的樣子應為下圖： ```graphviz digraph RBT{ 1[label="2",style=filled,fillcolor="gray30",fontcolor="white"] 2[label="1",style=filled,fillcolor="gray30",fontcolor="white"] 3[label="4",style=filled,fillcolor="crimson",fontcolor="white"] 4[label="3",style=filled,fillcolor="gray30",fontcolor="white"] 5[label="5",style=filled,fillcolor="gray30",fontcolor="white"] 6[label="6",style=filled,fillcolor="crimson",fontcolor="white"] 1->2 1->3 3->4 3->5 5->6 } ``` ### 刪除 * 欸我發現我們沒有教欸ㄏㄏ * 有興趣的自己去看[wiki](https://zh.wikipedia.org/wiki/%E7%BA%A2%E9%BB%91%E6%A0%91#%E5%88%A0%E9%99%A4)，感覺好麻煩。 ### 與AVL比較 * 這些性質有什麼用？ * 為了維持這些特性，在插入、刪除資料時，我們也會進行一些旋轉，而這讓樹的高度不會差太多。 * 那為什麼我們不使用AVL Trees就好了？ * AVL對於高度的限制太嚴格，導致插入或刪除時吃掉太多資源。 * 那哪一個比較好？ * 根據需求有好有壞，沒有絕對。 | 執行動作 | AVL | Red-Black | | -------- | -------- | -------- | | 搜尋 | 勝 | | |插入||勝| |刪除||勝| --- ## Splay Tree * Splay Tree的概念是，最近搜尋/插入的資料會越接近root。 * 至於為什麼要這樣做，應該蠻好理解的：你最近使用過的資料，短期內很有可能再次被用到，越靠近root搜尋所需的時間就越短。 * 將一筆資料移動root的動作稱為Splay，而Splay中我們會使用的操作又分為`Zig Step`和`Zig-zig Step`。 ### Zig Step * 當你使用的節點N和root node只差一階(也就是說，N是root node的子樹)，我們就使用下圖來交換： * 如果是在右子樹就反過來。 ```graphviz digraph ZigStep{ subgraph cluster1{ label="旋轉前" 1[label="P"] 2[label="N"] 3[label="T3",shape="triangle"] 4[label="T1",shape="triangle"] 5[label="T2",shape="triangle"] } 1->2 1->3 2->1[style="dashed",color="orange"] 2->4 2->5 subgraph cluster2{ label="旋轉後" a2[label="N"] a3[label="T1",shape="triangle"] a1[label="P"] a4[label="T2",shape="triangle"] a5[label="T3",shape="triangle"] a2->a3 a2->a1 a1->a4 a1->a5 } } ``` ### Zig-zig Step * 當你爸不是root，而且你爸和你的路徑呈現RR或是LL時，使用下圖交換： ```graphviz digraph ZigzigStep{ subgraph cluster1{ label="旋轉前" 1[label="G"] 2[label="P"] 3[label="T4",shape="triangle"] 5[label="N"] 4[label="T3",shape="triangle"] 6[label="T1",shape="triangle"] 7[label="T2",shape="triangle"] } 1->2[label="L"] 1->3 2->4 2->5[label="L"] 5->6 5->7 subgraph cluster2{ label="旋轉後" a5[label="N"] a6[label="T1",shape="triangle"] a2[label="P"] a7[label="T2",shape="triangle"] a1[label="G"] a4[label="T3",shape="triangle"] a3[label="T4",shape="triangle"] a5->a6 a5->a2 a2->a1 a2->a7 a1->a4 a1->a3 } } ``` * 而情況是RL、LR時則是： ```graphviz digraph ZigZigStep{ subgraph cluster1{ label="旋轉前" 1[label="G"] 2[label="P"] 3[label="T4",shape="triangle"] 4[label="T1",shape="triangle"] 5[label="N"] 6[label="T2",shape="triangle"] 7[label="T3",shape="triangle"] } 1->2[label="L"] 1->3 2->4 2->5[label="R"] 5->6 5->7 subgraph cluster2{ label="旋轉後" a5[label="N"] a2[label="P"] a1[label="G"] a6[label="T1",shape="triangle"] a7[label="T2",shape="triangle"] a4[label="T3",shape="triangle"] a3[label="T4",shape="triangle"] a5->a2 a5->a1 a2->a6 a2->a7 a1->a4 a1->a3 } } ``` * 當你要將N上提至root時，只要不停的重複這兩個步驟即可。 * 請和上方的紅黑樹旋轉比較(或是AVL旋轉)，此部分的目的是將Node N上提，因此不是將中間值上提，切勿搞混。 ### 插入 * 用一般Binary Search Tree的方法插入`N`。 * Splay `N`，也就是將`N`上提至root。 ### 搜尋 * 用一般Binary Search Tree的方法搜尋`N`。 * 如果成功找到`N`，Splay`N`。 * 如果找不到該數字，Splay你最後一個遍歷到的node。 * 概念上來說就是把最接近的資料上提。 ### 刪除 * ㄏㄏ我們還是沒教。 --- ## Huffman Trees * 霍夫曼樹用於壓縮字串。 * 與紅黑樹相反，霍夫曼樹的資料只會放在葉子的地方，其他節點則是用於計算和連接。 ### Weighted External Path Length * 霍夫曼樹除了高度之外，有另一種計算路徑長度的方法，我們把這個值稱作**權重**： `將所有(葉子的值)*(該葉子到root的距離)給加總起來，即為Weighted External Path Length。` * 舉個例子，下面的樹的長度為`77`。 * $2*3+3*3+5*2+11*3+5*3+2*2=77$ ```graphviz digraph Huffman{ 1[label=""] 2[label=""] 3[label=""] 4[label=""] 5[label="5",shape="box"] 6[label=""] 7[label="2",shape="box"] 8[label="2",shape="box"] 9[label="3",shape="box"] 10[label="11",shape="box"] 11[label="5",shape="box"] 1->2[label="1"] 1->3[label="1"] 2->4[label="2"] 2->5[label="2"] 3->6[label="2"] 3->7[label="2"] 4->8[label="3"] 4->9[label="3"] 6->10[label="3"] 6->11[label="3"] } ``` ### Create Huffman Tree * 霍夫曼樹不是搜索樹，因此不需要小的在左子樹大的在右子樹。它的用途不是搜尋而是壓縮。 * 霍夫曼樹的特性是它可以根據資料，找出Weighted External Path Length**最小**的結構。 * 只要根據字元出現的頻率給予node的number，就可以將字串壓縮成最小編碼。 * 以下是建霍夫曼樹的步驟與例子： * 假設現在要將字串編碼，該字串該字串只會出現A~J，且出現的頻率A<B<...<J。 ``` 1.所有的data都先建成一個獨立的node，並依照頻率排序。 ``` ```graphviz digraph Huffman { node[shape=record] store1 [label="<f0> A 7|<f1> B 9|<f2>C 11|<f3>D 14|<f4>E 18|<f5> F 21|<f6>G 27|<f7>H 29|<f8>I 35 |<f9> J40"]; store1:f0 -> store1:f9[label="頻率低頻率高"] } ``` ``` 2.新建一個node，將兩個權重最小的root node當作左子樹和右子樹，並將該node的權重設為左右子樹的總和。 ``` ```graphviz digraph Huffman { node[shape=record] A[label="A 7",shape=""] B[label="B 9",shape=""] store1 [label="<f0> 16|<f2>C 11|<f3>D 14|<f4>E 18|<f5> F 21|<f6>G 27|<f7>H 29|<f8>I 35 |<f9> J40"]; store1:f0 -> store1:f9[label="頻率低頻率高"] store1:f0 -> A store1:f0 -> B } ``` ``` 3.不停重複步驟2，直到只剩下一個root node。 ``` ```graphviz digraph Huffman { node[shape=record] A[label="A 7",shape=""] B[label="B 9",shape=""] C[label="C 11",shape=""] D[label="D 14",shape=""] store1 [label="<f0> 16|<f2> 25|<f4>E 18|<f5> F 21|<f6>G 27|<f7>H 29|<f8>I 35 |<f9> J40"]; store1:f0 -> store1:f9[label="頻率低頻率高"] store1:f0 -> A store1:f0 -> B store1:f2 -> C store1:f2 -> D } ``` * 跳過中間步驟，最後的樹為： ```graphviz digraph Huffman { node[shape=record] 221 125 86 56 69 G[label="G 27",shape=""] H[label="H 29",shape=""] 34 I[label="I 35",shape=""] 16 E[label="E 18",shape=""] A[label="A 7",shape=""] B[label="B 9",shape=""] 46 J[label="J 40",shape=""] 25 F[label="F 21",shape=""] C[label="C 11",shape=""] D[label="D 14",shape=""] 221->125 221->86 125->56 125->69 56->G 56->H 69->34 69->I 34->16 34->E 16->A 16->B 86->46 86->J 46->25 46->F 25->C 25->D } ``` * 如此一來，我們便完成了一棵霍夫曼樹。 ### Data Coding * 根據出現頻率完成了樹之後，我們到底如何運用這棵樹編碼呢？ * 我們將剛剛的樹往左邊的路寫上`0`，右邊則是`1`。 ```graphviz digraph Huffman { node[shape=record] 221 125 86 56 69 G[label="G 27",shape=""] H[label="H 29",shape=""] 34 I[label="I 35",shape=""] 16 E[label="E 18",shape=""] A[label="A 7",shape=""] B[label="B 9",shape=""] 46 J[label="J 40",shape=""] 25 F[label="F 21",shape=""] C[label="C 11",shape=""] D[label="D 14",shape=""] 221->125[label="0"] 221->86[label="1"] 125->56[label="0"] 125->69[label="1"] 56->G[label="0"] 56->H[label="1"] 69->34[label="0"] 69->I[label="1"] 34->16[label="0"] 34->E[label="1"] 16->A[label="0"] 16->B[label="1"] 86->46[label="0"] 86->J[label="1"] 46->25[label="0"] 46->F[label="1"] 25->C[label="0"] 25->D[label="1"] } ``` * 根據從root到每個data的路徑，我們可以得到這個表： | 字元/資料 | 對應編碼 | | -------- | -------- | | A | 01000 | |B|01001| |C|1000| |D|1001| |E|0101| |F|101| |G|000| |H|001| |I|011| |J|11| * 然後我們就可以利用這個表編碼，例如"AFG"=01000101000。 * 你可以發現，頻率越高的字，編碼越短，因此霍夫曼樹可以提升壓縮字串時的壓縮率。 ###### tags: `DS` `note`