資料結構筆記 - Trie

如果今天想要實作如 google 搜尋列中的自動補完功能，通常會先有一個字典結構存放關鍵字，
Trie 是一種樹狀結構也稱為字典樹，用以在大量字串中快速找尋是否存在該字串。此外，Trie 也有人叫前綴樹，因為它是由字串間相同的前綴文字組成。
一個典型的 Trie 如下：

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

如果我們順著 Root 往下走，每次隨著節點走到 Leaf 時都可以得到一個字串，如上圖我們可以拼出 CAR、CAT、DOT 3 個字串。其中 Trie 的 Root 節點是不存放文字的。

所以今天如果要確認某個字串是否存在於我們的字典裡，只要依序將該字串從頭開始從 Trie 的 Root 出發，只要相同，我們就取出下個文字看看是否和下個子節點相同，一直到該字串沒有文字可以取出來比對就表示存在，如果中途有不同的子節點或者是已經走到 Leaf 但該字串尚未比對完就表示不存在字典中。

比方今天我們用 CA 比對，先從 C 開始，第一步有節點 C 和 C 相同，所以我們接著取出 A 和 C 的子節點比對，而 C 的子節點是 A 也相同，這時我們要比對的字串已經到尾巴了，所以表示 CA 存在於字典中。

如果是用 CARD 比對，一樣從 C 開始，和節點 C 相同，接著 A 和 C 的子節點 A 相同，然後 R 和 A 的子節點 R 相同，但此時 R 沒有子節點已經走到 Leaf，但還有 D 尚未比對，所以我們知道 CARD 不存在字典中。

實作時的 Node 內部以 Java 為例：




class Node {
    Map<Character, Node> children;
    boolean isWord;
}

children 存放子節點文字和其 reference，而 isWord 標示走到這個節點時是否已經組成之前插入過的文字。

Java




















































































































import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

class Node {
    Map<Character, Node> children = new HashMap<>();
    boolean isWord = false;
}

public class Trie {
    private Node root;

    public Trie() {
        root = new Node();
    }

    public void insert(String word) {
        if (word == null || word.length() == 0) {
            return;
        }

        Node currentNode = root;
        for (int i = 0; i < word.length(); i++) {
            Character c = word.charAt(i);
            Map<Character, Node> children = currentNode.children;

            // 如果子節點有此字元，則不需新增節點
            if (children.get(c) != null) {
                currentNode = children.get(c);
            } else {
            // 否則就在目前的節點下新增此字元的子節點
                currentNode = new Node();

                children.put(c, currentNode);
            }

        }

        // 遍歷到最後一個字元時，將該節點的 isWord 標示為 true
        // 表示如果從 root 走到這個節點時，目前組合的字串是之前插入過的
        currentNode.isWord = true;
    }

    public boolean search(String word) {
        if (word == null || word.length() == 0) {
            return false;
        }

        if (root.children.size() == 0) {
            return false;
        }

        // 從 root 出發
        Node currentNode = root;
        for (int i = 0; i < word.length(); i++) {
            // 取出這個字元對應的子節點
            currentNode = currentNode.children.get(word.charAt(i));

            // 如果沒有子節點表示 Trie 中沒有這個字串
            if (currentNode == null) {
                return false;
            }
        }

        // 遍歷到最後一個字元時則檢查該節點是否標為 isWord
        // 如果為 true 表示之前有插入過相同的字串
        // 如果為 false 表示只是剛好前綴和欲找尋的字串相同
        return currentNode.isWord;
    }

    public void display() {
        Node currentNode = root;

        display("", currentNode);
    }

    // 使用遞迴遍歷整個 Trie，顯示目前儲存的字串
    private void display(String word, Node currentNode) {
        if (currentNode.isWord) {
            System.out.println("含有: " + word);
        }
        // 如果已經到 leaf, 則回到上一層, 所以組好的字串要減去最後一個字
        if (currentNode.children == null || currentNode.children.size() == 0) {
            word = word.substring(0, word.length() - 1);
            return;
        }

        Set<Character> keySet = currentNode.children.keySet();
        Iterator<Character> iterator = keySet.iterator();
        while (iterator.hasNext()) {
            Character c = iterator.next();
            word += c;
            display(word, currentNode.children.get(c));
            // 執行到這裡表示這個 Node 的子樹已經遍歷完, 所以要換下個 Node
            // 因此目前的字串要減掉這次遍歷到的 Node 包含的字, 也就是最後一個字
            word = word.substring(0, word.length() - 1);
        }
    }

    public static void main(String[] args) {
        Trie trie = new Trie();

        trie.insert("test");
        trie.insert("MyPhone");
        trie.insert("測試中文");
        trie.insert("te");
        trie.display();
        System.out.println("含有 te? " + trie.search("te"));
        System.out.println("含有 test? " + trie.search("test"));
        System.out.println("含有 My? " + trie.search("My"));
        System.out.println("含有 myPhone? " + trie.search("myPhone"));
        System.out.println("含有 測試中文? " + trie.search("測試中文"));
        System.out.println("含有 測試? " + trie.search("測試"));
    }
}

輸出結果

含有: te
含有: test
含有: 測試中文
含有: MyPhone
含有 te? true
含有 test? true
含有 My? false
含有 myPhone? false
含有 測試中文? true
含有 測試? false

參考資料：

https://leetcode.com/articles/implement-trie-prefix-tree/#

資料結構筆記 - Trie

Java

輸出結果

Read more

LeetCode Question Notes

Spring Data Projection 整理

Node.js event queue

LeetCode 11. Container With Most Water