Trie (3) - HackMD

# Trie (3) > 紀錄 NeetCode-150 與 LeetCode 的解題想法、刷題過程與錯誤，以 Python 或 C++ 兩種語言為主 > > 其他 NeetCode-150 或 LeetCode 的主題請見 [Here](https://hackmd.io/-ovTFlzLRWaSgzDGhbSwsA)  ## Introduction <img src="https://media.geeksforgeeks.org/wp-content/uploads/20220828232752/Triedatastructure1.png" width=400> (圖片來源：[GeeksforGeeks](https://www.geeksforgeeks.org/trie-insert-and-search/?ref=gcse_outind)) Trie 是一種類似樹(tree-like)的資料結構，是由 re**trie**val 簡化命名而來，又稱為字典樹，用來動態的儲存字串，能夠在大型資料集中進行有效且快速的查找(retrieval)與儲存(storage)資料，尋找某個字串是否在資料集中。如上圖所示，資料集中儲存了 3 個字串，可以透過 tree traversal 的方式尋找字串有沒有包含在資料集中。因為整個樹的資料是由相同的字首組成的，所以也稱為字首樹或前綴樹(prefix tree)。常見的應用包含： * 字典查詢：尋找該字是否在資料庫中 * 自動補齊功能：藉由已經出現的字首來推斷使用者可能是要輸入哪些字 * 拼寫檢查 ### Representation Tri 中的每個節點會儲存一個字元或是部分的字串，根節點是一個空結合。從根節點到某節點路徑代表資料集中某個字的前綴。 > Python ver. ```python= # define node class TrieNode: def __init__(self): self.ch = [None] * 26 self.isEndOfWord = False ``` > C ver. ```c= // define node typedef struct trieNode *triePointer; struct trieNode { triePointer ch[26]; int isEndOfWord; }; // create a new node triePointer getNode(){ triePointer node = malloc(sizeof(struct trieNode)); // character for (int i = 0; i < 26; i++) node->ch[i] = NULL; node->isEndOfWord = 0; return node; } ``` 簡單來說，tri 中的每個節點都會有 26 個子樹來儲存可能的分支與路徑，是一種以空間換取時間的高效率搜尋法。不過有時也會用 hash map 的方式來代替每個節點儲存的長度為 26 的陣列，只在有資料時才會將字元儲存在 hash table 中，節省空間。此外，每個節點還需要儲存一個結束符號來紀錄這個節點是不是字串的結束位置，在查找的過程中才能停止搜尋。 ### Operation Trie 提供了三種運算：(1) 插入(insertion) (2) 搜尋(search) 與 (3) 刪除(deletion) #### 1. Insertion 以插入 "and" 字串為例： * 從 root 開始往下找： * 因為 root 是空節點，所以不設值 * `isEndOfWord` 設為 0 表示不是結束的地方 * 檢查 root 的 index = 0 的子樹 * 有其他內容，繼續往下 ; 沒有其他內容，在該位址建立一個新節點 * `isEndOfWord` 設為 0 表示不是結束的地方 * 檢查節點 A 的 index = 14 的子樹 * 有其他內容，繼續往下 ; 沒有其他內容，在該位址建立一個新節點 * `isEndOfWord` 設為 0 表示不是結束的地方 * 檢查節點 N 的 index = 20 的子樹 * 有其他內容，繼續往下 ; 沒有其他內容，在該位址建立一個新節點 * `isEndOfWord` 設為 0 表示不是結束的地方 * 檢查節點 T 的內容 * 該字串結束，所以設置提醒 `isEndOfWord = 1;` ```c= // in C, the string is saved in character array and end of '\0' // where '\0' is thought as NULL. // the following function insert a key with lowercase by default void insert(triePointer root, char* key){ triePointer curr = root; while (*key){ int index = *key - 'a'; if (!curr->ch[index]) curr->ch[index] = getNode(); key++; // next character curr = curr->ch[index]; // next level node } curr->isEndOfWord = 1; } ``` #### 2. Search 搜尋的方法類似 tree traversal，依序檢查每個節點的值。當要搜尋的字串結束時，還需要檢查該節點的是不是結束點(`isEndOfWord = 1`)，是的話才能放人。 ```c= int search(triePointer root, char* key){ triePointer curr = root; while (*key){ int index = *key - 'a'; if (!curr->ch[index]) return 0; key++; // next character curr = curr->ch[index]; // next level node } return curr->isEndOfWord; } ``` #### 3. Deletion(暫略) ### Reference * [Trie-wiki](https://zh.wikipedia.org/zh-tw/Trie) * [字典樹](https://hackmd.io/@TienYi/trie) * [資料結構筆記 - Trie](https://hackmd.io/@blackdiz/rJmA9n-PL) * [GeekforGeek](https://www.geeksforgeeks.org/trie-insert-and-search/?ref=gcse_outind) ## Implement Trie Prefix Tree **Medium** > A **prefix tree** (also known as a trie) is a tree data structure used to efficiently store and retrieve keys in a set of strings. Some applications of this data structure include auto-complete and spell checker systems. > > Implement the PrefixTree class: > > * `PrefixTree()` Initializes the prefix tree object. > * `void insert(String word)` Inserts the string `word` into the prefix tree. > * `boolean search(String word)` Returns `true` if the string `word` is in the prefix tree (i.e., was inserted before), and `false` otherwise. > * `boolean startsWith(String prefix)` Returns `true` if there is a previously inserted string `word` that has the prefix `prefix`, and `false` otherwise. ### Example 1: ```java Input: ["Trie", "insert", "dog", "search", "dog", "search", "do", "startsWith", "do", "insert", "do", "search", "do"] Output: [null, null, true, false, true, null, true] Explanation: PrefixTree prefixTree = new PrefixTree(); prefixTree.insert("dog"); prefixTree.search("dog"); // return true prefixTree.search("do"); // return false prefixTree.startsWith("do"); // return true prefixTree.insert("do"); prefixTree.search("do"); // return true ``` ### Constraints * `1 <= word.length, prefix.length <= 1000` * `word` and `prefix` are made up of lowercase English letters. ### Recommended complexity * Time complexity: $O(n)$, where `n` is the length of the given string. * Space complexity: $O(t)$, where `t` is the total number of nodes created in the Trie. ### Solution 最基礎的 trie 與其運算的實作，分為 2 種語言，每個語言 2 種版本(array and hash map) #### 1. Array ver. ```python= # Python class TrieNode: def __init__(self): self.ch = [None] * 26 self.isEndOfWord = False class PrefixTree: def __init__(self): self.root = TrieNode() def insert(self, word: str) -> None: curr = self.root for ch in word: index = ord(ch) - ord('a') if not curr.ch[index]: curr.ch[index] = TrieNode() curr = curr.ch[index] # next level node curr.isEndOfWord = True def search(self, word: str) -> bool: curr = self.root for ch in word: index = ord(ch) - ord('a') if not curr.ch[index]: return False curr = curr.ch[index] # next level node return curr.isEndOfWord def startsWith(self, prefix: str) -> bool: curr = self.root for ch in prefix: index = ord(ch) - ord('a') if not curr.ch[index]: return False curr = curr.ch[index] # next level node return True ``` #### 2. Hash map ver. ```python= # Python class TrieNode: def __init__(self): self.ch = {} self.isEndOfWord = False class PrefixTree: def __init__(self): self.root = TrieNode() def insert(self, word: str) -> None: curr = self.root for ch in word: if ch not in curr.ch: curr.ch[ch]= TrieNode() curr = curr.ch[ch] # next level node curr.isEndOfWord = True def search(self, word: str) -> bool: curr = self.root for ch in word: if ch not in curr.ch: return False curr = curr.ch[ch] # next level node return curr.isEndOfWord def startsWith(self, prefix: str) -> bool: curr = self.root for ch in prefix: if ch not in curr.ch: return False curr = curr.ch[ch] # next level node return True ``` ## Design Add And Search Words Data Structure **Medium** > Design a data structure that supports adding new words and searching for existing words. > > Implement the `WordDictionary` class: > > * `void addWord(word)` Adds `word` to the data structure. > * `bool search(word)` Returns `true` if there is any string in the data structure that matches `word` or `false` otherwise. `word` may contain dots `'.'` where dots can be matched with any letter. ### Example 1: ```java Input: ["WordDictionary", "addWord", "day", "addWord", "bay", "addWord", "may", "search", "say", "search", "day", "search", ".ay", "search", "b.."] Output: [null, null, null, null, false, true, true, true] Explanation: WordDictionary wordDictionary = new WordDictionary(); wordDictionary.addWord("day"); wordDictionary.addWord("bay"); wordDictionary.addWord("may"); wordDictionary.search("say"); // return false wordDictionary.search("day"); // return true wordDictionary.search(".ay"); // return true wordDictionary.search("b.."); // return true ``` ### Constraints * `1 <= word.length <= 20` * `word` in `addWord` consists of lowercase English letters. * `word` in `search` consist of `'.'` or lowercase English letters. ### Recommended complexity * Time complexity: $O(n)$, where `n` is the length of the given string. * Space complexity: $O(t + n)$, where `t` is the total number of nodes created in the Trie. ### Solution 插入(insert)運算與一般的 trie 相同。但搜尋(search)運算有點差異。因為在做搜尋的時候可能會碰到 `'.'` 字元，且這個字元又可以當做任何字元使用，所以搜尋過程可以分為兩種狀況討論 * 遇到 `'.'` 就去迭代該節點下面的所有分支，每個分支都去做遞迴搜索。只要有某一個分支可以匹配 `'.'` 之後的字元，就表示正確。如果不能匹配就是錯誤 * 沒遇到 `'.'` 就做一般的搜索 ```python= class TrieNode: def __init__(self): self.child = {} self.EndOfWord = False class WordDictionary: def __init__(self): self.root = TrieNode() def addWord(self, word: str) -> None: curr = self.root for c in word: if c not in curr.child: curr.child[c] = TrieNode() curr = curr.child[c] curr.EndOfWord = True def search(self, word: str) -> bool: return self.checkPath(self.root, word) def checkPath(self, root, word): wordLen = len(word) for i in range(wordLen): if word[i] == '.': for node in root.child.values(): if self.checkPath(node, word[i+1:]): return True return False else: if word[i] not in root.child: return False root = root.child[word[i]] return root.EndOfWord ``` ## Word Search ll **Hard** > Given a 2-D grid of characters `board` and a list of strings `words`, return all words that are present in the grid. > > For a word to be present it must be possible to form the word with a path in the board with horizontally or vertically neighboring cells. The same cell may not be used more than once in a word. ### Example 1: <img src="https://imagedelivery.net/CLfkmk9Wzy8_9HRyug4EVA/06435c8e-bac3-49f5-5df7-77fd5dd42800/public" width = 300> ```java Input: board = [ ["a","b","c","d"], ["s","a","a","t"], ["a","c","k","e"], ["a","c","d","n"] ], words = ["bat","cat","back","backend","stack"] Output: ["cat","back","backend"] ``` ### Example 2: <img src="https://imagedelivery.net/CLfkmk9Wzy8_9HRyug4EVA/6f244a10-78bf-4a30-0a5f-b8f3e03ce000/public" width = 200> ```java Input: board = [ ["x","o"], ["x","o"] ], words = ["xoxo"] Output: [] ``` ### Constraints * `1 <= board.length, board[i].length <= 10` * `board[i]` consists only of lowercase English letter. * `1 <= words.length <= 100` * `1 <= words[i].length <= 10` * `words[i]` consists only of lowercase English letters. * All strings within `words` are distinct. ### Solution 解題的核心思想： * 使用題目給定的多個字串 `words` 用 trie 建立字典樹 * 把 2-D matrix 中的字串跟字典樹儲存的字串一一比較 * 使用 trie 的 search operation * 但因為不是搜尋單一字串，而是 matrix 中的多個字串，所以是用類似 depth-first search 的方式比對每個 cell 中的字元每個 cell 最多都有 4 個方向可以進行下一步的搜尋，所以 4 條路徑都要走(路徑存在的話)。當某條路徑不能走的時候再回推上一步走其他路徑(backtracking)，因此需要一個變數(`self.visit`)儲存走過的路徑，回溯時再刪除路徑。整體的搜索方式是 depth-first search + backtracking ```python= class TrieNode: def __init__(self): self.child = {} self.EndOfWord = False class Solution: def findWords(self, board: List[List[str]], words: List[str]) -> List[str]: self.root = TrieNode() self.board = board # construct the prefix tree for word in words: self.insert(word) self.rows, self.cols = len(board), len(board[0]) self.visit = set() # record go through path self.res = set() # record existed word for r in range(self.rows): for c in range(self.cols): self.dfs(r, c, self.root, "") return list(self.res) def dfs(self, r, c, node, word): # stop search if (r < 0 or c < 0 or r >= self.rows or c >= self.cols or (r, c) in self.visit or self.board[r][c] not in node.child): return # check character which was saved in board[r][c] word += self.board[r][c] # concatenate previous string and current character self.visit.add((r, c)) # record path node = node.child[self.board[r][c]] # next level node if node.EndOfWord: self.res.add(word) # next 4 direction self.dfs(r+1, c, node, word) self.dfs(r-1, c, node, word) self.dfs(r, c+1, node, word) self.dfs(r, c-1, node, word) # backtracking previous step self.visit.remove((r, c)) def insert(self, word): curr = self.root for ch in word: if ch not in curr.child: curr.child[ch] = TrieNode() curr = curr.child[ch] curr.EndOfWord = True ```