# Trie (3)
> 紀錄 NeetCode-150 與 LeetCode 的解題想法、刷題過程與錯誤,以 Python 或 C++ 兩種語言為主
>
> 其他 NeetCode-150 或 LeetCode 的主題請見 [Here](https://hackmd.io/-ovTFlzLRWaSgzDGhbSwsA)
<!-- 常用色碼紀錄
Easy: <font color="#00ad5f">**Easy**</font>
Medium: <font color="#ffbb00">**Medium**</font>
Hard: <font color="#ee2f56">**Hard**</font>
-->
## Introduction
<img src="https://media.geeksforgeeks.org/wp-content/uploads/20220828232752/Triedatastructure1.png" width=400>
(圖片來源:[GeeksforGeeks](https://www.geeksforgeeks.org/trie-insert-and-search/?ref=gcse_outind))
Trie 是一種類似樹(tree-like)的資料結構,是由 re**trie**val 簡化命名而來,又稱為字典樹,用來動態的儲存字串,能夠在大型資料集中進行有效且快速的查找(retrieval)與儲存(storage)資料,尋找某個字串是否在資料集中。
如上圖所示,資料集中儲存了 3 個字串,可以透過 tree traversal 的方式尋找字串有沒有包含在資料集中。因為整個樹的資料是由相同的字首組成的,所以也稱為字首樹或前綴樹(prefix tree)。
常見的應用包含:
* 字典查詢:尋找該字是否在資料庫中
* 自動補齊功能:藉由已經出現的字首來推斷使用者可能是要輸入哪些字
* 拼寫檢查
### Representation
Tri 中的每個節點會儲存一個字元或是部分的字串,根節點是一個空結合。從根節點到某節點路徑代表資料集中某個字的前綴。
> Python ver.
```python=
# define node
class TrieNode:
def __init__(self):
self.ch = [None] * 26
self.isEndOfWord = False
```
> C ver.
```c=
// define node
typedef struct trieNode *triePointer;
struct trieNode {
triePointer ch[26];
int isEndOfWord;
};
// create a new node
triePointer getNode(){
triePointer node = malloc(sizeof(struct trieNode));
// character
for (int i = 0; i < 26; i++)
node->ch[i] = NULL;
node->isEndOfWord = 0;
return node;
}
```
簡單來說,tri 中的每個節點都會有 26 個子樹來儲存可能的分支與路徑,是一種以空間換取時間的高效率搜尋法。不過有時也會用 hash map 的方式來代替每個節點儲存的長度為 26 的陣列,只在有資料時才會將字元儲存在 hash table 中,節省空間。
此外,每個節點還需要儲存一個結束符號來紀錄這個節點是不是字串的結束位置,在查找的過程中才能停止搜尋。
### Operation
Trie 提供了三種運算:(1) 插入(insertion) (2) 搜尋(search) 與 (3) 刪除(deletion)
#### 1. Insertion
以插入 "and" 字串為例:
* 從 root 開始往下找:
* 因為 root 是空節點,所以不設值
* `isEndOfWord` 設為 0 表示不是結束的地方
* 檢查 root 的 index = 0 的子樹
* 有其他內容,繼續往下 ; 沒有其他內容,在該位址建立一個新節點
* `isEndOfWord` 設為 0 表示不是結束的地方
* 檢查節點 A 的 index = 14 的子樹
* 有其他內容,繼續往下 ; 沒有其他內容,在該位址建立一個新節點
* `isEndOfWord` 設為 0 表示不是結束的地方
* 檢查節點 N 的 index = 20 的子樹
* 有其他內容,繼續往下 ; 沒有其他內容,在該位址建立一個新節點
* `isEndOfWord` 設為 0 表示不是結束的地方
* 檢查節點 T 的內容
* 該字串結束,所以設置提醒 `isEndOfWord = 1;`
```c=
// in C, the string is saved in character array and end of '\0'
// where '\0' is thought as NULL.
// the following function insert a key with lowercase by default
void insert(triePointer root, char* key){
triePointer curr = root;
while (*key){
int index = *key - 'a';
if (!curr->ch[index])
curr->ch[index] = getNode();
key++; // next character
curr = curr->ch[index]; // next level node
}
curr->isEndOfWord = 1;
}
```
#### 2. Search
搜尋的方法類似 tree traversal,依序檢查每個節點的值。當要搜尋的字串結束時,還需要檢查該節點的是不是結束點(`isEndOfWord = 1`),是的話才能放人。
```c=
int search(triePointer root, char* key){
triePointer curr = root;
while (*key){
int index = *key - 'a';
if (!curr->ch[index])
return 0;
key++; // next character
curr = curr->ch[index]; // next level node
}
return curr->isEndOfWord;
}
```
#### 3. Deletion(暫略)
### Reference
* [Trie-wiki](https://zh.wikipedia.org/zh-tw/Trie)
* [字典樹](https://hackmd.io/@TienYi/trie)
* [資料結構筆記 - Trie](https://hackmd.io/@blackdiz/rJmA9n-PL)
* [GeekforGeek](https://www.geeksforgeeks.org/trie-insert-and-search/?ref=gcse_outind)
## Implement Trie Prefix Tree
<font color="#ffbb00">**Medium**</font>
> A **prefix tree** (also known as a trie) is a tree data structure used to efficiently store and retrieve keys in a set of strings. Some applications of this data structure include auto-complete and spell checker systems.
>
> Implement the PrefixTree class:
>
> * `PrefixTree()` Initializes the prefix tree object.
> * `void insert(String word)` Inserts the string `word` into the prefix tree.
> * `boolean search(String word)` Returns `true` if the string `word` is in the prefix tree (i.e., was inserted before), and `false` otherwise.
> * `boolean startsWith(String prefix)` Returns `true` if there is a previously inserted string `word` that has the prefix `prefix`, and `false` otherwise.
### Example 1:
```java
Input:
["Trie", "insert", "dog", "search", "dog", "search", "do", "startsWith", "do", "insert", "do", "search", "do"]
Output:
[null, null, true, false, true, null, true]
Explanation:
PrefixTree prefixTree = new PrefixTree();
prefixTree.insert("dog");
prefixTree.search("dog"); // return true
prefixTree.search("do"); // return false
prefixTree.startsWith("do"); // return true
prefixTree.insert("do");
prefixTree.search("do"); // return true
```
### Constraints
* `1 <= word.length, prefix.length <= 1000`
* `word` and `prefix` are made up of lowercase English letters.
### Recommended complexity
* Time complexity: $O(n)$, where `n` is the length of the given string.
* Space complexity: $O(t)$, where `t` is the total number of nodes created in the Trie.
### Solution
最基礎的 trie 與其運算的實作,分為 2 種語言,每個語言 2 種版本(array and hash map)
#### 1. Array ver.
```python=
# Python
class TrieNode:
def __init__(self):
self.ch = [None] * 26
self.isEndOfWord = False
class PrefixTree:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
curr = self.root
for ch in word:
index = ord(ch) - ord('a')
if not curr.ch[index]:
curr.ch[index] = TrieNode()
curr = curr.ch[index] # next level node
curr.isEndOfWord = True
def search(self, word: str) -> bool:
curr = self.root
for ch in word:
index = ord(ch) - ord('a')
if not curr.ch[index]:
return False
curr = curr.ch[index] # next level node
return curr.isEndOfWord
def startsWith(self, prefix: str) -> bool:
curr = self.root
for ch in prefix:
index = ord(ch) - ord('a')
if not curr.ch[index]:
return False
curr = curr.ch[index] # next level node
return True
```
#### 2. Hash map ver.
```python=
# Python
class TrieNode:
def __init__(self):
self.ch = {}
self.isEndOfWord = False
class PrefixTree:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
curr = self.root
for ch in word:
if ch not in curr.ch:
curr.ch[ch]= TrieNode()
curr = curr.ch[ch] # next level node
curr.isEndOfWord = True
def search(self, word: str) -> bool:
curr = self.root
for ch in word:
if ch not in curr.ch:
return False
curr = curr.ch[ch] # next level node
return curr.isEndOfWord
def startsWith(self, prefix: str) -> bool:
curr = self.root
for ch in prefix:
if ch not in curr.ch:
return False
curr = curr.ch[ch] # next level node
return True
```
## Design Add And Search Words Data Structure
<font color="#ffbb00">**Medium**</font>
> Design a data structure that supports adding new words and searching for existing words.
>
> Implement the `WordDictionary` class:
>
> * `void addWord(word)` Adds `word` to the data structure.
> * `bool search(word)` Returns `true` if there is any string in the data structure that matches `word` or `false` otherwise. `word` may contain dots `'.'` where dots can be matched with any letter.
### Example 1:
```java
Input:
["WordDictionary", "addWord", "day", "addWord", "bay", "addWord", "may", "search", "say", "search", "day", "search", ".ay", "search", "b.."]
Output:
[null, null, null, null, false, true, true, true]
Explanation:
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("day");
wordDictionary.addWord("bay");
wordDictionary.addWord("may");
wordDictionary.search("say"); // return false
wordDictionary.search("day"); // return true
wordDictionary.search(".ay"); // return true
wordDictionary.search("b.."); // return true
```
### Constraints
* `1 <= word.length <= 20`
* `word` in `addWord` consists of lowercase English letters.
* `word` in `search` consist of `'.'` or lowercase English letters.
### Recommended complexity
* Time complexity: $O(n)$, where `n` is the length of the given string.
* Space complexity: $O(t + n)$, where `t` is the total number of nodes created in the Trie.
### Solution
插入(insert)運算與一般的 trie 相同。但搜尋(search)運算有點差異。因為在做搜尋的時候可能會碰到 `'.'` 字元,且這個字元又可以當做任何字元使用,所以搜尋過程可以分為兩種狀況討論
* 遇到 `'.'` 就去迭代該節點下面的所有分支,每個分支都去做遞迴搜索。只要有某一個分支可以匹配 `'.'` 之後的字元,就表示正確。如果不能匹配就是錯誤
* 沒遇到 `'.'` 就做一般的搜索
```python=
class TrieNode:
def __init__(self):
self.child = {}
self.EndOfWord = False
class WordDictionary:
def __init__(self):
self.root = TrieNode()
def addWord(self, word: str) -> None:
curr = self.root
for c in word:
if c not in curr.child:
curr.child[c] = TrieNode()
curr = curr.child[c]
curr.EndOfWord = True
def search(self, word: str) -> bool:
return self.checkPath(self.root, word)
def checkPath(self, root, word):
wordLen = len(word)
for i in range(wordLen):
if word[i] == '.':
for node in root.child.values():
if self.checkPath(node, word[i+1:]):
return True
return False
else:
if word[i] not in root.child:
return False
root = root.child[word[i]]
return root.EndOfWord
```
## Word Search ll
<font color="#ee2f56">**Hard**</font>
> Given a 2-D grid of characters `board` and a list of strings `words`, return all words that are present in the grid.
>
> For a word to be present it must be possible to form the word with a path in the board with horizontally or vertically neighboring cells. The same cell may not be used more than once in a word.
### Example 1:
<img src="https://imagedelivery.net/CLfkmk9Wzy8_9HRyug4EVA/06435c8e-bac3-49f5-5df7-77fd5dd42800/public" width = 300>
```java
Input:
board = [
["a","b","c","d"],
["s","a","a","t"],
["a","c","k","e"],
["a","c","d","n"]
],
words = ["bat","cat","back","backend","stack"]
Output: ["cat","back","backend"]
```
### Example 2:
<img src="https://imagedelivery.net/CLfkmk9Wzy8_9HRyug4EVA/6f244a10-78bf-4a30-0a5f-b8f3e03ce000/public" width = 200>
```java
Input:
board = [
["x","o"],
["x","o"]
],
words = ["xoxo"]
Output: []
```
### Constraints
* `1 <= board.length, board[i].length <= 10`
* `board[i]` consists only of lowercase English letter.
* `1 <= words.length <= 100`
* `1 <= words[i].length <= 10`
* `words[i]` consists only of lowercase English letters.
* All strings within `words` are distinct.
### Solution
解題的核心思想:
* 使用題目給定的多個字串 `words` 用 trie 建立字典樹
* 把 2-D matrix 中的字串跟字典樹儲存的字串一一比較
* 使用 trie 的 search operation
* 但因為不是搜尋單一字串,而是 matrix 中的多個字串,所以是用類似 depth-first search 的方式比對每個 cell 中的字元
每個 cell 最多都有 4 個方向可以進行下一步的搜尋,所以 4 條路徑都要走(路徑存在的話)。當某條路徑不能走的時候再回推上一步走其他路徑(backtracking),因此需要一個變數(`self.visit`)儲存走過的路徑,回溯時再刪除路徑。
整體的搜索方式是 depth-first search + backtracking
```python=
class TrieNode:
def __init__(self):
self.child = {}
self.EndOfWord = False
class Solution:
def findWords(self, board: List[List[str]], words: List[str]) -> List[str]:
self.root = TrieNode()
self.board = board
# construct the prefix tree
for word in words:
self.insert(word)
self.rows, self.cols = len(board), len(board[0])
self.visit = set() # record go through path
self.res = set() # record existed word
for r in range(self.rows):
for c in range(self.cols):
self.dfs(r, c, self.root, "")
return list(self.res)
def dfs(self, r, c, node, word):
# stop search
if (r < 0 or c < 0 or
r >= self.rows or c >= self.cols or
(r, c) in self.visit or
self.board[r][c] not in node.child):
return
# check character which was saved in board[r][c]
word += self.board[r][c] # concatenate previous string and current character
self.visit.add((r, c)) # record path
node = node.child[self.board[r][c]] # next level node
if node.EndOfWord:
self.res.add(word)
# next 4 direction
self.dfs(r+1, c, node, word)
self.dfs(r-1, c, node, word)
self.dfs(r, c+1, node, word)
self.dfs(r, c-1, node, word)
# backtracking previous step
self.visit.remove((r, c))
def insert(self, word):
curr = self.root
for ch in word:
if ch not in curr.child:
curr.child[ch] = TrieNode()
curr = curr.child[ch]
curr.EndOfWord = True
```