2sum 3sum 4sum discussion

# 2sum 3sum 4sum discussion ###### tags: `Steven` `Leetcode` # [1. Two Sum](https://leetcode.com/problems/two-sum/) The most classic version. ## Binary search Time complexity: $O(nlog(n))$, where n is the number of elements in the array Since we are using binary search, the precondition is that the array must be sorted. ### Search for value x ```c++ class Solution { public: vector<int> twoSum(vector<int>& nums, int target) { vector<int> orig(nums); // bug: the original index must be preserved sort(nums.begin(), nums.end()); // nums[i], target - nums[i] for(int i = 0; i < nums.size(); i++) { int l = 0, r = nums.size(); // [l, r) int cand = target - nums[i]; while(r - l > 1) { int mid = l + (r - l) / 2; if(nums[mid] == cand) { l = mid; break; } else if(nums[mid] < cand) { l = mid; } else { r = mid; } } if(nums[l] == cand && i != l) { int x = -1, y = -1; for(int j = 0; j < orig.size(); j++) { if(x == -1 && nums[i] == orig[j]) x = j; else if(nums[l] == orig[j]) y = j; } return vector<int>{x, y}; } } assert(1 == -1); // bug } }; ``` ### Using lowerbound ```c++ class Solution { public: vector<int> twoSum(vector<int>& nums, int target) { vector<int> orig(nums); // bug: the original index must be preserved sort(nums.begin(), nums.end()); // nums[i], target - nums[i] for(int i = 0; i < nums.size(); i++) { int l = 0, r = nums.size(); // [l, r) int cand = target - nums[i]; // vvvxxxxx // lr while(r - l > 1) { int mid = l + (r - l) / 2; if(nums[mid] <= cand) { l = mid; } else { r = mid; } } if(nums[l] == cand && i != l) { int x = -1, y = -1; for(int j = 0; j < orig.size(); j++) { if(x == -1 && nums[i] == orig[j]) x = j; else if(nums[l] == orig[j]) y = j; } return vector<int>{x, y}; } } assert(1 == -1); // bug } }; ``` ## Hashmap Time complexity: $O(n)$, where n is the number of elements in the array, and there is no significant collision during the hash map operations. Sorting is not required. ```c++ class Solution { public: vector<int> twoSum(vector<int>& nums, int target) { unordered_map<int, int> seen; for(int i = 0; i < nums.size(); i++) { int cand = target - nums[i]; if(seen.count(cand) == 1) { return vector<int>{seen[cand], i}; } seen[nums[i]] = i; } assert(1 == -1); } }; ``` ## 2 pointers It's actually the next problem. # [167. Two Sum II - Input array is sorted](https://leetcode.com/problems/two-sum-ii-input-array-is-sorted/) The code above should both work. Notice that the index has been changed to 1-based. ## Binary search Time complexity: $O(nlog(n))$, where n is the number of elements in the array ```c++ class Solution { public: vector<int> twoSum(vector<int>& nums, int target) { // nums[i], target - nums[i] for(int i = 0; i < nums.size(); i++) { int l = 0, r = nums.size(); // [l, r) int cand = target - nums[i]; while(r - l > 1) { int mid = l + (r - l) / 2; if(nums[mid] == cand) { l = mid; break; } else if(nums[mid] < cand) { l = mid; } else { r = mid; } } if(nums[l] == cand && i != l) { return vector<int>{min(i + 1, l + 1), max(i + 1, l + 1)}; } } assert(1 == -1); } }; ``` ## Hashmap Time complexity: $O(n)$, where n is the number of elements in the array, and there is no significant collision during the hash map operations. Sorting is not required. ```c++ class Solution { public: vector<int> twoSum(vector<int>& nums, int target) { unordered_map<int, int> seen; for(int i = 0; i < nums.size(); i++) { int cand = target - nums[i]; if(seen.count(cand) == 1) { return vector<int>{seen[cand], i + 1}; } seen[nums[i]] = i + 1; } assert(1 == -1); } }; ``` ## 2 pointers The main idea is that if you move the left pointer to the right, the value can only go up; if you move the right pointer to the left, the value can only go down! ```c++ class Solution { public: vector<int> twoSum(vector<int>& nums, int target) { int l = 0, r = nums.size() - 1; while(r - l > 0) { if(nums[l] + nums[r] == target) { return vector<int>{l + 1, r + 1}; } else if(nums[l] + nums[r] < target) { l++; } else { r--; } } assert(1 == -1); } }; ``` # [170. Two Sum III - Data structure design](https://leetcode.com/problems/two-sum-iii-data-structure-design/) # [653. Two Sum IV - Input is a BST](https://leetcode.com/problems/two-sum-iv-input-is-a-bst/) # [1099. Two Sum Less Than K](https://leetcode.com/problems/two-sum-less-than-k/) # [15. 3Sum](https://leetcode.com/problems/3sum/) # [16. 3Sum Closest](https://leetcode.com/problems/3sum-closest/) # [259. 3Sum Smaller](https://leetcode.com/problems/3sum-smaller/) # [18. 4Sum](https://leetcode.com/problems/4sum/) # [454. 4Sum II](https://leetcode.com/problems/4sum-ii/) # Binary search - great article Source(重新排版): [我已經不會寫二分搜](https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A376367007072759%7D&path=%2Fnotes%2Fnote%2F) 在開始之前，必須要強調一下我個人的專長是胡說八道。如果有寫錯的東西，請大家用力吐槽。另外廣告一下這是認捐給競程日記 http://blog.icpc.tw 的文章。二分搜 (Binary search) 幾乎是一個大家都會的演算法，最常見的用途是在一個有單調性的序列中，找某個東西是不是存在，譬如在 $1,2,5,7,9,10$ 這個單調遞增的數列中，搜尋一個特定的值 $x$ 是否存在。而 Binary search 的作法通常是先判斷 $x$ 是否介於第一項$(1)$與最後一項$(10)$之間，如果不是，那肯定沒有。反之，則維護可能的位置下限 $L$ 及上限 $U$，每次選取 $L$ 與 $U$ 之間的一個數字 $M$，通常是 $L$ 與 $U$ 的平均值，檢查一下第 $M$ 項與 $x$ 之間的大小關係。如果相等，那就找到了。如果 $x$ 較小，那麼答案應該在第 $L$ 項與第 $M-1$ 項之間。如果 $x$ 比較大，那麼答案應該在第 $M+1$ 項與第 $U$ 項之間。就反覆做到找到 $x$，或是藉由不相等的結果維護 $L$ 與 $U$，直到 $L \gt U$ 為止：此情況時可以宣布找不到。自從年紀大了以後，我經常沒有辦法寫出一個不會進入無窮迴圈的 Binary search，理由是每次要更新 $L$ 或 $U$ 時，更新錯邊界值了。感覺方法如此簡單，可是我常常會寫錯，我現在甚至不確定上面那樣寫對不對，有錯請大家吐槽一下。我現在使用的程式碼，大部分是 Binary search 的變形，也就是本文要敘述的方法，優點是在絕大多數的場合，只要 $a < b$ 不會判錯，那麼用在二分逼近法時，可以充分的壓榨出 double 的精確度，還有我幾乎沒有把這個方法寫到無窮迴圈過。我們先考慮一下另外一個問題：在一個單調遞增的數列中，找出最後一個 $<x$ 的數字所在的位置。如果數列的第一項就 $\geq x$ 那麼就沒答案。如果數列的最後一項還是 $<x$，那他就是答案。其他的情況，如果能查詢每一個位置，答案的形態將會是: $<x <x <x <x <x <x \geq x \geq x \geq x \geq x \geq x$。在多數情況下，我們可以使用 Binary search 找比 $x$ 略小一些些的 $x-\epsilon$，然後最後的 $U$ 便會是最後一個 $<x$ 的位置，而最後的 $L$ 便會是 $\geq x$ 的位置。當然，我們也能用這個問題的答案來解決平常 Binary search 所要處理的問題。而本文要提的方法框架則是，當數列有 $n$ 項、形態是 $<x <x ... <x <x \geq x \geq x\ ... \geq x \geq x \geq x$ 且 index 是 0-base 時，先選定一個 step size $s$， $\frac{n}{2} \leq s < n$，並設定候補答案 $cand=0$。每一個回合，先判斷 $cand+s$ 是否超過邊界，即 $cand+s < n$ 是否在邊界內。如果在邊界內，則判斷第 $cand+s$ 項與 $x$之間的關係。如果是 $<x$ 則將 $cand$ 加上 $s$。如果是 $\geq x$ 或是超界，那麼將 $s$ 減半 (整數除法)，而當 $s$ 被減到 $0$ 時，第 $cand$ 項就是最後一個 $<x$ 的。寫成 code 大概像是這樣： ```c++ cand = 0; s = n/2; while (s != 0) { if(cand+s < n && a[cand+s] < x) cand = cand+s; else s = s/2; } ``` 這個方法保證 $a[cand]$ 總是 $<x$，且終究會把 $s$ 降到 $0$。而 $s$ 降到 $0$ 之前必須要經過 $1$，而由 $1$ 降到 $0$ 的時間點，必然是 $cand+1 = n$ 或第 $cand+1$ 項 $\geq x$。由這個觀察，我們能確定結束時第 $cand$ 項是最後一個 $<x$ 的，而這個方法所需要的時間，其實也是 $O(log(n))$：頂多三次 if 判定就可以把 $$ 減半，頂多減半 $O(log(n))$ 次就能把 $s$ 歸零。行文到此，其實還沒有把真正要用的方法說出來。大家仔細觀察一下，便可以發現只要 $s$ 初始值介於 $\frac{n}{2}$ 到 $n-1$ 之間就可以得到上述的複雜度與正確性保證，而我們總是可以在 $\frac{n}{2}$ 到 $n-1$ 之中，找到一個 $2$ 的整數次方。假定 $f(n)$ 是 $\frac{n}{2}$ 到 $n-1$ 之中，最大的 $2$ 的整數次方，例如 $f(9)=8$。那我們選定 $f(n)$ 為 $s$ 的初始值，會有一些有趣的事情發生。當我們第一次判定，其實是判定 $a[f(n)]$ 是否 $<x$。如果 $a[f(n)]<x$，那麼第二回合，我們保證 $cand+s$ 會超界，會把 $s$ 設定成 $\frac{f(n)}{2}$。如果 $a[f(n)] \geq x$，我們第一回合馬上就會把 $s$ 設定成 $\frac{f(n)}{2}$，殊途同歸。接下來考慮是 $s$ 減半之後的那個回合(假定是第 $r$ 回合)，如果 `if` 判定成立，再接下來第 $r+1$ 回合，所查詢的位置，必然是第 $r-1$ 回合所查詢的位置，因為過程中所有的 $s$ 都是 $2$ 的整數次方，減半後走兩步就會踩到先前判過的！既然我們早就知道該位置是超界或是 $\geq x$，那第 $r+1$ 回合連比都不用比，直接把 $s$ 減半就好。因此我們能將程式碼改寫為： ```c++ for (cand = 0, s = f(n); s != 0; s = s/2) { if(cand+s < n && a[cand+s] < x) cand = cand+s; } ``` 此時大家再看仔細一點： $cand$ 變化的過程。這個方法在過程中，$cand$ 從 $0$ 變成最終型態的過程，就是每一回合確定一個 bit 是 $0$ 或是 $1$，從高位開始。呃，大家可能會想說這跟浮點精確度有什麼關係。讓我們回顧一下浮點數的結構，大多數場合，浮點數是二進位的科學記號表示法，比如說 $10.0$ 是二進位的 $1.01$ 乘上 $2$ 的 $3$ 次方。當我們在透過二分逼近法找答案時，假定 $2$ 的 $k$ 次方 $<$ 答案，$2$ 的 $k+1$ 次方 $\geq$ 答案，那麼如果將 $cand$ 初始化為 $2$ 的 $k$ 次方，$s$ 初始化為 $2$ 的 $k-1$ 次方，再套用上述方法的框架，每次能夠精準的決定一個 bit，而 double 只有小數點下 $52$ 位，只要做 $52$ 回合便能全面壓榨出 double 的精確度，只要每次跟答案比大小不會錯的話。 (按：自己吐槽自己一下，其實用二分搜的框架，找到正確的 $L$ 與 $U$ 之後，也是做幾次就多幾個 bit 的精確度。不過這個手法還可以用在維護 $\{1,2,...,n\}$ 的子集合並提供 $O(log(n))$ 的下列三種操作：插入、刪除、查詢第 $k$ 大的元素，以及透過 jump pointer 花 $O(log(n))$ 去查樹狀圖的 path aggregation ，這就是後話了，以後有空再寫。)