Heap / Priority Queue (7)

# Heap / Priority Queue (7) > 紀錄 NeetCode-150 與 LeetCode 的解題想法、刷題過程與錯誤，以 Python 或 C++ 兩種語言為主 > > 其他 NeetCode-150 或 LeetCode 的主題請見 [Here](https://hackmd.io/-ovTFlzLRWaSgzDGhbSwsA)  ## Introduction ### Python STL Python STL 中的 heapq 模組有支援 heap 的相關運算，可使用 `import heapq` 匯入。模組中實作的都是最小堆積(minimum heap)為主，且編號也都是 0-index 系統。以下是 heapq 模組支援的相關函式： * `heapq.heappush(heap, item)` : 將 item 加到 heap 之中，且保持 heap 的性質不變 * `heapq.heappop(heap)` : 取出最小的元素並回傳。若只是要存取最小值，可直接使用 `heap[0]` * `heapq.heapify(x)` : 將串列 `x` 轉為具有 min heap 性質的串列(in-place)，原地排序所以不會回傳 * `heapq.nlargest(n, iterable)` : 回傳一個可迭代物件 iterable 中前 n 大的元素所組成的串列 * `heapq.nsmallest(n, iterable)` : 同上其他相關運算可參考[Python 官網](https://docs.python.org/3/library/heapq.html) > Heap / Priority queue 的題目很常可以使用排序來處理，如果想不到怎麼做可以先試試看排序的方式。 ## 1. Kth Largest Element in a Stream **Easy** > Design a class to find the `kth` largest integer in a stream of values, including duplicates. E.g. the `2nd` largest from [1, 2, 3, 3] is `3`. The stream is not necessarily sorted. > > Implement the following methods: > > * `constructor(int k, int[] nums)` Initializes the object given an integer `k` and the stream of integers `nums`. > * `int add(int val)` Adds the integer `val` to the stream and returns the `kth` largest integer in the stream. ### Exmple 1: ```java Input: ["KthLargest", [3, [1, 2, 3, 3]], "add", [3], "add", [5], "add", [6], "add", [7], "add", [8]] Output: [null, 3, 3, 3, 5, 6] Explanation: KthLargest kthLargest = new KthLargest(3, [1, 2, 3, 3]); kthLargest.add(3); // return 3 kthLargest.add(5); // return 3 kthLargest.add(6); // return 3 kthLargest.add(7); // return 5 kthLargest.add(8); // return 6 ``` ### Constraints * `1 <= k <= 1000` * `0 <= nums.length <= 1000` * `-1000 <= nums[i] <= 1000` * `-1000 <= val <= 1000` * There will always be at least `k` integers in the stream when you search for the `kth` integer. ### Recommended complexity * Time complexity: $O(m \cdot \log k)$ * m is the number of times `add()` is called * k represent the rank of the largest number to be tracked * Space comolexity: $O(k)$ ### Solution 以排序的方式來做，關鍵點是每呼叫 `add()` 方法一次，就會改變陣列的順，所以每呼叫一次 `add()` 就會需要做一次排序，因此排序做法的時間複雜度為 $O(m \cdot \log n)$。可以考慮使用 minimum heap 來維持前 k 的最大數，則此堆積的根節點即為所求。每新增一個元素就 push 到 minimum heap 之中，並隨時保持(檢查)長度 = k。 ```python= class KthLargest: def __init__(self, k: int, nums: List[int]): self.k = k self.minHeap = nums heapq.heapify(self.minHeap) while (len(self.minHeap) > self.k): heapq.heappop(self.minHeap) def add(self, val: int) -> int: # push into minHeap heapq.heappush(self.minHeap, val) # keep length k while (len(self.minHeap) > self.k): heapq.heappop(self.minHeap) return self.minHeap[0] ``` ## 2. Last Stone Weight **Easy** > You are given an array of integers `stones` where `stones[i]` represents the weight of the `ith` stone. > > We want to run a simulation on the stones as follows: > > * At each step we choose the **two heaviest stones**, with weight `x` and `y` and smash them togethers > * If `x == y`, both stones are destroyed > * If `x < y`, the stone of weight `x` is destroyed, and the stone of weight `y` has new weight `y - x`. > > Continue the simulation until there is no more than one stone remaining. > > Return the weight of the last remaining stone or return `0` if none remain. ### Example 1: ```java Input: stones = [2,3,6,2,4] Output: 1 ``` Explanation: We smash 6 and 4 and are left with a 2, so the array becomes [2,3,2,2]. We smash 3 and 2 and are left with a 1, so the array becomes [1,2,2]. We smash 2 and 2, so the array becomes [1]. ### Example 2: ```java Input: stones = [1,2] Output: 1 ``` ### Constraints * `1 <= stones.length <= 20` * `1 <= stones[i] <= 100` ### Recommended complexity * Time complexity: as good or better than $O(n \cdot \log n)$ * n is the size of the input array * Space complexity: $O(n)$ ### Solution 可以使用 maximum heap，但有些語言(Ex: python)所支援的 STL 只有提供 minimum heap 的實作。以 minimum heap 模擬 maximum heap 的其中一個方法是把新增的元素加負號，則最小的就會變成最大的。 ```python= class Solution: def lastStoneWeight(self, stones: List[int]) -> int: stones = [-s for s in stones] heapq.heapify(stones) while len(stones) > 1: x = heapq.heappop(stones) # most heaviest y = heapq.heappop(stones) # second heavest # only compare once since (-x) is always greater than or equal to (-y) if y > x: heapq.heappush(stones, x-y) return 0 if len(stones) == 0 else -stones[0] ``` ## 3. K Closest Point to Origin **Medium** > You are given an 2-D array `points` where `points[i] = [xi, yi]` represents the coordinates of a point on an X-Y axis plane. You are also given an integer `k`. > > Return the `k` closest points to the origin `(0, 0)`. > > The distance between two points is defined as the Euclidean distance (`sqrt((x1 - x2)^2 + (y1 - y2)^2)`). > > You may return the answer in **any order**. ### Example 1: ![image](https://imagedelivery.net/CLfkmk9Wzy8_9HRyug4EVA/ffe90895-5c8e-47f1-6719-d5c0f656d700/public) ```java Input: points = [[0,2],[2,2]], k = 1 Output: [[0,2]] ``` Explanation : The distance between `(0, 2)` and the origin `(0, 0)` is `2`. The distance between `(2, 2)` and the origin is `sqrt(2^2 + 2^2) = 2.82842`. So the closest point to the origin is `(0, 2)`. ### Example 2: ```java Input: points = [[0,2],[2,0],[2,2]], k = 2 Output: [[0,2],[2,0]] ``` Explanation: The output `[2,0],[0,2]` would also be accepted. ### Constrints * `1 <= k <= points.length <= 1000` * `-100 <= points[i][0], points[i][1] <= 100` ### Recommended complexity * Time complexity: as good or better than $O(n \cdot \log k)$ * n is the size of the input array * k is the number of point to be returned * Space compolexity: $O(k)$ ### Solution 維護一個長度為 k 的 max heap，當新增元素後長度 > k 就刪除最大元素。(不用 min heap 的原因在於要刪除最大元素比較不方便) 每次新增元素時，除了使用距離當作值以外還需要再加上點座標的資訊，才能在最後回傳時把每個點座標一起輸出。以 Python 舉例，當新增的元素是一個序列時，會自動以序列的第一個值作為比較的元素。每一次新增元素所需的複雜度為 $O(\log k)$，n 個元素的複雜度即為 $O(n \cdot \log k)$。 ```python= class Solution: def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]: maxHeap = [] for point in points: x, y = point[0], point[1] dist = (x**2 + y**2)**0.5 heapq.heappush(maxHeap, (-dist, point)) if len(maxHeap) > k: heapq.heappop(maxHeap) return [point for _, point in maxHeap] ``` ## 4. Kth Largest Element in An Array **Medium** > Given an unsorted array of integers `nums` and an integer `k`, return the `kth` largest element in the array. > > By `kth` largest element, we mean the `kth` largest element in the sorted order, not the `kth` distinct element. > > Follow-up: Can you solve it without sorting? ### Example 1: ```java Input: nums = [2,3,1,5,4], k = 2 Output: 4 ``` ### Example 2: ```java Input: nums = [2,3,1,1,5,5,4], k = 3 Output: 4 ``` ### Constraints: * `1 <= k <= nums.length <= 10000` * `-1000 <= nums[i] <= 1000` ### Recommended complexity * Time complexity: as good or better than $O(n \cdot \log k)$ * n is the size of the nput array * k represents the rank of the largest number * Space complexity: $O(k)$ ### Solution 要回傳排序後第 k 大的元素，而不是第 k 個不同元素。 * 初始化一個 minHeap 並迭代輸入陣列將每個元素 push 到 minHeap 中 * 維護長度為 k 的 minHeap，只要長度 > k 就將根節點的元素取出 * 迭代完後的根節點即為所求因為輸入陣列長度為 n，且每個元素 push 到長度為 k 的 minHeap 的複雜度為 $O(\log k)$，所以總共的時間複雜度為 $O(n \cdot \log k)$。 ```python= class Solution: def findKthLargest(self, nums: List[int], k: int) -> int: minHeap = [] for i in nums: heapq.heappush(minHeap, i) if len(minHeap) > k: heapq.heappop(minHeap) return minHeap[0] ``` ## 5. Task Scheduler **Medium** > You are given an array of CPU tasks `tasks`, where `tasks[i]` is an uppercase english character from `A` to `Z`. You are also given an integer `n`. > > Each CPU cycle allows the completion of a single task, and tasks may be completed in any order. > > The only constraint is that identical tasks must be separated by at least `n` CPU cycles, to cooldown the CPU. > > Return the minimum number of CPU cycles required to complete all tasks. ### Example 1: ```java Input: tasks = ["X","X","Y","Y"], n = 2 Output: 5 ``` Explanation: A possible sequence is: X -> Y -> idle -> X -> Y. ### Example 2: ```java Input: tasks = ["A","A","A","B","C"], n = 3 Output: 9 ``` Explanation: A possible sequence is: A -> B -> C -> Idle -> A -> Idle -> Idle -> Idle -> A. ### Constraints * `1 <= tasks.length <= 1000` * `0 <= n <= 100` ### Recommended complexity * Time complexity: $O(m)$ * m is the size of the imput array * Space complexity: $O(1)$ ### Solution (1) 為了要讓 CPU cycle 次數最少，所以出現次數最多的任務要優先開始才不會有過多的 idle。因此要先計算每個任務的出現頻率。 * 以一個 hash map 或長度為 26 的 array 來儲存各任務的出現頻率 (2) 計算完出現頻率後，需要選擇次數最多的任務開始優先執行，並且在完成後需要將其放置在另外一個資料結構中等待冷卻後的下一次執行。 * 使用 maxHeap 儲存目前能夠執行(冷卻完)的任務，可再 $O(1)$ 的時間內找到頻率最多的開始執行 * 使用 queue 儲存執行完任務並等待下一次的執行(冷卻後) (3) 有了上述兩種儲存的結構後可以開始排程，並使用一個變數 `time` 來追蹤目前經過的時間 * 每一次的迭代中從 maxHeap 取出出現頻率最高的執行 * 執行完成後將出現頻率 - 1 * 如果該任務需要再繼續執行，則搭配他們下次可執行時間一起放入 queue 中排程(`(task, nextAvailableTime)`) * 如果該任務不需要再執行，捨棄 * 當 maxHeap 為空時，表示現在沒有可執行的任務(尚未冷卻完)，此時更新時間 `time` 至 queue 中下一個任務的可執行時間 * 迭代過程中還需要隨時檢查 queue 中的任務是否可執行，如果可執行就 push 到 minHeap 中等待執行 * 當 maxHeap 與 queue 都空時，表示任務都完成，結束 :::info 要以 minHeap 模擬一個動態更新的 maxHeap 時，可將優先性以負號表示(優先性越大，負數越小，minHeap 結構中越上層)。當優先度更新(變小)時等價於 minHeap 中的優先度變大，如下： * maxHeap 中的優先度: 3 $\rightarrow$ 2 $\rightarrow$ 1 $\rightarrow$ 0 * minHeap 中的優先度: -3 $\rightarrow$ -2 $\rightarrow$ -1 $\rightarrow$ 0 上述兩者是等價的，這種方式可以避免 minHeap 取出時忘記加上負號的動作。 ::: #### 1. 未優化，此版本 minHeap 取出後優先度的更新可能會忘記加負號 ```python= class Solution: def leastInterval(self, tasks: List[str], n: int) -> int: # compute frequency: O(m) taskCounting = {} for task in tasks: taskCounting[task] = taskCounting.get(task, 0) + 1 # initialize max heap: O(26 * log26) maxHeap = [] for task in taskCounting: heapq.heappush(maxHeap, (-taskCounting[task], task)) time = 0 scheduler = collections.deque() while maxHeap or scheduler: time += 1 if maxHeap: _, task = heapq.heappop(maxHeap) taskCounting[task] -= 1 if taskCounting[task] > 0: # still valid scheduler.append((task, time+n)) if scheduler and scheduler[0][1] == time: task, _ = scheduler.popleft() heapq.heappush(maxHeap, (-taskCounting[task], task)) # else: # task, time = scheduler.popleft() # heapq.heappush(maxHeap, (-taskCounting[task], task)) return time ``` #### 2. 優化版本 ```python= class Solution: def leastInterval(self, tasks: List[str], n: int) -> int: # computing frequency taskCount = {} for task in tasks: taskCount[task] = taskCount.get(task, 0) - 1 # push to maxHeap and construct a queue maxHeap = [] for count in taskCount.values(): heapq.heappush(maxHeap, count) scheduler = collections.deque() # scheduling timeStamp = 0 while maxHeap or scheduler: timeStamp += 1 if maxHeap: count = heapq.heappop(maxHeap) + 1 if count < 0: # still avaiable scheduler.append((count, timeStamp + n)) # next available time else: timeStamp = scheduler[0][1] if scheduler and scheduler[0][1] == timeStamp: heapq.heappush(maxHeap, scheduler.popleft()[0]) return timeStamp ``` ## 6. Design Twitter **Medium** > Implement a simplified version of Twitter which allows users to post tweets, follow/unfollow each other, and view the 10 most recent tweets within their own news feed. > > Users and tweets are uniquely identified by their IDs (integers). > > Implement the following methods: > > * `Twitter()` Initializes the twitter object. > * `void postTweet(int userId, int tweetId)` Publish a new tweet with ID `tweetId` by the user `userId`. You may assume that each `tweetId` is unique. > * `List<Integer> getNewsFeed(int userId)` Fetches at most the `10` most recent tweet IDs in the user's news feed. Each item must be posted by users who the user is following or by the user themself. Tweets IDs should be **ordered from most recent to least recent.** > * `void follow(int followerId, int followeeId)` The user with ID `followerId` follows the user with ID `followeeId`. > * `void unfollow(int followerId, int followeeId)` The user with ID `followerId` unfollows the user with ID `followeeId`. ### Example 1: ```java Input: ["Twitter", "postTweet", [1, 10], "postTweet", [2, 20], "getNewsFeed", [1], "getNewsFeed", [2], "follow", [1, 2], "getNewsFeed", [1], "getNewsFeed", [2], "unfollow", [1, 2], "getNewsFeed", [1]] Output: [null, null, null, [10], [20], null, [20, 10], [20], null, [10]] Explanation: Twitter twitter = new Twitter(); twitter.postTweet(1, 10); // User 1 posts a new tweet with id = 10. twitter.postTweet(2, 20); // User 2 posts a new tweet with id = 20. twitter.getNewsFeed(1); // User 1's news feed should only contain their own tweets -> [10]. twitter.getNewsFeed(2); // User 2's news feed should only contain their own tweets -> [20]. twitter.follow(1, 2); // User 1 follows user 2. twitter.getNewsFeed(1); // User 1's news feed should contain both tweets from user 1 and user 2 -> [20, 10]. twitter.getNewsFeed(2); // User 2's news feed should still only contain their own tweets -> [20]. twitter.unfollow(1, 2); // User 1 follows user 2. twitter.getNewsFeed(1); // User 1's news feed should only contain their own tweets -> [10]. ``` ### Constrains * `1 <= userId, followerId, followeeId <= 100` * `0 <= tweetId <= 1000` ### Recommendd complexity * Time space: $O(n)$ for `getNewsFeed()` and $O(1)$ for remaining ### Soluion (1) 每個 user 有自己的追隨對象(follow)與推文(tweets)，此外 tweets 有時間上的優先順序 * 使用兩個 hashmap 儲存 user $\leftrightarrow$ follow 與 user $\leftrightarrow$ tweets 兩種對應關係 * user $\leftrightarrow$ follow 的關係中，follow 不會重複，所以可以使用 hash set 來紀錄，並可在 $O(1)$ 時間內新增與刪除 * user $\leftrightarrow$ tweets 的關係中要紀錄推文內容以及時間順序 * 用一個時間戳記的變數 `timeStamp` 紀錄每則推文的優先順序 * 要用 minHeap 模擬 maxHeap，所以 `timeStamp` 同樣要以負號的方式呈現與遞減上述的兩個 hashmap 可以在 $O(1)$ 的時間內完成除 `getNewsFeed()` 外的方法。 (2) `getNewsFeed()` 的處理暴力解是把 user 和追蹤名單內的所有 tweet 整理成一個 array 並做排序。但其實每個 user 的 tweets 已經做了基本的排序了(依照 timeStamp)，我們要做的事情類似把多個排序好的陣列從中挑出前 10 個優先度最高的。 * 把 user 自己也加入追蹤者名單中 * 把所有追蹤者的**最新**推文加到 maxHeap 之中 * 從 maxHeap 中不斷選出推文，直到選出的推文數 = 10 * 取出一則推文後順勢再加入該追隨者的前一則推文到 maxHeap 中(如果還有推文的話) * 因此 push 到 maxHeap 中的推文需要保留該則推文的 index 資訊才能再次找到前一則推文 ```python= class Twitter: def __init__(self): self.timeStamp = 0 self.followMap = defaultdict(set) # userID:(followeeID, ...) self.tweetMap = defaultdict(list) # userID:[(timeStamp, tweetID), ...] def postTweet(self, userId: int, tweetId: int) -> None: self.tweetMap[userId].append((self.timeStamp, tweetId)) self.timeStamp -= 1 def getNewsFeed(self, userId: int) -> List[int]: res = [] maxHeap = [] # Add the user to their own list self.followMap[userId].add(userId) # push most recent tweet from followID to maxHeap for followeeId in self.followMap[userId]: if followeeId in self.tweetMap: index = len(self.tweetMap[followeeId]) - 1 # index of the most recent tweet timeStamp, tweetId = self.tweetMap[followeeId][index] heapq.heappush(maxHeap, [timeStamp, tweetId, followeeId, index - 1]) # previous tweet index # select tweet from maxHeap while maxHeap and len(res) < 10: _, tweetId, followeeId, index = heapq.heappop(maxHeap) res.append(tweetId) if index >= 0: # still have tweet timeStamp, tweetId = self.tweetMap[followeeId][index] heapq.heappush(maxHeap, [timeStamp, tweetId, followeeId, index - 1]) return res def follow(self, followerId: int, followeeId: int) -> None: self.followMap[followerId].add(followeeId) def unfollow(self, followerId: int, followeeId: int) -> None: if followeeId in self.followMap[followerId]: self.followMap[followerId].remove(followeeId) ``` ## 7. Find Median From Data Stream **Hard** > The median is the middle value in a sorted list of integers. For lists of even length, there is no middle value, so the median is the mean of the two middle values. > > For example: > > * For `arr = [1,2,3]`, the median is `2`. > *　For `arr = [1,2]`, the median is `(1 + 2) / 2 = 1.5` > > Implement the MedianFinder class: > > * `MedianFinder()` initializes the MedianFinder object. > * `void addNum(int num)` adds the integer `num` from the data stream to the data structure. < * `double findMedian()` returns the median of all elements so far. ### Example 1: ```java Input: ["MedianFinder", "addNum", "1", "findMedian", "addNum", "3" "findMedian", "addNum", "2", "findMedian"] Output: [null, null, 1.0, null, 2.0, null, 2.0] Explanation: MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); // arr = [1] medianFinder.findMedian(); // return 1.0 medianFinder.addNum(3); // arr = [1, 3] medianFinder.findMedian(); // return 2.0 medianFinder.addNum(2); // arr[1, 2, 3] medianFinder.findMedian(); // return 2.0 ``` ### Constraints * `-100,000 <= num <= 100,000` * `findMedian` will only be called after adding at least one integer to the data structure. ### Recommended complexity * Time complexity: $O(\log n)$ for `addNum()` and $O(1)$ for `findMedian()` * Space complexity: $O(n)$ ### Solution (1) 要找中位數，但每次呼叫 `addNum()` 新增元素時又不能使用排序(否則效能會是$O(n \cdot \log n)$) * 考慮到限制是 $O(\log n)$ 的時間，所以新增元素時應該是 push 到一個 minHeap/maxHeap 之中 (2) 中位數是排序後在中間的數字，但 minHeap/maxHeap 無法直接找到中間位置的元素，且也不具備排序的功能 * 把整個 array 平均分成兩部分 * 前半部份(left)維持一個 maxHeap 用來找到最大值 M * 後半部份(right)維持一個 minHeap 用來找到最小值 m * M 與 m 是位在中間位置的元素，可用來計算中位數 (3) 前後部份的長度需保持一樣，或是兩者長度差 1 * 預設新增元素都放入前半部分 * 調整兩者的大小 * 若第一部分最大值 M > 第二部分最小值 m，則 push M 到 minHeap * 元素新增後需檢查兩個 minHeap/maxHeap 的長度 :::info 前後兩部分的長度不一時，也可以預設前段較長 ::: ```python= class MedianFinder: def __init__(self): self.first = [] # max heap self.second = [] # min heap def addNum(self, num: int) -> None: # push to first heapq.heappush(self.first, -num) # check M and m if self.first and self.second and (-self.first[0]) > self.second[0]: M = heapq.heappop(self.first) heapq.heappush(self.second, -M) # check their length if len(self.first) > len(self.second) + 1: M = heapq.heappop(self.first) heapq.heappush(self.second, -M) if len(self.second) > len(self.first) + 1: m = heapq.heappop(self.second) heapq.heappush(self.first, -m) def findMedian(self) -> float: if len(self.first) > len(self.second): return float(-self.first[0]) if len(self.first) < len(self.second): return float(self.second[0]) return (-self.first[0] + self.second[0]) / 2 ```