Heap - HackMD

# Heap ###### tags: `Data Structure & Algorithm` `面試準備` ## Min Heap Min heap 是以 heap 的形式去呈現 binary tree ，而 tree 保證這個性質: > 為 complete binary tree ，所有的 children 都比 ancestor 大/小因此我們可以推得， root of tree 一定是整個樹的最大值/最小值 ![](https://i.imgur.com/3nlOmQA.jpg) - 從上圖可以發現一個好用的性質，取代了原本 binary tree 每個 node 所需記錄的 pointer ``` leftChildren = heap[nowIdx*2+1] rightChildren = heap[nowIdx*2+2] ancestor = heap[(nowIdx-1)/2] ``` | function | Description | Time Complexity | | -------- | ---------------------------------------- | --------------- | | heapify | turn array into heap | $O(N)$ | | siftUp | swap the node up with smaller ancestor | $O(logN)$ | | siftDown | swap the node down with bigger ancestor | $O(logN)$ | | insert | insert new node | $O(logN)$ | | delete | delete specific node | $O(logN)$ | | peek | return min/max of heap | $O(1)$ | | pop | pop min/max of heap and balance the heap | $O(logN)$ | - 我們討論其中幾個 function ，其餘皆是相同概念類推 ### siftDown ```cpp void siftDown(int idx, int len){ while(idx<len){ //check the nextIdx to compare with curIdx int leftIdx = idx*2+1; int rightIdx = leftIdx+1; if(rightIdx<len && heap[right]<leftIdx) leftIdx = rightIdx; //swap if childern is smaller if(leftIdx<len && heap[idx]>heap[leftIdx]){ swap(heap[idx], leftIdx); idx = leftIdx; } } return; } ``` ### insert - 插入新的 node 時為了保持 complete tree ，我們直接把他 push_back 到 heap ,之後再用 siftUp 將其調整至適當位置 ### pop - 我的作法是 `swap(heap[0],heap[len])` 後透過 siftDown 去 maintain heap 的性質 ```cpp int pop(){ if(heap.size()==0) return -1; swap(heap[0], heap[heap.size()-1]); int re = heap.pop_back(); int curIdx = 0; siftDown(0); return re; } ``` ## Time Complexity 大部分的 function 時間複雜度都很直覺，比較特別的是 **heapify** 的部分，乍看之下會覺得時間複雜度應該是 $O(NlogN)$ ，但其實是 $O(N)$ ，我們試著來分析: ![](https://i.imgur.com/hHactcU.jpg) 當我們做 siftUp 時，我們觀察紅色的點，最多會做2次，黃色的點最多會做4次，藍色的點最多會做8次， general 來看，總數為 $N$ ，共有 $\frac{N}{2^k}$ 個點會做 $k$ 次 operation ，故總合為: $$ \#operation=\displaystyle\sum^{depth}_{i=1} {\frac{N}{2^k}*k} $$ $$ s={\frac{1}{2^1}}+{\frac{2}{2^2}}+{\frac{3}{2^3}}+\cdots+{\frac{d}{2^d}} $$ $$ {\frac{s}{2}}={\frac{1}{2^2}}+{\frac{2}{2^3}}+{\frac{3}{2^4}}+\cdots+{\frac{d}{2^{d+1}}} $$ $$ s-{\frac{s}{2}}={\frac{1}{2^1}}+{\frac{1}{2^2}}+{\frac{1}{2^3}}+\cdots+{\frac{1}{2^d}}-{\frac{d}{2^{d+1}}}={\frac{\frac{1}{2}*(1-({\frac{1}{2}})^{d})}{1-\frac{1}{2}}} = 1-({\frac{1}{2}})^{d} $$ $$ s\leq2,\ \#operation = O(N) $$ ## Compare with QuickSort - 一般來說 QuickSort 的平均表現會優於 HeapSort ，而 HeapSort 的優勢在於 worst case 也只有 $O(NlogN)$ ，而 QuickSort 會到 $O(N^2)$ 。 - 根據參考資料的說法，如果使用情形允許偶發的 overtime，那 QuickSort 平均來說的體驗會比較好，但如果有時間限制，比方說每個任務一定要在某個時限內完成， HeapSort 會是更好、更保險的選擇。參考資料: [Comparing Quick and Heap Sorts](https://web.archive.org/web/20130801175915/http://www.cs.auckland.ac.nz/~jmor159/PLDS210/qsort3.html)