#### Calculating with mean values
Since the input array is randomly choosed, we use $L_{\text{BUS}}$, $L_{\text{HS}}$ and $D$ to denote the random variable that is the sum of all $l_{
For a input of different entries, there are at most permutations, which only 1 of them is the sorted list. If a algorithm always fisinhs after steps (comparisons), it can at most distinguish different permutaions, due to the fact there are only 2 result of each comparison, eliminating half of the possibilities.
Therefore, the algorithm must at least yield , or
Since the number of comparisons is definetely a whole number, we can assume the lower bound of it at worst case as , then use the Strling's Formula to simplify:
When is big enough, we can ignore the error .
Merge Sort
Requires at least comparisions, but also needs an additional array of length , mergesort is only useful for external sorting.
Insertion Sort
Requires only at most comparisions. However, the average number of nonessencial operaions (e.g., swapping) is bounded at , which is much higher than other sorting algorithms we consider useful's number ().
ITALIC-CAPTALIZED-UNDERSCORED-FONTS other methods called wwithin methods
Basic Heap Sort Analyzation
First, we define these followings to perform a heap sort.
LEFT(i) The left child of element at index i.
RIGHT(i) The right child of element at index i.
HEAPIFY(A, i) Assuming in tree A, both subtree which roots are LEFT(i) / RIGHT(i) already satisfies the properties of a heap. Reorder so that the subtree which root is i also satisfies the properties of a heap. The shift of i is performed at most h times, assuming h is the height of subtree which root is i; before each shift, at most 2 comparision is required.
BUILD-HEAP(A) Reorder tree A so that its element satisfies the properties of a heap. That is, to call HEAPIFY on the -th node down to the 1st node.
Then, to perform heap sort, we just need to perform the following procedure:
Recall that when reheaping, on every level of shifting, 2 comparisions are required to determine which element to go on. Furthermore, since most node of a tree are located near the leaf, we can assume almost all reheaping goes until the leaf.
Thus, there will be a significant difference if we could eliminate 1 of the 2 comparision required.
Special Leaf & Special Path
To achieve the above goal, we try to find the final location of i (as in REHEAP(i)) then modify the heap.
From i, we find the Special Path by comparing the two child, choose the one that i should be exchanging with, until we reach the leaf, which we call the Special Leaf.
Following it, we search from the special leaf, to where i is supposed to be. Consider the special path as b[1]...b[k]...b[h], the required comparision number is comparisions.
BOTTOM-UP-SEARCH(A, i, j): (if building min-heap) whileA[i] < A[j]do j = floor(j / 2)
After we found the position, we must perform the exchange, assuming on the special path excluding the root is b[1]...b[k]...b[h], and the root as x. The procedure INTERCHANGE(A, i, j) preform actions so that:
When creating building a heap, the procedure LEAF-SEARCH() is called for node to .
Worst Case Situation
We first calculate the number of comparisions for nodes for the fully filled layers (That is, the orange nodes, considering ). For each that , There exists nodes, located at level , casuing comparisions each. For example, if we substitute to the formula, we get that on level , there are nodes that require comparisions to the reach the leaf. To sum it up, we try to calculate the result of Considering as the result when , we can get the recursive formula By substuting , we can generate: Then, We try to get the form . To do so, we plug in and , to get , thus resulting in the following equation allowing us to substitute to itself: Thus, we finally get the following equation: To represent with , we again substitute with , getting the number of comparisions:
Next, we try to calculate the last nodes. ( That is, the blue nodes ). Every pair of the blue nodes ( that is, when ) causes every node above them costing more comparisons, that is, for level , the blue nodes causes Causing the procedure requireing an additional Comparisons. Since the formula contains a ceiling function, we try to obtain a upper bound. First, we seperate the integer part and the fractal part (which would be modified by the ceiling function) from the formula, getting: The first term is easy to calculate. Eliminating the constant term, we get which can be easily evaulated into For the second term, we use the property that Adding the two terms up results in Since , we are able to elimate the last term, thus obtaining the inequality stated in the paper:
Finally, adding the two value up, we result in: (Recall that for a full tree, ) As the upper-bound for the number of comparisions.
Best case Situation
Due to the fact that whatever the form the tree is, all LEAF_SEARCH() runs until the leaf, thus the number of comparisions remains as , identical with the worst case.
However, consider the situation that . That is, the deepest layer only contains 1 simgle node (Only 1 blue node exists). Althought creating an additional level through the root to leaf, a single simply lets the calculation fall through, saving comparisions for each root. This happens if the root sits on the path to , saving comparisons in total.
Thus for the best case, the number of comparisions becomes As the lower-bound for the number of comparisions.
2. Number of Cmp. used by the BOTTOM-UP-SEARCH() calls during the heap creationn phase.
In this part, we focus on individual special paths; that is, the nodes from the processing root till the special leaf. Since the heapify procedure is done bottom-up, the nodes on the path should be well ordered according to the rules of a heap.
The array represents the nodes on the special path, excluding the root, starting from .
Representing the border of the array, .
represents the root node.
represents the length of the special path, excluding the root.
The bottom-up search is simple: we place at the position after , then move up until before node that satisfies .
Symbols for calculation
Compared with HEAPSORT, the above action requires the following numbers of comparison:
Method
When
When
When
HEAPSORT requires comparision per level
BOTTOM-UP SEARCH() requires comparision per level
Then, we define the following (for one root,):
represents the number comparison used by BOTTOM-UP-SEARCH()
represents half of the number of comparison HEAPSORT uses.
The result of changes according to the condtions:
Calculations
When
When
When
When
When
When
The comparison number of normal HEAPSORT
TODO: Read the damn paper which the math is way too over my capacity
Since the input array is randomly choosed, we use , and to denote the random variable that is the sum of all , and , respectively. ( That is to say, for instance, the excpected value of )
According to the first part this section, since the comparision time of LEAF-SEARCH() is exactly the sum of special length for each root node, we know that:
Additionally, According to the above theory, we also know that:
By defining the number of calls that as , the number of calls that would be . Thus we can calculate that:
To optain the value of , we consider the 2 situations: when , and when .
Situation — Means that the root node is the smallest (if its a max heap) within the subtree. If the subtree contains nodes, the probability of a node be in that case is .
Consider a fully filled tree: From the 2nd lowest level, there exists subtrees, containing 3 nodes; the 3rd lowest level exists subtrees, each containing 7 nodes. We are able to only consider a full tree since a error is allowed.
In general, the -th lowest level exists subtrees, contains nodes; thus we can conclude there is a chance this situation happens on each level; thus we obtain the expected nunber of case where being
Situation — Means that after the reheap process, the root node should have childs; we first define the following symbols. When using the old REHEAP() procedure:
represents the number of comparisons used.
represents the number of interchanges used (moving how many levels down).
represents the number of childs the rood node have after the procedure, .
Using the old REHEAP(), to advance down, each level requires 2 comparisons, and some additional comparisons at the desired level, comparing with all of its childs. Thus, we can know the relationship between the three symbols being:
To obtain the correct value, we try rule out the cases when or . Due to the property of a heap, the lowest level nodes shall be placed at smallest indexes, leaving only 1 possible way for a node to have , that is, the -th node. (See image for calculating cmp. of LEAF_SEARCH() to illustrate.) All nodes on that path might result in the position, thus a probability for this case, which is low enough landing within the error range .
Again for the case , we use the random variable , and to represent the sum of all , and respectivly. We then may know that according to the above property of HEAPSORT,
Also, we can also use the equation to represent expected value, where is the random variable under set , and as the probability of happening; that is to say, represent as , or . Thus, we could calculate as following:
Therefore, we can obtain
And the expected number of cases that becomes:
Thus, combinding the equation of , , we can finally get the result that as the expected number of comparisons used in BOTTOM-UP-SEARCH() during the heap-creation phase.
3. The Number of Comparison Used in the Heap Creation Phase
Simply add the above 2 sections' result up, we get
TODO: require explaination for content after Lemma 4.4