Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Efficient Sorting

In Chapter 4 we proved that any comparison-based sorting algorithm must make comparisons in the worst case. The three elementary algorithms we studied — bubble sort, selection sort, and insertion sort — fall short of this bound, requiring time. In this chapter we meet three algorithms that close the gap: merge sort, quicksort, and heapsort. All three achieve time and are, in different senses, asymptotically optimal. They use the divide-and-conquer strategy from Chapter 3, but apply it in very different ways — merge sort divides trivially and combines carefully, quicksort divides carefully and combines trivially, and heapsort uses a heap data structure to repeatedly extract the maximum. We also study randomized quicksort, which uses random pivot selection to guarantee expected performance on every input.

Merge sort

Merge sort is the most straightforward application of divide-and-conquer to sorting. The idea is simple: split the array in half, recursively sort each half, and then merge the two sorted halves into a single sorted array.

The algorithm

The recursive (top-down) formulation of merge sort is:

  1. If the array has zero or one elements, it is already sorted. Return.
  2. Divide the array into two halves of roughly equal size.
  3. Recursively sort each half.
  4. Merge the two sorted halves into a single sorted array using an efficient merge procedure.

Notice that the divide step (step 2) does no real work — it simply computes a midpoint. The recursive sort (step 3) keeps splitting until it reaches single-element subarrays, which are trivially sorted. All the real work happens in the merge step (step 4). Since the recursive splitting always ends the same way — individual elements — we can skip it entirely and work bottom-up:

  1. Start with runs of length 1 (each individual element is a trivially sorted run).
  2. Set the run length .
  3. While :
    • Merge each adjacent pair of sorted runs of length into a sorted run of length using an efficient merge procedure.
    • Double the run length: .

This bottom-up formulation performs exactly the same merges as the recursive version but avoids the recursion stack. It is the version we will implement.

The key insight shared by both formulations is that merging two sorted arrays of total length takes time: we scan both arrays from left to right, always taking the smaller of the two current elements.

The merge procedure

The merge step is the heart of the algorithm. Given an array arr and indices start, middle, and end, we merge the sorted subarrays arr[start..middle) and arr[middle..end) into a single sorted subarray arr[start..end).

export function merge<T>(
  arr: T[],
  start: number,
  middle: number,
  end: number,
  comparator: Comparator<T> = numberComparator as Comparator<T>,
): void {
  const sorted: T[] = [];
  let i = start;
  let j = middle;

  while (i < middle && j < end) {
    if (comparator(arr[i]!, arr[j]!) <= 0) {
      sorted.push(arr[i]!);
      i++;
    } else {
      sorted.push(arr[j]!);
      j++;
    }
  }
  while (i < middle) {
    sorted.push(arr[i]!);
    i++;
  }
  while (j < end) {
    sorted.push(arr[j]!);
    j++;
  }

  i = start;
  while (i < end) {
    arr[i] = sorted[i - start]!;
    i++;
  }
}

The comparison <= 0 (rather than < 0) ensures stability: when two elements are equal, the one from the left subarray comes first, preserving original order.

Tracing the merge procedure

To understand how merge works step by step, let us trace through two small examples.

Example 1: merge the sorted subarrays and .

We initialize two pointers: at the start of the left subarray and at the start of the right subarray. At each step, we compare the elements at and , take the smaller one into the auxiliary array sorted, and advance the corresponding pointer. When one subarray is exhausted, we append the remainder of the other.

#arr[i]arr[j]ComparisonActionsorted
124? YesTake 2 from left, ++
284? NoTake 4 from right, ++
385? NoTake 5 from right, ++
48Right exhaustedAppend remaining from left

The right subarray is exhausted after step 3 (both of its elements have been taken). The remaining element from the left subarray (8) is appended. The auxiliary array sorted = is then copied back into the corresponding positions of the original array.

Example 2: merge the sorted subarrays and .

#arr[i]arr[j]ComparisonActionsorted
112? YesTake 1 from left, ++
232? NoTake 2 from right, ++
33Right exhaustedAppend remaining from left

The right subarray has only one element. After it is taken in step 2, the right subarray is exhausted and we append the remaining elements from the left subarray (3 and 6) in order. The merge procedure handles subarrays of unequal length naturally — the two "cleanup" loops in the code (lines 43–50) append whichever subarray still has remaining elements.

Tracing through an example

Let us sort using the bottom-up approach.

The divide phase is implicit. Had we used the recursive (top-down) formulation, the algorithm would begin by splitting the array in half through recursive calls, producing the following tree of subproblems:

                [38, 27, 43, 3, 9, 82, 10]
               /                          \
        [38, 27, 43, 3]             [9, 82, 10]
        /            \               /        \
    [38, 27]      [43, 3]       [9, 82]     [10]
    /     \       /     \       /     \
  [38]   [27]  [43]    [3]   [9]   [82]

As discussed above, this divide phase performs no useful work — it merely determines which subarrays to merge. Our bottom-up implementation skips it entirely, starting from single-element runs and doubling the run length each iteration.

Iteration 1 (step = 2): merge pairs of 1-element runs into sorted 2-element runs.

Merge of and merge(arr, 0, 1, 2):

#arr[i]arr[j]ComparisonActionsorted
13827? NoTake 27 from right, ++
238Right exhaustedAppend remaining from left

Merge of and merge(arr, 2, 3, 4):

#arr[i]arr[j]ComparisonActionsorted
1433? NoTake 3 from right, ++
243Right exhaustedAppend remaining from left

Merge of and merge(arr, 4, 5, 6):

#arr[i]arr[j]ComparisonActionsorted
1982? YesTake 9 from left, ++
282Left exhaustedAppend remaining from right

The element 10 at index 6 has no partner to merge with (the array has odd length), so it remains as a 1-element run.

Array after iteration 1: .

Iteration 2 (step = 4): merge pairs of 2-element runs into sorted 4-element runs.

Merge of and merge(arr, 0, 2, 4):

#arr[i]arr[j]ComparisonActionsorted
1273? NoTake 3 from right, ++
22743? YesTake 27 from left, ++
33843? YesTake 38 from left, ++
443Left exhaustedAppend remaining from right

Merge of and merge(arr, 4, 6, 7):

#arr[i]arr[j]ComparisonActionsorted
1910? YesTake 9 from left, ++
28210? NoTake 10 from right, ++
382Right exhaustedAppend remaining from left

Array after iteration 2: .

Iteration 3 (step = 8): merge the two remaining runs into a single sorted array.

Merge of and merge(arr, 0, 4, 7):

#arr[i]arr[j]ComparisonActionsorted
139? YesTake 3 from left, ++
2279? NoTake 9 from right, ++
32710? NoTake 10 from right, ++
42782? YesTake 27 from left, ++
53882? YesTake 38 from left, ++
64382? YesTake 43 from left, ++
782Left exhaustedAppend remaining from right

Array after iteration 3: .

Result: .

Notice how the bottom-up approach performs exactly the same merges that the recursive version would, but in a simple iterative pattern: each iteration doubles the run length, and the algorithm terminates after iterations. The total number of element comparisons across all merges is 14 — fewer than the comparisons that an elementary algorithm would make on the same input. The difference is modest here, but it grows rapidly with input size: merge sort makes comparisons versus , so for large the savings are enormous.

Bottom-up implementation

Here is the bottom-up formulation described above, which performs the same merges as the recursive version but avoids the recursion stack.

export function mergeSort<T>(
  elements: T[],
  comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
  let step = 1;

  while (step < elements.length) {
    step = step * 2;
    for (let start = 0; start < elements.length; start = start + step) {
      const middle = Math.min(start + step / 2, elements.length);
      const end = Math.min(start + step, elements.length);

      merge(elements, start, middle, end, comparator);
    }
  }
  return elements;
}

The bottom-up approach has the same time complexity as the recursive version but avoids the recursion stack overhead.

Correctness

Claim. The merge procedure merges two sorted subarrays into a single sorted subarray.

At each step of the main loop, we choose the smaller of the two current front elements. Since both subarrays are sorted, the current front element of each is the smallest remaining element in that subarray. Therefore, the smaller of the two fronts is the smallest remaining element overall. After the main loop, one subarray is exhausted. Every remaining element in the other subarray is greater than or equal to the last element placed into the merged result (otherwise it would have been chosen earlier), and these remaining elements are already sorted among themselves, so appending them preserves the sorted order. The result is a sorted permutation of all elements from both subarrays. The <= 0 comparison ensures that equal elements from the left subarray come first, preserving stability.

Claim. Merge sort correctly sorts the array.

We argue by induction on the run length.

Base case. In the first iteration (step = 2), each merge operates on runs of length 1, which are trivially sorted. By the merge claim above, each merge produces a sorted run of length 2.

Inductive step. Assume that after iteration every run has length and is sorted. In iteration , the merge procedure combines each pair of sorted runs of length into a sorted run of length , which is sorted by the merge claim.

After iterations, the entire array is a single sorted run.

Complexity analysis

Time. At each level of the merge tree, the total work across all merges is (each element is compared and copied once). The number of levels is . Therefore:

This holds in the best case, worst case, and average case — merge sort is not adaptive to the input's presortedness.

The same result follows from the recurrence for the recursive version:

By the Master Theorem (case 2, with , , ), we get .

Exact worst-case comparison count. While captures the growth rate, we can pin down the exact number of comparisons. The key observation is that merging two sorted arrays of sizes and requires at most comparisons in the worst case — when the elements are fully interleaved, so we must exhaust both arrays before the merge is complete. (In the best case, all elements of one array are smaller than all elements of the other, requiring only comparisons.)

For , every split is perfectly even and the recursion tree has levels of merging. At level (counting from the bottom), there are merges, each combining two arrays of size with at most comparisons:

Summing over all levels:

Since , this gives . For general , the exact worst-case count satisfies the recurrence with . A straightforward strong induction shows that the solution is:

The leading term is with coefficient exactly 1 — not a hidden Big-O constant, but a precise value. We will use this fact in the quicksort section to make an exact comparison between quicksort's average-case and merge sort's worst-case comparison counts.

For completeness: the best-case comparison count (when every merge encounters one subarray entirely smaller than the other, requiring only comparisons) is roughly half the worst case. For , each of the levels contributes comparisons, giving . Both best and worst case are , but the leading coefficients differ — versus .

Space. The merge procedure uses an auxiliary array of size up to to hold merged elements during each merge. The bottom-up version uses no recursion stack; the recursive version would add stack frames. The total auxiliary space is .

Properties

PropertyMerge sort
Worst-case time
Best-case time
Average-case time
Space auxiliary
StableYes
AdaptiveNo

Quicksort

Quicksort, invented by Tony Hoare in 1959, takes the opposite approach from merge sort. Where merge sort divides trivially (split in half) and combines carefully (merge), quicksort divides carefully (partition) and combines trivially (the subarrays are already in the right place).

The idea: choose a pivot element, rearrange the array so that all elements less than the pivot come before it and all elements greater come after it, then recursively sort the two partitions.

The algorithm

The recursive formulation of quicksort is:

  1. If the array has zero or one elements, it is already sorted. Return.
  2. Choose a pivot element from the array.
  3. Partition the array: rearrange elements so that all elements less than the pivot come before it and all elements greater come after it. The pivot is now in its correct final position.
  4. Recursively sort the subarray of elements before the pivot.
  5. Recursively sort the subarray of elements after the pivot.

Notice that the combine step is trivial — there is nothing to do after the recursive calls, because the partitioning has already placed every element on the correct side of the pivot. All the real work happens in the partition step (step 3). The quality of the pivot choice (step 2) determines performance. Since the pivot itself is placed in its final position and does not participate in either recursive call, the two subarrays together contain elements. An ideal pivot splits them into two roughly equal halves, which results in the running time for the algorithm, while a poor pivot that lands at one extreme produces one subarray of size and one of size 0, which results in the running time for the algorithm.

The partition procedure

The partition step rearranges arr[start..end] around a pivot element and returns the pivot's final index. After partitioning:

  • All elements to the left of the pivot are the pivot.
  • All elements to the right are the pivot.
  • The pivot is in its correct final position.

Our implementation uses the Lomuto partition scheme: scan from left, moving elements smaller than the pivot to the front. By default the middle element is chosen as the pivot, but the caller can pass an explicit pivotPos to partition around a specific element — a flexibility we will use in Chapter 6 for the median-of-medians algorithm.

export function partition<T>(
  arr: T[],
  start: number,
  end: number,
  comparator: Comparator<T> = numberComparator as Comparator<T>,
  pivotPos?: number,
): number | undefined {
  if (start > end || end >= arr.length || start < 0 || end < 0) {
    return undefined;
  }

  const pivotIndex = pivotPos ?? Math.floor((start + end) / 2);
  let storeIndex = start;

  // Move pivot to end
  const pivotTemp = arr[pivotIndex]!;
  arr[pivotIndex] = arr[end]!;
  arr[end] = pivotTemp;

  for (let i = start; i < end; i++) {
    if (comparator(arr[i]!, arr[end]!) < 0) {
      const temp = arr[storeIndex]!;
      arr[storeIndex] = arr[i]!;
      arr[i] = temp;
      storeIndex++;
    }
  }

  // Move pivot to its final position
  const temp = arr[storeIndex]!;
  arr[storeIndex] = arr[end]!;
  arr[end] = temp;

  return storeIndex;
}

The pivot is first swapped to the end, then storeIndex tracks the boundary between elements known to be less than the pivot and elements not yet examined. After the scan, the pivot is swapped into storeIndex, its correct position.

The Lomuto partition scheme in detail

The Lomuto partition scheme (named after Nico Lomuto and popularized by Jon Bentley) is an elegant single-pass algorithm that partitions an array around a pivot using two indices: storeIndex and i. The pivot is first moved to the end of the array, and then the scan pointer i advances from left to right, examining each element exactly once.

At every point during the scan, the array is divided into four regions. The two pointers storeIndex and i carve out the boundaries:

  • arr[start..storeIndex-1] — elements already classified as less than the pivot.
  • arr[storeIndex..i-1] — elements already classified as greater than or equal to the pivot. This region may be empty: at the very beginning of the scan storeIndex = i = start, and it remains empty as long as every element examined so far is less than the pivot (because each such element advances both i and storeIndex). The region grows only when the scan encounters an element pivot — that element stays in place and i advances past it while storeIndex does not.
  • arr[i..end-1] — elements not yet examined.
  • arr[end] — the pivot itself (parked at the end).

On each step, the scan pointer i examines one element:

  • If arr[i] < pivot: swap arr[i] with arr[storeIndex] and advance both i and storeIndex. This grows the "less than" region by one. If the "≥ pivot" region is non-empty, the swap moves its first element into the position just vacated by arr[i], keeping it in the "≥ pivot" region. If the "≥ pivot" region is empty (storeIndex = i), the element is effectively swapped with itself — a no-op — and both pointers advance together.
  • If arr[i] ≥ pivot: advance i only. The element stays where it is, and storeIndex does not move, so the element becomes part of the "≥ pivot" region. This is also the moment when storeIndex and i diverge (if they were still equal).

When the scan is complete (i = end), the "not examined" region is empty. We swap the pivot from arr[end] into arr[storeIndex] — the boundary between the two classified regions — placing it in its correct final position.

Tracing the Lomuto scheme. Let us trace the partition of (indices 0–5) with the middle element as pivot. The middle index is , so the pivot is . Swap it to the end:

Now scan with storeIndex = 0. At each step, we show the four regions of the array. Elements in the "less than" region are shown in bold, elements in the "greater or equal" region are shown in italics, and the pivot is underlined.

Initial state (storeIndex = 0, i = 0):

Step 1 (i = 0): . Is ? No. Do nothing.

storeIndex stays at 0.

Step 2 (i = 1): . Is ? Yes. Swap and :

storeIndex advances to 1.

Step 3 (i = 2): . Is ? Yes. Swap and :

storeIndex advances to 2.

Step 4 (i = 3): . Is ? Yes. Swap and :

storeIndex advances to 3.

Step 5 (i = 4): . Is ? Yes. Swap and :

storeIndex advances to 4.

Place pivot: Swap with :

The pivot 5 is now at index 4, its correct sorted position. Every element to its left is less than 5, and every element to its right is greater than or equal to 5.

Notice how the "greater or equal" region (just the element 8 in this example) gets pushed rightward one position each time a smaller element is swapped into the "less than" region. The storeIndex pointer always marks the exact boundary: everything before it is less than the pivot, everything from it onward (up to the scan pointer) is greater or equal.

This detailed trace also serves as an informal correctness argument for the Lomuto scheme. At every step, the four-region invariant is maintained: elements before storeIndex are less than the pivot, elements from storeIndex to i - 1 are greater than or equal, elements from i onward have not yet been examined, and the pivot sits at the end. When the scan completes, the "not examined" region is empty, so swapping the pivot into storeIndex places it at the exact boundary between the "less than" and "greater or equal" regions — its correct final position. Note also that the scan examines each of the non-pivot elements exactly once, performing at most one swap per element, so the partition procedure runs in time.

Now that we understand how a single partition call rearranges an array around a pivot, we can see how quicksort uses this operation repeatedly to sort an entire array. Each partition places one element — the pivot — in its correct final position and divides the remaining elements into two subproblems. The following example traces the full recursive process, showing how successive partitions progressively sort the array.

Tracing through an example

Let us sort with middle-element pivot selection. Since we have already traced the Lomuto partition scheme step-by-step in the previous section, here we skip the inner details of each partition call and focus on the recursive structure of quicksort — which subarray is partitioned at each step, which pivot is chosen, and how the array evolves toward the sorted order.

In the array snapshots below, elements already in their final sorted positions are shown in bold, and the pivot just placed by the current partition is underlined.

The full recursion tree (each node shows the subarray and the pivot chosen):

                  [7, 2, 1, 6, 8, 5, 3, 4]  pivot 6
                  /                        \
       [2, 1, 4, 5, 3]  pivot 4         [8, 7]  pivot 8
        /            \                   /
   [2, 1, 3]  pivot 1  [5]          [7]
        \
      [3, 2]  pivot 3
       /
     [2]

Partition 1 — full array, indices 0–7.

The middle index is , so the pivot is . Partition places 6 at index 5:

The pivot 6 is now in its final position. Two subproblems remain: the left subarray (indices 0–4) and the right subarray (indices 6–7).

Partition 2 — left subarray, indices 0–4: .

The middle index is , so the pivot is . Partition places 4 at index 3:

The pivot 4 is now in its final position. Left subproblem: (indices 0–2). Right subproblem: (index 4) — a single element, already in place and sorted as an array consisting of a single element.

Partition 3 — subarray, indices 0–2: .

The middle index is , so the pivot is . Partition places 1 at index 0:

The pivot 1 is now in its final position. Left subproblem: empty. Right subproblem: (indices 1–2).

Partition 4 — subarray, indices 1–2: .

The middle index is , so the pivot is . Partition places 3 at index 2:

The pivot 3 is now in its final position. Left subproblem: (index 1) — a single element, already in place and sorted as an array consisting of a single element. Right subproblem: empty. The entire left side of the original array is now sorted: .

Partition 5 — right subarray, indices 6–7: .

The middle index is , so the pivot is . Partition places 8 at index 7:

The pivot 8 is now in its final position. Left subproblem: (index 6) — a single element, already in place and sorted as an array consisting of a single element. Right subproblem: empty.

All subproblems have reached the base case. Result: .

Notice that quicksort performed five partitions to sort eight elements, placing one pivot in its final position each time. The three remaining elements (, , and ) reached their final positions by ending up in base-case subarrays of size 1.

Implementation

function sort<T>(
  arr: T[],
  start: number,
  end: number,
  comparator: Comparator<T>,
): void {
  if (start < end) {
    const partitionIndex = partition(arr, start, end, comparator)!;
    sort(arr, start, partitionIndex - 1, comparator);
    sort(arr, partitionIndex + 1, end, comparator);
  }
}

export function quickSort<T>(
  elements: T[],
  comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
  sort(elements, 0, elements.length - 1, comparator);
  return elements;
}

Correctness

Claim. After partition(arr, start, end), the pivot is in its correct final sorted position.

The partition loop moves all elements less than the pivot to positions before storeIndex, and leaves elements greater than or equal to the pivot after storeIndex. The pivot is then placed at storeIndex. Every element before it is smaller, every element after it is at least as large — this is exactly where the pivot belongs in the sorted output.

Claim. Quicksort correctly sorts the array.

We argue by strong induction on the subarray size.

Base case. A subarray of size 0 or 1 is trivially sorted. Quicksort returns it unchanged, so the claim holds.

Inductive step. Assume that quicksort correctly sorts all subarrays of size less than , for some . For a subarray of size , partition places the pivot in its final correct position, with all elements pivot to its left and all elements pivot to its right. Both the left and right subarrays have size strictly less than , so by the inductive hypothesis quicksort correctly sorts each of them. Since every element in the left subarray is pivot every element in the right subarray, and both subarrays are themselves sorted, the entire array of size is sorted.

Complexity analysis

The performance of quicksort depends on the quality of the partition — how evenly the pivot divides the array.

Best case. If the pivot always lands in the middle, each partition splits the array into two roughly equal halves. The recurrence is the same as merge sort:

Worst case. If the pivot always lands at one extreme (the smallest or largest element), one partition has elements and the other has 0. The recurrence becomes:

This worst case occurs with our middle-element pivot when the input is specially constructed, and with the first-element or last-element pivot strategies on already-sorted or reverse-sorted input.

Average case. On a random permutation with any fixed pivot strategy, the expected running time is . Intuitively, even moderately unbalanced partitions (say, 1:9 splits) only add a constant factor to the recursion depth: the shorter side shrinks by a factor of 10, and . A careful analysis (presented below) shows that the exact expected number of comparisons is , where is the th harmonic number. Recall from the merge sort section that merge sort's exact worst-case comparison count is , whose leading term is with coefficient exactly 1. The ratio of the two leading terms is , so quicksort's average case uses only about 39% more comparisons than merge sort's worst case — a remarkably small penalty for an algorithm whose constant-factor advantages (in-place operation, cache friendliness) often make it faster in practice.


A note to the reader. Understanding where the exact formula comes from and why it behaves as is not required for the rest of this book — only the conclusion that quicksort makes expected comparisons matters. The derivation below is provided for the sake of completeness. If the algebra feels heavy, feel free to skip ahead to the Space analysis and return to this section later or skip it altogether.

Setting up the recurrence. Suppose we are sorting elements and each of the elements is equally likely to end up as the pivot (this is the case for a random permutation with a fixed pivot-selection rule such as "pick the first element" or "pick the middle element"). The partition step compares the pivot to every other element, making exactly comparisons. After partitioning, the pivot lands in some position (where ), leaving a left subarray of size and a right subarray of size . Since every position is equally likely, each value of occurs with probability . Let denote the expected number of comparisons to sort elements. We get:

The term counts the comparisons in the partition step. The sum averages over all equally likely pivot positions: when the pivot lands at position , we recursively sort subarrays of sizes and .

Simplifying. Notice that as ranges from to , the value ranges from down to — the same set of values in reverse. Therefore, , and the recurrence becomes:

Solving the recurrence. This is a classic recurrence that is solved by the "multiply both sides by " trick to eliminate the fraction:

Write the same equation for :

Subtracting the second from the first cancels the entire sum except its last term:

Collecting on the right:

Dividing both sides by :

Now define . The recurrence becomes , which telescopes — we can unroll it all the way down to the base case :

Where the harmonic numbers arise. We decompose the summand using partial fractions:

Summing from to :

Both sums are closely related to the harmonic number , but neither starts at — the first runs from and the second from . We express each in terms of by adding and subtracting the missing initial terms.

For the first sum, add and subtract the and terms to complete the harmonic sum up to :

For the second sum, add and subtract the term:

Substituting back:

Finally, use the identity to write everything in terms of :

Multiplying back by (recall ):

This is the exact expected number of comparisons. The harmonic number arises naturally because the telescoping recurrence produces a sum of reciprocals — each "level" of the recursion contributes a term proportional to , and these terms accumulate into a harmonic sum.

Approximating. It is a well-known result from analysis that the harmonic number satisfies , where is the Euler–Mascheroni constant. We omit the proof of this fact and simply use the result. Therefore:

The factor arises from converting between natural logarithm and base-2 logarithm.


Space. Quicksort sorts in place. The recursion stack has depth in the best case but in the worst case. Tail-call optimization or explicit stack management can limit the worst-case stack depth to by always recursing on the smaller partition first.

Properties

PropertyQuicksort
Worst-case time
Best-case time
Average-case time
Space stack (in-place)
StableNo
AdaptiveNo

Why quicksort is fast in practice

Despite its worst case, quicksort is often the fastest comparison sort in practice. Several factors contribute:

  1. Cache friendliness. Quicksort's partition scan accesses elements sequentially, which is excellent for CPU cache performance. Merge sort accesses two separate subarrays during merge, which can cause more cache misses.

  2. Small constant factor. Quicksort performs fewer data movements than merge sort — partitioning swaps elements in place, while merging copies elements to an auxiliary array and back.

  3. No auxiliary memory. Quicksort needs only stack space, while merge sort needs auxiliary space. Less memory allocation means less overhead.

  4. Adaptable. In practice, quicksort implementations use several optimizations:

    • Insertion sort for small subarrays. When a subarray shrinks below a threshold (typically 10–20 elements), the algorithm switches to insertion sort, which has lower overhead for small inputs.

    • Median-of-three pivot selection. Instead of picking a single element (e.g., the middle or first) as the pivot, the algorithm examines three elements — typically the first, middle, and last — and chooses their median as the pivot. For example, given first = 8, middle = 5, last = 2, the median is 5. Because the median of three samples is far less likely to be an extreme value than a single arbitrary pick, this strategy produces more balanced partitions and dramatically reduces the probability of hitting the worst case — particularly on already-sorted or reverse-sorted inputs, which are the classic worst cases for naive pivot strategies.

    • Three-way partitioning (Dutch National Flag). Standard Lomuto or Hoare partitioning splits the array into two regions: elements pivot and elements pivot. When many elements are equal to the pivot, those duplicates still end up in recursive calls even though they are already in their correct relative position. Three-way partitioning instead splits the array into three regions — elements less than the pivot, elements equal to the pivot, and elements greater than the pivot. The equal-to-pivot region is excluded from both recursive calls, since those elements are already in their final positions. This makes a large difference when the input has many duplicate values: in the extreme case of all-equal elements, a single partition call finishes the entire array in time, whereas standard two-way partitioning would degrade to .

Randomized quicksort

Deterministic quicksort's performance depends on the pivot choice. A fixed strategy — first element, last element, middle element — can always be defeated by a carefully constructed input that forces behavior. Randomized quicksort eliminates this vulnerability by choosing the pivot uniformly at random.

Motivation

Consider a sorting library used by millions of applications. An adversary who knows the pivot-selection strategy can craft inputs that trigger worst-case behavior, leading to denial-of-service attacks. By choosing the pivot randomly, we ensure that no input is consistently bad — the algorithm's expected performance is for every input, regardless of how it was constructed.

This is a powerful guarantee. It shifts the source of randomness from the input (which an adversary controls) to the algorithm (which the adversary cannot predict).

The algorithm

Randomized quicksort is identical to standard quicksort, except that the partition step selects a random element as the pivot instead of a fixed one. The only change is the pivot index computation — everything else (the Lomuto scan, the swap logic, the recursive structure) remains exactly the same:

function randomizedPartition<T>(
  arr: T[],
  start: number,
  end: number,
  comparator: Comparator<T>,
): number {
  // The only change: pick a random pivot instead of the middle element
  const randomIndex = start + Math.floor(Math.random() * (end - start + 1));

  // ... rest identical to partition
}

The line Math.floor(Math.random() * (end - start + 1)) replaces Math.floor((start + end) / 2) — a uniform random index in [start, end] instead of the fixed middle index. The Lomuto scan, the swap into storeIndex, and the recursive sort driver all remain unchanged.

Expected running time

Theorem 5.1. The expected number of comparisons made by randomized quicksort on any input of size is at most .

Proof. Let be the elements of the input in sorted order. Define the indicator random variable as 1 if and are ever compared during the execution, and 0 otherwise.

The total number of comparisons is:

Why does this sum count all comparisons? The double sum iterates over every pair of distinct elements exactly once: the outer index ranges from 1 to and the inner index ranges from to , so we visit exactly the pairs — all pairs with . (The constraint avoids counting each pair twice; stops at because the smallest valid pair has .)

For each pair, contributes 1 if those two elements are compared during the execution, and 0 if they are not. The sum therefore counts the total number of pairs that are compared — but is this the same as the total number of comparisons? It is, because each pair is compared at most once. To see why: every comparison in quicksort happens during a partition step, where the pivot is compared against each other element in the current subarray. A comparison between and can therefore only occur when one of them is the pivot. Once an element serves as pivot, it is placed in its final position and excluded from all future recursive calls — so that element is never compared with anything again, and in particular and can never be compared a second time.

Since every comparison corresponds to exactly one pair and every pair is compared at most once, the double sum counts exactly the total number of comparisons.

Since the algorithm is randomized, is a random variable — it may take different values on different runs of the algorithm (due to different random pivot choices). We want to compute , the expected value (long-run average) of the total number of comparisons. By linearity of expectation — the fact that the expected value of a sum equals the sum of the expected values, regardless of dependencies — we can pull the expectation inside the double sum:

Now, is an indicator variable: it equals either 0 or 1. A standard property of indicator variables is that the expected value of an indicator equals the probability that it is 1: . This gives us:

It remains to compute these probabilities. Fix a pair and (with ) and ask: under what circumstances are they compared?

Recall that the only comparisons quicksort makes are between a pivot and the other elements in its subarray. So and can only be compared when one of them is a pivot and both are in the same subarray for which the partition function is called. The question is: do they ever find themselves in this situation, or are they separated into different subarrays before either becomes a pivot?

The answer depends on the set of elements between them in sorted order: . Consider what happens when some element from this set is chosen as pivot for the first time:

  • If (an element strictly between them): The pivot satisfies , so goes to the left partition and goes to the right partition. They are now in different subarrays and will never be in the same subarray again — so they will never be compared. (Note that during this partition step, both and are compared to the pivot , but not to each other.)

  • If or (one of the endpoints): That endpoint is the pivot and is compared to every other element in its subarray — including the other endpoint. So and are compared.

What about elements outside this interval — pivots with or ? These cannot separate and : if , then , so both and are greater than the pivot and land in the same (right) partition. If , both are less than the pivot and land in the same (left) partition. Either way, and remain together, and the question whether they will be compared is deferred to a later pivot choice and a partition function call.

Therefore, the fate of the pair (whether they will be compared once during the quicksort sorting or not) is determined entirely by which element in is the first to be chosen as a pivot. If it is or , they are compared; if it is any of the elements strictly between them, they are separated without being compared.

Since pivots are chosen uniformly at random, each of the elements in is equally likely to be the first one selected. Two of these choices (namely and ) lead to a comparison, so:

Therefore:

where is the th harmonic number.

This expected bound holds for every input — it is not an average over random inputs. Even on an adversarial input, randomized quicksort makes expected comparisons.

Worst case

The worst case of still exists in theory: if the random choices happen to always pick the smallest or largest element as pivot. However, the probability of this occurring is astronomically small. For , the probability of consistently terrible pivots through all recursive calls is effectively zero.

Properties

PropertyRandomized quicksort
Worst-case time (extremely unlikely)
Expected time for all inputs
Space expected stack depth
StableNo

Note: the expected time is , not merely . The upper bound follows from Theorem 5.1 above. The lower bound follows from the comparison-based sorting lower bound proved in Chapter 4: any comparison-based sorting algorithm — including randomized ones — must make comparisons in expectation, since for any fixed sequence of random choices the algorithm is deterministic and the information-theoretic lower bound applies.

Heapsort

Heapsort uses a binary heap to sort an array in place. A binary heap is an array-based data structure that maintains a partial ordering — not fully sorted, but structured enough to find the maximum (or minimum) in time and restore order in time after a removal.

The binary heap

A max-heap is a complete binary tree stored in an array where every node's value is greater than or equal to its children's values. For a node at index (zero-based):

  • Left child:
  • Right child:
  • Parent:

The max-heap property ensures that the root (index 0) holds the largest element.

The algorithm

Heapsort works in two phases:

  1. Build a max-heap from the input array. After this step, the array satisfies the max-heap property: every node's value is greater than or equal to its children's values, and in particular the largest element is at the root (index 0).
  2. Extract the maximum repeatedly. For :
    • Swap the root (the current maximum) with element , placing the maximum in its final sorted position.
    • Reduce the heap size by 1 — element is now in its final position and excluded from the heap.
    • Call heapify on the new root to restore the max-heap property among the remaining elements.

Notice that unlike merge sort and quicksort, heapsort does not use divide-and-conquer in the traditional recursive sense. Instead, it uses the heap data structure to efficiently find and remove the maximum element at each step. The build-heap phase (step 1) does the structural work of organizing the array, while the extraction loop (step 2) does the sorting work by repeatedly peeling off the maximum. The sorted elements accumulate at the end of the array while the heap shrinks from the front — the algorithm sorts in place with auxiliary space.

The sections below describe the heap data structure, the heapify and build-heap operations that support it, and the full heapsort implementation.

Heapify

The heapify operation takes a node whose children are both valid max-heaps but whose own value may violate the heap property, and "sinks" it down to restore the property:

function heapify<T>(
  arr: T[],
  heapSize: number,
  index: number,
  comparator: Comparator<T>,
): void {
  const left = 2 * index + 1;
  const right = 2 * index + 2;
  let indexOfMaximum = index;

  for (const subTreeRootIndex of [left, right]) {
    if (
      subTreeRootIndex < heapSize &&
      comparator(arr[subTreeRootIndex]!, arr[indexOfMaximum]!) > 0
    ) {
      indexOfMaximum = subTreeRootIndex;
    }
  }
  if (indexOfMaximum !== index) {
    const temp = arr[index]!;
    arr[index] = arr[indexOfMaximum]!;
    arr[indexOfMaximum] = temp;
    heapify(arr, heapSize, indexOfMaximum, comparator);
  }
}

The element at index is compared with its children. If a child is larger, the element is swapped with the largest child, and the process repeats in that child's subtree. Each step moves down one level, so heapify runs in time (the height of the tree).

Tracing the heapify procedure

To understand how heapify works step by step, let us trace through a small example.

Example: call heapify(arr, 7, 0) on the array . The root (value 2) violates the max-heap property, but both of its subtrees are valid max-heaps:

        2          ← violates heap property
       / \
      7    6
     / \  / \
    5  4 1   3

At each step, we compare the current node with its children, find the largest of the three, and swap if the current node is not the largest. If a swap occurs, we recurse into the affected subtree because the swapped-down value may violate the heap property there.

Step 1 (index 0): Compare with its children and . The largest is 7 at index 1. Since , swap and :

        7
       / \
      2    6
     / \  / \
    5  4 1   3

Array: . The right subtree (rooted at 6) is unaffected. Recurse into the left subtree at index 1, where the value 2 may now violate the heap property.

Step 2 (index 1): Compare with its children and . The largest is 5 at index 3. Since , swap and :

        7
       / \
      5    6
     / \  / \
    2  4 1   3

Array: . Recurse into index 3.

Step 3 (index 3): Node has no children (left child index heapSize). It is a leaf — stop.

The element 2 has "sunk" from the root to a leaf in two swaps. The result is a valid max-heap: every node is greater than or equal to its children.

Building a heap

We can convert an unordered array into a max-heap by calling heapify on every non-leaf node, bottom-up:

function buildHeap<T>(
  arr: T[],
  heapSize: number,
  comparator: Comparator<T>,
): void {
  const lastNonLeafIndex = Math.floor((heapSize + 1) / 2) - 1;

  for (let i = lastNonLeafIndex; i >= 0; i--) {
    heapify(arr, heapSize, i, comparator);
  }
}

Why Math.floor((heapSize + 1) / 2) - 1? The last non-leaf is the parent of the last element in the array. Since the last element has index , its parent is at index . Every node after this index has no children — it is a leaf. For example, in a heap of size 6, the last element is at index 5, its parent is at , and indices 3, 4, 5 are all leaves.

The code's expression equals when is even. When is odd, it yields one index higher — a leaf node — but this is harmless: heapify on a leaf finds no children and returns immediately. The more direct formula Math.floor((heapSize - 2) / 2) gives the exact last non-leaf for all , but it computes heapSize - 2, which underflows when heapSize is 0 or 1 in languages with unsigned integers. The (heapSize + 1) / 2 - 1 form avoids negative intermediate values, making it portable and safe regardless of the integer type. (When heapSize is 0, the expression evaluates to , so the loop condition i >= 0 is immediately false and the loop body never executes — correctly treating an empty array as a trivial heap.)

Why bottom-up? The leaves (the bottom half of the array) are trivially valid heaps. By processing nodes from the bottom up, each call to heapify encounters a node whose children are already valid heaps — exactly the precondition heapify requires.

It turns out that buildHeap built this way has the time complexity .

Why and not ? A naive analysis says: calls to heapify, each costing , giving . But this overestimates the boundary because it treats every node as if it could sink all the way to the bottom. In reality, most nodes are near the bottom and sink only a few levels. To get the true cost, we group nodes by their height in the tree and sum the work done at each height.

How many nodes are at each height? Define the height of a node as the number of edges on the longest downward path to a leaf. Leaves have height 0, their parents have height 1, and so on up to the root at height . In a complete binary tree with nodes, the number of nodes at height is . Intuitively, each successive level going up has roughly half as many nodes as the level below: about leaves (height 0), nodes at height 1, at height 2, and so on.

How much work does heapify do on a node at height ? Each call to heapify sinks a node by at most levels (one comparison-and-swap per level), so the cost is .

The total cost. Multiplying the number of nodes at height by the cost per node and summing over all heights gives:

Dropping the ceiling and pulling out , this is at most:

Since every term is positive, we can safely extend the upper limit to infinity (which only increases the sum), obtaining:

Evaluating the series . It is a well-known result from analysis that this series converges to exactly 2 (it can be derived by differentiating the geometric series and substituting ; we omit the proof here). So:

Therefore:

The key insight is that the work pyramid is inverted: the many nodes near the bottom of the tree each sink at most a few levels, while the few nodes near the top can sink many levels. Because the heavy per-node work is concentrated at the top where there are very few nodes, the total work sums to rather than to .

Tracing the buildHeap procedure

Let us trace buildHeap on the array . Since we have already traced the heapify procedure step-by-step in the previous section, here we treat each heapify call as a single step and focus on the overall bottom-up process.

The initial array as a tree:

        3
       / \
      1    6
     / \  /
    5   2 4

The array has elements. The last non-leaf index is , so we call heapify on indices 2, 1, and 0, in that order.

heapify(arr, 6, 2) — node at index 2 (value 6). Its only child is . Since , the heap property is already satisfied. No change.

        3
       / \
      1    6
     / \  /
    5   2 4

Array: (unchanged).

heapify(arr, 6, 1) — node at index 1 (value 1). Children are and . The largest child is 5 at index 3. Since , heapify swaps 1 and 5. The element 1 sinks to index 3, which is a leaf — no further swaps.

        3
       / \
      5    6
     / \  /
    1   2 4

Array: .

heapify(arr, 6, 0) — node at index 0 (value 3). Children are and . The largest child is 6 at index 2. Since , heapify swaps 3 and 6. The element 3 sinks to index 2, where its only child is . Since , heapify swaps again. Now 3 is at index 5, a leaf — done.

        6
       / \
      5    4
     / \  /
    1   2 3

Array: .

The result is a valid max-heap. Notice the bottom-up order: by the time we process a node, all nodes below it have already been heapified, so both of its subtrees are valid max-heaps — exactly the precondition that heapify requires.

Now that we have the necessary building blocks: heapify and buildHeap we can proceed to the implementation of the sorting algorithm itself.

Implementation

export function heapSort<T>(
  elements: T[],
  comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
  let heapSize = elements.length;

  buildHeap(elements, heapSize, comparator);
  // Extract-max loop: repeatedly swap the root (maximum) with the last
  // heap element, shrink the heap, and restore the heap property.
  for (let i = elements.length - 1; i > 0; i--) {
    const temp = elements[0]!;
    elements[0] = elements[i]!;
    elements[i] = temp;
    heapSize--;
    heapify(elements, heapSize, 0, comparator);
  }
  return elements;
}

Tracing through an example

Let us sort .

Build max-heap:

Starting array (as a tree):

        4
       / \
     10    3
    / \
   5   1

Process non-leaf nodes bottom-up. Node at index 1 (value 10): children are 5, 1. 10 is already larger — no change. Node at index 0 (value 4): children are 10, 3. Swap 4 with 10. Then heapify the subtree: 4 vs children 5, 1 → swap with 5.

        10            10
       / \           / \
      4    3   →    5    3
     / \           / \
    5   1         4   1

After calling buildHeap have the following max-heap: .

Extract-max loop:

#swapAfter swapCurrent heapafter heapify
1[, 5, 3, 4, 10]
2[, 4, 3, 5, 10]
3[, 1, 4, 5, 10]
4[, 3, 4, 5, 10]

Result: .

Correctness

Invariant: The extract-max loop variable i starts at and decreases to . At the start of the iteration with loop variable value :

  • is a max-heap containing the smallest elements.
  • is the sorted prefix, contains the largest elements, in sorted order (when , this range is empty — no elements have been sorted yet).

Initialization (). After buildHeap, the entire array is a max-heap and the sorted suffix is empty.

Maintenance. The root is the largest element in the heap . Swapping it with places it in the correct position (it is the th largest overall). Reducing the heap size and calling heapify restores the heap property on .

Termination (). When i becomes 0, the loop exits. At this point the invariant tells us that contains the largest elements in sorted order, and is a trivial one-element max-heap holding the minimum. The array is sorted.

Complexity analysis

Time. Building the heap takes . The extract-max loop runs times, each iteration performing a swap and a heapify costing . Total:

This holds for all inputs — heapsort is not adaptive.

Space. Heapsort sorts in place. The only auxiliary space is for temporary variables.

Properties

PropertyHeapsort
Worst-case time
Best-case time
Average-case time
Space in-place
StableNo
AdaptiveNo

Comparison of efficient sorting algorithms

We have now studied three sorting algorithms. Let us compare them across the dimensions that matter in practice.

Time complexity

AlgorithmBest caseAverage caseWorst case
Merge sort
Quicksort
Randomized quicksort expected unlikely
Heapsort

Merge sort and heapsort provide guaranteed performance. Quicksort has a theoretical worst case, but randomization makes this practically irrelevant. In terms of constant factors, quicksort (including randomized) typically makes the fewest comparisons on average — about versus merge sort's comparisons, but with lower overhead per comparison.

Space complexity

AlgorithmAuxiliary space
Merge sort
Quicksort stack
Randomized quicksort expected stack
Heapsort

Heapsort is the clear winner for space: it sorts truly in place with extra memory. Quicksort needs stack space (or in the worst case without tail-call optimization). Merge sort needs for the auxiliary merge array.

Stability

AlgorithmStable?
Merge sortYes
QuicksortNo
Randomized quicksortNo
HeapsortNo

Merge sort is the only stable algorithm among the three. This makes it the default choice when stability is required — for example, in database sorting or when composing sorts on multiple keys.

Cache performance

Quicksort has the best cache performance among the three. Its partition scan accesses elements sequentially, making excellent use of CPU cache lines. Merge sort accesses two separate subarrays during merge, which can cause cache misses when the subarrays are far apart in memory. Heapsort has the worst cache performance: heap navigation accesses elements at indices , , and , which jump around the array unpredictably for large arrays.

Practical recommendations

  • General-purpose sorting: Randomized quicksort (or a tuned variant) is the standard choice. Most standard library sort functions (including V8's Array.prototype.sort for large arrays) are based on quicksort variants.

  • Guaranteed worst-case performance: Use merge sort or heapsort. Merge sort is preferred when stability is needed; heapsort when memory is constrained.

  • Small arrays: Insertion sort (from Chapter 4) outperforms all of the above for small arrays (typically ) due to its minimal overhead. Practical quicksort implementations switch to insertion sort for small subarrays.

  • Hybrid algorithms: The best practical sorts combine multiple algorithms. Timsort (Python, Java) combines merge sort with insertion sort. Introsort (C++ STL) starts with quicksort, switches to heapsort if the recursion depth exceeds (to guarantee worst case), and uses insertion sort for small subarrays.

Summary

In this chapter we studied three efficient comparison-based sorting algorithms:

  • Merge sort divides the array in half, sorts each half recursively, and merges the sorted halves. It runs in time in all cases but requires auxiliary space. It is stable.

  • Quicksort partitions the array around a pivot, placing it in its correct position, then recursively sorts the two partitions. It runs in average time with excellent cache performance, but has worst-case time with a fixed pivot strategy. Randomized quicksort eliminates this vulnerability to adversarial inputs by choosing pivots uniformly at random, achieving expected time on every input.

  • Heapsort builds a max-heap and repeatedly extracts the maximum to build the sorted array from right to left. It runs in time in all cases and uses auxiliary space, but has poor cache performance.

All three algorithms achieve the lower bound proved in Chapter 4. In the next chapter, we explore a different question: can we sort faster than by using information beyond pairwise comparisons?

Exercises

Exercise 5.1. Trace through the merge sort algorithm on the input . Show the state of the array after each merge operation in the bottom-up approach.

Exercise 5.2. Merge sort's merge procedure uses auxiliary space. Can we merge two sorted subarrays in place (using extra space) while maintaining time? Explain why this is difficult. (Hint: in-place merge algorithms exist, but they either sacrifice time complexity to or are extremely complex.)

Exercise 5.3. Consider quicksort with the "first element" pivot strategy. Give an input of size that causes behavior. Then give a different input that causes behavior. What input causes the worst case for the "middle element" strategy used in our implementation?

Exercise 5.4. Prove that the expected recursion depth of randomized quicksort is . (Hint: at each level, with constant probability the pivot falls in the middle half of the array. How many levels until the subproblem size drops to 1?)

Exercise 5.5. Heapsort is not stable. Give a concrete example of an array with duplicate values where heapsort changes the relative order of equal elements. Why does the "swap root with last element" step destroy stability?