Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Linear-Time Sorting and Selection

In Chapter 4 we proved a lower bound: every comparison-based sorting algorithm must make comparisons in the worst case. The efficient algorithms of Chapter 5 — merge sort, quicksort, heapsort — all meet this bound, and none can beat it. But what if we are willing to go beyond pairwise comparisons? If we know something about the structure of the values — for instance, that they are integers in a bounded range — then it turns out that we can exploit that structure to sort in linear time. In this chapter we study three such algorithms: counting sort, radix sort, and bucket sort. We also turn to a related problem — selection — and present two algorithms that find the th smallest element in time without sorting: randomized quickselect and the deterministic median-of-medians algorithm.

Breaking the comparison lower bound

The lower bound from Chapter 4 applies to comparison-based sorting: algorithms that learn about the input only by comparing pairs of elements. The decision-tree argument shows that any comparison-based algorithm must traverse a binary tree of height of at least , because there are possible permutations and each leaf of the decision tree corresponds to one permutation.

This lower bound however does not apply if we use operations other than comparisons and know more about the values in the underlying array. For example, if the values are integers, we can look at their individual digits. And if the values are bounded, we can use them as array indices. These non-comparison-based operations give us additional information that comparison-based algorithms cannot access, and this is what allows us to sort faster.

The obvious trade-off we are making here is generality: comparison-based sorting works for any totally ordered type, while the algorithms in this chapter require specific value structure (integers, bounded range, uniform distribution).

Counting sort

Counting sort is the simplest linear-time sorting algorithm. It works for non-negative integer values in a known range and sorts by counting how many times each value appears.

The algorithm

  1. Create an array counts of size , initialized to zeros.
  2. For each element in the input, increment counts[element].
  3. Compute prefix sums: replace each counts[i] with the sum of all counts for values . After this step, counts[i] tells us the position after the last occurrence of value in the sorted output.
  4. Walk the input array in reverse, placing each element at position counts[element] - 1 and decrementing the count. Walking in reverse ensures stability.

Implementation

export function countingSort(elements: number[]): number[] {
  if (elements.length <= 1) {
    return elements.slice(0);
  }

  const max = Math.max(...elements);
  const counts = new Array<number>(max + 1).fill(0);

  // Count occurrences
  for (const val of elements) {
    counts[val]!++;
  }

  // Compute prefix sums (cumulative counts)
  for (let i = 1; i <= max; i++) {
    counts[i]! += counts[i - 1]!;
  }

  // Build output array in reverse for stability
  const output = new Array<number>(elements.length);
  for (let i = elements.length - 1; i >= 0; i--) {
    const val = elements[i]!;
    counts[val]!--;
    output[counts[val]!] = val;
  }

  return output;
}

Tracing through an example

Let us sort .

Step 1–2: Count occurrences. The maximum value is 8, so we create counts of size 9:

Index012345678
counts012210001

Step 3: Prefix sums. Each entry becomes the cumulative count:

Index012345678
counts013566667

The prefix sum tells us: 0 elements are , 1 element is , 3 elements are , and so on.

Step 4: Place elements (reverse scan).

counts[A[i]] beforeOutput positioncounts[A[i]] after
61100
53544
43433
38766
22322
12211
04655

Result: .

Notice that the two 2s and the two 3s appear in the same relative order as in the input — counting sort is stable.

Stability

Counting sort's stability is not an accident; it is a consequence of scanning the input in reverse during the placement step. When we encounter the last occurrence of a value (scanning right to left), we place it at the highest available position for that value. The second-to-last occurrence goes one position earlier, and so on. This preserves the original relative order among elements with equal values.

Stability matters when sorting records by one field while preserving order on another, and it is essential for the counting sort's role as a subroutine in radix sort.

Complexity analysis

Time. The algorithm makes four passes:

  1. Finding the maximum: .
  2. Counting occurrences: .
  3. Computing prefix sums: .
  4. Placing elements in the output: .

Total: , where is the maximum value.

Space. The counts array uses space, and the output array uses space. Total: .

The core trade-off. At this point the reader might wonder: if counting sort runs in linear time, why don't we always use it instead of comparison-based algorithms? The answer is that counting sort trades space for speed, and this trade-off is only possible because we assume the input consists of non-negative integers in a known range . The algorithm allocates an auxiliary counts array of size and uses the element values directly as array indices — an operation that comparison-based algorithms never perform. It is this extra structural knowledge that lets us bypass the comparison lower bound.

When is counting sort practical? When , the space overhead is proportional to the input size, and counting sort runs in time — excellent. But when , the trade-off breaks down: we pay space and time for a mostly empty counts array while gaining nothing. For instance, if the values range up to but the array has only elements, counting sort performs operations and allocates a billion-entry counts array occupying roughly 4 GB of memory (at 4 bytes per integer) — while a comparison-based sort finishes in roughly operations using space (some Kbs of memory at 4 bytes per integer).

Other limitations. Counting sort also cannot handle negative integers (without shifting), floating-point numbers, or strings — any type that cannot serve as an array index. Comparison-based sorting, by contrast, works for any totally ordered type. So counting sort is a more specialized algorithm with a more limited applicability: extremely fast under the right conditions, but inapplicable or wasteful otherwise.

Properties

PropertyCounting sort
Time
Space
StableYes
In-placeNo
Value typeNon-negative integers in

Radix sort

Radix sort extends counting sort to handle integers with many digits. Instead of sorting on the entire value at once (which would require a counts array as large as the value range), radix sort processes one digit at a time, from least significant to most significant.

The algorithm

  1. Find the maximum element to determine the number of digits . Number the digit positions from left to right, so that digit 1 is the most significant (leftmost) digit and digit is the least significant (rightmost, units) digit.
  2. For (i.e., from the rightmost digit to the leftmost):
    • Sort the array by digit using a stable sort (counting sort restricted to digits 0–9).

The key insight is that we must process digits from least significant to most significant, and that sorting by each digit must be stable. After sorting by the units digit, elements with the same units digit are in a consistent order. When we then sort by the tens digit, stability ensures that elements with the same tens digit remain sorted by their units digit — and so on.

Why least significant digit first?

It may at first seem counterintuitive to start with the least significant digit. To understand why this is needed let us try to sort by the most significant digit first and see where it leads us - let us consider sorting the array . If we sorted by the most significant digit first, we would get groups starting with 3, 4, 6, 7. But then sorting by the next digit within each group would be exactly the original problem, only on smaller arrays - which means that we have made no progress toward a linear-time algorithm.

LSD (Least Significant Digit) radix sort avoids this by exploiting stability. Recall that digit positions are numbered from left to right, and we process them in reverse: . After sorting by digit , the relative order for the elements that agree on digit is determined by the previous passes on digits (i.e., the digits to its right). And when next we sort by the digit (one position to the left), the stability of sorting preserves the sorting by digit among the elements with the same digit at position .

Implementation

The digit-level sorting function is a specialized counting sort that operates on a single digit position:

export function countingSortByDigit(
  elements: number[],
  position: number,
): number[] {
  const n = elements.length;
  if (n <= 1) {
    return elements.slice(0);
  }

  const output = new Array<number>(n);
  const counts = new Array<number>(10).fill(0);

  // Count occurrences of each digit at the given position
  for (const val of elements) {
    const digit = Math.floor(val / position) % 10;
    counts[digit]!++;
  }

  // Compute prefix sums
  for (let i = 1; i < 10; i++) {
    counts[i]! += counts[i - 1]!;
  }

  // Build output in reverse for stability
  for (let i = n - 1; i >= 0; i--) {
    const val = elements[i]!;
    const digit = Math.floor(val / position) % 10;
    counts[digit]!--;
    output[counts[digit]!] = val;
  }

  return output;
}

The main radix sort function calls this function for each digit position:

export function radixSort(elements: number[]): number[] {
  if (elements.length <= 1) {
    return elements.slice(0);
  }

  const max = Math.max(...elements);

  let result = elements.slice(0);

  // Process each digit position from least significant to most significant
  for (let position = 1; Math.floor(max / position) > 0; position *= 10) {
    result = countingSortByDigit(result, position);
  }

  return result;
}

Tracing through an example

Sort .

Pass 1: Sort by units digit ():

ElementUnits digit
1700
455
755
900
8022
244
22
666

After stable sort by units digit: .

Pass 2: Sort by tens digit ():

ElementTens digit
1707
909
8020
20
242
454
757
666

After stable sort by tens digit: .

Notice that 802 and 2 both have tens digit 0, and they remain in the order established by Pass 1 (802 before 2) thanks to stability.

Pass 3: Sort by hundreds digit ():

ElementHundreds digit
8028
20
240
450
660
1701
750
900

After stable sort by hundreds digit: .

Result: is a sorted array.

Correctness

Claim. After passes of LSD (Least Significant Digit) radix sort, the array is sorted with respect to the last digits.

Proof by induction on the number of passes :

  • Base case (). The first pass sorts by digit (the units digit). Since counting sort is correct, the array is sorted with respect to the last 1 digit.

  • Inductive step. Assume that after passes the array is sorted by the last digits (that is, by digits ). Pass sorts by digit (the next digit to the left). Consider two arbitrary elements and after the pass :

    • If and differ in digit : the sort on digit places them correctly.
    • If and have the same digit at position : since the sort is stable, their relative order is preserved from the previous pass, which by the inductive hypothesis ordered them correctly by their last digits.

    In both cases, and are correctly ordered by their last digits.

Complexity analysis

Time. Radix sort makes passes, where is the number of digits in the maximum element. Each pass is a counting sort with (the radix), which takes time. Total:

For (bounded number of digits), this is . More generally, if the values are in the range for some constant , then , and radix sort runs in — no better than comparison sort.

Why focus on the range ? Because it covers the cases that arise most often in practice. When we sort items, the values are typically bounded by some polynomial in : sorting people by age gives values in which is ; sorting exam scores out of questions gives values in ; sorting the edges of an -vertex graph by integer weights in (as in Kruskal's minimum spanning tree algorithm) gives values in . Fixed-width machine integers (32-bit or 64-bit) are an even tighter case: is at most 10 or 20 decimal digits regardless of , so radix sort runs in . Ranges beyond — for instance, values up to — would give digits and an running time, but such ranges rarely arise in practice because they require exponentially large numbers that do not fit in standard integer types.

Radix sort achieves true linear time only when is bounded by a constant independent of .

Space. Each counting sort pass uses auxiliary space.

Properties

PropertyRadix sort
Time where = number of digits
Space
StableYes
Value typeNon-negative integers

Bucket sort

Bucket sort works well when the input is drawn from a uniform distribution over a known range. It distributes elements into equal-width buckets, sorts each bucket individually (typically with insertion sort), and concatenates the sorted buckets.

The algorithm

  1. Scan the input to find and — the smallest and largest values — in time.
  2. Create empty buckets spanning the range (by default , i.e., as many buckets as elements).
  3. Place each element in its bucket: element goes to bucket .
  4. Sort each bucket using insertion sort (we reuse the implementation from Chapter 4).
  5. Concatenate all buckets.

The expression normalizes to a value in , where maps to 0 and maps to 1. Multiplying by scales this to the range , and taking the floor gives a valid bucket index. We multiply by rather than so that the maximum element maps to bucket (the last bucket) rather than to bucket (which would be out of bounds): when , the normalized value is exactly 1, and .

Implementation

The implementation imports insertionSort from Chapter 4 (Elementary Sorting) to sort individual buckets. Since each bucket is expected to contain only a few elements under a uniform distribution, insertion sort's cost per bucket of size is negligible.

import { insertionSort } from
  '../04-elementary-sorting/insertion-sort';

export function bucketSort(
  elements: number[],
  bucketCount?: number,
): number[] {
  const n = elements.length;
  if (n <= 1) {
    return elements.slice(0);
  }

  const max = Math.max(...elements);
  const min = Math.min(...elements);

  // If all elements are the same, return a copy
  if (max === min) {
    return elements.slice(0);
  }

  const numBuckets = bucketCount ?? n;
  const range = max - min;

  // Create empty buckets
  const buckets: number[][] = [];
  for (let i = 0; i < numBuckets; i++) {
    buckets.push([]);
  }

  // Distribute elements into buckets.
  // The formula maps each value to a bucket index
  // in [0, numBuckets - 1].
  // Since val is in [min, max], (val - min) / range
  // is in [0, 1], so
  // Math.floor(... * (numBuckets - 1)) is always
  // in [0, numBuckets - 1].
  for (const val of elements) {
    const index = Math.floor(
      ((val - min) / range) * (numBuckets - 1)
    );
    buckets[index]!.push(val);
  }

  // Sort each bucket using insertion sort
  // and concatenate
  const result: number[] = [];
  for (const bucket of buckets) {
    insertionSort(bucket);
    for (const val of bucket) {
      result.push(val);
    }
  }

  return result;
}

Tracing through an example

Sort using 10 buckets.

Step 1: Find min and max. Scan the array: , , so the range is .

Step 2: Create 10 empty buckets (indices 0 through 9).

Step 3: Distribute elements into buckets. For each element , compute the bucket index :

Element Bucket
0.780.80497.24477
0.170.06100.54900
0.390.32932.96322
0.260.17071.53711
0.720.73176.58566
0.941.00009.00099
0.210.10980.98800
0.120.00000.00000
0.230.13411.20711
0.680.68296.14666

State of the buckets after distribution:

BucketElements
0[0.17, 0.21, 0.12]
1[0.26, 0.23]
2[0.39]
3–5[]
6[0.72, 0.68]
7[0.78]
8[]
9[0.94]

Step 4: Sort each bucket using insertion sort.

  • Bucket 0: → sort →
  • Bucket 1: → sort →
  • Bucket 2: → already sorted
  • Buckets 3–5: empty, nothing to do
  • Bucket 6: → sort →
  • Bucket 7: → already sorted
  • Bucket 8: empty, nothing to do
  • Bucket 9: → already sorted

Step 5: Concatenate all buckets in order:

Result: .

Complexity analysis

Expected time under uniform distribution. Suppose we distribute elements into buckets (the typical choice is ). If the elements are drawn uniformly from , each bucket receives roughly elements on average. Sorting each bucket with insertion sort takes expected time per bucket, and there are buckets, so the total expected sorting cost is . Combined with the distribution and concatenation steps, the total expected time is .

More precisely, the total expected cost of sorting all buckets is proportional to the sum , where is the number of elements in bucket . Using probability theory (specifically, the binomial distribution and the variance identity ), one can show that this sum equals:

The total expected cost is whenever , which holds if and only if — that is, is at least proportional to . In particular:

  • If is a fixed constant (say ), the dominant term becomes , which is no better than a single insertion sort over the whole array.
  • If grows with but slower — say — we get , which is still worse than comparison-based sorting.
  • If for any constant (i.e., ), we get , and the expected time is linear.
  • Using would still give time but would waste space on empty buckets with no further benefit.

The choice is therefore the most common in textbooks and implementations: it is the simplest choice, it achieves the linear expected time, and the space for the bucket array is — the same order as the input itself.

For example, substituting :

A note to the reader. The derivation of the formula requires familiarity with probability theory (expected values, variance, binomial distributions). The proof is provided below for the sake of completeness. If the math feels unfamiliar, feel free to skip ahead to the Worst case paragraph — the key takeaway is simply that bucket sort achieves expected time when and the input is uniformly distributed.

Proof of the expected cost formula (optional).

The cost of sorting all buckets with insertion sort is proportional to where is the number of elements in bucket . By linearity of expectation:

If the elements are drawn independently and uniformly from , each element lands in bucket with probability . We can think of each element as an independent Bernoulli trial: it either falls into bucket (success, probability ) or it does not (failure, probability ). The count is the total number of successes in independent trials, so follows a binomial distribution .

The mean and variance of a binomial random variable are and respectively. To see why, consider a single Bernoulli trial that is 1 with probability and 0 with probability . Its mean is and its variance is (since and ). The binomial count is the sum . By linearity of expectation, . Because the trials are independent, their variances also add: .

Substituting :

Here is the variance of — a standard measure from probability theory that quantifies how much a random variable deviates from its mean. It is defined as . Let us expand the square inside the expectation. Writing for brevity:

By linearity of expectation, and using the fact that is a constant:

Substituting back and rearranging, we obtain a useful identity that relates the second moment to the variance and the squared mean:

Applying this identity to , we get:

Summing over buckets:

Worst case. If all elements happen to fall into the same bucket, we will have steps for the insertion sort on that bucket. This might happen if the distribution of the elements is very far from uniform. So the more uniform the distribution of the elements over the range is, the better it is for the running time of the bucket sort.

Space. The buckets collectively hold elements, plus for the bucket array structure. With , the total is .

Properties

PropertyBucket sort
Expected time (uniform distribution)
Worst-case time
Space
StableYes (with stable per-bucket sort)
Value typeNumeric values in a known range

Comparison of linear-time sorts

AlgorithmTimeSpace            StableAssumptions
Counting sortYesInteger values in
Radix sortYesInteger values with digits
Bucket sort expectedYesUniformly distributed values

All three algorithms achieve linear time under specific conditions. Counting sort is simplest and best when the value range is not much larger than . Radix sort extends counting sort to larger ranges by processing one digit at a time. Bucket sort is ideal for floating-point data with a known, roughly uniform distribution.

These algorithms do not contradict the comparison lower bound — they simply bypass it by using non-comparison operations (such as indexing into an array by value or extracting digits).

The selection problem

We now turn to a different problem. Given an unsorted array of elements and an index (with ), find the th smallest element — the element that would be at index if the array were sorted.

Special cases include:

  • : the minimum (trivially solvable in ).
  • : the maximum (trivially solvable in ).
  • : the median.

The naive approach is to sort the array () and return the element at index . But can we do better? It turns out that the answer is yes - we can actually solve the selection problem in time and get by with much fewer comparisons if we are smart about the process.

Quickselect

Quickselect (also known as Hoare's selection algorithm) is the selection analogue of quicksort. Like quicksort, it partitions the array around a pivot. But unlike quicksort, it only recurses into one side — the side that contains the desired element.

The algorithm

  1. Choose a random pivot and partition the array.
  2. If the pivot lands at position , we are done.
  3. If pivot's position, recurse on the left partition.
  4. If pivot's position, recurse on the right partition.

Implementation

export function quickselect(
  elements: number[],
  k: number,
): number {
  if (elements.length === 0) {
    throw new RangeError('Cannot select from an empty array');
  }
  if (k < 0 || k >= elements.length) {
    throw new RangeError(
      `k=${k} is out of bounds for array of length ${elements.length}`,
    );
  }

  return select(elements, 0, elements.length - 1, k);
}

function select(
  arr: number[],
  left: number,
  right: number,
  k: number,
): number {
  if (left === right) {
    return arr[left]!;
  }

  const pivotIndex = randomizedPartition(arr, left, right);

  if (k === pivotIndex) {
    return arr[pivotIndex]!;
  } else if (k < pivotIndex) {
    return select(arr, left, pivotIndex - 1, k);
  } else {
    return select(arr, pivotIndex + 1, right, k);
  }
}

The randomizedPartition function is identical to the one used in randomized quicksort: choose a random element, swap it to the end, partition using the Lomuto scheme.

Tracing through an example

Find the 3rd smallest element (, zero-indexed) in .

The sorted array would be , so the answer is 5.

Iteration 1: Suppose the random pivot is 7 (index 0). After partitioning: , pivot at index 3.

We want , so recurse on the left partition (indices 0–2).

Iteration 2: Suppose the random pivot is 1 (index 1 of the subarray). After partitioning: , pivot at index 0.

We want , so recurse on the right partition (indices 1–2).

Iteration 3: Suppose the random pivot is 5 (index 2). After partitioning: , pivot at index 2.

We want . Done! Return .

Complexity analysis

Expected time. The analysis is similar to randomized quicksort. With a random pivot, the expected partition splits the array roughly in half. But unlike quicksort, we recurse into only one partition, so the expected work at each level halves:

More precisely, the expected number of comparisons is at most (by an analysis similar to the randomized quicksort proof, summing indicator random variables over pairs).

Why does ?

This identity can look surprising the first time you see it — how can adding up infinitely many positive numbers give a finite result? Here is a concrete example, followed by the general argument.

Concrete example. Take . The sum is . Adding the first few terms: , then , , , and so on — each new term gets us closer to but never past it. The sum converges to exactly .

What kind of sum is this? This is a geometric series — a sum where each term is a fixed fraction of the previous one (here each term is half the preceding term). The general formula for an infinite geometric series with first term and common ratio (where ) is:

In our case and , so the sum is .

Step-by-step derivation. Where does the formula come from? Let denote the infinite sum:

Multiply both sides by :

Now subtract the second equation from the first. On the right-hand side, almost every term cancels — cancels with , cancels with , and so on — leaving only the first term:

Factoring the left-hand side gives , so . This works whenever , because the terms shrink toward zero and the infinite sum converges to a finite value. (When , the terms do not shrink and the sum diverges.)

Why it matters here. Each recursive call does less and less work — the first call scans elements, the next scans roughly , then , and so on. Even though there are infinitely many terms in the idealized sum, the terms shrink so fast that the total never exceeds . The first call already accounts for half the total work, the second call for a quarter, and so on — the contributions diminish rapidly and their sum converges to a finite value.

Worst case. If the pivot always lands at one extreme, we have:

This is the same worst case as quicksort, but it is extremely unlikely with random pivots.

Why does ?

This identity is the counterpart to the geometric series above, and it explains why the worst case is so much worse than the expected case.

Concrete example. Take . The sum is . Compare this to the geometric series . The arithmetic sum is 25 times larger, because the terms shrink much more slowly.

What kind of sum is this? If the pivot always lands at one extreme, each recursive call reduces the problem size by only 1 instead of halving it. Expanding the recurrence, we get:

This is an arithmetic series — a sum where each term decreases by a fixed amount (here, by 1) rather than by a fixed ratio.

Step-by-step derivation. Where does the formula come from? A classic trick attributed to Gauss: write the sum forwards and backwards and add them together:

Adding these two rows term by term, every column sums to :

Therefore . The total is roughly , which is .

Why it matters here. Contrast this with the expected case: halving the problem size each time gives a geometric series that converges to , while reducing it by just 1 each time gives an arithmetic series that grows to . The difference between "half as much work each step" and "one less unit of work each step" is the difference between linear and quadratic time.

Properties

PropertyQuickselect
Expected time
Worst-case time
Space expected stack
DeterministicNo (randomized)

Median of medians

Can we achieve worst-case selection? The answer is yes, using a clever pivot-selection strategy called median of medians (also known as BFPRT, after its five inventors: Blum, Floyd, Pratt, Rivest, and Tarjan, 1973).

The idea: instead of choosing a random pivot, choose a pivot that is guaranteed to be near the median, ensuring that each partition eliminates a constant fraction of the elements.

The algorithm

Let's call the overall procedure — it returns the -th smallest element of array (zero-indexed). The algorithm works in place: it rearranges so that position holds the correct element (with everything to its left it and everything to its right it), then returns that element's value.

  1. Divide the elements into groups of 5 (the last group may have fewer).
  2. Find the median of each group by sorting it (sorting 5 elements takes constant time). Collect these medians into a new array .
  3. Find a good pivot by calling — that is, use this same selection procedure on the smaller array to find its median. The result is the "median of medians" pivot.
  4. Partition the original array around this pivot, placing all smaller elements to its left and all larger elements to its right. The pivot ends up at some position .
  5. Select from the correct side. If , return the pivot. Otherwise, call on the left partition (if ) or the right partition (if ) to continue searching for the -th element.

Implementation

The implementation reuses insertionSortRange from Chapter 4 to sort small groups in place, and the Lomuto partition from Chapter 5 to partition around the chosen pivot. Each group has at most 5 elements, so insertion sort's cost for is per group.

import { insertionSortRange } from
  '../04-elementary-sorting/insertion-sort';
import { partition } from
  '../05-efficient-sorting/quick-sort';

export function medianOfMedians(
  elements: number[],
  k: number,
): number {
  if (elements.length === 0) {
    throw new RangeError('Cannot select from an empty array');
  }
  if (k < 0 || k >= elements.length) {
    throw new RangeError(
      `k=${k} is out of bounds for array of length ${elements.length}`,
    );
  }

  return selectMoM(elements, 0, elements.length - 1, k);
}

The core recursive function:

function selectMoM(
  arr: number[],
  left: number,
  right: number,
  k: number,
): number {
  // Base case: small enough to sort directly
  if (right - left < 5) {
    insertionSortRange(arr, left, right);
    return arr[k]!;
  }

  // Step 1: Divide into groups of 5, find median of each group
  const numGroups = Math.ceil((right - left + 1) / 5);
  for (let i = 0; i < numGroups; i++) {
    const groupLeft = left + i * 5;
    const groupRight = Math.min(groupLeft + 4, right);

    // Sort the group to find its median
    insertionSortRange(arr, groupLeft, groupRight);

    // Move the median of this group to the front of the array
    const medianIndex =
      groupLeft + Math.floor((groupRight - groupLeft) / 2);
    swap(arr, medianIndex, left + i);
  }

  // Step 2: Recursively find the median of the medians
  const medianOfMediansIndex =
    left + Math.floor((numGroups - 1) / 2);
  selectMoM(arr, left, left + numGroups - 1, medianOfMediansIndex);

  // The median of medians is now at medianOfMediansIndex
  // Step 3: Lomuto partition (from Ch. 5) with the chosen pivot
  const pivotIndex = partition(
    arr, left, right, numberComparator,
    medianOfMediansIndex
  )!;

  if (k === pivotIndex) {
    return arr[pivotIndex]!;
  } else if (k < pivotIndex) {
    return selectMoM(arr, left, pivotIndex - 1, k);
  } else {
    return selectMoM(arr, pivotIndex + 1, right, k);
  }
}

Tracing through an example

Find the median (, zero-indexed) in:

.

The sorted array is , so the answer at is 8.

Step 1: Divide into groups of 5 and find medians.

GroupElementsSortedMedian
1[12, 3, 5, 7, 19][3, 5, 7, 12, 19]7
2[26, 4, 1, 8, 15][1, 4, 8, 15, 26]8
3[20, 11, 9, 2, 6][2, 6, 9, 11, 20]9

Step 2: Median of medians. The medians are . The median of this group is 8.

Step 3: Partition around 8. Using 8 as the pivot, elements go left, elements go right:

The pivot lands at index 7. We want , and the pivot is at index 7. Done! Return 8.

A note on the partition. The partition above is shown conceptually: it illustrates which elements end up on which side of the pivot, but lists them in their original order for readability. The actual in-place algorithm rearranges the array before partitioning — sorting each group of 5, swapping medians to the front — and the Lomuto partition itself does not preserve relative order. The arrangement of elements within each partition does not matter for correctness or efficiency; all that matters is that the pivot lands at position with all smaller elements to its left and all larger elements to its right.

Complexity analysis

Time. Let be the worst-case time to select from elements. The algorithm does the following work at each level of recursion:

  • : the non-recursive work — sorting each of the groups of 5 takes per group, so total; and partitioning the full array around the pivot is a single linear scan, also .
  • : the recursive call to find the median of the group medians. After sorting each group and extracting one median per group (that is the work above), we have an array of medians and need to find their median — that is, select the -th smallest element out of the medians (since the median of elements is the element at position in sorted order, and ). This is a selection problem on a smaller input, so we solve it by calling the same median-of-medians algorithm recursively, which costs .
  • : the recursive call to select within the partition that contains the target index since neither partition can have more than elements; let us provide the proof of this below.

Why the pivot guarantees a worst-case split. We have groups, and the pivot is the median of the group medians. Call that count . If we lined up all group medians in sorted order, would sit in the middle. That means at least of the group medians are (the ones at or below the middle position), and at least are . Now consider any group whose median is . Within that group of 5, the median is the 3rd-smallest element, so the median and the two elements below it are all — that is 3 elements per group. So the total number of elements guaranteed to be is at least:

However, the last group may have fewer than 5 elements, so its median may sit above fewer than 2 elements — in the worst case (a group of 1 or 2), it contributes only 1 element instead of 3, a shortfall of 2. Every other group, including the group that contains itself, is a full group of 5 and contributes the full 3 elements ('s group contributes plus the 2 elements below it). So the total number of elements guaranteed to be is at least , which simplifies to .

By a symmetric argument, is also at least elements. Therefore, after partitioning around , neither side can contain more than elements. Since , this is at most .

The recurrence. Combining the three terms above gives:

Note that the master theorem does not directly apply here, because the recurrence has two recursive terms of different sizes rather than the standard form . The key observation is that : the two recursive subproblem sizes add up to a strict fraction of , and this is what makes the recursion tree converge and make behave as a .

Why does behave as ?

Consider the recursion tree. At each level, sum the non-recursive work done by all active subproblems:

LevelSubproblem sizesWork at this level
0
1
2
(all branches at depth )

Because the fractions sum to , the total work at each level shrinks by a factor of . Summing over all levels gives a convergent geometric series:

The strict inequality is what makes the series converge and collapses all levels into a single term.

Why groups of 5? The choice of 5 is not arbitrary — it is the smallest group size that makes the recurrence work out to . Groups of 3 and 4 both fail, and for related reasons.

Groups of 3. We would have groups. The median of medians step would recurse on elements to find the pivot. By the same counting argument, the pivot would be guaranteed to be (and ) at least elements (2 elements per group instead of 3, since in a group of 3 the median has only 1 element below it, plus itself). So each partition side would have at most elements. The recurrence would be:

Since , the two subproblems add up to the full input size (plus a constant), so this solves to , not .

Groups of 4. We would have groups. The lower median of a group of 4 is the 2nd element, which has only 1 element below it — so each qualifying group contributes just 2 elements, the same as groups of 3. The pivot is guaranteed to be (and ) at least elements, giving a worst-case partition of size . The recurrence would be:

Since , this again solves to , not .

Why 5 succeeds where 3 and 4 fail. The crucial quantity is how many elements each qualifying group contributes to the "guaranteed " set: (the median plus every element below it). For this is 2; for it is also 2 (the lower median of 4 elements has only 1 element below it, the same as the median of 3); but for it jumps to 3. This extra element per group is what pushes the fraction sum below 1: for groups of 5 we get , giving the convergent geometric series that yields .

Note that groups of 6 also give : each group contributes elements (the same as groups of 5), but there are fewer groups ( instead of ), so fewer qualifying groups ( instead of ), giving only guaranteed elements instead of . The worst-case partition is therefore , and the fractions sum to . But 5 is preferred over 6 because smaller groups mean less work sorting each group and fewer groups to process. More generally, odd group sizes are more efficient because their median is the true middle element: in a group of (odd), elements sit at or below the median and the same number at or above, so every element "pulls its weight." In an even group, the lower median is biased — one element sits above the median without contributing to the guarantee, wasting a slot. Groups of 4 and 3 end up contributing the same count (2) even though groups of 4 are larger.

In order to have we need the fractions to sum to strictly less than 1.

Why does solve to and not ?

Consider the recursion tree. At each level, sum the non-recursive work done by all active subproblems:

LevelSubproblem sizesWork at this level
0
1
2
(all branches at depth )

Because the fractions sum to exactly 1, the total input covered at every level is exactly , so every level costs . The deepest branch is the arm, which reaches its base case after levels. Multiplying gives:

This is the same reason mergesort — whose recurrence also has fractions summing to — is .

Contrast this with the groups-of-5 recurrence analysed in the box above: there , so the per-level work shrinks geometrically and sums to . The difference between and comes down to whether the fractions sum to strictly less than 1 (geometric series converges) or exactly 1 (every level costs and there are levels).

Space. The recursion has depth (each level reduces the problem by a constant factor), so the stack space is .

Practical considerations

While the median-of-medians algorithm is a beautiful theoretical result — it proved that deterministic linear-time selection is possible — it is rarely used in practice. The constant factor hidden in the is large (roughly 5–10× slower than randomized quickselect for typical inputs). Randomized quickselect is almost always faster in practice because:

  1. It avoids the overhead of computing medians of groups.
  2. Random pivots are usually good enough.
  3. The probability of quadratic behavior is exponentially small in — bounded by — making it vanishingly unlikely for any practical input size.

To see why, let's call a pivot "good" if it lands in the middle 50% of the subarray (between the 25th and 75th percentiles), and "bad" otherwise — like a coin flip, each pivot is good with probability . A good pivot shrinks the subproblem to at most of its current size, so just a handful of good pivots are enough to collapse the subarray through geometric shrinkage: For quadratic behavior, almost all pivots throughout the entire recursion would have to be bad — good pivots must appear too rarely to drive this shrinkage. But that is like flipping a fair coin hundreds of times and getting almost no heads. Each flip is independent, so the probability drops exponentially with : out of pivots we expect about half to be good, but quadratic behavior needs almost all of them to be bad. Each additional pivot that "must be bad" multiplies the probability by , so after the pivots in the "bad" scenario the probability is at most . For an input of just , this is already .

The practical value of median of medians is primarily as a fallback: some implementations (e.g., the introselect algorithm in C++ STL) start with quickselect and switch to median of medians if the recursion depth grows too large, guaranteeing worst-case while maintaining fast average-case performance.

Properties

PropertyMedian of medians
Worst-case time
Space — recursion stack depth (each level reduces the problem by a constant fraction)
DeterministicYes
PracticalSlower than quickselect due to large constants

Summary

In this chapter we studied algorithms that break the comparison-based sorting barrier and solve the selection problem in linear time:

  • Counting sort sorts non-negative integers in the range in time by counting occurrences and computing prefix sums. It is stable and serves as a building block for radix sort.

  • Radix sort extends counting sort to handle integers with multiple digits, sorting digit by digit from least significant to most significant. It runs in time where is the number of digits. Correctness depends on the subroutine sort being stable, so that the ordering established by earlier digits is preserved when sorting by later ones.

  • Bucket sort distributes elements into buckets, sorts each bucket, and concatenates. Under a uniform distribution, the expected time is . Its worst case is when all elements land in one bucket.

  • Quickselect finds the th smallest element in expected time by partitioning around a random pivot and recursing into one side. It is the practical algorithm of choice for selection.

  • Median of medians achieves worst-case selection through a carefully chosen pivot: the median of a group medians. While theoretically optimal, its large constant factor makes it slower than randomized quickselect in practice.

The linear-time sorting algorithms teach an important lesson: algorithmic lower bounds depend on the model of computation. The bound is real for comparison-based sorting, but by stepping outside the comparison model — using integers as array indices (counting sort), extracting digits (radix sort) — we can do better. The selection algorithms show that finding a single order statistic is fundamentally easier than fully sorting, requiring only time regardless of the method.

Exercises

Exercise 6.1. Trace through counting sort on the input . Show the counts array after each step (counting, prefix sums, placement). Verify that the sort is stable by tracking the original indices of elements with value 3.

Exercise 6.2. Radix sort processes digits from least significant to most significant, using a stable sort at each step. What goes wrong if we process digits from most significant to least significant? Give a concrete example where MSD radix sort (without special handling) produces incorrect output.

Exercise 6.3. Counting sort uses space for the counts array, where is the maximum value. If we need to sort integers in the range , we could use counting sort directly with , or we could use radix sort with a base- representation (2 digits). Compare the time and space complexity of both approaches.

Exercise 6.4. Consider a modification of quickselect where, instead of choosing a random pivot, we always choose the first element as the pivot. Describe an input of size for which this modified quickselect takes time to find the median. Then describe an input for which it takes time.