Linear-Time Sorting and Selection
In Chapter 4 we proved a lower bound: every comparison-based sorting algorithm must make comparisons in the worst case. The efficient algorithms of Chapter 5 — merge sort, quicksort, heapsort — all meet this bound, and none can beat it. But what if we are willing to go beyond pairwise comparisons? If we know something about the structure of the keys — for instance, that they are integers in a bounded range — we can exploit that structure to sort in linear time. In this chapter we study three such algorithms: counting sort, radix sort, and bucket sort. We also turn to a related problem — selection — and present two algorithms that find the th smallest element in time without sorting: randomized quickselect and the deterministic median-of-medians algorithm.
Breaking the comparison lower bound
The lower bound from Chapter 4 applies to comparison-based sorting: algorithms that learn about the input only by comparing pairs of elements. The decision-tree argument shows that any comparison-based algorithm must traverse a binary tree of height at least , because there are possible permutations and each leaf of the decision tree corresponds to one permutation.
This lower bound does not apply if we use operations other than comparisons. If the keys are integers, we can look at individual digits. If the keys are bounded, we can use them as array indices. These non-comparison-based operations give us additional information that comparison-based algorithms cannot access, and this is what allows us to sort faster.
The trade-off is generality: comparison-based sorting works for any totally ordered type, while the algorithms in this chapter require specific key structure (integers, bounded range, uniform distribution).
Counting sort
Counting sort is the simplest linear-time sorting algorithm. It works for non-negative integer keys in a known range and sorts by counting how many times each value appears.
The algorithm
- Create an array
countsof size , initialized to zeros. - For each element in the input, increment
counts[element]. - Compute prefix sums: replace each
counts[i]with the sum of all counts for values . After this step,counts[i]tells us the position after the last occurrence of value in the sorted output. - Walk the input array in reverse, placing each element at position
counts[element] - 1and decrementing the count. Walking in reverse ensures stability.
Implementation
export function countingSort(elements: number[]): number[] {
if (elements.length <= 1) {
return elements.slice(0);
}
const max = Math.max(...elements);
const counts = new Array<number>(max + 1).fill(0);
// Count occurrences
for (const val of elements) {
counts[val]!++;
}
// Compute prefix sums (cumulative counts)
for (let i = 1; i <= max; i++) {
counts[i]! += counts[i - 1]!;
}
// Build output array in reverse for stability
const output = new Array<number>(elements.length);
for (let i = elements.length - 1; i >= 0; i--) {
const val = elements[i]!;
counts[val]!--;
output[counts[val]!] = val;
}
return output;
}
Tracing through an example
Let us sort .
Step 1–2: Count occurrences. The maximum value is 8, so we create counts of size 9:
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| Count | 0 | 1 | 2 | 2 | 1 | 0 | 0 | 0 | 1 |
Step 3: Prefix sums. Each entry becomes the cumulative count:
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| Prefix sum | 0 | 1 | 3 | 5 | 6 | 6 | 6 | 6 | 7 |
The prefix sum tells us: 0 elements are , 1 element is , 3 elements are , and so on.
Step 4: Place elements (reverse scan).
counts[A[i]] before | Output position | counts[A[i]] after | ||
|---|---|---|---|---|
| 6 | 1 | 1 | 0 | 0 |
| 5 | 3 | 5 | 4 | 4 |
| 4 | 3 | 4 | 3 | 3 |
| 3 | 8 | 7 | 6 | 6 |
| 2 | 2 | 3 | 2 | 2 |
| 1 | 2 | 2 | 1 | 1 |
| 0 | 4 | 6 | 5 | 5 |
Result: .
Notice that the two 2s and the two 3s appear in the same relative order as in the input — counting sort is stable.
Stability
Counting sort's stability is not an accident; it is a consequence of scanning the input in reverse during the placement step. When we encounter the last occurrence of a value (scanning right to left), we place it at the highest available position for that value. The second-to-last occurrence goes one position earlier, and so on. This preserves the original relative order among elements with equal keys.
Stability matters when sorting records by one key while preserving order on another, and it is essential for counting sort's role as a subroutine in radix sort.
Complexity analysis
Time. The algorithm makes four passes:
- Finding the maximum: .
- Counting occurrences: .
- Computing prefix sums: .
- Placing elements in the output: .
Total: , where is the maximum value.
Space. The counts array uses space, and the output array uses space. Total: .
When is counting sort practical? When , counting sort runs in time and is excellent. When (for example, sorting 10 numbers in the range ), the space and time become prohibitive, and a comparison-based sort would be faster.
Properties
| Property | Counting sort |
|---|---|
| Time | |
| Space | |
| Stable | Yes |
| In-place | No |
| Key type | Non-negative integers in |
Radix sort
Radix sort extends counting sort to handle integers with many digits. Instead of sorting on the entire key at once (which would require a counts array as large as the key range), radix sort processes one digit at a time, from least significant to most significant.
The algorithm
- Find the maximum element to determine the number of digits .
- For each digit position from least significant to most significant:
- Sort the array by that digit using a stable sort (counting sort restricted to digits 0–9).
The key insight is that we must process digits from least significant to most significant, and each digit sort must be stable. After sorting by the units digit, elements with the same units digit are in a consistent order. When we then sort by the tens digit, stability ensures that elements with the same tens digit remain sorted by their units digit — and so on.
Why least significant digit first?
It may seem counterintuitive to start with the least significant digit. Consider sorting . If we sorted by the most significant digit first, we would get groups starting with 3, 4, 6, 7. But then sorting by the next digit within each group is exactly the original problem on smaller arrays — we have made no progress toward a linear-time algorithm.
LSD radix sort avoids this by exploiting stability. After sorting by digit , the relative order of elements that agree on digit is determined by the previous passes on digits . When we sort by digit , stability preserves this order among elements with the same digit at position .
Implementation
The digit-level sorting subroutine is a specialized counting sort that operates on a single digit position:
export function countingSortByDigit(
elements: number[],
exp: number,
): number[] {
const n = elements.length;
if (n <= 1) {
return elements.slice(0);
}
const output = new Array<number>(n);
const counts = new Array<number>(10).fill(0);
// Count occurrences of each digit at position exp
for (const val of elements) {
const digit = Math.floor(val / exp) % 10;
counts[digit]!++;
}
// Compute prefix sums
for (let i = 1; i < 10; i++) {
counts[i]! += counts[i - 1]!;
}
// Build output in reverse for stability
for (let i = n - 1; i >= 0; i--) {
const val = elements[i]!;
const digit = Math.floor(val / exp) % 10;
counts[digit]!--;
output[counts[digit]!] = val;
}
return output;
}
The main radix sort function calls this subroutine for each digit position:
export function radixSort(elements: number[]): number[] {
if (elements.length <= 1) {
return elements.slice(0);
}
const max = Math.max(...elements);
let result = elements.slice(0);
// Process each digit position from least significant to most significant
for (let exp = 1; Math.floor(max / exp) > 0; exp *= 10) {
result = countingSortByDigit(result, exp);
}
return result;
}
Tracing through an example
Sort .
Pass 1: Sort by units digit ():
| Element | Units digit |
|---|---|
| 170 | 0 |
| 45 | 5 |
| 75 | 5 |
| 90 | 0 |
| 802 | 2 |
| 24 | 4 |
| 2 | 2 |
| 66 | 6 |
After stable sort by units digit: .
Pass 2: Sort by tens digit ():
| Element | Tens digit |
|---|---|
| 170 | 7 |
| 90 | 9 |
| 802 | 0 |
| 2 | 0 |
| 24 | 2 |
| 45 | 4 |
| 75 | 7 |
| 66 | 6 |
After stable sort by tens digit: .
Notice that 802 and 2 both have tens digit 0, and they remain in the order established by Pass 1 (802 before 2) thanks to stability.
Pass 3: Sort by hundreds digit ():
| Element | Hundreds digit |
|---|---|
| 802 | 8 |
| 2 | 0 |
| 24 | 0 |
| 45 | 0 |
| 66 | 0 |
| 170 | 1 |
| 75 | 0 |
| 90 | 0 |
After stable sort by hundreds digit: .
Result: . Sorted!
Correctness
Claim. After passes of LSD radix sort, the array is sorted with respect to the last digits.
Proof by induction. After the first pass, the array is sorted by the units digit (the counting sort is correct). Assume after passes the array is sorted by the last digits. Consider two elements and after pass :
- If and differ in digit : the sort on digit places them correctly.
- If and have the same digit at position : since the sort is stable, their relative order is preserved from the previous pass, which (by hypothesis) ordered them correctly by their last digits.
In both cases, the elements are correctly ordered by their last digits.
Complexity analysis
Time. Radix sort makes passes, where is the number of digits in the maximum element. Each pass is a counting sort with (the radix), which takes time. Total:
For (bounded number of digits), this is . More generally, if the values are in the range for some constant , then , and radix sort runs in — no better than comparison sort. Radix sort achieves true linear time only when is bounded by a constant independent of .
Space. Each counting sort pass uses auxiliary space.
Properties
| Property | Radix sort |
|---|---|
| Time | where = number of digits |
| Space | |
| Stable | Yes |
| Key type | Non-negative integers |
Bucket sort
Bucket sort works well when the input is drawn from a uniform distribution over a known range. It distributes elements into equal-width buckets, sorts each bucket individually (typically with insertion sort), and concatenates the sorted buckets.
The algorithm
- Determine the range of the input.
- Create empty buckets spanning the range.
- Place each element in its bucket: element goes to bucket .
- Sort each bucket using insertion sort.
- Concatenate all buckets.
Implementation
export function bucketSort(
elements: number[],
bucketCount?: number,
): number[] {
const n = elements.length;
if (n <= 1) {
return elements.slice(0);
}
const max = Math.max(...elements);
const min = Math.min(...elements);
// If all elements are the same, return a copy
if (max === min) {
return elements.slice(0);
}
const numBuckets = bucketCount ?? n;
const range = max - min;
// Create empty buckets
const buckets: number[][] = [];
for (let i = 0; i < numBuckets; i++) {
buckets.push([]);
}
// Distribute elements into buckets
for (const val of elements) {
let index = Math.floor(
((val - min) / range) * (numBuckets - 1)
);
if (index >= numBuckets) {
index = numBuckets - 1;
}
buckets[index]!.push(val);
}
// Sort each bucket using insertion sort and concatenate
const result: number[] = [];
for (const bucket of buckets) {
insertionSortInPlace(bucket);
for (const val of bucket) {
result.push(val);
}
}
return result;
}
The subroutine insertionSortInPlace is efficient for the small bucket sizes expected under uniform distribution:
function insertionSortInPlace(arr: number[]): void {
for (let i = 1; i < arr.length; i++) {
const key = arr[i]!;
let j = i - 1;
while (j >= 0 && arr[j]! > key) {
arr[j + 1] = arr[j]!;
j--;
}
arr[j + 1] = key;
}
}
Tracing through an example
Sort using 10 buckets.
Here the range is , so each bucket covers an interval of width roughly .
| Bucket | Elements |
|---|---|
| 0 | [0.17, 0.12] |
| 1 | [0.26, 0.21, 0.23] |
| 3 | [0.39] |
| 6 | [0.72, 0.68] |
| 8 | [0.78] |
| 9 | [0.94] |
After sorting each bucket and concatenating: .
Complexity analysis
Expected time under uniform distribution. If elements are drawn independently and uniformly from , then with buckets each element lands in a random bucket. The expected number of elements per bucket is 1. By a balls-into-bins argument, the expected total cost of sorting all buckets is:
where is the number of elements in bucket . Since each element independently falls into bucket with probability , we have and . Summing over buckets:
Including the distribution and concatenation steps, the total expected time is .
Worst case. If all elements fall into one bucket, we pay for insertion sort on that bucket. This happens when the distribution is far from uniform.
Space. The buckets collectively hold elements, plus for the bucket array structure. Total: .
Properties
| Property | Bucket sort |
|---|---|
| Expected time | (uniform distribution) |
| Worst-case time | |
| Space | |
| Stable | Yes (with stable per-bucket sort) |
| Key type | Numeric keys in a known range |
Comparison of linear-time sorts
| Algorithm | Time | Space | Stable | Assumptions |
|---|---|---|---|---|
| Counting sort | Yes | Integer keys in | ||
| Radix sort | Yes | Integer keys with digits | ||
| Bucket sort | expected | Yes | Uniformly distributed keys |
All three algorithms achieve linear time under specific conditions. Counting sort is simplest and best when the key range is not much larger than . Radix sort extends counting sort to larger ranges by processing one digit at a time. Bucket sort is ideal for floating-point data with a known, roughly uniform distribution.
None of these algorithms contradicts the comparison lower bound — they bypass it by using non-comparison operations (indexing into an array by key value, extracting digits).
The selection problem
We now turn to a different problem. Given an unsorted array of elements and an integer (with ), find the th smallest element — the element that would be at index if the array were sorted.
Special cases include:
- : the minimum (trivially solvable in ).
- : the maximum.
- : the median.
The naive approach is to sort the array () and return the element at index . Can we do better? Yes — we can solve the selection problem in time.
Quickselect
Quickselect (also known as Hoare's selection algorithm) is the selection analogue of quicksort. Like quicksort, it partitions the array around a pivot. But unlike quicksort, it only recurses into one side — the side that contains the desired element.
The algorithm
- Choose a random pivot and partition the array.
- If the pivot lands at position , we are done.
- If pivot's position, recurse on the left partition.
- If pivot's position, recurse on the right partition.
Implementation
export function quickselect(
elements: number[],
k: number,
): number {
if (elements.length === 0) {
throw new RangeError('Cannot select from an empty array');
}
if (k < 0 || k >= elements.length) {
throw new RangeError(
`k=${k} is out of bounds for array of length ${elements.length}`,
);
}
const copy = elements.slice(0);
return select(copy, 0, copy.length - 1, k);
}
function select(
arr: number[],
left: number,
right: number,
k: number,
): number {
if (left === right) {
return arr[left]!;
}
const pivotIndex = randomizedPartition(arr, left, right);
if (k === pivotIndex) {
return arr[pivotIndex]!;
} else if (k < pivotIndex) {
return select(arr, left, pivotIndex - 1, k);
} else {
return select(arr, pivotIndex + 1, right, k);
}
}
The randomizedPartition function is identical to the one used in randomized quicksort: choose a random element, swap it to the end, partition using the Lomuto scheme.
Tracing through an example
Find the 3rd smallest element (, zero-indexed) in .
The sorted array would be , so the answer is 5.
Iteration 1: Suppose the random pivot is 7 (index 0). After partitioning: , pivot at index 3.
We want , so recurse on the left partition (indices 0–2).
Iteration 2: Suppose the random pivot is 1 (index 1 of the subarray). After partitioning: , pivot at index 0.
We want , so recurse on the right partition (indices 1–2).
Iteration 3: Suppose the random pivot is 5 (index 2). After partitioning: , pivot at index 2.
We want . Done! Return .
Complexity analysis
Expected time. The analysis is similar to randomized quicksort. With a random pivot, the expected partition splits the array roughly in half. But unlike quicksort, we recurse into only one partition, so the expected work at each level halves:
More precisely, the expected number of comparisons is at most (by an analysis similar to the randomized quicksort proof, summing indicator random variables over pairs).
Worst case. If the pivot always lands at one extreme, we have:
This is the same worst case as quicksort, but it is extremely unlikely with random pivots.
Properties
| Property | Quickselect |
|---|---|
| Expected time | |
| Worst-case time | |
| Space | for copy + expected stack |
| Deterministic | No (randomized) |
Median of medians
Can we achieve worst-case selection? The answer is yes, using a clever pivot-selection strategy called median of medians (also known as BFPRT, after its five inventors: Blum, Floyd, Pratt, Rivest, and Tarjan, 1973).
The idea: instead of choosing a random pivot, choose a pivot that is guaranteed to be near the median, ensuring that each partition eliminates a constant fraction of the elements.
The algorithm
- Divide the elements into groups of 5.
- Find the median of each group by sorting (5 elements can be sorted in constant time).
- Recursively compute the median of these medians.
- Use this "median of medians" as the pivot for partitioning.
- Recurse into the appropriate partition (just like quickselect).
Why groups of 5?
The choice of 5 is not arbitrary. It is the smallest odd group size that makes the recurrence work out to . The median of medians is guaranteed to be larger than at least elements and smaller than at least elements. This means each recursive call operates on at most roughly elements.
Here is why: the median of medians is larger than the medians of half the groups (roughly groups), and each of those medians is larger than 2 elements in its group. Therefore, the pivot is larger than at least elements. By symmetry, it is also smaller than at least elements. The worst-case partition is therefore at most .
Implementation
export function medianOfMedians(
elements: number[],
k: number,
): number {
if (elements.length === 0) {
throw new RangeError('Cannot select from an empty array');
}
if (k < 0 || k >= elements.length) {
throw new RangeError(
`k=${k} is out of bounds for array of length ${elements.length}`,
);
}
const copy = elements.slice(0);
return selectMoM(copy, 0, copy.length - 1, k);
}
The core recursive function:
function selectMoM(
arr: number[],
left: number,
right: number,
k: number,
): number {
// Base case: small enough to sort directly
if (right - left < 5) {
insertionSortRange(arr, left, right);
return arr[k]!;
}
// Step 1: Divide into groups of 5, find median of each
const numGroups = Math.ceil((right - left + 1) / 5);
for (let i = 0; i < numGroups; i++) {
const groupLeft = left + i * 5;
const groupRight = Math.min(groupLeft + 4, right);
insertionSortRange(arr, groupLeft, groupRight);
const medianIndex =
groupLeft + Math.floor((groupRight - groupLeft) / 2);
swap(arr, medianIndex, left + i);
}
// Step 2: Recursively find the median of the medians
const medianOfMediansIndex =
left + Math.floor((numGroups - 1) / 2);
selectMoM(arr, left, left + numGroups - 1, medianOfMediansIndex);
// Step 3: Partition around the median of medians
const pivotIndex = partitionAroundPivot(
arr, left, right, medianOfMediansIndex
);
if (k === pivotIndex) {
return arr[pivotIndex]!;
} else if (k < pivotIndex) {
return selectMoM(arr, left, pivotIndex - 1, k);
} else {
return selectMoM(arr, pivotIndex + 1, right, k);
}
}
Tracing through an example
Find the median (, zero-indexed) in:
.
The sorted array is , so the answer at is 8.
Step 1: Divide into groups of 5 and find medians.
| Group | Elements | Sorted | Median |
|---|---|---|---|
| 1 | [12, 3, 5, 7, 19] | [3, 5, 7, 12, 19] | 7 |
| 2 | [26, 4, 1, 8, 15] | [1, 4, 8, 15, 26] | 8 |
| 3 | [20, 11, 9, 2, 6] | [2, 6, 9, 11, 20] | 9 |
Step 2: Median of medians. The medians are . The median of this group is 8.
Step 3: Partition around 8. Using 8 as the pivot, elements go left, elements go right:
The pivot lands at index 7. We want , and the pivot is at index 7. Done! Return 8.
Complexity analysis
Time. Let be the worst-case time for selecting from elements.
The algorithm does the following work:
- Sorting groups of 5: total.
- Finding the median of medians: .
- Partitioning: .
- Recursing into the larger partition: at most .
This gives the recurrence:
We claim . To verify, assume for some constant . Then:
provided . Since , the two recursive calls together operate on a shrinking fraction of the input, and the algorithm runs in time.
Space. The recursion has depth (each level reduces the problem by a constant factor), so the stack space is . Combined with the copy, total space is .
Practical considerations
While the median-of-medians algorithm is a beautiful theoretical result — it proved that deterministic linear-time selection is possible — it is rarely used in practice. The constant factor hidden in the is large (roughly 5–10× slower than randomized quickselect for typical inputs). Randomized quickselect is almost always faster in practice because:
- It avoids the overhead of computing medians of groups.
- Random pivots are usually good enough.
- The probability of quadratic behavior is astronomically small.
The practical value of median of medians is primarily as a fallback: some implementations (e.g., the introselect algorithm in C++ STL) start with quickselect and switch to median of medians if the recursion depth grows too large, guaranteeing worst-case while maintaining fast average-case performance.
Properties
| Property | Median of medians |
|---|---|
| Worst-case time | |
| Space | |
| Deterministic | Yes |
| Practical | Slower than quickselect due to large constants |
Chapter summary
In this chapter we studied algorithms that break the comparison-based sorting barrier and solve the selection problem in linear time:
-
Counting sort sorts non-negative integers in time by counting occurrences and computing prefix sums. It is stable and serves as a building block for radix sort.
-
Radix sort extends counting sort to handle integers with multiple digits, sorting digit by digit from least significant to most significant. It runs in time where is the number of digits. The key requirement is a stable subroutine sort.
-
Bucket sort distributes elements into buckets, sorts each bucket, and concatenates. Under a uniform distribution, the expected time is . Its worst case is when all elements land in one bucket.
-
Quickselect finds the th smallest element in expected time by partitioning around a random pivot and recursing into one side. It is the practical algorithm of choice for selection.
-
Median of medians achieves worst-case selection through a carefully chosen pivot: the median of group medians. While theoretically optimal, its large constant factor makes it slower than randomized quickselect in practice.
The linear-time sorting algorithms teach an important lesson: algorithmic lower bounds depend on the model of computation. The bound is real for comparison-based sorting, but by stepping outside the comparison model — using integers as array indices, extracting digits — we can do better. The selection algorithms show that finding a single order statistic is fundamentally easier than fully sorting, requiring only time regardless of the method.
Exercises
Exercise 6.1. Trace through counting sort on the input . Show the counts array after each step (counting, prefix sums, placement). Verify that the sort is stable by tracking the original indices of elements with value 3.
Exercise 6.2. Radix sort processes digits from least significant to most significant, using a stable sort at each step. What goes wrong if we process digits from most significant to least significant? Give a concrete example where MSD radix sort (without special handling) produces incorrect output.
Exercise 6.3. Counting sort uses space for the counts array, where is the maximum value. If we need to sort integers in the range , we could use counting sort directly with , or we could use radix sort with a base- representation (2 digits). Compare the time and space complexity of both approaches.
Exercise 6.4. Consider a modification of quickselect where, instead of choosing a random pivot, we always choose the first element as the pivot. Describe an input of size for which this modified quickselect takes time to find the median. Then describe an input for which it takes time.
Exercise 6.5. The median-of-medians algorithm divides elements into groups of 5. What happens if we use groups of 3 instead? Set up the recurrence and show that it does not solve to . What about groups of 7? (Hint: compute the fraction of elements guaranteed to be eliminated at each step for each group size.)