Preface
Algorithms with TypeScript is a free, comprehensive textbook covering algorithms and data structures from first principles to advanced topics. Spanning 22 chapters across six parts, it features idiomatic TypeScript 5 implementations, step-by-step complexity analysis, exercises, and a full test suite using Vitest — bridging the gap between academic Computer Science theory and practical software engineering.
Beta: This book is currently in beta and is still under active review. It may contain errors or incomplete sections. Report errors or issues — contributions are welcome via the GitHub repository.
This book grew out of a simple observation: most software engineers use algorithms and data structures every day, yet many feel uncertain about the fundamentals. They may use a hash map or call a sorting function without fully understanding the guarantees those abstractions provide, or they may struggle when a problem requires designing a new algorithm from scratch. At the same time, Computer Science students often encounter algorithms in a highly theoretical setting that can feel disconnected from the code they write in practice.
Algorithms with TypeScript bridges that gap. It presents the core algorithms and data structures from a typical undergraduate algorithms curriculum - roughly equivalent to MIT's 6.006 and 6.046 - but uses TypeScript as the language of expression. Every algorithm discussed in the text is implemented, tested, and available in the accompanying repository. The implementations are not pseudocode translated into TypeScript; they are idiomatic, type-safe, and tested with a modern toolchain.
Who this book is for
This book is written for two audiences:
-
Software engineers who want to solidify their understanding of algorithms and data structures. Perhaps you learned this material years ago and want a refresher, or perhaps you are self-taught and want to fill in the gaps. Either way, seeing algorithms in a language you likely use at work - TypeScript - makes the material immediately applicable.
-
Computer Science students who are taking (or preparing to take) an algorithms course. The book follows a standard curricular sequence and includes exercises at the end of every chapter. The TypeScript implementations let you run, modify, and experiment with every algorithm.
Prerequisites
The book assumes you can read and write basic TypeScript or JavaScript. You should be comfortable with functions, loops, conditionals, arrays, and objects. No prior knowledge of algorithms or data structures is required - we build everything from the ground up, starting with the definition of an algorithm in Chapter 1.
Some chapters use mathematical notation, particularly for complexity analysis. Chapter 2 introduces asymptotic notation (, , ), and the Notation section that follows this preface summarizes all conventions used in the book. A comfort with basic algebra and mathematical reasoning is helpful but not strictly required; we explain each concept as it arises.
How to use this book
The book is organized into six parts:
- Part I: Foundations (Chapters 1-3) introduces the notion of an algorithm, the mathematical tools for analyzing algorithms, and recursion with divide-and-conquer.
- Part II: Sorting and Selection (Chapters 4-6) covers the classical sorting algorithms, from elementary methods through comparison sorts to linear-time non-comparison sorts and selection algorithms.
- Part III: Data Structures (Chapters 7-11) presents the fundamental data structures: arrays, linked lists, stacks, queues, hash tables, trees, balanced search trees, heaps, and priority queues.
- Part IV: Graph Algorithms (Chapters 12-15) covers graph representations, traversal, shortest paths, minimum spanning trees, and network flow.
- Part V: Algorithm Design Techniques (Chapters 16--17) explores dynamic programming and greedy algorithms as general problem-solving strategies.
- Part VI: Advanced Topics (Chapters 18-22) covers disjoint sets, tries, string matching, computational complexity, and approximation algorithms.
The parts are designed to be read in order, as later chapters build on concepts and data structures introduced in earlier ones. Within each part, the chapters are largely self-contained - if you are comfortable with the prerequisites, you can often read individual chapters independently.
Each chapter follows a consistent structure: a motivating introduction, formal definitions, detailed algorithm descriptions with step-by-step traces, TypeScript implementations with code snippets, complexity analysis, and exercises. The exercises range from straightforward checks of understanding to more challenging problems that extend the material.
The code
All implementations live in the src/ directory of the repository, organized by chapter. Tests are in the tests/ directory with a parallel structure. To run the full test suite:
npm install
npm test
The code is written in TypeScript 5 with strict mode enabled, uses ES modules, and is tested with Vitest. See the project README for detailed setup instructions.
We encourage you to read the code alongside the text. The implementations are designed to be clear and readable rather than maximally optimized. Where there is a tension between clarity and performance, we choose clarity and discuss the performance implications in the text.
Acknowledgments
This book draws inspiration from several excellent texts, most notably Cormen, Leiserson, Rivest, and Stein's Introduction to Algorithms (CLRS), Sedgewick and Wayne's Algorithms, Niklaus Wirth's Algorithms + Data Structures = Programs, and Kleinberg and Tardos's Algorithm Design. The MIT OpenCourseWare materials for 6.006 and 6.046 were invaluable in shaping the curriculum. Full references are in the Bibliography.
Notation
This section summarizes the mathematical and typographical conventions used throughout the book. It is intended as a reference; each symbol is introduced and explained in context when it first appears.
Asymptotic notation
| Symbol | Meaning |
|---|---|
| Asymptotic upper bound: for all (Definition 2.2) | |
| Asymptotic lower bound: for all (Definition 2.3) | |
| Tight bound: and (Definition 2.4) | |
| Strict upper bound: as | |
| Strict lower bound: as |
The asymptotic families correspond loosely to the comparison operators: to , to , to , to , and to .
Common growth rates
| Growth rate | Name | Example algorithm |
|---|---|---|
| Constant | Hash table lookup (expected) | |
| Logarithmic | Binary search | |
| Linear | Finding the maximum | |
| Linearithmic | Merge sort, heap sort | |
| Quadratic | Insertion sort (worst case) | |
| Cubic | Floyd-Warshall | |
| Exponential | Subset sum (brute force) | |
| Factorial | TSP (brute force) |
General mathematical notation
| Symbol | Meaning |
|---|---|
| Input size (unless otherwise stated) | |
| Running time as a function of input size | |
| Floor: largest integer | |
| Ceiling: smallest integer | |
| Logarithm base 2 (unless base is stated explicitly) | |
| Logarithm base | |
| Natural logarithm (base ) | |
| -th harmonic number: | |
| Factorial: | |
| Binomial coefficient: | |
| Remainder when is divided by | |
| Summation of for from to | |
| Product of for from to | |
| Infinity | |
| Approximately equal |
Logic and quantifiers
| Symbol | Meaning |
|---|---|
| Implies (if ... then) | |
| If and only if | |
| For all | |
| There exists |
Set notation
| Symbol | Meaning |
|---|---|
| Set containing elements , , | |
| is a member of set | |
| is not a member of set | |
| is a subset of (possibly equal) | |
| is a proper subset of | |
| Union of and | |
| Intersection of and | |
| Set difference: elements in but not in | |
| Cardinality (number of elements) of set | |
| Empty set | |
| Set of real numbers | |
| Set of all binary strings |
Graph notation
| Symbol | Meaning |
|---|---|
| Graph with vertex set and edge set | |
| Number of vertices | |
| Number of edges | |
| Edge from vertex to vertex | |
| Weight of edge | |
| Weight function mapping edges to real numbers | |
| Shortest-path weight from to | |
| Distance between vertices and | |
| Capacity of edge (in flow networks) | |
| Flow on edge | |
| Total weight of tree | |
| Adjacency list of vertex |
Vertices are typically denoted by lowercase letters: , , (source), (sink). We use to denote a path from to .
Probability notation
| Symbol | Meaning |
|---|---|
| Probability of event | |
| Expected value of random variable |
Complexity classes
Complexity classes are set in bold: , , . NP-complete problems are written in small capitals in running text (e.g., SUBSET SUM, SAT, HAMILTONIAN CYCLE).
Algorithm and function names
In mathematical expressions, algorithm names are typeset in roman (upright) text to distinguish them from variables:
- , , for heap index calculations
- for shortest-path edge relaxation
- for the optimal solution value on instance
Running-time recurrences use . Fibonacci numbers are .
Array and indexing conventions
All TypeScript implementations use 0-based indexing: the first element of an array arr is arr[0], and an array of elements has indices .
In mathematical discussion, array ranges are written as to denote the subarray from index (inclusive) to (exclusive). In heap formulas:
Formal structures
Formal definitions, theorems, and lemmas are set in blockquotes with a label:
Definition X.Y --- Title
Statement of the definition.
Proofs end with the symbol . Examples are labeled Example X.Y and numbered within each chapter.
Code conventions
- All code is TypeScript with strict mode and ES module syntax.
- Generic type parameters (e.g.,
T,K,V) follow standard TypeScript conventions. - The shared type
Comparator<T>is(a: T, b: T) => number, returning negative if , zero if , and positive if . - Code snippets in chapters match the tested implementations in the
src/directory.
Introduction to Algorithms
In this chapter we discuss what an algorithm is, how algorithms can be expressed, and why studying them matters. We introduce TypeScript as the language used throughout the book, walk through setting up a development environment, and examine our first two algorithms in detail: finding the maximum of an array and the Sieve of Eratosthenes.
What is an algorithm?
Let us start with a discussion of what an algorithm is. Intuitively the notion is more or less clear: we are talking about some formal way to describe a computational procedure. According to the Merriam-Webster dictionary, an algorithm is "a set of steps that are followed in order to solve a mathematical problem or to complete a computer process".
Still, this is probably not formal enough. How do we choose the next step from the set of steps? Should the procedure stop eventually? What is the result of executing an algorithm? Many formal definitions of what constitutes an algorithm can be given; however, we will use the following working definition.
Definition 1.1 - Algorithm
A set of computational steps that specifies a formal computational procedure and has the following properties:
After each step is completed, the next step is unambiguously defined, or the algorithm stops its execution if there are no more steps left.
It is defined on a set of inputs and for each valid input it stops after a finite number of steps.
When it stops it produces a result, which we call its output.
Its steps and their order of execution can be formally and unambiguously specified using some language or notation.
These four properties capture the essence of what makes a procedure an algorithm. Let us look at each one briefly:
- Determinism (property 1): at every point during execution, there is exactly one thing to do next, or the algorithm is done. The next step is always uniquely determined by the current state.
- Termination (property 2): for every valid input, the algorithm eventually finishes. It does not run forever.
- Output (property 3): when the algorithm finishes, it produces a well-defined result.
- Formal specification (property 4): the algorithm can be written down precisely enough that it could, in principle, be carried out mechanically.
Expressing algorithms
Algorithms can be expressed in a variety of ways. We can even specify the execution steps using ordinary human language. Let us provide a few simple examples. A trivial first example is multiplying two numbers.
Example 1.1: Integer multiplication.
Steps:
- Given two integer numbers, multiply them and return the result.
All the properties from Definition 1.1 are satisfied. There is only one step; after this step the algorithm stops; the step is formally specified; all pairs of integer numbers are valid inputs; and a valid result will be produced for each of them. If we denote the algorithm for multiplication as , then, for example,
and we can specify the algorithm more concisely as
So far, while talking about algorithms, we have encountered no TypeScript or any other programming language notation. This is quite intentional: the notion of an algorithm is mathematical and abstract. Of course we can express any algorithm using TypeScript, but that will be just one of the possible formal representations - in this case, one that is also executable by a computer.
A careful reader might be puzzled by our confidence. How can we assert that any algorithm can be expressed using TypeScript? Can this claim be proven, given our definition? Is TypeScript powerful enough to express every possible algorithm? It turns out that it is. This can be proven rigorously: TypeScript (like most general-purpose programming languages) is Turing complete, meaning it can simulate any computation that a Turing machine can perform. Since Turing machines capture the full power of algorithmic computation, any algorithm can be expressed in TypeScript.
Let us look again at Definition 1.1. It states that we should be able to specify the computational procedure formally. It is now clear why we require this property: given a formal language such as TypeScript, we can specify the algorithm of interest and execute the specification on a computer. For the multiplication algorithm we can write:
function mult(x: number, y: number): number {
return x * y;
}
The TypeScript specification is more concise and unambiguous than the natural-language version. Throughout the book we will primarily use TypeScript, but keep in mind that the algorithms we discuss can be expressed in other formal notations as well. Many Computer Science textbooks go as far as inventing their own pseudocode to avoid being tied to a particular programming language. We will not go that far and will happily use TypeScript - hence the name of the book, Algorithms with TypeScript.
Computational procedures that are not algorithms
Can we write a computational procedure that is not an algorithm? Yes. Consider the following TypeScript function:
function getMaximumNumber(): number {
let x = 0;
while (true) {
x++;
}
return x;
}
This function never terminates: the while (true) loop runs forever, so the return statement is never reached. Property 2 of Definition 1.1 is violated - the procedure does not stop after a finite number of steps. This is therefore not an algorithm.
Another example of a non-algorithm is a division function defined on all pairs of numbers:
function divide(x: number, y: number): number {
if (y === 0) {
throw new Error('Cannot divide by zero');
}
return x / y;
}
This is not an algorithm according to our definition because the result is not defined for all inputs - when the procedure throws an error instead of producing an output (property 3 is violated). However, it is easy to fix this:
function divide(x: number, y: number): number {
return y === 0 ? Infinity : x / y;
}
In fact, in JavaScript (and TypeScript), dividing by zero returns Infinity by default, so we could simply write:
function divide(x: number, y: number): number {
return x / y;
}
This is an algorithm - but only because of JavaScript's particular treatment of division by zero.
From these examples we see that not every computational procedure that can be formally expressed is an algorithm. The properties in Definition 1.1 are genuine constraints.
Why study algorithms?
Before we proceed to our first nontrivial examples, let's briefly discuss why studying algorithms is worthwhile.
Correctness. Real-world software often needs to solve well-defined computational problems: sort a list, find the shortest route, compress data, search a database. An algorithm gives us a proven solution to such a problem. Understanding the classic algorithms means you can recognize when a problem you face has already been solved - and solved well.
Efficiency. Two algorithms that solve the same problem can differ enormously in how long they take or how much memory they use. Later in this book we will see sorting algorithms that take time proportional to (where is the number of elements) and others that take time proportional to . For a million elements, that is the difference between a trillion operations and roughly twenty million - a factor of 50,000. Choosing the right algorithm can be the difference between a program that finishes in seconds and one that takes hours.
Foundation for deeper topics. Algorithms and data structures form the backbone of Computer Science. Topics like databases, compilers, operating systems, machine learning, and cryptography all build on the ideas we will develop in this book.
Problem-solving skills. Even when you are not directly implementing a classic algorithm, the techniques you learn - divide and conquer, dynamic programming, greedy strategies, graph modeling - give you a powerful toolkit for approaching new problems.
Introduction to TypeScript
Throughout this book we use TypeScript as our implementation language. TypeScript is a statically typed superset of JavaScript: every valid JavaScript program is also a valid TypeScript program, but TypeScript adds optional type annotations that are checked at compile time.
We chose TypeScript for several reasons:
- Readability. TypeScript syntax is familiar to anyone who has worked with JavaScript, Java, C#, or similar C-family languages. Type annotations make function signatures self-documenting.
- Type safety. Generic types let us write algorithms that work with any element type while the compiler catches type errors before we run the code.
- Ubiquity. TypeScript runs anywhere JavaScript runs: in the browser, on the server (Node.js), and in countless tools. There is no special runtime to install beyond Node.js.
- Modern features. Destructuring, iterators, generator functions, and first-class functions make algorithm implementations concise and expressive.
Here is a small example that illustrates some features we will use frequently:
// A generic function that returns the first element of a non-empty array
function first<T>(arr: T[]): T {
if (arr.length === 0) {
throw new Error('Array must not be empty');
}
return arr[0];
}
const name: string = first(['Alice', 'Bob', 'Charlie']); // 'Alice'
const value: number = first([42, 17, 8]); // 42
The <T> syntax introduces a type parameter: the function works with arrays of any element type, and the compiler ensures that the return type matches the array's element type. We will use generics extensively when implementing data structures and sorting algorithms.
Setting up the development environment
To follow along with the code in this book, you will need:
- Node.js (version 18 or later): download from https://nodejs.org or use a version manager such as
nvm. - A text editor with TypeScript support. Visual Studio Code works particularly well, but any modern editor will do.
Once Node.js is installed, clone the book's repository and install the dependencies:
git clone https://github.com/amoilanen/Algorithms-with-Typescript.git
cd Algorithms-with-Typescript
npm install
The project uses the following tools, all installed automatically by npm install:
| Tool | Purpose |
|---|---|
| TypeScript | Static type checking and compilation |
| Vitest | Fast test runner with native TypeScript support |
| ESLint | Code quality and consistency checking |
| Prettier | Automatic code formatting |
Useful commands:
npm test # Run all tests
npm run test:watch # Re-run tests on file changes
npm run typecheck # Check types without emitting files
npm run lint # Run the linter
Every algorithm in this book has a corresponding test suite. We encourage you to run the tests, read them, and experiment by modifying the implementations.
Finding the maximum element
Now that we are finished with definitions and setup, let's look at a few more interesting algorithms. The first problem is simple: given an array of numbers, find the largest one.
The problem
Input: An array of numbers .
Output: The maximum value in , or undefined if is empty.
A linear scan
The most natural approach is to scan through the array from left to right, keeping track of the largest value seen so far:
- Set to
undefined. - For each element in :
- If is
undefinedor , set .
- If is
- Return .
Here is the TypeScript implementation:
export function max(elements: number[]): number | undefined {
let result: number | undefined;
for (const element of elements) {
if (result === undefined || element > result) {
result = element;
}
}
return result;
}
Let us trace through an example. Suppose :
| Step | element | result before | Comparison | result after |
|---|---|---|---|---|
| 1 | 2 | undefined | undefined → update | 2 |
| 2 | 1 | 2 | ? No | 2 |
| 3 | 4 | 2 | ? Yes | 4 |
| 4 | 2 | 4 | ? No | 4 |
| 5 | 3 | 4 | ? No | 4 |
The function returns 4, which is indeed the maximum.
Correctness
We can argue correctness using a loop invariant: at the start of each iteration, result holds the maximum of all elements examined so far (or undefined if none have been examined).
- Initialization: Before the first iteration, no elements have been examined and
resultisundefined. The invariant holds trivially. - Maintenance: Suppose the invariant holds at the start of an iteration. If the current
elementis greater thanresult(orresultisundefined), we updateresulttoelement. Otherwiseresultalready holds the maximum. In either case, after the iterationresultis the maximum of all elements seen so far. - Termination: The loop ends when all elements have been examined. By the invariant,
resultholds the maximum of the entire array.
Complexity analysis
The function performs one comparison per element and visits each element exactly once.
- Time complexity: , where is the length of the array.
- Space complexity: - we use only a single variable
resultbeyond the input.
Can we do better than ? No. Any algorithm that finds the maximum must examine every element at least once: if it skipped an element, that element could have been the maximum. Therefore is optimal for this problem.
Finding prime numbers: the Sieve of Eratosthenes
Our second algorithm is more substantial and has a rich history dating back over two thousand years. The goal is to find all prime numbers up to a given number .
The problem
Input: A positive integer .
Output: A list of all prime numbers such that .
Recall that a prime number is an integer greater than 1 whose only positive divisors are 1 and itself. The first few primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, ...
A naive approach: trial division
The most straightforward method is to test each number from 2 to for primality by checking whether it has any divisors other than 1 and itself:
export function primesUpToSlow(number: number): number[] {
const primes: number[] = [];
for (let current = 2; current <= number; current++) {
if (isPrime(current)) {
primes.push(current);
}
}
return primes;
}
function isPrime(number: number): boolean {
for (let i = 2; i < number; i++) {
if (number % i === 0) {
return false;
}
}
return true;
}
For each candidate number , the isPrime function tests all potential divisors from 2 up to . If any of them divides evenly, is not prime.
This works, but it is slow. For each of the candidates, we may test up to divisors. In the worst case (when is prime), the isPrime check does work. Summing over all candidates gives roughly time. (We could improve isPrime by only testing up to , which brings the total to approximately , but there is a fundamentally better approach.)
The Sieve of Eratosthenes
The Sieve of Eratosthenes, attributed to the ancient Greek mathematician Eratosthenes of Cyrene (c. 276--194 BC), takes a different approach. Instead of testing each number individually, it starts by assuming all numbers are prime and then systematically eliminates the ones that are not:
- Create a boolean array
isPrime[2..n], initially alltrue. - For each number starting from 2:
- If
isPrime[p]istrue, then is prime. Mark all multiples of (starting from ) asfalse.
- If
- Collect all indices that remain
true.
Here is the TypeScript implementation:
export function primesUpTo(number: number): number[] {
const isPrimeNumber: boolean[] = [];
const primes: number[] = [];
let current = 2;
for (let i = 2; i <= number; i++) {
isPrimeNumber[i] = true;
}
while (current <= number) {
if (isPrimeNumber[current]) {
primes.push(current);
for (let j = 2 * current; j <= number; j += current) {
isPrimeNumber[j] = false;
}
}
current++;
}
return primes;
}
Tracing through an example
Let us trace the sieve for . We start with all numbers from 2 to 20 marked as potentially prime:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
T T T T T T T T T T T T T T T T T T T
: 2 is prime. Cross out multiples of 2: 4, 6, 8, 10, 12, 14, 16, 18, 20.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
T T F T F T F T F T F T F T F T F T F
: 3 is still marked true, so it is prime. Cross out multiples of 3: 6, 9, 12, 15, 18 (some are already crossed out).
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
T T F T F T F F F T F T F F F T F T F
: 4 is marked false (not prime). Skip it.
: 5 is marked true, so it is prime. Cross out multiples of 5: 10, 15, 20 (all already crossed out). The array is unchanged.
: 6 is marked false (not prime). Skip it.
: 7 is marked true, so it is prime. Its first multiple is 14, which is already crossed out. The array is unchanged.
: All marked false. Skip.
: 11 is marked true, so it is prime. Its first multiple within range would be 22, which exceeds . No crossings.
: Marked false. Skip.
: 13 is marked true, so it is prime. Its first multiple within range would be 26, which exceeds . No crossings.
: All marked false. Skip.
: 17 is marked true, so it is prime. Its first multiple within range would be 34, which exceeds . No crossings.
: Marked false. Skip.
: 19 is marked true, so it is prime. Its first multiple within range would be 38, which exceeds . No crossings.
: Marked false. Skip.
The final state of the array is:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
T T F T F T F F F T F T F F F T F T F
The numbers that remain true are:
These are exactly the primes up to 20.
Why does the sieve work?
The key insight is: if a number is composite (not prime), then for some integers with . The smallest such factor is itself prime (otherwise it could be factored further). When the sieve processes , it marks as composite. Therefore, every composite number gets marked false by the time the sieve finishes.
Conversely, if a number is prime, no smaller prime divides it, so is never marked false. The sieve correctly identifies exactly the prime numbers.
Complexity analysis
How much work does the sieve do? For each prime , it crosses out at most multiples. The total work is proportional to:
A classical result in number theory states that the sum of the reciprocals of the primes up to grows as . Therefore:
- Time complexity: .
- Space complexity: for the boolean array.
Note: since and differ only by a constant factor (), it does not matter which logarithm base we use inside big- notation. You may see this complexity written equivalently as in other sources.
Compare this with the naive trial-division approach at . For :
| Algorithm | Approximate operations |
|---|---|
| Trial division | |
| Sieve of Eratosthenes |
The sieve is roughly 300 times faster - an enormous difference for large inputs.
Comparing the two approaches
This is our first encounter with a recurring theme in this book: different algorithms for the same problem can have vastly different performance characteristics. The naive approach is simple and easy to understand, but the sieve achieves dramatically better performance by exploiting the structure of the problem.
Throughout the book, we will develop the tools to analyze these differences precisely. In Chapter 2 we formalize the notion of time complexity using asymptotic notation (, , ), which gives us a language for comparing algorithms independently of the specific hardware they run on.
Summary
In this chapter we defined what an algorithm is, introduced TypeScript as our implementation language, and studied two concrete algorithms. We saw that:
- An algorithm is a well-defined computational procedure that terminates on all valid inputs and produces a result.
- Algorithms can be expressed in many notations; we use TypeScript because it combines readability, type safety, and executability.
- Even for simple problems, the choice of algorithm can dramatically affect performance: the Sieve of Eratosthenes outperforms trial division by orders of magnitude.
In the next chapter, we develop the mathematical framework - asymptotic notation and complexity analysis - that lets us reason precisely about algorithm efficiency. These tools will be essential throughout the rest of the book.
Exercises
Exercise 1.1. Write a function min(elements: number[]): number | undefined that returns the minimum element of an array, analogous to the max function. What is its time complexity?
Exercise 1.2. The isPrime function in the trial-division approach tests divisors from 2 all the way up to . Explain why it suffices to test only up to . Modify the function accordingly and analyze the improved time complexity for finding all primes up to .
Exercise 1.3. The Sieve of Eratosthenes as presented starts crossing out multiples of from . Show that it is sufficient to start from instead. Why does this not change the asymptotic time complexity?
Exercise 1.4. The proper divisors of a positive integer are all positive divisors of other than itself. For example, the proper divisors of 12 are 1, 2, 3, 4, and 6. A perfect number is a positive integer that equals the sum of its proper divisors (e.g., ). Write a function isPerfect(n: number): boolean and use it to find all perfect numbers up to 10,000. What is the time complexity of your approach?
Exercise 1.5. Consider the following function:
function mystery(n: number): number {
if (n <= 1) return n;
return mystery(n - 1) + mystery(n - 2);
}
Does this function define an algorithm according to Definition 1.1? What does it compute? Try calling it with , , and . What do you observe about the running time? (We will revisit this function in Chapter 16 on dynamic programming.)
Analyzing Algorithms
In Chapter 1 we saw that two algorithms for the same problem — trial division versus the Sieve of Eratosthenes — can differ enormously in performance. In this chapter we develop the mathematical framework for making such comparisons precise. We introduce asymptotic notation, which lets us describe how an algorithm's resource usage grows with input size, and we study several techniques for analyzing running time: best-, worst-, and average-case analysis, amortized analysis, and recurrence relations.
Why analyze algorithms?
Suppose you have two sorting algorithms, and , and you want to know which one is faster. The most direct approach is to run both on the same input and measure the wall-clock time. This is called benchmarking, and it has an important place in software engineering. However, benchmarking has limitations:
- Hardware dependence. Algorithm might be faster on your laptop but slower on a different machine with a different CPU, cache hierarchy, or memory bandwidth.
- Input dependence. Algorithm might be faster on the particular test data you chose, but slower on inputs that arise in practice.
- Implementation effects. A clever implementation of a theoretically slower algorithm can outperform a naive implementation of a theoretically faster one.
What we want is a way to compare algorithms independently of these factors — a way to reason about the inherent efficiency of an algorithm rather than the efficiency of a particular implementation on a particular machine with a particular input. This is what asymptotic analysis provides.
The idea is to count the number of "basic operations" an algorithm performs as a function of the input size , and then focus on how that function grows as becomes large. We ignore constant factors (which depend on the hardware and implementation) and lower-order terms (which become negligible for large ). The result is a concise characterization of an algorithm's scalability.
Measuring input size and running time
Before we can analyze an algorithm, we need to agree on two things: what counts as the input size, and what counts as a basic operation.
Input size is usually the most natural measure of how much data the algorithm must process:
- For an array of numbers, the input size is the number of elements .
- For a graph, the input size is often specified as a pair — the number of vertices and edges.
- For a number-theoretic algorithm like the Sieve of Eratosthenes, the input size is the number itself.
Basic operations are the elementary steps we count. Common choices include comparisons, arithmetic operations, assignments, or array accesses. The specific choice rarely matters for asymptotic analysis, because changing which operation we count changes the total by at most a constant factor.
Definition 2.1 - Running time
The running time of an algorithm on a given input is the number of basic operations it performs when executed on that input.
We are usually interested in expressing the running time as a function of the input size .
Example 2.1: Running time of max.
Recall the max function from Chapter 1:
export function max(elements: number[]): number | undefined {
let result: number | undefined;
for (const element of elements) {
if (result === undefined || element > result) {
result = element;
}
}
return result;
}
If we count comparisons as our basic operation, the loop performs exactly one comparison per element (the element > result check; the undefined check is bookkeeping). For an array of elements, the running time is .
Asymptotic notation
Rather than stating that an algorithm takes exactly operations, we want to capture the growth rate — the fact that the dominant behavior is quadratic. Asymptotic notation gives us a precise way to do this.
Big-O: upper bounds
Definition 2.2 - Big-O notation
Let and be functions from the non-negative integers to the non-negative reals. We write
if there exist constants and such that
In words: grows no faster than , up to a constant factor, for sufficiently large .
Example 2.2. Let . We claim .
Proof. For , we have and , so
Choosing and satisfies Definition 2.2.
Note that is also technically true — is bounded above by — but it is less informative. By convention, we try to state the tightest bound we can prove.
Big-Omega: lower bounds
Definition 2.3 - Big-Omega notation
We write if there exist constants and such that
In words: grows at least as fast as , up to a constant factor.
Example 2.3. .
Proof. For all , . Choose and .
Big-Omega is especially useful for expressing lower bounds on problems: "any algorithm that solves this problem must take at least time."
Big-Theta: tight bounds
Definition 2.4 - Big-Theta notation
We write if and .
Equivalently, there exist constants and such that
In words: and grow at the same rate, up to constant factors.
Example 2.4. From Examples 2.2 and 2.3, we have .
Big-Theta is the most precise statement: it says the function grows exactly like , within constant factors. When we can determine a Big-Theta bound for an algorithm, we have characterized its running time completely (in the asymptotic sense).
Summary of notation
| Notation | Meaning | Analogy |
|---|---|---|
| grows no faster than | ||
| grows at least as fast as | ||
| and grow at the same rate |
The analogy to , , is informal but helpful for intuition. Formally, all three notations suppress constant factors and describe behavior only for sufficiently large .
Common growth rates
The following table lists growth rates that appear throughout this book, ordered from slowest to fastest:
| Growth rate | Name | Example |
|---|---|---|
| Constant | Array index access | |
| Logarithmic | Binary search | |
| Linear | Finding the maximum | |
| Linearithmic | Merge sort, heap sort | |
| Quadratic | Insertion sort (worst case) | |
| Cubic | Floyd-Warshall all-pairs shortest paths | |
| Exponential | Brute-force subset enumeration | |
| Factorial | Brute-force permutation enumeration |
To appreciate the practical impact, consider an algorithm that performs operations on a computer executing operations per second:
| 10 | 10 ns | 33 ns | 100 ns | 1 μs | 1 μs |
| 100 | 100 ns | 664 ns | 10 μs | 1 ms | years |
| 1,000 | 1 μs | 10 μs | 1 ms | 1 s | years |
| 1 ms | 20 ms | 17 min | 31.7 years | years | |
| 1 s | 30 s | 31.7 years | years | years |
The table makes a powerful point: the gap between and is large for a million elements, and the jump to is catastrophic even for modest inputs.
Best case, worst case, and average case
The running time of an algorithm usually depends on the specific input, not just its size. Consider insertion sort.
Insertion sort as a running example
The following is an implementation of insertion sort, which we will study fully in Chapter 4. For now, we use it as an analysis example:
export function insertionSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
const copy = elements.slice(0);
for (let i = 1; i < copy.length; i++) {
const toInsert = copy[i]!;
let insertIndex = i - 1;
while (insertIndex >= 0 && comparator(toInsert, copy[insertIndex]!) < 0) {
copy[insertIndex + 1] = copy[insertIndex]!;
insertIndex--;
}
insertIndex++;
copy[insertIndex] = toInsert;
}
return copy;
}
The outer loop runs iterations (for ). For each iteration, the inner while loop shifts elements to the right until it finds the correct insertion point. The number of shifts depends on the input.
Worst-case analysis
Definition 2.5 - Worst-case running time
The worst-case running time is the maximum running time over all inputs of size :
For insertion sort, the worst case occurs when the array is sorted in reverse order: . In this case, every new element must be shifted past all previously sorted elements. The inner loop performs comparisons in iteration , so the total number of comparisons is:
Best-case analysis
Definition 2.6 - Best-case running time
The best-case running time is the minimum running time over all inputs of size :
For insertion sort, the best case occurs when the array is already sorted. Each new element is already in its correct position, so the inner loop performs zero shifts — just one comparison to discover that no shifting is needed. The total is:
This is remarkable: insertion sort runs in linear time on already-sorted input, matching the theoretical minimum for any comparison-based algorithm that must verify sortedness.
Average-case analysis
Definition 2.7 - Average-case running time
The average-case running time is the expected running time over some distribution of inputs. For a uniform distribution over all permutations of elements:
For insertion sort, consider iteration : the element being inserted has an equal probability of belonging at any of the positions in the sorted prefix. On average, it must be shifted past half of the sorted elements, so the expected number of comparisons in iteration is roughly . The total expected comparisons are:
The average case is still — the same order of growth as the worst case. The constant factor is half as large, but asymptotically the behavior is the same.
Which case matters?
In practice, worst-case analysis is the most commonly used, for several reasons:
- Guarantees. The worst case gives an upper bound that holds for every input. This is crucial in real-time systems, web servers, and other contexts where performance must be predictable.
- Average case can be misleading. The "average" depends on the input distribution, which we may not know. If the actual inputs differ from our assumption, the average-case analysis may not apply.
- Worst case is often typical. For many algorithms, the worst case and average case have the same asymptotic growth rate (as we just saw with insertion sort).
We will occasionally discuss best-case and average-case bounds when they provide useful insight, but unless otherwise stated, all complexity bounds in this book refer to the worst case.
Amortized analysis
Sometimes an operation is expensive occasionally but cheap most of the time. Amortized analysis gives us a way to average the cost over a sequence of operations, providing a tighter bound than the worst-case cost per operation.
The dynamic array example
Consider a dynamic array (like JavaScript's Array or std::vector in C++) that supports an append operation. The array maintains an internal buffer of some capacity. When the buffer is full and a new element is appended, the array allocates a new buffer of double the capacity and copies all existing elements over. This resize operation costs , where is the current number of elements.
At first glance, this seems concerning: a single append can cost . But resizes happen infrequently — only when the size reaches a power of 2. Let us analyze the cost of consecutive appends starting from an empty array.
The resize operations happen at sizes 1, 2, 4, 8, , , where . The total copying cost across all resizes is:
Adding the operations that simply write the new element into the array (cost 1 each), the total cost of appends is less than . Therefore the amortized cost per append is:
Each individual append may cost in the worst case, but averaged over a sequence of operations, the cost is per operation.
Amortized vs. average case
It is important to distinguish amortized analysis from average-case analysis:
- Average case averages over random inputs: we assume a probability distribution on the inputs and compute the expected running time.
- Amortized analysis averages over a sequence of operations on a worst-case input: no probability is involved. The guarantee holds deterministically.
Amortized analysis says: "no matter what sequence of operations you perform, the total cost is at most , so the amortized cost per operation is ." This is a worst-case guarantee about the total, not a probabilistic statement.
We will see amortized analysis again in Chapter 7 (dynamic arrays), Chapter 11 (binary heaps), and Chapter 18 (union-find).
Recurrence relations
When an algorithm solves a problem by breaking it into smaller instances of the same problem, its running time is naturally expressed as a recurrence relation: a formula that expresses in terms of applied to smaller values.
Setting up a recurrence
Example 2.5: Binary search. Binary search (discussed in Chapter 3) repeatedly halves the search space:
- Compare the target with the middle element.
- If they match, return the index.
- Otherwise, recurse on the left or right half.
The running time satisfies the recurrence:
The term accounts for the recursive call on half the array, and the term accounts for the comparison and index computation.
Example 2.6: Merge sort. Merge sort (discussed in Chapter 5) divides the array in half, recursively sorts both halves, and merges the results:
The two recursive calls each process half the array (), and the merge step takes time.
Solving recurrences by expansion
One way to solve a recurrence is to expand it repeatedly until a pattern emerges.
Example 2.7: Solving the binary search recurrence.
Expanding:
The recursion bottoms out when , i.e., . Therefore:
Example 2.8: Solving the merge sort recurrence.
Expanding:
At level : . Setting :
The recursion tree method
A recursion tree is a visual tool for solving recurrences. Each node represents the cost at one level of recursion, and the total cost is the sum over all nodes.
For merge sort with :
Level 0: cn → cost cn
/ \
Level 1: cn/2 cn/2 → cost cn
/ \ / \
Level 2: cn/4 cn/4 cn/4 cn/4 → cost cn
... ...
Level k: c c c ... c c c → cost cn
(n leaves)
There are levels, each contributing work, so the total is .
The Master Theorem
The Master Theorem provides a general solution for recurrences of a common form.
Definition 2.8 - The Master Theorem
Let and be constants, let be a function, and let be defined by the recurrence
Then can be bounded asymptotically as follows:
If for some constant , then .
If , then .
If for some constant , and if for some constant and sufficiently large , then .
The three cases correspond to three scenarios, and the intuition comes directly from examining the recursion tree for the general recurrence .
A note to the reader. Understanding why the Master Theorem works is not required for the rest of this book — only knowing how to apply it (which we cover in the examples that follow). The proof sketch below is provided for the sake of completeness. If the math feels a bit daunting, feel free to skip ahead to Example 2.9 and return to this section later.
The general recursion tree
At each level of the recursion, one problem of size spawns subproblems of size and performs non-recursive work. The following table summarizes the tree level by level:
| Level | Nodes | Subproblem size | Work per node | Total work at level |
|---|---|---|---|---|
The tree has levels (since we divide by at each step until we reach size 1). The number of leaves is — this quantity is central to all three cases.
Why does ?
This identity can look like a magic trick the first time you see it. Here is a concrete way to understand it, followed by the general argument.
Concrete example. Take , , . Then , so the left side is . For the right side, , so . They match — but why?
Step-by-step derivation. The key idea is to rewrite as a power of . Since logarithms and exponents are inverses, we can always write . Now substitute this into :
Meanwhile, we can do the same thing with and compute:
The two exponents are identical — just multiplication in different order — so . The trick is nothing more than .
Why it matters here. The left form, , is easy to read off the tree: children per node, levels deep, so leaves. The right form, , is more useful for asymptotic analysis because it expresses the leaf count as a polynomial in , making it easy to compare with .
The total cost is the sum across all levels:
The Master Theorem's three cases arise from comparing the non-recursive work to the leaf count , which determines the shape of how work is distributed across levels.
Case 1: Leaf-dominated —
When is polynomially smaller than , the work increases geometrically as you move down the tree. Each level does more work than the one above it, so the leaves dominate:
| Level | Total work at level | |
|---|---|---|
| ▎ | ||
| ▍ | ||
| ▌ | ||
| ▊ | ||
| ████ ← dominates |
Since the bottom level dominates a geometric series, the total is .
Intuition: The function shrinks so fast that the proliferation of subproblems ( new ones per level) outpaces the reduction in per-node work. The cost is essentially the number of leaves times per leaf.
Example: Strassen's algorithm, . Here , which dominates .
Case 2: Evenly distributed —
When is proportional to , a remarkable cancellation occurs: every level of the tree contributes roughly the same amount of work. The increase in the number of nodes at each level is exactly offset by the decrease in per-node work:
| Level | Total work at level | |
|---|---|---|
| ████ | ||
| ████ | ||
| ████ | ||
| ████ | ||
| ████ |
There are levels, each contributing , so .
Intuition: To see why each level contributes the same work, note that at level , the work is . If , then . Multiplying by gives — the factors cancel exactly.
Example: Merge sort, . Here , so every level costs , and the total is .
Case 3: Root-dominated —
When is polynomially larger than , the work decreases geometrically as you move down the tree. The root does the most work, and each subsequent level does less:
| Level | Total work at level | |
|---|---|---|
| ████ ← dominates | ||
| ▊ | ||
| ▌ | ||
| ▍ | ||
| ▎ |
The total is a decreasing geometric series dominated by its first term, so .
Intuition: The non-recursive work is so large relative to the number of subproblems that the recursive calls barely contribute. The extra "regularity condition" for some guarantees that the per-level costs truly form a decreasing geometric series (i.e., that doesn't have pathological oscillations that could break the argument).
Example: . Here , but . The root does work, the next level does , then , and so on. The total is .
Proof intuition: why these are the only three cases
The key insight is that the per-level work at level is:
As increases from to , the factor grows exponentially (more subproblems) while shrinks (smaller subproblem sizes). The total cost is , which is a sum of terms. The behavior of this sum depends on whether is increasing, constant, or decreasing in :
- If shrinks faster than grows → increases → geometric sum dominated by the last term (leaves) → Case 1.
- If shrinks at exactly the rate grows → is constant → equal terms → Case 2.
- If shrinks slower than grows → decreases → geometric sum dominated by the first term (root) → Case 3.
The critical exponent is the dividing line: it is the rate at which the number of subproblems grows across levels. When grows faster than , the root dominates; when it grows slower, the leaves dominate; when it grows at the same rate, all levels are balanced.
Let us apply the Master Theorem to our earlier examples.
Example 2.9: Binary search. .
Here , , . We have . Since , Case 2 applies:
Example 2.10: Merge sort. .
Here , , . We have . Since , Case 2 applies:
Example 2.11: Strassen's matrix multiplication. .
Here , , . We have . Since with , Case 1 applies:
This is better than the naive matrix multiplication.
Limitations of the Master Theorem
The Master Theorem does not cover all recurrences. It requires that the subproblems be of equal size and that fall into one of the three cases. Recurrences like (which arises in randomized quicksort analysis) do not fit the template directly. For such cases, the recursion-tree method or the Akra–Bazzi theorem can be used.
Space complexity
So far we have focused on time complexity, but algorithms also consume memory. Space complexity measures the amount of additional memory an algorithm uses beyond the input.
Definition 2.9 - Space complexity
The space complexity of an algorithm is the maximum amount of memory it uses at any point during execution, measured as a function of the input size.
We distinguish between:
- Auxiliary space: the extra memory used beyond the input itself.
- Total space: auxiliary space plus the space for the input.
Unless stated otherwise, when we refer to "space complexity" in this book, we mean auxiliary space.
Example 2.12: Space complexity of max.
The max function from Chapter 1 uses a single variable result. Its auxiliary space is .
Example 2.13: Space complexity of merge sort.
Merge sort requires a temporary array of size for the merge step, plus space for the recursion stack. Its auxiliary space is .
Example 2.14: Space complexity of insertion sort.
Our insertion sort implementation copies the input array (space ). An in-place variant that sorts the array directly would use only auxiliary space.
Time–space trade-offs
Often there is a trade-off between time and space. An algorithm can sometimes be made faster by using more memory, or made more memory-efficient at the cost of additional computation. A classic example:
- Hash table lookup (Chapter 8): average time, space.
- Linear search through an unsorted array: time, space.
Both solve the problem of finding an element in a collection, but they make different trade-offs. Recognizing and navigating these trade-offs is a recurring theme in algorithm design.
Practical considerations
Asymptotic analysis is a powerful framework, but it has limitations that a practicing programmer should keep in mind.
Constant factors matter for moderate
Asymptotic notation hides constant factors. As an example, an algorithm with running time is , and an algorithm with running time is . For , the "slower" algorithm is actually faster. In practice, constant factors depend on:
- The number of operations per step.
- Cache behavior — algorithms with good spatial locality are faster in practice.
- Branch prediction — algorithms with predictable control flow benefit from CPU branch predictors.
This is why, for example, insertion sort (which is ) is often used for small arrays (say, ) even inside asymptotically faster algorithms like merge sort. The constant factor is smaller, and for tiny inputs the quadratic term has not yet become dominant.
Lower-order terms
An algorithm that performs operations is , but for , the linear term dominates. Asymptotic analysis describes long-term growth; for small inputs, the actual constants and lower-order terms may be more important.
Choosing the right model
Our analysis assumes a simple model where every basic operation takes the same amount of time. Real computers have caches, pipelines, and memory hierarchies that make some access patterns much faster than others. An algorithm that accesses memory sequentially (like insertion sort) can be significantly faster in practice than one that accesses memory randomly (like binary search on a large array), even if the latter has a better asymptotic bound.
Despite these caveats, asymptotic analysis remains the single most useful tool for comparing algorithms. It correctly predicts which algorithm will win for large enough inputs, and "large enough" usually means "the sizes that actually matter in practice."
Summary
In this chapter we have developed the fundamental tools for analyzing algorithms:
- Asymptotic notation (, , ) captures growth rates while abstracting away constant factors and hardware details.
- Worst-case analysis gives reliable upper bounds on running time. Best-case and average-case analyses provide additional insight.
- Amortized analysis reveals that operations with occasional expensive steps can still be efficient on average.
- Recurrence relations express the running time of recursive algorithms, and the Master Theorem provides a quick way to solve common recurrences.
- Space complexity measures memory usage and highlights time–space trade-offs.
Armed with these tools, we are ready to analyze every algorithm in this book rigorously. In the next chapter, we explore recursion and the divide-and-conquer strategy — one of the most powerful algorithm design techniques — and apply our analytical framework to algorithms like binary search and the closest pair of points.
Exercises
Exercise 2.1. Rank the following functions by asymptotic growth rate, from slowest to fastest. For each consecutive pair, state whether , , or .
Exercise 2.2. Prove or disprove: if and , then . (In other words, is Big-O transitive?)
Exercise 2.3. For each of the following recurrences, use the Master Theorem to determine the asymptotic bound, or explain why the Master Theorem does not apply.
(a)
(b)
(c)
(d)
Exercise 2.4. Consider a dynamic array that triples (instead of doubles) its capacity when full. Prove that the amortized cost of an append operation is still . How does the constant factor compare to the doubling strategy?
Exercise 2.5. An algorithm processes an array of elements as follows: for each element, it performs a binary search over the preceding elements. What is the overall time complexity? Express your answer in Big-Theta notation.
Recursion and Divide-and-Conquer
Recursion is one of the most powerful techniques in algorithm design: solving a problem by breaking it into smaller instances of the same problem. In this chapter we study recursion from the ground up, connect it to mathematical induction, and then develop the divide-and-conquer strategy — splitting a problem into independent subproblems, solving each recursively, and combining the results. We illustrate these ideas with four algorithms: binary search, fast exponentiation, the Euclidean algorithm for greatest common divisors, and the closest pair of points.
Recursion
Some problems have a natural recursive structure: the solution depends on solving one or more smaller instances of the same problem. When we translate this structure into code, we get a recursive function — a function that calls itself. This is not mere circularity: each call works on a smaller instance of the problem, and eventually the instances become small enough to solve directly. Every recursive function has two essential ingredients:
- Base case. One or more input sizes for which the answer is immediate, without further recursion.
- Recursive case. For larger inputs, the function reduces the problem to one or more smaller instances and combines the results.
Consider a simple example: computing the factorial .
function factorial(n: number): number {
if (n <= 1) return 1; // base case
return n * factorial(n - 1); // recursive case
}
The base case is , where we return 1. The recursive case multiplies by the factorial of . Each recursive call reduces the argument by 1, so the chain of calls eventually reaches the base case:
The call stack
To understand how recursion works at runtime, we need to understand how computers handle function calls in general. Whenever any function is called — recursive or not — the runtime needs to remember where to return after the call finishes, and what values the local variables had. It stores this information in a frame: a block of memory holding the function's arguments, local variables, and the return address (the point in the calling code to resume after the call completes).
These frames are organized in a call stack — a stack data structure where each new call pushes a frame on top, and each return pops one off. For ordinary (non-recursive) code, the stack is typically shallow — for example, if main calls f, which calls g, the stack is only three frames deep. But with recursion, the same function can appear many times on the stack simultaneously, each frame representing a different invocation with different argument values.
For factorial(4), the stack grows to depth 4 before the base case is reached:
factorial(4) — waiting for factorial(3)
factorial(3) — waiting for factorial(2)
factorial(2) — waiting for factorial(1)
factorial(1) — returns 1
factorial(2) — returns 2 × 1 = 2
factorial(3) — returns 3 × 2 = 6
factorial(4) — returns 4 × 6 = 24
Each frame occupies memory, so a recursion of depth uses stack space. For factorial(n), the depth is , so the space complexity is .
Stack overflow
The call stack has a fixed maximum size, set by the operating system or the language runtime. When a recursion goes too deep, the stack runs out of space, and the program crashes with a stack overflow error. This is not an abstract concern — it happens easily in practice. For example, our factorial function works fine for small inputs, but calling factorial(100000) will likely crash:
factorial(100000); // RangeError: Maximum call stack size exceeded
In JavaScript and TypeScript, the default stack size typically allows around 10,000–15,000 frames (the exact limit depends on the runtime and the size of each frame). Other languages have similar limits.
This is an important practical consideration when choosing between a recursive and an iterative solution. Any recursion can be rewritten as a loop with an explicit stack, trading the elegance of recursion for safety against overflow. But for some problems the clarity and elegance of the recursive solution outweigh the cost. For algorithms where the recursion depth is logarithmic (like binary search, with depth ), stack overflow is never a concern — even for , the depth is only about 60. But for algorithms where the depth is linear in the input (like our factorial), an iterative version may be preferable for large inputs.
Common pitfalls
Two mistakes arise frequently when writing recursive functions:
-
Missing base case. Without a base case, the recursion never terminates:
function infiniteRecursion(n: number): number { return n * infiniteRecursion(n - 1); // no base case! }This is not an algorithm in the sense of Definition 1.1 — it does not terminate.
-
Subproblems that do not shrink. Even with a base case, the recursion must make progress:
function noProgress(n: number): number { if (n <= 1) return 1; return n * noProgress(n); // n does not decrease! }This function never reaches the base case for .
Recursion and mathematical induction
Mathematical induction is a proof technique for showing that a statement is true for every element of a well-ordered sequence — most commonly the natural numbers, but the idea applies whenever instances can be ranked by size (array lengths, tree depths, and so on). It works in two steps. First, you show the statement is true for the smallest case (usually or ) — this is the base case. Second, you show that if the statement is true for some number , then it must also be true for — this is the inductive step. Together, these two steps create a chain of reasoning: the base case establishes the first domino, and the inductive step guarantees that each domino knocks over the next, so the statement holds for all natural numbers.
There is a deep connection between this technique and recursion. Induction proves that a property holds for all natural numbers; recursion computes a value for all valid inputs. The structures are parallel:
| Induction | Recursion |
|---|---|
| Base case: prove (or ) | Base case: return a value directly |
| Inductive step: assuming , prove | Recursive case: assuming the recursive call returns the correct result, compute the current result |
This parallel is not a coincidence — it is the foundation for proving recursive algorithms correctness. To prove that a recursive function computes the right answer, we use strong induction (also called complete induction): assume the function works correctly for all inputs smaller than , and show it works correctly for input .
Definition 3.1 - Correctness of a recursive algorithm
A recursive algorithm is correct if:
- It produces the correct answer on all base cases.
- If the algorithm produces the correct answer on every strictly smaller subproblem, then it also produces the correct answer on the current problem.
When we implement a recursive algorithm as a function in code, these two conditions translate directly: the base case corresponds to the if branch that returns a value without recursing, and the recursive case corresponds to the branch that calls the function on a smaller input and combines the result. Condition 2 becomes: if every recursive call on a strictly smaller input returns the correct answer, then the current call also returns the correct answer.
Not every function is an algorithm — a function might not terminate, or might not solve a well-defined problem — but when a recursive function does implement an algorithm, proving it correct means verifying exactly these two conditions.
Example 3.1: Correctness of factorial.
Base case. When , the function returns 1, and indeed .
Inductive step. Assume factorial(k) returns for all . Then factorial(n) returns .
Divide and conquer
Divide and conquer is a specific recursion pattern that solves a problem by:
- Divide: split the input into two or more smaller subproblems of the same type.
- Conquer: solve each subproblem recursively (or directly if it is small enough).
- Combine: merge the subproblem solutions into a solution for the original problem.
Not every recursive algorithm is divide-and-conquer. The factorial function above reduces the problem by a constant amount (from to ), which is sometimes called decrease and conquer. True divide-and-conquer algorithms typically split the input by a constant fraction (usually in half), leading to logarithmic recursion depth and often dramatically better performance.
The running time of a divide-and-conquer algorithm is typically expressed as a recurrence of the form
where is the number of subproblems, is their size, and is the cost of dividing and combining.
Note that and need not be equal: describes how the input is partitioned (a structural choice), while describes how many of those parts the algorithm actually recurses on (an algorithmic choice). For example, binary search splits the array in two () but recurses on only one half (); merge sort also splits in two () but must recurse on both halves (). An algorithm can even have : Karatsuba multiplication splits each number into two halves () but produces three recursive subproblems () from clever algebraic rearrangement. The relationship between and is what ultimately determines the algorithm's growth rate. As we saw in Chapter 2, the Master Theorem often gives us the asymptotic estimate for directly.
Binary search
Our first divide-and-conquer algorithm is one of the most important ones: binary search. It finds the position of a target value in a sorted array by repeatedly halving the search space.
The problem
Input: A sorted array of numbers and a target value .
Output: An index such that , or if is not in .
The algorithm
The idea is simple: compare with the middle element of the array.
- If they match, return the index.
- If is smaller, recurse on the left half.
- If is larger, recurse on the right half.
Each step eliminates half of the remaining elements.
Recursive implementation
Since binary search is a divide-and-conquer algorithm, it is most natural to express it recursively. The function takes an array, a target, and the current search range (low and high). The two base cases are: (1) the range is empty — the element is not present; (2) the middle element matches — we return its index. The recursive cases narrow the range to one half:
export function binarySearchRecursive(
arr: number[],
element: number,
low: number = 0,
high: number = arr.length - 1,
): number {
if (low > high) return -1; // base case: empty range
const mid = Math.floor((low + high) / 2);
const midVal = arr[mid]!;
if (midVal === element) {
return mid; // base case: found
} else if (midVal < element) {
return binarySearchRecursive(arr, element, mid + 1, high); // search right half
} else {
return binarySearchRecursive(arr, element, low, mid - 1); // search left half
}
}
This implementation directly mirrors the divide-and-conquer description: each call either solves the problem immediately (base case) or delegates to a single subproblem of half the size.
Tracing through an example
Let and .
| Call | low | high | mid | arr[mid] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 6 | 3 | 7 | : recurse on right half |
| 2 | 4 | 6 | 5 | 11 | : recurse on left half |
| 3 | 4 | 4 | 4 | 9 | : found, return 4 |
After only 3 comparisons (and 3 recursive calls), we have found the element in a 7-element array. A linear scan might have taken up to 7 comparisons.
Correctness
We prove correctness by induction on the size of the search range .
Base case. When (empty range, ), the element cannot be present, and the function correctly returns .
Inductive step. Assume the function returns the correct result for all ranges smaller than . For a range of size , the function computes and compares with :
- If , we return . Correct.
- If , then since is sorted, cannot be in . The recursive call searches , a strictly smaller range, so by the inductive hypothesis it returns the correct answer.
- The case is symmetric.
From recursion to iteration
Notice that the recursive binary search is tail-recursive: the recursive call is the very last operation in each branch — the function returns whatever the recursive call returns, without doing any further computation. Tail-recursive functions can always be transformed into a simple loop: we replace the recursive calls with assignments to the parameters and repeat.
Of course, even without this manual transformation, the computer already executes the recursion iteratively at the hardware level — using the call stack we discussed earlier in this chapter. Each recursive call pushes a new frame, and each return pops one off. But that mechanical translation is wasteful: it allocates a stack frame for every call, even though the caller does nothing with the result except pass it through. Transforming a tail-recursive function into a loop eliminates the stack entirely — the parameters are simply updated in place and control jumps back to the top of the function. The reason we can eliminate the stack is precisely that the function is tail-recursive: since nothing remains to be done after the recursive call returns, the caller's frame holds no state that is still needed, so there is nothing to save and no need for a stack frame at all. This is a general property — any tail-recursive function can be mechanically rewritten as a loop without a stack.
Let us apply this transformation to our recursive binary search:
export function binarySearch(arr: number[], element: number): number {
let low = 0;
let high = arr.length - 1;
while (low <= high) {
const mid = Math.floor((low + high) / 2);
const midVal = arr[mid]!;
if (midVal === element) {
return mid;
} else if (midVal < element) {
low = mid + 1;
} else {
high = mid - 1;
}
}
return -1;
}
The variables low and high play exactly the same role as the parameters of the recursive version — they define the current subproblem. Each iteration halves the range, just as each recursive call did. The iterative version has the advantage of using space instead of , because it avoids the overhead of the call stack. In practice, both versions are fine for binary search (the recursion depth is at most about 60 even for ), but the iterative form is conventional and marginally faster.
Complexity analysis
Each step halves the search range. Starting from elements, after steps we have at most elements. When there is still one element left to compare, so the search has not yet terminated. The range becomes empty — and the process terminates — when , i.e., after at most steps.
- Time complexity: (both versions).
- Space complexity: for the recursive version (call stack depth); for the iterative version.
Using the Master Theorem on the recurrence: . Here , , . Since , Case 2 gives .
Comparison with linear search
For comparison, here is the linear search algorithm:
export function linearSearch<T>(arr: T[], element: T): number {
let position = -1;
let currentIndex = 0;
while (position < 0 && currentIndex < arr.length) {
if (arr[currentIndex] === element) {
position = currentIndex;
} else {
currentIndex++;
}
}
return position;
}
Linear search works on any array (not just sorted ones) but takes time. Binary search requires a sorted array but is exponentially faster:
| Elements | Linear search | Binary search |
|---|---|---|
| 1,000 | 1,000 comparisons | 10 comparisons |
| 1,000,000 | 1,000,000 comparisons | 20 comparisons |
| comparisons | 30 comparisons |
This dramatic improvement — from linear to logarithmic — is the hallmark of the divide-and-conquer approach. The key insight is that each comparison does not eliminate just a single element but half of the remaining elements.
Fast exponentiation (exponentiation by squaring)
Our second example addresses the problem of computing efficiently.
The problem
Input: A number (the base) and a non-negative integer (the exponent).
Output: The value .
Naive approach
The straightforward approach multiplies by itself times:
export function powSlow(base: number, power: number): number {
let result = 1;
for (let i = 0; i < power; i++) {
result = result * base;
}
return result;
}
This performs multiplications, so it runs in time.
Exponentiation by squaring
We can do much better by observing a simple mathematical identity:
When is even, we compute once and square the result — a single multiplication instead of multiplications. When is odd, we reduce to an even exponent by extracting one factor of .
The recurrence above translates directly into a recursive function — just as with binary search. Writing that recursive version is straightforward (see Exercise 3.3). Here, we skip the intermediate steps we took for binary search and jump straight to the optimized iterative version. As before, the recursive version is tail-recursive, so the same transformation applies: we replace recursive calls with assignments to the parameters and loop:
export function pow(base: number, power: number): number {
let result = 1;
while (power > 0) {
if (power % 2 === 0) {
base = base * base;
power = power / 2;
} else {
result = result * base;
power = power - 1;
}
}
return result;
}
Tracing through an example
Let us compute :
| Step | base | power | result | Action |
|---|---|---|---|---|
| 1 | 2 | 10 | 1 | Even: base ← , power ← 5 |
| 2 | 4 | 5 | 1 | Odd: result ← , power ← 4 |
| 3 | 4 | 4 | 4 | Even: base ← , power ← 2 |
| 4 | 16 | 2 | 4 | Even: base ← , power ← 1 |
| 5 | 256 | 1 | 4 | Odd: result ← , power ← 0 |
Result: . The naive approach would have used 10 multiplications; fast exponentiation used 5.
Correctness
Invariant: At the start of each iteration, equals the original .
- Initialization. , , . The invariant holds.
- Maintenance.
- If power is even: we replace base with and power with . Then . Invariant preserved.
- If power is odd: we replace result with and power with . Then . Invariant preserved.
- Termination. When power , the invariant gives .
Complexity analysis
At each "odd" step, the exponent decreases by 1 (making it even). At each "even" step, the exponent halves. After at most two consecutive steps (one odd, one even), the exponent has been at least halved. Therefore the total number of steps is .
- Time complexity: .
- Space complexity: .
Alternatively, we can obtain a more precise asymptotic estimate directly from the Master Theorem. The recurrence for the recursive view is , the same as binary search, giving .
The Euclidean algorithm for GCD
The greatest common divisor (GCD) of two positive integers and is the largest integer that divides both. It is one of the oldest algorithms known, recorded by Euclid around 300 BC.
The problem
Input: Two positive integers and .
Output: , the largest positive integer dividing both and .
Naive approach
The brute-force approach tries every candidate from the smaller number downward. The GCD of and cannot exceed , because no number larger than can divide (and likewise for ). So there is no point starting the search above the smaller of the two inputs:
export function gcdSlow(x: number, y: number): number {
const min = Math.min(x, y);
for (let i = min; i >= 2; i--) {
if (x % i === 0 && y % i === 0) {
return i;
}
}
return 1;
}
This checks up to candidates, so its time complexity is .
The Euclidean algorithm
The Euclidean algorithm is based on a key observation:
The modulo operation. The expression (pronounced "x mod y") denotes the remainder when is divided by : for instance, because , and because divides evenly. In TypeScript this is written
x % y— we already used it in the naive GCD function above (x % i === 0). Formally, , and the result always satisfies .
This identity holds because the pairs and have exactly the same set of common divisors — any integer that divides both and also divides (a linear combination of and ), and conversely. Since the two pairs share the same divisors, they share the same greatest common divisor. Since , the arguments strictly decrease, and the process terminates when the remainder is 0:
Here is the implementation:
export function gcd(x: number, y: number): number {
while (y > 0) {
const r = x % y;
x = y;
y = r;
}
return x;
}
Tracing through an example
Let us compute . Each row shows and at the start of the iteration. After computing , the algorithm sets and , producing the values shown in the next row:
| Step | |||
|---|---|---|---|
| 1 | 210 | 2618 | 210 |
| 2 | 2618 | 210 | 98 |
| 3 | 210 | 98 | 14 |
| 4 | 98 | 14 | 0 |
| 5 | 14 | 0 | loop exits, return |
Result: .
The naive approach would have tested candidates from 2618 down to 14 — over 2600 iterations. The Euclidean algorithm needed only 5.
Correctness
We prove correctness by induction on the number of iterations.
Base case. When , the loop does not execute, and the algorithm returns . This is correct because .
Inductive step. Assume the algorithm correctly computes where . Since , the result is correct.
Complexity analysis
The key insight is that each iteration at least halves : whenever , we have . Since the algorithm swaps and at each step, after every two consecutive iterations the values have shrunk by at least a factor of 2. Starting from , we can halve at most times before reaching 0, so the total number of iterations is .
- Time complexity: .
- Space complexity: .
This is an exponential improvement over the naive approach.
Why . There are two cases. Case 1: . The remainder is always strictly less than the divisor, so . Case 2: . Since but , dividing by gives a quotient of exactly 1, so . In both cases, .
The closest pair of points
Our final example is the most challenging one, because it requires all three stages of the divide-and-conquer recipe — a nontrivial divide step, two recursive subproblems, and a combine step whose efficiency depends on a subtle geometric argument. The problem itself is easy to state: given a set of points in the plane, find the two that are closest to each other.
The problem
Input: A set of points in the plane, where each point is a pair of coordinates.
Output: A pair of points that minimize the Euclidean distance .
Brute-force approach
The obvious approach checks all pairs — the number of ways to choose 2 points from , written in the binomial coefficient notation :
function bruteForce(points: readonly Point[]): ClosestPairResult {
let best: ClosestPairResult = {
p1: points[0]!,
p2: points[1]!,
distance: distance(points[0]!, points[1]!),
};
for (let i = 0; i < points.length; i++) {
for (let j = i + 1; j < points.length; j++) {
const d = distance(points[i]!, points[j]!);
if (d < best.distance) {
best = { p1: points[i]!, p2: points[j]!, distance: d };
}
}
}
return best;
}
This runs in time. Can we do better?
The divide-and-conquer idea
The strategy is:
-
Divide. Sort the points by -coordinate and split them into a left half and a right half at the median -value.
-
Conquer. Recursively find the closest pair in and in . Let and be these distances, and let .
-
Combine. The overall closest pair is either entirely in , entirely in , or split — with one point in and one in . We have already found the first two cases. For the split case, we need to check if any split pair has distance less than .
The crux of the algorithm is the combine step: can we check split pairs efficiently?
The strip optimization
Consider the vertical strip of width centered on the dividing line (at the median -coordinate). Any split pair with distance less than must have both points in this strip, because otherwise the horizontal distance alone exceeds .
Now comes the key geometric insight. Sort the points in the strip by -coordinate. For any point in the strip, how many other strip points can lie within distance of ? Geometrically, such points could be anywhere in a box centered on (width from the strip, height because points could be above or below). But we can cut this box in half with a simple algorithmic trick: we will design the inner loop so that it walks through the strip points in ascending -order, and for each point it only checks points that come after in that order — that is, points above . There is no need to look below , because any pair where is below will already have been examined when was the current point and was above it. This way, every pair in the strip is considered exactly once.
Because we only look upward from , the relevant region is not the full box but only its upper half: a rectangle extending from to (see the figure). We can divide each half of this rectangle into four cells. Each cell has diagonal , so no two points from the same half can share a cell (they are at least apart). That gives at most 4 points per half, 8 total in the rectangle, so at most 7 other points besides .
This bound directly controls the cost of the nested loop in the code. Because at most 7 points above can lie within -distance , the inner loop executes at most 8 times per outer iteration: at most 7 points within range, plus 1 check that triggers the break. Summing over all strip points, the total number of inner-loop iterations is at most . The nested loop is linear, not quadratic, despite its two-loop appearance.
Implementation
We define the Point and ClosestPairResult types:
export interface Point {
x: number;
y: number;
}
export interface ClosestPairResult {
p1: Point;
p2: Point;
distance: number;
}
The distance function:
export function distance(a: Point, b: Point): number {
const dx = a.x - b.x;
const dy = a.y - b.y;
return Math.sqrt(dx * dx + dy * dy);
}
The main function sorts by -coordinate and delegates to the recursive helper:
export function closestPair(points: readonly Point[]): ClosestPairResult {
if (points.length < 2) {
throw new Error('At least 2 points are required');
}
// Tie-break on y for a deterministic order among points with equal x
const sortedByX = [...points].sort(
(a, b) => a.x - b.x || a.y - b.y,
);
return closestPairRec(sortedByX);
}
The recursive function implements the three steps:
function closestPairRec(points: readonly Point[]): ClosestPairResult {
if (points.length <= 3) {
return bruteForce(points);
}
const mid = Math.floor(points.length / 2);
const midPoint = points[mid]!;
const left = points.slice(0, mid);
const right = points.slice(mid);
const leftResult = closestPairRec(left);
const rightResult = closestPairRec(right);
let best =
leftResult.distance <= rightResult.distance
? leftResult
: rightResult;
const delta = best.distance;
// Build the strip
const strip: Point[] = [];
for (const p of points) {
if (Math.abs(p.x - midPoint.x) < delta) {
strip.push(p);
}
}
// Sort strip by y-coordinate
strip.sort((a, b) => a.y - b.y);
// Check each point against at most 7 subsequent points
for (let i = 0; i < strip.length; i++) {
for (let j = i + 1; j < strip.length; j++) {
const dy = strip[j]!.y - strip[i]!.y;
if (dy >= best.distance) {
break;
}
const d = distance(strip[i]!, strip[j]!);
if (d < best.distance) {
best = { p1: strip[i]!, p2: strip[j]!, distance: d };
}
}
}
return best;
}
Tracing through an example
Consider 6 points:
Step 1: Sort by x. .
Step 2: Divide. Left: . Right: . Dividing line at .
Step 3: Conquer (left). With 3 points, brute force checks all 3 pairs:
Closest in left: with .
Step 3: Conquer (right). Brute force on :
Closest in right: with .
Step 4: Combine. . The strip contains all points within of — which includes none of the left points (they are at , all more than away from 12) and only and on the right. The strip pair distance is 20, which does not improve on .
Result: The closest pair is with distance .
Correctness
The algorithm correctly finds the closest pair because it considers all three possible cases — closest pair entirely in the left, entirely in the right, or split across the dividing line. The correctness of the strip check follows from the observation that any split pair closer than must lie in the strip (both points are within of the dividing line), and the inner loop's break condition (dy >= delta) guarantees that every such pair is examined.
Base case. For 2 or 3 points, brute force checks all pairs. Correct.
Inductive step. Assume the recursive calls return the correct closest pairs in and . Then is the correct minimum distance within each half. The strip check examines all candidates for a closer split pair: it iterates over all strip points sorted by , and for each point checks subsequent points until the -distance reaches . Any split pair with distance less than must have -distance less than as well, so the break condition cannot skip a valid candidate.
Complexity analysis
Let be the running time. The algorithm:
- Divides the points in half: (the array is already sorted by ).
- Recursively solves two subproblems: .
- Builds and sorts the strip: in the worst case (the strip could contain all points).
- Checks strip pairs: . The nested loop looks quadratic, but the inner loop is tightly bounded by the geometry. Because the strip is sorted by -coordinate, the inner loop visits only points above the current point (earlier points were already handled). The packing argument guarantees that at most 7 such points can have -distance less than (they all lie in a rectangle above the current point, which fits at most 8 points total). Once the -distance reaches , the
breakfires. So the inner loop runs at most 8 times per outer iteration ( valid neighbors + 1 break), giving at most inner-loop iterations in total — including alldycomparisons and all distance computations.
The combine step is dominated by the strip sort at . The recurrence is:
This does not fall neatly into Case 2 of the Master Theorem (where ). Solving by the recursion tree method or the Akra-Bazzi theorem gives .
However, the initial sort by -coordinate costs and is done once. With a more careful implementation (maintaining a pre-sorted-by- list using a merge step instead of re-sorting the strip), the combine step can be reduced to , giving the optimal recurrence:
Our implementation uses the simpler approach, which is already a substantial improvement over the brute force. In practice, the strip is typically much smaller than , so the extra logarithmic factor is rarely felt.
- Time complexity: as implemented; with the merge-based optimization.
- Space complexity: for the sorted arrays and strip.
Summary of closest pair
| Approach | Time | Space |
|---|---|---|
| Brute force | ||
| Divide-and-conquer (simple) | ||
| Divide-and-conquer (optimal) |
The closest pair problem beautifully illustrates the power of divide and conquer. The brute-force approach must check all pairs. By splitting the problem, solving each half, and cleverly bounding the combine step, we achieve near-linear time.
The divide-and-conquer recipe
Looking back at our four algorithms, we can identify a common recipe:
-
Identify a way to shrink the problem. Binary search halves the array, exponentiation by squaring halves the exponent, the Euclidean algorithm replaces a number with a remainder, and closest pair splits the point set.
-
Solve the smaller instance(s). Sometimes there is one subproblem (binary search, exponentiation, GCD); sometimes there are two (closest pair).
-
Combine. Binary search and GCD need no combining — the subproblem answer is the final answer. Exponentiation squares the subresult. Closest pair must check the strip.
-
Analyze with recurrences. The running time follows from the recurrence and the Master Theorem (or recursion tree method when the Master Theorem does not apply directly).
This recipe is a powerful tool for designing new algorithms. When you face a problem, ask: can I split it into smaller instances of the same problem? If so, the divide-and-conquer approach may yield an efficient solution.
A note on memoization
There is one more important idea connected to recursion that we should mention here: memoization.
Many recursive algorithms solve the same subproblems repeatedly. Consider computing the Fibonacci numbers recursively: . A direct recursive implementation calls itself twice at each level, and the subproblems overlap heavily — is computed many times when computing , is computed many times when computing , and so on. The resulting recursion tree grows exponentially, even though there are only distinct subproblems.
Memoization is a technique that eliminates this redundancy: when a recursive function is about to compute a subproblem, it first checks whether the result has already been computed and stored (in a cache, hash map, or array). If so, it returns the cached value immediately; if not, it computes the result, stores it, and then returns it. The name comes from "memo" — a note to oneself — and the effect can be dramatic: for Fibonacci, memoization reduces the time from exponential to linear , because each of the subproblems is solved at most once.
Memoization is valuable whenever a recursive decomposition produces overlapping subproblems — the same smaller instance appears in multiple branches of the recursion. The divide-and-conquer algorithms in this chapter (binary search, fast exponentiation, GCD, closest pair) do not have this property: each recursive call works on a distinct portion of the input, so no subproblem is ever solved twice. But many important recursive algorithms do have overlapping subproblems, and for those, memoization is the difference between a practical algorithm and an unusably slow one.
This idea is at the heart of dynamic programming, a powerful algorithm design paradigm that we study in detail in Chapter 16. There we will see memoization in action on problems such as Fibonacci numbers, coin change, longest common subsequence, and the knapsack problem, and we will also explore tabulation — a bottom-up alternative that avoids recursion entirely.
Summary
In this chapter we studied recursion and the divide-and-conquer strategy:
- Recursion solves a problem by reducing it to smaller instances, terminating at base cases. Its correctness is proven by induction.
- Divide-and-conquer is a specific recursion pattern: divide into subproblems, conquer recursively, combine the results.
- Binary search halves the search space at each step, achieving time.
- Exponentiation by squaring computes in multiplications instead of .
- The Euclidean algorithm computes GCD in time, an ancient and elegant application of the divide-and-conquer idea.
- The closest pair of points demonstrates a nontrivial combine step, achieving (or in the simpler variant) versus brute force.
- Memoization caches the results of recursive calls to avoid redundant computation when subproblems overlap — an idea we will develop fully in Chapter 16 on dynamic programming.
In the next chapter, we turn to the sorting problem. We begin with three elementary sorting algorithms — bubble sort, selection sort, and insertion sort — all of which run in time. In Chapter 5, we study efficient sorting algorithms — merge sort, quicksort, and heapsort — that use divide-and-conquer to achieve time.
Exercises
Exercise 3.1. In this chapter we presented both a recursive and an iterative version of binary search. Explain why the recursive version is tail-recursive and how that property enables the transformation to the iterative version. Can every recursive function be transformed this way? Give an example of a recursive function that is not tail-recursive and explain what makes the transformation harder.
Exercise 3.2. The Tower of Hanoi puzzle has disks of decreasing size stacked on one of three pegs. The goal is to move all disks to another peg, moving one disk at a time, never placing a larger disk on a smaller one. Write a recursive function hanoi(n: number, from: string, to: string, via: string): void that prints the moves. What is the time complexity? Prove that moves are both necessary and sufficient.
Exercise 3.3. Implement a recursive version of the pow function (exponentiation by squaring). Analyze its space complexity and compare it with the iterative version.
Exercise 3.4. The maximum subarray problem asks for a contiguous subarray of an array of numbers with the largest sum. Design an divide-and-conquer algorithm for this problem. (Hint: split the array in half; the maximum subarray is entirely in the left half, entirely in the right half, or crossing the midpoint.)
Exercise 3.5. Karatsuba's algorithm multiplies two -digit numbers using the recurrence . Use the Master Theorem to determine its time complexity. How does this compare with the naive multiplication algorithm?
Elementary Sorting
Sorting is one of the most fundamental problems in Computer Science. In this chapter we define the sorting problem precisely, introduce the concepts of stability and in-place sorting, and study three elementary sorting algorithms — bubble sort, selection sort, and insertion sort. All three run in time in the worst case, but they differ in important ways: their behavior on nearly sorted input, their stability properties, and their practical performance. We close the chapter by proving that any comparison-based sorting algorithm must make comparisons in the worst case — a lower bound that the elementary algorithms do not achieve, motivating the efficient algorithms of Chapter 5.
The sorting problem
Sorting is the problem of rearranging a collection of elements into a specified order. It arises constantly in practice — in database queries, in preparing data for binary search, in eliminating duplicates, in scheduling, and in countless other contexts. Knuth devoted an entire volume of The Art of Computer Programming to sorting and searching, calling sorting "perhaps the most deeply studied problem in Computer Science."
Definition 4.1 - The sorting problem
Input: A sequence of elements and a total ordering on the elements.
Output: A permutation of the input such that .
The definition requires a total ordering on the elements. A total ordering is a relation that satisfies four properties for all elements , , and :
- Reflexivity: .
- Transitivity: if and , then .
- Antisymmetry: if and , then .
- Totality: for any two elements, either or (or both).
The crucial property is totality: every pair of elements is comparable. Numbers compared with are the most familiar example — given any two numbers, one is less than or equal to the other — but numbers are not the only things we can sort: for example, strings can be sorted lexicographically (dictionary order), dates can be sorted chronologically, and objects can be sorted by any key that admits a total ordering. No matter what the nature of the elements is, any sequence of such elements can be sorted, if we can define a total ordering on them.
So far we have seen only total orderings. A natural question arises if there exist any other kinds of orderings. It turns out that not all orderings are total: a partial ordering satisfies the first three properties but not totality: some pairs of elements may be incomparable. For example, consider sets ordered by the subset relation . We have , but and are incomparable — neither is a subset of the other. Another example is a task dependency relation: task A must precede task B, and task C must precede task D, but A and C have no ordering relation between them.
Because with partial ordering we cannot compare any arbitrary pair of elements, we cannot sort elements into a single linear sequence (though we can topologically sort them, which is a different problem discussed in Chapter 12). This is why the condition that there is a total ordering defined on the elements of the sequence being sorted is important and cannot be omitted: sorting requires a total ordering because the output must be a linear sequence where every adjacent pair of elements satisfies . If some elements were incomparable, there would be no way to decide which should come first.
In TypeScript, we express the ordering through a comparator function:
export type Comparator<T> = (a: T, b: T) => number;
The comparator returns a negative number if , zero if , and a positive number if . For numbers in ascending order, the comparator is simply:
export const numberComparator: Comparator<number> = (a, b) => a - b;
All three sorting algorithms in this chapter accept an optional comparator, defaulting to numberComparator. This makes them generic: they can sort arrays of any type, provided an appropriate comparator is supplied.
Stability
When a sequence contains elements that compare as equal, there is a choice: should the algorithm preserve the original relative order of equal elements, or is potentially re-arranging these equal elements acceptable?
Definition 4.2 - Stable sort
A sorting algorithm is stable if, whenever two elements and satisfy and in the input, then appears before in the output.
Stability matters when elements carry additional data beyond the sort key. For example, suppose we sort a list of students by grade, and two students — Alice and Bob — both have a grade of 90. If Alice appeared before Bob in the original list, a stable sort guarantees she still appears before Bob in the sorted output. An unstable sort might swap them.
Stability also enables multi-key sorting by composition: to sort by last name and then by first name, we first sort by first name (using a stable sort). When we then sort by last name (using a stable sort), stability ensures that records sharing the same last name retain the first-name ordering from the first sort — giving us the desired two-level sort.
Of the three algorithms in this chapter, bubble sort and insertion sort are stable, while selection sort is not.
In-place sorting
Definition 4.3 - In-place sort
A sorting algorithm is in-place if it uses auxiliary space — that is, a constant amount of memory beyond the input array.
All three algorithms in this chapter are in-place: they sort by swapping and shifting elements within the array, using only a constant number of temporary variables. Our TypeScript implementations sort the input array directly — the caller's array is modified.
Bubble sort
Bubble sort's main virtue is pedagogical simplicity: it is the easiest sorting algorithm to understand and implement, which makes it an ideal first example when studying sorting. In practice, however, it is outperformed by insertion sort on nearly every input and is rarely used outside the classroom. The algorithm works by repeatedly scanning the array from left to right, swapping adjacent elements that are out of order. After each complete pass, the largest unsorted element has "bubbled" to its correct position at the end. After passes, every element is in place.
The algorithm
- For :
- For :
- If , swap and .
- For :
After pass , the last elements are already in their final positions, so the inner loop only needs to scan positions through .
Implementation
export function bubbleSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
for (let k = 0; k < elements.length - 1; k++) {
for (let i = 1; i < elements.length - k; i++) {
if (comparator(elements[i - 1]!, elements[i]!) > 0) {
const temp = elements[i - 1]!;
elements[i - 1] = elements[i]!;
elements[i] = temp;
}
}
}
return elements;
}
Tracing through an example
Let us sort .
Pass 1 (, inner loop scans ):
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 1 | [5, 3, 8, 4, 2] | ? Yes | Swap | [3, 5, 8, 4, 2] |
| 2 | [3, 5, 8, 4, 2] | ? No | — | [3, 5, 8, 4, 2] |
| 3 | [3, 5, 8, 4, 2] | ? Yes | Swap | [3, 5, 4, 8, 2] |
| 4 | [3, 5, 4, 8, 2] | ? Yes | Swap | [3, 5, 4, 2, 8] |
After pass 1, the largest element (8) is in its final position.
Pass 2 (, inner loop scans ):
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 1 | [3, 5, 4, 2, 8] | ? No | — | [3, 5, 4, 2, 8] |
| 2 | [3, 5, 4, 2, 8] | ? Yes | Swap | [3, 4, 5, 2, 8] |
| 3 | [3, 4, 5, 2, 8] | ? Yes | Swap | [3, 4, 2, 5, 8] |
After pass 2, the two largest elements (5, 8) are in place.
Pass 3 (, inner loop scans ):
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 1 | [3, 4, 2, 5, 8] | ? No | — | [3, 4, 2, 5, 8] |
| 2 | [3, 4, 2, 5, 8] | ? Yes | Swap | [3, 2, 4, 5, 8] |
Pass 4 (, inner loop scans ):
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 1 | [3, 2, 4, 5, 8] | ? Yes | Swap | [2, 3, 4, 5, 8] |
Result: .
Correctness
We prove correctness using the following loop invariant for the outer loop.
Invariant: After complete passes, the largest elements are in their correct final positions at the end of the array, and the algorithm has not changed the relative order of equal elements.
Initialization: Before any passes (), the invariant holds trivially — zero elements are known to be in their final positions.
Maintenance: Consider pass . The inner loop scans positions through from left to right, swapping adjacent out-of-order pairs. The largest element in the unsorted prefix "bubbles" rightward through every comparison, because it is larger than (or equal to) every element it encounters. By the end of the pass, this element has reached position , which is its correct final position. The swap condition uses strict inequality (), so equal elements are never swapped — preserving stability.
Termination: The outer loop runs exactly times and then terminates. By the invariant, after passes all largest elements are in their correct positions. The remaining element — the smallest — is necessarily in position , so the entire array is sorted.
Complexity analysis
Worst case. The outer loop performs passes. Pass performs comparisons. The total is:
Best case. Even when the array is already sorted, the algorithm performs all passes:
Average case. comparisons in all cases.
Space complexity. auxiliary space.
Early termination optimization
The basic algorithm always performs passes, even if the array becomes sorted early. We can improve the best case by tracking whether any swap occurred during a pass. If a complete pass makes no swaps, the array is already sorted and we can stop:
export function bubbleSortOptimized<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
let n = elements.length;
let wasSwapped = true;
while (wasSwapped) {
wasSwapped = false;
for (let i = 1; i < n; i++) {
if (comparator(elements[i - 1]!, elements[i]!) > 0) {
const temp = elements[i - 1]!;
elements[i - 1] = elements[i]!;
elements[i] = temp;
wasSwapped = true;
}
}
n--;
}
return elements;
}
This optimization does not change the worst-case or average-case complexity, but on already-sorted input only one pass is needed ( comparisons, zero swaps), giving a best case. The correctness proof from above still applies: by the loop invariant, each pass places at least one more element in its final position, so after at most passes the array is sorted and wasSwapped remains false, guaranteeing termination.
Properties
| Property | Bubble sort | Bubble sort (optimized) |
|---|---|---|
| Worst-case time | ||
| Best-case time | ||
| Average-case time | ||
| Space | in-place | in-place |
| Stable | Yes | Yes |
Selection sort
Selection sort takes a different approach: instead of bubbling elements rightward, it repeatedly finds the minimum element from the unsorted portion and places it at the beginning.
The algorithm
- For :
- Find the index of the minimum element in .
- Swap and .
After iteration , the first positions contain the smallest elements in sorted order.
Implementation
export function selectionSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
for (let i = 0; i < elements.length - 1; i++) {
let remainingMinimum = elements[i]!;
let indexToSwap = -1;
for (let j = i + 1; j < elements.length; j++) {
if (comparator(elements[j]!, remainingMinimum) < 0) {
remainingMinimum = elements[j]!;
indexToSwap = j;
}
}
if (indexToSwap >= 0) {
elements[indexToSwap] = elements[i]!;
elements[i] = remainingMinimum;
}
}
return elements;
}
Tracing through an example
Let us sort .
Iteration 1 (): find the minimum in and place it at position 0.
| Array | Comparison | Update minimum? | Current minimum | |
|---|---|---|---|---|
| — | [29, 10, 14, 37, 13] | — | Initialize | 29 (index 0) |
| 1 | [29, 10, 14, 37, 13] | ? Yes | Yes | 10 (index 1) |
| 2 | [29, 10, 14, 37, 13] | ? No | No | 10 (index 1) |
| 3 | [29, 10, 14, 37, 13] | ? No | No | 10 (index 1) |
| 4 | [29, 10, 14, 37, 13] | ? No | No | 10 (index 1) |
Minimum is 10 at index 1. Swap and : .
Array after iteration 1: .
Iteration 2 (): find the minimum in and place it at position 1.
| Array | Comparison | Update minimum? | Current minimum | |
|---|---|---|---|---|
| — | [10, 29, 14, 37, 13] | — | Initialize | 29 (index 1) |
| 2 | [10, 29, 14, 37, 13] | ? Yes | Yes | 14 (index 2) |
| 3 | [10, 29, 14, 37, 13] | ? No | No | 14 (index 2) |
| 4 | [10, 29, 14, 37, 13] | ? Yes | Yes | 13 (index 4) |
Minimum is 13 at index 4. Swap and : .
Array after iteration 2: .
Iteration 3 (): find the minimum in and place it at position 2.
| Array | Comparison | Update minimum? | Current minimum | |
|---|---|---|---|---|
| — | [10, 13, 14, 37, 29] | — | Initialize | 14 (index 2) |
| 3 | [10, 13, 14, 37, 29] | ? No | No | 14 (index 2) |
| 4 | [10, 13, 14, 37, 29] | ? No | No | 14 (index 2) |
Minimum is 14 at index 2. No swap needed — the minimum is already at position .
Array after iteration 3: .
Iteration 4 (): find the minimum in and place it at position 3.
| Array | Comparison | Update minimum? | Current minimum | |
|---|---|---|---|---|
| — | [10, 13, 14, 37, 29] | — | Initialize | 37 (index 3) |
| 4 | [10, 13, 14, 37, 29] | ? Yes | Yes | 29 (index 4) |
Minimum is 29 at index 4. Swap and : .
Array after iteration 4: .
Result: .
Correctness
Invariant: After iteration of the outer loop, the subarray contains the smallest elements of the original array, in sorted order, and the remaining elements in are all greater than or equal to .
Initialization: Before the first iteration (), the sorted prefix is empty. The invariant holds trivially.
Maintenance: In iteration , the inner loop scans and finds the minimum element. This element is the smallest among all elements not yet in the sorted prefix (since, by the invariant, all smaller elements are already in ). Swapping it into position extends the sorted prefix by one element, maintaining the invariant.
Termination: After iterations, positions through contain the smallest elements in order. The remaining element at position is necessarily the largest, so the entire array is sorted.
Why selection sort is not stable
Consider the array , where and are equal values distinguished by subscripts to track their original positions. In the first iteration, selection sort finds the minimum (1, at index 2) and swaps it with :
Now appears before , but in the original array appeared first. The relative order of equal elements has been reversed. This happens because the swap moves past in a single step, without regard for their original order.
Complexity analysis
The inner loop in iteration performs comparisons. The total number of comparisons is:
This count is the same regardless of the input — selection sort always performs exactly comparisons, whether the array is sorted, reverse-sorted, or random.
Swaps. Selection sort performs at most swaps (one per outer-loop iteration). This is a notable advantage in languages where swaps are expensive — for example, in C or C++, an array of structs stores the structs inline, so swapping two elements copies the entire struct byte by byte, and larger structs mean slower swaps. In TypeScript, however, arrays of objects store references (pointers) rather than the objects themselves, so swapping two elements only exchanges two small references — a constant-time operation regardless of how large the underlying objects are. The low swap count of selection sort therefore matters most in languages with value-type semantics; in TypeScript the benefit is negligible.
Space complexity. auxiliary space.
Properties
| Property | Selection sort |
|---|---|
| Worst-case time | |
| Best-case time | |
| Average-case time | |
| Space | in-place |
| Stable | No |
Insertion sort
Insertion sort is the algorithm most people use intuitively when sorting a hand of playing cards. We hold the sorted cards in our left hand and pick up one card at a time from the table with our right hand, inserting it into the correct position among the already-sorted cards.
The algorithm
- For :
- Let .
- Insert into the sorted subarray by shifting larger elements one position to the right.
Implementation
export function insertionSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
for (let i = 1; i < elements.length; i++) {
const toInsert = elements[i]!;
let insertIndex = i - 1;
while (insertIndex >= 0 && comparator(toInsert, elements[insertIndex]!) < 0) {
elements[insertIndex + 1] = elements[insertIndex]!;
insertIndex--;
}
insertIndex++;
elements[insertIndex] = toInsert;
}
return elements;
}
The inner while loop shifts elements rightward until it finds the correct position for toInsert. The use of strict less-than (< 0) in the comparator check means that equal elements are not shifted past each other, which makes the algorithm stable.
Tracing through an example
Let us sort . In the tables below, denotes insertIndex from the code.
Iteration 1 (, toInsert ): insert 2 into the sorted prefix .
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 0 | [5, 2, 4, 6, 1, 3] | ? Yes | Shift 5 right | [5, 5, 4, 6, 1, 3] |
| [5, 5, 4, 6, 1, 3] | Place 2 at position 0 | [2, 5, 4, 6, 1, 3] |
After iteration 1: .
Iteration 2 (, toInsert ): insert 4 into the sorted prefix .
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 1 | [2, 5, 4, 6, 1, 3] | ? Yes | Shift 5 right | [2, 5, 5, 6, 1, 3] |
| 0 | [2, 5, 5, 6, 1, 3] | ? No | Place 4 at position 1 | [2, 4, 5, 6, 1, 3] |
After iteration 2: .
Iteration 3 (, toInsert ): insert 6 into the sorted prefix .
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 2 | [2, 4, 5, 6, 1, 3] | ? No | Place 6 at position 3 | [2, 4, 5, 6, 1, 3] |
After iteration 3: . No shifting was needed — 6 is already in the right place.
Iteration 4 (, toInsert ): insert 1 into the sorted prefix .
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 3 | [2, 4, 5, 6, 1, 3] | ? Yes | Shift 6 right | [2, 4, 5, 6, 6, 3] |
| 2 | [2, 4, 5, 6, 6, 3] | ? Yes | Shift 5 right | [2, 4, 5, 5, 6, 3] |
| 1 | [2, 4, 5, 5, 6, 3] | ? Yes | Shift 4 right | [2, 4, 4, 5, 6, 3] |
| 0 | [2, 4, 4, 5, 6, 3] | ? Yes | Shift 2 right | [2, 2, 4, 5, 6, 3] |
| [2, 2, 4, 5, 6, 3] | Place 1 at position 0 | [1, 2, 4, 5, 6, 3] |
After iteration 4: .
Iteration 5 (, toInsert ): insert 3 into the sorted prefix .
| Array before | Comparison | Action | Array after | |
|---|---|---|---|---|
| 4 | [1, 2, 4, 5, 6, 3] | ? Yes | Shift 6 right | [1, 2, 4, 5, 6, 6] |
| 3 | [1, 2, 4, 5, 6, 6] | ? Yes | Shift 5 right | [1, 2, 4, 5, 5, 6] |
| 2 | [1, 2, 4, 5, 5, 6] | ? Yes | Shift 4 right | [1, 2, 4, 4, 5, 6] |
| 1 | [1, 2, 4, 4, 5, 6] | ? No | Place 3 at position 2 | [1, 2, 3, 4, 5, 6] |
After iteration 5: .
Result: .
Notice how each element is inserted into its correct position within the growing sorted prefix on the left. When the element is already in the right place (like 6 in iteration 3), no shifting is needed and the inner loop exits immediately.
Correctness
Invariant: At the start of iteration of the outer loop, the subarray is a sorted permutation of the elements originally in those positions.
Initialization: Before the first iteration (), the subarray contains a single element. A single element is trivially sorted.
Maintenance: During iteration , the element is removed from its position and inserted into the sorted subarray . The inner loop finds the correct insertion point by scanning rightward from position and shifting elements that are larger than . After the insertion, is a sorted permutation of the elements originally in .
Termination: When , the entire array is sorted.
Complexity analysis
The number of comparisons depends on the input.
Worst case. The worst case is a reverse-sorted array. In iteration , the element must be shifted past all elements in the sorted prefix, requiring comparisons. The total is:
Best case. The best case is an already-sorted array. In each iteration, the inner loop performs one comparison (finding that toInsert is already in place) and zero shifts:
This is remarkable: insertion sort runs in linear time on sorted input, matching the theoretical minimum for any algorithm that must verify sortedness.
Average case. On a random permutation, each element is, on average, shifted past half the elements in the sorted prefix:
Nearly sorted input. If each element is at most positions from its sorted position, the inner loop performs at most comparisons per element, giving . When is a small constant, insertion sort runs in linear time. This makes it an excellent choice for "nearly sorted" data and for finishing off the work of a more sophisticated algorithm (for example, some quicksort implementations switch to insertion sort for small subarrays).
Space complexity. auxiliary space.
Inversions
The performance of insertion sort is closely tied to the concept of inversions.
Definition 4.4 - Inversion
An inversion in a sequence is a pair with and .
Each swap (or shift) in insertion sort eliminates exactly one inversion. Therefore, the number of comparisons insertion sort makes is , where is the number of inversions in the input. A sorted array has inversions; a reverse-sorted array has , the maximum possible. On average, a random permutation has inversions.
This connection makes insertion sort the natural choice when we know the input has few inversions — it is adaptive to the presortedness of the input.
Properties
| Property | Insertion sort |
|---|---|
| Worst-case time | |
| Best-case time | |
| Average-case time | |
| Space | in-place |
| Stable | Yes |
| Adaptive | Yes (time depends on inversions) |
Comparison of elementary sorts
Now that we have studied all three algorithms, let us compare them side by side.
| Property | Bubble sort | Selection sort | Insertion sort |
|---|---|---|---|
| Worst-case time | |||
| Best-case time | |||
| Average-case time | |||
| Stable | Yes | No | Yes |
| Adaptive | No | No | Yes |
| Comparisons (worst) | |||
| Swaps (worst) | shifts |
The optimized bubble sort variant (with early termination) achieves best-case time and becomes adaptive, but the comparison above reflects the basic algorithm.
Several observations stand out:
-
Selection sort always does the same amount of work regardless of the input — it is not adaptive. However, it minimizes the number of swaps (), which matters when moving elements is expensive.
-
Insertion sort is the best general-purpose choice among the three. It is stable, adaptive, and efficient on small or nearly sorted inputs. In practice, it outperforms both bubble sort and selection sort.
-
Bubble sort in its basic form always performs comparisons regardless of input. Even with the early termination optimization, it is slower in practice than insertion sort because elements move only one position per swap, while insertion sort shifts an entire block. Bubble sort's main virtue is pedagogical simplicity.
The comparison-based sorting lower bound
All three elementary sorting algorithms are comparison-based: they access the input elements only through pairwise comparisons. This shared trait raises a natural question — is the best we can do under this model, or can a comparison-based algorithm sort faster? It turns out the answer is yes: merge sort, heapsort, and quicksort all achieve time, as we will see in Chapter 5. This immediately raises a deeper question: can we do better still — is there a comparison-based algorithm that beats ? It turns out that the answer is "no" and we are going to prove it below.
Theorem 4.1 - Comparison-based sorting lower bound
Any comparison-based sorting algorithm must make at least comparisons in the worst case to sort elements.
The decision tree argument
To prove this theorem, we model any comparison-based sorting algorithm as a decision tree. Each internal node represents a comparison between two elements (e.g., "is ?"), with two children corresponding to the outcomes "yes" and "no." Each leaf represents a specific output permutation.
For the algorithm to be correct, it must be able to produce every permutation of elements as output — there must be at least leaves. The number of comparisons in the worst case equals the height of the decision tree (the longest root-to-leaf path).
A binary tree of height has at most leaves. We have just established that a correct decision tree must have at least leaves. Since the actual number of leaves is simultaneously at most (the binary-tree bound) and at least (the correctness requirement), the upper bound must be large enough to accommodate the lower bound:
Taking logarithms:
It remains to show that . One can appeal to Stirling's approximation (), but a direct argument is just as short and more illuminating.
Claim —
Proof. Write as a sum and keep only the upper half of its terms:
Every term in the remaining sum satisfies , so . There are at least such terms, giving:
For we have , so , and therefore:
Combining with the claim, any comparison-based sorting algorithm requires comparisons in the worst case.
Implications
This lower bound tells us that algorithms like merge sort and heapsort are asymptotically optimal among comparison-based sorts — they cannot be improved in the worst case.
Moreover, once we prove that a sorting algorithm's worst-case running time is (an upper bound), the lower bound that we just established applies to it automatically — because it is a comparison-based sort. The two bounds together give us : such an algorithm is not merely "at most " but exactly in the worst case. There is no comparison-based sorting algorithm whose worst case grows slower, and merge sort and heapsort already match this floor.
It also tells us that our elementary algorithms are a factor of away from optimal. For , that factor is roughly 50,000 — the same dramatic gap we noted in the growth-rate table of Chapter 2.
However, as it is evident from the proof of the lower bound, the lower bound applies only to comparison-based sorting. Algorithms that exploit additional structure in the input (such as knowing that elements are integers in a bounded range) can sort in time, as we will see in Chapter 6.
Summary
In this chapter we studied the sorting problem and three elementary algorithms for solving it:
- Bubble sort repeatedly swaps adjacent out-of-order elements. It is simple and stable, with time in all cases. An early termination optimization can improve the best case to .
- Selection sort repeatedly selects the minimum from the unsorted portion. It always takes time but minimizes swaps to . It is not stable.
- Insertion sort inserts each element into its correct position in a growing sorted prefix. It is stable, adaptive to the number of inversions, and has best-case time. It is the practical choice among elementary sorts.
- The comparison-based lower bound of shows that these quadratic algorithms are not optimal.
In Chapter 5, we study three efficient sorting algorithms — merge sort, quicksort, and heapsort — that achieve the bound. These algorithms use the divide-and-conquer strategy from Chapter 3 to overcome the quadratic barrier.
Exercises
Exercise 4.1. The chapter shows that selection sort is not stable. Describe how selection sort could be modified to become stable (hint: use insertion into a separate output instead of swapping). What is the cost of this modification?
Exercise 4.2. A sentinel version of insertion sort places a minimum element at position before sorting, eliminating the insertIndex >= 0 bound check in the inner loop. Explain why this is correct and analyze its effect on performance. What are the drawbacks?
Exercise 4.3. Write a Comparator<string> for lexicographic ordering and use it to sort ["banana", "apple", "cherry", "date", "apricot"] with insertion sort. Why can't a string comparator use the subtraction pattern (a, b) => a - b that numberComparator uses?
Exercise 4.4. Given an array of student records [{name: "Charlie", grade: 90}, {name: "Alice", grade: 85}, {name: "Bob", grade: 90}, {name: "Alice", grade: 90}], use the multi-key sorting technique described in the stability section to sort by grade (ascending) as the primary key and by name (alphabetically) as the secondary key. Then write a single composite comparator that achieves the same result in one pass — when would you prefer each approach?
Exercise 4.5. Consider sorting an array of {x: number, y: number} points by their Euclidean distance from the origin. Write the comparator. Can you avoid computing square roots? Does the choice between insertion sort and selection sort matter if multiple points may share the same distance?
Efficient Sorting
In Chapter 4 we proved that any comparison-based sorting algorithm must make comparisons in the worst case. The three elementary algorithms we studied — bubble sort, selection sort, and insertion sort — fall short of this bound, requiring time. In this chapter we meet three algorithms that close the gap: merge sort, quicksort, and heapsort. All three achieve time and are, in different senses, asymptotically optimal. They use the divide-and-conquer strategy from Chapter 3, but apply it in very different ways — merge sort divides trivially and combines carefully, quicksort divides carefully and combines trivially, and heapsort uses a heap data structure to repeatedly extract the maximum. We also study randomized quicksort, which uses random pivot selection to guarantee expected performance on every input.
Merge sort
Merge sort is the most straightforward application of divide-and-conquer to sorting. The idea is simple: split the array in half, recursively sort each half, and then merge the two sorted halves into a single sorted array.
The algorithm
The recursive (top-down) formulation of merge sort is:
- If the array has zero or one elements, it is already sorted. Return.
- Divide the array into two halves of roughly equal size.
- Recursively sort each half.
- Merge the two sorted halves into a single sorted array using an efficient merge procedure.
Notice that the divide step (step 2) does no real work — it simply computes a midpoint. The recursive sort (step 3) keeps splitting until it reaches single-element subarrays, which are trivially sorted. All the real work happens in the merge step (step 4). Since the recursive splitting always ends the same way — individual elements — we can skip it entirely and work bottom-up:
- Start with runs of length 1 (each individual element is a trivially sorted run).
- Set the run length .
- While :
- Merge each adjacent pair of sorted runs of length into a sorted run of length using an efficient merge procedure.
- Double the run length: .
This bottom-up formulation performs exactly the same merges as the recursive version but avoids the recursion stack. It is the version we will implement.
The key insight shared by both formulations is that merging two sorted arrays of total length takes time: we scan both arrays from left to right, always taking the smaller of the two current elements.
The merge procedure
The merge step is the heart of the algorithm. Given an array arr and indices start, middle, and end, we merge the sorted subarrays arr[start..middle) and arr[middle..end) into a single sorted subarray arr[start..end).
export function merge<T>(
arr: T[],
start: number,
middle: number,
end: number,
comparator: Comparator<T> = numberComparator as Comparator<T>,
): void {
const sorted: T[] = [];
let i = start;
let j = middle;
while (i < middle && j < end) {
if (comparator(arr[i]!, arr[j]!) <= 0) {
sorted.push(arr[i]!);
i++;
} else {
sorted.push(arr[j]!);
j++;
}
}
while (i < middle) {
sorted.push(arr[i]!);
i++;
}
while (j < end) {
sorted.push(arr[j]!);
j++;
}
i = start;
while (i < end) {
arr[i] = sorted[i - start]!;
i++;
}
}
The comparison <= 0 (rather than < 0) ensures stability: when two elements are equal, the one from the left subarray comes first, preserving original order.
Tracing the merge procedure
To understand how merge works step by step, let us trace through two small examples.
Example 1: merge the sorted subarrays and .
We initialize two pointers: at the start of the left subarray and at the start of the right subarray. At each step, we compare the elements at and , take the smaller one into the auxiliary array sorted, and advance the corresponding pointer. When one subarray is exhausted, we append the remainder of the other.
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 2 | 4 | ? Yes | Take 2 from left, ++ | |
| 2 | 8 | 4 | ? No | Take 4 from right, ++ | |
| 3 | 8 | 5 | ? No | Take 5 from right, ++ | |
| 4 | 8 | — | Right exhausted | Append remaining from left |
The right subarray is exhausted after step 3 (both of its elements have been taken). The remaining element from the left subarray (8) is appended. The auxiliary array sorted = is then copied back into the corresponding positions of the original array.
Example 2: merge the sorted subarrays and .
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 1 | 2 | ? Yes | Take 1 from left, ++ | |
| 2 | 3 | 2 | ? No | Take 2 from right, ++ | |
| 3 | 3 | — | Right exhausted | Append remaining from left |
The right subarray has only one element. After it is taken in step 2, the right subarray is exhausted and we append the remaining elements from the left subarray (3 and 6) in order. The merge procedure handles subarrays of unequal length naturally — the two "cleanup" loops in the code (lines 43–50) append whichever subarray still has remaining elements.
Tracing through an example
Let us sort using the bottom-up approach.
The divide phase is implicit. Had we used the recursive (top-down) formulation, the algorithm would begin by splitting the array in half through recursive calls, producing the following tree of subproblems:
[38, 27, 43, 3, 9, 82, 10]
/ \
[38, 27, 43, 3] [9, 82, 10]
/ \ / \
[38, 27] [43, 3] [9, 82] [10]
/ \ / \ / \
[38] [27] [43] [3] [9] [82]
As discussed above, this divide phase performs no useful work — it merely determines which subarrays to merge. Our bottom-up implementation skips it entirely, starting from single-element runs and doubling the run length each iteration.
Iteration 1 (step = 2): merge pairs of 1-element runs into sorted 2-element runs.
Merge of and — merge(arr, 0, 1, 2):
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 38 | 27 | ? No | Take 27 from right, ++ | |
| 2 | 38 | — | Right exhausted | Append remaining from left |
Merge of and — merge(arr, 2, 3, 4):
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 43 | 3 | ? No | Take 3 from right, ++ | |
| 2 | 43 | — | Right exhausted | Append remaining from left |
Merge of and — merge(arr, 4, 5, 6):
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 9 | 82 | ? Yes | Take 9 from left, ++ | |
| 2 | — | 82 | Left exhausted | Append remaining from right |
The element 10 at index 6 has no partner to merge with (the array has odd length), so it remains as a 1-element run.
Array after iteration 1: .
Iteration 2 (step = 4): merge pairs of 2-element runs into sorted 4-element runs.
Merge of and — merge(arr, 0, 2, 4):
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 27 | 3 | ? No | Take 3 from right, ++ | |
| 2 | 27 | 43 | ? Yes | Take 27 from left, ++ | |
| 3 | 38 | 43 | ? Yes | Take 38 from left, ++ | |
| 4 | — | 43 | Left exhausted | Append remaining from right |
Merge of and — merge(arr, 4, 6, 7):
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 9 | 10 | ? Yes | Take 9 from left, ++ | |
| 2 | 82 | 10 | ? No | Take 10 from right, ++ | |
| 3 | 82 | — | Right exhausted | Append remaining from left |
Array after iteration 2: .
Iteration 3 (step = 8): merge the two remaining runs into a single sorted array.
Merge of and — merge(arr, 0, 4, 7):
| # | arr[i] | arr[j] | Comparison | Action | sorted |
|---|---|---|---|---|---|
| 1 | 3 | 9 | ? Yes | Take 3 from left, ++ | |
| 2 | 27 | 9 | ? No | Take 9 from right, ++ | |
| 3 | 27 | 10 | ? No | Take 10 from right, ++ | |
| 4 | 27 | 82 | ? Yes | Take 27 from left, ++ | |
| 5 | 38 | 82 | ? Yes | Take 38 from left, ++ | |
| 6 | 43 | 82 | ? Yes | Take 43 from left, ++ | |
| 7 | — | 82 | Left exhausted | Append remaining from right |
Array after iteration 3: .
Result: .
Notice how the bottom-up approach performs exactly the same merges that the recursive version would, but in a simple iterative pattern: each iteration doubles the run length, and the algorithm terminates after iterations. The total number of element comparisons across all merges is 14 — fewer than the comparisons that an elementary algorithm would make on the same input. The difference is modest here, but it grows rapidly with input size: merge sort makes comparisons versus , so for large the savings are enormous.
Bottom-up implementation
Here is the bottom-up formulation described above, which performs the same merges as the recursive version but avoids the recursion stack.
export function mergeSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
let step = 1;
while (step < elements.length) {
step = step * 2;
for (let start = 0; start < elements.length; start = start + step) {
const middle = Math.min(start + step / 2, elements.length);
const end = Math.min(start + step, elements.length);
merge(elements, start, middle, end, comparator);
}
}
return elements;
}
The bottom-up approach has the same time complexity as the recursive version but avoids the recursion stack overhead.
Correctness
Claim. The merge procedure merges two sorted subarrays into a single sorted subarray.
At each step of the main loop, we choose the smaller of the two current front elements. Since both subarrays are sorted, the current front element of each is the smallest remaining element in that subarray. Therefore, the smaller of the two fronts is the smallest remaining element overall. After the main loop, one subarray is exhausted. Every remaining element in the other subarray is greater than or equal to the last element placed into the merged result (otherwise it would have been chosen earlier), and these remaining elements are already sorted among themselves, so appending them preserves the sorted order. The result is a sorted permutation of all elements from both subarrays. The <= 0 comparison ensures that equal elements from the left subarray come first, preserving stability.
Claim. Merge sort correctly sorts the array.
We argue by induction on the run length.
Base case. In the first iteration (step = 2), each merge operates on runs of length 1, which are trivially sorted. By the merge claim above, each merge produces a sorted run of length 2.
Inductive step. Assume that after iteration every run has length and is sorted. In iteration , the merge procedure combines each pair of sorted runs of length into a sorted run of length , which is sorted by the merge claim.
After iterations, the entire array is a single sorted run.
Complexity analysis
Time. At each level of the merge tree, the total work across all merges is (each element is compared and copied once). The number of levels is . Therefore:
This holds in the best case, worst case, and average case — merge sort is not adaptive to the input's presortedness.
The same result follows from the recurrence for the recursive version:
By the Master Theorem (case 2, with , , ), we get .
Exact worst-case comparison count. While captures the growth rate, we can pin down the exact number of comparisons. The key observation is that merging two sorted arrays of sizes and requires at most comparisons in the worst case — when the elements are fully interleaved, so we must exhaust both arrays before the merge is complete. (In the best case, all elements of one array are smaller than all elements of the other, requiring only comparisons.)
For , every split is perfectly even and the recursion tree has levels of merging. At level (counting from the bottom), there are merges, each combining two arrays of size with at most comparisons:
Summing over all levels:
Since , this gives . For general , the exact worst-case count satisfies the recurrence with . A straightforward strong induction shows that the solution is:
The leading term is with coefficient exactly 1 — not a hidden Big-O constant, but a precise value. We will use this fact in the quicksort section to make an exact comparison between quicksort's average-case and merge sort's worst-case comparison counts.
For completeness: the best-case comparison count (when every merge encounters one subarray entirely smaller than the other, requiring only comparisons) is roughly half the worst case. For , each of the levels contributes comparisons, giving . Both best and worst case are , but the leading coefficients differ — versus .
Space. The merge procedure uses an auxiliary array of size up to to hold merged elements during each merge. The bottom-up version uses no recursion stack; the recursive version would add stack frames. The total auxiliary space is .
Properties
| Property | Merge sort |
|---|---|
| Worst-case time | |
| Best-case time | |
| Average-case time | |
| Space | auxiliary |
| Stable | Yes |
| Adaptive | No |
Quicksort
Quicksort, invented by Tony Hoare in 1959, takes the opposite approach from merge sort. Where merge sort divides trivially (split in half) and combines carefully (merge), quicksort divides carefully (partition) and combines trivially (the subarrays are already in the right place).
The idea: choose a pivot element, rearrange the array so that all elements less than the pivot come before it and all elements greater come after it, then recursively sort the two partitions.
The algorithm
The recursive formulation of quicksort is:
- If the array has zero or one elements, it is already sorted. Return.
- Choose a pivot element from the array.
- Partition the array: rearrange elements so that all elements less than the pivot come before it and all elements greater come after it. The pivot is now in its correct final position.
- Recursively sort the subarray of elements before the pivot.
- Recursively sort the subarray of elements after the pivot.
Notice that the combine step is trivial — there is nothing to do after the recursive calls, because the partitioning has already placed every element on the correct side of the pivot. All the real work happens in the partition step (step 3). The quality of the pivot choice (step 2) determines performance. Since the pivot itself is placed in its final position and does not participate in either recursive call, the two subarrays together contain elements. An ideal pivot splits them into two roughly equal halves, which results in the running time for the algorithm, while a poor pivot that lands at one extreme produces one subarray of size and one of size 0, which results in the running time for the algorithm.
The partition procedure
The partition step rearranges arr[start..end] around a pivot element and returns the pivot's final index. After partitioning:
- All elements to the left of the pivot are the pivot.
- All elements to the right are the pivot.
- The pivot is in its correct final position.
Our implementation uses the Lomuto partition scheme: scan from left, moving elements smaller than the pivot to the front. By default the middle element is chosen as the pivot, but the caller can pass an explicit pivotPos to partition around a specific element — a flexibility we will use in Chapter 6 for the median-of-medians algorithm.
export function partition<T>(
arr: T[],
start: number,
end: number,
comparator: Comparator<T> = numberComparator as Comparator<T>,
pivotPos?: number,
): number | undefined {
if (start > end || end >= arr.length || start < 0 || end < 0) {
return undefined;
}
const pivotIndex = pivotPos ?? Math.floor((start + end) / 2);
let storeIndex = start;
// Move pivot to end
const pivotTemp = arr[pivotIndex]!;
arr[pivotIndex] = arr[end]!;
arr[end] = pivotTemp;
for (let i = start; i < end; i++) {
if (comparator(arr[i]!, arr[end]!) < 0) {
const temp = arr[storeIndex]!;
arr[storeIndex] = arr[i]!;
arr[i] = temp;
storeIndex++;
}
}
// Move pivot to its final position
const temp = arr[storeIndex]!;
arr[storeIndex] = arr[end]!;
arr[end] = temp;
return storeIndex;
}
The pivot is first swapped to the end, then storeIndex tracks the boundary between elements known to be less than the pivot and elements not yet examined. After the scan, the pivot is swapped into storeIndex, its correct position.
The Lomuto partition scheme in detail
The Lomuto partition scheme (named after Nico Lomuto and popularized by Jon Bentley) is an elegant single-pass algorithm that partitions an array around a pivot using two indices: storeIndex and i. The pivot is first moved to the end of the array, and then the scan pointer i advances from left to right, examining each element exactly once.
At every point during the scan, the array is divided into four regions. The two pointers storeIndex and i carve out the boundaries:
arr[start..storeIndex-1]— elements already classified as less than the pivot.arr[storeIndex..i-1]— elements already classified as greater than or equal to the pivot. This region may be empty: at the very beginning of the scanstoreIndex = i = start, and it remains empty as long as every element examined so far is less than the pivot (because each such element advances bothiandstoreIndex). The region grows only when the scan encounters an element pivot — that element stays in place andiadvances past it whilestoreIndexdoes not.arr[i..end-1]— elements not yet examined.arr[end]— the pivot itself (parked at the end).
On each step, the scan pointer i examines one element:
- If
arr[i] < pivot: swaparr[i]witharr[storeIndex]and advance bothiandstoreIndex. This grows the "less than" region by one. If the "≥ pivot" region is non-empty, the swap moves its first element into the position just vacated byarr[i], keeping it in the "≥ pivot" region. If the "≥ pivot" region is empty (storeIndex = i), the element is effectively swapped with itself — a no-op — and both pointers advance together. - If
arr[i] ≥ pivot: advanceionly. The element stays where it is, andstoreIndexdoes not move, so the element becomes part of the "≥ pivot" region. This is also the moment whenstoreIndexandidiverge (if they were still equal).
When the scan is complete (i = end), the "not examined" region is empty. We swap the pivot from arr[end] into arr[storeIndex] — the boundary between the two classified regions — placing it in its correct final position.
Tracing the Lomuto scheme. Let us trace the partition of (indices 0–5) with the middle element as pivot. The middle index is , so the pivot is . Swap it to the end:
Now scan with storeIndex = 0. At each step, we show the four regions of the array. Elements in the "less than" region are shown in bold, elements in the "greater or equal" region are shown in italics, and the pivot is underlined.
Initial state (storeIndex = 0, i = 0):
Step 1 (i = 0): . Is ? No. Do nothing.
storeIndex stays at 0.
Step 2 (i = 1): . Is ? Yes. Swap and :
storeIndex advances to 1.
Step 3 (i = 2): . Is ? Yes. Swap and :
storeIndex advances to 2.
Step 4 (i = 3): . Is ? Yes. Swap and :
storeIndex advances to 3.
Step 5 (i = 4): . Is ? Yes. Swap and :
storeIndex advances to 4.
Place pivot: Swap with :
The pivot 5 is now at index 4, its correct sorted position. Every element to its left is less than 5, and every element to its right is greater than or equal to 5.
Notice how the "greater or equal" region (just the element 8 in this example) gets pushed rightward one position each time a smaller element is swapped into the "less than" region. The storeIndex pointer always marks the exact boundary: everything before it is less than the pivot, everything from it onward (up to the scan pointer) is greater or equal.
This detailed trace also serves as an informal correctness argument for the Lomuto scheme. At every step, the four-region invariant is maintained: elements before storeIndex are less than the pivot, elements from storeIndex to i - 1 are greater than or equal, elements from i onward have not yet been examined, and the pivot sits at the end. When the scan completes, the "not examined" region is empty, so swapping the pivot into storeIndex places it at the exact boundary between the "less than" and "greater or equal" regions — its correct final position. Note also that the scan examines each of the non-pivot elements exactly once, performing at most one swap per element, so the partition procedure runs in time.
Now that we understand how a single partition call rearranges an array around a pivot, we can see how quicksort uses this operation repeatedly to sort an entire array. Each partition places one element — the pivot — in its correct final position and divides the remaining elements into two subproblems. The following example traces the full recursive process, showing how successive partitions progressively sort the array.
Tracing through an example
Let us sort with middle-element pivot selection. Since we have already traced the Lomuto partition scheme step-by-step in the previous section, here we skip the inner details of each partition call and focus on the recursive structure of quicksort — which subarray is partitioned at each step, which pivot is chosen, and how the array evolves toward the sorted order.
In the array snapshots below, elements already in their final sorted positions are shown in bold, and the pivot just placed by the current partition is underlined.
The full recursion tree (each node shows the subarray and the pivot chosen):
[7, 2, 1, 6, 8, 5, 3, 4] pivot 6
/ \
[2, 1, 4, 5, 3] pivot 4 [8, 7] pivot 8
/ \ /
[2, 1, 3] pivot 1 [5] [7]
\
[3, 2] pivot 3
/
[2]
Partition 1 — full array, indices 0–7.
The middle index is , so the pivot is . Partition places 6 at index 5:
The pivot 6 is now in its final position. Two subproblems remain: the left subarray (indices 0–4) and the right subarray (indices 6–7).
Partition 2 — left subarray, indices 0–4: .
The middle index is , so the pivot is . Partition places 4 at index 3:
The pivot 4 is now in its final position. Left subproblem: (indices 0–2). Right subproblem: (index 4) — a single element, already in place and sorted as an array consisting of a single element.
Partition 3 — subarray, indices 0–2: .
The middle index is , so the pivot is . Partition places 1 at index 0:
The pivot 1 is now in its final position. Left subproblem: empty. Right subproblem: (indices 1–2).
Partition 4 — subarray, indices 1–2: .
The middle index is , so the pivot is . Partition places 3 at index 2:
The pivot 3 is now in its final position. Left subproblem: (index 1) — a single element, already in place and sorted as an array consisting of a single element. Right subproblem: empty. The entire left side of the original array is now sorted: .
Partition 5 — right subarray, indices 6–7: .
The middle index is , so the pivot is . Partition places 8 at index 7:
The pivot 8 is now in its final position. Left subproblem: (index 6) — a single element, already in place and sorted as an array consisting of a single element. Right subproblem: empty.
All subproblems have reached the base case. Result: .
Notice that quicksort performed five partitions to sort eight elements, placing one pivot in its final position each time. The three remaining elements (, , and ) reached their final positions by ending up in base-case subarrays of size 1.
Implementation
function sort<T>(
arr: T[],
start: number,
end: number,
comparator: Comparator<T>,
): void {
if (start < end) {
const partitionIndex = partition(arr, start, end, comparator)!;
sort(arr, start, partitionIndex - 1, comparator);
sort(arr, partitionIndex + 1, end, comparator);
}
}
export function quickSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
sort(elements, 0, elements.length - 1, comparator);
return elements;
}
Correctness
Claim. After partition(arr, start, end), the pivot is in its correct final sorted position.
The partition loop moves all elements less than the pivot to positions before storeIndex, and leaves elements greater than or equal to the pivot after storeIndex. The pivot is then placed at storeIndex. Every element before it is smaller, every element after it is at least as large — this is exactly where the pivot belongs in the sorted output.
Claim. Quicksort correctly sorts the array.
We argue by strong induction on the subarray size.
Base case. A subarray of size 0 or 1 is trivially sorted. Quicksort returns it unchanged, so the claim holds.
Inductive step. Assume that quicksort correctly sorts all subarrays of size less than , for some . For a subarray of size , partition places the pivot in its final correct position, with all elements pivot to its left and all elements pivot to its right. Both the left and right subarrays have size strictly less than , so by the inductive hypothesis quicksort correctly sorts each of them. Since every element in the left subarray is pivot every element in the right subarray, and both subarrays are themselves sorted, the entire array of size is sorted.
Complexity analysis
The performance of quicksort depends on the quality of the partition — how evenly the pivot divides the array.
Best case. If the pivot always lands in the middle, each partition splits the array into two roughly equal halves. The recurrence is the same as merge sort:
Worst case. If the pivot always lands at one extreme (the smallest or largest element), one partition has elements and the other has 0. The recurrence becomes:
This worst case occurs with our middle-element pivot when the input is specially constructed, and with the first-element or last-element pivot strategies on already-sorted or reverse-sorted input.
Average case. On a random permutation with any fixed pivot strategy, the expected running time is . Intuitively, even moderately unbalanced partitions (say, 1:9 splits) only add a constant factor to the recursion depth: the shorter side shrinks by a factor of 10, and . A careful analysis (presented below) shows that the exact expected number of comparisons is , where is the th harmonic number. Recall from the merge sort section that merge sort's exact worst-case comparison count is , whose leading term is with coefficient exactly 1. The ratio of the two leading terms is , so quicksort's average case uses only about 39% more comparisons than merge sort's worst case — a remarkably small penalty for an algorithm whose constant-factor advantages (in-place operation, cache friendliness) often make it faster in practice.
A note to the reader. Understanding where the exact formula comes from and why it behaves as is not required for the rest of this book — only the conclusion that quicksort makes expected comparisons matters. The derivation below is provided for the sake of completeness. If the algebra feels heavy, feel free to skip ahead to the Space analysis and return to this section later or skip it altogether.
Setting up the recurrence. Suppose we are sorting elements and each of the elements is equally likely to end up as the pivot (this is the case for a random permutation with a fixed pivot-selection rule such as "pick the first element" or "pick the middle element"). The partition step compares the pivot to every other element, making exactly comparisons. After partitioning, the pivot lands in some position (where ), leaving a left subarray of size and a right subarray of size . Since every position is equally likely, each value of occurs with probability . Let denote the expected number of comparisons to sort elements. We get:
The term counts the comparisons in the partition step. The sum averages over all equally likely pivot positions: when the pivot lands at position , we recursively sort subarrays of sizes and .
Simplifying. Notice that as ranges from to , the value ranges from down to — the same set of values in reverse. Therefore, , and the recurrence becomes:
Solving the recurrence. This is a classic recurrence that is solved by the "multiply both sides by " trick to eliminate the fraction:
Write the same equation for :
Subtracting the second from the first cancels the entire sum except its last term:
Collecting on the right:
Dividing both sides by :
Now define . The recurrence becomes , which telescopes — we can unroll it all the way down to the base case :
Where the harmonic numbers arise. We decompose the summand using partial fractions:
Summing from to :
Both sums are closely related to the harmonic number , but neither starts at — the first runs from and the second from . We express each in terms of by adding and subtracting the missing initial terms.
For the first sum, add and subtract the and terms to complete the harmonic sum up to :
For the second sum, add and subtract the term:
Substituting back:
Finally, use the identity to write everything in terms of :
Multiplying back by (recall ):
This is the exact expected number of comparisons. The harmonic number arises naturally because the telescoping recurrence produces a sum of reciprocals — each "level" of the recursion contributes a term proportional to , and these terms accumulate into a harmonic sum.
Approximating. It is a well-known result from analysis that the harmonic number satisfies , where is the Euler–Mascheroni constant. We omit the proof of this fact and simply use the result. Therefore:
The factor arises from converting between natural logarithm and base-2 logarithm.
Space. Quicksort sorts in place. The recursion stack has depth in the best case but in the worst case. Tail-call optimization or explicit stack management can limit the worst-case stack depth to by always recursing on the smaller partition first.
Properties
| Property | Quicksort |
|---|---|
| Worst-case time | |
| Best-case time | |
| Average-case time | |
| Space | stack (in-place) |
| Stable | No |
| Adaptive | No |
Why quicksort is fast in practice
Despite its worst case, quicksort is often the fastest comparison sort in practice. Several factors contribute:
-
Cache friendliness. Quicksort's partition scan accesses elements sequentially, which is excellent for CPU cache performance. Merge sort accesses two separate subarrays during merge, which can cause more cache misses.
-
Small constant factor. Quicksort performs fewer data movements than merge sort — partitioning swaps elements in place, while merging copies elements to an auxiliary array and back.
-
No auxiliary memory. Quicksort needs only stack space, while merge sort needs auxiliary space. Less memory allocation means less overhead.
-
Adaptable. In practice, quicksort implementations use several optimizations:
-
Insertion sort for small subarrays. When a subarray shrinks below a threshold (typically 10–20 elements), the algorithm switches to insertion sort, which has lower overhead for small inputs.
-
Median-of-three pivot selection. Instead of picking a single element (e.g., the middle or first) as the pivot, the algorithm examines three elements — typically the first, middle, and last — and chooses their median as the pivot. For example, given first = 8, middle = 5, last = 2, the median is 5. Because the median of three samples is far less likely to be an extreme value than a single arbitrary pick, this strategy produces more balanced partitions and dramatically reduces the probability of hitting the worst case — particularly on already-sorted or reverse-sorted inputs, which are the classic worst cases for naive pivot strategies.
-
Three-way partitioning (Dutch National Flag). Standard Lomuto or Hoare partitioning splits the array into two regions: elements pivot and elements pivot. When many elements are equal to the pivot, those duplicates still end up in recursive calls even though they are already in their correct relative position. Three-way partitioning instead splits the array into three regions — elements less than the pivot, elements equal to the pivot, and elements greater than the pivot. The equal-to-pivot region is excluded from both recursive calls, since those elements are already in their final positions. This makes a large difference when the input has many duplicate values: in the extreme case of all-equal elements, a single partition call finishes the entire array in time, whereas standard two-way partitioning would degrade to .
-
Randomized quicksort
Deterministic quicksort's performance depends on the pivot choice. A fixed strategy — first element, last element, middle element — can always be defeated by a carefully constructed input that forces behavior. Randomized quicksort eliminates this vulnerability by choosing the pivot uniformly at random.
Motivation
Consider a sorting library used by millions of applications. An adversary who knows the pivot-selection strategy can craft inputs that trigger worst-case behavior, leading to denial-of-service attacks. By choosing the pivot randomly, we ensure that no input is consistently bad — the algorithm's expected performance is for every input, regardless of how it was constructed.
This is a powerful guarantee. It shifts the source of randomness from the input (which an adversary controls) to the algorithm (which the adversary cannot predict).
The algorithm
Randomized quicksort is identical to standard quicksort, except that the partition step selects a random element as the pivot instead of a fixed one. The only change is the pivot index computation — everything else (the Lomuto scan, the swap logic, the recursive structure) remains exactly the same:
function randomizedPartition<T>(
arr: T[],
start: number,
end: number,
comparator: Comparator<T>,
): number {
// The only change: pick a random pivot instead of the middle element
const randomIndex = start + Math.floor(Math.random() * (end - start + 1));
// ... rest identical to partition
}
The line Math.floor(Math.random() * (end - start + 1)) replaces Math.floor((start + end) / 2) — a uniform random index in [start, end] instead of the fixed middle index. The Lomuto scan, the swap into storeIndex, and the recursive sort driver all remain unchanged.
Expected running time
Theorem 5.1. The expected number of comparisons made by randomized quicksort on any input of size is at most .
Proof. Let be the elements of the input in sorted order. Define the indicator random variable as 1 if and are ever compared during the execution, and 0 otherwise.
The total number of comparisons is:
Why does this sum count all comparisons? The double sum iterates over every pair of distinct elements exactly once: the outer index ranges from 1 to and the inner index ranges from to , so we visit exactly the pairs — all pairs with . (The constraint avoids counting each pair twice; stops at because the smallest valid pair has .)
For each pair, contributes 1 if those two elements are compared during the execution, and 0 if they are not. The sum therefore counts the total number of pairs that are compared — but is this the same as the total number of comparisons? It is, because each pair is compared at most once. To see why: every comparison in quicksort happens during a partition step, where the pivot is compared against each other element in the current subarray. A comparison between and can therefore only occur when one of them is the pivot. Once an element serves as pivot, it is placed in its final position and excluded from all future recursive calls — so that element is never compared with anything again, and in particular and can never be compared a second time.
Since every comparison corresponds to exactly one pair and every pair is compared at most once, the double sum counts exactly the total number of comparisons.
Since the algorithm is randomized, is a random variable — it may take different values on different runs of the algorithm (due to different random pivot choices). We want to compute , the expected value (long-run average) of the total number of comparisons. By linearity of expectation — the fact that the expected value of a sum equals the sum of the expected values, regardless of dependencies — we can pull the expectation inside the double sum:
Now, is an indicator variable: it equals either 0 or 1. A standard property of indicator variables is that the expected value of an indicator equals the probability that it is 1: . This gives us:
It remains to compute these probabilities. Fix a pair and (with ) and ask: under what circumstances are they compared?
Recall that the only comparisons quicksort makes are between a pivot and the other elements in its subarray. So and can only be compared when one of them is a pivot and both are in the same subarray for which the partition function is called. The question is: do they ever find themselves in this situation, or are they separated into different subarrays before either becomes a pivot?
The answer depends on the set of elements between them in sorted order: . Consider what happens when some element from this set is chosen as pivot for the first time:
-
If (an element strictly between them): The pivot satisfies , so goes to the left partition and goes to the right partition. They are now in different subarrays and will never be in the same subarray again — so they will never be compared. (Note that during this partition step, both and are compared to the pivot , but not to each other.)
-
If or (one of the endpoints): That endpoint is the pivot and is compared to every other element in its subarray — including the other endpoint. So and are compared.
What about elements outside this interval — pivots with or ? These cannot separate and : if , then , so both and are greater than the pivot and land in the same (right) partition. If , both are less than the pivot and land in the same (left) partition. Either way, and remain together, and the question whether they will be compared is deferred to a later pivot choice and a partition function call.
Therefore, the fate of the pair (whether they will be compared once during the quicksort sorting or not) is determined entirely by which element in is the first to be chosen as a pivot. If it is or , they are compared; if it is any of the elements strictly between them, they are separated without being compared.
Since pivots are chosen uniformly at random, each of the elements in is equally likely to be the first one selected. Two of these choices (namely and ) lead to a comparison, so:
Therefore:
where is the th harmonic number.
This expected bound holds for every input — it is not an average over random inputs. Even on an adversarial input, randomized quicksort makes expected comparisons.
Worst case
The worst case of still exists in theory: if the random choices happen to always pick the smallest or largest element as pivot. However, the probability of this occurring is astronomically small. For , the probability of consistently terrible pivots through all recursive calls is effectively zero.
Properties
| Property | Randomized quicksort |
|---|---|
| Worst-case time | (extremely unlikely) |
| Expected time | for all inputs |
| Space | expected stack depth |
| Stable | No |
Note: the expected time is , not merely . The upper bound follows from Theorem 5.1 above. The lower bound follows from the comparison-based sorting lower bound proved in Chapter 4: any comparison-based sorting algorithm — including randomized ones — must make comparisons in expectation, since for any fixed sequence of random choices the algorithm is deterministic and the information-theoretic lower bound applies.
Heapsort
Heapsort uses a binary heap to sort an array in place. A binary heap is an array-based data structure that maintains a partial ordering — not fully sorted, but structured enough to find the maximum (or minimum) in time and restore order in time after a removal.
The binary heap
A max-heap is a complete binary tree stored in an array where every node's value is greater than or equal to its children's values. For a node at index (zero-based):
- Left child:
- Right child:
- Parent:
The max-heap property ensures that the root (index 0) holds the largest element.
The algorithm
Heapsort works in two phases:
- Build a max-heap from the input array. After this step, the array satisfies the max-heap property: every node's value is greater than or equal to its children's values, and in particular the largest element is at the root (index 0).
- Extract the maximum repeatedly. For :
- Swap the root (the current maximum) with element , placing the maximum in its final sorted position.
- Reduce the heap size by 1 — element is now in its final position and excluded from the heap.
- Call heapify on the new root to restore the max-heap property among the remaining elements.
Notice that unlike merge sort and quicksort, heapsort does not use divide-and-conquer in the traditional recursive sense. Instead, it uses the heap data structure to efficiently find and remove the maximum element at each step. The build-heap phase (step 1) does the structural work of organizing the array, while the extraction loop (step 2) does the sorting work by repeatedly peeling off the maximum. The sorted elements accumulate at the end of the array while the heap shrinks from the front — the algorithm sorts in place with auxiliary space.
The sections below describe the heap data structure, the heapify and build-heap operations that support it, and the full heapsort implementation.
Heapify
The heapify operation takes a node whose children are both valid max-heaps but whose own value may violate the heap property, and "sinks" it down to restore the property:
function heapify<T>(
arr: T[],
heapSize: number,
index: number,
comparator: Comparator<T>,
): void {
const left = 2 * index + 1;
const right = 2 * index + 2;
let indexOfMaximum = index;
for (const subTreeRootIndex of [left, right]) {
if (
subTreeRootIndex < heapSize &&
comparator(arr[subTreeRootIndex]!, arr[indexOfMaximum]!) > 0
) {
indexOfMaximum = subTreeRootIndex;
}
}
if (indexOfMaximum !== index) {
const temp = arr[index]!;
arr[index] = arr[indexOfMaximum]!;
arr[indexOfMaximum] = temp;
heapify(arr, heapSize, indexOfMaximum, comparator);
}
}
The element at index is compared with its children. If a child is larger, the element is swapped with the largest child, and the process repeats in that child's subtree. Each step moves down one level, so heapify runs in time (the height of the tree).
Tracing the heapify procedure
To understand how heapify works step by step, let us trace through a small example.
Example: call heapify(arr, 7, 0) on the array . The root (value 2) violates the max-heap property, but both of its subtrees are valid max-heaps:
2 ← violates heap property
/ \
7 6
/ \ / \
5 4 1 3
At each step, we compare the current node with its children, find the largest of the three, and swap if the current node is not the largest. If a swap occurs, we recurse into the affected subtree because the swapped-down value may violate the heap property there.
Step 1 (index 0): Compare with its children and . The largest is 7 at index 1. Since , swap and :
7
/ \
2 6
/ \ / \
5 4 1 3
Array: . The right subtree (rooted at 6) is unaffected. Recurse into the left subtree at index 1, where the value 2 may now violate the heap property.
Step 2 (index 1): Compare with its children and . The largest is 5 at index 3. Since , swap and :
7
/ \
5 6
/ \ / \
2 4 1 3
Array: . Recurse into index 3.
Step 3 (index 3): Node has no children (left child index heapSize). It is a leaf — stop.
The element 2 has "sunk" from the root to a leaf in two swaps. The result is a valid max-heap: every node is greater than or equal to its children.
Building a heap
We can convert an unordered array into a max-heap by calling heapify on every non-leaf node, bottom-up:
function buildHeap<T>(
arr: T[],
heapSize: number,
comparator: Comparator<T>,
): void {
const lastNonLeafIndex = Math.floor((heapSize + 1) / 2) - 1;
for (let i = lastNonLeafIndex; i >= 0; i--) {
heapify(arr, heapSize, i, comparator);
}
}
Why Math.floor((heapSize + 1) / 2) - 1? The last non-leaf is the parent of the last element in the array. Since the last element has index , its parent is at index . Every node after this index has no children — it is a leaf. For example, in a heap of size 6, the last element is at index 5, its parent is at , and indices 3, 4, 5 are all leaves.
The code's expression equals when is even. When is odd, it yields one index higher — a leaf node — but this is harmless: heapify on a leaf finds no children and returns immediately. The more direct formula Math.floor((heapSize - 2) / 2) gives the exact last non-leaf for all , but it computes heapSize - 2, which underflows when heapSize is 0 or 1 in languages with unsigned integers. The (heapSize + 1) / 2 - 1 form avoids negative intermediate values, making it portable and safe regardless of the integer type. (When heapSize is 0, the expression evaluates to , so the loop condition i >= 0 is immediately false and the loop body never executes — correctly treating an empty array as a trivial heap.)
Why bottom-up? The leaves (the bottom half of the array) are trivially valid heaps. By processing nodes from the bottom up, each call to heapify encounters a node whose children are already valid heaps — exactly the precondition heapify requires.
It turns out that buildHeap built this way has the time complexity .
Why and not ? A naive analysis says: calls to heapify, each costing , giving . But this overestimates the boundary because it treats every node as if it could sink all the way to the bottom. In reality, most nodes are near the bottom and sink only a few levels. To get the true cost, we group nodes by their height in the tree and sum the work done at each height.
How many nodes are at each height? Define the height of a node as the number of edges on the longest downward path to a leaf. Leaves have height 0, their parents have height 1, and so on up to the root at height . In a complete binary tree with nodes, the number of nodes at height is . Intuitively, each successive level going up has roughly half as many nodes as the level below: about leaves (height 0), nodes at height 1, at height 2, and so on.
How much work does heapify do on a node at height ? Each call to heapify sinks a node by at most levels (one comparison-and-swap per level), so the cost is .
The total cost. Multiplying the number of nodes at height by the cost per node and summing over all heights gives:
Dropping the ceiling and pulling out , this is at most:
Since every term is positive, we can safely extend the upper limit to infinity (which only increases the sum), obtaining:
Evaluating the series . It is a well-known result from analysis that this series converges to exactly 2 (it can be derived by differentiating the geometric series and substituting ; we omit the proof here). So:
Therefore:
The key insight is that the work pyramid is inverted: the many nodes near the bottom of the tree each sink at most a few levels, while the few nodes near the top can sink many levels. Because the heavy per-node work is concentrated at the top where there are very few nodes, the total work sums to rather than to .
Tracing the buildHeap procedure
Let us trace buildHeap on the array . Since we have already traced the heapify procedure step-by-step in the previous section, here we treat each heapify call as a single step and focus on the overall bottom-up process.
The initial array as a tree:
3
/ \
1 6
/ \ /
5 2 4
The array has elements. The last non-leaf index is , so we call heapify on indices 2, 1, and 0, in that order.
heapify(arr, 6, 2) — node at index 2 (value 6). Its only child is . Since , the heap property is already satisfied. No change.
3
/ \
1 6
/ \ /
5 2 4
Array: (unchanged).
heapify(arr, 6, 1) — node at index 1 (value 1). Children are and . The largest child is 5 at index 3. Since , heapify swaps 1 and 5. The element 1 sinks to index 3, which is a leaf — no further swaps.
3
/ \
5 6
/ \ /
1 2 4
Array: .
heapify(arr, 6, 0) — node at index 0 (value 3). Children are and . The largest child is 6 at index 2. Since , heapify swaps 3 and 6. The element 3 sinks to index 2, where its only child is . Since , heapify swaps again. Now 3 is at index 5, a leaf — done.
6
/ \
5 4
/ \ /
1 2 3
Array: .
The result is a valid max-heap. Notice the bottom-up order: by the time we process a node, all nodes below it have already been heapified, so both of its subtrees are valid max-heaps — exactly the precondition that heapify requires.
Now that we have the necessary building blocks: heapify and buildHeap we can proceed to the implementation of the sorting algorithm itself.
Implementation
export function heapSort<T>(
elements: T[],
comparator: Comparator<T> = numberComparator as Comparator<T>,
): T[] {
let heapSize = elements.length;
buildHeap(elements, heapSize, comparator);
// Extract-max loop: repeatedly swap the root (maximum) with the last
// heap element, shrink the heap, and restore the heap property.
for (let i = elements.length - 1; i > 0; i--) {
const temp = elements[0]!;
elements[0] = elements[i]!;
elements[i] = temp;
heapSize--;
heapify(elements, heapSize, 0, comparator);
}
return elements;
}
Tracing through an example
Let us sort .
Build max-heap:
Starting array (as a tree):
4
/ \
10 3
/ \
5 1
Process non-leaf nodes bottom-up. Node at index 1 (value 10): children are 5, 1. 10 is already larger — no change. Node at index 0 (value 4): children are 10, 3. Swap 4 with 10. Then heapify the subtree: 4 vs children 5, 1 → swap with 5.
10 10
/ \ / \
4 3 → 5 3
/ \ / \
5 1 4 1
After calling buildHeap have the following max-heap: .
Extract-max loop:
| # | swap | After swap | Current heap | after heapify |
|---|---|---|---|---|
| 1 | [, 5, 3, 4, 10] | |||
| 2 | [, 4, 3, 5, 10] | |||
| 3 | [, 1, 4, 5, 10] | |||
| 4 | [, 3, 4, 5, 10] |
Result: .
Correctness
Invariant: The extract-max loop variable i starts at and decreases to . At the start of the iteration with loop variable value :
- is a max-heap containing the smallest elements.
- is the sorted prefix, contains the largest elements, in sorted order (when , this range is empty — no elements have been sorted yet).
Initialization (). After buildHeap, the entire array is a max-heap and the sorted suffix is empty.
Maintenance. The root is the largest element in the heap . Swapping it with places it in the correct position (it is the th largest overall). Reducing the heap size and calling heapify restores the heap property on .
Termination (). When i becomes 0, the loop exits. At this point the invariant tells us that contains the largest elements in sorted order, and is a trivial one-element max-heap holding the minimum. The array is sorted.
Complexity analysis
Time. Building the heap takes . The extract-max loop runs times, each iteration performing a swap and a heapify costing . Total:
This holds for all inputs — heapsort is not adaptive.
Space. Heapsort sorts in place. The only auxiliary space is for temporary variables.
Properties
| Property | Heapsort |
|---|---|
| Worst-case time | |
| Best-case time | |
| Average-case time | |
| Space | in-place |
| Stable | No |
| Adaptive | No |
Comparison of efficient sorting algorithms
We have now studied three sorting algorithms. Let us compare them across the dimensions that matter in practice.
Time complexity
| Algorithm | Best case | Average case | Worst case |
|---|---|---|---|
| Merge sort | |||
| Quicksort | |||
| Randomized quicksort | expected | unlikely | |
| Heapsort |
Merge sort and heapsort provide guaranteed performance. Quicksort has a theoretical worst case, but randomization makes this practically irrelevant. In terms of constant factors, quicksort (including randomized) typically makes the fewest comparisons on average — about versus merge sort's comparisons, but with lower overhead per comparison.
Space complexity
| Algorithm | Auxiliary space |
|---|---|
| Merge sort | |
| Quicksort | stack |
| Randomized quicksort | expected stack |
| Heapsort |
Heapsort is the clear winner for space: it sorts truly in place with extra memory. Quicksort needs stack space (or in the worst case without tail-call optimization). Merge sort needs for the auxiliary merge array.
Stability
| Algorithm | Stable? |
|---|---|
| Merge sort | Yes |
| Quicksort | No |
| Randomized quicksort | No |
| Heapsort | No |
Merge sort is the only stable algorithm among the three. This makes it the default choice when stability is required — for example, in database sorting or when composing sorts on multiple keys.
Cache performance
Quicksort has the best cache performance among the three. Its partition scan accesses elements sequentially, making excellent use of CPU cache lines. Merge sort accesses two separate subarrays during merge, which can cause cache misses when the subarrays are far apart in memory. Heapsort has the worst cache performance: heap navigation accesses elements at indices , , and , which jump around the array unpredictably for large arrays.
Practical recommendations
-
General-purpose sorting: Randomized quicksort (or a tuned variant) is the standard choice. Most standard library sort functions (including V8's
Array.prototype.sortfor large arrays) are based on quicksort variants. -
Guaranteed worst-case performance: Use merge sort or heapsort. Merge sort is preferred when stability is needed; heapsort when memory is constrained.
-
Small arrays: Insertion sort (from Chapter 4) outperforms all of the above for small arrays (typically ) due to its minimal overhead. Practical quicksort implementations switch to insertion sort for small subarrays.
-
Hybrid algorithms: The best practical sorts combine multiple algorithms. Timsort (Python, Java) combines merge sort with insertion sort. Introsort (C++ STL) starts with quicksort, switches to heapsort if the recursion depth exceeds (to guarantee worst case), and uses insertion sort for small subarrays.
Summary
In this chapter we studied three efficient comparison-based sorting algorithms:
-
Merge sort divides the array in half, sorts each half recursively, and merges the sorted halves. It runs in time in all cases but requires auxiliary space. It is stable.
-
Quicksort partitions the array around a pivot, placing it in its correct position, then recursively sorts the two partitions. It runs in average time with excellent cache performance, but has worst-case time with a fixed pivot strategy. Randomized quicksort eliminates this vulnerability to adversarial inputs by choosing pivots uniformly at random, achieving expected time on every input.
-
Heapsort builds a max-heap and repeatedly extracts the maximum to build the sorted array from right to left. It runs in time in all cases and uses auxiliary space, but has poor cache performance.
All three algorithms achieve the lower bound proved in Chapter 4. In the next chapter, we explore a different question: can we sort faster than by using information beyond pairwise comparisons?
Exercises
Exercise 5.1. Trace through the merge sort algorithm on the input . Show the state of the array after each merge operation in the bottom-up approach.
Exercise 5.2. Merge sort's merge procedure uses auxiliary space. Can we merge two sorted subarrays in place (using extra space) while maintaining time? Explain why this is difficult. (Hint: in-place merge algorithms exist, but they either sacrifice time complexity to or are extremely complex.)
Exercise 5.3. Consider quicksort with the "first element" pivot strategy. Give an input of size that causes behavior. Then give a different input that causes behavior. What input causes the worst case for the "middle element" strategy used in our implementation?
Exercise 5.4. Prove that the expected recursion depth of randomized quicksort is . (Hint: at each level, with constant probability the pivot falls in the middle half of the array. How many levels until the subproblem size drops to 1?)
Exercise 5.5. Heapsort is not stable. Give a concrete example of an array with duplicate values where heapsort changes the relative order of equal elements. Why does the "swap root with last element" step destroy stability?
Linear-Time Sorting and Selection
In Chapter 4 we proved a lower bound: every comparison-based sorting algorithm must make comparisons in the worst case. The efficient algorithms of Chapter 5 — merge sort, quicksort, heapsort — all meet this bound, and none can beat it. But what if we are willing to go beyond pairwise comparisons? If we know something about the structure of the values — for instance, that they are integers in a bounded range — then it turns out that we can exploit that structure to sort in linear time. In this chapter we study three such algorithms: counting sort, radix sort, and bucket sort. We also turn to a related problem — selection — and present two algorithms that find the th smallest element in time without sorting: randomized quickselect and the deterministic median-of-medians algorithm.
Breaking the comparison lower bound
The lower bound from Chapter 4 applies to comparison-based sorting: algorithms that learn about the input only by comparing pairs of elements. The decision-tree argument shows that any comparison-based algorithm must traverse a binary tree of height of at least , because there are possible permutations and each leaf of the decision tree corresponds to one permutation.
This lower bound however does not apply if we use operations other than comparisons and know more about the values in the underlying array. For example, if the values are integers, we can look at their individual digits. And if the values are bounded, we can use them as array indices. These non-comparison-based operations give us additional information that comparison-based algorithms cannot access, and this is what allows us to sort faster.
The obvious trade-off we are making here is generality: comparison-based sorting works for any totally ordered type, while the algorithms in this chapter require specific value structure (integers, bounded range, uniform distribution).
Counting sort
Counting sort is the simplest linear-time sorting algorithm. It works for non-negative integer values in a known range and sorts by counting how many times each value appears.
The algorithm
- Create an array
countsof size , initialized to zeros. - For each element in the input, increment
counts[element]. - Compute prefix sums: replace each
counts[i]with the sum of all counts for values . After this step,counts[i]tells us the position after the last occurrence of value in the sorted output. - Walk the input array in reverse, placing each element at position
counts[element] - 1and decrementing the count. Walking in reverse ensures stability.
Implementation
export function countingSort(elements: number[]): number[] {
if (elements.length <= 1) {
return elements.slice(0);
}
const max = Math.max(...elements);
const counts = new Array<number>(max + 1).fill(0);
// Count occurrences
for (const val of elements) {
counts[val]!++;
}
// Compute prefix sums (cumulative counts)
for (let i = 1; i <= max; i++) {
counts[i]! += counts[i - 1]!;
}
// Build output array in reverse for stability
const output = new Array<number>(elements.length);
for (let i = elements.length - 1; i >= 0; i--) {
const val = elements[i]!;
counts[val]!--;
output[counts[val]!] = val;
}
return output;
}
Tracing through an example
Let us sort .
Step 1–2: Count occurrences. The maximum value is 8, so we create counts of size 9:
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| counts | 0 | 1 | 2 | 2 | 1 | 0 | 0 | 0 | 1 |
Step 3: Prefix sums. Each entry becomes the cumulative count:
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| counts | 0 | 1 | 3 | 5 | 6 | 6 | 6 | 6 | 7 |
The prefix sum tells us: 0 elements are , 1 element is , 3 elements are , and so on.
Step 4: Place elements (reverse scan).
counts[A[i]] before | Output position | counts[A[i]] after | ||
|---|---|---|---|---|
| 6 | 1 | 1 | 0 | 0 |
| 5 | 3 | 5 | 4 | 4 |
| 4 | 3 | 4 | 3 | 3 |
| 3 | 8 | 7 | 6 | 6 |
| 2 | 2 | 3 | 2 | 2 |
| 1 | 2 | 2 | 1 | 1 |
| 0 | 4 | 6 | 5 | 5 |
Result: .
Notice that the two 2s and the two 3s appear in the same relative order as in the input — counting sort is stable.
Stability
Counting sort's stability is not an accident; it is a consequence of scanning the input in reverse during the placement step. When we encounter the last occurrence of a value (scanning right to left), we place it at the highest available position for that value. The second-to-last occurrence goes one position earlier, and so on. This preserves the original relative order among elements with equal values.
Stability matters when sorting records by one field while preserving order on another, and it is essential for the counting sort's role as a subroutine in radix sort.
Complexity analysis
Time. The algorithm makes four passes:
- Finding the maximum: .
- Counting occurrences: .
- Computing prefix sums: .
- Placing elements in the output: .
Total: , where is the maximum value.
Space. The counts array uses space, and the output array uses space. Total: .
The core trade-off. At this point the reader might wonder: if counting sort runs in linear time, why don't we always use it instead of comparison-based algorithms? The answer is that counting sort trades space for speed, and this trade-off is only possible because we assume the input consists of non-negative integers in a known range . The algorithm allocates an auxiliary counts array of size and uses the element values directly as array indices — an operation that comparison-based algorithms never perform. It is this extra structural knowledge that lets us bypass the comparison lower bound.
When is counting sort practical? When , the space overhead is proportional to the input size, and counting sort runs in time — excellent. But when , the trade-off breaks down: we pay space and time for a mostly empty counts array while gaining nothing. For instance, if the values range up to but the array has only elements, counting sort performs operations and allocates a billion-entry counts array occupying roughly 4 GB of memory (at 4 bytes per integer) — while a comparison-based sort finishes in roughly operations using space (some Kbs of memory at 4 bytes per integer).
Other limitations. Counting sort also cannot handle negative integers (without shifting), floating-point numbers, or strings — any type that cannot serve as an array index. Comparison-based sorting, by contrast, works for any totally ordered type. So counting sort is a more specialized algorithm with a more limited applicability: extremely fast under the right conditions, but inapplicable or wasteful otherwise.
Properties
| Property | Counting sort |
|---|---|
| Time | |
| Space | |
| Stable | Yes |
| In-place | No |
| Value type | Non-negative integers in |
Radix sort
Radix sort extends counting sort to handle integers with many digits. Instead of sorting on the entire value at once (which would require a counts array as large as the value range), radix sort processes one digit at a time, from least significant to most significant.
The algorithm
- Find the maximum element to determine the number of digits . Number the digit positions from left to right, so that digit 1 is the most significant (leftmost) digit and digit is the least significant (rightmost, units) digit.
- For (i.e., from the rightmost digit to the leftmost):
- Sort the array by digit using a stable sort (counting sort restricted to digits 0–9).
The key insight is that we must process digits from least significant to most significant, and that sorting by each digit must be stable. After sorting by the units digit, elements with the same units digit are in a consistent order. When we then sort by the tens digit, stability ensures that elements with the same tens digit remain sorted by their units digit — and so on.
Why least significant digit first?
It may at first seem counterintuitive to start with the least significant digit. To understand why this is needed let us try to sort by the most significant digit first and see where it leads us - let us consider sorting the array . If we sorted by the most significant digit first, we would get groups starting with 3, 4, 6, 7. But then sorting by the next digit within each group would be exactly the original problem, only on smaller arrays - which means that we have made no progress toward a linear-time algorithm.
LSD (Least Significant Digit) radix sort avoids this by exploiting stability. Recall that digit positions are numbered from left to right, and we process them in reverse: . After sorting by digit , the relative order for the elements that agree on digit is determined by the previous passes on digits (i.e., the digits to its right). And when next we sort by the digit (one position to the left), the stability of sorting preserves the sorting by digit among the elements with the same digit at position .
Implementation
The digit-level sorting function is a specialized counting sort that operates on a single digit position:
export function countingSortByDigit(
elements: number[],
position: number,
): number[] {
const n = elements.length;
if (n <= 1) {
return elements.slice(0);
}
const output = new Array<number>(n);
const counts = new Array<number>(10).fill(0);
// Count occurrences of each digit at the given position
for (const val of elements) {
const digit = Math.floor(val / position) % 10;
counts[digit]!++;
}
// Compute prefix sums
for (let i = 1; i < 10; i++) {
counts[i]! += counts[i - 1]!;
}
// Build output in reverse for stability
for (let i = n - 1; i >= 0; i--) {
const val = elements[i]!;
const digit = Math.floor(val / position) % 10;
counts[digit]!--;
output[counts[digit]!] = val;
}
return output;
}
The main radix sort function calls this function for each digit position:
export function radixSort(elements: number[]): number[] {
if (elements.length <= 1) {
return elements.slice(0);
}
const max = Math.max(...elements);
let result = elements.slice(0);
// Process each digit position from least significant to most significant
for (let position = 1; Math.floor(max / position) > 0; position *= 10) {
result = countingSortByDigit(result, position);
}
return result;
}
Tracing through an example
Sort .
Pass 1: Sort by units digit ():
| Element | Units digit |
|---|---|
| 170 | 0 |
| 45 | 5 |
| 75 | 5 |
| 90 | 0 |
| 802 | 2 |
| 24 | 4 |
| 2 | 2 |
| 66 | 6 |
After stable sort by units digit: .
Pass 2: Sort by tens digit ():
| Element | Tens digit |
|---|---|
| 170 | 7 |
| 90 | 9 |
| 802 | 0 |
| 2 | 0 |
| 24 | 2 |
| 45 | 4 |
| 75 | 7 |
| 66 | 6 |
After stable sort by tens digit: .
Notice that 802 and 2 both have tens digit 0, and they remain in the order established by Pass 1 (802 before 2) thanks to stability.
Pass 3: Sort by hundreds digit ():
| Element | Hundreds digit |
|---|---|
| 802 | 8 |
| 2 | 0 |
| 24 | 0 |
| 45 | 0 |
| 66 | 0 |
| 170 | 1 |
| 75 | 0 |
| 90 | 0 |
After stable sort by hundreds digit: .
Result: is a sorted array.
Correctness
Claim. After passes of LSD (Least Significant Digit) radix sort, the array is sorted with respect to the last digits.
Proof by induction on the number of passes :
-
Base case (). The first pass sorts by digit (the units digit). Since counting sort is correct, the array is sorted with respect to the last 1 digit.
-
Inductive step. Assume that after passes the array is sorted by the last digits (that is, by digits ). Pass sorts by digit (the next digit to the left). Consider two arbitrary elements and after the pass :
- If and differ in digit : the sort on digit places them correctly.
- If and have the same digit at position : since the sort is stable, their relative order is preserved from the previous pass, which by the inductive hypothesis ordered them correctly by their last digits.
In both cases, and are correctly ordered by their last digits.
Complexity analysis
Time. Radix sort makes passes, where is the number of digits in the maximum element. Each pass is a counting sort with (the radix), which takes time. Total:
For (bounded number of digits), this is . More generally, if the values are in the range for some constant , then , and radix sort runs in — no better than comparison sort.
Why focus on the range ? Because it covers the cases that arise most often in practice. When we sort items, the values are typically bounded by some polynomial in : sorting people by age gives values in which is ; sorting exam scores out of questions gives values in ; sorting the edges of an -vertex graph by integer weights in (as in Kruskal's minimum spanning tree algorithm) gives values in . Fixed-width machine integers (32-bit or 64-bit) are an even tighter case: is at most 10 or 20 decimal digits regardless of , so radix sort runs in . Ranges beyond — for instance, values up to — would give digits and an running time, but such ranges rarely arise in practice because they require exponentially large numbers that do not fit in standard integer types.
Radix sort achieves true linear time only when is bounded by a constant independent of .
Space. Each counting sort pass uses auxiliary space.
Properties
| Property | Radix sort |
|---|---|
| Time | where = number of digits |
| Space | |
| Stable | Yes |
| Value type | Non-negative integers |
Bucket sort
Bucket sort works well when the input is drawn from a uniform distribution over a known range. It distributes elements into equal-width buckets, sorts each bucket individually (typically with insertion sort), and concatenates the sorted buckets.
The algorithm
- Scan the input to find and — the smallest and largest values — in time.
- Create empty buckets spanning the range (by default , i.e., as many buckets as elements).
- Place each element in its bucket: element goes to bucket .
- Sort each bucket using insertion sort (we reuse the implementation from Chapter 4).
- Concatenate all buckets.
The expression normalizes to a value in , where maps to 0 and maps to 1. Multiplying by scales this to the range , and taking the floor gives a valid bucket index. We multiply by rather than so that the maximum element maps to bucket (the last bucket) rather than to bucket (which would be out of bounds): when , the normalized value is exactly 1, and .
Implementation
The implementation imports insertionSort from Chapter 4 (Elementary Sorting) to sort individual buckets. Since each bucket is expected to contain only a few elements under a uniform distribution, insertion sort's cost per bucket of size is negligible.
import { insertionSort } from
'../04-elementary-sorting/insertion-sort';
export function bucketSort(
elements: number[],
bucketCount?: number,
): number[] {
const n = elements.length;
if (n <= 1) {
return elements.slice(0);
}
const max = Math.max(...elements);
const min = Math.min(...elements);
// If all elements are the same, return a copy
if (max === min) {
return elements.slice(0);
}
const numBuckets = bucketCount ?? n;
const range = max - min;
// Create empty buckets
const buckets: number[][] = [];
for (let i = 0; i < numBuckets; i++) {
buckets.push([]);
}
// Distribute elements into buckets.
// The formula maps each value to a bucket index
// in [0, numBuckets - 1].
// Since val is in [min, max], (val - min) / range
// is in [0, 1], so
// Math.floor(... * (numBuckets - 1)) is always
// in [0, numBuckets - 1].
for (const val of elements) {
const index = Math.floor(
((val - min) / range) * (numBuckets - 1)
);
buckets[index]!.push(val);
}
// Sort each bucket using insertion sort
// and concatenate
const result: number[] = [];
for (const bucket of buckets) {
insertionSort(bucket);
for (const val of bucket) {
result.push(val);
}
}
return result;
}
Tracing through an example
Sort using 10 buckets.
Step 1: Find min and max. Scan the array: , , so the range is .
Step 2: Create 10 empty buckets (indices 0 through 9).
Step 3: Distribute elements into buckets. For each element , compute the bucket index :
| Element | Bucket | |||
|---|---|---|---|---|
| 0.78 | 0.8049 | 7.244 | 7 | 7 |
| 0.17 | 0.0610 | 0.549 | 0 | 0 |
| 0.39 | 0.3293 | 2.963 | 2 | 2 |
| 0.26 | 0.1707 | 1.537 | 1 | 1 |
| 0.72 | 0.7317 | 6.585 | 6 | 6 |
| 0.94 | 1.0000 | 9.000 | 9 | 9 |
| 0.21 | 0.1098 | 0.988 | 0 | 0 |
| 0.12 | 0.0000 | 0.000 | 0 | 0 |
| 0.23 | 0.1341 | 1.207 | 1 | 1 |
| 0.68 | 0.6829 | 6.146 | 6 | 6 |
State of the buckets after distribution:
| Bucket | Elements |
|---|---|
| 0 | [0.17, 0.21, 0.12] |
| 1 | [0.26, 0.23] |
| 2 | [0.39] |
| 3–5 | [] |
| 6 | [0.72, 0.68] |
| 7 | [0.78] |
| 8 | [] |
| 9 | [0.94] |
Step 4: Sort each bucket using insertion sort.
- Bucket 0: → sort →
- Bucket 1: → sort →
- Bucket 2: → already sorted
- Buckets 3–5: empty, nothing to do
- Bucket 6: → sort →
- Bucket 7: → already sorted
- Bucket 8: empty, nothing to do
- Bucket 9: → already sorted
Step 5: Concatenate all buckets in order:
Result: .
Complexity analysis
Expected time under uniform distribution. Suppose we distribute elements into buckets (the typical choice is ). If the elements are drawn uniformly from , each bucket receives roughly elements on average. Sorting each bucket with insertion sort takes expected time per bucket, and there are buckets, so the total expected sorting cost is . Combined with the distribution and concatenation steps, the total expected time is .
More precisely, the total expected cost of sorting all buckets is proportional to the sum , where is the number of elements in bucket . Using probability theory (specifically, the binomial distribution and the variance identity ), one can show that this sum equals:
The total expected cost is whenever , which holds if and only if — that is, is at least proportional to . In particular:
- If is a fixed constant (say ), the dominant term becomes , which is no better than a single insertion sort over the whole array.
- If grows with but slower — say — we get , which is still worse than comparison-based sorting.
- If for any constant (i.e., ), we get , and the expected time is linear.
- Using would still give time but would waste space on empty buckets with no further benefit.
The choice is therefore the most common in textbooks and implementations: it is the simplest choice, it achieves the linear expected time, and the space for the bucket array is — the same order as the input itself.
For example, substituting :
A note to the reader. The derivation of the formula requires familiarity with probability theory (expected values, variance, binomial distributions). The proof is provided below for the sake of completeness. If the math feels unfamiliar, feel free to skip ahead to the Worst case paragraph — the key takeaway is simply that bucket sort achieves expected time when and the input is uniformly distributed.
Proof of the expected cost formula (optional).
The cost of sorting all buckets with insertion sort is proportional to where is the number of elements in bucket . By linearity of expectation:
If the elements are drawn independently and uniformly from , each element lands in bucket with probability . We can think of each element as an independent Bernoulli trial: it either falls into bucket (success, probability ) or it does not (failure, probability ). The count is the total number of successes in independent trials, so follows a binomial distribution .
The mean and variance of a binomial random variable are and respectively. To see why, consider a single Bernoulli trial that is 1 with probability and 0 with probability . Its mean is and its variance is (since and ). The binomial count is the sum . By linearity of expectation, . Because the trials are independent, their variances also add: .
Substituting :
Here is the variance of — a standard measure from probability theory that quantifies how much a random variable deviates from its mean. It is defined as . Let us expand the square inside the expectation. Writing for brevity:
By linearity of expectation, and using the fact that is a constant:
Substituting back and rearranging, we obtain a useful identity that relates the second moment to the variance and the squared mean:
Applying this identity to , we get:
Summing over buckets:
Worst case. If all elements happen to fall into the same bucket, we will have steps for the insertion sort on that bucket. This might happen if the distribution of the elements is very far from uniform. So the more uniform the distribution of the elements over the range is, the better it is for the running time of the bucket sort.
Space. The buckets collectively hold elements, plus for the bucket array structure. With , the total is .
Properties
| Property | Bucket sort |
|---|---|
| Expected time | (uniform distribution) |
| Worst-case time | |
| Space | |
| Stable | Yes (with stable per-bucket sort) |
| Value type | Numeric values in a known range |
Comparison of linear-time sorts
| Algorithm | Time | Space | Stable | Assumptions |
|---|---|---|---|---|
| Counting sort | Yes | Integer values in | ||
| Radix sort | Yes | Integer values with digits | ||
| Bucket sort | expected | Yes | Uniformly distributed values |
All three algorithms achieve linear time under specific conditions. Counting sort is simplest and best when the value range is not much larger than . Radix sort extends counting sort to larger ranges by processing one digit at a time. Bucket sort is ideal for floating-point data with a known, roughly uniform distribution.
These algorithms do not contradict the comparison lower bound — they simply bypass it by using non-comparison operations (such as indexing into an array by value or extracting digits).
The selection problem
We now turn to a different problem. Given an unsorted array of elements and an index (with ), find the th smallest element — the element that would be at index if the array were sorted.
Special cases include:
- : the minimum (trivially solvable in ).
- : the maximum (trivially solvable in ).
- : the median.
The naive approach is to sort the array () and return the element at index . But can we do better? It turns out that the answer is yes - we can actually solve the selection problem in time and get by with much fewer comparisons if we are smart about the process.
Quickselect
Quickselect (also known as Hoare's selection algorithm) is the selection analogue of quicksort. Like quicksort, it partitions the array around a pivot. But unlike quicksort, it only recurses into one side — the side that contains the desired element.
The algorithm
- Choose a random pivot and partition the array.
- If the pivot lands at position , we are done.
- If pivot's position, recurse on the left partition.
- If pivot's position, recurse on the right partition.
Implementation
export function quickselect(
elements: number[],
k: number,
): number {
if (elements.length === 0) {
throw new RangeError('Cannot select from an empty array');
}
if (k < 0 || k >= elements.length) {
throw new RangeError(
`k=${k} is out of bounds for array of length ${elements.length}`,
);
}
return select(elements, 0, elements.length - 1, k);
}
function select(
arr: number[],
left: number,
right: number,
k: number,
): number {
if (left === right) {
return arr[left]!;
}
const pivotIndex = randomizedPartition(arr, left, right);
if (k === pivotIndex) {
return arr[pivotIndex]!;
} else if (k < pivotIndex) {
return select(arr, left, pivotIndex - 1, k);
} else {
return select(arr, pivotIndex + 1, right, k);
}
}
The randomizedPartition function is identical to the one used in randomized quicksort: choose a random element, swap it to the end, partition using the Lomuto scheme.
Tracing through an example
Find the 3rd smallest element (, zero-indexed) in .
The sorted array would be , so the answer is 5.
Iteration 1: Suppose the random pivot is 7 (index 0). After partitioning: , pivot at index 3.
We want , so recurse on the left partition (indices 0–2).
Iteration 2: Suppose the random pivot is 1 (index 1 of the subarray). After partitioning: , pivot at index 0.
We want , so recurse on the right partition (indices 1–2).
Iteration 3: Suppose the random pivot is 5 (index 2). After partitioning: , pivot at index 2.
We want . Done! Return .
Complexity analysis
Expected time. The analysis is similar to randomized quicksort. With a random pivot, the expected partition splits the array roughly in half. But unlike quicksort, we recurse into only one partition, so the expected work at each level halves:
More precisely, the expected number of comparisons is at most (by an analysis similar to the randomized quicksort proof, summing indicator random variables over pairs).
Why does ?
This identity can look surprising the first time you see it — how can adding up infinitely many positive numbers give a finite result? Here is a concrete example, followed by the general argument.
Concrete example. Take . The sum is . Adding the first few terms: , then , , , and so on — each new term gets us closer to but never past it. The sum converges to exactly .
What kind of sum is this? This is a geometric series — a sum where each term is a fixed fraction of the previous one (here each term is half the preceding term). The general formula for an infinite geometric series with first term and common ratio (where ) is:
In our case and , so the sum is .
Step-by-step derivation. Where does the formula come from? Let denote the infinite sum:
Multiply both sides by :
Now subtract the second equation from the first. On the right-hand side, almost every term cancels — cancels with , cancels with , and so on — leaving only the first term:
Factoring the left-hand side gives , so . This works whenever , because the terms shrink toward zero and the infinite sum converges to a finite value. (When , the terms do not shrink and the sum diverges.)
Why it matters here. Each recursive call does less and less work — the first call scans elements, the next scans roughly , then , and so on. Even though there are infinitely many terms in the idealized sum, the terms shrink so fast that the total never exceeds . The first call already accounts for half the total work, the second call for a quarter, and so on — the contributions diminish rapidly and their sum converges to a finite value.
Worst case. If the pivot always lands at one extreme, we have:
This is the same worst case as quicksort, but it is extremely unlikely with random pivots.
Why does ?
This identity is the counterpart to the geometric series above, and it explains why the worst case is so much worse than the expected case.
Concrete example. Take . The sum is . Compare this to the geometric series . The arithmetic sum is 25 times larger, because the terms shrink much more slowly.
What kind of sum is this? If the pivot always lands at one extreme, each recursive call reduces the problem size by only 1 instead of halving it. Expanding the recurrence, we get:
This is an arithmetic series — a sum where each term decreases by a fixed amount (here, by 1) rather than by a fixed ratio.
Step-by-step derivation. Where does the formula come from? A classic trick attributed to Gauss: write the sum forwards and backwards and add them together:
Adding these two rows term by term, every column sums to :
Therefore . The total is roughly , which is .
Why it matters here. Contrast this with the expected case: halving the problem size each time gives a geometric series that converges to , while reducing it by just 1 each time gives an arithmetic series that grows to . The difference between "half as much work each step" and "one less unit of work each step" is the difference between linear and quadratic time.
Properties
| Property | Quickselect |
|---|---|
| Expected time | |
| Worst-case time | |
| Space | expected stack |
| Deterministic | No (randomized) |
Median of medians
Can we achieve worst-case selection? The answer is yes, using a clever pivot-selection strategy called median of medians (also known as BFPRT, after its five inventors: Blum, Floyd, Pratt, Rivest, and Tarjan, 1973).
The idea: instead of choosing a random pivot, choose a pivot that is guaranteed to be near the median, ensuring that each partition eliminates a constant fraction of the elements.
The algorithm
Let's call the overall procedure — it returns the -th smallest element of array (zero-indexed). The algorithm works in place: it rearranges so that position holds the correct element (with everything to its left it and everything to its right it), then returns that element's value.
- Divide the elements into groups of 5 (the last group may have fewer).
- Find the median of each group by sorting it (sorting 5 elements takes constant time). Collect these medians into a new array .
- Find a good pivot by calling — that is, use this same selection procedure on the smaller array to find its median. The result is the "median of medians" pivot.
- Partition the original array around this pivot, placing all smaller elements to its left and all larger elements to its right. The pivot ends up at some position .
- Select from the correct side. If , return the pivot. Otherwise, call on the left partition (if ) or the right partition (if ) to continue searching for the -th element.
Implementation
The implementation reuses insertionSortRange from Chapter 4 to sort small groups in place, and the Lomuto partition from Chapter 5 to partition around the chosen pivot. Each group has at most 5 elements, so insertion sort's cost for is per group.
import { insertionSortRange } from
'../04-elementary-sorting/insertion-sort';
import { partition } from
'../05-efficient-sorting/quick-sort';
export function medianOfMedians(
elements: number[],
k: number,
): number {
if (elements.length === 0) {
throw new RangeError('Cannot select from an empty array');
}
if (k < 0 || k >= elements.length) {
throw new RangeError(
`k=${k} is out of bounds for array of length ${elements.length}`,
);
}
return selectMoM(elements, 0, elements.length - 1, k);
}
The core recursive function:
function selectMoM(
arr: number[],
left: number,
right: number,
k: number,
): number {
// Base case: small enough to sort directly
if (right - left < 5) {
insertionSortRange(arr, left, right);
return arr[k]!;
}
// Step 1: Divide into groups of 5, find median of each group
const numGroups = Math.ceil((right - left + 1) / 5);
for (let i = 0; i < numGroups; i++) {
const groupLeft = left + i * 5;
const groupRight = Math.min(groupLeft + 4, right);
// Sort the group to find its median
insertionSortRange(arr, groupLeft, groupRight);
// Move the median of this group to the front of the array
const medianIndex =
groupLeft + Math.floor((groupRight - groupLeft) / 2);
swap(arr, medianIndex, left + i);
}
// Step 2: Recursively find the median of the medians
const medianOfMediansIndex =
left + Math.floor((numGroups - 1) / 2);
selectMoM(arr, left, left + numGroups - 1, medianOfMediansIndex);
// The median of medians is now at medianOfMediansIndex
// Step 3: Lomuto partition (from Ch. 5) with the chosen pivot
const pivotIndex = partition(
arr, left, right, numberComparator,
medianOfMediansIndex
)!;
if (k === pivotIndex) {
return arr[pivotIndex]!;
} else if (k < pivotIndex) {
return selectMoM(arr, left, pivotIndex - 1, k);
} else {
return selectMoM(arr, pivotIndex + 1, right, k);
}
}
Tracing through an example
Find the median (, zero-indexed) in:
.
The sorted array is , so the answer at is 8.
Step 1: Divide into groups of 5 and find medians.
| Group | Elements | Sorted | Median |
|---|---|---|---|
| 1 | [12, 3, 5, 7, 19] | [3, 5, 7, 12, 19] | 7 |
| 2 | [26, 4, 1, 8, 15] | [1, 4, 8, 15, 26] | 8 |
| 3 | [20, 11, 9, 2, 6] | [2, 6, 9, 11, 20] | 9 |
Step 2: Median of medians. The medians are . The median of this group is 8.
Step 3: Partition around 8. Using 8 as the pivot, elements go left, elements go right:
The pivot lands at index 7. We want , and the pivot is at index 7. Done! Return 8.
A note on the partition. The partition above is shown conceptually: it illustrates which elements end up on which side of the pivot, but lists them in their original order for readability. The actual in-place algorithm rearranges the array before partitioning — sorting each group of 5, swapping medians to the front — and the Lomuto partition itself does not preserve relative order. The arrangement of elements within each partition does not matter for correctness or efficiency; all that matters is that the pivot lands at position with all smaller elements to its left and all larger elements to its right.
Complexity analysis
Time. Let be the worst-case time to select from elements. The algorithm does the following work at each level of recursion:
- : the non-recursive work — sorting each of the groups of 5 takes per group, so total; and partitioning the full array around the pivot is a single linear scan, also .
- : the recursive call to find the median of the group medians. After sorting each group and extracting one median per group (that is the work above), we have an array of medians and need to find their median — that is, select the -th smallest element out of the medians (since the median of elements is the element at position in sorted order, and ). This is a selection problem on a smaller input, so we solve it by calling the same median-of-medians algorithm recursively, which costs .
- : the recursive call to select within the partition that contains the target index since neither partition can have more than elements; let us provide the proof of this below.
Why the pivot guarantees a worst-case split. We have groups, and the pivot is the median of the group medians. Call that count . If we lined up all group medians in sorted order, would sit in the middle. That means at least of the group medians are (the ones at or below the middle position), and at least are . Now consider any group whose median is . Within that group of 5, the median is the 3rd-smallest element, so the median and the two elements below it are all — that is 3 elements per group. So the total number of elements guaranteed to be is at least:
However, the last group may have fewer than 5 elements, so its median may sit above fewer than 2 elements — in the worst case (a group of 1 or 2), it contributes only 1 element instead of 3, a shortfall of 2. Every other group, including the group that contains itself, is a full group of 5 and contributes the full 3 elements ('s group contributes plus the 2 elements below it). So the total number of elements guaranteed to be is at least , which simplifies to .
By a symmetric argument, is also at least elements. Therefore, after partitioning around , neither side can contain more than elements. Since , this is at most .
The recurrence. Combining the three terms above gives:
Note that the master theorem does not directly apply here, because the recurrence has two recursive terms of different sizes rather than the standard form . The key observation is that : the two recursive subproblem sizes add up to a strict fraction of , and this is what makes the recursion tree converge and make behave as a .
Why does behave as ?
Consider the recursion tree. At each level, sum the non-recursive work done by all active subproblems:
Level Subproblem sizes Work at this level 0 1 2 (all branches at depth ) Because the fractions sum to , the total work at each level shrinks by a factor of . Summing over all levels gives a convergent geometric series:
The strict inequality is what makes the series converge and collapses all levels into a single term.
Why groups of 5? The choice of 5 is not arbitrary — it is the smallest group size that makes the recurrence work out to . Groups of 3 and 4 both fail, and for related reasons.
Groups of 3. We would have groups. The median of medians step would recurse on elements to find the pivot. By the same counting argument, the pivot would be guaranteed to be (and ) at least elements (2 elements per group instead of 3, since in a group of 3 the median has only 1 element below it, plus itself). So each partition side would have at most elements. The recurrence would be:
Since , the two subproblems add up to the full input size (plus a constant), so this solves to , not .
Groups of 4. We would have groups. The lower median of a group of 4 is the 2nd element, which has only 1 element below it — so each qualifying group contributes just 2 elements, the same as groups of 3. The pivot is guaranteed to be (and ) at least elements, giving a worst-case partition of size . The recurrence would be:
Since , this again solves to , not .
Why 5 succeeds where 3 and 4 fail. The crucial quantity is how many elements each qualifying group contributes to the "guaranteed " set: (the median plus every element below it). For this is 2; for it is also 2 (the lower median of 4 elements has only 1 element below it, the same as the median of 3); but for it jumps to 3. This extra element per group is what pushes the fraction sum below 1: for groups of 5 we get , giving the convergent geometric series that yields .
Note that groups of 6 also give : each group contributes elements (the same as groups of 5), but there are fewer groups ( instead of ), so fewer qualifying groups ( instead of ), giving only guaranteed elements instead of . The worst-case partition is therefore , and the fractions sum to . But 5 is preferred over 6 because smaller groups mean less work sorting each group and fewer groups to process. More generally, odd group sizes are more efficient because their median is the true middle element: in a group of (odd), elements sit at or below the median and the same number at or above, so every element "pulls its weight." In an even group, the lower median is biased — one element sits above the median without contributing to the guarantee, wasting a slot. Groups of 4 and 3 end up contributing the same count (2) even though groups of 4 are larger.
In order to have we need the fractions to sum to strictly less than 1.
Why does solve to and not ?
Consider the recursion tree. At each level, sum the non-recursive work done by all active subproblems:
Level Subproblem sizes Work at this level 0 1 2 (all branches at depth ) Because the fractions sum to exactly 1, the total input covered at every level is exactly , so every level costs . The deepest branch is the arm, which reaches its base case after levels. Multiplying gives:
This is the same reason mergesort — whose recurrence also has fractions summing to — is .
Contrast this with the groups-of-5 recurrence analysed in the box above: there , so the per-level work shrinks geometrically and sums to . The difference between and comes down to whether the fractions sum to strictly less than 1 (geometric series converges) or exactly 1 (every level costs and there are levels).
Space. The recursion has depth (each level reduces the problem by a constant factor), so the stack space is .
Practical considerations
While the median-of-medians algorithm is a beautiful theoretical result — it proved that deterministic linear-time selection is possible — it is rarely used in practice. The constant factor hidden in the is large (roughly 5–10× slower than randomized quickselect for typical inputs). Randomized quickselect is almost always faster in practice because:
- It avoids the overhead of computing medians of groups.
- Random pivots are usually good enough.
- The probability of quadratic behavior is exponentially small in — bounded by — making it vanishingly unlikely for any practical input size.
To see why, let's call a pivot "good" if it lands in the middle 50% of the subarray (between the 25th and 75th percentiles), and "bad" otherwise — like a coin flip, each pivot is good with probability . A good pivot shrinks the subproblem to at most of its current size, so just a handful of good pivots are enough to collapse the subarray through geometric shrinkage: For quadratic behavior, almost all pivots throughout the entire recursion would have to be bad — good pivots must appear too rarely to drive this shrinkage. But that is like flipping a fair coin hundreds of times and getting almost no heads. Each flip is independent, so the probability drops exponentially with : out of pivots we expect about half to be good, but quadratic behavior needs almost all of them to be bad. Each additional pivot that "must be bad" multiplies the probability by , so after the pivots in the "bad" scenario the probability is at most . For an input of just , this is already .
The practical value of median of medians is primarily as a fallback: some implementations (e.g., the introselect algorithm in C++ STL) start with quickselect and switch to median of medians if the recursion depth grows too large, guaranteeing worst-case while maintaining fast average-case performance.
Properties
| Property | Median of medians |
|---|---|
| Worst-case time | |
| Space | — recursion stack depth (each level reduces the problem by a constant fraction) |
| Deterministic | Yes |
| Practical | Slower than quickselect due to large constants |
Summary
In this chapter we studied algorithms that break the comparison-based sorting barrier and solve the selection problem in linear time:
-
Counting sort sorts non-negative integers in the range in time by counting occurrences and computing prefix sums. It is stable and serves as a building block for radix sort.
-
Radix sort extends counting sort to handle integers with multiple digits, sorting digit by digit from least significant to most significant. It runs in time where is the number of digits. Correctness depends on the subroutine sort being stable, so that the ordering established by earlier digits is preserved when sorting by later ones.
-
Bucket sort distributes elements into buckets, sorts each bucket, and concatenates. Under a uniform distribution, the expected time is . Its worst case is when all elements land in one bucket.
-
Quickselect finds the th smallest element in expected time by partitioning around a random pivot and recursing into one side. It is the practical algorithm of choice for selection.
-
Median of medians achieves worst-case selection through a carefully chosen pivot: the median of a group medians. While theoretically optimal, its large constant factor makes it slower than randomized quickselect in practice.
The linear-time sorting algorithms teach an important lesson: algorithmic lower bounds depend on the model of computation. The bound is real for comparison-based sorting, but by stepping outside the comparison model — using integers as array indices (counting sort), extracting digits (radix sort) — we can do better. The selection algorithms show that finding a single order statistic is fundamentally easier than fully sorting, requiring only time regardless of the method.
Exercises
Exercise 6.1. Trace through counting sort on the input . Show the counts array after each step (counting, prefix sums, placement). Verify that the sort is stable by tracking the original indices of elements with value 3.
Exercise 6.2. Radix sort processes digits from least significant to most significant, using a stable sort at each step. What goes wrong if we process digits from most significant to least significant? Give a concrete example where MSD radix sort (without special handling) produces incorrect output.
Exercise 6.3. Counting sort uses space for the counts array, where is the maximum value. If we need to sort integers in the range , we could use counting sort directly with , or we could use radix sort with a base- representation (2 digits). Compare the time and space complexity of both approaches.
Exercise 6.4. Consider a modification of quickselect where, instead of choosing a random pivot, we always choose the first element as the pivot. Describe an input of size for which this modified quickselect takes time to find the median. Then describe an input for which it takes time.
Arrays, Linked Lists, Stacks, and Queues
The algorithms of the preceding chapters operate on arrays: contiguous blocks of memory indexed by integers. Arrays are powerful but they are only one of many ways to organize data. In this chapter we study the fundamental data structures that underpin nearly all of Computer Science: dynamic arrays, linked lists, stacks, queues, and deques. Each offers a different set of trade-offs between time complexity, memory usage, and flexibility. Understanding these structures deeply is essential, because every higher-level data structure, from hash tables to balanced trees to graphs, is built on top of them.
Arrays
An array is the simplest data structure: a contiguous block of memory divided into equal-sized slots, each identified by an integer index. Accessing any element by its index takes time, because the memory address can be computed directly: if the array starts at address and each element occupies bytes, then element lives at address .
This direct addressing makes arrays extremely efficient for random access. However, arrays have a fundamental limitation: their size is fixed at creation time. If we need to store more elements than the array can hold, we must allocate a new, larger array and copy all existing elements, which is an operation.
Static arrays in TypeScript
TypeScript (and JavaScript) arrays are actually dynamic; they resize automatically behind the scenes. But to understand the foundations, imagine a fixed-size array:
const fixed = new Array<number>(10); // 10 slots, all undefined
fixed[0] = 42;
fixed[9] = 99;
// fixed[10] would be out of bounds in a true static array
In languages like C or Java, going beyond the allocated size is either a compile-time error or a runtime crash. JavaScript's built-in arrays hide this complexity, but the cost of resizing is still there; it is just managed for us in the background. Let us see how by prodiving an implementation of dynamic arrays.
A note on faithfulness. The closest analogue to a true fixed-size, contiguous-memory array in TypeScript is a TypedArray such as Int32Array or Float64Array, which is backed by a raw ArrayBuffer of bytes. TypedArrays have a fixed length set at construction, cannot grow, and store elements of a single numeric type in a predictable, cache-friendly layout. The downside is that they only support numeric primitives, so they cannot hold a generic T. Because we want a reusable DynamicArray<T>, the implementation below uses a regular JavaScript array as the backing buffer and treats its slots as fixed by tracking capacity manually. Conceptually, every this.data[i] access should be read as "load the -th word from a contiguous memory block of size capacity." If you want to see the same idea with a genuinely fixed-size store, swap the backing buffer for a Float64Array and drop the T | undefined typing; the resize logic stays identical.
Dynamic arrays
A dynamic array maintains an internal buffer that is larger than the number of elements currently stored. When the buffer fills up, the array allocates a new buffer of double the size and copies all elements over. This doubling strategy gives us amortized appends while keeping worst-case access at . We will justify both bounds in the complexity and space analysis section below, once we have seen how the implementation grows the buffer.
The doubling strategy
Suppose our dynamic array has capacity and currently holds elements. When we append element :
- If : store the element in slot . Cost: .
- If : allocate a new buffer of size , copy all elements, then store the new element. Cost: .
The key insight is that expensive copies happen rarely. After a copy doubles the capacity to , we can perform another cheap appends before the next copy. This is essential for amortized analysis, which we will make precise after the implementation.
Implementation
Our DynamicArray<T> uses a plain JavaScript array as the backing buffer, with explicit capacity management. The initial capacity defaults to 4.
export class DynamicArray<T> implements Iterable<T> {
private data: (T | undefined)[];
private length: number;
constructor(initialCapacity = 4) {
this.data = new Array<T | undefined>(initialCapacity);
this.length = 0;
}
get size(): number {
return this.length;
}
get capacity(): number {
return this.data.length;
}
get(index: number): T {
this.checkBounds(index);
return this.data[index] as T;
}
set(index: number, value: T): void {
this.checkBounds(index);
this.data[index] = value;
}
append(value: T): void {
this.growIfFull();
this.data[this.length] = value;
this.length++;
}
insert(index: number, value: T): void {
this.checkInsertBounds(index);
this.growIfFull();
for (let i = this.length; i > index; i--) {
this.data[i] = this.data[i - 1];
}
this.data[index] = value;
this.length++;
}
remove(index: number): T {
this.checkBounds(index);
const value = this.data[index] as T;
for (let i = index; i < this.length - 1; i++) {
this.data[i] = this.data[i + 1];
}
this.data[this.length - 1] = undefined;
this.length--;
this.shrinkIfSparse();
return value;
}
private growIfFull(): void {
if (this.length === this.data.length) {
this.resize(this.data.length * 2);
}
}
private shrinkIfSparse(): void {
if (
this.length > 0 &&
this.length <= this.data.length / 4 &&
this.data.length > 4
) {
this.resize(Math.max(4, Math.floor(this.data.length / 2)));
}
}
private resize(newCapacity: number): void {
const newData = new Array<T | undefined>(newCapacity);
for (let i = 0; i < this.length; i++) {
newData[i] = this.data[i];
}
this.data = newData;
}
private checkBounds(index: number): void {
if (index < 0 || index >= this.length) {
throw new RangeError(
`Index ${index} out of bounds for size ${this.length}`
);
}
}
private checkInsertBounds(index: number): void {
if (index < 0 || index > this.length) {
throw new RangeError(
`Index ${index} out of bounds for size ${this.length}`
);
}
}
// ... iterator, toArray, etc.
}
Notice that remove also implements shrinking: when occupancy falls below 25%, the buffer is halved (but never below 4). This prevents a long sequence of removals from wasting memory, and the halving threshold (1/4 rather than 1/2) avoids thrashing, a pathological pattern where alternating appends and removes near the boundary trigger repeated resizes.
Complexity and space analysis
Now that we have seen the implementation, we can return to the claim made at the start of this section: a dynamic array supports append in amortized time while keeping get and set at in the worst case.
Indexed read and write stay
The backing store is a single contiguous JavaScript array, so the slot for element lives at a fixed offset within it. Computing that offset is constant work, independent of . Once we have the offset, both reading the slot (get(i)) and writing to it (set(i, v)) take constant time.
Note that neither get nor set ever triggers a resize. They do not change the number of elements stored, so the buffer never needs to grow or shrink as a result of them; only append, insert, and remove change the size and therefore interact with the resize logic. get and set simply address an existing slot.
Resizing does not break this invariant: the new buffer is also contiguous, so after the copy the slot for element is again at a fixed offset. Therefore get(i) and set(i, v) are both in the worst case, not just on average.
Amortized analysis of append
A single append can cost when it triggers a resize, but resizes are rare enough that the average cost per append, taken over a long sequence of operations, is . This kind of averaging is what amortized analysis captures.
We use the aggregate method. Starting from an empty array with initial capacity 1, suppose we perform appends. Resizes happen when the size reaches 1, 2, 4, 8, ..., up to the largest power of 2 not exceeding . The total number of element copies across all resizes is therefore
To see why this bound holds, recall the multiply-by--and-subtract derivation we used in Chapter 6 (in the box explaining ). Applied to a finite sum with , the same algebra gives , hence
Specializing to and , the sum of copies is exactly . By definition of the floor, , so
The intuition that "the last term dominates the sum" is precisely the statement that, reading the series backward and factoring out , the remaining factor approaches its limit of but never reaches it, so the whole sum is always strictly less than .
So the total work for appends is at most writes plus copies, giving in total. The amortized cost per append is .
Geometric growth is essential to this argument. If we instead grew the buffer by a constant amount at every resize, every -th append would trigger an copy, summing to work for appends and an amortized cost of per append. Doubling (or any growth factor strictly greater than 1) keeps the geometric sum bounded and the amortized cost constant.
Space overhead
There are two distinct sources of space overhead to account for, both absent from a fixed-size static array.
Steady-state slack. Between resizes, some of the allocated buffer is unused: capacity is at most twice the live size, and the shrink rule keeps occupancy above 25% (except for the small initial buffer). So at any moment outside of a resize, the unused capacity is at most a constant fraction of , and the dynamic array's footprint is bounded by a small constant times the footprint of its static-array counterpart.
Transient resize overhead. Resizing is not free of memory cost either. When the buffer becomes full at size , the implementation allocates a new buffer of capacity and copies every live element into it before the old buffer can be freed. During the copy, both buffers coexist, so the peak memory usage of a single resize is slots, three times the live size, and strictly more than the steady-state bound. After the copy finishes and the old buffer is released, the footprint drops back to . A static array, in contrast, is allocated once at its final size and never has this transient spike.
For long-lived programs this transient is rarely a problem: it is local to the resize and reclaimed immediately. But it is worth being aware of in two situations. First, when memory is tight (embedded systems, large in-memory datasets close to physical limits), a doubling-resize on a buffer of size requires slots free at the moment of the resize, even though the data structure "uses" only . Second, the resize allocates a fresh contiguous block, which interacts with allocator fragmentation and (in garbage-collected runtimes like JavaScript) creates collectable garbage proportional to the old buffer's size.
Asymptotically, both sources of overhead are still . The dynamic array uses space, with a small constant-factor overhead compared to a perfectly sized static array, but the constants differ between the steady-state () and the brief peak during a resize ().
Summary
| Operation | Time | Notes |
|---|---|---|
get(i) / set(i, v) | Direct index access | |
append(v) | amortized | worst case during resize |
insert(i, v) | Must shift elements right | |
remove(i) | Must shift elements left | |
indexOf(v) | Linear scan |
Linked lists
A linked list stores elements in nodes that are scattered throughout memory, with each node containing a value and a pointer (reference) to the next node. Unlike arrays, linked lists do not require contiguous memory, and inserting or removing an element at a known position takes time, with no shifting required.
The trade-off is that random access is lost: to reach the th element, we must follow pointers from the head, taking time.
Singly linked lists
In a singly linked list, each node points to the next node. The list maintains a pointer to the head (first node) and, for efficiency, a pointer to the tail (last node).
head → [10 | •] → [20 | •] → [30 | null]
↑
tail
Implementation
class SinglyNode<T> {
constructor(
public value: T,
public next: SinglyNode<T> | null = null,
) {}
}
export class SinglyLinkedList<T> implements Iterable<T> {
private head: SinglyNode<T> | null = null;
private tail: SinglyNode<T> | null = null;
private length: number = 0;
get size(): number {
return this.length;
}
prepend(value: T): void {
const node = new SinglyNode(value, this.head);
this.head = node;
if (this.tail === null) {
this.tail = node;
}
this.length++;
}
append(value: T): void {
const node = new SinglyNode(value);
if (this.tail !== null) {
this.tail.next = node;
} else {
this.head = node;
}
this.tail = node;
this.length++;
}
removeFirst(): T | undefined {
if (this.head === null) return undefined;
const value = this.head.value;
this.head = this.head.next;
if (this.head === null) {
this.tail = null;
}
this.length--;
return value;
}
delete(value: T): boolean {
if (this.head === null) return false;
if (this.head.value === value) {
this.head = this.head.next;
if (this.head === null) this.tail = null;
this.length--;
return true;
}
let current = this.head;
while (current.next !== null) {
if (current.next.value === value) {
if (current.next === this.tail) this.tail = current;
current.next = current.next.next;
this.length--;
return true;
}
current = current.next;
}
return false;
}
find(value: T): boolean {
let current = this.head;
while (current !== null) {
if (current.value === value) return true;
current = current.next;
}
return false;
}
// ... iterator, toArray, etc.
}
Tracing through an example
Starting with an empty singly linked list, let us perform a sequence of operations:
| Operation | List state | size |
|---|---|---|
append(10) | [10] | 1 |
append(20) | [10] → [20] | 2 |
prepend(5) | [5] → [10] → [20] | 3 |
removeFirst() → 5 | [10] → [20] | 2 |
delete(20) → true | [10] | 1 |
append(30) | [10] → [30] | 2 |
Notice that prepend and removeFirst are both because they only touch the head pointer. Appending is because we maintain a tail pointer. However, delete(value) requires a linear scan.
A limitation of singly linked lists
Removing the last element is in a singly linked list, because we must traverse the entire list to find the node that precedes the tail. The doubly linked list solves this problem.
Doubly linked lists
In a doubly linked list, each node has pointers to both the next and previous nodes. This enables removal from both ends.
null ← [10 | •] ⇄ [20 | •] ⇄ [30 | •] → null
↑ ↑
head tail
Implementation
class DoublyNode<T> {
constructor(
public value: T,
public prev: DoublyNode<T> | null = null,
public next: DoublyNode<T> | null = null,
) {}
}
export class DoublyLinkedList<T> implements Iterable<T> {
private head: DoublyNode<T> | null = null;
private tail: DoublyNode<T> | null = null;
private length: number = 0;
get size(): number {
return this.length;
}
prepend(value: T): void {
const node = new DoublyNode(value, null, this.head);
if (this.head !== null) {
this.head.prev = node;
} else {
this.tail = node;
}
this.head = node;
this.length++;
}
append(value: T): void {
const node = new DoublyNode(value, this.tail, null);
if (this.tail !== null) {
this.tail.next = node;
} else {
this.head = node;
}
this.tail = node;
this.length++;
}
removeFirst(): T | undefined {
if (this.head === null) return undefined;
const value = this.head.value;
this.head = this.head.next;
if (this.head !== null) {
this.head.prev = null;
} else {
this.tail = null;
}
this.length--;
return value;
}
removeLast(): T | undefined {
if (this.tail === null) return undefined;
const value = this.tail.value;
this.tail = this.tail.prev;
if (this.tail !== null) {
this.tail.next = null;
} else {
this.head = null;
}
this.length--;
return value;
}
private removeNode(node: DoublyNode<T>): void {
if (node.prev !== null) {
node.prev.next = node.next;
} else {
this.head = node.next;
}
if (node.next !== null) {
node.next.prev = node.prev;
} else {
this.tail = node.prev;
}
this.length--;
}
// ... delete, find, iterators, etc.
}
The critical advantage is removeLast: by following the tail's prev pointer, we can unlink the last node in time without traversing the list. The removeNode helper detaches any node from the list in once we have a reference to it.
The cost of this flexibility is extra memory: each node stores two pointers instead of one. For large collections of small values, this overhead can be significant.
Comparing arrays and linked lists
| Operation | Dynamic array | Singly linked list | Doubly linked list |
|---|---|---|---|
| Access by index | |||
| Prepend | |||
| Append | * | ||
| Remove first | |||
| Remove last | * | ||
| Insert at known position | |||
| Search | |||
| Memory per element | Low (contiguous) | +1 pointer | +2 pointers |
| Cache performance | Excellent | Poor | Poor |
* Amortized
When to use which:
- Dynamic array when you need fast random access or are iterating sequentially (cache-friendly).
- Singly linked list when insertions and deletions at the front dominate.
- Doubly linked list when you need efficient removal from both ends or deletion of arbitrary nodes (given a reference).
In practice, arrays and dynamic arrays dominate due to cache locality; modern CPUs are optimized for accessing contiguous memory. Linked lists shine in scenarios where elements are frequently inserted or removed at the endpoints, or when the data is too large to copy during a resize.
Abstract data types: stacks, queues, and deques
The data structures above (arrays and linked lists) are concrete implementations. Now we turn to abstract data types (ADTs): specifications of behavior that can be implemented in multiple ways. A stack, for instance, defines what operations are available (push, pop, peek) and what they do, without prescribing how to store the elements.
Stacks
A stack is a Last-In, First-Out (LIFO) collection. The most recently added element is the first one to be removed, like a stack of plates.
Interface
interface IStack<T> {
push(value: T): void; // Add to top
pop(): T | undefined; // Remove and return top
peek(): T | undefined; // Return top without removing
readonly size: number;
readonly isEmpty: boolean;
}
Implementation
A stack is naturally implemented as a linked list where both push and pop operate on the head:
export class Stack<T> implements IStack<T>, Iterable<T> {
private head: { value: T; next: unknown } | null = null;
private length: number = 0;
get size(): number { return this.length; }
get isEmpty(): boolean { return this.length === 0; }
push(value: T): void {
this.head = { value, next: this.head };
this.length++;
}
pop(): T | undefined {
if (this.head === null) return undefined;
const value = this.head.value;
this.head = this.head.next as typeof this.head;
this.length--;
return value;
}
peek(): T | undefined {
return this.head?.value;
}
}
All three operations — push, pop, peek — are .
We could equally implement a stack with a dynamic array (push = append, pop = remove last). The array-based version has better cache locality, while the linked-list version avoids occasional resize costs. For most purposes in TypeScript, the built-in array with push/pop is the pragmatic choice; our implementation here serves pedagogical purposes.
Applications
Stacks appear throughout Computer Science:
- Function call stack. When a function is called, its local variables and return address are pushed onto the call stack. When it returns, they are popped. This is why recursive algorithms can overflow the stack with too many nested calls.
- Parenthesis matching. To check whether brackets are balanced in an expression like
((a + b) * c), push each opening bracket and pop when a matching closing bracket is found. - Undo/redo. Text editors push each action onto an undo stack. Undoing pops the most recent action.
- Depth-first search. DFS uses a stack (often the call stack via recursion) to track which vertices to visit next.
Tracing through an example
| Operation | Stack (top → bottom) | Returned |
|---|---|---|
push(10) | 10 | — |
push(20) | 20, 10 | — |
push(30) | 30, 20, 10 | — |
peek() | 30, 20, 10 | 30 |
pop() | 20, 10 | 30 |
pop() | 10 | 20 |
push(40) | 40, 10 | — |
pop() | 10 | 40 |
pop() | (empty) | 10 |
Queues
A queue is a First-In, First-Out (FIFO) collection. Elements are added at the back and removed from the front, like a line of people waiting.
Interface
interface IQueue<T> {
enqueue(value: T): void; // Add to back
dequeue(): T | undefined; // Remove and return front
peek(): T | undefined; // Return front without removing
readonly size: number;
readonly isEmpty: boolean;
}
Implementation
A queue maps naturally onto a singly linked list with head and tail pointers: enqueue appends at the tail, dequeue removes from the head.
interface QueueNode<T> {
value: T;
next: QueueNode<T> | null;
}
export class Queue<T> implements IQueue<T>, Iterable<T> {
private head: QueueNode<T> | null = null;
private tail: QueueNode<T> | null = null;
private length: number = 0;
get size(): number { return this.length; }
get isEmpty(): boolean { return this.length === 0; }
enqueue(value: T): void {
const node: QueueNode<T> = { value, next: null };
if (this.tail !== null) {
this.tail.next = node;
} else {
this.head = node;
}
this.tail = node;
this.length++;
}
dequeue(): T | undefined {
if (this.head === null) return undefined;
const value = this.head.value;
this.head = this.head.next;
if (this.head === null) this.tail = null;
this.length--;
return value;
}
peek(): T | undefined {
return this.head?.value;
}
}
All operations are .
An array-based queue is trickier: naively dequeuing from the front of an array is because every element must shift. A circular buffer solves this by wrapping indices around modulo the capacity, giving amortized enqueue and dequeue. Our linked-list implementation avoids this complexity altogether.
Applications
- Breadth-first search. BFS uses a queue to explore vertices level by level.
- Task scheduling. Operating systems use queues to schedule processes for CPU time.
- Buffering. Data streams (network packets, keyboard input) are buffered in queues.
- Level-order tree traversal. Visiting tree nodes level by level requires a queue.
Tracing through an example
| Operation | Queue (front → back) | Returned |
|---|---|---|
enqueue(10) | 10 | — |
enqueue(20) | 10, 20 | — |
enqueue(30) | 10, 20, 30 | — |
peek() | 10, 20, 30 | 10 |
dequeue() | 20, 30 | 10 |
dequeue() | 30 | 20 |
enqueue(40) | 30, 40 | — |
dequeue() | 40 | 30 |
Deques
A deque (double-ended queue, pronounced "deck") supports insertion and removal at both ends in time. It generalizes both stacks and queues.
Implementation
A deque maps directly onto a doubly linked list:
interface DequeNode<T> {
value: T;
prev: DequeNode<T> | null;
next: DequeNode<T> | null;
}
export class Deque<T> implements Iterable<T> {
private head: DequeNode<T> | null = null;
private tail: DequeNode<T> | null = null;
private length: number = 0;
get size(): number { return this.length; }
get isEmpty(): boolean { return this.length === 0; }
pushFront(value: T): void {
const node: DequeNode<T> = { value, prev: null, next: this.head };
if (this.head !== null) {
this.head.prev = node;
} else {
this.tail = node;
}
this.head = node;
this.length++;
}
pushBack(value: T): void {
const node: DequeNode<T> = { value, prev: this.tail, next: null };
if (this.tail !== null) {
this.tail.next = node;
} else {
this.head = node;
}
this.tail = node;
this.length++;
}
popFront(): T | undefined {
if (this.head === null) return undefined;
const value = this.head.value;
this.head = this.head.next;
if (this.head !== null) this.head.prev = null;
else this.tail = null;
this.length--;
return value;
}
popBack(): T | undefined {
if (this.tail === null) return undefined;
const value = this.tail.value;
this.tail = this.tail.prev;
if (this.tail !== null) this.tail.next = null;
else this.head = null;
this.length--;
return value;
}
peekFront(): T | undefined { return this.head?.value; }
peekBack(): T | undefined { return this.tail?.value; }
}
All six operations — pushFront, pushBack, popFront, popBack, peekFront, peekBack — are .
Using a deque as a stack or queue
A deque subsumes both stacks and queues:
- As a stack: use
pushFront/popFront(orpushBack/popBack). - As a queue: use
pushBack/popFront.
This flexibility makes the deque a useful building block when the access pattern is uncertain, or when both ends are needed.
Applications
- Sliding window maximum. In the classic interview problem "maximum in every window of size ," a deque holds indices of potential maximums. Elements are added at the back and removed from the front (when they fall out of the window) or from the back (when a larger element supersedes them).
- Work-stealing schedulers. Each thread has a deque of tasks. It pops from its own front, while idle threads steal from other deques' backs.
- Palindrome checking. Push characters from both ends; pop from both ends and compare.
Complexity comparison
DynamicArray | SinglyLinkedList | DoublyLinkedList | Stack | Queue | Deque | |
|---|---|---|---|---|---|---|
| Add front | — | |||||
| Add back | * | — | ||||
| Remove front | ||||||
| Remove back | * | — | — | |||
| Access by index | — | — | — | |||
| Search | — | — | — |
* Amortized
Summary
This chapter introduced the foundational data structures upon which nearly everything else is built:
- Dynamic arrays provide random access and amortized append via the doubling strategy. Insert and remove at arbitrary positions cost due to shifting.
- Singly linked lists offer insertion and removal at the head, and append with a tail pointer, but sacrifice random access and efficient removal from the tail.
- Doubly linked lists add back-pointers for removal at both ends, at the cost of extra memory per node.
- Stacks (LIFO) are the workhorse of recursion, expression evaluation, and depth-first search.
- Queues (FIFO) power breadth-first search, task scheduling, and buffering.
- Deques generalize stacks and queues, supporting operations at both ends.
The choice between arrays and linked lists comes down to access patterns. If you need random access or sequential iteration (where cache locality matters), use an array. If insertions and deletions at the endpoints dominate, use a linked list. When in doubt, the dynamic array is usually the right default — it is what most languages provide as their standard collection.
In the next chapter, we will use these building blocks to construct hash tables, which achieve expected lookup by combining arrays with a hash function.
Exercises
Exercise 7.1. Implement a function isBalanced(expression: string): boolean that uses a Stack to determine whether the parentheses (), brackets [], and braces {} in an expression are properly balanced. For example, isBalanced("((a+b)*[c-d])") should return false (mismatched outer parentheses), while isBalanced("{a*(b+c)}") should return true.
Exercise 7.2. Implement a circular buffer–based queue. Use a fixed-size array and two indices (front and back) that wrap around using modular arithmetic. Compare its performance characteristics with our linked-list–based Queue.
Exercise 7.3. Implement a MinStack<T> that supports push, pop, peek, and an additional min() operation that returns the minimum element in the stack — all in time. Hint: maintain a second stack that tracks minimums.
Exercise 7.4. Using only two Stacks, implement a Queue. Analyze the amortized time complexity of enqueue and dequeue. Hint: use one stack for enqueuing and another for dequeuing; transfer elements between them lazily.
Exercise 7.5. Implement a function slidingWindowMax(arr: number[], k: number): number[] that returns the maximum value in each window of size as the window slides from left to right across the array. Use a Deque to achieve time complexity.
Hash Tables
The data structures of the previous chapter — arrays, linked lists, stacks, and queues — support searching in time at best. Binary search trees (which we will study in Chapter 9) reduce this to , but can we do even better? Hash tables achieve expected time for insertions, deletions, and lookups by using a hash function to compute the index where each element should be stored. This makes hash tables one of the most important and widely used data structures in software engineering. In this chapter we explore how hash functions work, how to handle collisions when two keys map to the same index, and how to build hash tables that resize dynamically to maintain their performance guarantees.
The dictionary problem
Many problems reduce to maintaining a collection of key-value pairs that supports three operations:
- Insert a new key-value pair (or update the value if the key exists).
- Lookup the value associated with a given key.
- Delete a key-value pair.
This is the dictionary abstract data type (also called a map or associative array). JavaScript's built-in Map and Python's dict are both dictionaries backed by hash tables.
Direct addressing
The simplest approach is direct addressing: use the key itself as an index into an array. If keys are integers in the range , we allocate an array of size and store the value for key at index . All three operations are .
// Direct-address table for integer keys in [0, m-1]
const table = new Array<string | undefined>(1000);
table[42] = 'Alice'; // insert
const name = table[42]; // lookup — O(1)
table[42] = undefined; // delete
Direct addressing has a fatal flaw: the key space must be small and dense. If keys are strings, or integers in the range , allocating an array large enough to hold every possible key is impractical. We need a way to map a large key space into a small array.
Hash functions
A hash function maps keys from a large universe to indices in a table of size :
Given a key , the hash function computes , which is the index (or bucket) where the key should be stored. A good hash function has two properties:
- Determinism. The same key always produces the same hash.
- Uniformity. Different keys should spread as evenly as possible across the buckets, minimizing collisions.
The division method
The simplest hash function for integer keys is the division method:
This maps any non-negative integer to . The choice of matters: if is a power of 2, the hash uses only the lowest-order bits of , which can lead to clustering. Prime values of tend to distribute keys more uniformly.
The multiplication method
The multiplication method avoids the sensitivity to :
where is a constant in the range . Knuth suggests . The expression extracts the fractional part of , which is then scaled to . This method works well regardless of whether is a power of 2.
Hashing strings
For string keys, we need to convert a sequence of characters into an integer. A standard approach is a polynomial rolling hash:
where is the character code at position , is a prime base (often 31 or 37), and is the table size. Variants of this idea include the FNV (Fowler–Noll–Vo) hash, which alternates XOR and multiplication to achieve good distribution with simple operations:
function fnvHash(key: string): number {
let h = 0x811c9dc5; // FNV offset basis
for (let i = 0; i < key.length; i++) {
h ^= key.charCodeAt(i);
h = Math.imul(h, 0x01000193); // FNV prime
}
return h >>> 0; // ensure non-negative 32-bit integer
}
The >>> 0 at the end is a JavaScript idiom that converts a possibly negative 32-bit integer to an unsigned 32-bit integer, ensuring we get a non-negative result suitable for use as an array index.
Universal hashing
No single hash function can avoid collisions for every possible input. An adversary who knows the hash function can deliberately choose keys that all hash to the same bucket, degrading performance to .
Universal hashing defeats this by choosing the hash function randomly from a family of functions at construction time. A family of hash functions from to is universal if, for any two distinct keys :
When the hash function is chosen randomly, no input distribution can consistently cause collisions, giving us expected performance regardless of the input.
Collision resolution
Since , multiple keys will inevitably hash to the same bucket — a collision. The two primary strategies for handling collisions are separate chaining and open addressing.
Separate chaining
In separate chaining, each bucket stores a linked list (or chain) of all key-value pairs that hash to that index. Insertions prepend to the chain; lookups and deletions walk the chain until the key is found.
How it works
Consider a hash table with buckets after inserting keys with hashes as shown:
Bucket 0: → (key₁, val₁) → (key₅, val₅) → null
Bucket 1: → (key₂, val₂) → null
Bucket 2: → null
Bucket 3: → (key₃, val₃) → (key₄, val₄) → null
Keys 1 and 5 collide at bucket 0; keys 3 and 4 collide at bucket 3. Lookups for key₅ must traverse two nodes in bucket 0.
Load factor
The load factor is the average number of elements per bucket, where is the number of stored entries and is the number of buckets. Under the simple uniform hashing assumption (each key is equally likely to hash to any bucket), the expected chain length is .
- If is kept constant (say, ), the expected time for any operation is .
- If we never resize, grows with , and operations degrade to .
Implementation
Our HashTableChaining<K, V> maintains an array of bucket heads (each either a chain node or null) and doubles the array when :
class ChainNode<K, V> {
constructor(
public key: K,
public value: V,
public next: ChainNode<K, V> | null = null,
) {}
}
export class HashTableChaining<K, V> implements Iterable<[K, V]> {
private buckets: (ChainNode<K, V> | null)[];
private count = 0;
constructor(initialCapacity = 16) {
const cap = Math.max(1, initialCapacity);
this.buckets = new Array<ChainNode<K, V> | null>(cap).fill(null);
}
get size(): number {
return this.count;
}
get capacity(): number {
return this.buckets.length;
}
get loadFactor(): number {
return this.count / this.buckets.length;
}
The set method searches the chain at the target bucket. If the key is found, its value is updated; otherwise a new node is prepended:
set(key: K, value: V): V | undefined {
if (this.count / this.buckets.length >= 0.75) {
this.resize(this.buckets.length * 2);
}
const idx = this.bucketIndex(key);
let node: ChainNode<K, V> | null = this.buckets[idx]!;
while (node !== null) {
if (Object.is(node.key, key)) {
const old = node.value;
node.value = value;
return old;
}
node = node.next;
}
// Prepend to the bucket chain
this.buckets[idx] = new ChainNode(key, value, this.buckets[idx]!);
this.count++;
return undefined;
}
The get and delete methods follow the same pattern — compute the bucket index, then walk the chain:
get(key: K): V | undefined {
const idx = this.bucketIndex(key);
let node: ChainNode<K, V> | null = this.buckets[idx]!;
while (node !== null) {
if (Object.is(node.key, key)) {
return node.value;
}
node = node.next;
}
return undefined;
}
delete(key: K): boolean {
const idx = this.bucketIndex(key);
let node: ChainNode<K, V> | null = this.buckets[idx]!;
let prev: ChainNode<K, V> | null = null;
while (node !== null) {
if (Object.is(node.key, key)) {
if (prev !== null) {
prev.next = node.next;
} else {
this.buckets[idx] = node.next;
}
this.count--;
return true;
}
prev = node;
node = node.next;
}
return false;
}
We use Object.is for key comparison rather than === because Object.is correctly handles the edge case where NaN === NaN is false but we want NaN keys to match.
Dynamic resizing
When the load factor reaches the threshold, we allocate a new array with double the capacity and rehash every entry:
private resize(newCapacity: number): void {
const oldBuckets = this.buckets;
this.buckets = new Array<ChainNode<K, V> | null>(newCapacity).fill(null);
this.count = 0;
for (let b = 0; b < oldBuckets.length; b++) {
let node: ChainNode<K, V> | null = oldBuckets[b]!;
while (node !== null) {
this.set(node.key, node.value);
node = node.next;
}
}
}
Resizing costs in the worst case, but by the same amortized argument as dynamic arrays (Chapter 7), the cost per insertion averages over a sequence of operations.
Tracing through an example
Let us trace insertions into a hash table with 4 buckets. We use a simple hash for clarity:
| Operation | Hash | Bucket state | Size |
|---|---|---|---|
set(5, "a") | B1: (5,"a") | 1 | |
set(9, "b") | B1: (9,"b")→(5,"a") | 2 | |
set(3, "c") | B1: (9,"b")→(5,"a"), B3: (3,"c") | 3 | |
set(5, "d") | B1: (9,"b")→(5,"d") — value updated | 3 | |
delete(9) | B1: (5,"d") | 2 |
Keys 5 and 9 collide at bucket 1. Setting key 5 again updates its value without increasing the size. Deleting key 9 removes it from the chain.
Open addressing
In open addressing, all entries are stored directly in the table array — there are no linked lists. When a collision occurs, we probe a sequence of alternative slots until an empty one is found.
The probe sequence for key is a permutation of the table indices:
We try slot first; if it is occupied, we try , and so on.
Linear probing
The simplest probing strategy is linear probing:
where is the primary hash. This means we simply try the next slot, then the one after that, wrapping around the end of the array.
Linear probing is cache-friendly because it accesses consecutive memory locations. However, it suffers from primary clustering: a contiguous block of occupied slots tends to grow, because any key that hashes into the cluster must probe to its end. Long clusters slow down both insertions and lookups.
Double hashing
Double hashing uses a second hash function to compute the probe step:
where is the primary hash and determines the step size. Different keys that collide at will have different step sizes, breaking up clusters.
For double hashing to work correctly, must be coprime to so that the probe sequence visits every slot. A common choice is to make a power of 2 and ensure is always odd.
Tombstones and lazy deletion
Deleting from an open-addressed table is tricky. Simply clearing a slot would break probe sequences: if key was placed after probing past slot (which held key ), clearing slot would make unreachable.
The solution is lazy deletion with tombstones. When we delete a key, we mark its slot with a special sentinel value (the tombstone). During lookups, tombstones are treated as occupied (we continue probing past them). During insertions, tombstones can be reused.
Slot 0: ── (key₁, val₁)
Slot 1: ── TOMBSTONE ← deleted entry
Slot 2: ── (key₃, val₃) ← still reachable past tombstone
Slot 3: ── empty
Over time, tombstones accumulate and degrade performance. When we resize the table, tombstones are discarded, restoring clean probe sequences.
Load factor for open addressing
Open addressing is more sensitive to load factor than chaining. As the table fills up, probe sequences get longer. At load factor , the expected number of probes for an unsuccessful search under uniform hashing is:
At , this is 2 probes. At , it is 4. At , it is 10. For this reason, open-addressed tables typically resize at — more aggressively than chaining tables.
Implementation
Our HashTableOpenAddressing<K, V> supports both linear probing and double hashing:
const TOMBSTONE = Symbol('TOMBSTONE');
interface Slot<K, V> {
key: K;
value: V;
}
type BucketEntry<K, V> = Slot<K, V> | typeof TOMBSTONE | undefined;
export class HashTableOpenAddressing<K, V> implements Iterable<[K, V]> {
private slots: BucketEntry<K, V>[];
private count = 0;
private tombstoneCount = 0;
private readonly strategy: 'linear' | 'double-hashing';
constructor(
initialCapacity = 16,
strategy: 'linear' | 'double-hashing' = 'linear',
) {
this.strategy = strategy;
const cap = nextPowerOf2(Math.max(1, initialCapacity));
this.slots = new Array<BucketEntry<K, V>>(cap);
}
The set method probes for an empty slot or a matching key:
set(key: K, value: V): V | undefined {
if ((this.count + this.tombstoneCount) / this.slots.length >= 0.5) {
this.rebuild(this.slots.length * 2);
}
const cap = this.slots.length;
const h1 = primaryHash(key) % cap;
const step = this.strategy === 'double-hashing'
? secondaryHash(key, cap) : 1;
let firstTombstone = -1;
let idx = h1;
for (let i = 0; i < cap; i++) {
const slot = this.slots[idx];
if (slot === undefined) {
const insertIdx = firstTombstone !== -1 ? firstTombstone : idx;
this.slots[insertIdx] = { key, value };
this.count++;
if (firstTombstone !== -1) this.tombstoneCount--;
return undefined;
}
if (slot === TOMBSTONE) {
if (firstTombstone === -1) firstTombstone = idx;
} else if (Object.is(slot.key, key)) {
const old = slot.value;
slot.value = value;
return old;
}
idx = (idx + step) % cap;
}
}
Notice the firstTombstone optimization: if we pass a tombstone during the probe sequence, we remember its position. If the key is not in the table, we insert at the first tombstone rather than probing all the way to an empty slot. This recycles tombstones and prevents them from accumulating.
The resize check counts both live entries and tombstones against the load threshold. When we rebuild, tombstones are discarded:
private rebuild(newCapacity: number): void {
const cap = nextPowerOf2(Math.max(1, newCapacity));
const oldSlots = this.slots;
this.slots = new Array<BucketEntry<K, V>>(cap);
this.count = 0;
this.tombstoneCount = 0;
for (const slot of oldSlots) {
if (slot !== undefined && slot !== TOMBSTONE) {
this.set(slot.key, slot.value);
}
}
}
Tracing through linear probing
Let us trace insertions into a table of size 8 using linear probing with :
| Operation | Hash | Probes | Result |
|---|---|---|---|
set(3, "a") | 3 | 3 | Slot 3 ← (3,"a") |
set(11, "b") | 3 | 3→4 | Collision at 3, slot 4 ← (11,"b") |
set(19, "c") | 3 | 3→4→5 | Collision at 3,4, slot 5 ← (19,"c") |
delete(11) | 3 | 3→4 | Slot 4 ← TOMBSTONE |
get(19) | 3 | 3→4→5 | Probes past tombstone at 4, finds at 5 |
set(27, "d") | 3 | 3→4 | Reuses tombstone at 4 ← (27,"d") |
The tombstone at slot 4 ensures that get(19) does not stop prematurely after passing the deleted slot.
Chaining vs open addressing
| Property | Chaining | Open addressing |
|---|---|---|
| Extra memory | Linked list nodes | None (entries in table) |
| Cache performance | Poor (pointer chasing) | Good (sequential probes) |
| Load factor tolerance | Works well up to | Degrades rapidly above |
| Deletion | Simple | Requires tombstones |
| Worst case (all collisions) | ||
| Implementation complexity | Simpler | More subtle |
In practice, open addressing with linear probing tends to outperform chaining for moderate load factors thanks to cache locality. Chaining is more forgiving when the load factor varies or when deletions are frequent. Modern high-performance hash maps (like Google's SwissTable or Rust's HashMap) use sophisticated open-addressing schemes with SIMD-accelerated probing.
Applications
Hash tables are ubiquitous. Here are a few classic applications:
Frequency counting
Count how many times each word appears in a text:
function wordFrequency(words: string[]): Map<string, number> {
const freq = new Map<string, number>();
for (const word of words) {
freq.set(word, (freq.get(word) ?? 0) + 1);
}
return freq;
}
This runs in expected time, where is the number of words. Without a hash table, we would need (sorting) or (brute force).
Two-sum problem
Given an array of numbers and a target sum, find two elements that add up to the target:
function twoSum(nums: number[], target: number): [number, number] | null {
const seen = new Map<number, number>(); // value → index
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
const j = seen.get(complement);
if (j !== undefined) return [j, i];
seen.set(nums[i], i);
}
return null;
}
Each element is inserted and looked up once, giving expected time.
Anagram detection
Two strings are anagrams if they contain the same characters with the same frequencies. We can check this by counting character frequencies in both strings and comparing:
function areAnagrams(a: string, b: string): boolean {
if (a.length !== b.length) return false;
const counts = new Map<string, number>();
for (const ch of a) counts.set(ch, (counts.get(ch) ?? 0) + 1);
for (const ch of b) {
const c = (counts.get(ch) ?? 0) - 1;
if (c < 0) return false;
counts.set(ch, c);
}
return true;
}
This is where is the string length, versus for sorting both strings and comparing.
Deduplication
Remove duplicate elements from an array while preserving order:
function deduplicate<T>(arr: T[]): T[] {
const seen = new Set<T>();
const result: T[] = [];
for (const item of arr) {
if (!seen.has(item)) {
seen.add(item);
result.push(item);
}
}
return result;
}
A Set is essentially a hash table that stores only keys (no values).
Complexity summary
| Operation | Chaining (expected) | Chaining (worst) | Open addressing (expected) | Open addressing (worst) |
|---|---|---|---|---|
| Insert | ||||
| Lookup | ||||
| Delete | ||||
| Space | — | — |
The expected complexities hold under the assumptions that the hash function distributes keys uniformly and the load factor is bounded by a constant.
Summary
Hash tables achieve expected time for insert, lookup, and delete by using a hash function to map keys to array indices. The two main collision resolution strategies are:
- Separate chaining stores colliding entries in linked lists at each bucket. It is simple, tolerates high load factors, and handles deletions cleanly. The cost is extra memory for list nodes and poor cache locality.
- Open addressing stores all entries directly in the table array, probing for alternative slots on collision. Linear probing is cache-friendly but susceptible to clustering; double hashing eliminates clustering at the cost of additional hash computations. Deletions require tombstones to preserve probe sequences.
The load factor controls performance. Chaining tables typically resize at ; open-addressed tables at . Dynamic resizing (doubling the table and rehashing all entries) maintains the load factor within bounds, giving amortized insertions.
Hash tables are the backbone of frequency counting, deduplication, two-sum–style problems, caching, and countless other applications. Their expected operations make them the go-to data structure whenever fast key-based access is needed — though their worst-case behavior means they are not a substitute for balanced search trees when guaranteed performance is required.
In the next chapter, we study trees and binary search trees, which provide worst-case operations and support order-based queries that hash tables cannot efficiently answer.
Exercises
Exercise 8.1. Implement a function groupAnagrams(words: string[]): string[][] that groups an array of words into sub-arrays of anagrams. For example, groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"]) should return [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]] (in any order). Use a hash table where the key is the sorted characters of each word.
Exercise 8.2. Our open-addressing implementation uses a load factor threshold of 0.5 and doubles the table when exceeded. Experiment with different thresholds (0.6, 0.7, 0.8) and measure the average number of probes per lookup on random data. At what point does performance degrade noticeably?
Exercise 8.3. Implement a HashSet<T> class backed by HashTableChaining<T, boolean>. Support add, has, delete, size, and iteration. How does this compare to using TypeScript's built-in Set?
Exercise 8.4. The cuckoo hashing scheme uses two hash functions and two tables. Each key has exactly two possible locations — one in each table. If both are occupied during an insertion, one of the existing keys is "kicked out" and re-inserted using its alternate location. Research cuckoo hashing and explain: (a) why lookup is worst case, (b) under what conditions insertion might fail, and (c) how to handle insertion failures.
Exercise 8.5. Our hash function uses FNV-1a for strings and a bit-mixing scheme for numbers. Design an experiment to test how uniformly these functions distribute keys. Generate 10,000 random strings (and separately, 10,000 random integers), hash each into a table of 1,000 buckets, and compute the chi-squared statistic. Compare with a theoretically perfect uniform distribution.
Trees and Binary Search Trees
Hash tables give us expected lookups, but they cannot answer order-based queries: what is the smallest key? What is the next key after ? What are all keys in the range ? Trees restore this capability. A binary search tree stores elements in a way that mirrors binary search — at every node, all smaller elements are to the left and all larger elements are to the right. This gives us search, insert, and delete operations, where is the height of the tree. In this chapter we develop the fundamental vocabulary of trees, study the four standard traversal orders, and build a complete binary search tree implementation.
Tree terminology
A tree is a connected, acyclic graph. In Computer Science we almost always work with rooted trees, where one node is designated as the root and all other nodes are arranged in a parent-child hierarchy descending from it.
Key definitions:
- Node: an element of the tree, containing a value and links to its children.
- Root: the topmost node; it has no parent.
- Parent: the node directly above a given node.
- Child: a node directly below a given node.
- Leaf: a node with no children (also called an external node).
- Internal node: a node with at least one child.
- Sibling: nodes that share the same parent.
- Subtree: the tree rooted at a given node, consisting of that node and all its descendants.
- Depth of a node: the number of edges from the root to that node. The root has depth 0.
- Height of a node: the number of edges on the longest path from that node down to a leaf. A leaf has height 0.
- Height of the tree: the height of the root. An empty tree has height by convention.
- Level : the set of all nodes at depth .
- Degree of a node: the number of children it has.
Binary trees
A binary tree is a tree in which every node has at most two children, called the left child and the right child. Binary trees are the most fundamental tree structure in Computer Science, underpinning search trees, heaps, expression parsers, and many other data structures.
Representations
There are two common ways to represent a binary tree:
Linked representation. Each node is an object with a value and two pointers (left and right). This is the most flexible representation and the one we use throughout this book:
class BinaryTreeNode<T> {
constructor(
public value: T,
public left: BinaryTreeNode<T> | null = null,
public right: BinaryTreeNode<T> | null = null,
) {}
}
Array representation. For a complete binary tree (where every level except possibly the last is fully filled), we can store nodes in an array by level order. The root is at index 0, and for a node at index :
- Left child:
- Right child:
- Parent:
This representation avoids pointer overhead and is used for binary heaps (Chapter 11).
Properties of binary trees
A binary tree of height has:
- At most nodes (when every level is full — a perfect binary tree).
- At least nodes (when every internal node has exactly one child — a degenerate or skewed tree).
- At most leaves.
A binary tree with nodes has height between and .
Tree traversals
A traversal visits every node in the tree exactly once. The order of visitation defines the traversal type. For a binary tree, there are four standard traversals.
Inorder traversal (left, root, right)
Visit the left subtree, then the root, then the right subtree. For a binary search tree, inorder traversal produces values in sorted order.
1
/ \
2 3
/ \ \
4 5 6
Inorder: 4, 2, 5, 1, 3, 6
inorder(): T[] {
const result: T[] = [];
this.inorderHelper(this.root, result);
return result;
}
private inorderHelper(node: BinaryTreeNode<T> | null, result: T[]): void {
if (node === null) return;
this.inorderHelper(node.left, result);
result.push(node.value);
this.inorderHelper(node.right, result);
}
The recursion mirrors the traversal definition directly: recurse left, process the current node, recurse right.
Preorder traversal (root, left, right)
Visit the root first, then the left subtree, then the right subtree. Preorder traversal is useful for serializing a tree (e.g., to reconstruct it later) because the root always comes before its children.
Preorder: 1, 2, 4, 5, 3, 6
private preorderHelper(node: BinaryTreeNode<T> | null, result: T[]): void {
if (node === null) return;
result.push(node.value);
this.preorderHelper(node.left, result);
this.preorderHelper(node.right, result);
}
Postorder traversal (left, right, root)
Visit the left subtree, then the right subtree, then the root. Postorder traversal processes children before their parent, making it useful for deleting a tree (free children before the parent) or evaluating expression trees (evaluate operands before the operator).
Postorder: 4, 5, 2, 6, 3, 1
private postorderHelper(node: BinaryTreeNode<T> | null, result: T[]): void {
if (node === null) return;
this.postorderHelper(node.left, result);
this.postorderHelper(node.right, result);
result.push(node.value);
}
Level-order traversal (breadth-first)
Visit nodes level by level, from left to right. Unlike the three depth-first traversals above, level-order traversal uses a queue rather than recursion:
Level-order: 1, 2, 3, 4, 5, 6
levelOrder(): T[] {
if (this.root === null) return [];
const result: T[] = [];
const queue: BinaryTreeNode<T>[] = [this.root];
while (queue.length > 0) {
const node = queue.shift()!;
result.push(node.value);
if (node.left !== null) queue.push(node.left);
if (node.right !== null) queue.push(node.right);
}
return result;
}
We enqueue the root, then repeatedly dequeue a node, process it, and enqueue its children. Since every node is enqueued and dequeued exactly once, the traversal is .
Complexity of traversals
All four traversals visit every node exactly once, so they run in time. The space complexity depends on the traversal:
- Recursive traversals (inorder, preorder, postorder): stack space, where is the tree height. For a balanced tree this is ; for a skewed tree it is .
- Level-order traversal: space for the queue, where is the maximum width (number of nodes at any single level). For a complete binary tree, the last level has up to nodes, so the space is .
Computing height and size
The height of a tree is computed recursively: the height of an empty tree is , and the height of a non-empty tree is one plus the maximum of the heights of its subtrees:
private heightHelper(node: BinaryTreeNode<T> | null): number {
if (node === null) return -1;
return 1 + Math.max(
this.heightHelper(node.left),
this.heightHelper(node.right),
);
}
The size (number of nodes) is similarly recursive:
private sizeHelper(node: BinaryTreeNode<T> | null): number {
if (node === null) return 0;
return 1 + this.sizeHelper(node.left) + this.sizeHelper(node.right);
}
Both run in time by visiting every node.
Binary search trees
A binary search tree (BST) is a binary tree that satisfies the BST property: for every node ,
- all values in 's left subtree are less than 's value, and
- all values in 's right subtree are greater than or equal to 's value.
This property makes the tree a natural implementation of the dictionary abstract data type (Chapter 8), with the added ability to answer order-based queries.
10
/ \
5 15
/ \ / \
3 7 12 20
Every node in the left subtree of 10 (namely 3, 5, 7) is less than 10, and every node in the right subtree (12, 15, 20) is greater.
BST node structure
Our BST nodes carry parent pointers, which simplify the successor and predecessor algorithms:
class BSTNode<T> {
constructor(
public value: T,
public left: BSTNode<T> | null = null,
public right: BSTNode<T> | null = null,
public parent: BSTNode<T> | null = null,
) {}
}
The parent pointer costs one extra reference per node but eliminates the need to maintain an explicit stack when walking up the tree.
Search
To search for a value , start at the root and compare with the current node's value. If is smaller, go left; if larger, go right; if equal, the node is found. If we reach a null pointer, the value is not in the tree.
search(value: T): BSTNode<T> | null {
let current = this.root;
while (current !== null) {
const cmp = this.compare(value, current.value);
if (cmp === 0) return current;
current = cmp < 0 ? current.left : current.right;
}
return null;
}
This is exactly binary search applied to a tree structure. At each step we eliminate one subtree, following a single root-to-leaf path. The running time is where is the height of the tree.
Insert
To insert a value, we walk the tree as in search until we reach a null position, then place the new node there:
insert(value: T): void {
const newNode = new BSTNode(value);
if (this.root === null) {
this.root = newNode;
return;
}
let current = this.root;
for (;;) {
if (this.compare(value, current.value) < 0) {
if (current.left === null) {
current.left = newNode;
newNode.parent = current;
return;
}
current = current.left;
} else {
if (current.right === null) {
current.right = newNode;
newNode.parent = current;
return;
}
current = current.right;
}
}
}
Insertion always adds a new leaf, so the tree's shape depends on the order of insertions. Inserting values in sorted order creates a degenerate (right-skewed) tree of height , while inserting in random order produces a tree of expected height .
Tracing through insertions
Let us trace the insertion of values 10, 5, 15, 3, 7, 12, 20:
| Insert | Tree state |
|---|---|
| 10 | 10 — root |
| 5 | 10 ← 5 goes left (5 < 10) |
| 15 | 10 → 15 goes right (15 ≥ 10) |
| 3 | 5 ← 3 goes left (3 < 5) |
| 7 | 5 → 7 goes right (7 ≥ 5) |
| 12 | 15 ← 12 goes left (12 < 15) |
| 20 | 15 → 20 goes right (20 ≥ 15) |
The result is a balanced tree of height 2:
10
/ \
5 15
/ \ / \
3 7 12 20
If instead we inserted 3, 5, 7, 10, 12, 15, 20 (sorted order), each value would go to the right of the previous one, producing a right-skewed linked list of height 6. This is why balanced BST variants (Chapter 10) are important.
Minimum and maximum
The minimum value in a BST is the leftmost node; the maximum is the rightmost:
private minNode(node: BSTNode<T> | null): BSTNode<T> | null {
if (node === null) return null;
while (node.left !== null) {
node = node.left;
}
return node;
}
private maxNode(node: BSTNode<T> | null): BSTNode<T> | null {
if (node === null) return null;
while (node.right !== null) {
node = node.right;
}
return node;
}
Both follow a single path from the given node to a leaf, so they run in time.
Successor and predecessor
The in-order successor of a node is the node with the smallest value greater than 's value — the next element in sorted order. The predecessor is the node with the largest value smaller than 's.
Finding the successor has two cases:
- If has a right subtree, the successor is the minimum of that subtree (the leftmost node in the right subtree).
- If has no right subtree, the successor is the lowest ancestor of whose left child is also an ancestor of . Intuitively, we walk up the tree until we turn right — the node where we turn is the successor.
private successorNode(node: BSTNode<T>): BSTNode<T> | null {
if (node.right !== null) {
return this.minNode(node.right);
}
let current: BSTNode<T> | null = node;
let parent = current.parent;
while (parent !== null && current === parent.right) {
current = parent;
parent = parent.parent;
}
return parent;
}
The predecessor is symmetric: if has a left subtree, the predecessor is the maximum of that subtree; otherwise walk up until we turn left.
private predecessorNode(node: BSTNode<T>): BSTNode<T> | null {
if (node.left !== null) {
return this.maxNode(node.left);
}
let current: BSTNode<T> | null = node;
let parent = current.parent;
while (parent !== null && current === parent.left) {
current = parent;
parent = parent.parent;
}
return parent;
}
Both operations follow at most one root-to-leaf path, so they are .
Tracing successor
Consider the tree:
10
/ \
5 15
/ \ / \
3 7 12 20
- Successor of 7: 7 has no right subtree. Walk up: 7 is the right child of 5, so continue. 5 is the left child of 10 — stop. The successor is 10.
- Successor of 10: 10 has a right subtree rooted at 15. The minimum of that subtree is 12. The successor is 12.
- Successor of 20: 20 has no right subtree. Walk up: 20 is the right child of 15, 15 is the right child of 10, 10 has no parent. No successor exists (20 is the maximum).
Delete
Deletion is the most complex BST operation because removing a node must preserve the BST property. There are three cases:
Case 1: The node is a leaf (no children). Simply remove it by setting the parent's pointer to null.
Case 2: The node has one child. Replace the node with its only child. The child takes the node's position in the tree.
Case 3: The node has two children. Find the node's in-order successor (the minimum of the right subtree). Copy the successor's value into the node, then delete the successor. The successor has at most one child (a right child), so its deletion reduces to Case 1 or 2.
The implementation uses a helper called transplant (following CLRS) that replaces one subtree with another:
private transplant(u: BSTNode<T>, v: BSTNode<T> | null): void {
if (u.parent === null) {
this.root = v;
} else if (u === u.parent.left) {
u.parent.left = v;
} else {
u.parent.right = v;
}
if (v !== null) {
v.parent = u.parent;
}
}
transplant(u, v) replaces the subtree rooted at with the subtree rooted at . It updates the parent of to point to and sets 's parent pointer.
The full deletion procedure:
private deleteNode(node: BSTNode<T>): void {
if (node.left === null) {
// Case 1 or 2a: no left child
this.transplant(node, node.right);
} else if (node.right === null) {
// Case 2b: no right child
this.transplant(node, node.left);
} else {
// Case 3: two children
const successor = this.minNode(node.right)!;
if (successor.parent !== node) {
this.transplant(successor, successor.right);
successor.right = node.right;
successor.right.parent = successor;
}
this.transplant(node, successor);
successor.left = node.left;
successor.left.parent = successor;
}
}
In Case 3, we find the successor (the minimum of the right subtree). If the successor is not the immediate right child of the node being deleted, we first detach the successor from its current position (transplanting its right child into its place), then connect the node's right subtree to the successor. Finally, we transplant the successor into the deleted node's position and connect the left subtree.
Tracing deletion
Starting with:
15
/ \
5 20
/ \
18 25
/ \
16 19
Delete 15 (two children, successor = 16):
- Successor of 15 is 16 (minimum of right subtree).
- 16 is not the immediate right child of 15, so first transplant 16 out: 16 has no right child, so its parent (18) gets null as left child.
- Connect 20's subtree to 16:
16.right = 20,20.parent = 16. - Transplant 16 into 15's position: 16 becomes the root.
- Connect 15's left subtree to 16:
16.left = 5,5.parent = 16.
Result:
16
/ \
5 20
/ \
18 25
\
19
The BST property is preserved: 5 < 16, and all of 18, 19, 20, 25 are greater than 16.
BST performance analysis
Every operation (search, insert, delete, min, max, successor, predecessor) follows at most one root-to-leaf path, so all run in time where is the tree height.
The height depends on the insertion order:
| Scenario | Height | Operation time |
|---|---|---|
| Balanced tree ( nodes) | ||
| Random insertion order (expected) | ||
| Sorted insertion order (worst case) |
For random insertions, the expected height of a BST with nodes is approximately (a result due to Reed, 2003). This means that on average, a plain BST performs well. However, the worst case is , which is no better than a linked list.
To guarantee operations regardless of insertion order, we need balanced binary search trees — trees that automatically restructure themselves to maintain low height. AVL trees and red-black trees (Chapter 10) achieve this guarantee with a constant-factor overhead per operation.
BST vs hash table
| Property | BST | Hash table |
|---|---|---|
| Search | expected | |
| Insert | expected | |
| Delete | expected | |
| Min / Max | ||
| Successor / Predecessor | ||
| Sorted traversal | (sort first) | |
| Range query |
Hash tables are faster for pure lookup workloads, but BSTs support order-based operations that hash tables cannot efficiently provide. When you need sorted iteration, range queries, or finding the nearest key, a BST (especially a balanced one) is the right choice.
Complexity summary
| Operation | Time (average) | Time (worst) | Space |
|---|---|---|---|
| Search | |||
| Insert | |||
| Delete | |||
| Min / Max | |||
| Successor / Predecessor | |||
| Inorder traversal | |||
| Space (tree itself) | — | — |
The "average" column assumes random insertion order. The "worst" column covers sorted or adversarial insertion order, which produces a degenerate tree.
Summary
Trees are hierarchical data structures where each node has a value and links to its children. Binary trees restrict each node to at most two children, and support four standard traversals: inorder (left-root-right), preorder (root-left-right), postorder (left-right-root), and level-order (breadth-first). All traversals run in time.
A binary search tree augments the binary tree with the BST property: left subtree values are less than the node's value, and right subtree values are greater. This enables search, insert, delete, min, max, successor, and predecessor operations by following a single root-to-leaf path.
The critical limitation of a plain BST is that its height depends on insertion order. Random insertions yield an expected height of , but sorted insertions produce a degenerate tree of height , reducing all operations to linear time. In the next chapter, we study balanced search trees — AVL trees and red-black trees — that maintain height through automatic rotations, guaranteeing efficient operations regardless of the input order.
Exercises
Exercise 9.1. Given the preorder traversal [8, 3, 1, 6, 4, 7, 10, 14, 13] of a BST, reconstruct the tree and write out the inorder and postorder traversals. Verify that the inorder traversal is sorted.
Exercise 9.2. Write an iterative (non-recursive) inorder traversal using an explicit stack. Compare its space usage with the recursive version. Under what circumstances might the iterative version be preferable?
Exercise 9.3. Prove that deleting a node from a BST using the successor-replacement method preserves the BST property. Specifically, argue that after replacing a two-children node with its in-order successor, every node in the left subtree is still less than the replacement, and every node in the right subtree is still greater.
Exercise 9.4. Write a function isBST(root) that checks whether a given binary tree satisfies the BST property. Your solution should run in time. Be careful with the common pitfall of only checking immediate children — for example, the tree with root 10, left child 5, and left child's right child 15 violates the BST property even though each parent-child relationship individually looks correct.
Exercise 9.5. Implement a function rangeQuery(bst, low, high) that returns all values in the BST that fall within , in sorted order. Your solution should run in time where is the number of values in the range, not . (Hint: adapt the inorder traversal to skip subtrees that cannot contain values in the range.)
Balanced Search Trees
In Chapter 9 we built a binary search tree that provides operations — fast when balanced, but potentially when degenerate. Inserting keys in sorted order produces a tree that is indistinguishable from a linked list. Balanced search trees solve this problem by restructuring the tree after every insert and delete, guaranteeing that the height remains regardless of the input order. In this chapter we study two classic self-balancing trees: AVL trees, which enforce a strict balance factor constraint, and red-black trees, which use node coloring to maintain a looser but equally effective bound.
The problem with unbalanced BSTs
Recall from Chapter 9 that every BST operation follows a single root-to-leaf path, giving time. For a balanced tree of nodes, , so all operations are logarithmic. But the height depends entirely on the insertion order.
Consider inserting the values 1, 2, 3, 4, 5 in order:
1
\
2
\
3
\
4
\
5
The tree has height 4 (one less than ), and every operation degrades to . Even if the average-case height for random insertions is , we cannot rely on the input being random — an adversary, a sorted file, or even a partially ordered stream can produce the worst case.
We need a tree that automatically rebalances after modifications. The key tool is the rotation — a local restructuring operation that changes the shape of a subtree without altering the in-order sequence of elements.
Rotations
A rotation rearranges a parent-child pair while preserving the BST property. There are two kinds:
Right rotation around node :
y x
/ \ / \
x C → A y
/ \ / \
A B B C
Node (the left child of ) becomes the new root of the subtree. The subtree , which was 's right child, becomes 's left child. All BST ordering is preserved: .
Left rotation around node :
x y
/ \ / \
A y → x C
/ \ / \
B C A B
This is the mirror image: (the right child of ) becomes the new root of the subtree.
Both rotations run in time — they only reassign a constant number of pointers. The critical insight is that rotations change the height of a subtree while keeping the sorted order intact. This is how balanced trees reduce height after an insertion or deletion disturbs the balance.
private rotateRight(y: AVLNode<T>): AVLNode<T> {
const x = y.left!;
const B = x.right;
// Perform rotation
x.right = y;
y.left = B;
// Update parents
x.parent = y.parent;
y.parent = x;
if (B !== null) B.parent = y;
// Update parent's child pointer
if (x.parent === null) {
this.root = x;
} else if (x.parent.left === y) {
x.parent.left = x;
} else {
x.parent.right = x;
}
// Update heights (y first since x is now y's parent)
this.updateHeight(y);
this.updateHeight(x);
return x;
}
AVL trees
The AVL tree (named after its inventors Adelson-Velsky and Landis, 1962) is the oldest self-balancing BST. It maintains the following invariant:
AVL property: For every node, the heights of its left and right subtrees differ by at most 1.
The balance factor of a node is . The AVL property requires that the balance factor of every node is , , or .
Height bound
An AVL tree with nodes has height at most . This bound comes from analyzing the minimum number of nodes in an AVL tree of height . Let be this minimum. Then:
The minimum AVL tree of height has a root, a minimum AVL subtree of height , and a minimum AVL subtree of height (the heights must differ by at most 1). This recurrence is closely related to the Fibonacci sequence, and its solution gives where is the golden ratio. Inverting, we get .
This means an AVL tree is at most about 44% taller than a perfectly balanced tree, guaranteeing operations.
Node structure
Each AVL node stores its height explicitly, which makes computing balance factors a constant-time operation:
class AVLNode<T> {
public left: AVLNode<T> | null = null;
public right: AVLNode<T> | null = null;
public parent: AVLNode<T> | null = null;
public height = 0;
constructor(public value: T) {}
}
Helper functions for height and balance factor:
private h(node: AVLNode<T> | null): number {
return node === null ? -1 : node.height;
}
private balanceFactor(node: AVLNode<T>): number {
return this.h(node.left) - this.h(node.right);
}
private updateHeight(node: AVLNode<T>): void {
node.height = 1 + Math.max(this.h(node.left), this.h(node.right));
}
Insertion
Insertion in an AVL tree starts with a standard BST insert, then walks back up the tree from the new node to the root, checking and fixing the balance factor at each ancestor.
After inserting a new leaf, the balance factor of some ancestors may become or . There are four cases, each resolved by one or two rotations:
Case 1: Left-Left (balance factor = +2, left child's balance factor ). The left subtree is too tall, and the imbalance is on the left side of the left child. A single right rotation fixes it:
z (+2) y
/ \ / \
y (+1) D → x z
/ \ / \ / \
x C A B C D
/ \
A B
Case 2: Right-Right (balance factor = -2, right child's balance factor ). The mirror of Case 1. A single left rotation fixes it.
Case 3: Left-Right (balance factor = +2, left child's balance factor = -1). The left subtree is too tall, but the imbalance is on the right side of the left child. A single rotation would not fix it — we need a double rotation: first left-rotate the left child, then right-rotate the node:
z (+2) z (+2) x
/ \ / \ / \
y (-1) D → x D → y z
/ \ / \ / \ / \
A x y C A B C D
/ \ / \
B C A B
Case 4: Right-Left (balance factor = -2, right child's balance factor = +1). The mirror of Case 3: right-rotate the right child, then left-rotate the node.
The rebalance procedure:
private rebalance(node: AVLNode<T>): AVLNode<T> {
this.updateHeight(node);
const bf = this.balanceFactor(node);
if (bf > 1) {
// Left-heavy
if (this.balanceFactor(node.left!) < 0) {
// Left-Right case: rotate left child left first
this.rotateLeft(node.left!);
}
// Left-Left case (or Left-Right reduced to Left-Left)
return this.rotateRight(node);
}
if (bf < -1) {
// Right-heavy
if (this.balanceFactor(node.right!) > 0) {
// Right-Left case: rotate right child right first
this.rotateRight(node.right!);
}
// Right-Right case (or Right-Left reduced to Right-Right)
return this.rotateLeft(node);
}
return node;
}
After insertion, we walk up from the new node's parent to the root, calling rebalance at each ancestor:
private rebalanceUp(node: AVLNode<T> | null): void {
let current = node;
while (current !== null) {
const parent = current.parent;
this.rebalance(current);
current = parent;
}
}
Tracing AVL insertions
Let us insert 1, 2, 3, 4, 5 — the sequence that degenerates a plain BST into a linked list.
Insert 1: Single node, height 0.
1
Insert 2: Standard BST insert to the right. Balance factors are all valid.
1
\
2
Insert 3: Insert to the right of 2. Now node 1 has balance factor (Right-Right case). Left-rotate around 1:
1 (-2) 2
\ / \
2 → 1 3
\
3
Insert 4: Insert to the right of 3. Balance factors are valid (root 2 has balance factor ).
2
/ \
1 3
\
4
Insert 5: Insert to the right of 4. Now node 3 has balance factor (Right-Right case). Left-rotate around 3:
2 2
/ \ / \
1 3 (-2) → 1 4
\ / \
4 3 5
After 5 insertions, the tree has height 2 — the minimum possible. A plain BST would have height 4.
Deletion
Deletion in an AVL tree uses the same three-case BST deletion algorithm from Chapter 9, followed by a rebalance walk from the lowest modified ancestor up to the root. The key difference from insertion is that deletion may require rotations at multiple ancestors (insertion requires at most one rotation point, but deletion can cascade):
private deleteNode(node: AVLNode<T>): void {
let rebalanceStart: AVLNode<T> | null;
if (node.left === null) {
rebalanceStart = node.parent;
this.transplant(node, node.right);
} else if (node.right === null) {
rebalanceStart = node.parent;
this.transplant(node, node.left);
} else {
const successor = this.minNode(node.right)!;
if (successor.parent !== node) {
rebalanceStart = successor.parent;
this.transplant(successor, successor.right);
successor.right = node.right;
successor.right.parent = successor;
} else {
rebalanceStart = successor;
}
this.transplant(node, successor);
successor.left = node.left;
successor.left.parent = successor;
}
this.rebalanceUp(rebalanceStart);
}
AVL complexity
| Operation | Time | Space |
|---|---|---|
| Search | ||
| Insert | ||
| Delete | ||
| Min / Max | ||
| Successor / Predecessor | ||
| Inorder traversal | ||
| Space (tree) | — |
Each node stores one extra field (height), so the per-node overhead is small. Search does zero rotations. Insert does at most 2 rotations (one rotation point), but deletion may rotate at ancestors in the worst case. All rotations are each.
Red-black trees
A red-black tree is a BST where each node carries a one-bit color attribute — red or black — and five properties constrain how colors can be arranged. Red-black trees allow a slightly less strict balance than AVL trees: the height can be up to versus AVL's . In exchange, they require fewer rotations during insertion and deletion, making them a popular choice in practice (used in std::map in C++, TreeMap in Java, and the Linux kernel's scheduling data structure).
Red-black properties
A valid red-black tree satisfies all five of these properties:
- Every node is either red or black.
- The root is black.
- Every leaf (NIL) is black. We use a sentinel NIL node rather than null pointers, which simplifies the algorithms.
- If a node is red, both its children are black. Equivalently, no path from root to leaf has two consecutive red nodes.
- For each node, all simple paths from that node to descendant leaves contain the same number of black nodes. This count is called the black-height of the node.
These properties together guarantee that no root-to-leaf path is more than twice as long as any other, which gives the height bound.
Height bound
The black-height of the root is the number of black nodes on any path from root to a leaf (not counting the root itself if we follow the convention, though CLRS counts the root). Because of Property 4 (no two reds in a row), a path of length has at least black nodes. Because of Property 5 (all paths have the same black-height), the shortest path is all black nodes and the longest alternates red and black. Therefore:
This guarantees operations.
Node structure and sentinel
Red-black tree implementations use a sentinel NIL node to represent all external leaves. This avoids null-checks throughout the rotation and fixup code:
enum Color {
Red = 'RED',
Black = 'BLACK',
}
class RBNode<T> {
public left: RBNode<T>;
public right: RBNode<T>;
public parent: RBNode<T>;
public color: Color;
constructor(public value: T, nil: RBNode<T>, color: Color = Color.Red) {
this.left = nil;
this.right = nil;
this.parent = nil;
this.color = color;
}
}
The sentinel is a single black node that serves as every leaf and as the parent of the root. When we write node.left === this.NIL, we are checking whether the node has no left child.
Insertion
Insertion follows the CLRS RB-INSERT algorithm:
- Insert the new node as a red leaf using standard BST insertion.
- Call
insertFixup(z)to restore the red-black properties.
The new node is colored red because inserting a black node would violate Property 5 (black-height would increase on exactly one path). A red node might violate Property 4 (if its parent is also red) or Property 2 (if it becomes the root), but these are easier to fix.
The fixup procedure handles three cases (and their symmetric mirrors when the parent is a right child):
Case 1: Uncle is red. Both the parent and uncle are red. Recolor the parent and uncle black and the grandparent red, then move up to the grandparent and repeat:
G (black) G (red)
/ \ / \
P (red) U (red) → P (black) U (black)
| |
z (red) z (red)
This fixes the local violation but may create a new red-red violation at and its parent. The fix propagates upward.
Case 2: Uncle is black, is an opposite-side child. If is a right child but its parent is a left child (or vice versa), rotate 's parent to convert to Case 3:
G G
/ \ / \
P U → z U
\ /
z P
Case 3: Uncle is black, is a same-side child. Rotate the grandparent and recolor:
G (black) P (black)
/ \ / \
P (red) U (black) → z (red) G (red)
| \
z (red) U (black)
After Case 3, the subtree root is black with two red children — no further fixing is needed.
The fixup terminates when:
- The parent is black (no violation), or
- We reach the root (color it black to satisfy Property 2).
private insertFixup(z: RBNode<T>): void {
let node = z;
while (node.parent.color === Color.Red) {
if (node.parent === node.parent.parent.left) {
const uncle = node.parent.parent.right;
if (uncle.color === Color.Red) {
// Case 1: uncle is red — recolor
node.parent.color = Color.Black;
uncle.color = Color.Black;
node.parent.parent.color = Color.Red;
node = node.parent.parent;
} else {
if (node === node.parent.right) {
// Case 2 → rotate to reduce to Case 3
node = node.parent;
this.rotateLeft(node);
}
// Case 3 — rotate grandparent
node.parent.color = Color.Black;
node.parent.parent.color = Color.Red;
this.rotateRight(node.parent.parent);
}
} else {
// Symmetric cases (parent is right child)
// ...
}
}
this.root.color = Color.Black;
}
Tracing red-black insertion
Let us insert the same sequence 1, 2, 3, 4, 5 into a red-black tree.
Insert 1: New node is red, but it is the root, so color it black.
1(B)
Insert 2: Insert as right child of 1. Node 2 is red, parent 1 is black — no violation.
1(B)
\
2(R)
Insert 3: Insert as right child of 2. Now 2 (red) has a red child 3 — violation of Property 4. Uncle of 3 is NIL (black), and 3 is a right child of a right child — Case 3 (Right-Right). Left-rotate grandparent 1 and recolor:
1(B) 2(B)
\ / \
2(R) → 1(R) 3(R)
\
3(R)
Insert 4: Insert as right child of 3. Now 3 (red) has red child 4 — violation. Uncle of 4 is 1 (red) — Case 1. Recolor: 1 and 3 become black, 2 becomes red. But 2 is the root, so immediately color it back to black:
2(B)
/ \
1(B) 3(B)
\
4(R)
Insert 5: Insert as right child of 4. Now 4 (red) has red child 5 — violation. Uncle of 5 is NIL (black), and 5 is a right child of a right child — Case 3. Left-rotate grandparent 3 and recolor:
2(B) 2(B)
/ \ / \
1(B) 3(B) → 1(B) 4(B)
\ / \
4(R) 3(R) 5(R)
\
5(R)
After 5 insertions, the tree has height 2 — well-balanced, with valid red-black properties.
Deletion
Red-black deletion is the most complex operation. The algorithm follows CLRS RB-DELETE:
- Perform standard BST deletion to remove the node. Track the color of the node that was actually removed or moved ('s original color) and the node that replaced it ().
- If the removed/moved node was black, call
deleteFixup(x)to restore the properties.
Removing a black node violates Property 5 (black-height consistency). The fixup pushes an "extra black" up the tree until it can be absorbed. There are four cases (and their mirrors):
Case 1: Sibling is red. Recolor black and the parent red, then rotate the parent. This converts to one of Cases 2–4 with a black sibling.
Case 2: Sibling is black, both of 's children are black. Move the extra black up by coloring red and moving to the parent.
Case 3: Sibling is black, 's far child is black, near child is red. Rotate and recolor to convert to Case 4.
Case 4: Sibling is black, 's far child is red. Rotate the parent, transfer colors, and make the far child black. This absorbs the extra black and terminates the fixup.
The details are intricate, but the key guarantee is that at most 3 rotations are performed per deletion — fewer than AVL deletion's potential rotations.
Verifying red-black properties
For testing and debugging, it is valuable to have a verification method that checks all five properties:
verify(): boolean {
// Property 2: root is black
if (this.root !== this.NIL && this.root.color !== Color.Black)
return false;
return this.verifyNode(this.root) >= 0;
}
private verifyNode(node: RBNode<T>): number {
if (node === this.NIL) return 0;
// Property 4: red node must have black children
if (node.color === Color.Red) {
if (node.left.color === Color.Red || node.right.color === Color.Red)
return -1;
}
const leftBH = this.verifyNode(node.left);
const rightBH = this.verifyNode(node.right);
if (leftBH < 0 || rightBH < 0) return -1;
// Property 5: equal black-height
if (leftBH !== rightBH) return -1;
return leftBH + (node.color === Color.Black ? 1 : 0);
}
This recursive procedure returns the black-height of each subtree, verifying Properties 4 and 5 simultaneously in time.
Red-black complexity
| Operation | Time | Rotations (worst case) |
|---|---|---|
| Search | 0 | |
| Insert | 2 | |
| Delete | 3 | |
| Min / Max | 0 | |
| Inorder traversal | 0 |
The per-node overhead is 1 bit (color), which is often stored in an otherwise unused alignment bit of a pointer.
B-trees
B-trees are balanced search trees designed for external storage — disks, SSDs, and databases — where the cost of each node access is high. Instead of binary branching, a B-tree of order allows each node to have up to children and store up to keys. This high branching factor means fewer levels and fewer disk accesses.
A B-tree of order satisfies:
- Every node has at most children.
- Every non-root internal node has at least children.
- The root has at least 2 children (unless it is a leaf).
- All leaves are at the same depth.
- A node with children stores keys.
For a B-tree of order 1000 storing one billion keys, the height is at most , meaning any key can be found in at most 4 disk reads. This is why B-trees and their variant B+ trees are the backbone of every major database system and filesystem.
We do not implement B-trees in this book because their primary benefit is I/O efficiency, which is difficult to demonstrate in an in-memory setting. The interested reader is referred to CLRS Chapter 18 or Wirth's Algorithms + Data Structures = Programs for detailed treatments.
Comparison of balanced tree variants
| Property | AVL tree | Red-black tree | B-tree |
|---|---|---|---|
| Height bound | |||
| Strictness | Tight (BF ) | Loose (path ratio ) | All leaves same depth |
| Search time | |||
| Insert rotations | 0 (splits instead) | ||
| Delete rotations | 0 (merges/redistributes) | ||
| Per-node overhead | Height (integer) | Color (1 bit) | Variable-size key arrays |
| Best use case | Lookup-heavy workloads | Insert/delete-heavy | Disk-based storage |
When to use which:
- AVL trees produce shorter, more tightly balanced trees. If your workload is search-heavy with few modifications, AVL trees will have slightly fewer comparisons per search.
- Red-black trees perform fewer rotations per modification. If your workload involves frequent insertions and deletions, red-black trees offer better amortized restructuring cost. Most language standard libraries choose red-black trees.
- B-trees are the right choice when data lives on disk and minimizing I/O operations is the priority.
Summary
Balanced search trees solve the fundamental problem of unbalanced BSTs by maintaining height invariants through automatic restructuring. AVL trees enforce a strict balance factor constraint (at most 1 difference between subtree heights), achieving a height bound of through four rotation cases applied during insertion and deletion. Red-black trees use a coloring scheme with five properties to maintain a height bound of , trading slightly taller trees for fewer rotations during modifications — at most 2 per insertion and 3 per deletion.
Both trees guarantee worst-case time for search, insert, delete, min, max, successor, and predecessor. AVL trees are preferred for lookup-heavy workloads due to shorter tree heights, while red-black trees are preferred for modification-heavy workloads due to fewer structural changes. B-trees, though not implemented here, extend the balancing concept to high-branching-factor trees optimized for disk access.
The rotations and rebalancing strategies studied in this chapter are fundamental techniques that appear throughout advanced data structures. In the next chapter, we turn to heaps and priority queues — another tree-based structure that maintains a different invariant (the heap property) for efficient extraction of minimum or maximum elements.
Exercises
Exercise 10.1. Insert the values 14, 17, 11, 7, 53, 4, 13, 12, 8 into an initially empty AVL tree. After each insertion, draw the tree and show any rotations that occur. Identify which of the four rotation cases (LL, RR, LR, RL) applies in each case.
Exercise 10.2. Prove that an AVL tree with nodes has height at most . (Hint: define as the minimum number of nodes in an AVL tree of height , establish the recurrence , and relate it to the Fibonacci sequence.)
Exercise 10.3. A red-black tree with internal nodes has height at most . Prove this. (Hint: show by induction that a subtree rooted at any node contains at least internal nodes, where is the black-height of . Then use Property 4 to relate height to black-height.)
Exercise 10.4. Consider a red-black tree where you insert the keys 1 through 15 in order. Draw the tree after all insertions. What is the resulting height? How does this compare to the height bound ?
Exercise 10.5. AVL trees and red-black trees both guarantee operations, but they make different trade-offs. Design an experiment to compare their performance: insert random integers, then perform searches, measuring the total number of comparisons for each tree type. Run the experiment for and report the average number of comparisons per search. Which tree type performs fewer comparisons per search? Which performs fewer rotations per insertion? Discuss when each tree would be preferred.
Heaps and Priority Queues
In the previous two chapters we studied binary search trees and their balanced variants — structures that maintain a total ordering of their elements for efficient search, insertion, and deletion. In this chapter we turn to a different kind of tree-based structure: the binary heap. A heap does not maintain a full sorted order; instead, it maintains a weaker heap property that ensures the minimum (or maximum) element is always at the root. This partial ordering is cheaper to maintain and gives us an efficient implementation of the priority queue abstract data type — a collection where we can always extract the highest-priority element in time, insert new elements in time, and peek at the top element in time.
The priority queue abstraction
Many algorithms need a data structure that answers the question: "What is the most urgent item?" Consider these examples:
- Dijkstra's algorithm (Chapter 13) repeatedly extracts the vertex with the smallest tentative distance.
- Prim's algorithm (Chapter 14) repeatedly extracts the lightest edge crossing a cut.
- Huffman coding (Chapter 17) repeatedly extracts the two lowest-frequency symbols.
- Operating system schedulers select the highest-priority process to run next.
- Event-driven simulations process events in chronological order.
In all these cases, the key operation is extract the element with the highest priority. A sorted array could answer this in time, but insertion would cost . An unsorted array allows insertion but extraction. We want for both — and that is exactly what a binary heap provides.
A priority queue supports the following operations:
| Operation | Description |
|---|---|
enqueue(value, priority) | Insert a value with a given priority |
dequeue() | Remove and return the highest-priority value |
peek() | Return the highest-priority value without removing it |
changePriority(value, newPriority) | Update the priority of an existing value |
The binary heap is the most common implementation of a priority queue, and the one we study in this chapter.
Binary heaps
A binary heap is a complete binary tree stored in an array. It satisfies two properties:
-
Shape property: The tree is a complete binary tree — every level is fully filled except possibly the last, which is filled from left to right. This guarantees the tree has height .
-
Heap property: For every node (other than the root), the value at 's parent is less than or equal to the value at (for a min-heap) or greater than or equal (for a max-heap).
The shape property means we can represent the tree as a flat array with no pointers. The heap property means the root always holds the minimum (or maximum) element.
Array representation
Because the tree is complete, we can map between tree positions and array indices using simple arithmetic. For a node at index (using 0-based indexing):
For example, the min-heap containing the values 1, 3, 5, 7, 4, 8, 6 is stored as:
Array: [1, 3, 5, 7, 4, 8, 6]
Index: 0 1 2 3 4 5 6
Tree view:
1 (index 0)
/ \
3 5 (indices 1, 2)
/ \ / \
7 4 8 6 (indices 3, 4, 5, 6)
Node 1 (at index 0) is the root. Its children are at indices 1 and 2. Node 3 (index 1) has children at indices 3 and 4. No pointers are needed — the parent-child relationships are computed from the index.
In TypeScript:
function parentIndex(i: number): number {
return Math.floor((i - 1) / 2);
}
function leftIndex(i: number): number {
return 2 * i + 1;
}
function rightIndex(i: number): number {
return 2 * i + 2;
}
The heap class
Our BinaryHeap<T> class stores elements in a flat array and accepts a comparator function to define the ordering. By default it uses ascending numeric comparison, producing a min-heap. Passing (a, b) => b - a produces a max-heap.
export class BinaryHeap<T> {
private data: T[] = [];
private readonly compare: Comparator<T>;
constructor(comparator?: Comparator<T>) {
this.compare = (comparator ?? numberComparator) as Comparator<T>;
}
get size(): number {
return this.data.length;
}
get isEmpty(): boolean {
return this.data.length === 0;
}
peek(): T | undefined {
return this.data[0];
}
// ...
}
The peek operation simply returns the root element at index 0 in time.
Heap operations
Sift-up (swim)
When we insert a new element at the end of the array, it may violate the heap property by being smaller than its parent (in a min-heap). Sift-up fixes this by repeatedly swapping the element with its parent until the heap property is restored or the element reaches the root.
Insert 2 into [1, 3, 5, 7, 4, 8, 6]:
Step 0: [1, 3, 5, 7, 4, 8, 6, 2] ← 2 appended at index 7
parent(7) = 3, at index 3
2 < 7, so swap
Step 1: [1, 3, 5, 2, 4, 8, 6, 7] ← 2 now at index 3
parent(3) = 1, at index 1
2 < 3, so swap
Step 2: [1, 2, 5, 3, 4, 8, 6, 7] ← 2 now at index 1
parent(1) = 0, at index 0
2 > 1, stop
The implementation:
private siftUp(index: number): void {
while (index > 0) {
const parent = parentIndex(index);
if (this.compare(this.data[index]!, this.data[parent]!) < 0) {
this.swap(index, parent);
index = parent;
} else {
break;
}
}
}
Since the tree has height , sift-up performs at most swaps.
Sift-down (sink)
When we remove the root, we replace it with the last element in the array. This element is likely too large for the root position. Sift-down fixes this by repeatedly swapping the element with its smaller child (in a min-heap) until the heap property is restored or the element reaches a leaf.
Extract min from [1, 2, 5, 3, 4, 8, 6, 7]:
Step 0: Remove root (1), move last element (7) to root:
[7, 2, 5, 3, 4, 8, 6]
Step 1: Compare 7 with children 2 (left) and 5 (right).
Smallest child is 2 at index 1. 7 > 2, so swap.
[2, 7, 5, 3, 4, 8, 6]
Step 2: Compare 7 with children 3 (left) and 4 (right).
Smallest child is 3 at index 3. 7 > 3, so swap.
[2, 3, 5, 7, 4, 8, 6]
Step 3: Index 3 has no children within bounds. Stop.
The implementation:
private siftDown(index: number): void {
const n = this.data.length;
while (true) {
let best = index;
const left = leftIndex(index);
const right = rightIndex(index);
if (left < n && this.compare(this.data[left]!, this.data[best]!) < 0) {
best = left;
}
if (right < n && this.compare(this.data[right]!, this.data[best]!) < 0) {
best = right;
}
if (best === index) break;
this.swap(index, best);
index = best;
}
}
Like sift-up, sift-down performs at most swaps.
Insert
Insertion appends the new element to the end of the array (maintaining the shape property) and then sifts up to restore the heap property:
insert(value: T): void {
this.data.push(value);
this.siftUp(this.data.length - 1);
}
Time: . The push is amortized, and sift-up traverses at most levels.
Extract
Extraction removes the root (the minimum element in a min-heap), replaces it with the last element, and sifts down:
extract(): T | undefined {
if (this.data.length === 0) return undefined;
if (this.data.length === 1) return this.data.pop()!;
const root = this.data[0]!;
this.data[0] = this.data.pop()!;
this.siftDown(0);
return root;
}
Time: . Moving the last element to the root is , and sift-down traverses at most levels.
Decrease-key
The decrease-key operation replaces an element's value with a smaller one (higher priority in a min-heap) and sifts up to restore order. This operation is essential for algorithms like Dijkstra's, where we discover shorter paths and need to update a vertex's tentative distance.
decreaseKey(index: number, newValue: T): void {
if (index < 0 || index >= this.data.length) {
throw new RangeError(
`Index ${index} out of bounds [0, ${this.data.length})`
);
}
if (this.compare(newValue, this.data[index]!) > 0) {
throw new Error('New value has lower priority than the current value');
}
this.data[index] = newValue;
this.siftUp(index);
}
Time: , since sift-up traverses at most the height of the tree.
Note that decrease-key requires knowing the index of the element to update. In practice, algorithms that use decrease-key maintain a separate map from elements to their heap indices, updating it during every swap.
Building a heap in
The naive approach to building a heap from elements is to insert them one at a time: insertions at each, for total. But we can do better.
Floyd's build-heap algorithm (1964) starts with the elements in arbitrary order and applies sift-down to every non-leaf node, working from the bottom of the tree to the root:
static from<T>(
elements: T[],
comparator?: Comparator<T>,
): BinaryHeap<T> {
const heap = new BinaryHeap<T>(comparator);
heap.data = elements.slice();
heap.buildHeap();
return heap;
}
private buildHeap(): void {
for (let i = parentIndex(this.data.length - 1); i >= 0; i--) {
this.siftDown(i);
}
}
Why is this ?
The key insight is that most nodes are near the bottom of the tree, where sift-down is cheap. In a complete binary tree with nodes:
- nodes are leaves (height 0) — sift-down does 0 swaps
- nodes are at height 1 — sift-down does at most 1 swap
- nodes are at height 2 — sift-down does at most 2 swaps
- In general, nodes are at height , each doing at most swaps
The total work is:
The series (this can be derived by differentiating the geometric series and setting ). Therefore:
This is a remarkable result: building a heap is linear, not . The intuition is that the expensive sift-downs (for nodes near the root) apply to very few nodes, while the cheap sift-downs (for nodes near the bottom) apply to many.
Why not sift-up?
If we tried to build a heap by sifting up from the first node to the last (simulating insertions), the analysis would be:
The problem is that the many leaf nodes would each sift up levels. Floyd's algorithm avoids this by processing nodes top-down (from the perspective of sift-down), so leaves do no work at all.
The priority queue interface
Our PriorityQueue<T> class wraps a BinaryHeap to provide a cleaner interface for the common case where each value has an associated numeric priority:
export interface PQEntry<T> {
value: T;
priority: number;
}
export class PriorityQueue<T> {
private heap: BinaryHeap<PQEntry<T>>;
constructor() {
this.heap = new BinaryHeap<PQEntry<T>>(
(a, b) => a.priority - b.priority
);
}
enqueue(value: T, priority: number): void {
this.heap.insert({ value, priority });
}
dequeue(): T | undefined {
const entry = this.heap.extract();
return entry?.value;
}
peek(): T | undefined {
return this.heap.peek()?.value;
}
changePriority(value: T, newPriority: number): boolean {
const arr = this.heap.toArray();
const idx = arr.findIndex((e) => Object.is(e.value, value));
if (idx === -1) return false;
arr[idx] = { value, priority: newPriority };
this.heap = BinaryHeap.from<PQEntry<T>>(
arr,
(a, b) => a.priority - b.priority,
);
return true;
}
}
Lower numeric priority values are dequeued first. To create a max-priority queue, negate the priorities when enqueuing.
The changePriority method finds the entry by value identity (Object.is) and rebuilds the heap. This is due to the linear scan. For Dijkstra's algorithm and similar performance-critical use cases, it is better to use BinaryHeap directly with an auxiliary index map for decrease-key — we will see this in Chapter 13.
Min-heap vs. max-heap
Our implementation uses the comparator pattern to support both min-heaps and max-heaps without separate classes:
// Min-heap (default): smallest element at root
const minHeap = new BinaryHeap<number>();
// Max-heap: largest element at root
const maxHeap = new BinaryHeap<number>((a, b) => b - a);
The only difference is the comparator. When compare(a, b) < 0, element a has higher priority and should be closer to the root. For a min-heap, we want the smallest element at the root, so compare(a, b) = a - b makes smaller values "win." For a max-heap, compare(a, b) = b - a reverses the ordering.
This is the same pattern used by Array.prototype.sort in JavaScript and by the Comparator<T> type used throughout this book.
Applications
Heap sort
We saw heap sort in Chapter 5: build a max-heap from the input, then repeatedly extract the maximum and place it at the end of the array. The BinaryHeap class in this chapter is the data structure that heap sort uses internally. Heap sort achieves worst-case time and extra space (when done in-place on the array).
Running median
Given a stream of numbers, maintain the median at all times. Use two heaps:
- A max-heap for the lower half of the numbers.
- A min-heap for the upper half.
When a new number arrives, insert it into the appropriate heap and rebalance so the heaps differ in size by at most 1. The median is the root of the larger heap (or the average of both roots if they are equal in size). Each insertion takes .
Event-driven simulation
Model a system as a series of events, each with a timestamp. Store events in a min-heap ordered by time. At each step, extract the earliest event, process it (which may generate new events), and insert any new events. The heap ensures events are always processed in chronological order.
smallest / largest elements
To find the smallest elements in an unsorted array of elements:
- Build a min-heap in .
- Extract times for a total of .
If , this is much faster than sorting the entire array.
Alternatively, maintain a max-heap of size . Scan the array; if an element is smaller than the heap's maximum, extract the max and insert the new element. This uses space and time.
Complexity summary
| Operation | Time | Space |
|---|---|---|
peek | ||
insert | amortized | |
extract | ||
decreaseKey | ||
buildHeap (from array) | for copy | |
size / isEmpty |
The space for the entire heap is , since it is stored as a contiguous array.
Compared to balanced BSTs, heaps trade away sorted-order iteration and efficient search ( to find an arbitrary element) in exchange for simpler implementation, better constant factors, and cache-friendly array storage. If you only need insert and extract-min, a heap is the right choice.
Summary
A binary heap is a complete binary tree stored in an array that maintains the heap property: every parent has higher priority than its children. This partial ordering — weaker than a sorted order — is cheaper to maintain and provides insertion and extraction of the highest-priority element, with peek.
The two fundamental operations are sift-up (restore order after insertion at the bottom) and sift-down (restore order after removal from the root). Floyd's build-heap algorithm constructs a heap from an arbitrary array in time — a result that follows from the observation that most nodes in a complete tree are near the bottom where sift-down is cheap.
The priority queue abstraction — enqueue with a priority, dequeue the highest-priority element — is directly implemented by a binary heap and is central to many graph algorithms (Dijkstra, Prim, Huffman). In the next chapters, we will put priority queues to work: Chapter 12 introduces graphs and graph traversal, and Chapter 13 uses priority queues as the backbone of Dijkstra's shortest-path algorithm.
Exercises
Exercise 11.1. Starting from an empty min-heap, insert the values 15, 10, 20, 8, 25, 12, 5, 18 one at a time. After each insertion, draw the heap as both a tree and an array. Verify the heap property holds at every step.
Exercise 11.2. Use Floyd's build-heap algorithm to construct a min-heap from the array . Show the array after processing each non-leaf node (from right to left). How many total swaps are performed? Compare this with the number of swaps that would result from inserting the elements one by one.
Exercise 11.3. Prove that the number of leaves in a complete binary tree stored in an array of length is . (Hint: the last non-leaf node is at index .)
Exercise 11.4. Design a data structure that supports insert, findMin, and findMax in time, and extractMin and extractMax in time. (Hint: maintain both a min-heap and a max-heap simultaneously, with cross-references between corresponding entries.)
Exercise 11.5. Implement a running median data structure that supports insert(x) in and median() in . Use two heaps: a max-heap for the lower half and a min-heap for the upper half. Write tests that insert a stream of 1000 random numbers and verify the median is correct after each insertion by comparing with a sorted-array baseline.
Graphs and Graph Traversal
In the previous chapters we studied data structures — arrays, linked lists, trees, heaps, hash tables — that organize data in essentially linear or hierarchical ways. Many real-world problems, however, involve relationships that are neither linear nor hierarchical: road networks, social connections, task dependencies, web links, circuit wiring. The natural abstraction for these problems is the graph. In this chapter we define graphs formally, implement two standard representations, and develop two fundamental traversal algorithms — breadth-first search (BFS) and depth-first search (DFS) — that form the basis for nearly every graph algorithm in the chapters that follow. We also study topological sorting and cycle detection, two direct applications of graph traversal.
What is a graph?
A graph consists of:
- A finite set of vertices (also called nodes).
- A set of edges (also called arcs), where each edge connects two vertices.
If every edge has a direction — going from one vertex to another — the graph is directed (a digraph). If edges have no direction, the graph is undirected. A weighted graph assigns a numeric weight to each edge; an unweighted graph treats all edges as having equal cost.
Key terminology:
- Adjacent vertices: Two vertices and are adjacent if there is an edge between them.
- Incident edge: An edge is incident to a vertex if the vertex is one of its endpoints.
- Degree: The number of edges incident to a vertex. In a directed graph, we distinguish in-degree (edges entering) and out-degree (edges leaving).
- Path: A sequence of vertices where each consecutive pair is connected by an edge. The length of the path is the number of edges, .
- Simple path: A path with no repeated vertices.
- Cycle: A path where and . A simple cycle has no repeated vertices except .
- Connected graph: An undirected graph where every pair of vertices is connected by some path.
- Connected component: A maximal connected subgraph.
- Strongly connected: In a directed graph, every vertex is reachable from every other vertex.
- DAG: A directed acyclic graph — a directed graph with no cycles.
- Dense graph: A graph where (many edges relative to vertices).
- Sparse graph: A graph where (few edges relative to vertices). Most real-world graphs are sparse.
Graph representations
There are two standard ways to represent a graph in memory: adjacency lists and adjacency matrices. The choice affects the time and space complexity of graph operations.
Adjacency list
An adjacency list stores, for each vertex, a collection of its neighbors. This is the preferred representation for sparse graphs, which includes most graphs encountered in practice.
Graph: 1 — 2 — 3 Adjacency list:
| | 1: [2, 4]
4 ——————┘ 2: [1, 3]
3: [2, 4]
4: [1, 3]
Space: . For each vertex we store its neighbor list; the total number of entries across all lists is for undirected graphs (each edge appears twice) or for directed graphs.
Our implementation uses a Map-based adjacency list. Each vertex maps to a Map of its neighbors and the corresponding edge weights:
export class Graph<T> {
private adj: Map<T, Map<T, number>> = new Map();
constructor(public readonly directed: boolean = false) {}
addVertex(v: T): void {
if (!this.adj.has(v)) {
this.adj.set(v, new Map());
}
}
addEdge(u: T, v: T, weight: number = 1): void {
this.addVertex(u);
this.addVertex(v);
this.adj.get(u)!.set(v, weight);
if (!this.directed) {
this.adj.get(v)!.set(u, weight);
}
}
hasEdge(u: T, v: T): boolean {
return this.adj.get(u)?.has(v) ?? false;
}
getNeighbors(v: T): [T, number][] {
const neighbors = this.adj.get(v);
if (!neighbors) return [];
return [...neighbors.entries()];
}
// ...
}
Using Map instead of a plain array gives us edge lookup and supports arbitrary vertex types — not just integers. The directed flag controls whether addEdge creates edges in both directions.
The complexity of common operations with an adjacency list:
| Operation | Time |
|---|---|
| Add vertex | |
| Add edge | |
| Remove edge | |
| Check edge | |
| Get neighbors | |
| Remove vertex | |
| Space |
Adjacency matrix
An adjacency matrix stores the graph as a matrix where holds the weight of the edge from to (or if no edge exists). Vertices must be identified by integer indices .
Graph: 0 — 1 Adjacency matrix:
| | 0 1 2
2 ——┘ 0 [ ∞ 1 1 ]
1 [ 1 ∞ 1 ]
2 [ 1 1 ∞ ]
Space: , regardless of the number of edges. This makes the adjacency matrix inefficient for sparse graphs but convenient for dense graphs, where the space is similar to an adjacency list.
export class GraphMatrix {
private matrix: number[][];
constructor(
size: number,
public readonly directed: boolean = false,
) {
this.matrix = Array.from({ length: size }, () =>
Array.from({ length: size }, () => Infinity),
);
}
addEdge(u: number, v: number, weight: number = 1): void {
this.matrix[u]![v] = weight;
if (!this.directed) {
this.matrix[v]![u] = weight;
}
}
hasEdge(u: number, v: number): boolean {
return this.matrix[u]![v] !== Infinity;
}
getNeighbors(v: number): [number, number][] {
const result: [number, number][] = [];
for (let i = 0; i < this.matrix.length; i++) {
if (this.matrix[v]![i] !== Infinity) {
result.push([i, this.matrix[v]![i]!]);
}
}
return result;
}
}
The complexity of common operations with an adjacency matrix:
| Operation | Time |
|---|---|
| Add edge | |
| Remove edge | |
| Check edge | |
| Get neighbors | |
| Space |
When to use which?
| Criterion | Adjacency list | Adjacency matrix |
|---|---|---|
| Space | ||
| Edge lookup | with Map | |
| Iterate neighbors | ||
| Best for | Sparse graphs | Dense graphs |
| Algorithms | BFS, DFS, Dijkstra, Kruskal | Floyd-Warshall, matrix algorithms |
Most graph algorithms iterate over the neighbors of each vertex, making the adjacency list the better choice for sparse graphs. The adjacency matrix is preferred when is close to or when constant-time edge lookups with integer indices are important (e.g., Floyd-Warshall in Chapter 13).
Throughout this book, we default to the adjacency list representation.
Breadth-first search (BFS)
Breadth-first search explores a graph level by level: it visits all vertices at distance from the source before any vertex at distance . This guarantees that BFS finds the shortest path (fewest edges) from the source to every reachable vertex in an unweighted graph.
The algorithm
BFS maintains a queue of vertices to visit. Starting from a source vertex :
- Enqueue and mark it as discovered with distance 0.
- While the queue is not empty:
a. Dequeue a vertex .
b. For each neighbor of that has not been discovered:
- Mark as discovered with distance .
- Record as the parent of .
- Enqueue .
The queue ensures that vertices are processed in the order they are discovered, which is the order of increasing distance from .
Trace-through
Consider the following undirected graph, starting BFS from vertex 1:
1 — 2 — 5
| |
3 — 4
| Step | Queue (front → back) | Process | Discover | Distance |
|---|---|---|---|---|
| 0 | [1] | — | 1 | d(1)=0 |
| 1 | [2, 3] | 1 | 2, 3 | d(2)=1, d(3)=1 |
| 2 | [3, 4, 5] | 2 | 4, 5 | d(4)=2, d(5)=2 |
| 3 | [4, 5] | 3 | — | (4 already discovered) |
| 4 | [5] | 4 | — | — |
| 5 | [] | 5 | — | — |
Every vertex is visited exactly once. The distances are correct: 2 and 3 are 1 edge from 1; 4 and 5 are 2 edges from 1.
Implementation
export interface BFSResult<T> {
parent: Map<T, T | undefined>;
distance: Map<T, number>;
order: T[];
}
export function bfs<T>(graph: Graph<T>, source: T): BFSResult<T> {
const parent = new Map<T, T | undefined>();
const distance = new Map<T, number>();
const order: T[] = [];
parent.set(source, undefined);
distance.set(source, 0);
order.push(source);
const queue: T[] = [source];
let head = 0;
while (head < queue.length) {
const u = queue[head++]!;
const d = distance.get(u)!;
for (const [v] of graph.getNeighbors(u)) {
if (!distance.has(v)) {
distance.set(v, d + 1);
parent.set(v, u);
order.push(v);
queue.push(v);
}
}
}
return { parent, distance, order };
}
We use an array with a head pointer as a simple queue (avoiding the overhead of a linked-list queue for this application). The distance map also serves as our "visited" set — a vertex has been discovered if and only if it has an entry in distance.
Path reconstruction
The parent map produced by BFS encodes a shortest-path tree. To reconstruct the shortest path from source to target, follow parent pointers backward from the target:
export function reconstructPath<T>(
parent: Map<T, T | undefined>,
source: T,
target: T,
): T[] | null {
if (!parent.has(target)) return null;
const path: T[] = [];
let current: T | undefined = target;
while (current !== undefined) {
path.push(current);
current = parent.get(current);
}
path.reverse();
if (path[0] !== source) return null;
return path;
}
Complexity
- Time: . Every vertex is enqueued and dequeued at most once (), and every edge is examined at most once (once for directed, twice for undirected) ().
- Space: for the queue, parent map, and distance map.
BFS is optimal for finding shortest paths in unweighted graphs. For weighted graphs, we need Dijkstra's algorithm (Chapter 13).
Depth-first search (DFS)
Depth-first search explores a graph by going as deep as possible along each branch before backtracking. Where BFS explores level by level (breadth-first), DFS explores path by path (depth-first).
The algorithm
DFS assigns two timestamps to each vertex:
- Discovery time : when the vertex is first encountered.
- Finish time : when all of 's descendants have been fully explored.
Starting from a source vertex, DFS:
- Mark the vertex as discovered (record discovery time).
- For each undiscovered neighbor, recursively visit it.
- Mark the vertex as finished (record finish time).
If the graph is disconnected, DFS restarts from unvisited vertices, producing a DFS forest.
Trace-through
Consider the directed graph:
1 → 2 → 3
↓ ↓
4 → 5 6
↑ |
└───┘
Starting DFS from vertex 1:
| Action | Vertex | Time | Stack (conceptual) |
|---|---|---|---|
| Discover | 1 | 0 | [1] |
| Discover | 2 | 1 | [1, 2] |
| Discover | 3 | 2 | [1, 2, 3] |
| Discover | 6 | 3 | [1, 2, 3, 6] |
| Discover | 5 | 4 | [1, 2, 3, 6, 5] |
| Finish | 5 | 5 | [1, 2, 3, 6] |
| Finish | 6 | 6 | [1, 2, 3] |
| Finish | 3 | 7 | [1, 2] |
| Finish | 2 | 8 | [1] |
| Discover | 4 | 9 | [1, 4] |
| — | (5 already discovered) | — | — |
| Finish | 4 | 10 | [1] |
| Finish | 1 | 11 | [] |
The discovery and finish times satisfy the parenthesis theorem: for any two vertices and , either the intervals and are entirely disjoint (neither is an ancestor of the other) or one is entirely contained within the other (one is an ancestor).
Edge classification
During DFS on a directed graph, every edge falls into one of four categories based on the state of when the edge is explored:
| Edge type | Condition | Meaning |
|---|---|---|
| Tree edge | is undiscovered | is discovered via this edge (part of the DFS tree) |
| Back edge | is discovered but not finished | is an ancestor of — indicates a cycle |
| Forward edge | is finished and | is a descendant of already fully explored via another path |
| Cross edge | is finished and | is in a different, already-finished subtree |
For undirected graphs, only tree edges and back edges are possible. Forward and cross edges cannot occur because every edge is traversed in both directions.
Implementation
export type EdgeType = 'tree' | 'back' | 'forward' | 'cross';
export interface ClassifiedEdge<T> {
from: T;
to: T;
type: EdgeType;
}
export interface DFSResult<T> {
discovery: Map<T, number>;
finish: Map<T, number>;
parent: Map<T, T | undefined>;
order: T[];
edges: ClassifiedEdge<T>[];
}
export function dfs<T>(
graph: Graph<T>,
startOrder?: T[],
): DFSResult<T> {
const discovery = new Map<T, number>();
const finish = new Map<T, number>();
const parent = new Map<T, T | undefined>();
const order: T[] = [];
const edges: ClassifiedEdge<T>[] = [];
let time = 0;
const vertices = startOrder ?? graph.getVertices();
function visit(u: T): void {
discovery.set(u, time++);
order.push(u);
for (const [v] of graph.getNeighbors(u)) {
if (!discovery.has(v)) {
edges.push({ from: u, to: v, type: 'tree' });
parent.set(v, u);
visit(v);
} else if (!finish.has(v)) {
if (!graph.directed && parent.get(u) === v) continue;
edges.push({ from: u, to: v, type: 'back' });
} else if (graph.directed) {
if (discovery.get(u)! < discovery.get(v)!) {
edges.push({ from: u, to: v, type: 'forward' });
} else {
edges.push({ from: u, to: v, type: 'cross' });
}
}
}
finish.set(u, time++);
}
for (const v of vertices) {
if (!discovery.has(v)) {
parent.set(v, undefined);
visit(v);
}
}
return { discovery, finish, parent, order, edges };
}
The three-state classification (undiscovered, discovered but not finished, finished) maps directly to the colors used in textbooks: white, gray, black.
For undirected graphs, we skip the edge back to the parent — this is the same undirected edge we just traversed to reach the current vertex, not a true back edge.
Complexity
- Time: . Each vertex is visited once (), and each edge is examined once for directed graphs or twice for undirected ().
- Space: for the recursion stack, parent map, discovery and finish times. In the worst case (a path graph), the recursion depth is .
Topological sort
A topological sort (or topological ordering) of a DAG is a linear ordering of all its vertices such that for every directed edge , vertex appears before in the ordering. In other words, if there is a path from to , then comes first.
Topological sort is only defined for directed acyclic graphs (DAGs). A directed graph with a cycle has no valid topological ordering — there is no way to place all vertices in a line when some edges point backward.
Applications
- Build systems (Make, Bazel): compile source files in dependency order.
- Task scheduling: schedule jobs so that each job's prerequisites are completed first.
- Course prerequisites: determine a valid order to take courses.
- Spreadsheet evaluation: compute cells in an order that respects formula dependencies.
- Package managers (npm, apt): install dependencies before dependents.
Kahn's algorithm (BFS-based)
Kahn's algorithm (1962) uses the idea that a vertex with no incoming edges can safely go first in the ordering:
- Compute the in-degree of every vertex.
- Add all vertices with in-degree 0 to a queue.
- While the queue is not empty: a. Dequeue a vertex and add it to the result. b. For each neighbor of , decrement 's in-degree. If 's in-degree becomes 0, enqueue .
- If the result contains all vertices, return it. Otherwise, the graph has a cycle.
export function topologicalSortKahn<T>(graph: Graph<T>): T[] | null {
const vertices = graph.getVertices();
const inDeg = new Map<T, number>();
for (const v of vertices) {
inDeg.set(v, 0);
}
for (const v of vertices) {
for (const [u] of graph.getNeighbors(v)) {
inDeg.set(u, (inDeg.get(u) ?? 0) + 1);
}
}
const queue: T[] = [];
for (const [v, deg] of inDeg) {
if (deg === 0) queue.push(v);
}
const order: T[] = [];
let head = 0;
while (head < queue.length) {
const u = queue[head++]!;
order.push(u);
for (const [v] of graph.getNeighbors(u)) {
const newDeg = inDeg.get(v)! - 1;
inDeg.set(v, newDeg);
if (newDeg === 0) queue.push(v);
}
}
return order.length === vertices.length ? order : null;
}
Cycle detection: If the graph has a cycle, some vertices will never reach in-degree 0 and will never be enqueued. The algorithm detects this by checking whether all vertices were processed.
DFS-based topological sort
An alternative approach uses DFS. A topological ordering is the reverse of the DFS finish-time order: the vertex that finishes last should appear first.
export function topologicalSortDFS<T>(graph: Graph<T>): T[] | null {
const vertices = graph.getVertices();
const enum Color { White, Gray, Black }
const color = new Map<T, Color>();
for (const v of vertices) {
color.set(v, Color.White);
}
const order: T[] = [];
let hasCycle = false;
function visit(u: T): void {
if (hasCycle) return;
color.set(u, Color.Gray);
for (const [v] of graph.getNeighbors(u)) {
const c = color.get(v)!;
if (c === Color.Gray) {
hasCycle = true;
return;
}
if (c === Color.White) {
visit(v);
if (hasCycle) return;
}
}
color.set(u, Color.Black);
order.push(u);
}
for (const v of vertices) {
if (color.get(v) === Color.White) {
visit(v);
if (hasCycle) return null;
}
}
order.reverse();
return order;
}
When we encounter a gray vertex (an ancestor on the current DFS path), we have found a back edge, which means the graph has a cycle.
Trace-through
Consider the "dressing order" DAG:
undershorts → pants → shoes
pants → belt → jacket
shirt → belt
shirt → tie → jacket
socks → shoes
watch (isolated)
Kahn's algorithm would start with vertices that have in-degree 0: undershorts, shirt, socks, watch. Processing them removes their outgoing edges, reducing in-degrees and producing new zero-in-degree vertices. A valid result:
undershorts, shirt, socks, watch, pants, tie, belt, shoes, jacket
DFS-based topological sort would produce a different but equally valid ordering based on which vertices are explored first.
Complexity
Both algorithms run in time and space.
Cycle detection
Cycle detection determines whether a graph contains a cycle. This is important for:
- Validating that a dependency graph is a DAG (and thus can be topologically sorted).
- Detecting deadlocks in resource allocation graphs.
- Identifying infinite loops in state machines.
Directed cycle detection
A directed graph has a cycle if and only if a DFS discovers a back edge — an edge to a vertex that is currently being explored (gray in the three-color scheme).
export function hasDirectedCycle<T>(graph: Graph<T>): boolean {
const enum Color { White, Gray, Black }
const color = new Map<T, Color>();
for (const v of graph.getVertices()) {
color.set(v, Color.White);
}
function visit(u: T): boolean {
color.set(u, Color.Gray);
for (const [v] of graph.getNeighbors(u)) {
const c = color.get(v)!;
if (c === Color.Gray) return true;
if (c === Color.White && visit(v)) return true;
}
color.set(u, Color.Black);
return false;
}
for (const v of graph.getVertices()) {
if (color.get(v) === Color.White && visit(v)) {
return true;
}
}
return false;
}
The three colors are essential for directed cycle detection. A vertex colored gray is on the current DFS path. If we encounter a gray vertex, we have found a cycle. A black vertex (already finished) is not on the current path — an edge to a black vertex is a cross or forward edge, not evidence of a cycle.
Undirected cycle detection
For undirected graphs, cycle detection is simpler. During DFS, if we encounter a visited vertex that is not the parent of the current vertex, we have found a cycle:
export function hasUndirectedCycle<T>(graph: Graph<T>): boolean {
const visited = new Set<T>();
function visit(u: T, parent: T | undefined): boolean {
visited.add(u);
for (const [v] of graph.getNeighbors(u)) {
if (!visited.has(v)) {
if (visit(v, u)) return true;
} else if (v !== parent) {
return true;
}
}
return false;
}
for (const v of graph.getVertices()) {
if (!visited.has(v)) {
if (visit(v, undefined)) return true;
}
}
return false;
}
We only need two states (visited / not visited) instead of three, because in an undirected graph every non-tree edge to a visited non-parent vertex indicates a cycle. There are no forward or cross edges to worry about.
Complexity
Both directed and undirected cycle detection run in time and space, since they are based on DFS.
Connected components
A connected component of an undirected graph is a maximal set of vertices such that every pair is connected by a path. BFS or DFS can find all connected components:
components = 0
for each vertex v:
if v is not visited:
BFS(v) or DFS(v) // marks all vertices in v's component
components += 1
Each traversal from an unvisited vertex discovers one component. The total time is since every vertex and edge is examined once across all traversals.
For directed graphs, the analogous concept is strongly connected components (SCCs): maximal sets of vertices where every vertex is reachable from every other vertex. Algorithms for finding SCCs (Kosaraju's, Tarjan's) build on DFS and will be discussed in later chapters.
BFS vs. DFS
| Property | BFS | DFS |
|---|---|---|
| Traversal order | Level by level | As deep as possible |
| Data structure | Queue | Stack (recursion or explicit) |
| Shortest paths (unweighted) | Yes | No |
| Edge classification (directed) | Tree, cross | Tree, back, forward, cross |
| Topological sort | Yes (Kahn's) | Yes (reverse finish order) |
| Cycle detection | Yes (via Kahn's / BFS topo sort) | Yes (back edge detection) |
| Memory | — may store entire level | — stack depth |
| Best for | Shortest paths, level-order | Cycle detection, topological sort, backtracking |
Both algorithms visit every vertex and edge exactly once (or twice for undirected edges), giving time. The choice between them depends on the problem:
- Use BFS when you need shortest paths in an unweighted graph or want to explore vertices in order of distance.
- Use DFS when you need to detect cycles, classify edges, compute topological orderings, or explore all paths for backtracking algorithms.
Summary
A graph models pairwise relationships between objects. The two standard representations — adjacency list ( space, efficient neighbor iteration) and adjacency matrix ( space, edge lookup) — offer different trade-offs suited to sparse and dense graphs respectively.
Breadth-first search explores vertices level by level using a queue, computing shortest distances in unweighted graphs in time. Depth-first search explores as deep as possible using recursion, assigning discovery and finish timestamps that enable edge classification into tree, back, forward, and cross edges.
Two important applications of DFS are topological sorting — producing a linear ordering of a DAG's vertices consistent with edge directions — and cycle detection — determining whether a graph contains a cycle by looking for back edges. Both run in time.
These traversal algorithms form the foundation for nearly every graph algorithm in the chapters that follow. In Chapter 13, we will combine BFS ideas with the priority queue from Chapter 11 to solve the single-source shortest-path problem on weighted graphs (Dijkstra's algorithm). In Chapter 14, we will use graph traversal to find minimum spanning trees.
Exercises
Exercise 12.1. Draw the adjacency list and adjacency matrix for the following directed graph. Which representation uses less space?
A → B → C
↓ ↑
D → E → F
Exercise 12.2. Run BFS on the following undirected graph starting from vertex . Record the discovery order, the distance from to each vertex, and the BFS tree (parent pointers). Show the state of the queue at each step.
s — a — b
| |
c — d — e
|
f
Exercise 12.3. Run DFS on the graph from Exercise 12.2 (treating it as directed with edges going both ways). Record discovery and finish times for each vertex. Verify that the parenthesis theorem holds: for every pair of vertices, the intervals and are either disjoint or one contains the other.
Exercise 12.4. A bipartite graph is an undirected graph whose vertices can be partitioned into two sets and such that every edge connects a vertex in to a vertex in . Prove that a graph is bipartite if and only if it contains no odd-length cycle. Then describe an algorithm to determine whether a graph is bipartite, using BFS. (Hint: try to 2-color the graph level by level.)
Exercise 12.5. A tournament is a directed graph where every pair of vertices is connected by exactly one directed edge. Prove that every tournament has a Hamiltonian path (a path that visits every vertex exactly once). Then describe an algorithm to find one. (Hint: use divide-and-conquer.)
Shortest Paths
In Chapter 12 we introduced BFS, which finds shortest paths in unweighted graphs — that is, paths with the fewest edges. Most real-world graphs, however, carry weights on their edges: travel times on a road map, latencies in a network, costs in a supply chain. In this chapter we study algorithms that find shortest paths in weighted graphs, where the length of a path is the sum of its edge weights rather than the number of edges. We present four algorithms, each suited to different settings: Dijkstra's algorithm for graphs with non-negative weights, Bellman-Ford for graphs that may have negative weights, a linear-time algorithm for DAGs, and Floyd-Warshall for computing shortest paths between all pairs of vertices.
The shortest-path problem
Given a weighted directed graph with edge-weight function and a source vertex , the single-source shortest-paths problem asks: for every vertex , what is the minimum-weight path from to ?
The weight of a path is
The shortest-path weight from to is
A shortest path from to is any path with .
Negative weights and negative cycles
When all edge weights are non-negative, shortest paths are well-defined. When negative-weight edges exist, a complication arises: a negative-weight cycle — a cycle whose total weight is negative — can be traversed repeatedly to make path weights arbitrarily negative. If such a cycle is reachable from the source, shortest-path distances are undefined for any vertex reachable from the cycle.
We will carefully note which algorithms handle negative weights and which detect negative cycles.
Relaxation
All single-source shortest-path algorithms share a common operation: relaxation. For each vertex we maintain an estimate of the shortest-path weight from the source (initially for all vertices except the source, which is ). Relaxing an edge checks whether the path through offers a shorter route to :
Relax(u, v, w):
if d[u] + w(u, v) < d[v]:
d[v] = d[u] + w(u, v)
parent[v] = u
The algorithms in this chapter differ in the order and number of times they relax edges.
Shared result type
Our implementations share a common result type representing shortest-path distances and predecessor pointers:
export interface ShortestPathResult<T> {
dist: Map<T, number>;
parent: Map<T, T | undefined>;
}
The parent map allows us to reconstruct the actual shortest path from source to any target:
export function reconstructPath<T>(
parent: Map<T, T | undefined>,
source: T,
target: T,
): T[] | null {
if (!parent.has(target)) return null;
const path: T[] = [];
let current: T | undefined = target;
while (current !== undefined) {
path.push(current);
current = parent.get(current);
}
path.reverse();
if (path[0] !== source) return null;
return path;
}
This is the same backtracking technique we used for BFS path reconstruction in Chapter 12: we follow parent pointers from the target back to the source, then reverse the result.
Dijkstra's algorithm
Dijkstra's algorithm (1959) solves the single-source shortest-paths problem for graphs with non-negative edge weights. It is the workhorse algorithm for shortest paths in practice — used in GPS navigation, network routing (OSPF), and countless other applications.
Intuition
The key insight is greedy: among all vertices whose shortest-path distance is not yet finalized, the one with the smallest current estimate already has the correct shortest-path distance. Why? Because all edge weights are non-negative, so any other path to must pass through a vertex with a distance estimate at least as large, making the total at least as long.
This is exactly analogous to BFS, except that instead of a FIFO queue (which processes vertices in order of number of edges), we use a priority queue ordered by distance estimates.
Algorithm
- Initialize and for all other vertices.
- Insert the source into a min-priority queue with priority .
- While the priority queue is not empty: a. Extract the vertex with the smallest priority. b. If has already been visited, skip it. c. Mark as visited. d. For each neighbor of , relax the edge . If the distance improves, insert into the priority queue with the new distance.
Implementation
import { Graph } from '../12-graphs-and-traversal/graph.js';
import { PriorityQueue } from '../11-heaps-and-priority-queues/priority-queue.js';
export function dijkstra<T>(
graph: Graph<T>,
source: T,
): ShortestPathResult<T> {
const dist = new Map<T, number>();
const parent = new Map<T, T | undefined>();
const visited = new Set<T>();
for (const v of graph.getVertices()) {
dist.set(v, Infinity);
}
dist.set(source, 0);
parent.set(source, undefined);
const pq = new PriorityQueue<T>();
pq.enqueue(source, 0);
while (!pq.isEmpty) {
const u = pq.dequeue()!;
if (visited.has(u)) continue;
visited.add(u);
for (const [v, weight] of graph.getNeighbors(u)) {
const newDist = dist.get(u)! + weight;
if (newDist < dist.get(v)!) {
dist.set(v, newDist);
parent.set(v, u);
pq.enqueue(v, newDist);
}
}
}
return { dist, parent };
}
Implementation note: Rather than implementing an explicit decrease-key operation, we insert a new entry into the priority queue whenever we find a shorter path. The visited set ensures we process each vertex only once — duplicate entries for already-visited vertices are simply skipped. This is a common practical optimization often called the "lazy Dijkstra" approach, and it does not affect correctness.
Trace-through
Consider the following directed graph with source :
| Edge | Weight |
|---|---|
| s → t | 10 |
| s → y | 5 |
| t → y | 2 |
| t → x | 1 |
| y → t | 3 |
| y → x | 9 |
| x → z | 4 |
| z → x | 6 |
| z → s | 7 |
Step-by-step execution from source :
| Step | Extract | Action | |||||
|---|---|---|---|---|---|---|---|
| Init | — | 0 | Enqueue with priority 0 | ||||
| 1 | 0 | 10 | 5 | Relax and | |||
| 2 | 0 | 8 | 5 | 14 | Relax (5+3=8 < 10) and (5+9=14) | ||
| 3 | 0 | 8 | 5 | 9 | Relax (8+1=9 < 14) | ||
| 4 | 0 | 8 | 5 | 9 | 13 | Relax (9+4=13) | |
| 5 | 0 | 8 | 5 | 9 | 13 | Done (z → s: 13+7=20 > 0, no update) |
Final shortest-path distances: , , , , .
Complexity
- Time: with a binary heap. Each vertex is extracted at most once ( total). Each edge triggers at most one priority queue insertion ( total).
- Space: for the graph plus for the priority queue and distance maps.
With a Fibonacci heap, the time complexity improves to , but Fibonacci heaps are complex to implement and have high constant factors. For most practical purposes, the binary-heap version is preferred.
Correctness argument
Dijkstra's algorithm is correct when all edge weights are non-negative. The proof relies on the following loop invariant: when a vertex is extracted from the priority queue, .
Sketch: Suppose for contradiction that is the first vertex extracted with . Consider the true shortest path from to . Let be the first edge on this path where has already been finalized but has not. When was finalized, edge was relaxed, so . But then would have been extracted before , contradicting our choice of . (This inequality relies on non-negative weights: each edge on the subpath from to contributes a non-negative amount.)
When Dijkstra fails
With negative edge weights, the greedy assumption breaks down. A vertex may be extracted with a distance estimate that is later revealed to be too high, because a path through a later-discovered vertex with a negative edge reaches more cheaply. For this reason, Dijkstra's algorithm produces incorrect results on graphs with negative edges.
Bellman-Ford algorithm
The Bellman-Ford algorithm (1958) solves the single-source shortest-paths problem for graphs with arbitrary edge weights — including negative weights. It also detects negative-weight cycles reachable from the source.
Algorithm
- Initialize and for all other vertices.
- Repeat times: relax every edge in the graph.
- Check for negative cycles: scan all edges once more. If any edge can still be relaxed, the graph has a negative-weight cycle reachable from the source.
Why iterations? A shortest path in a graph with no negative cycles has at most edges (it is a simple path). In iteration , the algorithm correctly computes shortest paths that use at most edges. After iterations, all shortest paths (with up to edges) are correctly computed.
Implementation
import { Graph } from '../12-graphs-and-traversal/graph.js';
export interface BellmanFordResult<T> extends ShortestPathResult<T> {
hasNegativeCycle: boolean;
}
export function bellmanFord<T>(
graph: Graph<T>,
source: T,
): BellmanFordResult<T> {
const vertices = graph.getVertices();
const dist = new Map<T, number>();
const parent = new Map<T, T | undefined>();
for (const v of vertices) {
dist.set(v, Infinity);
}
dist.set(source, 0);
parent.set(source, undefined);
// Relax all edges V-1 times.
const V = vertices.length;
for (let i = 0; i < V - 1; i++) {
let changed = false;
for (const u of vertices) {
const du = dist.get(u)!;
if (du === Infinity) continue;
for (const [v, weight] of graph.getNeighbors(u)) {
const newDist = du + weight;
if (newDist < dist.get(v)!) {
dist.set(v, newDist);
parent.set(v, u);
changed = true;
}
}
}
if (!changed) break; // Early termination
}
// Check for negative-weight cycles.
let hasNegativeCycle = false;
for (const u of vertices) {
const du = dist.get(u)!;
if (du === Infinity) continue;
for (const [v, weight] of graph.getNeighbors(u)) {
if (du + weight < dist.get(v)!) {
hasNegativeCycle = true;
break;
}
}
if (hasNegativeCycle) break;
}
return { dist, parent, hasNegativeCycle };
}
Early termination: If no distance estimate changes in an entire pass, all distances are final and we can stop early. This optimization does not improve the worst-case complexity but can significantly speed up the algorithm on graphs where shortest paths have few edges.
Trace-through
Consider the CLRS example graph (directed, with negative edges):
| Edge | Weight |
|---|---|
| s → t | 6 |
| s → y | 7 |
| t → x | 5 |
| t → y | 8 |
| t → z | −4 |
| y → x | −3 |
| y → z | 9 |
| x → t | −2 |
| z → s | 2 |
| z → x | 7 |
Running Bellman-Ford from source , after all passes converge:
| Vertex | Shortest path from | |
|---|---|---|
| s | 0 | — |
| t | 2 | s → y → x → t |
| x | 4 | s → y → x |
| y | 7 | s → y |
| z | −2 | s → y → x → t → z |
The shortest path to has weight , using two negative edges ( and ).
Complexity
- Time: . The outer loop runs at most times, and each iteration examines all edges.
- Space: for distances and parent pointers.
Negative cycle detection
The check in the final pass is both necessary and sufficient. If a negative cycle is reachable from the source, then after relaxation passes, at least one edge on the cycle can still be relaxed — because traversing the cycle one more time would further decrease the distance. Conversely, if no edge can be relaxed, then for all reachable vertices and no negative cycle exists.
DAG shortest paths
When the input graph is a directed acyclic graph (DAG), we can find shortest paths in time — even with negative edge weights. The idea is simple: process vertices in topological order.
Algorithm
- Compute a topological ordering of the DAG (using Kahn's algorithm or DFS, as described in Chapter 12).
- Initialize and for all other vertices.
- For each vertex in topological order: relax all outgoing edges of .
Since vertices are processed in topological order, when we relax the edges of , all vertices that could provide a shorter path to have already been processed. Every edge is relaxed exactly once.
Implementation
import { Graph } from '../12-graphs-and-traversal/graph.js';
import { topologicalSortKahn }
from '../12-graphs-and-traversal/topological-sort.js';
export function dagShortestPaths<T>(
graph: Graph<T>,
source: T,
): ShortestPathResult<T> {
const order = topologicalSortKahn(graph);
if (order === null) {
throw new Error(
'Graph contains a cycle; DAG shortest paths requires a DAG',
);
}
const dist = new Map<T, number>();
const parent = new Map<T, T | undefined>();
for (const v of graph.getVertices()) {
dist.set(v, Infinity);
}
dist.set(source, 0);
parent.set(source, undefined);
for (const u of order) {
const du = dist.get(u)!;
if (du === Infinity) continue;
for (const [v, weight] of graph.getNeighbors(u)) {
const newDist = du + weight;
if (newDist < dist.get(v)!) {
dist.set(v, newDist);
parent.set(v, u);
}
}
}
return { dist, parent };
}
Why this works
A topological order guarantees that for every edge , vertex is processed before . When we process and relax its outgoing edges, is already optimal — all predecessors of in the graph have already been processed. Therefore, each edge is relaxed exactly once, and after processing all vertices, for every reachable vertex.
This argument does not require non-negative weights. Even if edge has a negative weight, when we process we have the correct , so the relaxation computes the correct contribution of this edge.
Applications
DAG shortest paths are useful for:
- Critical path analysis (PERT/CPM): find the longest path in a project task graph to determine the minimum project duration. (Use negated weights to convert longest-path to shortest-path.)
- Dynamic programming on DAGs: many DP problems can be modeled as shortest or longest paths in a DAG.
- Pipeline scheduling: determine minimum latency through a pipeline of processing stages.
Complexity
- Time: — topological sort takes , and relaxing all edges takes .
- Space: .
This is asymptotically optimal: we must examine every edge at least once, and there are edges and vertices.
Floyd-Warshall algorithm
The previous three algorithms solve the single-source shortest-paths problem: shortest paths from one specific source vertex. The Floyd-Warshall algorithm (1962) solves a different problem: all-pairs shortest paths — the shortest distance between every pair of vertices simultaneously.
Of course, we could run Dijkstra's algorithm times (once from each vertex) to get all-pairs shortest paths in time. But Floyd-Warshall uses a different approach based on dynamic programming that runs in time, which is simpler to implement and competitive for dense graphs where .
The dynamic programming formulation
Define as the shortest-path weight from vertex to vertex using only vertices as intermediate vertices. The recurrence is:
In words: the shortest path from to through vertices either avoids vertex entirely (first term) or goes through (second term).
Base case: if edge exists, if not, and if .
Final answer: for all pairs .
Space optimization
The three nested loops can update the matrix in place. When computing , the values and are not modified by including vertex as an intermediate (setting or doesn't change the result). Therefore, we need only a single 2D matrix rather than copies.
Implementation
import { Graph } from '../12-graphs-and-traversal/graph.js';
export interface FloydWarshallResult<T> {
dist: number[][];
next: number[][];
vertices: T[];
}
export function floydWarshall<T>(
graph: Graph<T>,
): FloydWarshallResult<T> {
const vertices = graph.getVertices();
const V = vertices.length;
const indexOf = new Map<T, number>();
for (let i = 0; i < V; i++) {
indexOf.set(vertices[i]!, i);
}
// Initialize distance and next-hop matrices.
const dist: number[][] = Array.from({ length: V }, () =>
Array.from({ length: V }, () => Infinity),
);
const next: number[][] = Array.from({ length: V }, () =>
Array.from({ length: V }, () => -1),
);
for (let i = 0; i < V; i++) {
dist[i]![i] = 0;
next[i]![i] = i;
}
// Seed with direct edges.
for (const v of vertices) {
const u = indexOf.get(v)!;
for (const [neighbor, weight] of graph.getNeighbors(v)) {
const w = indexOf.get(neighbor)!;
if (weight < dist[u]![w]!) {
dist[u]![w] = weight;
next[u]![w] = w;
}
}
}
// DP: consider each vertex k as intermediate.
for (let k = 0; k < V; k++) {
for (let i = 0; i < V; i++) {
for (let j = 0; j < V; j++) {
const through_k = dist[i]![k]! + dist[k]![j]!;
if (through_k < dist[i]![j]!) {
dist[i]![j] = through_k;
next[i]![j] = next[i]![k]!;
}
}
}
}
return { dist, next, vertices };
}
The next matrix tracks the first hop on the shortest path from to , enabling path reconstruction:
export function reconstructPathFW(
next: number[][],
i: number,
j: number,
): number[] | null {
if (next[i]![j] === -1) return null;
const path = [i];
let current = i;
while (current !== j) {
current = next[current]![j]!;
if (current === -1) return null;
path.push(current);
}
return path;
}
Negative cycle detection
After running Floyd-Warshall, a negative-weight cycle exists if and only if some diagonal entry is negative: for some vertex . This means there is a path from back to with negative total weight.
export function hasNegativeCycle(
result: FloydWarshallResult<unknown>,
): boolean {
for (let i = 0; i < result.vertices.length; i++) {
if (result.dist[i]![i]! < 0) return true;
}
return false;
}
Complexity
- Time: — three nested loops, each iterating over vertices.
- Space: for the distance and next-hop matrices.
For dense graphs (), this matches running Dijkstra times: , so Floyd-Warshall is actually faster. For sparse graphs, running Dijkstra from each vertex is preferable.
Choosing the right algorithm
| Algorithm | Weights | Negative cycles | Source | Time | Space |
|---|---|---|---|---|---|
| Dijkstra | N/A | Single | |||
| Bellman-Ford | Any | Detects | Single | ||
| DAG shortest paths | Any | N/A (no cycles) | Single | ||
| Floyd-Warshall | Any | Detects | All pairs |
Decision guide:
- Non-negative weights, single source: Use Dijkstra. It is the fastest single-source algorithm for this common case.
- Negative weights possible, single source: Use Bellman-Ford. It handles negative weights and detects negative cycles.
- DAG with any weights, single source: Use DAG shortest paths. It is the fastest possible, running in linear time.
- All-pairs shortest paths, dense graph: Use Floyd-Warshall. Simple to implement and efficient for dense graphs.
- All-pairs shortest paths, sparse graph: Run Dijkstra from each vertex (), or use Johnson's algorithm (which combines Bellman-Ford reweighting with Dijkstra) for .
Summary
The shortest-path problem asks for minimum-weight paths in weighted graphs. Four algorithms address different variants of this problem.
Dijkstra's algorithm uses a greedy strategy with a priority queue, extracting vertices in order of increasing distance. It runs in time but requires non-negative edge weights. It is the standard choice for road networks, routing protocols, and other practical applications.
Bellman-Ford relaxes every edge times, running in time. It handles negative edge weights and detects negative-weight cycles. It is slower than Dijkstra but more general.
DAG shortest paths exploits the absence of cycles by processing vertices in topological order, achieving optimal time. It handles negative weights and is useful for scheduling and critical-path analysis.
Floyd-Warshall computes all-pairs shortest paths using dynamic programming in time and space. It handles negative weights and detects negative cycles. It is simple to implement and efficient for dense graphs.
All four algorithms use relaxation as the core operation. They differ in the order of relaxations (greedy by distance, repeated over all edges, topological order, or systematic DP over intermediate vertices) and the resulting time-space trade-offs. In Chapter 14, we will see a related problem — finding minimum spanning trees — that also uses edge relaxation but optimizes a different objective.
Exercises
Exercise 13.1. Run Dijkstra's algorithm on the following undirected graph from source . Show the state of the priority queue and the distance estimates after each extraction.
a ---3--- b ---1--- c
| | |
7 2 5
| | |
d ---4--- e ---6--- f
Exercise 13.2. Explain why Dijkstra's algorithm produces incorrect results on the following graph with source :
s --2--> a --(-5)--> b
| ^
+--------1---------->+
Show the incorrect distances Dijkstra computes and the correct distances.
Exercise 13.3. Run Bellman-Ford on the graph from Exercise 13.2 and verify that it produces the correct shortest-path distances. How many relaxation passes are needed before the algorithm converges?
Exercise 13.4. Consider a directed graph representing course prerequisites at a university. Each edge has a weight representing the "effort" of completing course after . Give an algorithm to find the minimum-effort path from a starting course to a target course. What property of this graph makes this possible?
Exercise 13.5. The transitive closure of a directed graph is a graph where if and only if there is a path from to in . Show how to compute the transitive closure using Floyd-Warshall. What is the time complexity? Can you modify the algorithm to use Boolean operations (AND, OR) instead of arithmetic for a constant-factor speedup?
Minimum Spanning Trees
In Chapter 13 we found shortest paths — the lightest routes between specific pairs of vertices. A different but equally important problem arises when we want to connect all vertices of a graph as cheaply as possible: laying cable between cities, wiring components on a circuit board, or clustering data points. The answer is a minimum spanning tree (MST). In this chapter we define the MST problem, establish the theoretical foundation — the cut property and cycle property — that makes greedy algorithms correct, and present two classic algorithms: Kruskal's algorithm, which sorts edges and uses a Union-Find data structure, and Prim's algorithm, which grows a tree from a single vertex using a priority queue.
The minimum spanning tree problem
Let be a connected, undirected graph with edge-weight function . A spanning tree of is a subgraph that:
- includes every vertex of ,
- is connected, and
- is acyclic (a tree).
Any spanning tree of a graph with vertices has exactly edges. A minimum spanning tree is a spanning tree whose total edge weight
is minimized over all spanning trees of . An MST is not necessarily unique — a graph can have multiple spanning trees with the same minimum total weight — but the minimum weight itself is unique.
If is disconnected, no spanning tree exists; instead we can find a minimum spanning forest, a collection of MSTs, one for each connected component.
Where MSTs appear
Minimum spanning trees arise naturally in many settings:
- Network design. Connecting cities with the least total cable, pipe, or road.
- Cluster analysis. Removing the most expensive edges from an MST partitions data into clusters (single-linkage clustering).
- Approximation algorithms. The MST provides a 2-approximation for the metric Travelling Salesman Problem (Chapter 22).
- Image segmentation. Treating pixels as vertices and pixel differences as edge weights, the MST captures the structure of an image.
Theoretical foundation
Both Kruskal's and Prim's algorithms are greedy — they build the MST by making locally optimal edge choices. The cut property and cycle property guarantee that these local choices lead to a globally optimal solution.
Cuts and light edges
A cut of a graph is a partition of the vertex set into two non-empty subsets. An edge crosses the cut if its endpoints are in different subsets. A cut respects a set of edges if no edge in crosses the cut. A light edge of a cut is a crossing edge with minimum weight among all crossing edges.
The cut property
Theorem (Cut Property). Let be a subset of some MST of , and let be any cut that respects . Let be a light edge crossing the cut. Then is a subset of some MST.
Proof sketch. Let be an MST containing . If already contains , we are done. Otherwise, adding to creates a cycle. This cycle must contain another edge crossing the cut (since crosses it and the cycle returns to the same side). Because is a light edge, . The tree is a spanning tree with , so is also an MST containing .
The cycle property
Theorem (Cycle Property). Let be any cycle in , and let be the unique heaviest edge in (strictly heavier than all other edges in ). Then does not belong to any MST.
Proof sketch. Suppose for contradiction that some MST contains . Removing from splits into two components. Since is a cycle, there exists another edge in connecting these two components. We have , so replacing with yields a spanning tree with smaller weight — contradicting the minimality of .
The cut property tells us which edges are safe to add; the cycle property tells us which edges are safe to exclude. Both Kruskal's and Prim's algorithms are instantiations of a generic greedy MST strategy that repeatedly applies the cut property.
Union-Find: the key data structure for Kruskal's algorithm
Kruskal's algorithm needs to efficiently determine whether adding an edge creates a cycle. This reduces to asking: "Are vertices and in the same connected component?" The Union-Find (also called Disjoint Set Union) data structure answers this question in nearly constant time.
Union-Find maintains a collection of disjoint sets and supports three operations:
- makeSet(x) — create a singleton set .
- find(x) — return the representative (root) of the set containing .
- union(x, y) — merge the sets containing and .
Union by rank
Each set is stored as a rooted tree. The rank of a node is an upper bound on its height. When merging two sets, we attach the shorter tree beneath the taller one, keeping the overall tree shallow:
union(x, y):
rootX = find(x)
rootY = find(y)
if rootX == rootY: return // already in same set
if rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else if rank[rootX] > rank[rootY]:
parent[rootY] = rootX
else:
parent[rootY] = rootX
rank[rootX] = rank[rootX] + 1
Without path compression, union by rank alone guarantees time per find.
Path compression
During a find operation, we make every node on the path from to the root point directly to the root. This "flattens" the tree, speeding up subsequent queries:
find(x):
root = x
while parent[root] != root:
root = parent[root]
// Compress: point every node on the path to root
while x != root:
next = parent[x]
parent[x] = root
x = next
return root
Combined complexity
With both path compression and union by rank, any sequence of operations on elements runs in time, where is the inverse Ackermann function. This function grows so slowly that for any up to — far beyond the number of atoms in the observable universe. For all practical purposes, each operation is .
Implementation
export class UnionFind<T> {
private parent = new Map<T, T>();
private rank = new Map<T, number>();
private _componentCount = 0;
makeSet(x: T): void {
if (this.parent.has(x)) return;
this.parent.set(x, x);
this.rank.set(x, 0);
this._componentCount++;
}
find(x: T): T {
let root = x;
while (this.parent.get(root) !== root) {
root = this.parent.get(root)!;
}
// Path compression.
let current = x;
while (current !== root) {
const next = this.parent.get(current)!;
this.parent.set(current, root);
current = next;
}
return root;
}
union(x: T, y: T): boolean {
const rootX = this.find(x);
const rootY = this.find(y);
if (rootX === rootY) return false;
const rankX = this.rank.get(rootX)!;
const rankY = this.rank.get(rootY)!;
if (rankX < rankY) {
this.parent.set(rootX, rootY);
} else if (rankX > rankY) {
this.parent.set(rootY, rootX);
} else {
this.parent.set(rootY, rootX);
this.rank.set(rootX, rankX + 1);
}
this._componentCount--;
return true;
}
connected(x: T, y: T): boolean {
return this.find(x) === this.find(y);
}
get componentCount(): number {
return this._componentCount;
}
}
We will revisit Union-Find in greater depth in Chapter 18, including a more thorough discussion of the amortized analysis and additional applications such as dynamic connectivity.
Kruskal's algorithm
Kruskal's algorithm (1956) builds the MST by processing edges in order of increasing weight. For each edge, it checks whether the edge connects two different components; if so, it adds the edge to the MST and merges the components.
Algorithm
Kruskal(G):
sort edges of G by weight (ascending)
initialize Union-Find with all vertices
MST = {}
for each edge (u, v, w) in sorted order:
if find(u) != find(v): // u and v in different components
MST = MST ∪ {(u, v, w)}
union(u, v)
return MST
Why it works
Each time Kruskal's adds an edge , the two components containing and define a cut: is the component containing , and contains . Edge is the lightest crossing edge (since we process edges in sorted order and all lighter crossing edges have already been processed — either added or rejected because they were within a single component). By the cut property, adding is safe.
Trace through an example
Consider this weighted graph:
A ---4--- B
| \ | \
8 2 6 7
| \ | \
H C ---4--- D
| / |
1 7 2
| / |
G ---6--- F
Sorted edges: , , , , , , , , , .
| Step | Edge | Weight | Action | Components |
|---|---|---|---|---|
| 1 | 1 | Add | , , , , , | |
| 2 | 2 | Add | , , , , | |
| 3 | 2 | Add | , , , | |
| 4 | 4 | Add | , , | |
| 5 | 4 | Add | , | |
| 6 | 6 | Reject | and in same component | |
| 7 | 6 | Add |
After adding 6 edges (which is for our 7-vertex graph), the MST is complete with total weight .
Implementation
import type { Edge } from '../types.js';
import { Graph } from '../12-graphs-and-traversal/graph.js';
import { UnionFind } from '../18-disjoint-sets/union-find.js';
export interface MSTResult<T> {
edges: Edge<T>[];
weight: number;
}
export function kruskal<T>(graph: Graph<T>): MSTResult<T> {
const vertices = graph.getVertices();
const edges = graph.getEdges();
// Sort edges by weight (ascending).
edges.sort((a, b) => a.weight - b.weight);
// Initialize Union-Find with all vertices.
const uf = new UnionFind<T>();
for (const v of vertices) {
uf.makeSet(v);
}
const mstEdges: Edge<T>[] = [];
let totalWeight = 0;
for (const edge of edges) {
if (!uf.connected(edge.from, edge.to)) {
uf.union(edge.from, edge.to);
mstEdges.push(edge);
totalWeight += edge.weight;
// An MST of V vertices has exactly V - 1 edges.
if (mstEdges.length === vertices.length - 1) break;
}
}
return { edges: mstEdges, weight: totalWeight };
}
Complexity
- Time: for sorting, plus for the union-find operations. Since (because ), the total is .
- Space: for the edge list and union-find structure.
Kruskal's algorithm is particularly well-suited for sparse graphs, where is much smaller than , and for situations where the edges are already available as a sorted list (e.g., from an external data source).
Prim's algorithm
Prim's algorithm (1957, independently discovered by Jarnik in 1930) takes a different approach: it grows the MST from a single starting vertex, always adding the lightest edge that connects the tree to a new vertex.
Algorithm
Prim(G, start):
initialize priority queue PQ
visited = {start}
insert all edges from start into PQ
MST = {}
while PQ is not empty and |MST| < |V| - 1:
(u, v, w) = PQ.extractMin() // lightest frontier edge
if v in visited: continue // already in tree
visited = visited ∪ {v}
MST = MST ∪ {(u, v, w)}
for each edge (v, x, w') where x not in visited:
PQ.insert((v, x, w'))
return MST
Why it works
At each step, the set of visited vertices defines one side of a cut, and the unvisited vertices form the other side. The priority queue ensures that we always select a light edge crossing this cut. By the cut property, this edge is safe to add.
Trace through an example
Using the same graph as before, starting from vertex :
| Step | Extract | Weight | Add to tree | Frontier edges added |
|---|---|---|---|---|
| 0 | — | — | start at | , , |
| 1 | 2 | , , , | ||
| 2 | 2 | |||
| 3 | 4 | |||
| 4 | 4 | — | ||
| 5 | 6 | |||
| 6 | 1 | — |
MST weight: — the same as Kruskal's result.
Notice that the edges may be added in a different order than Kruskal's, but the total weight is identical.
Implementation
import type { Edge } from '../types.js';
import { Graph } from '../12-graphs-and-traversal/graph.js';
import { BinaryHeap } from '../11-heaps-and-priority-queues/binary-heap.js';
interface HeapEntry<T> {
vertex: T;
weight: number;
from: T;
}
export function prim<T>(graph: Graph<T>, start?: T): MSTResult<T> {
const vertices = graph.getVertices();
const source = start ?? vertices[0]!;
const visited = new Set<T>();
const mstEdges: Edge<T>[] = [];
let totalWeight = 0;
// Min-heap ordered by edge weight.
const heap = new BinaryHeap<HeapEntry<T>>(
(a, b) => a.weight - b.weight,
);
// Seed the heap with all edges from the source.
visited.add(source);
for (const [neighbor, weight] of graph.getNeighbors(source)) {
heap.insert({ vertex: neighbor, weight, from: source });
}
while (!heap.isEmpty && visited.size < vertices.length) {
const entry = heap.extract()!;
if (visited.has(entry.vertex)) continue;
// Add this vertex to the tree.
visited.add(entry.vertex);
mstEdges.push({
from: entry.from,
to: entry.vertex,
weight: entry.weight,
});
totalWeight += entry.weight;
// Add frontier edges from the newly added vertex.
for (const [neighbor, weight] of graph.getNeighbors(entry.vertex)) {
if (!visited.has(neighbor)) {
heap.insert({ vertex: neighbor, weight, from: entry.vertex });
}
}
}
return { edges: mstEdges, weight: totalWeight };
}
Our implementation uses a binary heap directly (rather than the PriorityQueue wrapper) for efficiency. Each edge may be inserted into the heap, and stale entries (edges to already-visited vertices) are simply discarded on extraction.
Complexity
- Time: with a binary heap. Each of the edges is inserted into the heap (at most once), and each insertion/extraction costs . With a Fibonacci heap, this improves to , which is better for dense graphs.
- Space: for the visited set and the heap.
Prim's algorithm is well-suited for dense graphs, especially with a Fibonacci heap. For sparse graphs, Kruskal's is often simpler and equally efficient.
Kruskal's vs. Prim's
| Feature | Kruskal's | Prim's |
|---|---|---|
| Strategy | Global edge sorting | Local vertex growing |
| Data structure | Union-Find | Priority queue (heap) |
| Time (binary heap) | ||
| Time (Fibonacci heap) | — | |
| Best for | Sparse graphs | Dense graphs |
| Parallelism | Edges can be processed in parallel (with concurrent union-find) | Inherently sequential |
| Disconnected graphs | Produces spanning forest naturally | Spans only one component per call |
| Simplicity | Very simple to implement | Slightly more complex |
Both algorithms produce MSTs of identical total weight. When the graph is sparse (), Kruskal's is often preferred for its simplicity. When the graph is dense () and a Fibonacci heap is available, Prim's has a theoretical edge.
Correctness and uniqueness
When is the MST unique?
An MST is unique if and only if every cut of the graph has a unique light edge. Equivalently, if all edge weights are distinct, the MST is unique. When edges share weights, there may be multiple MSTs, but they all have the same total weight.
Verifying an MST
Given a claimed MST, we can verify it in time by checking:
- The tree has exactly edges.
- The tree spans all vertices (use union-find or BFS/DFS).
- For every non-tree edge , the weight of is at least as large as the maximum edge weight on the path from to in the tree (cycle property).
Summary
A minimum spanning tree of a connected, undirected, weighted graph is a spanning tree with minimum total edge weight. The cut property guarantees that the lightest edge crossing any cut is safe to include, while the cycle property guarantees that the heaviest edge in any cycle is safe to exclude.
Kruskal's algorithm sorts all edges by weight and greedily adds edges that do not create a cycle, using a Union-Find data structure for efficient cycle detection. It runs in time and naturally produces a spanning forest for disconnected graphs.
Prim's algorithm grows the MST from a single vertex, always adding the lightest edge connecting the tree to a new vertex, using a priority queue to select the minimum-weight frontier edge. It also runs in with a binary heap, improving to with a Fibonacci heap.
Both algorithms are greedy, both are correct by the cut property, and both produce MSTs of identical total weight. Kruskal's is typically preferred for sparse graphs and for its simplicity; Prim's is preferred for dense graphs, especially when a Fibonacci heap is available. The Union-Find data structure introduced here — with path compression and union by rank — achieves near-constant amortized time per operation and will reappear in Chapter 18 and in the approximation algorithms of Chapter 22.
Exercises
Exercise 14.1. Run Kruskal's algorithm on the following weighted graph. Show the state of the Union-Find structure after each edge addition and the final MST.
1 ---5--- 2 ---3--- 3
| | |
6 2 7
| | |
4 ---4--- 5 ---1--- 6
Exercise 14.2. Run Prim's algorithm on the same graph from Exercise 14.1, starting from vertex 1. Show the contents of the priority queue after each step.
Exercise 14.3. Prove that if all edge weights are distinct, the minimum spanning tree is unique. (Hint: assume two distinct MSTs exist and derive a contradiction using the cycle property.)
Exercise 14.4. A bottleneck spanning tree is a spanning tree that minimizes the weight of its maximum-weight edge. Prove that every MST is a bottleneck spanning tree. Is the converse true?
Exercise 14.5. You are given a connected, weighted, undirected graph and its MST . A new edge is added to . Describe an efficient algorithm to update the MST. What is the time complexity? (Hint: adding the new edge to creates exactly one cycle.)
Network Flow
In Chapters 12–14 we studied graphs from the perspective of connectivity and distance — traversals, shortest paths, and spanning trees. In this chapter we shift focus to a fundamentally different question: how much "stuff" can we push through a network? Imagine oil flowing through a pipeline, data packets traversing a computer network, or goods moving through a supply chain. Each link has a limited capacity, and we want to maximize the total throughput from a designated source to a designated sink. This is the maximum flow problem, one of the most versatile tools in combinatorial optimization. We develop the Ford-Fulkerson method, prove the celebrated max-flow min-cut theorem, and implement the efficient Edmonds-Karp variant that guarantees polynomial running time. We then show how maximum flow solves the maximum bipartite matching problem — assigning jobs to workers, students to schools, or organs to patients.
Flow networks
A flow network is a directed graph in which each edge has a non-negative capacity . Two distinguished vertices are the source and the sink , where . We assume that every vertex lies on some path from to (otherwise it is irrelevant to the flow problem).
If , we define for convenience.
Flows
A flow in is a function satisfying two constraints:
-
Capacity constraint. For all :
-
Flow conservation. For every vertex :
In words, the flow into any internal vertex equals the flow out of it — flow is neither created nor destroyed except at the source and sink.
The value of a flow is the net flow leaving the source:
The maximum flow problem asks: find a flow of maximum value .
Where network flow appears
Network flow arises in a remarkable variety of applications:
- Transportation and logistics. Routing goods through a supply chain with capacity-limited links.
- Computer networks. Maximizing data throughput between two hosts.
- Bipartite matching. Assigning workers to jobs, students to projects, or doctors to hospitals (we cover this later in the chapter).
- Image segmentation. Partitioning an image into foreground and background by finding a minimum cut.
- Baseball elimination. Determining whether a team has been mathematically eliminated from contention.
- Project selection. Choosing which projects to fund when some projects depend on others.
The power of network flow lies not just in the max-flow problem itself, but in the large number of combinatorial problems that reduce to it.
The Ford-Fulkerson method
The Ford-Fulkerson method (1956) is a general strategy for computing maximum flow. It repeatedly finds augmenting paths — paths from source to sink along which more flow can be pushed — and increases the flow until no augmenting path remains.
Residual graphs
Given a flow network and a flow , the residual graph has the same vertex set as and contains two types of edges for each original edge :
-
Forward edge with residual capacity , representing unused capacity that can still carry more flow.
-
Reverse edge with residual capacity , representing flow that can be "cancelled" — pushed back — to reroute it through a better path.
An edge appears in only if its residual capacity is positive.
Augmenting paths
An augmenting path is a simple path from to in the residual graph . The bottleneck capacity of the path is the minimum residual capacity along its edges:
We can increase the flow by by pushing flow along the augmenting path: for each forward edge, increase the flow; for each reverse edge, decrease the flow on the corresponding original edge.
The Ford-Fulkerson algorithm
FordFulkerson(G, s, t):
Initialize f(u, v) = 0 for all (u, v)
while there exists an augmenting path p in G_f:
c_f(p) = min residual capacity along p
for each edge (u, v) in p:
if (u, v) is a forward edge:
f(u, v) = f(u, v) + c_f(p)
else: // (u, v) is a reverse edge
f(v, u) = f(v, u) - c_f(p)
return f
The method is correct but does not specify how to find the augmenting path. Different choices lead to different running times. With arbitrary path selection and irrational capacities, Ford-Fulkerson may not even terminate. The Edmonds-Karp variant fixes this by using BFS.
The max-flow min-cut theorem
Before presenting Edmonds-Karp, let us establish the theoretical foundation that justifies the Ford-Fulkerson approach.
Cuts
A cut of a flow network is a partition of into two sets and such that and . The capacity of a cut is the sum of capacities of edges crossing from to :
Note that we only count edges from to , not from to .
The net flow across a cut is:
A key lemma: for any flow and any cut , the net flow across the cut equals the value of the flow: . This follows from flow conservation at internal vertices.
Since the flow across any cut cannot exceed the cut's capacity, we get:
This holds for every cut — so the maximum flow is at most the minimum cut capacity.
The theorem
Theorem (Max-Flow Min-Cut). In any flow network, the following three conditions are equivalent:
- is a maximum flow.
- The residual graph contains no augmenting path from to .
- for some cut .
Proof sketch. : If an augmenting path existed, we could increase the flow, contradicting maximality. : If no augmenting path exists, define as the set of vertices reachable from in . Since , is a valid cut. Every edge from to must be saturated (otherwise the endpoint would be reachable), and every edge from to must carry zero flow (otherwise the reverse edge would be in ). Therefore . : Since for all cuts, equality with some cut implies is maximum.
This theorem has a profound consequence: the maximum flow through a network equals the minimum capacity of any cut separating source from sink. It also tells us that when the Ford-Fulkerson method terminates (no augmenting path exists), the flow is guaranteed to be maximum. As a bonus, the source-side vertices reachable in the final residual graph give us the minimum cut.
Edmonds-Karp algorithm
The Edmonds-Karp algorithm (1972) is a refinement of Ford-Fulkerson that uses breadth-first search (BFS) to find augmenting paths. By always choosing a shortest augmenting path (fewest edges), it guarantees termination in augmenting path iterations, giving a total running time of .
Why shortest augmenting paths?
The key insight is that when we always augment along shortest paths, the distances in the residual graph never decrease over successive iterations. More precisely:
Lemma. Let denote the shortest-path distance (number of edges) from to in the residual graph . If Edmonds-Karp augments flow to obtain flow , then for all .
This monotonicity property, combined with the observation that each augmenting path saturates at least one edge (which then temporarily disappears from the residual graph), yields:
Theorem. The Edmonds-Karp algorithm performs at most augmenting path iterations.
Since each BFS takes time, the total running time is . For dense graphs this is ; for sparse graphs it is .
Pseudocode
EdmondsKarp(G, s, t):
Initialize f(u, v) = 0 for all (u, v)
repeat:
// BFS in residual graph to find shortest augmenting path
parent = BFS(G_f, s, t)
if t is not reachable: break
// Find bottleneck capacity
bottleneck = infinity
v = t
while v != s:
u = parent[v]
bottleneck = min(bottleneck, c_f(u, v))
v = u
// Augment flow along the path
v = t
while v != s:
u = parent[v]
push bottleneck units of flow along (u, v)
v = u
maxFlow = maxFlow + bottleneck
return maxFlow
Trace through an example
Consider the following flow network (based on the classic CLRS example):
| Edge | Capacity |
|---|---|
| s → v1 | 16 |
| s → v2 | 13 |
| v1 → v2 | 4 |
| v1 → v3 | 12 |
| v2 → v1 | 10 |
| v2 → v4 | 14 |
| v3 → v2 | 9 |
| v3 → t | 20 |
| v4 → v3 | 7 |
| v4 → t | 4 |
Iteration 1. BFS finds the shortest path s → v1 → v3 → t (3 edges). Bottleneck = min(16, 12, 20) = 12. Push 12 units. Total flow = 12.
After augmentation, the residual graph has:
- s → v1: residual 4 (was 16, used 12)
- v1 → v3: residual 0 (saturated)
- v3 → v1: residual 12 (reverse edge)
- v3 → t: residual 8 (was 20, used 12)
Iteration 2. BFS finds s → v2 → v4 → t (3 edges). Bottleneck = min(13, 14, 4) = 4. Push 4 units. Total flow = 16.
Iteration 3. BFS finds s → v2 → v4 → v3 → t (4 edges). Bottleneck = min(9, 10, 7, 8) = 7. Push 7 units. Total flow = 23.
After iteration 3, no augmenting path exists in the residual graph. The maximum flow is 23.
The minimum cut is , . The cut edges and their capacities are:
| Cut edge | Capacity |
|---|---|
| v1 → v3 | 12 |
| v4 → v3 | 7 |
| v4 → t | 4 |
| Total | 23 |
This confirms the max-flow min-cut theorem: the minimum cut capacity equals the maximum flow.
TypeScript implementation
Our implementation uses a self-contained residual graph structure with efficient integer-keyed maps. Vertices of any type are supported — each vertex is assigned a unique integer ID, and edge capacities are stored in a compact map keyed by Cantor-paired vertex IDs.
The result type captures the max flow value, the per-edge flow assignment, and the min-cut:
export interface FlowEdge<T> {
from: T;
to: T;
capacity: number;
flow: number;
}
export interface MaxFlowResult<T> {
maxFlow: number;
flowEdges: FlowEdge<T>[];
minCut: Set<T>;
}
The core algorithm follows the Edmonds-Karp approach — BFS for augmenting paths, bottleneck computation, and flow augmentation:
export function edmondsKarp<T>(
edges: { from: T; to: T; capacity: number }[],
source: T,
sink: T,
): MaxFlowResult<T> {
if (source === sink) {
throw new Error('Source and sink must be different vertices');
}
const residual = new ResidualGraph<T>();
residual.addVertex(source);
residual.addVertex(sink);
for (const { from, to, capacity } of edges) {
residual.addEdge(from, to, capacity);
}
let maxFlow = 0;
while (true) {
const parent = residual.bfs(source, sink);
if (parent === null) break;
// Find the bottleneck capacity along the path.
let bottleneck = Infinity;
let v: T = sink;
while (v !== source) {
const u = parent.get(v) as T;
bottleneck = Math.min(
bottleneck,
residual.getResidualCapacity(u, v),
);
v = u;
}
// Augment flow along the path.
v = sink;
while (v !== source) {
const u = parent.get(v) as T;
residual.pushFlow(u, v, bottleneck);
v = u;
}
maxFlow += bottleneck;
}
// The min-cut is the set of vertices reachable from the source
// in the final residual graph (BFS from source with no path to sink).
const minCut = residual.reachableFrom(source);
const flowEdges = residual.getFlowEdges();
return { maxFlow, flowEdges, minCut };
}
The residual graph internally maps each vertex to a sequential integer ID and uses Cantor pairing to compute a single numeric key for each edge. This ensures correct behavior even when vertices are objects (where String() would not produce unique keys).
After termination, the algorithm computes the minimum cut by running BFS from the source in the final residual graph. The set of reachable vertices forms the source side of the min-cut — exactly as prescribed by the max-flow min-cut theorem.
Complexity analysis
-
Time: . Each BFS takes . The number of augmenting path iterations is bounded by because: (a) distances in the residual graph never decrease; and (b) after at most augmentations at a given distance, some critical edge is permanently saturated, increasing the distance. Since distances are bounded by , we get iterations total.
-
Space: for the residual graph, adjacency lists, and BFS data structures.
Application: maximum bipartite matching
One of the most elegant applications of network flow is solving the maximum bipartite matching problem.
The matching problem
A bipartite graph has two disjoint vertex sets (left) and (right), with edges only between and . A matching is a subset such that no vertex appears in more than one edge of . A maximum matching is a matching of largest possible size.
Bipartite matching models many real-world assignment problems:
- Job assignment. = workers, = jobs, edge means worker is qualified for job . Maximum matching assigns the most workers to jobs.
- Course enrollment. = students, = courses. Maximum matching enrolls the most students.
- Organ donation. = donors, = recipients. Maximum matching saves the most lives.
Reduction to max flow
We reduce bipartite matching to max flow by constructing a flow network:
- Add a super-source and a super-sink .
- For each left vertex , add edge with capacity 1.
- For each right vertex , add edge with capacity 1.
- For each bipartite edge , add edge with capacity 1.
1 1 1
s ────▶ L1 ────▶ R1 ────▶ t
│ 1 ╲ 1 ▲
├──▶ L2 ──────▶ R2 ──────┤
│ 1 1 ╲ 1 │
└──▶ L3 ────▶ R3 ────────┘
1 1
Why it works. Since all capacities are 1, any integer flow corresponds to a matching:
- Capacity-1 edges from to ensure each left vertex sends at most 1 unit of flow — it is matched to at most one right vertex.
- Capacity-1 edges from to ensure each right vertex receives at most 1 unit — it is matched to at most one left vertex.
- An edge carries flow 1 if and only if is matched to .
The integrality theorem for network flow guarantees that when all capacities are integers, there exists a maximum flow that is also integral. Therefore the maximum flow value equals the maximum matching size.
TypeScript implementation
export interface BipartiteMatchingResult<L, R> {
size: number;
matches: [L, R][];
}
export function bipartiteMatching<L, R>(
left: L[],
right: R[],
edges: [L, R][],
): BipartiteMatchingResult<L, R> {
const source = { kind: 'source' };
const sink = { kind: 'sink' };
const leftVertices = new Map<L, FlowVertex>();
const rightVertices = new Map<R, FlowVertex>();
for (const l of left)
leftVertices.set(l, { kind: 'left', value: l });
for (const r of right)
rightVertices.set(r, { kind: 'right', value: r });
const flowEdges = [];
for (const lv of leftVertices.values())
flowEdges.push({ from: source, to: lv, capacity: 1 });
for (const rv of rightVertices.values())
flowEdges.push({ from: rv, to: sink, capacity: 1 });
for (const [l, r] of edges) {
const lv = leftVertices.get(l);
const rv = rightVertices.get(r);
if (lv && rv)
flowEdges.push({ from: lv, to: rv, capacity: 1 });
}
const result = edmondsKarp(flowEdges, source, sink);
const matches = [];
for (const fe of result.flowEdges) {
if (fe.flow === 1
&& fe.from.kind === 'left'
&& fe.to.kind === 'right') {
matches.push([fe.from.value, fe.to.value]);
}
}
return { size: result.maxFlow, matches };
}
The implementation uses tagged vertex objects ({ kind: 'left', value: l }) to prevent name collisions between left vertices, right vertices, the source, and the sink. Since our Edmonds-Karp implementation uses identity-based vertex comparison (via Map), these object vertices are compared by reference — exactly what we need.
Complexity analysis
In the constructed flow network, and . With unit capacities, Edmonds-Karp terminates in augmenting path iterations (since each augmentation increases the flow by 1 and the maximum flow is at most ), giving:
- Time: where and is the number of bipartite edges.
- Space: for the flow network.
Trace through an example
Consider assigning workers to jobs:
| Worker | Qualified for |
|---|---|
| Alice | Job 1, Job 2 |
| Bob | Job 1 |
| Carol | Job 2, Job 3 |
The bipartite graph has and .
Iteration 1. BFS finds s → Alice → Job1 → t. Push 1 unit. Flow = 1.
Iteration 2. BFS finds s → Bob → Job1, but Job1 → t is saturated. Through the reverse edge (Job1 → Alice, residual capacity 1), BFS discovers the path: s → Bob → Job1 → Alice → Job2 → t. Push 1 unit. Flow = 2.
This rerouting is the power of augmenting paths in matching: Bob "steals" Job 1 from Alice, and Alice is reassigned to Job 2.
Iteration 3. BFS finds s → Carol → Job3 → t. Push 1 unit. Flow = 3.
Result: Maximum matching of size 3: {Bob → Job 1, Alice → Job 2, Carol → Job 3}.
Notice how the algorithm found a perfect matching even though a greedy approach (match Alice → Job 1 first) would have left Bob unmatched. The augmenting path through reverse edges enabled the rerouting.
Beyond Edmonds-Karp
The Edmonds-Karp algorithm is a clean, practical choice for many applications, but faster max-flow algorithms exist:
| Algorithm | Time complexity | Notes |
|---|---|---|
| Ford-Fulkerson (DFS) | = max flow value; not polynomial | |
| Edmonds-Karp (BFS) | Polynomial; simple to implement | |
| Dinic's algorithm | Uses blocking flows; faster in practice | |
| Push-relabel | or | No augmenting paths; local operations |
| Orlin's algorithm | Optimal for sparse graphs |
For bipartite matching specifically, Hopcroft-Karp achieves by finding multiple augmenting paths simultaneously.
In practice, Edmonds-Karp and Dinic's are the most commonly implemented. Dinic's algorithm is particularly effective on unit-capacity networks (like bipartite matching), where it achieves — matching Hopcroft-Karp.
Summary
In this chapter we studied network flow — a rich framework for maximizing throughput in capacity-constrained networks.
- A flow network is a directed graph with edge capacities, a source, and a sink. A flow assigns values to edges satisfying capacity and conservation constraints.
- The Ford-Fulkerson method finds maximum flow by iteratively discovering augmenting paths in the residual graph and pushing flow along them.
- The max-flow min-cut theorem proves that the maximum flow equals the minimum cut capacity — a deep duality result that connects optimization (max flow) with combinatorics (min cut).
- Edmonds-Karp uses BFS to find shortest augmenting paths, guaranteeing time. This polynomial bound makes it practical for moderately sized networks.
- Maximum bipartite matching reduces elegantly to max flow: add a super-source and super-sink with unit-capacity edges, and the max flow equals the maximum matching size. The integrality theorem ensures integer solutions.
- The min-cut computed as a by-product of max flow identifies the source-reachable vertices in the final residual graph — useful for applications like image segmentation and network reliability analysis.
Network flow is one of the most versatile tools in algorithm design. Many problems that seem unrelated — assignment, scheduling, connectivity, and partitioning — can be modeled as flow problems and solved efficiently with the algorithms in this chapter.
Exercises
Exercise 15.1. Consider the following flow network with edges: s → A (capacity 5), s → B (capacity 3), A → t (capacity 4), A → C (capacity 2), B → C (capacity 5), C → t (capacity 6).
(a) Find the maximum flow by tracing Edmonds-Karp (BFS-based augmenting paths). (b) Identify the minimum cut and verify that its capacity equals the max flow. (c) What is the flow assignment on each edge?
Exercise 15.2. Prove that in any flow network, the total flow into the sink equals the total flow out of the source. (Hint: sum the flow conservation constraints over all vertices except and .)
Exercise 15.3. A company has 4 workers and 4 tasks. The qualification matrix is:
| Task A | Task B | Task C | Task D | |
|---|---|---|---|---|
| Worker 1 | Yes | Yes | ||
| Worker 2 | Yes | Yes | ||
| Worker 3 | Yes | Yes | Yes | |
| Worker 4 | Yes |
(a) Model this as a bipartite matching problem and find the maximum matching. (b) Is a perfect matching possible? If so, find one. If not, explain why.
Exercise 15.4. Modify the Edmonds-Karp algorithm to handle lower bounds on edge flows: each edge has both a capacity and a minimum flow requirement , so . Describe how to transform this into a standard max-flow problem. (Hint: introduce excess supply and demand at vertices based on the lower bounds.)
Exercise 15.5. König's theorem states that in a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover. Using the max-flow min-cut theorem applied to the bipartite matching reduction, prove König's theorem. (Hint: show how the minimum cut in the flow network corresponds to a minimum vertex cover in the bipartite graph.)
Dynamic Programming
In the preceding chapters we met two powerful algorithm design paradigms: divide-and-conquer (Chapter 3) breaks a problem into independent subproblems, and greedy algorithms (Chapter 17) build solutions by making locally optimal choices. Dynamic programming (DP) occupies the territory between them. Like divide-and-conquer, it solves problems by combining solutions to subproblems. But unlike divide-and-conquer, those subproblems overlap — the same subproblem is needed by many larger subproblems. Instead of recomputing these answers, DP saves them in a table and reuses them, trading space for an often dramatic reduction in time. In this chapter we develop a systematic approach to dynamic programming and apply it to seven classic problems: Fibonacci numbers, coin change, longest common subsequence, edit distance, 0/1 knapsack, matrix chain multiplication, and the longest increasing subsequence.
When does dynamic programming apply?
A problem is amenable to dynamic programming when it exhibits two properties:
-
Optimal substructure. An optimal solution to the problem contains optimal solutions to its subproblems. For example, if the shortest path from to passes through , then the sub-path from to must itself be a shortest path from to .
-
Overlapping subproblems. The recursive decomposition of the problem leads to the same subproblems being solved many times. If every subproblem were solved only once, there would be nothing to save — and a straightforward divide-and-conquer approach would suffice.
When both properties hold, we can avoid redundant computation by storing subproblem solutions in a table and looking them up rather than recomputing them.
Memoization vs tabulation
There are two standard ways to implement dynamic programming:
Top-down with memoization
Start from the original problem and recurse. Before computing a subproblem, check whether its solution is already cached. If so, return the cached value; otherwise, compute it, cache it, and return it. This approach is sometimes called memoization (from "memo" — a note to oneself).
Advantages:
- Only solves subproblems that are actually needed.
- The recursive structure mirrors the mathematical recurrence directly.
Disadvantages:
- Recursion overhead (call stack).
- Possible stack overflow on very deep recursions.
Bottom-up with tabulation
Solve subproblems in an order such that when we need a subproblem's solution, it has already been computed. Typically this means solving subproblems from "smallest" to "largest" using iterative loops and storing results in an array or table.
Advantages:
- No recursion overhead.
- Constant per-subproblem overhead.
- Often allows space optimization (keeping only the last row or two of the table).
Disadvantages:
- Must determine a valid computation order in advance.
- May compute subproblems that are not needed for the final answer.
In practice, bottom-up tabulation is more common because it avoids stack overhead and enables space optimizations. We use it for most examples in this chapter.
A systematic approach to DP
For each problem in this chapter, we follow a five-step recipe:
- Define subproblems. Characterize the space of subproblems in terms of one or more indices (or parameters).
- Write the recurrence. Express the solution to a subproblem in terms of solutions to smaller subproblems.
- Identify base cases. Determine the values of the smallest subproblems directly.
- Determine computation order. Choose an order in which to fill the table so that dependencies are satisfied.
- Recover the solution. Extract the answer from the table, and optionally backtrack to find the actual solution (not just its value).
Fibonacci numbers: the introductory example
The Fibonacci sequence is defined by:
This is the simplest illustration of how DP transforms an exponential algorithm into a linear one.
Naive recursion
Directly translating the recurrence into code:
export function fibNaive(n: number): number {
if (n < 0) throw new RangeError('n must be non-negative');
if (n <= 1) return n;
return fibNaive(n - 1) + fibNaive(n - 2);
}
The recursion tree for shows massive redundancy:
F(5)
/ \
F(4) F(3)
/ \ / \
F(3) F(2) F(2) F(1)
/ \ / \ / \
F(2) F(1) F(1) F(0) F(1) F(0)
/ \
F(1) F(0)
is computed twice, three times, and so on. The total number of calls grows exponentially — — because the same subproblems are solved over and over.
Top-down with memoization
Adding a cache eliminates the redundancy:
export function fibMemo(n: number): number {
if (n < 0) throw new RangeError('n must be non-negative');
const memo = new Map<number, number>();
function fib(k: number): number {
if (k <= 1) return k;
const cached = memo.get(k);
if (cached !== undefined) return cached;
const result = fib(k - 1) + fib(k - 2);
memo.set(k, result);
return result;
}
return fib(n);
}
Now each subproblem is computed at most once and then looked up in time, giving total time and space.
Bottom-up with tabulation
We can go further by eliminating the recursion entirely. Since only depends on and , we need to store only two values at any time:
export function fibTabulated(n: number): number {
if (n < 0) throw new RangeError('n must be non-negative');
if (n <= 1) return n;
let prev2 = 0;
let prev1 = 1;
for (let i = 2; i <= n; i++) {
const current = prev1 + prev2;
prev2 = prev1;
prev1 = current;
}
return prev1;
}
Complexity. Time , space .
The progression from time to time with space is the essence of dynamic programming.
Coin change
The coin change problem has two variants:
- Minimum coins: Given denominations and a target amount , find the fewest coins that sum to .
- Count ways: Count the number of distinct combinations of coins that sum to .
Minimum coins
Sub-problems. Let be the minimum number of coins needed to make amount .
Recurrence.
Base case. (zero coins to make amount zero).
Computation order. Fill in increasing order.
export function minCoinChange(
denominations: number[],
amount: number,
): MinCoinsResult {
if (amount < 0) throw new RangeError('amount must be non-negative');
if (amount === 0) return { minCoins: 0, coins: [] };
const dp = new Array<number>(amount + 1).fill(Infinity);
const parent = new Array<number>(amount + 1).fill(-1);
dp[0] = 0;
for (let i = 1; i <= amount; i++) {
for (const coin of denominations) {
if (coin <= i && dp[i - coin]! + 1 < dp[i]!) {
dp[i] = dp[i - coin]! + 1;
parent[i] = coin;
}
}
}
if (dp[amount] === Infinity) {
return { minCoins: -1, coins: [] };
}
// Backtrack to recover the coins used.
const coins: number[] = [];
let remaining = amount;
while (remaining > 0) {
coins.push(parent[remaining]!);
remaining -= parent[remaining]!;
}
return { minCoins: dp[amount]!, coins };
}
Complexity. Time where is the amount and is the number of denominations. Space .
Example. Denominations , amount 11. A greedy approach would pick (2 coins), which happens to be optimal. For amount 10, however, greedy picks (5 coins), while the optimal is (2 coins). Dynamic programming always finds the minimum.
Counting the number of ways
To count the number of distinct combinations (not permutations) that sum to , we iterate denominations in the outer loop to avoid counting the same combination multiple times:
export function countCoinChange(denominations: number[], amount: number): number {
if (amount < 0) throw new RangeError('amount must be non-negative');
const dp = new Array<number>(amount + 1).fill(0);
dp[0] = 1; // one way to make 0: use no coins
for (const coin of denominations) {
for (let i = coin; i <= amount; i++) {
dp[i] = dp[i]! + dp[i - coin]!;
}
}
return dp[amount]!;
}
Complexity. Time , space .
The key subtlety is the loop order. If we iterated amounts in the outer loop and denominations in the inner loop, we would count permutations ( and as separate), not combinations.
Longest common subsequence
Given two sequences and , a common subsequence is a sequence that appears (in order, but not necessarily contiguously) in both and . The longest common subsequence (LCS) problem asks for a common subsequence of maximum length.
Applications. LCS is fundamental in:
diffutilities — computing the minimal set of changes between two files.- Bioinformatics — comparing DNA, RNA, or protein sequences.
- Version control — finding differences between file versions.
The DP formulation
Sub-problems. Let be the length of the LCS of and .
Recurrence.
Base cases. for all .
Computation order. Fill the table row by row, left to right.
The intuition: if the last characters match, they must be part of an optimal alignment, so we include them and recurse on the remaining prefixes. If they do not match, we try dropping the last character from each sequence and take the better result.
export function lcs<T>(a: readonly T[], b: readonly T[]): LCSResult<T> {
const m = a.length;
const n = b.length;
const dp: number[][] = Array.from({ length: m + 1 }, () =>
new Array<number>(n + 1).fill(0),
);
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (a[i - 1] === b[j - 1]) {
dp[i]![j] = dp[i - 1]![j - 1]! + 1;
} else {
dp[i]![j] = Math.max(dp[i - 1]![j]!, dp[i]![j - 1]!);
}
}
}
// Backtrack to recover the subsequence.
const subsequence: T[] = [];
let i = m;
let j = n;
while (i > 0 && j > 0) {
if (a[i - 1] === b[j - 1]) {
subsequence.push(a[i - 1]!);
i--;
j--;
} else if (dp[i - 1]![j]! > dp[i]![j - 1]!) {
i--;
} else {
j--;
}
}
subsequence.reverse();
return { length: dp[m]![n]!, subsequence };
}
Complexity. Time , space .
Example. For and , the LCS has length 4 — one solution is BCBA.
Space optimization
If we only need the LCS length (not the actual subsequence), we can reduce space to by keeping only two rows of the table at a time: the previous row and the current row.
Edit distance
The edit distance (or Levenshtein distance) between two strings and is the minimum number of single-character operations needed to transform into . The allowed operations are:
- Insert a character into .
- Delete a character from .
- Substitute one character in with another.
Edit distance is closely related to LCS — in fact, the edit distance between two strings of lengths and is when only insertions and deletions are allowed. With substitutions, the relationship is more nuanced.
Applications. Edit distance is used in spell checkers, DNA sequence alignment, natural language processing, and fuzzy string matching.
The DP formulation
Sub-problems. Let be the edit distance between and .
Recurrence.
The three terms in the minimum correspond to:
- : delete .
- : insert .
- : substitute with .
Base cases. (delete all characters from ) and (insert all characters of ).
export function editDistance(a: string, b: string): EditDistanceResult {
const m = a.length;
const n = b.length;
const dp: number[][] = Array.from({ length: m + 1 }, () =>
new Array<number>(n + 1).fill(0),
);
for (let i = 0; i <= m; i++) dp[i]![0] = i;
for (let j = 0; j <= n; j++) dp[0]![j] = j;
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (a[i - 1] === b[j - 1]) {
dp[i]![j] = dp[i - 1]![j - 1]!;
} else {
dp[i]![j] =
1 +
Math.min(
dp[i - 1]![j]!, // delete
dp[i]![j - 1]!, // insert
dp[i - 1]![j - 1]!, // substitute
);
}
}
}
// ... backtrack to recover operations ...
return { distance: dp[m]![n]!, operations };
}
Complexity. Time , space .
Example. kitten → sitting requires 3 operations:
- Substitute
k→s(sitten) - Substitute
e→i(sittin) - Insert
gat the end (sitting)
Recovering the edit script
By backtracking through the DP table from to , we can recover the actual sequence of edit operations. At each cell, we determine which operation was used (match, substitute, insert, or delete) by comparing the cell's value with its neighbors. Our implementation returns an array of EditStep objects, each recording the operation type and the characters involved.
0/1 Knapsack
The 0/1 knapsack problem models a fundamental resource allocation trade-off: given items, each with a weight and a value , and a knapsack of capacity , select a subset of items that maximizes total value without exceeding the capacity.
The "0/1" qualifier means each item is either taken or left — no fractions. This distinguishes it from the fractional knapsack problem (Chapter 17), which has a greedy solution.
The DP formulation
Sub-problems. Let be the maximum value achievable using items with capacity .
Recurrence.
For each item, we choose the better of two options: skip it (value stays at ) or take it (add its value to the best we can do with the remaining capacity).
Base cases. for all (no items, no value).
export function knapsack(items: KnapsackItem[], capacity: number): KnapsackResult {
if (capacity < 0) throw new RangeError('capacity must be non-negative');
const n = items.length;
const dp: number[][] = Array.from({ length: n + 1 }, () =>
new Array<number>(capacity + 1).fill(0),
);
for (let i = 1; i <= n; i++) {
const item = items[i - 1]!;
for (let w = 0; w <= capacity; w++) {
dp[i]![w] = dp[i - 1]![w]!;
if (item.weight <= w) {
const withItem = dp[i - 1]![w - item.weight]! + item.value;
if (withItem > dp[i]![w]!) {
dp[i]![w] = withItem;
}
}
}
}
// Backtrack to find which items were selected.
const selectedItems: number[] = [];
let w = capacity;
for (let i = n; i > 0; i--) {
if (dp[i]![w] !== dp[i - 1]![w]) {
selectedItems.push(i - 1);
w -= items[i - 1]!.weight;
}
}
selectedItems.reverse();
return { maxValue: dp[n]![capacity]!, selectedItems, totalWeight };
}
Complexity. Time , space .
Important caveat. This is a pseudo-polynomial algorithm. The running time depends on the numeric value of , not on the number of bits needed to represent it. If is exponentially large in the input size, the algorithm becomes exponential. This distinction is crucial when discussing NP-completeness (Chapter 21) — the 0/1 knapsack problem is NP-hard, and the pseudo-polynomial algorithm does not contradict this.
Example. Items: , , . Capacity: 50. The optimal selection is items 2 and 3 (weight 50, value 220).
Space optimization
Since row depends only on row , we can reduce space to by using a single 1D array and iterating weights in decreasing order (to avoid using an item twice):
for each item:
for w = W down to item.weight:
dp[w] = max(dp[w], dp[w - item.weight] + item.value)
However, this optimization prevents us from backtracking to recover which items were selected, since the full table is no longer available.
Matrix chain multiplication
Given a chain of matrices where matrix has dimensions , we want to parenthesize the product to minimize the total number of scalar multiplications.
Matrix multiplication is associative, so any parenthesization yields the same result. But the cost varies dramatically. For three matrices with dimensions , , :
- : cost
- : cost
The first parenthesization is nearly twice as fast.
The DP formulation
Sub-problems. Let be the minimum number of scalar multiplications needed to compute the product .
Recurrence.
The idea: split the chain at position , compute the two sub-chains optimally, and add the cost of multiplying the resulting two matrices.
Base cases. (a single matrix requires no multiplication).
Computation order. Solve by increasing chain length .
export function matrixChainOrder(dims: number[]): MatrixChainResult {
if (dims.length < 2) {
throw new Error('dims must have at least 2 elements (at least one matrix)');
}
const n = dims.length - 1;
const m: number[][] = Array.from({ length: n + 1 }, () =>
new Array<number>(n + 1).fill(0),
);
const s: number[][] = Array.from({ length: n + 1 }, () =>
new Array<number>(n + 1).fill(0),
);
for (let l = 2; l <= n; l++) {
for (let i = 1; i <= n - l + 1; i++) {
const j = i + l - 1;
m[i]![j] = Infinity;
for (let k = i; k < j; k++) {
const cost =
m[i]![k]! + m[k + 1]![j]! + dims[i - 1]! * dims[k]! * dims[j]!;
if (cost < m[i]![j]!) {
m[i]![j] = cost;
s[i]![j] = k;
}
}
}
}
return {
minCost: m[1]![n]!,
parenthesization: buildParens(s, 1, n),
splits: s,
};
}
Complexity. Time , space .
The split table records where the optimal split occurs for each sub-chain. We use it to reconstruct the optimal parenthesization recursively:
function buildParens(s: number[][], i: number, j: number): string {
if (i === j) return `A${i}`;
return `(${buildParens(s, i, s[i]![j]!)}${buildParens(s, s[i]![j]! + 1, j)})`;
}
Example. The classic CLRS example with dimensions yields an optimal cost of 15,125 scalar multiplications.
Longest increasing subsequence
Given a sequence of numbers , the longest increasing subsequence (LIS) is the longest subsequence such that and .
Applications. LIS appears in patience sorting, version tracking, and computational geometry (longest chain of points dominated by each other).
O(n²) dynamic programming
Sub-problems. Let be the length of the longest increasing subsequence ending at position .
Recurrence.
(If no such exists, .)
Base cases. for all (each element is an increasing subsequence of length 1 by itself).
export function lisDP(arr: readonly number[]): LISResult {
const n = arr.length;
if (n === 0) return { length: 0, subsequence: [] };
const dp = new Array<number>(n).fill(1);
const parent = new Array<number>(n).fill(-1);
for (let i = 1; i < n; i++) {
for (let j = 0; j < i; j++) {
if (arr[j]! < arr[i]! && dp[j]! + 1 > dp[i]!) {
dp[i] = dp[j]! + 1;
parent[i] = j;
}
}
}
// Find the index where the LIS ends.
let bestLen = 0;
let bestIdx = 0;
for (let i = 0; i < n; i++) {
if (dp[i]! > bestLen) {
bestLen = dp[i]!;
bestIdx = i;
}
}
// Backtrack to recover the subsequence.
const subsequence: number[] = [];
let idx = bestIdx;
while (idx !== -1) {
subsequence.push(arr[idx]!);
idx = parent[idx]!;
}
subsequence.reverse();
return { length: bestLen, subsequence };
}
Complexity. Time , space .
O(n log n) patience sorting
We can improve to using a technique inspired by the card game Patience. Maintain an array tails where tails[i] is the smallest tail element of all increasing subsequences of length found so far.
For each element in the input:
- Binary search for the leftmost position in
tailswheretails[pos] >= val. - Replace
tails[pos]withval(or extendtailsifvalis larger than all current tails).
The key invariant is that tails is always sorted, which is what makes binary search possible.
export function lisBinarySearch(arr: readonly number[]): LISResult {
const n = arr.length;
if (n === 0) return { length: 0, subsequence: [] };
const tails: number[] = [];
const tailIndices: number[] = [];
const parent = new Array<number>(n).fill(-1);
for (let i = 0; i < n; i++) {
const val = arr[i]!;
let lo = 0;
let hi = tails.length;
while (lo < hi) {
const mid = (lo + hi) >>> 1;
if (tails[mid]! < val) {
lo = mid + 1;
} else {
hi = mid;
}
}
tails[lo] = val;
tailIndices[lo] = i;
if (lo > 0) {
parent[i] = tailIndices[lo - 1]!;
}
}
// Backtrack to recover the subsequence.
const length = tails.length;
const subsequence: number[] = [];
let idx = tailIndices[length - 1]!;
for (let k = 0; k < length; k++) {
subsequence.push(arr[idx]!);
idx = parent[idx]!;
}
subsequence.reverse();
return { length, subsequence };
}
Complexity. Time , space .
Example. For the sequence , the LIS has length 6. One such subsequence is .
Summary
Dynamic programming transforms problems with exponential brute-force solutions into efficient polynomial-time algorithms by exploiting optimal substructure and overlapping subproblems. The key insight is simple: do not recompute — remember. Whether through top-down memoization or bottom-up tabulation, DP systematically stores solutions to subproblems and builds toward the final answer.
We saw this principle in action across seven problems: from the elementary Fibonacci sequence (which illustrates the core idea) to sophisticated optimization problems like matrix chain multiplication and the knapsack problem. Each problem followed the same five-step recipe: define subproblems, write the recurrence, identify base cases, determine computation order, and recover the solution.
| Problem | Sub-problem space | Recurrence | Time | Space |
|---|---|---|---|---|
| Fibonacci | ||||
| Min coin change | : min coins for amount | |||
| LCS | : LCS of prefixes | match or skip | ||
| Edit distance | : edit dist of prefixes | match, sub, ins, del | ||
| 0/1 Knapsack | : best value, items , cap | take or skip item | ||
| Matrix chain | : min cost for | split at | ||
| LIS | : LIS ending at | extend from |
In the next chapter, we turn to greedy algorithms — a complementary design paradigm that, when applicable, yields even simpler and more efficient solutions than DP. The key challenge with greedy algorithms is proving that the locally optimal choice at each step leads to a globally optimal solution — a property that holds for some problems but not others. Understanding when to use DP and when to use greedy is one of the most important skills in algorithm design.
Exercises
-
Rod cutting. Given a rod of length and a price table where is the price of a rod of length , find the maximum revenue obtainable by cutting the rod into pieces. Write the recurrence, implement both top-down and bottom-up solutions, and analyze their complexity.
-
Subset sum. Given a set of positive integers and a target , determine whether there exists a subset of that sums to . Define the subproblems, write the recurrence, and implement a tabulated solution. What is the relationship between this problem and 0/1 knapsack?
-
Counting LCS. Modify the LCS algorithm to count the number of distinct longest common subsequences (not just find one). What changes are needed in the recurrence and the table?
-
Weighted edit distance. Generalize the edit distance algorithm so that insertions, deletions, and substitutions can have different costs (not all equal to 1). For example, in DNA alignment, a substitution between similar nucleotides might cost less than one between dissimilar nucleotides. Implement this generalization and verify it on a test case.
-
LIS and LCS connection. Prove that the LIS problem can be reduced to LCS by computing the LCS of the original sequence and its sorted version. Is this reduction efficient? When would you prefer the patience-sorting approach over the LCS-based approach?
Greedy Algorithms
Dynamic programming (Chapter 16) achieves optimal solutions by methodically exploring all subproblems and combining their answers. Greedy algorithms take a more aggressive approach: at each step they make the locally optimal choice and never look back. When a greedy strategy works, the result is typically a simpler and faster algorithm — often just a single pass over sorted data. The catch is that the locally optimal choice does not always lead to a globally optimal solution, so correctness requires proof. In this chapter we develop two proof techniques — the "greedy stays ahead" argument and the exchange argument — and apply them to three classic problems: interval scheduling, Huffman coding, and fractional knapsack.
The greedy strategy
A greedy algorithm builds a solution incrementally. At each step it examines the available candidates, selects the one that looks best according to some criterion, and commits to that choice irrevocably. It never reconsiders past decisions or explores alternative combinations.
Contrast this with dynamic programming:
| Dynamic programming | Greedy | |
|---|---|---|
| Decisions | Deferred — explores all combinations via table | Immediate — commits at each step |
| Subproblems | Many, overlapping | Typically none (single pass) |
| Correctness | Optimal substructure + overlapping subproblems | Requires a specific proof (exchange or stays-ahead) |
| Efficiency | Often or | Often or |
The greedy strategy works when a problem has:
- Optimal substructure. An optimal solution contains optimal solutions to subproblems.
- The greedy-choice property. A locally optimal choice can always be extended to a globally optimal solution. In other words, we never need to reconsider a greedy choice.
Property 1 is shared with DP. Property 2 is what distinguishes greedy problems: it asserts that committing to the local optimum is safe.
Proving greedy algorithms correct
Because the greedy-choice property is not obvious, we need rigorous proofs. Two standard techniques are widely used.
Greedy stays ahead
Idea. Show that after each step, the greedy solution is at least as good as any other solution at the same step. If the greedy algorithm stays ahead (or tied) at every step, it must be at least as good as the optimum overall.
Structure of the proof:
- Define a measure of progress after steps.
- Prove by induction that the greedy solution's measure is at least as good as the optimal solution's measure after every step .
- Conclude that the final greedy solution is optimal.
We will use this technique for interval scheduling below.
Exchange argument
Idea. Start with an arbitrary optimal solution. Show that it can be transformed — step by step, by "exchanging" its choices for greedy choices — into the greedy solution without worsening the objective. If an optimal solution can always be transformed into the greedy solution, the greedy solution must be optimal.
Structure of the proof:
- Consider an optimal solution that differs from the greedy solution .
- Identify the first point where and differ.
- Show that modifying to agree with at that point does not make worse.
- Repeat until .
We will use this technique for Huffman coding.
Interval scheduling (activity selection)
Problem definition
Given activities, each with a start time and a finish time (where ), select the largest subset of mutually compatible activities. Two activities are compatible if they do not overlap — that is, one finishes before the other starts.
This problem arises in resource allocation: scheduling the maximum number of non-overlapping jobs on a single machine, booking meeting rooms, or allocating time slots.
Greedy approach
The key insight is to sort activities by finish time and greedily select each activity whose start time does not conflict with the previously selected activity.
Why finish time? Consider the alternatives:
- Sort by start time. A long early activity could block many shorter ones.
- Sort by duration. A short activity in the middle could block two non-overlapping ones.
- Sort by fewest conflicts. Counterexamples exist.
- Sort by finish time. By always choosing the activity that finishes earliest, we leave as much room as possible for future activities.
Algorithm
- Sort activities by finish time.
- Select the first activity.
- For each subsequent activity: if its start time is the finish time of the last selected activity, select it.
export interface Interval {
start: number;
end: number;
}
export interface IntervalSchedulingResult {
selected: Interval[];
count: number;
}
export function intervalScheduling(
intervals: readonly Interval[],
): IntervalSchedulingResult {
if (intervals.length === 0) {
return { selected: [], count: 0 };
}
// Sort by finish time (break ties by start time).
const sorted = intervals.slice().sort((a, b) => {
if (a.end !== b.end) return a.end - b.end;
return a.start - b.start;
});
const selected: Interval[] = [sorted[0]!];
let lastEnd = sorted[0]!.end;
for (let i = 1; i < sorted.length; i++) {
const interval = sorted[i]!;
if (interval.start >= lastEnd) {
selected.push(interval);
lastEnd = interval.end;
}
}
return { selected, count: selected.length };
}
Correctness proof (greedy stays ahead)
Let be the activities selected by the greedy algorithm (in order of finish time), and let be an optimal solution (also sorted by finish time). We want to show .
Lemma (greedy stays ahead). For all , we have — the -th greedy activity finishes no later than the -th optimal activity.
Proof by induction on :
- Base case (). The greedy algorithm picks the activity with the earliest finish time, so .
- Inductive step. Assume . Since starts after finishes, we have . Therefore is compatible with , and the greedy algorithm considers it (or an activity that finishes even earlier). It follows that .
Theorem. . If , then by the lemma, , so is compatible with and the greedy algorithm would have selected it — contradicting the fact that greedy stopped at activities. Therefore , and the greedy solution is optimal.
Complexity
- Time: for sorting, plus for the single scan. Total: .
- Space: for the sorted copy and result.
Example
Consider these activities sorted by finish time:
| Activity | Start | Finish |
|---|---|---|
| A | 1 | 4 |
| B | 3 | 5 |
| C | 0 | 6 |
| D | 5 | 7 |
| E | 3 | 9 |
| F | 6 | 10 |
| G | 8 | 11 |
The greedy algorithm proceeds:
- Select A [1, 4). Last finish = 4.
- B starts at 3 < 4 — skip.
- C starts at 0 < 4 — skip.
- D starts at 5 ≥ 4 — select. Last finish = 7.
- E starts at 3 < 7 — skip.
- F starts at 6 < 7 — skip.
- G starts at 8 ≥ 7 — select. Last finish = 11.
Result: {A, D, G} — 3 activities. This is optimal.
Huffman coding
Problem definition
Given an alphabet of characters, each with a known frequency , find a prefix-free binary code that minimizes the total encoding length:
where is the depth of character in the coding tree (which equals the length of its binary codeword).
A code is prefix-free if no codeword is a prefix of another. This guarantees that encoded text can be decoded unambiguously without delimiters.
Why variable-length codes?
Fixed-length codes (like ASCII) use bits per character regardless of frequency. If some characters appear much more often than others, variable-length codes can do better: assign shorter codewords to frequent characters and longer ones to rare characters. This is the principle behind data compression formats like ZIP, gzip, and JPEG.
Huffman's greedy algorithm
David Huffman (1952) discovered that the optimal prefix-free code can be built by a simple greedy procedure:
- Create a leaf node for each character, with its frequency as the key.
- Insert all leaves into a min-priority queue.
- While the queue has more than one node: a. Extract the two nodes and with the lowest frequencies. b. Create a new internal node with , left child , and right child . c. Insert back into the queue.
- The remaining node is the root of the Huffman tree.
- Assign code
0to left edges and1to right edges. Each character's codeword is the sequence of bits on the path from root to its leaf.
import { BinaryHeap } from '../11-heaps-and-priority-queues/binary-heap.js';
export type HuffmanNode = HuffmanLeaf | HuffmanInternal;
export interface HuffmanLeaf {
kind: 'leaf';
char: string;
freq: number;
}
export interface HuffmanInternal {
kind: 'internal';
freq: number;
left: HuffmanNode;
right: HuffmanNode;
}
export function buildHuffmanTree(
frequencies: ReadonlyMap<string, number>,
): HuffmanNode {
if (frequencies.size === 0) {
throw new RangeError('frequency map must not be empty');
}
// Special case: single character.
if (frequencies.size === 1) {
const [char, freq] = [...frequencies][0]!;
return { kind: 'leaf', char, freq };
}
const heap = new BinaryHeap<HuffmanNode>((a, b) => a.freq - b.freq);
for (const [char, freq] of frequencies) {
heap.insert({ kind: 'leaf', char, freq });
}
while (heap.size > 1) {
const left = heap.extract()!;
const right = heap.extract()!;
const merged: HuffmanInternal = {
kind: 'internal',
freq: left.freq + right.freq,
left,
right,
};
heap.insert(merged);
}
return heap.extract()!;
}
The code table is then extracted by a simple tree traversal:
export function buildCodeTable(root: HuffmanNode): Map<string, string> {
const table = new Map<string, string>();
if (root.kind === 'leaf') {
table.set(root.char, '0');
return table;
}
function walk(node: HuffmanNode, prefix: string): void {
if (node.kind === 'leaf') {
table.set(node.char, prefix);
return;
}
walk(node.left, prefix + '0');
walk(node.right, prefix + '1');
}
walk(root, '');
return table;
}
Encoding and decoding
Encoding replaces each character with its codeword:
export function huffmanEncode(text: string): HuffmanEncodingResult {
if (text.length === 0) {
throw new RangeError('text must be non-empty');
}
const frequencies = new Map<string, number>();
for (const ch of text) {
frequencies.set(ch, (frequencies.get(ch) ?? 0) + 1);
}
const tree = buildHuffmanTree(frequencies);
const codeTable = buildCodeTable(tree);
let encoded = '';
for (const ch of text) {
encoded += codeTable.get(ch)!;
}
return { encoded, codeTable, tree };
}
Decoding walks the tree from root to leaf for each bit:
export function huffmanDecode(
encoded: string,
tree: HuffmanNode,
): string {
if (tree.kind === 'leaf') {
return tree.char.repeat(encoded.length);
}
let result = '';
let node: HuffmanNode = tree;
for (const bit of encoded) {
node = bit === '0'
? (node as HuffmanInternal).left
: (node as HuffmanInternal).right;
if (node.kind === 'leaf') {
result += node.char;
node = tree;
}
}
return result;
}
Correctness proof (exchange argument)
We prove that the Huffman algorithm produces an optimal prefix-free code.
Lemma 1. There exists an optimal tree in which the two lowest-frequency characters are siblings at the maximum depth.
Proof. Let be an optimal tree. Let and be the two characters with the lowest frequencies. If they are not at the maximum depth or not siblings in , we can swap them with the characters at maximum depth without increasing the cost (because and have the lowest frequencies, moving them deeper cannot increase , and moving more frequent characters to shallower positions can only help).
Lemma 2. Let be the tree obtained by replacing the subtree containing siblings and with a single leaf having frequency . Then .
Proof. In , and are one level deeper than is in . Each contributes extra to compared to .
Theorem. The Huffman algorithm produces an optimal prefix-free code.
Proof by induction on the number of characters :
- Base case ( or ). Trivially optimal.
- Inductive step. By Lemma 1, there is an optimal tree where the two lowest-frequency characters are siblings at maximum depth. By Lemma 2, replacing them with a merged node gives a subproblem with characters. By the inductive hypothesis, Huffman solves the subproblem optimally. Since the merge doesn't affect the relative costs of the remaining characters, the full tree is also optimal.
Complexity
- Time: where is the number of distinct characters. Each of the merge steps involves two heap extractions and one insertion, each .
- Space: for the tree and heap.
- Encoding time: where is the length of the input text (after the tree is built).
- Decoding time: where is the number of bits in the encoded string.
Example
Consider an alphabet with these frequencies:
| Character | f | a | b | c | d | e |
|---|---|---|---|---|---|---|
| Frequency | 5 | 9 | 12 | 13 | 16 | 45 |
Step-by-step tree construction:
- Extract
f(5) anda(9) → merge into node (14). - Extract
b(12) andc(13) → merge into node (25). - Extract (14) and
d(16) → merge into node (30). - Extract (25) and (30) → merge into node (55).
- Extract
e(45) and (55) → merge into root (100).
(100)
/ \
e:45 (55)
/ \
(25) (30)
/ \ / \
b:12 c:13 (14) d:16
/ \
f:5 a:9
Resulting codes:
| Character | Code | Length |
|---|---|---|
| e | 0 | 1 |
| b | 100 | 3 |
| c | 101 | 3 |
| f | 1100 | 4 |
| a | 1101 | 4 |
| d | 111 | 3 |
Total encoding length: bits.
A fixed-length code would require bits per character, for a total of bits. Huffman coding saves bits, a 25% reduction.
Fractional knapsack
Problem definition
Given items, each with a weight and a value , and a knapsack with capacity , maximize the total value of items placed in the knapsack. Unlike the 0/1 knapsack (Chapter 16), here we may take fractions of items: for each item , we choose a fraction , subject to:
Maximize:
Why greedy works here (but not for 0/1 knapsack)
The fractional knapsack has the greedy-choice property: we should always take as much as possible of the item with the highest value-per-unit-weight ratio .
For the 0/1 knapsack, this greedy strategy fails. Consider:
- Item A: weight 10, value 60 (ratio 6)
- Item B: weight 20, value 100 (ratio 5)
- Capacity: 20
Greedy by ratio selects item A (ratio 6), getting value 60. But the optimal solution takes item B for value 100. The constraint that items cannot be split breaks the greedy-choice property, which is why the 0/1 knapsack requires dynamic programming.
In the fractional case, if item A doesn't fill the knapsack, we can take part of item B as well — the "fractional freedom" ensures the greedy choice is always safe.
Algorithm
- Compute the value-to-weight ratio for each item.
- Sort items by ratio in descending order.
- Greedily take as much of each item as possible until the knapsack is full.
export interface FractionalKnapsackItem {
weight: number;
value: number;
}
export interface PackedItem {
index: number;
fraction: number;
weight: number;
value: number;
}
export interface FractionalKnapsackResult {
maxValue: number;
totalWeight: number;
packedItems: PackedItem[];
}
export function fractionalKnapsack(
items: readonly FractionalKnapsackItem[],
capacity: number,
): FractionalKnapsackResult {
if (capacity < 0) {
throw new RangeError('capacity must be non-negative');
}
const indexed = items.map((item, i) => ({
index: i,
weight: item.weight,
value: item.value,
ratio: item.value / item.weight,
}));
indexed.sort((a, b) => b.ratio - a.ratio);
const packedItems: PackedItem[] = [];
let remaining = capacity;
let totalValue = 0;
let totalWeight = 0;
for (const item of indexed) {
if (remaining <= 0) break;
if (item.weight <= remaining) {
packedItems.push({
index: item.index,
fraction: 1,
weight: item.weight,
value: item.value,
});
remaining -= item.weight;
totalValue += item.value;
totalWeight += item.weight;
} else {
const fraction = remaining / item.weight;
const fractionalValue = item.value * fraction;
packedItems.push({
index: item.index,
fraction,
weight: remaining,
value: fractionalValue,
});
totalValue += fractionalValue;
totalWeight += remaining;
remaining = 0;
}
}
return { maxValue: totalValue, totalWeight, packedItems };
}
Correctness proof (exchange argument)
Theorem. Sorting by and greedily packing yields an optimal solution.
Proof. Suppose items are sorted so that . Let be the greedy solution and be an optimal solution. If , let be the first index where they differ. By the greedy algorithm, is as large as possible (either 1 or filling the remaining capacity), so .
We can increase and decrease some (where , so ) to compensate. Specifically, shift weight from item to item :
The objective value does not decrease. Repeating this exchange process transforms into without ever decreasing the total value, so is optimal.
Complexity
- Time: for sorting, plus for the greedy scan. Total: .
- Space: .
Example
| Item | Weight | Value | Ratio |
|---|---|---|---|
| A | 10 | 60 | 6.0 |
| B | 20 | 100 | 5.0 |
| C | 30 | 120 | 4.0 |
Capacity: 50
Greedy packing (sorted by ratio):
- Take all of A: weight 10, value 60. Remaining capacity: 40.
- Take all of B: weight 20, value 100. Remaining capacity: 20.
- Take 20/30 of C: weight 20, value . Remaining capacity: 0.
Total value: .
Compare with the 0/1 knapsack (no fractions allowed), where the optimal is to take A and C for value , or A and B for value . The ability to take fractions yields a strictly higher value.
When greedy fails
Not every optimization problem admits a greedy solution. Here are instructive examples where the greedy approach fails:
-
0/1 Knapsack. As shown above, the greedy-by-ratio strategy is suboptimal. The integer constraint destroys the greedy-choice property.
-
Longest path in a graph. Greedily choosing the longest edge at each step does not yield the longest path. This problem is NP-hard.
-
Optimal BST. Greedily placing the most frequent key at the root does not minimize expected search time. This requires DP (similar to matrix chain multiplication).
The lesson: always prove that the greedy-choice property holds before trusting a greedy algorithm. The proofs in this chapter — "greedy stays ahead" and the exchange argument — are the standard tools for doing so.
Summary
Greedy algorithms solve optimization problems by making locally optimal choices at each step. They are simpler and typically faster than dynamic programming — often requiring just a sort followed by a linear scan — but they require careful proof that the greedy-choice property holds.
We studied two proof techniques. The greedy stays ahead argument shows that the greedy solution maintains an advantage over any optimal solution at every step, and we applied it to interval scheduling. The exchange argument shows that any optimal solution can be transformed into the greedy solution without loss, and we applied it to Huffman coding and fractional knapsack.
The three problems in this chapter illustrate the range of greedy applications:
- Interval scheduling selects the maximum number of non-overlapping activities by always choosing the one that finishes earliest — a algorithm.
- Huffman coding produces optimal prefix-free binary codes by repeatedly merging the two lowest-frequency symbols — also .
- Fractional knapsack maximizes value by greedily packing items in order of value-to-weight ratio — .
| Problem | Strategy | Time | Space | Proof technique |
|---|---|---|---|---|
| Interval scheduling | Sort by finish time | Greedy stays ahead | ||
| Huffman coding | Merge lowest-frequency pairs | Exchange argument | ||
| Fractional knapsack | Sort by value/weight ratio | Exchange argument |
We also contrasted greedy with DP on the knapsack problem: the fractional variant yields to greedy, while the 0/1 variant requires dynamic programming. Recognizing which problems have the greedy-choice property — and which do not — is a fundamental skill in algorithm design.
Exercises
-
Weighted interval scheduling. In the weighted variant, each activity has a value , and the goal is to maximize the total value (not the count) of selected non-overlapping activities. Show that the greedy algorithm (sort by finish time) does not solve this problem optimally. Design a dynamic programming algorithm in time.
-
Job scheduling with deadlines. You have jobs, each taking unit time, with a deadline and a penalty incurred if the job is not completed by its deadline. Design a greedy algorithm that minimizes the total penalty. Prove its correctness.
-
Optimal merge pattern. You have sorted files of sizes . Merging two files of sizes and costs . Find the merge order that minimizes the total cost. How does this relate to Huffman coding?
-
Huffman vs fixed-width. Prove that Huffman coding never uses more bits than a fixed-width encoding. Under what conditions does it use the same number of bits?
-
Greedy failure. Consider the coin-change problem with denominations and target amount 6. Show that the greedy algorithm (always use the largest denomination that fits) gives a suboptimal solution. What is the optimal solution?
Disjoint Sets
In Chapter 14 we introduced the Union-Find data structure as a tool for Kruskal's minimum spanning tree algorithm. We showed the code and stated that, with path compression and union by rank, each operation runs in amortized near-constant time. In this chapter we give the data structure the thorough treatment it deserves: we motivate the problem, build up from naive solutions, add the two key optimizations one at a time, explain why the combined structure achieves its remarkable amortized bound, and survey the wide range of problems where Union-Find is the right tool.
The disjoint-set problem
Many algorithms need to maintain a collection of disjoint sets — a partition of elements into non-overlapping groups — and answer questions about which group an element belongs to. The disjoint-set (or union-find) abstract data type supports three operations:
- makeSet(x) — create a new set containing only .
- find(x) — return the representative (canonical element) of the set containing . Two elements are in the same set if and only if
findreturns the same representative. - union(x, y) — merge the set containing and the set containing into a single set.
A sequence of makeSet operations followed by find and union operations is called an intermixed sequence of length . Our goal is a data structure that processes the entire sequence as quickly as possible.
Where disjoint sets arise
The disjoint-set problem appears in a surprising number of settings:
- Kruskal's MST algorithm (Chapter 14): determine whether adding an edge creates a cycle by checking if two vertices are already in the same component, and merge components when an edge is added.
- Dynamic connectivity: given a stream of edge insertions in an undirected graph, answer "Are vertices and connected?" after each insertion.
- Image processing: in connected-component labeling, pixels are grouped into regions by unioning adjacent pixels that satisfy a similarity criterion.
- Equivalence classes: in compilers, type unification during type inference is modeled as a union-find problem.
- Percolation: in physics simulations, determining whether a path exists from top to bottom of a grid is equivalent to checking whether top-row and bottom-row elements share a component.
- Least common ancestors (offline): Tarjan's offline LCA algorithm uses union-find to batch-process ancestor queries on a tree.
- Network redundancy: determining the number of connected components in a network, or detecting when a network becomes fully connected.
Naive implementations
Before introducing the optimized structure, let us consider two naive approaches. Each is fast for one operation but slow for the other, and understanding their limitations motivates the optimizations.
Array-based (quick-find)
Store an array id[] where id[x] is the representative of 's set. Two elements are in the same set if and only if they have the same id value.
- find(x) — return
id[x]. This is . - union(x, y) — scan the entire array, changing every entry equal to
id[x]toid[y]. This is .
A sequence of union operations (enough to merge singletons into one set) costs time. For large , this is too slow.
Linked-list-based (quick-union, unoptimized)
Represent each set as a rooted tree using a parent[] array. The representative of a set is the root of its tree: the element with parent[r] = r.
- find(x) — follow parent pointers from to the root. Time is , where is the depth of .
- union(x, y) — set
parent[find(x)] = find(y). This is .
The problem is that trees can become arbitrarily deep. If we perform unions in an unlucky order — always attaching the larger tree beneath the smaller one's root — the tree degenerates into a chain of length , and find costs . A sequence of find operations then costs .
We need two ideas to fix this: union by rank to keep trees shallow, and path compression to flatten them over time.
Union by rank
The first optimization controls tree height by always attaching the shorter tree beneath the taller one during a union.
Each node has a rank — an upper bound on the height of the subtree rooted at . Initially, every node has rank 0 (it is a leaf). When we merge two trees:
- If the roots have different ranks, we attach the lower-rank root beneath the higher-rank root. The rank of the new root does not change.
- If the roots have equal rank , we attach one beneath the other and increment the new root's rank to .
union(x, y):
rootX = find(x)
rootY = find(y)
if rootX == rootY: return // already same set
if rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else if rank[rootX] > rank[rootY]:
parent[rootY] = rootX
else:
parent[rootY] = rootX
rank[rootX] = rank[rootX] + 1
Why union by rank helps
Lemma. With union by rank (and no path compression), a tree with root of rank contains at least nodes.
Proof. By induction on the number of union operations. Initially, every node has rank 0 and its tree has node. The rank of a root increases from to only when two trees of rank are merged. By the inductive hypothesis, each contains at least nodes, so the merged tree contains at least nodes.
Corollary. The maximum rank of any node is , where is the total number of elements.
This means that find(x) follows at most parent pointers, so each find costs . A sequence of operations costs — already a major improvement over the naive .
Path compression
The second optimization speeds up find by making every node on the find path point directly to the root:
find(x):
root = x
while parent[root] != root:
root = parent[root]
// Path compression: point every node on path directly to root
while x != root:
next = parent[x]
parent[x] = root
x = next
return root
After find(x) completes, every node that was between and the root now has the root as its immediate parent. Future find operations on any of these nodes will complete in a single step.
Path compression alone (without union by rank) already achieves amortized time per operation. But the real power comes from combining both optimizations.
A variant: path halving
An alternative to full path compression is path halving, where each node on the find path is made to skip its parent and point to its grandparent:
find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] // skip to grandparent
x = parent[x]
return x
Path halving achieves the same asymptotic amortized bound as full path compression and requires only a single pass through the path (no second loop). In practice, both variants perform similarly.
Combined complexity: the inverse Ackermann function
With both path compression and union by rank, any sequence of operations on elements runs in time, where is the inverse Ackermann function. This remarkable result was proved by Tarjan in 1975 and later tightened by Tarjan and van Leeuwen.
What is the Ackermann function?
The Ackermann function is defined recursively:
This function grows extraordinarily fast. A few values:
| 0 | 2 |
| 1 | 3 |
| 2 | 5 |
| 3 | 13 |
| 4 | 65533 |
| 5 | (a tower of 65536 twos) |
The value is so large that it dwarfs the number of atoms in the observable universe ().
The inverse Ackermann function
The inverse Ackermann function is defined as:
Since grows so fast, grows inconceivably slowly:
- for
- for
- for
- for
- for
- for
For any value of that could arise in practice — or indeed in any computation on physical hardware — . This is why we say union-find operations run in "effectively constant" amortized time.
Intuition for the amortized bound
The formal proof uses a sophisticated potential function argument originally due to Tarjan. Here is the intuition:
-
Union by rank ensures that tree heights are at most , so the "starting point" for find costs is logarithmic.
-
Path compression does not change ranks, so the rank-based height bound still holds as a worst case. However, after a find operation, the compressed nodes have much shorter paths to the root.
-
The key insight is that path compression "pays for itself." A find that traverses a long path is expensive, but it compresses that path, making all subsequent finds along it cheap. The total cost of finds, amortized, is only .
To formalize this, Tarjan defines a potential function based on how much "room" each node has for future compression. Each expensive find reduces the potential significantly, ensuring that the amortized cost per operation is bounded by .
Is this optimal?
Yes. Tarjan proved a matching lower bound: in the pointer machine model, any data structure for the disjoint-set problem requires time for a sequence of operations on elements. The union-find structure with path compression and union by rank is asymptotically optimal.
Implementation
Our TypeScript implementation uses a Map for the parent and rank arrays, which allows the element type T to be any hashable value — not just integers.
export class UnionFind<T> {
private parent = new Map<T, T>();
private rank = new Map<T, number>();
private _componentCount = 0;
makeSet(x: T): void {
if (this.parent.has(x)) return;
this.parent.set(x, x);
this.rank.set(x, 0);
this._componentCount++;
}
find(x: T): T {
let root = x;
while (this.parent.get(root) !== root) {
root = this.parent.get(root)!;
}
// Path compression: point every node on path directly to root.
let current = x;
while (current !== root) {
const next = this.parent.get(current)!;
this.parent.set(current, root);
current = next;
}
return root;
}
union(x: T, y: T): boolean {
const rootX = this.find(x);
const rootY = this.find(y);
if (rootX === rootY) return false;
const rankX = this.rank.get(rootX)!;
const rankY = this.rank.get(rootY)!;
if (rankX < rankY) {
this.parent.set(rootX, rootY);
} else if (rankX > rankY) {
this.parent.set(rootY, rootX);
} else {
this.parent.set(rootY, rootX);
this.rank.set(rootX, rankX + 1);
}
this._componentCount--;
return true;
}
connected(x: T, y: T): boolean {
return this.find(x) === this.find(y);
}
get componentCount(): number {
return this._componentCount;
}
get size(): number {
return this.parent.size;
}
}
Design decisions
Generic type parameter. The UnionFind<T> class works with any element type — numbers, strings, or objects — as long as elements can be used as Map keys (i.e., identity via ===). This is more flexible than an array-based implementation that requires elements to be integer indices.
Idempotent makeSet. Calling makeSet(x) when x is already in a set is a no-op. This simplifies client code that may process elements from an unknown source.
Return value of union. The method returns true if a merge actually happened and false if the elements were already in the same set. This is useful for Kruskal's algorithm, which needs to know whether an edge was added to the MST.
Component count. The componentCount property tracks the number of disjoint sets, which is useful for dynamic connectivity queries ("How many connected components remain?").
Complexity summary
| Operation | Amortized Time |
|---|---|
makeSet | |
find | |
union | |
connected |
Space: for elements.
Trace through an example
Let us trace through a sequence of operations on integers . We show the parent array and rank array after each operation. An arrow means parent[x] = y; a self-loop means is a root.
After makeSet(0) through makeSet(7):
parent: 0→0 1→1 2→2 3→3 4→4 5→5 6→6 7→7
rank: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
components: 8
Every element is its own root with rank 0.
union(0, 1): Roots 0 and 1 both have rank 0, so attach 1 under 0. Increment rank of 0.
parent: 0→0 1→0 2→2 3→3 4→4 5→5 6→6 7→7
rank: 0:1 1:0 2:0 3:0 4:0 5:0 6:0 7:0
components: 7
union(2, 3): Attach 3 under 2. Increment rank of 2.
parent: 0→0 1→0 2→2 3→2 4→4 5→5 6→6 7→7
rank: 0:1 1:0 2:1 3:0 4:0 5:0 6:0 7:0
components: 6
union(4, 5): Attach 5 under 4.
parent: 0→0 1→0 2→2 3→2 4→4 5→4 6→6 7→7
rank: 0:1 1:0 2:1 3:0 4:1 5:0 6:0 7:0
components: 5
union(6, 7): Attach 7 under 6.
parent: 0→0 1→0 2→2 3→2 4→4 5→4 6→6 7→6
rank: 0:1 1:0 2:1 3:0 4:1 5:0 6:1 7:0
components: 4
union(0, 2): Roots 0 and 2 both have rank 1. Attach 2 under 0. Increment rank of 0.
parent: 0→0 1→0 2→0 3→2 4→4 5→4 6→6 7→6
rank: 0:2 1:0 2:1 3:0 4:1 5:0 6:1 7:0
components: 3
union(4, 6): Roots 4 and 6 both have rank 1. Attach 6 under 4.
parent: 0→0 1→0 2→0 3→2 4→4 5→4 6→4 7→6
rank: 0:2 1:0 2:1 3:0 4:2 5:0 6:1 7:0
components: 2
union(0, 4): Roots 0 and 4 both have rank 2. Attach 4 under 0.
parent: 0→0 1→0 2→0 3→2 4→0 5→4 6→4 7→6
rank: 0:3 1:0 2:1 3:0 4:2 5:0 6:1 7:0
components: 1
find(7): Follow the path . The root is 0. Path compression sets parent[7] = 0, parent[6] = 0, and parent[4] = 0 (4 was already pointing to 0).
parent: 0→0 1→0 2→0 3→2 4→0 5→4 6→0 7→0
rank: (unchanged — path compression does not alter ranks)
After this find, the next call to find(7) completes in a single step.
find(3): Follow . Path compression sets parent[3] = 0.
parent: 0→0 1→0 2→0 3→0 4→0 5→4 6→0 7→0
Now almost every node points directly to the root. The tree is nearly flat, and future finds will be very fast.
Applications
Kruskal's minimum spanning tree
The most classic application of Union-Find is in Kruskal's algorithm (Chapter 14). The algorithm sorts edges by weight and processes them in order. For each edge :
- Call
find(u)andfind(v)to check if and are in the same component. - If not, call
union(u, v)and add the edge to the MST.
Without Union-Find, cycle detection would require a full graph traversal for each edge, costing per edge and overall. With Union-Find, the total cost of all find and union operations is , which is effectively .
Dynamic connectivity
In the dynamic connectivity problem, we process a stream of edge insertions in an undirected graph and must answer connectivity queries: "Are vertices and connected?"
Union-Find handles this directly: when edge is inserted, call union(u, v). To answer a connectivity query, call connected(u, v). Each operation runs in amortized time.
Note that standard Union-Find only supports incremental connectivity — edges can be added but not removed. Supporting deletions requires more sophisticated data structures (such as link-cut trees or the Euler tour tree), which are beyond the scope of this book.
Connected components in an image
In image processing, connected-component labeling groups pixels into regions. Two adjacent pixels are in the same component if they share some property (e.g., similar color).
The algorithm scans the image in raster order (left to right, top to bottom). For each pixel:
- Call
makeSetfor the pixel. - Check the pixel above and to the left. If either neighbor has a similar value, call
unionto merge the current pixel's set with the neighbor's set. - After scanning the entire image, each connected component corresponds to one disjoint set.
This is the standard "two-pass" connected-component labeling algorithm. Union-Find makes the second pass (resolving label equivalences) nearly linear.
Percolation
In a percolation simulation, we model a grid of cells where each cell is independently "open" with probability or "blocked" with probability . The question is: does an open path exist from the top row to the bottom row?
We model this with Union-Find:
- Create a "virtual top" node connected to all open cells in the top row.
- Create a "virtual bottom" node connected to all open cells in the bottom row.
- For each open cell, union it with its open neighbors.
- The system percolates if
connected(virtualTop, virtualBottom).
This allows efficient simulation of percolation for many values of , enabling Monte Carlo estimation of the percolation threshold — the critical probability above which percolation almost certainly occurs.
Union by rank vs. union by size
An alternative to union by rank is union by size, which attaches the tree with fewer nodes beneath the tree with more nodes. Both strategies achieve height without path compression and amortized time with path compression. The choice between them is largely a matter of taste:
- Union by rank is slightly simpler because rank is a single integer that only increases, never decreases, and is never affected by path compression.
- Union by size provides additional information: after the union, the root's size equals the total number of elements in the merged set. This is useful when you need to know component sizes.
Our implementation uses union by rank, following the approach in CLRS.
Summary
The disjoint-set (Union-Find) data structure maintains a partition of elements into disjoint sets, supporting makeSet, find, and union operations. Naive implementations achieve at best per operation (with union by rank alone) or in the worst case (without any optimizations).
Union by rank keeps trees shallow by always attaching the shorter tree beneath the taller one, guaranteeing a maximum height of .
Path compression flattens trees during find operations by pointing every traversed node directly at the root, making subsequent finds faster.
Together, union by rank and path compression achieve amortized time per operation, where is the inverse Ackermann function — a function so slow-growing that for any practically conceivable input size. This bound is optimal: no pointer-based data structure can do better.
Union-Find is a fundamental building block in algorithm design. Its primary application is Kruskal's MST algorithm (Chapter 14), where it provides efficient cycle detection. It also appears in dynamic connectivity, image processing, percolation, type unification in compilers, and many other settings. In Chapter 22, we will see Union-Find used again in approximation algorithms for NP-hard problems.
Exercises
Exercise 18.1. Starting from eight singleton sets , perform the following operations using union by rank and path compression. Draw the forest after each operation and show how path compression modifies the tree structure.
union(0, 1), union(2, 3), union(0, 2),
union(4, 5), union(6, 7), union(4, 6),
union(0, 4), find(7), find(3), find(5)
Exercise 18.2. Prove that with union by rank (without path compression), the rank of any root is at most . (Hint: prove that a tree with root rank has at least nodes, by induction on the number of union operations.)
Exercise 18.3. Consider implementing union-find with path compression but without union by rank (i.e., always attaching the second root under the first, regardless of tree heights). What is the amortized time complexity per operation? Is it still ?
Exercise 18.4. Describe how to use Union-Find to detect whether an undirected graph has a cycle. Process the edges one by one; what condition indicates a cycle? Analyze the time complexity.
Exercise 18.5. A social network has users. Friendships arrive as a stream of pairs . You want to determine the exact moment when all users become connected (directly or transitively). Describe an algorithm using Union-Find and analyze its complexity.
(Hint: maintain a component count and check when it reaches 1.)
Tries and String Data Structures
The data structures we have studied so far — hash tables, balanced search trees, heaps — work well when keys are atomic values that can be compared or hashed in constant time. But many applications deal with string keys: dictionaries, autocomplete systems, IP routing tables, spell checkers, DNA sequence databases. For these, a data structure that exploits the character-by-character structure of keys can be far more efficient. The trie (from retrieval) is such a structure. In this chapter we develop the standard trie, optimize it into a compressed trie (radix tree) that eliminates wasted space, survey applications, and briefly introduce suffix arrays for substring search.
The trie (prefix tree)
Motivation
Consider storing a dictionary of words, where the total number of characters across all words is , and answering these queries:
- Lookup: Is a given word in the dictionary?
- Prefix search: Are there any words starting with a given prefix?
- Autocomplete: List all words starting with a given prefix.
A hash table answers lookup in expected time, where is the length of the query word (we must hash the entire word). But it cannot answer prefix queries without scanning every stored word. A balanced BST stores words in sorted order and can answer prefix queries via range searches, but each comparison costs , so lookup costs .
A trie answers all three queries in time — proportional to the length of the query, independent of the number of stored words. The key insight is that the trie avoids comparing entire keys; instead, it inspects one character at a time.
Structure
A trie (also called a prefix tree) is a rooted tree where:
- Each edge is labeled with a single character from the alphabet .
- Each node has at most children (one per character).
- A node may be marked as an end-of-word node, indicating that the path from the root to that node spells a complete word.
- The root represents the empty prefix.
The crucial property is prefix sharing: words that share a common prefix share the same path from the root. For example, "app", "apple", and "application" all share the path a → p → p.
Operations
Insert(word). Starting from the root, follow (or create) the edge labeled with each character of the word. Mark the final node as an end-of-word.
Search(word). Starting from the root, follow the edge labeled with each character. If at any point the required edge does not exist, the word is not in the trie. If we reach the end of the word, check whether the current node is marked as an end-of-word.
StartsWith(prefix). Like search, but we do not require the final node to be an end-of-word. If we can follow all characters of the prefix, at least one stored word has that prefix.
Delete(word). First verify the word exists. Then unmark the end-of-word flag. If the node has no children and is not an end-of-word for another word, remove it. Propagate this cleanup upward: if a parent becomes childless and is not itself an end-of-word, remove it too. This ensures the trie does not retain unnecessary nodes.
Autocomplete(prefix, limit). Navigate to the node corresponding to the prefix, then collect all words in the subtree (via DFS), stopping after limit results.
Complexity analysis
Let be the length of the key being operated on, and be the alphabet size.
| Operation | Time |
|---|---|
insert | |
search | |
startsWith | |
delete | |
autocomplete | where is the output size |
Space. In the worst case a trie stores one node per character of every stored word, for nodes where is the total length of all words. Each node stores up to child pointers, so the total space is . In practice, prefix sharing reduces the number of nodes significantly, especially when the stored words share many common prefixes.
When is small (e.g., DNA alphabet with 4 characters) or when using a hash map for child storage instead of a fixed-size array, the space is close to .
Implementation
Our implementation uses a Map<string, TrieNode> for each node's children, which supports arbitrary alphabets and avoids wasting space on unused child slots:
export class TrieNode {
readonly children = new Map<string, TrieNode>();
isEnd = false;
}
export class Trie {
private readonly root = new TrieNode();
private _size = 0;
get size(): number {
return this._size;
}
insert(word: string): void {
let node = this.root;
for (const ch of word) {
let child = node.children.get(ch);
if (child === undefined) {
child = new TrieNode();
node.children.set(ch, child);
}
node = child;
}
if (!node.isEnd) {
node.isEnd = true;
this._size++;
}
}
search(word: string): boolean {
const node = this.findNode(word);
return node !== null && node.isEnd;
}
startsWith(prefix: string): boolean {
return this.findNode(prefix) !== null;
}
private findNode(key: string): TrieNode | null {
let node: TrieNode = this.root;
for (const ch of key) {
const child = node.children.get(ch);
if (child === undefined) return null;
node = child;
}
return node;
}
}
Insert iterates character by character, creating child nodes as needed. Each character lookup in the Map is expected time, so the total is .
Search and startsWith both call findNode, which walks the trie following the key's characters. The difference is that search additionally checks the isEnd flag.
Delete is more involved because we must clean up nodes that are no longer needed:
delete(word: string): boolean {
if (!this.search(word)) return false;
this.deleteHelper(this.root, word, 0);
this._size--;
return true;
}
private deleteHelper(node: TrieNode, word: string, depth: number): boolean {
if (depth === word.length) {
node.isEnd = false;
return node.children.size === 0;
}
const ch = word[depth]!;
const child = node.children.get(ch);
if (child === undefined) return false;
const shouldDeleteChild = this.deleteHelper(child, word, depth + 1);
if (shouldDeleteChild) {
node.children.delete(ch);
return node.children.size === 0 && !node.isEnd;
}
return false;
}
The deleteHelper returns true when a node should be removed (it has no children and is not an end-of-word). This propagates up the recursion, cleaning the path.
Autocomplete navigates to the prefix node and then performs a DFS to collect all words in the subtree:
autocomplete(prefix: string, limit = Infinity): string[] {
const node = this.findNode(prefix);
if (node === null) return [];
const results: string[] = [];
this.collectWords(node, prefix, results, limit);
return results;
}
private collectWords(
node: TrieNode,
prefix: string,
results: string[],
limit: number,
): void {
if (results.length >= limit) return;
if (node.isEnd) {
results.push(prefix);
if (results.length >= limit) return;
}
const sortedKeys = [...node.children.keys()].sort();
for (const ch of sortedKeys) {
this.collectWords(node.children.get(ch)!, prefix + ch, results, limit);
if (results.length >= limit) return;
}
}
By iterating children in sorted order, we produce results in lexicographic order.
Trace through an example
Let us insert the words "app", "apple", "apply", and "banana" into an initially empty trie.
After inserting "app":
(root)
└─ a
└─ p
└─ p*
An asterisk (*) marks end-of-word nodes.
After inserting "apple":
(root)
└─ a
└─ p
└─ p*
└─ l
└─ e*
The path a → p → p is shared. The new characters l → e extend from the existing "app" node.
After inserting "apply":
(root)
└─ a
└─ p
└─ p*
├─ l
│ ├─ e*
│ └─ y*
The node for l now has two children: e (for "apple") and y (for "apply").
After inserting "banana":
(root)
├─ a
│ └─ p
│ └─ p*
│ └─ l
│ ├─ e*
│ └─ y*
└─ b
└─ a
└─ n
└─ a
└─ n
└─ a*
Now autocomplete("app") returns ["app", "apple", "apply"] — the word "app" itself plus all words in its subtree.
Compressed tries (radix trees)
The problem with standard tries
In a standard trie, a chain of nodes with a single child wastes space. Consider storing only the word "internationalization" in a trie: it requires 20 nodes, each with exactly one child, plus the root. This is 21 nodes for a single word.
More generally, if the stored words have long unique suffixes, the trie degenerates into long chains. These chains use space per character but create many nodes, each carrying a child map overhead.
Compressing single-child chains
A compressed trie (also called a radix tree or Patricia tree) eliminates single-child chains by storing an entire substring on each edge rather than a single character. The rule is:
Every internal node (except the root) has at least two children.
If a node has exactly one child and is not an end-of-word, it is merged with that child by concatenating their edge labels.
For example, the standard trie for {"romane", "romanus", "romulus", "rubens", "ruber", "rubicon", "rubicundus"} has many single-child chains. The compressed trie looks like:
(root)
└─ "r"
├─ "om"
│ ├─ "an"
│ │ ├─ "e"*
│ │ └─ "us"*
│ └─ "ulus"*
└─ "ub"
├─ "e"
│ ├─ "ns"*
│ └─ "r"*
└─ "ic"
├─ "on"*
└─ "undus"*
Instead of one node per character, each edge carries a substring. The total number of nodes is bounded by where is the number of stored words (at most leaves, at most internal branching nodes, plus the root).
Operations
The operations are conceptually the same as for a standard trie, but each step may match multiple characters at once:
Insert(word). Navigate the trie, matching the word against edge labels. There are three cases:
- No matching child. Create a new leaf node with the remaining suffix as its label.
- Edge label is a prefix of the remaining word. Recurse into the child with the rest of the word.
- Edge label and remaining word diverge. Split the edge: create a new internal node at the divergence point, move the existing child beneath it with a shortened label, and create a new leaf for the remaining suffix.
Search(word). Navigate the trie, matching edge labels character by character. The word is found only if we arrive at a node boundary (not in the middle of an edge label) and the node is marked as an end-of-word.
StartsWith(prefix). Like search, but the prefix may end in the middle of an edge label — this is acceptable because the label continues with characters that extend the prefix.
Delete(word). Find and unmark the node. If it becomes a leaf, remove it. If its parent now has only one child and is not an end-of-word, merge the parent with its child by concatenating labels. This maintains the compressed trie invariant.
Complexity
| Operation | Time |
|---|---|
insert | |
search | |
startsWith | |
delete | |
autocomplete |
Space. The number of nodes is where is the number of stored words — a major improvement over the standard trie's nodes. However, each node stores a substring label, and the total length of all labels is . So total space is in terms of characters stored, but with far fewer node objects.
Implementation
The key difference from a standard trie is the split operation during insertion. When an edge label and the remaining word diverge at some position, we must create a new branching node:
export class CompressedTrieNode {
readonly children = new Map<string, CompressedTrieNode>();
label: string;
isEnd = false;
constructor(label: string) {
this.label = label;
}
}
Each child in the map is keyed by the first character of its label. This allows lookup of the correct child for the next character in the key.
The insert helper handles the three cases:
private insertHelper(node: CompressedTrieNode, remaining: string): void {
const firstChar = remaining[0]!;
const child = node.children.get(firstChar);
if (child === undefined) {
// Case 1: no matching child — create a new leaf
const newNode = new CompressedTrieNode(remaining);
newNode.isEnd = true;
node.children.set(firstChar, newNode);
this._size++;
return;
}
const commonLen = commonPrefixLength(child.label, remaining);
if (commonLen === child.label.length && commonLen === remaining.length) {
// Exact match with existing node
if (!child.isEnd) {
child.isEnd = true;
this._size++;
}
return;
}
if (commonLen === child.label.length) {
// Case 2: child label is a prefix of remaining — recurse
this.insertHelper(child, remaining.slice(commonLen));
return;
}
// Case 3: split — labels diverge at position commonLen
const splitNode = new CompressedTrieNode(
child.label.slice(0, commonLen),
);
node.children.set(firstChar, splitNode);
// Move existing child beneath the split node
child.label = child.label.slice(commonLen);
splitNode.children.set(child.label[0]!, child);
if (commonLen === remaining.length) {
splitNode.isEnd = true;
this._size++;
} else {
const newLeaf = new CompressedTrieNode(remaining.slice(commonLen));
newLeaf.isEnd = true;
splitNode.children.set(newLeaf.label[0]!, newLeaf);
this._size++;
}
}
Search must check that the word ends exactly at a node boundary — not partway through an edge label:
private findExactNode(
node: CompressedTrieNode,
key: string,
): CompressedTrieNode | null {
let offset = 0;
for (;;) {
if (offset === key.length) return node;
const child = node.children.get(key[offset]!);
if (child === undefined) return null;
const label = child.label;
const remaining = key.length - offset;
if (remaining < label.length) {
// Key ends within this edge's label — not an exact match
return null;
}
if (key.slice(offset, offset + label.length) !== label) {
return null;
}
offset += label.length;
node = child;
}
}
Delete must maintain the compression invariant by merging nodes when appropriate:
private mergeWithChild(
parent: CompressedTrieNode,
key: string,
node: CompressedTrieNode,
): void {
if (node.children.size !== 1 || node.isEnd) return;
const entry = [...node.children.entries()][0]!;
const onlyChild = entry[1];
onlyChild.label = node.label + onlyChild.label;
parent.children.set(key, onlyChild);
}
When a node loses its end-of-word flag (or a child is deleted) and has exactly one remaining child, we merge the node with that child by concatenating their labels and removing the intermediate node.
Design decisions
Map-based children, keyed by first character. Each child's label starts with a unique character (since we split on divergence), so the first character serves as a unique key. This gives child lookup.
Separate findExactNode and findNodeForPrefix. Search requires an exact match at a node boundary, while startsWith and autocomplete allow partial matches within an edge label. We use two different navigation methods to handle these semantics correctly.
Node count tracking. The nodeCount() method allows testing that the trie is properly compressed — for instance, a single word should result in exactly 2 nodes (root + one leaf), not one node per character.
Standard trie vs. compressed trie
| Property | Standard trie | Compressed trie |
|---|---|---|
| Nodes | ||
| Space (total) | ||
| Lookup time | ||
| Insert time | ||
| Implementation | Simpler | More complex (splitting/merging) |
| Best for | Small alphabets, many short words | Long words, shared prefixes |
Where = total characters across all words, = number of words, = query length, = alphabet size.
For most practical applications the compressed trie is preferred because it uses nodes regardless of word length, and its operations have the same asymptotic time complexity as the standard trie.
Applications
Autocomplete and search suggestions
The most visible application of tries is autocomplete. When a user types a prefix in a search box, the system queries a trie to find all stored strings matching that prefix. The trie's structure makes this natural: navigate to the prefix node in time, then enumerate the subtree.
In practice, autocomplete systems augment the trie with frequency counts or ranking scores at each end-of-word node, so the most popular completions are returned first.
Spell checking
A trie can serve as the dictionary for a spell checker. Given a misspelled word, we can:
- Edit-distance search: enumerate all words within edit distance 1 or 2 by performing DFS on the trie while tracking allowed edits (insertions, deletions, substitutions). This is far more efficient than computing edit distance against every dictionary word.
- Prefix validation: as the user types, highlight prefixes that cannot lead to any valid word (the trie returns
startsWith(prefix) = false).
IP routing (longest prefix match)
Internet routers must match an incoming IP address against a routing table to determine the next hop. The routing table contains prefixes of various lengths, and the router must find the longest matching prefix. A trie indexed on the bits of the IP address solves this efficiently: navigate the trie bit by bit, keeping track of the last end-of-word node encountered. This is the standard data structure in router implementations.
Compressed tries (specifically, the Patricia tree variant) are particularly well-suited here because IP prefixes tend to be long and share common leading bits.
T9 predictive text
The T9 system for numeric keypads maps each key to several letters (2 → {a, b, c}, 3 → {d, e, f}, etc.). Given a sequence of key presses, T9 must find all dictionary words that match. A trie indexed by the key mappings rather than the letters themselves allows efficient lookup.
Bioinformatics
DNA sequences over the alphabet are naturally stored in tries with branching factor 4. Suffix tries (discussed below) enable fast substring search in genomic databases.
Suffix arrays (conceptual overview)
While tries excel at prefix queries, many applications require substring search: given a text of length , preprocess it so that queries "Does pattern appear in ?" can be answered quickly.
A suffix array is a sorted array of all suffixes of , represented by their starting positions. For example, for "banana":
| Index | Suffix |
|---|---|
| 5 | "a" |
| 3 | "ana" |
| 1 | "anana" |
| 0 | "banana" |
| 4 | "na" |
| 2 | "nana" |
Since the array is sorted, we can binary-search for any pattern in time, where . With an auxiliary LCP array (longest common prefix between consecutive suffixes), this can be improved to .
Construction. A suffix array can be built in time using the SA-IS algorithm or in time using simpler prefix-doubling approaches. The space is — just an array of integers.
Relation to suffix trees. A suffix tree is a compressed trie of all suffixes of . It supports substring queries (faster than suffix arrays without LCP) but uses significantly more space — typically 10-20 times the size of the text. Suffix arrays are the preferred choice in practice due to their compact representation and cache-friendly access patterns.
We do not implement suffix arrays in this chapter, as their construction algorithms are more specialized. The key takeaway is that the trie concept extends naturally to substring search when applied to suffixes.
Summary
A trie (prefix tree) is a tree-based data structure that stores strings by their character-by-character structure. Each path from the root to an end-of-word node represents a stored string, and strings that share a common prefix share the same initial path. This yields lookup, insertion, and deletion, where is the key length — independent of the number of stored strings.
A compressed trie (radix tree) optimizes the standard trie by collapsing chains of single-child nodes into single edges labeled with substrings. This reduces the node count from to , where is the total length of all stored strings and is the number of strings. The time complexity of all operations remains .
Tries are the natural choice for problems involving prefix queries: autocomplete, spell checking, IP routing, and predictive text. For substring queries, the trie concept extends to suffix trees and suffix arrays, which preprocess a text to enable fast pattern matching.
The trie is one of the most elegant examples of a data structure designed around the structure of the data it stores. Rather than treating keys as opaque objects to be compared or hashed, it decomposes keys into their constituent characters and exploits shared structure. This principle — designing data structures that respect the internal structure of their keys — is a powerful idea that appears throughout Computer Science.
Exercises
Exercise 19.1. Insert the words "bear", "bell", "bid", "bull", "buy", "sell", "stock", "stop" into an empty trie. Draw the resulting trie and count the total number of nodes (including the root). Then repeat the exercise with a compressed trie and compare the node counts.
Exercise 19.2. A standard trie over an alphabet of size with stored words has at most nodes (where is the total number of characters). Prove that a compressed trie has at most nodes. (Hint: every internal node except the root has at least two children, and there are exactly leaves.)
Exercise 19.3. Modify the Trie class to support wildcard search: search("b.ll") should match "ball", "bell", "bill", "bull", etc., where . matches any single character. What is the time complexity of your solution?
Exercise 19.4. You are designing an autocomplete system for a search engine. Each query has an associated frequency count. Describe how to modify the trie to return the top- most frequent completions of a prefix efficiently. What data would you store at each node? What is the time complexity?
(Hint: consider storing the top- completions at each node, or augmenting the trie with a priority queue.)
Exercise 19.5. An IP routing table contains the following prefixes (in binary): "0", "01", "011", "1", "10", "100", "1000". Build a compressed trie for these prefixes. Given the IP address "10010110" (in binary), trace the longest-prefix-match lookup and identify which prefix matches.
String Matching
Given a text of length and a pattern of length , find all positions in where occurs. This deceptively simple problem — searching for a word in a document, a DNA motif in a genome, a keyword in a log file — is one of the most fundamental in Computer Science. In this chapter we develop three algorithms of increasing sophistication: the naive brute-force approach, the Rabin-Karp algorithm based on rolling hashes, and the Knuth-Morris-Pratt (KMP) algorithm based on the failure function. Each illustrates a different strategy for avoiding redundant comparisons.
The pattern matching problem
Input. A text string and a pattern string , where .
Output. All indices such that , i.e., all positions where the pattern occurs in the text.
We call each such a valid shift. A shift is invalid if .
There are possible shifts to check (). The challenge is to avoid checking each one character by character from scratch. The three algorithms in this chapter differ in how they eliminate invalid shifts:
| Algorithm | Strategy | Time (worst) | Time (expected) | Space |
|---|---|---|---|---|
| Naive | Check every shift from scratch | |||
| Rabin-Karp | Use hashing to filter shifts | |||
| KMP | Use a failure function to skip shifts |
Naive string matching
The simplest approach: for each possible starting position in the text, compare the pattern against character by character. If all characters match, record as a valid shift. If any character fails to match, move to position and start over.
Algorithm
NAIVE-MATCH(T, P):
n ← length(T)
m ← length(P)
for i ← 0 to n − m:
j ← 0
while j < m and T[i + j] = P[j]:
j ← j + 1
if j = m:
report match at position i
Trace through an example
Consider aabaabaac and aabac. We have and .
| Shift | Comparison | Result |
|---|---|---|
| 0 | aabaa vs aabac | Mismatch at (a c) |
| 1 | abaab vs aabac | Mismatch at (b a) |
| 2 | baaba vs aabac | Mismatch at (b a) |
| 3 | aabaa vs aabac | Mismatch at (a c) |
| 4 | abaac vs aabac | Mismatch at (b a) |
No match is found. Notice that at shift 0 we successfully matched four characters before failing, yet at shift 1 we start the comparison entirely from scratch — discarding all information gained from the previous attempt. The algorithms that follow exploit this wasted information.
Implementation
export function naiveMatch(text: string, pattern: string): number[] {
const n = text.length;
const m = pattern.length;
const result: number[] = [];
if (m === 0) return result;
if (m > n) return result;
for (let i = 0; i <= n - m; i++) {
let j = 0;
while (j < m && text[i + j] === pattern[j]) {
j++;
}
if (j === m) {
result.push(i);
}
}
return result;
}
Complexity analysis
The outer loop runs times. In the worst case, the inner loop performs comparisons before discovering a mismatch (e.g., aaa...a and aaa...ab). The total number of character comparisons is therefore .
Best case. If the first character of the pattern rarely appears in the text, most shifts are eliminated after a single comparison, giving in practice.
Average case. For random text over an alphabet of size , the expected number of comparisons per shift is (a geometric series), so the expected total is . But for small alphabets (e.g., binary) or structured text (e.g., DNA), the worst case is more likely.
Space. beyond the output array. No preprocessing is needed.
Rabin-Karp string matching
The Rabin-Karp algorithm avoids re-examining every character at every shift by using hashing. The idea: compute a hash of the pattern and a hash of each text window of length . If the hashes differ, the window cannot match and we skip it without comparing characters. If the hashes match, we verify character by character to eliminate false positives (hash collisions).
The key insight is that the hash of the next window can be computed from the hash of the current window in time using a rolling hash. This makes the overall hash computation rather than .
Rolling hash
We treat each string of length as a number in base (where is the alphabet size) and take the result modulo a prime :
When we slide the window one position to the right, the new hash is:
This recurrence removes the contribution of the leftmost character and adds the new rightmost character . The value is a constant that we precompute once.
Algorithm
RABIN-KARP(T, P):
n ← length(T)
m ← length(P)
d ← 256 // alphabet size
q ← 1000000007 // large prime
h ← d^(m−1) mod q // precomputed weight
// Initial hashes
patternHash ← 0
windowHash ← 0
for j ← 0 to m − 1:
patternHash ← (patternHash · d + P[j]) mod q
windowHash ← (windowHash · d + T[j]) mod q
// Slide the window
for i ← 0 to n − m:
if windowHash = patternHash:
if T[i..i+m−1] = P: // verify to eliminate collisions
report match at position i
if i < n − m:
windowHash ← (d · (windowHash − T[i] · h) + T[i+m]) mod q
if windowHash < 0:
windowHash ← windowHash + q
Trace through an example
Consider 31415926 and 1592. Using and for illustration:
| Shift | Window | Hash | Match? |
|---|---|---|---|
| 0 | 3141 | No | |
| 1 | 1415 | No | |
| 2 | 4159 | roll... | No |
| 3 | 1592 | roll... | Hash match! Verify: 1592 = 1592. Match at . |
| 4 | 5926 | roll... | No |
Implementation
export function rabinKarp(text: string, pattern: string): number[] {
const n = text.length;
const m = pattern.length;
const result: number[] = [];
if (m === 0) return result;
if (m > n) return result;
const d = 256; // alphabet size (extended ASCII)
const q = 1_000_000_007; // prime modulus
// Precompute d^(m-1) mod q
let h = 1;
for (let i = 0; i < m - 1; i++) {
h = (h * d) % q;
}
// Initial hash values
let patternHash = 0;
let windowHash = 0;
for (let i = 0; i < m; i++) {
patternHash = (patternHash * d + pattern.charCodeAt(i)) % q;
windowHash = (windowHash * d + text.charCodeAt(i)) % q;
}
// Slide the pattern across the text
for (let i = 0; i <= n - m; i++) {
if (windowHash === patternHash) {
let match = true;
for (let j = 0; j < m; j++) {
if (text[i + j] !== pattern[j]) {
match = false;
break;
}
}
if (match) {
result.push(i);
}
}
if (i < n - m) {
windowHash =
((windowHash - text.charCodeAt(i) * h) * d +
text.charCodeAt(i + m)) % q;
if (windowHash < 0) {
windowHash += q;
}
}
}
return result;
}
Complexity analysis
Preprocessing. Computing and the initial hashes takes .
Searching. The rolling hash update at each shift costs . Hash comparisons cost . When hashes match, verification costs .
- Expected case. If the hash function distributes uniformly, the probability of a spurious hit (collision) at any shift is . The expected total verification cost is , which is negligible for large . Combined with for rolling hashes, the expected time is .
- Worst case. If every window produces a collision (e.g., and consist entirely of the same character), every hash match requires verification, giving — no better than naive. Choosing a large random prime makes this scenario astronomically unlikely in practice.
Space. beyond the output array.
Why Rabin-Karp matters
Rabin-Karp's main advantage over the other algorithms in this chapter is its easy generalization to multi-pattern search: given patterns, compute all their hashes and store them in a set, then check each window's hash against the set. This yields expected time for searching patterns simultaneously — far better than running KMP times.
Rabin-Karp is also the foundation of plagiarism detection systems: by computing rolling hashes of fixed-length substrings in two documents, matching hashes identify shared passages.
Knuth-Morris-Pratt (KMP)
The KMP algorithm achieves time in the worst case, not just in expectation. The key idea: when a mismatch occurs after matching characters of the pattern, we have already seen the text characters and know they equal . Instead of restarting from scratch at shift , we can use this information to determine the longest possible overlap — how far the pattern can be shifted while still maintaining a partial match.
This information is encoded in the failure function (also called the prefix function).
The failure function
For a pattern , define:
In other words, is the length of the longest string that appears both at the start and the end of , excluding the trivial case of the entire string.
Example. For ababaca:
| Longest proper prefix = suffix | |||
|---|---|---|---|
| 0 | a | (none) | 0 |
| 1 | ab | (none) | 0 |
| 2 | aba | a | 1 |
| 3 | abab | ab | 2 |
| 4 | ababa | aba | 3 |
| 5 | ababac | (none) | 0 |
| 6 | ababaca | a | 1 |
Computing the failure function
The failure function can be computed in time by recognizing that computing is itself a pattern-matching problem: we are matching the pattern against itself.
COMPUTE-FAILURE(P):
m ← length(P)
π[0] ← 0
k ← 0
for i ← 1 to m − 1:
while k > 0 and P[k] ≠ P[i]:
k ← π[k − 1] // fall back
if P[k] = P[i]:
k ← k + 1
π[i] ← k
return π
The variable tracks the length of the current match between a prefix and a suffix. When a mismatch occurs, we "fall back" to , which gives the next longest prefix that could still match. This cascade of fallbacks is the heart of KMP.
Why is this ? Although the inner while loop can execute multiple times for a single , each fallback decreases by at least 1. Since increases by at most 1 per iteration of the outer loop and can never go below 0, the total number of fallback operations across all iterations is at most . The total work is therefore .
The KMP search algorithm
With the failure function in hand, the search proceeds as follows. We maintain a variable that tracks how many characters of the pattern are currently matched against the text. On a mismatch, we fall back to instead of restarting from 0:
KMP-SEARCH(T, P):
n ← length(T)
m ← length(P)
π ← COMPUTE-FAILURE(P)
q ← 0 // characters matched so far
for i ← 0 to n − 1:
while q > 0 and P[q] ≠ T[i]:
q ← π[q − 1] // fall back
if P[q] = T[i]:
q ← q + 1
if q = m:
report match at position i − m + 1
q ← π[q − 1] // continue for overlapping matches
Step-by-step trace
Let abababaababaca and ababaca. The failure function is .
| before | Action | after | ||
|---|---|---|---|---|
| 0 | a | 0 | Match, | 1 |
| 1 | b | 1 | Match, | 2 |
| 2 | a | 2 | Match, | 3 |
| 3 | b | 3 | Match, | 4 |
| 4 | a | 4 | Match, | 5 |
| 5 | b | 5 | c b. Fall back: . b b. Match, | 4 |
| 6 | a | 4 | Match, | 5 |
| 7 | a | 5 | c a. Fall back: . b a. Fall back: . b a. Fall back: . a a. Match, | 1 |
| 8 | b | 1 | Match, | 2 |
| 9 | a | 2 | Match, | 3 |
| 10 | b | 3 | Match, | 4 |
| 11 | a | 4 | Match, | 5 |
| 12 | c | 5 | Match, | 6 |
| 13 | a | 6 | Match, . Match at position . Fall back: | 1 |
The pattern ababaca is found at position 7 in the text.
Notice at : after matching 5 characters, we discovered a mismatch. Instead of going back to shift 1 and starting over, the failure function told us that the last 3 matched characters (aba) form a prefix of the pattern, so we could continue from . This is the savings that gives KMP its efficiency.
Implementation
export function computeFailure(pattern: string): number[] {
const m = pattern.length;
const failure = new Array<number>(m).fill(0);
let k = 0;
for (let i = 1; i < m; i++) {
while (k > 0 && pattern[k] !== pattern[i]) {
k = failure[k - 1]!;
}
if (pattern[k] === pattern[i]) {
k++;
}
failure[i] = k;
}
return failure;
}
export function kmpSearch(text: string, pattern: string): number[] {
const n = text.length;
const m = pattern.length;
const result: number[] = [];
if (m === 0) return result;
if (m > n) return result;
const failure = computeFailure(pattern);
let q = 0;
for (let i = 0; i < n; i++) {
while (q > 0 && pattern[q] !== text[i]) {
q = failure[q - 1]!;
}
if (pattern[q] === text[i]) {
q++;
}
if (q === m) {
result.push(i - m + 1);
q = failure[q - 1]!;
}
}
return result;
}
Complexity analysis
Failure function computation. as argued above.
Search phase. By the same amortized argument: increases by at most 1 per iteration of the outer loop, and each fallback in the while loop decreases by at least 1. Since always, the total number of fallback operations is at most . Combined with the iterations of the outer loop, the search phase takes .
Total. in the worst case. This is optimal — we must read every character of both the text and the pattern at least once.
Space. for the failure function array.
Why KMP is important
KMP is significant not just for its efficiency, but for the ideas it introduces:
- The failure function captures the self-similarity structure of the pattern. This concept appears in many other string algorithms.
- Amortized analysis with a potential function. The argument that the total number of fallbacks is bounded is a clean example of amortized analysis — the variable serves as the potential.
- Online processing. KMP processes the text left to right, one character at a time, never looking back. This makes it suitable for streaming data.
Comparison and practical considerations
| Criterion | Naive | Rabin-Karp | KMP |
|---|---|---|---|
| Worst-case time | |||
| Expected time | * | ||
| Extra space | |||
| Preprocessing | None | ||
| Multi-pattern | Run times | Natural extension | Run times** |
| Implementation complexity | Trivial | Moderate | Moderate |
* Over random text with a large alphabet.
** The Aho-Corasick algorithm extends KMP to multi-pattern matching in time.
In practice:
- For short patterns or one-off searches, the naive algorithm is often the fastest due to its simplicity and cache-friendliness. Most standard library
indexOfimplementations use optimized variants of the naive approach (with heuristics like Boyer-Moore's bad-character rule). - Rabin-Karp shines when searching for multiple patterns simultaneously or when the alphabet is small and patterns are long (making hashing effective).
- KMP is the right choice when worst-case guarantees matter (e.g., processing untrusted input where an adversary might craft pathological text/pattern combinations).
Beyond this chapter
The string matching algorithms presented here search for exact occurrences of a fixed pattern. Important extensions include:
- Boyer-Moore and its variants (bad-character and good-suffix heuristics): often the fastest in practice for single-pattern search on natural language text, achieving sublinear average time.
- Aho-Corasick: extends KMP to match multiple patterns simultaneously by building a trie of patterns augmented with failure links.
- Suffix arrays and suffix trees (introduced in Chapter 19): preprocess the text rather than the pattern, enabling or queries after or construction.
- Approximate matching: finding occurrences that are within a given edit distance of the pattern, which connects to the dynamic programming techniques of Chapter 16.
Summary
The string matching problem — finding all occurrences of a pattern of length in a text of length — admits several algorithmic approaches.
The naive algorithm checks each of the possible shifts by comparing characters one by one, taking time in the worst case. It requires no preprocessing and no extra space, making it suitable for short patterns or large alphabets where mismatches occur quickly.
The Rabin-Karp algorithm improves on the naive approach by using a rolling hash to filter out non-matching shifts in time each. Only when hashes match does it verify character by character. With a good hash function, the expected running time is , though the worst case remains . Its main strength is easy extension to multi-pattern search.
The Knuth-Morris-Pratt algorithm achieves time in the worst case by preprocessing the pattern into a failure function that encodes its self-similarity structure. When a mismatch occurs, the failure function determines exactly how far to shift the pattern without missing any potential matches and without re-examining any text characters. The failure function computation and the search each use an elegant amortized argument: a counter that increases by at most 1 per step and decreases on fallbacks, bounding the total work.
These three algorithms illustrate a progression of ideas — from brute force to hashing to finite automaton-like preprocessing — that recur throughout algorithm design. The choice among them in practice depends on the use case: naive for simplicity, Rabin-Karp for multi-pattern search, and KMP when worst-case guarantees matter.
Exercises
Exercise 20.1. Trace the naive string matching algorithm on aabaabaaab and aab. Count the total number of character comparisons. Then trace KMP on the same input and count comparisons. By what factor does KMP reduce the work?
Exercise 20.2. Compute the failure function for the pattern aabaabaaa. Show the table and trace through the computation step by step. Verify your answer by checking that each correctly identifies the longest proper prefix of that is also a suffix.
Exercise 20.3. The Rabin-Karp algorithm uses a prime modulus to reduce hash collisions. What happens if is too small? Construct a concrete example where and consist of different characters but produce the same hash for every window when and . How does the algorithm handle this situation?
Exercise 20.4. Modify the KMP algorithm to find only the first occurrence of the pattern and return immediately. Then modify it to find the last occurrence. What are the time complexities of your modified versions?
Exercise 20.5. A circular string is one where the end wraps around to the beginning: the circular string abcd contains the substring dab. Describe how to use any of the string matching algorithms in this chapter to search for a pattern in a circular string of length . What is the time complexity?
(Hint: consider searching in — the text concatenated with itself — but be careful about reporting duplicate matches.)
Complexity Classes and NP-Completeness
Throughout this book we have analyzed algorithms by their running time as a function of input size: for merge sort, for BFS, for knapsack. An implicit assumption has been that every problem we studied has an efficient — polynomial-time — solution. But not all problems do. Some of the most natural and practically important computational problems appear to resist all attempts at efficient solution. In this chapter we develop the theoretical framework of complexity classes — P, NP, and co-NP — that categorizes problems by the computational resources they require. We then introduce the concept of NP-completeness, which identifies a class of problems that are, in a precise sense, the "hardest" problems in NP. Understanding this theory is essential for every computer scientist: it tells us when to stop searching for an efficient algorithm and instead reach for approximation, heuristics, or special-case solutions.
Decision problems and languages
Complexity theory is formalized in terms of decision problems — problems with a yes/no answer. While this may seem restrictive, optimization problems can always be rephrased as decision problems. For example:
- Optimization: Find the shortest Hamiltonian cycle (TSP).
- Decision: Is there a Hamiltonian cycle of length ?
If we can solve the decision version efficiently, we can typically solve the optimization version by binary searching on .
Formally, a decision problem corresponds to a language : the set of all binary strings (encodings of inputs) for which the answer is "yes." An algorithm decides if, given any input , it correctly outputs "yes" if and "no" if .
The class P
Definition. is the class of decision problems solvable by a deterministic Turing machine in time polynomial in the input size :
In practical terms, a problem is in P if there exists an algorithm that solves every instance of size in time for some constant .
Almost every algorithm in this book solves a problem in P:
| Problem | Algorithm | Time |
|---|---|---|
| Sorting | Merge sort | |
| Shortest path | Dijkstra | |
| MST | Kruskal | |
| Maximum flow | Edmonds-Karp | |
| String matching | KMP |
P captures the intuitive notion of "efficiently solvable." While is technically polynomial, in practice all known polynomial algorithms for natural problems have small exponents.
The class NP
Definition. (Nondeterministic Polynomial time) is the class of decision problems for which a "yes" answer can be verified in polynomial time given an appropriate certificate (also called a witness).
More precisely, a language is in NP if there exists a polynomial-time verifier and a polynomial such that:
The certificate is a "proof" that is a yes-instance, and checks this proof in polynomial time.
Key point: NP does not stand for "not polynomial." It stands for nondeterministic polynomial time. A nondeterministic machine can "guess" the certificate and verify it in polynomial time.
Examples
| Problem | Certificate | Verification |
|---|---|---|
| HAMILTONIAN CYCLE | A permutation of vertices | Check it forms a valid cycle: |
| SUBSET SUM | A subset of numbers | Check the sum equals the target: |
| SAT | A truth assignment | Evaluate the formula: |
| GRAPH COLORING | A color assignment | Check no adjacent vertices share a color: |
| CLIQUE | A set of vertices | Check all pairs are adjacent: |
P NP
Every problem in P is also in NP. If we can solve a problem in polynomial time, we can certainly verify a "yes" answer in polynomial time — we simply ignore the certificate and solve the problem from scratch. The deep open question is whether the converse holds.
The class co-NP
Definition. is the class of decision problems whose complement is in NP. Equivalently, a problem is in co-NP if "no" answers can be verified in polynomial time.
For example, "Is this formula unsatisfiable?" is in co-NP: if the formula is satisfiable, a satisfying assignment serves as a short certificate for a "no" answer to the unsatisfiability question. But proving unsatisfiability — providing a certificate that no satisfying assignment exists — appears to require exponential-length proofs in general.
It is known that . Whether is another major open question in complexity theory.
The P versus NP question
The most famous open problem in theoretical Computer Science — and one of the seven Clay Millennium Prize Problems — asks:
Is P = NP?
If P = NP, then every problem whose solution can be efficiently verified can also be efficiently solved. This would have profound consequences: public-key cryptography would be broken, many optimization problems in logistics, biology, and AI would become tractable, and mathematical proof search would be automatable.
Most researchers believe , based on decades of failed attempts to find polynomial algorithms for NP-complete problems. But a proof remains elusive.
Polynomial-time reductions
To compare the difficulty of problems, we use polynomial-time reductions.
Definition. A polynomial-time reduction from problem to problem (written ) is a polynomial-time computable function such that for all inputs :
If , then B is "at least as hard as" A:
- If B is in P, then A is in P (we can solve A by reducing to B and solving B).
- If A is not in P, then B is not in P either.
Reductions are transitive: if and , then .
NP-completeness
Definition. A problem is NP-hard if every problem in NP satisfies .
Definition. A problem is NP-complete if:
- , and
- is NP-hard.
NP-complete problems are the "hardest" problems in NP: if any one of them can be solved in polynomial time, then every problem in NP can be solved in polynomial time, and P = NP.
The Cook-Levin theorem
The foundational result in NP-completeness theory is:
Theorem (Cook 1971, Levin 1973). The Boolean satisfiability problem (SAT) is NP-complete.
SAT: Given a Boolean formula in conjunctive normal form (CNF), is there a truth assignment to its variables that makes true?
The proof (which we state without proving) shows that any computation of a nondeterministic Turing machine can be encoded as a Boolean formula in polynomial time. This means SAT is universal — every NP problem reduces to it.
Once SAT was shown to be NP-complete, the floodgates opened. Proving that a new problem is NP-complete requires just two steps:
- Show (exhibit a polynomial-time verifier).
- Show that some known NP-complete problem reduces to : .
By transitivity, this means every NP problem reduces to .
Classic NP-complete problems
Thousands of problems have been shown to be NP-complete. Here are some of the most important, organized by domain.
Boolean satisfiability
SAT. Given a CNF formula (conjunction of clauses, each a disjunction of literals), is it satisfiable?
3-SAT. A restriction of SAT where each clause has exactly 3 literals. Despite the restriction, 3-SAT remains NP-complete (SAT reduces to 3-SAT by clause splitting). 3-SAT is the starting point for most NP-completeness reductions because its structure is simple yet expressive.
Note that 2-SAT is in P — it can be solved in linear time using strongly connected components. The jump from 2 to 3 literals per clause is where tractability breaks down.
Graph problems
VERTEX COVER. Given a graph and an integer , is there a set with such that every edge has at least one endpoint in ?
INDEPENDENT SET. Given and , is there a set with such that no two vertices in are adjacent? (Complement of vertex cover: is independent is a vertex cover.)
CLIQUE. Given and , does contain a complete subgraph on vertices?
HAMILTONIAN CYCLE. Given , does it contain a cycle that visits every vertex exactly once?
GRAPH COLORING. Given and , can the vertices be colored with colors so that no two adjacent vertices share a color? NP-complete for .
Numeric problems
SUBSET SUM. Given a set of integers and a target , is there a subset of that sums to exactly ?
PARTITION. Given a multiset of integers, can it be partitioned into two subsets with equal sum? (A special case of subset sum with .)
BIN PACKING. Given items of various sizes and bins of capacity , can all items be packed into bins?
Optimization problems (decision versions)
TRAVELING SALESMAN (TSP). Given a complete weighted graph and a bound , is there a Hamiltonian cycle of total weight ?
SET COVER. Given a universe , a collection of subsets , and an integer , is there a sub-collection of sets whose union is ?
Proving NP-completeness by reduction: a worked example
We prove that VERTEX COVER is NP-complete by reducing from 3-SAT.
Step 1: VERTEX COVER is in NP
Certificate: A set of at most vertices. Verification: Check and that every edge has or . This takes time.
Step 2: 3-SAT VERTEX COVER
Given a 3-SAT formula with variables and clauses , we construct a graph and a number such that:
Construction:
-
Variable gadgets. For each variable , create two vertices and connected by an edge. Any vertex cover must include at least one of — this models the truth assignment.
-
Clause gadgets. For each clause , create a triangle on three new vertices . Any vertex cover must include at least 2 of these 3 vertices.
-
Connection edges. Connect to the vertex representing literal (that is, if , or if ). Similarly for and .
-
Set .
Correctness (sketch):
-
() If is satisfiable, pick the true literal from each variable gadget ( vertices), and for each clause triangle, pick the 2 vertices whose connection edges lead to false literals (which are not in the cover from step 1). This gives a vertex cover of size .
-
() If has a vertex cover of size , then exactly 1 vertex per variable gadget and exactly 2 per clause triangle are chosen (since we need at least ). The vertex not covered in each clause triangle must have its connection edge covered by the variable-gadget vertex — meaning the corresponding literal is true. So is satisfiable.
The construction takes polynomial time (the graph has vertices and edges), so this is a valid polynomial-time reduction. Since 3-SAT is NP-complete and reduces to VERTEX COVER, and VERTEX COVER is in NP, VERTEX COVER is NP-complete.
The reduction landscape
Many NP-completeness proofs follow chains of reductions from SAT or 3-SAT:
SAT
└─→ 3-SAT
├─→ INDEPENDENT SET ──→ CLIQUE
├─→ VERTEX COVER
├─→ HAMILTONIAN CYCLE ──→ TSP
├─→ SUBSET SUM ──→ PARTITION ──→ BIN PACKING
├─→ GRAPH COLORING
└─→ SET COVER
Each arrow represents a polynomial-time reduction. The diversity of these problems — spanning logic, graphs, numbers, and optimization — is what makes NP-completeness so remarkable: all these seemingly unrelated problems are computationally equivalent.
Brute-force illustrations
To make the exponential nature of NP-complete problems concrete, we implement brute-force solvers for two classic problems. These are educational implementations — they work correctly but have exponential running times that make them impractical for large inputs.
Subset sum (brute force)
The brute-force approach enumerates all subsets of the input set and checks whether any of them sums to the target.
Algorithm:
SUBSET-SUM-BRUTE(S, t):
n ← |S|
for mask ← 1 to 2^n − 1:
sum ← 0
subset ← ∅
for i ← 0 to n − 1:
if bit i of mask is set:
sum ← sum + S[i]
add S[i] to subset
if sum = t:
return (true, subset)
return (false, ∅)
Implementation:
export interface SubsetSumResult {
found: boolean;
subset: number[];
}
export function subsetSum(
nums: readonly number[],
target: number,
): SubsetSumResult {
const n = nums.length;
if (n > 30) {
throw new RangeError(
`input size ${n} is too large for brute-force enumeration (max 30)`,
);
}
if (target === 0) {
return { found: true, subset: [] };
}
const total = 1 << n;
for (let mask = 1; mask < total; mask++) {
let sum = 0;
const subset: number[] = [];
for (let i = 0; i < n; i++) {
if (mask & (1 << i)) {
sum += nums[i]!;
subset.push(nums[i]!);
}
}
if (sum === target) {
return { found: true, subset };
}
}
return { found: false, subset: [] };
}
Complexity:
- Time: . There are subsets, and summing each takes .
- Space: for the current subset.
Note that the dynamic programming approach from Chapter 16 can solve subset sum in time when is bounded. However, is pseudo-polynomial — polynomial in the numeric value of , not in the number of bits needed to encode . The subset sum problem remains NP-complete because the target can be exponentially large relative to the input length.
Traveling salesman (brute force)
The brute-force TSP solver generates all permutations of cities (fixing the starting city) and evaluates each tour.
Algorithm:
TSP-BRUTE(dist[0..n-1][0..n-1]):
bestDist ← ∞
bestTour ← nil
for each permutation π of {1, 2, ..., n-1}:
cost ← dist[0][π[0]]
for i ← 0 to n − 3:
cost ← cost + dist[π[i]][π[i+1]]
cost ← cost + dist[π[n−2]][0]
if cost < bestDist:
bestDist ← cost
bestTour ← (0, π[0], ..., π[n−2])
return (bestTour, bestDist)
Implementation:
export type DistanceMatrix = readonly (readonly number[])[];
export interface TSPResult {
tour: number[];
distance: number;
}
export function tspBruteForce(dist: DistanceMatrix): TSPResult {
const n = dist.length;
if (n === 0) {
throw new RangeError('distance matrix must not be empty');
}
if (n > 12) {
throw new RangeError(
`input size ${n} is too large for brute-force TSP (max 12)`,
);
}
if (n === 1) return { tour: [0], distance: 0 };
if (n === 2) {
return { tour: [0, 1], distance: dist[0]![1]! + dist[1]![0]! };
}
const remaining = Array.from({ length: n - 1 }, (_, i) => i + 1);
let bestDistance = Infinity;
let bestTour: number[] = [];
function tourCost(perm: number[]): number {
let cost = dist[0]![perm[0]!]!;
for (let i = 0; i < perm.length - 1; i++) {
cost += dist[perm[i]!]![perm[i + 1]!]!;
}
cost += dist[perm[perm.length - 1]!]![0]!;
return cost;
}
function heapPermute(arr: number[], size: number): void {
if (size === 1) {
const cost = tourCost(arr);
if (cost < bestDistance) {
bestDistance = cost;
bestTour = [0, ...arr];
}
return;
}
for (let i = 0; i < size; i++) {
heapPermute(arr, size - 1);
const swapIdx = size % 2 === 0 ? i : 0;
const temp = arr[swapIdx]!;
arr[swapIdx] = arr[size - 1]!;
arr[size - 1] = temp;
}
}
heapPermute(remaining, remaining.length);
return { tour: bestTour, distance: bestDistance };
}
Complexity:
- Time: . We fix city 0 and generate all permutations of the remaining cities. Each permutation requires to evaluate, giving total.
- Space: for the recursion stack and current permutation.
The factorial growth makes this approach completely impractical beyond about 12–15 cities:
| permutations | |
|---|---|
| 5 | 24 |
| 8 | 5,040 |
| 10 | 362,880 |
| 12 | 39,916,800 |
| 15 | 87,178,291,200 |
| 20 |
For practical TSP instances (hundreds or thousands of cities), we need approximation algorithms (Chapter 22), branch-and-bound, or metaheuristics like simulated annealing and genetic algorithms.
Coping with NP-hardness
When faced with an NP-hard problem, giving up is not the answer. Several strategies can yield useful solutions:
1. Approximation algorithms
Accept a solution that is provably close to optimal. For example:
- Vertex cover: A simple greedy algorithm achieves a 2-approximation — it always finds a cover at most twice the size of the optimum (Chapter 22).
- Metric TSP: An MST-based algorithm achieves a 2-approximation when the triangle inequality holds (Chapter 22).
- Set cover: A greedy algorithm achieves an -approximation (Chapter 22).
The key advantage is a guaranteed approximation ratio — we know how far from optimal the solution can be.
2. Exact algorithms for special cases
Many NP-hard problems become tractable for restricted inputs:
- TSP on planar graphs can be solved in time.
- Vertex cover parameterized by can be solved in time (fixed-parameter tractable).
- 2-SAT is solvable in linear time, even though 3-SAT is NP-complete.
- Tree-width bounded graphs admit polynomial-time algorithms for many NP-hard problems.
3. Pseudo-polynomial algorithms
Problems like subset sum and knapsack have algorithms running in time, where is a numeric parameter. When is small relative to , these algorithms are practical despite the problem's NP-completeness. See the dynamic programming chapter (Chapter 16) for implementations.
4. Heuristics and metaheuristics
When provable guarantees are not needed, heuristic methods often find good solutions quickly:
- Local search: Start with a random solution and iteratively improve it by making small changes (e.g., 2-opt for TSP, which swaps pairs of edges).
- Simulated annealing: Like local search, but occasionally accepts worse solutions to escape local optima, with the probability of acceptance decreasing over time.
- Genetic algorithms: Maintain a population of solutions, combine them via crossover, and apply mutation to explore the search space.
- Branch and bound: Systematically explore the solution space, pruning branches that provably cannot improve on the best solution found so far.
5. Randomized algorithms
Randomization can sometimes break through worst-case barriers:
- Random sampling can quickly find satisfying assignments for SAT instances that are not too constrained.
- Randomized rounding of linear programming relaxations yields good approximations for many NP-hard problems.
Summary
This chapter introduced the theoretical framework for classifying computational problems by their inherent difficulty.
P contains problems solvable in polynomial time — the "efficiently solvable" problems that have been our focus throughout this book. NP contains problems whose solutions can be verified in polynomial time, even if finding a solution may be hard. The question of whether P = NP — whether efficient verification implies efficient solution — is the most important open problem in Computer Science.
NP-complete problems, identified through polynomial-time reductions, are the hardest problems in NP: solving any one of them efficiently would solve all of them. The Cook-Levin theorem established SAT as the first NP-complete problem, and thousands more have been identified through chains of reductions — from satisfiability to graph problems (vertex cover, clique, Hamiltonian cycle), to numeric problems (subset sum, partition), to optimization problems (TSP, set cover).
| Class | Informal definition | Examples |
|---|---|---|
| P | Efficiently solvable (polynomial time) | Sorting, shortest path, MST, max flow |
| NP | Efficiently verifiable (polynomial-time certificate for "yes") | SAT, TSP, subset sum, clique, coloring |
| co-NP | Efficiently verifiable "no" answers | Tautology, primality (also in P) |
| NP-complete | Hardest problems in NP (every NP problem reduces to them) | 3-SAT, vertex cover, TSP, subset sum |
| NP-hard | At least as hard as NP-complete (but may not be in NP) | Halting problem, optimal chess play |
Relationships: .
Whether any of these inclusions are strict is unknown (except that NP-hard NP, since NP-hard includes undecidable problems).
We implemented brute-force solvers for two NP-complete problems to illustrate their exponential nature:
- Subset sum by exhaustive enumeration of all subsets: time.
- TSP by exhaustive enumeration of all permutations: time.
When facing NP-hard problems in practice, we have several coping strategies: approximation algorithms with provable guarantees (Chapter 22), exact algorithms for special cases (e.g., fixed-parameter tractability, bounded tree-width), pseudo-polynomial algorithms (e.g., DP for knapsack when the target is small), and heuristics (local search, simulated annealing, genetic algorithms). The theory of NP-completeness tells us not that these problems are unsolvable, but that we should not expect a polynomial-time algorithm that works optimally on all instances — and guides us toward the right tool for each situation.
Exercises
-
NP membership. Show that the CLIQUE problem is in NP by describing a certificate and a polynomial-time verifier. What is the running time of your verifier?
-
Reduction practice. Prove that INDEPENDENT SET is NP-complete by reducing from VERTEX COVER. (Hint: is an independent set in if and only if is a vertex cover.)
-
Subset sum variants. The PARTITION problem asks whether a multiset of integers can be divided into two subsets of equal sum. Show that PARTITION is NP-complete by reducing from SUBSET SUM. (Hint: given a SUBSET SUM instance , construct a PARTITION instance by adding appropriate elements.)
-
Pseudo-polynomial vs polynomial. Explain why the dynamic programming algorithm for 0/1 knapsack does not prove P = NP, even though knapsack is NP-complete. What is the relationship between and the input size?
-
Brute-force analysis. Suppose you have a computer that can evaluate TSP tours per second. How long would it take to solve a 20-city instance by brute force? A 25-city instance? Express your answers in meaningful time units (seconds, years, etc.).
Approximation Algorithms
Throughout this book we have designed algorithms that solve problems exactly and efficiently. But in the previous chapter we saw that many important optimization problems — minimum vertex cover, set cover, traveling salesman — are NP-hard: no polynomial-time algorithm is known, and most researchers believe none exists. Approximation algorithms offer a powerful middle ground: polynomial-time algorithms that produce solutions provably close to optimal. Instead of finding the best solution, we settle for one that is guaranteed to be within a known factor of the best. In this chapter we formalize approximation ratios, then study three classical algorithms: a 2-approximation for vertex cover, a greedy -approximation for set cover, and a 2-approximation for metric TSP via minimum spanning trees.
When exact solutions are infeasible
Chapter 21 demonstrated that brute-force approaches to NP-hard problems are impractical for all but the smallest inputs. A brute-force TSP solver exhausts permutations, which is infeasible beyond about 12–15 cities. A brute-force subset sum examines subsets, limiting us to roughly 30 elements.
For real-world instances — routing delivery trucks through hundreds of stops, selecting facilities to cover a service region, or allocating resources across a network — we need algorithms that:
- Run in polynomial time (ideally or , not ).
- Provide a quality guarantee — we can bound how far the solution is from optimal.
Approximation algorithms deliver both.
Approximation ratios
Let be a polynomial-time algorithm for an optimization problem, and let denote the cost of an optimal solution for instance .
Definition. Algorithm has approximation ratio if, for every instance of size :
The ratio is always . For minimization problems, . For maximization problems, .
An algorithm with approximation ratio is called a -approximation algorithm.
Some important distinctions:
- A constant-factor approximation has for some constant (e.g., the 2-approximation for vertex cover).
- A logarithmic approximation has (e.g., greedy set cover).
- A polynomial-time approximation scheme (PTAS) achieves ratio for any constant , though the running time may depend on .
- A fully polynomial-time approximation scheme (FPTAS) is a PTAS whose running time is polynomial in both and .
Not all NP-hard problems can be approximated equally well. Under standard complexity assumptions:
| Problem | Best known ratio | Hardness of approximation |
|---|---|---|
| Vertex cover | 2 | Cannot do better than unless P = NP |
| Set cover | Cannot do better than unless P = NP | |
| Metric TSP | 1.5 (Christofides) | Cannot do better than unless P = NP |
| General TSP | — | No constant-factor approximation unless P = NP |
| MAX-3SAT | 7/8 | Cannot do better than unless P = NP |
| Knapsack | FPTAS | Has a -approximation for any |
Vertex cover: 2-approximation
Problem definition
Given an undirected graph , a vertex cover is a subset such that every edge in has at least one endpoint in . The minimum vertex cover problem asks for a cover of smallest size.
Vertex cover is one of Karp's 21 NP-complete problems (1972) and has a natural relationship to the independent set problem: is an independent set if and only if is a vertex cover.
The algorithm
The 2-approximation is elegantly simple:
- Start with an empty cover and the full edge set .
- Pick an arbitrary uncovered edge from .
- Add both endpoints and to .
- Remove all edges incident to or from .
- Repeat until is empty.
The key insight is that the edges we pick in step 2 form a matching — a set of edges that share no endpoints. Every vertex cover must include at least one endpoint of each matching edge, so . Our algorithm adds exactly 2 vertices per matching edge, giving .
Pseudocode
APPROX-VERTEX-COVER(G):
C ← ∅
E' ← E
while E' ≠ ∅:
pick any edge (u, v) ∈ E'
C ← C ∪ {u, v}
remove all edges incident to u or v from E'
return C
Proof of the 2-approximation
Claim: .
Proof. Let be the set of edges selected by the algorithm. By construction:
- No two edges in share an endpoint (each time we select an edge, we remove all incident edges). So is a matching.
- The algorithm adds both endpoints of each matching edge: .
- Any vertex cover must include at least one endpoint of every edge, including every edge in . Since matching edges are disjoint, the optimal cover needs at least vertices: .
- Therefore .
TypeScript implementation
import { Graph } from '../12-graphs-and-traversal/graph.js';
export interface VertexCoverResult<T> {
cover: Set<T>;
size: number;
}
export function vertexCover<T>(graph: Graph<T>): VertexCoverResult<T> {
if (graph.directed) {
throw new Error('Vertex cover requires an undirected graph');
}
const cover = new Set<T>();
const edges = graph.getEdges();
for (const edge of edges) {
// If neither endpoint is already covered, add both.
if (!cover.has(edge.from) && !cover.has(edge.to)) {
cover.add(edge.from);
cover.add(edge.to);
}
}
return { cover, size: cover.size };
}
Note that the implementation iterates over edges and skips any edge that already has a covered endpoint — this is equivalent to "removing incident edges" in the pseudocode, since we only select an edge when both endpoints are uncovered.
Complexity:
- Time: — we iterate over all edges once.
- Space: — for the edge list and the cover set.
Worked example
Consider this graph:
1 --- 2
| |
3 --- 4 --- 5
Edges: (1,2), (1,3), (2,4), (3,4), (4,5).
Suppose the algorithm processes edges in order:
- Pick (1,2): add 1 and 2 to . Remove (1,2), (1,3), (2,4).
- Remaining edges: (3,4), (4,5). Pick (3,4): add 3 and 4 to . Remove (3,4), (4,5).
- No edges remain. , .
The matching was , so .
The optimal cover is or with . Our algorithm returned , which is exactly the worst case of the 2-approximation guarantee.
Tightness of the bound
The factor of 2 is tight for this algorithm. Consider the complete bipartite graph with vertices on each side. The optimal vertex cover selects one side: . A maximal matching has edges (one from each left vertex to a right vertex), and the algorithm adds both endpoints: .
Whether vertex cover can be approximated with a ratio better than 2 in polynomial time is a major open problem. The best known lower bound (assuming the Unique Games Conjecture) is for any .
Greedy set cover: -approximation
Problem definition
Given a universe and a collection of subsets of whose union is , the set cover problem asks for the smallest sub-collection of that covers every element of .
Set cover is a fundamental NP-hard problem that generalizes vertex cover (each vertex corresponds to a "set" of its incident edges, and the universe is the edge set).
The greedy algorithm
The greedy strategy is intuitive: at each step, select the subset that covers the most currently-uncovered elements.
GREEDY-SET-COVER(U, S):
C ← ∅ // selected subsets
uncovered ← U
while uncovered ≠ ∅:
select S_i ∈ S maximizing |S_i ∩ uncovered|
C ← C ∪ {S_i}
uncovered ← uncovered \ S_i
return C
Proof of the -approximation
Theorem. The greedy algorithm produces a cover of size at most , where is the -th harmonic number.
Proof sketch. We use a charging argument. When the greedy algorithm selects a set that covers new elements, we "charge" each newly covered element a cost of .
Consider any element that was covered when elements remained uncovered. The greedy choice covers at least elements (because the optimal solution uses sets to cover everything, so by pigeonhole, some set covers at least of the remaining elements). So element 's charge is at most .
Summing over all elements in the order they were covered:
TypeScript implementation
export interface SetCoverResult<T> {
selectedIndices: number[];
selectedSets: ReadonlySet<T>[];
count: number;
}
export function setCover<T>(
universe: ReadonlySet<T>,
subsets: readonly ReadonlySet<T>[],
): SetCoverResult<T> {
if (universe.size === 0) {
return { selectedIndices: [], selectedSets: [], count: 0 };
}
const uncovered = new Set<T>(universe);
const selectedIndices: number[] = [];
const selectedSets: ReadonlySet<T>[] = [];
const used = new Set<number>();
while (uncovered.size > 0) {
let bestIndex = -1;
let bestCount = 0;
for (let i = 0; i < subsets.length; i++) {
if (used.has(i)) continue;
let count = 0;
for (const elem of subsets[i]!) {
if (uncovered.has(elem)) count++;
}
if (count > bestCount) {
bestCount = count;
bestIndex = i;
}
}
if (bestIndex === -1 || bestCount === 0) {
throw new Error(
'Subsets do not cover the entire universe; ' +
`${uncovered.size} element(s) remain uncovered`,
);
}
used.add(bestIndex);
selectedIndices.push(bestIndex);
selectedSets.push(subsets[bestIndex]!);
for (const elem of subsets[bestIndex]!) {
uncovered.delete(elem);
}
}
return { selectedIndices, selectedSets, count: selectedIndices.length };
}
Complexity:
- Time: in the worst case. Each of the at most iterations scans all subsets, and each scan examines up to elements.
- Space: .
Worked example
Universe:
Subsets:
| Set | Elements |
|---|---|
Iteration 1: Uncovered = .
- covers 3 elements, covers 2, covers 3, covers 2.
- Tie between and ; pick .
- Uncovered = .
Iteration 2: covers 1 (), covers 2 (), covers 2 (). Pick .
- Uncovered = .
Iteration 3: covers 0, covers 1 (). Pick .
- Uncovered = .
Result: , 3 subsets. The optimal solution is also 3 (e.g., ), so the greedy algorithm found an optimal solution in this case.
Optimality of the greedy bound
The approximation ratio is essentially the best possible for set cover. Under standard complexity assumptions, no polynomial-time algorithm can achieve a ratio better than for any .
Metric TSP: 2-approximation via MST
Problem definition
The Traveling Salesman Problem (TSP) asks for the shortest Hamiltonian cycle (a tour visiting every vertex exactly once and returning to the start) in a complete weighted graph.
General TSP is not only NP-hard but also inapproximable: no polynomial-time algorithm can achieve any constant approximation ratio unless P = NP. (The proof: if we could approximate within any factor , we could solve the NP-complete Hamiltonian cycle problem by assigning weight 1 to existing edges and weight to missing edges.)
However, many practical TSP instances satisfy the triangle inequality: for all vertices :
This holds for Euclidean distances, shortest-path distances in networks, and most other natural distance metrics. The resulting metric TSP admits constant-factor approximations.
The MST-based algorithm
The algorithm exploits a fundamental relationship between MSTs and optimal tours:
- Compute an MST of the complete graph.
- Perform a DFS preorder traversal of the MST.
- The preorder sequence, with a return edge to the start, forms the tour.
APPROX-METRIC-TSP(G, d):
T ← MST(G) // Prim's or Kruskal's
tour ← DFS-PREORDER(T, starting from vertex 0)
return tour
Why this works: the shortcutting argument
Consider the full walk of the MST: start at the root, and traverse every edge twice (once going down, once returning). This walk visits every vertex but may visit some vertices multiple times. Its total cost is exactly , where is the MST weight.
The preorder traversal is a shortcut of this full walk: whenever the walk would revisit an already-visited vertex, we skip directly to the next unvisited vertex. By the triangle inequality, skipping vertices can only decrease the total distance:
So the shortcutted tour costs at most .
Proof of the 2-approximation
Claim: The MST-based tour has cost at most .
Proof.
- MST OPT: Removing any edge from the optimal tour yields a spanning tree. Since the MST is the minimum-weight spanning tree: .
- Tour 2 MST: The full walk costs , and the shortcutted preorder tour costs at most this (by the triangle inequality).
- Combining: .
TypeScript implementation
import type { DistanceMatrix } from '../21-complexity/tsp-brute-force.js';
import { Graph } from '../12-graphs-and-traversal/graph.js';
import { prim } from '../14-minimum-spanning-trees/prim.js';
export interface MetricTSPResult {
tour: number[];
distance: number;
}
export function metricTSP(dist: DistanceMatrix): MetricTSPResult {
const n = dist.length;
if (n === 0) throw new RangeError('distance matrix must not be empty');
for (let i = 0; i < n; i++) {
if (dist[i]!.length !== n) {
throw new Error(
`distance matrix must be square (row ${i} has ` +
`${dist[i]!.length} columns, expected ${n})`,
);
}
}
if (n === 1) return { tour: [0], distance: 0 };
if (n === 2) {
return { tour: [0, 1], distance: dist[0]![1]! + dist[1]![0]! };
}
// Build a complete undirected graph.
const graph = new Graph<number>(false);
for (let i = 0; i < n; i++) graph.addVertex(i);
for (let i = 0; i < n; i++) {
for (let j = i + 1; j < n; j++) {
graph.addEdge(i, j, dist[i]![j]!);
}
}
// Step 1: Compute MST.
const mst = prim(graph, 0);
// Build MST adjacency list.
const mstAdj = new Map<number, number[]>();
for (let i = 0; i < n; i++) mstAdj.set(i, []);
for (const edge of mst.edges) {
mstAdj.get(edge.from)!.push(edge.to);
mstAdj.get(edge.to)!.push(edge.from);
}
// Step 2: DFS preorder traversal.
const tour: number[] = [];
const visited = new Set<number>();
function dfsPreorder(v: number): void {
visited.add(v);
tour.push(v);
for (const neighbor of mstAdj.get(v)!) {
if (!visited.has(neighbor)) dfsPreorder(neighbor);
}
}
dfsPreorder(0);
// Step 3: Compute tour distance.
let distance = 0;
for (let i = 0; i < tour.length - 1; i++) {
distance += dist[tour[i]!]![tour[i + 1]!]!;
}
distance += dist[tour[tour.length - 1]!]![tour[0]!]!;
return { tour, distance };
}
Complexity:
- Time: — constructing the complete graph is , and Prim's algorithm on a complete graph with a binary heap is .
- Space: for the adjacency list of the complete graph.
Worked example
Consider 4 cities at the corners of a unit square:
1 -------- 2
| |
| |
0 -------- 3
Distance matrix (Euclidean):
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 0 | 1 | 1 | |
| 1 | 1 | 0 | 1 | |
| 2 | 1 | 0 | 1 | |
| 3 | 1 | 1 | 0 |
Step 1: MST (using Prim's from vertex 0):
- Add edge 0–1 (weight 1)
- Add edge 1–2 (weight 1)
- Add edge 0–3 (weight 1)
MST weight = 3. MST edges: 0–1, 1–2, 0–3.
Step 2: DFS preorder from 0:
Visit 0 → visit 1 → visit 2 → backtrack to 1 → backtrack to 0 → visit 3 → backtrack to 0.
Preorder: [0, 1, 2, 3].
Step 3: Tour cost:
.
The optimal tour is also 4 (the perimeter of the square), so the approximation is exact in this case.
, MST weight = 3, and — the guarantee holds.
Christofides' algorithm: a better bound
While we implemented the 2-approximation for its simplicity, a better algorithm exists. Christofides' algorithm (1976) achieves a -approximation:
- Compute an MST .
- Find the set of vertices with odd degree in .
- Compute a minimum-weight perfect matching on the vertices in .
- Combine and to get an Eulerian multigraph.
- Find an Eulerian circuit.
- Shortcut to a Hamiltonian cycle.
The key insight is that combining the MST with a minimum perfect matching on odd-degree vertices produces an Eulerian graph (all degrees even), whose Euler tour can be shortcutted. Since the minimum matching costs at most (by a pairing argument on the optimal tour), the total cost is at most .
Christofides' algorithm remained the best known approximation for metric TSP for nearly 50 years, until a very slight improvement was achieved by Karlin, Klein, and Oveis Gharan in 2021.
Comparison of approximation algorithms
| Problem | Algorithm | Ratio | Time | Approach |
|---|---|---|---|---|
| Vertex cover | Matching-based | 2 | Pick both endpoints of a maximal matching | |
| Set cover | Greedy | Pick set covering most uncovered elements | ||
| Metric TSP | MST-based | 2 | MST + DFS preorder + shortcutting | |
| Metric TSP | Christofides | 1.5 | MST + minimum matching + Euler tour |
Beyond the algorithms in this chapter
Approximation algorithms form a rich and active area of research. Some important topics we have not covered include:
-
LP relaxation and rounding: Many approximation algorithms work by solving a linear programming relaxation of an integer program and then rounding the fractional solution to an integer one. This technique yields tight results for problems like weighted vertex cover and MAX-SAT.
-
Semidefinite programming: For problems like MAX-CUT, the Goemans-Williamson algorithm uses semidefinite programming to achieve an approximation ratio of approximately 0.878, which is optimal assuming the Unique Games Conjecture.
-
Primal-dual methods: These construct both a feasible solution and a lower bound simultaneously, useful for network design problems.
-
The PCP theorem: The celebrated PCP (Probabilistically Checkable Proofs) theorem provides the theoretical foundation for hardness of approximation results, showing that for many problems, achieving certain approximation ratios is as hard as solving the problem exactly.
Summary
Approximation algorithms provide a principled approach to NP-hard optimization problems: polynomial-time algorithms with provable guarantees on solution quality.
We studied three classical examples:
-
Vertex cover 2-approximation: Pick an arbitrary uncovered edge, add both endpoints. The selected edges form a matching, and any cover needs at least one vertex per matching edge, giving a factor-2 guarantee. Runs in time.
-
Greedy set cover -approximation: Repeatedly select the subset covering the most uncovered elements. A charging argument shows the greedy cost is at most times optimal, where is the harmonic number. This ratio is essentially tight: no polynomial-time algorithm can do significantly better unless P = NP.
-
Metric TSP 2-approximation via MST: Compute a minimum spanning tree, perform a DFS preorder traversal, and return the resulting tour. The MST provides a lower bound on OPT, and the triangle inequality ensures the shortcutted tour costs at most twice the MST weight. Christofides' algorithm improves this to a -approximation.
The study of approximation algorithms reveals a rich structure within NP-hard problems. Some problems (like knapsack) admit -approximations for any . Others (like vertex cover) admit constant-factor approximations but resist improvements below specific thresholds. Still others (like general TSP) cannot be approximated at all. Understanding where a problem falls in this landscape guides us toward the most effective algorithmic approach.
Exercises
-
Vertex cover on trees. Show that the minimum vertex cover of a tree can be computed exactly in polynomial time using dynamic programming. (Hint: root the tree and compute, for each vertex, the minimum cover of its subtree with and without including that vertex.) Does this contradict the NP-hardness of vertex cover?
-
Weighted set cover. Generalize the greedy set cover algorithm to the weighted case, where each subset has a cost and we want to minimize the total cost of selected subsets. Show that the greedy algorithm (pick the set with the smallest cost per newly covered element) achieves the same approximation ratio.
-
TSP triangle inequality failure. Construct a graph with 4 vertices where the triangle inequality is violated, and show that the MST-based algorithm produces a tour whose cost exceeds . Explain why the shortcutting argument fails.
-
MAX-SAT approximation. Consider the following simple algorithm for MAX-SAT: independently set each variable to true with probability . Show that this randomized algorithm satisfies at least clauses in expectation when each clause has at least one literal, and at least clauses when each clause has exactly 3 literals. (Here is the number of clauses.) Can you derandomize this algorithm?
-
Tight examples. For each of the three algorithms in this chapter, describe a family of instances where the approximation ratio approaches the proven bound. That is: find graphs where the vertex cover algorithm returns a cover of size approaching , set cover instances where the greedy algorithm uses sets, and metric TSP instances where the MST tour approaches .
Bibliography
Textbooks
-
Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. Introduction to Algorithms, 4th edition. MIT Press, 2022. The comprehensive reference for algorithm design and analysis, commonly known as CLRS. Our curriculum and many proofs follow its presentation.
-
Kleinberg, J. and Tardos, E. Algorithm Design. Addison-Wesley, 2005. An excellent treatment of algorithm design techniques, particularly dynamic programming, greedy algorithms, and network flow.
-
Sedgewick, R. and Wayne, K. Algorithms, 4th edition. Addison-Wesley, 2011. A practically oriented textbook with Java implementations. Its approach to presenting algorithms alongside working code influenced the style of this book.
-
Skiena, S. The Algorithm Design Manual, 3rd edition. Springer, 2020. A unique combination of algorithm design techniques and a catalogue of algorithmic problems, useful as both a textbook and a reference.
-
Wirth, N. Algorithms + Data Structures = Programs. Prentice Hall, 1976. Also available at https://people.inf.ethz.ch/wirth/AD.pdf. A classic that pioneered the idea of teaching algorithms through a real programming language (Pascal). The title captures a philosophy this book shares.
-
Knuth, D.E. The Art of Computer Programming, Volumes 1--4A. Addison-Wesley, 1997--2011. The definitive, encyclopedic treatment of algorithms and their analysis. An invaluable reference for the mathematically inclined reader.
-
Aho, A.V., Hopcroft, J.E., and Ullman, J.D. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974. A foundational textbook that established many of the standard approaches to algorithm analysis.
-
Dasgupta, S., Papadimitriou, C.H., and Vazirani, U.V. Algorithms. McGraw-Hill, 2006. A concise and elegant textbook that is freely available from the authors. Particularly strong on number theory and NP-completeness.
-
Sipser, M. Introduction to the Theory of Computation, 3rd edition. Cengage Learning, 2012. The standard reference for computational complexity theory, NP-completeness, and the theory of computation.
Online resources
-
MIT OpenCourseWare. 6.006 Introduction to Algorithms. https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/. Lecture videos, notes, and problem sets covering the material in Parts I--IV of this book.
-
MIT OpenCourseWare. 6.046J Design and Analysis of Algorithms. https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/. The follow-on course covering advanced algorithm design techniques, network flow, and computational complexity.
Note on authorship and licensing
A substantial part of this book was created with the assistance of Zenflow, using Claude Code and Claude Opus 4.6.
This book is available under the MIT License and is provided as is, without any explicit guarantees of fitness for a given purpose or correctness.
Bugs and errors should be reported at https://github.com/amoilanen/Algorithms-with-Typescript.