Greedy Algorithms
Dynamic programming (Chapter 16) achieves optimal solutions by methodically exploring all subproblems and combining their answers. Greedy algorithms take a more aggressive approach: at each step they make the locally optimal choice and never look back. When a greedy strategy works, the result is typically a simpler and faster algorithm — often just a single pass over sorted data. The catch is that the locally optimal choice does not always lead to a globally optimal solution, so correctness requires proof. In this chapter we develop two proof techniques — the "greedy stays ahead" argument and the exchange argument — and apply them to three classic problems: interval scheduling, Huffman coding, and fractional knapsack.
The greedy strategy
A greedy algorithm builds a solution incrementally. At each step it examines the available candidates, selects the one that looks best according to some criterion, and commits to that choice irrevocably. It never reconsiders past decisions or explores alternative combinations.
Contrast this with dynamic programming:
| Dynamic programming | Greedy | |
|---|---|---|
| Decisions | Deferred — explores all combinations via table | Immediate — commits at each step |
| Subproblems | Many, overlapping | Typically none (single pass) |
| Correctness | Optimal substructure + overlapping subproblems | Requires a specific proof (exchange or stays-ahead) |
| Efficiency | Often or | Often or |
The greedy strategy works when a problem has:
- Optimal substructure. An optimal solution contains optimal solutions to subproblems.
- The greedy-choice property. A locally optimal choice can always be extended to a globally optimal solution. In other words, we never need to reconsider a greedy choice.
Property 1 is shared with DP. Property 2 is what distinguishes greedy problems: it asserts that committing to the local optimum is safe.
Proving greedy algorithms correct
Because the greedy-choice property is not obvious, we need rigorous proofs. Two standard techniques are widely used.
Greedy stays ahead
Idea. Show that after each step, the greedy solution is at least as good as any other solution at the same step. If the greedy algorithm stays ahead (or tied) at every step, it must be at least as good as the optimum overall.
Structure of the proof:
- Define a measure of progress after steps.
- Prove by induction that the greedy solution's measure is at least as good as the optimal solution's measure after every step .
- Conclude that the final greedy solution is optimal.
We will use this technique for interval scheduling below.
Exchange argument
Idea. Start with an arbitrary optimal solution. Show that it can be transformed — step by step, by "exchanging" its choices for greedy choices — into the greedy solution without worsening the objective. If an optimal solution can always be transformed into the greedy solution, the greedy solution must be optimal.
Structure of the proof:
- Consider an optimal solution that differs from the greedy solution .
- Identify the first point where and differ.
- Show that modifying to agree with at that point does not make worse.
- Repeat until .
We will use this technique for Huffman coding.
Interval scheduling (activity selection)
Problem definition
Given activities, each with a start time and a finish time (where ), select the largest subset of mutually compatible activities. Two activities are compatible if they do not overlap — that is, one finishes before the other starts.
This problem arises in resource allocation: scheduling the maximum number of non-overlapping jobs on a single machine, booking meeting rooms, or allocating time slots.
Greedy approach
The key insight is to sort activities by finish time and greedily select each activity whose start time does not conflict with the previously selected activity.
Why finish time? Consider the alternatives:
- Sort by start time. A long early activity could block many shorter ones.
- Sort by duration. A short activity in the middle could block two non-overlapping ones.
- Sort by fewest conflicts. Counterexamples exist.
- Sort by finish time. By always choosing the activity that finishes earliest, we leave as much room as possible for future activities.
Algorithm
- Sort activities by finish time.
- Select the first activity.
- For each subsequent activity: if its start time is the finish time of the last selected activity, select it.
export interface Interval {
start: number;
end: number;
}
export interface IntervalSchedulingResult {
selected: Interval[];
count: number;
}
export function intervalScheduling(
intervals: readonly Interval[],
): IntervalSchedulingResult {
if (intervals.length === 0) {
return { selected: [], count: 0 };
}
// Sort by finish time (break ties by start time).
const sorted = intervals.slice().sort((a, b) => {
if (a.end !== b.end) return a.end - b.end;
return a.start - b.start;
});
const selected: Interval[] = [sorted[0]!];
let lastEnd = sorted[0]!.end;
for (let i = 1; i < sorted.length; i++) {
const interval = sorted[i]!;
if (interval.start >= lastEnd) {
selected.push(interval);
lastEnd = interval.end;
}
}
return { selected, count: selected.length };
}
Correctness proof (greedy stays ahead)
Let be the activities selected by the greedy algorithm (in order of finish time), and let be an optimal solution (also sorted by finish time). We want to show .
Lemma (greedy stays ahead). For all , we have — the -th greedy activity finishes no later than the -th optimal activity.
Proof by induction on :
- Base case (). The greedy algorithm picks the activity with the earliest finish time, so .
- Inductive step. Assume . Since starts after finishes, we have . Therefore is compatible with , and the greedy algorithm considers it (or an activity that finishes even earlier). It follows that .
Theorem. . If , then by the lemma, , so is compatible with and the greedy algorithm would have selected it — contradicting the fact that greedy stopped at activities. Therefore , and the greedy solution is optimal.
Complexity
- Time: for sorting, plus for the single scan. Total: .
- Space: for the sorted copy and result.
Example
Consider these activities sorted by finish time:
| Activity | Start | Finish |
|---|---|---|
| A | 1 | 4 |
| B | 3 | 5 |
| C | 0 | 6 |
| D | 5 | 7 |
| E | 3 | 9 |
| F | 6 | 10 |
| G | 8 | 11 |
The greedy algorithm proceeds:
- Select A [1, 4). Last finish = 4.
- B starts at 3 < 4 — skip.
- C starts at 0 < 4 — skip.
- D starts at 5 ≥ 4 — select. Last finish = 7.
- E starts at 3 < 7 — skip.
- F starts at 6 < 7 — skip.
- G starts at 8 ≥ 7 — select. Last finish = 11.
Result: {A, D, G} — 3 activities. This is optimal.
Huffman coding
Problem definition
Given an alphabet of characters, each with a known frequency , find a prefix-free binary code that minimizes the total encoding length:
where is the depth of character in the coding tree (which equals the length of its binary codeword).
A code is prefix-free if no codeword is a prefix of another. This guarantees that encoded text can be decoded unambiguously without delimiters.
Why variable-length codes?
Fixed-length codes (like ASCII) use bits per character regardless of frequency. If some characters appear much more often than others, variable-length codes can do better: assign shorter codewords to frequent characters and longer ones to rare characters. This is the principle behind data compression formats like ZIP, gzip, and JPEG.
Huffman's greedy algorithm
David Huffman (1952) discovered that the optimal prefix-free code can be built by a simple greedy procedure:
- Create a leaf node for each character, with its frequency as the key.
- Insert all leaves into a min-priority queue.
- While the queue has more than one node: a. Extract the two nodes and with the lowest frequencies. b. Create a new internal node with , left child , and right child . c. Insert back into the queue.
- The remaining node is the root of the Huffman tree.
- Assign code
0to left edges and1to right edges. Each character's codeword is the sequence of bits on the path from root to its leaf.
import { BinaryHeap } from '../11-heaps-and-priority-queues/binary-heap.js';
export type HuffmanNode = HuffmanLeaf | HuffmanInternal;
export interface HuffmanLeaf {
kind: 'leaf';
char: string;
freq: number;
}
export interface HuffmanInternal {
kind: 'internal';
freq: number;
left: HuffmanNode;
right: HuffmanNode;
}
export function buildHuffmanTree(
frequencies: ReadonlyMap<string, number>,
): HuffmanNode {
if (frequencies.size === 0) {
throw new RangeError('frequency map must not be empty');
}
// Special case: single character.
if (frequencies.size === 1) {
const [char, freq] = [...frequencies][0]!;
return { kind: 'leaf', char, freq };
}
const heap = new BinaryHeap<HuffmanNode>((a, b) => a.freq - b.freq);
for (const [char, freq] of frequencies) {
heap.insert({ kind: 'leaf', char, freq });
}
while (heap.size > 1) {
const left = heap.extract()!;
const right = heap.extract()!;
const merged: HuffmanInternal = {
kind: 'internal',
freq: left.freq + right.freq,
left,
right,
};
heap.insert(merged);
}
return heap.extract()!;
}
The code table is then extracted by a simple tree traversal:
export function buildCodeTable(root: HuffmanNode): Map<string, string> {
const table = new Map<string, string>();
if (root.kind === 'leaf') {
table.set(root.char, '0');
return table;
}
function walk(node: HuffmanNode, prefix: string): void {
if (node.kind === 'leaf') {
table.set(node.char, prefix);
return;
}
walk(node.left, prefix + '0');
walk(node.right, prefix + '1');
}
walk(root, '');
return table;
}
Encoding and decoding
Encoding replaces each character with its codeword:
export function huffmanEncode(text: string): HuffmanEncodingResult {
if (text.length === 0) {
throw new RangeError('text must be non-empty');
}
const frequencies = new Map<string, number>();
for (const ch of text) {
frequencies.set(ch, (frequencies.get(ch) ?? 0) + 1);
}
const tree = buildHuffmanTree(frequencies);
const codeTable = buildCodeTable(tree);
let encoded = '';
for (const ch of text) {
encoded += codeTable.get(ch)!;
}
return { encoded, codeTable, tree };
}
Decoding walks the tree from root to leaf for each bit:
export function huffmanDecode(
encoded: string,
tree: HuffmanNode,
): string {
if (tree.kind === 'leaf') {
return tree.char.repeat(encoded.length);
}
let result = '';
let node: HuffmanNode = tree;
for (const bit of encoded) {
node = bit === '0'
? (node as HuffmanInternal).left
: (node as HuffmanInternal).right;
if (node.kind === 'leaf') {
result += node.char;
node = tree;
}
}
return result;
}
Correctness proof (exchange argument)
We prove that the Huffman algorithm produces an optimal prefix-free code.
Lemma 1. There exists an optimal tree in which the two lowest-frequency characters are siblings at the maximum depth.
Proof. Let be an optimal tree. Let and be the two characters with the lowest frequencies. If they are not at the maximum depth or not siblings in , we can swap them with the characters at maximum depth without increasing the cost (because and have the lowest frequencies, moving them deeper cannot increase , and moving more frequent characters to shallower positions can only help).
Lemma 2. Let be the tree obtained by replacing the subtree containing siblings and with a single leaf having frequency . Then .
Proof. In , and are one level deeper than is in . Each contributes extra to compared to .
Theorem. The Huffman algorithm produces an optimal prefix-free code.
Proof by induction on the number of characters :
- Base case ( or ). Trivially optimal.
- Inductive step. By Lemma 1, there is an optimal tree where the two lowest-frequency characters are siblings at maximum depth. By Lemma 2, replacing them with a merged node gives a subproblem with characters. By the inductive hypothesis, Huffman solves the subproblem optimally. Since the merge doesn't affect the relative costs of the remaining characters, the full tree is also optimal.
Complexity
- Time: where is the number of distinct characters. Each of the merge steps involves two heap extractions and one insertion, each .
- Space: for the tree and heap.
- Encoding time: where is the length of the input text (after the tree is built).
- Decoding time: where is the number of bits in the encoded string.
Example
Consider an alphabet with these frequencies:
| Character | f | a | b | c | d | e |
|---|---|---|---|---|---|---|
| Frequency | 5 | 9 | 12 | 13 | 16 | 45 |
Step-by-step tree construction:
- Extract
f(5) anda(9) → merge into node (14). - Extract
b(12) andc(13) → merge into node (25). - Extract (14) and
d(16) → merge into node (30). - Extract (25) and (30) → merge into node (55).
- Extract
e(45) and (55) → merge into root (100).
(100)
/ \
e:45 (55)
/ \
(25) (30)
/ \ / \
b:12 c:13 (14) d:16
/ \
f:5 a:9
Resulting codes:
| Character | Code | Length |
|---|---|---|
| e | 0 | 1 |
| b | 100 | 3 |
| c | 101 | 3 |
| f | 1100 | 4 |
| a | 1101 | 4 |
| d | 111 | 3 |
Total encoding length: bits.
A fixed-length code would require bits per character, for a total of bits. Huffman coding saves bits, a 25% reduction.
Fractional knapsack
Problem definition
Given items, each with a weight and a value , and a knapsack with capacity , maximize the total value of items placed in the knapsack. Unlike the 0/1 knapsack (Chapter 16), here we may take fractions of items: for each item , we choose a fraction , subject to:
Maximize:
Why greedy works here (but not for 0/1 knapsack)
The fractional knapsack has the greedy-choice property: we should always take as much as possible of the item with the highest value-per-unit-weight ratio .
For the 0/1 knapsack, this greedy strategy fails. Consider:
- Item A: weight 10, value 60 (ratio 6)
- Item B: weight 20, value 100 (ratio 5)
- Capacity: 20
Greedy by ratio selects item A (ratio 6), getting value 60. But the optimal solution takes item B for value 100. The constraint that items cannot be split breaks the greedy-choice property, which is why the 0/1 knapsack requires dynamic programming.
In the fractional case, if item A doesn't fill the knapsack, we can take part of item B as well — the "fractional freedom" ensures the greedy choice is always safe.
Algorithm
- Compute the value-to-weight ratio for each item.
- Sort items by ratio in descending order.
- Greedily take as much of each item as possible until the knapsack is full.
export interface FractionalKnapsackItem {
weight: number;
value: number;
}
export interface PackedItem {
index: number;
fraction: number;
weight: number;
value: number;
}
export interface FractionalKnapsackResult {
maxValue: number;
totalWeight: number;
packedItems: PackedItem[];
}
export function fractionalKnapsack(
items: readonly FractionalKnapsackItem[],
capacity: number,
): FractionalKnapsackResult {
if (capacity < 0) {
throw new RangeError('capacity must be non-negative');
}
const indexed = items.map((item, i) => ({
index: i,
weight: item.weight,
value: item.value,
ratio: item.value / item.weight,
}));
indexed.sort((a, b) => b.ratio - a.ratio);
const packedItems: PackedItem[] = [];
let remaining = capacity;
let totalValue = 0;
let totalWeight = 0;
for (const item of indexed) {
if (remaining <= 0) break;
if (item.weight <= remaining) {
packedItems.push({
index: item.index,
fraction: 1,
weight: item.weight,
value: item.value,
});
remaining -= item.weight;
totalValue += item.value;
totalWeight += item.weight;
} else {
const fraction = remaining / item.weight;
const fractionalValue = item.value * fraction;
packedItems.push({
index: item.index,
fraction,
weight: remaining,
value: fractionalValue,
});
totalValue += fractionalValue;
totalWeight += remaining;
remaining = 0;
}
}
return { maxValue: totalValue, totalWeight, packedItems };
}
Correctness proof (exchange argument)
Theorem. Sorting by and greedily packing yields an optimal solution.
Proof. Suppose items are sorted so that . Let be the greedy solution and be an optimal solution. If , let be the first index where they differ. By the greedy algorithm, is as large as possible (either 1 or filling the remaining capacity), so .
We can increase and decrease some (where , so ) to compensate. Specifically, shift weight from item to item :
The objective value does not decrease. Repeating this exchange process transforms into without ever decreasing the total value, so is optimal.
Complexity
- Time: for sorting, plus for the greedy scan. Total: .
- Space: .
Example
| Item | Weight | Value | Ratio |
|---|---|---|---|
| A | 10 | 60 | 6.0 |
| B | 20 | 100 | 5.0 |
| C | 30 | 120 | 4.0 |
Capacity: 50
Greedy packing (sorted by ratio):
- Take all of A: weight 10, value 60. Remaining capacity: 40.
- Take all of B: weight 20, value 100. Remaining capacity: 20.
- Take 20/30 of C: weight 20, value . Remaining capacity: 0.
Total value: .
Compare with the 0/1 knapsack (no fractions allowed), where the optimal is to take A and C for value , or A and B for value . The ability to take fractions yields a strictly higher value.
When greedy fails
Not every optimization problem admits a greedy solution. Here are instructive examples where the greedy approach fails:
-
0/1 Knapsack. As shown above, the greedy-by-ratio strategy is suboptimal. The integer constraint destroys the greedy-choice property.
-
Longest path in a graph. Greedily choosing the longest edge at each step does not yield the longest path. This problem is NP-hard.
-
Optimal BST. Greedily placing the most frequent key at the root does not minimize expected search time. This requires DP (similar to matrix chain multiplication).
The lesson: always prove that the greedy-choice property holds before trusting a greedy algorithm. The proofs in this chapter — "greedy stays ahead" and the exchange argument — are the standard tools for doing so.
Comparison of algorithms in this chapter
| Problem | Strategy | Time | Space | Proof technique |
|---|---|---|---|---|
| Interval scheduling | Sort by finish time | Greedy stays ahead | ||
| Huffman coding | Merge lowest-frequency pairs | Exchange argument | ||
| Fractional knapsack | Sort by value/weight ratio | Exchange argument |
Exercises
-
Weighted interval scheduling. In the weighted variant, each activity has a value , and the goal is to maximize the total value (not the count) of selected non-overlapping activities. Show that the greedy algorithm (sort by finish time) does not solve this problem optimally. Design a dynamic programming algorithm in time.
-
Job scheduling with deadlines. You have jobs, each taking unit time, with a deadline and a penalty incurred if the job is not completed by its deadline. Design a greedy algorithm that minimizes the total penalty. Prove its correctness.
-
Optimal merge pattern. You have sorted files of sizes . Merging two files of sizes and costs . Find the merge order that minimizes the total cost. How does this relate to Huffman coding?
-
Huffman vs fixed-width. Prove that Huffman coding never uses more bits than a fixed-width encoding. Under what conditions does it use the same number of bits?
-
Greedy failure. Consider the coin-change problem with denominations and target amount 6. Show that the greedy algorithm (always use the largest denomination that fits) gives a suboptimal solution. What is the optimal solution?
Chapter summary
Greedy algorithms solve optimization problems by making locally optimal choices at each step. They are simpler and typically faster than dynamic programming — often requiring just a sort followed by a linear scan — but they require careful proof that the greedy-choice property holds.
We studied two proof techniques. The greedy stays ahead argument shows that the greedy solution maintains an advantage over any optimal solution at every step, and we applied it to interval scheduling. The exchange argument shows that any optimal solution can be transformed into the greedy solution without loss, and we applied it to Huffman coding and fractional knapsack.
The three problems in this chapter illustrate the range of greedy applications:
- Interval scheduling selects the maximum number of non-overlapping activities by always choosing the one that finishes earliest — a algorithm.
- Huffman coding produces optimal prefix-free binary codes by repeatedly merging the two lowest-frequency symbols — also .
- Fractional knapsack maximizes value by greedily packing items in order of value-to-weight ratio — .
We also contrasted greedy with DP on the knapsack problem: the fractional variant yields to greedy, while the 0/1 variant requires dynamic programming. Recognizing which problems have the greedy-choice property — and which do not — is a fundamental skill in algorithm design.