Reducibility
Glossary Contents
← Ch V
Chapter VI Set Cover
Ch VII →
  1. 16 — Cover the universe
  2. 17 — Greedy in R
  3. 18 — Where to advertise
Chapter VI

Set Cover

Cover every element of a universe using as few sets as possible. The greedy ln(n)-approximation is one of the most-used algorithms in industry that nobody calls by its name.

16

Cover the universe

Each set covers some elements. Pick a cheap collection of sets that covers them all.

Set cover takes a universe U and a family F of subsets, and asks for the minimum subcollection whose union equals U:

min∣F′∣subject toF′⊆F,  S∈F′⋃​S=U.(1)

It is the covering dual of set packing's packing. The greedy algorithm — repeatedly pick the set that covers the most uncovered elements — achieves an approximation ratio

H(∣U∣)=1+21​+31​+⋯+∣U∣1​≈ln∣U∣.(2)

Feige (1998) proved that the bound (2) is essentially optimal: no polynomial-time algorithm achieves

(1−ε)ln∣U∣(3)

unless P = NP. So the cheap, obvious algorithm is also, mathematically, the best one.

Figures, in plain terms
(1)

min is the dual of max from earlier: pick a choice that minimizes the expression. Here we minimize |F'| — the count of sets we choose.

The constraints are F' ⊆ F (a subfamily of the family, like before) and a new symbol: the big-union ⋃.

The big-∪ is union (combine all elements) repeated: ⋃ S in F' S reads "take the union of every set S in F'." Same idea as the big-AND from chapter 1, but for set union instead of Boolean AND. Plain union of two sets, A ∪ B, gives every element that appears in either side. Stack it across all chosen sets.

The constraint = U demands that union equals the whole universe — every element of U appears in at least one chosen set.

(2)

H(|U|) is the harmonic number of |U|. Spelled out, it's the running sum 1 + 1/2 + 1/3 + ... + 1/|U| — the reciprocals of the integers 1 through |U|.

That sum grows slowly. By a classical fact, H(n) ≈ ln(n) — natural logarithm — with the gap converging to a small constant (Euler–Mascheroni, ~0.577).

So the greedy heuristic uses at most about ln(|U|) times as many sets as the optimal. For a 1,000-element universe, that's roughly 7×; for a million, about 14×. Slow growth, no matter how big the input.

(3)

Same ln(|U|) factor, scaled by (1 − ε) — a number just barely under 1.

The claim of the figure: no polynomial algorithm can guarantee an approximation factor strictly better than ln(|U|), unless P = NP.

In other words, the greedy ln(|U|) bound from (2) is essentially the best you can hope for. Cheap and obvious, also mathematically optimal.

In plain terms

You have a list of items to cover and a list of bundles, where each bundle covers some of the items. Pick the fewest bundles such that every item is covered by at least one of your picks.

Scan to come back to page 16

← 15 Where to put the cameras
1/3 · 16/63
17 Greedy in R →