Chapter VI Set Cover

Cover the universe

Each set covers some elements. Pick a cheap collection of sets that covers them all.

Set cover takes a universe U and a family F of subsets, and asks for the minimum subcollection whose union equals U:

min ∣ F^{'} ∣ subject to F^{'} \subseteq F, S \in F^{'} ⋃ S = U . (1)

It is the covering dual of set packing's packing. The greedy algorithm — repeatedly pick the set that covers the most uncovered elements — achieves an approximation ratio

H (∣ U ∣) = 1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{∣ U ∣} \approx ln ∣ U ∣. (2)

Feige (1998) proved that the bound (2) is essentially optimal: no polynomial-time algorithm achieves

(1 - ε) ln ∣ U ∣ (3)

unless P = NP. So the cheap, obvious algorithm is also, mathematically, the best one.

Figures, in plain terms

(1)

min is the dual of max from earlier: pick a choice that minimizes the expression. Here we minimize |F'| — the count of sets we choose.

The constraints are F' ⊆ F (a subfamily of the family, like before) and a new symbol: the big-union ⋃.

The big-∪ is union (combine all elements) repeated: ⋃ S in F' S reads "take the union of every set S in F'." Same idea as the big-AND from chapter 1, but for set union instead of Boolean AND. Plain union of two sets, A ∪ B, gives every element that appears in either side. Stack it across all chosen sets.

The constraint = U demands that union equals the whole universe — every element of U appears in at least one chosen set.

(2)

H(|U|) is the harmonic number of |U|. Spelled out, it's the running sum 1 + 1/2 + 1/3 + ... + 1/|U| — the reciprocals of the integers 1 through |U|.

That sum grows slowly. By a classical fact, H(n) ≈ ln(n) — natural logarithm — with the gap converging to a small constant (Euler–Mascheroni, ~0.577).

So the greedy heuristic uses at most about ln(|U|) times as many sets as the optimal. For a 1,000-element universe, that's roughly 7×; for a million, about 14×. Slow growth, no matter how big the input.

(3)

Same ln(|U|) factor, scaled by (1 − ε) — a number just barely under 1.

The claim of the figure: no polynomial algorithm can guarantee an approximation factor strictly better than ln(|U|), unless P = NP.

In other words, the greedy ln(|U|) bound from (2) is essentially the best you can hope for. Cheap and obvious, also mathematically optimal.