Picking the right subset under constraints

Three of the most-cited NP-complete problems are actually the same coin viewed from three angles.

Three classic NP-complete problems show up in business in slightly different forms. Vertex cover: pick the minimum-size set of items that "touches" every relation. Independent set: pick the maximum-size set with no relations among them. Clique: pick the maximum-size set where every pair is related. All three are equivalent — solving one solves the others. Use cases include minimum-staff coverage (vertex cover), fraud-ring detection (clique), conflict-free scheduling (independent set), security policy auditing (vertex cover). All NP-complete. Vertex cover has a fast 2-approximation (greedy edge picking) and is one of the friendliest hard problems when the optimal cover is small (a parameterized algorithm runs in time exponential only in the cover size). Independent set and clique are much harder to approximate. In Rust, model as integer programming via lp" data-term="good_lp">good_lp for exact answers on moderate instances.

Model minimum vertex cover as an ILP — a binary variable per node, a constraint per edge requiring at least one endpoint covered, an objective summing the variables. Dispatch to HiGHS via good_lp. Exact answers on graphs with thousands of nodes in seconds.

use good_lp::{constraint, default_solver, ProblemVariables, SolverModel, Solution, variable};

fn min_vertex_cover(n: usize, edges: &[(usize, usize)]) -> Vec<usize> {
    let mut vars = ProblemVariables::new();
    let x: Vec<_> = (0..n).map(|_| vars.add(variable().binary())).collect();
    let mut model = vars
        .minimise(x.iter().sum::<good_lp::Expression>())
        .using(default_solver);
    for &(u, v) in edges {
        model = model.with(constraint!(x[u] + x[v] >= 1));
    }
    let sol = model.solve().expect("ILP solved");
    (0..n).filter(|&i| sol.value(x[i]) > 0.5).collect()
}

variable() returns a builder — a fresh variable specification with no constraints attached yet. Builder methods like .binary(), .integer(), .min(x), .max(x) chain on to refine the spec; each method returns the builder back so the chain can continue. When the spec is ready, hand it to vars.add(...) and you get a real variable handle. This fluent builder pattern is a standard Rust idiom for configuring complex objects without dozens of constructor arguments — you set only what you need, defaults handle the rest, and each call documents itself at the call site. .binary() constrains the variable to take only 0 or 1 — exactly what vertex cover needs ("vertex i is in the cover or not").

// variable() — fresh variable spec, no constraints yet.
let v = variable();

// Chain builder methods to refine.  Each method returns the builder
// back so you can keep chaining:
let v = variable().binary();                       // {0, 1}
let v = variable().integer().min(0).max(100);      // integer in [0, 100]
let v = variable().min(0.0);                        // continuous, ≥ 0
let v = variable().bounds(-10.0..=10.0);            // continuous, [-10, 10]
let v = variable().name("bread");                   // optional name for debugging

// Hand the spec to .add() on the variables container to get a real handle:
let mut vars = ProblemVariables::new();
let bread:    Variable = vars.add(variable().min(0.0).name("bread"));
let in_cover: Variable = vars.add(variable().binary());

// Why this pattern instead of constructor arguments:
//   • You set only what you need — defaults handle the rest.
//   • Each method documents itself at the call site.
//   • The library can grow the spec language without breaking constructors.
//
// Other Rust libraries that use the same pattern:
//   clap         command-line argument parsers
//   reqwest      HTTP clients
//   tokio        runtime configuration
//   nalgebra     statically-sized matrix builders

x.iter().sum::<good_lp::Expression>() adds up every variable in x into a single Expression. Two things make this work. First, good_lp overloads the + operator on Variable to produce an Expression — adding two variables does not compute a number (the variables have no values yet), it builds a symbolic sum. Second, Iterator::sum needs to know what type to accumulate into. With Vec<i32> the answer is obviously i32; with Vec<Variable> the result type is Expression (not Variable), and the compiler cannot guess that. The turbofish ::<good_lp::Expression> from page 11 tells sum exactly which target type to use. Annotating the binding instead of using a turbofish would work equally well.

// good_lp overloads + on Variable to produce a symbolic Expression:
//
//   Variable + Variable     →  Expression
//   Variable + Expression   →  Expression
//   2 * Variable            →  Expression
//   Expression + Expression →  Expression
//
// No arithmetic is performed — the variables don't have values yet.
// The result is a tree representing the algebraic sum.

let x: Vec<Variable> = /* ... */;

// Build the objective: minimise sum of x_i (the cover size).
let sum: good_lp::Expression = x.iter().sum::<good_lp::Expression>();
//                                       ─────────────────────────
//                                       turbofish: tell sum() the target type

// Same thing with type annotation instead of turbofish:
let sum: good_lp::Expression = x.iter().sum();

// Hand the expression to .minimise():
let model = vars.minimise(sum).using(default_solver);

// Why the type hint is needed:
//   Iterator::sum<S> requires <S: Sum<Item>>.
//   For an iter of Variable, the impl Sum<&Variable> for Expression exists,
//   but the compiler doesn't pick a target type on its own — there could
//   in principle be other impls.  Turbofish or annotation disambiguates.
//
// Same shape applies to .collect() (page 14) and .parse() (page 11) —
// any iterator/conversion that can produce more than one type.

constraint! is a macro (the ! is the giveaway from page 19) that parses a Rust-like inequality and produces a Constraint object the solver can add to the model. The whole expression — x[u] + x[v] >= 1 — is parsed by the macro as one piece, not as separate + and >= operations. This is the macro's job: regular function arguments cannot contain a >= in the middle. The body of the loop adds one constraint per edge of the input graph. In vertex cover, an edge is "covered" if at least one of its endpoints is in the cover, so the constraint x[u] + x[v] >= 1 translates directly: with binary variables, the sum is 0, 1, or 2, and requiring it to be >= 1 is the same as saying "at least one endpoint is chosen." Combined with the minimise-the-sum objective, the solver hunts for the smallest set of vertices that covers every edge — exactly the minimum vertex cover.

// constraint! is a macro — the trailing ! is the marker (page 19).
// It parses an inequality expression in one piece.

for &(u, v) in edges {
    model = model.with(constraint!(x[u] + x[v] >= 1));
}
//
//              ┌──── u and v are graph indices (the edge's endpoints)
//              ▼
//   constraint!( x[u] + x[v] >= 1 )

// What it means for vertex cover, enumerated:
//
//   x[u] = 0,  x[v] = 0    →  sum = 0    fails  ✗   edge not covered
//   x[u] = 1,  x[v] = 0    →  sum = 1    passes ✓   u in the cover
//   x[u] = 0,  x[v] = 1    →  sum = 1    passes ✓   v in the cover
//   x[u] = 1,  x[v] = 1    →  sum = 2    passes ✓   both endpoints
//
// "At least one endpoint must be in the cover."  Combined with the
// minimise-sum objective, the solver returns the smallest set of
// vertices that covers every edge.

// One edge → one constraint.  A graph with 1000 edges → 1000 constraints.
// Modern MIP solvers (HiGHS) handle thousands of constraints in seconds.

// The same shape encodes many other "at least / at most / exactly k of these":
constraint!(x[a] + x[b] + x[c] >= 2);    // at least 2 of {a, b, c} chosen
constraint!(x[a] + x[b] + x[c] <= 1);    // at most 1 of them chosen
constraint!(x[a] + x[b] + x[c] == 1);    // exactly 1 of them chosen

sol.value(x[i]) reads the solver's final value for variable x[i]. Even though x[i] was declared .binary(), the solver returns the result as an f64 — a 64-bit floating-point number — because the underlying linear-programming and branch-and-bound algorithms operate in continuous arithmetic and only commit to integer values at branch decisions. The returned value should be essentially 0.0 or 1.0, but floating-point precision means the literal value might be 0.99999999998 or 0.00000000002. Comparing against 0.5 is the safe, idiomatic way to read "which side did the variable land on" — anything above 0.5 is the 1 case, anything below is the 0 case. This pattern is standard with every numerical optimization solver — exact float equality (== 1.0) is fragile and skips legitimate values like 0.999999999998.

// sol.value(x[i]) — read the solver's value for x[i] as an f64.
//
//   If x[i] was declared .binary(), the solver guarantees the value is
//   "essentially" 0 or 1.  In practice it might be:
//
//     0.0000000000023      ← effectively 0
//     0.9999999999987      ← effectively 1
//
//   Tiny deviations from exact 0/1 come from floating-point precision
//   in the solver's internal arithmetic, not from a bug in the model.


// The idiomatic comparison is against 0.5:
let cover: Vec<usize> = (0..n)
    .filter(|&i| sol.value(x[i]) > 0.5)
    //                              ───
    //                              "did this variable land on the 1 side?"
    .collect();


// Why 0.5 and not == 1.0?
//
//   value > 0.5      reads cleanly as "value is closer to 1 than to 0."
//                    robust to ±1e-9 numerical drift.
//
//   value == 1.0     exact float comparison, fragile.
//                    0.999999999998 == 1.0 is FALSE, so this would skip
//                    legitimate cover members.


// Same trick for integer variables — if x is supposed to be 7, the
// idiomatic read is .round() or a tolerance check:
let exact = sol.value(x).round() as i64;            // common pattern
let close = (sol.value(x) - 7.0).abs() < 1e-6;       // tolerance check

// Floating-point equality is almost always wrong for solver outputs.
// Pick a tolerance and stick with it.

In plain terms

Vertex cover, independent set, and clique are three NP-complete problems that look different but are the same problem in disguise. Each one is a way of picking the right subset of items from a network of relationships.

Vertex cover: pick the smallest set of items such that every relationship has at least one of its endpoints in the set. Business use: minimum-staff coverage (cover every shift with the fewest people), security policy auditing (cover every required permission with the fewest roles), influencer marketing (reach every community with the fewest seeders).

Independent set: pick the largest set of items with no relationships among them. Business use: conflict-free scheduling (the maximum set of jobs that can run simultaneously without competing for the same resource), team formation (the largest group of people with no interpersonal conflicts).

Clique: pick the largest set of items all mutually related. Business use: fraud-ring detection (the largest group of accounts all transacting with each other), tightly-knit community discovery, sales territory consolidation.

All three are equivalent — a fast algorithm for any one solves all three. Vertex cover and independent set are complements (a set is independent if and only if everything else is a vertex cover). Clique becomes independent set when you flip the graph (replace every edge with non-edge and vice versa). None of them has a known fast algorithm.

The approximability picture is different across the three. Vertex cover has a simple algorithm that gets within a factor of two of optimal — pick any uncovered edge, add both endpoints to the cover, repeat. Twenty lines of code, two-times-optimal guarantee. Independent set and clique are much worse — no constant-factor approximation exists unless P = NP.

Vertex cover is also one of the most parameterized-friendly NP-complete problems. When the optimal cover is small (parameter k), the work grows exponentially only in k, not in the input size. For a graph with millions of nodes and a cover of size twenty, exact solutions are fast.

In Rust, model as integer programming via lp" data-term="good_lp">good_lp and dispatch to HiGHS for exact solutions on moderate instances. For larger or worst-case instances, accept approximation or use a SaaS.

When you see a "pick the best subset" problem, it is probably one of these three. Treat the estimate accordingly.