Perl is at home in the world of "match this pattern in that file." Hitting set on a corpus of regex requirements is a Perl-shaped problem.
01
Read the family from stdin: each line a set, members space-separated. Perl's hash-of-hashes makes the count-by-element pattern a one-liner.
#!/usr/bin/perl
use strict;
use warnings;
my @family;
while (my $line = <STDIN>) {
chomp $line;
my %set = map { $_ => 1 } split /\s+/, $line;
push @family, \%set;
}
02
Greedy step: count, across all uncovered sets, how many each element hits. Pick the element with the highest count. This is the classic ln(n)-approximation, the same one as for set cover.
sub pick_best {
my @open = @_;
my %count;
for my $set (@open) {
$count{$_}++ for keys %$set;
}
my ($best) = sort { $count{$b} <=> $count{$a} } keys %count;
return $best;
}
03
Loop: while uncovered sets remain, pick the best element, add it to the hitting set, drop every set it hits. Print the running cover.
my @open = @family;
my @hit;
while (@open) {
my $e = pick_best(@open);
last unless defined $e;
push @hit, $e;
@open = grep { !exists $_->{$e} } @open;
}
print "hitting set: @hit\n";
04
Drive it on a small example. The output is a set of elements that touches every input set — within an ln(n) factor of the minimum.
$ echo 'a b c
b d
c e
a e' | perl hitting.pl
hitting set: a b c