np hard
1998 United States NP-hard
13

Bunch — Software Module Clustering

Spiros Mancoridis and Brian Mitchell, 1998 — recovering the right modular structure of a codebase reduces to graph partitioning, which is NP-hard. Architecture is search.

Spiros Mancoridis and Brian Mitchell, at Drexel University, with Yih-Farn Chen and Emden Gansner, defined the software module clustering problem and built the Bunch tool to solve it heuristically. Given a module dependency graph — nodes are source files, edges are calls or imports — find a partition of the nodes into clusters that maximizes a modularization quality (MQ) metric: high cohesion within clusters, low coupling between them. This is graph partitioning, NP-hard. Bunch attacks it with hill-climbing and genetic algorithms; later work added simulated annealing and multi-objective Pareto search. Every empirical study confirms the same: optimal modularization is computationally infeasible at industrial scale, and every architectural-recovery tool delivers approximations. The hardness is intrinsic. Choosing where the modules go is a search problem, not an engineering one.

Mancoridis, S., Mitchell, B. S., Rorres, C., Chen, Y., Gansner, E. R. (1998). Using Automatic Clustering to Produce High-Level System Organizations of Source Code. Proceedings of the 6th International Workshop on Program Comprehension, 45–52. Source →