N or P
Contents Glossary
16

Dependency order — when the graph has no cycles

No cycles, no problem. DAG-shaped data unlocks an enormous toolbox of fast algorithms.

Directed acyclic graphs (DAGs) are the data shape of build systems, prerequisite chains, version histories, computation pipelines, and most workflow systems. The defining property is "no cycles" — you can list the items in an order where every dependency comes before what depends on it. Computing that order is linear time (topological sort). And once you have it, an enormous range of problems that are hard on general graphs become trivial: longest paths, shortest paths with negative costs, counting completions, computing expected costs. All linear passes over the topological order. Rust's petgraph has topological sort as one call. When your problem lives on a DAG, you are in the cheap category and your team should ship the feature in days.

  1. 01
    petgraph's toposort returns the build order or an error pointing at a cycle. Once you have the order, every NP-hard graph problem that becomes easy on DAGs — longest path, counting completions, critical path — is one linear scan over the sorted list.
    use petgraph::algo::toposort;
    use petgraph::graph::{DiGraph, NodeIndex};
    
    fn build_order(deps: &DiGraph<&str, ()>) -> Result<Vec<NodeIndex>, NodeIndex> {
        toposort(deps, None).map_err(|cycle| cycle.node_id())
    }
  2. 02
    .map_err(f) is the error-side counterpart of .map(f). Where .map(f) transforms the success value of a Result (Ok(t) → Ok(f(t))), .map_err(f) transforms the error value (Err(e) → Err(f(e))). Ok cases pass through unchanged. Here, toposort returns Result<Vec<NodeIndex>, Cycle> — a rich error type carrying one node from the cycle plus internal bookkeeping. The wrapper simplifies that to just the offending node's index (NodeIndex), which is all most callers actually need. Use map_err whenever you want to swap one error type for another without touching the Ok path — typically when a library's error type is richer than your own API needs to expose.
    // .map_err(f) — transform only the Err side of a Result.
    //
    //   Result<T, E1>            .map_err(|e| f(e))            Result<T, E2>
    //
    //   Ok(t)              ────►  passes through        ────►  Ok(t)
    //   Err(e)             ────►  Err(f(e))             ────►  Err(f(e))
    
    let parsed: Result<i32, std::num::ParseIntError> = "abc".parse();
    
    let with_message: Result<i32, String> =
        parsed.map_err(|e| format!("bad number: {}", e));
    
    // Page 16's use — narrow a rich error type to a simpler one the API exposes:
    fn build_order(deps: &DiGraph<&str, ()>) -> Result<Vec<NodeIndex>, NodeIndex> {
        toposort(deps, None)                            // Result<Vec<NodeIndex>, Cycle>
            .map_err(|cycle| cycle.node_id())            // Result<Vec<NodeIndex>, NodeIndex>
    }
    
    // The Result-transformer family:
    //   .map(f)       →  Ok side: Ok(t)  → Ok(f(t))
    //   .map_err(f)   →  Err side: Err(e) → Err(f(e))
    //   .and_then(f)  →  Ok side: Ok(t)  → run f and return its Result   (chain fallible ops)
    //   .or_else(f)   →  Err side: Err(e) → run f and return its Result   (try a recovery)
  3. 03
    toposort's second argument is an optional pre-allocated workspace (a DfsSpace). Passing None says "I have no workspace — allocate one yourself, throw it away when done." That is the right answer for almost every caller. If you were running toposort in a hot loop where the allocation cost mattered, you would build a DfsSpace once and pass Some(&mut space) to every call — the library reuses the scratch buffers across invocations instead of allocating fresh each time. This "bring your own buffer" pattern is common across performance-sensitive Rust libraries (petgraph, nalgebra, regex). It is the library author's way of giving the caller a knob to turn without forcing every caller to think about it.
    // toposort signature (simplified):
    //
    //   fn toposort<G>(
    //       g: G,
    //       space: Option<&mut DfsSpace<G::NodeId, G::Map>>,
    //   ) -> Result<Vec<G::NodeId>, Cycle<G::NodeId>>;
    //
    //   ──────────  ─────────────────────────────────────
    //      graph    optional pre-allocated workspace
    
    // Casual usage — no workspace, library allocates internally:
    let order = toposort(&deps, None)?;
    
    // Hot-path usage — reuse one workspace across many calls:
    use petgraph::algo::DfsSpace;
    
    let mut space = DfsSpace::new(&deps);
    for graph in many_graphs {
        let order = toposort(graph, Some(&mut space))?;
        //                          ────────────────
        //                          same buffer reused; no fresh alloc per call
    }
    
    // The "bring your own buffer" pattern:
    //   None              →  ergonomic default, library handles allocation
    //   Some(&mut buf)    →  caller controls allocation; reuse across calls
Kahn, A. (1962) Topological Sorting of Large Networks. Source →
In plain terms

A directed acyclic graph — or DAG — is a flow of dependencies with no circles. Build systems are DAGs. Course prerequisites are DAGs. Software dependency graphs are DAGs. Workflow engines, data pipelines, computation graphs in machine learning, version control histories — all DAGs.

The structural property that makes DAGs cheap is the topological order. Because there are no cycles, you can list every item in a sequence such that all of its dependencies appear earlier in the sequence. Computing this ordering takes time proportional to the size of the graph.

The reason this matters for planning is that an enormous number of business problems are easy on DAGs and hard on general graphs. The longest-path problem on a general graph is NP-complete. On a DAG it is linear. The shortest-path problem with negative edges takes the slower Bellman-Ford algorithm on general graphs. On a DAG it is one pass. Counting the number of completions of a workflow, computing the expected duration of a project, finding the critical path through a Gantt chart — all linear on DAGs.

The estimating discipline: when a problem involves dependencies, ask whether the dependencies could ever form a cycle. If the answer is no — and it usually is, for build systems, prerequisite chains, and workflows — the problem is on a DAG and is cheap. If the answer is yes, you are on a general graph and the toolbox shrinks.

In Rust, petgraph's toposort function returns the ordering or an error if a cycle exists. Combined with a hand-written loop over the order, your team can solve almost any DAG question in a day. The work is in defining the graph correctly; the algorithm is free.

Many production systems have hidden DAG structure that is not exploited. When a feature request looks expensive, check whether the underlying data is acyclic. Often it is, and the work moves from a quarter to a sprint.

← 15 Either-or constraints
16/39
17 Searching for text patterns →