N or P
Contents Glossary
06

Sorting — almost never your problem

The first problem any programmer learns, and one of the very few where the standard library is always the right answer.

Sorting a list — by date, by price, by score, by any key — is one of the most studied problems in computing. The standard library in every modern language ships an implementation that is faster than anything your team will write. Rust's standard sort handles millions of items per second per core, is well-tested, and degrades gracefully on adversarial inputs. When sorting becomes a performance question — typically only at the scale of billions of items — the answer is to use a specialized data layout (radix sort for fixed-width keys, external sort for data that does not fit in memory) or a database that does the work for you. Almost no business has a sorting problem that is not already solved. If a ticket on your roadmap is "implement custom sorting," ask whether the real ticket is something else.

  1. 01
    sort_unstable_by_key is the right default — faster than sort for most workloads, allocates nothing, and is what std ships. Reach for rayon::par_sort_unstable_by when the slice is large enough that the parallel split pays for itself.
    use rayon::slice::ParallelSliceMut;
    
    struct Order { id: u64, price_cents: u64, placed_at: u64 }
    
    fn newest_first(orders: &mut [Order]) {
        orders.sort_unstable_by_key(|o| std::cmp::Reverse(o.placed_at));
    }
    
    fn newest_first_parallel(orders: &mut [Order]) {
        orders.par_sort_unstable_by_key(|o| std::cmp::Reverse(o.placed_at));
    }
  2. 02
    "Unstable" here is a property of the sorting algorithm, not Rust's unsafe keyword. A stable sort preserves the original relative order of equal elements; an unstable sort may reorder them. Rust's sort() is stable (a Timsort variant) and allocates scratch space on the heap. sort_unstable() uses pdqsort (a tuned quicksort) — faster, allocates nothing, but two Orders with the same placed_at might come out in either order. For newest-first ranking, nothing in the schema cares which of two same-timestamp orders appears first, so sort_unstable is the right default. Reach for plain sort only when you actually rely on equal-key stability — for example, when an earlier sort already established a secondary order you want to preserve.
    // "unstable" here = the sorting algorithm may reorder equal elements.
    // It has nothing to do with Rust's `unsafe` keyword.
    
    let mut orders = vec![
        Order { id: 1, placed_at: 100, /* .. */ },   // input order: 1 before 2
        Order { id: 2, placed_at: 100, /* .. */ },
        Order { id: 3, placed_at:  50, /* .. */ },
    ];
    
    orders.sort_by_key(|o| o.placed_at);
    // → ids: [3, 1, 2]   stable: id 1 stays before id 2 because they tie
    
    orders.sort_unstable_by_key(|o| o.placed_at);
    // → ids: [3, 1, 2] or [3, 2, 1]   either is allowed — they tie
    
    // Rule of thumb:
    //   sort()           ← stable, allocates, slightly slower
    //   sort_unstable()  ← unstable, no allocation, fastest default
  3. 03
    Separately, safe vs unsafe Rust is a different axis entirely. All of the code on this page is safe Rust — the compiler proves at compile time that there are no use-after-free, no out-of-bounds indexing, no data races. The unsafe keyword exists for the small minority of cases where you need to opt out of those checks (dereferencing raw pointers, calling C functions, building a new data structure whose invariants the compiler cannot see). An unsafe block does not turn off the type system — it adds five specific superpowers, and the burden is on you to prove they are used correctly. Nothing in this book uses unsafe. The whole point of the language is that high-performance code (like sort_unstable or rayon's parallel sort) is achievable without it.
    // Safe Rust — what every example in this book uses.
    // Compiler guarantees: no use-after-free, no data races,
    // no out-of-bounds indexing, no null-pointer dereferences.
    fn ok(orders: &mut [Order]) {
        orders.sort_unstable_by_key(|o| o.placed_at);
    }
    
    // Unsafe Rust — used inside std and a few low-level crates.
    // You write `unsafe` to take responsibility for invariants
    // the compiler cannot check.  Five things become legal:
    //   1. dereference a raw pointer
    //   2. call an unsafe function
    //   3. access a mutable static
    //   4. implement an unsafe trait
    //   5. access fields of a union
    fn raw_example(ptr: *const u32) -> u32 {
        unsafe { *ptr }   // YOU promise ptr is valid and aligned
    }
    
    // "unstable sort" and "unsafe Rust" sound related and are not.
    // sort_unstable is 100% safe.
Knuth, D. (1998) The Art of Computer Programming, Volume 3. Sedgewick, R. (2011) Algorithms. Source →
In plain terms

Sorting is the running example in every introductory programming course. Generations of students have implemented bubble sort, insertion sort, merge sort, and quicksort as homework. The result is that every working engineer thinks they understand sorting and almost none of them should ever write a sorting routine in production code.

The standard library implementations are the product of decades of tuning. They handle the cases your team will not think about — already-sorted inputs, mostly-sorted inputs, inputs with many duplicates, inputs that fit in cache versus inputs that do not, inputs with expensive comparison functions versus cheap ones. Beating a standard library sort on a generic workload is a multi-week project that almost no team has a business reason to attempt.

There is a theoretical limit to comparison-based sorting that you should know exists — the work grows in proportion to the input times the logarithm of the input. This means sorting a list ten times larger takes roughly eleven times as long, not a hundred times. It scales predictably. Your engineers cannot do better than this without exploiting some special structure of the keys.

The exceptions worth knowing. When your keys are bounded-width integers or short strings, sorting can be done in time proportional to just the input — radix sort and counting sort skip the comparison limit entirely. When the data does not fit in memory, you need an external sort algorithm, which databases handle natively. When you need to pick the top K items rather than sort everything, the standard library has a partial-sort function that runs in linear time.

If a ticket on the roadmap says "improve our sorting performance," the questions to ask are: are we using the standard library, are we sorting more than we need to, and is the right answer actually to push the sort into the database. Almost always the answer is one of those three.

You do not have a sorting problem. You have a planning problem.

← 05 What if the line collapsed
6/39
07 Traversing a network →