Searching for patterns in text — substrings, regular expressions, dictionaries of keywords — is one of the most engineered problems in computing. Industrial implementations scan gigabytes of text per second on a single CPU core. The libraries handle every edge case — Unicode, multi-pattern, anchored, case-insensitive, with SIMD acceleration. Rust's aho-corasick library handles multi-matching" data-term="pattern matching">pattern matching (finding any of thousands of keywords in a body of text). Rust's regex library handles regular expressions and is the fastest in widespread use. Rust's memchr library handles single-byte and short-pattern scanning with SIMD. Your team should never write their own pattern matcher. The only flavor that is genuinely hard is regular expressions with backreferences (PCRE-style) — those can be made to run exponentially slow. Rust's regex library deliberately excludes them.
17
Searching for text patterns
Pattern matching in text is a solved category. Reach for the library; the library is faster than anything your team will write.
-
aho-corasickcompiles a list of keywords into a single automaton, then scans a body of text in one pass. Ideal for content moderation, spam filtering, intrusion detection — anywhere you need many needles in one haystack.use aho_corasick::AhoCorasick; fn flag(text: &str, banned: &[&str]) -> Vec<(usize, String)> { let ac = AhoCorasick::new(banned).expect("valid patterns"); ac.find_iter(text) .map(|m| (m.start(), banned[m.pattern()].to_string())) .collect() } -
&[&str]is a slice of string slices, taken by reference. Three layers nested together:&stris one string slice (a view into bytes, page 4),[&str]is a sequence of those, and the outer&borrows the sequence. Reading it inside-out: "a borrowed view of a sequence of borrowed string views." This is the most permissive shape for "a list of read-only strings I want to scan." Callers can pass an array literal directly (&["foo", "bar"]), or borrow aVec<&str>(&patterns). Taking&[&str]instead ofVec<String>means the function does not demand ownership of the patterns and the caller does not have to allocate. It is the page-14 rule — parameters borrow, not own — applied to a list.// &[&str] decoded: // // & [ &str ] // ─ ─ ──── // │ │ │ // │ │ └── &str — one string slice (borrowed view of bytes) // │ └──── [&str] — a sequence of string slices // └────── & — a borrow of that sequence // // "A borrowed view of a sequence of borrowed string views." fn flag(text: &str, banned: &[&str]) -> Vec<(usize, String)> { /* ... */ } // Every one of these calls works: flag(input, &["secret", "password"]); // array literal — most ergonomic flag(input, &vec!["secret", "password"]); // Vec<&str> coerces to &[&str] let patterns: Vec<&str> = lines.iter().map(|l| l.trim()).collect(); flag(input, &patterns); // borrow an existing Vec<&str> // Contrast with a shape that forces every caller to allocate: fn flag_owned(text: &str, banned: Vec<String>) { /* ... */ } flag_owned(input, vec!["secret".to_string(), "password".to_string()]); // .to_string() per literal — wasteful // One subtle catch — &[String] does NOT auto-coerce to &[&str]: let owned: Vec<String> = vec!["secret".into(), "password".into()]; flag(input, &owned); // ✗ type mismatch // // Fix: convert with .iter().map(String::as_str).collect::<Vec<&str>>(). // Or accept the most flexible shape: &[impl AsRef<str>]. // For most flagging APIs, &[&str] is the right default — literals are // the common case. - The return type is
Vec<(usize, String)>— each tuple holds the byte offset where the match started and a freshStringcopy of the banned pattern. WhyStringinstead of&str? Returning&strwould work but only with an explicit lifetime tying the return value to thebannedslice: the result would point into the input, and the caller would have to keepbannedalive for as long as they held the result. ReturningStringallocates a small fresh copy per match (a handful of bytes each, usually cheap), and the result is self-contained — callers can store it, dropbanned, send the result across threads, serialize it to JSON. This is the page-14 rule in action: return new text the caller should own (String); return a view tied to inputs (&strwith a lifetime). When the lifetime would make the API noticeably worse for downstream consumers,Stringis the right tradeoff.// What we have — self-contained result: fn flag(text: &str, banned: &[&str]) -> Vec<(usize, String)> { let ac = AhoCorasick::new(banned).expect("valid patterns"); ac.find_iter(text) .map(|m| (m.start(), banned[m.pattern()].to_string())) // └─ allocates a fresh String per match .collect() } // Zero-allocation alternative — &str views tied to banned's lifetime: fn flag_borrowed<'a>(text: &str, banned: &'a [&'a str]) -> Vec<(usize, &'a str)> { let ac = AhoCorasick::new(banned).expect("valid patterns"); ac.find_iter(text) .map(|m| (m.start(), banned[m.pattern()])) // └─ no allocation; lifetime tied to banned .collect() } // Tradeoffs: // // Vec<(usize, String)> self-contained. caller can drop banned. // JSON-serialisable, thread-shippable, store anywhere. // one small allocation per match. // // Vec<(usize, &'a str)> zero allocation per match. // caller must keep banned alive while result is in use. // every consumer downstream has to think about lifetimes. // // For a flagging API where results get logged, stored, or sent to a worker, // the String variant is the friendlier shape. The cost is a few bytes // copied per hit — usually invisible. - Worth pausing on the word borrow, since it has been doing a lot of work for the last ten pages. It is a deliberate metaphor in Rust, not just casual jargon. Every value in a Rust program has exactly one owner; when the owner goes out of scope, the value is dropped — memory freed, file handles closed, locks released. A borrow is what the English word suggests: temporary access to a value without transferring ownership. You hand someone your book; they read it; they give it back; you still own it. Two flavors: a shared borrow
&Tlets any number of readers see the value at once but lets nobody change it (many people reading the same poster). An exclusive borrow&mut Tgives one writer permission to mutate the value, and while it exists nobody else — not even the original owner — can read or write that value (one person editing a document). The compiler enforces this "shared XOR mutable" rule at compile time, which is what the borrow checker does. "Fighting the borrow checker" is the universal Rust-learner experience of running headlong into the rule. The rules are stricter than other languages. The payoff is that at runtime your program has no data races, no use-after-free, no iterator-invalidation crashes — every one of those bugs is turned into a compile error.// ── Ownership: every value has exactly one owner ────────────── let s = String::from("hello"); let t = s; // ownership MOVES from s to t // println!("{}", s); // error: borrow of moved value `s` // ── Borrow: temporary access, no ownership transfer ─────────── let s = String::from("hello"); let r = &s; // r BORROWS s — s still owns the bytes println!("{} {}", s, r); // both readable; the data has not moved // ── Two flavors ─────────────────────────────────────────────── // 1. Shared borrows (&T) — many readers, no writers. let s = String::from("hello"); let a = &s; let b = &s; // fine — many shared borrows can coexist println!("{} {} {}", s, a, b); // 2. Exclusive borrows (&mut T) — one writer, no other readers. let mut s = String::from("hello"); let m = &mut s; m.push_str(", world"); // can mutate through &mut // println!("{}", s); // error: cannot use s while m exists println!("{}", m); // ok — m is the one allowed reference // ── The rule the borrow checker enforces ───────────────────── // // At any point in the program, for any value, you may have EITHER: // // • any number of shared borrows (&T) OR // • exactly one exclusive borrow (&mut T) // // Never both. Never two &mut at once. // let mut s = String::from("hello"); let a = &s; // shared borrow let b = &mut s; // error: cannot also borrow as mutable println!("{}", a); // ── "Fighting the borrow checker" — and how not to ─────────── // // The fix is never to disable the checker. The fix is to restructure // so the lifetimes are clearer: // // • finish using the shared borrow, THEN take the mutable borrow // • clone the data if two independent owners are actually what you want // • split the data so two halves can be borrowed independently // • thread the data through one owner instead of sharing // // What you buy by accepting the rules — at runtime, Rust programs do not have: // • data races (two threads writing the same value) // • use-after-free (reading freed memory) // • iterator invalidation (modifying a collection while iterating it) // // All three are turned into compile errors. The borrow checker is // the part of the compiler that does that work.