Convex optimization extends linear programming with curved objectives, as long as the curve goes the right way. "Convex" means the objective looks like a bowl — every local minimum is the global minimum. This structure makes the problem solvable in polynomial time, even when the objective is not linear. Most regularized statistical models (ridge regression, lasso, support vector machines, logistic regression) are convex. Portfolio optimization with risk penalties is convex. Many control problems are convex. Rust's clarabel library is a production-grade convex solver. When the problem is convex, the engineering estimate is days. When the problem is non-convex (most deep learning training), the toolkit changes to heuristics with no global guarantees. Knowing whether your objective is convex changes the engineering plan.
20
Optimization with curves instead of lines
Convex optimization is the broadest class of optimization that stays in the cheap category. Most machine learning training fits here.
- Ridge regression is the easiest convex problem to recognize — a quadratic loss plus an L2 penalty. The closed-form solution is one linear-system solve;
nalgebrahandles it. For larger problems the same setup goes toargminorclarabelwithout any change of shape.use nalgebra::{DMatrix, DVector}; fn ridge(x: &DMatrix<f64>, y: &DVector<f64>, lambda: f64) -> DVector<f64> { let n = x.ncols(); let xt = x.transpose(); let a = &xt * x + DMatrix::identity(n, n) * lambda; let b = xt * y; a.lu().solve(&b).expect("ridge is always solvable for lambda > 0") } -
DMatrix<T>andDVector<T>come from the nalgebra crate. The leading "D" stands for dynamic — the dimensions (rows × columns for a matrix, length for a vector) are decided at runtime, and the storage lives on the heap. Both let you write matrix algebra with operator overloading (+,-,*on whole matrices) the way the math reads. nalgebra also ships statically-sized variants —Matrix3<T>,Vector3<T>,SMatrix<T, R, C>— where the dimensions are encoded in the type and the storage lives inline (no allocation, slightly faster). Use the D-variants when the size depends on input data (a regression whose feature count comes from the data); use the static variants when the dimensions are fixed up front (a 3D position, a 4×4 transform, a small kernel).use nalgebra::{DMatrix, DVector}; // DMatrix — dynamically-sized matrix, heap-allocated: let x: DMatrix<f64> = DMatrix::from_row_slice(3, 2, &[ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, ]); // ─ ─ // │ └── 2 columns // └───── 3 rows // // Three samples (rows), two features per sample (columns). // DVector — dynamically-sized column vector, heap-allocated: let y: DVector<f64> = DVector::from_vec(vec![1.0, 2.0, 3.0]); // ───────────── // 3 elements // // One target per sample, three samples total. // Operations look like the math: let xt: DMatrix<f64> = x.transpose(); // X^T let xtx: DMatrix<f64> = &xt * &x; // X^T X (matrix product) let xty: DVector<f64> = &xt * &y; // X^T y (matrix · vector) let scaled: DMatrix<f64> = &x * 2.0; // scalar multiplication // Statically-sized variants — dimensions in the type, no allocation: use nalgebra::{Matrix3, Vector3}; let m: Matrix3<f64> = Matrix3::new( 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, ); let v: Vector3<f64> = Vector3::new(1.0, 2.0, 3.0); // When to use which: // // DMatrix / DVector size depends on input data. // regression, dynamic graphs, runtime-shaped maps. // // Matrix3, Vector3, size fixed at compile time. // SMatrix<T, R, C> 3D geometry, fixed transforms, small kernels. -
DMatrix::identity(n, n)constructs the n×n identity matrix — an n×n matrix with1.0on the diagonal and0.0everywhere else. The name comes from its algebraic role: multiplying any matrix or vector by the identity leaves it unchanged.I × A = A, the same way multiplying a number by 1 leaves it unchanged. The identity is the matrix-algebra equivalent of the number1in ordinary arithmetic. Here it appears in the ridge-regression formula: addingλItoX^T Xis the regularization term. Without it,X^T Xcan be singular (no inverse) and the linear system has no unique solution. Adding a small multiple of the identity bumps every diagonal entry up byλ, which guarantees a clean solve and simultaneously penalizes large coefficient values. Smallλ≈ ordinary least squares; largeλshrinks the solution toward zero.// The 3×3 identity: // // ┌ 1 0 0 ┐ // I_3 = │ 0 1 0 │ // └ 0 0 1 ┘ // // In nalgebra: let i: DMatrix<f64> = DMatrix::identity(3, 3); // The defining property — multiplying by identity changes nothing: let a: DMatrix<f64> = DMatrix::from_row_slice(3, 3, &[ 2.0, 1.0, 3.0, 4.0, 0.0, 1.0, 1.0, 1.0, 1.0, ]); let same = &i * &a; // == a let same = &a * &i; // == a // On page 20, identity is the regularizer for X^T X: // // without regularization: a = X^T X // X^T X can be singular — no inverse exists, // and the linear system has no unique solution. // // ridge regularization: a = X^T X + λI // every diagonal entry bumped up by λ. // guarantees a clean solve, and the λ knob // trades off fit against coefficient size. // The "1" of matrix algebra. Just as x * 1 == x in ordinary arithmetic, // A · I == A in matrix algebra. Identity is the multiplicative identity // element for matrix multiplication.