Improve dense linear solver docs with practical guidance

libeigen/eigen!2395 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-10 11:34:33 +08:00 · 2026-04-05 21:40:42 -07:00
parent 8eabfb5342
commit bde3a68bae
5 changed files with 128 additions and 73 deletions
--- a/doc/DenseDecompositionBenchmark.dox
+++ b/doc/DenseDecompositionBenchmark.dox
@@ -30,10 +30,11 @@ Timings are in \b milliseconds, and factors are relative to the LLT decompositio
 <a name="note_ls">\b *: </a> This decomposition do not support direct least-square solving for over-constrained problems, and the reported timing include the cost to form the symmetric covariance matrix \f$ A^T A \f$.

 \b Observations:
- + LLT is always the fastest solvers.
+ + LLT is always the fastest solver.
 + For largely over-constrained problems, the cost of Cholesky/LU decompositions is dominated by the computation of the symmetric covariance matrix.
- + For large problem sizes, only the decomposition implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explain why for a 4k x 4k matrix, HouseholderQR is faster than LDLT. In the future, LDLT and ColPivHouseholderQR will also implement blocking strategies.
+ + For large problem sizes, only the decompositions implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explains why for a 4k x 4k matrix, HouseholderQR is faster than LDLT.
 + CompleteOrthogonalDecomposition is based on ColPivHouseholderQR and they thus achieve the same level of performance.
+ + FullPivLU and FullPivHouseholderQR are dramatically slower for large matrices due to the lack of blocking, and are not shown for the 4k x 4k case.

 The above table was originally generated by a benchmark tool. Feel free to write your own benchmark to generate a table matching your hardware, compiler, and favorite problem sizes.

--- a/doc/LeastSquares.dox
+++ b/doc/LeastSquares.dox
@@ -7,13 +7,33 @@ of equations, say \a Ax = \a b, has no solutions. In this case, it makes sense t
 vector \a x which is closest to being a solution, in the sense that the difference \a Ax - \a b is
 as small as possible. This \a x is called the least square solution (if the Euclidean norm is used).

-The three methods discussed on this page are the SVD decomposition, the QR decomposition and normal
-equations. Of these, the SVD decomposition is generally the most accurate but the slowest, normal
-equations is the fastest but least accurate, and the QR decomposition is in between.
+The methods discussed on this page are the complete orthogonal decomposition (COD), the SVD
+decomposition, other QR decompositions, and normal equations. For most problems, we recommend
+CompleteOrthogonalDecomposition: it robustly computes the minimum-norm least squares solution
+(like the SVD) for both over- and under-determined systems, including rank-deficient ones, but at
+QR-like speed. The SVD is the most robust but also the slowest; use it when you also need singular
+values or vectors. Normal equations are the fastest but least robust.

 \eigenAutoToc


+\section LeastSquaresCOD Using the complete orthogonal decomposition (recommended)
+
+CompleteOrthogonalDecomposition is the recommended method for least squares problems. It handles the
+widest class of problems — overdetermined, underdetermined, and rank-deficient systems — and computes
+the minimum-norm solution when the system is rank-deficient or underdetermined, just like the SVD.
+It is based on a rank-revealing QR factorization (ColPivHouseholderQR) followed by a post-processing
+step, so it is significantly faster than SVD while providing comparable robustness.
+
+<table class="example">
+<tr><th>Example:</th><th>Output:</th></tr>
+<tr>
+  <td>\include LeastSquaresCOD.cpp </td>
+  <td>\verbinclude LeastSquaresCOD.out </td>
+</tr>
+</table>
+
+
 \section LeastSquaresSVD Using the SVD decomposition

 The \link BDCSVD::solve() solve() \endlink method in the BDCSVD class can be directly used to
@@ -30,16 +50,19 @@ computing least squares solutions:
 </table>

 This is example from the page \link TutorialLinearAlgebra Linear algebra and decompositions \endlink.
-If you just need to solve the least squares problem, but are not interested in the SVD per se, a
-faster alternative method is CompleteOrthogonalDecomposition. 
+The SVD gives you singular values and vectors in addition to the least squares solution, but if you
+only need the solution, CompleteOrthogonalDecomposition (above) is faster.


-\section LeastSquaresQR Using the QR decomposition
+\section LeastSquaresQR Using other QR decompositions

-The solve() method in QR decomposition classes also computes the least squares solution. There are
-three QR decomposition classes: HouseholderQR (no pivoting, fast but unstable if your matrix is
-not rull rank), ColPivHouseholderQR (column pivoting, thus a bit slower but more stable) and
-FullPivHouseholderQR (full pivoting, so slowest and slightly more stable than ColPivHouseholderQR).
+The solve() method in QR decomposition classes also computes the least squares solution. Besides
+CompleteOrthogonalDecomposition (above), there are three other QR decomposition classes:
+HouseholderQR (no pivoting, so fast but unreliable if your matrix is not full rank),
+ColPivHouseholderQR (column pivoting, a bit slower but rank-revealing), and FullPivHouseholderQR
+(full pivoting, significantly slower and rarely needed in practice).
+Note that only CompleteOrthogonalDecomposition and the SVD-based solvers compute minimum-norm
+solutions for rank-deficient or underdetermined problems; the other QR variants do not.
 Here is an example with column pivoting:

 <table class="example">
--- a/doc/TopicLinearAlgebraDecompositions.dox
+++ b/doc/TopicLinearAlgebraDecompositions.dox
@@ -42,10 +42,10 @@ To get an overview of the true relative speed of the different decompositions, c
    <tr class="alt">
        <td>FullPivLU</td>
        <td>-</td>
-        <td>Slow</td>
+        <td>Slow (no blocking)</td>
        <td>Proven</td>
        <td>Yes</td>
-        <td>-</td>
+        <td>Rank, kernel, image</td>
        <td>Yes</td>
        <td>Excellent</td>
        <td>-</td>
@@ -78,7 +78,7 @@ To get an overview of the true relative speed of the different decompositions, c
    <tr>
        <td>FullPivHouseholderQR</td>
        <td>-</td>
-        <td>Slow</td>
+        <td>Slow (no blocking)</td>
        <td>Proven</td>
        <td>Yes</td>
        <td>Orthogonalization</td>
@@ -120,7 +120,7 @@ To get an overview of the true relative speed of the different decompositions, c
        <td>-</td>
        <td>Yes</td>
        <td>Excellent</td>
-        <td><em>Soon: blocking</em></td>
+        <td>-</td>
    </tr>

    <tr><th class="inter" colspan="9">\n Singular values and eigenvalues decompositions</th></tr>
@@ -232,7 +232,7 @@ To get an overview of the true relative speed of the different decompositions, c
        <td>-</td>
        <td>-</td>
        <td>Good</td>
-        <td><em>Soon: blocking</em></td>
+        <td>-</td>
    </tr>

    <tr>
@@ -244,7 +244,7 @@ To get an overview of the true relative speed of the different decompositions, c
        <td>-</td>
        <td>-</td>
        <td>Good</td>
-        <td><em>Soon: blocking</em></td>
+        <td>-</td>
    </tr>

 </table>
@@ -253,9 +253,32 @@ To get an overview of the true relative speed of the different decompositions, c
 <ul>
 <li><a name="note1">\b 1: </a>There exist two variants of the LDLT algorithm. Eigen's one produces a pure diagonal D matrix, and therefore it cannot handle indefinite matrices, unlike Lapack's one which produces a block diagonal D matrix.</li>
 <li><a name="note2">\b 2: </a>Eigenvalues, SVD and Schur decompositions rely on iterative algorithms. Their convergence speed depends on how well the eigenvalues are separated.</li>
-<li><a name="note3">\b 3: </a>Our JacobiSVD is two-sided, making for proven and optimal precision for square matrices. For non-square matrices, we have to use a QR preconditioner first. The default choice, ColPivHouseholderQR, is already very reliable, but if you want it to be proven, use FullPivHouseholderQR instead.
+<li><a name="note3">\b 3: </a>Our JacobiSVD is two-sided, making for proven and optimal precision for square matrices. For non-square matrices, we have to use a QR preconditioner first. The default choice, ColPivHouseholderQR, is already very reliable, but if you want it to be proven, use FullPivHouseholderQR instead.</li>
 </ul>

+\section TopicLinAlgPracticalGuidance Practical guidance
+
+The following recommendations apply to the most common use cases:
+
+\li <b>Symmetric positive definite systems:</b> Use \b LLT. It is the fastest solver and has excellent
+    numerical properties for this class of problems. For semidefinite or nearly singular symmetric systems,
+    use \b LDLT.
+\li <b>General invertible systems:</b> Use \b PartialPivLU. It uses cache-friendly blocking and implicit
+    multi-threading, making it the fastest general-purpose solver. Partial pivoting is sufficient for
+    virtually all practical problems.
+\li <b>Least squares (over- or under-determined systems):</b> Use \b CompleteOrthogonalDecomposition as
+    the default. Like the SVD, it robustly computes the minimum-norm solution for rank-deficient and
+    under-determined problems, but at QR-like speed. Use \b BDCSVD when you also need singular values
+    or vectors, not just the least squares solution.
+\li <b>Full-rank least squares (overdetermined systems):</b> When the matrix is known to be full rank,
+    \b HouseholderQR is the fastest option. For very tall and skinny well-conditioned matrices,
+    solving via the normal equations with \b LLT can be faster still.
+\li <b>FullPivLU and FullPivHouseholderQR</b> use complete pivoting, which prevents the use of
+    cache-friendly blocking algorithms and makes them significantly slower than their partial/column
+    pivoting counterparts. In practice, complete pivoting rarely provides meaningful accuracy benefits.
+    These decompositions are primarily useful for debugging, pedagogy, or the very rare case
+    where column pivoting is insufficient.
+
 \section TopicLinAlgTerminology Terminology

 <dl>
--- a/doc/TutorialLinearAlgebra.dox
+++ b/doc/TutorialLinearAlgebra.dox
@@ -43,7 +43,23 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
        <th>Requirements<br/>on the matrix</th>
        <th>Speed<br/> (small-to-medium)</th>
        <th>Speed<br/> (large)</th>
-        <th>Accuracy</th>
+        <th>Robustness<sup><a href="#note_robust">*</a></sup></th>
+    </tr>
+    <tr>
+        <td>LLT</td>
+        <td>llt()</td>
+        <td>Positive definite</td>
+        <td>+++</td>
+        <td>+++</td>
+        <td>+</td>
+    </tr>
+    <tr class="alt">
+        <td>LDLT</td>
+        <td>ldlt()</td>
+        <td>Positive or negative<br/> semidefinite</td>
+        <td>+++</td>
+        <td>+</td>
+        <td>++</td>
    </tr>
    <tr>
        <td>PartialPivLU</td>
@@ -54,14 +70,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
        <td>+</td>
    </tr>
    <tr class="alt">
-        <td>FullPivLU</td>
-        <td>fullPivLu()</td>
-        <td>None</td>
-        <td>-</td>
-        <td>- -</td>
-        <td>+++</td>
-    </tr>
-    <tr>
        <td>HouseholderQR</td>
        <td>householderQr()</td>
        <td>None</td>
@@ -69,7 +77,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
        <td>++</td>
        <td>+</td>
    </tr>
-    <tr class="alt">
+    <tr>
        <td>ColPivHouseholderQR</td>
        <td>colPivHouseholderQr()</td>
        <td>None</td>
@@ -77,14 +85,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
        <td>-</td>
        <td>+++</td>
    </tr>
-    <tr>
-        <td>FullPivHouseholderQR</td>
-        <td>fullPivHouseholderQr()</td>
-        <td>None</td>
-        <td>-</td>
-        <td>- -</td>
-        <td>+++</td>
-    </tr>
    <tr class="alt">
        <td>CompleteOrthogonalDecomposition</td>
        <td>completeOrthogonalDecomposition()</td>
@@ -93,23 +93,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
        <td>-</td>
        <td>+++</td>
    </tr>
-    <tr class="alt">
-        <td>LLT</td>
-        <td>llt()</td>
-        <td>Positive definite</td>
-        <td>+++</td>
-        <td>+++</td>
-        <td>+</td>
-    </tr>
    <tr>
-        <td>LDLT</td>
-        <td>ldlt()</td>
-        <td>Positive or negative<br/> semidefinite</td>
-        <td>+++</td>
-        <td>+</td>
-        <td>++</td>
-    </tr>
-    <tr class="alt">
        <td>BDCSVD</td>
        <td>bdcSvd()</td>
        <td>None</td>
@@ -126,15 +110,36 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
        <td>+++</td>
    </tr>
 </table>
+
+<a name="note_robust"><b>*</b></a> The <b>Robustness</b> column indicates how well the decomposition handles
+ill-conditioned or rank-deficient matrices. All decompositions give excellent accuracy when their
+requirements on the matrix are met and the problem is well-conditioned.
+
 To get an overview of the true relative speed of the different decompositions, check this \link DenseDecompositionBenchmark benchmark \endlink.

-All of these decompositions offer a solve() method that works as in the above example. 
+All of these decompositions offer a solve() method that works as in the above example.

-If you know more about the properties of your matrix, you can use the above table to select the best method.
-For example, a good choice for solving linear systems with a non-symmetric matrix of full rank is PartialPivLU.
-If you know that your matrix is also symmetric and positive definite, the above table says that
-a very good choice is the LLT or LDLT decomposition. Here's an example, also demonstrating that using a general
-matrix (not a vector) as right hand side is possible:
+\b Practical \b recommendations:
+\li If your matrix is symmetric positive definite, use \b LLT. It is the fastest and is perfectly accurate
+    for this class of problems. If your matrix is only positive or negative semidefinite, use \b LDLT.
+\li For a general invertible matrix, \b PartialPivLU is the best choice. It is fast (uses cache-friendly
+    blocking) and reliable for the vast majority of problems.
+\li For least squares problems (over- or under-determined systems), \b CompleteOrthogonalDecomposition
+    is the recommended default. Like the SVD, it robustly computes the minimum-norm solution for
+    rank-deficient and under-determined problems, but at the cost of a QR decomposition rather than
+    an SVD. Use \b ColPivHouseholderQR if you only need least squares for full-rank overdetermined
+    systems and don't need the minimum-norm property.
+\li \b SVD decompositions (BDCSVD, JacobiSVD) are the most robust but also the slowest. Use these when
+    you need singular values/vectors, not just the solution.
+\li \b HouseholderQR is the fastest option for full-rank least squares problems, but it does not
+    reveal rank and cannot compute minimum-norm solutions for rank-deficient problems.
+\li FullPivLU and FullPivHouseholderQR use complete pivoting, which is significantly slower due to
+    lack of blocking. In practice, they rarely provide meaningful benefits over PartialPivLU and
+    ColPivHouseholderQR, respectively, and are not recommended for general use. They are primarily useful
+    for debugging or for pedagogical purposes.
+
+Here's an example showing the use of LLT for a symmetric positive definite system, also demonstrating
+that using a general matrix (not a vector) as right hand side is possible:

 <table class="example">
 <tr><th>Example:</th><th>Output:</th></tr>
@@ -151,14 +156,15 @@ supports many other decompositions), see our special page on

 \section TutorialLinAlgLeastsquares Least squares solving

-The most general and accurate method to solve under- or over-determined linear systems
-in the least squares sense, is the SVD decomposition. Eigen provides two implementations.
-The recommended one is the BDCSVD class, which scales well for large problems
-and automatically falls back to the JacobiSVD class for smaller problems.
-For both classes, their solve() method solved the linear system in the least-squares
-sense. 
+The recommended method to solve under- or over-determined linear systems in the least squares sense is
+\b CompleteOrthogonalDecomposition. Like the SVD, it robustly computes the minimum-norm least squares
+solution, correctly handling rank-deficient and under-determined problems, but it is significantly faster
+since it is based on a rank-revealing QR decomposition rather than a full SVD.

-Here is an example:
+If you also need the singular values or vectors themselves (not just the least squares solution), use
+\b BDCSVD, which scales well for large problems and automatically falls back to JacobiSVD for smaller ones.
+
+Here is an example using the SVD:
 <table class="example">
 <tr><th>Example:</th><th>Output:</th></tr>
 <tr>
@@ -167,11 +173,9 @@ Here is an example:
 </tr>
 </table>

-An alternative to the SVD, which is usually faster and about as accurate, is CompleteOrthogonalDecomposition. 
-
-Again, if you know more about the problem, the table above contains methods that are potentially faster.
-If your matrix is full rank, HouseHolderQR is the method of choice. If your matrix is full rank and well conditioned,
-using the Cholesky decomposition (LLT) on the matrix of the normal equations can be faster still.
+If you know more about the problem, faster methods are available.
+If your matrix is full rank, HouseholderQR is the fastest method. If your matrix is full rank and
+well conditioned, using the Cholesky decomposition (LLT) on the normal equations can be faster still.
 Our page on \link LeastSquares least squares solving \endlink has more details.


@@ -267,8 +271,9 @@ singular matrix). On \ref TopicLinearAlgebraDecompositions "this table" you can
 whether they are rank-revealing or not.

 Rank-revealing decompositions offer at least a rank() method. They can also offer convenience methods such as isInvertible(),
-and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix, as is the
-case with FullPivLU:
+and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix.
+ColPivHouseholderQR, CompleteOrthogonalDecomposition, and FullPivLU all provide these methods. Here is an example using
+FullPivLU:

 <table class="example">
 <tr><th>Example:</th><th>Output:</th></tr>
--- a/doc/snippets/LeastSquaresCOD.cpp
+++ b/doc/snippets/LeastSquaresCOD.cpp
@@ -0,0 +1,3 @@
+MatrixXf A = MatrixXf::Random(3, 2);
+VectorXf b = VectorXf::Random(3);
+cout << "The solution using the COD is:\n" << A.completeOrthogonalDecomposition().solve(b) << endl;