Improve dense linear solver docs with practical guidance

libeigen/eigen!2395

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
This commit is contained in:
Rasmus Munk Larsen
2026-04-05 21:40:42 -07:00
parent 8eabfb5342
commit bde3a68bae
5 changed files with 128 additions and 73 deletions

View File

@@ -30,10 +30,11 @@ Timings are in \b milliseconds, and factors are relative to the LLT decompositio
<a name="note_ls">\b *: </a> This decomposition do not support direct least-square solving for over-constrained problems, and the reported timing include the cost to form the symmetric covariance matrix \f$ A^T A \f$.
\b Observations:
+ LLT is always the fastest solvers.
+ LLT is always the fastest solver.
+ For largely over-constrained problems, the cost of Cholesky/LU decompositions is dominated by the computation of the symmetric covariance matrix.
+ For large problem sizes, only the decomposition implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explain why for a 4k x 4k matrix, HouseholderQR is faster than LDLT. In the future, LDLT and ColPivHouseholderQR will also implement blocking strategies.
+ For large problem sizes, only the decompositions implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explains why for a 4k x 4k matrix, HouseholderQR is faster than LDLT.
+ CompleteOrthogonalDecomposition is based on ColPivHouseholderQR and they thus achieve the same level of performance.
+ FullPivLU and FullPivHouseholderQR are dramatically slower for large matrices due to the lack of blocking, and are not shown for the 4k x 4k case.
The above table was originally generated by a benchmark tool. Feel free to write your own benchmark to generate a table matching your hardware, compiler, and favorite problem sizes.

View File

@@ -7,13 +7,33 @@ of equations, say \a Ax = \a b, has no solutions. In this case, it makes sense t
vector \a x which is closest to being a solution, in the sense that the difference \a Ax - \a b is
as small as possible. This \a x is called the least square solution (if the Euclidean norm is used).
The three methods discussed on this page are the SVD decomposition, the QR decomposition and normal
equations. Of these, the SVD decomposition is generally the most accurate but the slowest, normal
equations is the fastest but least accurate, and the QR decomposition is in between.
The methods discussed on this page are the complete orthogonal decomposition (COD), the SVD
decomposition, other QR decompositions, and normal equations. For most problems, we recommend
CompleteOrthogonalDecomposition: it robustly computes the minimum-norm least squares solution
(like the SVD) for both over- and under-determined systems, including rank-deficient ones, but at
QR-like speed. The SVD is the most robust but also the slowest; use it when you also need singular
values or vectors. Normal equations are the fastest but least robust.
\eigenAutoToc
\section LeastSquaresCOD Using the complete orthogonal decomposition (recommended)
CompleteOrthogonalDecomposition is the recommended method for least squares problems. It handles the
widest class of problems — overdetermined, underdetermined, and rank-deficient systems — and computes
the minimum-norm solution when the system is rank-deficient or underdetermined, just like the SVD.
It is based on a rank-revealing QR factorization (ColPivHouseholderQR) followed by a post-processing
step, so it is significantly faster than SVD while providing comparable robustness.
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
<tr>
<td>\include LeastSquaresCOD.cpp </td>
<td>\verbinclude LeastSquaresCOD.out </td>
</tr>
</table>
\section LeastSquaresSVD Using the SVD decomposition
The \link BDCSVD::solve() solve() \endlink method in the BDCSVD class can be directly used to
@@ -30,16 +50,19 @@ computing least squares solutions:
</table>
This is example from the page \link TutorialLinearAlgebra Linear algebra and decompositions \endlink.
If you just need to solve the least squares problem, but are not interested in the SVD per se, a
faster alternative method is CompleteOrthogonalDecomposition.
The SVD gives you singular values and vectors in addition to the least squares solution, but if you
only need the solution, CompleteOrthogonalDecomposition (above) is faster.
\section LeastSquaresQR Using the QR decomposition
\section LeastSquaresQR Using other QR decompositions
The solve() method in QR decomposition classes also computes the least squares solution. There are
three QR decomposition classes: HouseholderQR (no pivoting, fast but unstable if your matrix is
not rull rank), ColPivHouseholderQR (column pivoting, thus a bit slower but more stable) and
FullPivHouseholderQR (full pivoting, so slowest and slightly more stable than ColPivHouseholderQR).
The solve() method in QR decomposition classes also computes the least squares solution. Besides
CompleteOrthogonalDecomposition (above), there are three other QR decomposition classes:
HouseholderQR (no pivoting, so fast but unreliable if your matrix is not full rank),
ColPivHouseholderQR (column pivoting, a bit slower but rank-revealing), and FullPivHouseholderQR
(full pivoting, significantly slower and rarely needed in practice).
Note that only CompleteOrthogonalDecomposition and the SVD-based solvers compute minimum-norm
solutions for rank-deficient or underdetermined problems; the other QR variants do not.
Here is an example with column pivoting:
<table class="example">

View File

@@ -42,10 +42,10 @@ To get an overview of the true relative speed of the different decompositions, c
<tr class="alt">
<td>FullPivLU</td>
<td>-</td>
<td>Slow</td>
<td>Slow (no blocking)</td>
<td>Proven</td>
<td>Yes</td>
<td>-</td>
<td>Rank, kernel, image</td>
<td>Yes</td>
<td>Excellent</td>
<td>-</td>
@@ -78,7 +78,7 @@ To get an overview of the true relative speed of the different decompositions, c
<tr>
<td>FullPivHouseholderQR</td>
<td>-</td>
<td>Slow</td>
<td>Slow (no blocking)</td>
<td>Proven</td>
<td>Yes</td>
<td>Orthogonalization</td>
@@ -120,7 +120,7 @@ To get an overview of the true relative speed of the different decompositions, c
<td>-</td>
<td>Yes</td>
<td>Excellent</td>
<td><em>Soon: blocking</em></td>
<td>-</td>
</tr>
<tr><th class="inter" colspan="9">\n Singular values and eigenvalues decompositions</th></tr>
@@ -232,7 +232,7 @@ To get an overview of the true relative speed of the different decompositions, c
<td>-</td>
<td>-</td>
<td>Good</td>
<td><em>Soon: blocking</em></td>
<td>-</td>
</tr>
<tr>
@@ -244,7 +244,7 @@ To get an overview of the true relative speed of the different decompositions, c
<td>-</td>
<td>-</td>
<td>Good</td>
<td><em>Soon: blocking</em></td>
<td>-</td>
</tr>
</table>
@@ -253,9 +253,32 @@ To get an overview of the true relative speed of the different decompositions, c
<ul>
<li><a name="note1">\b 1: </a>There exist two variants of the LDLT algorithm. Eigen's one produces a pure diagonal D matrix, and therefore it cannot handle indefinite matrices, unlike Lapack's one which produces a block diagonal D matrix.</li>
<li><a name="note2">\b 2: </a>Eigenvalues, SVD and Schur decompositions rely on iterative algorithms. Their convergence speed depends on how well the eigenvalues are separated.</li>
<li><a name="note3">\b 3: </a>Our JacobiSVD is two-sided, making for proven and optimal precision for square matrices. For non-square matrices, we have to use a QR preconditioner first. The default choice, ColPivHouseholderQR, is already very reliable, but if you want it to be proven, use FullPivHouseholderQR instead.
<li><a name="note3">\b 3: </a>Our JacobiSVD is two-sided, making for proven and optimal precision for square matrices. For non-square matrices, we have to use a QR preconditioner first. The default choice, ColPivHouseholderQR, is already very reliable, but if you want it to be proven, use FullPivHouseholderQR instead.</li>
</ul>
\section TopicLinAlgPracticalGuidance Practical guidance
The following recommendations apply to the most common use cases:
\li <b>Symmetric positive definite systems:</b> Use \b LLT. It is the fastest solver and has excellent
numerical properties for this class of problems. For semidefinite or nearly singular symmetric systems,
use \b LDLT.
\li <b>General invertible systems:</b> Use \b PartialPivLU. It uses cache-friendly blocking and implicit
multi-threading, making it the fastest general-purpose solver. Partial pivoting is sufficient for
virtually all practical problems.
\li <b>Least squares (over- or under-determined systems):</b> Use \b CompleteOrthogonalDecomposition as
the default. Like the SVD, it robustly computes the minimum-norm solution for rank-deficient and
under-determined problems, but at QR-like speed. Use \b BDCSVD when you also need singular values
or vectors, not just the least squares solution.
\li <b>Full-rank least squares (overdetermined systems):</b> When the matrix is known to be full rank,
\b HouseholderQR is the fastest option. For very tall and skinny well-conditioned matrices,
solving via the normal equations with \b LLT can be faster still.
\li <b>FullPivLU and FullPivHouseholderQR</b> use complete pivoting, which prevents the use of
cache-friendly blocking algorithms and makes them significantly slower than their partial/column
pivoting counterparts. In practice, complete pivoting rarely provides meaningful accuracy benefits.
These decompositions are primarily useful for debugging, pedagogy, or the very rare case
where column pivoting is insufficient.
\section TopicLinAlgTerminology Terminology
<dl>

View File

@@ -43,7 +43,23 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
<th>Requirements<br/>on the matrix</th>
<th>Speed<br/> (small-to-medium)</th>
<th>Speed<br/> (large)</th>
<th>Accuracy</th>
<th>Robustness<sup><a href="#note_robust">*</a></sup></th>
</tr>
<tr>
<td>LLT</td>
<td>llt()</td>
<td>Positive definite</td>
<td>+++</td>
<td>+++</td>
<td>+</td>
</tr>
<tr class="alt">
<td>LDLT</td>
<td>ldlt()</td>
<td>Positive or negative<br/> semidefinite</td>
<td>+++</td>
<td>+</td>
<td>++</td>
</tr>
<tr>
<td>PartialPivLU</td>
@@ -54,14 +70,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
<td>+</td>
</tr>
<tr class="alt">
<td>FullPivLU</td>
<td>fullPivLu()</td>
<td>None</td>
<td>-</td>
<td>- -</td>
<td>+++</td>
</tr>
<tr>
<td>HouseholderQR</td>
<td>householderQr()</td>
<td>None</td>
@@ -69,7 +77,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
<td>++</td>
<td>+</td>
</tr>
<tr class="alt">
<tr>
<td>ColPivHouseholderQR</td>
<td>colPivHouseholderQr()</td>
<td>None</td>
@@ -77,14 +85,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
<td>-</td>
<td>+++</td>
</tr>
<tr>
<td>FullPivHouseholderQR</td>
<td>fullPivHouseholderQr()</td>
<td>None</td>
<td>-</td>
<td>- -</td>
<td>+++</td>
</tr>
<tr class="alt">
<td>CompleteOrthogonalDecomposition</td>
<td>completeOrthogonalDecomposition()</td>
@@ -93,23 +93,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
<td>-</td>
<td>+++</td>
</tr>
<tr class="alt">
<td>LLT</td>
<td>llt()</td>
<td>Positive definite</td>
<td>+++</td>
<td>+++</td>
<td>+</td>
</tr>
<tr>
<td>LDLT</td>
<td>ldlt()</td>
<td>Positive or negative<br/> semidefinite</td>
<td>+++</td>
<td>+</td>
<td>++</td>
</tr>
<tr class="alt">
<td>BDCSVD</td>
<td>bdcSvd()</td>
<td>None</td>
@@ -126,15 +110,36 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
<td>+++</td>
</tr>
</table>
<a name="note_robust"><b>*</b></a> The <b>Robustness</b> column indicates how well the decomposition handles
ill-conditioned or rank-deficient matrices. All decompositions give excellent accuracy when their
requirements on the matrix are met and the problem is well-conditioned.
To get an overview of the true relative speed of the different decompositions, check this \link DenseDecompositionBenchmark benchmark \endlink.
All of these decompositions offer a solve() method that works as in the above example.
All of these decompositions offer a solve() method that works as in the above example.
If you know more about the properties of your matrix, you can use the above table to select the best method.
For example, a good choice for solving linear systems with a non-symmetric matrix of full rank is PartialPivLU.
If you know that your matrix is also symmetric and positive definite, the above table says that
a very good choice is the LLT or LDLT decomposition. Here's an example, also demonstrating that using a general
matrix (not a vector) as right hand side is possible:
\b Practical \b recommendations:
\li If your matrix is symmetric positive definite, use \b LLT. It is the fastest and is perfectly accurate
for this class of problems. If your matrix is only positive or negative semidefinite, use \b LDLT.
\li For a general invertible matrix, \b PartialPivLU is the best choice. It is fast (uses cache-friendly
blocking) and reliable for the vast majority of problems.
\li For least squares problems (over- or under-determined systems), \b CompleteOrthogonalDecomposition
is the recommended default. Like the SVD, it robustly computes the minimum-norm solution for
rank-deficient and under-determined problems, but at the cost of a QR decomposition rather than
an SVD. Use \b ColPivHouseholderQR if you only need least squares for full-rank overdetermined
systems and don't need the minimum-norm property.
\li \b SVD decompositions (BDCSVD, JacobiSVD) are the most robust but also the slowest. Use these when
you need singular values/vectors, not just the solution.
\li \b HouseholderQR is the fastest option for full-rank least squares problems, but it does not
reveal rank and cannot compute minimum-norm solutions for rank-deficient problems.
\li FullPivLU and FullPivHouseholderQR use complete pivoting, which is significantly slower due to
lack of blocking. In practice, they rarely provide meaningful benefits over PartialPivLU and
ColPivHouseholderQR, respectively, and are not recommended for general use. They are primarily useful
for debugging or for pedagogical purposes.
Here's an example showing the use of LLT for a symmetric positive definite system, also demonstrating
that using a general matrix (not a vector) as right hand side is possible:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
@@ -151,14 +156,15 @@ supports many other decompositions), see our special page on
\section TutorialLinAlgLeastsquares Least squares solving
The most general and accurate method to solve under- or over-determined linear systems
in the least squares sense, is the SVD decomposition. Eigen provides two implementations.
The recommended one is the BDCSVD class, which scales well for large problems
and automatically falls back to the JacobiSVD class for smaller problems.
For both classes, their solve() method solved the linear system in the least-squares
sense.
The recommended method to solve under- or over-determined linear systems in the least squares sense is
\b CompleteOrthogonalDecomposition. Like the SVD, it robustly computes the minimum-norm least squares
solution, correctly handling rank-deficient and under-determined problems, but it is significantly faster
since it is based on a rank-revealing QR decomposition rather than a full SVD.
Here is an example:
If you also need the singular values or vectors themselves (not just the least squares solution), use
\b BDCSVD, which scales well for large problems and automatically falls back to JacobiSVD for smaller ones.
Here is an example using the SVD:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
<tr>
@@ -167,11 +173,9 @@ Here is an example:
</tr>
</table>
An alternative to the SVD, which is usually faster and about as accurate, is CompleteOrthogonalDecomposition.
Again, if you know more about the problem, the table above contains methods that are potentially faster.
If your matrix is full rank, HouseHolderQR is the method of choice. If your matrix is full rank and well conditioned,
using the Cholesky decomposition (LLT) on the matrix of the normal equations can be faster still.
If you know more about the problem, faster methods are available.
If your matrix is full rank, HouseholderQR is the fastest method. If your matrix is full rank and
well conditioned, using the Cholesky decomposition (LLT) on the normal equations can be faster still.
Our page on \link LeastSquares least squares solving \endlink has more details.
@@ -267,8 +271,9 @@ singular matrix). On \ref TopicLinearAlgebraDecompositions "this table" you can
whether they are rank-revealing or not.
Rank-revealing decompositions offer at least a rank() method. They can also offer convenience methods such as isInvertible(),
and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix, as is the
case with FullPivLU:
and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix.
ColPivHouseholderQR, CompleteOrthogonalDecomposition, and FullPivLU all provide these methods. Here is an example using
FullPivLU:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>

View File

@@ -0,0 +1,3 @@
MatrixXf A = MatrixXf::Random(3, 2);
VectorXf b = VectorXf::Random(3);
cout << "The solution using the COD is:\n" << A.completeOrthogonalDecomposition().solve(b) << endl;