mirror of
https://gitlab.com/libeigen/eigen.git
synced 2026-04-10 11:34:33 +08:00
Improve dense linear solver docs with practical guidance
libeigen/eigen!2395 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
This commit is contained in:
@@ -30,10 +30,11 @@ Timings are in \b milliseconds, and factors are relative to the LLT decompositio
|
||||
<a name="note_ls">\b *: </a> This decomposition do not support direct least-square solving for over-constrained problems, and the reported timing include the cost to form the symmetric covariance matrix \f$ A^T A \f$.
|
||||
|
||||
\b Observations:
|
||||
+ LLT is always the fastest solvers.
|
||||
+ LLT is always the fastest solver.
|
||||
+ For largely over-constrained problems, the cost of Cholesky/LU decompositions is dominated by the computation of the symmetric covariance matrix.
|
||||
+ For large problem sizes, only the decomposition implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explain why for a 4k x 4k matrix, HouseholderQR is faster than LDLT. In the future, LDLT and ColPivHouseholderQR will also implement blocking strategies.
|
||||
+ For large problem sizes, only the decompositions implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explains why for a 4k x 4k matrix, HouseholderQR is faster than LDLT.
|
||||
+ CompleteOrthogonalDecomposition is based on ColPivHouseholderQR and they thus achieve the same level of performance.
|
||||
+ FullPivLU and FullPivHouseholderQR are dramatically slower for large matrices due to the lack of blocking, and are not shown for the 4k x 4k case.
|
||||
|
||||
The above table was originally generated by a benchmark tool. Feel free to write your own benchmark to generate a table matching your hardware, compiler, and favorite problem sizes.
|
||||
|
||||
|
||||
@@ -7,13 +7,33 @@ of equations, say \a Ax = \a b, has no solutions. In this case, it makes sense t
|
||||
vector \a x which is closest to being a solution, in the sense that the difference \a Ax - \a b is
|
||||
as small as possible. This \a x is called the least square solution (if the Euclidean norm is used).
|
||||
|
||||
The three methods discussed on this page are the SVD decomposition, the QR decomposition and normal
|
||||
equations. Of these, the SVD decomposition is generally the most accurate but the slowest, normal
|
||||
equations is the fastest but least accurate, and the QR decomposition is in between.
|
||||
The methods discussed on this page are the complete orthogonal decomposition (COD), the SVD
|
||||
decomposition, other QR decompositions, and normal equations. For most problems, we recommend
|
||||
CompleteOrthogonalDecomposition: it robustly computes the minimum-norm least squares solution
|
||||
(like the SVD) for both over- and under-determined systems, including rank-deficient ones, but at
|
||||
QR-like speed. The SVD is the most robust but also the slowest; use it when you also need singular
|
||||
values or vectors. Normal equations are the fastest but least robust.
|
||||
|
||||
\eigenAutoToc
|
||||
|
||||
|
||||
\section LeastSquaresCOD Using the complete orthogonal decomposition (recommended)
|
||||
|
||||
CompleteOrthogonalDecomposition is the recommended method for least squares problems. It handles the
|
||||
widest class of problems — overdetermined, underdetermined, and rank-deficient systems — and computes
|
||||
the minimum-norm solution when the system is rank-deficient or underdetermined, just like the SVD.
|
||||
It is based on a rank-revealing QR factorization (ColPivHouseholderQR) followed by a post-processing
|
||||
step, so it is significantly faster than SVD while providing comparable robustness.
|
||||
|
||||
<table class="example">
|
||||
<tr><th>Example:</th><th>Output:</th></tr>
|
||||
<tr>
|
||||
<td>\include LeastSquaresCOD.cpp </td>
|
||||
<td>\verbinclude LeastSquaresCOD.out </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
\section LeastSquaresSVD Using the SVD decomposition
|
||||
|
||||
The \link BDCSVD::solve() solve() \endlink method in the BDCSVD class can be directly used to
|
||||
@@ -30,16 +50,19 @@ computing least squares solutions:
|
||||
</table>
|
||||
|
||||
This is example from the page \link TutorialLinearAlgebra Linear algebra and decompositions \endlink.
|
||||
If you just need to solve the least squares problem, but are not interested in the SVD per se, a
|
||||
faster alternative method is CompleteOrthogonalDecomposition.
|
||||
The SVD gives you singular values and vectors in addition to the least squares solution, but if you
|
||||
only need the solution, CompleteOrthogonalDecomposition (above) is faster.
|
||||
|
||||
|
||||
\section LeastSquaresQR Using the QR decomposition
|
||||
\section LeastSquaresQR Using other QR decompositions
|
||||
|
||||
The solve() method in QR decomposition classes also computes the least squares solution. There are
|
||||
three QR decomposition classes: HouseholderQR (no pivoting, fast but unstable if your matrix is
|
||||
not rull rank), ColPivHouseholderQR (column pivoting, thus a bit slower but more stable) and
|
||||
FullPivHouseholderQR (full pivoting, so slowest and slightly more stable than ColPivHouseholderQR).
|
||||
The solve() method in QR decomposition classes also computes the least squares solution. Besides
|
||||
CompleteOrthogonalDecomposition (above), there are three other QR decomposition classes:
|
||||
HouseholderQR (no pivoting, so fast but unreliable if your matrix is not full rank),
|
||||
ColPivHouseholderQR (column pivoting, a bit slower but rank-revealing), and FullPivHouseholderQR
|
||||
(full pivoting, significantly slower and rarely needed in practice).
|
||||
Note that only CompleteOrthogonalDecomposition and the SVD-based solvers compute minimum-norm
|
||||
solutions for rank-deficient or underdetermined problems; the other QR variants do not.
|
||||
Here is an example with column pivoting:
|
||||
|
||||
<table class="example">
|
||||
|
||||
@@ -42,10 +42,10 @@ To get an overview of the true relative speed of the different decompositions, c
|
||||
<tr class="alt">
|
||||
<td>FullPivLU</td>
|
||||
<td>-</td>
|
||||
<td>Slow</td>
|
||||
<td>Slow (no blocking)</td>
|
||||
<td>Proven</td>
|
||||
<td>Yes</td>
|
||||
<td>-</td>
|
||||
<td>Rank, kernel, image</td>
|
||||
<td>Yes</td>
|
||||
<td>Excellent</td>
|
||||
<td>-</td>
|
||||
@@ -78,7 +78,7 @@ To get an overview of the true relative speed of the different decompositions, c
|
||||
<tr>
|
||||
<td>FullPivHouseholderQR</td>
|
||||
<td>-</td>
|
||||
<td>Slow</td>
|
||||
<td>Slow (no blocking)</td>
|
||||
<td>Proven</td>
|
||||
<td>Yes</td>
|
||||
<td>Orthogonalization</td>
|
||||
@@ -120,7 +120,7 @@ To get an overview of the true relative speed of the different decompositions, c
|
||||
<td>-</td>
|
||||
<td>Yes</td>
|
||||
<td>Excellent</td>
|
||||
<td><em>Soon: blocking</em></td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
|
||||
<tr><th class="inter" colspan="9">\n Singular values and eigenvalues decompositions</th></tr>
|
||||
@@ -232,7 +232,7 @@ To get an overview of the true relative speed of the different decompositions, c
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>Good</td>
|
||||
<td><em>Soon: blocking</em></td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
@@ -244,7 +244,7 @@ To get an overview of the true relative speed of the different decompositions, c
|
||||
<td>-</td>
|
||||
<td>-</td>
|
||||
<td>Good</td>
|
||||
<td><em>Soon: blocking</em></td>
|
||||
<td>-</td>
|
||||
</tr>
|
||||
|
||||
</table>
|
||||
@@ -253,9 +253,32 @@ To get an overview of the true relative speed of the different decompositions, c
|
||||
<ul>
|
||||
<li><a name="note1">\b 1: </a>There exist two variants of the LDLT algorithm. Eigen's one produces a pure diagonal D matrix, and therefore it cannot handle indefinite matrices, unlike Lapack's one which produces a block diagonal D matrix.</li>
|
||||
<li><a name="note2">\b 2: </a>Eigenvalues, SVD and Schur decompositions rely on iterative algorithms. Their convergence speed depends on how well the eigenvalues are separated.</li>
|
||||
<li><a name="note3">\b 3: </a>Our JacobiSVD is two-sided, making for proven and optimal precision for square matrices. For non-square matrices, we have to use a QR preconditioner first. The default choice, ColPivHouseholderQR, is already very reliable, but if you want it to be proven, use FullPivHouseholderQR instead.
|
||||
<li><a name="note3">\b 3: </a>Our JacobiSVD is two-sided, making for proven and optimal precision for square matrices. For non-square matrices, we have to use a QR preconditioner first. The default choice, ColPivHouseholderQR, is already very reliable, but if you want it to be proven, use FullPivHouseholderQR instead.</li>
|
||||
</ul>
|
||||
|
||||
\section TopicLinAlgPracticalGuidance Practical guidance
|
||||
|
||||
The following recommendations apply to the most common use cases:
|
||||
|
||||
\li <b>Symmetric positive definite systems:</b> Use \b LLT. It is the fastest solver and has excellent
|
||||
numerical properties for this class of problems. For semidefinite or nearly singular symmetric systems,
|
||||
use \b LDLT.
|
||||
\li <b>General invertible systems:</b> Use \b PartialPivLU. It uses cache-friendly blocking and implicit
|
||||
multi-threading, making it the fastest general-purpose solver. Partial pivoting is sufficient for
|
||||
virtually all practical problems.
|
||||
\li <b>Least squares (over- or under-determined systems):</b> Use \b CompleteOrthogonalDecomposition as
|
||||
the default. Like the SVD, it robustly computes the minimum-norm solution for rank-deficient and
|
||||
under-determined problems, but at QR-like speed. Use \b BDCSVD when you also need singular values
|
||||
or vectors, not just the least squares solution.
|
||||
\li <b>Full-rank least squares (overdetermined systems):</b> When the matrix is known to be full rank,
|
||||
\b HouseholderQR is the fastest option. For very tall and skinny well-conditioned matrices,
|
||||
solving via the normal equations with \b LLT can be faster still.
|
||||
\li <b>FullPivLU and FullPivHouseholderQR</b> use complete pivoting, which prevents the use of
|
||||
cache-friendly blocking algorithms and makes them significantly slower than their partial/column
|
||||
pivoting counterparts. In practice, complete pivoting rarely provides meaningful accuracy benefits.
|
||||
These decompositions are primarily useful for debugging, pedagogy, or the very rare case
|
||||
where column pivoting is insufficient.
|
||||
|
||||
\section TopicLinAlgTerminology Terminology
|
||||
|
||||
<dl>
|
||||
|
||||
@@ -43,7 +43,23 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
|
||||
<th>Requirements<br/>on the matrix</th>
|
||||
<th>Speed<br/> (small-to-medium)</th>
|
||||
<th>Speed<br/> (large)</th>
|
||||
<th>Accuracy</th>
|
||||
<th>Robustness<sup><a href="#note_robust">*</a></sup></th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>LLT</td>
|
||||
<td>llt()</td>
|
||||
<td>Positive definite</td>
|
||||
<td>+++</td>
|
||||
<td>+++</td>
|
||||
<td>+</td>
|
||||
</tr>
|
||||
<tr class="alt">
|
||||
<td>LDLT</td>
|
||||
<td>ldlt()</td>
|
||||
<td>Positive or negative<br/> semidefinite</td>
|
||||
<td>+++</td>
|
||||
<td>+</td>
|
||||
<td>++</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>PartialPivLU</td>
|
||||
@@ -54,14 +70,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
|
||||
<td>+</td>
|
||||
</tr>
|
||||
<tr class="alt">
|
||||
<td>FullPivLU</td>
|
||||
<td>fullPivLu()</td>
|
||||
<td>None</td>
|
||||
<td>-</td>
|
||||
<td>- -</td>
|
||||
<td>+++</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>HouseholderQR</td>
|
||||
<td>householderQr()</td>
|
||||
<td>None</td>
|
||||
@@ -69,7 +77,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
|
||||
<td>++</td>
|
||||
<td>+</td>
|
||||
</tr>
|
||||
<tr class="alt">
|
||||
<tr>
|
||||
<td>ColPivHouseholderQR</td>
|
||||
<td>colPivHouseholderQr()</td>
|
||||
<td>None</td>
|
||||
@@ -77,14 +85,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
|
||||
<td>-</td>
|
||||
<td>+++</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>FullPivHouseholderQR</td>
|
||||
<td>fullPivHouseholderQr()</td>
|
||||
<td>None</td>
|
||||
<td>-</td>
|
||||
<td>- -</td>
|
||||
<td>+++</td>
|
||||
</tr>
|
||||
<tr class="alt">
|
||||
<td>CompleteOrthogonalDecomposition</td>
|
||||
<td>completeOrthogonalDecomposition()</td>
|
||||
@@ -93,23 +93,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
|
||||
<td>-</td>
|
||||
<td>+++</td>
|
||||
</tr>
|
||||
<tr class="alt">
|
||||
<td>LLT</td>
|
||||
<td>llt()</td>
|
||||
<td>Positive definite</td>
|
||||
<td>+++</td>
|
||||
<td>+++</td>
|
||||
<td>+</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>LDLT</td>
|
||||
<td>ldlt()</td>
|
||||
<td>Positive or negative<br/> semidefinite</td>
|
||||
<td>+++</td>
|
||||
<td>+</td>
|
||||
<td>++</td>
|
||||
</tr>
|
||||
<tr class="alt">
|
||||
<td>BDCSVD</td>
|
||||
<td>bdcSvd()</td>
|
||||
<td>None</td>
|
||||
@@ -126,15 +110,36 @@ depending on your matrix, the problem you are trying to solve, and the trade-off
|
||||
<td>+++</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<a name="note_robust"><b>*</b></a> The <b>Robustness</b> column indicates how well the decomposition handles
|
||||
ill-conditioned or rank-deficient matrices. All decompositions give excellent accuracy when their
|
||||
requirements on the matrix are met and the problem is well-conditioned.
|
||||
|
||||
To get an overview of the true relative speed of the different decompositions, check this \link DenseDecompositionBenchmark benchmark \endlink.
|
||||
|
||||
All of these decompositions offer a solve() method that works as in the above example.
|
||||
All of these decompositions offer a solve() method that works as in the above example.
|
||||
|
||||
If you know more about the properties of your matrix, you can use the above table to select the best method.
|
||||
For example, a good choice for solving linear systems with a non-symmetric matrix of full rank is PartialPivLU.
|
||||
If you know that your matrix is also symmetric and positive definite, the above table says that
|
||||
a very good choice is the LLT or LDLT decomposition. Here's an example, also demonstrating that using a general
|
||||
matrix (not a vector) as right hand side is possible:
|
||||
\b Practical \b recommendations:
|
||||
\li If your matrix is symmetric positive definite, use \b LLT. It is the fastest and is perfectly accurate
|
||||
for this class of problems. If your matrix is only positive or negative semidefinite, use \b LDLT.
|
||||
\li For a general invertible matrix, \b PartialPivLU is the best choice. It is fast (uses cache-friendly
|
||||
blocking) and reliable for the vast majority of problems.
|
||||
\li For least squares problems (over- or under-determined systems), \b CompleteOrthogonalDecomposition
|
||||
is the recommended default. Like the SVD, it robustly computes the minimum-norm solution for
|
||||
rank-deficient and under-determined problems, but at the cost of a QR decomposition rather than
|
||||
an SVD. Use \b ColPivHouseholderQR if you only need least squares for full-rank overdetermined
|
||||
systems and don't need the minimum-norm property.
|
||||
\li \b SVD decompositions (BDCSVD, JacobiSVD) are the most robust but also the slowest. Use these when
|
||||
you need singular values/vectors, not just the solution.
|
||||
\li \b HouseholderQR is the fastest option for full-rank least squares problems, but it does not
|
||||
reveal rank and cannot compute minimum-norm solutions for rank-deficient problems.
|
||||
\li FullPivLU and FullPivHouseholderQR use complete pivoting, which is significantly slower due to
|
||||
lack of blocking. In practice, they rarely provide meaningful benefits over PartialPivLU and
|
||||
ColPivHouseholderQR, respectively, and are not recommended for general use. They are primarily useful
|
||||
for debugging or for pedagogical purposes.
|
||||
|
||||
Here's an example showing the use of LLT for a symmetric positive definite system, also demonstrating
|
||||
that using a general matrix (not a vector) as right hand side is possible:
|
||||
|
||||
<table class="example">
|
||||
<tr><th>Example:</th><th>Output:</th></tr>
|
||||
@@ -151,14 +156,15 @@ supports many other decompositions), see our special page on
|
||||
|
||||
\section TutorialLinAlgLeastsquares Least squares solving
|
||||
|
||||
The most general and accurate method to solve under- or over-determined linear systems
|
||||
in the least squares sense, is the SVD decomposition. Eigen provides two implementations.
|
||||
The recommended one is the BDCSVD class, which scales well for large problems
|
||||
and automatically falls back to the JacobiSVD class for smaller problems.
|
||||
For both classes, their solve() method solved the linear system in the least-squares
|
||||
sense.
|
||||
The recommended method to solve under- or over-determined linear systems in the least squares sense is
|
||||
\b CompleteOrthogonalDecomposition. Like the SVD, it robustly computes the minimum-norm least squares
|
||||
solution, correctly handling rank-deficient and under-determined problems, but it is significantly faster
|
||||
since it is based on a rank-revealing QR decomposition rather than a full SVD.
|
||||
|
||||
Here is an example:
|
||||
If you also need the singular values or vectors themselves (not just the least squares solution), use
|
||||
\b BDCSVD, which scales well for large problems and automatically falls back to JacobiSVD for smaller ones.
|
||||
|
||||
Here is an example using the SVD:
|
||||
<table class="example">
|
||||
<tr><th>Example:</th><th>Output:</th></tr>
|
||||
<tr>
|
||||
@@ -167,11 +173,9 @@ Here is an example:
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
An alternative to the SVD, which is usually faster and about as accurate, is CompleteOrthogonalDecomposition.
|
||||
|
||||
Again, if you know more about the problem, the table above contains methods that are potentially faster.
|
||||
If your matrix is full rank, HouseHolderQR is the method of choice. If your matrix is full rank and well conditioned,
|
||||
using the Cholesky decomposition (LLT) on the matrix of the normal equations can be faster still.
|
||||
If you know more about the problem, faster methods are available.
|
||||
If your matrix is full rank, HouseholderQR is the fastest method. If your matrix is full rank and
|
||||
well conditioned, using the Cholesky decomposition (LLT) on the normal equations can be faster still.
|
||||
Our page on \link LeastSquares least squares solving \endlink has more details.
|
||||
|
||||
|
||||
@@ -267,8 +271,9 @@ singular matrix). On \ref TopicLinearAlgebraDecompositions "this table" you can
|
||||
whether they are rank-revealing or not.
|
||||
|
||||
Rank-revealing decompositions offer at least a rank() method. They can also offer convenience methods such as isInvertible(),
|
||||
and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix, as is the
|
||||
case with FullPivLU:
|
||||
and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix.
|
||||
ColPivHouseholderQR, CompleteOrthogonalDecomposition, and FullPivLU all provide these methods. Here is an example using
|
||||
FullPivLU:
|
||||
|
||||
<table class="example">
|
||||
<tr><th>Example:</th><th>Output:</th></tr>
|
||||
|
||||
3
doc/snippets/LeastSquaresCOD.cpp
Normal file
3
doc/snippets/LeastSquaresCOD.cpp
Normal file
@@ -0,0 +1,3 @@
|
||||
MatrixXf A = MatrixXf::Random(3, 2);
|
||||
VectorXf b = VectorXf::Random(3);
|
||||
cout << "The solution using the COD is:\n" << A.completeOrthogonalDecomposition().solve(b) << endl;
|
||||
Reference in New Issue
Block a user