diff --git a/doc/DenseDecompositionBenchmark.dox b/doc/DenseDecompositionBenchmark.dox index f5bfc9c9c..fcbc74b24 100644 --- a/doc/DenseDecompositionBenchmark.dox +++ b/doc/DenseDecompositionBenchmark.dox @@ -30,10 +30,11 @@ Timings are in \b milliseconds, and factors are relative to the LLT decompositio \b *: This decomposition do not support direct least-square solving for over-constrained problems, and the reported timing include the cost to form the symmetric covariance matrix \f$ A^T A \f$. \b Observations: - + LLT is always the fastest solvers. + + LLT is always the fastest solver. + For largely over-constrained problems, the cost of Cholesky/LU decompositions is dominated by the computation of the symmetric covariance matrix. - + For large problem sizes, only the decomposition implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explain why for a 4k x 4k matrix, HouseholderQR is faster than LDLT. In the future, LDLT and ColPivHouseholderQR will also implement blocking strategies. + + For large problem sizes, only the decompositions implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explains why for a 4k x 4k matrix, HouseholderQR is faster than LDLT. + CompleteOrthogonalDecomposition is based on ColPivHouseholderQR and they thus achieve the same level of performance. + + FullPivLU and FullPivHouseholderQR are dramatically slower for large matrices due to the lack of blocking, and are not shown for the 4k x 4k case. The above table was originally generated by a benchmark tool. Feel free to write your own benchmark to generate a table matching your hardware, compiler, and favorite problem sizes. diff --git a/doc/LeastSquares.dox b/doc/LeastSquares.dox index ddbf38dec..606f3583c 100644 --- a/doc/LeastSquares.dox +++ b/doc/LeastSquares.dox @@ -7,13 +7,33 @@ of equations, say \a Ax = \a b, has no solutions. In this case, it makes sense t vector \a x which is closest to being a solution, in the sense that the difference \a Ax - \a b is as small as possible. This \a x is called the least square solution (if the Euclidean norm is used). -The three methods discussed on this page are the SVD decomposition, the QR decomposition and normal -equations. Of these, the SVD decomposition is generally the most accurate but the slowest, normal -equations is the fastest but least accurate, and the QR decomposition is in between. +The methods discussed on this page are the complete orthogonal decomposition (COD), the SVD +decomposition, other QR decompositions, and normal equations. For most problems, we recommend +CompleteOrthogonalDecomposition: it robustly computes the minimum-norm least squares solution +(like the SVD) for both over- and under-determined systems, including rank-deficient ones, but at +QR-like speed. The SVD is the most robust but also the slowest; use it when you also need singular +values or vectors. Normal equations are the fastest but least robust. \eigenAutoToc +\section LeastSquaresCOD Using the complete orthogonal decomposition (recommended) + +CompleteOrthogonalDecomposition is the recommended method for least squares problems. It handles the +widest class of problems — overdetermined, underdetermined, and rank-deficient systems — and computes +the minimum-norm solution when the system is rank-deficient or underdetermined, just like the SVD. +It is based on a rank-revealing QR factorization (ColPivHouseholderQR) followed by a post-processing +step, so it is significantly faster than SVD while providing comparable robustness. + + + + + + + +
Example:Output:
\include LeastSquaresCOD.cpp \verbinclude LeastSquaresCOD.out
+ + \section LeastSquaresSVD Using the SVD decomposition The \link BDCSVD::solve() solve() \endlink method in the BDCSVD class can be directly used to @@ -30,16 +50,19 @@ computing least squares solutions: This is example from the page \link TutorialLinearAlgebra Linear algebra and decompositions \endlink. -If you just need to solve the least squares problem, but are not interested in the SVD per se, a -faster alternative method is CompleteOrthogonalDecomposition. +The SVD gives you singular values and vectors in addition to the least squares solution, but if you +only need the solution, CompleteOrthogonalDecomposition (above) is faster. -\section LeastSquaresQR Using the QR decomposition +\section LeastSquaresQR Using other QR decompositions -The solve() method in QR decomposition classes also computes the least squares solution. There are -three QR decomposition classes: HouseholderQR (no pivoting, fast but unstable if your matrix is -not rull rank), ColPivHouseholderQR (column pivoting, thus a bit slower but more stable) and -FullPivHouseholderQR (full pivoting, so slowest and slightly more stable than ColPivHouseholderQR). +The solve() method in QR decomposition classes also computes the least squares solution. Besides +CompleteOrthogonalDecomposition (above), there are three other QR decomposition classes: +HouseholderQR (no pivoting, so fast but unreliable if your matrix is not full rank), +ColPivHouseholderQR (column pivoting, a bit slower but rank-revealing), and FullPivHouseholderQR +(full pivoting, significantly slower and rarely needed in practice). +Note that only CompleteOrthogonalDecomposition and the SVD-based solvers compute minimum-norm +solutions for rank-deficient or underdetermined problems; the other QR variants do not. Here is an example with column pivoting: diff --git a/doc/TopicLinearAlgebraDecompositions.dox b/doc/TopicLinearAlgebraDecompositions.dox index 8598ce65b..190f09d14 100644 --- a/doc/TopicLinearAlgebraDecompositions.dox +++ b/doc/TopicLinearAlgebraDecompositions.dox @@ -42,10 +42,10 @@ To get an overview of the true relative speed of the different decompositions, c - + - + @@ -78,7 +78,7 @@ To get an overview of the true relative speed of the different decompositions, c - + @@ -120,7 +120,7 @@ To get an overview of the true relative speed of the different decompositions, c - + @@ -232,7 +232,7 @@ To get an overview of the true relative speed of the different decompositions, c - + @@ -244,7 +244,7 @@ To get an overview of the true relative speed of the different decompositions, c - +
FullPivLU -SlowSlow (no blocking) Proven Yes-Rank, kernel, image Yes Excellent -
FullPivHouseholderQR -SlowSlow (no blocking) Proven Yes Orthogonalization- Yes ExcellentSoon: blocking-
\n Singular values and eigenvalues decompositions
- - GoodSoon: blocking-
- - GoodSoon: blocking-
@@ -253,9 +253,32 @@ To get an overview of the true relative speed of the different decompositions, c +\section TopicLinAlgPracticalGuidance Practical guidance + +The following recommendations apply to the most common use cases: + +\li Symmetric positive definite systems: Use \b LLT. It is the fastest solver and has excellent + numerical properties for this class of problems. For semidefinite or nearly singular symmetric systems, + use \b LDLT. +\li General invertible systems: Use \b PartialPivLU. It uses cache-friendly blocking and implicit + multi-threading, making it the fastest general-purpose solver. Partial pivoting is sufficient for + virtually all practical problems. +\li Least squares (over- or under-determined systems): Use \b CompleteOrthogonalDecomposition as + the default. Like the SVD, it robustly computes the minimum-norm solution for rank-deficient and + under-determined problems, but at QR-like speed. Use \b BDCSVD when you also need singular values + or vectors, not just the least squares solution. +\li Full-rank least squares (overdetermined systems): When the matrix is known to be full rank, + \b HouseholderQR is the fastest option. For very tall and skinny well-conditioned matrices, + solving via the normal equations with \b LLT can be faster still. +\li FullPivLU and FullPivHouseholderQR use complete pivoting, which prevents the use of + cache-friendly blocking algorithms and makes them significantly slower than their partial/column + pivoting counterparts. In practice, complete pivoting rarely provides meaningful accuracy benefits. + These decompositions are primarily useful for debugging, pedagogy, or the very rare case + where column pivoting is insufficient. + \section TopicLinAlgTerminology Terminology
diff --git a/doc/TutorialLinearAlgebra.dox b/doc/TutorialLinearAlgebra.dox index 8042fcad3..f2dd13e40 100644 --- a/doc/TutorialLinearAlgebra.dox +++ b/doc/TutorialLinearAlgebra.dox @@ -43,7 +43,23 @@ depending on your matrix, the problem you are trying to solve, and the trade-off Requirements
on the matrix Speed
(small-to-medium) Speed
(large) - Accuracy + Robustness* + + + LLT + llt() + Positive definite + +++ + +++ + + + + + LDLT + ldlt() + Positive or negative
semidefinite + +++ + + + ++ PartialPivLU @@ -54,14 +70,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off + - FullPivLU - fullPivLu() - None - - - - - - +++ - - HouseholderQR householderQr() None @@ -69,7 +77,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off ++ + - + ColPivHouseholderQR colPivHouseholderQr() None @@ -77,14 +85,6 @@ depending on your matrix, the problem you are trying to solve, and the trade-off - +++ - - FullPivHouseholderQR - fullPivHouseholderQr() - None - - - - - - +++ - CompleteOrthogonalDecomposition completeOrthogonalDecomposition() @@ -93,23 +93,7 @@ depending on your matrix, the problem you are trying to solve, and the trade-off - +++ - - LLT - llt() - Positive definite - +++ - +++ - + - - LDLT - ldlt() - Positive or negative
semidefinite - +++ - + - ++ - - BDCSVD bdcSvd() None @@ -126,15 +110,36 @@ depending on your matrix, the problem you are trying to solve, and the trade-off +++ + +* The Robustness column indicates how well the decomposition handles +ill-conditioned or rank-deficient matrices. All decompositions give excellent accuracy when their +requirements on the matrix are met and the problem is well-conditioned. + To get an overview of the true relative speed of the different decompositions, check this \link DenseDecompositionBenchmark benchmark \endlink. -All of these decompositions offer a solve() method that works as in the above example. +All of these decompositions offer a solve() method that works as in the above example. -If you know more about the properties of your matrix, you can use the above table to select the best method. -For example, a good choice for solving linear systems with a non-symmetric matrix of full rank is PartialPivLU. -If you know that your matrix is also symmetric and positive definite, the above table says that -a very good choice is the LLT or LDLT decomposition. Here's an example, also demonstrating that using a general -matrix (not a vector) as right hand side is possible: +\b Practical \b recommendations: +\li If your matrix is symmetric positive definite, use \b LLT. It is the fastest and is perfectly accurate + for this class of problems. If your matrix is only positive or negative semidefinite, use \b LDLT. +\li For a general invertible matrix, \b PartialPivLU is the best choice. It is fast (uses cache-friendly + blocking) and reliable for the vast majority of problems. +\li For least squares problems (over- or under-determined systems), \b CompleteOrthogonalDecomposition + is the recommended default. Like the SVD, it robustly computes the minimum-norm solution for + rank-deficient and under-determined problems, but at the cost of a QR decomposition rather than + an SVD. Use \b ColPivHouseholderQR if you only need least squares for full-rank overdetermined + systems and don't need the minimum-norm property. +\li \b SVD decompositions (BDCSVD, JacobiSVD) are the most robust but also the slowest. Use these when + you need singular values/vectors, not just the solution. +\li \b HouseholderQR is the fastest option for full-rank least squares problems, but it does not + reveal rank and cannot compute minimum-norm solutions for rank-deficient problems. +\li FullPivLU and FullPivHouseholderQR use complete pivoting, which is significantly slower due to + lack of blocking. In practice, they rarely provide meaningful benefits over PartialPivLU and + ColPivHouseholderQR, respectively, and are not recommended for general use. They are primarily useful + for debugging or for pedagogical purposes. + +Here's an example showing the use of LLT for a symmetric positive definite system, also demonstrating +that using a general matrix (not a vector) as right hand side is possible: @@ -151,14 +156,15 @@ supports many other decompositions), see our special page on \section TutorialLinAlgLeastsquares Least squares solving -The most general and accurate method to solve under- or over-determined linear systems -in the least squares sense, is the SVD decomposition. Eigen provides two implementations. -The recommended one is the BDCSVD class, which scales well for large problems -and automatically falls back to the JacobiSVD class for smaller problems. -For both classes, their solve() method solved the linear system in the least-squares -sense. +The recommended method to solve under- or over-determined linear systems in the least squares sense is +\b CompleteOrthogonalDecomposition. Like the SVD, it robustly computes the minimum-norm least squares +solution, correctly handling rank-deficient and under-determined problems, but it is significantly faster +since it is based on a rank-revealing QR decomposition rather than a full SVD. -Here is an example: +If you also need the singular values or vectors themselves (not just the least squares solution), use +\b BDCSVD, which scales well for large problems and automatically falls back to JacobiSVD for smaller ones. + +Here is an example using the SVD:
Example:Output:
@@ -167,11 +173,9 @@ Here is an example:
Example:Output:
-An alternative to the SVD, which is usually faster and about as accurate, is CompleteOrthogonalDecomposition. - -Again, if you know more about the problem, the table above contains methods that are potentially faster. -If your matrix is full rank, HouseHolderQR is the method of choice. If your matrix is full rank and well conditioned, -using the Cholesky decomposition (LLT) on the matrix of the normal equations can be faster still. +If you know more about the problem, faster methods are available. +If your matrix is full rank, HouseholderQR is the fastest method. If your matrix is full rank and +well conditioned, using the Cholesky decomposition (LLT) on the normal equations can be faster still. Our page on \link LeastSquares least squares solving \endlink has more details. @@ -267,8 +271,9 @@ singular matrix). On \ref TopicLinearAlgebraDecompositions "this table" you can whether they are rank-revealing or not. Rank-revealing decompositions offer at least a rank() method. They can also offer convenience methods such as isInvertible(), -and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix, as is the -case with FullPivLU: +and some are also providing methods to compute the kernel (null-space) and image (column-space) of the matrix. +ColPivHouseholderQR, CompleteOrthogonalDecomposition, and FullPivLU all provide these methods. Here is an example using +FullPivLU: diff --git a/doc/snippets/LeastSquaresCOD.cpp b/doc/snippets/LeastSquaresCOD.cpp new file mode 100644 index 000000000..b7325bc29 --- /dev/null +++ b/doc/snippets/LeastSquaresCOD.cpp @@ -0,0 +1,3 @@ +MatrixXf A = MatrixXf::Random(3, 2); +VectorXf b = VectorXf::Random(3); +cout << "The solution using the COD is:\n" << A.completeOrthogonalDecomposition().solve(b) << endl;
Example:Output: