diff --git a/CHANGELOG.md b/CHANGELOG.md index 87b92a971..5d724e90f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -345,16 +345,16 @@ Changes since 3.3.5: * #1617: Fix triangular solve crashing for empty matrix. * #785: Make dense Cholesky decomposition work for empty matrices. * #1634: Remove double copy in move-ctor of non movable Matrix/Array. -* Changeset 588e1eb34eff: Workaround weird MSVC bug. +* Changeset a2d6c106a450: Workaround weird MSVC bug. * #1637 Workaround performance regression in matrix products with gcc>=6 and clang>=6.0. -* Changeset bf0f100339c1: Fix some implicit 0 to Scalar conversions. +* Changeset 9ccbaaf3dd4c: Fix some implicit 0 to Scalar conversions. * #1605: Workaround ABI issue with vector types (aka `__m128`) versus scalar types (aka float). -* Changeset d1421c479baa: Fix for gcc<4.6 regarding usage of #pragma GCC diagnostic push/pop. -* Changeset c20b83b9d736: Fix conjugate-gradient for right-hand-sides with a very small magnitude. -* Changeset 281a877a3bf7: Fix product of empty arrays (returned 0 instead of 1). +* Changeset 148e579cc004: Fix for gcc<4.6 regarding usage of #pragma GCC diagnostic push/pop. +* Changeset bc000deaae45: Fix conjugate-gradient for right-hand-sides with a very small magnitude. +* Changeset 5be00b0e2964: Fix product of empty arrays (returned 0 instead of 1). * #1590: Fix collision with some system headers defining the macro FP32. * #1584: Fix possible undefined behavior in random generation. -* Changeset d632d18db8ca: Fix fallback to BLAS for rankUpdate. +* Changeset e4127b0f7d3b: Fix fallback to BLAS for rankUpdate. * Fixes for NVCC 9. * Fix matrix-market IO. * Various fixes in the doc. @@ -365,94 +365,94 @@ Changes since 3.3.5: Changes since 3.3.4: * General bug fixes: - * Fix GeneralizedEigenSolver when requesting for eigenvalues only (0d15855abb30) - * #1560 fix product with a 1x1 diagonal matrix (90d7654f4a59) + * Fix GeneralizedEigenSolver when requesting for eigenvalues only (ab3fa2e12308) + * #1560 fix product with a 1x1 diagonal matrix (483beabab9bf) * #1543: fix linear indexing in generic block evaluation - * Fix compilation of product with inverse transpositions (e.g., `mat * Transpositions().inverse()`) (14a13748d761) - * #1509: fix `computeInverseWithCheck` for complexes (8be258ef0b6d) - * #1521: avoid signalling `NaN` in hypot and make it std::complex<> friendly (a9c06b854991). - * #1517: fix triangular product with unit diagonal and nested scaling factor: `(s*A).triangularView()*B` (a546d43bdd4f) - * Fix compilation of stableNorm for some odd expressions as input (499e982b9281) - * #1485: fix linking issue of non template functions (ae28c2aaeeda) - * Fix overflow issues in BDCSVD (92060f82e1de) - * #1468: add missing `std::` to `memcpy` (4565282592ae) - * #1453: fix Map with non-default inner-stride but no outer-stride (af00212cf3a4) - * Fix mixing types in sparse matrix products (7e5fcd0008bd) - * #1544: Generate correct Q matrix in complex case (c0c410b508a1) - * #1461: fix compilation of `Map::x()` (69652a06967d) + * Fix compilation of product with inverse transpositions (e.g., `mat * Transpositions().inverse()`) (170914dbbcc3) + * #1509: fix `computeInverseWithCheck` for complexes (a2a2c3c86507) + * #1521: avoid signalling `NaN` in hypot and make it std::complex<> friendly (b18e2d422b09). + * #1517: fix triangular product with unit diagonal and nested scaling factor: `(s*A).triangularView()*B` (c24844195d90) + * Fix compilation of stableNorm for some odd expressions as input (33b972d8b384) + * #1485: fix linking issue of non template functions (d18877f18d8e) + * Fix overflow issues in BDCSVD (7a875acfb05f) + * #1468: add missing `std::` to `memcpy` (32a6db0f8cd5) + * #1453: fix Map with non-default inner-stride but no outer-stride (1ca9072b51d8) + * Fix mixing types in sparse matrix products (4ead16cdd6c8) + * #1544: Generate correct Q matrix in complex case (39125654ce9e) + * #1461: fix compilation of `Map::x()` (9a266e5118cf) * Backends: - * Fix MKL backend for symmetric eigenvalues on row-major matrices (4726d6a24f69) - * #1527: fix support for MKL's VML (972424860545) - * Fix incorrect ldvt in LAPACKE call from JacobiSVD (88c4604601b9) - * Fix support for MKL's BLAS when using `MKL_DIRECT_CALL` (205731b87e19, b88c70c6ced7, 46e2367262e1) - * Use MKL's lapacke.h header when using MKL (19bc9df6b726) + * Fix MKL backend for symmetric eigenvalues on row-major matrices (eab7afe25273) + * #1527: fix support for MKL's VML (86a939451c75) + * Fix incorrect ldvt in LAPACKE call from JacobiSVD (bfc66e8b9a3b) + * Fix support for MKL's BLAS when using `MKL_DIRECT_CALL` (9df7f3d8e9cd, 3108fbf76708, 292dea7922e7) + * Use MKL's lapacke.h header when using MKL (070b5958e0ae) * Diagnostics: - * #1516: add assertion for out-of-range diagonal index in `MatrixBase::diagonal(i)` (783d38b3c78c) - * Add static assertion for fixed sizes `Ref<>` (e1203d5ceb8e) - * Add static assertion on selfadjoint-view's UpLo parameter. (b84db94c677e, 0ffe8a819801) - * #1479: fix failure detection in LDLT (67719139abc3) + * #1516: add assertion for out-of-range diagonal index in `MatrixBase::diagonal(i)` (273738ba6f6e) + * Add static assertion for fixed sizes `Ref<>` (1724dae8b834) + * Add static assertion on selfadjoint-view's UpLo parameter. (74daf12e525e, 190b46dd1f05) + * #1479: fix failure detection in LDLT (c20043c8fd64) * Compiler support: * #1555: compilation fix with XLC - * Workaround MSVC 2013 ambiguous calls (1c7b59b0b5f4) - * Adds missing `EIGEN_STRONG_INLINE` to help MSVC properly inlining small vector calculations (1ba3f10b91f2) - * Several minor warning fixes: 3c87fc0f1042, ad6bcf0e8efc, "used uninitialized" (20efc44c5500), Wint-in-bool-context (131da2cbc695, b4f969795d1b) - * #1428: make NEON vectorization compilable by MSVC. (* 3d1b3dbe5927, 4e1b7350182a) - * Fix compilation and SSE support with PGI compiler (faabf000855d 90d33b09040f) - * #1555: compilation fix with XLC (23eb37691f14) - * #1520: workaround some `-Wfloat-equal` warnings by calling `std::equal_to` (7d9a9456ed7c) - * Make the TensorStorage class compile with clang 3.9 (eff7001e1f0a) - * Misc: some old compiler fixes (493691b29be1) - * Fix MSVC warning C4290: C++ exception specification ignored except to indicate a function is not `__declspec(nothrow)` (524918622506) + * Workaround MSVC 2013 ambiguous calls (c92536d92647) + * Adds missing `EIGEN_STRONG_INLINE` to help MSVC properly inlining small vector calculations (01fb6217335b) + * Several minor warning fixes: f90d136c8445, 542fb03968c2, "used uninitialized" (7634a44bfe11), Wint-in-bool-context (3d1795da28c2, d1c2d6683c55) + * #1428: make NEON vectorization compilable by MSVC. (* 1e2d2693b911, 927d023ceaab) + * Fix compilation and SSE support with PGI compiler (bb87f618bfc3 450c5e5d2771) + * #1555: compilation fix with XLC (20ca86888e70) + * #1520: workaround some `-Wfloat-equal` warnings by calling `std::equal_to` (1c4fdad7bd6f) + * Make the TensorStorage class compile with clang 3.9 (a7144f8d6a94) + * Misc: some old compiler fixes (b60cbbef3791) + * Fix MSVC warning C4290: C++ exception specification ignored except to indicate a function is not `__declspec(nothrow)` (3df78d5afc1e) * Architecture support: - * Several AVX512 fixes for `log`, `sqrt`, `rsqrt`, non `AVX512ER` CPUs, `apply_rotation_in_the_plane` b64275e912ba cab3d626a59e 7ce234652ab9, d89b9a754371. - * AltiVec fixes: 9450038e380d - * NEON fixes: const-cast (e8a69835ccda), compilation of Jacobi rotations (c06cfd545b15,#1436). - * Changeset d0658cc9d4a2: Define `pcast<>` for SSE types even when AVX is enabled. (otherwise float are silently reinterpreted as int instead of being converted) - * #1494: makes `pmin`/`pmax` behave on Altivec/VSX as on x86 regarding NaNs (d0af83f82b19) + * Several AVX512 fixes for `log`, `sqrt`, `rsqrt`, non `AVX512ER` CPUs, `apply_rotation_in_the_plane` 5c59564bfb92 1939c971a3db c2f9e6cb37e5, 609e425166f6. + * AltiVec fixes: 1641a6cdd5a4 + * NEON fixes: const-cast (877a2b64c9ba), compilation of Jacobi rotations (bc837b797559,#1436). + * Changeset 971b32440c74: Define `pcast<>` for SSE types even when AVX is enabled. (otherwise float are silently reinterpreted as int instead of being converted) + * #1494: makes `pmin`/`pmax` behave on Altivec/VSX as on x86 regarding NaNs (892c0a79ce93) * Documentation: * Update manual pages regarding BDCSVD (#1538) - * Add aliasing in common pitfaffs (2a5a8408fdc5) - * Update `aligned_allocator` (21e03aef9f2b) - * #1456: add perf recommendation for LLT and storage format (c8c154ebf130, 9aef1e23dbe0) - * #1455: Cholesky module depends on Jacobi for rank-updates (2e6e26b851a8) - * #1458: fix documentation of LLT and LDLT `info()` method (2a4cf4f473dd) - * Warn about constness in `LLT::solveInPlace` (518f97b69bdf) - * Fix lazyness of `operator*` with CUDA (c4dbb556bd36) - * #336: improve doc for `PlainObjectBase::Map` (13dc446545fe) + * Add aliasing in common pitfaffs (656712d48f6b) + * Update `aligned_allocator` (6fc0f2be70a4) + * #1456: add perf recommendation for LLT and storage format (55fbf4fedd04, 9fd138e2b333) + * #1455: Cholesky module depends on Jacobi for rank-updates (b87875abf8dc) + * #1458: fix documentation of LLT and LDLT `info()` method (ac2c97edff07) + * Warn about constness in `LLT::solveInPlace` (51e1aa153957) + * Fix lazyness of `operator*` with CUDA (fa77d713359d) + * #336: improve doc for `PlainObjectBase::Map` (18868228adae) * Other general improvements: - * Enable linear indexing in generic block evaluation (31537598bf83, 5967bc3c2cdb, #1543). - * Fix packet and alignment propagation logic of `Block` expressions. In particular, `(A+B).col(j)` now preserve vectorisation. (b323cc9c2c7f) - * Several fixes regarding custom scalar type support: hypot (f8d6c791791d), boost-multiprec (acb8ef9b2478), literal casts (6bbd97f17534, 39f65d65894f), - * LLT: avoid making a copy when decomposing in place (2f7e28920f4e), const the arg to `solveInPlace()` to allow passing `.transpose()`, `.block()`, etc. (c31c0090e998). - * Add possibility to overwrite `EIGEN_STRONG_INLINE` (7094bbdf3f4d) - * #1528: use `numeric_limits::min()` instead of `1/highest()` that might underflow (dd823c64ade7) - * #1532: disable `stl::*_negate` in C++17 (they are deprecated) (88e9452099d5) - * Add C++11 `max_digits10` for half (faf74dde8ed1) - * Make sparse QR result sizes consistent with dense QR (4638bc4d0f96) + * Enable linear indexing in generic block evaluation (15752027ec2f, 80af7d6a47c1, #1543). + * Fix packet and alignment propagation logic of `Block` expressions. In particular, `(A+B).col(j)` now preserve vectorisation. (9c9e90f6db7e) + * Several fixes regarding custom scalar type support: hypot (385d8b5e42c2), boost-multiprec (5f71579a2d3f), literal casts (e6577f3c3049, fbb0c510c52f), + * LLT: avoid making a copy when decomposing in place (9d03711df8bc), const the arg to `solveInPlace()` to allow passing `.transpose()`, `.block()`, etc. (0137ed4f19b6). + * Add possibility to overwrite `EIGEN_STRONG_INLINE` (6d6e5fcd4356) + * #1528: use `numeric_limits::min()` instead of `1/highest()` that might underflow (9ff315024335) + * #1532: disable `stl::*_negate` in C++17 (they are deprecated) (3fb42ff7b278) + * Add C++11 `max_digits10` for half (70ac6c923001) + * Make sparse QR result sizes consistent with dense QR (2136cfa17e28) * Unsupported/unit-tests/cmake/unvisible internals/etc. - * #1484: restore deleted line for 128 bits long doubles, and improve dispatching logic. (dffc0f957f19) - * #1462: remove all occurences of the deprecated `__CUDACC_VER__` macro by introducing `EIGEN_CUDACC_VER` (a201b8438d36) - * Changeset 2722aa8eb93f: Fix oversharding bug in parallelFor. - * Changeset ea1db80eab46: commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix) - * Changeset 350957be012c: Fix int versus Index - * Changeset 424038431015: fix linking issue - * Changeset 3f938790b7e0: Fix short vs long - * Changeset ba14974d054a: Fix cmake scripts with no fortran compiler - * Changeset 2ac088501976: add cmake-option to enable/disable creation of tests - * Changeset 56996c54158b: Use col method for column-major matrix - * Changeset 762373ca9793: #1449: fix `redux_3` unit test - * Changeset eda96fd2fa30: Fix uninitialized output argument. - * Changeset 75a12dff8ca4: Handle min/max/inf/etc issue in `cuda_fp16.h` directly in `test/main.h` - * Changeset 568614bf79b8: Add tests for sparseQR results (value and size) covering bugs 1522 and 1544 - * Changeset 12c9ece47d14: `SelfAdjointView<...,Mode>` causes a static assert since commit c73a77e47db8 - * Changeset 899fd2ef704f: weird compilation issue in `mapped_matrix.cpp` + * #1484: restore deleted line for 128 bits long doubles, and improve dispatching logic. (c8e663fe87ec) + * #1462: remove all occurences of the deprecated `__CUDACC_VER__` macro by introducing `EIGEN_CUDACC_VER` (e7c065ec717b) + * Changeset fea50d40ea79: Fix oversharding bug in parallelFor. + * Changeset 866d222d6065: commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix) + * Changeset 48048172e5aa: Fix int versus Index + * Changeset 906a98fe39c3: fix linking issue + * Changeset 352489edbe36: Fix short vs long + * Changeset 81e94eea024c: Fix cmake scripts with no fortran compiler + * Changeset 8bd392ca0e3f: add cmake-option to enable/disable creation of tests + * Changeset 02c0cef97fb5: Use col method for column-major matrix + * Changeset a8d2459f8e1f: #1449: fix `redux_3` unit test + * Changeset e90a14609a56: Fix uninitialized output argument. + * Changeset 5d40715db6a7: Handle min/max/inf/etc issue in `cuda_fp16.h` directly in `test/main.h` + * Changeset 2f9de522457b: Add tests for sparseQR results (value and size) covering bugs 1522 and 1544 + * Changeset 4662c610c13c: `SelfAdjointView<...,Mode>` causes a static assert since commit d820ab9edc0b + * Changeset 96134409fc91: weird compilation issue in `mapped_matrix.cpp` ## [3.3.4] - 2017-06-15 @@ -686,7 +686,7 @@ Main changes since 3.3-beta1: * #779: in `Map`, allows non aligned buffers for buffers smaller than the requested alignment. * Add a complete orthogonal decomposition class: [CompleteOrthogonalDecomposition](http://eigen.tuxfamily.org/dox-devel/classEigen_1_1CompleteOrthogonalDecomposition.html) * Improve robustness of JacoviSVD with complexes (underflow, noise amplification in complex to real conversion, compare off-diagonal entries to the current biggest diagonal entry instead of the global biggest, null inputs). - * Change Eigen's ColPivHouseholderQR to use a numerically stable norm downdate formula (changeset 9da6c621d055) + * Change Eigen's ColPivHouseholderQR to use a numerically stable norm downdate formula (changeset acce4dd0500f) * #1214: consider denormals as zero in D&C SVD. This also workaround infinite binary search when compiling with ICC's unsafe optimizations. * Add log1p for arrays. * #1193: now `lpNorm` supports empty inputs. @@ -709,7 +709,7 @@ Main changes since 3.3-beta1: * Performance improvements: * #256: enable vectorization with unaligned loads/stores. This concerns all architectures and all sizes. This new behavior can be disabled by defining `EIGEN_UNALIGNED_VECTORIZE=0` * Add support for s390x(zEC13) ZVECTOR instruction set. - * Optimize mixing of real with complex matrices by avoiding a conversion from real to complex when the real types do not match exactly. (see bccae23d7018) + * Optimize mixing of real with complex matrices by avoiding a conversion from real to complex when the real types do not match exactly. (see 76faf4a9657e) * Speedup square roots in performance critical methods such as norm, normalize(d). * #1154: use dynamic scheduling for spmv products. * #667, #1181: improve perf with MSVC and ICC through `FORCE_INLINE` @@ -898,7 +898,7 @@ Main changes since 3.3-alpha1: * Add temporary-free evaluation of `D.nolias() *= C + A*B`. * Add vectorization of round, ceil and floor for SSE4.1/AVX. * Optimize assignment into a `Block` by using Ref and avoiding useless updates in non-compressed mode. This make row-by-row filling of a row-major sparse matrix very efficient. - * Improve internal cost model leading to faster code in some cases (see changeset 1bcb41187a45). + * Improve internal cost model leading to faster code in some cases (see changeset 77ff3386b7d2). * #1090: improve redux evaluation logic. * Enable unaligned vectorization of small fixed size matrix products.