Commit Graph

  • a885340ba5 Update rocm docker again. Antonio Sánchez 2024-12-06 17:19:31 +00:00
  • 45a8478d09 Update rocm docker image in CI. Antonio Sanchez 2024-12-06 07:14:59 -08:00
  • de4afcf414 Add a deploy phase to the CI that tags the latest nightly pipeline if it passes. Antonio Sánchez 2024-12-05 15:28:18 +00:00
  • 5e8916050b move constructor / move assignment doc strings Charles Schlosser 2024-12-04 17:42:20 +00:00
  • 77a073aaa8 fix checkformat ci stage Charles Schlosser 2024-12-04 02:45:52 +00:00
  • 41e46ed243 fix IOFormat alignment Charles Schlosser 2024-12-04 01:13:48 +00:00
  • a0d32e40d9 fix map fill logic Charles Schlosser 2024-11-30 13:39:02 +00:00
  • d34b100c13 Fix UB in setZero Charles Schlosser 2024-11-27 19:32:14 +00:00
  • f19a6803c8 Refactor special case handling in pow(x,y) and revert to repeated squaring for <float,int> Rasmus Munk Larsen 2024-11-27 00:24:21 +00:00
  • 5064cb7d5e Add test for using pcast on scalars. Rasmus Munk Larsen 2024-11-25 22:27:26 -08:00
  • 1ea61a5d26 Improve pow(x,y): 25% speedup, increase accuracy for integer exponents. Rasmus Munk Larsen 2024-11-26 06:13:48 +00:00
  • 8ad4344ca7 optimize setConstant, setZero Charles Schlosser 2024-11-22 03:39:19 +00:00
  • 5610a13b77 Simplify and speed up pow() by 5-6% Rasmus Munk Larsen 2024-11-20 12:45:00 +00:00
  • 6c6ce9d06b Enable vectorized erf<double>(x) for SSE and AVX, which was accidentally removed in merge request 1750. Rasmus Munk Larsen 2024-11-19 22:14:29 +00:00
  • e7c799b7c9 Prevent premature overflow to infinity in exp(x). The changes also provide a 3-4% speedup. Rasmus Munk Larsen 2024-11-19 13:08:18 -08:00
  • 00af47102d Revert 040180078d Rasmus Munk Larsen 2024-11-19 10:25:16 -08:00
  • 8ee6f8475a Speed up exp(x). Rasmus Munk Larsen 2024-11-19 17:50:34 +00:00
  • 93ec5450cb disable fill_n optimization for msvc Charles Schlosser 2024-11-19 01:38:48 +00:00
  • 0af6ab4b76 Remove unnecessary check for HasBlend trait. Rasmus Munk Larsen 2024-11-18 13:04:59 -08:00
  • d5eec781b7 Get rid of redundant computation for large arguments to erf(x). Rasmus Munk Larsen 2024-11-18 10:51:58 -08:00
  • 2fc63808e4 Fix C++20 constexpr test compilation failures Tyler Veness 2024-11-18 01:56:55 +00:00
  • 5133c836c0 Vectorize erf(x) for double. Rasmus Munk Larsen 2024-11-16 19:05:16 +00:00
  • d6e3b528b2 Update Assign_MKL.h to cast disparate enum type to int, so it can be compared... Conrad Poelman 2024-11-15 20:00:29 +00:00
  • 040180078d Ensure that destructor's needed by lldb make it into binary in non-inlined fashion breathe1 2024-11-15 17:15:09 +00:00
  • 0fb2ed140d Make element accessors constexpr Tyler Veness 2024-11-14 01:05:29 +00:00
  • e67c494cba Use old syntax for CMake's separate_arguments() to restore compatiblity with old CMake versions. Morris Hafner 2024-11-13 17:01:13 +00:00
  • 8b4efc8ed8 check_size_for_overflow: use numeric limits instead of c99 macro Charles Schlosser 2024-11-13 00:35:35 +00:00
  • 489dbbc651 make fixed_size matrices conform to std::is_standard_layout Charles Schlosser 2024-11-12 23:34:26 +00:00
  • 283d871a3f Add missing EIGEN_DEVICE_FUNCTION decorations. Rasmus Munk Larsen 2024-11-08 14:25:57 -08:00
  • 0d366f6532 Vectorize erfc(x) for double and improve erfc(x) for float. Rasmus Munk Larsen 2024-11-08 17:21:11 +00:00
  • 8adf43640e more avx predux_any Charles Schlosser 2024-11-07 19:58:48 +00:00
  • bc424f617a add missing avx predux_any functions Charles Schlosser 2024-11-07 19:11:29 +00:00
  • e52ac76ca3 use EIGEN_CPLUSPLUS instead of checking cpp version Charles Schlosser 2024-11-06 17:25:22 +00:00
  • 122be167cd Revert "make fixed-size objects trivially move assignable" Rasmus Munk Larsen 2024-11-06 01:09:38 +00:00
  • d49021212b Tensor Roll / Circular Shift / Rotate Tobias Wood 2024-11-05 14:10:19 +00:00
  • 3e7bcf54f7 cherry-pick !1682 Add nvc++ support into 3.4 Morris Hafner 2024-11-04 17:55:47 +00:00
  • bb73be8a2e make fixed-size objects trivially move assignable Charles Schlosser 2024-11-04 17:55:27 +00:00
  • 7fd305ecae Fix GPU builds. Antonio Sánchez 2024-11-01 04:50:03 +00:00
  • c8267654f2 Don't use __builtin_alloca_with_align with nvc++ Morris Hafner 2024-10-30 18:02:08 +00:00
  • 84c446df2c Fix macro redefinition warning in FFTW test Tyler Veness 2024-10-30 17:18:42 +00:00
  • a9584d8e3c Fix clang6 failures. Antonio Sánchez 2024-10-30 14:41:50 +00:00
  • dd4c2805d9 Fix clang6 failures. Antonio Sánchez 2024-10-29 22:18:30 +00:00
  • 9e962d9c54 Fix OOB access in triangular matrix multiplication. Antonio Sánchez 2024-10-29 19:07:07 +00:00
  • 695e49d1bd Fix NVCC builds for CUDA 10+. Antonio Sánchez 2024-10-29 18:38:14 +00:00
  • dae09773fc Don't pass matrices by value. Antonio Sánchez 2024-10-29 18:19:02 +00:00
  • c23ec3420e Add tests for sizeof() with one dynamic dimension. Rasmus Munk Larsen 2024-10-28 13:48:53 -07:00
  • 58b252e5b3 Fix typo in PacketMath.h Rasmus Munk Larsen 2024-10-28 18:19:52 +00:00
  • 6c04d0cd68 Add missing exp2 definition for Altivec. Rasmus Munk Larsen 2024-10-28 18:12:36 +00:00
  • b15ebb1c2d add nextafter for bfloat16 Peter Gavin 2024-10-21 21:23:41 +00:00
  • 53b83cddf9 Include <type_traits> in main.h for std::is_trivial* Rasmus Munk Larsen 2024-10-25 20:55:51 +00:00
  • 37563856c9 Fix stack allocation assert Charles Schlosser 2024-10-25 17:02:43 +00:00
  • 3f067c4850 Add exp2() as a packet op and array method. Rasmus Munk Larsen 2024-10-22 22:09:34 +00:00
  • 4e5136d239 make fixed size matrices and arrays trivially_default_constructible Charles Schlosser 2024-10-21 17:10:15 +00:00
  • b396a6fbb2 Add free-function swap. Antonio Sánchez 2024-10-14 15:51:40 +00:00
  • 820e8a45fb add compile time info to reverse in place Charles Schlosser 2024-10-13 17:55:56 +00:00
  • b55dab7f21 Fix DenseBase::tail for Dynamic template argument Charles Schlosser 2024-10-12 21:03:30 +00:00
  • e0cbc55d92 Update README.md Charles Schlosser 2024-10-10 01:54:30 +00:00
  • 7eea0a9213 Vectorize erfc() for float Rasmus Munk Larsen 2024-10-09 18:38:05 +00:00
  • 78f3c654ee Don't use constexpr with half. Rasmus Munk Larsen 2024-10-08 16:44:40 +00:00
  • 6d7af238fa Adjust array_cwise for 32-bit arm. Antonio Sánchez 2024-10-07 23:15:24 +00:00
  • 74dcfbbd0f Use ppolevl for polynomial evaluation in more places. Rasmus Munk Larsen 2024-10-07 13:27:28 -07:00
  • a097f728fe Avoid producing erf(x) = NaN for large |x|. Rasmus Munk Larsen 2024-10-04 12:15:23 -07:00
  • 44b16f48cb Improve speed and accuracy or erf() Rasmus Munk Larsen 2024-10-03 01:52:16 +00:00
  • 12068cbcdb Fix inverse evaluator for running on CUDA device. Antonio Sánchez 2024-10-01 20:59:54 +00:00
  • 4e8e5e7409 Add max_digits10 in NumTraits for mpreal types. Rasmus Munk Larsen 2024-10-01 11:45:06 -07:00
  • 8e8c319087 Add missing EIGEN_DEVICE_FUNC annotations. Rasmus Munk Larsen 2024-10-01 11:40:58 -07:00
  • 7ad7c1d5c5 fix implicit conversion warning (again) Charles Schlosser 2024-09-24 22:07:00 +00:00
  • d052b7f864 add extra debugging info to float_pow_test_impl, clean up array_cwise tests Charles Schlosser 2024-09-24 21:08:22 +00:00
  • ba5183f98c fix warning in EigenSolver::pseudoEigenvalueMatrix() Charles Schlosser 2024-09-24 17:23:58 +00:00
  • 3ffb4e50df fix implicit conversion in TensorChipping Charles Schlosser 2024-09-24 16:58:49 +00:00
  • b6b8b54e5e Fixed issue #2858: removed unneeded call to _mm_setzero_si128 Sean McBride 2024-09-24 16:29:45 +00:00
  • 2a3465102a Refactor code to use constexpr for data() functions. Frédéric BRIOL 2024-09-23 16:43:53 +00:00
  • 2d4c9b400c make fixed size matrices and arrays trivially_copy_constructible and trivially_move_constructible Charles Schlosser 2024-09-17 17:43:36 +00:00
  • 132f281f50 Fix generic ceil for SSE2. Antonio Sánchez 2024-09-14 01:31:21 +00:00
  • 84282c42fc optimize new dot product Charles Schlosser 2024-09-11 21:40:43 +00:00
  • fb477b8be1 Better dot products Charles Schlosser 2024-09-10 21:02:31 +00:00
  • 134b526d61 Update NonBlockingThreadPool.h plain asserts to use eigen_plain_assert Sophie Chang 2024-09-10 00:18:27 +00:00
  • 072ec9d954 Fix a bug for pcmp_lt_or_nan and Add sqrt support for SVE qile lin 2024-09-04 21:45:39 +00:00
  • 9315389795 Fix bug in bug fix for atanh. Rasmus Munk Larsen 2024-09-04 09:37:59 -07:00
  • f33af052e0 Fix bug for atanh(-1). Rasmus Munk Larsen 2024-09-03 20:54:01 +00:00
  • 66927f7807 Fix out-of-range arguments to _mm_permute_pd. Rasmus Munk Larsen 2024-08-30 10:29:25 -07:00
  • bbdabebf44 Vectorize atanh<double>. Make atanh(x) standard compliant for |x| >= 1. Rasmus Munk Larsen 2024-08-30 17:27:55 +00:00
  • 26e2c4f617 Add nvc++ support Morris Hafner 2024-08-30 12:34:48 +00:00
  • c59332d74a Detect "effectively inner/outer" chipping in TensorChipping Eugene Zhulenev 2024-08-29 17:49:59 +00:00
  • 648bce6cae SSE/AVX Complex FMA Charles Schlosser 2024-08-29 17:37:57 +00:00
  • c21a80be3d BDCSVD: Suppress Wmaybe-uninitialized Charles Schlosser 2024-08-29 02:45:38 +00:00
  • 9d3d37c5b7 Complex Numtraits::HasSign and nmsub test Charles Schlosser 2024-08-28 03:02:47 +00:00
  • c5189ac656 Fix GeneralizedEigenSolver::eigenvectors() not appearing in documentation Valentin Sarthou 2024-08-21 11:17:30 +02:00
  • 3b5a1b4157 sve instrinsics with "_x" suffix will be faster than "_z" suffix qile lin 2024-08-23 12:52:22 +00:00
  • 98f1ac5e65 Fix breakage in GPU build. Rasmus Munk Larsen 2024-08-23 06:08:37 +00:00
  • 231308f690 TensorVolumePatchOp: Suppress Wmaybe-uninitialized caused by unreachable code Charles Schlosser 2024-08-23 01:55:12 +00:00
  • 2bf8fe1489 NEON Complex Intrinsics Tobias Wood 2024-08-22 22:46:16 +00:00
  • f91f8e9ab9 Consolidate float and double implementations of patan(). Rasmus Munk Larsen 2024-08-21 20:44:18 +00:00
  • 87239e058a vectorize squaredNorm() for complex types Charles Schlosser 2024-08-21 10:54:17 +00:00
  • 32d95bb097 Add vectorized implementation of tanh<double> Rasmus Munk Larsen 2024-08-21 02:29:45 +00:00
  • cc240eea2f Speed up and improve accuracy of tanh. Rasmus Munk Larsen 2024-08-16 23:46:28 +00:00
  • 92e373e6f5 Speed up StableNorm for non-trivial sizes and improve consistency between aligned and unaligned inputs. Rasmus Munk Larsen 2024-08-14 21:42:04 +00:00
  • 1dbc7581ec Include <thread> for std::this_thread::yield(). Rasmus Munk Larsen 2024-08-14 17:44:14 +00:00
  • ab310943d6 Add a yield instruction in the two spinloops of the threaded matmul implementation. Rasmus Munk Larsen 2024-08-09 10:48:24 -07:00
  • 99ffad1971 A few cleanups to threaded product code and test. Rasmus Munk Larsen 2024-08-09 09:35:23 -07:00