Commit Graph

141 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
75bcd155c4 Vectorize tan(x)
libeigen/eigen!2086

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-12-02 21:53:10 +00:00
Rasmus Munk Larsen
db90c4939c Add a ptanh_float implementation that is accurate to 1 ULP
libeigen/eigen!2082

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-26 00:17:12 +00:00
Rasmus Munk Larsen
8eb6551a8a Add support for complex numbers in the generic clang backend
libeigen/eigen!2078

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-20 00:26:37 +00:00
Rasmus Munk Larsen
ec93a6d098 Add a generic Eigen backend based on clang vector extensions
The goal of this MR is to implement a generic SIMD backend (packet ops) for Eigen that uses clang vector extensions instead of platform-dependent intrinsics. Ideally, this should make it possible to build Eigen and achieve reasonable speed on any platform that has a recent clang compiler, without having to write any inline assembly or intrinsics.

Caveats:

* The current implementation is a proof of concept and supports vectorization for float, double, int32_t, and int64_t using fixed-size 512-bit vectors (a somewhat arbitrary choice). I have not done much to tune this for speed yet.
* For now, there is no way to enable this other than setting -DEIGEN_VECTORIZE_GENERIC on the command line.
* This only compiles with newer versions of clang. I have tested that it compiles and all tests pass with clang 19.1.7.

https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors

Closes #2998 and #2997

See merge request libeigen/eigen!2051

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
Co-authored-by: Antonio Sánchez <cantonios@google.com>
2025-11-06 21:52:19 +00:00
Antonio Sánchez
db8bd5b825 Modify pselect and various masks to use Scalar(1) for true. 2025-06-20 22:40:46 +00:00
Rasmus Munk Larsen
66d8111ac1 Use a more conservative method to detect non-finite inputs to cbrt. 2025-04-21 20:59:46 +00:00
Tyler Veness
d6689a15d7 Replace instances of EIGEN_CONSTEXPR macro 2025-04-18 08:27:52 -07:00
Rasmus Munk Larsen
33f5f59614 Vectorize cbrt for float and double. 2025-04-17 23:31:20 +00:00
Rasmus Munk Larsen
f19a6803c8 Refactor special case handling in pow(x,y) and revert to repeated squaring for <float,int> 2024-11-27 00:24:21 +00:00
Rasmus Munk Larsen
1ea61a5d26 Improve pow(x,y): 25% speedup, increase accuracy for integer exponents. 2024-11-26 06:13:48 +00:00
Rasmus Munk Larsen
5610a13b77 Simplify and speed up pow() by 5-6% 2024-11-20 12:45:00 +00:00
Rasmus Munk Larsen
e7c799b7c9 Prevent premature overflow to infinity in exp(x). The changes also provide a 3-4% speedup. 2024-11-19 13:08:18 -08:00
Rasmus Munk Larsen
8ee6f8475a Speed up exp(x). 2024-11-19 17:50:34 +00:00
Rasmus Munk Larsen
283d871a3f Add missing EIGEN_DEVICE_FUNCTION decorations. 2024-11-08 14:25:57 -08:00
Rasmus Munk Larsen
0d366f6532 Vectorize erfc(x) for double and improve erfc(x) for float. 2024-11-08 17:21:11 +00:00
Rasmus Munk Larsen
3f067c4850 Add exp2() as a packet op and array method. 2024-10-22 22:09:34 +00:00
Rasmus Munk Larsen
74dcfbbd0f Use ppolevl for polynomial evaluation in more places. 2024-10-07 13:27:28 -07:00
Antonio Sánchez
132f281f50 Fix generic ceil for SSE2. 2024-09-14 01:31:21 +00:00
Rasmus Munk Larsen
9315389795 Fix bug in bug fix for atanh. 2024-09-04 09:37:59 -07:00
Rasmus Munk Larsen
f33af052e0 Fix bug for atanh(-1). 2024-09-03 20:54:01 +00:00
Rasmus Munk Larsen
bbdabebf44 Vectorize atanh<double>. Make atanh(x) standard compliant for |x| >= 1. 2024-08-30 17:27:55 +00:00
Rasmus Munk Larsen
f91f8e9ab9 Consolidate float and double implementations of patan(). 2024-08-21 20:44:18 +00:00
Rasmus Munk Larsen
32d95bb097 Add vectorized implementation of tanh<double> 2024-08-21 02:29:45 +00:00
Rasmus Munk Larsen
cc240eea2f Speed up and improve accuracy of tanh. 2024-08-16 23:46:28 +00:00
Frédéric Chapoton
6331da95eb fixing a lot of typos 2024-07-30 22:15:49 +00:00
Rasmus Munk Larsen
9000b37677 Fix new generic nearest integer ops on GPU. 2024-04-30 22:18:25 +00:00
Charles Schlosser
fb95e90f7f Add truncation op 2024-04-29 23:45:49 +00:00
Rasmus Munk Larsen
112ad8b846 Revert part of !1583, which may cause underflow on ARM. 2024-04-22 21:14:38 +00:00
Chip Kerchner
ad452e575d Fix compilation problems with PacketI on PowerPC. 2024-04-18 14:55:15 +00:00
Damiano Franzò
888fca0e2b Simd sincos double 2024-04-15 21:12:32 +00:00
Rasmus Munk Larsen
5226566a14 Speed up pldexp_generic. 2024-04-12 01:32:17 +00:00
Antonio Sánchez
38fcedaf8e Fix pexp complex test edge-cases. 2024-03-04 17:44:38 +00:00
Antonio Sánchez
6b365e74d6 Fix GPU build for ptanh_float. 2024-02-20 16:08:50 +00:00
Damiano Franzò
be06c9ad51 Implement float pexp_complex 2024-02-17 00:26:57 +00:00
Rasmus Munk Larsen
4d419e2209 Rename generic_fast_tanh_float to ptanh_float and move it to... 2024-02-16 21:27:22 +00:00
Damiano Franzò
7fd7a3f946 Implement plog_complex 2024-01-30 19:06:05 +00:00
Antonio Sánchez
a73970a864 Fix arm32 issues. 2024-01-23 22:04:55 +00:00
Tobias Wood
f38e16c193 Apply clang-format 2023-11-29 11:12:48 +00:00
Antonio Sánchez
6e4d5d4832 Add IWYU private pragmas to internal headers. 2023-08-21 16:25:22 +00:00
Charles Schlosser
387175c258 Fix safe_abs in int_pow 2023-06-23 04:12:41 +00:00
Charles Schlosser
b7151ffaab Fix unary pow error handling and test 2023-06-06 18:46:55 +00:00
Charles Schlosser
1d80e23186 Optimize scalar_unary_pow_op error handling 2023-06-02 18:53:06 +00:00
Charles Schlosser
29c8e3c754 fix pow for uint32_t, disable pmul<Packet4ul> 2023-04-21 05:47:56 +00:00
Rasmus Munk Larsen
0488b708b4 Vectorize tensor.isnan() by using typed predicates. 2023-03-16 04:04:22 +00:00
Rasmus Munk Larsen
6bcd941ee3 Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%. 2023-02-21 20:09:29 +00:00
Rasmus Munk Larsen
ce62177b5b Vectorize atanh & add a missing definition and unit test for atan. 2023-02-21 03:14:05 +00:00
Charles Schlosser
71a8e60a7a Tweak pasin_float, fix psqrt_complex 2023-02-15 01:01:14 +00:00
Charles Schlosser
325e3063d9 Optimize psign 2023-02-09 22:15:26 +00:00
Rasmus Munk Larsen
12ad99ce60 Remove unused variables from GenericPacketMathFunctions.h 2023-01-29 18:10:28 +00:00
Charles Schlosser
0471e61b4c Optimize various mathematical packet ops 2023-01-28 01:34:26 +00:00