eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	6bcd941ee3	Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%.	2023-02-21 20:09:29 +00:00
Rasmus Munk Larsen	ce62177b5b	Vectorize atanh & add a missing definition and unit test for atan.	2023-02-21 03:14:05 +00:00
Charles Schlosser	71a8e60a7a	Tweak pasin_float, fix psqrt_complex	2023-02-15 01:01:14 +00:00
Charles Schlosser	325e3063d9	Optimize psign	2023-02-09 22:15:26 +00:00
Rasmus Munk Larsen	12ad99ce60	Remove unused variables from GenericPacketMathFunctions.h	2023-01-29 18:10:28 +00:00
Charles Schlosser	0471e61b4c	Optimize various mathematical packet ops	2023-01-28 01:34:26 +00:00
Rasmus Munk Larsen	462758e8a3	Don't use generic sign function for sign(complex) unless it is vectorizable	2022-10-12 16:03:29 +00:00
Rasmus Munk Larsen	72db3f0fa5	Remove references to M_PI_2 and M_PI_4.	2022-10-11 00:27:16 +00:00
Rasmus Munk Larsen	e95c4a837f	Simpler range reduction strategy for atan<float>().	2022-10-04 18:11:00 +00:00
Rasmus Munk Larsen	c475228b28	Vectorize atan() for double.	2022-10-01 01:49:30 +00:00
Antonio Sanchez	3e44f960ed	Reduce compiler warnings for tests.	2022-09-06 18:20:56 +00:00
Rasmus Munk Larsen	bd393e15c3	Vectorize acos, asin, and atan for float.	2022-08-29 19:49:33 +00:00
Charles Schlosser	e5af9f87f2	Vectorize pow for integer base / exponent types	2022-08-29 19:23:54 +00:00
Rasmus Munk Larsen	7064ed1345	Specialize psign<Packet8i> for AVX2, don't vectorize psign<bool>.	2022-08-26 17:02:37 +00:00
Rasmus Munk Larsen	98e51c9e24	Avoid undefined behavior in array_cwise test due to signed integer overflow	2022-08-26 16:19:03 +00:00
Rasmus Munk Larsen	6aad0f821b	Fix psign for unsigned integer types, such as bool.	2022-08-22 20:19:35 +00:00
Rasmus Munk Larsen	7c67dc67ae	Use proper double word division algorithm for pow<double>. Gives 11-15% speedup.	2022-08-17 18:36:23 +00:00
Charles Schlosser	76a669fb45	add fixed power unary operation	2022-08-16 21:32:36 +00:00
Rasmus Munk Larsen	97e0784dc6	Vectorize the sign operator in Eigen.	2022-08-09 19:54:57 +00:00
Rasmus Munk Larsen	7a87ed1b6a	Fix code and unit test for a few corner cases in vectorized pow()	2022-08-08 18:48:36 +00:00
Tobias Schlüter	cb1e8228e9	Convert bit calculation to constexpr, avoid casts.	2022-03-13 22:38:36 +09:00
Rasmus Munk Larsen	0e6f4e43f1	Fix a few confusing comments in psincos_float.	2022-03-04 20:41:49 +00:00
Sean McBride	f1b9692d63	Removed EIGEN_UNUSED decorations from many functions that are in fact used	2022-03-03 20:19:33 +00:00
Antonio Sánchez	6b60bd6754	Fix 32-bit arm int issue.	2022-02-04 21:59:33 +00:00
Rasmus Munk Larsen	7b5a8b6bc5	Improve plog: 20% speedup for float + handle denormals	2022-01-05 23:40:31 +00:00
Rasmus Munk Larsen	8eab7b6886	Improve exp<float>(): Don't flush denormal results +4% speedup. 1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to degree 6. With exactly representable coefficients computed by the Sollya tool, this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for arguments where exp(x) is a normalized float. This change results in a speedup of about 4% for AVX2. 2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to ~[-104;88] i.e. return denormalized values for large negative arguments instead of zero. Compared to exp<double>(x) the denormalized results gradually decrease in accuracy down to 0.033 relative error for arguments around x = -104 where exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.	2021-12-28 15:00:19 +00:00
Erik Schultheis	dee6428a71	fixed clang warnings about alignment change and floating point precision	2021-12-18 17:18:16 +00:00
Rasmus Munk Larsen	f04fd8b168	Make sure exp(-Inf) is zero for vectorized expressions. This fixes #2385 .	2021-12-08 17:57:23 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Rasmus Munk Larsen	bbfc4d54cd	Use `padd` instead of `+`.	2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen	9312a5bf5c	Implement a generic vectorized version of Smith's algorithms for complex division.	2021-07-01 23:31:12 +00:00
Rasmus Munk Larsen	5aebbe9098	Get rid of redundant `pabs` instruction in complex square root.	2021-06-29 23:26:15 +00:00
Antonio Sanchez	12e8d57108	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.	2021-06-16 18:41:17 -07:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Antonio Sanchez	8dfe1029a5	Augment NumTraits with min/max_exponent() again. Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase where possible. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. The previous MR !443 failed for c++03 due to lack of `constexpr`. Because of this, we need to keep around the `std::numeric_limits` version in enum expressions until the switch to c++11. Fixes #2148	2021-03-16 20:12:46 -07:00
David Tellenbach	df4bc2731c	Revert "Augment NumTraits with min/max_exponent()." This reverts commit `75ce9cd2a7`.	2021-03-17 03:06:08 +01:00
Antonio Sanchez	75ce9cd2a7	Augment NumTraits with min/max_exponent(). Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. Fixes #2148	2021-03-17 01:00:41 +00:00
Antonio Sanchez	82d61af3a4	Fix rint SSE/NEON again, using optimization barrier. This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-05 08:54:12 -08:00
Christoph Hertzberg	4fb3459a23	Fix double-promotion warnings (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)	2021-02-27 18:44:26 +01:00
Rasmus Munk Larsen	88d4c6d4c8	Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.	2021-02-23 23:11:03 +00:00
Rasmus Munk Larsen	7f09d3487d	Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.	2021-02-18 20:49:18 +00:00
Rasmus Munk Larsen	be0574e215	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.	2021-02-17 02:50:32 +00:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
Antonio Sanchez	9fde9cce5d	Adjust bounds for pexp_float/double The original clamping bounds on `_x` actually produce finite values: ``` exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38 exp(709.437) = 1.27226e+308 < 1.79769e+308 ``` so with an accurate `ldexp` implementation, `pexp` fails for large inputs, producing finite values instead of `inf`. This adjusts the bounds slightly outside the finite range so that the output will overflow to +/- `inf` as expected.	2021-02-10 22:48:05 +00:00
Antonio Sanchez	4cb563a01e	Fix ldexp implementations. The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.	2021-02-10 22:45:41 +00:00
Rasmus Munk Larsen	6e3b795f81	Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.	2021-02-05 16:58:49 -08:00
Antonio Sanchez	f0e46ed5d4	Fix pow and other cwise ops for half/bfloat16. The new `generic_pow` implementation was failing for half/bfloat16 since their construction from int/float is not `constexpr`. Modified in `GenericPacketMathFunctions` to remove `constexpr`. While adding tests for half/bfloat16, found other issues related to implicit conversions. Also needed to implement `numext::arg` for non-integer, non-complex, non-float/double/long double types. These seem to be implicitly converted to `std::complex<T>`, which then fails for half/bfloat16.	2021-01-22 11:10:54 -08:00
Antonio Sanchez	b2126fd6b5	Fix pfrexp/pldexp for half. The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.	2021-01-21 19:32:28 +00:00

1 2

97 Commits