Commit Graph

97 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
6bcd941ee3 Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%. 2023-02-21 20:09:29 +00:00
Rasmus Munk Larsen
ce62177b5b Vectorize atanh & add a missing definition and unit test for atan. 2023-02-21 03:14:05 +00:00
Charles Schlosser
71a8e60a7a Tweak pasin_float, fix psqrt_complex 2023-02-15 01:01:14 +00:00
Charles Schlosser
325e3063d9 Optimize psign 2023-02-09 22:15:26 +00:00
Rasmus Munk Larsen
12ad99ce60 Remove unused variables from GenericPacketMathFunctions.h 2023-01-29 18:10:28 +00:00
Charles Schlosser
0471e61b4c Optimize various mathematical packet ops 2023-01-28 01:34:26 +00:00
Rasmus Munk Larsen
462758e8a3 Don't use generic sign function for sign(complex) unless it is vectorizable 2022-10-12 16:03:29 +00:00
Rasmus Munk Larsen
72db3f0fa5 Remove references to M_PI_2 and M_PI_4. 2022-10-11 00:27:16 +00:00
Rasmus Munk Larsen
e95c4a837f Simpler range reduction strategy for atan<float>(). 2022-10-04 18:11:00 +00:00
Rasmus Munk Larsen
c475228b28 Vectorize atan() for double. 2022-10-01 01:49:30 +00:00
Antonio Sanchez
3e44f960ed Reduce compiler warnings for tests. 2022-09-06 18:20:56 +00:00
Rasmus Munk Larsen
bd393e15c3 Vectorize acos, asin, and atan for float. 2022-08-29 19:49:33 +00:00
Charles Schlosser
e5af9f87f2 Vectorize pow for integer base / exponent types 2022-08-29 19:23:54 +00:00
Rasmus Munk Larsen
7064ed1345 Specialize psign<Packet8i> for AVX2, don't vectorize psign<bool>. 2022-08-26 17:02:37 +00:00
Rasmus Munk Larsen
98e51c9e24 Avoid undefined behavior in array_cwise test due to signed integer overflow 2022-08-26 16:19:03 +00:00
Rasmus Munk Larsen
6aad0f821b Fix psign for unsigned integer types, such as bool. 2022-08-22 20:19:35 +00:00
Rasmus Munk Larsen
7c67dc67ae Use proper double word division algorithm for pow<double>. Gives 11-15% speedup. 2022-08-17 18:36:23 +00:00
Charles Schlosser
76a669fb45 add fixed power unary operation 2022-08-16 21:32:36 +00:00
Rasmus Munk Larsen
97e0784dc6 Vectorize the sign operator in Eigen. 2022-08-09 19:54:57 +00:00
Rasmus Munk Larsen
7a87ed1b6a Fix code and unit test for a few corner cases in vectorized pow() 2022-08-08 18:48:36 +00:00
Tobias Schlüter
cb1e8228e9 Convert bit calculation to constexpr, avoid casts. 2022-03-13 22:38:36 +09:00
Rasmus Munk Larsen
0e6f4e43f1 Fix a few confusing comments in psincos_float. 2022-03-04 20:41:49 +00:00
Sean McBride
f1b9692d63 Removed EIGEN_UNUSED decorations from many functions that are in fact used 2022-03-03 20:19:33 +00:00
Antonio Sánchez
6b60bd6754 Fix 32-bit arm int issue. 2022-02-04 21:59:33 +00:00
Rasmus Munk Larsen
7b5a8b6bc5 Improve plog: 20% speedup for float + handle denormals 2022-01-05 23:40:31 +00:00
Rasmus Munk Larsen
8eab7b6886 Improve exp<float>(): Don't flush denormal results +4% speedup.
1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to
degree 6. With exactly representable coefficients computed by the Sollya tool,
this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for
arguments where exp(x) is a normalized float. This change results in a speedup
of about 4% for AVX2.


2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to
~[-104;88] i.e. return denormalized values for large negative arguments instead
of zero. Compared to exp<double>(x) the denormalized results gradually decrease
in accuracy down to 0.033 relative error for arguments around x = -104 where
exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.
2021-12-28 15:00:19 +00:00
Erik Schultheis
dee6428a71 fixed clang warnings about alignment change and floating point precision 2021-12-18 17:18:16 +00:00
Rasmus Munk Larsen
f04fd8b168 Make sure exp(-Inf) is zero for vectorized expressions. This fixes #2385. 2021-12-08 17:57:23 +00:00
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Rasmus Munk Larsen
bbfc4d54cd Use padd instead of +. 2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen
9312a5bf5c Implement a generic vectorized version of Smith's algorithms for complex division. 2021-07-01 23:31:12 +00:00
Rasmus Munk Larsen
5aebbe9098 Get rid of redundant pabs instruction in complex square root. 2021-06-29 23:26:15 +00:00
Antonio Sanchez
12e8d57108 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.
2021-06-16 18:41:17 -07:00
Rasmus Munk Larsen
fc87e2cbaa Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. 2021-06-11 02:35:53 +00:00
Antonio Sanchez
8dfe1029a5 Augment NumTraits with min/max_exponent() again.
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible.  Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.

The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.

Fixes #2148
2021-03-16 20:12:46 -07:00
David Tellenbach
df4bc2731c Revert "Augment NumTraits with min/max_exponent()."
This reverts commit 75ce9cd2a7.
2021-03-17 03:06:08 +01:00
Antonio Sanchez
75ce9cd2a7 Augment NumTraits with min/max_exponent().
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase.  Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.

Fixes #2148
2021-03-17 01:00:41 +00:00
Antonio Sanchez
82d61af3a4 Fix rint SSE/NEON again, using optimization barrier.
This is a new version of !423, which failed for MSVC.

Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).

Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.

Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part.  If we are
over `2^digits` then just return the input.  This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied.  Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
Christoph Hertzberg
4fb3459a23 Fix double-promotion warnings
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Rasmus Munk Larsen
88d4c6d4c8 Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Rasmus Munk Larsen
7f09d3487d Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp. 2021-02-18 20:49:18 +00:00
Rasmus Munk Larsen
be0574e215 New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double. 2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980 Updated pfrexp implementation.
The original implementation fails for 0, denormals, inf, and NaN.

See #2150
2021-02-17 02:23:24 +00:00
Antonio Sanchez
9fde9cce5d Adjust bounds for pexp_float/double
The original clamping bounds on `_x` actually produce finite values:
```
  exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38

  exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.

This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e Fix ldexp implementations.
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits.  See #2131 for a complete discussion,
and !375 for other possible implementations.

Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.

The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.

Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.

Fixes #2131.
2021-02-10 22:45:41 +00:00
Rasmus Munk Larsen
6e3b795f81 Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one. 2021-02-05 16:58:49 -08:00
Antonio Sanchez
f0e46ed5d4 Fix pow and other cwise ops for half/bfloat16.
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.

While adding tests for half/bfloat16, found other issues related to
implicit conversions.

Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types.  These seem to be  implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
b2126fd6b5 Fix pfrexp/pldexp for half.
The recent addition of vectorized pow (!330) relies on `pfrexp` and
`pldexp`.  This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.

Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00