Commit Graph

19 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
9aba527405 Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly. 2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
a3298b22ec Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)

The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.

Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
Gael Guennebaud
f11364290e ICC does not support -fno-unsafe-math-optimizations 2019-03-22 09:26:24 +01:00
Gael Guennebaud
1c09ee8541 bug #1674: workaround clang fast-math aggressive optimizations 2019-02-22 15:48:53 +01:00
Gael Guennebaud
871e2e5339 bug #1674: disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise) 2019-02-03 08:54:47 +01:00
Gael Guennebaud
4356a55a61 PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
9005f0111f Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7. 2019-01-11 17:10:54 +01:00
Gael Guennebaud
3f14e0d19e fix warning 2019-01-09 15:45:21 +01:00
Gael Guennebaud
e6b217b8dd bug #1652: implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
  - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Gael Guennebaud
0f6f75bd8a Implement a faster fix for sin/cos of large entries that also correctly handle INF input. 2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8 Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate) 2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb Fix plog(+INF): it returned ~87 instead of +INF 2018-12-23 15:40:52 +01:00
Gael Guennebaud
b477d60bc6 Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX) 2018-11-30 11:26:30 +01:00
Gael Guennebaud
b131a4db24 bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions. 2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21 Update pshiftleft to pass the shift as a true compile-time integer. 2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda Unify SSE/AVX psin functions.
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
 - add packet-float to packet-int type traits
 - add packet float<->int reinterpret casts
 - add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Gael Guennebaud
502f92fa10 Unify SSE and AVX pexp for double. 2018-11-26 23:12:44 +01:00
Gael Guennebaud
cf8b85d5c5 Unify SSE and AVX implementation of pexp 2018-11-26 16:36:19 +01:00
Gael Guennebaud
2c44c40114 First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00