eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	9aba527405	Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.	2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen	a3298b22ec	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM	2019-08-12 13:53:28 -07:00
Gael Guennebaud	f11364290e	ICC does not support -fno-unsafe-math-optimizations	2019-03-22 09:26:24 +01:00
Gael Guennebaud	1c09ee8541	bug #1674 : workaround clang fast-math aggressive optimizations	2019-02-22 15:48:53 +01:00
Gael Guennebaud	871e2e5339	bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)	2019-02-03 08:54:47 +01:00
Gael Guennebaud	4356a55a61	PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos. This makes both the small and huge argument cases faster because: - for small inputs this removes the last pselect - for large inputs only the reduction part follows a scalar path, the rest use the same SIMD path as the small-argument case.	2019-01-14 13:54:01 +01:00
Gael Guennebaud	9005f0111f	Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7.	2019-01-11 17:10:54 +01:00
Gael Guennebaud	3f14e0d19e	fix warning	2019-01-09 15:45:21 +01:00
Gael Guennebaud	e6b217b8dd	bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows: - no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs	2019-01-09 15:25:17 +01:00
Gael Guennebaud	0f6f75bd8a	Implement a faster fix for sin/cos of large entries that also correctly handle INF input.	2018-12-23 17:26:21 +01:00
Gael Guennebaud	38d704def8	Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)	2018-12-23 16:13:24 +01:00
Gael Guennebaud	5713fb7feb	Fix plog(+INF): it returned ~87 instead of +INF	2018-12-23 15:40:52 +01:00
Gael Guennebaud	b477d60bc6	Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)	2018-11-30 11:26:30 +01:00
Gael Guennebaud	b131a4db24	bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.	2018-11-27 23:45:00 +01:00
Gael Guennebaud	a1a5fbbd21	Update pshiftleft to pass the shift as a true compile-time integer.	2018-11-27 22:57:30 +01:00
Gael Guennebaud	fa7fd61eda	Unify SSE/AVX psin functions. It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv	2018-11-27 22:41:51 +01:00
Gael Guennebaud	502f92fa10	Unify SSE and AVX pexp for double.	2018-11-26 23:12:44 +01:00
Gael Guennebaud	cf8b85d5c5	Unify SSE and AVX implementation of pexp	2018-11-26 16:36:19 +01:00
Gael Guennebaud	2c44c40114	First step toward a unification of packet log implementation, currently only SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.	2018-11-26 14:21:24 +01:00

19 Commits