Commit Graph

209 Commits

Author SHA1 Message Date
Gael Guennebaud
b240080e64 bug #1436: fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing. 2017-06-15 10:16:30 +02:00
Rasmus Munk Larsen
5e144bbaa4 Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
5d00fdf0e8 bug #1363: fix mingw's ABI issue 2016-12-15 11:58:31 +01:00
Gael Guennebaud
178c084856 Disable usage of SSE3 _mm_hadd_ps that is extremely slow. 2016-11-22 21:53:14 +01:00
Gael Guennebaud
f3fb0a1940 Disable usage of SSE3 haddpd that is extremely slow. 2016-11-22 16:58:31 +01:00
Gael Guennebaud
598de8b193 Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX. 2016-11-02 10:38:13 +01:00
Gael Guennebaud
aad72f3c6d Add missing inline keywords 2016-10-25 20:20:09 +02:00
Benoit Steiner
3e194a6a73 Fixed a typo 2016-10-25 08:42:15 -07:00
Gael Guennebaud
13fc18d3a2 Add a pinsertlast function replacing the last entry of a packet by a scalar.
(useful to vectorize LinSpaced)
2016-10-25 16:48:49 +02:00
Rasmus Munk Larsen
765615609d Update comment for fast sqrt. 2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
                    SSE        AVX
Fast=1              2.529G     4.380G
Fast=0              1.944G     1.898G
Fast=1 fixed        2.214G     3.739G

This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Gael Guennebaud
471eac5399 bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX) 2016-09-08 08:36:27 +02:00
Gael Guennebaud
35a8e94577 bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command. 2016-08-29 10:59:37 +02:00
Gael Guennebaud
b3151bca40 Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available. 2016-08-23 14:24:08 +02:00
Gael Guennebaud
a4c266f827 Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only. 2016-08-23 14:23:08 +02:00
Benoit Jacob
40a16282c7 Remove now-unused protate PacketMath func 2016-05-24 11:01:18 -04:00
Benoit Steiner
8ce46f9d89 Improved implementation of ptanh for SSE and AVX 2016-02-18 13:24:34 -08:00
Benoit Steiner
6d8b1dce06 Avoid implicit cast from double to float. 2016-02-10 18:07:11 -08:00
Benoit Steiner
bfb3fcd94f Optimized implementation of the tanh function for SSE 2016-02-10 08:52:30 -08:00
Benoit Jacob
e6ee18d6b4 Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC 2016-02-10 11:11:49 -05:00
Benoit Jacob
964a95bf5e Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088 2016-02-10 10:37:22 -05:00
Gael Guennebaud
c2bf2f56ef Remove custom unaligned loads for SSE. They were only useful for core2 CPU. 2016-02-08 14:29:12 +01:00
Gael Guennebaud
7cae8918c0 Fix compilation on old gcc+AVX 2016-01-21 20:30:32 +01:00
Gael Guennebaud
8dca9f97e3 Add numext::sqrt function to enable custom optimized implementation.
This changeset add two specializations for float/double on SSE. Those
are mostly usefull with GCC for which std::sqrt add an extra and costly
check on the result of _mm_sqrt_*. Clang does not add this burden.

In this changeset, only DenseBase::norm() makes use of it.
2016-01-21 20:18:51 +01:00
Gael Guennebaud
70404e07c2 Workaround clang -Wdocumentation warning about "/*<" 2015-12-30 16:46:45 +01:00
Gael Guennebaud
ae87f094eb Fix "," in non SSE4 mode 2015-11-05 12:08:36 +01:00
Alexandre Avenel
d46e2c10a6 Add round, ceil and floor for SSE4.1/AVX (Bug #70) 2015-11-01 10:49:27 +01:00
Gael Guennebaud
6163db814c bug #1085: workaround gcc default ABI issue 2015-10-10 22:38:55 +02:00
Gael Guennebaud
f047ecc36a _mm_hadd_epi32 is for SSSE3 only (and not SSE3) 2015-10-07 15:48:35 +02:00
Gael Guennebaud
2c676ddb40 Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks) 2015-10-06 15:43:27 +02:00
Gael Guennebaud
6245591349 Fix prototype of plset and generalize linspace functor. 2015-08-07 19:27:59 +02:00
Gael Guennebaud
ce57dbd937 Let unpacket_traits<> exposes the required alignment and make use of it everywhere 2015-08-07 10:44:01 +02:00
Benoit Steiner
4fd7f47692 Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined. 2015-03-02 09:38:47 -08:00
Benoit Steiner
05089aba75 Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts 2015-02-27 09:27:30 -08:00
Benoit Steiner
573b377110 Added support for vectorized type casting of tensors 2015-02-27 08:46:04 -08:00
Benoit Steiner
f41b1f1666 Added support for fast reciprocal square root computation. 2015-02-26 09:42:41 -08:00
Benoit Jacob
9bd8a4bab5 bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path
This is substantially faster on ARM, where it's important to minimize the number of loads.

This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.

Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
2015-02-18 15:03:35 -05:00
Gael Guennebaud
eb563049f7 Remove some dead stores. 2015-02-18 11:26:48 +01:00
Gael Guennebaud
159fb181c2 Disable __m128* wrappers when compiling with AVX and -fabi-version=4 2015-02-17 16:27:20 +01:00
Gael Guennebaud
91ab2489dd Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI) 2015-02-17 16:08:07 +01:00
Gael Guennebaud
45cbb0bbb1 The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index 2015-02-16 15:05:41 +01:00
Gael Guennebaud
0918c51e60 merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper 2015-02-12 21:48:41 +01:00
Gael Guennebaud
fe25f3b8e3 FMA has been wrongly disabled 2015-02-10 23:11:35 +01:00
Benoit Steiner
c739102ef9 Pulled the latest changes from the trunk 2015-02-06 05:25:03 -08:00
Benoit Jacob
0f21613698 bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD 2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14 bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Gael Guennebaud
ee06f78679 Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively. 2014-11-04 21:58:52 +01:00
Christoph Hertzberg
84aaa03182 Addendum to bug #859: pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f Fix bug #859: pexp(NaN) returned Inf instead of NaN 2014-10-20 11:38:51 +02:00
Benoit Steiner
16047c8d4a Pulled in the latest changes from the Eigen trunk 2014-08-13 22:25:29 -07:00