Gael Guennebaud
35a8e94577
bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.
2016-08-29 10:59:37 +02:00
Gael Guennebaud
b3151bca40
Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.
2016-08-23 14:24:08 +02:00
Gael Guennebaud
a4c266f827
Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.
2016-08-23 14:23:08 +02:00
Benoit Jacob
40a16282c7
Remove now-unused protate PacketMath func
2016-05-24 11:01:18 -04:00
Benoit Steiner
8ce46f9d89
Improved implementation of ptanh for SSE and AVX
2016-02-18 13:24:34 -08:00
Benoit Steiner
6d8b1dce06
Avoid implicit cast from double to float.
2016-02-10 18:07:11 -08:00
Benoit Steiner
bfb3fcd94f
Optimized implementation of the tanh function for SSE
2016-02-10 08:52:30 -08:00
Benoit Jacob
e6ee18d6b4
Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC
2016-02-10 11:11:49 -05:00
Benoit Jacob
964a95bf5e
Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088
2016-02-10 10:37:22 -05:00
Gael Guennebaud
c2bf2f56ef
Remove custom unaligned loads for SSE. They were only useful for core2 CPU.
2016-02-08 14:29:12 +01:00
Gael Guennebaud
7cae8918c0
Fix compilation on old gcc+AVX
2016-01-21 20:30:32 +01:00
Gael Guennebaud
8dca9f97e3
Add numext::sqrt function to enable custom optimized implementation.
...
This changeset add two specializations for float/double on SSE. Those
are mostly usefull with GCC for which std::sqrt add an extra and costly
check on the result of _mm_sqrt_*. Clang does not add this burden.
In this changeset, only DenseBase::norm() makes use of it.
2016-01-21 20:18:51 +01:00
Gael Guennebaud
70404e07c2
Workaround clang -Wdocumentation warning about "/*<"
2015-12-30 16:46:45 +01:00
Gael Guennebaud
ae87f094eb
Fix "," in non SSE4 mode
2015-11-05 12:08:36 +01:00
Alexandre Avenel
d46e2c10a6
Add round, ceil and floor for SSE4.1/AVX (Bug #70 )
2015-11-01 10:49:27 +01:00
Gael Guennebaud
6163db814c
bug #1085 : workaround gcc default ABI issue
2015-10-10 22:38:55 +02:00
Gael Guennebaud
f047ecc36a
_mm_hadd_epi32 is for SSSE3 only (and not SSE3)
2015-10-07 15:48:35 +02:00
Gael Guennebaud
2c676ddb40
Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)
2015-10-06 15:43:27 +02:00
Gael Guennebaud
6245591349
Fix prototype of plset and generalize linspace functor.
2015-08-07 19:27:59 +02:00
Gael Guennebaud
ce57dbd937
Let unpacket_traits<> exposes the required alignment and make use of it everywhere
2015-08-07 10:44:01 +02:00
Benoit Steiner
4fd7f47692
Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.
2015-03-02 09:38:47 -08:00
Benoit Steiner
05089aba75
Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts
2015-02-27 09:27:30 -08:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00
Benoit Jacob
9bd8a4bab5
bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path
...
This is substantially faster on ARM, where it's important to minimize the number of loads.
This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.
Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
2015-02-18 15:03:35 -05:00
Gael Guennebaud
eb563049f7
Remove some dead stores.
2015-02-18 11:26:48 +01:00
Gael Guennebaud
159fb181c2
Disable __m128* wrappers when compiling with AVX and -fabi-version=4
2015-02-17 16:27:20 +01:00
Gael Guennebaud
91ab2489dd
Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)
2015-02-17 16:08:07 +01:00
Gael Guennebaud
45cbb0bbb1
The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index
2015-02-16 15:05:41 +01:00
Gael Guennebaud
0918c51e60
merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper
2015-02-12 21:48:41 +01:00
Gael Guennebaud
fe25f3b8e3
FMA has been wrongly disabled
2015-02-10 23:11:35 +01:00
Benoit Steiner
c739102ef9
Pulled the latest changes from the trunk
2015-02-06 05:25:03 -08:00
Benoit Jacob
0f21613698
bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD
2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Christoph Hertzberg
84aaa03182
Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
...
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f
Fix bug #859 : pexp(NaN) returned Inf instead of NaN
2014-10-20 11:38:51 +02:00
Benoit Steiner
16047c8d4a
Pulled in the latest changes from the Eigen trunk
2014-08-13 22:25:29 -07:00
Gael Guennebaud
b47ef1431f
Fix many long to int implicit conversions
2014-07-08 16:47:11 +02:00
Benoit Steiner
29aebf96e6
Created the pblend packet primitive and implemented it using SSE and AVX instructions.
2014-06-06 20:18:44 -07:00
Gael Guennebaud
450d0c3de0
Make sure that calls to broadcast4 are 16 bytes aligned
2014-04-25 22:25:48 +02:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
5c5231ab71
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
2014-04-22 16:03:19 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Benoit Steiner
feaf7c7e6d
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
2014-04-14 10:44:17 -07:00
Gael Guennebaud
1c0728043a
Workaround alignment warnings
2014-03-30 22:43:47 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Benoit Steiner
8a94cb3edd
Implemented the SSE version of the gather and scatter packet primitives.
2014-03-27 18:29:01 -07:00
Gael Guennebaud
052aedd394
Implement pcplflip, palign, predux and the likes from AVC/complexes
2014-03-27 14:47:00 +01:00