Benoit Steiner
05089aba75
Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts
2015-02-27 09:27:30 -08:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00
Benoit Steiner
7765039f1c
Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device.
2015-02-19 21:22:51 -08:00
Benoit Jacob
9bd8a4bab5
bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path
...
This is substantially faster on ARM, where it's important to minimize the number of loads.
This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.
Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
2015-02-18 15:03:35 -05:00
Gael Guennebaud
6f4adc9e94
Add missing install directives for arch/CUDA
2015-02-18 11:40:06 +01:00
Gael Guennebaud
eb563049f7
Remove some dead stores.
2015-02-18 11:26:48 +01:00
Gael Guennebaud
159fb181c2
Disable __m128* wrappers when compiling with AVX and -fabi-version=4
2015-02-17 16:27:20 +01:00
Gael Guennebaud
91ab2489dd
Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)
2015-02-17 16:08:07 +01:00
Gael Guennebaud
98604576d1
Merged in chtz/eigen-indexconversion (pull request PR-92)
...
bug #877 , bug #572 : Get rid of Index conversion warnings, summary of changes:
- Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t).
- Eigen::Index is used throughout the API to represent indices, offsets, and sizes.
- Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int.
- Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
2015-02-16 15:29:00 +01:00
Gael Guennebaud
45cbb0bbb1
The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index
2015-02-16 15:05:41 +01:00
Benoit Steiner
e2cfddf75f
Pulled latest updates from trunk
2015-02-13 16:21:59 -08:00
Benoit Steiner
0927801a84
Optimized version of the sin(), exp(), log() and sqrt() function for AVX
2015-02-13 16:07:08 -08:00
Gael Guennebaud
0918c51e60
merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper
2015-02-12 21:48:41 +01:00
Gael Guennebaud
029d236ceb
merge
2015-02-10 23:12:47 +01:00
Gael Guennebaud
fe25f3b8e3
FMA has been wrongly disabled
2015-02-10 23:11:35 +01:00
Benoit Steiner
cc5d7ff523
Added vectorized implementation of the exponential function for ARM/NEON
2015-02-10 14:02:38 -08:00
Benoit Steiner
c739102ef9
Pulled the latest changes from the trunk
2015-02-06 05:25:03 -08:00
Benoit Jacob
5ef95fabee
bug #936 , patch 3/3: Properly detect FMA support on ARM (requires VFPv4)
...
and use it instead of MLA when available, because it's both more accurate,
and faster.
2015-01-30 17:45:03 -05:00
Benoit Jacob
0f21613698
bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD
2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69
bug #936 , patch 1/3: some cleanup and renaming for consistency.
2015-01-30 17:43:56 -05:00
Gael Guennebaud
ae4644cc68
bug #907 , ARM64: workaround ICE in xcode/clang
2015-01-13 10:03:00 +01:00
Gael Guennebaud
36f7c1337f
bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang
2015-01-13 09:57:37 +01:00
Gael Guennebaud
63974bcb88
Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)
2015-01-07 09:44:25 +01:00
Gael Guennebaud
79f4a59ed9
bug #907 : fix compilation with ARM64
2015-01-07 09:41:56 +01:00
Benoit Steiner
509e4ddc02
Added reduction packet primitives for CUDA
2014-11-19 10:34:11 -08:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Benoit Steiner
1946cc4478
Added missing packet primitives for CUDA.
2014-10-30 17:52:32 -07:00
Konstantinos Margaritis
fcb3573d17
Merged eigen/eigen into default
2014-10-22 10:42:18 +03:00
Konstantinos Margaritis
fae4fd7a26
Added ARMv8 support
2014-10-22 07:39:49 +00:00
Konstantinos Margaritis
b508619392
working 64-bit support in PacketMath.h, Complex.h needed
2014-10-21 18:10:33 +00:00
Christoph Hertzberg
84aaa03182
Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
...
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f
Fix bug #859 : pexp(NaN) returned Inf instead of NaN
2014-10-20 11:38:51 +02:00
Benoit Steiner
95a430a2ca
Vector primitives for CUDA
2014-10-03 19:45:19 -07:00
Konstantinos Margaritis
9d3c69952b
fixed to make big-endian VSX work as well
2014-10-01 09:43:56 +00:00
Konstantinos Margaritis
de38ff2499
prefetch are noops on VSX, actually disable the prefetch trait
2014-09-21 11:56:07 +00:00
Konstantinos Margaritis
60e093a9dc
Merged eigen/eigen into default
2014-09-21 14:02:51 +03:00
Konstantinos Margaritis
56408504e4
fix compile error on big endian altivec
2014-09-21 13:59:30 +03:00
Konstantinos Margaritis
974fe38ca3
prefetch are noops on VSX
2014-09-21 11:24:30 +00:00
Konstantinos Margaritis
c0205ca4af
VSX supports vec_div, implement where appropriate (float/doubles)
2014-09-21 08:12:22 +00:00
Konstantinos Margaritis
10f8aabb61
VSX port passes packetmath_[1-5] tests!
2014-09-20 22:31:31 +00:00
Konstantinos Margaritis
60663a510a
32-bit floats/ints, 64-bit doubles pass packetmath tests, complex 32/64-bit remaining
2014-09-19 21:05:01 +00:00
Benoit Steiner
10a79ca3a3
Merged latest updates from the Eigen trunk.
2014-09-15 09:18:16 -07:00
Konstantinos Margaritis
470aa15c35
First time it compiles, but fails to pass the tests.
2014-09-09 16:58:48 +00:00
Konstantinos Margaritis
7ff266e3ce
Initial VSX commit
2014-08-29 20:03:49 +00:00
Benoit Steiner
16047c8d4a
Pulled in the latest changes from the Eigen trunk
2014-08-13 22:25:29 -07:00
Jitse Niesen
25bceefb4e
Replace asm by __asm__ (bug #873 )
2014-09-06 11:47:24 +01:00
Gael Guennebaud
0369db12af
bug #871 : fix compilation on ARM/Neon regarding __has_builtin usage
2014-09-01 10:52:58 +02:00
Konstantinos Margaritis
2c625ec9ba
Simplification of some Altivec constants, reuse existing constants and avoid loading from RAM esp in the case of p16uc_COMPLEX_TRANSPOSE*
2014-07-22 20:46:03 +00:00