Commit Graph

296 Commits

Author SHA1 Message Date
Benoit Jacob
dc04f12967 use unsigned short instead of uint16_t which doesn't exist in c++98 2015-03-17 10:31:45 -04:00
Benoit Jacob
35c3a8bb84 Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance. 2015-03-16 11:05:51 -04:00
Benoit Jacob
02babb9c0f Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now. 2015-03-15 18:13:12 -04:00
Benoit Jacob
ccc1277a42 must also disable complex<double> when disabling double vectorization 2015-03-03 10:17:05 -05:00
Benoit Jacob
f839099512 Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics. 2015-03-03 09:35:22 -05:00
Benoit Jacob
1ec0f4fadf HalfPacket also needed to be disabled for double, on ARMv8. 2015-03-02 16:08:54 -05:00
Benoit Jacob
2fc3b484d7 remove trailing comma 2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4 Disable Packet2f/2i halfpacket support in NEON.
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Steiner
7765039f1c Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device. 2015-02-19 21:22:51 -08:00
Benoit Jacob
9bd8a4bab5 bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path
This is substantially faster on ARM, where it's important to minimize the number of loads.

This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.

Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
2015-02-18 15:03:35 -05:00
Gael Guennebaud
6f4adc9e94 Add missing install directives for arch/CUDA 2015-02-18 11:40:06 +01:00
Gael Guennebaud
eb563049f7 Remove some dead stores. 2015-02-18 11:26:48 +01:00
Gael Guennebaud
159fb181c2 Disable __m128* wrappers when compiling with AVX and -fabi-version=4 2015-02-17 16:27:20 +01:00
Gael Guennebaud
91ab2489dd Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI) 2015-02-17 16:08:07 +01:00
Gael Guennebaud
98604576d1 Merged in chtz/eigen-indexconversion (pull request PR-92)
bug #877, bug #572: Get rid of Index conversion warnings, summary of changes:

- Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t).

 - Eigen::Index is used throughout the API to represent indices, offsets, and sizes.

 - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int.

 - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
2015-02-16 15:29:00 +01:00
Gael Guennebaud
45cbb0bbb1 The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index 2015-02-16 15:05:41 +01:00
Benoit Steiner
e2cfddf75f Pulled latest updates from trunk 2015-02-13 16:21:59 -08:00
Benoit Steiner
0927801a84 Optimized version of the sin(), exp(), log() and sqrt() function for AVX 2015-02-13 16:07:08 -08:00
Gael Guennebaud
0918c51e60 merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper 2015-02-12 21:48:41 +01:00
Gael Guennebaud
029d236ceb merge 2015-02-10 23:12:47 +01:00
Gael Guennebaud
fe25f3b8e3 FMA has been wrongly disabled 2015-02-10 23:11:35 +01:00
Benoit Steiner
cc5d7ff523 Added vectorized implementation of the exponential function for ARM/NEON 2015-02-10 14:02:38 -08:00
Benoit Steiner
c739102ef9 Pulled the latest changes from the trunk 2015-02-06 05:25:03 -08:00
Benoit Jacob
5ef95fabee bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4)
and use it instead of MLA when available, because it's both more accurate,
and faster.
2015-01-30 17:45:03 -05:00
Benoit Jacob
0f21613698 bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD 2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14 bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69 bug #936, patch 1/3: some cleanup and renaming for consistency. 2015-01-30 17:43:56 -05:00
Gael Guennebaud
ae4644cc68 bug #907, ARM64: workaround ICE in xcode/clang 2015-01-13 10:03:00 +01:00
Gael Guennebaud
36f7c1337f bug #907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang 2015-01-13 09:57:37 +01:00
Gael Guennebaud
63974bcb88 Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64) 2015-01-07 09:44:25 +01:00
Gael Guennebaud
79f4a59ed9 bug #907: fix compilation with ARM64 2015-01-07 09:41:56 +01:00
Benoit Steiner
509e4ddc02 Added reduction packet primitives for CUDA 2014-11-19 10:34:11 -08:00
Gael Guennebaud
ee06f78679 Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively. 2014-11-04 21:58:52 +01:00
Benoit Steiner
1946cc4478 Added missing packet primitives for CUDA. 2014-10-30 17:52:32 -07:00
Konstantinos Margaritis
fcb3573d17 Merged eigen/eigen into default 2014-10-22 10:42:18 +03:00
Konstantinos Margaritis
fae4fd7a26 Added ARMv8 support 2014-10-22 07:39:49 +00:00
Konstantinos Margaritis
b508619392 working 64-bit support in PacketMath.h, Complex.h needed 2014-10-21 18:10:33 +00:00
Christoph Hertzberg
84aaa03182 Addendum to bug #859: pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f Fix bug #859: pexp(NaN) returned Inf instead of NaN 2014-10-20 11:38:51 +02:00
Benoit Steiner
95a430a2ca Vector primitives for CUDA 2014-10-03 19:45:19 -07:00
Konstantinos Margaritis
9d3c69952b fixed to make big-endian VSX work as well 2014-10-01 09:43:56 +00:00
Konstantinos Margaritis
de38ff2499 prefetch are noops on VSX, actually disable the prefetch trait 2014-09-21 11:56:07 +00:00
Konstantinos Margaritis
60e093a9dc Merged eigen/eigen into default 2014-09-21 14:02:51 +03:00
Konstantinos Margaritis
56408504e4 fix compile error on big endian altivec 2014-09-21 13:59:30 +03:00
Konstantinos Margaritis
974fe38ca3 prefetch are noops on VSX 2014-09-21 11:24:30 +00:00
Konstantinos Margaritis
c0205ca4af VSX supports vec_div, implement where appropriate (float/doubles) 2014-09-21 08:12:22 +00:00
Konstantinos Margaritis
10f8aabb61 VSX port passes packetmath_[1-5] tests! 2014-09-20 22:31:31 +00:00
Konstantinos Margaritis
60663a510a 32-bit floats/ints, 64-bit doubles pass packetmath tests, complex 32/64-bit remaining 2014-09-19 21:05:01 +00:00
Benoit Steiner
10a79ca3a3 Merged latest updates from the Eigen trunk. 2014-09-15 09:18:16 -07:00
Konstantinos Margaritis
470aa15c35 First time it compiles, but fails to pass the tests. 2014-09-09 16:58:48 +00:00