Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69
bug #936 , patch 1/3: some cleanup and renaming for consistency.
2015-01-30 17:43:56 -05:00
Gael Guennebaud
ae4644cc68
bug #907 , ARM64: workaround ICE in xcode/clang
2015-01-13 10:03:00 +01:00
Gael Guennebaud
36f7c1337f
bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang
2015-01-13 09:57:37 +01:00
Gael Guennebaud
63974bcb88
Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)
2015-01-07 09:44:25 +01:00
Gael Guennebaud
79f4a59ed9
bug #907 : fix compilation with ARM64
2015-01-07 09:41:56 +01:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Konstantinos Margaritis
fcb3573d17
Merged eigen/eigen into default
2014-10-22 10:42:18 +03:00
Konstantinos Margaritis
fae4fd7a26
Added ARMv8 support
2014-10-22 07:39:49 +00:00
Konstantinos Margaritis
b508619392
working 64-bit support in PacketMath.h, Complex.h needed
2014-10-21 18:10:33 +00:00
Christoph Hertzberg
84aaa03182
Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
...
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f
Fix bug #859 : pexp(NaN) returned Inf instead of NaN
2014-10-20 11:38:51 +02:00
Konstantinos Margaritis
9d3c69952b
fixed to make big-endian VSX work as well
2014-10-01 09:43:56 +00:00
Konstantinos Margaritis
de38ff2499
prefetch are noops on VSX, actually disable the prefetch trait
2014-09-21 11:56:07 +00:00
Konstantinos Margaritis
60e093a9dc
Merged eigen/eigen into default
2014-09-21 14:02:51 +03:00
Konstantinos Margaritis
56408504e4
fix compile error on big endian altivec
2014-09-21 13:59:30 +03:00
Konstantinos Margaritis
974fe38ca3
prefetch are noops on VSX
2014-09-21 11:24:30 +00:00
Konstantinos Margaritis
c0205ca4af
VSX supports vec_div, implement where appropriate (float/doubles)
2014-09-21 08:12:22 +00:00
Konstantinos Margaritis
10f8aabb61
VSX port passes packetmath_[1-5] tests!
2014-09-20 22:31:31 +00:00
Konstantinos Margaritis
60663a510a
32-bit floats/ints, 64-bit doubles pass packetmath tests, complex 32/64-bit remaining
2014-09-19 21:05:01 +00:00
Konstantinos Margaritis
470aa15c35
First time it compiles, but fails to pass the tests.
2014-09-09 16:58:48 +00:00
Konstantinos Margaritis
7ff266e3ce
Initial VSX commit
2014-08-29 20:03:49 +00:00
Jitse Niesen
25bceefb4e
Replace asm by __asm__ (bug #873 )
2014-09-06 11:47:24 +01:00
Gael Guennebaud
0369db12af
bug #871 : fix compilation on ARM/Neon regarding __has_builtin usage
2014-09-01 10:52:58 +02:00
Konstantinos Margaritis
2c625ec9ba
Simplification of some Altivec constants, reuse existing constants and avoid loading from RAM esp in the case of p16uc_COMPLEX_TRANSPOSE*
2014-07-22 20:46:03 +00:00
Konstantinos Margaritis
0a945687b7
Added HasDiv=1 to Altivec PacketMath.h, now vectorization_logic test passes.
...
Added comments to the constants, indicative of the actual values
2014-07-15 11:02:51 +00:00
Christoph Hertzberg
d1460d9278
stride must be DenseIndex not int
2014-07-10 16:23:20 +02:00
Gael Guennebaud
b47ef1431f
Fix many long to int implicit conversions
2014-07-08 16:47:11 +02:00
Gael Guennebaud
d67aa1549b
Add missing add_subdirectory directive
2014-05-03 10:46:11 +02:00
Gael Guennebaud
450d0c3de0
Make sure that calls to broadcast4 are 16 bytes aligned
2014-04-25 22:25:48 +02:00
Gael Guennebaud
2dbfd83424
Implement pbroadcast4 on altivec
2014-04-25 02:46:57 -07:00
Gael Guennebaud
4def7b1fa5
Fix ptranspose overload prototypes for NEON
2014-04-25 11:15:13 +02:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
8d85ce88e1
Implement ptranspose on altivec and fix pgather/pscatter
2014-04-24 05:47:53 -07:00
Benoit Steiner
4eb92e5647
Fixed the NEON implementation of predux_max<Packet4i>.
2014-04-23 18:23:07 -07:00
Benoit Steiner
ccb4dec719
Created a NEON version of the ptranspose packet primitives
2014-04-23 18:22:10 -07:00
Gael Guennebaud
82b09fcb91
Add Altivec implementation of pgather/pscatter (not tested)
2014-04-23 13:09:26 +02:00
Gael Guennebaud
934ce93886
merge with default branch
2014-04-22 17:00:38 +02:00
Gael Guennebaud
5c5231ab71
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
2014-04-22 16:03:19 +02:00
Gael Guennebaud
1388f4f9fd
Fix typo (was working with clang\!)
2014-04-18 11:43:13 +02:00
Gael Guennebaud
2c3c95990d
merge
2014-04-17 22:50:49 +02:00
Benoit Steiner
6d6df90c9a
Implemented the pgather/pscatter packet primitives for the arm/NEON architecture
2014-04-17 12:28:01 -07:00
Gael Guennebaud
9746396d1b
Optimize AVX pset1 for complexes and ploaddup
2014-04-17 20:51:04 +02:00
Gael Guennebaud
0fa8290366
Optimize ploaddup for AVX
2014-04-17 16:02:27 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Benoit Steiner
feaf7c7e6d
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
2014-04-14 10:44:17 -07:00
Benoit Steiner
8044b00a7f
bug #782 : Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.
2014-04-03 23:41:47 +02:00
Gael Guennebaud
1c0728043a
Workaround alignment warnings
2014-03-30 22:43:47 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00