Gael Guennebaud
450d0c3de0
Make sure that calls to broadcast4 are 16 bytes aligned
2014-04-25 22:25:48 +02:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
5c5231ab71
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
2014-04-22 16:03:19 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Benoit Steiner
feaf7c7e6d
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
2014-04-14 10:44:17 -07:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Benoit Steiner
8a94cb3edd
Implemented the SSE version of the gather and scatter packet primitives.
2014-03-27 18:29:01 -07:00
Benoit Steiner
a419cea4a0
Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions.
...
Implemented the primitive using SSE instructions.
2014-03-26 19:03:07 -07:00
Benoit Steiner
cc73164aa8
Merged latest updates from the parent branch
2014-03-26 15:23:59 -07:00
Gael Guennebaud
bc401eb6fa
Implement new 1 packet x 8 gebp kernel
2014-03-26 18:53:00 +01:00
Gael Guennebaud
b286a1e75c
add pbroadcast2/4 generic intrinsics
2014-03-26 16:46:36 +01:00
Benoit Steiner
db7d49efbb
Added support for FMA instructions
2014-02-24 13:45:32 -08:00
Benoit Steiner
64a85800bd
Added support for AVX to Eigen.
2014-01-29 11:43:05 -08:00
Gael Guennebaud
a7621809fe
Remove useless register keyword, and optimize predux_min/max for SSE4
2014-01-25 16:54:13 +01:00
Gael Guennebaud
01fd880424
Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction.
...
The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply.
Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.
2014-03-20 16:03:46 +01:00
Gael Guennebaud
c39a3fa7a1
Makes gcc to generate a pshufd instruction for pset1
2014-03-20 10:14:26 +01:00
Gael Guennebaud
d4dd6aaed2
Fix bug #642 : add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled
2013-08-19 16:02:27 +02:00
Gael Guennebaud
b3adc4face
Add missing pconj specializations
2013-05-17 17:25:29 +02:00
Gael Guennebaud
d63712163c
Add SSE4 min/max for integers
2013-03-20 18:28:40 +01:00
Gael Guennebaud
e8aa1f00c5
add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1
2012-07-27 23:40:04 +02:00
Benoit Jacob
69124cfca2
Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.
2012-07-13 14:42:47 -04:00
Jitse Niesen
3c412183b2
Get rid of include directives inside namespace blocks (bug #339 ).
2012-04-15 11:06:28 +01:00
Gael Guennebaud
634fedaf68
proper C++ casting
2012-01-31 18:56:25 +01:00
Gael Guennebaud
9c86ee2695
fix static inline versus inline static issues (the former is the correct order)
2012-01-31 12:58:52 +01:00
Gael Guennebaud
c331c092d5
no comment
2011-09-21 14:20:41 +02:00
Gael Guennebaud
7301f4345c
quick workaround of MSVC9' ICE in pset1
2011-09-21 14:18:41 +02:00
Gael Guennebaud
c8e1b679fa
re-enable fast pset1-pstore by introducing a new higher level pstore1 function
2011-03-02 10:55:44 +01:00
Benoit Jacob
eef03525b8
fix bug #203 : revert to using _mm_set1_p[sd]
2011-02-28 00:04:05 -05:00
Benoit Jacob
9be2712bf7
remove now-useless comments
2011-02-27 22:35:17 -05:00
Benoit Jacob
0612768c1c
fix bug #201 : Clang too has intrinsics bugs preventing us to use custom unaligned loads
2011-02-27 21:59:07 -05:00
Benoit Jacob
b3544ce2ae
bug #195 - fix this once and for all: just never use _mm_load_sd on gcc/i386, it generates redundant x87 ops
2011-02-27 17:26:59 -05:00
Benoit Jacob
5dfae4524b
fix bug #195 : fast unaligned load for integer using _mm_load_sd failed when the value interpreted as a NaN
2011-02-24 10:31:57 -05:00
Hauke Heibel
1a6597b8e4
MSVC does not like using uninitialized SSE variables, so we have to pass all zeros.
2011-02-12 21:29:16 +01:00
Gael Guennebaud
9d2bf35a05
implement optimized ploadu for MSVC10: this also fix bad code generation in gebp_kernel :)
2011-02-12 16:40:09 +01:00
Benoit Jacob
6a5a13e394
The pfirst hack is needed also on msvc 2010 as it gets completely nuts, even though it doesnt segfault as msvc 2008 did
2011-02-09 15:13:23 -05:00
Hauke Heibel
7bc8e3ac09
Initial fixes for bug #85 .
...
Renamed meta_{true|false} to {true|false}_type, meta_if to conditional, is_same_type to is_same, un{ref|pointer|const} to remove_{reference|pointer|const} and makeconst to add_const.
Changed boolean type 'ret' member to 'value'.
Changed 'ret' members refering to types to 'type'.
Adapted all code occurences.
2010-10-25 22:13:49 +02:00
Benoit Jacob
4716040703
bug #86 : use internal:: namespace instead of ei_ prefix
2010-10-25 10:15:22 -04:00
Benoit Jacob
b80d9dd42e
fix determination of number of registers on sse:
...
__i386__ was not defined by MSVC 2010.
fixed as (2*sizeof(void*)).
also move that to SSE/ and let the default for unknown arch's be just 8.
2010-08-13 13:55:28 -04:00
Gael Guennebaud
c2ee454df4
* fix compilation of mixed scalar product
...
* optimize mixed scalar products
2010-07-19 16:49:09 +02:00
Gael Guennebaud
f8aae7a908
* _mm_loaddup_pd is slow
...
* optimize SSE ei_ploaddup<Packet4f>
2010-07-19 15:43:27 +02:00
Gael Guennebaud
cd0e5dca9b
wip: extend the gebp kernel to optimize complex and mixed products
2010-07-19 08:50:59 +02:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Benoit Jacob
6dcd373b9d
let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.
2010-07-09 18:51:17 -04:00
Gael Guennebaud
96f9015807
disable MSVC optimization when the underlying compiler is ICC
2010-07-09 19:33:43 +02:00
Gael Guennebaud
300a226ffa
scalars fitting in a single packet requires more work, step 1
...
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
dd18b22f0b
optimize pmul for complex<double>
2010-07-07 15:29:04 +02:00
Gael Guennebaud
b0896382a3
s/IsVectorized/Vectorizable
2010-07-07 11:10:46 +02:00
Gael Guennebaud
bfa606d16f
* add a IsVectorized mechanism (instead of packet-size>1...)
...
* vectorize complex<double>
2010-07-06 23:36:00 +02:00