eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	450d0c3de0	Make sure that calls to broadcast4 are 16 bytes aligned	2014-04-25 22:25:48 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Gael Guennebaud	b0e19db1cf	Enable fused madd for Altivec	2014-04-24 23:17:18 +02:00
Gael Guennebaud	5c5231ab71	Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.	2014-04-22 16:03:19 +02:00
Gael Guennebaud	d5a795f673	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.	2014-04-16 17:05:11 +02:00
Benoit Steiner	feaf7c7e6d	Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).	2014-04-14 10:44:17 -07:00
Gael Guennebaud	10aa14592a	Add a mechanism to recursively access to half-size packet types	2014-03-28 10:18:04 +01:00
Benoit Steiner	8a94cb3edd	Implemented the SSE version of the gather and scatter packet primitives.	2014-03-27 18:29:01 -07:00
Benoit Steiner	a419cea4a0	Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.	2014-03-26 19:03:07 -07:00
Benoit Steiner	cc73164aa8	Merged latest updates from the parent branch	2014-03-26 15:23:59 -07:00
Gael Guennebaud	bc401eb6fa	Implement new 1 packet x 8 gebp kernel	2014-03-26 18:53:00 +01:00
Gael Guennebaud	b286a1e75c	add pbroadcast2/4 generic intrinsics	2014-03-26 16:46:36 +01:00
Benoit Steiner	db7d49efbb	Added support for FMA instructions	2014-02-24 13:45:32 -08:00
Benoit Steiner	64a85800bd	Added support for AVX to Eigen.	2014-01-29 11:43:05 -08:00
Gael Guennebaud	a7621809fe	Remove useless register keyword, and optimize predux_min/max for SSE4	2014-01-25 16:54:13 +01:00
Gael Guennebaud	01fd880424	Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.	2014-03-20 16:03:46 +01:00
Gael Guennebaud	c39a3fa7a1	Makes gcc to generate a pshufd instruction for pset1	2014-03-20 10:14:26 +01:00
Gael Guennebaud	d4dd6aaed2	Fix bug #642 : add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled	2013-08-19 16:02:27 +02:00
Gael Guennebaud	b3adc4face	Add missing pconj specializations	2013-05-17 17:25:29 +02:00
Gael Guennebaud	d63712163c	Add SSE4 min/max for integers	2013-03-20 18:28:40 +01:00
Gael Guennebaud	e8aa1f00c5	add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1	2012-07-27 23:40:04 +02:00
Benoit Jacob	69124cfca2	Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.	2012-07-13 14:42:47 -04:00
Jitse Niesen	3c412183b2	Get rid of include directives inside namespace blocks (bug #339 ).	2012-04-15 11:06:28 +01:00
Gael Guennebaud	634fedaf68	proper C++ casting	2012-01-31 18:56:25 +01:00
Gael Guennebaud	9c86ee2695	fix static inline versus inline static issues (the former is the correct order)	2012-01-31 12:58:52 +01:00
Gael Guennebaud	c331c092d5	no comment	2011-09-21 14:20:41 +02:00
Gael Guennebaud	7301f4345c	quick workaround of MSVC9' ICE in pset1	2011-09-21 14:18:41 +02:00
Gael Guennebaud	c8e1b679fa	re-enable fast pset1-pstore by introducing a new higher level pstore1 function	2011-03-02 10:55:44 +01:00
Benoit Jacob	eef03525b8	fix bug #203 : revert to using _mm_set1_p[sd]	2011-02-28 00:04:05 -05:00
Benoit Jacob	9be2712bf7	remove now-useless comments	2011-02-27 22:35:17 -05:00
Benoit Jacob	0612768c1c	fix bug #201 : Clang too has intrinsics bugs preventing us to use custom unaligned loads	2011-02-27 21:59:07 -05:00
Benoit Jacob	b3544ce2ae	bug #195 - fix this once and for all: just never use _mm_load_sd on gcc/i386, it generates redundant x87 ops	2011-02-27 17:26:59 -05:00
Benoit Jacob	5dfae4524b	fix bug #195 : fast unaligned load for integer using _mm_load_sd failed when the value interpreted as a NaN	2011-02-24 10:31:57 -05:00
Hauke Heibel	1a6597b8e4	MSVC does not like using uninitialized SSE variables, so we have to pass all zeros.	2011-02-12 21:29:16 +01:00
Gael Guennebaud	9d2bf35a05	implement optimized ploadu for MSVC10: this also fix bad code generation in gebp_kernel :)	2011-02-12 16:40:09 +01:00
Benoit Jacob	6a5a13e394	The pfirst hack is needed also on msvc 2010 as it gets completely nuts, even though it doesnt segfault as msvc 2008 did	2011-02-09 15:13:23 -05:00
Hauke Heibel	7bc8e3ac09	Initial fixes for bug #85 . Renamed meta_{true\|false} to {true\|false}_type, meta_if to conditional, is_same_type to is_same, un{ref\|pointer\|const} to remove_{reference\|pointer\|const} and makeconst to add_const. Changed boolean type 'ret' member to 'value'. Changed 'ret' members refering to types to 'type'. Adapted all code occurences.	2010-10-25 22:13:49 +02:00
Benoit Jacob	4716040703	bug #86 : use internal:: namespace instead of ei_ prefix	2010-10-25 10:15:22 -04:00
Benoit Jacob	b80d9dd42e	fix determination of number of registers on sse: __i386__ was not defined by MSVC 2010. fixed as (2sizeof(void)). also move that to SSE/ and let the default for unknown arch's be just 8.	2010-08-13 13:55:28 -04:00
Gael Guennebaud	c2ee454df4	* fix compilation of mixed scalar product * optimize mixed scalar products	2010-07-19 16:49:09 +02:00
Gael Guennebaud	f8aae7a908	* _mm_loaddup_pd is slow * optimize SSE ei_ploaddup<Packet4f>	2010-07-19 15:43:27 +02:00
Gael Guennebaud	cd0e5dca9b	wip: extend the gebp kernel to optimize complex and mixed products	2010-07-19 08:50:59 +02:00
Gael Guennebaud	ff96c94043	mixing types in product step 2: * pload* and pset1 are now templated on the packet type * gemv routines are now embeded into a structure with a consistent API with respect to gemm * some configurations of vector * matrix and matrix * matrix works fine, some need more work...	2010-07-11 15:48:30 +02:00
Gael Guennebaud	4161b8be67	sync	2010-07-10 22:58:51 +02:00
Benoit Jacob	6dcd373b9d	let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.	2010-07-09 18:51:17 -04:00
Gael Guennebaud	96f9015807	disable MSVC optimization when the underlying compiler is ICC	2010-07-09 19:33:43 +02:00
Gael Guennebaud	300a226ffa	scalars fitting in a single packet requires more work, step 1 * add a, Alignable trait * update LinearVectorization assignment	2010-07-08 14:27:47 +02:00
Gael Guennebaud	dd18b22f0b	optimize pmul for complex<double>	2010-07-07 15:29:04 +02:00
Gael Guennebaud	b0896382a3	s/IsVectorized/Vectorizable	2010-07-07 11:10:46 +02:00
Gael Guennebaud	bfa606d16f	* add a IsVectorized mechanism (instead of packet-size>1...) * vectorize complex<double>	2010-07-06 23:36:00 +02:00

1 2 3

105 Commits