eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	35a8e94577	bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.	2016-08-29 10:59:37 +02:00
Gael Guennebaud	b3151bca40	Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.	2016-08-23 14:24:08 +02:00
Gael Guennebaud	a4c266f827	Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.	2016-08-23 14:23:08 +02:00
Benoit Jacob	40a16282c7	Remove now-unused protate PacketMath func	2016-05-24 11:01:18 -04:00
Benoit Steiner	8ce46f9d89	Improved implementation of ptanh for SSE and AVX	2016-02-18 13:24:34 -08:00
Benoit Steiner	6d8b1dce06	Avoid implicit cast from double to float.	2016-02-10 18:07:11 -08:00
Benoit Steiner	bfb3fcd94f	Optimized implementation of the tanh function for SSE	2016-02-10 08:52:30 -08:00
Benoit Jacob	e6ee18d6b4	Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC	2016-02-10 11:11:49 -05:00
Benoit Jacob	964a95bf5e	Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088	2016-02-10 10:37:22 -05:00
Gael Guennebaud	c2bf2f56ef	Remove custom unaligned loads for SSE. They were only useful for core2 CPU.	2016-02-08 14:29:12 +01:00
Gael Guennebaud	7cae8918c0	Fix compilation on old gcc+AVX	2016-01-21 20:30:32 +01:00
Gael Guennebaud	8dca9f97e3	Add numext::sqrt function to enable custom optimized implementation. This changeset add two specializations for float/double on SSE. Those are mostly usefull with GCC for which std::sqrt add an extra and costly check on the result of _mm_sqrt_*. Clang does not add this burden. In this changeset, only DenseBase::norm() makes use of it.	2016-01-21 20:18:51 +01:00
Gael Guennebaud	70404e07c2	Workaround clang -Wdocumentation warning about "/*<"	2015-12-30 16:46:45 +01:00
Gael Guennebaud	ae87f094eb	Fix "," in non SSE4 mode	2015-11-05 12:08:36 +01:00
Alexandre Avenel	d46e2c10a6	Add round, ceil and floor for SSE4.1/AVX (Bug #70 )	2015-11-01 10:49:27 +01:00
Gael Guennebaud	6163db814c	bug #1085 : workaround gcc default ABI issue	2015-10-10 22:38:55 +02:00
Gael Guennebaud	f047ecc36a	_mm_hadd_epi32 is for SSSE3 only (and not SSE3)	2015-10-07 15:48:35 +02:00
Gael Guennebaud	2c676ddb40	Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)	2015-10-06 15:43:27 +02:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00
Benoit Steiner	05089aba75	Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts	2015-02-27 09:27:30 -08:00
Benoit Steiner	573b377110	Added support for vectorized type casting of tensors	2015-02-27 08:46:04 -08:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Gael Guennebaud	eb563049f7	Remove some dead stores.	2015-02-18 11:26:48 +01:00
Gael Guennebaud	159fb181c2	Disable __m128* wrappers when compiling with AVX and -fabi-version=4	2015-02-17 16:27:20 +01:00
Gael Guennebaud	91ab2489dd	Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)	2015-02-17 16:08:07 +01:00
Gael Guennebaud	45cbb0bbb1	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	2015-02-16 15:05:41 +01:00
Gael Guennebaud	0918c51e60	merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper	2015-02-12 21:48:41 +01:00
Gael Guennebaud	fe25f3b8e3	FMA has been wrongly disabled	2015-02-10 23:11:35 +01:00
Benoit Steiner	c739102ef9	Pulled the latest changes from the trunk	2015-02-06 05:25:03 -08:00
Benoit Jacob	0f21613698	bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD	2015-01-30 17:44:26 -05:00
Benoit Jacob	340b8afb14	bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.	2015-01-31 14:15:57 -05:00
Gael Guennebaud	ee06f78679	Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.	2014-11-04 21:58:52 +01:00
Christoph Hertzberg	84aaa03182	Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN. psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0	2014-10-20 13:13:43 +02:00
Gael Guennebaud	aa5f79206f	Fix bug #859 : pexp(NaN) returned Inf instead of NaN	2014-10-20 11:38:51 +02:00
Benoit Steiner	16047c8d4a	Pulled in the latest changes from the Eigen trunk	2014-08-13 22:25:29 -07:00
Gael Guennebaud	b47ef1431f	Fix many long to int implicit conversions	2014-07-08 16:47:11 +02:00
Benoit Steiner	29aebf96e6	Created the pblend packet primitive and implemented it using SSE and AVX instructions.	2014-06-06 20:18:44 -07:00
Gael Guennebaud	450d0c3de0	Make sure that calls to broadcast4 are 16 bytes aligned	2014-04-25 22:25:48 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Gael Guennebaud	b0e19db1cf	Enable fused madd for Altivec	2014-04-24 23:17:18 +02:00
Gael Guennebaud	5c5231ab71	Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.	2014-04-22 16:03:19 +02:00
Gael Guennebaud	d5a795f673	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.	2014-04-16 17:05:11 +02:00
Benoit Steiner	feaf7c7e6d	Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).	2014-04-14 10:44:17 -07:00
Gael Guennebaud	1c0728043a	Workaround alignment warnings	2014-03-30 22:43:47 +02:00
Gael Guennebaud	10aa14592a	Add a mechanism to recursively access to half-size packet types	2014-03-28 10:18:04 +01:00
Benoit Steiner	8a94cb3edd	Implemented the SSE version of the gather and scatter packet primitives.	2014-03-27 18:29:01 -07:00
Gael Guennebaud	052aedd394	Implement pcplflip, palign, predux and the likes from AVC/complexes	2014-03-27 14:47:00 +01:00

1 2 3 4

197 Commits