eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	b240080e64	bug #1436 : fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing.	2017-06-15 10:16:30 +02:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
Gael Guennebaud	5d00fdf0e8	bug #1363 : fix mingw's ABI issue	2016-12-15 11:58:31 +01:00
Gael Guennebaud	178c084856	Disable usage of SSE3 _mm_hadd_ps that is extremely slow.	2016-11-22 21:53:14 +01:00
Gael Guennebaud	f3fb0a1940	Disable usage of SSE3 haddpd that is extremely slow.	2016-11-22 16:58:31 +01:00
Gael Guennebaud	598de8b193	Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.	2016-11-02 10:38:13 +01:00
Gael Guennebaud	aad72f3c6d	Add missing inline keywords	2016-10-25 20:20:09 +02:00
Benoit Steiner	3e194a6a73	Fixed a typo	2016-10-25 08:42:15 -07:00
Gael Guennebaud	13fc18d3a2	Add a pinsertlast function replacing the last entry of a packet by a scalar. (useful to vectorize LinSpaced)	2016-10-25 16:48:49 +02:00
Rasmus Munk Larsen	765615609d	Update comment for fast sqrt.	2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen	3ed67cb0bb	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.	2016-10-04 14:22:56 -07:00
Gael Guennebaud	471eac5399	bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)	2016-09-08 08:36:27 +02:00
Gael Guennebaud	35a8e94577	bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.	2016-08-29 10:59:37 +02:00
Gael Guennebaud	b3151bca40	Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.	2016-08-23 14:24:08 +02:00
Gael Guennebaud	a4c266f827	Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.	2016-08-23 14:23:08 +02:00
Benoit Jacob	40a16282c7	Remove now-unused protate PacketMath func	2016-05-24 11:01:18 -04:00
Benoit Steiner	8ce46f9d89	Improved implementation of ptanh for SSE and AVX	2016-02-18 13:24:34 -08:00
Benoit Steiner	6d8b1dce06	Avoid implicit cast from double to float.	2016-02-10 18:07:11 -08:00
Benoit Steiner	bfb3fcd94f	Optimized implementation of the tanh function for SSE	2016-02-10 08:52:30 -08:00
Benoit Jacob	e6ee18d6b4	Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC	2016-02-10 11:11:49 -05:00
Benoit Jacob	964a95bf5e	Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088	2016-02-10 10:37:22 -05:00
Gael Guennebaud	c2bf2f56ef	Remove custom unaligned loads for SSE. They were only useful for core2 CPU.	2016-02-08 14:29:12 +01:00
Gael Guennebaud	7cae8918c0	Fix compilation on old gcc+AVX	2016-01-21 20:30:32 +01:00
Gael Guennebaud	8dca9f97e3	Add numext::sqrt function to enable custom optimized implementation. This changeset add two specializations for float/double on SSE. Those are mostly usefull with GCC for which std::sqrt add an extra and costly check on the result of _mm_sqrt_*. Clang does not add this burden. In this changeset, only DenseBase::norm() makes use of it.	2016-01-21 20:18:51 +01:00
Gael Guennebaud	70404e07c2	Workaround clang -Wdocumentation warning about "/*<"	2015-12-30 16:46:45 +01:00
Gael Guennebaud	ae87f094eb	Fix "," in non SSE4 mode	2015-11-05 12:08:36 +01:00
Alexandre Avenel	d46e2c10a6	Add round, ceil and floor for SSE4.1/AVX (Bug #70 )	2015-11-01 10:49:27 +01:00
Gael Guennebaud	6163db814c	bug #1085 : workaround gcc default ABI issue	2015-10-10 22:38:55 +02:00
Gael Guennebaud	f047ecc36a	_mm_hadd_epi32 is for SSSE3 only (and not SSE3)	2015-10-07 15:48:35 +02:00
Gael Guennebaud	2c676ddb40	Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)	2015-10-06 15:43:27 +02:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00
Benoit Steiner	05089aba75	Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts	2015-02-27 09:27:30 -08:00
Benoit Steiner	573b377110	Added support for vectorized type casting of tensors	2015-02-27 08:46:04 -08:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Gael Guennebaud	eb563049f7	Remove some dead stores.	2015-02-18 11:26:48 +01:00
Gael Guennebaud	159fb181c2	Disable __m128* wrappers when compiling with AVX and -fabi-version=4	2015-02-17 16:27:20 +01:00
Gael Guennebaud	91ab2489dd	Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)	2015-02-17 16:08:07 +01:00
Gael Guennebaud	45cbb0bbb1	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	2015-02-16 15:05:41 +01:00
Gael Guennebaud	0918c51e60	merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper	2015-02-12 21:48:41 +01:00
Gael Guennebaud	fe25f3b8e3	FMA has been wrongly disabled	2015-02-10 23:11:35 +01:00
Benoit Steiner	c739102ef9	Pulled the latest changes from the trunk	2015-02-06 05:25:03 -08:00
Benoit Jacob	0f21613698	bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD	2015-01-30 17:44:26 -05:00
Benoit Jacob	340b8afb14	bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.	2015-01-31 14:15:57 -05:00
Gael Guennebaud	ee06f78679	Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.	2014-11-04 21:58:52 +01:00
Christoph Hertzberg	84aaa03182	Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN. psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0	2014-10-20 13:13:43 +02:00
Gael Guennebaud	aa5f79206f	Fix bug #859 : pexp(NaN) returned Inf instead of NaN	2014-10-20 11:38:51 +02:00
Benoit Steiner	16047c8d4a	Pulled in the latest changes from the Eigen trunk	2014-08-13 22:25:29 -07:00

1 2 3 4 5

209 Commits