eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	60d3fe9a89	One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)	2018-12-24 13:05:03 +01:00
Gael Guennebaud	4aa667b510	Add EIGEN_STRONG_INLINE where required	2018-12-24 10:45:01 +01:00
Gael Guennebaud	961ff567e8	Add missing pcmp_lt_or_nan for AVX512	2018-12-23 22:13:29 +01:00
Gael Guennebaud	0f6f75bd8a	Implement a faster fix for sin/cos of large entries that also correctly handle INF input.	2018-12-23 17:26:21 +01:00
Gael Guennebaud	38d704def8	Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)	2018-12-23 16:13:24 +01:00
Gael Guennebaud	5713fb7feb	Fix plog(+INF): it returned ~87 instead of +INF	2018-12-23 15:40:52 +01:00
Christoph Hertzberg	6dd93f7e3b	Make code compile again for older compilers. See https://stackoverflow.com/questions/7411515/	2018-12-22 13:09:07 +01:00
Gael Guennebaud	efa4c9c40f	bug #1615 : slightly increase the default unrolling limit to compensate for changeset `101ea26f5e` . This solves a performance regression with clang and 3x3 matrix products.	2018-12-13 10:42:39 +01:00
Gael Guennebaud	f582ea3579	Fix compilation with expression template scalar type.	2018-12-12 22:47:00 +01:00
Gael Guennebaud	2de8da70fd	bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.	2018-12-12 17:30:08 +01:00
Gael Guennebaud	37c91e1836	bug #1644 : fix warning	2018-12-11 22:07:20 +01:00
Gael Guennebaud	f159cf3d75	Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels. With a 6pX4 kernel (not committed yet), this provides a +20% speedup.	2018-12-11 15:36:27 +01:00
Gael Guennebaud	0a7e7af6fd	Properly set the number of registers for AVX512	2018-12-11 15:33:17 +01:00
Gael Guennebaud	7166496f70	bug #1643 : fix compilation issue with gcc and no optimizaion	2018-12-11 13:24:42 +01:00
Gael Guennebaud	0d90637838	enable spilling workaround on architectures with SSE/AVX	2018-12-10 23:22:44 +01:00
Gael Guennebaud	bff90bf270	workaround "may be used uninitialized" warning	2018-12-08 18:58:28 +01:00
Gael Guennebaud	81c27325ae	bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512	2018-12-08 14:27:48 +01:00
Gael Guennebaud	426bce7529	fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target	2018-12-08 09:44:21 +01:00
Gael Guennebaud	956678a4ef	bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.	2018-12-07 18:03:36 +01:00
Gael Guennebaud	7b6d0ff1f6	Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.	2018-12-07 15:14:50 +01:00
Gael Guennebaud	f233c6194d	bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA	2018-12-07 10:01:09 +01:00
Gael Guennebaud	ae59a7652b	bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA	2018-12-07 09:23:28 +01:00
Gael Guennebaud	4e7746fe22	bug #1636 : fix gemm performance issue with gcc>=6 and no FMA	2018-12-07 09:15:46 +01:00
Gael Guennebaud	cbf2f4b7a0	AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only	2018-12-06 18:21:56 +01:00
Gael Guennebaud	1d683ae2f5	Fix compilation with avx512f only, i.e., no AVX512DQ	2018-12-06 18:11:07 +01:00
Gael Guennebaud	c53eececb0	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Gael Guennebaud	3fba59ea59	temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!	2018-12-06 00:13:26 +01:00
Gael Guennebaud	1ac2695ef7	bug #1636 : fix compilation with some ABI versions.	2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen	47d8b741b2	#elif -> #else to fix GPU build.	2018-12-05 13:19:31 -08:00
Christoph Hertzberg	c1d356e8b4	bug #1635 : Use infinity from Numtraits instead of creating it manually.	2018-12-05 15:01:04 +01:00
Rasmus Munk Larsen	b57b31cce9	Merged in ezhulenev/eigen-01 (pull request PR-553) Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-12-04 23:47:19 +00:00
Eugene Zhulenev	0bb15bb6d6	Update checks in ConfigureVectorization.h	2018-12-03 17:10:40 -08:00
Eugene Zhulenev	fd0fbfa9b5	Do not disable alignment with EIGEN_GPUCC	2018-12-03 15:54:10 -08:00
Christoph Hertzberg	919414b9fe	bug #785 : Make Cholesky decomposition work for empty matrices	2018-12-03 16:18:15 +01:00
Gael Guennebaud	0ea7ae7213	Add missing padd for Packet8i (it was implicitly generated by clang and gcc)	2018-11-30 21:52:25 +01:00
Gael Guennebaud	ab4df3e6ff	bug #1634 : remove double copy in move-ctor of non movable Matrix/Array	2018-11-30 21:25:51 +01:00
Gael Guennebaud	c785464430	Add packet sin and cos to Altivec/VSX and NEON	2018-11-30 16:21:33 +01:00
Gael Guennebaud	69ace742be	Several improvements regarding packet-bitwise operations: - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)	2018-11-30 15:56:08 +01:00
Gael Guennebaud	fa87f9d876	Add psin/pcos on AVX512 -> almost for free, at last!	2018-11-30 14:33:13 +01:00
Gael Guennebaud	c68bd2fa7a	Cleanup	2018-11-30 14:32:31 +01:00
Gael Guennebaud	f91500d303	Fix pandnot order in AVX512	2018-11-30 14:32:06 +01:00
Gael Guennebaud	b477d60bc6	Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)	2018-11-30 11:26:30 +01:00
Gael Guennebaud	e19ece822d	Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)	2018-11-28 17:56:24 +01:00
Gael Guennebaud	41052f63b7	same for pmax	2018-11-28 17:17:28 +01:00
Gael Guennebaud	3e95e398b6	pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions	2018-11-28 17:14:20 +01:00
Gael Guennebaud	aa6097395b	Add missing SSE/AVX type-casting in AVX512 mode	2018-11-28 16:09:08 +01:00
Gael Guennebaud	48fe78c375	bug #1630 : fix linspaced when requesting smaller packet size than default one.	2018-11-28 13:15:06 +01:00
Eugene Zhulenev	80f1651f35	Use explicit packet type in SSE/PacketMath pldexp	2018-11-27 17:25:49 -08:00
Benoit Jacob	a4159dba08	do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).	2018-11-27 16:53:14 -05:00
Gael Guennebaud	b131a4db24	bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.	2018-11-27 23:45:00 +01:00

1 2 3 4 5 ...

5828 Commits