eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Henry Schreiner	243249718b	Adding missing inlines for CUDA and ARCH 6	2017-10-20 13:00:23 +00:00
Gael Guennebaud	70ac6c9230	Add C++11 max_digits10 for half. (grafted from `9c353dd145` )	2017-09-06 10:22:47 +02:00
Gael Guennebaud	e7c065ec71	bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER	2017-08-24 11:06:47 +02:00
Gael Guennebaud	bc837b7975	bug #1436 : fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing. (grafted from `b240080e64` )	2017-06-15 10:16:30 +02:00
Gael Guennebaud	8880be60fa	fix compilation of Half in C++98 (issue introduced in previous commit) (grafted from `26f552c18d` )	2017-06-09 13:36:58 +02:00
Gael Guennebaud	e41713d52e	Fix compilation with gcc 4.3 and ARM NEON (grafted from `1d59ca2458` )	2017-06-09 13:20:52 +02:00
Gael Guennebaud	2c32368642	Add missing std::numeric_limits specialization for half, and complete NumTraits<half> (grafted from `d588822779` )	2017-06-09 11:51:53 +02:00
Benoit Jacob	07c2244440	ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.	2017-03-15 06:53:35 -04:00
Gael Guennebaud	9219307e13	remove UTF8 symbols (grafted from `e958c2baac` )	2017-03-07 10:47:40 +01:00
Benoit Steiner	d66586ac90	Avoid unecessary float to double conversions. (grafted from `34d9fce93b` )	2017-02-27 16:33:33 -08:00
Gael Guennebaud	b4218b8473	Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON. (grafted from `cbbf88c4d7` )	2017-02-17 14:39:02 +01:00
Gael Guennebaud	4b2e7f26aa	Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t	2017-01-23 22:02:53 +01:00
Benoit Jacob	26197bb467	Use 32 registers on ARM64	2016-12-19 13:44:46 -05:00
Gael Guennebaud	772e59d475	bug #1360 : fix sign issue with pmull on altivec (grafted from `8c0e701504` )	2016-12-18 22:13:19 +00:00
Gael Guennebaud	e8f83cbb5d	Fix unused warning (grafted from `fc94258e77` )	2016-12-18 22:11:48 +00:00
Gael Guennebaud	dce584d799	bug #1363 : fix mingw's ABI issue (grafted from `5d00fdf0e8` )	2016-12-15 11:58:31 +01:00
Gael Guennebaud	723ed92e0e	Fix compilation with gcc and old ABI version (grafted from `e340866c81` )	2016-11-23 14:04:57 +01:00
Gael Guennebaud	d6b9bc1ccd	Optimize predux<Packet8f> (AVX) (grafted from `74637fa4e3` )	2016-11-22 21:57:52 +01:00
Gael Guennebaud	0eff51e2ed	Disable usage of SSE3 _mm_hadd_ps that is extremely slow. (grafted from `178c084856` )	2016-11-22 21:53:14 +01:00
Gael Guennebaud	1b7dd46d94	Optimize predux<Packet4d> (AVX) (grafted from `7dd894e40e` )	2016-11-22 21:41:30 +01:00
Gael Guennebaud	b2eb1bf3dc	Disable usage of SSE3 haddpd that is extremely slow. (grafted from `f3fb0a1940` )	2016-11-22 16:58:31 +01:00
Konstantinos Margaritis	463176cc44	implement float/std::complex<float> for ZVector as well, minor fixes to ZVector (grafted from `672aa97d4d` )	2016-11-17 13:27:33 -05:00
Benoit Steiner	c80587c92b	Merged eigen/eigen into default	2016-11-03 03:55:11 -07:00
Gael Guennebaud	598de8b193	Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.	2016-11-02 10:38:13 +01:00
Benoit Steiner	7a0e96b80d	Gate the code that refers to cuda fp16 primitives more thoroughly	2016-11-01 12:08:09 -07:00
Gael Guennebaud	aad72f3c6d	Add missing inline keywords	2016-10-25 20:20:09 +02:00
Benoit Steiner	3e194a6a73	Fixed a typo	2016-10-25 08:42:15 -07:00
Gael Guennebaud	13fc18d3a2	Add a pinsertlast function replacing the last entry of a packet by a scalar. (useful to vectorize LinSpaced)	2016-10-25 16:48:49 +02:00
Benoit Steiner	38b6048e14	Deleted redundant implementation of predux	2016-10-12 14:37:56 -07:00
Benoit Steiner	78d2926508	Merged eigen/eigen into default	2016-10-12 13:46:29 -07:00
Benoit Steiner	2e2f48e30e	Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.	2016-10-12 13:45:39 -07:00
Gael Guennebaud	5c366fe1d7	Merged in rmlarsen/eigen (pull request PR-230) Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1	2016-10-12 16:30:51 +00:00
Rasmus Munk Larsen	47150af1c8	Fix copy-paste error: Must use _mm256_cmp_ps for AVX.	2016-10-12 08:34:39 -07:00
Gael Guennebaud	89e315152c	bug #1325 : fix compilation on NEON with clang	2016-10-12 16:55:47 +02:00
Benoit Steiner	507b661106	Renamed predux_half into predux_downto4	2016-10-06 17:57:04 -07:00
Benoit Steiner	a498ff7df6	Fixed incorrect comment	2016-10-06 15:27:27 -07:00
Benoit Steiner	a7473d6d5a	Fixed compilation error with gcc >= 5.3	2016-10-06 14:33:22 -07:00
Benoit Steiner	5e64cea896	Silenced a compilation warning	2016-10-06 14:24:17 -07:00
Benoit Steiner	d485d12c51	Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.	2016-10-06 10:41:03 -07:00
Benoit Steiner	4131074818	Deleted unecessary CMakeLists.txt file	2016-10-05 18:54:35 -07:00
Benoit Steiner	cb5cd69872	Silenced a compilation warning.	2016-10-05 18:50:53 -07:00
Benoit Steiner	78b569f685	Merged latest updates from trunk	2016-10-05 18:48:55 -07:00
Benoit Steiner	9c2b6c049b	Silenced a few compilation warnings	2016-10-05 18:37:31 -07:00
Benoit Steiner	698ff69450	Properly characterize the CUDA packet primitives for fp16 as device only	2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen	7f67e6dfdb	Update comment for fast sqrt.	2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen	765615609d	Update comment for fast sqrt.	2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen	3ed67cb0bb	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.	2016-10-04 14:22:56 -07:00
Benoit Steiner	409e887d78	Added support for constand std::complex numbers on GPU	2016-10-03 11:06:24 -07:00
Benoit Steiner	26f9907542	Added missing typedefs	2016-09-20 12:58:03 -07:00
RJ Ryan	b2c6dc48d9	Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.	2016-09-20 07:18:20 -07:00

1 2 3 4 5 ...

576 Commits