eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Benoit Steiner	17b9fbed34	Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa	2016-02-19 06:16:07 +00:00
Benoit Steiner	8ce46f9d89	Improved implementation of ptanh for SSE and AVX	2016-02-18 13:24:34 -08:00
Benoit Steiner	6d8b1dce06	Avoid implicit cast from double to float.	2016-02-10 18:07:11 -08:00
Benoit Steiner	bfb3fcd94f	Optimized implementation of the tanh function for SSE	2016-02-10 08:52:30 -08:00
Benoit Steiner	2d523332b3	Optimized implementation of the hyperbolic tangent function for AVX	2016-02-10 08:48:05 -08:00
Benoit Jacob	e6ee18d6b4	Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC	2016-02-10 11:11:49 -05:00
Benoit Jacob	964a95bf5e	Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088	2016-02-10 10:37:22 -05:00
Gael Guennebaud	c2bf2f56ef	Remove custom unaligned loads for SSE. They were only useful for core2 CPU.	2016-02-08 14:29:12 +01:00
Gael Guennebaud	ddf64babde	merge	2016-01-28 13:21:48 +01:00
Gael Guennebaud	7cae8918c0	Fix compilation on old gcc+AVX	2016-01-21 20:30:32 +01:00
Gael Guennebaud	8dca9f97e3	Add numext::sqrt function to enable custom optimized implementation. This changeset add two specializations for float/double on SSE. Those are mostly usefull with GCC for which std::sqrt add an extra and costly check on the result of _mm_sqrt_*. Clang does not add this burden. In this changeset, only DenseBase::norm() makes use of it.	2016-01-21 20:18:51 +01:00
Gael Guennebaud	70404e07c2	Workaround clang -Wdocumentation warning about "/*<"	2015-12-30 16:46:45 +01:00
Eugene Brevdo	cef81c9084	Merged eigen/eigen into default	2015-12-24 21:17:33 -08:00
Eugene Brevdo	f7362772e3	Add digamma for CPU + CUDA. Includes tests.	2015-12-24 21:15:38 -08:00
Gael Guennebaud	d2e288ae50	Workaround compilers that do not even define _mm256_set_m128.	2015-12-24 16:53:43 +01:00
Benoit Steiner	a6c243617b	Fixed a typo in previous change.	2015-12-21 09:05:45 -08:00
Benoit Steiner	51be91f15e	Added support for CUDA architectures that don's support for 3.5 capabilities	2015-12-21 08:42:58 -08:00
Benoit Steiner	6d777e1bc7	Fixed a typo.	2015-12-18 19:25:50 -08:00
Gael Guennebaud	3abd8470ca	bug #1140 : remove custom definition and use of _mm256_setr_m128	2015-12-18 14:18:59 +01:00
Gael Guennebaud	ca39b1546e	Merged in ebrevdo/eigen (pull request PR-148) Add special functions to eigen: lgamma, erf, erfc.	2015-12-11 11:52:09 +01:00
Gael Guennebaud	7ad1aaec1d	bug #1103 : fix neon vectorization of pmul(Packet1cd,Packet1cd)	2015-12-10 16:06:33 +01:00
Eugene Brevdo	fa4f933c0f	Add special functions to Eigen: lgamma, erf, erfc. Includes CUDA support and unit tests.	2015-12-07 15:24:49 -08:00
Gael Guennebaud	ae87f094eb	Fix "," in non SSE4 mode	2015-11-05 12:08:36 +01:00
Gael Guennebaud	90323f1751	Fix AVX round/ceil/floor, and fix respective unit test	2015-11-04 22:15:57 +01:00
Gael Guennebaud	3dd24bdf99	Merged in aavenel/eigen (pull request PR-142) Add round, ceil and floor for SSE4.1/AVX (Bug #70)	2015-11-04 18:26:38 +01:00
Benoit Steiner	36cd6daaae	Made the CUDA implementation of ploadt_ro compatible with cuda implementations older than 3.5	2015-11-03 16:36:30 -08:00
Alexandre Avenel	d46e2c10a6	Add round, ceil and floor for SSE4.1/AVX (Bug #70 )	2015-11-01 10:49:27 +01:00
Gael Guennebaud	6163db814c	bug #1085 : workaround gcc default ABI issue	2015-10-10 22:38:55 +02:00
Gael Guennebaud	f047ecc36a	_mm_hadd_epi32 is for SSSE3 only (and not SSE3)	2015-10-07 15:48:35 +02:00
Gael Guennebaud	2c676ddb40	Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)	2015-10-06 15:43:27 +02:00
Gael Guennebaud	75861f6650	bug #1069 : fix AVX support on MSVC (use of non portable C-style cast)	2015-09-28 10:08:26 +02:00
Benoit Steiner	98f8f0db9a	Added support for predux_mul for CUDA devices	2015-09-08 15:37:25 -07:00
Doug Kwan	5c9ee73eb9	Implement plog and pexp for AltiVec.	2015-07-30 11:12:42 -07:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	e68c7b8368	Include SSE packetmath when AVX is enabled, and enable AVX's sine function only in fast-math mode (as SSE)	2015-08-07 17:40:39 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Gael Guennebaud	9a2447b0c9	Fix shadow warnings triggered by clang	2015-06-09 09:11:12 +02:00
Benoit Jacob	051d5325cc	Abandon blocking size lookup table approach. Not performing as well in real world as in microbenchmark.	2015-05-19 11:03:59 -04:00
Benoit Jacob	c88e1abaf3	also uninitialized here, see previous cset	2015-05-15 11:34:57 -04:00
Benoit Jacob	807793ec3b	Fix uninitialized var warning. The compiler was clearing the register anyway, so this does not change resulting code	2015-05-15 11:15:53 -04:00
Konstantinos Margaritis	dd698e6680	Merged in doug_kwan/eigen (pull request PR-103) Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of	2015-05-05 20:50:14 +03:00
Benoit Steiner	1dded10cb7	Added a double-precision implementation of the exp() function for AVX.	2015-05-04 10:42:51 -07:00
Benoit Steiner	d3f7915aeb	Pulled latest update from the eigen main codebase	2015-03-24 13:12:14 -07:00
Benoit Steiner	abdbe8562e	Fixed the CUDA packet primitives	2015-03-24 10:45:46 -07:00
Benoit Jacob	dc04f12967	use unsigned short instead of uint16_t which doesn't exist in c++98	2015-03-17 10:31:45 -04:00
Benoit Jacob	35c3a8bb84	Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.	2015-03-16 11:05:51 -04:00
Benoit Jacob	02babb9c0f	Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.	2015-03-15 18:13:12 -04:00
Doug Kwan	657407227e	Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of doubles instead of swapping the doubles.	2015-03-11 15:13:37 -07:00
Benoit Steiner	0196141938	Fixed the optimized AVX implementation of the fast rsqrt function	2015-03-02 13:49:39 -08:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00

1 2 3 4 5 ...

347 Commits