Benoit Steiner
ac5d706a94
Added support for simple coefficient wise tensor expression using half floats on CUDA devices
2016-02-19 08:19:12 +00:00
Benoit Steiner
0606a0a39b
FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA
2016-02-18 23:15:23 -08:00
Benoit Steiner
17b9fbed34
Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa
2016-02-19 06:16:07 +00:00
Benoit Steiner
8ce46f9d89
Improved implementation of ptanh for SSE and AVX
2016-02-18 13:24:34 -08:00
Benoit Steiner
6d8b1dce06
Avoid implicit cast from double to float.
2016-02-10 18:07:11 -08:00
Benoit Steiner
bfb3fcd94f
Optimized implementation of the tanh function for SSE
2016-02-10 08:52:30 -08:00
Benoit Steiner
2d523332b3
Optimized implementation of the hyperbolic tangent function for AVX
2016-02-10 08:48:05 -08:00
Benoit Jacob
e6ee18d6b4
Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC
2016-02-10 11:11:49 -05:00
Benoit Jacob
964a95bf5e
Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088
2016-02-10 10:37:22 -05:00
Gael Guennebaud
c2bf2f56ef
Remove custom unaligned loads for SSE. They were only useful for core2 CPU.
2016-02-08 14:29:12 +01:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Gael Guennebaud
7cae8918c0
Fix compilation on old gcc+AVX
2016-01-21 20:30:32 +01:00
Gael Guennebaud
8dca9f97e3
Add numext::sqrt function to enable custom optimized implementation.
...
This changeset add two specializations for float/double on SSE. Those
are mostly usefull with GCC for which std::sqrt add an extra and costly
check on the result of _mm_sqrt_*. Clang does not add this burden.
In this changeset, only DenseBase::norm() makes use of it.
2016-01-21 20:18:51 +01:00
Gael Guennebaud
70404e07c2
Workaround clang -Wdocumentation warning about "/*<"
2015-12-30 16:46:45 +01:00
Eugene Brevdo
cef81c9084
Merged eigen/eigen into default
2015-12-24 21:17:33 -08:00
Eugene Brevdo
f7362772e3
Add digamma for CPU + CUDA. Includes tests.
2015-12-24 21:15:38 -08:00
Gael Guennebaud
d2e288ae50
Workaround compilers that do not even define _mm256_set_m128.
2015-12-24 16:53:43 +01:00
Benoit Steiner
a6c243617b
Fixed a typo in previous change.
2015-12-21 09:05:45 -08:00
Benoit Steiner
51be91f15e
Added support for CUDA architectures that don's support for 3.5 capabilities
2015-12-21 08:42:58 -08:00
Benoit Steiner
6d777e1bc7
Fixed a typo.
2015-12-18 19:25:50 -08:00
Gael Guennebaud
3abd8470ca
bug #1140 : remove custom definition and use of _mm256_setr_m128
2015-12-18 14:18:59 +01:00
Gael Guennebaud
ca39b1546e
Merged in ebrevdo/eigen (pull request PR-148)
...
Add special functions to eigen: lgamma, erf, erfc.
2015-12-11 11:52:09 +01:00
Gael Guennebaud
7ad1aaec1d
bug #1103 : fix neon vectorization of pmul(Packet1cd,Packet1cd)
2015-12-10 16:06:33 +01:00
Eugene Brevdo
fa4f933c0f
Add special functions to Eigen: lgamma, erf, erfc.
...
Includes CUDA support and unit tests.
2015-12-07 15:24:49 -08:00
Gael Guennebaud
ae87f094eb
Fix "," in non SSE4 mode
2015-11-05 12:08:36 +01:00
Gael Guennebaud
90323f1751
Fix AVX round/ceil/floor, and fix respective unit test
2015-11-04 22:15:57 +01:00
Gael Guennebaud
3dd24bdf99
Merged in aavenel/eigen (pull request PR-142)
...
Add round, ceil and floor for SSE4.1/AVX (Bug #70 )
2015-11-04 18:26:38 +01:00
Benoit Steiner
36cd6daaae
Made the CUDA implementation of ploadt_ro compatible with cuda implementations older than 3.5
2015-11-03 16:36:30 -08:00
Alexandre Avenel
d46e2c10a6
Add round, ceil and floor for SSE4.1/AVX (Bug #70 )
2015-11-01 10:49:27 +01:00
Gael Guennebaud
6163db814c
bug #1085 : workaround gcc default ABI issue
2015-10-10 22:38:55 +02:00
Gael Guennebaud
f047ecc36a
_mm_hadd_epi32 is for SSSE3 only (and not SSE3)
2015-10-07 15:48:35 +02:00
Gael Guennebaud
2c676ddb40
Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)
2015-10-06 15:43:27 +02:00
Gael Guennebaud
75861f6650
bug #1069 : fix AVX support on MSVC (use of non portable C-style cast)
2015-09-28 10:08:26 +02:00
Benoit Steiner
98f8f0db9a
Added support for predux_mul for CUDA devices
2015-09-08 15:37:25 -07:00
Doug Kwan
5c9ee73eb9
Implement plog and pexp for AltiVec.
2015-07-30 11:12:42 -07:00
Gael Guennebaud
6245591349
Fix prototype of plset and generalize linspace functor.
2015-08-07 19:27:59 +02:00
Gael Guennebaud
e68c7b8368
Include SSE packetmath when AVX is enabled, and enable AVX's sine function only in fast-math mode (as SSE)
2015-08-07 17:40:39 +02:00
Gael Guennebaud
ce57dbd937
Let unpacket_traits<> exposes the required alignment and make use of it everywhere
2015-08-07 10:44:01 +02:00
Gael Guennebaud
9a2447b0c9
Fix shadow warnings triggered by clang
2015-06-09 09:11:12 +02:00
Benoit Jacob
051d5325cc
Abandon blocking size lookup table approach. Not performing as well in real world as in microbenchmark.
2015-05-19 11:03:59 -04:00
Benoit Jacob
c88e1abaf3
also uninitialized here, see previous cset
2015-05-15 11:34:57 -04:00
Benoit Jacob
807793ec3b
Fix uninitialized var warning. The compiler was clearing the register anyway, so this does not change resulting code
2015-05-15 11:15:53 -04:00
Konstantinos Margaritis
dd698e6680
Merged in doug_kwan/eigen (pull request PR-103)
...
Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
2015-05-05 20:50:14 +03:00
Benoit Steiner
1dded10cb7
Added a double-precision implementation of the exp() function for AVX.
2015-05-04 10:42:51 -07:00
Benoit Steiner
d3f7915aeb
Pulled latest update from the eigen main codebase
2015-03-24 13:12:14 -07:00
Benoit Steiner
abdbe8562e
Fixed the CUDA packet primitives
2015-03-24 10:45:46 -07:00
Benoit Jacob
dc04f12967
use unsigned short instead of uint16_t which doesn't exist in c++98
2015-03-17 10:31:45 -04:00
Benoit Jacob
35c3a8bb84
Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.
2015-03-16 11:05:51 -04:00
Benoit Jacob
02babb9c0f
Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.
2015-03-15 18:13:12 -04:00
Doug Kwan
657407227e
Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
...
doubles instead of swapping the doubles.
2015-03-11 15:13:37 -07:00