Gael Guennebaud
|
21633e585b
|
bug #1462: remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER
|
2017-08-24 11:06:47 +02:00 |
|
Gael Guennebaud
|
bbd97b4095
|
Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases
|
2017-07-17 01:02:51 +02:00 |
|
Gael Guennebaud
|
24fe1de9b4
|
merge
|
2017-06-15 10:17:39 +02:00 |
|
Gael Guennebaud
|
b240080e64
|
bug #1436: fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing.
|
2017-06-15 10:16:30 +02:00 |
|
Benoit Steiner
|
3baef62b9a
|
Added missing __device__ qualifier
|
2017-06-13 12:56:55 -07:00 |
|
Benoit Steiner
|
449936828c
|
Added missing __device__ qualifier
|
2017-06-13 12:54:57 -07:00 |
|
Gael Guennebaud
|
26f552c18d
|
fix compilation of Half in C++98 (issue introduced in previous commit)
|
2017-06-09 13:36:58 +02:00 |
|
Gael Guennebaud
|
1d59ca2458
|
Fix compilation with gcc 4.3 and ARM NEON
|
2017-06-09 13:20:52 +02:00 |
|
Gael Guennebaud
|
d588822779
|
Add missing std::numeric_limits specialization for half, and complete NumTraits<half>
|
2017-06-09 11:51:53 +02:00 |
|
Abhijit Kundu
|
9bc0a35731
|
Fixed nested angle barckets >> issue when compiling with cuda 8
|
2017-04-27 03:09:03 -04:00 |
|
Benoit Jacob
|
61160a21d2
|
ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.
|
2017-03-15 06:57:25 -04:00 |
|
Gael Guennebaud
|
e958c2baac
|
remove UTF8 symbols
|
2017-03-07 10:47:40 +01:00 |
|
Benoit Steiner
|
7b61944669
|
Made most of the packet math primitives usable within CUDA kernel when compiling with clang
|
2017-02-28 17:05:28 -08:00 |
|
Benoit Steiner
|
34d9fce93b
|
Avoid unecessary float to double conversions.
|
2017-02-27 16:33:33 -08:00 |
|
Gael Guennebaud
|
cbbf88c4d7
|
Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON.
|
2017-02-17 14:39:02 +01:00 |
|
Rasmus Munk Larsen
|
5c9ed4ba0d
|
Reverse arguments for pmin in AVX.
|
2017-01-25 09:21:57 -08:00 |
|
Rasmus Munk Larsen
|
7b6aaa3440
|
Fix NaN propagation for AVX512.
|
2017-01-24 13:37:08 -08:00 |
|
Rasmus Munk Larsen
|
5e144bbaa4
|
Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
|
2017-01-24 13:32:50 -08:00 |
|
Gael Guennebaud
|
ca79c1545a
|
Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t
|
2017-01-23 22:02:53 +01:00 |
|
Benoit Steiner
|
354baa0fb1
|
Avoid using horizontal adds since they're not very efficient.
|
2016-12-21 20:55:07 -08:00 |
|
Benoit Steiner
|
d7825b6707
|
Use native AVX512 types instead of Eigen Packets whenever possible.
|
2016-12-21 20:06:18 -08:00 |
|
Benoit Steiner
|
923acadfac
|
Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics
|
2016-12-19 13:02:27 -08:00 |
|
Benoit Jacob
|
751e097c57
|
Use 32 registers on ARM64
|
2016-12-19 13:44:46 -05:00 |
|
Gael Guennebaud
|
8c0e701504
|
bug #1360: fix sign issue with pmull on altivec
|
2016-12-18 22:13:19 +00:00 |
|
Gael Guennebaud
|
fc94258e77
|
Fix unused warning
|
2016-12-18 22:11:48 +00:00 |
|
Gael Guennebaud
|
5d00fdf0e8
|
bug #1363: fix mingw's ABI issue
|
2016-12-15 11:58:31 +01:00 |
|
Srinivas Vasudevan
|
f7d7c33a28
|
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
|
2016-12-05 12:19:01 -08:00 |
|
Srinivas Vasudevan
|
09ee7f0c80
|
Fix small nit where I changed name of plog1p to pexpm1.
|
2016-12-02 15:30:12 -08:00 |
|
Srinivas Vasudevan
|
218764ee1f
|
Added support for expm1 in Eigen.
|
2016-12-02 14:13:01 -08:00 |
|
Rasmus Munk Larsen
|
a0329f64fb
|
Add a default constructor for the "fake" __half class when not using the
__half class provided by CUDA.
|
2016-11-29 13:18:09 -08:00 |
|
Gael Guennebaud
|
e340866c81
|
Fix compilation with gcc and old ABI version
|
2016-11-23 14:04:57 +01:00 |
|
Gael Guennebaud
|
74637fa4e3
|
Optimize predux<Packet8f> (AVX)
|
2016-11-22 21:57:52 +01:00 |
|
Gael Guennebaud
|
178c084856
|
Disable usage of SSE3 _mm_hadd_ps that is extremely slow.
|
2016-11-22 21:53:14 +01:00 |
|
Gael Guennebaud
|
7dd894e40e
|
Optimize predux<Packet4d> (AVX)
|
2016-11-22 21:41:30 +01:00 |
|
Gael Guennebaud
|
f3fb0a1940
|
Disable usage of SSE3 haddpd that is extremely slow.
|
2016-11-22 16:58:31 +01:00 |
|
Konstantinos Margaritis
|
672aa97d4d
|
implement float/std::complex<float> for ZVector as well, minor fixes to ZVector
|
2016-11-17 13:27:33 -05:00 |
|
Benoit Steiner
|
dff9a049c4
|
Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs
|
2016-11-16 09:01:51 -08:00 |
|
Benoit Steiner
|
c80587c92b
|
Merged eigen/eigen into default
|
2016-11-03 03:55:11 -07:00 |
|
Gael Guennebaud
|
598de8b193
|
Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.
|
2016-11-02 10:38:13 +01:00 |
|
Benoit Steiner
|
7a0e96b80d
|
Gate the code that refers to cuda fp16 primitives more thoroughly
|
2016-11-01 12:08:09 -07:00 |
|
Gael Guennebaud
|
aad72f3c6d
|
Add missing inline keywords
|
2016-10-25 20:20:09 +02:00 |
|
Benoit Steiner
|
3e194a6a73
|
Fixed a typo
|
2016-10-25 08:42:15 -07:00 |
|
Gael Guennebaud
|
13fc18d3a2
|
Add a pinsertlast function replacing the last entry of a packet by a scalar.
(useful to vectorize LinSpaced)
|
2016-10-25 16:48:49 +02:00 |
|
Benoit Steiner
|
38b6048e14
|
Deleted redundant implementation of predux
|
2016-10-12 14:37:56 -07:00 |
|
Benoit Steiner
|
78d2926508
|
Merged eigen/eigen into default
|
2016-10-12 13:46:29 -07:00 |
|
Benoit Steiner
|
2e2f48e30e
|
Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.
|
2016-10-12 13:45:39 -07:00 |
|
Gael Guennebaud
|
5c366fe1d7
|
Merged in rmlarsen/eigen (pull request PR-230)
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
|
2016-10-12 16:30:51 +00:00 |
|
Rasmus Munk Larsen
|
47150af1c8
|
Fix copy-paste error: Must use _mm256_cmp_ps for AVX.
|
2016-10-12 08:34:39 -07:00 |
|
Gael Guennebaud
|
89e315152c
|
bug #1325: fix compilation on NEON with clang
|
2016-10-12 16:55:47 +02:00 |
|
Benoit Steiner
|
507b661106
|
Renamed predux_half into predux_downto4
|
2016-10-06 17:57:04 -07:00 |
|