Gael Guennebaud
e7c065ec71
bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER
2017-08-24 11:06:47 +02:00
Gael Guennebaud
bc837b7975
bug #1436 : fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing.
...
(grafted from b240080e64
)
2017-06-15 10:16:30 +02:00
Gael Guennebaud
8880be60fa
fix compilation of Half in C++98 (issue introduced in previous commit)
...
(grafted from 26f552c18d
)
2017-06-09 13:36:58 +02:00
Gael Guennebaud
e41713d52e
Fix compilation with gcc 4.3 and ARM NEON
...
(grafted from 1d59ca2458
)
2017-06-09 13:20:52 +02:00
Gael Guennebaud
2c32368642
Add missing std::numeric_limits specialization for half, and complete NumTraits<half>
...
(grafted from d588822779
)
2017-06-09 11:51:53 +02:00
Benoit Jacob
07c2244440
ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.
2017-03-15 06:53:35 -04:00
Gael Guennebaud
9219307e13
remove UTF8 symbols
...
(grafted from e958c2baac
)
2017-03-07 10:47:40 +01:00
Benoit Steiner
d66586ac90
Avoid unecessary float to double conversions.
...
(grafted from 34d9fce93b
)
2017-02-27 16:33:33 -08:00
Gael Guennebaud
b4218b8473
Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON.
...
(grafted from cbbf88c4d7
)
2017-02-17 14:39:02 +01:00
Gael Guennebaud
4b2e7f26aa
Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t
2017-01-23 22:02:53 +01:00
Benoit Jacob
26197bb467
Use 32 registers on ARM64
2016-12-19 13:44:46 -05:00
Gael Guennebaud
772e59d475
bug #1360 : fix sign issue with pmull on altivec
...
(grafted from 8c0e701504
)
2016-12-18 22:13:19 +00:00
Gael Guennebaud
e8f83cbb5d
Fix unused warning
...
(grafted from fc94258e77
)
2016-12-18 22:11:48 +00:00
Gael Guennebaud
dce584d799
bug #1363 : fix mingw's ABI issue
...
(grafted from 5d00fdf0e8
)
2016-12-15 11:58:31 +01:00
Gael Guennebaud
723ed92e0e
Fix compilation with gcc and old ABI version
...
(grafted from e340866c81
)
2016-11-23 14:04:57 +01:00
Gael Guennebaud
d6b9bc1ccd
Optimize predux<Packet8f> (AVX)
...
(grafted from 74637fa4e3
)
2016-11-22 21:57:52 +01:00
Gael Guennebaud
0eff51e2ed
Disable usage of SSE3 _mm_hadd_ps that is extremely slow.
...
(grafted from 178c084856
)
2016-11-22 21:53:14 +01:00
Gael Guennebaud
1b7dd46d94
Optimize predux<Packet4d> (AVX)
...
(grafted from 7dd894e40e
)
2016-11-22 21:41:30 +01:00
Gael Guennebaud
b2eb1bf3dc
Disable usage of SSE3 haddpd that is extremely slow.
...
(grafted from f3fb0a1940
)
2016-11-22 16:58:31 +01:00
Konstantinos Margaritis
463176cc44
implement float/std::complex<float> for ZVector as well, minor fixes to ZVector
...
(grafted from 672aa97d4d
)
2016-11-17 13:27:33 -05:00
Benoit Steiner
c80587c92b
Merged eigen/eigen into default
2016-11-03 03:55:11 -07:00
Gael Guennebaud
598de8b193
Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.
2016-11-02 10:38:13 +01:00
Benoit Steiner
7a0e96b80d
Gate the code that refers to cuda fp16 primitives more thoroughly
2016-11-01 12:08:09 -07:00
Gael Guennebaud
aad72f3c6d
Add missing inline keywords
2016-10-25 20:20:09 +02:00
Benoit Steiner
3e194a6a73
Fixed a typo
2016-10-25 08:42:15 -07:00
Gael Guennebaud
13fc18d3a2
Add a pinsertlast function replacing the last entry of a packet by a scalar.
...
(useful to vectorize LinSpaced)
2016-10-25 16:48:49 +02:00
Benoit Steiner
38b6048e14
Deleted redundant implementation of predux
2016-10-12 14:37:56 -07:00
Benoit Steiner
78d2926508
Merged eigen/eigen into default
2016-10-12 13:46:29 -07:00
Benoit Steiner
2e2f48e30e
Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.
2016-10-12 13:45:39 -07:00
Gael Guennebaud
5c366fe1d7
Merged in rmlarsen/eigen (pull request PR-230)
...
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
2016-10-12 16:30:51 +00:00
Rasmus Munk Larsen
47150af1c8
Fix copy-paste error: Must use _mm256_cmp_ps for AVX.
2016-10-12 08:34:39 -07:00
Gael Guennebaud
89e315152c
bug #1325 : fix compilation on NEON with clang
2016-10-12 16:55:47 +02:00
Benoit Steiner
507b661106
Renamed predux_half into predux_downto4
2016-10-06 17:57:04 -07:00
Benoit Steiner
a498ff7df6
Fixed incorrect comment
2016-10-06 15:27:27 -07:00
Benoit Steiner
a7473d6d5a
Fixed compilation error with gcc >= 5.3
2016-10-06 14:33:22 -07:00
Benoit Steiner
5e64cea896
Silenced a compilation warning
2016-10-06 14:24:17 -07:00
Benoit Steiner
d485d12c51
Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.
2016-10-06 10:41:03 -07:00
Benoit Steiner
4131074818
Deleted unecessary CMakeLists.txt file
2016-10-05 18:54:35 -07:00
Benoit Steiner
cb5cd69872
Silenced a compilation warning.
2016-10-05 18:50:53 -07:00
Benoit Steiner
78b569f685
Merged latest updates from trunk
2016-10-05 18:48:55 -07:00
Benoit Steiner
9c2b6c049b
Silenced a few compilation warnings
2016-10-05 18:37:31 -07:00
Benoit Steiner
698ff69450
Properly characterize the CUDA packet primitives for fp16 as device only
2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen
7f67e6dfdb
Update comment for fast sqrt.
2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen
765615609d
Update comment for fast sqrt.
2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb
Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
...
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
SSE AVX
Fast=1 2.529G 4.380G
Fast=0 1.944G 1.898G
Fast=1 fixed 2.214G 3.739G
This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Benoit Steiner
409e887d78
Added support for constand std::complex numbers on GPU
2016-10-03 11:06:24 -07:00
Benoit Steiner
26f9907542
Added missing typedefs
2016-09-20 12:58:03 -07:00
RJ Ryan
b2c6dc48d9
Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.
2016-09-20 07:18:20 -07:00
Gael Guennebaud
471eac5399
bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)
2016-09-08 08:36:27 +02:00
Gael Guennebaud
8f4b4ad5fb
use ::hlog if available.
2016-08-29 11:05:32 +02:00