Rasmus Larsen
cb3c059fa4
Merged eigen/eigen into default
2019-01-09 15:04:17 -08:00
Gael Guennebaud
3492a1ca74
fix plog(+inf) with AVX512
2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7
Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE
2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e
fix warning
2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b
Add missing pcmp_lt and others for AVX512
2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd
bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
...
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
- FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen
055f0b73db
Add support for pcmp_eq and pnot, including for complex types.
2019-01-07 16:53:36 -08:00
Mark D Ryan
bc5dd4cafd
PR560: Fix the AVX512f only builds
...
Commit c53eececb0
introduced AVX512 support for complex numbers but required
avx512dq to build. Commit 1d683ae2f5
fixed some but not, it would seem all,
of the hard avx512dq dependencies. Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3. Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps. This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
60d3fe9a89
One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)
2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510
Add EIGEN_STRONG_INLINE where required
2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8
Add missing pcmp_lt_or_nan for AVX512
2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a
Implement a faster fix for sin/cos of large entries that also correctly handle INF input.
2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8
Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)
2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb
Fix plog(+INF): it returned ~87 instead of +INF
2018-12-23 15:40:52 +01:00
Gael Guennebaud
efa4c9c40f
bug #1615 : slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
...
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
0a7e7af6fd
Properly set the number of registers for AVX512
2018-12-11 15:33:17 +01:00
Gael Guennebaud
81c27325ae
bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512
2018-12-08 14:27:48 +01:00
Gael Guennebaud
7b6d0ff1f6
Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.
2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d
bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA
2018-12-07 10:01:09 +01:00
Gael Guennebaud
cbf2f4b7a0
AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only
2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5
Fix compilation with avx512f only, i.e., no AVX512DQ
2018-12-06 18:11:07 +01:00
Gael Guennebaud
c53eececb0
Implement AVX512 vectorization of std::complex<float/double>
2018-12-06 15:58:06 +01:00
Gael Guennebaud
1ac2695ef7
bug #1636 : fix compilation with some ABI versions.
2018-12-06 00:05:10 +01:00
Gael Guennebaud
0ea7ae7213
Add missing padd for Packet8i (it was implicitly generated by clang and gcc)
2018-11-30 21:52:25 +01:00
Gael Guennebaud
c785464430
Add packet sin and cos to Altivec/VSX and NEON
2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be
Several improvements regarding packet-bitwise operations:
...
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876
Add psin/pcos on AVX512 -> almost for free, at last!
2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a
Cleanup
2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303
Fix pandnot order in AVX512
2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6
Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)
2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d
Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)
2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7
same for pmax
2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6
pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions
2018-11-28 17:14:20 +01:00
Eugene Zhulenev
80f1651f35
Use explicit packet type in SSE/PacketMath pldexp
2018-11-27 17:25:49 -08:00
Gael Guennebaud
b131a4db24
bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.
2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21
Update pshiftleft to pass the shift as a true compile-time integer.
2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda
Unify SSE/AVX psin functions.
...
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
- add packet-float to packet-int type traits
- add packet float<->int reinterpret casts
- add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Gael Guennebaud
b5695a6008
Unify Altivec/VSX pexp(double) with default implementation
2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e
cleanup
2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10
Unify SSE and AVX pexp for double.
2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054
Unify NEON's pexp with generic implementation
2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc
Unify Altivec/VSX's pexp with generic implementation
2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5
Unify SSE and AVX implementation of pexp
2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47
Unify Altivec/VSX's plog with generic implementation, and enable it!
2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8
Unify NEON's plog with generic implementation
2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114
First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
...
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c
Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"
2018-11-26 14:14:07 +01:00
Gael Guennebaud
0836a715d6
bug #1611 : fix plog(0) on NEON
2018-11-26 09:08:38 +01:00
Christian von Schultz
4a40b3785d
Collapsed revision (based on pull request PR-325)
...
* Support compiling without IO streams
Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Gael Guennebaud
0f780bb0b4
Fix float-to-double warning
2018-10-16 09:19:45 +02:00