Gael Guennebaud
|
60d3fe9a89
|
One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)
|
2018-12-24 13:05:03 +01:00 |
|
Gael Guennebaud
|
4aa667b510
|
Add EIGEN_STRONG_INLINE where required
|
2018-12-24 10:45:01 +01:00 |
|
Gael Guennebaud
|
961ff567e8
|
Add missing pcmp_lt_or_nan for AVX512
|
2018-12-23 22:13:29 +01:00 |
|
Gael Guennebaud
|
0f6f75bd8a
|
Implement a faster fix for sin/cos of large entries that also correctly handle INF input.
|
2018-12-23 17:26:21 +01:00 |
|
Gael Guennebaud
|
38d704def8
|
Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)
|
2018-12-23 16:13:24 +01:00 |
|
Gael Guennebaud
|
5713fb7feb
|
Fix plog(+INF): it returned ~87 instead of +INF
|
2018-12-23 15:40:52 +01:00 |
|
Christoph Hertzberg
|
6dd93f7e3b
|
Make code compile again for older compilers.
See https://stackoverflow.com/questions/7411515/
|
2018-12-22 13:09:07 +01:00 |
|
Gael Guennebaud
|
efa4c9c40f
|
bug #1615: slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
.
This solves a performance regression with clang and 3x3 matrix products.
|
2018-12-13 10:42:39 +01:00 |
|
Gael Guennebaud
|
f582ea3579
|
Fix compilation with expression template scalar type.
|
2018-12-12 22:47:00 +01:00 |
|
Gael Guennebaud
|
2de8da70fd
|
bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.
|
2018-12-12 17:30:08 +01:00 |
|
Gael Guennebaud
|
37c91e1836
|
bug #1644: fix warning
|
2018-12-11 22:07:20 +01:00 |
|
Gael Guennebaud
|
f159cf3d75
|
Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
|
2018-12-11 15:36:27 +01:00 |
|
Gael Guennebaud
|
0a7e7af6fd
|
Properly set the number of registers for AVX512
|
2018-12-11 15:33:17 +01:00 |
|
Gael Guennebaud
|
7166496f70
|
bug #1643: fix compilation issue with gcc and no optimizaion
|
2018-12-11 13:24:42 +01:00 |
|
Gael Guennebaud
|
0d90637838
|
enable spilling workaround on architectures with SSE/AVX
|
2018-12-10 23:22:44 +01:00 |
|
Gael Guennebaud
|
bff90bf270
|
workaround "may be used uninitialized" warning
|
2018-12-08 18:58:28 +01:00 |
|
Gael Guennebaud
|
81c27325ae
|
bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512
|
2018-12-08 14:27:48 +01:00 |
|
Gael Guennebaud
|
426bce7529
|
fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target
|
2018-12-08 09:44:21 +01:00 |
|
Gael Guennebaud
|
956678a4ef
|
bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.
|
2018-12-07 18:03:36 +01:00 |
|
Gael Guennebaud
|
7b6d0ff1f6
|
Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.
|
2018-12-07 15:14:50 +01:00 |
|
Gael Guennebaud
|
f233c6194d
|
bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA
|
2018-12-07 10:01:09 +01:00 |
|
Gael Guennebaud
|
ae59a7652b
|
bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA
|
2018-12-07 09:23:28 +01:00 |
|
Gael Guennebaud
|
4e7746fe22
|
bug #1636: fix gemm performance issue with gcc>=6 and no FMA
|
2018-12-07 09:15:46 +01:00 |
|
Gael Guennebaud
|
cbf2f4b7a0
|
AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only
|
2018-12-06 18:21:56 +01:00 |
|
Gael Guennebaud
|
1d683ae2f5
|
Fix compilation with avx512f only, i.e., no AVX512DQ
|
2018-12-06 18:11:07 +01:00 |
|
Gael Guennebaud
|
c53eececb0
|
Implement AVX512 vectorization of std::complex<float/double>
|
2018-12-06 15:58:06 +01:00 |
|
Gael Guennebaud
|
3fba59ea59
|
temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!
|
2018-12-06 00:13:26 +01:00 |
|
Gael Guennebaud
|
1ac2695ef7
|
bug #1636: fix compilation with some ABI versions.
|
2018-12-06 00:05:10 +01:00 |
|
Rasmus Munk Larsen
|
47d8b741b2
|
#elif -> #else to fix GPU build.
|
2018-12-05 13:19:31 -08:00 |
|
Christoph Hertzberg
|
c1d356e8b4
|
bug #1635: Use infinity from Numtraits instead of creating it manually.
|
2018-12-05 15:01:04 +01:00 |
|
Rasmus Munk Larsen
|
b57b31cce9
|
Merged in ezhulenev/eigen-01 (pull request PR-553)
Do not disable alignment with EIGEN_GPUCC
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
|
2018-12-04 23:47:19 +00:00 |
|
Eugene Zhulenev
|
0bb15bb6d6
|
Update checks in ConfigureVectorization.h
|
2018-12-03 17:10:40 -08:00 |
|
Eugene Zhulenev
|
fd0fbfa9b5
|
Do not disable alignment with EIGEN_GPUCC
|
2018-12-03 15:54:10 -08:00 |
|
Christoph Hertzberg
|
919414b9fe
|
bug #785: Make Cholesky decomposition work for empty matrices
|
2018-12-03 16:18:15 +01:00 |
|
Gael Guennebaud
|
0ea7ae7213
|
Add missing padd for Packet8i (it was implicitly generated by clang and gcc)
|
2018-11-30 21:52:25 +01:00 |
|
Gael Guennebaud
|
ab4df3e6ff
|
bug #1634: remove double copy in move-ctor of non movable Matrix/Array
|
2018-11-30 21:25:51 +01:00 |
|
Gael Guennebaud
|
c785464430
|
Add packet sin and cos to Altivec/VSX and NEON
|
2018-11-30 16:21:33 +01:00 |
|
Gael Guennebaud
|
69ace742be
|
Several improvements regarding packet-bitwise operations:
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
|
2018-11-30 15:56:08 +01:00 |
|
Gael Guennebaud
|
fa87f9d876
|
Add psin/pcos on AVX512 -> almost for free, at last!
|
2018-11-30 14:33:13 +01:00 |
|
Gael Guennebaud
|
c68bd2fa7a
|
Cleanup
|
2018-11-30 14:32:31 +01:00 |
|
Gael Guennebaud
|
f91500d303
|
Fix pandnot order in AVX512
|
2018-11-30 14:32:06 +01:00 |
|
Gael Guennebaud
|
b477d60bc6
|
Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)
|
2018-11-30 11:26:30 +01:00 |
|
Gael Guennebaud
|
e19ece822d
|
Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)
|
2018-11-28 17:56:24 +01:00 |
|
Gael Guennebaud
|
41052f63b7
|
same for pmax
|
2018-11-28 17:17:28 +01:00 |
|
Gael Guennebaud
|
3e95e398b6
|
pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions
|
2018-11-28 17:14:20 +01:00 |
|
Gael Guennebaud
|
aa6097395b
|
Add missing SSE/AVX type-casting in AVX512 mode
|
2018-11-28 16:09:08 +01:00 |
|
Gael Guennebaud
|
48fe78c375
|
bug #1630: fix linspaced when requesting smaller packet size than default one.
|
2018-11-28 13:15:06 +01:00 |
|
Eugene Zhulenev
|
80f1651f35
|
Use explicit packet type in SSE/PacketMath pldexp
|
2018-11-27 17:25:49 -08:00 |
|
Benoit Jacob
|
a4159dba08
|
do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
|
2018-11-27 16:53:14 -05:00 |
|
Gael Guennebaud
|
b131a4db24
|
bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.
|
2018-11-27 23:45:00 +01:00 |
|