eigen

devtools/eigen

Fork 0

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Commit Graph

Select branches

Hide Pull Requests

2.0

3.0

3.1

3.2

3.3

3.4

5.0

gpu-cg-interop

gpu-dense-solvers

gpu-library-dispatch

gpu-modernize-minimum-versions

gpu-sparse-fft-spmv

master

revert-b1d2ce4c

selfadjoint-eigensolver-audit

2.0-beta1

2.0-beta2

2.0-beta3

2.0-beta4

2.0-beta5

2.0-beta6

2.0-rc1

2.0.0

2.0.1

2.0.10

2.0.11

2.0.12

2.0.13

2.0.14

2.0.15

2.0.16

2.0.17

2.0.2

2.0.3

2.0.4

2.0.5

2.0.6

2.0.7

2.0.8

2.0.9

3.0-beta1

3.0-beta2

3.0-beta3

3.0-beta4

3.0-rc1

3.0.0

3.0.1

3.0.2

3.0.3

3.0.4

3.0.5

3.0.6

3.0.7

3.1.0

3.1.0-alpha1

3.1.0-alpha2

3.1.0-beta1

3.1.0-rc1

3.1.0-rc2

3.1.1

3.1.2

3.1.3

3.1.4

3.2-beta1

3.2-rc1

3.2-rc2

3.2.0

3.2.1

3.2.10

3.2.2

3.2.3

3.2.4

3.2.5

3.2.6

3.2.7

3.2.8

3.2.9

3.3-alpha1

3.3-beta1

3.3-beta2

3.3-rc1

3.3-rc2

3.3.0

3.3.1

3.3.2

3.3.3

3.3.4

3.3.5

3.3.6

3.3.7

3.3.8

3.3.8-rc1

3.3.9

3.4-rc1

3.4.0

3.4.0-rc1

3.4.1

5.0.0

5.0.1

actual-start-from-scratch

after-hg-migration

before-3.4

before-evaluators

before-git-migration

before-hg-migration

nightly

starting_new_generickernels

starting_new_packmapcalculator

cfc70dc13f Add regression test for bug #1174 Gael Guennebaud 2018-12-12 18:03:31 +01:00
2de8da70fd bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the diagonal. Gael Guennebaud 2018-12-12 17:30:08 +01:00
72c0bbe2bd Simplify handling of tests that must fail to compile. Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch). Gael Guennebaud 2018-12-12 15:48:36 +01:00
37c91e1836 bug #1644: fix warning Gael Guennebaud 2018-12-11 22:07:20 +01:00
f159cf3d75 Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels. With a 6pX4 kernel (not committed yet), this provides a +20% speedup. Gael Guennebaud 2018-12-11 15:36:27 +01:00
0a7e7af6fd Properly set the number of registers for AVX512 Gael Guennebaud 2018-12-11 15:33:17 +01:00
7166496f70 bug #1643: fix compilation issue with gcc and no optimizaion Gael Guennebaud 2018-12-11 13:24:42 +01:00
0d90637838 enable spilling workaround on architectures with SSE/AVX Gael Guennebaud 2018-12-10 23:22:44 +01:00
cf697272e1 Remove debug code. Gael Guennebaud 2018-12-09 23:05:46 +01:00
450dc97c6b Various fixes in polynomial solver and its unit tests: - cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places Gael Guennebaud 2018-12-09 22:54:39 +01:00
348bb386d1 Enable "old" CMP0026 policy (not perfect, but better than dozens of warning) Gael Guennebaud 2018-12-08 18:59:51 +01:00
bff90bf270 workaround "may be used uninitialized" warning Gael Guennebaud 2018-12-08 18:58:28 +01:00
81c27325ae bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 Gael Guennebaud 2018-12-08 14:27:48 +01:00
426bce7529 fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target Gael Guennebaud 2018-12-08 09:44:21 +01:00
cd25b538ab Fix noise in sparse_basic_3 (numerical cancellation) Gael Guennebaud 2018-12-08 00:13:37 +01:00
efaf03bf96 Fix noise in lu unit test Gael Guennebaud 2018-12-08 00:05:03 +01:00
956678a4ef bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling. Gael Guennebaud 2018-12-07 18:03:36 +01:00
7b6d0ff1f6 Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error. Gael Guennebaud 2018-12-07 15:14:50 +01:00
f233c6194d bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA Gael Guennebaud 2018-12-07 10:01:09 +01:00
ae59a7652b bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA Gael Guennebaud 2018-12-07 09:23:28 +01:00
4e7746fe22 bug #1636: fix gemm performance issue with gcc>=6 and no FMA Gael Guennebaud 2018-12-07 09:15:46 +01:00
cbf2f4b7a0 AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only Gael Guennebaud 2018-12-06 18:21:56 +01:00
1d683ae2f5 Fix compilation with avx512f only, i.e., no AVX512DQ Gael Guennebaud 2018-12-06 18:11:07 +01:00
aab749b1c3 fix test regarding AVX512 vectorization of complexes. Gael Guennebaud 2018-12-06 16:55:00 +01:00
c53eececb0 Implement AVX512 vectorization of std::complex<float/double> Gael Guennebaud 2018-12-06 15:58:06 +01:00
3fba59ea59 temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though! Gael Guennebaud 2018-12-06 00:13:26 +01:00
1ac2695ef7 bug #1636: fix compilation with some ABI versions. Gael Guennebaud 2018-12-06 00:05:10 +01:00
47d8b741b2 #elif -> #else to fix GPU build. Rasmus Munk Larsen 2018-12-05 13:19:31 -08:00
8a02883d58 Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554) Rasmus Munk Larsen 2018-12-05 18:19:32 +00:00
acc3459a49 Add help messages in the quick ref/ascii docs regarding slicing, indexing, and reshaping. Gael Guennebaud 2018-12-05 17:17:23 +01:00
e2e897298a Fix page nesting Gael Guennebaud 2018-12-05 17:13:46 +01:00
c1d356e8b4 bug #1635: Use infinity from Numtraits instead of creating it manually. Christoph Hertzberg 2018-12-05 15:01:04 +01:00
36f8f6d0be Fix evalShardedByInnerDim for AVX512 builds Mark D Ryan 2018-12-05 12:29:03 +01:00
b57b31cce9 Merged in ezhulenev/eigen-01 (pull request PR-553) Rasmus Munk Larsen 2018-12-04 23:47:19 +00:00
0bb15bb6d6 Update checks in ConfigureVectorization.h Eugene Zhulenev 2018-12-03 17:10:40 -08:00
fd0fbfa9b5 Do not disable alignment with EIGEN_GPUCC Eugene Zhulenev 2018-12-03 15:54:10 -08:00
919414b9fe bug #785: Make Cholesky decomposition work for empty matrices Christoph Hertzberg 2018-12-03 16:18:15 +01:00
0ea7ae7213 Add missing padd for Packet8i (it was implicitly generated by clang and gcc) Gael Guennebaud 2018-11-30 21:52:25 +01:00
ab4df3e6ff bug #1634: remove double copy in move-ctor of non movable Matrix/Array Gael Guennebaud 2018-11-30 21:25:51 +01:00
c785464430 Add packet sin and cos to Altivec/VSX and NEON Gael Guennebaud 2018-11-30 16:21:33 +01:00
69ace742be Several improvements regarding packet-bitwise operations: - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...) Gael Guennebaud 2018-11-30 15:56:08 +01:00
fa87f9d876 Add psin/pcos on AVX512 -> almost for free, at last! Gael Guennebaud 2018-11-30 14:33:13 +01:00
c68bd2fa7a Cleanup Gael Guennebaud 2018-11-30 14:32:31 +01:00
f91500d303 Fix pandnot order in AVX512 Gael Guennebaud 2018-11-30 14:32:06 +01:00
b477d60bc6 Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX) Gael Guennebaud 2018-11-30 11:26:30 +01:00
e19ece822d Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) Gael Guennebaud 2018-11-28 17:56:24 +01:00
41052f63b7 same for pmax Gael Guennebaud 2018-11-28 17:17:28 +01:00
3e95e398b6 pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions Gael Guennebaud 2018-11-28 17:14:20 +01:00
aa6097395b Add missing SSE/AVX type-casting in AVX512 mode Gael Guennebaud 2018-11-28 16:09:08 +01:00
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. Gael Guennebaud 2018-11-28 13:15:06 +01:00
80f1651f35 Use explicit packet type in SSE/PacketMath pldexp Eugene Zhulenev 2018-11-27 17:25:49 -08:00
a4159dba08 do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). Benoit Jacob 2018-11-27 16:53:14 -05:00
b131a4db24 bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions. Gael Guennebaud 2018-11-27 23:45:00 +01:00
a1a5fbbd21 Update pshiftleft to pass the shift as a true compile-time integer. Gael Guennebaud 2018-11-27 22:57:30 +01:00
fa7fd61eda Unify SSE/AVX psin functions. It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv Gael Guennebaud 2018-11-27 22:41:51 +01:00
08edbc8cfe Merged in bjacob/eigen/fixbuild (pull request PR-549) Rasmus Munk Larsen 2018-11-27 20:14:12 +00:00
7b1cb8a440 fix the build on 64-bit ARM when NEON is disabled Benoit Jacob 2018-11-27 11:11:02 -05:00
b5695a6008 Unify Altivec/VSX pexp(double) with default implementation Gael Guennebaud 2018-11-27 13:53:05 +01:00
7655a8af6e cleanup Gael Guennebaud 2018-11-26 23:21:29 +01:00
502f92fa10 Unify SSE and AVX pexp for double. Gael Guennebaud 2018-11-26 23:12:44 +01:00
4a347a0054 Unify NEON's pexp with generic implementation Gael Guennebaud 2018-11-26 22:15:44 +01:00
5c8406babc Unify Altivec/VSX's pexp with generic implementation Gael Guennebaud 2018-11-26 16:47:13 +01:00
cf8b85d5c5 Unify SSE and AVX implementation of pexp Gael Guennebaud 2018-11-26 16:36:19 +01:00
c2f35b1b47 Unify Altivec/VSX's plog with generic implementation, and enable it! Gael Guennebaud 2018-11-26 15:58:11 +01:00
c24e98e6a8 Unify NEON's plog with generic implementation Gael Guennebaud 2018-11-26 15:02:16 +01:00
2c44c40114 First step toward a unification of packet log implementation, currently only SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions. Gael Guennebaud 2018-11-26 14:21:24 +01:00
5f6045077c Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" Gael Guennebaud 2018-11-26 14:14:07 +01:00
382279eb7f Extend unit test to recursively check half-packet types and non packet types Gael Guennebaud 2018-11-26 14:10:07 +01:00
0836a715d6 bug #1611: fix plog(0) on NEON Gael Guennebaud 2018-11-26 09:08:38 +01:00
95566eeed4 Fix typos Patrik Huber 2018-11-23 22:22:14 +00:00
e3b22a6bd0 merge Gael Guennebaud 2018-11-23 16:06:21 +01:00
ccabdd88c9 Fix reserved usage of double __ in macro names Gael Guennebaud 2018-11-23 16:01:47 +01:00
572d62697d check two ctors Gael Guennebaud 2018-11-23 15:37:09 +01:00
354f14293b Fix double = bool ! Gael Guennebaud 2018-11-23 15:12:06 +01:00
a7842daef2 Fix several uninitialized member from ctor Gael Guennebaud 2018-11-23 15:10:28 +01:00
ea60a172cf Add default constructor to Bar to make test compile again with clang-3.8 Christoph Hertzberg 2018-11-23 14:24:22 +01:00
806352d844 Small typo found be Patrick Huber (pull request PR-547) Christoph Hertzberg 2018-11-23 12:34:27 +00:00
a476054879 bug #1624: improve matrix-matrix product on ARM 64, 20% speedup Gael Guennebaud 2018-11-23 10:25:19 +01:00
c685fe9838 Move regression test to right unit test file Gael Guennebaud 2018-11-21 15:59:47 +01:00
4b2cebade8 Workaround weird MSVC bug Gael Guennebaud 2018-11-21 15:53:37 +01:00
0ec8afde57 Fixed most conversion warnings in MatrixFunctions module Christoph Hertzberg 2018-11-20 16:23:28 +01:00
e7e6809e6b ROCm/HIP specfic fixes + updates Deven Desai 2018-11-19 18:13:59 +00:00
6a510fe69c Make MaxPacketSize a true upper bound, even for fixed-size inputs Gael Guennebaud 2018-11-16 11:25:32 +01:00
43c987b1c1 Add explicit regression test for bug #1622 Gael Guennebaud 2018-11-16 11:24:51 +01:00
670d56441c PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals Mark D Ryan 2018-11-13 16:15:08 +01:00
3dc0845046 Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES Nikolaus Demmel 2018-11-14 18:11:30 +01:00
7fddc6a51f typo Gael Guennebaud 2018-11-14 14:43:18 +01:00
449f948b2a help doxygen linking to DenseBase::NulllaryExpr Gael Guennebaud 2018-11-14 14:42:59 +01:00
4263f23c28 Improve doc on multi-threading and warn about hyper-threading Gael Guennebaud 2018-11-14 14:42:29 +01:00
db529ae4ec doxygen does not like \addtogroup and \ingroup in the same line Gael Guennebaud 2018-11-14 14:42:06 +01:00
72928a2c8a Merged in rmlarsen/eigen2 (pull request PR-543) Rasmus Munk Larsen 2018-11-13 17:10:30 +00:00
cda479d626 Remove accidental changes. Rasmus Munk Larsen 2018-11-12 18:34:04 -08:00
719d9aee65 Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. Rasmus Munk Larsen 2018-11-12 17:46:02 -08:00
f67b19a884 [PATCH 1/2] Misc. typos From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via codespell -q 3 -I ../eigen-word-whitelist.txt where the whitelists consists of: `` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch `` --- CMakeLists.txt | 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h | 2 +- Eigen/src/SparseLU/SparseLU.h | 2 +- bench/bench_norm.cpp | 2 +- doc/HiPerformance.dox | 2 +- doc/QuickStartGuide.dox | 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +- .../src/Tensor/TensorForwardDeclarations.h | 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h | 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h | 2 +- .../test/cxx11_tensor_concatenation.cpp | 2 +- unsupported/test/cxx11_tensor_executor.cpp | 2 +- 14 files changed, 29 insertions(+), 29 deletions(-) luz.paz" 2018-09-18 04:15:01 -04:00
6f5b126e6d Fix tensor contraction for AVX512 machines Mark D Ryan 2018-07-31 09:33:37 +01:00
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. Rasmus Munk Larsen 2018-11-12 13:42:24 -08:00
c81bdbdadc Add manual doc on STL-compatible iterators Gael Guennebaud 2018-11-12 22:06:33 +01:00
0105146915 Fix warning in c++03 Gael Guennebaud 2018-11-10 09:11:38 +01:00
93f9988a7e A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles. Rasmus Munk Larsen 2018-11-09 14:15:32 -08:00
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators Gael Guennebaud 2018-11-09 21:45:10 +01:00

... 33 34 35 36 37 ...