eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	943ef50a2d	Disable testing of complex compound assignment operators for MSVC. MSVC does not support specializing compound assignments for `std::complex`, since it already specializes them (contrary to the standard). Trying to use one of these on device will currently lead to a duplicate definition error. This is still probably preferable to no error though. If we remove the definitions for MSVC, then it will compile, but the kernel will fail silently. The only proper solution would be to define our own custom `Complex` type. (cherry picked from commit `f0f1d7938b`)	2021-10-11 10:00:29 -07:00
Antonio Sanchez	7ea4adb5f0	Disable another device warning (cherry picked from commit `e9e90892fe`)	2021-10-11 10:00:29 -07:00
Antonio Sanchez	71498b32c9	Disable more NVCC warnings. The 2979 warning is yet another "calling a __host__ function from a __host__ device__ function. Although we probably should eventually address these, they are flooding the logs. Most of these are harmless since we only call the original from the host. In cases where these are actually called from device, an error is generated instead anyways. The 2977 warning is a bit strange - although the warning suggests the `__device__` annotation is ignored, this doesn't actually seem to be the case. Without the `__device__` declarations, the kernel actually fails to run when attempting to construct such objects. Again, these warnings are flooding the logs, so disabling for now. (cherry picked from commit `86c0decc48`)	2021-10-11 10:00:29 -07:00
Alexander Grund	929bc0e191	Fix alias violation in BFloat16 reinterpret_cast between unrelated types is undefined behavior and leads to misoptimizations on some platforms. Use the safer (and faster) version via bit_cast (cherry picked from commit `b5eaa42695`)	2021-09-20 14:25:58 +00:00
Antonio Sanchez	f046e326d9	Fix strict aliasing bug causing product_small failure. Packet loading is skipped due to aliasing violation, leading to nullopt matrix multiplication. Fixes #2327. (cherry picked from commit `3c724c44cf`)	2021-09-19 18:06:17 +00:00
Antonio Sanchez	f03d3e7072	Missing EIGEN_DEVICE_FUNCs to get `gpu_basic` passing with CUDA 9. CUDA 9 seems to require labelling defaulted constructors as `EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are ignored. Without these labels, the `gpu_basic` test fails to compile, with errors about calling `__host__` functions from `__host__ __device__` functions. (cherry picked from commit `998bab4b04`)	2021-09-02 03:21:43 +00:00
Antonio Sanchez	07cc362238	Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang. Clang doesn't like !621, needs the "g" constraint back. The "g" constraint also works for GCC >= 5. This fixes our gitlab CI. (cherry picked from commit `3a6296d4f1`)	2021-09-01 16:40:08 +00:00
Antonio Sanchez	4ef67cbfb2	GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315 ). GCC 4.8 doesn't seem to like the `g` register constraint, failing to compile with "error: 'asm' operand requires impossible reload". Tested `r` instead, and that seems to work, even with latest compilers. Also fixed some minor macro issues to eliminate warnings on armv7. Fixes #2315. (cherry picked from commit `ff07a8a639`)	2021-08-31 21:23:28 +00:00
Antonio Sanchez	c2b6df6e60	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets. (cherry picked from commit `cc3573ab44`)	2021-08-31 21:23:11 +00:00
Antonio Sanchez	7aee90b8d3	Fix fix<N> when variable templates are not supported. There were some typos that checked `EIGEN_HAS_CXX14` that should have checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch in some of the `Eigen::fix<N>` assumptions. Also fixed the `symbolic_index` test when `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0. Fixes #2308 (cherry picked from commit `5db9e5c779`)	2021-08-30 16:23:35 +00:00
Rasmus Munk Larsen	3147391d94	Change version to 3.4.0.	2021-08-18 13:41:58 -07:00
Antonio Sanchez	115591b9e3	Workaround VS 2017 arg bug. In VS 2017, `std::arg` for real inputs always returns 0, even for negative inputs. It should return `PI` for negative real values. This seems to be fixed in VS 2019 (MSVC 1920). (cherry picked from commit `2b410ecbef`)	2021-08-18 19:04:50 +00:00
Jakob Struye	1ec173b54e	Clearer doc for squaredNorm (cherry picked from commit `53a29c7e35`)	2021-08-18 15:12:36 +00:00
Antonio Sanchez	f1032255d3	Add missing PPC packet comparisons. This is to fix the packetmath tests on the ppc pipeline. (cherry picked from commit `2cc6ee0d2e`)	2021-08-17 15:33:55 +00:00
Chip-Kerchner	f57dec64ef	Fix unaligned loads in ploadLhs & ploadRhs for P8. (cherry picked from commit `8dcf3e38ba`)	2021-08-17 12:48:36 +00:00
andiwand	cd474d4cd0	minor doc fix in Map.h (cherry picked from commit `5c6b3efead`)	2021-08-16 14:26:39 +00:00
Chip-Kerchner	0b56b62f30	Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8). (cherry picked from commit `e07227c411`)	2021-08-13 18:01:15 +00:00
Chip Kerchner	44cc96e1a1	Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+ (cherry picked from commit `66499f0f17`)	2021-08-12 21:39:17 +00:00
Rasmus Munk Larsen	6d2506040c	* revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B. * This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B. Authored by @awoniu. (cherry picked from commit `8ce341caf2`)	2021-08-11 18:11:26 +00:00
ChipKerchner	13d7658c5d	Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl). (cherry picked from commit `413bc491f1`)	2021-08-10 20:40:54 +00:00
Gauri Deshpande	93bff85a42	remove denormal flushing in fp32tobf16 for avx & avx512 (cherry picked from commit `e6a5a594a7`)	2021-08-09 22:15:42 +00:00
Antonio Sanchez	237c59a2aa	Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset. The `memset` function and bitwise manipulation only apply to POD types that do not require initialization, otherwise resulting in UB. We currently violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and bitwise operations are applied byte-by-byte in the generic implementations. This is causing issues for scalar types that do require initialization or that contain non-POD info such as pointers (#2201). We either break them, or force specializations of these functions for custom scalars, even if they are not vectorized. Here we modify these functions for scalars only - instead using only scalar operations: - `pzero`: `Scalar(0)` for all scalars. - `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars. - `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars - `pand`, `por`, `pxor`, `pnot`: use operators `&`, `\|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise. For non-scalar types, the original implementations are used to maintain compatibility and minimize the number of changes. Fixes #2201. (cherry picked from commit `3d98a6ef5c`)	2021-08-03 16:32:59 +00:00
Antonio Sanchez	3dc42eeaec	Enable equality comparisons on GPU. Since `std::equal_to::operator()` is not a device function, it fails on GPU. On my device, I seem to get a silent crash in the kernel (no reported error, but the kernel does not complete). Replacing this with a portable version enables comparisons on device. Addresses #2292 - would need to be cherry-picked. The 3.3 branch also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get fully working. (cherry picked from commit `7880f10526`)	2021-08-03 16:15:44 +00:00
hyunggi-sv	7adc1545b4	fix:typo in dox (has->have) (cherry picked from commit `02a0e79c70`)	2021-08-03 00:54:41 +00:00
Antonio Sanchez	c0c7b695cd	Fix assignment operator issue for latest MSVC+NVCC. Details are scattered across #920, #1000, #1324, #2291. Summary: some MSVC versions have a bug that requires omitting explicit `operator=` definitions (leads to duplicate definition errors), and some MSVC versions require adding explicit `operator=` definitions (otherwise implicitly deleted errors). This mess tries to cover all the cases encountered. Fixes #2291. (cherry picked from commit `9816fe59b4`)	2021-08-03 00:52:21 +00:00
Antonio Sanchez	5d37114fc0	Fix explicit default cache size typo. (cherry picked from commit `297f0f563d`)	2021-07-20 18:42:25 +00:00
Rohit Santhanam	930696fc53	Enable extract et. al. for HIP GPU. (cherry picked from commit `beea14a18f`)	2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen	56966fd2e6	Defer to std::fill_n when filling a dense object with a constant value. (cherry picked from commit `0c361c4899`)	2021-07-09 03:59:56 +00:00
Rasmus Munk Larsen	05bab8139a	Fix breakage of conj_helper in conjunction with custom types introduced in !537 . (cherry picked from commit `7b35638ddb`)	2021-07-02 20:59:50 +00:00
Chip Kerchner	eebde572d9	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow (cherry picked from commit `91e99ec1e0`)	2021-07-01 23:32:38 +00:00
Antonio Sanchez	8190739f12	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter. (cherry picked from commit `6035da5283`)	2021-07-01 23:18:10 +00:00
Dan Miller	1f6b1c1a1f	Fix duplicate definitions on Mac (cherry picked from commit `eb04775903`)	2021-07-01 20:49:05 +00:00
Alexander Karatarakis	517294d6e1	Make DenseStorage<> trivially_copyable (cherry picked from commit `60400334a9`)	2021-07-01 20:48:47 +00:00
大河メタル	94e2250b36	Correct declarations for aarch64-pc-windows-msvc (cherry picked from commit `c81da59a25`)	2021-06-30 04:10:04 +00:00
Rasmus Munk Larsen	380d0e4916	Get rid of redundant `pabs` instruction in complex square root. (cherry picked from commit `5aebbe9098`)	2021-06-29 23:27:09 +00:00
Rohit Santhanam	e83af2cc24	Commit `52a5f982` broke conjhelper functionality for HIP GPUs. This commit addresses this. (cherry picked from commit `2d132d1736`)	2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen	413ff2b531	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. (cherry picked from commit `bffd267d17`)	2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen	a235ddef39	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. (cherry picked from commit `52a5f98212`)	2021-06-24 23:30:42 +00:00
Antonio Sanchez	c2c0f6f64b	Fix fix<> for gcc-4.9.3. There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267 (cherry picked from commit `35a367d557`)	2021-06-21 17:26:07 +00:00
Antonio Sanchez	ee4e099aa2	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps. (cherry picked from commit `12e8d57108`)	2021-06-17 17:11:08 +00:00
Chip-Kerchner	9fc93ce31a	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. (cherry picked from commit `ef1fd341a8`)	2021-06-16 22:14:17 +00:00
Antonio Sanchez	1374f49f28	Add missing ppc pcmp_lt_or_nan<Packet8bf> (cherry picked from commit `9e94c59570`)	2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen	47722a66f2	Fix more enum arithmetic. (cherry picked from commit `13fb5ab92c`)	2021-06-15 16:40:35 +00:00
Antonio Sanchez	5e75331b9f	Fix checking of version number for mingw. MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this. (cherry picked from commit `ad82d20cf6`)	2021-06-12 00:02:26 +00:00
Rasmus Munk Larsen	1cb1ffd5b2	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. (cherry picked from commit `fc87e2cbaa`)	2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen	4b502a7215	Fix c++20 warnings about using enums in arithmetic expressions. (cherry picked from commit `f64b2954c7`)	2021-06-11 02:35:19 +00:00
Cyril Kaiser	573570b6c9	Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor. (cherry picked from commit `91cd67f057`)	2021-05-26 19:45:25 +00:00
Antonio Sanchez	98cf1e076f	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`. (cherry picked from commit `dba753a986`)	2021-05-25 19:09:50 +00:00
Antonio Sanchez	ee2a8f7139	Modify Unary/Binary/TernaryOp evaluators to work for non-class types. This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit `11f55b29` to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251 (cherry picked from commit `ebb300d0b4`)	2021-05-25 18:19:53 +00:00
Steve Bronder	4fbd01cd4b	Adds macro for checking if C++14 variable templates are supported (cherry picked from commit `1720057023`)	2021-05-21 16:43:30 +00:00

1 2 3 4 5 ...

4553 Commits