eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	55967f87d1	Fix NEON pmax<PropagateNumbers,Packet4bf>. Simple typo, the max impl called pmin instead of pmax for floats.	2020-12-11 21:50:52 -08:00
Antonio Sanchez	839aa505c3	Fix typo in AVX512 packet math.	2020-12-11 21:35:44 -08:00
David Tellenbach	536c8a79f2	Remove unused macro in Half.h	2020-12-12 00:53:26 +01:00
Antonio Sanchez	8c9976d7f0	Fix more SSE/AVX packet conversions for peven. MSVC doesn't like function-style casts and forces us to use intrinsics.	2020-12-11 15:46:42 -08:00
Antonio Sanchez	c6efc4e0ba	Replace M_LOG2E and M_LN2 with custom macros. For these to exist we would need to define `_USE_MATH_DEFINES` before `cmath` or `math.h` is first included. However, we don't control the include order for projects outside Eigen, so even defining the macro in `Eigen/Core` does not fix the issue for projects that end up including `<cmath>` before Eigen does (explicitly or transitively). To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.	2020-12-11 14:34:31 -08:00
Antonio Sanchez	e82722a4a7	Fix MSVC SSE casts. MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be converted using intrinsic methods.	2020-12-11 08:52:59 -08:00
Deven Desai	f3d2ea48f5	Fix for broken ROCm/HIP Support The following commit introduced a breakage in ROCm/HIP support for Eigen. `5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)` ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:356: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:222: /home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'? return half2half2(from); ^~~~~~~~~~ __half2half2 /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here __half2 __half2half2(__half x) ^ 1 error generated when compiling for gfx900. ``` The cause seems to be a copy-paster error, and the fix is trivial	2020-12-11 16:14:57 +00:00
David Tellenbach	c7eb3a74cb	Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64	2020-12-11 12:41:52 +01:00
Everton Constantino	bccf055a7c	Add Armv8 guard on PropagateNumbers implementation.	2020-12-10 22:01:55 -03:00
David Tellenbach	00be0a7ff3	Fix vectorization of complex sqrt on NEON	2020-12-10 15:23:23 +00:00
David Tellenbach	8eb461a431	Remove comma at end of enumerator list in NEON PacketMath	2020-12-10 15:22:55 +01:00
Rasmus Munk Larsen	125cc9a5df	Implement vectorized complex square root. Closes #1905 Measured speedup for sqrt of `complex<float>` on Skylake: SSE: ``` name old time/op new time/op delta BM_eigen_sqrt_ctype/1 49.4ns ± 0% 54.3ns ± 0% +10.01% BM_eigen_sqrt_ctype/8 332ns ± 0% 50ns ± 1% -84.97% BM_eigen_sqrt_ctype/64 2.81µs ± 1% 0.38µs ± 0% -86.49% BM_eigen_sqrt_ctype/512 23.8µs ± 0% 3.0µs ± 0% -87.32% BM_eigen_sqrt_ctype/4k 202µs ± 0% 24µs ± 2% -88.03% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.19ms ± 0% -88.18% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 1.5ms ± 1% -88.20% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 6.2ms ± 0% -88.18% ``` AVX2: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.6ns ± 0% 55.6ns ± 0% +3.71% BM_eigen_sqrt_ctype/8 334ns ± 0% 27ns ± 0% -91.86% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.22µs ± 2% -92.28% BM_eigen_sqrt_ctype/512 23.8µs ± 1% 1.7µs ± 1% -92.81% BM_eigen_sqrt_ctype/4k 201µs ± 0% 14µs ± 1% -93.24% BM_eigen_sqrt_ctype/32k 1.62ms ± 0% 0.11ms ± 1% -93.29% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.9ms ± 1% -93.31% BM_eigen_sqrt_ctype/1M 52.0ms ± 0% 3.5ms ± 1% -93.31% ``` AVX512: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.7ns ± 0% 56.2ns ± 1% +4.75% BM_eigen_sqrt_ctype/8 334ns ± 0% 18ns ± 2% -94.63% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.12µs ± 1% -95.54% BM_eigen_sqrt_ctype/512 23.9µs ± 1% 1.0µs ± 1% -95.89% BM_eigen_sqrt_ctype/4k 202µs ± 0% 8µs ± 1% -96.13% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.06ms ± 1% -96.15% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.5ms ± 4% -96.11% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 2.0ms ± 1% -96.13% ```	2020-12-08 18:13:35 -08:00
Antonio Sanchez	8cfe0db108	Fix host/device calls for __half. The previous code had `__host__ __device__` functions calling `__device__` functions (e.g. `__low2half`) which caused build failures in tensorflow. Also tried to simplify the `#ifdef` guards to make them more clear.	2020-12-08 20:31:02 +00:00
Everton Constantino	baf9d762b7	- Enabling PropagateNaN and PropagateNumbers for NEON. - Adding propagate tests to bfloat16.	2020-12-08 17:05:05 +00:00
Antonio Sanchez	5ec4907434	Clean up `#if`s in GPU PacketPath. Removed redundant checks and redundant code for CUDA/HIP. Note: there are several issues here of calling `__device__` functions from `__host__ __device__` functions, in particular `__low2half`. We do not address that here -- only modifying this file enough to get our current tests to compile. Fixed: #1847	2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen	f9fac1d5b0	Add log2() to Eigen.	2020-12-04 21:45:09 +00:00
Antonio Sanchez	e2f21465fe	Special function implementations for half/bfloat16 packets. Current implementations fail to consider half-float packets, only half-float scalars. Added specializations for packets on AVX, AVX512 and NEON. Added tests to `special_packetmath`. The current `special_functions` tests would fail for half and bfloat16 due to lack of precision. The NEON tests also fail with precision issues and due to different handling of `sqrt(inf)`, so special functions bessel, ndtri have been disabled. Tested with AVX, AVX512.	2020-12-04 10:16:29 -08:00
Antonio Sanchez	9ee9ac81de	Fix shfl* macros for CUDA/HIP The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so they are defined whenever the corresponding CUDA/HIP ones are. Also changed the HIP/CUDA<9.0 versions to cast to int instead of doing the conversion `half`<->`float`. Fixes #2083	2020-12-04 17:18:32 +00:00
Rasmus Munk Larsen	f23dc5b971	Revert "Add log2() operator to Eigen" This reverts commit `4d91519a9b`.	2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen	4d91519a9b	Add log2() operator to Eigen	2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen	25d8ae7465	Small cleanup of generic plog implementations: Adding the term eln(2) is split into two step for no obvious reason. This dates back to the original Cephes code from which the algorithm is adapted. It appears that this was done in Cephes to prevent the compiler from reordering the addition of the 3 terms in the approximation log(1+x) ~= x - 0.5x^2 + x^3*P(x)/Q(x) which must be added in reverse order since \|x\| < (sqrt(2)-1). This allows rewriting the code to just 2 pmadd and 1 padd instructions, which on a Skylake processor speeds up the code by 5-7%.	2020-12-03 19:40:40 +00:00
Antonio Sanchez	70fbcf82ed	Fix typo in `F32MaskToBf16Mask`.	2020-12-02 07:58:34 -08:00
Antonio Sanchez	2627e2f2e6	Fix neon cmp* functions for bf16. The current impl corrupts the comparison masks when converting from float back to bfloat16. The resulting masks are then no longer all zeros or all ones, which breaks when used with `pselect` (e.g. in `pmin<PropagateNumbers>`). This was causing `packetmath_15` to fail on arm. Introducing a simple `F32MaskToBf16Mask` corrects this (takes the lower 16-bits for each float mask).	2020-12-02 01:29:34 +00:00
Antonio Sanchez	ddd48b242c	Implement CUDA __shfl* for Eigen::half Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu` test are broken, as well as several ops in Tensorflow. The gpu functions `__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float. Here we add the required specializations.	2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen	e57281a741	Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available.	2020-12-01 11:31:47 -08:00
Antonio Sanchez	1992af3de2	Fix #2077 , `EIGEN_CONSTEXPR` in `Half`. `bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from `raw_half_as_uint16(...)`. This shouldn't affect anything else, since it is only used in `a bit_cast<uint16_t,half>()` which is not itself `constexpr`. Fixes #2077.	2020-12-01 03:10:21 +00:00
Antonio Sanchez	89f90b585d	AVX512 missing ops. This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).	2020-11-30 16:28:57 +00:00
Andreas Krebbel	1e74f93d55	Fix some packet-functions in the IBM ZVector packet-math.	2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen	79818216ed	Revert "Fix Half NaN definition and test." This reverts commit `c770746d70`.	2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen	c770746d70	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-24 20:53:07 +00:00
Antonio Sanchez	22f67b5958	Fix boolean float conversion and product warnings. This fixes some gcc warnings such as: ``` Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool] Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); } ``` Details: - Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`). - Added `scalar_square_op<bool>` and `scalar_cube_op<bool>` specializations (`-Wint-in-bool-context`) - Deprecated above specialized ops for bool. - Modified `cxx11_tensor_block_eval` to specialize generator for booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to avoid deprecated bool ops.	2020-11-24 20:20:36 +00:00
Antonio Sanchez	a3b300f1af	Implement missing AVX half ops. Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).	2020-11-24 16:46:41 +00:00
Antonio Sanchez	38abf2be42	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-23 14:13:59 -08:00
Antonio Sanchez	4cf01d2cf5	Update AVX half packets, disable test. The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.	2020-11-21 09:05:10 -08:00
Antonio Sanchez	fd1dcb6b45	Fixes duplicate symbol when building blas Missing inline breaks blas, since symbol generated in `complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp` Changed rest of inlines to `EIGEN_STRONG_INLINE`.	2020-11-20 09:37:40 -08:00
David Tellenbach	6c9c3f9a1a	Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to float and can hence be converted to bool via the conversion chain Eigen::{half,bfloat16} -> float -> bool We thus remove the explicit cast operator to bool.	2020-11-19 18:49:09 +01:00
David Tellenbach	11e4056f6b	Re-enable Arm Neon Eigen::half packets of size 8 - Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half	2020-11-18 23:02:21 +00:00
Antonio Sanchez	17268b155d	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.	2020-11-18 20:32:35 +00:00
Rasmus Munk Larsen	2d63706545	Add missing parens around macro argument.	2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen	6bba58f109	Replace SSE_SHUFFLE_MASK macro with shuffle_mask.	2020-11-17 15:28:37 -08:00
David Tellenbach	e9b55c4db8	Avoid promotion of Arm __fp16 to float in Neon PacketMath Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.	2020-11-17 20:19:44 +01:00
Antonio Sanchez	117a4c0617	Fix missing `EIGEN_CONSTEXPR` pop_macro in `Half`. `EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if `EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.	2020-11-17 08:29:33 -08:00
Guoqiang QI	394f564055	Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath.	2020-11-17 12:27:01 +00:00
Antonio Sanchez	bb69a8db5d	Explicit casts of S -> std::complex<T> When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.	2020-11-14 05:50:42 +00:00
guoqiangqi	8324e5e049	Fix typo in NEON/PacketMath.h	2020-11-13 00:46:41 +00:00
Pedro Caldeira	c29935b323	Add support for dynamic dispatch of MMA instructions for POWER 10	2020-11-12 11:31:15 -03:00
mehdi-goli	e24a1f57e3	[SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects.	2020-11-12 01:50:28 +00:00
guoqiangqi	82fe059f35	Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x	2020-11-04 09:21:39 +08:00
Deven Desai	39a038f2e4	Fix for ROCm (and CUDA?) breakage - 201029 The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error `e265f7ed8e` ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:355: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:169: /home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'? return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t>(ptr))); ^~~~~~ Eigen::numext /home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here namespace numext { ^ 1 error generated when compiling for gfx900. CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message): Error generating file /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed make[3]: [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1 CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed make[2]: * [test/CMakeFiles/gpu_basic.dir/all] Error 2 CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed make[1]: * [test/CMakeFiles/gpu_basic.dir/rule] Error 2 Makefile:5401: recipe for target 'gpu_basic' failed make: * [gpu_basic] Error 2 ``` The fix is in this commit is trivial. Please review and merge	2020-10-29 15:34:05 +00:00
David Tellenbach	f895755c0e	Remove unused functions in Half.h. The following functions have been removed: Eigen::half fabsh(const Eigen::half&) Eigen::half exph(const Eigen::half&) Eigen::half sqrth(const Eigen::half&) Eigen::half powh(const Eigen::half&, const Eigen::half&) Eigen::half floorh(const Eigen::half&) Eigen::half ceilh(const Eigen::half&)	2020-10-29 07:37:52 +01:00

1 2 3 4 5 ...

957 Commits