Commit Graph

957 Commits

Author SHA1 Message Date
Antonio Sanchez
55967f87d1 Fix NEON pmax<PropagateNumbers,Packet4bf>.
Simple typo, the max impl called pmin instead of pmax for floats.
2020-12-11 21:50:52 -08:00
Antonio Sanchez
839aa505c3 Fix typo in AVX512 packet math. 2020-12-11 21:35:44 -08:00
David Tellenbach
536c8a79f2 Remove unused macro in Half.h 2020-12-12 00:53:26 +01:00
Antonio Sanchez
8c9976d7f0 Fix more SSE/AVX packet conversions for peven.
MSVC doesn't like function-style casts and forces us to use intrinsics.
2020-12-11 15:46:42 -08:00
Antonio Sanchez
c6efc4e0ba Replace M_LOG2E and M_LN2 with custom macros.
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included.  However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).

To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Antonio Sanchez
e82722a4a7 Fix MSVC SSE casts.
MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be
converted using intrinsic methods.
2020-12-11 08:52:59 -08:00
Deven Desai
f3d2ea48f5 Fix for broken ROCm/HIP Support
The following commit introduced a breakage in ROCm/HIP support for Eigen.

5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:222:
/home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'?
  return half2half2(from);
         ^~~~~~~~~~
         __half2half2
/opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here
            __half2 __half2half2(__half x)
                    ^
1 error generated when compiling for gfx900.

```

The cause seems to be a copy-paster error, and the fix is trivial
2020-12-11 16:14:57 +00:00
David Tellenbach
c7eb3a74cb Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64 2020-12-11 12:41:52 +01:00
Everton Constantino
bccf055a7c Add Armv8 guard on PropagateNumbers implementation. 2020-12-10 22:01:55 -03:00
David Tellenbach
00be0a7ff3 Fix vectorization of complex sqrt on NEON 2020-12-10 15:23:23 +00:00
David Tellenbach
8eb461a431 Remove comma at end of enumerator list in NEON PacketMath 2020-12-10 15:22:55 +01:00
Rasmus Munk Larsen
125cc9a5df Implement vectorized complex square root.
Closes #1905

Measured speedup for sqrt of `complex<float>` on Skylake:

SSE:
```
name                      old time/op             new time/op  delta
BM_eigen_sqrt_ctype/1     49.4ns ± 0%             54.3ns ± 0%  +10.01%
BM_eigen_sqrt_ctype/8      332ns ± 0%               50ns ± 1%  -84.97%
BM_eigen_sqrt_ctype/64    2.81µs ± 1%             0.38µs ± 0%  -86.49%
BM_eigen_sqrt_ctype/512   23.8µs ± 0%              3.0µs ± 0%  -87.32%
BM_eigen_sqrt_ctype/4k     202µs ± 0%               24µs ± 2%  -88.03%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%             0.19ms ± 0%  -88.18%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%              1.5ms ± 1%  -88.20%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%              6.2ms ± 0%  -88.18%
```

AVX2:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.6ns ± 0%  55.6ns ± 0%   +3.71%
BM_eigen_sqrt_ctype/8      334ns ± 0%    27ns ± 0%  -91.86%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.22µs ± 2%  -92.28%
BM_eigen_sqrt_ctype/512   23.8µs ± 1%   1.7µs ± 1%  -92.81%
BM_eigen_sqrt_ctype/4k     201µs ± 0%    14µs ± 1%  -93.24%
BM_eigen_sqrt_ctype/32k   1.62ms ± 0%  0.11ms ± 1%  -93.29%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.9ms ± 1%  -93.31%
BM_eigen_sqrt_ctype/1M    52.0ms ± 0%   3.5ms ± 1%  -93.31%
```

AVX512:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.7ns ± 0%  56.2ns ± 1%   +4.75%
BM_eigen_sqrt_ctype/8      334ns ± 0%    18ns ± 2%  -94.63%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.12µs ± 1%  -95.54%
BM_eigen_sqrt_ctype/512   23.9µs ± 1%   1.0µs ± 1%  -95.89%
BM_eigen_sqrt_ctype/4k     202µs ± 0%     8µs ± 1%  -96.13%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%  0.06ms ± 1%  -96.15%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.5ms ± 4%  -96.11%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%   2.0ms ± 1%  -96.13%
```
2020-12-08 18:13:35 -08:00
Antonio Sanchez
8cfe0db108 Fix host/device calls for __half.
The previous code had `__host__ __device__` functions calling `__device__`
functions (e.g. `__low2half`) which caused build failures in tensorflow.
Also tried to simplify the `#ifdef` guards to make them more clear.
2020-12-08 20:31:02 +00:00
Everton Constantino
baf9d762b7 - Enabling PropagateNaN and PropagateNumbers for NEON.
- Adding propagate tests to bfloat16.
2020-12-08 17:05:05 +00:00
Antonio Sanchez
5ec4907434 Clean up #ifs in GPU PacketPath.
Removed redundant checks and redundant code for CUDA/HIP.

Note: there are several issues here of calling `__device__` functions
from `__host__ __device__` functions, in particular `__low2half`.
We do not address that here -- only modifying this file enough
to get our current tests to compile.

Fixed: #1847
2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Antonio Sanchez
e2f21465fe Special function implementations for half/bfloat16 packets.
Current implementations fail to consider half-float packets, only
half-float scalars.  Added specializations for packets on AVX, AVX512 and
NEON.  Added tests to `special_packetmath`.

The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.

Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
Antonio Sanchez
9ee9ac81de Fix shfl* macros for CUDA/HIP
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.

Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.

Fixes #2083
2020-12-04 17:18:32 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9b.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
25d8ae7465 Small cleanup of generic plog implementations:
Adding the term e*ln(2) is split into two step for no obvious reason.
This dates back to the original Cephes code from which the algorithm is adapted.
It appears that this was done in Cephes to prevent the compiler from reordering
the addition of the 3 terms in the approximation

  log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)

which must be added in reverse order since |x| < (sqrt(2)-1).

This allows rewriting the code to just 2 pmadd and 1 padd instructions,
which on a Skylake processor speeds up the code by 5-7%.
2020-12-03 19:40:40 +00:00
Antonio Sanchez
70fbcf82ed Fix typo in F32MaskToBf16Mask. 2020-12-02 07:58:34 -08:00
Antonio Sanchez
2627e2f2e6 Fix neon cmp* functions for bf16.
The current impl corrupts the comparison masks when converting
from float back to bfloat16.  The resulting masks are then
no longer all zeros or all ones, which breaks when used with
`pselect` (e.g. in `pmin<PropagateNumbers>`).  This was
causing `packetmath_15` to fail on arm.

Introducing a simple `F32MaskToBf16Mask` corrects this (takes
the lower 16-bits for each float mask).
2020-12-02 01:29:34 +00:00
Antonio Sanchez
ddd48b242c Implement CUDA __shfl* for Eigen::half
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen
e57281a741 Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available. 2020-12-01 11:31:47 -08:00
Antonio Sanchez
1992af3de2 Fix #2077, EIGEN_CONSTEXPR in Half.
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`.  This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.

Fixes #2077.
2020-12-01 03:10:21 +00:00
Antonio Sanchez
89f90b585d AVX512 missing ops.
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.

Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Andreas Krebbel
1e74f93d55 Fix some packet-functions in the IBM ZVector packet-math. 2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen
79818216ed Revert "Fix Half NaN definition and test."
This reverts commit c770746d70.
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a3b300f1af Implement missing AVX half ops.
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`.  Allows `packetmath_13` to pass.

Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45 Fixes duplicate symbol when building blas
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`

Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain

  Eigen::{half,bfloat16} -> float -> bool

We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Rasmus Munk Larsen
2d63706545 Add missing parens around macro argument. 2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109 Replace SSE_SHUFFLE_MASK macro with shuffle_mask. 2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8 Avoid promotion of Arm __fp16 to float in Neon PacketMath
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617 Fix missing EIGEN_CONSTEXPR pop_macro in Half.
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055 Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
guoqiangqi
8324e5e049 Fix typo in NEON/PacketMath.h 2020-11-13 00:46:41 +00:00
Pedro Caldeira
c29935b323 Add support for dynamic dispatch of MMA instructions for POWER 10 2020-11-12 11:31:15 -03:00
mehdi-goli
e24a1f57e3 [SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects. 2020-11-12 01:50:28 +00:00
guoqiangqi
82fe059f35 Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x 2020-11-04 09:21:39 +08:00
Deven Desai
39a038f2e4 Fix for ROCm (and CUDA?) breakage - 201029
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error

e265f7ed8e

```

Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
  return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
                                                                           ^~~~~~
                                                                           Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
          ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```

The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e Remove unused functions in Half.h.
The following functions have been removed:

  Eigen::half fabsh(const Eigen::half&)
  Eigen::half exph(const Eigen::half&)
  Eigen::half sqrth(const Eigen::half&)
  Eigen::half powh(const Eigen::half&, const Eigen::half&)
  Eigen::half floorh(const Eigen::half&)
  Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00