If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.
This fixes issue #2395.
(cherry picked from commit c06c3e52a0)
- Doing computation with uninitialized (zero-ed ? but thanks Linux) matrix, or
worse NaN on other non-linux systems.
- This commit fixes it by initializing to Random().
(cherry picked from commit 4284c68fbb)
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.
Here, it is inconsequential, so switch to && and || to suppress the warning.
(cherry picked from commit b17bcddbca)
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly. Here we fix this, and make other minor
cleanup changes.
Fixes#2351.
(cherry picked from commit a500da1dc0)
Fixes compiler errors in expressions that look like
Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()
The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.
The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.
(cherry picked from commit d0e3791b1a)
We currently have plenty of type definitions with the alignment
qualifier coming after the type. The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];
Turn this into:
EIGEN_ALIGN16 int ai[4];
(cherry picked from commit 8faafc3aaa)
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).
Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list. Commenting it out
allows our Eigen tests to pass.
(cherry picked from commit 8f8c2ba2fe)
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).
Trying to use one of these on device will currently lead to a
duplicate definition error. This is still probably preferable
to no error though. If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.
The only proper solution would be to define our own custom `Complex`
type.
(cherry picked from commit f0f1d7938b)
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function. Although we probably should eventually
address these, they are flooding the logs. Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.
The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case. Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects. Again,
these warnings are flooding the logs, so disabling for now.
(cherry picked from commit 86c0decc48)
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
(cherry picked from commit b5eaa42695)
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
(cherry picked from commit ebd4b17d2f)
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored. Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
(cherry picked from commit 998bab4b04)
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.
This fixes our gitlab CI.
(cherry picked from commit 3a6296d4f1)
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".
Tested `r` instead, and that seems to work, even with latest compilers.
Also fixed some minor macro issues to eliminate warnings on armv7.
Fixes#2315.
(cherry picked from commit ff07a8a639)
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.
All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.
Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
(cherry picked from commit cc3573ab44)
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.
Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.
Fixes#2308
(cherry picked from commit 5db9e5c779)
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs. It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).
(cherry picked from commit 2b410ecbef)
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor. Newer versions of clang optimize away
this check.
Removing the affected tests.
(cherry picked from commit 0c4ae56e37)