Making them trivially copyable allows using std::memcpy() without undefined
behaviors.
Only Matrix and Array with trivially copyable DenseStorage are marked as
trivially copyable with an additional type trait.
As described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0848r3.html
it requires extremely verbose SFINAE to make the special member functions of
fixed-size Matrix and Array trivial, unless C++20 concepts are available to
simplify the selection of trivial special member functions given template
parameters. Therefore only make this feature available to compilers that support
C++20 P0848R3.
Fix#1855.
The following preprocessor macros are added:
- EIGEN_COMP_CPE and EIGEN_COMP_CLANGCPE version number of the CRAY compiler if
Eigen is compiled with the Cray C++ compiler, 0 otherwise.
- EIGEN_COMP_FCC and EIGEN_COMP_CLANGFCC version number of the FCC compiler if
Eigen is compiled with the Fujitsu C++ compiler, 0 otherwise
- EIGEN_COMP_CLANGICC version number of the ICX compiler if Eigen is compiled
with the Intel oneAPI C++ compiler, 0 otherwise
All three compilers (Cray, Fujitsu, Intel) offer a traditional and a Clang-based
frontend. This is distinguished by the CLANG prefix.
Some some header guards were repeated between the `AltiVec` package and the
`ZVector` packages. This could cause a problem if (for whatever reason) someone
attempts to include headers for both architectures.
1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to
degree 6. With exactly representable coefficients computed by the Sollya tool,
this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for
arguments where exp(x) is a normalized float. This change results in a speedup
of about 4% for AVX2.
2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to
~[-104;88] i.e. return denormalized values for large negative arguments instead
of zero. Compared to exp<double>(x) the denormalized results gradually decrease
in accuracy down to 0.033 relative error for arguments around x = -104 where
exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.
If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.
This fixes issue #2395.
This MR fixes a bunch of smaller issues, making the following changes:
* Template parameters in the documentation are documented with `\tparam` instead
of `\param`
* Superfluous semicolon warnings fixed
* Fixed the type of literals used to initialize float variables
Makes e. g. matrix multiplication 2x faster:
name old cpu/op new cpu/op delta
BM_convers 181ms ± 1% 62ms ± 9% -65.82% (p=0.016 n=4+5)
Tested on all possible input values (not adding tests, since they
take a long time).
Activates vectorization of the Eigen::half versions of the tanh and
logistic functions when they run on Neon. Both functions convert their
inputs to float before computing the output, and as a result of this
commit, the conversions and the computation in float are vectorized.
We currently have plenty of type definitions with the alignment
qualifier coming after the type. The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];
Turn this into:
EIGEN_ALIGN16 int ai[4];
Fixes compiler errors in expressions that look like
Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()
The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.
The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.
To elide the memcpy, we need to first load the `src` value into
registers by making a local copy. This avoids the need to resort
to potential UB by using `reinterpret_cast`.
This change doesn't seem to affect CPU (at least not with gcc/clang).
With optimizations on, the copy is also elided.