Compare commits

...

485 Commits

Author SHA1 Message Date
David Tellenbach
831133cc76 Bump to 3.4.0 2021-02-17 23:19:19 +01:00
David Tellenbach
5336ad8591 Define internal::make_unsigned for [unsigned]long long on macOS.
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization

  internal::make_unsigned<long long>,

for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00
Antonio Sanchez
0845df7f77 Fix uninitialized warning on AVX. 2021-02-17 13:13:39 -08:00
Chip Kerchner
9b51dc7972 Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product 2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen
be0574e215 New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double. 2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980 Updated pfrexp implementation.
The original implementation fails for 0, denormals, inf, and NaN.

See #2150
2021-02-17 02:23:24 +00:00
David Tellenbach
9ad4096ccb Document possible inconsistencies when using Matrix<bool, ...> 2021-02-17 00:50:26 +01:00
Ashutosh Sharma
f702792a7c missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel) 2021-02-16 16:33:59 +00:00
Jan van Dijk
db61b8d478 Avoid -Wunused warnings in NDEBUG builds.
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
2021-02-12 18:35:35 +00:00
David Tellenbach
622c598944 Don't allow all test jobs to fail but only the currently failing ones. 2021-02-12 14:01:17 +01:00
Antonio Sanchez
90ee821c56 Use vrsqrts for rsqrt Newton iterations.
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
2021-02-11 11:33:51 -08:00
Antonio Sanchez
9fde9cce5d Adjust bounds for pexp_float/double
The original clamping bounds on `_x` actually produce finite values:
```
  exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38

  exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.

This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e Fix ldexp implementations.
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits.  See #2131 for a complete discussion,
and !375 for other possible implementations.

Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.

The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.

Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.

Fixes #2131.
2021-02-10 22:45:41 +00:00
Ashutosh Sharma
7eb07da538 loop less ptranspose 2021-02-10 10:21:37 -08:00
David Tellenbach
36200b7855 Remove vim specific comments to recognoize correct file-type.
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
David Tellenbach
54589635ad Replace nullptr by NULL in SparseLU.h to be C++03 compliant. 2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas
984d010b7b add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves 2021-02-08 22:00:31 +00:00
Nikolaus Demmel
b578930657 Fix documentation typos in LDLT.h 2021-02-08 21:07:29 +00:00
Antonio Sanchez
66841ea070 Enable bdcsvd on host.
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error.  This prevents it from running on
the host as well.

Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host.  Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).

Fixes #2139
2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen
6e3b795f81 Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one. 2021-02-05 16:58:49 -08:00
Antonio Sanchez
abcde69a79 Disable vectorized pow for half/bfloat16.
We are potentially seeing some accuracy issues with these.  Ideally we
would hand off to `float`, but that's not trivial with the current
setup.

We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00
Antonio Sanchez
56c8b14d87 Eliminate implicit conversions from float to double. 2021-02-01 15:31:01 -08:00
Antonio Sanchez
fb4548e27b Implement bit_* for device.
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`).  They also never exist in the
global namespace, so the current implementation  always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.

To overcome these limitations, we implement these functionals here.
2021-02-01 13:27:45 -08:00
Antonio Sanchez
1615a27993 Fix altivec packetmath.
Allows the altivec packetmath tests to pass.  There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.

Related to !370, #2128

cc: @ChipKerchner @pdrocaldeira

Tested on `_BIG_ENDIAN` running on QEMU with VSX.  Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c Fix clang compilation for AltiVec from previous check-in 2021-01-28 18:36:40 +00:00
David Tellenbach
170a504c2f Add the following functions
DenseBase::setConstant(NoChange_t, Index, const Scalar&)
  DenseBase::setConstant(Index, NoChange_t, const Scalar&)

to close #663.
2021-01-28 15:13:07 +01:00
David Tellenbach
598e1b6e54 Add the following functions:
DenseBase::setZero(NoChange_t, Index)
  DenseBase::setZero(Index, NoChange_t)
  DenseBase::setOnes(NoChange_t, Index)
  DenseBase::setOnes(Index, NoChange_t)
  DenseBase::setRandom(NoChange_t, Index)
  DenseBase::setRandom(Index, NoChange_t)

This closes #663.
2021-01-28 01:10:36 +01:00
Gael Guennebaud
0668c68b03 Allow for negative strides.
Note that using a stride of -1 is still not possible because it would
clash with the definition of Eigen::Dynamic.

This fixes #747.
2021-01-27 23:32:12 +01:00
Samir Benmendil
288d456c29 Replace language_support module with builtin CheckLanguage
The workaround_9220 function was introduced a long time ago to
workaround a CMake issue with enable_language(OPTIONAL). Since then
CMake has clarified that the OPTIONAL keywords has not been
implemented[0].

A CheckLanguage module is now provided with CMake to check if a language
can be enabled. Use that instead.

[0] https://cmake.org/cmake/help/v3.18/command/enable_language.html
2021-01-27 13:26:40 +00:00
Antonio Sanchez
3f4684f87d Include <cstdint> in one place, remove custom typedefs
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.

To avoid this, we remove the custom `typedef` definitions for win32.  The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.

Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Chip Kerchner
0784d9f87b Fix sqrt, ldexp and frexp compilation errors. 2021-01-25 15:22:19 -06:00
Gmc2
a4edb1079c fix test of ExtractVolumePatchesOp 2021-01-25 03:23:46 +00:00
Antonio Sanchez
4c42d5ee41 Eliminate implicit conversion warning in test/array_cwise.cpp 2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90 Replace std::isnan with numext::isnan for c++03 2021-01-23 11:02:35 -08:00
Florian Maurin
c35965b381 Remove unused variable in SparseLU.h 2021-01-22 22:24:11 +00:00
Antonio Sanchez
f0e46ed5d4 Fix pow and other cwise ops for half/bfloat16.
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.

While adding tests for half/bfloat16, found other issues related to
implicit conversions.

Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types.  These seem to be  implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
f19bcffee6 Specialize std::complex operators for use on GPU device.
NVCC and older versions of clang do not fully support `std::complex` on device,
leading to either compile errors (Cannot call `__host__` function) or worse,
runtime errors (Illegal instruction).  For most functions, we can
implement specialized `numext` versions. Here we specialize the standard
operators (with the exception of stream operators and member function operators
with a scalar that are already specialized in `<complex>`) so they can be used
in device code as well.

To import these operators into the current scope, use
`EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into
the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces.

This allow us to remove specializations of the
sum/difference/product/quotient ops, and allow us to treat complex
numbers like most other scalars (e.g. in tests).
2021-01-22 18:19:19 +00:00
David Tellenbach
65e2169c45 Add support for Arm SVE
This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*).

During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector.

Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`.

This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements.

This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.
2021-01-21 21:11:57 +00:00
Antonio Sanchez
b2126fd6b5 Fix pfrexp/pldexp for half.
The recent addition of vectorized pow (!330) relies on `pfrexp` and
`pldexp`.  This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.

Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00
Antonio Sanchez
25d8498f8b Fix stable_norm_1 test.
Test enters an infinite loop if size is 1x1 when choosing to select
unique indices for adding `inf` and `NaN` to the input. Here we
revert to non-unique indices, and split the `hypotNorm` check into
two cases: one where both `inf` and `NaN` are added, and one where
only `NaN` is added.
2021-01-21 09:44:42 -08:00
David Tellenbach
660c6b857c Remove std::cerr in iterative solver since we don't have iostream.
This fixes #2123
2021-01-21 11:40:05 +01:00
Antonio Sanchez
d5b7981119 Fix signed-unsigned comparison.
Hex literals are interpreted as unsigned, leading to a comparison between
signed max supported function `abcd[0]`  (which was negative) to the unsigned
literal `0x80000006`.  Should not change result since signed is
implicitly converted to unsigned for the comparison, but eliminates the
warning.
2021-01-20 08:34:00 -08:00
Ivan Popivanov
e409795d6b Proper CPUID 2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen
cdd8fdc32e Vectorize pow(x, y). This closes https://gitlab.com/libeigen/eigen/-/issues/2085, which also contains a description of the algorithm.
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:

```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```

If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:

```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```

which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Antonio Sanchez
bde6741641 Improved std::complex sqrt and rsqrt.
Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously
`complex_sqrt` was only used for CUDA and MSVC), and implements
custom `complex_rsqrt`.

Also introduces `numext::rsqrt` to simplify implementation, and modified
`numext::hypot` to adhere to IEEE IEC 6059 for special cases.

The `complex_sqrt` and `complex_rsqrt` implementations were found to be
significantly faster than `std::sqrt<std::complex<T>>` and
`1/numext::sqrt<std::complex<T>>`.

Benchmark file attached.
```
GCC 10, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           9.21 ns         9.21 ns     73225448
BM_StdSqrt<std::complex<float>>        17.1 ns         17.1 ns     40966545
BM_Sqrt<std::complex<double>>          8.53 ns         8.53 ns     81111062
BM_StdSqrt<std::complex<double>>       21.5 ns         21.5 ns     32757248
BM_Rsqrt<std::complex<float>>          10.3 ns         10.3 ns     68047474
BM_DivSqrt<std::complex<float>>        16.3 ns         16.3 ns     42770127
BM_Rsqrt<std::complex<double>>         11.3 ns         11.3 ns     61322028
BM_DivSqrt<std::complex<double>>       16.5 ns         16.5 ns     42200711

Clang 11, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           7.46 ns         7.45 ns     90742042
BM_StdSqrt<std::complex<float>>        16.6 ns         16.6 ns     42369878
BM_Sqrt<std::complex<double>>          8.49 ns         8.49 ns     81629030
BM_StdSqrt<std::complex<double>>       21.8 ns         21.7 ns     31809588
BM_Rsqrt<std::complex<float>>          8.39 ns         8.39 ns     82933666
BM_DivSqrt<std::complex<float>>        14.4 ns         14.4 ns     48638676
BM_Rsqrt<std::complex<double>>         9.83 ns         9.82 ns     70068956
BM_DivSqrt<std::complex<double>>       15.7 ns         15.7 ns     44487798

Clang 9, Pixel 2, aarch64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           24.2 ns         24.1 ns     28616031
BM_StdSqrt<std::complex<float>>         104 ns          103 ns      6826926
BM_Sqrt<std::complex<double>>          31.8 ns         31.8 ns     22157591
BM_StdSqrt<std::complex<double>>        128 ns          128 ns      5437375
BM_Rsqrt<std::complex<float>>          31.9 ns         31.8 ns     22384383
BM_DivSqrt<std::complex<float>>        99.2 ns         98.9 ns      7250438
BM_Rsqrt<std::complex<double>>         46.0 ns         45.8 ns     15338689
BM_DivSqrt<std::complex<double>>        119 ns          119 ns      5898944
```
2021-01-17 08:50:57 -08:00
Maozhou, Ge
21a8a2487c fix paddings of TensorVolumePatchOp 2021-01-15 11:51:49 +08:00
Guoqiang QI
38ae5353ab 1)provide a better generic paddsub op implementation
2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON
3)make paddsub op support the Packet2cf/Packet4f in SSE
2021-01-13 22:54:03 +00:00
Antonio Sanchez
352f1422d3 Remove inf local variable.
Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`,
causing a compile error here. We don't need the local anyways since it's
only used in one spot.
2021-01-12 10:33:15 -08:00
Antonio Sanchez
2044084979 Remove TODO from Transform::computeScaleRotation()
Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD`
for small matrices (twice as fast for 2x2, 20% faster for 3x3,
1% faster for 10x10).  Since the majority of cases will be small,
let's stick with `JacobiSVD`.  See !361.
2021-01-11 11:30:01 -08:00
Antonio Sanchez
3daf92c7a5 Transform::computeScalingRotation flush determinant to +/- 1.
In the previous code, in attempting to correct for a negative
determinant, we end up multiplying and dividing by a number that
is often very near, but not exactly +/-1.  By flushing to +/-1,
we can replace a division with a multiplication, and results
are more numerically consistent.
2021-01-11 10:13:38 -08:00
Antonio Sanchez
587fd6ab70 Only specialize complex sqrt_impl for CUDA if not MSVC.
We already specialize `sqrt_impl` on windows due to MSVC's mishandling
of `inf` (!355).
2021-01-11 09:15:45 -08:00
Deven Desai
2a6addb4f9 Fix for breakage in ROCm support - 210108
The following commit breaks ROCm support for Eigen
f149e0ebc3

All unit tests fail with the following error

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:166:
/home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt'
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
                                  ^
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here
template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x);
                                     ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2

```

The error message is accurate, and the fix (provided in thsi commit) is trivial.
2021-01-08 18:04:40 +00:00
Antonio Sanchez
f149e0ebc3 Fix MSVC complex sqrt and packetmath test.
MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`.
Here we replace it with a custom version (currently used on GPU).

Also fixed the `packetmath` test, which previously skipped several
corner cases since `CHECK_CWISE1` only tests the first `PacketSize`
elements.
2021-01-08 01:17:19 +00:00
Antonio Sanchez
8d9cfba799 Fix rand test for MSVC.
MSVC's uniform random number generator is not quite as uniform as
others, requiring a slightly wider threshold on the histogram test.
After inspecting histograms for several runs, there's no obvious
bias -- just some bins end up having slightly more less elements
(often > 2% but less than 2.5%).
2021-01-07 12:48:40 -08:00
Essex Edwards
e741b43668 Make Transform::computeRotationScaling(0,&S) continuous 2021-01-07 17:45:14 +00:00
David Tellenbach
0bdc0dba20 Add missing #endif directive in Macros.h 2021-01-07 12:32:41 +01:00
shrek1402
cb654b1c45 #define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors. 2021-01-07 10:12:25 +00:00
Antonio Sanchez
52d1dd979a Fix Ref initialization.
Since `eigen_assert` is a macro, the statements can become noops (e.g.
when compiling for GPU), so they may not execute the contained logic -- which
in this case is the entire `Ref` construction.  We need to separate the assert
from statements which have consequences.

Fixes #2113
2021-01-06 13:14:20 -08:00
Antonio Sanchez
166fcdecdb Allow CwiseUnaryView to be used on device.
Added `EIGEN_DEVICE_FUNC` to methods.
2021-01-06 09:16:52 -08:00
Antonio Sanchez
bb1de9dbde Fix Ref Stride checks.
The existing `Ref` class failed to consider cases where the Ref's
`Stride` setting *could* match the underlying referred object's stride,
but **didn't** at runtime.  This led to trying to set invalid stride values,
causing runtime failures in some cases, and garbage due to mismatched
strides in others.

Here we add the missing runtime checks.  This involves computing the
strides necessary to align with the referred object's storage, and
verifying we can actually set those strides at runtime.

In the `const` case, if it *may* be possible to refer to the original
storage at compile-time but fails at runtime, then we defer to the
`construct(...)` method that makes a copy.

Added more tests to check these cases.

Fixes #2093.
2021-01-05 10:41:25 -08:00
Christoph Hertzberg
12dda34b15 Eliminate boolean product warnings by factoring out a
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Antonio Sanchez
070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
rgreenblatt
fdf2ee62c5 Fix missing EIGEN_DEVICE_FUNC 2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen
05754100fe * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.
* Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA.
* Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster.

The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx.

AVX+FMA (float)

name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 1%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.08ns ± 1%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  95.5ns ± 0%    ~
BM_eigen_sqrt_float/4k     776ns ± 0%   763ns ± 0%  -1.67%
BM_eigen_sqrt_float/32k   6.57µs ± 1%  6.13µs ± 0%  -6.69%
BM_eigen_sqrt_float/256k  83.7µs ± 3%  83.3µs ± 2%    ~
BM_eigen_sqrt_float/1M     335µs ± 2%   332µs ± 2%    ~

SSE+FMA (float)
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 0%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.06ns ± 0%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  96.3ns ± 4%    ~
BM_eigen_sqrt_float/4k     774ns ± 0%   763ns ± 0%  -1.50%
BM_eigen_sqrt_float/32k   6.58µs ± 2%  6.11µs ± 0%  -7.06%
BM_eigen_sqrt_float/256k  82.7µs ± 1%  82.6µs ± 1%    ~
BM_eigen_sqrt_float/1M     330µs ± 1%   329µs ± 2%    ~

SSE+FMA (double)
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.08ns ± 0%   -6.68%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.65%
BM_eigen_sqrt_double/512     417ns ± 0%   374ns ± 1%  -10.29%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -11.00%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.7µs ± 0%  -11.07%
BM_eigen_sqrt_double/256k    213µs ± 0%   206µs ± 1%   -3.31%
BM_eigen_sqrt_double/1M      862µs ± 0%   870µs ± 2%   +0.96%

AVX+FMA (double)
name                        old cpu/op  new cpu/op  delta
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.06ns ± 0%   -6.95%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.80%
BM_eigen_sqrt_double/512     417ns ± 0%   373ns ± 1%  -10.59%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -10.79%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.8µs ± 0%  -10.94%
BM_eigen_sqrt_double/256k    214µs ± 0%   208µs ± 2%   -2.76%
BM_eigen_sqrt_double/1M      866µs ± 3%   923µs ± 7%     ~
2020-12-16 18:16:11 +00:00
Turing Eret
3bee9422d6 Merge branch 'lambdaknight/eigen-master' 2020-12-16 09:18:24 -07:00
Turing Eret
19e6496ce0 Replace call to FixedDimensions() with a singleton instance of
FixedDimensions.
2020-12-16 07:34:44 -07:00
Rasmus Munk Larsen
6cee8d347e Add an additional step of Newton-Raphson for psqrt<double> on Arm, which otherwise has an error of ~1000 ulps. 2020-12-15 04:06:41 +00:00
Turing Eret
bc7d1599fb TensorStorage with FixedDimensions now has zero instance memory overhead.
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
2020-12-14 07:19:34 -07:00
Alexander Grund
cf0b5b0344 Remove code checking for CMake < 3.5
As the CMake version is at least 3.5 the code checking for earlier versions can be removed.
2020-12-14 09:57:44 +00:00
David Tellenbach
751f18f2c0 Remove comma at the end of enumeration list to silence C++03 warnings 2020-12-13 18:11:02 +01:00
Antonio Sanchez
5dc2fbabee Fix implicit cast to double.
Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors
in Google due to `-Wall`.
2020-12-12 09:26:20 -08:00
Antonio Sanchez
55967f87d1 Fix NEON pmax<PropagateNumbers,Packet4bf>.
Simple typo, the max impl called pmin instead of pmax for floats.
2020-12-11 21:50:52 -08:00
Antonio Sanchez
839aa505c3 Fix typo in AVX512 packet math. 2020-12-11 21:35:44 -08:00
David Tellenbach
536c8a79f2 Remove unused macro in Half.h 2020-12-12 00:53:26 +01:00
Antonio Sanchez
8c9976d7f0 Fix more SSE/AVX packet conversions for peven.
MSVC doesn't like function-style casts and forces us to use intrinsics.
2020-12-11 15:46:42 -08:00
Antonio Sanchez
c6efc4e0ba Replace M_LOG2E and M_LN2 with custom macros.
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included.  However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).

To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Antonio Sanchez
e82722a4a7 Fix MSVC SSE casts.
MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be
converted using intrinsic methods.
2020-12-11 08:52:59 -08:00
Deven Desai
f3d2ea48f5 Fix for broken ROCm/HIP Support
The following commit introduced a breakage in ROCm/HIP support for Eigen.

5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:222:
/home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'?
  return half2half2(from);
         ^~~~~~~~~~
         __half2half2
/opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here
            __half2 __half2half2(__half x)
                    ^
1 error generated when compiling for gfx900.

```

The cause seems to be a copy-paster error, and the fix is trivial
2020-12-11 16:14:57 +00:00
David Tellenbach
c7eb3a74cb Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64 2020-12-11 12:41:52 +01:00
Everton Constantino
bccf055a7c Add Armv8 guard on PropagateNumbers implementation. 2020-12-10 22:01:55 -03:00
Antonio Sanchez
82c0c18a83 Remove private access of std::deque::_M_impl.
This no longer works on gcc or clang, so we should just remove the hack.
The default should compile to similar code anyways.
2020-12-10 14:59:34 -08:00
David Tellenbach
00be0a7ff3 Fix vectorization of complex sqrt on NEON 2020-12-10 15:23:23 +00:00
David Tellenbach
8eb461a431 Remove comma at end of enumerator list in NEON PacketMath 2020-12-10 15:22:55 +01:00
David Tellenbach
2e8f850c78 Fix a typo in SparseMatrix documentation.
This fixes issue #2091.
2020-12-09 14:48:24 +01:00
Rasmus Munk Larsen
125cc9a5df Implement vectorized complex square root.
Closes #1905

Measured speedup for sqrt of `complex<float>` on Skylake:

SSE:
```
name                      old time/op             new time/op  delta
BM_eigen_sqrt_ctype/1     49.4ns ± 0%             54.3ns ± 0%  +10.01%
BM_eigen_sqrt_ctype/8      332ns ± 0%               50ns ± 1%  -84.97%
BM_eigen_sqrt_ctype/64    2.81µs ± 1%             0.38µs ± 0%  -86.49%
BM_eigen_sqrt_ctype/512   23.8µs ± 0%              3.0µs ± 0%  -87.32%
BM_eigen_sqrt_ctype/4k     202µs ± 0%               24µs ± 2%  -88.03%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%             0.19ms ± 0%  -88.18%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%              1.5ms ± 1%  -88.20%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%              6.2ms ± 0%  -88.18%
```

AVX2:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.6ns ± 0%  55.6ns ± 0%   +3.71%
BM_eigen_sqrt_ctype/8      334ns ± 0%    27ns ± 0%  -91.86%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.22µs ± 2%  -92.28%
BM_eigen_sqrt_ctype/512   23.8µs ± 1%   1.7µs ± 1%  -92.81%
BM_eigen_sqrt_ctype/4k     201µs ± 0%    14µs ± 1%  -93.24%
BM_eigen_sqrt_ctype/32k   1.62ms ± 0%  0.11ms ± 1%  -93.29%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.9ms ± 1%  -93.31%
BM_eigen_sqrt_ctype/1M    52.0ms ± 0%   3.5ms ± 1%  -93.31%
```

AVX512:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.7ns ± 0%  56.2ns ± 1%   +4.75%
BM_eigen_sqrt_ctype/8      334ns ± 0%    18ns ± 2%  -94.63%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.12µs ± 1%  -95.54%
BM_eigen_sqrt_ctype/512   23.9µs ± 1%   1.0µs ± 1%  -95.89%
BM_eigen_sqrt_ctype/4k     202µs ± 0%     8µs ± 1%  -96.13%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%  0.06ms ± 1%  -96.15%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.5ms ± 4%  -96.11%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%   2.0ms ± 1%  -96.13%
```
2020-12-08 18:13:35 -08:00
Antonio Sanchez
8cfe0db108 Fix host/device calls for __half.
The previous code had `__host__ __device__` functions calling `__device__`
functions (e.g. `__low2half`) which caused build failures in tensorflow.
Also tried to simplify the `#ifdef` guards to make them more clear.
2020-12-08 20:31:02 +00:00
Everton Constantino
baf9d762b7 - Enabling PropagateNaN and PropagateNumbers for NEON.
- Adding propagate tests to bfloat16.
2020-12-08 17:05:05 +00:00
Antonio Sanchez
634bd79b0e Fix unused warning on new dense_assignment_loop impl. 2020-12-07 19:14:21 -08:00
Antonio Sanchez
655c3a4042 Add specialization for compile-time zero-sized dense assignment.
In the current `dense_assignment_loop` implementations, if the
destination's inner or outer size is zero at compile time and if the kernel
involves a product, we currently get a compile error (#2080).  This is
triggered by attempting to multiply a non-existent row by a column (or
vice-versa).

To address this, we add a specialization for zero-sized assignments
(`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static
check to ensure the size is in-fact zero. This now seems to be the only
existing use of `AllAtOnceTraversal`.

Fixes #2080.
2020-12-07 08:38:43 -08:00
Antonio Sanchez
5ec4907434 Clean up #ifs in GPU PacketPath.
Removed redundant checks and redundant code for CUDA/HIP.

Note: there are several issues here of calling `__device__` functions
from `__host__ __device__` functions, in particular `__low2half`.
We do not address that here -- only modifying this file enough
to get our current tests to compile.

Fixed: #1847
2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Antonio Sanchez
2dbac2f99f Fix bad NEON fp16 check 2020-12-04 13:42:18 -08:00
Antonio Sanchez
e2f21465fe Special function implementations for half/bfloat16 packets.
Current implementations fail to consider half-float packets, only
half-float scalars.  Added specializations for packets on AVX, AVX512 and
NEON.  Added tests to `special_packetmath`.

The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.

Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
David Tellenbach
305b8bd277 Remove duplicate #if clause 2020-12-04 18:55:46 +01:00
Antonio Sanchez
9ee9ac81de Fix shfl* macros for CUDA/HIP
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.

Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.

Fixes #2083
2020-12-04 17:18:32 +00:00
shrek1402
a9a2f2bebf The function 'prefetch' did not work correctly on the win64 platform 2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9b.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
25d8ae7465 Small cleanup of generic plog implementations:
Adding the term e*ln(2) is split into two step for no obvious reason.
This dates back to the original Cephes code from which the algorithm is adapted.
It appears that this was done in Cephes to prevent the compiler from reordering
the addition of the 3 terms in the approximation

  log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)

which must be added in reverse order since |x| < (sqrt(2)-1).

This allows rewriting the code to just 2 pmadd and 1 padd instructions,
which on a Skylake processor speeds up the code by 5-7%.
2020-12-03 19:40:40 +00:00
Antonio Sanchez
eb4d4ae070 Include chrono in main for c++11.
Hack to fix tensor tests, since min/max are overridden by `main.h`.
2020-12-03 11:27:32 -08:00
Rasmus Munk Larsen
71c85df4c1 Clean up the Tensor header and get rid of the EIGEN_SLEEP macro. 2020-12-02 11:04:04 -08:00
Antonio Sanchez
70fbcf82ed Fix typo in F32MaskToBf16Mask. 2020-12-02 07:58:34 -08:00
Antonio Sanchez
2627e2f2e6 Fix neon cmp* functions for bf16.
The current impl corrupts the comparison masks when converting
from float back to bfloat16.  The resulting masks are then
no longer all zeros or all ones, which breaks when used with
`pselect` (e.g. in `pmin<PropagateNumbers>`).  This was
causing `packetmath_15` to fail on arm.

Introducing a simple `F32MaskToBf16Mask` corrects this (takes
the lower 16-bits for each float mask).
2020-12-02 01:29:34 +00:00
Antonio Sanchez
ddd48b242c Implement CUDA __shfl* for Eigen::half
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen
e57281a741 Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available. 2020-12-01 11:31:47 -08:00
Antonio Sanchez
1992af3de2 Fix #2077, EIGEN_CONSTEXPR in Half.
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`.  This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.

Fixes #2077.
2020-12-01 03:10:21 +00:00
acxz
7b80609d49 add EIGEN_DEVICE_FUNC to methods 2020-12-01 03:08:47 +00:00
Antonio Sanchez
89f90b585d AVX512 missing ops.
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.

Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Florian Maurin
c5985c46f5 Fix typo in doc 2020-11-30 10:53:29 +00:00
Jim Lersch
68f69414f7 Workaround for doxygen class template titles in which the template
part of the class signature is lost due to a problem with forward
declarations.  The problem is probably caused by doxygen bug #7689.
It is confirmed to be fixed in doxygen >= 1.8.19.
2020-11-27 19:52:16 -07:00
Jim Lersch
a7170f2aca Fix doxygen class blocks that were not associated with the correct classes. 2020-11-27 08:48:11 -07:00
David Tellenbach
550e8f8f57 Include CMakeDependentOption to be able to use cmake_dependent_option 2020-11-27 13:21:49 +01:00
Bowie Owens
9842366bba Make inclusion of doc sub-directory optional by adjusting options.
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().

Requested by:

https://gitlab.com/libeigen/eigen/-/issues/1842

Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
filippobrizzi
aa56e1d980 check for include dirs set 2020-11-26 10:22:46 +00:00
Andreas Krebbel
1e74f93d55 Fix some packet-functions in the IBM ZVector packet-math. 2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen
79818216ed Revert "Fix Half NaN definition and test."
This reverts commit c770746d70.
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a3b300f1af Implement missing AVX half ops.
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`.  Allows `packetmath_13` to pass.

Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45 Fixes duplicate symbol when building blas
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`

Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain

  Eigen::{half,bfloat16} -> float -> bool

We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
41d5d5334b Initialize primitives to fix -Wuninitialized-const-reference.
The `meta` test generates warnings with the latest version of clang due
to passing uninitialized variables as const reference arguments.
```
test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference]
    VERIFY(( check_is_convertible(a.dot(b), f) ));
```
We don't actually use the variables, but initializing them eliminates the
new warning.

Fixes #2067.
2020-11-18 20:23:20 +00:00
Antonio Sanchez
3669498f5a Fix rule-of-3 for the Tensor module.
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.

Addresses #1863
2020-11-18 18:14:53 +00:00
Antonio Sanchez
60218829b7 EOF newline added to InverseSize4.
Causing build breakages due to `-Wnewline-eof -Werror` that seems to be
common across Google.
2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen
2d63706545 Add missing parens around macro argument. 2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109 Replace SSE_SHUFFLE_MASK macro with shuffle_mask. 2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8 Avoid promotion of Arm __fp16 to float in Neon PacketMath
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617 Fix missing EIGEN_CONSTEXPR pop_macro in Half.
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055 Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
Antonio Sanchez
8e9cc5b10a Eliminate double-promotion warnings.
Clang currently complains about implicit conversions, e.g.
```
test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion]
          data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
                                                        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion]
          data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
```

Modified to explicitly cast to double.
2020-11-16 10:39:09 -08:00
acxz
9175f50d6f Add EIGEN_DEVICE_FUNC to TranspositionsBase
Fixes #2057.
2020-11-16 15:37:40 +00:00
Martin Vonheim Larsen
280f4f2407 Enable MathJax in Doxygen.in
Note that HTTPS must be used against the MathJax CDN when hosted on `eigen.tuxfamily.org` (which uses HTTPS) in order to avoid `Mixed Content`-errors from browsers. Using HTTPS for MathJax also works if the Eigen docs are hosted on plain HTTP.
2020-11-16 12:59:13 +00:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
Christoph Hertzberg
90f6d9d23e Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace. 2020-11-13 16:21:53 +01:00
guoqiangqi
8324e5e049 Fix typo in NEON/PacketMath.h 2020-11-13 00:46:41 +00:00
Antonio Sanchez
852513e7a6 Disable testing of OpenGL by default.
The `OpenGLSupport` module contains mostly deprecated features, and the
test is highly GL context-dependent, relies on deprecated GLUT, and
requires a display.  Until the module is updated to support modern
OpenGL and the test to use newer windowing frameworks (e.g. GLFW)
it's probably best to disable the test by default.

The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`.

See #2053 for more details.
2020-11-12 16:15:40 -08:00
Rasmus Munk Larsen
bec72345d6 Simplify expression for inner product fallback in Gemv product evaluator. 2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen
276db21f26 Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct. 2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen
cf12474a8b Optimize matrix*matrix and matrix*vector products when they correspond to inner products at runtime.
This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k).

name                           old time/op             new time/op   delta
BM_VecVecStatStat<float>/1     1.64ns ± 0%             1.64ns ± 0%     ~
BM_VecVecStatStat<float>/8     2.99ns ± 0%             2.99ns ± 0%     ~
BM_VecVecStatStat<float>/64    7.00ns ± 1%             7.04ns ± 0%   +0.66%
BM_VecVecStatStat<float>/512   61.6ns ± 0%             61.6ns ± 0%     ~
BM_VecVecStatStat<float>/4k     551ns ± 0%              553ns ± 1%   +0.26%
BM_VecVecStatStat<float>/32k   4.45µs ± 0%             4.45µs ± 0%     ~
BM_VecVecStatStat<float>/256k  77.9µs ± 0%             78.1µs ± 1%     ~
BM_VecVecStatStat<float>/1M     312µs ± 0%              312µs ± 1%     ~
BM_VecVecDynStat<float>/1      13.3ns ± 1%              4.6ns ± 0%  -65.35%
BM_VecVecDynStat<float>/8      14.4ns ± 0%              6.2ns ± 0%  -57.00%
BM_VecVecDynStat<float>/64     24.0ns ± 0%             10.2ns ± 3%  -57.57%
BM_VecVecDynStat<float>/512     138ns ± 0%               68ns ± 0%  -50.52%
BM_VecVecDynStat<float>/4k     1.11µs ± 0%             0.56µs ± 0%  -49.72%
BM_VecVecDynStat<float>/32k    8.89µs ± 0%             4.46µs ± 0%  -49.89%
BM_VecVecDynStat<float>/256k   78.2µs ± 0%             78.1µs ± 1%     ~
BM_VecVecDynStat<float>/1M      313µs ± 0%              312µs ± 1%     ~
BM_VecVecDynDyn<float>/1       10.4ns ± 0%             10.5ns ± 0%   +0.91%
BM_VecVecDynDyn<float>/8       12.0ns ± 3%             11.9ns ± 0%     ~
BM_VecVecDynDyn<float>/64      37.4ns ± 0%             19.6ns ± 1%  -47.57%
BM_VecVecDynDyn<float>/512      159ns ± 0%               81ns ± 0%  -49.07%
BM_VecVecDynDyn<float>/4k      1.13µs ± 0%             0.58µs ± 1%  -49.11%
BM_VecVecDynDyn<float>/32k     8.91µs ± 0%             5.06µs ±12%  -43.23%
BM_VecVecDynDyn<float>/256k    78.2µs ± 0%             78.2µs ± 1%     ~
BM_VecVecDynDyn<float>/1M       313µs ± 0%              312µs ± 1%     ~
2020-11-12 18:02:37 +00:00
Pedro Caldeira
c29935b323 Add support for dynamic dispatch of MMA instructions for POWER 10 2020-11-12 11:31:15 -03:00
acxz
b714dd9701 remove annotation for first declaration of default con/destruction 2020-11-12 04:34:12 +00:00
mehdi-goli
e24a1f57e3 [SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects. 2020-11-12 01:50:28 +00:00
Antonio Sanchez
6961468915 Address issues with openglsupport test.
The existing test fails on several systems due to GL runtime version mismatches,
the use of deprecated features, and memory errors due to improper use of GLUT.
The test was modified to:

- Run within a display function, allowing proper GLUT cleanup.
- Generate dynamic shaders with a supported GLSL version string and output variables.
- Report shader compilation errors.
- Check GL context version before launching version-specific tests.

Note that most of the existing `OpenGLSupport` module and tests rely on deprecated
features (e.g. fixed-function pipeline). The test was modified to allow it to
pass on various systems. We might want to consider removing the module or re-writing
it entirely to support modern OpenGL.  This is beyond the scope of this patch.

Testing of legacy GL (for platforms that support it) can be enabled by defining
`EIGEN_LEGACY_OPENGL`.  Otherwise, the test will try to create a modern context.

Tested on
- MacBook Air (2019), macOS Catalina 10.15.7 (OpenGL 2.1, 4.1)
- Debian 10.6, NVidia Quadro K1200 (OpenGL 3.1, 3.3)
2020-11-11 15:54:43 -08:00
Everton Constantino
348a48682e Fix erroneous forward declaration of boost nvp. 2020-11-10 13:07:34 -03:00
guoqiangqi
82fe059f35 Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x 2020-11-04 09:21:39 +08:00
Deven Desai
9d11e2c03e CMakefile update for ROCm 4.0
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
Deven Desai
39a038f2e4 Fix for ROCm (and CUDA?) breakage - 201029
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error

e265f7ed8e

```

Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
  return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
                                                                           ^~~~~~
                                                                           Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
          ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```

The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e Remove unused functions in Half.h.
The following functions have been removed:

  Eigen::half fabsh(const Eigen::half&)
  Eigen::half exph(const Eigen::half&)
  Eigen::half sqrth(const Eigen::half&)
  Eigen::half powh(const Eigen::half&, const Eigen::half&)
  Eigen::half floorh(const Eigen::half&)
  Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00
David Tellenbach
09f015852b Replace numext::as_uint with numext::bit_cast<numext::uint32_t> 2020-10-29 07:28:28 +01:00
David Tellenbach
e265f7ed8e Add support for Armv8.2-a __fp16
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces

* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
  float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half

The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.

This closes #1940.
2020-10-28 20:15:09 +00:00
mehdi-goli
a725a3233c [SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems 2020-10-28 08:34:49 +00:00
mehdi-goli
b9ff791fed [Missing SYCL math op]: Addin the missing LDEXP Function for SYCL. 2020-10-28 08:32:57 +00:00
mehdi-goli
61461d682a [Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend. 2020-10-28 08:30:34 +00:00
Christoph Hertzberg
ecb7bc9514 Bug #2036 make sure find_standard_math_library_test_program actually compiles (and is guaranteed to call math functions) 2020-10-24 15:22:21 +02:00
Susi Lehtola
09f595a269 Make sure compiler does not optimize away calls to math functions 2020-10-24 06:16:50 +00:00
guoqiangqi
28aef8e816 Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f> 2020-10-20 11:37:09 +08:00
guoqiangqi
4a77eda1fd remove unnecessary specialize template of pexp for scale float/double 2020-10-19 00:51:42 +00:00
Antonio Sanchez
d9f0d9eb76 Fix missing pfirst<Packet16b> for MSVC.
It was only defined under one `#ifdef` case.  This fixes the `packetmath_14`
test for MSVC.
2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen
21edea5edd Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent. 2020-10-15 18:39:58 -07:00
Deven Desai
011e0db31d Fix for ROCm/HIP breakage - 201013
The following commit seems to have introduced regressions in ROCm/HIP support.

183a208212

It causes some unit-tests to fail with the following error

```
...
Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'?
...
Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'?
...
Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'?
...
```

The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above.

It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling  with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( 4091f6b25c ).

Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option.

Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either.

Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen
6ea8091705 Revert change from 4e4d3f32d1 that broke BFloat16.h build with older compilers. 2020-10-15 01:20:08 +00:00
Guoqiang QI
4700713faf Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api 2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen
af6f43d7ff Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512. 2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen
274ef12b61 Remove leftover debug print statement in cxx11_tensor_expr.cpp 2020-10-14 22:59:51 +00:00
Rasmus Munk Larsen
208b3626d1 Revert generic implementation of predux, since it break compilation of predux_any with MSVC. 2020-10-14 21:41:28 +00:00
David Tellenbach
e3e2cf9d24 Add MatrixBase::cwiseArg() 2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen
61fc78bbda Get rid of nested template specialization in TensorReductionGpu.h, which was broken by c6953f799b. 2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
acxz
807e51528d undefine EIGEN_CONSTEXPR before redefinition 2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen
9a4d04c05f Make bitwise_helper a device function to unbreak GPU builds. 2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
David Tellenbach
7a8d3d5b81 Disable test exceptions when using OpenMP. 2020-10-09 17:49:07 +02:00
David Tellenbach
9022f5aa8a Mention problems when using potentially throwing scalars and OpenMP 2020-10-09 17:04:25 +02:00
Karl Ljungkvist
d199c17b14 Fix typo in Tutorial_BlockOperations_block_assignment.cpp 2020-10-09 07:51:36 +00:00
David Tellenbach
4091f6b25c Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD 2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
183a208212 Implement generic bitwise logical packet ops that work for all types. 2020-10-08 22:45:20 +00:00
David Tellenbach
8f8d77b516 Add EIGEN prefix for HAS_LGAMMA_R 2020-10-08 18:32:19 +02:00
Eugene Zhulenev
2279f2c62f Use lgamma_r if it is available (update check for glibc 2.19+) 2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
f66f3393e3 Use reinterpret_cast instead of C-style cast in Inverse_NEON.h 2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen
22c971a225 Don't cast away const in Inverse_NEON.h. 2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen
f93841b53e Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h. 2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen
ee714f79f7 Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments. 2020-10-02 21:05:35 +00:00
janos
f7b185a8b1 dont use =* might not return a Scalar 2020-10-02 14:36:51 +02:00
Rasmus Munk Larsen
9078f47cd6 Fix build breakage with MSVC 2019, which does not support MMX intrinsics for 64 bit builds, see:
https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c

Instead use the equivalent SSE2 intrinsics.
2020-10-01 12:37:55 -07:00
Rasmus Munk Larsen
3b445d9bf2 Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564. 2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen
44b9d4e412 Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead. 2020-09-30 13:33:44 -07:00
Antonio Sanchez
d5a0d89491 Fix alignedbox 32-bit precision test failure.
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.

In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.

Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly.  Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
David Tellenbach
30960d485e Fix failure in GEBP kernel when compiling with OpenMP and FMA
Fixes #1995
2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen
f9d1500f74 Revert !182. 2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen
068121ec02 Add missing newline at the end of Inverse_NEON.h 2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen
74ff5719b3 Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long. 2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen
3a0b23e473 Fix compilation of pset1frombits calls on iOS. 2020-09-28 22:30:36 +00:00
Christoph Hertzberg
6b0c0b587e Provide a more efficient Packet2l->Packet2d cast method 2020-09-28 22:14:02 +00:00
Martin Pecka
6425e875a1 Added AlignedBox::transform(AffineTransform). 2020-09-28 18:06:23 +00:00
Alexander Grund
a967fadb21 Make relative path variables of type STRING
When the type is PATH an absolute path is expected and user-defined
values are converted into absolute paths relative to the current directory.

Fixes #1990
2020-09-28 16:39:48 +00:00
Zhuyie
e4b24e7fb2 Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined 2020-09-25 09:36:43 +00:00
Deven Desai
ce5c59729d Fix for ROCm/HIP breakage - 200921
The following commit causes regressions in the ROCm/HIP support for Eigen
e55182ac09

I suspect the same breakages occur on the CUDA side too.

The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.

When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.

This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
2020-09-22 22:26:45 +00:00
David Tellenbach
b8a13f13ca Add CI configuration for ppc64le 2020-09-22 00:26:23 +00:00
Guoqiang QI
821702e771 Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler 2020-09-21 15:49:00 +00:00
David Tellenbach
493a7c773c Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>> 2020-09-21 12:43:41 +02:00
Павел Мацула
38e4a67394 Fix using FindStandardMathLibrary.cmake with -Wall (-Wunused-value) added to CMAKE_CXX_FLAG 2020-09-19 16:13:16 +00:00
Rasmus Munk Larsen
c4b99f78c7 Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86.
If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.
2020-09-18 18:13:20 -07:00
guoqiangqi
9aad16b443 Fix undefined reference to pset1frombits bug on different platforms 2020-09-19 00:53:21 +00:00
David Tellenbach
c4aa8e0db2 Rename variable to avoid shadowing of a previously declared one 2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen
14022f5eb5 Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h.
'vmvnq_u64' does not exist for some reason.
2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen
a5b226920f Fix typo in PacketMath.h 2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen
3af744b023 Add missing packet op pcmp_lt_or_nan for Packet2d on ARM. 2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen
31a6b88ff3 Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported. 2020-09-17 23:51:06 +00:00
Brad King
880fa43b2b Add support for CastXML on ARM aarch64
CastXML simulates the preprocessors of other compilers, but actually
parses the translation unit with an internal Clang compiler.
Use the same `vld1q_u64` workaround that we do for Clang.

Fixes: #1979
2020-09-16 13:40:23 -04:00
daravi
6f0f6f792e Fix compiler error due to c++20 operator== generation rules 2020-09-16 02:06:53 +00:00
Benoit Jacob
cc0c38ace8 Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler. 2020-09-15 20:54:14 -04:00
Tim Shen
bb56a62582 Make bfloat16(float(-nan)) produce -nan, not nan. 2020-09-15 13:24:23 -07:00
Guoqiang QI
3012e755e9 Add plog ops support packet2d for NEON 2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen
e4fb0ddf78 Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h 2020-09-15 01:18:55 +00:00
Pedro Caldeira
65e400896b Fix bfloat16 round on gcc 4.8 2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen
5636f80d11 Fix issue #1968. Don't discard return value from "new" in C++17. 2020-09-13 17:38:45 +00:00
Guoqiang QI
7c5d48f313 Unified sse pldexp_double api 2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen
71e08c702b Make blueNorm threadsafe if C++11 atomics are available. 2020-09-12 01:23:29 +00:00
David Tellenbach
adc861cabd New CI infrastructure, including AArch64 runners 2020-09-11 18:11:49 +00:00
Niels Dekker
5328c9be43 Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
2020-09-10 16:22:28 +02:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Guoqiang QI
85428a3440 Add Neon psqrt<Packet2d> and pexp<Packet2d> 2020-09-08 09:04:03 +00:00
Alexander Neumann
5272106826 remove semi triggering -Wextra-semi-stmt 2020-09-07 11:42:30 +02:00
Stephen Zheng
5f25bcf7d6 Add Inverse_NEON.h
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.

```
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_float                 -0.1285         -0.1275           568           495           572           499
BM_double                -0.2265         -0.2254           638           494           641           496
```
2020-09-04 10:55:47 +00:00
Everton Constantino
6fe88a3c9d MatrixProuct enhancements:
- Changes to Altivec/MatrixProduct
  Adapting code to gcc 10.
  Generic code style and performance enhancements.
  Adding PanelMode support.
  Adding stride/offset support.
  Enabling float64, std::complex and std::complex.
  Fixing lack of symm_pack.
  Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275 Changing u/int8_t to un/signed char because clang does not understand
it.

Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Gael Guennebaud
27e6648074 fix #1901: warning in Mode==(Upper|Lower) 2020-09-02 15:43:58 +02:00
Hans Johnson
5b9bfc892a BUG: cmake_minimum_required must be the first command
https://cmake.org/cmake/help/v3.5/command/project.html

Note: Call the cmake_minimum_required() command at the beginning of the
top-level CMakeLists.txt file even before calling the project() command.
It is important to establish version and policy settings before invoking
other commands whose behavior they may affect. See also policy CMP0000.
2020-08-28 22:57:16 +00:00
Chip Kerchner
e5886457c8 Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub. 2020-08-28 19:27:32 +00:00
Gael Guennebaud
25424d91f6 Fix #1974: assertion when reserving an empty sparse matrix 2020-08-26 12:32:20 +02:00
Guoqiang QI
8bb0febaf9 add psqrt ops support packet2f/packet4f for NEON 2020-08-21 03:17:15 +00:00
Georg Jäger
1b1082334b adding attributes to constructors to support hip-clang on ROCm 3.5 2020-08-20 16:48:11 +02:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
c060114a25 Fix nightly CI configuration 2020-08-19 20:52:34 +02:00
David Tellenbach
fe8c3ef3cb Add possibility to split test suit build targets and improved CI configuration
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
2020-08-19 18:27:45 +00:00
Rasmus Munk Larsen
d10b27fe37 Add missing inline keyword in Quaternion.h. 2020-08-14 17:51:04 +00:00
David Tellenbach
d4a727d092 Disable min/max NaN propagation in test cxx11_tensor_expr
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.

See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
David Tellenbach
d2bb6cf396 Fix compilation error in blasutil test 2020-08-14 18:15:18 +02:00
David Tellenbach
c6820a6316 Replace the call to int64_t in the blasutil test by explicit types
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.

We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
David Tellenbach
8ba1b0f41a bfloat16 packetmath for Arm Neon backend 2020-08-13 15:48:40 +00:00
Pedro Caldeira
704798d1df Add support for Bfloat16 to use vector instructions on Altivec
architecture
2020-08-10 13:22:01 -05:00
Deven Desai
46f8a18567 Adding an explicit launch_bounds(1024) attribute for GPU kernels.
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.

This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)

Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.

This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
Zachary Garrett
21122498ec Temporarily turn off the NEON implementation of pfloor as it does not work for large values.
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point  represented are supported.
2020-08-04 16:28:23 +00:00
David Tellenbach
23b7f0572b Disable CI buildstage again 2020-08-03 15:41:43 +02:00
Gael Guennebaud
d0f5d4bc50 add a banner to advertise the survey 2020-07-29 19:01:38 +02:00
David Tellenbach
5e484fa11d Fix StlDeque for GCC 10
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641 Fix undefine BF16 union behavior in AVX512. 2020-07-29 02:20:21 +00:00
Rasmus Munk Larsen
b92206676c Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned. 2020-07-28 19:19:37 +00:00
David Tellenbach
99da2e1a8d Fix clang-tidy warnings in generic bfloat16 implementation
See !172 for related discussions.
2020-07-27 16:00:24 +02:00
qxxxb
649fd1c2ae Fix CMake install command 2020-07-25 16:35:13 -04:00
David Tellenbach
e48d8e4725 Don't allow failure for CI build stage anymore 2020-07-24 21:12:15 +02:00
David Tellenbach
b8ca93842c Improve CI configuration
- Fix docker Fedora image to Fedora:31
  - Fix gcc version to gcc-9.2.1
  - Use GitLab CI dag
  - Fix usage of build cache
  - Introduce build artificats
2020-07-24 15:58:44 +00:00
Gael Guennebaud
fb0c6868ad Add missing footer declaration 2020-07-24 10:28:44 +02:00
David Tellenbach
c1ffe452fc Fix bfloat16 casts
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Gael Guennebaud
2ce2f51989 remove piwik tracker 2020-07-23 13:51:39 +02:00
Rasmus Munk Larsen
1b84f21e32 Revert change that made conversion from bfloat16 to {float, double} implicit.
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
David Tellenbach
38b91f256b Fix cast of blfoat16 to std::complex<T>
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen
bed7fbe854 Make sure we take the little-endian path if __BYTE_ORDER__ is not defined. 2020-07-22 18:54:38 +00:00
Niels Dekker
0e1a33a461 Faster conversion from integer types to bfloat16
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.

A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen
acab22c205 Avoid division by zero in nonZerosEstimate() for empty blocks. 2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00
Rasmus Munk Larsen
0aeaf5f451 Make numext::as_uint a device function. 2020-07-22 00:33:41 +00:00
Alexander Turkin
60faa9f897 user-defined copy operations removed in favor of compiler-generated ones 2020-07-20 14:59:35 +03:00
Niels Dekker
b11f817bcf Avoid undefined behavior by union type punning in float_to_bfloat16_rtne
Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

`numext::as_uint` was suggested by David Tellenbach
2020-07-14 19:55:20 +02:00
Sheng Yang
56b3e3f3f8 AVX path for BF16 2020-07-14 01:34:03 +00:00
Niels Dekker
4ab32e2de2 Allow implicit conversion from bfloat16 to float and double
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.

Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d Guard operator<< test by EIGEN_NO_IO. 2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
ed00df445d Guard operator<< by EIGEN_NO_IO. 2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen
fb77b7288c Add operator<< to print a quaternion. 2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48 Fix test basic stuff
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Forrest Voight
8889a2c1c6 Add operator==/operator!= to Quaternion. Fixes #1876. 2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183 Make test packetmath C++98 compliant 2020-07-01 20:41:59 +02:00
Sheng Yang
116c5235ac BF16 for scalar_cmp_with_cast_op 2020-07-01 18:33:42 +00:00
Kan Chen
8731452b97 Delete duplicate test cases in vectorization_logic.cpp 2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f Fix denormal check pre c++11.
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d Report custom C++ flags in CMake testing summary 2020-06-30 17:18:54 +00:00
David Tellenbach
f3b8d441f6 Remote CI tags to enable shared runners 2020-06-29 22:15:41 +02:00
Christoph Grüninger
dc0b81fb1d Pass CMAKE_MAKE_PROGRAM to Fortran language support test
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
2020-06-27 23:52:38 +02:00
David Tellenbach
13d25f5ed8 Add initial CI configuration file.
The initial CI configuration consists of jobs to build and run tests and
to build docs.
2020-06-27 00:03:35 +00:00
Antonio Sanchez
7222f0b6b5 Fix packetmath_1 float tests for arm/aarch64.
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.

Modified `log(denorm)` tests.  Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.

Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Simon Pfreundschuh
14f84978e8 Replaced call to deprecated 'load' function with appropriate call to 'on'. 2020-06-23 11:23:13 +02:00
Antonio Sanchez
ff4e7a0820 Add missing Packet2l/Packet2ul ops for NEON.
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-22 11:24:43 -07:00
Antonio Sanchez
03ebdf6acb Added missing NEON pcasts, update packetmath tests.
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.

Removed incorrect `HasHalfPacket`  definition for NEON's
`Packet2l`/`Packet2ul`.

Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
  packets that are vectorizable)
- added 8:1 cast tests
- random number generation
  - original had uninteresting 0 to 0 casts for many casts between
    floating-point and integers, and exhibited signed overflow
    undefined behavior

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
6b9c92fe7e Add Apache 2.0 license text in COPYING.APACHE. 2020-06-18 12:45:27 -07:00
Nicolas Mellado
cf7adf3a5d Update things you can do message using cmake commands
Print cmake commands instead of make commands, which should work for any generator.
2020-06-16 21:04:33 +00:00
Ilya Tokar
231ce21535 Run two independent chains, when reducing tensors.
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.

Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.

BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
Pedro Caldeira
a475bf14d4 Fix pscatter and pgather for Altivec Complex double 2020-06-16 16:41:02 -03:00
David Tellenbach
c6c84ed961 Fix unused variable warning on Arm 2020-06-15 00:14:58 +02:00
Sebastien Boisvert
6228f27234 Fix #1818: SparseLU: add methods nnzL() and nnzU()
Now this compiles without errors:

$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
2020-06-11 23:49:49 +00:00
Sebastien Boisvert
39cbd6578f Fix #1911: add benchmark for move semantics with fixed-size matrix
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
        -o bench_move_semantics

$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Sebastien Boisvert
463ec86648 Fix #1757: remove the word 'suicide' 2020-06-11 00:56:54 +00:00
ShengYang1
b5d66b5e73 Implement scalar_cmp_with_cast_op 2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
c4059ffcb6 Fix static analyzer warning in SelfadjointProduct.h.
Fix compiler warnings in GeneralBlockPanelKernel.h.
2020-06-08 11:48:44 -07:00
Thales Sabino
1fcaaf460f Update FindComputeCpp.cmake to fix build problems on Windows
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
2020-06-05 20:51:20 +00:00
David Tellenbach
3ce18d3c8f Revert ".gitlab-ci.yml: initial commit"
This reverts commit 95177362ed to
disable GitLab CI temporarily.
2020-06-05 22:43:49 +02:00
Rasmus Munk Larsen
c2ab36f47a Fix broken packetmath test for logistic on Arm. 2020-06-04 16:24:47 -07:00
Rasmus Munk Larsen
537e2b322f Fix typo in previous update to generic predux_any. 2020-06-04 22:25:05 +00:00
Rasmus Munk Larsen
fdc1cbdce3 Avoid implicit float equality comparison in generic predux_any, but use numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal. 2020-06-04 22:15:56 +00:00
Rasmus Munk Larsen
daf9bbeca2 Fix compilation error in logistic packet op. 2020-06-03 00:57:41 +00:00
n0mend
6d2a9a524b Update run instructions for benchCholesky 2020-06-01 18:31:46 +00:00
Gael Guennebaud
029a76e115 Bug #1777: make the scalar and packet path consistent for the logistic function + respective unit test 2020-05-31 00:53:37 +02:00
Gael Guennebaud
99b7f7cb9c Fix #556: warnings with mingw 2020-05-31 00:39:44 +02:00
Gael Guennebaud
72782d13e0 Bug #1767: increase required cmake version to 3.5.0 2020-05-31 00:31:09 +02:00
Gael Guennebaud
867a756509 Fix #1833: compilation issue of "array!=scalar" with c++20 2020-05-30 23:53:58 +02:00
Gael Guennebaud
ab615e4114 Save one extra temporary when assigning a sparse product to a row-major sparse matrix 2020-05-30 23:15:12 +02:00
Christoph Junghans
95177362ed .gitlab-ci.yml: initial commit 2020-05-29 09:23:25 -06:00
Kan Chen
8d1302f566 Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ptranspose on NEON 2020-05-29 00:33:45 +00:00
Antonio Sánchez
8719b9c5bc Disable test for 32-bit systems (e.g. ARM, i386)
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.

https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Yong Tang
8e1df5b082 Fix incorrect usage of if defined(EIGEN_ARCH_PPC) => if EIGEN_ARCH_PPC
This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)`
in `Eigen/Core` header.

In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined
as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true.
This causes issues when building on non PPC platform and `MatrixProduct.h` is not
available.

This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2020-05-28 05:53:44 -07:00
Kan Chen
4e7046063b Fix #1874: it works on both MSVC 2017 and other platforms. 2020-05-21 18:42:56 +08:00
Pedro Caldeira
2d67af2d2b Add pscatter for Packet16{u}c (int8) 2020-05-20 17:29:34 -03:00
David Tellenbach
5328cd62b3 Guard usage of decltype since it's a C++11 feature
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
2020-05-20 16:04:16 +02:00
Rasmus Munk Larsen
cc86a31e20 Add guard around specialization for bool, which is only currently implemented for SSE. 2020-05-19 16:21:56 -07:00
Everton Constantino
8a7f360ec3 - Vectorizing MMA packing.
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
a145e4adf5 Add newline at the end of StlIterators.h. 2020-05-15 20:36:00 +00:00
Gael Guennebaud
8ce9630ddb Fix #1874: workaround MSVC 2017 compilation issue. 2020-05-15 20:47:32 +02:00
Rasmus Munk Larsen
9b411757ab Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
This change also contains a few minor cleanups:
  1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
     which can be done in other ways.
  2. Remove the "HasInsert" enum, which is no longer needed since we removed the
     corresponding packet ops.
  3. Add faster pselect op for Packet4i when SSE4.1 is supported.

Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.

Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                        Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4            9.77           9.77    71670320
BM_TransposeInPlace<float>/8           21.9           21.9     31929525
BM_TransposeInPlace<float>/16          66.6           66.6     10000000
BM_TransposeInPlace<float>/32         243            243        2879561
BM_TransposeInPlace<float>/59         844            844         829767
BM_TransposeInPlace<float>/64         933            933         750567
BM_TransposeInPlace<float>/128       3944           3945         177405
BM_TransposeInPlace<float>/256      16853          16853          41457
BM_TransposeInPlace<float>/512     204952         204968           3448
BM_TransposeInPlace<float>/1k     1053889        1053861            664
BM_TransposeInPlace<bool>/4            14.4           14.4     48637301
BM_TransposeInPlace<bool>/8            36.0           36.0     19370222
BM_TransposeInPlace<bool>/16           31.5           31.5     22178902
BM_TransposeInPlace<bool>/32          111            111        6272048
BM_TransposeInPlace<bool>/59          626            626        1000000
BM_TransposeInPlace<bool>/64          428            428        1632689
BM_TransposeInPlace<bool>/128        1677           1677         417377
BM_TransposeInPlace<bool>/256        7126           7126          96264
BM_TransposeInPlace<bool>/512       29021          29024          24165
BM_TransposeInPlace<bool>/1k       116321         116330           6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31 Added support for reverse iterators for Vectorwise operations. 2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5 Indexed view should have RowMajorBit when there is staticly a single row 2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28 Resolve "IndexedView of a vector should allow linear access" 2020-05-13 19:24:42 +00:00
Mark Eberlein
ba9d18b938 Add KLU support to spbenchsolver 2020-05-11 21:50:27 +00:00
Pedro Caldeira
5fdc179241 Altivec template functions to better code reusability 2020-05-11 21:04:51 +00:00
mehdi-goli
d3e81db6c5 Eigen moved the scanLauncehr function inside the internal namespace.
This commit applies the following changes:
    - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend.
    - Replacing  `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard.
    - minor fixes: commenting out an unused variable to avoid compiler warnings.
2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
David Tellenbach
5c4e19fbe7 Possibility to specify user-defined default cache sizes for GEBP kernel
Some architectures have no convinient way to determine cache sizes at
runtime. Eigen's GEBP kernel falls back to default cache values in this
case which might not be correct in all situations.

This patch introduces three preprocessor directives

  `EIGEN_DEFAULT_L1_CACHE_SIZE`
  `EIGEN_DEFAULT_L2_CACHE_SIZE`
  `EIGEN_DEFAULT_L3_CACHE_SIZE`

to give users the possibility to set these default values explicitly.
2020-05-08 12:54:36 +02:00
Rasmus Munk Larsen
225ab040e0 Remove unused packet op "palign".
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618 Make size odd for transposeInPlace test to make sure we hit the scalar path. 2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
49f1aeb60d Remove traits declaring NEON vectorized casts that do not actually have packet op implementations. 2020-05-07 09:49:22 -07:00
Rasmus Munk Larsen
2fd8a5a08f Add parallelization of TensorScanOp for types without packet ops.
Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors.

Benchmark numbers for Tensor<uint32_t>:

name                                                       old time/op             new time/op             delta
BM_cumSumRowReduction_1T/8   [using 1 threads]             76.5ns ± 0%             61.3ns ± 4%    -19.80%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/64  [using 1 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.77%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/256 [using 1 threads]             39.8µs ± 0%             39.6µs ± 0%     -0.60%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/4k  [using 1 threads]             13.9ms ± 0%             13.4ms ± 1%     -4.19%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/8   [using 2 threads]             76.8ns ± 0%             59.1ns ± 0%    -23.09%          (p=0.016 n=5+4)
BM_cumSumRowReduction_2T/64  [using 2 threads]             2.47µs ± 1%             2.41µs ± 1%     -2.53%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/256 [using 2 threads]             39.8µs ± 0%             34.7µs ± 6%    -12.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/4k  [using 2 threads]             13.8ms ± 1%              7.2ms ± 6%    -47.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/8   [using 8 threads]             76.4ns ± 0%             61.8ns ± 3%    -19.02%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/64  [using 8 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.84%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/256 [using 8 threads]             39.8µs ± 0%             28.3µs ±11%    -28.75%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/4k  [using 8 threads]             13.8ms ± 0%              2.7ms ± 5%    -80.39%          (p=0.008 n=5+5)
BM_cumSumColReduction_1T/8   [using 1 threads]             59.1ns ± 0%             80.3ns ± 0%    +35.94%          (p=0.029 n=4+4)
BM_cumSumColReduction_1T/64  [using 1 threads]             3.06µs ± 0%             3.08µs ± 1%       ~             (p=0.114 n=4+4)
BM_cumSumColReduction_1T/256 [using 1 threads]              175µs ± 0%              176µs ± 0%       ~             (p=0.190 n=4+5)
BM_cumSumColReduction_1T/4k  [using 1 threads]              824ms ± 1%              844ms ± 1%     +2.37%          (p=0.008 n=5+5)
BM_cumSumColReduction_2T/8   [using 2 threads]             59.0ns ± 0%             90.7ns ± 0%    +53.74%          (p=0.029 n=4+4)
BM_cumSumColReduction_2T/64  [using 2 threads]             3.06µs ± 0%             3.10µs ± 0%     +1.08%          (p=0.016 n=4+5)
BM_cumSumColReduction_2T/256 [using 2 threads]              176µs ± 0%              189µs ±18%       ~             (p=0.151 n=5+5)
BM_cumSumColReduction_2T/4k  [using 2 threads]              836ms ± 2%              611ms ±14%    -26.92%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/8   [using 8 threads]             59.3ns ± 2%             90.6ns ± 0%    +52.79%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/64  [using 8 threads]             3.07µs ± 0%             3.10µs ± 0%     +0.99%          (p=0.016 n=5+4)
BM_cumSumColReduction_8T/256 [using 8 threads]              176µs ± 0%               80µs ±19%    -54.51%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/4k  [using 8 threads]              827ms ± 2%              180ms ±14%    -78.24%          (p=0.008 n=5+5)
2020-05-06 14:48:37 -07:00
Rasmus Munk Larsen
0e59f786e1 Fix accidental copy of loop variable. 2020-05-05 21:35:38 +00:00
Rasmus Munk Larsen
7b76c85daf Vectorize and parallelize TensorScanOp.
TensorScanOp is used in TensorFlow for a number of operations, such as cumulative logexp reduction and cumulative sum and product reductions.

The benchmarks numbers below are for cumulative row- and column reductions of NxN matrices.

name                                                         old time/op             new time/op     delta
BM_cumSumRowReduction_1T/4    [using 1 threads ]             25.1ns ± 1%             35.2ns ± 1%    +40.45%
BM_cumSumRowReduction_1T/8    [using 1 threads ]             73.4ns ± 0%             82.7ns ± 3%    +12.74%
BM_cumSumRowReduction_1T/32   [using 1 threads ]              988ns ± 0%              832ns ± 0%    -15.77%
BM_cumSumRowReduction_1T/64   [using 1 threads ]             4.07µs ± 2%             3.47µs ± 0%    -14.70%
BM_cumSumRowReduction_1T/128  [using 1 threads ]             18.0µs ± 0%             16.8µs ± 0%     -6.58%
BM_cumSumRowReduction_1T/512  [using 1 threads ]              287µs ± 0%              281µs ± 0%     -2.22%
BM_cumSumRowReduction_1T/2k   [using 1 threads ]             4.78ms ± 1%             4.78ms ± 2%       ~
BM_cumSumRowReduction_1T/10k  [using 1 threads ]              117ms ± 1%              117ms ± 1%       ~
BM_cumSumRowReduction_8T/4    [using 8 threads ]             25.0ns ± 0%             35.2ns ± 0%    +40.82%
BM_cumSumRowReduction_8T/8    [using 8 threads ]             77.2ns ±16%             81.3ns ± 0%       ~
BM_cumSumRowReduction_8T/32   [using 8 threads ]              988ns ± 0%              833ns ± 0%    -15.67%
BM_cumSumRowReduction_8T/64   [using 8 threads ]             4.08µs ± 2%             3.47µs ± 0%    -14.95%
BM_cumSumRowReduction_8T/128  [using 8 threads ]             18.0µs ± 0%             17.3µs ±10%       ~
BM_cumSumRowReduction_8T/512  [using 8 threads ]              287µs ± 0%               58µs ± 6%    -79.92%
BM_cumSumRowReduction_8T/2k   [using 8 threads ]             4.79ms ± 1%             0.64ms ± 1%    -86.58%
BM_cumSumRowReduction_8T/10k  [using 8 threads ]              117ms ± 1%               18ms ± 6%    -84.50%

BM_cumSumColReduction_1T/4    [using 1 threads ]             23.9ns ± 0%             33.4ns ± 1%    +39.68%
BM_cumSumColReduction_1T/8    [using 1 threads ]             71.6ns ± 1%             49.1ns ± 3%    -31.40%
BM_cumSumColReduction_1T/32   [using 1 threads ]              973ns ± 0%              165ns ± 2%    -83.10%
BM_cumSumColReduction_1T/64   [using 1 threads ]             4.06µs ± 1%             0.57µs ± 1%    -85.94%
BM_cumSumColReduction_1T/128  [using 1 threads ]             33.4µs ± 1%              4.1µs ± 1%    -87.67%
BM_cumSumColReduction_1T/512  [using 1 threads ]             1.72ms ± 4%             0.21ms ± 5%    -87.91%
BM_cumSumColReduction_1T/2k   [using 1 threads ]              119ms ±53%               11ms ±35%    -90.42%
BM_cumSumColReduction_1T/10k  [using 1 threads ]              1.59s ±67%              0.35s ±49%    -77.96%
BM_cumSumColReduction_8T/4    [using 8 threads ]             23.8ns ± 0%             33.3ns ± 0%    +40.06%
BM_cumSumColReduction_8T/8    [using 8 threads ]             71.6ns ± 1%             49.2ns ± 5%    -31.33%
BM_cumSumColReduction_8T/32   [using 8 threads ]             1.01µs ±12%             0.17µs ± 3%    -82.93%
BM_cumSumColReduction_8T/64   [using 8 threads ]             4.15µs ± 4%             0.58µs ± 1%    -86.09%
BM_cumSumColReduction_8T/128  [using 8 threads ]             33.5µs ± 0%              4.1µs ± 4%    -87.65%
BM_cumSumColReduction_8T/512  [using 8 threads ]             1.71ms ± 3%             0.06ms ±16%    -96.21%
BM_cumSumColReduction_8T/2k   [using 8 threads ]             97.1ms ±14%              3.0ms ±23%    -96.88%
BM_cumSumColReduction_8T/10k  [using 8 threads ]              1.97s ± 8%              0.06s ± 2%    -96.74%
2020-05-05 00:19:43 +00:00
Xiaoxiang Cao
a74a278abd Fix confusing template param name for Stride fwd decl. 2020-04-30 01:43:05 +00:00
Rasmus Munk Larsen
923ee9aba3 Fix the embarrassingly incomplete fix to the embarrassing bug in blocked transpose. 2020-04-29 17:27:36 +00:00
Rasmus Munk Larsen
a32923a439 Fix (embarrassing) bug in blocked transpose. 2020-04-29 17:02:27 +00:00
Rasmus Munk Larsen
1e41406c36 Add missing transpose in cleanup loop. Without it, we trip an assertion in debug mode. 2020-04-29 01:30:51 +00:00
Rasmus Munk Larsen
fbe7916c55 Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile. 2020-04-29 00:58:41 +00:00
Clément Grégoire
82f54ad144 Fix perf monitoring merge function 2020-04-28 17:02:59 +00:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993 Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)

name                                       old time/op             new time/op             delta
BM_TransposeInPlace<float>/4               9.84ns ± 0%             6.51ns ± 0%  -33.80%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8               23.6ns ± 1%             17.6ns ± 0%  -25.26%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16              78.8ns ± 0%             60.3ns ± 0%  -23.50%          (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32               302ns ± 0%              229ns ± 0%  -24.40%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59              1.03µs ± 0%             0.84µs ± 1%  -17.87%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64              1.20µs ± 0%             0.89µs ± 1%  -25.81%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128             8.96µs ± 0%             3.82µs ± 2%  -57.33%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256              152µs ± 3%               17µs ± 2%  -89.06%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512              837µs ± 1%              208µs ± 0%  -75.15%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k              4.28ms ± 2%             1.08ms ± 2%  -74.72%          (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Pedro Caldeira
29f0917a43 Add support to vector instructions to Packet16uc and Packet16c 2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen
e80ec24357 Remove unused packet op "preduxp". 2020-04-23 18:17:14 +00:00
René Wagner
0aebe19aca BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.
This enables operator== on Eigen matrices in device code.
2020-04-23 17:25:08 +02:00
Eugene Zhulenev
3c02fefec5 Add async evaluation support to TensorSlicingOp.
Device::memcpy is not async-safe and might lead to deadlocks. Always evaluate slice expression in async mode.
2020-04-22 19:55:01 +00:00
Pedro Caldeira
0c67b855d2 Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations 2020-04-21 14:52:46 -03:00
Rasmus Munk Larsen
e8f40e4670 Fix bug in ptrue for Packet16b. 2020-04-20 21:45:10 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
dlazenby
00f6340153 Update PreprocessorDirectives.dox - Added line for the new VectorwiseOp plugin directive (and re-alphabatized the plugin section) 2020-04-17 21:43:37 +00:00
Rasmus Munk Larsen
5ab87d8aba Move eigen_packet_wrapper to GenericPacketMath.h and use it for SSE/AVX/AVX512 as it is already used for NEON.
This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i.
Use this machanism to define packets for half and clean up the packet op implementations.
2020-04-15 18:17:19 +00:00
Rasmus Munk Larsen
4aae8ac693 Fix typo in TypeCasting.h 2020-04-14 02:55:51 +00:00
Rasmus Munk Larsen
1d674003b2 Fix big in vectorized casting of
{uint8, int8} -> {int16, uint16, int32, uint32, float} 
 {uint16, int16} -> {int32, uint32, int64, uint64, float} 

for NEON. These conversions were advertised as vectorized, but not actually implemented.
2020-04-14 02:11:06 +00:00
Changming Sun
b1aa07a8d3 Fix a bug in TensorIndexList.h 2020-04-13 18:22:03 +00:00
Christoph Hertzberg
d46d726e9d CommaInitializer wrongfully asserted for 0-sized blocks
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
c854e189e6 Fixed commainitializer test.
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
2020-04-10 13:53:26 -07:00
jangsoopark
39142904cc Resolve C4346 when building eigen on windows 2020-04-08 14:55:39 +09:00
Rasmus Munk Larsen
f0577a2bfd Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.

Measured improvement with AVX+FMA:

name                           old time/op             new time/op             delta
BM_MatMul_ATB/8                 139ns ± 1%              129ns ± 1%   -7.49%          (p=0.008 n=5+5)
BM_MatMul_ATB/32               1.46µs ± 1%             1.22µs ± 0%  -16.72%          (p=0.008 n=5+5)
BM_MatMul_ATB/64               8.43µs ± 1%             7.41µs ± 0%  -12.04%          (p=0.008 n=5+5)
BM_MatMul_ATB/128              56.8µs ± 1%             52.9µs ± 1%   -6.83%          (p=0.008 n=5+5)
BM_MatMul_ATB/256               407µs ± 1%              395µs ± 3%   -2.94%          (p=0.032 n=5+5)
BM_MatMul_ATB/512              3.27ms ± 3%             3.18ms ± 1%     ~             (p=0.056 n=5+5)


Measured improvement for AVX512:

name                          old time/op             new time/op             delta
BM_MatMul_ATB/8                167ns ± 1%              154ns ± 1%   -7.63%          (p=0.008 n=5+5)
BM_MatMul_ATB/32              1.08µs ± 1%             0.83µs ± 3%  -23.58%          (p=0.008 n=5+5)
BM_MatMul_ATB/64              6.21µs ± 1%             5.06µs ± 1%  -18.47%          (p=0.008 n=5+5)
BM_MatMul_ATB/128             36.1µs ± 2%             31.3µs ± 1%  -13.32%          (p=0.008 n=5+5)
BM_MatMul_ATB/256              263µs ± 2%              242µs ± 2%   -7.92%          (p=0.008 n=5+5)
BM_MatMul_ATB/512             1.95ms ± 2%             1.91ms ± 2%     ~             (p=0.095 n=5+5)
BM_MatMul_ATB/1k              15.4ms ± 4%             14.8ms ± 2%     ~             (p=0.095 n=5+5)
2020-04-07 22:09:51 +00:00
Antonio Sanchez
8e875719b3 Replace norm() with squaredNorm() to address integer overflows
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.

Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Antonio Sanchez
9dda5eb7d2 Missing struct definition in NumTraits 2020-04-07 09:01:11 -07:00
Akshay Naresh Modi
bcc0e9e15c Add numeric_limits min and max for bool
This will allow (among other things) computation of argmax and argmin of bool tensors
2020-04-06 23:38:57 +00:00
Bernardo Bahia Monteiro
54a0a9c9dd Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar
The error generated by the compiler was:

    no matching function for call to 'maxi'
    RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);

The important part in the following notes was:

    candidate template ignored: deduced conflicting
    types for parameter 'T'"
    ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
    EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)

I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
2020-03-29 19:44:12 -04:00
Rasmus Munk Larsen
4fd5d1477b Fix packetmath test build for AVX. 2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
393dbd8ee9 Fix bug in 52d54278be 2020-03-27 16:42:18 +00:00
Rasmus Munk Larsen
55c8fe8d0f Fix bug in 52d54278be 2020-03-27 16:41:15 +00:00
Joel Holdsworth
6d2dbfc453 NEON: Fixed MSVC types definitions 2020-03-26 20:19:58 +00:00
Joel Holdsworth
52d54278be Additional NEON packet-math operations 2020-03-26 20:18:19 +00:00
Everton Constantino
deb93ed1bf Adhere to recommended load/store intrinsics for pp64le 2020-03-23 15:18:15 -03:00
Aaron Franke
5c22c7a7de Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Everton Constantino
5afdaa473a Fixing float32's pround halfway criteria to match STL's criteria. 2020-03-21 22:30:54 -05:00
Alessio M
96cd1ff718 Fixed:
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
2020-03-21 05:11:21 +00:00
dlazenby
cc954777f2 Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h 2020-03-20 19:30:01 +00:00
Masaki Murooka
55ecd58a3c Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base. 2020-03-20 13:37:37 +09:00
Rasmus Munk Larsen
4da2c6b197 Remove reference to non-existent unary_op_base class. 2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35 Add missing arguments to numext::absdiff(). 2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b Add absolute_difference coefficient-wise binary Array function 2020-03-19 17:45:20 +00:00
Everton Constantino
6ff5a14091 Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned
types.
2020-03-19 17:31:49 +00:00
Joel Holdsworth
232f904082 Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions 2020-03-19 17:24:06 +00:00
Joel Holdsworth
54aa8fa186 Implement integer square-root for NEON 2020-03-19 17:05:13 +00:00
Allan Leal
37ccb86916 Update NullaryFunctors.h 2020-03-16 11:59:02 +00:00
Deven Desai
7158ed4e0e Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as the Eigen::Half packet type 2020-03-12 01:06:24 +00:00
Joel Holdsworth
d53ae40f7b NEON: Added int64_t and uint64_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
4b9ecf2924 NEON: Added int8_t and uint8_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
ceaabd4e16 NEON: Added int16_t and uint16_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
d5d3cf9339 NEON: Added uint32_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
eacf97f727 NEON: Implemented half-size vectors 2020-03-10 22:46:19 +00:00
Joel Holdsworth
5f411b729e NEON: Set packet_traits<double> flags 2020-03-10 22:46:19 +00:00
Joel Holdsworth
88337acae2 test/packetmath: Add tests for all integer types 2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578 test/packetmath: Made negate non-mandatory 2020-03-10 22:46:19 +00:00
Sami Kama
b733b8b680 remove duplicate pset1 for half and add some comments about why we need expose pmul/add/div/min/max on host 2020-03-10 20:28:43 +00:00
Ram-Z
a45d28256d Don't restrict CMAKE_BUILD_TYPE
This prevents projects that add Eigen using `add_subdirectory` from using their own custom CMAKE_BUILD_TYPE and have Eigen respect the same custom flags.
2020-02-28 20:46:53 +00:00
Cédric Hubert
98bfc5aaa8 Update MarketIO.h 2020-02-28 12:41:51 +00:00
Rasmus Munk Larsen
52a2fbbb00 Revert "avoid selecting half-packets when unnecessary"
This reverts commit 5ca10480b0
2020-02-25 01:07:43 +00:00
Rasmus Munk Larsen
235bcfe08d Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"
This reverts commit 44df2109c8
2020-02-25 01:07:28 +00:00
Rasmus Munk Larsen
d7a42eade6 Revert "do not pick full-packet if it'd result in more operations"
This reverts commit e9cc0cd353
2020-02-25 01:07:15 +00:00
Rasmus Munk Larsen
6ac37768a9 Revert "add some static checks for packet-picking logic"
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8 Disable test in test/vectorization_logic.cpp, which is currently failing with AVX. 2020-02-24 23:28:25 +00:00
Tobias Bosch
f0ce88cff7 Include <sstream> explicitly, and don't rely on the implicit include via <complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
2020-02-24 23:09:36 +00:00
Ilya Tokar
eb6cc29583 Avoid a division in NonBlockingThreadPool::Steal.
Looking at profiles we spend ~10-20% of Steal on simply computing
random % size. We can reduce random 32-bit int into [0, size) range with
a single multiplication and shift. This transformation is described in
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
2020-02-14 16:02:57 -05:00
Francesco Mazzoli
7769600245 add some static checks for packet-picking logic 2020-02-07 18:16:16 +01:00
Francesco Mazzoli
e9cc0cd353 do not pick full-packet if it'd result in more operations
See comment and
<https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
44df2109c8 Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE
See comment for details.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
5ca10480b0 avoid selecting half-packets when unnecessary
See
<https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation>
for an explanation of the problem this solves.

In short, for some reason, before this commit the half-packet is
selected when the array / matrix size is not a multiple of
`unpacket_traits<PacketType>::size`, where `PacketType` starts out
being the full Packet.

For example, for some data of 100 `float`s, `Packet4f` will be
selected rather than `Packet8f`, because 100 is not a multiple of 8,
the size of `Packet8f`.

This commit switches to selecting the half-packet if the size is
less than the packet size, which seems to make more sense.

As I stated in the SO post I'm not sure that I'm understanding the
issue correctly, but this fix resolves the issue in my program. Moreover,
`make check` passes, with the exception of line 614 and 616 in
`test/packetmath.cpp`, which however also fail on master on my machine:

    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0);
    ...
    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
2020-02-07 18:16:16 +01:00
Eugene Zhulenev
f584bd9b30 Fail at compile time if default executor tries to use non-default device 2020-02-06 22:43:24 +00:00
Eugene Zhulenev
3fda850c46 Remove dead code from TensorReduction.h 2020-01-29 18:45:31 +00:00
Jeff Daily
b5df8cabd7 fix hip-clang compilation due to new HIP scalar accessor 2020-01-20 21:08:52 +00:00
Deven Desai
6d284bb1b7 Fix for HIP breakage - 200115. Adding a missing EIGEN_DEVICE_FUNC attr 2020-01-16 00:51:43 +00:00
Srinivas Vasudevan
f6c6de5d63 Ensure Igamma does not NaN or Inf for large values. 2020-01-14 21:32:48 +00:00
Rasmus Munk Larsen
6601abce86 Remove rogue include in TypeCasting.h. Meta.h is already included by the top-level header in Eigen/Core. 2020-01-14 21:03:53 +00:00
Eugene Zhulenev
b9362fb8f7 Convert StridedLinearBufferCopy::Kind to enum class 2020-01-13 11:43:24 -08:00
Everton Constantino
5a8b97b401 Switching unpacket_traits<Packet4i> to vectorizable=true. 2020-01-13 16:08:20 -03:00
Everton Constantino
42838c28b8 Adding correct cache sizes for PPC architecture. 2020-01-13 16:58:14 +00:00
Christoph Hertzberg
1d0c45122a Removing executable bit from file mode 2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Rasmus Munk Larsen
e1ecfc162d call Explicitly ::rint and ::rintf for targets without c++11. Without this, the Windows build breaks when trying to compile numext::rint<double>. 2020-01-10 21:14:08 +00:00
Joel Holdsworth
da5a7afed0 Improvements to the tidiness and completeness of the NEON implementation 2020-01-10 18:31:15 +00:00
Anuj Rawat
452371cead Fix for gcc build error when using Eigen headers with AVX512 2020-01-10 18:05:42 +00:00
mehdi-goli
601f89dfd0 Adding RInt vector support for SYCL. 2020-01-10 18:00:36 +00:00
Matthew Powelson
2ea5a715cf Properly initialize b vector in SplineFitting
InterpolateWithDerivative does not initialize the be vector correctly. This issue is discussed In stackoverflow question 48382939.
2020-01-09 21:29:04 +00:00
Rasmus Munk Larsen
9254974115 Don't add EIGEN_DEVICE_FUNC to random() since ::rand is not available in Cuda. 2020-01-09 21:23:09 +00:00
Rasmus Munk Larsen
a3ec89b5bd Add missing EIGEN_DEVICE_FUNC annotations in MathFunctions.h. 2020-01-09 21:06:34 +00:00
Christoph Hertzberg
8333e03590 Use data.data() instead of &data (since it is not obvious that Array is trivially copyable) 2020-01-09 11:38:19 +01:00
Rasmus Munk Larsen
e6fcee995b Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower. 2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen
4217a9f090 The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()). 2020-01-08 22:21:37 +00:00
Christoph Hertzberg
9623c0c4b9 Fix formatting 2020-01-08 13:58:18 +01:00
Ilya Tokar
19876ced76 Bug #1785: Introduce numext::rint.
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
mehdi-goli
d0ae052da4 [SYCL Backend]
* Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL
 * Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend
 * Reducing the SYCL warning by  removing the abort function inside the kernel
 * Adding Strong inline to functions inside SYCL interop.
2020-01-07 15:13:37 +00:00
Everton Constantino
eedb7eeacf Protecting integer_types's long long test with a check to see if we have CXX11 support. 2020-01-07 14:35:35 +00:00
Christoph Hertzberg
bcbaad6d87 Bug #1800: Guard against misleading indentation 2020-01-03 13:47:43 +01:00
Janek Kozicki
00de570793 Fix -Werror -Wfloat-conversion warning. 2019-12-23 23:52:44 +01:00
Deven Desai
636e2bb3fa Fix for HIP breakage - 191220
The breakage was introduced by the following commit :

ae07801dd8

After the commit, HIPCC errors out on some tests with the following error

```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_device_1.dir/cxx11_tensor_device_1_generated_cxx11_tensor_device.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_device.cu:17:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor💯
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:129:12: error: no matching constructor for initialization of 'Eigen::internal::TensorBlockResourceRequirements'
    return {merge(lhs.shape_type, rhs.shape_type),           // shape_type
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
struct TensorBlockResourceRequirements {
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 5 arguments, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 3 were provided
...
...
```

The fix is to explicitly decalre the (implicitly called) constructor as a device func
2019-12-20 21:28:00 +00:00
Christoph Hertzberg
1e9664b147 Bug #1796: Make matrix squareroot usable for Map and Ref types 2019-12-20 18:10:22 +01:00
Christoph Hertzberg
d86544d654 Reduce code duplication and avoid confusing Doxygen 2019-12-19 19:48:39 +01:00
Christoph Hertzberg
dde279f57d Hide recursive meta templates from Doxygen 2019-12-19 19:47:23 +01:00
Christoph Hertzberg
c21771ac04 Use double-braces initialization (as everywhere else in the test-suite). 2019-12-19 19:20:48 +01:00
Christoph Hertzberg
a3273aeff8 Fix trivial shadow warning 2019-12-19 19:13:11 +01:00
Christoph Hertzberg
870e53c0f2 Bug #1788: Fix rule-of-three violations inside the stable modules.
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587)
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f Fix unit-test which I broke in previous fix 2019-12-19 13:42:14 +01:00
Eugene Zhulenev
7a65219a2e Fix TensorPadding bug in squeezed reads from inner dimension 2019-12-19 05:43:57 +00:00
Eugene Zhulenev
73e55525e5 Return const data pointer from TensorRef evaluator.data() 2019-12-18 23:19:36 +00:00
Eugene Zhulenev
ae07801dd8 Tensor block evaluation cost model 2019-12-18 20:07:00 +00:00
Christoph Hertzberg
72166d0e6e Fix some maybe-unitialized warnings 2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac Workaround class-memaccess warnings on newer GCC versions 2019-12-18 16:37:26 +01:00
Jeff Daily
de07c4d1c2 fix compilation due to new HIP scalar accessor 2019-12-17 20:27:30 +00:00
Eugene Zhulenev
788bef6ab5 Reduce block evaluation overhead for small tensor expressions 2019-12-17 19:06:14 +00:00
Rasmus Munk Larsen
7252163335 Add default definition for EIGEN_PREDICT_* 2019-12-16 22:31:59 +00:00
Rasmus Munk Larsen
a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
Christoph Hertzberg
8e5da71466 Resolve double-promotion warnings when compiling with clang.
`sin` was calling `sin(double)` instead of `std::sin(float)`
2019-12-13 22:46:40 +01:00
Christoph Hertzberg
9b7a2b43c2 Renamed .hgignore to .gitignore (removing hg-specific "syntax" line) 2019-12-13 19:40:57 +01:00
Ilya Tokar
06e99aaf40 Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen
73a8d572f5 Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781. 2019-12-12 19:34:25 +00:00
Srinivas Vasudevan
88062b7fed Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one. 2019-12-12 01:56:54 +00:00
Eugene Zhulenev
381f8f3139 Initialize non-trivially constructible types when allocating a temp buffer. 2019-12-12 01:31:30 +00:00
Eugene Zhulenev
64272c7f40 Squeeze reads from two inner dimensions in TensorPadding 2019-12-11 16:54:51 -08:00
Eugene Zhulenev
963ba1015b Add back accidentally deleted default constructor to TensorExecutorTilingContext. 2019-12-11 18:47:55 +00:00
Joel Holdsworth
1b6e0395e6 Added io test 2019-12-11 18:22:57 +00:00
Joel Holdsworth
3c0ef9f394 IO: Fixed printing of char and unsigned char matrices 2019-12-11 18:22:57 +00:00
Joel Holdsworth
e87af0ed37 Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t 2019-12-11 18:22:57 +00:00
Gael Guennebaud
15b3bcfca0 Bug 1786: fix compilation with MSVC 2019-12-11 16:16:38 +01:00
Eugene Zhulenev
c9220c035f Remove block memory allocation required by removed block evaluation API 2019-12-10 17:15:55 -08:00
Eugene Zhulenev
1c879eb010 Remove V2 suffix from TensorBlock 2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880 Remove TensorBlock.h and old TensorBlock/BlockMapper 2019-12-10 14:31:44 -08:00
Deven Desai
c49f0d851a Fix for HIP breakage detected on 191210
The following commit introduces compile errors when running eigen with hipcc

2918f85ba9

hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods
2019-12-10 22:14:05 +00:00
Eugene Zhulenev
2918f85ba9 Do not use std::vector in getResourceRequirements 2019-12-09 16:19:55 -08:00
Artem Belevich
8056a05b54 Undo the block size change.
.z *is* used by the EigenContractionKernelInternal().
2019-12-09 11:10:29 -08:00
Eugene Zhulenev
dbb703d44e Add async evaluation support to TensorSelectOp 2019-12-09 18:36:13 +00:00
Janek Kozicki
11d6465326 fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing. 2019-12-06 21:07:39 +01:00
Eugene Zhulenev
bb7ccac3af Add recursive work splitting to EvalShardedByInnerDimContext 2019-12-05 14:51:49 -08:00
Artem Belevich
25230d1862 Improve performance of contraction kernels
* Force-inline implementations. They pass around pointers to shared memory
  blocks. Without inlining compiler must operate via generic pointers.
  Inlining allows compiler to detect that we're operating on shared memory
  which allows generation of substantially faster code.

* Fixed a long-standing typo which resulted in launching 8x more kernels
  than we needed (.z dimension of the block is unused by the kernel).
2019-12-05 12:48:34 -08:00
Gael Guennebaud
08eeb648ea update hg to git hashes 2019-12-05 16:33:24 +01:00
Rasmus Munk Larsen
366cf005b0 Add missing initialization in cxx11_tensor_trace.cpp. 2019-12-04 23:56:37 +00:00
Gael Guennebaud
c488b8b32f Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
Gael Guennebaud
8fbe0e4699 Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
Gael Guennebaud
114a15c66a Added tag before-git-migration for changeset a7c7d329d8 2019-12-04 10:06:00 +01:00
363 changed files with 35327 additions and 12974 deletions

View File

@@ -1,4 +1,3 @@
syntax: glob
qrc_*cxx
*.orig
*.pyc
@@ -36,3 +35,4 @@ lapack/reference
.*project
.settings
Makefile
!ci/build.gitlab-ci.yml

20
.gitlab-ci.yml Normal file
View File

@@ -0,0 +1,20 @@
# This file is part of Eigen, a lightweight C++ template library
# for linear algebra.
#
# Copyright (C) 2020 Arm Ltd. and Contributors
#
# This Source Code Form is subject to the terms of the Mozilla
# Public License v. 2.0. If a copy of the MPL was not distributed
# with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
stages:
- build
- test
variables:
BUILDDIR: builddir
EIGEN_CI_CMAKE_GENEATOR: "Ninja"
include:
- "/ci/build.gitlab-ci.yml"
- "/ci/test.gitlab-ci.yml"

View File

@@ -1,6 +1,7 @@
project(Eigen3)
# cmake_minimum_require must be the first command of the file
cmake_minimum_required(VERSION 3.5.0)
cmake_minimum_required(VERSION 2.8.11)
project(Eigen3)
# guard against in-source builds
@@ -20,13 +21,6 @@ if (NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release")
endif()
string(TOLOWER "${CMAKE_BUILD_TYPE}" cmake_build_type_tolower)
if( NOT cmake_build_type_tolower STREQUAL "debug"
AND NOT cmake_build_type_tolower STREQUAL "release"
AND NOT cmake_build_type_tolower STREQUAL "relwithdebinfo")
message(FATAL_ERROR "Unknown build type \"${CMAKE_BUILD_TYPE}\". Allowed values are Debug, Release, RelWithDebInfo (case-insensitive).")
endif()
#############################################################################
# retrieve version information #
@@ -42,29 +36,28 @@ string(REGEX MATCH "define[ \t]+EIGEN_MINOR_VERSION[ \t]+([0-9]+)" _eigen_minor_
set(EIGEN_MINOR_VERSION "${CMAKE_MATCH_1}")
set(EIGEN_VERSION_NUMBER ${EIGEN_WORLD_VERSION}.${EIGEN_MAJOR_VERSION}.${EIGEN_MINOR_VERSION})
# if we are not in a mercurial clone
if(IS_DIRECTORY ${CMAKE_SOURCE_DIR}/.hg)
# if the mercurial program is absent or this will leave the EIGEN_HG_CHANGESET string empty,
# if we are not in a git clone
if(IS_DIRECTORY ${CMAKE_SOURCE_DIR}/.git)
# if the git program is absent or this will leave the EIGEN_GIT_REVNUM string empty,
# but won't stop CMake.
execute_process(COMMAND hg tip -R ${CMAKE_SOURCE_DIR} OUTPUT_VARIABLE EIGEN_HGTIP_OUTPUT)
execute_process(COMMAND hg branch -R ${CMAKE_SOURCE_DIR} OUTPUT_VARIABLE EIGEN_BRANCH_OUTPUT)
execute_process(COMMAND git ls-remote --refs -q ${CMAKE_SOURCE_DIR} HEAD OUTPUT_VARIABLE EIGEN_GIT_OUTPUT)
endif()
# if this is the default (aka development) branch, extract the mercurial changeset number from the hg tip output...
if(EIGEN_BRANCH_OUTPUT MATCHES "default")
string(REGEX MATCH "^changeset: *[0-9]*:([0-9;a-f]+).*" EIGEN_HG_CHANGESET_MATCH "${EIGEN_HGTIP_OUTPUT}")
set(EIGEN_HG_CHANGESET "${CMAKE_MATCH_1}")
# extract the git rev number from the git output...
if(EIGEN_GIT_OUTPUT)
string(REGEX MATCH "^([0-9;a-f]+).*" EIGEN_GIT_CHANGESET_MATCH "${EIGEN_GIT_OUTPUT}")
set(EIGEN_GIT_REVNUM "${CMAKE_MATCH_1}")
endif()
#...and show it next to the version number
if(EIGEN_HG_CHANGESET)
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER} (mercurial changeset ${EIGEN_HG_CHANGESET})")
if(EIGEN_GIT_REVNUM)
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER} (git rev ${EIGEN_GIT_REVNUM})")
else()
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER}")
endif()
include(CheckCXXCompilerFlag)
include(GNUInstallDirs)
include(CMakeDependentOption)
set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)
@@ -248,15 +241,30 @@ if(NOT MSVC)
message(STATUS "Enabling FMA in tests/examples")
endif()
option(EIGEN_TEST_AVX2 "Enable/Disable AVX2 in tests/examples" OFF)
if(EIGEN_TEST_AVX2)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -mfma")
message(STATUS "Enabling AVX2 in tests/examples")
endif()
option(EIGEN_TEST_AVX512 "Enable/Disable AVX512 in tests/examples" OFF)
if(EIGEN_TEST_AVX512)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -mfma -DEIGEN_ENABLE_AVX512")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -mfma")
if (NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fabi-version=6")
endif()
message(STATUS "Enabling AVX512 in tests/examples")
endif()
option(EIGEN_TEST_AVX512DQ "Enable/Disable AVX512DQ in tests/examples" OFF)
if(EIGEN_TEST_AVX512DQ)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512dq")
if (NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fabi-version=6")
endif()
message(STATUS "Enabling AVX512DQ in tests/examples")
endif()
option(EIGEN_TEST_F16C "Enable/Disable F16C in tests/examples" OFF)
if(EIGEN_TEST_F16C)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mf16c")
@@ -416,22 +424,27 @@ endif()
if(EIGEN_INCLUDE_INSTALL_DIR AND NOT INCLUDE_INSTALL_DIR)
set(INCLUDE_INSTALL_DIR ${EIGEN_INCLUDE_INSTALL_DIR}
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
else()
set(INCLUDE_INSTALL_DIR
"${CMAKE_INSTALL_INCLUDEDIR}/eigen3"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
)
endif()
set(CMAKEPACKAGE_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/eigen3/cmake"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
)
set(PKGCONFIG_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/pkgconfig"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
)
foreach(var INCLUDE_INSTALL_DIR CMAKEPACKAGE_INSTALL_DIR PKGCONFIG_INSTALL_DIR)
if(IS_ABSOLUTE "${${var}}")
message(FATAL_ERROR "${var} must be relative to CMAKE_PREFIX_PATH. Got: ${${var}}")
endif()
endforeach()
# similar to set_target_properties but append the property instead of overwriting it
macro(ei_add_target_property target prop value)
@@ -458,7 +471,12 @@ endif()
install(DIRECTORY Eigen DESTINATION ${INCLUDE_INSTALL_DIR} COMPONENT Devel)
add_subdirectory(doc EXCLUDE_FROM_ALL)
option(EIGEN_BUILD_DOC "Enable creation of Eigen documentation" ON)
if(EIGEN_BUILD_DOC)
add_subdirectory(doc EXCLUDE_FROM_ALL)
endif()
option(BUILD_TESTING "Enable creation of Eigen tests." ON)
if(BUILD_TESTING)
@@ -486,6 +504,7 @@ option(EIGEN_TEST_SYCL "Add Sycl support." OFF)
option(EIGEN_SYCL_TRISYCL "Use the triSYCL Sycl implementation (ComputeCPP by default)." OFF)
if(EIGEN_TEST_SYCL)
set (CMAKE_MODULE_PATH "${CMAKE_ROOT}/Modules" "cmake/Modules/" "${CMAKE_MODULE_PATH}")
find_package(Threads REQUIRED)
if(EIGEN_SYCL_TRISYCL)
message(STATUS "Using triSYCL")
include(FindTriSYCL)
@@ -538,32 +557,30 @@ message(STATUS "")
string(TOLOWER "${CMAKE_GENERATOR}" cmake_generator_tolower)
if(cmake_generator_tolower MATCHES "makefile")
message(STATUS "Some things you can do now:")
message(STATUS "--------------+--------------------------------------------------------------")
message(STATUS "Command | Description")
message(STATUS "--------------+--------------------------------------------------------------")
message(STATUS "make install | Install Eigen. Headers will be installed to:")
message(STATUS " | <CMAKE_INSTALL_PREFIX>/<INCLUDE_INSTALL_DIR>")
message(STATUS " | Using the following values:")
message(STATUS " | CMAKE_INSTALL_PREFIX: ${CMAKE_INSTALL_PREFIX}")
message(STATUS " | INCLUDE_INSTALL_DIR: ${INCLUDE_INSTALL_DIR}")
message(STATUS " | Change the install location of Eigen headers using:")
message(STATUS " | cmake . -DCMAKE_INSTALL_PREFIX=yourprefix")
message(STATUS " | Or:")
message(STATUS " | cmake . -DINCLUDE_INSTALL_DIR=yourdir")
message(STATUS "make doc | Generate the API documentation, requires Doxygen & LaTeX")
if(BUILD_TESTING)
message(STATUS "make check | Build and run the unit-tests. Read this page:")
message(STATUS " | http://eigen.tuxfamily.org/index.php?title=Tests")
endif()
message(STATUS "make blas | Build BLAS library (not the same thing as Eigen)")
message(STATUS "make uninstall| Removes files installed by make install")
message(STATUS "--------------+--------------------------------------------------------------")
message(STATUS "Available targets (use: make TARGET):")
else()
message(STATUS "To build/run the unit tests, read this page:")
message(STATUS " http://eigen.tuxfamily.org/index.php?title=Tests")
message(STATUS "Available targets (use: cmake --build . --target TARGET):")
endif()
message(STATUS "---------+--------------------------------------------------------------")
message(STATUS "Target | Description")
message(STATUS "---------+--------------------------------------------------------------")
message(STATUS "install | Install Eigen. Headers will be installed to:")
message(STATUS " | <CMAKE_INSTALL_PREFIX>/<INCLUDE_INSTALL_DIR>")
message(STATUS " | Using the following values:")
message(STATUS " | CMAKE_INSTALL_PREFIX: ${CMAKE_INSTALL_PREFIX}")
message(STATUS " | INCLUDE_INSTALL_DIR: ${INCLUDE_INSTALL_DIR}")
message(STATUS " | Change the install location of Eigen headers using:")
message(STATUS " | cmake . -DCMAKE_INSTALL_PREFIX=yourprefix")
message(STATUS " | Or:")
message(STATUS " | cmake . -DINCLUDE_INSTALL_DIR=yourdir")
message(STATUS "doc | Generate the API documentation, requires Doxygen & LaTeX")
if(BUILD_TESTING)
message(STATUS "check | Build and run the unit-tests. Read this page:")
message(STATUS " | http://eigen.tuxfamily.org/index.php?title=Tests")
endif()
message(STATUS "blas | Build BLAS library (not the same thing as Eigen)")
message(STATUS "uninstall| Remove files installed by the install target")
message(STATUS "---------+--------------------------------------------------------------")
message(STATUS "")
@@ -575,82 +592,48 @@ set ( EIGEN_DEFINITIONS "")
set ( EIGEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${INCLUDE_INSTALL_DIR}" )
set ( EIGEN_ROOT_DIR ${CMAKE_INSTALL_PREFIX} )
# Interface libraries require at least CMake 3.0
if (NOT CMAKE_VERSION VERSION_LESS 3.0)
include (CMakePackageConfigHelpers)
include (CMakePackageConfigHelpers)
# Imported target support
add_library (eigen INTERFACE)
add_library (Eigen3::Eigen ALIAS eigen)
target_compile_definitions (eigen INTERFACE ${EIGEN_DEFINITIONS})
target_include_directories (eigen INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
$<INSTALL_INTERFACE:${INCLUDE_INSTALL_DIR}>
)
# Imported target support
add_library (eigen INTERFACE)
add_library (Eigen3::Eigen ALIAS eigen)
target_compile_definitions (eigen INTERFACE ${EIGEN_DEFINITIONS})
target_include_directories (eigen INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
$<INSTALL_INTERFACE:${INCLUDE_INSTALL_DIR}>
)
# Export as title case Eigen
set_target_properties (eigen PROPERTIES EXPORT_NAME Eigen)
# Export as title case Eigen
set_target_properties (eigen PROPERTIES EXPORT_NAME Eigen)
install (TARGETS eigen EXPORT Eigen3Targets)
install (TARGETS eigen EXPORT Eigen3Targets)
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3Config.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
PATH_VARS EIGEN_INCLUDE_DIR EIGEN_ROOT_DIR
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
# Remove CMAKE_SIZEOF_VOID_P from Eigen3ConfigVersion.cmake since Eigen does
# not depend on architecture specific settings or libraries. More
# specifically, an Eigen3Config.cmake generated from a 64 bit target can be
# used for 32 bit targets as well (and vice versa).
set (_Eigen3_CMAKE_SIZEOF_VOID_P ${CMAKE_SIZEOF_VOID_P})
unset (CMAKE_SIZEOF_VOID_P)
write_basic_package_version_file (Eigen3ConfigVersion.cmake
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion)
set (CMAKE_SIZEOF_VOID_P ${_Eigen3_CMAKE_SIZEOF_VOID_P})
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3Config.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
PATH_VARS EIGEN_INCLUDE_DIR EIGEN_ROOT_DIR
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
# Remove CMAKE_SIZEOF_VOID_P from Eigen3ConfigVersion.cmake since Eigen does
# not depend on architecture specific settings or libraries. More
# specifically, an Eigen3Config.cmake generated from a 64 bit target can be
# used for 32 bit targets as well (and vice versa).
set (_Eigen3_CMAKE_SIZEOF_VOID_P ${CMAKE_SIZEOF_VOID_P})
unset (CMAKE_SIZEOF_VOID_P)
write_basic_package_version_file (Eigen3ConfigVersion.cmake
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion)
set (CMAKE_SIZEOF_VOID_P ${_Eigen3_CMAKE_SIZEOF_VOID_P})
# The Eigen target will be located in the Eigen3 namespace. Other CMake
# targets can refer to it using Eigen3::Eigen.
export (TARGETS eigen NAMESPACE Eigen3:: FILE Eigen3Targets.cmake)
# Export Eigen3 package to CMake registry such that it can be easily found by
# CMake even if it has not been installed to a standard directory.
export (PACKAGE Eigen3)
# The Eigen target will be located in the Eigen3 namespace. Other CMake
# targets can refer to it using Eigen3::Eigen.
export (TARGETS eigen NAMESPACE Eigen3:: FILE Eigen3Targets.cmake)
# Export Eigen3 package to CMake registry such that it can be easily found by
# CMake even if it has not been installed to a standard directory.
export (PACKAGE Eigen3)
install (EXPORT Eigen3Targets NAMESPACE Eigen3:: DESTINATION ${CMAKEPACKAGE_INSTALL_DIR})
else ()
# Fallback to legacy Eigen3Config.cmake without the imported target
# If CMakePackageConfigHelpers module is available (CMake >= 2.8.8)
# create a relocatable Config file, otherwise leave the hardcoded paths
include(CMakePackageConfigHelpers OPTIONAL RESULT_VARIABLE CPCH_PATH)
if(CPCH_PATH)
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3ConfigLegacy.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
PATH_VARS EIGEN_INCLUDE_DIR EIGEN_ROOT_DIR
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
else()
# The PACKAGE_* variables are defined by the configure_package_config_file
# but without it we define them manually to the hardcoded paths
set(PACKAGE_INIT "")
set(PACKAGE_EIGEN_INCLUDE_DIR ${EIGEN_INCLUDE_DIR})
set(PACKAGE_EIGEN_ROOT_DIR ${EIGEN_ROOT_DIR})
configure_file ( ${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3ConfigLegacy.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
@ONLY ESCAPE_QUOTES )
endif()
write_basic_package_version_file( Eigen3ConfigVersion.cmake
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion )
endif ()
install (EXPORT Eigen3Targets NAMESPACE Eigen3:: DESTINATION ${CMAKEPACKAGE_INSTALL_DIR})
install ( FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake/UseEigen3.cmake
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
@@ -660,3 +643,7 @@ install ( FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake/UseEigen3.cmake
# Add uninstall target
add_custom_target ( uninstall
COMMAND ${CMAKE_COMMAND} -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/EigenUninstall.cmake)
if (EIGEN_SPLIT_TESTSUITE)
ei_split_testsuite("${EIGEN_SPLIT_TESTSUITE}")
endif()

203
COPYING.APACHE Normal file
View File

@@ -0,0 +1,203 @@
/*
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

View File

@@ -23,4 +23,4 @@
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
*/

View File

@@ -1,52 +1,51 @@
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.

View File

@@ -43,4 +43,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_CHOLESKY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_CORE_H
#define EIGEN_CORE_H
// first thing Eigen does: stop the compiler from committing suicide
// first thing Eigen does: stop the compiler from reporting useless warnings.
#include "src/Core/util/DisableStupidWarnings.h"
// then include this file where all our macros are defined. It's really important to do it first because
@@ -22,7 +22,7 @@
#include "src/Core/util/ConfigureVectorization.h"
// We need cuda_runtime.h/hip_runtime.h to ensure that
// the EIGEN_USING_STD_MATH macro works properly on the device side
// the EIGEN_USING_STD macro works properly on the device side
#if defined(EIGEN_CUDACC)
#include <cuda_runtime.h>
#elif defined(EIGEN_HIPCC)
@@ -36,7 +36,7 @@
// Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3)
// See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details.
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6)
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6) && EIGEN_GNUC_AT_MOST(5,5)
#pragma GCC optimize ("-fno-ipa-cp-clone")
#endif
@@ -51,6 +51,10 @@
#define EIGEN_HAS_GPU_FP16
#endif
#if defined(EIGEN_HAS_CUDA_BF16) || defined(EIGEN_HAS_HIP_BF16)
#define EIGEN_HAS_GPU_BF16
#endif
#if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE)
#define EIGEN_HAS_OPENMP
#endif
@@ -72,6 +76,7 @@
#include <cmath>
#include <cassert>
#include <functional>
#include <sstream>
#ifndef EIGEN_NO_IO
#include <iosfwd>
#endif
@@ -107,7 +112,7 @@
#undef isnan
#undef isinf
#undef isfinite
#include <SYCL/sycl.hpp>
#include <CL/sycl.hpp>
#include <map>
#include <memory>
#include <utility>
@@ -162,6 +167,7 @@ using std::ptrdiff_t;
#include "src/Core/arch/Default/ConjHelper.h"
// Generic half float support
#include "src/Core/arch/Default/Half.h"
#include "src/Core/arch/Default/BFloat16.h"
#include "src/Core/arch/Default/TypeCasting.h"
#include "src/Core/arch/Default/GenericPacketMathFunctionsFwd.h"
@@ -202,6 +208,10 @@ using std::ptrdiff_t;
#include "src/Core/arch/NEON/TypeCasting.h"
#include "src/Core/arch/NEON/MathFunctions.h"
#include "src/Core/arch/NEON/Complex.h"
#elif defined EIGEN_VECTORIZE_SVE
#include "src/Core/arch/SVE/PacketMath.h"
#include "src/Core/arch/SVE/TypeCasting.h"
#include "src/Core/arch/SVE/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_ZVECTOR
#include "src/Core/arch/ZVector/PacketMath.h"
#include "src/Core/arch/ZVector/MathFunctions.h"
@@ -329,6 +339,12 @@ using std::ptrdiff_t;
#include "src/Core/CoreIterators.h"
#include "src/Core/ConditionEstimator.h"
#if defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/MatrixProduct.h"
#elif defined EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/GeneralBlockPanelKernel.h"
#endif
#include "src/Core/BooleanRedux.h"
#include "src/Core/Select.h"
#include "src/Core/VectorwiseOp.h"

View File

@@ -58,4 +58,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_EIGENVALUES_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -50,11 +50,10 @@
#include "src/Geometry/Umeyama.h"
// Use the SSE optimized version whenever possible.
#if defined EIGEN_VECTORIZE_SSE
#include "src/Geometry/arch/Geometry_SSE.h"
#if (defined EIGEN_VECTORIZE_SSE) || (defined EIGEN_VECTORIZE_NEON)
#include "src/Geometry/arch/Geometry_SIMD.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_GEOMETRY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -27,4 +27,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_HOUSEHOLDER_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -29,5 +29,4 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_JACOBI_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -40,11 +40,10 @@
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
#include "src/LU/arch/Inverse_SSE.h"
#if (defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX) || defined EIGEN_VECTORIZE_NEON
#include "src/LU/arch/InverseSize4.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_LU_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -48,4 +48,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_QR_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -37,4 +37,3 @@ void *qRealloc(void *ptr, std::size_t size)
#endif
#endif // EIGEN_QTMALLOC_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -48,4 +48,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SVD_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -45,7 +45,7 @@ namespace internal {
* matrix \f$ A \f$ such that \f$ A = P^TLDL^*P \f$, where P is a permutation matrix, L
* is lower triangular with a unit diagonal and D is a diagonal matrix.
*
* The decomposition uses pivoting to ensure stability, so that L will have
* The decomposition uses pivoting to ensure stability, so that D will have
* zeros in the bottom right rank(A) - n submatrix. Avoiding the square root
* on D also stabilizes the computation.
*
@@ -53,7 +53,7 @@ namespace internal {
* decomposition to determine whether a system of equations has a solution.
*
* This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism.
*
*
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt(), class LLT
*/
template<typename _MatrixType, int _UpLo> class LDLT
@@ -200,7 +200,7 @@ template<typename _MatrixType, int _UpLo> class LDLT
* \f$ L^* y_4 = y_3 \f$ and \f$ P x = y_4 \f$ in succession. If the matrix \f$ A \f$ is singular, then
* \f$ D \f$ will also be singular (all the other matrices are invertible). In that case, the
* least-square solution of \f$ D y_3 = y_2 \f$ is computed. This does not mean that this function
* computes the least-square solution of \f$ A x = b \f$ is \f$ A \f$ is singular.
* computes the least-square solution of \f$ A x = b \f$ if \f$ A \f$ is singular.
*
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt()
*/
@@ -246,8 +246,8 @@ template<typename _MatrixType, int _UpLo> class LDLT
*/
const LDLT& adjoint() const { return *this; };
inline Index rows() const { return m_matrix.rows(); }
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.cols(); }
/** \brief Reports whether previous computation was successful.
*

View File

@@ -153,8 +153,8 @@ template<typename Derived> class ArrayBase
// inline void evalTo(Dest& dst) const { dst = matrix(); }
protected:
EIGEN_DEVICE_FUNC
ArrayBase() : Base() {}
EIGEN_DEFAULT_COPY_CONSTRUCTOR(ArrayBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(ArrayBase)
private:
explicit ArrayBase(Index);

View File

@@ -17,7 +17,7 @@ namespace Eigen {
// This implementation is based on Assign.h
namespace internal {
/***************************************************************************
* Part 1 : the logic deciding a strategy for traversal and unrolling *
***************************************************************************/
@@ -29,12 +29,12 @@ struct copy_using_evaluator_traits
{
typedef typename DstEvaluator::XprType Dst;
typedef typename Dst::Scalar DstScalar;
enum {
DstFlags = DstEvaluator::Flags,
SrcFlags = SrcEvaluator::Flags
};
public:
enum {
DstAlignment = DstEvaluator::Alignment,
@@ -99,7 +99,8 @@ private:
public:
enum {
Traversal = (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
Traversal = int(Dst::SizeAtCompileTime) == 0 ? int(AllAtOnceTraversal) // If compile-size is zero, traversing will fail at compile-time.
: (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
: int(MayInnerVectorize) ? int(InnerVectorizedTraversal)
: int(MayLinearVectorize) ? int(LinearVectorizedTraversal)
: int(MaySliceVectorize) ? int(SliceVectorizedTraversal)
@@ -137,7 +138,7 @@ public:
? int(CompleteUnrolling)
: int(NoUnrolling) )
: int(Traversal) == int(LinearTraversal)
? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling)
? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling)
: int(NoUnrolling) )
#if EIGEN_UNALIGNED_VECTORIZE
: int(Traversal) == int(SliceVectorizedTraversal)
@@ -199,7 +200,7 @@ struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling
// FIXME: this is not very clean, perhaps this information should be provided by the kernel?
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
enum {
outer = Index / DstXprType::InnerSizeAtCompileTime,
inner = Index % DstXprType::InnerSizeAtCompileTime
@@ -265,7 +266,7 @@ struct copy_using_evaluator_innervec_CompleteUnrolling
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum {
outer = Index / DstXprType::InnerSizeAtCompileTime,
inner = Index % DstXprType::InnerSizeAtCompileTime,
@@ -316,6 +317,22 @@ template<typename Kernel,
int Unrolling = Kernel::AssignmentTraits::Unrolling>
struct dense_assignment_loop;
/************************
***** Special Cases *****
************************/
// Zero-sized assignment is a no-op.
template<typename Kernel, int Unrolling>
struct dense_assignment_loop<Kernel, AllAtOnceTraversal, Unrolling>
{
EIGEN_DEVICE_FUNC static void EIGEN_STRONG_INLINE run(Kernel& /*kernel*/)
{
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
EIGEN_STATIC_ASSERT(int(DstXprType::SizeAtCompileTime) == 0,
EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT)
}
};
/************************
*** Default traversal ***
************************/
@@ -430,7 +447,7 @@ struct dense_assignment_loop<Kernel, LinearVectorizedTraversal, CompleteUnrollin
{
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum { size = DstXprType::SizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
alignedSize = (size/packetSize)*packetSize };
@@ -572,14 +589,15 @@ struct dense_assignment_loop<Kernel, SliceVectorizedTraversal, InnerUnrolling>
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum { size = DstXprType::InnerSizeAtCompileTime,
enum { innerSize = DstXprType::InnerSizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
vectorizableSize = (size/packetSize)*packetSize };
vectorizableSize = (innerSize/packetSize)*packetSize,
size = DstXprType::SizeAtCompileTime };
for(Index outer = 0; outer < kernel.outerSize(); ++outer)
{
copy_using_evaluator_innervec_InnerUnrolling<Kernel, 0, vectorizableSize, 0, 0>::run(kernel, outer);
copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, vectorizableSize, size>::run(kernel, outer);
copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, vectorizableSize, innerSize>::run(kernel, outer);
}
}
};
@@ -603,14 +621,14 @@ protected:
typedef typename DstEvaluatorTypeT::XprType DstXprType;
typedef typename SrcEvaluatorTypeT::XprType SrcXprType;
public:
typedef DstEvaluatorTypeT DstEvaluatorType;
typedef SrcEvaluatorTypeT SrcEvaluatorType;
typedef typename DstEvaluatorType::Scalar Scalar;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: m_dst(dst), m_src(src), m_functor(func), m_dstExpr(dstExpr)
@@ -619,58 +637,58 @@ public:
AssignmentTraits::debug();
#endif
}
EIGEN_DEVICE_FUNC Index size() const { return m_dstExpr.size(); }
EIGEN_DEVICE_FUNC Index innerSize() const { return m_dstExpr.innerSize(); }
EIGEN_DEVICE_FUNC Index outerSize() const { return m_dstExpr.outerSize(); }
EIGEN_DEVICE_FUNC Index rows() const { return m_dstExpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_dstExpr.cols(); }
EIGEN_DEVICE_FUNC Index outerStride() const { return m_dstExpr.outerStride(); }
EIGEN_DEVICE_FUNC DstEvaluatorType& dstEvaluator() { return m_dst; }
EIGEN_DEVICE_FUNC const SrcEvaluatorType& srcEvaluator() const { return m_src; }
/// Assign src(row,col) to dst(row,col) through the assignment functor.
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index row, Index col)
{
m_functor.assignCoeff(m_dst.coeffRef(row,col), m_src.coeff(row,col));
}
/// \sa assignCoeff(Index,Index)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index index)
{
m_functor.assignCoeff(m_dst.coeffRef(index), m_src.coeff(index));
}
/// \sa assignCoeff(Index,Index)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeffByOuterInner(Index outer, Index inner)
{
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
assignCoeff(row, col);
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index row, Index col)
{
m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(row,col), m_src.template packet<LoadMode,PacketType>(row,col));
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index index)
{
m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(index), m_src.template packet<LoadMode,PacketType>(index));
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacketByOuterInner(Index outer, Index inner)
{
Index row = rowIndexByOuterInner(outer, inner);
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
assignPacket<StoreMode,LoadMode,PacketType>(row, col);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index rowIndexByOuterInner(Index outer, Index inner)
{
typedef typename DstEvaluatorType::ExpressionTraits Traits;
@@ -693,7 +711,7 @@ public:
{
return m_dstExpr.data();
}
protected:
DstEvaluatorType& m_dst;
const SrcEvaluatorType& m_src;
@@ -716,13 +734,13 @@ protected:
typedef typename Base::DstXprType DstXprType;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor, 4> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC restricted_packet_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{
}
};
/***************************************************************************
* Part 5 : Entry point for dense rectangular assignment
***************************************************************************/
@@ -760,7 +778,7 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
typedef generic_dense_assignment_kernel<DstEvaluatorType,SrcEvaluatorType,Functor> Kernel;
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
@@ -788,7 +806,7 @@ struct EigenBase2EigenBase {};
template<typename,typename> struct AssignmentKind { typedef EigenBase2EigenBase Kind; };
template<> struct AssignmentKind<DenseShape,DenseShape> { typedef Dense2Dense Kind; };
// This is the main assignment class
template< typename DstXprType, typename SrcXprType, typename Functor,
typename Kind = typename AssignmentKind< typename evaluator_traits<DstXprType>::Shape , typename evaluator_traits<SrcXprType>::Shape >::Kind,
@@ -813,7 +831,7 @@ void call_assignment(const Dst& dst, const Src& src)
{
call_assignment(dst, src, internal::assign_op<typename Dst::Scalar,typename Src::Scalar>());
}
// Deal with "assume-aliasing"
template<typename Dst, typename Src, typename Func>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -853,12 +871,12 @@ void call_assignment_no_alias(Dst& dst, const Src& src, const Func& func)
typedef typename internal::conditional<NeedToTranspose, Transpose<Dst>, Dst>::type ActualDstTypeCleaned;
typedef typename internal::conditional<NeedToTranspose, Transpose<Dst>, Dst&>::type ActualDstType;
ActualDstType actualDst(dst);
// TODO check whether this is the right place to perform these checks:
EIGEN_STATIC_ASSERT_LVALUE(Dst)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(ActualDstTypeCleaned,Src)
EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename ActualDstTypeCleaned::Scalar,typename Src::Scalar);
Assignment<ActualDstTypeCleaned,Src,Func>::run(actualDst, src, func);
}
@@ -875,7 +893,7 @@ void call_restricted_packet_assignment_no_alias(Dst& dst, const Src& src, const
SrcEvaluatorType srcEvaluator(src);
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
@@ -922,7 +940,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Dense2Dense, Weak>
#ifndef EIGEN_NO_DEBUG
internal::check_for_aliasing(dst, src);
#endif
call_dense_assignment_loop(dst, src, func);
}
};

View File

@@ -22,7 +22,7 @@ struct all_unroller
row = (UnrollCount-1) % Rows
};
static inline bool run(const Derived &mat)
EIGEN_DEVICE_FUNC static inline bool run(const Derived &mat)
{
return all_unroller<Derived, UnrollCount-1, Rows>::run(mat) && mat.coeff(row, col);
}
@@ -31,13 +31,13 @@ struct all_unroller
template<typename Derived, int Rows>
struct all_unroller<Derived, 0, Rows>
{
static inline bool run(const Derived &/*mat*/) { return true; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &/*mat*/) { return true; }
};
template<typename Derived, int Rows>
struct all_unroller<Derived, Dynamic, Rows>
{
static inline bool run(const Derived &) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &) { return false; }
};
template<typename Derived, int UnrollCount, int Rows>
@@ -48,7 +48,7 @@ struct any_unroller
row = (UnrollCount-1) % Rows
};
static inline bool run(const Derived &mat)
EIGEN_DEVICE_FUNC static inline bool run(const Derived &mat)
{
return any_unroller<Derived, UnrollCount-1, Rows>::run(mat) || mat.coeff(row, col);
}
@@ -57,13 +57,13 @@ struct any_unroller
template<typename Derived, int Rows>
struct any_unroller<Derived, 0, Rows>
{
static inline bool run(const Derived & /*mat*/) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived & /*mat*/) { return false; }
};
template<typename Derived, int Rows>
struct any_unroller<Derived, Dynamic, Rows>
{
static inline bool run(const Derived &) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &) { return false; }
};
} // end namespace internal

View File

@@ -33,6 +33,8 @@ struct CommaInitializer
inline CommaInitializer(XprType& xpr, const Scalar& s)
: m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1)
{
eigen_assert(m_xpr.rows() > 0 && m_xpr.cols() > 0
&& "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.coeffRef(0,0) = s;
}
@@ -41,6 +43,8 @@ struct CommaInitializer
inline CommaInitializer(XprType& xpr, const DenseBase<OtherDerived>& other)
: m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows())
{
eigen_assert(m_xpr.rows() >= other.rows() && m_xpr.cols() >= other.cols()
&& "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.block(0, 0, other.rows(), other.cols()) = other;
}
@@ -103,7 +107,7 @@ struct CommaInitializer
EIGEN_EXCEPTION_SPEC(Eigen::eigen_assert_exception)
#endif
{
finished();
finished();
}
/** \returns the built matrix once all its coefficients have been set.

View File

@@ -383,6 +383,33 @@ PlainObjectBase<Derived>::setConstant(Index rows, Index cols, const Scalar& val)
return setConstant(val);
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to the given value \a val. For the parameter
* of type NoChange_t, just pass the special value \c NoChange.
*
* \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setConstant(NoChange_t, Index cols, const Scalar& val)
{
return setConstant(rows(), cols, val);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to the given value \a val. For the parameter
* of type NoChange_t, just pass the special value \c NoChange.
*
* \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setConstant(Index rows, NoChange_t, const Scalar& val)
{
return setConstant(rows, cols(), val);
}
/**
* \brief Sets a linearly spaced vector.
*
@@ -556,6 +583,32 @@ PlainObjectBase<Derived>::setZero(Index rows, Index cols)
return setConstant(Scalar(0));
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to zero. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Zero()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setZero(NoChange_t, Index cols)
{
return setZero(rows(), cols);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to zero. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Zero()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setZero(Index rows, NoChange_t)
{
return setZero(rows, cols());
}
// ones:
/** \returns an expression of a matrix where all coefficients equal one.
@@ -682,6 +735,32 @@ PlainObjectBase<Derived>::setOnes(Index rows, Index cols)
return setConstant(Scalar(1));
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to one. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(NoChange_t, Index), class CwiseNullaryOp, MatrixBase::Ones()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setOnes(Index rows, NoChange_t)
{
return setOnes(rows, cols());
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to one. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(Index, NoChange_t) class CwiseNullaryOp, MatrixBase::Ones()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setOnes(NoChange_t, Index cols)
{
return setOnes(rows(), cols);
}
// Identity:
/** \returns an expression of the identity matrix (not necessarily square).

View File

@@ -64,23 +64,23 @@ class CwiseUnaryView : public CwiseUnaryViewImpl<ViewOp, MatrixType, typename in
typedef typename internal::ref_selector<MatrixType>::non_const_type MatrixTypeNested;
typedef typename internal::remove_all<MatrixType>::type NestedExpression;
explicit inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
explicit EIGEN_DEVICE_FUNC inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryView)
EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
/** \returns the functor representing unary operation */
const ViewOp& functor() const { return m_functor; }
EIGEN_DEVICE_FUNC const ViewOp& functor() const { return m_functor; }
/** \returns the nested expression */
const typename internal::remove_all<MatrixTypeNested>::type&
EIGEN_DEVICE_FUNC const typename internal::remove_all<MatrixTypeNested>::type&
nestedExpression() const { return m_matrix; }
/** \returns the nested expression */
typename internal::remove_reference<MatrixTypeNested>::type&
EIGEN_DEVICE_FUNC typename internal::remove_reference<MatrixTypeNested>::type&
nestedExpression() { return m_matrix; }
protected:
@@ -121,6 +121,8 @@ class CwiseUnaryViewImpl<ViewOp,MatrixType,Dense>
{
return derived().nestedExpression().outerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(CwiseUnaryViewImpl)
};
} // end namespace Eigen

View File

@@ -18,7 +18,7 @@ namespace internal {
// The index type defined by EIGEN_DEFAULT_DENSE_INDEX_TYPE must be a signed type.
// This dummy function simply aims at checking that at compile time.
static inline void check_DenseIndex_is_signed() {
EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE);
EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE)
}
} // end namespace internal
@@ -530,16 +530,16 @@ template<typename Derived> class DenseBase
static const RandomReturnType Random();
template<typename ThenDerived,typename ElseDerived>
const Select<Derived,ThenDerived,ElseDerived>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived,ElseDerived>
select(const DenseBase<ThenDerived>& thenMatrix,
const DenseBase<ElseDerived>& elseMatrix) const;
template<typename ThenDerived>
inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
select(const DenseBase<ThenDerived>& thenMatrix, const typename ThenDerived::Scalar& elseScalar) const;
template<typename ElseDerived>
inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
inline EIGEN_DEVICE_FUNC const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
select(const typename ElseDerived::Scalar& thenScalar, const DenseBase<ElseDerived>& elseMatrix) const;
template<int p> RealScalar lpNorm() const;
@@ -636,11 +636,12 @@ template<typename Derived> class DenseBase
}
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(DenseBase)
/** Default constructor. Do nothing. */
EIGEN_DEVICE_FUNC DenseBase()
{
/* Just checks for self-consistency of the flags.
* Only do it when debugging Eigen, as this borders on paranoiac and could slow compilation down
* Only do it when debugging Eigen, as this borders on paranoia and could slow compilation down
*/
#ifdef EIGEN_INTERNAL_DEBUGGING
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, int(IsRowMajor))

View File

@@ -200,6 +200,18 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
if (this != &other) m_data = other.m_data;
return *this;
}
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
: m_data(std::move(other.m_data))
{
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT
{
if (this != &other)
m_data = std::move(other.m_data);
return *this;
}
#endif
EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({})
eigen_internal_assert(size==rows*cols && rows==_Rows && cols==_Cols);

View File

@@ -207,7 +207,7 @@ struct lpNorm_selector
EIGEN_DEVICE_FUNC
static inline RealScalar run(const MatrixBase<Derived>& m)
{
EIGEN_USING_STD_MATH(pow)
EIGEN_USING_STD(pow)
return pow(m.cwiseAbs().array().pow(p).sum(), RealScalar(1)/p);
}
};

View File

@@ -228,8 +228,7 @@ template<> struct gemv_dense_selector<OnTheRight,ColMajor,true>
ActualLhsType actualLhs = LhsBlasTraits::extract(lhs);
ActualRhsType actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
// make sure Dest is a compile-time vector type (bug 1166)
typedef typename conditional<Dest::IsVectorAtCompileTime, Dest, typename Dest::ColXpr>::type ActualDest;
@@ -320,8 +319,7 @@ template<> struct gemv_dense_selector<OnTheRight,RowMajor,true>
typename add_const<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(lhs);
typename add_const<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
enum {
// FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1

View File

@@ -44,19 +44,23 @@ struct default_packet_traits
enum {
HasHalfPacket = 0,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasNegate = 1,
HasAbs = 1,
HasArg = 0,
HasAbs2 = 1,
HasMin = 1,
HasMax = 1,
HasConj = 1,
HasAdd = 1,
HasSub = 1,
HasShift = 1,
HasMul = 1,
HasNegate = 1,
HasAbs = 1,
HasArg = 0,
HasAbs2 = 1,
HasAbsDiff = 0,
HasMin = 1,
HasMax = 1,
HasConj = 1,
HasSetLinear = 1,
HasBlend = 0,
HasReduxp = 1,
HasBlend = 0,
// This flag is used to indicate whether packet comparison is supported.
// pcmp_eq, pcmp_lt and pcmp_le should be defined for it to be true.
HasCmp = 0,
HasDiv = 0,
HasSqrt = 0,
@@ -92,9 +96,9 @@ struct default_packet_traits
HasBetaInc = 0,
HasRound = 0,
HasRint = 0,
HasFloor = 0,
HasCeil = 0,
HasSign = 0
};
};
@@ -133,6 +137,22 @@ template <typename Src, typename Tgt> struct type_casting_traits {
};
};
/** \internal Wrapper to ensure that multiple packet types can map to the same
same underlying vector type. */
template<typename T, int unique_id = 0>
struct eigen_packet_wrapper
{
EIGEN_ALWAYS_INLINE operator T&() { return m_val; }
EIGEN_ALWAYS_INLINE operator const T&() const { return m_val; }
EIGEN_ALWAYS_INLINE eigen_packet_wrapper() {}
EIGEN_ALWAYS_INLINE eigen_packet_wrapper(const T &v) : m_val(v) {}
EIGEN_ALWAYS_INLINE eigen_packet_wrapper& operator=(const T &v) {
m_val = v;
return *this;
}
T m_val;
};
/** \internal \returns static_cast<TgtType>(a) (coeff-wise) */
template <typename SrcPacket, typename TgtPacket>
@@ -145,12 +165,17 @@ EIGEN_DEVICE_FUNC inline TgtPacket
pcast(const SrcPacket& a, const SrcPacket& /*b*/) {
return static_cast<TgtPacket>(a);
}
template <typename SrcPacket, typename TgtPacket>
EIGEN_DEVICE_FUNC inline TgtPacket
pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/) {
return static_cast<TgtPacket>(a);
}
template <typename SrcPacket, typename TgtPacket>
EIGEN_DEVICE_FUNC inline TgtPacket
pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/,
const SrcPacket& /*e*/, const SrcPacket& /*f*/, const SrcPacket& /*g*/, const SrcPacket& /*h*/) {
return static_cast<TgtPacket>(a);
}
/** \internal \returns reinterpret_cast<Target>(a) */
template <typename Target, typename Packet>
@@ -160,6 +185,9 @@ preinterpret(const Packet& a); /* { return reinterpret_cast<const Target&>(a); }
/** \internal \returns a + b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
padd(const Packet& a, const Packet& b) { return a+b; }
// Avoid compiler warning for boolean algebra.
template<> EIGEN_DEVICE_FUNC inline bool
padd(const bool& a, const bool& b) { return a || b; }
/** \internal \returns a - b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -169,128 +197,31 @@ psub(const Packet& a, const Packet& b) { return a-b; }
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pnegate(const Packet& a) { return -a; }
/** \internal \returns conj(a) (coeff-wise) */
template<> EIGEN_DEVICE_FUNC inline bool
pnegate(const bool& a) { return !a; }
/** \internal \returns conj(a) (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pconj(const Packet& a) { return numext::conj(a); }
/** \internal \returns a * b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmul(const Packet& a, const Packet& b) { return a*b; }
// Avoid compiler warning for boolean algebra.
template<> EIGEN_DEVICE_FUNC inline bool
pmul(const bool& a, const bool& b) { return a && b; }
/** \internal \returns a / b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pdiv(const Packet& a, const Packet& b) { return a/b; }
/** \internal \returns the min of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmin(const Packet& a, const Packet& b) { return numext::mini(a, b); }
/** \internal \returns the max of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmax(const Packet& a, const Packet& b) { return numext::maxi(a, b); }
/** \internal \returns the absolute value of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pabs(const Packet& a) { using std::abs; return abs(a); }
/** \internal \returns the phase angle of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
parg(const Packet& a) { using numext::arg; return arg(a); }
/** \internal \returns the bitwise and of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pand(const Packet& a, const Packet& b) { return a & b; }
/** \internal \returns the bitwise or of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
por(const Packet& a, const Packet& b) { return a | b; }
/** \internal \returns the bitwise xor of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pxor(const Packet& a, const Packet& b) { return a ^ b; }
/** \internal \returns the bitwise andnot of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pandnot(const Packet& a, const Packet& b) { return a & (~b); }
/** \internal \returns ones */
/** \internal \returns one bits */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ptrue(const Packet& /*a*/) { Packet b; memset((void*)&b, 0xff, sizeof(b)); return b;}
template <typename RealScalar>
EIGEN_DEVICE_FUNC inline std::complex<RealScalar> ptrue(const std::complex<RealScalar>& /*a*/) {
RealScalar b;
b = ptrue(b);
return std::complex<RealScalar>(b, b);
}
/** \internal \returns the bitwise not of \a a */
template <typename Packet> EIGEN_DEVICE_FUNC inline Packet
pnot(const Packet& a) { return pxor(ptrue(a), a);}
/** \internal \returns \a a shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
pshiftright(const int& a) { return a >> N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
pshiftright(const long int& a) { return a >> N; }
/** \internal \returns \a a shifted by N bits to the left */
template<int N> EIGEN_DEVICE_FUNC inline int
pshiftleft(const int& a) { return a << N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
pshiftleft(const long int& a) { return a << N; }
/** \internal \returns the significant and exponent of the underlying floating point numbers
* See https://en.cppreference.com/w/cpp/numeric/math/frexp
*/
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet pfrexp(const Packet& a, Packet& exponent) {
int exp;
EIGEN_USING_STD_MATH(frexp);
Packet result = frexp(a, &exp);
exponent = static_cast<Packet>(exp);
return result;
}
/** \internal \returns a * 2^exponent
* See https://en.cppreference.com/w/cpp/numeric/math/ldexp
*/
/** \internal \returns zero bits */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pldexp(const Packet &a, const Packet &exponent) {
EIGEN_USING_STD_MATH(ldexp);
return ldexp(a, static_cast<int>(exponent));
}
/** \internal \returns zeros */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pzero(const Packet& a) { return pxor(a,a); }
template<> EIGEN_DEVICE_FUNC inline float pzero<float>(const float& a) {
EIGEN_UNUSED_VARIABLE(a);
return 0.f;
}
template<> EIGEN_DEVICE_FUNC inline double pzero<double>(const double& a) {
EIGEN_UNUSED_VARIABLE(a);
return 0.;
}
/** \internal \returns bits of \a or \b according to the input bit mask \a mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pselect(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
template<> EIGEN_DEVICE_FUNC inline float pselect<float>(
const float& mask, const float& a, const float&b) {
return numext::equal_strict(mask,0.f) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline double pselect<double>(
const double& mask, const double& a, const double& b) {
return numext::equal_strict(mask,0.) ? b : a;
}
pzero(const Packet& /*a*/) { Packet b; memset((void*)&b, 0, sizeof(b)); return b;}
/** \internal \returns a <= b as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -306,7 +237,234 @@ pcmp_eq(const Packet& a, const Packet& b) { return a==b ? ptrue(a) : pzero(a); }
/** \internal \returns a < b or a==NaN or b==NaN as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_lt_or_nan(const Packet& a, const Packet& b) { return pnot(pcmp_le(b,a)); }
pcmp_lt_or_nan(const Packet& a, const Packet& b) { return a>=b ? pzero(a) : ptrue(a); }
template<> EIGEN_DEVICE_FUNC inline float pzero<float>(const float& a) {
EIGEN_UNUSED_VARIABLE(a)
return 0.f;
}
template<> EIGEN_DEVICE_FUNC inline double pzero<double>(const double& a) {
EIGEN_UNUSED_VARIABLE(a)
return 0.;
}
template <typename RealScalar>
EIGEN_DEVICE_FUNC inline std::complex<RealScalar> ptrue(const std::complex<RealScalar>& /*a*/) {
RealScalar b = ptrue(RealScalar(0));
return std::complex<RealScalar>(b, b);
}
template <typename Packet, typename Op>
EIGEN_DEVICE_FUNC inline Packet bitwise_helper(const Packet& a, const Packet& b, Op op) {
const unsigned char* a_ptr = reinterpret_cast<const unsigned char*>(&a);
const unsigned char* b_ptr = reinterpret_cast<const unsigned char*>(&b);
Packet c;
unsigned char* c_ptr = reinterpret_cast<unsigned char*>(&c);
for (size_t i = 0; i < sizeof(Packet); ++i) {
*c_ptr++ = op(*a_ptr++, *b_ptr++);
}
return c;
}
template<typename T>
struct bit_and {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const {
return a & b;
}
};
template<typename T>
struct bit_or {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const {
return a | b;
}
};
template<typename T>
struct bit_xor {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const {
return a ^ b;
}
};
/** \internal \returns the bitwise and of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pand(const Packet& a, const Packet& b) {
return bitwise_helper(a, b, bit_and<unsigned char>());
}
/** \internal \returns the bitwise or of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
por(const Packet& a, const Packet& b) {
return bitwise_helper(a ,b, bit_or<unsigned char>());
}
/** \internal \returns the bitwise xor of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pxor(const Packet& a, const Packet& b) {
return bitwise_helper(a ,b, bit_xor<unsigned char>());
}
/** \internal \returns the bitwise and of \a a and not \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pandnot(const Packet& a, const Packet& b) { return pand(a, pxor(ptrue(b), b)); }
/** \internal \returns \a or \b for each field in packet according to \mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pselect(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
template<> EIGEN_DEVICE_FUNC inline float pselect<float>(
const float& cond, const float& a, const float&b) {
return numext::equal_strict(cond,0.f) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline double pselect<double>(
const double& cond, const double& a, const double& b) {
return numext::equal_strict(cond,0.) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline bool pselect<bool>(
const bool& cond, const bool& a, const bool& b) {
return cond ? a : b;
}
/** \internal \returns the min or of \a a and \a b (coeff-wise)
If either \a a or \a b are NaN, the result is implementation defined. */
template<int NaNPropagation>
struct pminmax_impl {
template <typename Packet, typename Op>
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) {
return op(a,b);
}
};
/** \internal \returns the min or max of \a a and \a b (coeff-wise)
If either \a a or \a b are NaN, NaN is returned. */
template<>
struct pminmax_impl<PropagateNaN> {
template <typename Packet, typename Op>
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) {
Packet not_nan_mask_a = pcmp_eq(a, a);
Packet not_nan_mask_b = pcmp_eq(b, b);
return pselect(not_nan_mask_a,
pselect(not_nan_mask_b, op(a, b), b),
a);
}
};
/** \internal \returns the min or max of \a a and \a b (coeff-wise)
If both \a a and \a b are NaN, NaN is returned.
Equivalent to std::fmin(a, b). */
template<>
struct pminmax_impl<PropagateNumbers> {
template <typename Packet, typename Op>
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) {
Packet not_nan_mask_a = pcmp_eq(a, a);
Packet not_nan_mask_b = pcmp_eq(b, b);
return pselect(not_nan_mask_a,
pselect(not_nan_mask_b, op(a, b), a),
b);
}
};
#ifndef SYCL_DEVICE_ONLY
#define EIGEN_BINARY_OP_NAN_PROPAGATION(Type, Func) Func
#else
#define EIGEN_BINARY_OP_NAN_PROPAGATION(Type, Func) \
[](const Type& a, const Type& b) { \
return Func(a, b);}
#endif
/** \internal \returns the min of \a a and \a b (coeff-wise).
If \a a or \b b is NaN, the return value is implementation defined. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmin(const Packet& a, const Packet& b) { return numext::mini(a,b); }
/** \internal \returns the min of \a a and \a b (coeff-wise).
NaNPropagation determines the NaN propagation semantics. */
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline Packet pmin(const Packet& a, const Packet& b) {
return pminmax_impl<NaNPropagation>::run(a, b, EIGEN_BINARY_OP_NAN_PROPAGATION(Packet, (pmin<Packet>)));
}
/** \internal \returns the max of \a a and \a b (coeff-wise)
If \a a or \b b is NaN, the return value is implementation defined. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmax(const Packet& a, const Packet& b) { return numext::maxi(a, b); }
/** \internal \returns the max of \a a and \a b (coeff-wise).
NaNPropagation determines the NaN propagation semantics. */
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline Packet pmax(const Packet& a, const Packet& b) {
return pminmax_impl<NaNPropagation>::run(a, b, EIGEN_BINARY_OP_NAN_PROPAGATION(Packet,(pmax<Packet>)));
}
/** \internal \returns the absolute value of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pabs(const Packet& a) { return numext::abs(a); }
template<> EIGEN_DEVICE_FUNC inline unsigned int
pabs(const unsigned int& a) { return a; }
template<> EIGEN_DEVICE_FUNC inline unsigned long
pabs(const unsigned long& a) { return a; }
template<> EIGEN_DEVICE_FUNC inline unsigned long long
pabs(const unsigned long long& a) { return a; }
/** \internal \returns the addsub value of \a a,b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
paddsub(const Packet& a, const Packet& b) {
return pselect(peven_mask(a), padd(a, b), psub(a, b));
}
/** \internal \returns the phase angle of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
parg(const Packet& a) { using numext::arg; return arg(a); }
/** \internal \returns \a a logically shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
parithmetic_shift_right(const int& a) { return a >> N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
parithmetic_shift_right(const long int& a) { return a >> N; }
/** \internal \returns \a a arithmetically shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
plogical_shift_right(const int& a) { return static_cast<int>(static_cast<unsigned int>(a) >> N); }
template<int N> EIGEN_DEVICE_FUNC inline long int
plogical_shift_right(const long int& a) { return static_cast<long>(static_cast<unsigned long>(a) >> N); }
/** \internal \returns \a a shifted by N bits to the left */
template<int N> EIGEN_DEVICE_FUNC inline int
plogical_shift_left(const int& a) { return a << N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
plogical_shift_left(const long int& a) { return a << N; }
/** \internal \returns the significant and exponent of the underlying floating point numbers
* See https://en.cppreference.com/w/cpp/numeric/math/frexp
*/
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet pfrexp(const Packet& a, Packet& exponent) {
int exp;
EIGEN_USING_STD(frexp);
Packet result = static_cast<Packet>(frexp(a, &exp));
exponent = static_cast<Packet>(exp);
return result;
}
/** \internal \returns a * 2^((int)exponent)
* See https://en.cppreference.com/w/cpp/numeric/math/ldexp
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pldexp(const Packet &a, const Packet &exponent) {
EIGEN_USING_STD(ldexp)
return static_cast<Packet>(ldexp(a, static_cast<int>(exponent)));
}
/** \internal \returns the min of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pabsdiff(const Packet& a, const Packet& b) { return pselect(pcmp_lt(a, b), psub(b, a), psub(a, b)); }
/** \internal \returns a packet version of \a *from, from must be 16 bytes aligned */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -348,7 +506,7 @@ ploaddup(const typename unpacket_traits<Packet>::type* from) { return *from; }
* For instance, for a packet of 8 elements, 2 scalars will be read from \a *from and
* replicated to form: {from[0],from[0],from[0],from[0],from[1],from[1],from[1],from[1]}
* Currently, this function is only used in matrix products.
* For packet-size smaller or equal to 4, this function is equivalent to pload1
* For packet-size smaller or equal to 4, this function is equivalent to pload1
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ploadquad(const typename unpacket_traits<Packet>::type* from)
@@ -392,6 +550,20 @@ inline void pbroadcast2(const typename unpacket_traits<Packet>::type *a,
template<typename Packet> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet
plset(const typename unpacket_traits<Packet>::type& a) { return a; }
/** \internal \returns a packet with constant coefficients \a a, e.g.: (x, 0, x, 0),
where x is the value of all 1-bits. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
peven_mask(const Packet& /*a*/) {
typedef typename unpacket_traits<Packet>::type Scalar;
const size_t n = unpacket_traits<Packet>::size;
Scalar elements[n];
for(size_t i = 0; i < n; ++i) {
memset(elements+i, ((i & 1) == 0 ? 0xff : 0), sizeof(Scalar));
}
return ploadu<Packet>(elements);
}
/** \internal copy the packet \a from to \a *to, \a to must be 16 bytes aligned */
template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstore(Scalar* to, const Packet& from)
{ (*to) = from; }
@@ -421,7 +593,7 @@ template<typename Scalar> EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* a
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// do nothing
#elif defined(EIGEN_CUDA_ARCH)
#if defined(__LP64__)
#if defined(__LP64__) || EIGEN_OS_WIN64
// 64-bit pointer operand constraint for inlined asm
asm(" prefetch.L1 [ %1 ];" : "=l"(addr) : "l"(addr));
#else
@@ -433,38 +605,189 @@ template<typename Scalar> EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* a
#endif
}
/** \internal \returns the first element of a packet */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type pfirst(const Packet& a)
/** \internal \returns the reversed elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a)
{ return a; }
/** \internal \returns a packet where the element i contains the sum of the packet of \a vec[i] */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
preduxp(const Packet* vecs) { return vecs[0]; }
/** \internal \returns \a a with real and imaginary part flipped (for complex type only) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a)
{
return Packet(numext::imag(a),numext::real(a));
}
/** \internal \returns the sum of the elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux(const Packet& a)
/**************************
* Special math functions
***************************/
/** \internal \returns the sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psin(const Packet& a) { EIGEN_USING_STD(sin); return sin(a); }
/** \internal \returns the cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcos(const Packet& a) { EIGEN_USING_STD(cos); return cos(a); }
/** \internal \returns the tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptan(const Packet& a) { EIGEN_USING_STD(tan); return tan(a); }
/** \internal \returns the arc sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pasin(const Packet& a) { EIGEN_USING_STD(asin); return asin(a); }
/** \internal \returns the arc cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pacos(const Packet& a) { EIGEN_USING_STD(acos); return acos(a); }
/** \internal \returns the arc tangent of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet patan(const Packet& a) { EIGEN_USING_STD(atan); return atan(a); }
/** \internal \returns the hyperbolic sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psinh(const Packet& a) { EIGEN_USING_STD(sinh); return sinh(a); }
/** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcosh(const Packet& a) { EIGEN_USING_STD(cosh); return cosh(a); }
/** \internal \returns the hyperbolic tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptanh(const Packet& a) { EIGEN_USING_STD(tanh); return tanh(a); }
/** \internal \returns the exp of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexp(const Packet& a) { EIGEN_USING_STD(exp); return exp(a); }
/** \internal \returns the expm1 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexpm1(const Packet& a) { return numext::expm1(a); }
/** \internal \returns the log of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog(const Packet& a) { EIGEN_USING_STD(log); return log(a); }
/** \internal \returns the log1p of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog1p(const Packet& a) { return numext::log1p(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog10(const Packet& a) { EIGEN_USING_STD(log10); return log10(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog2(const Packet& a) {
typedef typename internal::unpacket_traits<Packet>::type Scalar;
return pmul(pset1<Packet>(Scalar(EIGEN_LOG2E)), plog(a));
}
/** \internal \returns the square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
/** \internal \returns the reciprocal square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet prsqrt(const Packet& a) {
typedef typename internal::unpacket_traits<Packet>::type Scalar;
return pdiv(pset1<Packet>(Scalar(1)), psqrt(a));
}
/** \internal \returns the rounded value of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pround(const Packet& a) { using numext::round; return round(a); }
/** \internal \returns the floor of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pfloor(const Packet& a) { using numext::floor; return floor(a); }
/** \internal \returns the rounded value of \a a (coeff-wise) with current
* rounding mode */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet print(const Packet& a) { using numext::rint; return rint(a); }
/** \internal \returns the ceil of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pceil(const Packet& a) { using numext::ceil; return ceil(a); }
/** \internal \returns the first element of a packet */
template<typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type
pfirst(const Packet& a)
{ return a; }
/** \internal \returns the sum of the elements of upper and lower half of \a a if \a a is larger than 4.
* For a packet {a0, a1, a2, a3, a4, a5, a6, a7}, it returns a half packet {a0+a4, a1+a5, a2+a6, a3+a7}
* For packet-size smaller or equal to 4, this boils down to a noop.
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline
typename conditional<(unpacket_traits<Packet>::size%8)==0,typename unpacket_traits<Packet>::half,Packet>::type
template<typename Packet>
EIGEN_DEVICE_FUNC inline typename conditional<(unpacket_traits<Packet>::size%8)==0,typename unpacket_traits<Packet>::half,Packet>::type
predux_half_dowto4(const Packet& a)
{ return a; }
// Slow generic implementation of Packet reduction.
template <typename Packet, typename Op>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type
predux_helper(const Packet& a, Op op) {
typedef typename unpacket_traits<Packet>::type Scalar;
const size_t n = unpacket_traits<Packet>::size;
Scalar elements[n];
pstoreu<Scalar>(elements, a);
for(size_t k = n / 2; k > 0; k /= 2) {
for(size_t i = 0; i < k; ++i) {
elements[i] = op(elements[i], elements[i + k]);
}
}
return elements[0];
}
/** \internal \returns the sum of the elements of \a a*/
template<typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type
predux(const Packet& a)
{
return a;
}
/** \internal \returns the product of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_mul(const Packet& a)
{ return a; }
template <typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_mul(
const Packet& a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmul<Scalar>)));
}
/** \internal \returns the min of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(const Packet& a)
{ return a; }
template <typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(
const Packet &a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmin<PropagateFast, Scalar>)));
}
/** \internal \returns the max of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(const Packet& a)
{ return a; }
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(
const Packet& a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmin<NaNPropagation, Scalar>)));
}
/** \internal \returns the min of the elements of \a a */
template <typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(
const Packet &a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmax<PropagateFast, Scalar>)));
}
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(
const Packet& a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmax<NaNPropagation, Scalar>)));
}
#undef EIGEN_BINARY_OP_NAN_PROPAGATION
/** \internal \returns true if all coeffs of \a a means "true"
* It is supposed to be called on values returned by pcmp_*.
@@ -484,101 +807,10 @@ template<typename Packet> EIGEN_DEVICE_FUNC inline bool predux_any(const Packet&
// - bits full of ones (NaN for floats),
// - or first bit equals to 1 (1 for ints, smallest denormal for floats).
// For all these cases, taking the sum is just fine, and this boils down to a no-op for scalars.
return bool(predux(a));
typedef typename unpacket_traits<Packet>::type Scalar;
return numext::not_equal_strict(predux(a), Scalar(0));
}
/** \internal \returns the reversed elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a)
{ return a; }
/** \internal \returns \a a with real and imaginary part flipped (for complex type only) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a)
{
return Packet(numext::imag(a),numext::real(a));
}
/**************************
* Special math functions
***************************/
/** \internal \returns the sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psin(const Packet& a) { EIGEN_USING_STD_MATH(sin); return sin(a); }
/** \internal \returns the cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcos(const Packet& a) { EIGEN_USING_STD_MATH(cos); return cos(a); }
/** \internal \returns the tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptan(const Packet& a) { EIGEN_USING_STD_MATH(tan); return tan(a); }
/** \internal \returns the arc sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pasin(const Packet& a) { EIGEN_USING_STD_MATH(asin); return asin(a); }
/** \internal \returns the arc cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pacos(const Packet& a) { EIGEN_USING_STD_MATH(acos); return acos(a); }
/** \internal \returns the arc tangent of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet patan(const Packet& a) { EIGEN_USING_STD_MATH(atan); return atan(a); }
/** \internal \returns the hyperbolic sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psinh(const Packet& a) { EIGEN_USING_STD_MATH(sinh); return sinh(a); }
/** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcosh(const Packet& a) { EIGEN_USING_STD_MATH(cosh); return cosh(a); }
/** \internal \returns the hyperbolic tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptanh(const Packet& a) { EIGEN_USING_STD_MATH(tanh); return tanh(a); }
/** \internal \returns the exp of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexp(const Packet& a) { EIGEN_USING_STD_MATH(exp); return exp(a); }
/** \internal \returns the expm1 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexpm1(const Packet& a) { return numext::expm1(a); }
/** \internal \returns the log of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog(const Packet& a) { EIGEN_USING_STD_MATH(log); return log(a); }
/** \internal \returns the log1p of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog1p(const Packet& a) { return numext::log1p(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog10(const Packet& a) { EIGEN_USING_STD_MATH(log10); return log10(a); }
/** \internal \returns the square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psqrt(const Packet& a) { EIGEN_USING_STD_MATH(sqrt); return sqrt(a); }
/** \internal \returns the reciprocal square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet prsqrt(const Packet& a) {
return pdiv(pset1<Packet>(1), psqrt(a));
}
/** \internal \returns the rounded value of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pround(const Packet& a) { using numext::round; return round(a); }
/** \internal \returns the floor of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pfloor(const Packet& a) { using numext::floor; return floor(a); }
/** \internal \returns the ceil of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pceil(const Packet& a) { using numext::ceil; return ceil(a); }
/***************************************************************************
* The following functions might not have to be overwritten for vectorized types
***************************************************************************/
@@ -631,35 +863,6 @@ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet ploadt_ro(const typename unpacket_t
return ploadt<Packet, LoadMode>(from);
}
/** \internal default implementation of palign() allowing partial specialization */
template<int Offset,typename PacketType>
struct palign_impl
{
// by default data are aligned, so there is nothing to be done :)
static inline void run(PacketType&, const PacketType&) {}
};
/** \internal update \a first using the concatenation of the packet_size minus \a Offset last elements
* of \a first and \a Offset first elements of \a second.
*
* This function is currently only used to optimize matrix-vector products on unligned matrices.
* It takes 2 packets that represent a contiguous memory array, and returns a packet starting
* at the position \a Offset. For instance, for packets of 4 elements, we have:
* Input:
* - first = {f0,f1,f2,f3}
* - second = {s0,s1,s2,s3}
* Output:
* - if Offset==0 then {f0,f1,f2,f3}
* - if Offset==1 then {f1,f2,f3,s0}
* - if Offset==2 then {f2,f3,s0,s1}
* - if Offset==3 then {f3,s0,s1,s3}
*/
template<int Offset,typename PacketType>
inline void palign(PacketType& first, const PacketType& second)
{
palign_impl<Offset,PacketType>::run(first,second);
}
/***************************************************************************
* Fast complex products (GCC generates a function call which is very slow)
***************************************************************************/
@@ -702,50 +905,6 @@ pblend(const Selector<unpacket_traits<Packet>::size>& ifPacket, const Packet& th
return ifPacket.select[0] ? thenPacket : elsePacket;
}
/** \internal \returns \a a with the first coefficient replaced by the scalar b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pinsertfirst(const Packet& a, typename unpacket_traits<Packet>::type b)
{
// Default implementation based on pblend.
// It must be specialized for higher performance.
Selector<unpacket_traits<Packet>::size> mask;
mask.select[0] = true;
// This for loop should be optimized away by the compiler.
for(Index i=1; i<unpacket_traits<Packet>::size; ++i)
mask.select[i] = false;
return pblend(mask, pset1<Packet>(b), a);
}
/** \internal \returns \a a with the last coefficient replaced by the scalar b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pinsertlast(const Packet& a, typename unpacket_traits<Packet>::type b)
{
// Default implementation based on pblend.
// It must be specialized for higher performance.
Selector<unpacket_traits<Packet>::size> mask;
// This for loop should be optimized away by the compiler.
for(Index i=0; i<unpacket_traits<Packet>::size-1; ++i)
mask.select[i] = false;
mask.select[unpacket_traits<Packet>::size-1] = true;
return pblend(mask, pset1<Packet>(b), a);
}
/***************************************************************************
* Some generic implementations to be used by implementors
***************************************************************************/
/** Default implementation of pfrexp for float.
* It is expected to be called by implementers of template<> pfrexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent);
/** Default implementation of pldexp for float.
* It is expected to be called by implementers of template<> pldexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent);
} // end namespace internal
} // end namespace Eigen

View File

@@ -81,14 +81,16 @@ namespace Eigen
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(expm1,scalar_expm1_op,exponential of a value minus 1,\sa ArrayBase::expm1)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op,natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p,scalar_log1p_op,natural logarithm of 1 plus the value,\sa ArrayBase::log1p)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log10)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log2,scalar_log2_op,base 2 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs,scalar_abs_op,absolute value,\sa ArrayBase::abs DOXCOMMA MatrixBase::cwiseAbs)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs2,scalar_abs2_op,squared absolute value,\sa ArrayBase::abs2 DOXCOMMA MatrixBase::cwiseAbs2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg DOXCOMMA MatrixBase::cwiseArg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt,scalar_sqrt_op,square root,\sa ArrayBase::sqrt DOXCOMMA MatrixBase::cwiseSqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rsqrt,scalar_rsqrt_op,reciprocal square root,\sa ArrayBase::rsqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(square,scalar_square_op,square (power 2),\sa Eigen::abs2 DOXCOMMA Eigen::pow DOXCOMMA ArrayBase::square)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cube,scalar_cube_op,cube (power 3),\sa Eigen::pow DOXCOMMA ArrayBase::cube)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rint,scalar_rint_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(round,scalar_round_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(floor,scalar_floor_op,nearest integer not greater than the giben value,\sa Eigen::ceil DOXCOMMA ArrayBase::floor)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ceil,scalar_ceil_op,nearest integer not less than the giben value,\sa Eigen::floor DOXCOMMA ArrayBase::ceil)

View File

@@ -130,6 +130,9 @@ struct significant_decimals_impl
template<typename Derived>
std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt)
{
using internal::is_same;
using internal::conditional;
if(_m.size() == 0)
{
s << fmt.matPrefix << fmt.matSuffix;
@@ -138,6 +141,22 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
typename Derived::Nested m = _m;
typedef typename Derived::Scalar Scalar;
typedef typename
conditional<
is_same<Scalar, char>::value ||
is_same<Scalar, unsigned char>::value ||
is_same<Scalar, numext::int8_t>::value ||
is_same<Scalar, numext::uint8_t>::value,
int,
typename conditional<
is_same<Scalar, std::complex<char> >::value ||
is_same<Scalar, std::complex<unsigned char> >::value ||
is_same<Scalar, std::complex<numext::int8_t> >::value ||
is_same<Scalar, std::complex<numext::uint8_t> >::value,
std::complex<int>,
const Scalar&
>::type
>::type PrintType;
Index width = 0;
@@ -174,7 +193,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
{
std::stringstream sstr;
sstr.copyfmt(s);
sstr << m.coeff(i,j);
sstr << static_cast<PrintType>(m.coeff(i,j));
width = std::max<Index>(width, Index(sstr.str().length()));
}
}
@@ -190,7 +209,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, 0);
s << static_cast<PrintType>(m.coeff(i, 0));
for(Index j = 1; j < m.cols(); ++j)
{
s << fmt.coeffSeparator;
@@ -198,7 +217,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, j);
s << static_cast<PrintType>(m.coeff(i, j));
}
s << fmt.rowSuffix;
if( i < m.rows() - 1)

View File

@@ -54,7 +54,8 @@ struct traits<IndexedView<XprType, RowIndices, ColIndices> >
DirectAccessMask = (int(InnerIncr)!=UndefinedIncr && int(OuterIncr)!=UndefinedIncr && InnerIncr>=0 && OuterIncr>=0) ? DirectAccessBit : 0,
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask)) | FlagsLvalueBit | FlagsRowMajorBit
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1) ? LinearAccessBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask )) | FlagsLvalueBit | FlagsRowMajorBit | FlagsLinearAccessBit
};
typedef Block<XprType,RowsAtCompileTime,ColsAtCompileTime,IsInnerPannel> BlockType;
@@ -168,7 +169,11 @@ struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost /* TODO + cost of row/col index */,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits /*| LinearAccessBit | DirectAccessBit*/)),
FlagsLinearAccessBit = (traits<XprType>::RowsAtCompileTime == 1 || traits<XprType>::ColsAtCompileTime == 1) ? LinearAccessBit : 0,
FlagsRowMajorBit = traits<XprType>::FlagsRowMajorBit,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits & ~RowMajorBit /*| LinearAccessBit | DirectAccessBit*/)) | FlagsLinearAccessBit | FlagsRowMajorBit,
Alignment = 0
};
@@ -193,6 +198,31 @@ struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
return m_argImpl.coeffRef(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar& coeffRef(Index index) const
{
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const CoeffReturnType coeff(Index index) const
{
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
return m_argImpl.coeff( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
protected:
evaluator<ArgType> m_argImpl;

View File

@@ -182,6 +182,8 @@ template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
#endif
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
template<typename T>
EIGEN_DEVICE_FUNC
@@ -294,6 +296,9 @@ template<typename Derived> class MapBase<Derived, WriteAccessors>
// In theory we could simply refer to Base:Base::operator=, but MSVC does not like Base::Base,
// see bugs 821 and 920.
using ReadOnlyMapBase::Base::operator=;
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
};
#undef EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS

View File

@@ -10,9 +10,11 @@
#ifndef EIGEN_MATHFUNCTIONS_H
#define EIGEN_MATHFUNCTIONS_H
// source: http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html
// TODO this should better be moved to NumTraits
#define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406L
// Source: WolframAlpha
#define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406L
#define EIGEN_LOG2E 1.442695040888963407359924681001892137426645954152985934135449406931109219L
#define EIGEN_LN2 0.693147180559945309417232121458176568075500134360255254120680009493393621L
namespace Eigen {
@@ -321,6 +323,65 @@ struct abs2_retval
typedef typename NumTraits<Scalar>::Real type;
};
/****************************************************************************
* Implementation of sqrt/rsqrt *
****************************************************************************/
template<typename Scalar>
struct sqrt_impl
{
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE Scalar run(const Scalar& x)
{
EIGEN_USING_STD(sqrt);
return sqrt(x);
}
};
// Complex sqrt defined in MathFunctionsImpl.h.
template<typename T> EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& a_x);
// Custom implementation is faster than `std::sqrt`, works on
// GPU, and correctly handles special cases (unlike MSVC).
template<typename T>
struct sqrt_impl<std::complex<T> >
{
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE std::complex<T> run(const std::complex<T>& x)
{
return complex_sqrt<T>(x);
}
};
template<typename Scalar>
struct sqrt_retval
{
typedef Scalar type;
};
// Default implementation relies on numext::sqrt, at bottom of file.
template<typename T>
struct rsqrt_impl;
// Complex rsqrt defined in MathFunctionsImpl.h.
template<typename T> EIGEN_DEVICE_FUNC std::complex<T> complex_rsqrt(const std::complex<T>& a_x);
template<typename T>
struct rsqrt_impl<std::complex<T> >
{
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE std::complex<T> run(const std::complex<T>& x)
{
return complex_rsqrt<T>(x);
}
};
template<typename Scalar>
struct rsqrt_retval
{
typedef Scalar type;
};
/****************************************************************************
* Implementation of norm1 *
****************************************************************************/
@@ -335,7 +396,7 @@ struct norm1_default_impl<Scalar,true>
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return abs(x.real()) + abs(x.imag());
}
};
@@ -346,7 +407,7 @@ struct norm1_default_impl<Scalar, false>
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return abs(x);
}
};
@@ -376,7 +437,7 @@ struct hypot_retval
* Implementation of cast *
****************************************************************************/
template<typename OldType, typename NewType>
template<typename OldType, typename NewType, typename EnableIf = void>
struct cast_impl
{
EIGEN_DEVICE_FUNC
@@ -386,6 +447,22 @@ struct cast_impl
}
};
// Casting from S -> Complex<T> leads to an implicit conversion from S to T,
// generating warnings on clang. Here we explicitly cast the real component.
template<typename OldType, typename NewType>
struct cast_impl<OldType, NewType,
typename internal::enable_if<
!NumTraits<OldType>::IsComplex && NumTraits<NewType>::IsComplex
>::type>
{
EIGEN_DEVICE_FUNC
static inline NewType run(const OldType& x)
{
typedef typename NumTraits<NewType>::Real NewReal;
return static_cast<NewType>(static_cast<NewReal>(x));
}
};
// here, for once, we're plainly returning NewType: we don't want cast to do weird things.
template<typename OldType, typename NewType>
@@ -402,22 +479,24 @@ inline NewType cast(const OldType& x)
#if EIGEN_HAS_CXX11_MATH
template<typename Scalar>
struct round_impl {
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
EIGEN_USING_STD_MATH(round);
return round(x);
EIGEN_USING_STD(round);
return Scalar(round(x));
}
};
#else
template<typename Scalar>
struct round_impl
{
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
EIGEN_USING_STD_MATH(floor);
EIGEN_USING_STD_MATH(ceil);
EIGEN_USING_STD(floor);
EIGEN_USING_STD(ceil);
return (x > Scalar(0)) ? floor(x + Scalar(0.5)) : ceil(x - Scalar(0.5));
}
};
@@ -429,49 +508,110 @@ struct round_retval
typedef Scalar type;
};
/****************************************************************************
* Implementation of rint *
****************************************************************************/
template<typename Scalar>
struct rint_impl {
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
#if EIGEN_HAS_CXX11_MATH
EIGEN_USING_STD(rint);
#endif
return rint(x);
}
};
#if !EIGEN_HAS_CXX11_MATH
template<>
struct rint_impl<double> {
EIGEN_DEVICE_FUNC
static inline double run(const double& x)
{
return ::rint(x);
}
};
template<>
struct rint_impl<float> {
EIGEN_DEVICE_FUNC
static inline float run(const float& x)
{
return ::rintf(x);
}
};
#endif
template<typename Scalar>
struct rint_retval
{
typedef Scalar type;
};
/****************************************************************************
* Implementation of arg *
****************************************************************************/
#if EIGEN_HAS_CXX11_MATH
template<typename Scalar>
struct arg_impl {
static inline Scalar run(const Scalar& x)
{
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// HIP does not seem to have a native device side implementation for the math routine "arg"
using std::arg;
#else
EIGEN_USING_STD_MATH(arg);
#endif
return arg(x);
}
};
// std::arg is only defined for types of std::complex, or integer types or float/double/long double
template<typename Scalar,
bool HasStdImpl = NumTraits<Scalar>::IsComplex || is_integral<Scalar>::value
|| is_same<Scalar, float>::value || is_same<Scalar, double>::value
|| is_same<Scalar, long double>::value >
struct arg_default_impl;
template<typename Scalar>
struct arg_default_impl<Scalar, true> {
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// HIP does not seem to have a native device side implementation for the math routine "arg"
using std::arg;
#else
EIGEN_USING_STD(arg);
#endif
return static_cast<Scalar>(arg(x));
}
};
// Must be non-complex floating-point type (e.g. half/bfloat16).
template<typename Scalar>
struct arg_default_impl<Scalar, false> {
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0);
}
};
#else
template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
struct arg_default_impl
template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
struct arg_default_impl
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0); }
};
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0);
}
};
template<typename Scalar>
struct arg_default_impl<Scalar,true>
template<typename Scalar>
struct arg_default_impl<Scalar,true>
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
EIGEN_USING_STD_MATH(arg);
return arg(x);
}
};
template<typename Scalar> struct arg_impl : arg_default_impl<Scalar> {};
EIGEN_USING_STD(arg);
return arg(x);
}
};
#endif
template<typename Scalar> struct arg_impl : arg_default_impl<Scalar> {};
template<typename Scalar>
struct arg_retval
@@ -493,7 +633,7 @@ namespace std_fallback {
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD_MATH(exp);
EIGEN_USING_STD(exp);
Scalar u = exp(x);
if (numext::equal_strict(u, Scalar(1))) {
return x;
@@ -503,7 +643,7 @@ namespace std_fallback {
return RealScalar(-1);
}
EIGEN_USING_STD_MATH(log);
EIGEN_USING_STD(log);
Scalar logu = log(u);
return numext::equal_strict(u, logu) ? u : (u - RealScalar(1)) * x / logu;
}
@@ -523,16 +663,6 @@ struct expm1_impl {
}
};
// Specialization for complex types that are not supported by std::expm1.
template <typename RealScalar>
struct expm1_impl<std::complex<RealScalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<RealScalar> run(
const std::complex<RealScalar>& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar)
return std_fallback::expm1(x);
}
};
template<typename Scalar>
struct expm1_retval
{
@@ -550,7 +680,7 @@ namespace std_fallback {
EIGEN_DEVICE_FUNC inline Scalar log1p(const Scalar& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD_MATH(log);
EIGEN_USING_STD(log);
Scalar x1p = RealScalar(1) + x;
Scalar log_1p = log(x1p);
const bool is_small = numext::equal_strict(x1p, Scalar(1));
@@ -600,7 +730,7 @@ struct pow_impl
typedef typename ScalarBinaryOpTraits<ScalarX,ScalarY,internal::scalar_pow_op<ScalarX,ScalarY> >::ReturnType result_type;
static EIGEN_DEVICE_FUNC inline result_type run(const ScalarX& x, const ScalarY& y)
{
EIGEN_USING_STD_MATH(pow);
EIGEN_USING_STD(pow);
return pow(x, y);
}
};
@@ -902,12 +1032,12 @@ template<typename T> T generic_fast_tanh_float(const T& a_x);
namespace numext {
#if (!defined(EIGEN_GPUCC) || defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC))
#if (!defined(EIGEN_GPUCC) || defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC))
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y)
{
EIGEN_USING_STD_MATH(min);
EIGEN_USING_STD(min)
return min EIGEN_NOT_A_MACRO (x,y);
}
@@ -915,7 +1045,7 @@ template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
{
EIGEN_USING_STD_MATH(max);
EIGEN_USING_STD(max)
return max EIGEN_NOT_A_MACRO (x,y);
}
#else
@@ -1116,6 +1246,34 @@ inline EIGEN_MATHFUNC_RETVAL(abs2, Scalar) abs2(const Scalar& x)
EIGEN_DEVICE_FUNC
inline bool abs2(bool x) { return x; }
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T absdiff(const T& x, const T& y)
{
return x > y ? x - y : y - x;
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE float absdiff(const float& x, const float& y)
{
return fabsf(x - y);
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE double absdiff(const double& x, const double& y)
{
return fabs(x - y);
}
#if !defined(EIGEN_GPUCC)
// HIP and CUDA do not support long double.
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE long double absdiff(const long double& x, const long double& y) {
return fabsl(x - y);
}
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(norm1, Scalar) norm1(const Scalar& x)
@@ -1174,6 +1332,13 @@ SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isinf, isinf, bool)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isfinite, isfinite, bool)
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(rint, Scalar) rint(const Scalar& x)
{
return EIGEN_MATHFUNC_IMPL(rint, Scalar)::run(x);
}
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(round, Scalar) round(const Scalar& x)
@@ -1189,7 +1354,7 @@ template<typename T>
EIGEN_DEVICE_FUNC
T (floor)(const T& x)
{
EIGEN_USING_STD_MATH(floor);
EIGEN_USING_STD(floor)
return floor(x);
}
@@ -1209,7 +1374,7 @@ template<typename T>
EIGEN_DEVICE_FUNC
T (ceil)(const T& x)
{
EIGEN_USING_STD_MATH(ceil);
EIGEN_USING_STD(ceil);
return ceil(x);
}
@@ -1250,23 +1415,35 @@ inline int log2(int x)
*
* It's usage is justified in performance critical functions, like norm/normalize.
*/
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T sqrt(const T &x)
template<typename Scalar>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE EIGEN_MATHFUNC_RETVAL(sqrt, Scalar) sqrt(const Scalar& x)
{
EIGEN_USING_STD_MATH(sqrt);
return sqrt(x);
return EIGEN_MATHFUNC_IMPL(sqrt, Scalar)::run(x);
}
// Boolean specialization, avoids implicit float to bool conversion (-Wimplicit-conversion-floating-point-to-bool).
template<>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_DEVICE_FUNC
bool sqrt<bool>(const bool &x) { return x; }
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sqrt, sqrt)
#endif
/** \returns the reciprocal square root of \a x. **/
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T rsqrt(const T& x)
{
return internal::rsqrt_impl<T>::run(x);
}
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T log(const T &x) {
EIGEN_USING_STD_MATH(log);
return log(x);
EIGEN_USING_STD(log);
return static_cast<T>(log(x));
}
#if defined(SYCL_DEVICE_ONLY)
@@ -1286,7 +1463,7 @@ template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
typename internal::enable_if<NumTraits<T>::IsSigned || NumTraits<T>::IsComplex,typename NumTraits<T>::Real>::type
abs(const T &x) {
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return abs(x);
}
@@ -1323,7 +1500,7 @@ double abs(const std::complex<double>& x) {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T exp(const T &x) {
EIGEN_USING_STD_MATH(exp);
EIGEN_USING_STD(exp);
return exp(x);
}
@@ -1377,7 +1554,7 @@ double expm1(const double &x) { return ::expm1(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T cos(const T &x) {
EIGEN_USING_STD_MATH(cos);
EIGEN_USING_STD(cos);
return cos(x);
}
@@ -1396,7 +1573,7 @@ double cos(const double &x) { return ::cos(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T sin(const T &x) {
EIGEN_USING_STD_MATH(sin);
EIGEN_USING_STD(sin);
return sin(x);
}
@@ -1415,7 +1592,7 @@ double sin(const double &x) { return ::sin(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T tan(const T &x) {
EIGEN_USING_STD_MATH(tan);
EIGEN_USING_STD(tan);
return tan(x);
}
@@ -1434,7 +1611,7 @@ double tan(const double &x) { return ::tan(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T acos(const T &x) {
EIGEN_USING_STD_MATH(acos);
EIGEN_USING_STD(acos);
return acos(x);
}
@@ -1442,8 +1619,8 @@ T acos(const T &x) {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T acosh(const T &x) {
EIGEN_USING_STD_MATH(acosh);
return acosh(x);
EIGEN_USING_STD(acosh);
return static_cast<T>(acosh(x));
}
#endif
@@ -1463,7 +1640,7 @@ double acos(const double &x) { return ::acos(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T asin(const T &x) {
EIGEN_USING_STD_MATH(asin);
EIGEN_USING_STD(asin);
return asin(x);
}
@@ -1471,8 +1648,8 @@ T asin(const T &x) {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T asinh(const T &x) {
EIGEN_USING_STD_MATH(asinh);
return asinh(x);
EIGEN_USING_STD(asinh);
return static_cast<T>(asinh(x));
}
#endif
@@ -1492,16 +1669,16 @@ double asin(const double &x) { return ::asin(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T atan(const T &x) {
EIGEN_USING_STD_MATH(atan);
return atan(x);
EIGEN_USING_STD(atan);
return static_cast<T>(atan(x));
}
#if EIGEN_HAS_CXX11_MATH
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T atanh(const T &x) {
EIGEN_USING_STD_MATH(atanh);
return atanh(x);
EIGEN_USING_STD(atanh);
return static_cast<T>(atanh(x));
}
#endif
@@ -1522,8 +1699,8 @@ double atan(const double &x) { return ::atan(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T cosh(const T &x) {
EIGEN_USING_STD_MATH(cosh);
return cosh(x);
EIGEN_USING_STD(cosh);
return static_cast<T>(cosh(x));
}
#if defined(SYCL_DEVICE_ONLY)
@@ -1541,8 +1718,8 @@ double cosh(const double &x) { return ::cosh(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T sinh(const T &x) {
EIGEN_USING_STD_MATH(sinh);
return sinh(x);
EIGEN_USING_STD(sinh);
return static_cast<T>(sinh(x));
}
#if defined(SYCL_DEVICE_ONLY)
@@ -1560,7 +1737,7 @@ double sinh(const double &x) { return ::sinh(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T tanh(const T &x) {
EIGEN_USING_STD_MATH(tanh);
EIGEN_USING_STD(tanh);
return tanh(x);
}
@@ -1584,7 +1761,7 @@ double tanh(const double &x) { return ::tanh(x); }
template <typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T fmod(const T& a, const T& b) {
EIGEN_USING_STD_MATH(fmod);
EIGEN_USING_STD(fmod);
return fmod(a, b);
}
@@ -1746,6 +1923,11 @@ template<> struct random_impl<bool>
{
return random<int>(0,1)==0 ? false : true;
}
static inline bool run(const bool& a, const bool& b)
{
return random<int>(a, b)==0 ? false : true;
}
};
template<> struct scalar_fuzzy_impl<bool>
@@ -1772,6 +1954,45 @@ template<> struct scalar_fuzzy_impl<bool>
};
} // end namespace internal
// Default implementations that rely on other numext implementations
namespace internal {
// Specialization for complex types that are not supported by std::expm1.
template <typename RealScalar>
struct expm1_impl<std::complex<RealScalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<RealScalar> run(
const std::complex<RealScalar>& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar)
RealScalar xr = x.real();
RealScalar xi = x.imag();
// expm1(z) = exp(z) - 1
// = exp(x + i * y) - 1
// = exp(x) * (cos(y) + i * sin(y)) - 1
// = exp(x) * cos(y) - 1 + i * exp(x) * sin(y)
// Imag(expm1(z)) = exp(x) * sin(y)
// Real(expm1(z)) = exp(x) * cos(y) - 1
// = exp(x) * cos(y) - 1.
// = expm1(x) + exp(x) * (cos(y) - 1)
// = expm1(x) + exp(x) * (2 * sin(y / 2) ** 2)
RealScalar erm1 = numext::expm1<RealScalar>(xr);
RealScalar er = erm1 + RealScalar(1.);
RealScalar sin2 = numext::sin(xi / RealScalar(2.));
sin2 = sin2 * sin2;
RealScalar s = numext::sin(xi);
RealScalar real_part = erm1 - RealScalar(2.) * er * sin2;
return std::complex<RealScalar>(real_part, er * s);
}
};
template<typename T>
struct rsqrt_impl {
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE T run(const T& x) {
return T(1)/numext::sqrt(x);
}
};
} // end namespace internal

View File

@@ -17,19 +17,28 @@ namespace internal {
/** \internal \returns the hyperbolic tan of \a a (coeff-wise)
Doesn't do anything fancy, just a 13/6-degree rational interpolant which
is accurate up to a couple of ulp in the range [-9, 9], outside of which
the tanh(x) = +/-1.
is accurate up to a couple of ulps in the (approximate) range [-8, 8],
outside of which tanh(x) = +/-1 in single precision. The input is clamped
to the range [-c, c]. The value c is chosen as the smallest value where
the approximation evaluates to exactly 1. In the reange [-0.0004, 0.0004]
the approxmation tanh(x) ~= x is used for better accuracy as x tends to zero.
This implementation works on both scalars and packets.
*/
template<typename T>
T generic_fast_tanh_float(const T& a_x)
{
// Clamp the inputs to the range [-9, 9] since anything outside
// this range is +/-1.0f in single-precision.
const T plus_9 = pset1<T>(9.f);
const T minus_9 = pset1<T>(-9.f);
const T x = pmax(pmin(a_x, plus_9), minus_9);
// Clamp the inputs to the range [-c, c]
#ifdef EIGEN_VECTORIZE_FMA
const T plus_clamp = pset1<T>(7.99881172180175781f);
const T minus_clamp = pset1<T>(-7.99881172180175781f);
#else
const T plus_clamp = pset1<T>(7.90531110763549805f);
const T minus_clamp = pset1<T>(-7.90531110763549805f);
#endif
const T tiny = pset1<T>(0.0004f);
const T x = pmax(pmin(a_x, plus_clamp), minus_clamp);
const T tiny_mask = pcmp_lt(pabs(a_x), tiny);
// The monomial coefficients of the numerator polynomial (odd).
const T alpha_1 = pset1<T>(4.89352455891786e-03f);
const T alpha_3 = pset1<T>(6.37261928875436e-04f);
@@ -57,20 +66,26 @@ T generic_fast_tanh_float(const T& a_x)
p = pmadd(x2, p, alpha_1);
p = pmul(x, p);
// Evaluate the denominator polynomial p.
// Evaluate the denominator polynomial q.
T q = pmadd(x2, beta_6, beta_4);
q = pmadd(x2, q, beta_2);
q = pmadd(x2, q, beta_0);
// Divide the numerator by the denominator.
return pdiv(p, q);
return pselect(tiny_mask, x, pdiv(p, q));
}
template<typename RealScalar>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
RealScalar positive_real_hypot(const RealScalar& x, const RealScalar& y)
{
EIGEN_USING_STD_MATH(sqrt);
// IEEE IEC 6059 special cases.
if ((numext::isinf)(x) || (numext::isinf)(y))
return NumTraits<RealScalar>::infinity();
if ((numext::isnan)(x) || (numext::isnan)(y))
return NumTraits<RealScalar>::quiet_NaN();
EIGEN_USING_STD(sqrt);
RealScalar p, qp;
p = numext::maxi(x,y);
if(p==RealScalar(0)) return RealScalar(0);
@@ -85,11 +100,90 @@ struct hypot_impl
static EIGEN_DEVICE_FUNC
inline RealScalar run(const Scalar& x, const Scalar& y)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return positive_real_hypot<RealScalar>(abs(x), abs(y));
}
};
// Generic complex sqrt implementation that correctly handles corner cases
// according to https://en.cppreference.com/w/cpp/numeric/complex/sqrt
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
// Computes the principal sqrt of the input.
//
// For a complex square root of the number x + i*y. We want to find real
// numbers u and v such that
// (u + i*v)^2 = x + i*y <=>
// u^2 - v^2 + i*2*u*v = x + i*v.
// By equating the real and imaginary parts we get:
// u^2 - v^2 = x
// 2*u*v = y.
//
// For x >= 0, this has the numerically stable solution
// u = sqrt(0.5 * (x + sqrt(x^2 + y^2)))
// v = y / (2 * u)
// and for x < 0,
// v = sign(y) * sqrt(0.5 * (-x + sqrt(x^2 + y^2)))
// u = y / (2 * v)
//
// Letting w = sqrt(0.5 * (|x| + |z|)),
// if x == 0: u = w, v = sign(y) * w
// if x > 0: u = w, v = y / (2 * w)
// if x < 0: u = |y| / (2 * w), v = sign(y) * w
const T x = numext::real(z);
const T y = numext::imag(z);
const T zero = T(0);
const T w = numext::sqrt(T(0.5) * (numext::abs(x) + numext::hypot(x, y)));
return
(numext::isinf)(y) ? std::complex<T>(NumTraits<T>::infinity(), y)
: x == zero ? std::complex<T>(w, y < zero ? -w : w)
: x > zero ? std::complex<T>(w, y / (2 * w))
: std::complex<T>(numext::abs(y) / (2 * w), y < zero ? -w : w );
}
// Generic complex rsqrt implementation.
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_rsqrt(const std::complex<T>& z) {
// Computes the principal reciprocal sqrt of the input.
//
// For a complex reciprocal square root of the number z = x + i*y. We want to
// find real numbers u and v such that
// (u + i*v)^2 = 1 / (x + i*y) <=>
// u^2 - v^2 + i*2*u*v = x/|z|^2 - i*v/|z|^2.
// By equating the real and imaginary parts we get:
// u^2 - v^2 = x/|z|^2
// 2*u*v = y/|z|^2.
//
// For x >= 0, this has the numerically stable solution
// u = sqrt(0.5 * (x + |z|)) / |z|
// v = -y / (2 * u * |z|)
// and for x < 0,
// v = -sign(y) * sqrt(0.5 * (-x + |z|)) / |z|
// u = -y / (2 * v * |z|)
//
// Letting w = sqrt(0.5 * (|x| + |z|)),
// if x == 0: u = w / |z|, v = -sign(y) * w / |z|
// if x > 0: u = w / |z|, v = -y / (2 * w * |z|)
// if x < 0: u = |y| / (2 * w * |z|), v = -sign(y) * w / |z|
const T x = numext::real(z);
const T y = numext::imag(z);
const T zero = T(0);
const T abs_z = numext::hypot(x, y);
const T w = numext::sqrt(T(0.5) * (numext::abs(x) + abs_z));
const T woz = w / abs_z;
// Corner cases consistent with 1/sqrt(z) on gcc/clang.
return
abs_z == zero ? std::complex<T>(NumTraits<T>::infinity(), NumTraits<T>::quiet_NaN())
: ((numext::isinf)(x) || (numext::isinf)(y)) ? std::complex<T>(zero, zero)
: x == zero ? std::complex<T>(woz, y < zero ? woz : -woz)
: x > zero ? std::complex<T>(woz, -y / (2 * w * abs_z))
: std::complex<T>(numext::abs(y) / (2 * w * abs_z), y < zero ? woz : -woz );
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -481,7 +481,8 @@ template<typename Derived> class MatrixBase
EIGEN_MATRIX_FUNCTION_1(MatrixComplexPowerReturnValue, pow, power to \c p, const std::complex<RealScalar>& p)
protected:
EIGEN_DEVICE_FUNC MatrixBase() : Base() {}
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MatrixBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MatrixBase)
private:
EIGEN_DEVICE_FUNC explicit MatrixBase(int);

View File

@@ -21,14 +21,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits10_impl
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return std::numeric_limits<T>::digits10; }
};
template<typename T>
struct default_digits10_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() {
using std::log10;
using std::ceil;
@@ -40,7 +40,7 @@ struct default_digits10_impl<T,false,false> // Floating point
template<typename T>
struct default_digits10_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return 0; }
};
@@ -52,14 +52,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits_impl
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return std::numeric_limits<T>::digits; }
};
template<typename T>
struct default_digits_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() {
using std::log;
using std::ceil;
@@ -71,12 +71,33 @@ struct default_digits_impl<T,false,false> // Floating point
template<typename T>
struct default_digits_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return 0; }
};
} // end namespace internal
namespace numext {
/** \internal bit-wise cast without changing the underlying bit representation. */
// TODO: Replace by std::bit_cast (available in C++20)
template <typename Tgt, typename Src>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Tgt bit_cast(const Src& src) {
#if EIGEN_HAS_TYPE_TRAITS
// The behaviour of memcpy is not specified for non-trivially copyable types
EIGEN_STATIC_ASSERT(std::is_trivially_copyable<Src>::value, THIS_TYPE_IS_NOT_SUPPORTED);
EIGEN_STATIC_ASSERT(std::is_trivially_copyable<Tgt>::value && std::is_default_constructible<Tgt>::value,
THIS_TYPE_IS_NOT_SUPPORTED);
#endif
EIGEN_STATIC_ASSERT(sizeof(Src) == sizeof(Tgt), THIS_TYPE_IS_NOT_SUPPORTED);
Tgt tgt;
EIGEN_USING_STD(memcpy)
memcpy(&tgt, &src, sizeof(Tgt));
return tgt;
}
} // namespace numext
/** \class NumTraits
* \ingroup Core_Module
*
@@ -140,25 +161,25 @@ template<typename T> struct GenericNumTraits
typedef T Nested;
typedef T Literal;
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real epsilon()
{
return numext::numeric_limits<T>::epsilon();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits10()
{
return internal::default_digits10_impl<T>::run();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits()
{
return internal::default_digits_impl<T>::run();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real dummy_precision()
{
// make sure to override this for floating-point types
@@ -166,23 +187,23 @@ template<typename T> struct GenericNumTraits
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T highest() {
return (numext::numeric_limits<T>::max)();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T lowest() {
return IsInteger ? (numext::numeric_limits<T>::min)()
: static_cast<T>(-(numext::numeric_limits<T>::max)());
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T infinity() {
return numext::numeric_limits<T>::infinity();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T quiet_NaN() {
return numext::numeric_limits<T>::quiet_NaN();
}
@@ -194,19 +215,20 @@ template<typename T> struct NumTraits : GenericNumTraits<T>
template<> struct NumTraits<float>
: GenericNumTraits<float>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline float dummy_precision() { return 1e-5f; }
};
template<> struct NumTraits<double> : GenericNumTraits<double>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline double dummy_precision() { return 1e-12; }
};
template<> struct NumTraits<long double>
: GenericNumTraits<long double>
{
EIGEN_CONSTEXPR
static inline long double dummy_precision() { return 1e-15l; }
};
@@ -223,11 +245,11 @@ template<typename _Real> struct NumTraits<std::complex<_Real> >
MulCost = 4 * NumTraits<Real>::MulCost + 2 * NumTraits<Real>::AddCost
};
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real epsilon() { return NumTraits<Real>::epsilon(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real dummy_precision() { return NumTraits<Real>::dummy_precision(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits10() { return NumTraits<Real>::digits10(); }
};
@@ -252,11 +274,12 @@ struct NumTraits<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::MulCost
};
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline RealScalar epsilon() { return NumTraits<RealScalar>::epsilon(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline RealScalar dummy_precision() { return NumTraits<RealScalar>::dummy_precision(); }
EIGEN_CONSTEXPR
static inline int digits10() { return NumTraits<Scalar>::digits10(); }
};
@@ -270,6 +293,7 @@ template<> struct NumTraits<std::string>
MulCost = HugeCost
};
EIGEN_CONSTEXPR
static inline int digits10() { return 0; }
private:
@@ -284,6 +308,8 @@ private:
// Empty specialization for void to allow template specialization based on NumTraits<T>::Real with T==void and SFINAE.
template<> struct NumTraits<void> {};
template<> struct NumTraits<bool> : GenericNumTraits<bool> {};
} // end namespace Eigen
#endif // EIGEN_NUMTRAITS_H

View File

@@ -717,18 +717,26 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
using Base::setConstant;
EIGEN_DEVICE_FUNC Derived& setConstant(Index size, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, Index cols, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(NoChange_t, Index cols, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, NoChange_t, const Scalar& val);
using Base::setZero;
EIGEN_DEVICE_FUNC Derived& setZero(Index size);
EIGEN_DEVICE_FUNC Derived& setZero(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setZero(NoChange_t, Index cols);
EIGEN_DEVICE_FUNC Derived& setZero(Index rows, NoChange_t);
using Base::setOnes;
EIGEN_DEVICE_FUNC Derived& setOnes(Index size);
EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setOnes(NoChange_t, Index cols);
EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, NoChange_t);
using Base::setRandom;
Derived& setRandom(Index size);
Derived& setRandom(Index rows, Index cols);
Derived& setRandom(NoChange_t, Index cols);
Derived& setRandom(Index rows, NoChange_t);
#ifdef EIGEN_PLAINOBJECTBASE_PLUGIN
#include EIGEN_PLAINOBJECTBASE_PLUGIN

View File

@@ -14,7 +14,7 @@
#define EIGEN_PRODUCTEVALUATORS_H
namespace Eigen {
namespace internal {
/** \internal
@@ -22,19 +22,19 @@ namespace internal {
* Since products require special treatments to handle all possible cases,
* we simply defer the evaluation logic to a product_evaluator class
* which offers more partial specialization possibilities.
*
*
* \sa class product_evaluator
*/
template<typename Lhs, typename Rhs, int Options>
struct evaluator<Product<Lhs, Rhs, Options> >
struct evaluator<Product<Lhs, Rhs, Options> >
: public product_evaluator<Product<Lhs, Rhs, Options> >
{
typedef Product<Lhs, Rhs, Options> XprType;
typedef product_evaluator<XprType> Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
// Catch "scalar * ( A * B )" and transform it to "(A*scalar) * B"
// TODO we should apply that rule only if that's really helpful
template<typename Lhs, typename Rhs, typename Scalar1, typename Scalar2, typename Plain1>
@@ -62,12 +62,12 @@ struct evaluator<CwiseBinaryOp<internal::scalar_product_op<Scalar1,Scalar2>,
template<typename Lhs, typename Rhs, int DiagIndex>
struct evaluator<Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> >
struct evaluator<Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> >
: public evaluator<Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex> >
{
typedef Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> XprType;
typedef evaluator<Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex> > Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr)
: Base(Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex>(
Product<Lhs, Rhs, LazyProduct>(xpr.nestedExpression().lhs(), xpr.nestedExpression().rhs()),
@@ -108,23 +108,23 @@ struct product_evaluator<Product<Lhs, Rhs, Options>, ProductTag, LhsShape, RhsSh
: m_result(xpr.rows(), xpr.cols())
{
::new (static_cast<Base*>(this)) Base(m_result);
// FIXME shall we handle nested_eval here?,
// if so, then we must take care at removing the call to nested_eval in the specializations (e.g., in permutation_matrix_product, transposition_matrix_product, etc.)
// typedef typename internal::nested_eval<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
// typedef typename internal::nested_eval<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
// typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
// typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
//
//
// const LhsNested lhs(xpr.lhs());
// const RhsNested rhs(xpr.rhs());
//
//
// generic_product_impl<LhsNestedCleaned, RhsNestedCleaned>::evalTo(m_result, lhs, rhs);
generic_product_impl<Lhs, Rhs, LhsShape, RhsShape, ProductTag>::evalTo(m_result, xpr.lhs(), xpr.rhs());
}
protected:
protected:
PlainObject m_result;
};
@@ -250,13 +250,13 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,InnerProduct>
{
dst.coeffRef(0,0) = (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
dst.coeffRef(0,0) += (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.coeffRef(0,0) -= (lhs.transpose().cwiseProduct(rhs)).sum(); }
@@ -298,7 +298,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
{
template<typename T> struct is_row_major : internal::conditional<(int(T::Flags)&RowMajorBit), internal::true_type, internal::false_type>::type {};
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
// TODO it would be nice to be able to exploit our *_assign_op functors for that purpose
struct set { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() = src; } };
struct add { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() += src; } };
@@ -310,31 +310,31 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
dst.const_cast_derived() += m_scale * src;
}
};
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, set(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, add(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, sub(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
internal::outer_product_selector_run(dst, lhs, rhs, adds(alpha), is_row_major<Dst>());
}
};
@@ -343,7 +343,7 @@ template<typename Lhs, typename Rhs, typename Derived>
struct generic_product_impl_base
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.setZero(); scaleAndAddTo(dst, lhs, rhs, Scalar(1)); }
@@ -355,7 +355,7 @@ struct generic_product_impl_base
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ scaleAndAddTo(dst, lhs, rhs, Scalar(-1)); }
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{ Derived::scaleAndAddTo(dst,lhs,rhs,alpha); }
@@ -375,6 +375,11 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
template<typename Dest>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
// Fallback to inner product if both the lhs and rhs is a runtime vector.
if (lhs.rows() == 1 && rhs.cols() == 1) {
dst.coeffRef(0,0) += alpha * lhs.row(0).conjugate().dot(rhs.col(0));
return;
}
LhsNested actual_lhs(lhs);
RhsNested actual_rhs(rhs);
internal::gemv_dense_selector<Side,
@@ -385,10 +390,10 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
};
template<typename Lhs, typename Rhs>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
@@ -403,7 +408,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
// dst.noalias() += lhs.lazyProduct(rhs);
call_assignment_no_alias(dst, lhs.lazyProduct(rhs), internal::add_assign_op<typename Dst::Scalar,Scalar>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
@@ -436,8 +441,8 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
};
// FIXME: in c++11 this should be auto, and extractScalarFactor should also return auto
// this is important for real*complex_mat
Scalar actualAlpha = blas_traits<Lhs>::extractScalarFactor(lhs)
* blas_traits<Rhs>::extractScalarFactor(rhs);
Scalar actualAlpha = combine_scalar_factors<Scalar>(lhs, rhs);
eval_dynamic_impl(dst,
blas_traits<Lhs>::extract(lhs).template conjugateIf<ConjLhs>(),
blas_traits<Rhs>::extract(rhs).template conjugateIf<ConjRhs>(),
@@ -520,7 +525,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
typedef typename internal::nested_eval<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
typedef typename internal::nested_eval<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
@@ -539,7 +544,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
typedef typename find_best_packet<Scalar,ColsAtCompileTime>::type RhsVecPacketType;
enum {
LhsCoeffReadCost = LhsEtorType::CoeffReadCost,
RhsCoeffReadCost = RhsEtorType::CoeffReadCost,
CoeffReadCost = InnerSize==0 ? NumTraits<Scalar>::ReadCost
@@ -548,10 +553,10 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
+ (InnerSize - 1) * NumTraits<Scalar>::AddCost,
Unroll = CoeffReadCost <= EIGEN_UNROLLING_LIMIT,
LhsFlags = LhsEtorType::Flags,
RhsFlags = RhsEtorType::Flags,
LhsRowMajor = LhsFlags & RowMajorBit,
RhsRowMajor = RhsFlags & RowMajorBit,
@@ -561,7 +566,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
// Here, we don't care about alignment larger than the usable packet size.
LhsAlignment = EIGEN_PLAIN_ENUM_MIN(LhsEtorType::Alignment,LhsVecPacketSize*int(sizeof(typename LhsNestedCleaned::Scalar))),
RhsAlignment = EIGEN_PLAIN_ENUM_MIN(RhsEtorType::Alignment,RhsVecPacketSize*int(sizeof(typename RhsNestedCleaned::Scalar))),
SameType = is_same<typename LhsNestedCleaned::Scalar,typename RhsNestedCleaned::Scalar>::value,
CanVectorizeRhs = bool(RhsRowMajor) && (RhsFlags & PacketAccessBit) && (ColsAtCompileTime!=1),
@@ -576,7 +581,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
// TODO enable vectorization for mixed types
| (SameType && (CanVectorizeLhs || CanVectorizeRhs) ? PacketAccessBit : 0)
| (XprType::IsVectorAtCompileTime ? LinearAccessBit : 0),
LhsOuterStrideBytes = int(LhsNestedCleaned::OuterStrideAtCompileTime) * int(sizeof(typename LhsNestedCleaned::Scalar)),
RhsOuterStrideBytes = int(RhsNestedCleaned::OuterStrideAtCompileTime) * int(sizeof(typename RhsNestedCleaned::Scalar)),
@@ -595,7 +600,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
&& (LhsFlags & RhsFlags & ActualPacketAccessBit)
&& (InnerSize % packet_traits<Scalar>::size == 0)
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index row, Index col) const
{
return (m_lhs.row(row).transpose().cwiseProduct( m_rhs.col(col) )).sum();
@@ -637,7 +642,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
protected:
typename internal::add_const_on_value_type<LhsNested>::type m_lhs;
typename internal::add_const_on_value_type<RhsNested>::type m_rhs;
LhsEtorType m_lhsImpl;
RhsEtorType m_rhsImpl;
@@ -668,7 +673,7 @@ struct product_evaluator<Product<Lhs, Rhs, DefaultProduct>, LazyCoeffBasedProduc
template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
{
etor_product_packet_impl<RowMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
res = pmadd(pset1<Packet>(lhs.coeff(row, Index(UnrollingIndex-1))), rhs.template packet<LoadMode,Packet>(Index(UnrollingIndex-1), col), res);
@@ -678,7 +683,7 @@ struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, Load
template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
{
etor_product_packet_impl<ColMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
res = pmadd(lhs.template packet<LoadMode,Packet>(row, Index(UnrollingIndex-1)), pset1<Packet>(rhs.coeff(Index(UnrollingIndex-1), col)), res);
@@ -688,7 +693,7 @@ struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, Load
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, 1, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
{
res = pmul(pset1<Packet>(lhs.coeff(row, Index(0))),rhs.template packet<LoadMode,Packet>(Index(0), col));
}
@@ -697,7 +702,7 @@ struct etor_product_packet_impl<RowMajor, 1, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, 1, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
{
res = pmul(lhs.template packet<LoadMode,Packet>(row, Index(0)), pset1<Packet>(rhs.coeff(Index(0), col)));
}
@@ -706,7 +711,7 @@ struct etor_product_packet_impl<ColMajor, 1, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
}
@@ -715,7 +720,7 @@ struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
}
@@ -724,7 +729,7 @@ struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
for(Index i = 0; i < innerDim; ++i)
@@ -735,7 +740,7 @@ struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
for(Index i = 0; i < innerDim; ++i)
@@ -757,7 +762,7 @@ struct generic_product_impl<Lhs,Rhs,TriangularShape,DenseShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,TriangularShape,DenseShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -771,7 +776,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,TriangularShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,DenseShape,TriangularShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -792,7 +797,7 @@ struct generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static EIGEN_DEVICE_FUNC
void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
@@ -806,7 +811,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -818,7 +823,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag>
/***************************************************************************
* Diagonal products
***************************************************************************/
template<typename MatrixType, typename DiagonalType, typename Derived, int ProductOrder>
struct diagonal_product_evaluator_base
: evaluator_base<Derived>
@@ -827,10 +832,10 @@ struct diagonal_product_evaluator_base
public:
enum {
CoeffReadCost = NumTraits<Scalar>::MulCost + evaluator<MatrixType>::CoeffReadCost + evaluator<DiagonalType>::CoeffReadCost,
MatrixFlags = evaluator<MatrixType>::Flags,
DiagFlags = evaluator<DiagonalType>::Flags,
_StorageOrder = (Derived::MaxRowsAtCompileTime==1 && Derived::MaxColsAtCompileTime!=1) ? RowMajor
: (Derived::MaxColsAtCompileTime==1 && Derived::MaxRowsAtCompileTime!=1) ? ColMajor
: MatrixFlags & RowMajorBit ? RowMajor : ColMajor,
@@ -853,14 +858,14 @@ public:
|| (DiagonalType::SizeAtCompileTime==Dynamic && MatrixType::RowsAtCompileTime==1 && ProductOrder==OnTheLeft)
|| (DiagonalType::SizeAtCompileTime==Dynamic && MatrixType::ColsAtCompileTime==1 && ProductOrder==OnTheRight)
};
diagonal_product_evaluator_base(const MatrixType &mat, const DiagonalType &diag)
EIGEN_DEVICE_FUNC diagonal_product_evaluator_base(const MatrixType &mat, const DiagonalType &diag)
: m_diagImpl(diag), m_matImpl(mat)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(NumTraits<Scalar>::MulCost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index idx) const
{
if(AsScalarProduct)
@@ -868,7 +873,7 @@ public:
else
return m_diagImpl.coeff(idx) * m_matImpl.coeff(idx);
}
protected:
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet_impl(Index row, Index col, Index id, internal::true_type) const
@@ -876,7 +881,7 @@ protected:
return internal::pmul(m_matImpl.template packet<LoadMode,PacketType>(row, col),
internal::pset1<PacketType>(m_diagImpl.coeff(id)));
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet_impl(Index row, Index col, Index id, internal::false_type) const
{
@@ -887,7 +892,7 @@ protected:
return internal::pmul(m_matImpl.template packet<LoadMode,PacketType>(row, col),
m_diagImpl.template packet<DiagonalPacketLoadMode,PacketType>(id));
}
evaluator<DiagonalType> m_diagImpl;
evaluator<MatrixType> m_matImpl;
};
@@ -902,24 +907,24 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
using Base::m_matImpl;
using Base::coeff;
typedef typename Base::Scalar Scalar;
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
typedef typename Lhs::DiagonalVectorType DiagonalType;
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.rhs(), xpr.lhs().diagonal())
{
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
{
return m_diagImpl.coeff(row) * m_matImpl.coeff(row, col);
}
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
@@ -929,7 +934,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
return this->template packet_impl<LoadMode,PacketType>(row,col, row,
typename internal::conditional<int(StorageOrder)==RowMajor, internal::true_type, internal::false_type>::type());
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index idx) const
{
@@ -948,22 +953,22 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
using Base::m_matImpl;
using Base::coeff;
typedef typename Base::Scalar Scalar;
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.lhs(), xpr.rhs().diagonal())
{
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
{
return m_matImpl.coeff(row, col) * m_diagImpl.coeff(col);
}
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
@@ -971,7 +976,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
return this->template packet_impl<LoadMode,PacketType>(row,col, col,
typename internal::conditional<int(StorageOrder)==ColMajor, internal::true_type, internal::false_type>::type());
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index idx) const
{
@@ -999,7 +1004,7 @@ struct permutation_matrix_product<ExpressionType, Side, Transposed, DenseShape>
typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
template<typename Dest, typename PermutationType>
static inline void run(Dest& dst, const PermutationType& perm, const ExpressionType& xpr)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Dest& dst, const PermutationType& perm, const ExpressionType& xpr)
{
MatrixType mat(xpr);
const Index n = Side==OnTheLeft ? mat.rows() : mat.cols();
@@ -1053,7 +1058,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, PermutationShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
permutation_matrix_product<Rhs, OnTheLeft, false, MatrixShape>::run(dst, lhs, rhs);
}
@@ -1063,7 +1068,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, MatrixShape, PermutationShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
permutation_matrix_product<Lhs, OnTheRight, false, MatrixShape>::run(dst, rhs, lhs);
}
@@ -1073,7 +1078,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Inverse<Lhs>, Rhs, PermutationShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Inverse<Lhs>& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Inverse<Lhs>& lhs, const Rhs& rhs)
{
permutation_matrix_product<Rhs, OnTheLeft, true, MatrixShape>::run(dst, lhs.nestedExpression(), rhs);
}
@@ -1083,7 +1088,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Inverse<Rhs>, MatrixShape, PermutationShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Inverse<Rhs>& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Inverse<Rhs>& rhs)
{
permutation_matrix_product<Lhs, OnTheRight, true, MatrixShape>::run(dst, rhs.nestedExpression(), lhs);
}
@@ -1105,9 +1110,9 @@ struct transposition_matrix_product
{
typedef typename nested_eval<ExpressionType, 1>::type MatrixType;
typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
template<typename Dest, typename TranspositionType>
static inline void run(Dest& dst, const TranspositionType& tr, const ExpressionType& xpr)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Dest& dst, const TranspositionType& tr, const ExpressionType& xpr)
{
MatrixType mat(xpr);
typedef typename TranspositionType::StorageIndex StorageIndex;
@@ -1130,7 +1135,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, TranspositionsShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
transposition_matrix_product<Rhs, OnTheLeft, false, MatrixShape>::run(dst, lhs, rhs);
}
@@ -1140,7 +1145,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, MatrixShape, TranspositionsShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
transposition_matrix_product<Lhs, OnTheRight, false, MatrixShape>::run(dst, rhs, lhs);
}
@@ -1151,7 +1156,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Transpose<Lhs>, Rhs, TranspositionsShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Transpose<Lhs>& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Transpose<Lhs>& lhs, const Rhs& rhs)
{
transposition_matrix_product<Rhs, OnTheLeft, true, MatrixShape>::run(dst, lhs.nestedExpression(), rhs);
}
@@ -1161,7 +1166,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Transpose<Rhs>, MatrixShape, TranspositionsShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Transpose<Rhs>& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Transpose<Rhs>& rhs)
{
transposition_matrix_product<Lhs, OnTheRight, true, MatrixShape>::run(dst, rhs.nestedExpression(), lhs);
}

View File

@@ -177,6 +177,42 @@ PlainObjectBase<Derived>::setRandom(Index rows, Index cols)
return setRandom();
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to random values. For the parameter of type
* NoChange_t, just pass the special value \c NoChange.
*
* Numbers are uniformly spread through their whole definition range for integer types,
* and in the [-1:1] range for floating point scalar types.
*
* \not_reentrant
*
* \sa DenseBase::setRandom(), setRandom(Index), setRandom(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Random()
*/
template<typename Derived>
EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setRandom(NoChange_t, Index cols)
{
return setRandom(rows(), cols);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to random values. For the parameter of type
* NoChange_t, just pass the special value \c NoChange.
*
* Numbers are uniformly spread through their whole definition range for integer types,
* and in the [-1:1] range for floating point scalar types.
*
* \not_reentrant
*
* \sa DenseBase::setRandom(), setRandom(Index), setRandom(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Random()
*/
template<typename Derived>
EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setRandom(Index rows, NoChange_t)
{
return setRandom(rows, cols());
}
} // end namespace Eigen
#endif // EIGEN_RANDOM_H

View File

@@ -93,29 +93,127 @@ protected:
typedef Stride<StrideType::OuterStrideAtCompileTime,StrideType::InnerStrideAtCompileTime> StrideBase;
template<typename Expression>
EIGEN_DEVICE_FUNC void construct(Expression& expr)
{
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(PlainObjectType,Expression);
// Resolves inner stride if default 0.
static EIGEN_DEVICE_FUNC Index resolveInnerStride(Index inner) {
if (inner == 0) {
return 1;
}
return inner;
}
// Resolves outer stride if default 0.
static EIGEN_DEVICE_FUNC Index resolveOuterStride(Index inner, Index outer, Index rows, Index cols, bool isVectorAtCompileTime, bool isRowMajor) {
if (outer == 0) {
if (isVectorAtCompileTime) {
outer = inner * rows * cols;
} else if (isRowMajor) {
outer = inner * cols;
} else {
outer = inner * rows;
}
}
return outer;
}
// Returns true if construction is valid, false if there is a stride mismatch,
// and fails if there is a size mismatch.
template<typename Expression>
EIGEN_DEVICE_FUNC bool construct(Expression& expr)
{
// Check matrix sizes. If this is a compile-time vector, we do allow
// implicitly transposing.
EIGEN_STATIC_ASSERT(
EIGEN_PREDICATE_SAME_MATRIX_SIZE(PlainObjectType, Expression)
// If it is a vector, the transpose sizes might match.
|| ( PlainObjectType::IsVectorAtCompileTime
&& ((int(PlainObjectType::RowsAtCompileTime)==Eigen::Dynamic
|| int(Expression::ColsAtCompileTime)==Eigen::Dynamic
|| int(PlainObjectType::RowsAtCompileTime)==int(Expression::ColsAtCompileTime))
&& (int(PlainObjectType::ColsAtCompileTime)==Eigen::Dynamic
|| int(Expression::RowsAtCompileTime)==Eigen::Dynamic
|| int(PlainObjectType::ColsAtCompileTime)==int(Expression::RowsAtCompileTime)))),
YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES
)
// Determine runtime rows and columns.
Index rows = expr.rows();
Index cols = expr.cols();
if(PlainObjectType::RowsAtCompileTime==1)
{
eigen_assert(expr.rows()==1 || expr.cols()==1);
::new (static_cast<Base*>(this)) Base(expr.data(), 1, expr.size());
rows = 1;
cols = expr.size();
}
else if(PlainObjectType::ColsAtCompileTime==1)
{
eigen_assert(expr.rows()==1 || expr.cols()==1);
::new (static_cast<Base*>(this)) Base(expr.data(), expr.size(), 1);
rows = expr.size();
cols = 1;
}
else
::new (static_cast<Base*>(this)) Base(expr.data(), expr.rows(), expr.cols());
// Verify that the sizes are valid.
eigen_assert(
(PlainObjectType::RowsAtCompileTime == Dynamic) || (PlainObjectType::RowsAtCompileTime == rows));
eigen_assert(
(PlainObjectType::ColsAtCompileTime == Dynamic) || (PlainObjectType::ColsAtCompileTime == cols));
// If this is a vector, we might be transposing, which means that stride should swap.
const bool transpose = PlainObjectType::IsVectorAtCompileTime && (rows != expr.rows());
// If the storage format differs, we also need to swap the stride.
const bool row_major = ((PlainObjectType::Flags)&RowMajorBit) != 0;
const bool expr_row_major = (Expression::Flags&RowMajorBit) != 0;
const bool storage_differs = (row_major != expr_row_major);
const bool swap_stride = (transpose != storage_differs);
if(Expression::IsVectorAtCompileTime && (!PlainObjectType::IsVectorAtCompileTime) && ((Expression::Flags&RowMajorBit)!=(PlainObjectType::Flags&RowMajorBit)))
::new (&m_stride) StrideBase(expr.innerStride(), StrideType::InnerStrideAtCompileTime==0?0:1);
else
::new (&m_stride) StrideBase(StrideType::OuterStrideAtCompileTime==0?0:expr.outerStride(),
StrideType::InnerStrideAtCompileTime==0?0:expr.innerStride());
// Determine expr's actual strides, resolving any defaults if zero.
const Index expr_inner_actual = resolveInnerStride(expr.innerStride());
const Index expr_outer_actual = resolveOuterStride(expr_inner_actual,
expr.outerStride(),
expr.rows(),
expr.cols(),
Expression::IsVectorAtCompileTime != 0,
expr_row_major);
// If this is a column-major row vector or row-major column vector, the inner-stride
// is arbitrary, so set it to either the compile-time inner stride or 1.
const bool row_vector = (rows == 1);
const bool col_vector = (cols == 1);
const Index inner_stride =
( (!row_major && row_vector) || (row_major && col_vector) ) ?
( StrideType::InnerStrideAtCompileTime > 0 ? Index(StrideType::InnerStrideAtCompileTime) : 1)
: swap_stride ? expr_outer_actual : expr_inner_actual;
// If this is a column-major column vector or row-major row vector, the outer-stride
// is arbitrary, so set it to either the compile-time outer stride or vector size.
const Index outer_stride =
( (!row_major && col_vector) || (row_major && row_vector) ) ?
( StrideType::OuterStrideAtCompileTime > 0 ? Index(StrideType::OuterStrideAtCompileTime) : rows * cols * inner_stride)
: swap_stride ? expr_inner_actual : expr_outer_actual;
// Check if given inner/outer strides are compatible with compile-time strides.
const bool inner_valid = (StrideType::InnerStrideAtCompileTime == Dynamic)
|| (resolveInnerStride(Index(StrideType::InnerStrideAtCompileTime)) == inner_stride);
if (!inner_valid) {
return false;
}
const bool outer_valid = (StrideType::OuterStrideAtCompileTime == Dynamic)
|| (resolveOuterStride(
inner_stride,
Index(StrideType::OuterStrideAtCompileTime),
rows, cols, PlainObjectType::IsVectorAtCompileTime != 0,
row_major)
== outer_stride);
if (!outer_valid) {
return false;
}
::new (static_cast<Base*>(this)) Base(expr.data(), rows, cols);
::new (&m_stride) StrideBase(
(StrideType::OuterStrideAtCompileTime == 0) ? 0 : outer_stride,
(StrideType::InnerStrideAtCompileTime == 0) ? 0 : inner_stride );
return true;
}
StrideBase m_stride;
@@ -212,7 +310,10 @@ template<typename PlainObjectType, int Options, typename StrideType> class Ref
typename internal::enable_if<bool(Traits::template match<Derived>::MatchAtCompileTime),Derived>::type* = 0)
{
EIGEN_STATIC_ASSERT(bool(Traits::template match<Derived>::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH);
Base::construct(expr.derived());
// Construction must pass since we will not create temprary storage in the non-const case.
const bool success = Base::construct(expr.derived());
EIGEN_UNUSED_VARIABLE(success)
eigen_assert(success);
}
template<typename Derived>
EIGEN_DEVICE_FUNC inline Ref(const DenseBase<Derived>& expr,
@@ -226,7 +327,10 @@ template<typename PlainObjectType, int Options, typename StrideType> class Ref
EIGEN_STATIC_ASSERT(bool(internal::is_lvalue<Derived>::value), THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
EIGEN_STATIC_ASSERT(bool(Traits::template match<Derived>::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH);
EIGEN_STATIC_ASSERT(!Derived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
Base::construct(expr.const_cast_derived());
// Construction must pass since we will not create temporary storage in the non-const case.
const bool success = Base::construct(expr.const_cast_derived());
EIGEN_UNUSED_VARIABLE(success)
eigen_assert(success);
}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Ref)
@@ -267,7 +371,10 @@ template<typename TPlainObjectType, int Options, typename StrideType> class Ref<
template<typename Expression>
EIGEN_DEVICE_FUNC void construct(const Expression& expr,internal::true_type)
{
Base::construct(expr);
// Check if we can use the underlying expr's storage directly, otherwise call the copy version.
if (!Base::construct(expr)) {
construct(expr, internal::false_type());
}
}
template<typename Expression>

View File

@@ -12,7 +12,6 @@
#define EIGEN_RESHAPED_H
namespace Eigen {
namespace internal {
/** \class Reshaped
* \ingroup Core_Module
@@ -44,6 +43,8 @@ namespace internal {
* \sa DenseBase::reshaped(NRowsType,NColsType)
*/
namespace internal {
template<typename XprType, int Rows, int Cols, int Order>
struct traits<Reshaped<XprType, Rows, Cols, Order> > : traits<XprType>
{

View File

@@ -120,7 +120,7 @@ class Select : public internal::dense_xpr_base< Select<ConditionMatrixType, Then
*/
template<typename Derived>
template<typename ThenDerived,typename ElseDerived>
inline const Select<Derived,ThenDerived,ElseDerived>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived,ElseDerived>
DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
const DenseBase<ElseDerived>& elseMatrix) const
{
@@ -134,7 +134,7 @@ DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
*/
template<typename Derived>
template<typename ThenDerived>
inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
const typename ThenDerived::Scalar& elseScalar) const
{
@@ -149,7 +149,7 @@ DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
*/
template<typename Derived>
template<typename ElseDerived>
inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
inline EIGEN_DEVICE_FUNC const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
DenseBase<Derived>::select(const typename ElseDerived::Scalar& thenScalar,
const DenseBase<ElseDerived>& elseMatrix) const
{

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SOLVETRIANGULAR_H
#define EIGEN_SOLVETRIANGULAR_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -54,7 +54,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
typedef blas_traits<Lhs> LhsProductTraits;
typedef typename LhsProductTraits::ExtractType ActualLhsType;
typedef Map<Matrix<RhsScalar,Dynamic,1>, Aligned> MappedRhs;
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
ActualLhsType actualLhs = LhsProductTraits::extract(lhs);
@@ -64,7 +64,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhs,rhs.size(),
(useRhsDirectly ? rhs.data() : 0));
if(!useRhsDirectly)
MappedRhs(actualRhs,rhs.size()) = rhs;
@@ -85,7 +85,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,Dynamic>
typedef blas_traits<Lhs> LhsProductTraits;
typedef typename LhsProductTraits::DirectLinearAccessType ActualLhsType;
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
typename internal::add_const_on_value_type<ActualLhsType>::type actualLhs = LhsProductTraits::extract(lhs);
@@ -118,7 +118,7 @@ struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,false> {
DiagIndex = IsLower ? LoopIndex : Size - LoopIndex - 1,
StartIndex = IsLower ? 0 : DiagIndex+1
};
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
if (LoopIndex>0)
rhs.coeffRef(DiagIndex) -= lhs.row(DiagIndex).template segment<LoopIndex>(StartIndex).transpose()
@@ -133,22 +133,22 @@ struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,false> {
template<typename Lhs, typename Rhs, int Mode, int LoopIndex, int Size>
struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,true> {
static void run(const Lhs&, Rhs&) {}
static EIGEN_DEVICE_FUNC void run(const Lhs&, Rhs&) {}
};
template<typename Lhs, typename Rhs, int Mode>
struct triangular_solver_selector<Lhs,Rhs,OnTheLeft,Mode,CompleteUnrolling,1> {
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{ triangular_solver_unroller<Lhs,Rhs,Mode,0,Rhs::SizeAtCompileTime>::run(lhs,rhs); }
};
template<typename Lhs, typename Rhs, int Mode>
struct triangular_solver_selector<Lhs,Rhs,OnTheRight,Mode,CompleteUnrolling,1> {
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
Transpose<const Lhs> trLhs(lhs);
Transpose<Rhs> trRhs(rhs);
triangular_solver_unroller<Transpose<const Lhs>,Transpose<Rhs>,
((Mode&Upper)==Upper ? Lower : Upper) | (Mode&UnitDiag),
0,Rhs::SizeAtCompileTime>::run(trLhs,trRhs);

View File

@@ -10,6 +10,10 @@
#ifndef EIGEN_STABLENORM_H
#define EIGEN_STABLENORM_H
#if EIGEN_HAS_CXX11_ATOMIC
#include <atomic>
#endif
namespace Eigen {
namespace internal {
@@ -123,41 +127,28 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
using std::pow;
using std::sqrt;
using std::abs;
// This program calculates the machine-dependent constants
// bl, b2, slm, s2m, relerr overfl
// from the "basic" machine-dependent numbers
// nbig, ibeta, it, iemin, iemax, rbig.
// The following define the basic machine-dependent constants.
// For portability, the PORT subprograms "ilmaeh" and "rlmach"
// are used. For any specific computer, each of the assignment
// statements can be replaced
static const int ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
static const int it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
static const int iemin = std::numeric_limits<RealScalar>::min_exponent; // minimum exponent
static const int iemax = std::numeric_limits<RealScalar>::max_exponent; // maximum exponent
static const RealScalar rbig = (std::numeric_limits<RealScalar>::max)(); // largest floating-point number
static const RealScalar b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(-((1-iemin)/2)))); // lower boundary of midrange
static const RealScalar b2 = RealScalar(pow(RealScalar(ibeta),RealScalar((iemax + 1 - it)/2))); // upper boundary of midrange
static const RealScalar s1m = RealScalar(pow(RealScalar(ibeta),RealScalar((2-iemin)/2))); // scaling factor for lower range
static const RealScalar s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(- ((iemax+it)/2)))); // scaling factor for upper range
static const RealScalar eps = RealScalar(pow(double(ibeta), 1-it));
static const RealScalar relerr = sqrt(eps); // tolerance for neglecting asml
const Derived& vec(_vec.derived());
static bool initialized = false;
static RealScalar b1, b2, s1m, s2m, rbig, relerr;
if(!initialized)
{
int ibeta, it, iemin, iemax, iexp;
RealScalar eps;
// This program calculates the machine-dependent constants
// bl, b2, slm, s2m, relerr overfl
// from the "basic" machine-dependent numbers
// nbig, ibeta, it, iemin, iemax, rbig.
// The following define the basic machine-dependent constants.
// For portability, the PORT subprograms "ilmaeh" and "rlmach"
// are used. For any specific computer, each of the assignment
// statements can be replaced
ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
iemin = std::numeric_limits<RealScalar>::min_exponent; // minimum exponent
iemax = std::numeric_limits<RealScalar>::max_exponent; // maximum exponent
rbig = (std::numeric_limits<RealScalar>::max)(); // largest floating-point number
iexp = -((1-iemin)/2);
b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // lower boundary of midrange
iexp = (iemax + 1 - it)/2;
b2 = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // upper boundary of midrange
iexp = (2-iemin)/2;
s1m = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // scaling factor for lower range
iexp = - ((iemax+it)/2);
s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // scaling factor for upper range
eps = RealScalar(pow(double(ibeta), 1-it));
relerr = sqrt(eps); // tolerance for neglecting asml
initialized = true;
}
Index n = vec.size();
RealScalar ab2 = b2 / RealScalar(n);
RealScalar asml = RealScalar(0);
@@ -166,9 +157,9 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
for(Index j=0; j<vec.outerSize(); ++j)
{
for(typename Derived::InnerIterator it(vec, j); it; ++it)
for(typename Derived::InnerIterator iter(vec, j); iter; ++iter)
{
RealScalar ax = abs(it.value());
RealScalar ax = abs(iter.value());
if(ax > ab2) abig += numext::abs2(ax*s2m);
else if(ax < b1) asml += numext::abs2(ax*s1m);
else amed += numext::abs2(ax);

View File

@@ -93,6 +93,85 @@ protected:
Index m_index;
};
template<typename Derived>
class indexed_based_stl_reverse_iterator_base
{
protected:
typedef indexed_based_stl_iterator_traits<Derived> traits;
typedef typename traits::XprType XprType;
typedef indexed_based_stl_reverse_iterator_base<typename traits::non_const_iterator> non_const_iterator;
typedef indexed_based_stl_reverse_iterator_base<typename traits::const_iterator> const_iterator;
typedef typename internal::conditional<internal::is_const<XprType>::value,non_const_iterator,const_iterator>::type other_iterator;
// NOTE: in C++03 we cannot declare friend classes through typedefs because we need to write friend class:
friend class indexed_based_stl_reverse_iterator_base<typename traits::const_iterator>;
friend class indexed_based_stl_reverse_iterator_base<typename traits::non_const_iterator>;
public:
typedef Index difference_type;
typedef std::random_access_iterator_tag iterator_category;
indexed_based_stl_reverse_iterator_base() : mp_xpr(0), m_index(0) {}
indexed_based_stl_reverse_iterator_base(XprType& xpr, Index index) : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_reverse_iterator_base(const non_const_iterator& other)
: mp_xpr(other.mp_xpr), m_index(other.m_index)
{}
indexed_based_stl_reverse_iterator_base& operator=(const non_const_iterator& other)
{
mp_xpr = other.mp_xpr;
m_index = other.m_index;
return *this;
}
Derived& operator++() { --m_index; return derived(); }
Derived& operator--() { ++m_index; return derived(); }
Derived operator++(int) { Derived prev(derived()); operator++(); return prev;}
Derived operator--(int) { Derived prev(derived()); operator--(); return prev;}
friend Derived operator+(const indexed_based_stl_reverse_iterator_base& a, Index b) { Derived ret(a.derived()); ret += b; return ret; }
friend Derived operator-(const indexed_based_stl_reverse_iterator_base& a, Index b) { Derived ret(a.derived()); ret -= b; return ret; }
friend Derived operator+(Index a, const indexed_based_stl_reverse_iterator_base& b) { Derived ret(b.derived()); ret += a; return ret; }
friend Derived operator-(Index a, const indexed_based_stl_reverse_iterator_base& b) { Derived ret(b.derived()); ret -= a; return ret; }
Derived& operator+=(Index b) { m_index -= b; return derived(); }
Derived& operator-=(Index b) { m_index += b; return derived(); }
difference_type operator-(const indexed_based_stl_reverse_iterator_base& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return other.m_index - m_index;
}
difference_type operator-(const other_iterator& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return other.m_index - m_index;
}
bool operator==(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator<=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator> (const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator>=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
bool operator==(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator<=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator> (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator>=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
protected:
Derived& derived() { return static_cast<Derived&>(*this); }
const Derived& derived() const { return static_cast<const Derived&>(*this); }
XprType *mp_xpr;
Index m_index;
};
template<typename XprType>
class pointer_based_stl_iterator
{
@@ -267,6 +346,54 @@ public:
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
template<typename _XprType, DirectionType Direction>
struct indexed_based_stl_iterator_traits<subvector_stl_reverse_iterator<_XprType,Direction> >
{
typedef _XprType XprType;
typedef subvector_stl_reverse_iterator<typename internal::remove_const<XprType>::type, Direction> non_const_iterator;
typedef subvector_stl_reverse_iterator<typename internal::add_const<XprType>::type, Direction> const_iterator;
};
template<typename XprType, DirectionType Direction>
class subvector_stl_reverse_iterator : public indexed_based_stl_reverse_iterator_base<subvector_stl_reverse_iterator<XprType,Direction> >
{
protected:
enum { is_lvalue = internal::is_lvalue<XprType>::value };
typedef indexed_based_stl_reverse_iterator_base<subvector_stl_reverse_iterator> Base;
using Base::m_index;
using Base::mp_xpr;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ColXpr,typename XprType::RowXpr>::type SubVectorType;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ConstColXpr,typename XprType::ConstRowXpr>::type ConstSubVectorType;
public:
typedef typename internal::conditional<bool(is_lvalue), SubVectorType, ConstSubVectorType>::type reference;
typedef typename reference::PlainObject value_type;
private:
class subvector_stl_reverse_iterator_ptr
{
public:
subvector_stl_reverse_iterator_ptr(const reference &subvector) : m_subvector(subvector) {}
reference* operator->() { return &m_subvector; }
private:
reference m_subvector;
};
public:
typedef subvector_stl_reverse_iterator_ptr pointer;
subvector_stl_reverse_iterator() : Base() {}
subvector_stl_reverse_iterator(XprType& xpr, Index index) : Base(xpr,index) {}
reference operator*() const { return (*mp_xpr).template subVector<Direction>(m_index); }
reference operator[](Index i) const { return (*mp_xpr).template subVector<Direction>(m_index+i); }
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
} // namespace internal

View File

@@ -26,7 +26,7 @@ namespace Eigen {
*
* The outer stride is the pointer increment between two consecutive rows of a row-major matrix or
* between two consecutive columns of a column-major matrix.
*
*
* These two values can be passed either at compile-time as template parameters, or at runtime as
* arguments to the constructor.
*
@@ -38,6 +38,10 @@ namespace Eigen {
* \include Map_general_stride.cpp
* Output: \verbinclude Map_general_stride.out
*
* Both strides can be negative, however, a negative stride of -1 cannot be specified at compiletime
* because of the ambiguity with Dynamic which is defined to -1 (historically, negative strides were
* not allowed).
*
* \sa class InnerStride, class OuterStride, \ref TopicStorageOrders
*/
template<int _OuterStrideAtCompileTime, int _InnerStrideAtCompileTime>
@@ -55,6 +59,8 @@ class Stride
Stride()
: m_outer(OuterStrideAtCompileTime), m_inner(InnerStrideAtCompileTime)
{
// FIXME: for Eigen 4 we should use DynamicIndex instead of Dynamic.
// FIXME: for Eigen 4 we should also unify this API with fix<>
eigen_assert(InnerStrideAtCompileTime != Dynamic && OuterStrideAtCompileTime != Dynamic);
}
@@ -63,7 +69,6 @@ class Stride
Stride(Index outerStride, Index innerStride)
: m_outer(outerStride), m_inner(innerStride)
{
eigen_assert(innerStride>=0 && outerStride>=0);
}
/** Copy constructor */

View File

@@ -153,6 +153,8 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
{
return derived().nestedExpression().coeffRef(index);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TransposeImpl)
};
/** \returns an expression of the transpose of *this.
@@ -241,7 +243,6 @@ struct inplace_transpose_selector<MatrixType,true,false> { // square matrix
}
};
// TODO: vectorized path is currently limited to LargestPacketSize x LargestPacketSize cases only.
template<typename MatrixType>
struct inplace_transpose_selector<MatrixType,true,true> { // PacketSize x PacketSize
static void run(MatrixType& m) {
@@ -258,16 +259,66 @@ struct inplace_transpose_selector<MatrixType,true,true> { // PacketSize x Packet
}
};
template <typename MatrixType, Index Alignment>
void BlockedInPlaceTranspose(MatrixType& m) {
typedef typename MatrixType::Scalar Scalar;
typedef typename internal::packet_traits<typename MatrixType::Scalar>::type Packet;
const Index PacketSize = internal::packet_traits<Scalar>::size;
eigen_assert(m.rows() == m.cols());
int row_start = 0;
for (; row_start + PacketSize <= m.rows(); row_start += PacketSize) {
for (int col_start = row_start; col_start + PacketSize <= m.cols(); col_start += PacketSize) {
PacketBlock<Packet> A;
if (row_start == col_start) {
for (Index i=0; i<PacketSize; ++i)
A.packet[i] = m.template packetByOuterInner<Alignment>(row_start + i,col_start);
internal::ptranspose(A);
for (Index i=0; i<PacketSize; ++i)
m.template writePacket<Alignment>(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), A.packet[i]);
} else {
PacketBlock<Packet> B;
for (Index i=0; i<PacketSize; ++i) {
A.packet[i] = m.template packetByOuterInner<Alignment>(row_start + i,col_start);
B.packet[i] = m.template packetByOuterInner<Alignment>(col_start + i, row_start);
}
internal::ptranspose(A);
internal::ptranspose(B);
for (Index i=0; i<PacketSize; ++i) {
m.template writePacket<Alignment>(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), B.packet[i]);
m.template writePacket<Alignment>(m.rowIndexByOuterInner(col_start + i, row_start), m.colIndexByOuterInner(col_start + i,row_start), A.packet[i]);
}
}
}
}
for (Index row = row_start; row < m.rows(); ++row) {
m.matrix().row(row).head(row).swap(
m.matrix().col(row).head(row).transpose());
}
}
template<typename MatrixType,bool MatchPacketSize>
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square matrix
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square or dynamic matrix
static void run(MatrixType& m) {
if (m.rows()==m.cols())
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
else
typedef typename MatrixType::Scalar Scalar;
if (m.rows() == m.cols()) {
const Index PacketSize = internal::packet_traits<Scalar>::size;
if (!NumTraits<Scalar>::IsComplex && m.rows() >= PacketSize) {
if ((m.rows() % PacketSize) == 0)
BlockedInPlaceTranspose<MatrixType,internal::evaluator<MatrixType>::Alignment>(m);
else
BlockedInPlaceTranspose<MatrixType,Unaligned>(m);
}
else {
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
}
} else {
m = m.transpose().eval();
}
}
};
} // end namespace internal
/** This is the "in place" version of transpose(): it replaces \c *this by its own transpose.

View File

@@ -10,20 +10,22 @@
#ifndef EIGEN_TRANSPOSITIONS_H
#define EIGEN_TRANSPOSITIONS_H
namespace Eigen {
namespace Eigen {
template<typename Derived>
class TranspositionsBase
{
typedef internal::traits<Derived> Traits;
public:
typedef typename Traits::IndicesType IndicesType;
typedef typename IndicesType::Scalar StorageIndex;
typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3
EIGEN_DEVICE_FUNC
Derived& derived() { return *static_cast<Derived*>(this); }
EIGEN_DEVICE_FUNC
const Derived& derived() const { return *static_cast<const Derived*>(this); }
/** Copies the \a other transpositions into \c *this */
@@ -35,13 +37,17 @@ class TranspositionsBase
}
/** \returns the number of transpositions */
EIGEN_DEVICE_FUNC
Index size() const { return indices().size(); }
/** \returns the number of rows of the equivalent permutation matrix */
EIGEN_DEVICE_FUNC
Index rows() const { return indices().size(); }
/** \returns the number of columns of the equivalent permutation matrix */
EIGEN_DEVICE_FUNC
Index cols() const { return indices().size(); }
/** Direct access to the underlying index vector */
EIGEN_DEVICE_FUNC
inline const StorageIndex& coeff(Index i) const { return indices().coeff(i); }
/** Direct access to the underlying index vector */
inline StorageIndex& coeffRef(Index i) { return indices().coeffRef(i); }
@@ -55,8 +61,10 @@ class TranspositionsBase
inline StorageIndex& operator[](Index i) { return indices()(i); }
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return derived().indices(); }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return derived().indices(); }
/** Resizes to given size. */
@@ -178,8 +186,10 @@ class Transpositions : public TranspositionsBase<Transpositions<SizeAtCompileTim
{}
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -237,9 +247,11 @@ class Map<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,_StorageIndex>,P
#endif
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -279,9 +291,11 @@ class TranspositionsWrapper
}
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -335,8 +349,11 @@ class Transpose<TranspositionsBase<TranspositionsDerived> >
explicit Transpose(const TranspositionType& t) : m_transpositions(t) {}
EIGEN_DEVICE_FUNC
Index size() const { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC
Index rows() const { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC
Index cols() const { return m_transpositions.size(); }
/** \returns the \a matrix with the inverse transpositions applied to the columns.
@@ -356,7 +373,8 @@ class Transpose<TranspositionsBase<TranspositionsDerived> >
{
return Product<Transpose, OtherDerived, AliasFreeProduct>(*this, matrix.derived());
}
EIGEN_DEVICE_FUNC
const TranspositionType& nestedExpression() const { return m_transpositions; }
protected:

View File

@@ -219,9 +219,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
explicit inline TriangularView(MatrixType& matrix) : m_matrix(matrix)
{}
using Base::operator=;
TriangularView& operator=(const TriangularView &other)
{ return Base::operator=(other); }
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TriangularView)
/** \copydoc EigenBase::rows() */
EIGEN_DEVICE_FUNC
@@ -557,6 +555,10 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
template<typename ProductType>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE TriangularViewType& _assignProduct(const ProductType& prod, const Scalar& alpha, bool beta);
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(TriangularViewImpl)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TriangularViewImpl)
};
/***************************************************************************

View File

@@ -279,27 +279,47 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
/** This is the const version of iterator (aka read-only) */
random_access_iterator_type const_iterator;
#else
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
typedef internal::subvector_stl_reverse_iterator<ExpressionType, DirectionType(Direction)> reverse_iterator;
typedef internal::subvector_stl_reverse_iterator<const ExpressionType, DirectionType(Direction)> const_reverse_iterator;
#endif
/** returns an iterator to the first row (rowwise) or column (colwise) of the nested expression.
* \sa end(), cbegin()
*/
iterator begin() { return iterator (m_matrix, 0); }
iterator begin() { return iterator (m_matrix, 0); }
/** const version of begin() */
const_iterator begin() const { return const_iterator(m_matrix, 0); }
const_iterator begin() const { return const_iterator(m_matrix, 0); }
/** const version of begin() */
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
/** returns a reverse iterator to the last row (rowwise) or column (colwise) of the nested expression.
* \sa rend(), crbegin()
*/
reverse_iterator rbegin() { return reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** const version of rbegin() */
const_reverse_iterator rbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** const version of rbegin() */
const_reverse_iterator crbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** returns an iterator to the row (resp. column) following the last row (resp. column) of the nested expression
* \sa begin(), cend()
*/
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** returns a reverse iterator to the row (resp. column) before the first row (resp. column) of the nested expression
* \sa begin(), cend()
*/
reverse_iterator rend() { return reverse_iterator (m_matrix, -1); }
/** const version of rend() */
const_reverse_iterator rend() const { return const_reverse_iterator (m_matrix, -1); }
/** const version of rend() */
const_reverse_iterator crend() const { return const_reverse_iterator (m_matrix, -1); }
/** \returns a row or column vector expression of \c *this reduxed by \a func
*
@@ -719,6 +739,10 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_DEVICE_FUNC
const HNormalizedReturnType hnormalized() const;
# ifdef EIGEN_VECTORWISEOP_PLUGIN
# include EIGEN_VECTORWISEOP_PLUGIN
# endif
protected:
Index redux_length() const
{

View File

@@ -38,6 +38,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -47,7 +48,18 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet4cf> { typedef std::complex<float> type; enum {size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet4cf> {
typedef std::complex<float> type;
typedef Packet2cf half;
typedef Packet8f as_real;
enum {
size=4,
alignment=Aligned32,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet4cf padd<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf psub<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_sub_ps(a.v,b.v)); }
@@ -76,7 +88,6 @@ EIGEN_STRONG_INLINE Packet4cf pcmp_eq(const Packet4cf& a, const Packet4cf& b) {
}
template<> EIGEN_STRONG_INLINE Packet4cf ptrue<Packet4cf>(const Packet4cf& a) { return Packet4cf(ptrue(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pnot<Packet4cf>(const Packet4cf& a) { return Packet4cf(pnot(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pand <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_and_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf por <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_or_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pxor <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_xor_ps(a.v,b.v)); }
@@ -150,37 +161,12 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet4cf>(const Packe
Packet2cf(_mm256_extractf128_ps(a.v,1))));
}
template<> EIGEN_STRONG_INLINE Packet4cf preduxp<Packet4cf>(const Packet4cf* vecs)
{
Packet8f t0 = _mm256_shuffle_ps(vecs[0].v, vecs[0].v, _MM_SHUFFLE(3, 1, 2 ,0));
Packet8f t1 = _mm256_shuffle_ps(vecs[1].v, vecs[1].v, _MM_SHUFFLE(3, 1, 2 ,0));
t0 = _mm256_hadd_ps(t0,t1);
Packet8f t2 = _mm256_shuffle_ps(vecs[2].v, vecs[2].v, _MM_SHUFFLE(3, 1, 2 ,0));
Packet8f t3 = _mm256_shuffle_ps(vecs[3].v, vecs[3].v, _MM_SHUFFLE(3, 1, 2 ,0));
t2 = _mm256_hadd_ps(t2,t3);
t1 = _mm256_permute2f128_ps(t0,t2, 0 + (2<<4));
t3 = _mm256_permute2f128_ps(t0,t2, 1 + (3<<4));
return Packet4cf(_mm256_add_ps(t1,t3));
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet4cf>(const Packet4cf& a)
{
return predux_mul(pmul(Packet2cf(_mm256_extractf128_ps(a.v, 0)),
Packet2cf(_mm256_extractf128_ps(a.v, 1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cf>
{
static EIGEN_STRONG_INLINE void run(Packet4cf& first, const Packet4cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, false,true>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
@@ -254,6 +240,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -263,7 +250,18 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet2cd> { typedef std::complex<double> type; enum {size=2, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet2cd> {
typedef std::complex<double> type;
typedef Packet1cd half;
typedef Packet4d as_real;
enum {
size=2,
alignment=Aligned32,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet2cd padd<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_add_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd psub<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_sub_pd(a.v,b.v)); }
@@ -291,7 +289,6 @@ EIGEN_STRONG_INLINE Packet2cd pcmp_eq(const Packet2cd& a, const Packet2cd& b) {
}
template<> EIGEN_STRONG_INLINE Packet2cd ptrue<Packet2cd>(const Packet2cd& a) { return Packet2cd(ptrue(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pnot<Packet2cd>(const Packet2cd& a) { return Packet2cd(pnot(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pand <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_and_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd por <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_or_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pxor <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_xor_pd(a.v,b.v)); }
@@ -347,30 +344,12 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet2cd>(const Pack
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<> EIGEN_STRONG_INLINE Packet2cd preduxp<Packet2cd>(const Packet2cd* vecs)
{
Packet4d t0 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 0 + (2<<4));
Packet4d t1 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 1 + (3<<4));
return Packet2cd(_mm256_add_pd(t0,t1));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet2cd>(const Packet2cd& a)
{
return predux(pmul(Packet1cd(_mm256_extractf128_pd(a.v,0)),
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet2cd>
{
static EIGEN_STRONG_INLINE void run(Packet2cd& first, const Packet2cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet4d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, false,true>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
@@ -444,24 +423,12 @@ ptranspose(PacketBlock<Packet2cd,2>& kernel) {
kernel.packet[0].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet4cf pinsertfirst(const Packet4cf& a, std::complex<float> b)
{
return Packet4cf(_mm256_blend_ps(a.v,pset1<Packet4cf>(b).v,1|2));
template<> EIGEN_STRONG_INLINE Packet2cd psqrt<Packet2cd>(const Packet2cd& a) {
return psqrt_complex<Packet2cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet2cd pinsertfirst(const Packet2cd& a, std::complex<double> b)
{
return Packet2cd(_mm256_blend_pd(a.v,pset1<Packet2cd>(b).v,1|2));
}
template<> EIGEN_STRONG_INLINE Packet4cf pinsertlast(const Packet4cf& a, std::complex<float> b)
{
return Packet4cf(_mm256_blend_ps(a.v,pset1<Packet4cf>(b).v,(1<<7)|(1<<6)));
}
template<> EIGEN_STRONG_INLINE Packet2cd pinsertlast(const Packet2cd& a, std::complex<double> b)
{
return Packet2cd(_mm256_blend_pd(a.v,pset1<Packet2cd>(b).v,(1<<3)|(1<<2)));
template<> EIGEN_STRONG_INLINE Packet4cf psqrt<Packet4cf>(const Packet4cf& a) {
return psqrt_complex<Packet4cf>(a);
}
} // end namespace internal

View File

@@ -36,6 +36,24 @@ plog<Packet8f>(const Packet8f& _x) {
return plog_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
plog<Packet4d>(const Packet4d& _x) {
return plog_double(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
plog2<Packet8f>(const Packet8f& _x) {
return plog2_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
plog2<Packet4d>(const Packet4d& _x) {
return plog2_double(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f plog1p<Packet8f>(const Packet8f& _x) {
return generic_plog1p(_x);
@@ -58,15 +76,15 @@ pexp<Packet8f>(const Packet8f& _x) {
// Hyperbolic Tangent function.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
ptanh<Packet8f>(const Packet8f& x) {
return internal::generic_fast_tanh_float(x);
ptanh<Packet8f>(const Packet8f& _x) {
return internal::generic_fast_tanh_float(_x);
}
// Exponential function for doubles.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
pexp<Packet4d>(const Packet4d& x) {
return pexp_double(x);
pexp<Packet4d>(const Packet4d& _x) {
return pexp_double(_x);
}
// Functions for sqrt.
@@ -79,33 +97,36 @@ pexp<Packet4d>(const Packet4d& x) {
// For detail see here: http://www.beyond3d.com/content/articles/8/
#if EIGEN_FAST_MATH
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
psqrt<Packet8f>(const Packet8f& _x) {
Packet8f half = pmul(_x, pset1<Packet8f>(.5f));
Packet8f denormal_mask = _mm256_and_ps(
_mm256_cmp_ps(_x, pset1<Packet8f>((std::numeric_limits<float>::min)()),
_CMP_LT_OQ),
_mm256_cmp_ps(_x, _mm256_setzero_ps(), _CMP_GE_OQ));
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& _x) {
Packet8f minus_half_x = pmul(_x, pset1<Packet8f>(-0.5f));
Packet8f denormal_mask = pandnot(
pcmp_lt(_x, pset1<Packet8f>((std::numeric_limits<float>::min)())),
pcmp_lt(_x, pzero(_x)));
// Compute approximate reciprocal sqrt.
Packet8f x = _mm256_rsqrt_ps(_x);
// Do a single step of Newton's iteration.
x = pmul(x, psub(pset1<Packet8f>(1.5f), pmul(half, pmul(x,x))));
x = pmul(x, pmadd(minus_half_x, pmul(x,x), pset1<Packet8f>(1.5f)));
// Flush results for denormals to zero.
return _mm256_andnot_ps(denormal_mask, pmul(_x,x));
return pandnot(pmul(_x,x), denormal_mask);
}
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& x) {
return _mm256_sqrt_ps(x);
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d psqrt<Packet4d>(const Packet4d& x) {
return _mm256_sqrt_pd(x);
}
#if EIGEN_FAST_MATH
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& _x) {
return _mm256_sqrt_ps(_x);
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d psqrt<Packet4d>(const Packet4d& _x) {
return _mm256_sqrt_pd(_x);
}
#if EIGEN_FAST_MATH
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000);
@@ -140,18 +161,65 @@ Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& x) {
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f(one, 1.0f);
return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(x));
return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(_x));
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d prsqrt<Packet4d>(const Packet4d& x) {
Packet4d prsqrt<Packet4d>(const Packet4d& _x) {
_EIGEN_DECLARE_CONST_Packet4d(one, 1.0);
return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(x));
return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(_x));
}
F16_PACKET_FUNCTION(Packet8f, Packet8h, psin)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pcos)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog2)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog1p)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pexpm1)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pexp)
F16_PACKET_FUNCTION(Packet8f, Packet8h, ptanh)
F16_PACKET_FUNCTION(Packet8f, Packet8h, psqrt)
F16_PACKET_FUNCTION(Packet8f, Packet8h, prsqrt)
template <>
EIGEN_STRONG_INLINE Packet8h pfrexp(const Packet8h& a, Packet8h& exponent) {
Packet8f fexponent;
const Packet8h out = float2half(pfrexp<Packet8f>(half2float(a), fexponent));
exponent = float2half(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet8h pldexp(const Packet8h& a, const Packet8h& exponent) {
return float2half(pldexp<Packet8f>(half2float(a), half2float(exponent)));
}
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psin)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pcos)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog2)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog1p)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexpm1)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexp)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, ptanh)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psqrt)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, prsqrt)
template <>
EIGEN_STRONG_INLINE Packet8bf pfrexp(const Packet8bf& a, Packet8bf& exponent) {
Packet8f fexponent;
const Packet8bf out = F32ToBf16(pfrexp<Packet8f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet8bf pldexp(const Packet8bf& a, const Packet8bf& exponent) {
return F32ToBf16(pldexp<Packet8f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
} // end namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -35,6 +35,46 @@ struct type_casting_traits<int, float> {
};
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<bfloat16, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<float, bfloat16> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8i pcast<Packet8f, Packet8i>(const Packet8f& a) {
return _mm256_cvttps_epi32(a);
@@ -52,36 +92,22 @@ template<> EIGEN_STRONG_INLINE Packet8f preinterpret<Packet8f,Packet8i>(const Pa
return _mm256_castsi256_ps(a);
}
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8h, Packet8f>(const Packet8h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8bf, Packet8f>(const Packet8bf& a) {
return Bf16ToF32(a);
}
template<> EIGEN_STRONG_INLINE Packet8h pcast<Packet8f, Packet8h>(const Packet8f& a) {
return float2half(a);
}
template<> EIGEN_STRONG_INLINE Packet8bf pcast<Packet8f, Packet8bf>(const Packet8f& a) {
return F32ToBf16(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -37,17 +37,19 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet8cf> {
typedef std::complex<float> type;
typedef Packet4cf half;
typedef Packet16f as_real;
enum {
size = 8,
alignment=unpacket_traits<Packet16f>::alignment,
@@ -55,11 +57,9 @@ template<> struct unpacket_traits<Packet8cf> {
masked_load_available=false,
masked_store_available=false
};
typedef Packet4cf half;
};
template<> EIGEN_STRONG_INLINE Packet8cf ptrue<Packet8cf>(const Packet8cf& a) { return Packet8cf(ptrue(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf pnot<Packet8cf>(const Packet8cf& a) { return Packet8cf(pnot(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf padd<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf psub<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_sub_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf pnegate(const Packet8cf& a)
@@ -153,16 +153,6 @@ EIGEN_STRONG_INLINE Packet4cf predux_half_dowto4<Packet8cf>(const Packet8cf& a)
return Packet4cf(res);
}
template<int Offset>
struct palign_impl<Offset,Packet8cf>
{
static EIGEN_STRONG_INLINE void run(Packet8cf& first, const Packet8cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet16f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, false,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
@@ -235,17 +225,19 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet4cd> {
typedef std::complex<double> type;
typedef Packet2cd half;
typedef Packet8d as_real;
enum {
size = 4,
alignment = unpacket_traits<Packet8d>::alignment,
@@ -253,7 +245,6 @@ template<> struct unpacket_traits<Packet4cd> {
masked_load_available=false,
masked_store_available=false
};
typedef Packet2cd half;
};
template<> EIGEN_STRONG_INLINE Packet4cd padd<Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_add_pd(a.v,b.v)); }
@@ -277,7 +268,6 @@ template<> EIGEN_STRONG_INLINE Packet4cd pmul<Packet4cd>(const Packet4cd& a, con
}
template<> EIGEN_STRONG_INLINE Packet4cd ptrue<Packet4cd>(const Packet4cd& a) { return Packet4cd(ptrue(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pnot<Packet4cd>(const Packet4cd& a) { return Packet4cd(pnot(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pand <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd por <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pxor <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pxor(a.v,b.v)); }
@@ -337,7 +327,7 @@ template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet4cd>(const Pack
}
template<> EIGEN_STRONG_INLINE Packet4cd preverse(const Packet4cd& a) {
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, EIGEN_SSE_SHUFFLE_MASK(3,2,1,0)));
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, (shuffle_mask<3,2,1,0>::mask)));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet4cd>(const Packet4cd& a)
@@ -352,16 +342,6 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet4cd>(const
Packet2cd(_mm512_extractf64x4_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cd>
{
static EIGEN_STRONG_INLINE void run(Packet4cd& first, const Packet4cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, false,true>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
@@ -450,43 +430,26 @@ ptranspose(PacketBlock<Packet8cf,8>& kernel) {
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet4cd,4>& kernel) {
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [c2 c3 d2 d3]
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<0,1,0,1>::mask)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<2,3,2,3>::mask)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<0,1,0,1>::mask)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<2,3,2,3>::mask)); // [c2 c3 d2 d3]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a0 b0 c0 d0]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<1,3,1,3>::mask))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<0,2,0,2>::mask))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<1,3,1,3>::mask))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<0,2,0,2>::mask))); // [a0 b0 c0 d0]
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertfirst(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,0));
tmp = pinsertfirst(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 0) );
template<> EIGEN_STRONG_INLINE Packet4cd psqrt<Packet4cd>(const Packet4cd& a) {
return psqrt_complex<Packet4cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertfirst(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 0) ));
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertlast(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,3) );
tmp = pinsertlast(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 3) );
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertlast(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 3) ));
template<> EIGEN_STRONG_INLINE Packet8cf psqrt<Packet8cf>(const Packet8cf& a) {
return psqrt_complex<Packet8cf>(a);
}
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_COMPLEX_AVX512_H

View File

@@ -29,106 +29,41 @@ namespace internal {
#define _EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(NAME, X) \
const Packet8d p8d_##NAME = _mm512_castsi512_pd(_mm512_set1_epi64(X))
// Natural logarithm
// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
// be easily approximated by a polynomial centered on m=1 for stability.
#if defined(EIGEN_VECTORIZE_AVX512DQ)
#define _EIGEN_DECLARE_CONST_Packet16bf(NAME, X) \
const Packet16bf p16bf_##NAME = pset1<Packet16bf>(X)
#define _EIGEN_DECLARE_CONST_Packet16bf_FROM_INT(NAME, X) \
const Packet16bf p16bf_##NAME = preinterpret<Packet16bf,Packet16i>(pset1<Packet16i>(X))
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
plog<Packet16f>(const Packet16f& _x) {
Packet16f x = _x;
_EIGEN_DECLARE_CONST_Packet16f(1, 1.0f);
_EIGEN_DECLARE_CONST_Packet16f(half, 0.5f);
_EIGEN_DECLARE_CONST_Packet16f(126f, 126.0f);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(inv_mant_mask, ~0x7f800000);
// The smallest non denormalized float number.
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(min_norm_pos, 0x00800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(minus_inf, 0xff800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(pos_inf, 0x7f800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(nan, 0x7fc00000);
// Polynomial coefficients.
_EIGEN_DECLARE_CONST_Packet16f(cephes_SQRTHF, 0.707106781186547524f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p0, 7.0376836292E-2f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p1, -1.1514610310E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p2, 1.1676998740E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p3, -1.2420140846E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p4, +1.4249322787E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p5, -1.6668057665E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p6, +2.0000714765E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p7, -2.4999993993E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p8, +3.3333331174E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_q1, -2.12194440e-4f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_q2, 0.693359375f);
// invalid_mask is set to true when x is NaN
__mmask16 invalid_mask = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_NGE_UQ);
__mmask16 iszero_mask = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_EQ_OQ);
// Truncate input values to the minimum positive normal.
x = pmax(x, p16f_min_norm_pos);
// Extract the shifted exponents.
Packet16f emm0 = _mm512_cvtepi32_ps(_mm512_srli_epi32(preinterpret<Packet16i,Packet16f>(x), 23));
Packet16f e = _mm512_sub_ps(emm0, p16f_126f);
// Set the exponents to -1, i.e. x are in the range [0.5,1).
x = _mm512_and_ps(x, p16f_inv_mant_mask);
x = _mm512_or_ps(x, p16f_half);
// part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2))
// and shift by -1. The values are then centered around 0, which improves
// the stability of the polynomial evaluation.
// if( x < SQRTHF ) {
// e -= 1;
// x = x + x - 1.0;
// } else { x = x - 1.0; }
__mmask16 mask = _mm512_cmp_ps_mask(x, p16f_cephes_SQRTHF, _CMP_LT_OQ);
Packet16f tmp = _mm512_mask_blend_ps(mask, _mm512_setzero_ps(), x);
x = psub(x, p16f_1);
e = psub(e, _mm512_mask_blend_ps(mask, _mm512_setzero_ps(), p16f_1));
x = padd(x, tmp);
Packet16f x2 = pmul(x, x);
Packet16f x3 = pmul(x2, x);
// Evaluate the polynomial approximant of degree 8 in three parts, probably
// to improve instruction-level parallelism.
Packet16f y, y1, y2;
y = pmadd(p16f_cephes_log_p0, x, p16f_cephes_log_p1);
y1 = pmadd(p16f_cephes_log_p3, x, p16f_cephes_log_p4);
y2 = pmadd(p16f_cephes_log_p6, x, p16f_cephes_log_p7);
y = pmadd(y, x, p16f_cephes_log_p2);
y1 = pmadd(y1, x, p16f_cephes_log_p5);
y2 = pmadd(y2, x, p16f_cephes_log_p8);
y = pmadd(y, x3, y1);
y = pmadd(y, x3, y2);
y = pmul(y, x3);
// Add the logarithm of the exponent back to the result of the interpolation.
y1 = pmul(e, p16f_cephes_log_q1);
tmp = pmul(x2, p16f_half);
y = padd(y, y1);
x = psub(x, tmp);
y2 = pmul(e, p16f_cephes_log_q2);
x = padd(x, y);
x = padd(x, y2);
__mmask16 pos_inf_mask = _mm512_cmp_ps_mask(_x,p16f_pos_inf,_CMP_EQ_OQ);
// Filter out invalid inputs, i.e.:
// - negative arg will be NAN,
// - 0 will be -INF.
// - +INF will be +INF
return _mm512_mask_blend_ps(iszero_mask,
_mm512_mask_blend_ps(invalid_mask,
_mm512_mask_blend_ps(pos_inf_mask,x,p16f_pos_inf),
p16f_nan),
p16f_minus_inf);
return plog_float(_x);
}
#endif
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
plog<Packet8d>(const Packet8d& _x) {
return plog_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog)
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
plog2<Packet16f>(const Packet16f& _x) {
return plog2_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
plog2<Packet8d>(const Packet8d& _x) {
return plog2_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog2)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog2)
// Exponential function. Works by writing "x = m*log(2) + r" where
// "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then
@@ -164,17 +99,17 @@ pexp<Packet16f>(const Packet16f& _x) {
_EIGEN_DECLARE_CONST_Packet16f(nln2, -0.6931471805599453f);
Packet16f r = _mm512_fmadd_ps(m, p16f_nln2, x);
Packet16f r2 = pmul(r, r);
Packet16f r3 = pmul(r2, r);
// TODO(gonnet): Split into odd/even polynomials and try to exploit
// instruction-level parallelism.
Packet16f y = p16f_cephes_exp_p0;
y = pmadd(y, r, p16f_cephes_exp_p1);
y = pmadd(y, r, p16f_cephes_exp_p2);
y = pmadd(y, r, p16f_cephes_exp_p3);
y = pmadd(y, r, p16f_cephes_exp_p4);
y = pmadd(y, r, p16f_cephes_exp_p5);
y = pmadd(y, r2, r);
y = padd(y, p16f_1);
// Evaluate the polynomial approximant,improved by instruction-level parallelism.
Packet16f y, y1, y2;
y = pmadd(p16f_cephes_exp_p0, r, p16f_cephes_exp_p1);
y1 = pmadd(p16f_cephes_exp_p3, r, p16f_cephes_exp_p4);
y2 = padd(r, p16f_1);
y = pmadd(y, r, p16f_cephes_exp_p2);
y1 = pmadd(y1, r, p16f_cephes_exp_p5);
y = pmadd(y, r3, y1);
y = pmadd(y, r2, y2);
// Build emm0 = 2^m.
Packet16i emm0 = _mm512_cvttps_epi32(padd(m, p16f_127));
@@ -253,6 +188,34 @@ pexp<Packet8d>(const Packet8d& _x) {
return pmax(pmul(x, e), _x);
}*/
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexp)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexp)
template <>
EIGEN_STRONG_INLINE Packet16h pfrexp(const Packet16h& a, Packet16h& exponent) {
Packet16f fexponent;
const Packet16h out = float2half(pfrexp<Packet16f>(half2float(a), fexponent));
exponent = float2half(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet16h pldexp(const Packet16h& a, const Packet16h& exponent) {
return float2half(pldexp<Packet16f>(half2float(a), half2float(exponent)));
}
template <>
EIGEN_STRONG_INLINE Packet16bf pfrexp(const Packet16bf& a, Packet16bf& exponent) {
Packet16f fexponent;
const Packet16bf out = F32ToBf16(pfrexp<Packet16f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet16bf pldexp(const Packet16bf& a, const Packet16bf& exponent) {
return F32ToBf16(pldexp<Packet16f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
// Functions for sqrt.
// The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step
@@ -303,12 +266,16 @@ template <>
EIGEN_STRONG_INLINE Packet16f psqrt<Packet16f>(const Packet16f& x) {
return _mm512_sqrt_ps(x);
}
template <>
EIGEN_STRONG_INLINE Packet8d psqrt<Packet8d>(const Packet8d& x) {
return _mm512_sqrt_pd(x);
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, psqrt)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psqrt)
// prsqrt for float.
#if defined(EIGEN_VECTORIZE_AVX512ER)
@@ -316,7 +283,6 @@ template <>
EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
return _mm512_rsqrt28_ps(x);
}
#elif EIGEN_FAST_MATH
template <>
@@ -332,7 +298,7 @@ prsqrt<Packet16f>(const Packet16f& _x) {
__mmask16 inf_mask = _mm512_cmp_ps_mask(_x, p16f_inf, _CMP_EQ_OQ);
__mmask16 not_pos_mask = _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_LE_OQ);
__mmask16 not_finite_pos_mask = not_pos_mask | inf_mask;
// Compute an approximate result using the rsqrt intrinsic, forcing +inf
// for denormals for consistency with AVX and SSE implementations.
Packet16f y_approx = _mm512_rsqrt14_ps(_x);
@@ -347,8 +313,7 @@ prsqrt<Packet16f>(const Packet16f& _x) {
// For other arguments, choose the output of the intrinsic. This will
// return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf.
return _mm512_mask_blend_ps(not_finite_pos_mask, y_newton, y_approx);
}
}
#else
template <>
@@ -356,9 +321,11 @@ EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
_EIGEN_DECLARE_CONST_Packet16f(one, 1.0f);
return _mm512_div_ps(p16f_one, _mm512_sqrt_ps(x));
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, prsqrt)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, prsqrt)
// prsqrt for double.
#if EIGEN_FAST_MATH
template <>
@@ -406,17 +373,21 @@ EIGEN_STRONG_INLINE Packet8d prsqrt<Packet8d>(const Packet8d& x) {
}
#endif
#if defined(EIGEN_VECTORIZE_AVX512DQ)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f plog1p<Packet16f>(const Packet16f& _x) {
return generic_plog1p(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog1p)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog1p)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f pexpm1<Packet16f>(const Packet16f& _x) {
return generic_expm1(_x);
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexpm1)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexpm1)
#endif
@@ -439,6 +410,14 @@ ptanh<Packet16f>(const Packet16f& _x) {
return internal::generic_fast_tanh_float(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, psin)
F16_PACKET_FUNCTION(Packet16f, Packet16h, pcos)
F16_PACKET_FUNCTION(Packet16f, Packet16h, ptanh)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psin)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pcos)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, ptanh)
} // end namespace internal
} // end namespace Eigen

File diff suppressed because it is too large Load Diff

View File

@@ -14,6 +14,22 @@ namespace Eigen {
namespace internal {
template<> EIGEN_STRONG_INLINE Packet16i pcast<Packet16f, Packet16i>(const Packet16f& a) {
return _mm512_cvttps_epi32(a);
}
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16i, Packet16f>(const Packet16i& a) {
return _mm512_cvtepi32_ps(a);
}
template<> EIGEN_STRONG_INLINE Packet16i preinterpret<Packet16i, Packet16f>(const Packet16f& a) {
return _mm512_castps_si512(a);
}
template<> EIGEN_STRONG_INLINE Packet16f preinterpret<Packet16f, Packet16i>(const Packet16i& a) {
return _mm512_castsi512_ps(a);
}
template <>
struct type_casting_traits<half, float> {
enum {
@@ -40,6 +56,32 @@ template<> EIGEN_STRONG_INLINE Packet16h pcast<Packet16f, Packet16h>(const Packe
return float2half(a);
}
template <>
struct type_casting_traits<bfloat16, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16bf, Packet16f>(const Packet16bf& a) {
return Bf16ToF32(a);
}
template <>
struct type_casting_traits<float, bfloat16> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16bf pcast<Packet16f, Packet16bf>(const Packet16f& a) {
return F32ToBf16(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -29,7 +29,7 @@ static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2d_MZERO, (P
//---------- float ----------
struct Packet2cf
{
EIGEN_STRONG_INLINE explicit Packet2cf() : v(p4f_ZERO) {}
EIGEN_STRONG_INLINE explicit Packet2cf() {}
EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {}
Packet4f v;
};
@@ -38,6 +38,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
{
typedef Packet2cf type;
typedef Packet2cf half;
typedef Packet4f as_real;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
@@ -60,7 +61,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; typedef Packet4f as_real; };
template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>& from)
{
@@ -149,22 +150,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packe
return pfirst<Packet2cf>(Packet2cf(b));
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
{
Packet4f b1, b2;
#ifdef _BIG_ENDIAN
b1 = vec_sld(vecs[0].v, vecs[1].v, 8);
b2 = vec_sld(vecs[1].v, vecs[0].v, 8);
#else
b1 = vec_sld(vecs[1].v, vecs[0].v, 8);
b2 = vec_sld(vecs[0].v, vecs[1].v, 8);
#endif
b2 = vec_sld(b2, b2, 8);
b2 = padd<Packet4f>(b1, b2);
return Packet2cf(b2);
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
Packet4f b;
@@ -175,22 +160,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst<Packet2cf>(prod);
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
{
static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset==1)
{
#ifdef _BIG_ENDIAN
first.v = vec_sld(first.v, second.v, 8);
#else
first.v = vec_sld(second.v, first.v, 8);
#endif
}
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
@@ -259,6 +228,11 @@ template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, con
}
#endif
template<> EIGEN_STRONG_INLINE Packet2cf psqrt<Packet2cf>(const Packet2cf& a)
{
return psqrt_complex<Packet2cf>(a);
}
//---------- double ----------
#ifdef __VSX__
struct Packet1cd
@@ -272,6 +246,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
{
typedef Packet1cd type;
typedef Packet1cd half;
typedef Packet2d as_real;
enum {
Vectorizable = 1,
AlignedOnScalar = 0,
@@ -291,7 +266,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; typedef Packet2d as_real; };
template<> EIGEN_STRONG_INLINE Packet1cd pload <Packet1cd>(const std::complex<double>* from) { return Packet1cd(pload<Packet2d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { return Packet1cd(ploadu<Packet2d>((const double*)from)); }
@@ -301,19 +276,13 @@ template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<
template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>& from)
{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index stride)
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index)
{
EIGEN_ALIGN16 std::complex<double> af[2];
af[0] = from[0*stride];
af[1] = from[1*stride];
return pload<Packet1cd>(af);
return pload<Packet1cd>(from);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index stride)
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index)
{
EIGEN_ALIGN16 std::complex<double> af[2];
pstore<std::complex<double> >(af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
pstore<std::complex<double> >(to, from);
}
template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v + b.v); }
@@ -359,20 +328,9 @@ template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet1cd>(const Pac
template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) { return vecs[0]; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<int Offset>
struct palign_impl<Offset,Packet1cd>
{
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
{
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
@@ -439,6 +397,11 @@ template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packe
return Packet1cd(vec_and(eq, eq_swapped));
}
template<> EIGEN_STRONG_INLINE Packet1cd psqrt<Packet1cd>(const Packet1cd& a)
{
return psqrt_complex<Packet1cd>(a);
}
#endif // __VSX__
} // end namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,80 @@
namespace Eigen {
namespace internal {
const static Packet16uc p16uc_SETCOMPLEX32_FIRST = { 0, 1, 2, 3,
16, 17, 18, 19,
4, 5, 6, 7,
20, 21, 22, 23};
const static Packet16uc p16uc_SETCOMPLEX32_SECOND = { 8, 9, 10, 11,
24, 25, 26, 27,
12, 13, 14, 15,
28, 29, 30, 31};
//[a,b],[ai,bi] = [a,ai] - This is equivalent to p16uc_GETREAL64
const static Packet16uc p16uc_SETCOMPLEX64_FIRST = { 0, 1, 2, 3, 4, 5, 6, 7,
16, 17, 18, 19, 20, 21, 22, 23};
//[a,b],[ai,bi] = [b,bi] - This is equivalent to p16uc_GETIMAG64
const static Packet16uc p16uc_SETCOMPLEX64_SECOND = { 8, 9, 10, 11, 12, 13, 14, 15,
24, 25, 26, 27, 28, 29, 30, 31};
// Grab two decouples real/imaginary PacketBlocks and return two coupled (real/imaginary pairs) PacketBlocks.
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc,8>& tRes, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX32_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_SECOND);
acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_SECOND);
acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX32_SECOND);
acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX32_SECOND);
acc1.packet[0] = padd<Packetc>(tRes.packet[0], acc1.packet[0]);
acc1.packet[1] = padd<Packetc>(tRes.packet[1], acc1.packet[1]);
acc1.packet[2] = padd<Packetc>(tRes.packet[2], acc1.packet[2]);
acc1.packet[3] = padd<Packetc>(tRes.packet[3], acc1.packet[3]);
acc2.packet[0] = padd<Packetc>(tRes.packet[4], acc2.packet[0]);
acc2.packet[1] = padd<Packetc>(tRes.packet[5], acc2.packet[1]);
acc2.packet[2] = padd<Packetc>(tRes.packet[6], acc2.packet[2]);
acc2.packet[3] = padd<Packetc>(tRes.packet[7], acc2.packet[3]);
}
template<>
EIGEN_STRONG_INLINE void bcouple<Packet2d, Packet1cd>(PacketBlock<Packet2d,4>& taccReal, PacketBlock<Packet2d,4>& taccImag, PacketBlock<Packet1cd,8>& tRes, PacketBlock<Packet1cd, 4>& acc1, PacketBlock<Packet1cd, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX64_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_SECOND);
acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_SECOND);
acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX64_SECOND);
acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX64_SECOND);
acc1.packet[0] = padd<Packet1cd>(tRes.packet[0], acc1.packet[0]);
acc1.packet[1] = padd<Packet1cd>(tRes.packet[1], acc1.packet[1]);
acc1.packet[2] = padd<Packet1cd>(tRes.packet[2], acc1.packet[2]);
acc1.packet[3] = padd<Packet1cd>(tRes.packet[3], acc1.packet[3]);
acc2.packet[0] = padd<Packet1cd>(tRes.packet[4], acc2.packet[0]);
acc2.packet[1] = padd<Packet1cd>(tRes.packet[5], acc2.packet[1]);
acc2.packet[2] = padd<Packet1cd>(tRes.packet[6], acc2.packet[2]);
acc2.packet[3] = padd<Packet1cd>(tRes.packet[7], acc2.packet[3]);
}
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadRhs(const Scalar *rhs)
{
return *((Packet *)rhs);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -0,0 +1,791 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2020 Everton Constantino (everton.constantino@ibm.com)
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H
#define EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H
#pragma GCC target("cpu=power10")
namespace Eigen {
namespace internal {
const static Packet16uc MMA_p16uc_SETCOMPLEX32_FIRST = { 0, 1, 2, 3,
16, 17, 18, 19,
4, 5, 6, 7,
20, 21, 22, 23};
const static Packet16uc MMA_p16uc_SETCOMPLEX32_SECOND = { 8, 9, 10, 11,
24, 25, 26, 27,
12, 13, 14, 15,
28, 29, 30, 31};
//[a,b],[ai,bi] = [a,ai] - This is equivalent to p16uc_GETREAL64
const static Packet16uc MMA_p16uc_SETCOMPLEX64_FIRST = { 0, 1, 2, 3, 4, 5, 6, 7,
16, 17, 18, 19, 20, 21, 22, 23};
//[a,b],[ai,bi] = [b,bi] - This is equivalent to p16uc_GETIMAG64
const static Packet16uc MMA_p16uc_SETCOMPLEX64_SECOND = { 8, 9, 10, 11, 12, 13, 14, 15,
24, 25, 26, 27, 28, 29, 30, 31};
// Grab two decouples real/imaginary PacketBlocks and return two coupled (real/imaginary pairs) PacketBlocks.
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcoupleMMA(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc,8>& tRes, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], MMA_p16uc_SETCOMPLEX32_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], MMA_p16uc_SETCOMPLEX32_FIRST);
acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], MMA_p16uc_SETCOMPLEX32_FIRST);
acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], MMA_p16uc_SETCOMPLEX32_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], MMA_p16uc_SETCOMPLEX32_SECOND);
acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], MMA_p16uc_SETCOMPLEX32_SECOND);
acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], MMA_p16uc_SETCOMPLEX32_SECOND);
acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], MMA_p16uc_SETCOMPLEX32_SECOND);
acc1.packet[0] = padd<Packetc>(tRes.packet[0], acc1.packet[0]);
acc1.packet[1] = padd<Packetc>(tRes.packet[1], acc1.packet[1]);
acc1.packet[2] = padd<Packetc>(tRes.packet[2], acc1.packet[2]);
acc1.packet[3] = padd<Packetc>(tRes.packet[3], acc1.packet[3]);
acc2.packet[0] = padd<Packetc>(tRes.packet[4], acc2.packet[0]);
acc2.packet[1] = padd<Packetc>(tRes.packet[5], acc2.packet[1]);
acc2.packet[2] = padd<Packetc>(tRes.packet[6], acc2.packet[2]);
acc2.packet[3] = padd<Packetc>(tRes.packet[7], acc2.packet[3]);
}
template<>
EIGEN_STRONG_INLINE void bcoupleMMA<Packet2d, Packet1cd>(PacketBlock<Packet2d,4>& taccReal, PacketBlock<Packet2d,4>& taccImag, PacketBlock<Packet1cd,8>& tRes, PacketBlock<Packet1cd, 4>& acc1, PacketBlock<Packet1cd, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], MMA_p16uc_SETCOMPLEX64_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], MMA_p16uc_SETCOMPLEX64_FIRST);
acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], MMA_p16uc_SETCOMPLEX64_FIRST);
acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], MMA_p16uc_SETCOMPLEX64_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], MMA_p16uc_SETCOMPLEX64_SECOND);
acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], MMA_p16uc_SETCOMPLEX64_SECOND);
acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], MMA_p16uc_SETCOMPLEX64_SECOND);
acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], MMA_p16uc_SETCOMPLEX64_SECOND);
acc1.packet[0] = padd<Packet1cd>(tRes.packet[0], acc1.packet[0]);
acc1.packet[1] = padd<Packet1cd>(tRes.packet[1], acc1.packet[1]);
acc1.packet[2] = padd<Packet1cd>(tRes.packet[2], acc1.packet[2]);
acc1.packet[3] = padd<Packet1cd>(tRes.packet[3], acc1.packet[3]);
acc2.packet[0] = padd<Packet1cd>(tRes.packet[4], acc2.packet[0]);
acc2.packet[1] = padd<Packet1cd>(tRes.packet[5], acc2.packet[1]);
acc2.packet[2] = padd<Packet1cd>(tRes.packet[6], acc2.packet[2]);
acc2.packet[3] = padd<Packet1cd>(tRes.packet[7], acc2.packet[3]);
}
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadLhsMMA(const Scalar *lhs)
{
return *((Packet *)lhs);
}
template<typename Packet>
EIGEN_STRONG_INLINE PacketBlock<Packet,2> pmul(const PacketBlock<Packet,2>& a, const Packet& b)
{
PacketBlock<Packet,2> pb;
pb.packet[0] = a.packet[0]*b;
pb.packet[1] = a.packet[1]*b;
return pb;
}
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void bsetzeroMMA(__vector_quad *acc)
{
__builtin_mma_xxsetaccz(acc);
}
template<typename DataMapper, typename Index, typename Packet>
EIGEN_STRONG_INLINE void storeAccumulator(Index i, Index j, const DataMapper& data, const Packet& alpha, __vector_quad *acc)
{
PacketBlock<Packet, 4> result;
__builtin_mma_disassemble_acc(&result.packet, acc);
result.packet[0] = pmadd<Packet>(alpha, result.packet[0], data.template loadPacket<Packet>(i, j + 0));
result.packet[1] = pmadd<Packet>(alpha, result.packet[1], data.template loadPacket<Packet>(i, j + 1));
result.packet[2] = pmadd<Packet>(alpha, result.packet[2], data.template loadPacket<Packet>(i, j + 2));
result.packet[3] = pmadd<Packet>(alpha, result.packet[3], data.template loadPacket<Packet>(i, j + 3));
data.template storePacketBlock<Packet, 4>(i, j, result);
}
template<typename DataMapper, typename Index, typename Packet, typename Packetc, int N>
EIGEN_STRONG_INLINE void storeComplexAccumulator(Index i, Index j, const DataMapper& data, const Packet& alphaReal, const Packet& alphaImag, __vector_quad *accReal, __vector_quad *accImag, const int accColsC)
{
PacketBlock<Packet, 4> resultReal, resultImag;
__builtin_mma_disassemble_acc(&resultReal.packet, accReal);
__builtin_mma_disassemble_acc(&resultImag.packet, accImag);
PacketBlock<Packet,4> taccReal, taccImag;
taccReal.packet[0] = pmul<Packet>(resultReal.packet[0], alphaReal);
taccReal.packet[1] = pmul<Packet>(resultReal.packet[1], alphaReal);
taccReal.packet[2] = pmul<Packet>(resultReal.packet[2], alphaReal);
taccReal.packet[3] = pmul<Packet>(resultReal.packet[3], alphaReal);
taccImag.packet[0] = pmul<Packet>(resultImag.packet[0], alphaReal);
taccImag.packet[1] = pmul<Packet>(resultImag.packet[1], alphaReal);
taccImag.packet[2] = pmul<Packet>(resultImag.packet[2], alphaReal);
taccImag.packet[3] = pmul<Packet>(resultImag.packet[3], alphaReal);
taccReal.packet[0] = psub<Packet>(taccReal.packet[0], pmul<Packet>(resultImag.packet[0], alphaImag));
taccReal.packet[1] = psub<Packet>(taccReal.packet[1], pmul<Packet>(resultImag.packet[1], alphaImag));
taccReal.packet[2] = psub<Packet>(taccReal.packet[2], pmul<Packet>(resultImag.packet[2], alphaImag));
taccReal.packet[3] = psub<Packet>(taccReal.packet[3], pmul<Packet>(resultImag.packet[3], alphaImag));
taccImag.packet[0] = pmadd<Packet>(resultReal.packet[0], alphaImag, taccImag.packet[0]);
taccImag.packet[1] = pmadd<Packet>(resultReal.packet[1], alphaImag, taccImag.packet[1]);
taccImag.packet[2] = pmadd<Packet>(resultReal.packet[2], alphaImag, taccImag.packet[2]);
taccImag.packet[3] = pmadd<Packet>(resultReal.packet[3], alphaImag, taccImag.packet[3]);
PacketBlock<Packetc, 8> tRes;
tRes.packet[0] = data.template loadPacket<Packetc>(i + N*accColsC, j + 0);
tRes.packet[1] = data.template loadPacket<Packetc>(i + N*accColsC, j + 1);
tRes.packet[2] = data.template loadPacket<Packetc>(i + N*accColsC, j + 2);
tRes.packet[3] = data.template loadPacket<Packetc>(i + N*accColsC, j + 3);
tRes.packet[4] = data.template loadPacket<Packetc>(i + (N+1)*accColsC, j + 0);
tRes.packet[5] = data.template loadPacket<Packetc>(i + (N+1)*accColsC, j + 1);
tRes.packet[6] = data.template loadPacket<Packetc>(i + (N+1)*accColsC, j + 2);
tRes.packet[7] = data.template loadPacket<Packetc>(i + (N+1)*accColsC, j + 3);
PacketBlock<Packetc, 4> acc1, acc2;
bcoupleMMA<Packet, Packetc>(taccReal, taccImag, tRes, acc1, acc2);
data.template storePacketBlock<Packetc, 4>(i + N*accColsC, j, acc1);
data.template storePacketBlock<Packetc, 4>(i + (N+1)*accColsC, j, acc2);
}
// Defaults to float32, since Eigen still supports C++03 we can't use default template arguments
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad *acc, const RhsPacket& a, const LhsPacket& b)
{
if(NegativeAccumulate)
{
__builtin_mma_xvf32gernp(acc, (__vector unsigned char)a, (__vector unsigned char)b);
} else {
__builtin_mma_xvf32gerpp(acc, (__vector unsigned char)a, (__vector unsigned char)b);
}
}
template<>
EIGEN_STRONG_INLINE void pgerMMA<Packet2d, PacketBlock<Packet2d, 2>, false>(__vector_quad *acc, const PacketBlock<Packet2d,2>& a, const Packet2d& b)
{
__vector_pair *a0 = (__vector_pair *)(&a.packet[0]);
__builtin_mma_xvf64gerpp(acc, *a0, (__vector unsigned char)b);
}
template<>
EIGEN_STRONG_INLINE void pgerMMA<Packet2d, PacketBlock<Packet2d, 2>, true>(__vector_quad *acc, const PacketBlock<Packet2d, 2>& a, const Packet2d& b)
{
__vector_pair *a0 = (__vector_pair *)(&a.packet[0]);
__builtin_mma_xvf64gernp(acc, *a0, (__vector unsigned char)b);
}
template<>
EIGEN_STRONG_INLINE void pgerMMA<Packet2d, __vector_pair, false>(__vector_quad *acc, const __vector_pair& a, const Packet2d& b)
{
__builtin_mma_xvf64gerpp(acc, a, (__vector unsigned char)b);
}
template<>
EIGEN_STRONG_INLINE void pgerMMA<Packet2d, __vector_pair, true>(__vector_quad *acc, const __vector_pair& a, const Packet2d& b)
{
__builtin_mma_xvf64gernp(acc, a, (__vector unsigned char)b);
}
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void ploadRhsMMA(const Scalar *rhs, Packet &rhsV)
{
rhsV = *((Packet *)rhs);
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA<double, PacketBlock<Packet2d, 2> >(const double *rhs, PacketBlock<Packet2d, 2> &rhsV)
{
rhsV.packet[0] = *((Packet2d *)rhs );
rhsV.packet[1] = *(((Packet2d *)rhs) + 1);
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA<double, __vector_pair>(const double *rhs, __vector_pair &rhsV)
{
__builtin_mma_assemble_pair(&rhsV, (__vector unsigned char)(*(((Packet2d *)rhs) + 1)), (__vector unsigned char)(*((Packet2d *)rhs)));
}
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accRows>
EIGEN_STRONG_INLINE void gemm_extra_col(
const DataMapper& res,
const Scalar *lhs_base,
const Scalar *rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index row,
Index col,
Index remaining_rows,
Index remaining_cols,
const Packet& pAlpha);
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accRows>
EIGEN_STRONG_INLINE void gemm_extra_row(
const DataMapper& res,
const Scalar *lhs_base,
const Scalar *rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index row,
Index col,
Index cols,
Index remaining_rows,
const Packet& pAlpha,
const Packet& pMask);
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accCols>
EIGEN_STRONG_INLINE void gemm_unrolled_col(
const DataMapper& res,
const Scalar *lhs_base,
const Scalar *rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index& row,
Index rows,
Index col,
Index remaining_cols,
const Packet& pAlpha);
template<typename Packet>
EIGEN_STRONG_INLINE Packet bmask(const int remaining_rows);
#define MICRO_MMA_DST \
__vector_quad *accZero0, __vector_quad *accZero1, __vector_quad *accZero2, \
__vector_quad *accZero3, __vector_quad *accZero4, __vector_quad *accZero5, \
__vector_quad *accZero6, __vector_quad *accZero7
#define MICRO_MMA_SRC \
const Scalar **lhs_ptr0, const Scalar **lhs_ptr1, const Scalar **lhs_ptr2, \
const Scalar **lhs_ptr3, const Scalar **lhs_ptr4, const Scalar **lhs_ptr5, \
const Scalar **lhs_ptr6, const Scalar **lhs_ptr7
#define MICRO_MMA_ONE \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_MMA<unroll_factor, Scalar, Packet, RhsPacket, accRows, accCols>(\
&lhs_ptr0, &lhs_ptr1, &lhs_ptr2, &lhs_ptr3, &lhs_ptr4, &lhs_ptr5, &lhs_ptr6, &lhs_ptr7, \
rhs_ptr, \
&accZero0, &accZero1, &accZero2, &accZero3, &accZero4, &accZero5, &accZero6, &accZero7); \
} else { \
MICRO_MMA<unroll_factor, Scalar, Packet, __vector_pair, accRows, accCols>(\
&lhs_ptr0, &lhs_ptr1, &lhs_ptr2, &lhs_ptr3, &lhs_ptr4, &lhs_ptr5, &lhs_ptr6, &lhs_ptr7, \
rhs_ptr, \
&accZero0, &accZero1, &accZero2, &accZero3, &accZero4, &accZero5, &accZero6, &accZero7); \
}
#define MICRO_MMA_WORK_ONE(iter) \
if (N > iter) { \
Packet lhsV = ploadLhsMMA<Scalar, Packet>(*lhs_ptr##iter); \
pgerMMA<Packet, RhsPacket, false>(accZero##iter, rhsV, lhsV); \
*lhs_ptr##iter += accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(accZero##iter); \
EIGEN_UNUSED_VARIABLE(lhs_ptr##iter); \
}
#define MICRO_MMA_UNROLL(func) \
func(0) func(1) func(2) func(3) func(4) func(5) func(6) func(7)
#define MICRO_MMA_WORK MICRO_MMA_UNROLL(MICRO_MMA_WORK_ONE)
#define MICRO_MMA_DST_PTR_ONE(iter) \
if (unroll_factor > iter){ \
bsetzeroMMA<Scalar, Packet>(&accZero##iter); \
} else { \
EIGEN_UNUSED_VARIABLE(accZero##iter); \
}
#define MICRO_MMA_DST_PTR MICRO_MMA_UNROLL(MICRO_MMA_DST_PTR_ONE)
#define MICRO_MMA_SRC_PTR_ONE(iter) \
if (unroll_factor > iter) { \
lhs_ptr##iter = lhs_base + ( (row/accCols) + iter )*strideA*accCols + accCols*offsetA; \
} else { \
EIGEN_UNUSED_VARIABLE(lhs_ptr##iter); \
}
#define MICRO_MMA_SRC_PTR MICRO_MMA_UNROLL(MICRO_MMA_SRC_PTR_ONE)
#define MICRO_MMA_PREFETCH_ONE(iter) \
if (unroll_factor > iter){ \
prefetch(lhs_ptr##iter); \
}
#define MICRO_MMA_PREFETCH MICRO_MMA_UNROLL(MICRO_MMA_PREFETCH_ONE)
#define MICRO_MMA_STORE_ONE(iter) \
if (unroll_factor > iter){ \
storeAccumulator<DataMapper, Index, Packet>(row + iter*accCols, col, res, pAlpha, &accZero##iter); \
}
#define MICRO_MMA_STORE MICRO_MMA_UNROLL(MICRO_MMA_STORE_ONE)
// PEEL_MMA loop factor.
#define PEEL_MMA 10
template<int N, typename Scalar, typename Packet, typename RhsPacket, const Index accRows, const Index accCols>
EIGEN_STRONG_INLINE void MICRO_MMA(
MICRO_MMA_SRC,
const Scalar* &rhs_ptr,
MICRO_MMA_DST)
{
RhsPacket rhsV;
ploadRhsMMA<Scalar, RhsPacket>(rhs_ptr, rhsV);
MICRO_MMA_WORK
rhs_ptr += accRows;
}
template<int unroll_factor, typename Scalar, typename Packet, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols>
EIGEN_STRONG_INLINE void gemm_unrolled_MMA_iteration(
const DataMapper& res,
const Scalar *lhs_base,
const Scalar *rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index& row,
Index col,
const Packet& pAlpha)
{
const Scalar *rhs_ptr = rhs_base;
const Scalar *lhs_ptr0, *lhs_ptr1, *lhs_ptr2, *lhs_ptr3, *lhs_ptr4, *lhs_ptr5, *lhs_ptr6, *lhs_ptr7;
__vector_quad accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7;
asm("#unrolled MMA start");
MICRO_MMA_SRC_PTR
MICRO_MMA_DST_PTR
Index k = 0;
for(; k + PEEL_MMA <= depth; k+= PEEL_MMA)
{
prefetch(rhs_ptr);
MICRO_MMA_PREFETCH
for (int l = 0; l < PEEL_MMA; l++) {
MICRO_MMA_ONE
}
}
for(; k < depth; k++)
{
MICRO_MMA_ONE
}
MICRO_MMA_STORE
row += unroll_factor*accCols;
asm("#unrolled MMA end");
}
template<typename Scalar, typename Index, typename Packet, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols>
void gemmMMA(const DataMapper& res, const Scalar* blockA, const Scalar* blockB, Index rows, Index depth, Index cols, Scalar alpha, Index strideA, Index strideB, Index offsetA, Index offsetB)
{
const Index remaining_rows = rows % accCols;
const Index remaining_cols = cols % accRows;
if( strideA == -1 ) strideA = depth;
if( strideB == -1 ) strideB = depth;
const Packet pAlpha = pset1<Packet>(alpha);
const Packet pMask = bmask<Packet>((const int)(remaining_rows));
Index col = 0;
for(; col + accRows <= cols; col += accRows)
{
const Scalar *rhs_base = blockB + col*strideB + accRows*offsetB;
const Scalar *lhs_base = blockA;
Index row = 0;
#define MAX_MMA_UNROLL 7
while(row + MAX_MMA_UNROLL*accCols <= rows){
gemm_unrolled_MMA_iteration<MAX_MMA_UNROLL, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
}
switch( (rows-row)/accCols ){
#if MAX_MMA_UNROLL > 7
case 7:
gemm_unrolled_MMA_iteration<7, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 6
case 6:
gemm_unrolled_MMA_iteration<6, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 5
case 5:
gemm_unrolled_MMA_iteration<5, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 4
case 4:
gemm_unrolled_MMA_iteration<4, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 3
case 3:
gemm_unrolled_MMA_iteration<3, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 2
case 2:
gemm_unrolled_MMA_iteration<2, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 1
case 1:
gemm_unrolled_MMA_iteration<1, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
default:
break;
}
#undef MAX_MMA_UNROLL
if(remaining_rows > 0)
{
gemm_extra_row<Scalar, Packet, DataMapper, Index, accRows>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, cols, remaining_rows, pAlpha, pMask);
}
}
if(remaining_cols > 0)
{
const Scalar *rhs_base = blockB + col*strideB + remaining_cols*offsetB;
const Scalar *lhs_base = blockA;
for(; col < cols; col++)
{
Index row = 0;
gemm_unrolled_col<Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, rows, col, remaining_cols, pAlpha);
if (remaining_rows > 0)
{
gemm_extra_col<Scalar, Packet, DataMapper, Index, accRows>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_rows, remaining_cols, pAlpha);
}
rhs_base++;
}
}
}
template<typename LhsScalar, typename RhsScalar, typename Scalarc, typename Scalar, typename Index, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, const int accRows, const int accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
void gemm_complexMMA(const DataMapper& res, const LhsScalar* blockAc, const RhsScalar* blockBc,
Index rows, Index depth, Index cols, Scalarc alpha, Index strideA, Index strideB, Index offsetA, Index offsetB)
{
const int remaining_rows = rows % accCols;
const int remaining_cols = cols % accRows;
const int accColsC = accCols / 2;
int advanceCols = 2;
int advanceRows = 2;
if(LhsIsReal) advanceRows = 1;
if(RhsIsReal) advanceCols = 1;
if( strideA == -1 ) strideA = depth;
if( strideB == -1 ) strideB = depth;
const Packet pAlphaReal = pset1<Packet>(alpha.real());
const Packet pAlphaImag = pset1<Packet>(alpha.imag());
const Scalar *blockA = (Scalar *) blockAc;
const Scalar *blockB = (Scalar *) blockBc;
Packet conj = pset1<Packet>((Scalar)-1.0f);
Index col = 0;
for(; col + accRows <= cols; col += accRows)
{
const Scalar *rhs_base = blockB + ( (advanceCols*col)/accRows )*strideB*accRows;
const Scalar *lhs_base = blockA;
Index row = 0;
for(; row + accCols <= rows; row += accCols)
{
const Scalar *rhs_ptr = rhs_base;
const Scalar *rhs_ptr_imag = rhs_ptr + accRows*strideB;
const Scalar *lhs_ptr = lhs_base + ((advanceRows*row)/accCols)*strideA*accCols;
const Scalar *lhs_ptr_imag = lhs_ptr + accCols*strideA;
__vector_quad accReal, accImag;
__builtin_mma_xxsetaccz(&accReal);
__builtin_mma_xxsetaccz(&accImag);
lhs_ptr += accCols*offsetA;
if(!LhsIsReal)
lhs_ptr_imag += accCols*offsetA;
rhs_ptr += accRows*offsetB;
if(!RhsIsReal)
rhs_ptr_imag += accRows*offsetB;
for(Index k = 0; k < depth; k++)
{
Packet lhsV = ploadLhsMMA<Scalar, Packet>(lhs_ptr);
RhsPacket rhsV = ploadRhs<Scalar, RhsPacket>(rhs_ptr);
Packet lhsVi = ploadLhsMMA<Scalar, Packet>(lhs_ptr_imag);
RhsPacket rhsVi = ploadRhs<Scalar, RhsPacket>(rhs_ptr_imag);
if(ConjugateLhs && !LhsIsReal) lhsVi = pmul<Packet>(lhsVi, conj);
if(ConjugateRhs && !RhsIsReal) rhsVi = pmul<Packet>(rhsVi, conj);
if(LhsIsReal)
{
pgerMMA<Packet, RhsPacket, false>(&accReal, rhsV, lhsV);
pgerMMA<Packet, RhsPacket, false>(&accImag, rhsVi, lhsV);
} else if(RhsIsReal) {
pgerMMA<Packet, RhsPacket, false>(&accReal, rhsV, lhsV);
pgerMMA<Packet, RhsPacket, false>(&accImag, rhsV, lhsVi);
} else {
pgerMMA<Packet, RhsPacket, false>(&accReal, rhsV, lhsV);
pgerMMA<Packet, RhsPacket, true>(&accReal, rhsVi, lhsVi);
pgerMMA<Packet, RhsPacket, false>(&accImag, rhsVi, lhsV);
pgerMMA<Packet, RhsPacket, false>(&accImag, rhsV, lhsVi);
}
lhs_ptr += accCols;
rhs_ptr += accRows;
if(!LhsIsReal)
lhs_ptr_imag += accCols;
if(!RhsIsReal)
rhs_ptr_imag += accRows;
}
storeComplexAccumulator<DataMapper, Index, Packet, Packetc, 0>(row, col, res, pAlphaReal, pAlphaImag, &accReal, &accImag, accColsC);
}
if(remaining_rows > 0)
{
const Scalar *rhs_ptr = rhs_base;
const Scalar *rhs_ptr_imag = rhs_ptr + accRows*strideB;
const Scalar *lhs_ptr = lhs_base + ((advanceRows*row)/accCols)*strideA*accCols;
const Scalar *lhs_ptr_imag = lhs_ptr + remaining_rows*strideA;
lhs_ptr += remaining_rows*offsetA;
if(!LhsIsReal)
lhs_ptr_imag += remaining_rows*offsetA;
rhs_ptr += accRows*offsetB;
if(!RhsIsReal)
rhs_ptr_imag += accRows*offsetB;
for(Index k = 0; k < depth; k++)
{
for(Index arow = 0; arow < remaining_rows; arow++)
{
Scalar lhs_real = lhs_ptr[arow];
Scalar lhs_imag;
if(!LhsIsReal) lhs_imag = lhs_ptr_imag[arow];
Scalarc lhsc;
lhsc.real(lhs_real);
if(!LhsIsReal)
{
if(ConjugateLhs)
lhsc.imag(-lhs_imag);
else
lhsc.imag(lhs_imag);
} else {
//Lazy approach for now
lhsc.imag((Scalar)0);
}
for(int acol = 0; acol < accRows; acol++ )
{
Scalar rhs_real = rhs_ptr[acol];
Scalar rhs_imag;
if(!RhsIsReal) rhs_imag = rhs_ptr_imag[acol];
Scalarc rhsc;
rhsc.real(rhs_real);
if(!RhsIsReal)
{
if(ConjugateRhs)
rhsc.imag(-rhs_imag);
else
rhsc.imag(rhs_imag);
} else {
//Lazy approach for now
rhsc.imag((Scalar)0);
}
res(row + arow, col + acol) += alpha*lhsc*rhsc;
}
}
rhs_ptr += accRows;
lhs_ptr += remaining_rows;
if(!LhsIsReal)
lhs_ptr_imag += remaining_rows;
if(!RhsIsReal)
rhs_ptr_imag += accRows;
}
}
}
if(remaining_cols > 0)
{
const Scalar *rhs_base = blockB + ( (advanceCols*col)/accRows )*strideB*accRows;
const Scalar *lhs_base = blockA;
Index row = 0;
for(; row + accCols <= rows; row += accCols)
{
const Scalar *rhs_ptr = rhs_base;
const Scalar *rhs_ptr_imag = rhs_ptr + remaining_cols*strideB;
const Scalar *lhs_ptr = lhs_base + ((advanceRows*row)/accCols)*strideA*accCols;
const Scalar *lhs_ptr_imag = lhs_ptr + accCols*strideA;
lhs_ptr += accCols*offsetA;
if(!LhsIsReal)
lhs_ptr_imag += accCols*offsetA;
rhs_ptr += remaining_cols*offsetB;
if(!RhsIsReal)
rhs_ptr_imag += remaining_cols*offsetB;
Scalarc scalarAcc[4][4];
for(Index arow = 0; arow < 4; arow++ )
{
for(Index acol = 0; acol < 4; acol++ )
{
scalarAcc[arow][acol].real((Scalar)0.0f);
scalarAcc[arow][acol].imag((Scalar)0.0f);
}
}
for(Index k = 0; k < depth; k++)
{
for(Index arow = 0; arow < accCols; arow++)
{
Scalar lhs_real = lhs_ptr[arow];
Scalar lhs_imag;
if(!LhsIsReal)
{
lhs_imag = lhs_ptr_imag[arow];
if(ConjugateLhs)
lhs_imag *= -1;
} else {
lhs_imag = (Scalar)0;
}
for(int acol = 0; acol < remaining_cols; acol++ )
{
Scalar rhs_real = rhs_ptr[acol];
Scalar rhs_imag;
if(!RhsIsReal)
{
rhs_imag = rhs_ptr_imag[acol];
if(ConjugateRhs)
rhs_imag *= -1;
} else {
rhs_imag = (Scalar)0;
}
scalarAcc[arow][acol].real(scalarAcc[arow][acol].real() + lhs_real*rhs_real - lhs_imag*rhs_imag);
scalarAcc[arow][acol].imag(scalarAcc[arow][acol].imag() + lhs_imag*rhs_real + lhs_real*rhs_imag);
}
}
rhs_ptr += remaining_cols;
lhs_ptr += accCols;
if(!RhsIsReal)
rhs_ptr_imag += remaining_cols;
if(!LhsIsReal)
lhs_ptr_imag += accCols;
}
for(int arow = 0; arow < accCols; arow++ )
{
for(int acol = 0; acol < remaining_cols; acol++ )
{
Scalar accR = scalarAcc[arow][acol].real();
Scalar accI = scalarAcc[arow][acol].imag();
Scalar aR = alpha.real();
Scalar aI = alpha.imag();
Scalar resR = res(row + arow, col + acol).real();
Scalar resI = res(row + arow, col + acol).imag();
res(row + arow, col + acol).real(resR + accR*aR - accI*aI);
res(row + arow, col + acol).imag(resI + accR*aI + accI*aR);
}
}
}
if(remaining_rows > 0)
{
const Scalar *rhs_ptr = rhs_base;
const Scalar *rhs_ptr_imag = rhs_ptr + remaining_cols*strideB;
const Scalar *lhs_ptr = lhs_base + ((advanceRows*row)/accCols)*strideA*accCols;
const Scalar *lhs_ptr_imag = lhs_ptr + remaining_rows*strideA;
lhs_ptr += remaining_rows*offsetA;
if(!LhsIsReal)
lhs_ptr_imag += remaining_rows*offsetA;
rhs_ptr += remaining_cols*offsetB;
if(!RhsIsReal)
rhs_ptr_imag += remaining_cols*offsetB;
for(Index k = 0; k < depth; k++)
{
for(Index arow = 0; arow < remaining_rows; arow++)
{
Scalar lhs_real = lhs_ptr[arow];
Scalar lhs_imag;
if(!LhsIsReal) lhs_imag = lhs_ptr_imag[arow];
Scalarc lhsc;
lhsc.real(lhs_real);
if(!LhsIsReal)
{
if(ConjugateLhs)
lhsc.imag(-lhs_imag);
else
lhsc.imag(lhs_imag);
} else {
lhsc.imag((Scalar)0);
}
for(Index acol = 0; acol < remaining_cols; acol++ )
{
Scalar rhs_real = rhs_ptr[acol];
Scalar rhs_imag;
if(!RhsIsReal) rhs_imag = rhs_ptr_imag[acol];
Scalarc rhsc;
rhsc.real(rhs_real);
if(!RhsIsReal)
{
if(ConjugateRhs)
rhsc.imag(-rhs_imag);
else
rhsc.imag(rhs_imag);
} else {
rhsc.imag((Scalar)0);
}
res(row + arow, col + acol) += alpha*lhsc*rhsc;
}
}
rhs_ptr += remaining_cols;
lhs_ptr += remaining_rows;
if(!LhsIsReal)
lhs_ptr_imag += remaining_rows;
if(!RhsIsReal)
rhs_ptr_imag += remaining_cols;
}
}
}
}
#pragma GCC reset_options
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H

File diff suppressed because it is too large Load Diff

View File

@@ -2,6 +2,7 @@
// for linear algebra.
//
// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
// Copyright (C) 2021 C. Antonio Sanchez <cantonios@google.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
@@ -12,92 +13,237 @@
// clang-format off
namespace Eigen {
namespace internal {
#if defined(EIGEN_CUDACC) && defined(EIGEN_USE_GPU)
#if defined(EIGEN_CUDACC) && defined(EIGEN_GPU_COMPILE_PHASE)
// Many std::complex methods such as operator+, operator-, operator* and
// operator/ are not constexpr. Due to this, clang does not treat them as device
// functions and thus Eigen functors making use of these operators fail to
// compile. Here, we manually specialize these functors for complex types when
// building for CUDA to avoid non-constexpr methods.
// operator/ are not constexpr. Due to this, GCC and older versions of clang do
// not treat them as device functions and thus Eigen functors making use of
// these operators fail to compile. Here, we manually specialize these
// operators and functors for complex types when building for CUDA to enable
// their use on-device.
// Sum
template<typename T> struct scalar_sum_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
typedef typename std::complex<T> result_type;
// Import Eigen's internal operator specializations.
#define EIGEN_USING_STD_COMPLEX_OPERATORS \
using Eigen::complex_operator_detail::operator+; \
using Eigen::complex_operator_detail::operator-; \
using Eigen::complex_operator_detail::operator*; \
using Eigen::complex_operator_detail::operator/; \
using Eigen::complex_operator_detail::operator+=; \
using Eigen::complex_operator_detail::operator-=; \
using Eigen::complex_operator_detail::operator*=; \
using Eigen::complex_operator_detail::operator/=; \
using Eigen::complex_operator_detail::operator==; \
using Eigen::complex_operator_detail::operator!=;
EIGEN_EMPTY_STRUCT_CTOR(scalar_sum_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
return std::complex<T>(numext::real(a) + numext::real(b),
numext::imag(a) + numext::imag(b));
}
};
namespace Eigen {
template<typename T> struct scalar_sum_op<std::complex<T>, std::complex<T> > : scalar_sum_op<const std::complex<T>, const std::complex<T> > {};
// Specialized std::complex overloads.
namespace complex_operator_detail {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_multiply(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
return std::complex<T>(
a_real * b_real - a_imag * b_imag,
a_imag * b_real + a_real * b_imag);
}
// Difference
template<typename T> struct scalar_difference_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
typedef typename std::complex<T> result_type;
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_fast(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = T(1) / (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) * norm,
(a_imag * b_real - a_real * b_imag) * norm);
}
EIGEN_EMPTY_STRUCT_CTOR(scalar_difference_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
return std::complex<T>(numext::real(a) - numext::real(b),
numext::imag(a) - numext::imag(b));
}
};
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_stable(const std::complex<T>& a, const std::complex<T>& b) {
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
// Guard against over/under-flow.
const T scale = T(1) / (numext::abs(b_real) + numext::abs(b_imag));
const T a_real_scaled = numext::real(a) * scale;
const T a_imag_scaled = numext::imag(a) * scale;
const T b_real_scaled = b_real * scale;
const T b_imag_scaled = b_imag * scale;
const T b_norm2_scaled = b_real_scaled * b_real_scaled + b_imag_scaled * b_imag_scaled;
return std::complex<T>(
(a_real_scaled * b_real_scaled + a_imag_scaled * b_imag_scaled) / b_norm2_scaled,
(a_imag_scaled * b_real_scaled - a_real_scaled * b_imag_scaled) / b_norm2_scaled);
}
template<typename T> struct scalar_difference_op<std::complex<T>, std::complex<T> > : scalar_difference_op<const std::complex<T>, const std::complex<T> > {};
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide(const std::complex<T>& a, const std::complex<T>& b) {
#if EIGEN_FAST_MATH
return complex_divide_fast(a, b);
#else
return complex_divide_stable(a, b);
#endif
}
// NOTE: We cannot specialize compound assignment operators with Scalar T,
// (i.e. operator@=(const T&), for @=+,-,*,/)
// since they are already specialized for float/double/long double within
// the standard <complex> header. We also do not specialize the stream
// operators.
#define EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(T) \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a) { return a; } \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a) { \
return std::complex<T>(-numext::real(a), -numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a, const std::complex<T>& b) { \
return std::complex<T>(numext::real(a) + numext::real(b), numext::imag(a) + numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) + b, numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a + numext::real(b), numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a, const std::complex<T>& b) { \
return std::complex<T>(numext::real(a) - numext::real(b), numext::imag(a) - numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) - b, numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a - numext::real(b), -numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const std::complex<T>& a, const std::complex<T>& b) { \
return complex_multiply(a, b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) * b, numext::imag(a) * b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a * numext::real(b), a * numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const std::complex<T>& a, const std::complex<T>& b) { \
return complex_divide(a, b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) / b, numext::imag(a) / b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const T& a, const std::complex<T>& b) { \
return complex_divide(std::complex<T>(a, 0), b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator+=(std::complex<T>& a, const std::complex<T>& b) { \
numext::real_ref(a) += numext::real(b); \
numext::imag_ref(a) += numext::imag(b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator-=(std::complex<T>& a, const std::complex<T>& b) { \
numext::real_ref(a) -= numext::real(b); \
numext::imag_ref(a) -= numext::imag(b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator*=(std::complex<T>& a, const std::complex<T>& b) { \
a = complex_multiply(a, b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator/=(std::complex<T>& a, const std::complex<T>& b) { \
a = complex_divide(a, b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const std::complex<T>& a, const std::complex<T>& b) { \
return numext::real(a) == numext::real(b) && numext::imag(a) == numext::imag(b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const std::complex<T>& a, const T& b) { \
return numext::real(a) == b && numext::imag(a) == 0; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const T& a, const std::complex<T>& b) { \
return a == numext::real(b) && 0 == numext::imag(b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const std::complex<T>& a, const std::complex<T>& b) { \
return !(a == b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const std::complex<T>& a, const T& b) { \
return !(a == b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const T& a, const std::complex<T>& b) { \
return !(a == b); \
}
// Product
template<typename T> struct scalar_product_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T> >::HasMul
};
typedef typename std::complex<T> result_type;
// Do not specialize for long double, since that reduces to double on device.
EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(float)
EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(double)
EIGEN_EMPTY_STRUCT_CTOR(scalar_product_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
return std::complex<T>(a_real * b_real - a_imag * b_imag,
a_real * b_imag + a_imag * b_real);
}
};
#undef EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS
template<typename T> struct scalar_product_op<std::complex<T>, std::complex<T> > : scalar_product_op<const std::complex<T>, const std::complex<T> > {};
} // namespace complex_operator_detail
EIGEN_USING_STD_COMPLEX_OPERATORS
// Quotient
template<typename T> struct scalar_quotient_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T> >::HasDiv
};
typedef typename std::complex<T> result_type;
namespace numext {
EIGEN_USING_STD_COMPLEX_OPERATORS
} // namespace numext
EIGEN_EMPTY_STRUCT_CTOR(scalar_quotient_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = T(1) / (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) * norm,
(a_imag * b_real - a_real * b_imag) * norm);
}
};
namespace internal {
EIGEN_USING_STD_COMPLEX_OPERATORS
template<typename T> struct scalar_quotient_op<std::complex<T>, std::complex<T> > : scalar_quotient_op<const std::complex<T>, const std::complex<T> > {};
} // namespace internal
} // namespace Eigen
#endif
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_COMPLEX_CUDA_H
#endif // EIGEN_COMPLEX_CUDA_H

View File

@@ -0,0 +1,706 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef EIGEN_BFLOAT16_H
#define EIGEN_BFLOAT16_H
#define BF16_PACKET_FUNCTION(PACKET_F, PACKET_BF16, METHOD) \
template <> \
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED \
PACKET_BF16 METHOD<PACKET_BF16>(const PACKET_BF16& _x) { \
return F32ToBf16(METHOD<PACKET_F>(Bf16ToF32(_x))); \
}
namespace Eigen {
struct bfloat16;
namespace bfloat16_impl {
// Make our own __bfloat16_raw definition.
struct __bfloat16_raw {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw() : value(0) {}
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw(unsigned short raw) : value(raw) {}
unsigned short value;
};
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(unsigned short value);
template <bool AssumeArgumentIsNormalOrInfinityOrZero>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff);
// Forward declarations of template specializations, to avoid Visual C++ 2019 errors, saying:
// > error C2908: explicit specialization; 'float_to_bfloat16_rtne' has already been instantiated
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<false>(float ff);
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<true>(float ff);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h);
struct bfloat16_base : public __bfloat16_raw {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base(const __bfloat16_raw& h) : __bfloat16_raw(h) {}
};
} // namespace bfloat16_impl
// Class definition.
struct bfloat16 : public bfloat16_impl::bfloat16_base {
typedef bfloat16_impl::__bfloat16_raw __bfloat16_raw;
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const __bfloat16_raw& h) : bfloat16_impl::bfloat16_base(h) {}
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(bool b)
: bfloat16_impl::bfloat16_base(bfloat16_impl::raw_uint16_to_bfloat16(b ? 0x3f80 : 0)) {}
template<class T>
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const T& val)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<internal::is_integral<T>::value>(static_cast<float>(val))) {}
explicit EIGEN_DEVICE_FUNC bfloat16(float f)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<false>(f)) {}
// Following the convention of numpy, converting between complex and
// float will lead to loss of imag value.
template<typename RealScalar>
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const std::complex<RealScalar>& val)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<false>(static_cast<float>(val.real()))) {}
EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless.
return bfloat16_impl::bfloat16_to_float(*this);
}
};
} // namespace Eigen
namespace std {
template<>
struct numeric_limits<Eigen::bfloat16> {
static const bool is_specialized = true;
static const bool is_signed = true;
static const bool is_integer = false;
static const bool is_exact = false;
static const bool has_infinity = true;
static const bool has_quiet_NaN = true;
static const bool has_signaling_NaN = true;
static const float_denorm_style has_denorm = std::denorm_absent;
static const bool has_denorm_loss = false;
static const std::float_round_style round_style = numeric_limits<float>::round_style;
static const bool is_iec559 = false;
static const bool is_bounded = true;
static const bool is_modulo = false;
static const int digits = 8;
static const int digits10 = 2;
static const int max_digits10 = 4;
static const int radix = 2;
static const int min_exponent = numeric_limits<float>::min_exponent;
static const int min_exponent10 = numeric_limits<float>::min_exponent10;
static const int max_exponent = numeric_limits<float>::max_exponent;
static const int max_exponent10 = numeric_limits<float>::max_exponent10;
static const bool traps = numeric_limits<float>::traps;
static const bool tinyness_before = numeric_limits<float>::tinyness_before;
static Eigen::bfloat16 (min)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0080); }
static Eigen::bfloat16 lowest() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0xff7f); }
static Eigen::bfloat16 (max)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f7f); }
static Eigen::bfloat16 epsilon() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x3c00); }
static Eigen::bfloat16 round_error() { return Eigen::bfloat16(0x3f00); }
static Eigen::bfloat16 infinity() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f80); }
static Eigen::bfloat16 quiet_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0); }
static Eigen::bfloat16 signaling_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f81); }
static Eigen::bfloat16 denorm_min() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0001); }
};
// If std::numeric_limits<T> is specialized, should also specialize
// std::numeric_limits<const T>, std::numeric_limits<volatile T>, and
// std::numeric_limits<const volatile T>
// https://stackoverflow.com/a/16519653/
template<>
struct numeric_limits<const Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
template<>
struct numeric_limits<volatile Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
template<>
struct numeric_limits<const volatile Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
} // namespace std
namespace Eigen {
namespace bfloat16_impl {
// We need to distinguish clang as the CUDA compiler from clang as the host compiler,
// invoked by NVCC (e.g. on MacOS). The former needs to see both host and device implementation
// of the functions, while the latter can only deal with one of them.
#if !defined(EIGEN_HAS_NATIVE_BF16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for bfloat16 floats
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
// We need to provide emulated *host-side* BF16 operators for clang.
#pragma push_macro("EIGEN_DEVICE_FUNC")
#undef EIGEN_DEVICE_FUNC
#if defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_NATIVE_BF16)
#define EIGEN_DEVICE_FUNC __host__
#else // both host and device need emulated ops.
#define EIGEN_DEVICE_FUNC __host__ __device__
#endif
#endif
// Definitions for CPUs, mostly working through conversion
// to/from fp32.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) + float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const int& b) {
return bfloat16(float(a) + static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const int& a, const bfloat16& b) {
return bfloat16(static_cast<float>(a) + float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator * (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) * float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) - float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) / float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a) {
bfloat16 result;
result.value = a.value ^ 0x8000;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator += (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) + float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator *= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) * float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator -= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) - float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator /= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) / float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a) {
a += bfloat16(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a) {
a -= bfloat16(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a, int) {
bfloat16 original_value = a;
++a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a, int) {
bfloat16 original_value = a;
--a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const bfloat16& a, const bfloat16& b) {
return numext::equal_strict(float(a),float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const bfloat16& a, const bfloat16& b) {
return numext::not_equal_strict(float(a), float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const bfloat16& a, const bfloat16& b) {
return float(a) < float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const bfloat16& a, const bfloat16& b) {
return float(a) <= float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const bfloat16& a, const bfloat16& b) {
return float(a) > float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const bfloat16& a, const bfloat16& b) {
return float(a) >= float(b);
}
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
#pragma pop_macro("EIGEN_DEVICE_FUNC")
#endif
#endif // Emulate support for bfloat16 floats
// Division by an index. Do it in full float precision to avoid accuracy
// issues in converting the denominator to bfloat16.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, Index b) {
return bfloat16(static_cast<float>(a) / static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw truncate_to_bfloat16(const float v) {
__bfloat16_raw output;
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(v)) {
output.value = std::signbit(v) ? 0xFFC0: 0x7FC0;
return output;
} else if (std::fabs(v) < std::numeric_limits<float>::min EIGEN_NOT_A_MACRO()) {
// Flush denormal to +/- 0.
output.value = std::signbit(v) ? 0x8000 : 0;
return output;
}
const uint16_t* p = reinterpret_cast<const uint16_t*>(&v);
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
output.value = p[0];
#else
output.value = p[1];
#endif
return output;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(numext::uint16_t value) {
return __bfloat16_raw(value);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR numext::uint16_t raw_bfloat16_as_uint16(const __bfloat16_raw& bf) {
return bf.value;
}
// float_to_bfloat16_rtne template specialization that does not make any
// assumption about the value of its function argument (ff).
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<false>(float ff) {
#if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16))
// Nothing to do here
#else
__bfloat16_raw output;
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(ff)) {
// If the value is a NaN, squash it to a qNaN with msb of fraction set,
// this makes sure after truncation we don't end up with an inf.
//
// qNaN magic: All exponent bits set + most significant bit of fraction
// set.
output.value = std::signbit(ff) ? 0xFFC0: 0x7FC0;
} else if (std::fabs(ff) < std::numeric_limits<float>::min EIGEN_NOT_A_MACRO()) {
// Flush denormal to +/- 0.0
output.value = std::signbit(ff) ? 0x8000 : 0;
} else {
// Fast rounding algorithm that rounds a half value to nearest even. This
// reduces expected error when we convert a large number of floats. Here
// is how it works:
//
// Definitions:
// To convert a float 32 to bfloat16, a float 32 can be viewed as 32 bits
// with the following tags:
//
// Sign | Exp (8 bits) | Frac (23 bits)
// S EEEEEEEE FFFFFFLRTTTTTTTTTTTTTTT
//
// S: Sign bit.
// E: Exponent bits.
// F: First 6 bits of fraction.
// L: Least significant bit of resulting bfloat16 if we truncate away the
// rest of the float32. This is also the 7th bit of fraction
// R: Rounding bit, 8th bit of fraction.
// T: Sticky bits, rest of fraction, 15 bits.
//
// To round half to nearest even, there are 3 cases where we want to round
// down (simply truncate the result of the bits away, which consists of
// rounding bit and sticky bits) and two cases where we want to round up
// (truncate then add one to the result).
//
// The fast converting algorithm simply adds lsb (L) to 0x7fff (15 bits of
// 1s) as the rounding bias, adds the rounding bias to the input, then
// truncates the last 16 bits away.
//
// To understand how it works, we can analyze this algorithm case by case:
//
// 1. L = 0, R = 0:
// Expect: round down, this is less than half value.
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input may create any carry, depending on
// whether there is any value set to 1 in T bits.
// - R may be set to 1 if there is a carry.
// - L remains 0.
// - Note that this case also handles Inf and -Inf, where all fraction
// bits, including L, R and Ts are all 0. The output remains Inf after
// this algorithm.
//
// 2. L = 1, R = 0:
// Expect: round down, this is less than half value.
//
// Algorithm:
// - Rounding bias: 0x7fff + 1 = 0x8000
// - Adding rounding bias to input doesn't change sticky bits but
// adds 1 to rounding bit.
// - L remains 1.
//
// 3. L = 0, R = 1, all of T are 0:
// Expect: round down, this is exactly at half, the result is already
// even (L=0).
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input sets all sticky bits to 1, but
// doesn't create a carry.
// - R remains 1.
// - L remains 0.
//
// 4. L = 1, R = 1:
// Expect: round up, this is exactly at half, the result needs to be
// round to the next even number.
//
// Algorithm:
// - Rounding bias: 0x7fff + 1 = 0x8000
// - Adding rounding bias to input doesn't change sticky bits, but
// creates a carry from rounding bit.
// - The carry sets L to 0, creates another carry bit and propagate
// forward to F bits.
// - If all the F bits are 1, a carry then propagates to the exponent
// bits, which then creates the minimum value with the next exponent
// value. Note that we won't have the case where exponents are all 1,
// since that's either a NaN (handled in the other if condition) or inf
// (handled in case 1).
//
// 5. L = 0, R = 1, any of T is 1:
// Expect: round up, this is greater than half.
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input creates a carry from sticky bits,
// sets rounding bit to 0, then create another carry.
// - The second carry sets L to 1.
//
// Examples:
//
// Exact half value that is already even:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1000000000000000
//
// This falls into case 3. We truncate the rest of 16 bits and no
// carry is created into F and L:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
//
// Exact half value, round to next even number:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1000000000000000
//
// This falls into case 4. We create a carry from R and T,
// which then propagates into L and F:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
//
//
// Max denormal value round to min normal value:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1111111111111111
//
// This falls into case 4. We create a carry from R and T,
// propagate into L and F, which then propagates into exponent
// bits:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
//
// Max normal value round to Inf:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1111111111111111
//
// This falls into case 4. We create a carry from R and T,
// propagate into L and F, which then propagates into exponent
// bits:
//
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
// At this point, ff must be either a normal float, or +/-infinity.
output = float_to_bfloat16_rtne<true>(ff);
}
return output;
#endif
}
// float_to_bfloat16_rtne template specialization that assumes that its function
// argument (ff) is either a normal floating point number, or +/-infinity, or
// zero. Used to improve the runtime performance of conversion from an integer
// type to bfloat16.
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<true>(float ff) {
#if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16))
// Nothing to do here
#else
numext::uint32_t input = numext::bit_cast<numext::uint32_t>(ff);
__bfloat16_raw output;
// Least significant bit of resulting bfloat.
numext::uint32_t lsb = (input >> 16) & 1;
numext::uint32_t rounding_bias = 0x7fff + lsb;
input += rounding_bias;
output.value = static_cast<numext::uint16_t>(input >> 16);
return output;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h) {
float result = 0;
unsigned short* q = reinterpret_cast<unsigned short*>(&result);
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
q[0] = h.value;
#else
q[1] = h.value;
#endif
return result;
}
// --- standard functions ---
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const bfloat16& a) {
EIGEN_USING_STD(isinf);
return (isinf)(float(a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const bfloat16& a) {
EIGEN_USING_STD(isnan);
return (isnan)(float(a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const bfloat16& a) {
return !(isinf EIGEN_NOT_A_MACRO (a)) && !(isnan EIGEN_NOT_A_MACRO (a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 abs(const bfloat16& a) {
bfloat16 result;
result.value = a.value & 0x7FFF;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 exp(const bfloat16& a) {
return bfloat16(::expf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 expm1(const bfloat16& a) {
return bfloat16(numext::expm1(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log(const bfloat16& a) {
return bfloat16(::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log1p(const bfloat16& a) {
return bfloat16(numext::log1p(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log10(const bfloat16& a) {
return bfloat16(::log10f(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log2(const bfloat16& a) {
return bfloat16(static_cast<float>(EIGEN_LOG2E) * ::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sqrt(const bfloat16& a) {
return bfloat16(::sqrtf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 pow(const bfloat16& a, const bfloat16& b) {
return bfloat16(::powf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sin(const bfloat16& a) {
return bfloat16(::sinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cos(const bfloat16& a) {
return bfloat16(::cosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tan(const bfloat16& a) {
return bfloat16(::tanf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asin(const bfloat16& a) {
return bfloat16(::asinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acos(const bfloat16& a) {
return bfloat16(::acosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atan(const bfloat16& a) {
return bfloat16(::atanf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sinh(const bfloat16& a) {
return bfloat16(::sinhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cosh(const bfloat16& a) {
return bfloat16(::coshf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tanh(const bfloat16& a) {
return bfloat16(::tanhf(float(a)));
}
#if EIGEN_HAS_CXX11_MATH
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asinh(const bfloat16& a) {
return bfloat16(::asinhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acosh(const bfloat16& a) {
return bfloat16(::acoshf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atanh(const bfloat16& a) {
return bfloat16(::atanhf(float(a)));
}
#endif
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 floor(const bfloat16& a) {
return bfloat16(::floorf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 rint(const bfloat16& a) {
return bfloat16(::rintf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 ceil(const bfloat16& a) {
return bfloat16(::ceilf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmod(const bfloat16& a, const bfloat16& b) {
return bfloat16(::fmodf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (min)(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return f2 < f1 ? b : a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (max)(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return f1 < f2 ? b : a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmin(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return bfloat16(::fminf(f1, f2));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmax(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return bfloat16(::fmaxf(f1, f2));
}
#ifndef EIGEN_NO_IO
EIGEN_ALWAYS_INLINE std::ostream& operator << (std::ostream& os, const bfloat16& v) {
os << static_cast<float>(v);
return os;
}
#endif
} // namespace bfloat16_impl
namespace internal {
template<>
struct random_default_impl<bfloat16, false, false>
{
static inline bfloat16 run(const bfloat16& x, const bfloat16& y)
{
return x + (y-x) * bfloat16(float(std::rand()) / float(RAND_MAX));
}
static inline bfloat16 run()
{
return run(bfloat16(-1.f), bfloat16(1.f));
}
};
template<> struct is_arithmetic<bfloat16> { enum { value = true }; };
} // namespace internal
template<> struct NumTraits<Eigen::bfloat16>
: GenericNumTraits<Eigen::bfloat16>
{
enum {
IsSigned = true,
IsInteger = false,
IsComplex = false,
RequireInitialization = false
};
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 epsilon() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x3c00);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 dummy_precision() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x3D4D); // bfloat16(5e-2f);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 highest() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7F7F);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 lowest() {
return bfloat16_impl::raw_uint16_to_bfloat16(0xFF7F);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 infinity() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7f80);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 quiet_NaN() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0);
}
};
} // namespace Eigen
namespace std {
#if __cplusplus > 199711L
template <>
struct hash<Eigen::bfloat16> {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::bfloat16& a) const {
return hash<float>()(static_cast<float>(a));
}
};
#endif
} // namespace std
namespace Eigen {
namespace numext {
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isnan)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isnan)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isinf)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isinf)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isfinite)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isfinite)(h);
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bit_cast<Eigen::bfloat16, uint16_t>(const uint16_t& src) {
return Eigen::bfloat16(Eigen::bfloat16_impl::raw_uint16_to_bfloat16(src));
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast<uint16_t, Eigen::bfloat16>(const Eigen::bfloat16& src) {
return Eigen::bfloat16_impl::raw_bfloat16_as_uint16(src);
}
} // namespace numext
} // namespace Eigen
#endif // EIGEN_BFLOAT16_H

File diff suppressed because it is too large Load Diff

View File

@@ -17,11 +17,31 @@ namespace internal {
// implemented in GenericPacketMathFunctions.h
// This is needed to workaround a circular dependency.
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent);
/** \internal \returns a packet with constant coefficients \a a, e.g.: (a[N-1],...,a[0]) */
template<typename Packet, int N> EIGEN_DEVICE_FUNC inline Packet
pset(const typename unpacket_traits<Packet>::type (&a)[N] /* a */);
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent);
/***************************************************************************
* Some generic implementations to be used by implementors
***************************************************************************/
/** Default implementation of pfrexp.
* It is expected to be called by implementers of template<> pfrexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
Packet pfrexp_generic(const Packet& a, Packet& exponent);
// Extracts the biased exponent value from Packet p, and casts the results to
// a floating-point Packet type. Used by pfrexp_generic. Override this if
// there is no unpacket_traits<Packet>::integer_packet.
template<typename Packet> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
Packet pfrexp_generic_get_biased_exponent(const Packet& p);
/** Default implementation of pldexp.
* It is expected to be called by implementers of template<> pldexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
Packet pldexp_generic(const Packet& a, const Packet& exponent);
/** \internal \returns log(x) for single precision float */
template <typename Packet>
@@ -29,6 +49,24 @@ EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog_float(const Packet _x);
/** \internal \returns log2(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog2_float(const Packet _x);
/** \internal \returns log(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog_double(const Packet _x);
/** \internal \returns log2(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog2_double(const Packet _x);
/** \internal \returns log(1 + x) */
template<typename Packet>
Packet generic_plog1p(const Packet& x);
@@ -61,8 +99,15 @@ EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pcos_float(const Packet& x);
/** \internal \returns sqrt(x) for complex types */
template<typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet psqrt_complex(const Packet& a);
template <typename Packet, int N> struct ppolevl;
} // end namespace internal
} // end namespace Eigen

View File

@@ -36,12 +36,26 @@
#ifndef EIGEN_HALF_H
#define EIGEN_HALF_H
#if __cplusplus > 199711L
#define EIGEN_EXPLICIT_CAST(tgt_type) explicit operator tgt_type()
#else
#define EIGEN_EXPLICIT_CAST(tgt_type) operator tgt_type()
#include <sstream>
#if defined(EIGEN_HAS_GPU_FP16) || defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
// When compiling with GPU support, the "__half_raw" base class as well as
// some other routines are defined in the GPU compiler header files
// (cuda_fp16.h, hip_fp16.h), and they are not tagged constexpr
// As a consequence, we get compile failures when compiling Eigen with
// GPU support. Hence the need to disable EIGEN_CONSTEXPR when building
// Eigen with GPU support
#pragma push_macro("EIGEN_CONSTEXPR")
#undef EIGEN_CONSTEXPR
#define EIGEN_CONSTEXPR
#endif
#define F16_PACKET_FUNCTION(PACKET_F, PACKET_F16, METHOD) \
template <> \
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_UNUSED \
PACKET_F16 METHOD<PACKET_F16>(const PACKET_F16& _x) { \
return float2half(METHOD<PACKET_F>(half2float(_x))); \
}
namespace Eigen {
@@ -52,40 +66,45 @@ namespace half_impl {
#if !defined(EIGEN_HAS_GPU_FP16)
// Make our own __half_raw definition that is similar to CUDA's.
struct __half_raw {
EIGEN_DEVICE_FUNC __half_raw() : x(0) {}
explicit EIGEN_DEVICE_FUNC __half_raw(unsigned short raw) : x(raw) {}
unsigned short x;
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw() : x(0) {}
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw(numext::uint16_t raw) : x(numext::bit_cast<__fp16>(raw)) {
}
__fp16 x;
#else
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw(numext::uint16_t raw) : x(raw) {}
numext::uint16_t x;
#endif
};
#elif defined(EIGEN_HAS_HIP_FP16)
// Nothing to do here
// HIP fp16 header file has a definition for __half_raw
#elif defined(EIGEN_HAS_CUDA_FP16)
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000
// In CUDA < 9.0, __half is the equivalent of CUDA 9's __half_raw
typedef __half __half_raw;
#endif // defined(EIGEN_HAS_CUDA_FP16)
#if EIGEN_CUDA_SDK_VER < 90000
// In CUDA < 9.0, __half is the equivalent of CUDA 9's __half_raw
typedef __half __half_raw;
#endif // defined(EIGEN_HAS_CUDA_FP16)
#elif defined(SYCL_DEVICE_ONLY)
typedef cl::sycl::half __half_raw;
typedef cl::sycl::half __half_raw;
#endif
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw raw_uint16_to_half(unsigned short x);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw raw_uint16_to_half(numext::uint16_t x);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h);
struct half_base : public __half_raw {
EIGEN_DEVICE_FUNC half_base() {}
EIGEN_DEVICE_FUNC half_base(const __half_raw& h) : __half_raw(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half_raw& h) : __half_raw(h) {}
#if defined(EIGEN_HAS_GPU_FP16)
#if defined(EIGEN_HAS_HIP_FP16)
EIGEN_DEVICE_FUNC half_base(const __half& h) { x = __half_as_ushort(h); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half& h) { x = __half_as_ushort(h); }
#elif defined(EIGEN_HAS_CUDA_FP16)
#if (defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000)
EIGEN_DEVICE_FUNC half_base(const __half& h) : __half_raw(*(__half_raw*)&h) {}
#if EIGEN_CUDA_SDK_VER >= 90000
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half& h) : __half_raw(*(__half_raw*)&h) {}
#endif
#endif
#endif
#endif
};
@@ -110,22 +129,22 @@ struct half : public half_impl::half_base {
#endif
#endif
EIGEN_DEVICE_FUNC half() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half() {}
EIGEN_DEVICE_FUNC half(const __half_raw& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half_raw& h) : half_impl::half_base(h) {}
#if defined(EIGEN_HAS_GPU_FP16)
#if defined(EIGEN_HAS_HIP_FP16)
EIGEN_DEVICE_FUNC half(const __half& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half& h) : half_impl::half_base(h) {}
#elif defined(EIGEN_HAS_CUDA_FP16)
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000
EIGEN_DEVICE_FUNC half(const __half& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half& h) : half_impl::half_base(h) {}
#endif
#endif
#endif
explicit EIGEN_DEVICE_FUNC half(bool b)
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(bool b)
: half_impl::half_base(half_impl::raw_uint16_to_half(b ? 0x3c00 : 0)) {}
template<class T>
explicit EIGEN_DEVICE_FUNC half(const T& val)
@@ -133,46 +152,15 @@ struct half : public half_impl::half_base {
explicit EIGEN_DEVICE_FUNC half(float f)
: half_impl::half_base(half_impl::float_to_half_rtne(f)) {}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(bool) const {
// +0.0 and -0.0 become false, everything else becomes true.
return (x & 0x7fff) != 0;
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(signed char) const {
return static_cast<signed char>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned char) const {
return static_cast<unsigned char>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(short) const {
return static_cast<short>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned short) const {
return static_cast<unsigned short>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(int) const {
return static_cast<int>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned int) const {
return static_cast<unsigned int>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(long) const {
return static_cast<long>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned long) const {
return static_cast<unsigned long>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(long long) const {
return static_cast<long long>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned long long) const {
return static_cast<unsigned long long>(half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(float) const {
// Following the convention of numpy, converting between complex and
// float will lead to loss of imag value.
template<typename RealScalar>
explicit EIGEN_DEVICE_FUNC half(std::complex<RealScalar> c)
: half_impl::half_base(half_impl::float_to_half_rtne(static_cast<float>(c.real()))) {}
EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless.
return half_impl::half_to_float(*this);
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(double) const {
return static_cast<double>(half_impl::half_to_float(*this));
}
};
} // end namespace Eigen
@@ -211,7 +199,7 @@ struct numeric_limits<Eigen::half> {
static Eigen::half round_error() { return Eigen::half(0.5); }
static Eigen::half infinity() { return Eigen::half_impl::raw_uint16_to_half(0x7c00); }
static Eigen::half quiet_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7e00); }
static Eigen::half signaling_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7e00); }
static Eigen::half signaling_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7d00); }
static Eigen::half denorm_min() { return Eigen::half_impl::raw_uint16_to_half(0x1); }
};
@@ -234,6 +222,9 @@ namespace half_impl {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
// Note: We deliberatly do *not* define this to 1 even if we have Arm's native
// fp16 type since GPU halfs are rather different from native CPU halfs.
// TODO: Rename to something like EIGEN_HAS_NATIVE_GPU_FP16
#define EIGEN_HAS_NATIVE_FP16
#endif
@@ -302,13 +293,62 @@ EIGEN_STRONG_INLINE __device__ bool operator > (const half& a, const half& b) {
EIGEN_STRONG_INLINE __device__ bool operator >= (const half& a, const half& b) {
return __hge(a, b);
}
#endif
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) {
return half(vaddh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator * (const half& a, const half& b) {
return half(vmulh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a, const half& b) {
return half(vsubh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, const half& b) {
return half(vdivh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a) {
return half(vnegh_f16(a.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator += (half& a, const half& b) {
a = half(vaddh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator *= (half& a, const half& b) {
a = half(vmulh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator -= (half& a, const half& b) {
a = half(vsubh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator /= (half& a, const half& b) {
a = half(vdivh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const half& a, const half& b) {
return vceqh_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const half& a, const half& b) {
return !vceqh_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const half& a, const half& b) {
return vclth_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const half& a, const half& b) {
return vcleh_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const half& a, const half& b) {
return vcgth_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const half& a, const half& b) {
return vcgeh_f16(a.x, b.x);
}
// We need to distinguish clang as the CUDA compiler from clang as the host compiler,
// invoked by NVCC (e.g. on MacOS). The former needs to see both host and device implementation
// of the functions, while the latter can only deal with one of them.
#if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for half floats
#elif !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for half floats
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
// We need to provide emulated *host-side* FP16 operators for clang.
@@ -323,7 +363,6 @@ EIGEN_STRONG_INLINE __device__ bool operator >= (const half& a, const half& b) {
// Definitions for CPUs and older HIP+CUDA, mostly working through conversion
// to/from fp32.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) {
return half(float(a) + float(b));
}
@@ -392,10 +431,33 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, Index b) {
// these in hardware. If we need more performance on older/other CPUs, they are
// also possible to vectorize directly.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw raw_uint16_to_half(unsigned short x) {
__half_raw h;
h.x = x;
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw raw_uint16_to_half(numext::uint16_t x) {
// We cannot simply do a "return __half_raw(x)" here, because __half_raw is union type
// in the hip_fp16 header file, and that will trigger a compile error
// On the other hand, having anything but a return statement also triggers a compile error
// because this is constexpr function.
// Fortunately, since we need to disable EIGEN_CONSTEXPR for GPU anyway, we can get out
// of this catch22 by having separate bodies for GPU / non GPU
#if defined(EIGEN_HAS_GPU_FP16)
__half_raw h;
h.x = x;
return h;
#else
return __half_raw(x);
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC numext::uint16_t raw_half_as_uint16(const __half_raw& h) {
// HIP/CUDA/Default have a member 'x' of type uint16_t.
// For ARM64 native half, the member 'x' is of type __fp16, so we need to bit-cast.
// For SYCL, cl::sycl::half is _Float16, so cast directly.
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return numext::bit_cast<numext::uint16_t>(h.x);
#elif defined(SYCL_DEVICE_ONLY)
return numext::bit_cast<numext::uint16_t>(h);
#else
return h.x;
#endif
}
union float32_bits {
@@ -414,6 +476,11 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
h.x = _cvtss_sh(ff, 0);
return h;
#elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
__half_raw h;
h.x = static_cast<__fp16>(ff);
return h;
#else
float32_bits f; f.f = ff;
@@ -422,7 +489,7 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
const float32_bits denorm_magic = { ((127 - 15) + (23 - 10) + 1) << 23 };
unsigned int sign_mask = 0x80000000u;
__half_raw o;
o.x = static_cast<unsigned short>(0x0u);
o.x = static_cast<numext::uint16_t>(0x0u);
unsigned int sign = f.u & sign_mask;
f.u ^= sign;
@@ -442,20 +509,22 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
f.f += denorm_magic.f;
// and one integer subtract of the bias later, we have our final float!
o.x = static_cast<unsigned short>(f.u - denorm_magic.u);
o.x = static_cast<numext::uint16_t>(f.u - denorm_magic.u);
} else {
unsigned int mant_odd = (f.u >> 13) & 1; // resulting mantissa is odd
// update exponent, rounding bias part 1
f.u += ((unsigned int)(15 - 127) << 23) + 0xfff;
// Equivalent to `f.u += ((unsigned int)(15 - 127) << 23) + 0xfff`, but
// without arithmetic overflow.
f.u += 0xc8000fffU;
// rounding bias part 2
f.u += mant_odd;
// take the bits!
o.x = static_cast<unsigned short>(f.u >> 13);
o.x = static_cast<numext::uint16_t>(f.u >> 13);
}
}
o.x |= static_cast<unsigned short>(sign >> 16);
o.x |= static_cast<numext::uint16_t>(sign >> 16);
return o;
#endif
}
@@ -464,10 +533,10 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h) {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __half2float(h);
#elif defined(EIGEN_HAS_FP16_C)
return _cvtsh_ss(h.x);
#elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return static_cast<float>(h.x);
#else
const float32_bits magic = { 113 << 23 };
const unsigned int shifted_exp = 0x7c00 << 13; // exponent mask after shift
@@ -493,12 +562,18 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h) {
// --- standard functions ---
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const half& a) {
#ifdef EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC
return (numext::bit_cast<numext::uint16_t>(a.x) & 0x7fff) == 0x7c00;
#else
return (a.x & 0x7fff) == 0x7c00;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const half& a) {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __hisnan(a);
#elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return (numext::bit_cast<numext::uint16_t>(a.x) & 0x7fff) > 0x7c00;
#else
return (a.x & 0x7fff) > 0x7c00;
#endif
@@ -508,9 +583,13 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const half& a) {
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half abs(const half& a) {
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return half(vabsh_f16(a.x));
#else
half result;
result.x = a.x & 0x7FFF;
return result;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half exp(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \
@@ -537,6 +616,10 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log1p(const half& a) {
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log10(const half& a) {
return half(::log10f(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log2(const half& a) {
return half(static_cast<float>(EIGEN_LOG2E) * ::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half sqrt(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
@@ -560,6 +643,12 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tan(const half& a) {
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tanh(const half& a) {
return half(::tanhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half asin(const half& a) {
return half(::asinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half acos(const half& a) {
return half(::acosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
@@ -568,6 +657,9 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) {
return half(::floorf(float(a)));
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half rint(const half& a) {
return half(::rintf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
@@ -639,55 +731,31 @@ template<> struct NumTraits<Eigen::half>
RequireInitialization = false
};
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half epsilon() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half epsilon() {
return half_impl::raw_uint16_to_half(0x0800);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half dummy_precision() { return Eigen::half(1e-2f); }
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half highest() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half dummy_precision() {
return half_impl::raw_uint16_to_half(0x211f); // Eigen::half(1e-2f);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half highest() {
return half_impl::raw_uint16_to_half(0x7bff);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half lowest() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half lowest() {
return half_impl::raw_uint16_to_half(0xfbff);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half infinity() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half infinity() {
return half_impl::raw_uint16_to_half(0x7c00);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half quiet_NaN() {
return half_impl::raw_uint16_to_half(0x7c01);
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half quiet_NaN() {
return half_impl::raw_uint16_to_half(0x7e00);
}
};
} // end namespace Eigen
// C-like standard mathematical functions and trancendentals.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half fabsh(const Eigen::half& a) {
Eigen::half result;
result.x = a.x & 0x7FFF;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half exph(const Eigen::half& a) {
return Eigen::half(::expf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half logh(const Eigen::half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return Eigen::half(::hlog(a));
#else
return Eigen::half(::logf(float(a)));
#if defined(EIGEN_HAS_GPU_FP16) || defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
#pragma pop_macro("EIGEN_CONSTEXPR")
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half sqrth(const Eigen::half& a) {
return Eigen::half(::sqrtf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half powh(const Eigen::half& a, const Eigen::half& b) {
return Eigen::half(::powf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half floorh(const Eigen::half& a) {
return Eigen::half(::floorf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half ceilh(const Eigen::half& a) {
return Eigen::half(::ceilf(float(a)));
}
namespace std {
@@ -702,55 +770,104 @@ struct hash<Eigen::half> {
} // end namespace std
// Add the missing shfl_xor intrinsic
#if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneMask, int width=warpSize) {
#if (EIGEN_CUDA_SDK_VER < 90000) || \
defined(EIGEN_HAS_HIP_FP16)
return static_cast<Eigen::half>(__shfl_xor(static_cast<float>(var), laneMask, width));
#else
return static_cast<Eigen::half>(__shfl_xor_sync(0xFFFFFFFF, static_cast<float>(var), laneMask, width));
#endif
}
#endif
// ldg() has an overload for __half_raw, but we also need one for Eigen::half.
#if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 350) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half __ldg(const Eigen::half* ptr) {
return Eigen::half_impl::raw_uint16_to_half(
__ldg(reinterpret_cast<const unsigned short*>(ptr)));
}
#endif
#if defined(EIGEN_GPU_COMPILE_PHASE)
namespace Eigen {
namespace numext {
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isnan)(const Eigen::half& h) {
#if defined(EIGEN_GPU_COMPILE_PHASE)
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isnan)(const Eigen::half& h) {
return (half_impl::isnan)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isinf)(const Eigen::half& h) {
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isinf)(const Eigen::half& h) {
return (half_impl::isinf)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isfinite)(const Eigen::half& h) {
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isfinite)(const Eigen::half& h) {
return (half_impl::isfinite)(h);
}
} // namespace Eigen
} // namespace numext
#endif
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bit_cast<Eigen::half, uint16_t>(const uint16_t& src) {
return Eigen::half(Eigen::half_impl::raw_uint16_to_half(src));
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast<uint16_t, Eigen::half>(const Eigen::half& src) {
return Eigen::half_impl::raw_half_as_uint16(src);
}
} // namespace numext
} // namespace Eigen
// Add the missing shfl* intrinsics.
// The __shfl* functions are only valid on HIP or _CUDA_ARCH_ >= 300.
// CUDA defines them for (__CUDA_ARCH__ >= 300 || !defined(__CUDA_ARCH__))
//
// HIP and CUDA prior to SDK 9.0 define
// __shfl, __shfl_up, __shfl_down, __shfl_xor for int and float
// CUDA since 9.0 deprecates those and instead defines
// __shfl_sync, __shfl_up_sync, __shfl_down_sync, __shfl_xor_sync,
// with native support for __half and __nv_bfloat16
//
// Note that the following are __device__ - only functions.
#if (defined(EIGEN_CUDACC) && (!defined(EIGEN_CUDA_ARCH) || EIGEN_CUDA_ARCH >= 300)) \
|| defined(EIGEN_HIPCC)
#if defined(EIGEN_HAS_CUDA_FP16) && EIGEN_CUDA_SDK_VER >= 90000
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_sync(unsigned mask, Eigen::half var, int srcLane, int width=warpSize) {
return static_cast<Eigen::half>(__shfl_sync(mask, static_cast<__half>(var), srcLane, width));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_up_sync(unsigned mask, Eigen::half var, unsigned int delta, int width=warpSize) {
return static_cast<Eigen::half>(__shfl_up_sync(mask, static_cast<__half>(var), delta, width));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_down_sync(unsigned mask, Eigen::half var, unsigned int delta, int width=warpSize) {
return static_cast<Eigen::half>(__shfl_down_sync(mask, static_cast<__half>(var), delta, width));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor_sync(unsigned mask, Eigen::half var, int laneMask, int width=warpSize) {
return static_cast<Eigen::half>(__shfl_xor_sync(mask, static_cast<__half>(var), laneMask, width));
}
#else // HIP or CUDA SDK < 9.0
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl(Eigen::half var, int srcLane, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl(ivar, srcLane, width)));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_up(Eigen::half var, unsigned int delta, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl_up(ivar, delta, width)));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_down(Eigen::half var, unsigned int delta, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl_down(ivar, delta, width)));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneMask, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl_xor(ivar, laneMask, width)));
}
#endif // HIP vs CUDA
#endif // __shfl*
// ldg() has an overload for __half_raw, but we also need one for Eigen::half.
#if (defined(EIGEN_CUDACC) && (!defined(EIGEN_CUDA_ARCH) || EIGEN_CUDA_ARCH >= 350)) \
|| defined(EIGEN_HIPCC)
EIGEN_STRONG_INLINE __device__ Eigen::half __ldg(const Eigen::half* ptr) {
return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const Eigen::numext::uint16_t*>(ptr)));
}
#endif // __ldg
#endif // EIGEN_HALF_H

View File

@@ -71,6 +71,49 @@ template<>
struct functor_traits<scalar_cast_op<Eigen::half, float> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<float, Eigen::bfloat16> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::bfloat16 result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 operator() (const float& a) const {
return Eigen::bfloat16(a);
}
};
template<>
struct functor_traits<scalar_cast_op<float, Eigen::bfloat16> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<int, Eigen::bfloat16> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::bfloat16 result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 operator() (const int& a) const {
return Eigen::bfloat16(static_cast<float>(a));
}
};
template<>
struct functor_traits<scalar_cast_op<int, Eigen::bfloat16> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<Eigen::bfloat16, float> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef float result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator() (const Eigen::bfloat16& a) const {
return static_cast<float>(a);
}
};
template<>
struct functor_traits<scalar_cast_op<Eigen::bfloat16, float> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -17,12 +17,13 @@ namespace internal {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 2,
TgtCoeffRatio = 1
SrcCoeffRatio = 1,
TgtCoeffRatio = 2
};
};
@@ -32,15 +33,39 @@ template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast<half2, float4>(con
return make_float4(r1.x, r1.y, r2.x, r2.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pcast<float4, Packet4h2>(const float4& a, const float4& b) {
Packet4h2 r;
half2* r_alias=reinterpret_cast<half2*>(&r);
r_alias[0]=__floats2half2_rn(a.x,a.y);
r_alias[1]=__floats2half2_rn(a.z,a.w);
r_alias[2]=__floats2half2_rn(b.x,b.y);
r_alias[3]=__floats2half2_rn(b.z,b.w);
return r;
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 2
SrcCoeffRatio = 2,
TgtCoeffRatio = 1
};
};
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast<Packet4h2, float4>(const Packet4h2& a) {
// Simply discard the second half of the input
float4 r;
const half2* a_alias=reinterpret_cast<const half2*>(&a);
float2 r1 = __half22float2(a_alias[0]);
float2 r2 = __half22float2(a_alias[1]);
r.x=static_cast<float>(r1.x);
r.y=static_cast<float>(r1.y);
r.z=static_cast<float>(r2.x);
r.w=static_cast<float>(r2.y);
return r;
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pcast<float4, half2>(const float4& a) {
// Simply discard the second half of the input
return __floats2half2_rn(a.x, a.y);

View File

@@ -297,20 +297,6 @@ EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packet2cf& a) {
return std::complex<float>(value[0], value[1]);
}
template <>
EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs) {
EIGEN_MSA_DEBUG;
Packet4f sum1, sum2, sum;
// Add the first two 64-bit float32x2_t of vecs[0]
sum1 = (Packet4f)__builtin_msa_ilvr_d((v2i64)vecs[1].v, (v2i64)vecs[0].v);
sum2 = (Packet4f)__builtin_msa_ilvl_d((v2i64)vecs[1].v, (v2i64)vecs[0].v);
sum = padd(sum1, sum2);
return Packet2cf(sum);
}
template <>
EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a) {
EIGEN_MSA_DEBUG;
@@ -319,15 +305,6 @@ EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a
(a.v[0] * a.v[3]) + (a.v[1] * a.v[2]));
}
template <int Offset>
struct palign_impl<Offset, Packet2cf> {
EIGEN_STRONG_INLINE static void run(Packet2cf& first, const Packet2cf& second) {
if (Offset == 1) {
first.v = (Packet4f)__builtin_msa_sldi_b((v16i8)second.v, (v16i8)first.v, Offset * 8);
}
}
};
template <>
struct conj_helper<Packet2cf, Packet2cf, false, true> {
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y,
@@ -660,13 +637,6 @@ EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) {
return pfirst(a);
}
template <>
EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) {
EIGEN_MSA_DEBUG;
return vecs[0];
}
template <>
EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) {
EIGEN_MSA_DEBUG;
@@ -674,15 +644,6 @@ EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd&
return pfirst(a);
}
template <int Offset>
struct palign_impl<Offset, Packet1cd> {
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/) {
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes
// boundary...
}
};
template <>
struct conj_helper<Packet1cd, Packet1cd, false, true> {
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y,

View File

@@ -575,45 +575,6 @@ EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a) {
return s[0];
}
template <>
EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs) {
EIGEN_MSA_DEBUG;
v4i32 tmp1, tmp2, tmp3, tmp4;
Packet4f sum;
tmp1 = __builtin_msa_ilvr_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp2 = __builtin_msa_ilvr_w((v4i32)vecs[3], (v4i32)vecs[2]);
tmp3 = __builtin_msa_ilvl_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp4 = __builtin_msa_ilvl_w((v4i32)vecs[3], (v4i32)vecs[2]);
sum = (Packet4f)__builtin_msa_ilvr_d((v2i64)tmp2, (v2i64)tmp1);
sum = padd(sum, (Packet4f)__builtin_msa_ilvod_d((v2i64)tmp2, (v2i64)tmp1));
sum = padd(sum, (Packet4f)__builtin_msa_ilvr_d((v2i64)tmp4, (v2i64)tmp3));
sum = padd(sum, (Packet4f)__builtin_msa_ilvod_d((v2i64)tmp4, (v2i64)tmp3));
return sum;
}
template <>
EIGEN_STRONG_INLINE Packet4i preduxp<Packet4i>(const Packet4i* vecs) {
EIGEN_MSA_DEBUG;
v4i32 tmp1, tmp2, tmp3, tmp4;
Packet4i sum;
tmp1 = __builtin_msa_ilvr_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp2 = __builtin_msa_ilvr_w((v4i32)vecs[3], (v4i32)vecs[2]);
tmp3 = __builtin_msa_ilvl_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp4 = __builtin_msa_ilvl_w((v4i32)vecs[3], (v4i32)vecs[2]);
sum = (Packet4i)__builtin_msa_ilvr_d((v2i64)tmp2, (v2i64)tmp1);
sum = padd(sum, (Packet4i)__builtin_msa_ilvod_d((v2i64)tmp2, (v2i64)tmp1));
sum = padd(sum, (Packet4i)__builtin_msa_ilvr_d((v2i64)tmp4, (v2i64)tmp3));
sum = padd(sum, (Packet4i)__builtin_msa_ilvod_d((v2i64)tmp4, (v2i64)tmp3));
return sum;
}
template <>
EIGEN_STRONG_INLINE int32_t predux<Packet4i>(const Packet4i& a) {
@@ -714,25 +675,6 @@ EIGEN_STRONG_INLINE int32_t predux_max<Packet4i>(const Packet4i& a) {
return m[0];
}
#define PALIGN_MSA(Offset, Type, Command) \
template <> \
struct palign_impl<Offset, Type> { \
EIGEN_STRONG_INLINE static void run(Type& first, const Type& second) { \
if (Offset != 0) first = (Type)(Command((v16i8)second, (v16i8)first, Offset * 4)); \
} \
};
PALIGN_MSA(0, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(1, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(2, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(3, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(0, Packet4i, __builtin_msa_sldi_b)
PALIGN_MSA(1, Packet4i, __builtin_msa_sldi_b)
PALIGN_MSA(2, Packet4i, __builtin_msa_sldi_b)
PALIGN_MSA(3, Packet4i, __builtin_msa_sldi_b)
#undef PALIGN_MSA
inline std::ostream& operator<<(std::ostream& os, const PacketBlock<Packet4f, 4>& value) {
os << "[ " << value.packet[0] << "," << std::endl
<< " " << value.packet[1] << "," << std::endl
@@ -1148,16 +1090,6 @@ EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) {
return s[0];
}
template <>
EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs) {
EIGEN_MSA_DEBUG;
Packet2d v0 = (Packet2d)__builtin_msa_ilvev_d((v2i64)vecs[1], (v2i64)vecs[0]);
Packet2d v1 = (Packet2d)__builtin_msa_ilvod_d((v2i64)vecs[1], (v2i64)vecs[0]);
return padd(v0, v1);
}
// Other reduction functions:
// mul
template <>
@@ -1217,19 +1149,6 @@ EIGEN_STRONG_INLINE Packet2d prsqrt(const Packet2d& a) {
#endif
}
#define PALIGN_MSA(Offset, Type, Command) \
template <> \
struct palign_impl<Offset, Type> { \
EIGEN_STRONG_INLINE static void run(Type& first, const Type& second) { \
if (Offset != 0) first = (Type)(Command((v16i8)second, (v16i8)first, Offset * 8)); \
} \
};
PALIGN_MSA(0, Packet2d, __builtin_msa_sldi_b)
PALIGN_MSA(1, Packet2d, __builtin_msa_sldi_b)
#undef PALIGN_MSA
inline std::ostream& operator<<(std::ostream& os, const PacketBlock<Packet2d, 2>& value) {
os << "[ " << value.packet[0] << "," << std::endl << " " << value.packet[1] << " ]";
return os;

View File

@@ -15,9 +15,10 @@ namespace Eigen {
namespace internal {
inline uint32x4_t p4ui_CONJ_XOR() {
inline uint32x4_t p4ui_CONJ_XOR()
{
// See bug 1325, clang fails to call vld1q_u64.
#if EIGEN_COMP_CLANG
#if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML
uint32x4_t ret = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 };
return ret;
#else
@@ -26,61 +27,143 @@ inline uint32x4_t p4ui_CONJ_XOR() {
#endif
}
inline uint32x2_t p2ui_CONJ_XOR() {
inline uint32x2_t p2ui_CONJ_XOR()
{
static const uint32_t conj_XOR_DATA[] = { 0x00000000, 0x80000000 };
return vld1_u32( conj_XOR_DATA );
}
//---------- float ----------
struct Packet1cf
{
EIGEN_STRONG_INLINE Packet1cf() {}
EIGEN_STRONG_INLINE explicit Packet1cf(const Packet2f& a) : v(a) {}
Packet2f v;
};
struct Packet2cf
{
EIGEN_STRONG_INLINE Packet2cf() {}
EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {}
Packet4f v;
Packet4f v;
};
template<> struct packet_traits<std::complex<float> > : default_packet_traits
template<> struct packet_traits<std::complex<float> > : default_packet_traits
{
typedef Packet2cf type;
typedef Packet2cf half;
enum {
typedef Packet1cf half;
enum
{
Vectorizable = 1,
AlignedOnScalar = 1,
size = 2,
HasHalfPacket = 0,
HasHalfPacket = 1,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>& from)
template<> struct unpacket_traits<Packet1cf>
{
float32x2_t r64;
r64 = vld1_f32((const float *)&from);
typedef std::complex<float> type;
typedef Packet1cf half;
typedef Packet2f as_real;
enum
{
size = 1,
alignment = Aligned16,
vectorizable = true,
masked_load_available = false,
masked_store_available = false
};
};
template<> struct unpacket_traits<Packet2cf>
{
typedef std::complex<float> type;
typedef Packet1cf half;
typedef Packet4f as_real;
enum
{
size = 2,
alignment = Aligned16,
vectorizable = true,
masked_load_available = false,
masked_store_available = false
};
};
template<> EIGEN_STRONG_INLINE Packet1cf pcast<float,Packet1cf>(const float& a)
{ return Packet1cf(vset_lane_f32(a, vdup_n_f32(0.f), 0)); }
template<> EIGEN_STRONG_INLINE Packet2cf pcast<Packet2f,Packet2cf>(const Packet2f& a)
{ return Packet2cf(vreinterpretq_f32_u64(vmovl_u32(vreinterpret_u32_f32(a)))); }
template<> EIGEN_STRONG_INLINE Packet1cf pset1<Packet1cf>(const std::complex<float>& from)
{ return Packet1cf(vld1_f32(reinterpret_cast<const float*>(&from))); }
template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>& from)
{
const float32x2_t r64 = vld1_f32(reinterpret_cast<const float*>(&from));
return Packet2cf(vcombine_f32(r64, r64));
}
template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(padd<Packet4f>(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(psub<Packet4f>(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cf padd<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{ return Packet1cf(padd<Packet2f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(padd<Packet4f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cf psub<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{ return Packet1cf(psub<Packet2f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(psub<Packet4f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor<Packet2cf>(const Packet2cf& a, const Packet2cf& b);
template<> EIGEN_STRONG_INLINE Packet2cf paddsub<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
Packet4f mask = {-0.0f, -0.0f, 0.0f, 0.0f};
return Packet2cf(padd(a.v, pxor(mask, b.v)));
}
template<> EIGEN_STRONG_INLINE Packet1cf pnegate(const Packet1cf& a) { return Packet1cf(pnegate<Packet2f>(a.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate<Packet4f>(a.v)); }
template<> EIGEN_STRONG_INLINE Packet1cf pconj(const Packet1cf& a)
{
const Packet2ui b = vreinterpret_u32_f32(a.v);
return Packet1cf(vreinterpret_f32_u32(veor_u32(b, p2ui_CONJ_XOR())));
}
template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a)
{
Packet4ui b = vreinterpretq_u32_f32(a.v);
const Packet4ui b = vreinterpretq_u32_f32(a.v);
return Packet2cf(vreinterpretq_f32_u32(veorq_u32(b, p4ui_CONJ_XOR())));
}
template<> EIGEN_STRONG_INLINE Packet1cf pmul<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{
Packet2f v1, v2;
// Get the real values of a | a1_re | a1_re |
v1 = vdup_lane_f32(a.v, 0);
// Get the imag values of a | a1_im | a1_im |
v2 = vdup_lane_f32(a.v, 1);
// Multiply the real a with b
v1 = vmul_f32(v1, b.v);
// Multiply the imag a with b
v2 = vmul_f32(v2, b.v);
// Conjugate v2
v2 = vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(v2), p2ui_CONJ_XOR()));
// Swap real/imag elements in v2.
v2 = vrev64_f32(v2);
// Add and return the result
return Packet1cf(vadd_f32(v1, v2));
}
template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
Packet4f v1, v2;
@@ -93,7 +176,7 @@ template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, con
v1 = vmulq_f32(v1, b.v);
// Multiply the imag a with b
v2 = vmulq_f32(v2, b.v);
// Conjugate v2
// Conjugate v2
v2 = vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(v2), p4ui_CONJ_XOR()));
// Swap real/imag elements in v2.
v2 = vrev64q_f32(v2);
@@ -101,6 +184,17 @@ template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, con
return Packet2cf(vaddq_f32(v1, v2));
}
template<> EIGEN_STRONG_INLINE Packet1cf pcmp_eq(const Packet1cf& a, const Packet1cf& b)
{
// Compare real and imaginary parts of a and b to get the mask vector:
// [re(a[0])==re(b[0]), im(a[0])==im(b[0])]
Packet2f eq = pcmp_eq<Packet2f>(a.v, b.v);
// Swap real/imag elements in the mask in to get:
// [im(a[0])==im(b[0]), re(a[0])==re(b[0])]
Packet2f eq_swapped = vrev64_f32(eq);
// Return re(a)==re(b) && im(a)==im(b) by computing bitwise AND of eq and eq_swapped
return Packet1cf(pand<Packet2f>(eq, eq_swapped));
}
template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b)
{
// Compare real and imaginary parts of a and b to get the mask vector:
@@ -113,98 +207,121 @@ template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packe
return Packet2cf(pand<Packet4f>(eq, eq_swapped));
}
template<> EIGEN_STRONG_INLINE Packet2cf pand <Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
return Packet2cf(vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
}
template<> EIGEN_STRONG_INLINE Packet2cf por <Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
return Packet2cf(vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
}
template<> EIGEN_STRONG_INLINE Packet2cf pxor <Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
return Packet2cf(vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
}
template<> EIGEN_STRONG_INLINE Packet1cf pand<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{ return Packet1cf(vreinterpret_f32_u32(vand_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet2cf pand<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cf por<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{ return Packet1cf(vreinterpret_f32_u32(vorr_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet2cf por<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cf pxor<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{ return Packet1cf(vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cf pandnot<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{ return Packet1cf(vreinterpret_f32_u32(vbic_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet2cf pandnot<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(vreinterpretq_f32_u32(vbicq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cf pload<Packet1cf>(const std::complex<float>* from)
{ EIGEN_DEBUG_ALIGNED_LOAD return Packet1cf(pload<Packet2f>((const float*)from)); }
template<> EIGEN_STRONG_INLINE Packet2cf pload<Packet2cf>(const std::complex<float>* from)
{ EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload<Packet4f>(reinterpret_cast<const float*>(from))); }
template<> EIGEN_STRONG_INLINE Packet1cf ploadu<Packet1cf>(const std::complex<float>* from)
{ EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cf(ploadu<Packet2f>((const float*)from)); }
template<> EIGEN_STRONG_INLINE Packet2cf ploadu<Packet2cf>(const std::complex<float>* from)
{ EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu<Packet4f>(reinterpret_cast<const float*>(from))); }
template<> EIGEN_STRONG_INLINE Packet1cf ploaddup<Packet1cf>(const std::complex<float>* from)
{ return pset1<Packet1cf>(*from); }
template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<float>* from)
{ return pset1<Packet2cf>(*from); }
template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> *to, const Packet1cf& from)
{ EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> *to, const Packet2cf& from)
{ EIGEN_DEBUG_ALIGNED_STORE pstore(reinterpret_cast<float*>(to), from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> *to, const Packet1cf& from)
{ EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> *to, const Packet2cf& from)
{ EIGEN_DEBUG_UNALIGNED_STORE pstoreu(reinterpret_cast<float*>(to), from.v); }
template<> EIGEN_DEVICE_FUNC inline Packet1cf pgather<std::complex<float>, Packet1cf>(
const std::complex<float>* from, Index stride)
{
return Packet2cf(vreinterpretq_f32_u32(vbicq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
const Packet2f tmp = vdup_n_f32(std::real(from[0*stride]));
return Packet1cf(vset_lane_f32(std::imag(from[0*stride]), tmp, 1));
}
template<> EIGEN_STRONG_INLINE Packet2cf pload<Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload<Packet4f>((const float*)from)); }
template<> EIGEN_STRONG_INLINE Packet2cf ploadu<Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu<Packet4f>((const float*)from)); }
template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<float>* from) { return pset1<Packet2cf>(*from); }
template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> * to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> * to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); }
template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, Index stride)
template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(
const std::complex<float>* from, Index stride)
{
Packet4f res = pset1<Packet4f>(0.f);
res = vsetq_lane_f32(std::real(from[0*stride]), res, 0);
Packet4f res = vdupq_n_f32(std::real(from[0*stride]));
res = vsetq_lane_f32(std::imag(from[0*stride]), res, 1);
res = vsetq_lane_f32(std::real(from[1*stride]), res, 2);
res = vsetq_lane_f32(std::imag(from[1*stride]), res, 3);
return Packet2cf(res);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf>(std::complex<float>* to, const Packet2cf& from, Index stride)
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet1cf>(
std::complex<float>* to, const Packet1cf& from, Index stride)
{ to[stride*0] = std::complex<float>(vget_lane_f32(from.v, 0), vget_lane_f32(from.v, 1)); }
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf>(
std::complex<float>* to, const Packet2cf& from, Index stride)
{
to[stride*0] = std::complex<float>(vgetq_lane_f32(from.v, 0), vgetq_lane_f32(from.v, 1));
to[stride*1] = std::complex<float>(vgetq_lane_f32(from.v, 2), vgetq_lane_f32(from.v, 3));
}
template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::complex<float> * addr) { EIGEN_ARM_PREFETCH((const float *)addr); }
template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::complex<float> *addr)
{ EIGEN_ARM_PREFETCH(reinterpret_cast<const float*>(addr)); }
template<> EIGEN_STRONG_INLINE std::complex<float> pfirst<Packet2cf>(const Packet2cf& a)
template<> EIGEN_STRONG_INLINE std::complex<float> pfirst<Packet1cf>(const Packet1cf& a)
{
EIGEN_ALIGN16 std::complex<float> x;
vst1_f32(reinterpret_cast<float*>(&x), a.v);
return x;
}
template<> EIGEN_STRONG_INLINE std::complex<float> pfirst<Packet2cf>(const Packet2cf& a)
{
EIGEN_ALIGN16 std::complex<float> x[2];
vst1q_f32((float *)x, a.v);
vst1q_f32(reinterpret_cast<float*>(x), a.v);
return x[0];
}
template<> EIGEN_STRONG_INLINE Packet1cf preverse(const Packet1cf& a) { return a; }
template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a)
{
float32x2_t a_lo, a_hi;
Packet4f a_r128;
a_lo = vget_low_f32(a.v);
a_hi = vget_high_f32(a.v);
a_r128 = vcombine_f32(a_hi, a_lo);
return Packet2cf(a_r128);
}
{ return Packet2cf(vcombine_f32(vget_high_f32(a.v), vget_low_f32(a.v))); }
template<> EIGEN_STRONG_INLINE Packet1cf pcplxflip<Packet1cf>(const Packet1cf& a)
{ return Packet1cf(vrev64_f32(a.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip<Packet2cf>(const Packet2cf& a)
{
return Packet2cf(vrev64q_f32(a.v));
}
{ return Packet2cf(vrev64q_f32(a.v)); }
template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet1cf>(const Packet1cf& a)
{
std::complex<float> s;
vst1_f32((float *)&s, a.v);
return s;
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packet2cf& a)
{
float32x2_t a1, a2;
std::complex<float> s;
a1 = vget_low_f32(a.v);
a2 = vget_high_f32(a.v);
a2 = vadd_f32(a1, a2);
vst1_f32((float *)&s, a2);
vst1_f32(reinterpret_cast<float*>(&s), vadd_f32(vget_low_f32(a.v), vget_high_f32(a.v)));
return s;
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet1cf>(const Packet1cf& a)
{
Packet4f sum1, sum2, sum;
// Add the first two 64-bit float32x2_t of vecs[0]
sum1 = vcombine_f32(vget_low_f32(vecs[0].v), vget_low_f32(vecs[1].v));
sum2 = vcombine_f32(vget_high_f32(vecs[0].v), vget_high_f32(vecs[1].v));
sum = vaddq_f32(sum1, sum2);
return Packet2cf(sum);
std::complex<float> s;
vst1_f32((float *)&s, a.v);
return s;
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
float32x2_t a1, a2, v1, v2, prod;
@@ -220,90 +337,121 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
v1 = vmul_f32(v1, a2);
// Multiply the imag a with b
v2 = vmul_f32(v2, a2);
// Conjugate v2
// Conjugate v2
v2 = vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(v2), p2ui_CONJ_XOR()));
// Swap real/imag elements in v2.
v2 = vrev64_f32(v2);
// Add v1, v2
prod = vadd_f32(v1, v2);
vst1_f32((float *)&s, prod);
vst1_f32(reinterpret_cast<float*>(&s), prod);
return s;
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
template<> struct conj_helper<Packet1cf,Packet1cf,false,true>
{
EIGEN_STRONG_INLINE static void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset==1)
{
first.v = vextq_f32(first.v, second.v, 2);
}
}
EIGEN_STRONG_INLINE Packet1cf pmadd(const Packet1cf& x, const Packet1cf& y, const Packet1cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) const
{ return internal::pmul(a, pconj(b)); }
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
template<> struct conj_helper<Packet1cf,Packet1cf,true,false>
{
EIGEN_STRONG_INLINE Packet1cf pmadd(const Packet1cf& x, const Packet1cf& y, const Packet1cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) const
{ return internal::pmul(pconj(a), b); }
};
template<> struct conj_helper<Packet1cf,Packet1cf,true,true>
{
EIGEN_STRONG_INLINE Packet1cf pmadd(const Packet1cf& x, const Packet1cf& y, const Packet1cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) const
{ return pconj(internal::pmul(a,b)); }
};
template<> struct conj_helper<Packet2cf,Packet2cf,false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(a, pconj(b));
}
{ return internal::pmul(a, pconj(b)); }
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
template<> struct conj_helper<Packet2cf,Packet2cf,true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(pconj(a), b);
}
{ return internal::pmul(pconj(a), b); }
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
template<> struct conj_helper<Packet2cf,Packet2cf,true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return pconj(internal::pmul(a, b));
}
{ return pconj(internal::pmul(a,b)); }
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cf,Packet2f)
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet1cf pdiv<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{
// TODO optimize it for NEON
Packet1cf res = conj_helper<Packet1cf, Packet1cf, false, true>().pmul(a,b);
Packet2f s, rev_s;
// this computes the norm
s = vmul_f32(b.v, b.v);
rev_s = vrev64_f32(s);
return Packet1cf(pdiv<Packet2f>(res.v, vadd_f32(s, rev_s)));
}
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
// TODO optimize it for NEON
Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a,b);
Packet2cf res = conj_helper<Packet2cf, Packet2cf, false, true>().pmul(a,b);
Packet4f s, rev_s;
// this computes the norm
s = vmulq_f32(b.v, b.v);
rev_s = vrev64q_f32(s);
return Packet2cf(pdiv<Packet4f>(res.v, vaddq_f32(s,rev_s)));
return Packet2cf(pdiv<Packet4f>(res.v, vaddq_f32(s, rev_s)));
}
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet2cf,2>& kernel) {
EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock<Packet1cf, 1>& /*kernel*/) {}
EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock<Packet2cf, 2>& kernel)
{
Packet4f tmp = vcombine_f32(vget_high_f32(kernel.packet[0].v), vget_high_f32(kernel.packet[1].v));
kernel.packet[0].v = vcombine_f32(vget_low_f32(kernel.packet[0].v), vget_low_f32(kernel.packet[1].v));
kernel.packet[1].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet1cf psqrt<Packet1cf>(const Packet1cf& a) {
return psqrt_complex<Packet1cf>(a);
}
template<> EIGEN_STRONG_INLINE Packet2cf psqrt<Packet2cf>(const Packet2cf& a) {
return psqrt_complex<Packet2cf>(a);
}
//---------- double ----------
#if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG
// See bug 1325, clang fails to call vld1q_u64.
#if EIGEN_COMP_CLANG
#if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML
static uint64x2_t p2ul_CONJ_XOR = {0x0, 0x8000000000000000};
#else
const uint64_t p2ul_conj_XOR_DATA[] = { 0x0, 0x8000000000000000 };
@@ -321,7 +469,8 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
{
typedef Packet1cd type;
typedef Packet1cd half;
enum {
enum
{
Vectorizable = 1,
AlignedOnScalar = 0,
size = 1,
@@ -340,24 +489,50 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet1cd>
{
typedef std::complex<double> type;
typedef Packet1cd half;
typedef Packet2d as_real;
enum
{
size=1,
alignment=Aligned16,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet1cd pload<Packet1cd>(const std::complex<double>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload<Packet2d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu<Packet2d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet1cd pload<Packet1cd>(const std::complex<double>* from)
{ EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload<Packet2d>(reinterpret_cast<const double*>(from))); }
template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>& from)
{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from)
{ EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu<Packet2d>(reinterpret_cast<const double*>(from))); }
template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(padd<Packet2d>(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(psub<Packet2d>(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate<Packet2d>(a.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v), p2ul_CONJ_XOR))); }
template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>& from)
{
/* here we really have to use unaligned loads :( */
return ploadu<Packet1cd>(&from);
}
template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{ return Packet1cd(padd<Packet2d>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{ return Packet1cd(psub<Packet2d>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a)
{ return Packet1cd(pnegate<Packet2d>(a.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a)
{ return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v), p2ul_CONJ_XOR))); }
template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
Packet2d v1, v2;
// Get the real values of a
// Get the real values of a
v1 = vdupq_lane_f64(vget_low_f64(a.v), 0);
// Get the imag values of a
v2 = vdupq_lane_f64(vget_high_f64(a.v), 0);
@@ -365,7 +540,7 @@ template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, con
v1 = vmulq_f64(v1, b.v);
// Multiply the imag a with b
v2 = vmulq_f64(v2, b.v);
// Conjugate v2
// Conjugate v2
v2 = vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(v2), p2ul_CONJ_XOR));
// Swap real/imag elements in v2.
v2 = preverse<Packet2d>(v2);
@@ -385,31 +560,32 @@ template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packe
return Packet1cd(pand<Packet2d>(eq, eq_swapped));
}
template<> EIGEN_STRONG_INLINE Packet1cd pand <Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
return Packet1cd(vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
}
template<> EIGEN_STRONG_INLINE Packet1cd por <Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
return Packet1cd(vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
}
template<> EIGEN_STRONG_INLINE Packet1cd pxor <Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
}
template<> EIGEN_STRONG_INLINE Packet1cd pand<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{ return Packet1cd(vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cd por<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{ return Packet1cd(vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cd pxor<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{ return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cd pandnot<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
return Packet1cd(vreinterpretq_f64_u64(vbicq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
}
{ return Packet1cd(vreinterpretq_f64_u64(vbicq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); }
template<> EIGEN_STRONG_INLINE Packet1cd ploaddup<Packet1cd>(const std::complex<double>* from) { return pset1<Packet1cd>(*from); }
template<> EIGEN_STRONG_INLINE Packet1cd ploaddup<Packet1cd>(const std::complex<double>* from)
{ return pset1<Packet1cd>(*from); }
template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> * to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> * to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> *to, const Packet1cd& from)
{ EIGEN_DEBUG_ALIGNED_STORE pstore(reinterpret_cast<double*>(to), from.v); }
template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::complex<double> * addr) { EIGEN_ARM_PREFETCH((const double *)addr); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> *to, const Packet1cd& from)
{ EIGEN_DEBUG_UNALIGNED_STORE pstoreu(reinterpret_cast<double*>(to), from.v); }
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index stride)
template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::complex<double> *addr)
{ EIGEN_ARM_PREFETCH(reinterpret_cast<const double*>(addr)); }
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(
const std::complex<double>* from, Index stride)
{
Packet2d res = pset1<Packet2d>(0.0);
res = vsetq_lane_f64(std::real(from[0*stride]), res, 0);
@@ -417,17 +593,14 @@ template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Pack
return Packet1cd(res);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index stride)
{
to[stride*0] = std::complex<double>(vgetq_lane_f64(from.v, 0), vgetq_lane_f64(from.v, 1));
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(
std::complex<double>* to, const Packet1cd& from, Index stride)
{ to[stride*0] = std::complex<double>(vgetq_lane_f64(from.v, 0), vgetq_lane_f64(from.v, 1)); }
template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet1cd>(const Packet1cd& a)
template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet1cd>(const Packet1cd& a)
{
EIGEN_ALIGN16 std::complex<double> res;
pstore<std::complex<double> >(&res, a);
return res;
}
@@ -435,29 +608,15 @@ template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) { return vecs[0]; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<int Offset>
struct palign_impl<Offset,Packet1cd>
{
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
{
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(a, pconj(b));
}
{ return internal::pmul(a, pconj(b)); }
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
@@ -466,9 +625,7 @@ template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(pconj(a), b);
}
{ return internal::pmul(pconj(a), b); }
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
@@ -477,9 +634,7 @@ template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return pconj(internal::pmul(a, b));
}
{ return pconj(internal::pmul(a,b)); }
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d)
@@ -495,9 +650,7 @@ template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, con
}
EIGEN_STRONG_INLINE Packet1cd pcplxflip/*<Packet1cd>*/(const Packet1cd& x)
{
return Packet1cd(preverse(Packet2d(x.v)));
}
{ return Packet1cd(preverse(Packet2d(x.v))); }
EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet1cd,2>& kernel)
{
@@ -505,6 +658,11 @@ EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet1cd,2>& kernel)
kernel.packet[0].v = vcombine_f64(vget_low_f64(kernel.packet[0].v), vget_low_f64(kernel.packet[1].v));
kernel.packet[1].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet1cd psqrt<Packet1cd>(const Packet1cd& a) {
return psqrt_complex<Packet1cd>(a);
}
#endif // EIGEN_ARCH_ARM64
} // end namespace internal

View File

@@ -0,0 +1,183 @@
namespace Eigen {
namespace internal {
#if EIGEN_ARCH_ARM && EIGEN_COMP_CLANG
// Clang seems to excessively spill registers in the GEBP kernel on 32-bit arm.
// Here we specialize gebp_traits to eliminate these register spills.
// See #2138.
template<>
struct gebp_traits <float,float,false,false,Architecture::NEON,GEBPPacketFull>
: gebp_traits<float,float,false,false,Architecture::Generic,GEBPPacketFull>
{
EIGEN_STRONG_INLINE void acc(const AccPacket& c, const ResPacket& alpha, ResPacket& r) const
{
// This volatile inline ASM both acts as a barrier to prevent reordering,
// as well as enforces strict register use.
asm volatile(
"vmla.f32 %q[r], %q[c], %q[alpha]"
: [r] "+w" (r)
: [c] "w" (c),
[alpha] "w" (alpha)
: );
}
template <typename LaneIdType>
EIGEN_STRONG_INLINE void madd(const Packet4f& a, const Packet4f& b,
Packet4f& c, Packet4f& tmp,
const LaneIdType&) const {
acc(a, b, c);
}
template <typename LaneIdType>
EIGEN_STRONG_INLINE void madd(const Packet4f& a, const QuadPacket<Packet4f>& b,
Packet4f& c, Packet4f& tmp,
const LaneIdType& lane) const {
madd(a, b.get(lane), c, tmp, lane);
}
};
#endif // EIGEN_ARCH_ARM && EIGEN_COMP_CLANG
#if EIGEN_ARCH_ARM64
template<>
struct gebp_traits <float,float,false,false,Architecture::NEON,GEBPPacketFull>
: gebp_traits<float,float,false,false,Architecture::Generic,GEBPPacketFull>
{
typedef float RhsPacket;
typedef float32x4_t RhsPacketx4;
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const
{
dest = *b;
}
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const
{
dest = vld1q_f32(b);
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacket& dest) const
{
dest = *b;
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const
{}
EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
{
loadRhs(b,dest);
}
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{
c = vfmaq_n_f32(c, a, b);
}
// NOTE: Template parameter inference failed when compiled with Android NDK:
// "candidate template ignored: could not match 'FixedInt<N>' against 'Eigen::internal::FixedInt<0>".
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{ madd_helper<0>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<1>&) const
{ madd_helper<1>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<2>&) const
{ madd_helper<2>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<3>&) const
{ madd_helper<3>(a, b, c); }
private:
template<int LaneID>
EIGEN_STRONG_INLINE void madd_helper(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c) const
{
#if EIGEN_COMP_GNUC_STRICT && !(EIGEN_GNUC_AT_LEAST(9,0))
// workaround gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101
// vfmaq_laneq_f32 is implemented through a costly dup
if(LaneID==0) asm("fmla %0.4s, %1.4s, %2.s[0]\n" : "+w" (c) : "w" (a), "w" (b) : );
else if(LaneID==1) asm("fmla %0.4s, %1.4s, %2.s[1]\n" : "+w" (c) : "w" (a), "w" (b) : );
else if(LaneID==2) asm("fmla %0.4s, %1.4s, %2.s[2]\n" : "+w" (c) : "w" (a), "w" (b) : );
else if(LaneID==3) asm("fmla %0.4s, %1.4s, %2.s[3]\n" : "+w" (c) : "w" (a), "w" (b) : );
#else
c = vfmaq_laneq_f32(c, a, b, LaneID);
#endif
}
};
template<>
struct gebp_traits <double,double,false,false,Architecture::NEON>
: gebp_traits<double,double,false,false,Architecture::Generic>
{
typedef double RhsPacket;
struct RhsPacketx4 {
float64x2_t B_0, B_1;
};
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const
{
dest = *b;
}
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const
{
dest.B_0 = vld1q_f64(b);
dest.B_1 = vld1q_f64(b+2);
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacket& dest) const
{
loadRhs(b,dest);
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const
{}
EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
{
loadRhs(b,dest);
}
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{
c = vfmaq_n_f64(c, a, b);
}
// NOTE: Template parameter inference failed when compiled with Android NDK:
// "candidate template ignored: could not match 'FixedInt<N>' against 'Eigen::internal::FixedInt<0>".
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{ madd_helper<0>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<1>&) const
{ madd_helper<1>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<2>&) const
{ madd_helper<2>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<3>&) const
{ madd_helper<3>(a, b, c); }
private:
template <int LaneID>
EIGEN_STRONG_INLINE void madd_helper(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c) const
{
#if EIGEN_COMP_GNUC_STRICT && !(EIGEN_GNUC_AT_LEAST(9,0))
// workaround gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101
// vfmaq_laneq_f64 is implemented through a costly dup
if(LaneID==0) asm("fmla %0.2d, %1.2d, %2.d[0]\n" : "+w" (c) : "w" (a), "w" (b.B_0) : );
else if(LaneID==1) asm("fmla %0.2d, %1.2d, %2.d[1]\n" : "+w" (c) : "w" (a), "w" (b.B_0) : );
else if(LaneID==2) asm("fmla %0.2d, %1.2d, %2.d[0]\n" : "+w" (c) : "w" (a), "w" (b.B_1) : );
else if(LaneID==3) asm("fmla %0.2d, %1.2d, %2.d[1]\n" : "+w" (c) : "w" (a), "w" (b.B_1) : );
#else
if(LaneID==0) c = vfmaq_laneq_f64(c, a, b.B_0, 0);
else if(LaneID==1) c = vfmaq_laneq_f64(c, a, b.B_0, 1);
else if(LaneID==2) c = vfmaq_laneq_f64(c, a, b.B_1, 0);
else if(LaneID==3) c = vfmaq_laneq_f64(c, a, b.B_1, 1);
#endif
}
};
#endif // EIGEN_ARCH_ARM64
} // namespace internal
} // namespace Eigen

View File

@@ -12,37 +12,62 @@ namespace Eigen {
namespace internal {
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f pexp<Packet4f>(const Packet4f& x)
{
return pexp_float(x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f pexp<Packet2f>(const Packet2f& x)
{ return pexp_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexp<Packet4f>(const Packet4f& x)
{ return pexp_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f plog<Packet4f>(const Packet4f& x)
{
return plog_float(x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f plog<Packet2f>(const Packet2f& x)
{ return plog_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog<Packet4f>(const Packet4f& x)
{ return plog_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f psin<Packet4f>(const Packet4f& x)
{
return psin_float(x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f psin<Packet2f>(const Packet2f& x)
{ return psin_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psin<Packet4f>(const Packet4f& x)
{ return psin_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f pcos<Packet4f>(const Packet4f& x)
{
return pcos_float(x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f pcos<Packet2f>(const Packet2f& x)
{ return pcos_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pcos<Packet4f>(const Packet4f& x)
{ return pcos_float(x); }
// Hyperbolic Tangent function.
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f ptanh<Packet2f>(const Packet2f& x)
{ return internal::generic_fast_tanh_float(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f ptanh<Packet4f>(const Packet4f& x)
{ return internal::generic_fast_tanh_float(x); }
BF16_PACKET_FUNCTION(Packet4f, Packet4bf, psin)
BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pcos)
BF16_PACKET_FUNCTION(Packet4f, Packet4bf, plog)
BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pexp)
BF16_PACKET_FUNCTION(Packet4f, Packet4bf, ptanh)
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f
ptanh<Packet4f>(const Packet4f& x) {
return internal::generic_fast_tanh_float(x);
EIGEN_STRONG_INLINE Packet4bf pfrexp(const Packet4bf& a, Packet4bf& exponent) {
Packet4f fexponent;
const Packet4bf out = F32ToBf16(pfrexp<Packet4f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet4bf pldexp(const Packet4bf& a, const Packet4bf& exponent) {
return F32ToBf16(pldexp<Packet4f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
//---------- double ----------
#if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d pexp<Packet2d>(const Packet2d& x)
{ return pexp_double(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d plog<Packet2d>(const Packet2d& x)
{ return plog_double(x); }
#endif
} // end namespace internal
} // end namespace Eigen

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -40,20 +40,39 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasBlend = 1
HasBlend = 1
};
};
#endif
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet2cf> {
typedef std::complex<float> type;
typedef Packet2cf half;
typedef Packet4f as_real;
enum {
size=2,
alignment=Aligned16,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_sub_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor<Packet2cf>(const Packet2cf& a, const Packet2cf& b);
template<> EIGEN_STRONG_INLINE Packet2cf paddsub<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x0,0x0));
return Packet2cf(padd(a.v, pxor(mask, b.v)));
}
template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a)
{
const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x80000000,0x80000000));
@@ -83,8 +102,6 @@ template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, con
}
template<> EIGEN_STRONG_INLINE Packet2cf ptrue <Packet2cf>(const Packet2cf& a) { return Packet2cf(ptrue(Packet4f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cf pnot <Packet2cf>(const Packet2cf& a) { return Packet2cf(pnot(Packet4f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cf pand <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_and_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf por <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_or_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_xor_ps(a.v,b.v)); }
@@ -155,29 +172,11 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packe
return pfirst(Packet2cf(_mm_add_ps(a.v, _mm_movehl_ps(a.v,a.v))));
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
{
return Packet2cf(_mm_add_ps(_mm_movelh_ps(vecs[0].v,vecs[1].v), _mm_movehl_ps(vecs[1].v,vecs[0].v)));
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
return pfirst(pmul(a, Packet2cf(_mm_movehl_ps(a.v,a.v))));
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
{
static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset==1)
{
first.v = _mm_movehl_ps(first.v, first.v);
first.v = _mm_movelh_ps(first.v, second.v);
}
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
@@ -274,6 +273,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -283,7 +283,18 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet1cd> {
typedef std::complex<double> type;
typedef Packet1cd half;
typedef Packet2d as_real;
enum {
size=1,
alignment=Aligned16,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_add_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_sub_pd(a.v,b.v)); }
@@ -309,7 +320,6 @@ template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, con
}
template<> EIGEN_STRONG_INLINE Packet1cd ptrue <Packet1cd>(const Packet1cd& a) { return Packet1cd(ptrue(Packet2d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet1cd pnot <Packet1cd>(const Packet1cd& a) { return Packet1cd(pnot(Packet2d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet1cd pand <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_and_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd por <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_or_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pxor <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_xor_pd(a.v,b.v)); }
@@ -345,26 +355,11 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Pack
return pfirst(a);
}
template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs)
{
return vecs[0];
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a)
{
return pfirst(a);
}
template<int Offset>
struct palign_impl<Offset,Packet1cd>
{
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
{
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
@@ -461,28 +456,15 @@ template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, co
return Packet2cf(_mm_castpd_ps(result));
}
template<> EIGEN_STRONG_INLINE Packet2cf pinsertfirst(const Packet2cf& a, std::complex<float> b)
{
return Packet2cf(_mm_loadl_pi(a.v, reinterpret_cast<const __m64*>(&b)));
template<> EIGEN_STRONG_INLINE Packet1cd psqrt<Packet1cd>(const Packet1cd& a) {
return psqrt_complex<Packet1cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet1cd pinsertfirst(const Packet1cd&, std::complex<double> b)
{
return pset1<Packet1cd>(b);
}
template<> EIGEN_STRONG_INLINE Packet2cf pinsertlast(const Packet2cf& a, std::complex<float> b)
{
return Packet2cf(_mm_loadh_pi(a.v, reinterpret_cast<const __m64*>(&b)));
}
template<> EIGEN_STRONG_INLINE Packet1cd pinsertlast(const Packet1cd&, std::complex<double> b)
{
return pset1<Packet1cd>(b);
template<> EIGEN_STRONG_INLINE Packet2cf psqrt<Packet2cf>(const Packet2cf& a) {
return psqrt_complex<Packet2cf>(a);
}
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_COMPLEX_SSE_H

View File

@@ -24,6 +24,21 @@ Packet4f plog<Packet4f>(const Packet4f& _x) {
return plog_float(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet2d plog<Packet2d>(const Packet2d& _x) {
return plog_double(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f plog2<Packet4f>(const Packet4f& _x) {
return plog2_float(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet2d plog2<Packet2d>(const Packet2d& _x) {
return plog2_double(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f plog1p<Packet4f>(const Packet4f& _x) {
return generic_plog1p(_x);
@@ -71,17 +86,17 @@ Packet4f pcos<Packet4f>(const Packet4f& _x)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f psqrt<Packet4f>(const Packet4f& _x)
{
Packet4f half = pmul(_x, pset1<Packet4f>(.5f));
Packet4f denormal_mask = _mm_and_ps(
_mm_cmpge_ps(_x, _mm_setzero_ps()),
_mm_cmplt_ps(_x, pset1<Packet4f>((std::numeric_limits<float>::min)())));
Packet4f minus_half_x = pmul(_x, pset1<Packet4f>(-0.5f));
Packet4f denormal_mask = pandnot(
pcmp_lt(_x, pset1<Packet4f>((std::numeric_limits<float>::min)())),
pcmp_lt(_x, pzero(_x)));
// Compute approximate reciprocal sqrt.
Packet4f x = _mm_rsqrt_ps(_x);
// Do a single step of Newton's iteration.
x = pmul(x, psub(pset1<Packet4f>(1.5f), pmul(half, pmul(x,x))));
x = pmul(x, pmadd(minus_half_x, pmul(x,x), pset1<Packet4f>(1.5f)));
// Flush results for denormals to zero.
return _mm_andnot_ps(denormal_mask, pmul(_x,x));
return pandnot(pmul(_x,x), denormal_mask);
}
#else
@@ -94,6 +109,9 @@ Packet4f psqrt<Packet4f>(const Packet4f& x) { return _mm_sqrt_ps(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet2d psqrt<Packet2d>(const Packet2d& x) { return _mm_sqrt_pd(x); }
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16b psqrt<Packet16b>(const Packet16b& x) { return x; }
#if EIGEN_FAST_MATH
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
@@ -132,7 +150,7 @@ Packet4f prsqrt<Packet4f>(const Packet4f& _x) {
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f prsqrt<Packet4f>(const Packet4f& x) {
// Unfortunately we can't use the much faster mm_rqsrt_ps since it only provides an approximation.
// Unfortunately we can't use the much faster mm_rsqrt_ps since it only provides an approximation.
return _mm_div_ps(pset1<Packet4f>(1.0f), _mm_sqrt_ps(x));
}
@@ -140,7 +158,6 @@ Packet4f prsqrt<Packet4f>(const Packet4f& x) {
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet2d prsqrt<Packet2d>(const Packet2d& x) {
// Unfortunately we can't use the much faster mm_rqsrt_pd since it only provides an approximation.
return _mm_div_pd(pset1<Packet2d>(1.0), _mm_sqrt_pd(x));
}
@@ -168,7 +185,7 @@ double sqrt(const double &x)
{
#if EIGEN_COMP_GNUC_STRICT
// This works around a GCC bug generating poor code for _mm_sqrt_pd
// See https://bitbucket.org/eigen/eigen/commits/14f468dba4d350d7c19c9b93072e19f7b3df563b
// See https://gitlab.com/libeigen/eigen/commit/8dca9f97e38970
return internal::pfirst(internal::Packet2d(__builtin_ia32_sqrtsd(_mm_set_sd(x))));
#else
return internal::pfirst(internal::Packet2d(_mm_sqrt_pd(_mm_set_sd(x))));

File diff suppressed because it is too large Load Diff

View File

@@ -77,6 +77,13 @@ template<> EIGEN_STRONG_INLINE Packet4f preinterpret<Packet4f,Packet4i>(const Pa
return _mm_castsi128_ps(a);
}
template<> EIGEN_STRONG_INLINE Packet2d preinterpret<Packet2d,Packet4i>(const Packet4i& a) {
return _mm_castsi128_pd(a);
}
template<> EIGEN_STRONG_INLINE Packet4i preinterpret<Packet4i,Packet2d>(const Packet2d& a) {
return _mm_castpd_si128(a);
}
// Disable the following code since it's broken on too many platforms / compilers.
//#elif defined(EIGEN_VECTORIZE_SSE) && (!EIGEN_ARCH_x86_64) && (!EIGEN_COMP_MSVC)

View File

@@ -0,0 +1,44 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2020, Arm Limited and Contributors
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_MATH_FUNCTIONS_SVE_H
#define EIGEN_MATH_FUNCTIONS_SVE_H
namespace Eigen {
namespace internal {
template <>
EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf pexp<PacketXf>(const PacketXf& x) {
return pexp_float(x);
}
template <>
EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf plog<PacketXf>(const PacketXf& x) {
return plog_float(x);
}
template <>
EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf psin<PacketXf>(const PacketXf& x) {
return psin_float(x);
}
template <>
EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf pcos<PacketXf>(const PacketXf& x) {
return pcos_float(x);
}
// Hyperbolic Tangent function.
template <>
EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf ptanh<PacketXf>(const PacketXf& x) {
return internal::generic_fast_tanh_float(x);
}
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_MATH_FUNCTIONS_SVE_H

View File

@@ -0,0 +1,756 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2020, Arm Limited and Contributors
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_PACKET_MATH_SVE_H
#define EIGEN_PACKET_MATH_SVE_H
namespace Eigen
{
namespace internal
{
#ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD
#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#endif
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32
template <typename Scalar, int SVEVectorLength>
struct sve_packet_size_selector {
enum { size = SVEVectorLength / (sizeof(Scalar) * CHAR_BIT) };
};
/********************************* int32 **************************************/
typedef svint32_t PacketXi __attribute__((arm_sve_vector_bits(EIGEN_ARM64_SVE_VL)));
template <>
struct packet_traits<numext::int32_t> : default_packet_traits {
typedef PacketXi type;
typedef PacketXi half; // Half not implemented yet
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size = sve_packet_size_selector<numext::int32_t, EIGEN_ARM64_SVE_VL>::size,
HasHalfPacket = 0,
HasAdd = 1,
HasSub = 1,
HasShift = 1,
HasMul = 1,
HasNegate = 1,
HasAbs = 1,
HasArg = 0,
HasAbs2 = 1,
HasMin = 1,
HasMax = 1,
HasConj = 1,
HasSetLinear = 0,
HasBlend = 0,
HasReduxp = 0 // Not implemented in SVE
};
};
template <>
struct unpacket_traits<PacketXi> {
typedef numext::int32_t type;
typedef PacketXi half; // Half not yet implemented
enum {
size = sve_packet_size_selector<numext::int32_t, EIGEN_ARM64_SVE_VL>::size,
alignment = Aligned64,
vectorizable = true,
masked_load_available = false,
masked_store_available = false
};
};
template <>
EIGEN_STRONG_INLINE void prefetch<numext::int32_t>(const numext::int32_t* addr)
{
svprfw(svptrue_b32(), addr, SV_PLDL1KEEP);
}
template <>
EIGEN_STRONG_INLINE PacketXi pset1<PacketXi>(const numext::int32_t& from)
{
return svdup_n_s32(from);
}
template <>
EIGEN_STRONG_INLINE PacketXi plset<PacketXi>(const numext::int32_t& a)
{
numext::int32_t c[packet_traits<numext::int32_t>::size];
for (int i = 0; i < packet_traits<numext::int32_t>::size; i++) c[i] = i;
return svadd_s32_z(svptrue_b32(), pset1<PacketXi>(a), svld1_s32(svptrue_b32(), c));
}
template <>
EIGEN_STRONG_INLINE PacketXi padd<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svadd_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi psub<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svsub_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pnegate(const PacketXi& a)
{
return svneg_s32_z(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXi pconj(const PacketXi& a)
{
return a;
}
template <>
EIGEN_STRONG_INLINE PacketXi pmul<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svmul_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pdiv<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svdiv_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pmadd(const PacketXi& a, const PacketXi& b, const PacketXi& c)
{
return svmla_s32_z(svptrue_b32(), c, a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pmin<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svmin_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pmax<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svmax_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pcmp_le<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svdup_n_s32_z(svcmplt_s32(svptrue_b32(), a, b), 0xffffffffu);
}
template <>
EIGEN_STRONG_INLINE PacketXi pcmp_lt<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svdup_n_s32_z(svcmplt_s32(svptrue_b32(), a, b), 0xffffffffu);
}
template <>
EIGEN_STRONG_INLINE PacketXi pcmp_eq<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svdup_n_s32_z(svcmpeq_s32(svptrue_b32(), a, b), 0xffffffffu);
}
template <>
EIGEN_STRONG_INLINE PacketXi ptrue<PacketXi>(const PacketXi& /*a*/)
{
return svdup_n_s32_z(svptrue_b32(), 0xffffffffu);
}
template <>
EIGEN_STRONG_INLINE PacketXi pzero<PacketXi>(const PacketXi& /*a*/)
{
return svdup_n_s32_z(svptrue_b32(), 0);
}
template <>
EIGEN_STRONG_INLINE PacketXi pand<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svand_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi por<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svorr_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pxor<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return sveor_s32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXi pandnot<PacketXi>(const PacketXi& a, const PacketXi& b)
{
return svbic_s32_z(svptrue_b32(), a, b);
}
template <int N>
EIGEN_STRONG_INLINE PacketXi parithmetic_shift_right(PacketXi a)
{
return svasrd_n_s32_z(svptrue_b32(), a, N);
}
template <int N>
EIGEN_STRONG_INLINE PacketXi plogical_shift_right(PacketXi a)
{
return svreinterpret_s32_u32(svlsr_u32_z(svptrue_b32(), svreinterpret_u32_s32(a), svdup_n_u32_z(svptrue_b32(), N)));
}
template <int N>
EIGEN_STRONG_INLINE PacketXi plogical_shift_left(PacketXi a)
{
return svlsl_s32_z(svptrue_b32(), a, svdup_n_u32_z(svptrue_b32(), N));
}
template <>
EIGEN_STRONG_INLINE PacketXi pload<PacketXi>(const numext::int32_t* from)
{
EIGEN_DEBUG_ALIGNED_LOAD return svld1_s32(svptrue_b32(), from);
}
template <>
EIGEN_STRONG_INLINE PacketXi ploadu<PacketXi>(const numext::int32_t* from)
{
EIGEN_DEBUG_UNALIGNED_LOAD return svld1_s32(svptrue_b32(), from);
}
template <>
EIGEN_STRONG_INLINE PacketXi ploaddup<PacketXi>(const numext::int32_t* from)
{
svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...}
indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...}
return svld1_gather_u32index_s32(svptrue_b32(), from, indices);
}
template <>
EIGEN_STRONG_INLINE PacketXi ploadquad<PacketXi>(const numext::int32_t* from)
{
svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...}
indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...}
indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a0, a0, a1, a1, a1, a1, ...}
return svld1_gather_u32index_s32(svptrue_b32(), from, indices);
}
template <>
EIGEN_STRONG_INLINE void pstore<numext::int32_t>(numext::int32_t* to, const PacketXi& from)
{
EIGEN_DEBUG_ALIGNED_STORE svst1_s32(svptrue_b32(), to, from);
}
template <>
EIGEN_STRONG_INLINE void pstoreu<numext::int32_t>(numext::int32_t* to, const PacketXi& from)
{
EIGEN_DEBUG_UNALIGNED_STORE svst1_s32(svptrue_b32(), to, from);
}
template <>
EIGEN_DEVICE_FUNC inline PacketXi pgather<numext::int32_t, PacketXi>(const numext::int32_t* from, Index stride)
{
// Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...}
svint32_t indices = svindex_s32(0, stride);
return svld1_gather_s32index_s32(svptrue_b32(), from, indices);
}
template <>
EIGEN_DEVICE_FUNC inline void pscatter<numext::int32_t, PacketXi>(numext::int32_t* to, const PacketXi& from, Index stride)
{
// Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...}
svint32_t indices = svindex_s32(0, stride);
svst1_scatter_s32index_s32(svptrue_b32(), to, indices, from);
}
template <>
EIGEN_STRONG_INLINE numext::int32_t pfirst<PacketXi>(const PacketXi& a)
{
// svlasta returns the first element if all predicate bits are 0
return svlasta_s32(svpfalse_b(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXi preverse(const PacketXi& a)
{
return svrev_s32(a);
}
template <>
EIGEN_STRONG_INLINE PacketXi pabs(const PacketXi& a)
{
return svabs_s32_z(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE numext::int32_t predux<PacketXi>(const PacketXi& a)
{
return static_cast<numext::int32_t>(svaddv_s32(svptrue_b32(), a));
}
template <>
EIGEN_STRONG_INLINE numext::int32_t predux_mul<PacketXi>(const PacketXi& a)
{
EIGEN_STATIC_ASSERT((EIGEN_ARM64_SVE_VL % 128 == 0),
EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT);
// Multiply the vector by its reverse
svint32_t prod = svmul_s32_z(svptrue_b32(), a, svrev_s32(a));
svint32_t half_prod;
// Extract the high half of the vector. Depending on the VL more reductions need to be done
if (EIGEN_ARM64_SVE_VL >= 2048) {
half_prod = svtbl_s32(prod, svindex_u32(32, 1));
prod = svmul_s32_z(svptrue_b32(), prod, half_prod);
}
if (EIGEN_ARM64_SVE_VL >= 1024) {
half_prod = svtbl_s32(prod, svindex_u32(16, 1));
prod = svmul_s32_z(svptrue_b32(), prod, half_prod);
}
if (EIGEN_ARM64_SVE_VL >= 512) {
half_prod = svtbl_s32(prod, svindex_u32(8, 1));
prod = svmul_s32_z(svptrue_b32(), prod, half_prod);
}
if (EIGEN_ARM64_SVE_VL >= 256) {
half_prod = svtbl_s32(prod, svindex_u32(4, 1));
prod = svmul_s32_z(svptrue_b32(), prod, half_prod);
}
// Last reduction
half_prod = svtbl_s32(prod, svindex_u32(2, 1));
prod = svmul_s32_z(svptrue_b32(), prod, half_prod);
// The reduction is done to the first element.
return pfirst<PacketXi>(prod);
}
template <>
EIGEN_STRONG_INLINE numext::int32_t predux_min<PacketXi>(const PacketXi& a)
{
return svminv_s32(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE numext::int32_t predux_max<PacketXi>(const PacketXi& a)
{
return svmaxv_s32(svptrue_b32(), a);
}
template <int N>
EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock<PacketXi, N>& kernel) {
int buffer[packet_traits<numext::int32_t>::size * N] = {0};
int i = 0;
PacketXi stride_index = svindex_s32(0, N);
for (i = 0; i < N; i++) {
svst1_scatter_s32index_s32(svptrue_b32(), buffer + i, stride_index, kernel.packet[i]);
}
for (i = 0; i < N; i++) {
kernel.packet[i] = svld1_s32(svptrue_b32(), buffer + i * packet_traits<numext::int32_t>::size);
}
}
/********************************* float32 ************************************/
typedef svfloat32_t PacketXf __attribute__((arm_sve_vector_bits(EIGEN_ARM64_SVE_VL)));
template <>
struct packet_traits<float> : default_packet_traits {
typedef PacketXf type;
typedef PacketXf half;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size = sve_packet_size_selector<float, EIGEN_ARM64_SVE_VL>::size,
HasHalfPacket = 0,
HasAdd = 1,
HasSub = 1,
HasShift = 1,
HasMul = 1,
HasNegate = 1,
HasAbs = 1,
HasArg = 0,
HasAbs2 = 1,
HasMin = 1,
HasMax = 1,
HasConj = 1,
HasSetLinear = 0,
HasBlend = 0,
HasReduxp = 0, // Not implemented in SVE
HasDiv = 1,
HasFloor = 1,
HasSin = EIGEN_FAST_MATH,
HasCos = EIGEN_FAST_MATH,
HasLog = 1,
HasExp = 1,
HasSqrt = 0,
HasTanh = EIGEN_FAST_MATH,
HasErf = EIGEN_FAST_MATH
};
};
template <>
struct unpacket_traits<PacketXf> {
typedef float type;
typedef PacketXf half; // Half not yet implemented
typedef PacketXi integer_packet;
enum {
size = sve_packet_size_selector<float, EIGEN_ARM64_SVE_VL>::size,
alignment = Aligned64,
vectorizable = true,
masked_load_available = false,
masked_store_available = false
};
};
template <>
EIGEN_STRONG_INLINE PacketXf pset1<PacketXf>(const float& from)
{
return svdup_n_f32(from);
}
template <>
EIGEN_STRONG_INLINE PacketXf pset1frombits<PacketXf>(numext::uint32_t from)
{
return svreinterpret_f32_u32(svdup_n_u32_z(svptrue_b32(), from));
}
template <>
EIGEN_STRONG_INLINE PacketXf plset<PacketXf>(const float& a)
{
float c[packet_traits<float>::size];
for (int i = 0; i < packet_traits<float>::size; i++) c[i] = i;
return svadd_f32_z(svptrue_b32(), pset1<PacketXf>(a), svld1_f32(svptrue_b32(), c));
}
template <>
EIGEN_STRONG_INLINE PacketXf padd<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svadd_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf psub<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svsub_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pnegate(const PacketXf& a)
{
return svneg_f32_z(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXf pconj(const PacketXf& a)
{
return a;
}
template <>
EIGEN_STRONG_INLINE PacketXf pmul<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svmul_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pdiv<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svdiv_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmadd(const PacketXf& a, const PacketXf& b, const PacketXf& c)
{
return svmla_f32_z(svptrue_b32(), c, a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmin<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svmin_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmin<PropagateNaN, PacketXf>(const PacketXf& a, const PacketXf& b)
{
return pmin<PacketXf>(a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmin<PropagateNumbers, PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svminnm_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmax<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svmax_f32_z(svptrue_b32(), a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmax<PropagateNaN, PacketXf>(const PacketXf& a, const PacketXf& b)
{
return pmax<PacketXf>(a, b);
}
template <>
EIGEN_STRONG_INLINE PacketXf pmax<PropagateNumbers, PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svmaxnm_f32_z(svptrue_b32(), a, b);
}
// Float comparisons in SVE return svbool (predicate). Use svdup to set active
// lanes to 1 (0xffffffffu) and inactive lanes to 0.
template <>
EIGEN_STRONG_INLINE PacketXf pcmp_le<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svdup_n_u32_z(svcmplt_f32(svptrue_b32(), a, b), 0xffffffffu));
}
template <>
EIGEN_STRONG_INLINE PacketXf pcmp_lt<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svdup_n_u32_z(svcmplt_f32(svptrue_b32(), a, b), 0xffffffffu));
}
template <>
EIGEN_STRONG_INLINE PacketXf pcmp_eq<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svdup_n_u32_z(svcmpeq_f32(svptrue_b32(), a, b), 0xffffffffu));
}
// Do a predicate inverse (svnot_b_z) on the predicate resulted from the
// greater/equal comparison (svcmpge_f32). Then fill a float vector with the
// active elements.
template <>
EIGEN_STRONG_INLINE PacketXf pcmp_lt_or_nan<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svdup_n_u32_z(svnot_b_z(svptrue_b32(), svcmpge_f32(svptrue_b32(), a, b)), 0xffffffffu));
}
template <>
EIGEN_STRONG_INLINE PacketXf pfloor<PacketXf>(const PacketXf& a)
{
return svrintm_f32_z(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXf ptrue<PacketXf>(const PacketXf& /*a*/)
{
return svreinterpret_f32_u32(svdup_n_u32_z(svptrue_b32(), 0xffffffffu));
}
// Logical Operations are not supported for float, so reinterpret casts
template <>
EIGEN_STRONG_INLINE PacketXf pand<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svand_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b)));
}
template <>
EIGEN_STRONG_INLINE PacketXf por<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svorr_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b)));
}
template <>
EIGEN_STRONG_INLINE PacketXf pxor<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(sveor_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b)));
}
template <>
EIGEN_STRONG_INLINE PacketXf pandnot<PacketXf>(const PacketXf& a, const PacketXf& b)
{
return svreinterpret_f32_u32(svbic_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b)));
}
template <>
EIGEN_STRONG_INLINE PacketXf pload<PacketXf>(const float* from)
{
EIGEN_DEBUG_ALIGNED_LOAD return svld1_f32(svptrue_b32(), from);
}
template <>
EIGEN_STRONG_INLINE PacketXf ploadu<PacketXf>(const float* from)
{
EIGEN_DEBUG_UNALIGNED_LOAD return svld1_f32(svptrue_b32(), from);
}
template <>
EIGEN_STRONG_INLINE PacketXf ploaddup<PacketXf>(const float* from)
{
svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...}
indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...}
return svld1_gather_u32index_f32(svptrue_b32(), from, indices);
}
template <>
EIGEN_STRONG_INLINE PacketXf ploadquad<PacketXf>(const float* from)
{
svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...}
indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...}
indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a0, a0, a1, a1, a1, a1, ...}
return svld1_gather_u32index_f32(svptrue_b32(), from, indices);
}
template <>
EIGEN_STRONG_INLINE void pstore<float>(float* to, const PacketXf& from)
{
EIGEN_DEBUG_ALIGNED_STORE svst1_f32(svptrue_b32(), to, from);
}
template <>
EIGEN_STRONG_INLINE void pstoreu<float>(float* to, const PacketXf& from)
{
EIGEN_DEBUG_UNALIGNED_STORE svst1_f32(svptrue_b32(), to, from);
}
template <>
EIGEN_DEVICE_FUNC inline PacketXf pgather<float, PacketXf>(const float* from, Index stride)
{
// Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...}
svint32_t indices = svindex_s32(0, stride);
return svld1_gather_s32index_f32(svptrue_b32(), from, indices);
}
template <>
EIGEN_DEVICE_FUNC inline void pscatter<float, PacketXf>(float* to, const PacketXf& from, Index stride)
{
// Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...}
svint32_t indices = svindex_s32(0, stride);
svst1_scatter_s32index_f32(svptrue_b32(), to, indices, from);
}
template <>
EIGEN_STRONG_INLINE float pfirst<PacketXf>(const PacketXf& a)
{
// svlasta returns the first element if all predicate bits are 0
return svlasta_f32(svpfalse_b(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXf preverse(const PacketXf& a)
{
return svrev_f32(a);
}
template <>
EIGEN_STRONG_INLINE PacketXf pabs(const PacketXf& a)
{
return svabs_f32_z(svptrue_b32(), a);
}
// TODO(tellenbach): Should this go into MathFunctions.h? If so, change for
// all vector extensions and the generic version.
template <>
EIGEN_STRONG_INLINE PacketXf pfrexp<PacketXf>(const PacketXf& a, PacketXf& exponent)
{
return pfrexp_generic(a, exponent);
}
template <>
EIGEN_STRONG_INLINE float predux<PacketXf>(const PacketXf& a)
{
return svaddv_f32(svptrue_b32(), a);
}
// Other reduction functions:
// mul
// Only works for SVE Vls multiple of 128
template <>
EIGEN_STRONG_INLINE float predux_mul<PacketXf>(const PacketXf& a)
{
EIGEN_STATIC_ASSERT((EIGEN_ARM64_SVE_VL % 128 == 0),
EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT);
// Multiply the vector by its reverse
svfloat32_t prod = svmul_f32_z(svptrue_b32(), a, svrev_f32(a));
svfloat32_t half_prod;
// Extract the high half of the vector. Depending on the VL more reductions need to be done
if (EIGEN_ARM64_SVE_VL >= 2048) {
half_prod = svtbl_f32(prod, svindex_u32(32, 1));
prod = svmul_f32_z(svptrue_b32(), prod, half_prod);
}
if (EIGEN_ARM64_SVE_VL >= 1024) {
half_prod = svtbl_f32(prod, svindex_u32(16, 1));
prod = svmul_f32_z(svptrue_b32(), prod, half_prod);
}
if (EIGEN_ARM64_SVE_VL >= 512) {
half_prod = svtbl_f32(prod, svindex_u32(8, 1));
prod = svmul_f32_z(svptrue_b32(), prod, half_prod);
}
if (EIGEN_ARM64_SVE_VL >= 256) {
half_prod = svtbl_f32(prod, svindex_u32(4, 1));
prod = svmul_f32_z(svptrue_b32(), prod, half_prod);
}
// Last reduction
half_prod = svtbl_f32(prod, svindex_u32(2, 1));
prod = svmul_f32_z(svptrue_b32(), prod, half_prod);
// The reduction is done to the first element.
return pfirst<PacketXf>(prod);
}
template <>
EIGEN_STRONG_INLINE float predux_min<PacketXf>(const PacketXf& a)
{
return svminv_f32(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE float predux_max<PacketXf>(const PacketXf& a)
{
return svmaxv_f32(svptrue_b32(), a);
}
template<int N>
EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock<PacketXf, N>& kernel)
{
float buffer[packet_traits<float>::size * N] = {0};
int i = 0;
PacketXi stride_index = svindex_s32(0, N);
for (i = 0; i < N; i++) {
svst1_scatter_s32index_f32(svptrue_b32(), buffer + i, stride_index, kernel.packet[i]);
}
for (i = 0; i < N; i++) {
kernel.packet[i] = svld1_f32(svptrue_b32(), buffer + i * packet_traits<float>::size);
}
}
template<>
EIGEN_STRONG_INLINE PacketXf pldexp<PacketXf>(const PacketXf& a, const PacketXf& exponent)
{
return pldexp_generic(a, exponent);
}
} // namespace internal
} // namespace Eigen
#endif // EIGEN_PACKET_MATH_SVE_H

View File

@@ -0,0 +1,49 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2020, Arm Limited and Contributors
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_TYPE_CASTING_SVE_H
#define EIGEN_TYPE_CASTING_SVE_H
namespace Eigen {
namespace internal {
template <>
struct type_casting_traits<float, numext::int32_t> {
enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 };
};
template <>
struct type_casting_traits<numext::int32_t, float> {
enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 };
};
template <>
EIGEN_STRONG_INLINE PacketXf pcast<PacketXi, PacketXf>(const PacketXi& a) {
return svcvt_f32_s32_z(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXi pcast<PacketXf, PacketXi>(const PacketXf& a) {
return svcvt_s32_f32_z(svptrue_b32(), a);
}
template <>
EIGEN_STRONG_INLINE PacketXf preinterpret<PacketXf, PacketXi>(const PacketXi& a) {
return svreinterpret_f32_s32(a);
}
template <>
EIGEN_STRONG_INLINE PacketXi preinterpret<PacketXi, PacketXf>(const PacketXf& a) {
return svreinterpret_s32_f32(a);
}
} // namespace internal
} // namespace Eigen
#endif // EIGEN_TYPE_CASTING_SVE_H

View File

@@ -59,12 +59,16 @@ struct sycl_packet_traits : default_packet_traits {
HasIGammac = 0,
HasBetaInc = 0,
HasBlend = has_blend,
// This flag is used to indicate whether packet comparison is supported.
// pcmp_eq, pcmp_lt and pcmp_le should be defined for it to be true.
HasCmp = 1,
HasMax = 1,
HasMin = 1,
HasMul = 1,
HasAdd = 1,
HasFloor = 1,
HasRound = 1,
HasRint = 1,
HasLog1p = 1,
HasExpm1 = 1,
HasCeil = 1,
@@ -147,7 +151,7 @@ struct PacketWrapper<PacketReturnType, 4> {
typedef typename ::Eigen::internal::unpacket_traits<PacketReturnType>::type
Scalar;
template <typename Index>
EIGEN_DEVICE_FUNC static Scalar scalarize(Index index, PacketReturnType &in) {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Scalar scalarize(Index index, PacketReturnType &in) {
switch (index) {
case 0:
return in.x();
@@ -158,17 +162,18 @@ struct PacketWrapper<PacketReturnType, 4> {
case 3:
return in.w();
default:
eigen_assert(false && "INDEX MUST BE BETWEEN 0 and 3");
abort();
//INDEX MUST BE BETWEEN 0 and 3.There is no abort function in SYCL kernel. so we cannot use abort here.
// The code will never reach here
__builtin_unreachable();
}
__builtin_unreachable();
}
EIGEN_DEVICE_FUNC static PacketReturnType convert_to_packet_type(
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType convert_to_packet_type(
Scalar in, Scalar other) {
return PacketReturnType(in, other, other, other);
}
EIGEN_DEVICE_FUNC static void set_packet(PacketReturnType &lhs, Scalar *rhs) {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void set_packet(PacketReturnType &lhs, Scalar *rhs) {
lhs = PacketReturnType(rhs[0], rhs[1], rhs[2], rhs[3]);
}
};
@@ -178,14 +183,14 @@ struct PacketWrapper<PacketReturnType, 1> {
typedef typename ::Eigen::internal::unpacket_traits<PacketReturnType>::type
Scalar;
template <typename Index>
EIGEN_DEVICE_FUNC static Scalar scalarize(Index, PacketReturnType &in) {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Scalar scalarize(Index, PacketReturnType &in) {
return in;
}
EIGEN_DEVICE_FUNC static PacketReturnType convert_to_packet_type(Scalar in,
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType convert_to_packet_type(Scalar in,
Scalar) {
return PacketReturnType(in);
}
EIGEN_DEVICE_FUNC static void set_packet(PacketReturnType &lhs, Scalar *rhs) {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void set_packet(PacketReturnType &lhs, Scalar *rhs) {
lhs = rhs[0];
}
};
@@ -195,24 +200,25 @@ struct PacketWrapper<PacketReturnType, 2> {
typedef typename ::Eigen::internal::unpacket_traits<PacketReturnType>::type
Scalar;
template <typename Index>
EIGEN_DEVICE_FUNC static Scalar scalarize(Index index, PacketReturnType &in) {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Scalar scalarize(Index index, PacketReturnType &in) {
switch (index) {
case 0:
return in.x();
case 1:
return in.y();
default:
eigen_assert(false && "INDEX MUST BE BETWEEN 0 and 1");
abort();
//INDEX MUST BE BETWEEN 0 and 1.There is no abort function in SYCL kernel. so we cannot use abort here.
// The code will never reach here
__builtin_unreachable();
}
__builtin_unreachable();
}
EIGEN_DEVICE_FUNC static PacketReturnType convert_to_packet_type(
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType convert_to_packet_type(
Scalar in, Scalar other) {
return PacketReturnType(in, other);
}
EIGEN_DEVICE_FUNC static void set_packet(PacketReturnType &lhs, Scalar *rhs) {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void set_packet(PacketReturnType &lhs, Scalar *rhs) {
lhs = PacketReturnType(rhs[0], rhs[1]);
}
};

View File

@@ -20,7 +20,6 @@
#ifndef EIGEN_MATH_FUNCTIONS_SYCL_H
#define EIGEN_MATH_FUNCTIONS_SYCL_H
namespace Eigen {
namespace internal {
@@ -70,6 +69,7 @@ SYCL_PLOG10(cl::sycl::cl_double2)
}
SYCL_PEXP(cl::sycl::cl_float4)
SYCL_PEXP(cl::sycl::cl_float)
SYCL_PEXP(cl::sycl::cl_double2)
#undef SYCL_PEXP
@@ -236,6 +236,17 @@ SYCL_PROUND(cl::sycl::cl_float4)
SYCL_PROUND(cl::sycl::cl_double2)
#undef SYCL_PROUND
#define SYCL_PRINT(packet_type) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type print<packet_type>( \
const packet_type& a) { \
return cl::sycl::rint(a); \
}
SYCL_PRINT(cl::sycl::cl_float4)
SYCL_PRINT(cl::sycl::cl_double2)
#undef SYCL_PRINT
#define SYCL_FLOOR(packet_type) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pfloor<packet_type>( \
@@ -269,8 +280,20 @@ SYCL_PMAX(cl::sycl::cl_float4, cl::sycl::fmax(a, b))
SYCL_PMAX(cl::sycl::cl_double2, cl::sycl::fmax(a, b))
#undef SYCL_PMAX
#endif
#define SYCL_PLDEXP(packet_type) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pldexp( \
const packet_type& a, const packet_type& exponent) { \
return cl::sycl::ldexp( \
a, exponent.template convert<cl::sycl::cl_int, \
cl::sycl::rounding_mode::automatic>()); \
}
SYCL_PLDEXP(cl::sycl::cl_float4)
SYCL_PLDEXP(cl::sycl::cl_double2)
#undef SYCL_PLDEXP
#endif
} // end namespace internal
} // end namespace Eigen

View File

@@ -472,6 +472,115 @@ pabs<cl::sycl::cl_double2>(const cl::sycl::cl_double2& a) {
return cl::sycl::cl_double2(cl::sycl::fabs(a.x()), cl::sycl::fabs(a.y()));
}
template <typename Packet>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet sycl_pcmp_le(const Packet &a,
const Packet &b) {
return ((a <= b)
.template convert<typename unpacket_traits<Packet>::type,
cl::sycl::rounding_mode::automatic>());
}
template <typename Packet>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet sycl_pcmp_lt(const Packet &a,
const Packet &b) {
return ((a < b)
.template convert<typename unpacket_traits<Packet>::type,
cl::sycl::rounding_mode::automatic>());
}
template <typename Packet>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet sycl_pcmp_eq(const Packet &a,
const Packet &b) {
return ((a == b)
.template convert<typename unpacket_traits<Packet>::type,
cl::sycl::rounding_mode::automatic>());
}
#define SYCL_PCMP(OP, TYPE) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TYPE pcmp_##OP<TYPE>(const TYPE &a, \
const TYPE &b) { \
return sycl_pcmp_##OP<TYPE>(a, b); \
}
SYCL_PCMP(le, cl::sycl::cl_float4)
SYCL_PCMP(lt, cl::sycl::cl_float4)
SYCL_PCMP(eq, cl::sycl::cl_float4)
SYCL_PCMP(le, cl::sycl::cl_double2)
SYCL_PCMP(lt, cl::sycl::cl_double2)
SYCL_PCMP(eq, cl::sycl::cl_double2)
#undef SYCL_PCMP
template <typename T> struct convert_to_integer;
template <> struct convert_to_integer<float> {
using type = std::int32_t;
using packet_type = cl::sycl::cl_int4;
};
template <> struct convert_to_integer<double> {
using type = std::int64_t;
using packet_type = cl::sycl::cl_long2;
};
template <typename PacketIn>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename convert_to_integer<
typename unpacket_traits<PacketIn>::type>::packet_type
vector_as_int(const PacketIn &p) {
return (
p.template convert<typename convert_to_integer<
typename unpacket_traits<PacketIn>::type>::type,
cl::sycl::rounding_mode::automatic>());
}
template <typename packetOut, typename PacketIn>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packetOut
convert_vector(const PacketIn &p) {
return (p.template convert<typename unpacket_traits<packetOut>::type,
cl::sycl::rounding_mode::automatic>());
}
#define SYCL_PAND(TYPE) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE pand<TYPE>(const TYPE &a, \
const TYPE &b) { \
return convert_vector<TYPE>(vector_as_int(a) & vector_as_int(b)); \
}
SYCL_PAND(cl::sycl::cl_float4)
SYCL_PAND(cl::sycl::cl_double2)
#undef SYCL_PAND
#define SYCL_POR(TYPE) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE por<TYPE>(const TYPE &a, \
const TYPE &b) { \
return convert_vector<TYPE>(vector_as_int(a) | vector_as_int(b)); \
}
SYCL_POR(cl::sycl::cl_float4)
SYCL_POR(cl::sycl::cl_double2)
#undef SYCL_POR
#define SYCL_PXOR(TYPE) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE pxor<TYPE>(const TYPE &a, \
const TYPE &b) { \
return convert_vector<TYPE>(vector_as_int(a) ^ vector_as_int(b)); \
}
SYCL_PXOR(cl::sycl::cl_float4)
SYCL_PXOR(cl::sycl::cl_double2)
#undef SYCL_PXOR
#define SYCL_PANDNOT(TYPE) \
template <> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE pandnot<TYPE>(const TYPE &a, \
const TYPE &b) { \
return convert_vector<TYPE>(vector_as_int(a) & (~vector_as_int(b))); \
}
SYCL_PANDNOT(cl::sycl::cl_float4)
SYCL_PANDNOT(cl::sycl::cl_double2)
#undef SYCL_PANDNOT
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void ptranspose(
PacketBlock<cl::sycl::cl_float4, 4>& kernel) {
float tmp = kernel.packet[0].y();

View File

@@ -140,6 +140,11 @@ template<> EIGEN_STRONG_INLINE Packet1cd por <Packet1cd>(const Packet1cd& a,
template<> EIGEN_STRONG_INLINE Packet1cd pxor <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_xor(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pandnot <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_and(a.v, vec_nor(b.v,b.v))); }
template<> EIGEN_STRONG_INLINE Packet1cd ploaddup<Packet1cd>(const std::complex<double>* from) { return pset1<Packet1cd>(*from); }
template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packet1cd& b) {
Packet2d eq = vec_cmpeq (a.v, b.v);
Packet2d tmp = { eq[1], eq[0] };
return (Packet1cd)pand<Packet2d>(eq, tmp);
}
template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::complex<double> * addr) { EIGEN_ZVECTOR_PREFETCH(addr); }
@@ -156,24 +161,10 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Pack
{
return pfirst(a);
}
template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs)
{
return vecs[0];
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a)
{
return pfirst(a);
}
template<int Offset>
struct palign_impl<Offset,Packet1cd>
{
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
{
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
@@ -295,6 +286,17 @@ template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::co
#if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ < 12)
template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) {
Packet4f eq = pcmp_eq<Packet4f> (a.v, b.v);
Packet2cf res;
Packet2d tmp1 = { eq.v4f[0][1], eq.v4f[0][0] };
Packet2d tmp2 = { eq.v4f[1][1], eq.v4f[1][0] };
res.v.v4f[0] = pand<Packet2d>(eq.v4f[0], tmp1);
res.v.v4f[1] = pand<Packet2d>(eq.v4f[1], tmp2);
return res;
}
template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a)
{
Packet2cf res;
@@ -327,16 +329,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packe
return res;
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
{
PacketBlock<Packet2cf,2> transpose;
transpose.packet[0] = vecs[0];
transpose.packet[1] = vecs[1];
ptranspose(transpose);
return padd<Packet2cf>(transpose.packet[0], transpose.packet[1]);
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
std::complex<float> res;
@@ -345,18 +337,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return res;
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
{
static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset == 1) {
first.cd[0] = first.cd[1];
first.cd[1] = second.cd[0];
}
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
@@ -423,6 +403,11 @@ template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, con
return result;
}
#else
template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) {
Packet4f eq = vec_cmpeq (a.v, b.v);
Packet4f tmp = { eq[1], eq[0], eq[3], eq[2] };
return (Packet2cf)pand<Packet4f>(eq, tmp);
}
template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf(pxor<Packet4f>(a.v, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR))); }
template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
@@ -461,17 +446,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packe
return pfirst<Packet2cf>(Packet2cf(b));
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
{
Packet4f b1, b2;
b1 = vec_sld(vecs[0].v, vecs[1].v, 8);
b2 = vec_sld(vecs[1].v, vecs[0].v, 8);
b2 = vec_sld(b2, b2, 8);
b2 = padd<Packet4f>(b1, b2);
return Packet2cf(b2);
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
Packet4f b;
@@ -482,18 +456,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst<Packet2cf>(prod);
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
{
static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset==1)
{
first.v = vec_sld(first.v, second.v, 8);
}
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const

View File

@@ -140,7 +140,6 @@ template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f pexp<Packet4f>(const Packet4f& _x)
{
#if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12)
/*
Packet4f x = _x;
Packet4f tmp, fx;
@@ -171,16 +170,11 @@ Packet4f pexp<Packet4f>(const Packet4f& _x)
y = padd(y, p4f_1);
// build 2^n
emm0 = vec_cts(fx, 0);
emm0 = (Packet4i){ (int)fx[0], (int)fx[1], (int)fx[2], (int)fx[3] };
emm0 = emm0 + p4i_0x7f;
emm0 = emm0 << reinterpret_cast<Packet4i>(p4i_23);
// Altivec's max & min operators just drop silent NaNs. Check NaNs in
// inputs and return them unmodified.
Packet4ui isnumber_mask = reinterpret_cast<Packet4ui>(vec_cmpeq(_x, _x));
return vec_sel(_x, pmax(pmul(y, reinterpret_cast<Packet4f>(emm0)), _x),
isnumber_mask);*/
return _x;
return pmax(pmul(y, reinterpret_cast<Packet4f>(emm0)), _x);
#else
Packet4f res;
res.v4f[0] = pexp<Packet2d>(_x.v4f[0]);

View File

@@ -10,8 +10,6 @@
#ifndef EIGEN_PACKET_MATH_ZVECTOR_H
#define EIGEN_PACKET_MATH_ZVECTOR_H
#include <stdint.h>
namespace Eigen {
namespace internal {
@@ -51,10 +49,10 @@ typedef struct {
#endif
typedef union {
int32_t i[4];
uint32_t ui[4];
int64_t l[2];
uint64_t ul[2];
numext::int32_t i[4];
numext::uint32_t ui[4];
numext::int64_t l[2];
numext::uint64_t ul[2];
double d[2];
float f[4];
Packet4i v4i;
@@ -193,11 +191,7 @@ struct packet_traits<float> : default_packet_traits {
HasSin = 0,
HasCos = 0,
HasLog = 0,
#if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12)
HasExp = 0,
#else
HasExp = 1,
#endif
HasSqrt = 1,
HasRsqrt = 1,
HasTanh = 1,
@@ -298,33 +292,6 @@ inline std::ostream & operator <<(std::ostream & s, const Packet4f & v)
}
#endif
template<int Offset>
struct palign_impl<Offset,Packet4i>
{
static EIGEN_STRONG_INLINE void run(Packet4i& first, const Packet4i& second)
{
switch (Offset % 4) {
case 1:
first = vec_sld(first, second, 4); break;
case 2:
first = vec_sld(first, second, 8); break;
case 3:
first = vec_sld(first, second, 12); break;
}
}
};
template<int Offset>
struct palign_impl<Offset,Packet2d>
{
static EIGEN_STRONG_INLINE void run(Packet2d& first, const Packet2d& second)
{
if (Offset == 1)
first = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4i>(first), reinterpret_cast<Packet4i>(second), 8));
}
};
template<> EIGEN_STRONG_INLINE Packet4i pload<Packet4i>(const int* from)
{
// FIXME: No intrinsic yet
@@ -530,45 +497,6 @@ template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a)
return pfirst(sum);
}
template<> EIGEN_STRONG_INLINE Packet4i preduxp<Packet4i>(const Packet4i* vecs)
{
Packet4i v[4], sum[4];
// It's easier and faster to transpose then add as columns
// Check: http://www.freevec.org/function/matrix_4x4_transpose_floats for explanation
// Do the transpose, first set of moves
v[0] = vec_mergeh(vecs[0], vecs[2]);
v[1] = vec_mergel(vecs[0], vecs[2]);
v[2] = vec_mergeh(vecs[1], vecs[3]);
v[3] = vec_mergel(vecs[1], vecs[3]);
// Get the resulting vectors
sum[0] = vec_mergeh(v[0], v[2]);
sum[1] = vec_mergel(v[0], v[2]);
sum[2] = vec_mergeh(v[1], v[3]);
sum[3] = vec_mergel(v[1], v[3]);
// Now do the summation:
// Lines 0+1
sum[0] = padd<Packet4i>(sum[0], sum[1]);
// Lines 2+3
sum[1] = padd<Packet4i>(sum[2], sum[3]);
// Add the results
sum[0] = padd<Packet4i>(sum[0], sum[1]);
return sum[0];
}
template<> EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs)
{
Packet2d v[2], sum;
v[0] = padd<Packet2d>(vecs[0], reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(vecs[0]), reinterpret_cast<Packet4ui>(vecs[0]), 8)));
v[1] = padd<Packet2d>(vecs[1], reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(vecs[1]), reinterpret_cast<Packet4ui>(vecs[1]), 8)));
sum = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(v[0]), reinterpret_cast<Packet4ui>(v[1]), 8));
return sum;
}
// Other reduction functions:
// mul
template<> EIGEN_STRONG_INLINE int predux_mul<Packet4i>(const Packet4i& a)
@@ -675,30 +603,6 @@ template<int element> EIGEN_STRONG_INLINE Packet4f vec_splat_packet4f(const Pack
return splat;
}
/* This is a tricky one, we have to translate float alignment to vector elements of sizeof double
*/
template<int Offset>
struct palign_impl<Offset,Packet4f>
{
static EIGEN_STRONG_INLINE void run(Packet4f& first, const Packet4f& second)
{
switch (Offset % 4) {
case 1:
first.v4f[0] = vec_sld(first.v4f[0], first.v4f[1], 8);
first.v4f[1] = vec_sld(first.v4f[1], second.v4f[0], 8);
break;
case 2:
first.v4f[0] = first.v4f[1];
first.v4f[1] = second.v4f[0];
break;
case 3:
first.v4f[0] = vec_sld(first.v4f[1], second.v4f[0], 8);
first.v4f[1] = vec_sld(second.v4f[0], second.v4f[1], 8);
break;
}
}
};
template<> EIGEN_STRONG_INLINE Packet4f pload<Packet4f>(const float* from)
{
// FIXME: No intrinsic yet
@@ -831,16 +735,16 @@ template<> EIGEN_STRONG_INLINE Packet4f pand<Packet4f>(const Packet4f& a, const
template<> EIGEN_STRONG_INLINE Packet4f por<Packet4f>(const Packet4f& a, const Packet4f& b)
{
Packet4f res;
res.v4f[0] = pand(a.v4f[0], b.v4f[0]);
res.v4f[1] = pand(a.v4f[1], b.v4f[1]);
res.v4f[0] = por(a.v4f[0], b.v4f[0]);
res.v4f[1] = por(a.v4f[1], b.v4f[1]);
return res;
}
template<> EIGEN_STRONG_INLINE Packet4f pxor<Packet4f>(const Packet4f& a, const Packet4f& b)
{
Packet4f res;
res.v4f[0] = pand(a.v4f[0], b.v4f[0]);
res.v4f[1] = pand(a.v4f[1], b.v4f[1]);
res.v4f[0] = pxor(a.v4f[0], b.v4f[0]);
res.v4f[1] = pxor(a.v4f[1], b.v4f[1]);
return res;
}
@@ -910,21 +814,6 @@ template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
return static_cast<float>(first);
}
template<> EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs)
{
PacketBlock<Packet4f,4> transpose;
transpose.packet[0] = vecs[0];
transpose.packet[1] = vecs[1];
transpose.packet[2] = vecs[2];
transpose.packet[3] = vecs[3];
ptranspose(transpose);
Packet4f sum = padd(transpose.packet[0], transpose.packet[1]);
sum = padd(sum, transpose.packet[2]);
sum = padd(sum, transpose.packet[3]);
return sum;
}
template<> EIGEN_STRONG_INLINE float predux_mul<Packet4f>(const Packet4f& a)
{
// Return predux_mul<Packet2d> of the subvectors product
@@ -995,23 +884,32 @@ template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, cons
result.v4f[1] = vec_sel(elsePacket.v4f[1], thenPacket.v4f[1], mask_lo);
return result;
}
#else
template<int Offset>
struct palign_impl<Offset,Packet4f>
{
static EIGEN_STRONG_INLINE void run(Packet4f& first, const Packet4f& second)
{
switch (Offset % 4) {
case 1:
first = vec_sld(first, second, 4); break;
case 2:
first = vec_sld(first, second, 8); break;
case 3:
first = vec_sld(first, second, 12); break;
}
}
};
template<> Packet4f EIGEN_STRONG_INLINE pcmp_le<Packet4f>(const Packet4f& a, const Packet4f& b)
{
Packet4f res;
res.v4f[0] = pcmp_le(a.v4f[0], b.v4f[0]);
res.v4f[1] = pcmp_le(a.v4f[1], b.v4f[1]);
return res;
}
template<> Packet4f EIGEN_STRONG_INLINE pcmp_lt<Packet4f>(const Packet4f& a, const Packet4f& b)
{
Packet4f res;
res.v4f[0] = pcmp_lt(a.v4f[0], b.v4f[0]);
res.v4f[1] = pcmp_lt(a.v4f[1], b.v4f[1]);
return res;
}
template<> Packet4f EIGEN_STRONG_INLINE pcmp_eq<Packet4f>(const Packet4f& a, const Packet4f& b)
{
Packet4f res;
res.v4f[0] = pcmp_eq(a.v4f[0], b.v4f[0]);
res.v4f[1] = pcmp_eq(a.v4f[1], b.v4f[1]);
return res;
}
#else
template<> EIGEN_STRONG_INLINE Packet4f pload<Packet4f>(const float* from)
{
// FIXME: No intrinsic yet
@@ -1106,34 +1004,6 @@ template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
return pfirst(sum);
}
template<> EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs)
{
Packet4f v[4], sum[4];
// It's easier and faster to transpose then add as columns
// Check: http://www.freevec.org/function/matrix_4x4_transpose_floats for explanation
// Do the transpose, first set of moves
v[0] = vec_mergeh(vecs[0], vecs[2]);
v[1] = vec_mergel(vecs[0], vecs[2]);
v[2] = vec_mergeh(vecs[1], vecs[3]);
v[3] = vec_mergel(vecs[1], vecs[3]);
// Get the resulting vectors
sum[0] = vec_mergeh(v[0], v[2]);
sum[1] = vec_mergel(v[0], v[2]);
sum[2] = vec_mergeh(v[1], v[3]);
sum[3] = vec_mergel(v[1], v[3]);
// Now do the summation:
// Lines 0+1
sum[0] = padd<Packet4f>(sum[0], sum[1]);
// Lines 2+3
sum[1] = padd<Packet4f>(sum[2], sum[3]);
// Add the results
sum[0] = padd<Packet4f>(sum[0], sum[1]);
return sum[0];
}
// Other reduction functions:
// mul
template<> EIGEN_STRONG_INLINE float predux_mul<Packet4f>(const Packet4f& a)

View File

@@ -39,12 +39,12 @@ struct scalar_sum_op : binary_op_base<LhsScalar,RhsScalar>
EIGEN_SCALAR_BINARY_OP_PLUGIN
}
#endif
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a + b; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a + b; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::padd(a,b); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type predux(const Packet& a) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const
{ return internal::predux(a); }
};
template<typename LhsScalar,typename RhsScalar>
@@ -56,15 +56,9 @@ struct functor_traits<scalar_sum_op<LhsScalar,RhsScalar> > {
};
};
/** \internal
* \brief Template specialization to deprecate the summation of boolean expressions.
* This is required to solve Bug 426.
* \sa DenseBase::count(), DenseBase::any(), ArrayBase::cast(), MatrixBase::cast()
*/
template<> struct scalar_sum_op<bool,bool> : scalar_sum_op<int,int> {
EIGEN_DEPRECATED
scalar_sum_op() {}
};
template<>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool scalar_sum_op<bool,bool>::operator() (const bool& a, const bool& b) const { return a || b; }
/** \internal
@@ -83,12 +77,12 @@ struct scalar_product_op : binary_op_base<LhsScalar,RhsScalar>
EIGEN_SCALAR_BINARY_OP_PLUGIN
}
#endif
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a * b; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a * b; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pmul(a,b); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type predux(const Packet& a) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const
{ return internal::predux_mul(a); }
};
template<typename LhsScalar,typename RhsScalar>
@@ -100,6 +94,10 @@ struct functor_traits<scalar_product_op<LhsScalar,RhsScalar> > {
};
};
template<>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool scalar_product_op<bool,bool>::operator() (const bool& a, const bool& b) const { return a && b; }
/** \internal
* \brief Template functor to compute the conjugate product of two scalars
*
@@ -116,11 +114,11 @@ struct scalar_conj_product_op : binary_op_base<LhsScalar,RhsScalar>
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_conj_product_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_conj_product_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const
{ return conj_helper<LhsScalar,RhsScalar,Conj,false>().pmul(a,b); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const
{ return conj_helper<Packet,Packet,Conj,false>().pmul(a,b); }
};
template<typename LhsScalar,typename RhsScalar>
@@ -136,21 +134,28 @@ struct functor_traits<scalar_conj_product_op<LhsScalar,RhsScalar> > {
*
* \sa class CwiseBinaryOp, MatrixBase::cwiseMin, class VectorwiseOp, MatrixBase::minCoeff()
*/
template<typename LhsScalar,typename RhsScalar>
template<typename LhsScalar,typename RhsScalar, int NaNPropagation>
struct scalar_min_op : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_min_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_min_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return numext::mini(a, b); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
return internal::pmin<NaNPropagation>(a, b);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pmin(a,b); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const
{
return internal::pmin<NaNPropagation>(a,b);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type predux(const Packet& a) const
{ return internal::predux_min(a); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const
{
return internal::predux_min<NaNPropagation>(a);
}
};
template<typename LhsScalar,typename RhsScalar>
struct functor_traits<scalar_min_op<LhsScalar,RhsScalar> > {
template<typename LhsScalar,typename RhsScalar, int NaNPropagation>
struct functor_traits<scalar_min_op<LhsScalar,RhsScalar, NaNPropagation> > {
enum {
Cost = (NumTraits<LhsScalar>::AddCost+NumTraits<RhsScalar>::AddCost)/2,
PacketAccess = internal::is_same<LhsScalar, RhsScalar>::value && packet_traits<LhsScalar>::HasMin
@@ -162,21 +167,28 @@ struct functor_traits<scalar_min_op<LhsScalar,RhsScalar> > {
*
* \sa class CwiseBinaryOp, MatrixBase::cwiseMax, class VectorwiseOp, MatrixBase::maxCoeff()
*/
template<typename LhsScalar,typename RhsScalar>
struct scalar_max_op : binary_op_base<LhsScalar,RhsScalar>
template<typename LhsScalar,typename RhsScalar, int NaNPropagation>
struct scalar_max_op : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_max_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_max_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return numext::maxi(a, b); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
return internal::pmax<NaNPropagation>(a,b);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pmax(a,b); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const
{
return internal::pmax<NaNPropagation>(a,b);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type predux(const Packet& a) const
{ return internal::predux_max(a); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const
{
return internal::predux_max<NaNPropagation>(a);
}
};
template<typename LhsScalar,typename RhsScalar>
struct functor_traits<scalar_max_op<LhsScalar,RhsScalar> > {
template<typename LhsScalar,typename RhsScalar, int NaNPropagation>
struct functor_traits<scalar_max_op<LhsScalar,RhsScalar, NaNPropagation> > {
enum {
Cost = (NumTraits<LhsScalar>::AddCost+NumTraits<RhsScalar>::AddCost)/2,
PacketAccess = internal::is_same<LhsScalar, RhsScalar>::value && packet_traits<LhsScalar>::HasMax
@@ -253,6 +265,110 @@ struct scalar_cmp_op<LhsScalar,RhsScalar, cmp_NEQ> : binary_op_base<LhsScalar,Rh
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a!=b;}
};
/** \internal
* \brief Template functors for comparison of two scalars and cast the output from boolean to input datatype
*/
template<typename LhsScalar, typename RhsScalar, ComparisonName cmp> struct scalar_cmp_with_cast_op;
template<typename LhsScalar, typename RhsScalar, ComparisonName cmp>
struct functor_traits<scalar_cmp_with_cast_op<LhsScalar,RhsScalar, cmp> > {
enum {
Cost = (NumTraits<LhsScalar>::AddCost+NumTraits<RhsScalar>::AddCost)/2,
PacketAccess = internal::is_same<LhsScalar, RhsScalar>::value && packet_traits<LhsScalar>::HasCmp
};
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_EQ> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a==b) return static_cast<result_type>(1);
else return static_cast<result_type>(0);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(internal::pcmp_eq(a,b), internal::pset1<Packet>(static_cast<result_type>(1)), internal::pzero(a)); }
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_LT> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a<b) return static_cast<result_type>(1);
else return static_cast<result_type>(0);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(internal::pcmp_lt(a,b), internal::pset1<Packet>(static_cast<result_type>(1)), internal::pzero(a)); }
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_LE> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a<=b) return static_cast<result_type>(1);
else return static_cast<result_type>(0);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(internal::pcmp_le(a,b), internal::pset1<Packet>(static_cast<result_type>(1)), internal::pzero(a)); }
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_GT> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a>b) return static_cast<result_type>(1);
else return static_cast<result_type>(0);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(internal::pcmp_le(a,b), internal::pzero(a), internal::pset1<Packet>(static_cast<result_type>(1))); }
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_GE> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a>=b) return static_cast<result_type>(1);
else return static_cast<result_type>(0);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(internal::pcmp_lt(a,b), internal::pzero(a), internal::pset1<Packet>(static_cast<result_type>(1))); }
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_UNORD> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a<=b || b<=a) return static_cast<result_type>(0);
else return static_cast<result_type>(1);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(por(internal::pcmp_le(a,b), internal::pcmp_le(b,a)), internal::pzero(a), internal::pset1<Packet>(static_cast<result_type>(1))); }
};
template<typename LhsScalar, typename RhsScalar>
struct scalar_cmp_with_cast_op<LhsScalar, RhsScalar, cmp_NEQ> : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_cmp_with_cast_op>::ReturnType result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_with_cast_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const {
if(a!=b) return static_cast<result_type>(1);
else return static_cast<result_type>(0);
}
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pselect(internal::pcmp_eq(a,b), internal::pzero(a), internal::pset1<Packet>(static_cast<result_type>(1))); }
};
/** \internal
* \brief Template functor to compute the hypot of two \b positive \b and \b real scalars
@@ -287,6 +403,7 @@ struct functor_traits<scalar_hypot_op<Scalar,Scalar> > {
/** \internal
* \brief Template functor to compute the pow of two scalars
* See the specification of pow in https://en.cppreference.com/w/cpp/numeric/math/pow
*/
template<typename Scalar, typename Exponent>
struct scalar_pow_op : binary_op_base<Scalar,Exponent>
@@ -301,16 +418,31 @@ struct scalar_pow_op : binary_op_base<Scalar,Exponent>
EIGEN_SCALAR_BINARY_OP_PLUGIN
}
#endif
EIGEN_DEVICE_FUNC
inline result_type operator() (const Scalar& a, const Exponent& b) const { return numext::pow(a, b); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{
return generic_pow(a,b);
}
};
template<typename Scalar, typename Exponent>
struct functor_traits<scalar_pow_op<Scalar,Exponent> > {
enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = false };
enum {
Cost = 5 * NumTraits<Scalar>::MulCost,
PacketAccess = (!NumTraits<Scalar>::IsComplex && !NumTraits<Scalar>::IsInteger &&
packet_traits<Scalar>::HasExp && packet_traits<Scalar>::HasLog &&
packet_traits<Scalar>::HasRound && packet_traits<Scalar>::HasCmp &&
// Temporarly disable packet access for half/bfloat16 until
// accuracy is improved.
!is_same<Scalar, half>::value && !is_same<Scalar, bfloat16>::value
)
};
};
//---------- non associative binary functors ----------
/** \internal
@@ -382,11 +514,14 @@ struct functor_traits<scalar_quotient_op<LhsScalar,RhsScalar> > {
struct scalar_boolean_and_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_and_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a && b; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pand(a,b); }
};
template<> struct functor_traits<scalar_boolean_and_op> {
enum {
Cost = NumTraits<bool>::AddCost,
PacketAccess = false
PacketAccess = true
};
};
@@ -398,11 +533,14 @@ template<> struct functor_traits<scalar_boolean_and_op> {
struct scalar_boolean_or_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_or_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a || b; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::por(a,b); }
};
template<> struct functor_traits<scalar_boolean_or_op> {
enum {
Cost = NumTraits<bool>::AddCost,
PacketAccess = false
PacketAccess = true
};
};
@@ -414,11 +552,44 @@ template<> struct functor_traits<scalar_boolean_or_op> {
struct scalar_boolean_xor_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_xor_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a ^ b; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pxor(a,b); }
};
template<> struct functor_traits<scalar_boolean_xor_op> {
enum {
Cost = NumTraits<bool>::AddCost,
PacketAccess = false
PacketAccess = true
};
};
/** \internal
* \brief Template functor to compute the absolute difference of two scalars
*
* \sa class CwiseBinaryOp, MatrixBase::absolute_difference
*/
template<typename LhsScalar,typename RhsScalar>
struct scalar_absolute_difference_op : binary_op_base<LhsScalar,RhsScalar>
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar,scalar_absolute_difference_op>::ReturnType result_type;
#ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN
EIGEN_EMPTY_STRUCT_CTOR(scalar_absolute_difference_op)
#else
scalar_absolute_difference_op() {
EIGEN_SCALAR_BINARY_OP_PLUGIN
}
#endif
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const
{ return numext::absdiff(a,b); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
{ return internal::pabsdiff(a,b); }
};
template<typename LhsScalar,typename RhsScalar>
struct functor_traits<scalar_absolute_difference_op<LhsScalar,RhsScalar> > {
enum {
Cost = (NumTraits<LhsScalar>::AddCost+NumTraits<RhsScalar>::AddCost)/2,
PacketAccess = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasAbsDiff
};
};

View File

@@ -44,17 +44,17 @@ struct linspaced_op_impl<Scalar,/*IsInteger*/false>
{
typedef typename NumTraits<Scalar>::Real RealScalar;
linspaced_op_impl(const Scalar& low, const Scalar& high, Index num_steps) :
m_low(low), m_high(high), m_size1(num_steps==1 ? 1 : num_steps-1), m_step(num_steps==1 ? Scalar() : (high-low)/RealScalar(num_steps-1)),
EIGEN_DEVICE_FUNC linspaced_op_impl(const Scalar& low, const Scalar& high, Index num_steps) :
m_low(low), m_high(high), m_size1(num_steps==1 ? 1 : num_steps-1), m_step(num_steps==1 ? Scalar() : Scalar((high-low)/RealScalar(num_steps-1))),
m_flip(numext::abs(high)<numext::abs(low))
{}
template<typename IndexType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (IndexType i) const {
if(m_flip)
return (i==0)? m_low : (m_high - RealScalar(m_size1-i)*m_step);
return (i==0)? m_low : Scalar(m_high - RealScalar(m_size1-i)*m_step);
else
return (i==m_size1)? m_high : (m_low + RealScalar(i)*m_step);
return (i==m_size1)? m_high : Scalar(m_low + RealScalar(i)*m_step);
}
template<typename Packet, typename IndexType>
@@ -66,17 +66,17 @@ struct linspaced_op_impl<Scalar,/*IsInteger*/false>
{
Packet pi = plset<Packet>(Scalar(i-m_size1));
Packet res = padd(pset1<Packet>(m_high), pmul(pset1<Packet>(m_step), pi));
if(i==0)
res = pinsertfirst(res, m_low);
return res;
if (EIGEN_PREDICT_TRUE(i != 0)) return res;
Packet mask = pcmp_lt(pset1<Packet>(0), plset<Packet>(0));
return pselect<Packet>(mask, res, pset1<Packet>(m_low));
}
else
{
Packet pi = plset<Packet>(Scalar(i));
Packet res = padd(pset1<Packet>(m_low), pmul(pset1<Packet>(m_step), pi));
if(i==m_size1-unpacket_traits<Packet>::size+1)
res = pinsertlast(res, m_high);
return res;
if(EIGEN_PREDICT_TRUE(i != m_size1-unpacket_traits<Packet>::size+1)) return res;
Packet mask = pcmp_lt(plset<Packet>(0), pset1<Packet>(unpacket_traits<Packet>::size-1));
return pselect<Packet>(mask, res, pset1<Packet>(m_high));
}
}
@@ -90,7 +90,7 @@ struct linspaced_op_impl<Scalar,/*IsInteger*/false>
template <typename Scalar>
struct linspaced_op_impl<Scalar,/*IsInteger*/true>
{
linspaced_op_impl(const Scalar& low, const Scalar& high, Index num_steps) :
EIGEN_DEVICE_FUNC linspaced_op_impl(const Scalar& low, const Scalar& high, Index num_steps) :
m_low(low),
m_multiplier((high-low)/convert_index<Scalar>(num_steps<=1 ? 1 : num_steps-1)),
m_divisor(convert_index<Scalar>((high>=low?num_steps:-num_steps)+(high-low))/((numext::abs(high-low)+1)==0?1:(numext::abs(high-low)+1))),
@@ -129,7 +129,7 @@ template <typename Scalar> struct functor_traits< linspaced_op<Scalar> >
};
template <typename Scalar> struct linspaced_op
{
linspaced_op(const Scalar& low, const Scalar& high, Index num_steps)
EIGEN_DEVICE_FUNC linspaced_op(const Scalar& low, const Scalar& high, Index num_steps)
: impl((num_steps==1 ? high : low),high,num_steps)
{}

View File

@@ -166,6 +166,44 @@ template<typename Scalar, typename NewType>
struct functor_traits<scalar_cast_op<Scalar,NewType> >
{ enum { Cost = is_same<Scalar, NewType>::value ? 0 : NumTraits<NewType>::AddCost, PacketAccess = false }; };
/** \internal
* \brief Template functor to arithmetically shift a scalar right by a number of bits
*
* \sa class CwiseUnaryOp, MatrixBase::shift_right()
*/
template<typename Scalar, int N>
struct scalar_shift_right_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_shift_right_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const
{ return a >> N; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
{ return internal::parithmetic_shift_right<N>(a); }
};
template<typename Scalar, int N>
struct functor_traits<scalar_shift_right_op<Scalar,N> >
{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = packet_traits<Scalar>::HasShift }; };
/** \internal
* \brief Template functor to logically shift a scalar left by a number of bits
*
* \sa class CwiseUnaryOp, MatrixBase::shift_left()
*/
template<typename Scalar, int N>
struct scalar_shift_left_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_shift_left_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const
{ return a << N; }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
{ return internal::plogical_shift_left<N>(a); }
};
template<typename Scalar, int N>
struct functor_traits<scalar_shift_left_op<Scalar,N> >
{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = packet_traits<Scalar>::HasShift }; };
/** \internal
* \brief Template functor to extract the real part of a complex
*
@@ -349,7 +387,7 @@ struct functor_traits<scalar_log1p_op<Scalar> > {
*/
template<typename Scalar> struct scalar_log10_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_log10_op)
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { EIGEN_USING_STD_MATH(log10) return log10(a); }
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { EIGEN_USING_STD(log10) return log10(a); }
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::plog10(a); }
};
@@ -357,6 +395,22 @@ template<typename Scalar>
struct functor_traits<scalar_log10_op<Scalar> >
{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasLog10 }; };
/** \internal
*
* \brief Template functor to compute the base-2 logarithm of a scalar
*
* \sa class CwiseUnaryOp, Cwise::log2()
*/
template<typename Scalar> struct scalar_log2_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_log2_op)
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return Scalar(EIGEN_LOG2E) * numext::log(a); }
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::plog2(a); }
};
template<typename Scalar>
struct functor_traits<scalar_log2_op<Scalar> >
{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasLog }; };
/** \internal
* \brief Template functor to compute the square root of a scalar
* \sa class CwiseUnaryOp, Cwise::sqrt()
@@ -384,13 +438,25 @@ struct functor_traits<scalar_sqrt_op<Scalar> > {
};
};
// Boolean specialization to eliminate -Wimplicit-conversion-floating-point-to-bool warnings.
template<> struct scalar_sqrt_op<bool> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_sqrt_op)
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline bool operator() (const bool& a) const { return a; }
template <typename Packet>
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return a; }
};
template <>
struct functor_traits<scalar_sqrt_op<bool> > {
enum { Cost = 1, PacketAccess = packet_traits<bool>::Vectorizable };
};
/** \internal
* \brief Template functor to compute the reciprocal square root of a scalar
* \sa class CwiseUnaryOp, Cwise::rsqrt()
*/
template<typename Scalar> struct scalar_rsqrt_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_rsqrt_op)
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return Scalar(1)/numext::sqrt(a); }
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::rsqrt(a); }
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::prsqrt(a); }
};
@@ -681,6 +747,19 @@ template<typename Scalar>
struct functor_traits<scalar_square_op<Scalar> >
{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
// Boolean specialization to avoid -Wint-in-bool-context warnings on GCC.
template<>
struct scalar_square_op<bool> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_square_op)
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline bool operator() (const bool& a) const { return a; }
template<typename Packet>
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const
{ return a; }
};
template<>
struct functor_traits<scalar_square_op<bool> >
{ enum { Cost = 0, PacketAccess = packet_traits<bool>::Vectorizable }; };
/** \internal
* \brief Template functor to compute the cube of a scalar
* \sa class CwiseUnaryOp, Cwise::cube()
@@ -697,6 +776,19 @@ template<typename Scalar>
struct functor_traits<scalar_cube_op<Scalar> >
{ enum { Cost = 2*NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
// Boolean specialization to avoid -Wint-in-bool-context warnings on GCC.
template<>
struct scalar_cube_op<bool> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cube_op)
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline bool operator() (const bool& a) const { return a; }
template<typename Packet>
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const
{ return a; }
};
template<>
struct functor_traits<scalar_cube_op<bool> >
{ enum { Cost = 0, PacketAccess = packet_traits<bool>::Vectorizable }; };
/** \internal
* \brief Template functor to compute the rounded value of a scalar
* \sa class CwiseUnaryOp, ArrayBase::round()
@@ -735,6 +827,25 @@ struct functor_traits<scalar_floor_op<Scalar> >
};
};
/** \internal
* \brief Template functor to compute the rounded (with current rounding mode) value of a scalar
* \sa class CwiseUnaryOp, ArrayBase::rint()
*/
template<typename Scalar> struct scalar_rint_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_rint_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::rint(a); }
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::print(a); }
};
template<typename Scalar>
struct functor_traits<scalar_rint_op<Scalar> >
{
enum {
Cost = NumTraits<Scalar>::MulCost,
PacketAccess = packet_traits<Scalar>::HasRint
};
};
/** \internal
* \brief Template functor to compute the ceil of a scalar
* \sa class CwiseUnaryOp, ArrayBase::ceil()
@@ -847,9 +958,9 @@ struct functor_traits<scalar_boolean_not_op<Scalar> > {
* \brief Template functor to compute the signum of a scalar
* \sa class CwiseUnaryOp, Cwise::sign()
*/
template<typename Scalar,bool iscpx=(NumTraits<Scalar>::IsComplex!=0) > struct scalar_sign_op;
template<typename Scalar,bool is_complex=(NumTraits<Scalar>::IsComplex!=0), bool is_integer=(NumTraits<Scalar>::IsInteger!=0) > struct scalar_sign_op;
template<typename Scalar>
struct scalar_sign_op<Scalar,false> {
struct scalar_sign_op<Scalar, false, true> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op)
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const
{
@@ -859,8 +970,21 @@ struct scalar_sign_op<Scalar,false> {
//template <typename Packet>
//EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psign(a); }
};
template<typename Scalar>
struct scalar_sign_op<Scalar,true> {
struct scalar_sign_op<Scalar, false, false> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op)
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const
{
return (numext::isnan)(a) ? a : Scalar( (a>Scalar(0)) - (a<Scalar(0)) );
}
//TODO
//template <typename Packet>
//EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psign(a); }
};
template<typename Scalar, bool is_integer>
struct scalar_sign_op<Scalar,true, is_integer> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op)
EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const
{
@@ -894,8 +1018,7 @@ template <typename T>
struct scalar_logistic_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_logistic_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& x) const {
const T one = T(1);
return one / (one + numext::exp(-x));
return packetOp(x);
}
template <typename Packet> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -905,14 +1028,99 @@ struct scalar_logistic_op {
}
};
#ifndef EIGEN_GPU_COMPILE_PHASE
/** \internal
* \brief Template specialization of the logistic function for float.
*
* Uses just a 9/10-degree rational interpolant which
* interpolates 1/(1+exp(-x)) - 0.5 up to a couple of ulps in the range
* [-9, 18]. Below -9 we use the more accurate approximation
* 1/(1+exp(-x)) ~= exp(x), and above 18 the logistic function is 1 withing
* one ulp. The shifted logistic is interpolated because it was easier to
* make the fit converge.
*
*/
template <>
struct scalar_logistic_op<float> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_logistic_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator()(const float& x) const {
return packetOp(x);
}
template <typename Packet> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Packet packetOp(const Packet& _x) const {
const Packet cutoff_lower = pset1<Packet>(-9.f);
const Packet lt_mask = pcmp_lt<Packet>(_x, cutoff_lower);
const bool any_small = predux_any(lt_mask);
// The upper cut-off is the smallest x for which the rational approximation evaluates to 1.
// Choosing this value saves us a few instructions clamping the results at the end.
#ifdef EIGEN_VECTORIZE_FMA
const Packet cutoff_upper = pset1<Packet>(15.7243833541870117f);
#else
const Packet cutoff_upper = pset1<Packet>(15.6437711715698242f);
#endif
const Packet x = pmin(_x, cutoff_upper);
// The monomial coefficients of the numerator polynomial (odd).
const Packet alpha_1 = pset1<Packet>(2.48287947061529e-01f);
const Packet alpha_3 = pset1<Packet>(8.51377133304701e-03f);
const Packet alpha_5 = pset1<Packet>(6.08574864600143e-05f);
const Packet alpha_7 = pset1<Packet>(1.15627324459942e-07f);
const Packet alpha_9 = pset1<Packet>(4.37031012579801e-11f);
// The monomial coefficients of the denominator polynomial (even).
const Packet beta_0 = pset1<Packet>(9.93151921023180e-01f);
const Packet beta_2 = pset1<Packet>(1.16817656904453e-01f);
const Packet beta_4 = pset1<Packet>(1.70198817374094e-03f);
const Packet beta_6 = pset1<Packet>(6.29106785017040e-06f);
const Packet beta_8 = pset1<Packet>(5.76102136993427e-09f);
const Packet beta_10 = pset1<Packet>(6.10247389755681e-13f);
// Since the polynomials are odd/even, we need x^2.
const Packet x2 = pmul(x, x);
// Evaluate the numerator polynomial p.
Packet p = pmadd(x2, alpha_9, alpha_7);
p = pmadd(x2, p, alpha_5);
p = pmadd(x2, p, alpha_3);
p = pmadd(x2, p, alpha_1);
p = pmul(x, p);
// Evaluate the denominator polynomial q.
Packet q = pmadd(x2, beta_10, beta_8);
q = pmadd(x2, q, beta_6);
q = pmadd(x2, q, beta_4);
q = pmadd(x2, q, beta_2);
q = pmadd(x2, q, beta_0);
// Divide the numerator by the denominator and shift it up.
const Packet logistic = padd(pdiv(p, q), pset1<Packet>(0.5f));
if (EIGEN_PREDICT_FALSE(any_small)) {
const Packet exponential = pexp(_x);
return pselect(lt_mask, exponential, logistic);
} else {
return logistic;
}
}
};
#endif // #ifndef EIGEN_GPU_COMPILE_PHASE
template <typename T>
struct functor_traits<scalar_logistic_op<T> > {
enum {
// The cost estimate for float here here is for the common(?) case where
// all arguments are greater than -9.
Cost = scalar_div_cost<T, packet_traits<T>::HasDiv>::value +
NumTraits<T>::AddCost * 2 + functor_traits<scalar_exp_op<T> >::Cost,
(internal::is_same<T, float>::value
? NumTraits<T>::AddCost * 15 + NumTraits<T>::MulCost * 11
: NumTraits<T>::AddCost * 2 +
functor_traits<scalar_exp_op<T> >::Cost),
PacketAccess =
packet_traits<T>::HasAdd && packet_traits<T>::HasDiv &&
packet_traits<T>::HasNegate && packet_traits<T>::HasExp
(internal::is_same<T, float>::value
? packet_traits<T>::HasMul && packet_traits<T>::HasMax &&
packet_traits<T>::HasMin
: packet_traits<T>::HasNegate && packet_traits<T>::HasExp)
};
};

View File

@@ -31,16 +31,42 @@ inline std::ptrdiff_t manage_caching_sizes_helper(std::ptrdiff_t a, std::ptrdiff
return a<=0 ? b : a;
}
#if EIGEN_ARCH_i386_OR_x86_64
const std::ptrdiff_t defaultL1CacheSize = 32*1024;
const std::ptrdiff_t defaultL2CacheSize = 256*1024;
const std::ptrdiff_t defaultL3CacheSize = 2*1024*1024;
#if defined(EIGEN_DEFAULT_L1_CACHE_SIZE)
#define EIGEN_SET_DEFAULT_L1_CACHE_SIZE(val) EIGEN_DEFAULT_L1_CACHE_SIZE
#else
const std::ptrdiff_t defaultL1CacheSize = 16*1024;
const std::ptrdiff_t defaultL2CacheSize = 512*1024;
const std::ptrdiff_t defaultL3CacheSize = 512*1024;
#define EIGEN_SET_DEFAULT_L1_CACHE_SIZE(val) val
#endif // defined(EIGEN_DEFAULT_L1_CACHE_SIZE)
#if defined(EIGEN_DEFAULT_L2_CACHE_SIZE)
#define EIGEN_SET_DEFAULT_L2_CACHE_SIZE(val) EIGEN_DEFAULT_L2_CACHE_SIZE
#else
#define EIGEN_SET_DEFAULT_L2_CACHE_SIZE(val) val
#endif // defined(EIGEN_DEFAULT_L2_CACHE_SIZE)
#if defined(EIGEN_DEFAULT_L3_CACHE_SIZE)
#define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) EIGEN_SET_DEFAULT_L3_CACHE_SIZE
#else
#define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) val
#endif // defined(EIGEN_DEFAULT_L3_CACHE_SIZE)
#if EIGEN_ARCH_i386_OR_x86_64
const std::ptrdiff_t defaultL1CacheSize = EIGEN_SET_DEFAULT_L1_CACHE_SIZE(32*1024);
const std::ptrdiff_t defaultL2CacheSize = EIGEN_SET_DEFAULT_L2_CACHE_SIZE(256*1024);
const std::ptrdiff_t defaultL3CacheSize = EIGEN_SET_DEFAULT_L3_CACHE_SIZE(2*1024*1024);
#elif EIGEN_ARCH_PPC
const std::ptrdiff_t defaultL1CacheSize = EIGEN_SET_DEFAULT_L1_CACHE_SIZE(64*1024);
const std::ptrdiff_t defaultL2CacheSize = EIGEN_SET_DEFAULT_L2_CACHE_SIZE(512*1024);
const std::ptrdiff_t defaultL3CacheSize = EIGEN_SET_DEFAULT_L3_CACHE_SIZE(4*1024*1024);
#else
const std::ptrdiff_t defaultL1CacheSize = EIGEN_SET_DEFAULT_L1_CACHE_SIZE(16*1024);
const std::ptrdiff_t defaultL2CacheSize = EIGEN_SET_DEFAULT_L2_CACHE_SIZE(512*1024);
const std::ptrdiff_t defaultL3CacheSize = EIGEN_SET_DEFAULT_L3_CACHE_SIZE(512*1024);
#endif
#undef EIGEN_SET_DEFAULT_L1_CACHE_SIZE
#undef EIGEN_SET_DEFAULT_L2_CACHE_SIZE
#undef EIGEN_SET_DEFAULT_L3_CACHE_SIZE
/** \internal */
struct CacheSizes {
CacheSizes(): m_l1(-1),m_l2(-1),m_l3(-1) {
@@ -56,7 +82,6 @@ struct CacheSizes {
std::ptrdiff_t m_l3;
};
/** \internal */
inline void manage_caching_sizes(Action action, std::ptrdiff_t* l1, std::ptrdiff_t* l2, std::ptrdiff_t* l3)
{
@@ -131,7 +156,8 @@ void evaluateProductBlockingSizesHeuristic(Index& k, Index& m, Index& n, Index n
// registers. However once the latency is hidden there is no point in
// increasing the value of k, so we'll cap it at 320 (value determined
// experimentally).
const Index k_cache = (numext::mini<Index>)((l1-ksub)/kdiv, 320);
// To avoid that k vanishes, we make k_cache at least as big as kr
const Index k_cache = numext::maxi<Index>(kr, (numext::mini<Index>)((l1-ksub)/kdiv, 320));
if (k_cache < k) {
k = k_cache - (k_cache % kr);
eigen_internal_assert(k > 0);
@@ -1050,148 +1076,6 @@ protected:
};
#if EIGEN_ARCH_ARM64 && defined EIGEN_VECTORIZE_NEON
template<>
struct gebp_traits <float, float, false, false,Architecture::NEON,GEBPPacketFull>
: gebp_traits<float,float,false,false,Architecture::Generic,GEBPPacketFull>
{
typedef float RhsPacket;
typedef float32x4_t RhsPacketx4;
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const
{
dest = *b;
}
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const
{
dest = vld1q_f32(b);
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacket& dest) const
{
dest = *b;
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const
{}
EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
{
loadRhs(b,dest);
}
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{
c = vfmaq_n_f32(c, a, b);
}
// NOTE: Template parameter inference failed when compiled with Android NDK:
// "candidate template ignored: could not match 'FixedInt<N>' against 'Eigen::internal::FixedInt<0>".
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{ madd_helper<0>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<1>&) const
{ madd_helper<1>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<2>&) const
{ madd_helper<2>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<3>&) const
{ madd_helper<3>(a, b, c); }
private:
template<int LaneID>
EIGEN_STRONG_INLINE void madd_helper(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c) const
{
#if EIGEN_COMP_GNUC_STRICT && !(EIGEN_GNUC_AT_LEAST(9,0))
// workaround gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101
// vfmaq_laneq_f32 is implemented through a costly dup
if(LaneID==0) asm("fmla %0.4s, %1.4s, %2.s[0]\n" : "+w" (c) : "w" (a), "w" (b) : );
else if(LaneID==1) asm("fmla %0.4s, %1.4s, %2.s[1]\n" : "+w" (c) : "w" (a), "w" (b) : );
else if(LaneID==2) asm("fmla %0.4s, %1.4s, %2.s[2]\n" : "+w" (c) : "w" (a), "w" (b) : );
else if(LaneID==3) asm("fmla %0.4s, %1.4s, %2.s[3]\n" : "+w" (c) : "w" (a), "w" (b) : );
#else
c = vfmaq_laneq_f32(c, a, b, LaneID);
#endif
}
};
template<>
struct gebp_traits <double, double, false, false,Architecture::NEON>
: gebp_traits<double,double,false,false,Architecture::Generic>
{
typedef double RhsPacket;
struct RhsPacketx4 {
float64x2_t B_0, B_1;
};
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const
{
dest = *b;
}
EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const
{
dest.B_0 = vld1q_f64(b);
dest.B_1 = vld1q_f64(b+2);
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacket& dest) const
{
loadRhs(b,dest);
}
EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const
{}
EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
{
loadRhs(b,dest);
}
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{
c = vfmaq_n_f64(c, a, b);
}
// NOTE: Template parameter inference failed when compiled with Android NDK:
// "candidate template ignored: could not match 'FixedInt<N>' against 'Eigen::internal::FixedInt<0>".
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const
{ madd_helper<0>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<1>&) const
{ madd_helper<1>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<2>&) const
{ madd_helper<2>(a, b, c); }
EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<3>&) const
{ madd_helper<3>(a, b, c); }
private:
template <int LaneID>
EIGEN_STRONG_INLINE void madd_helper(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c) const
{
#if EIGEN_COMP_GNUC_STRICT && !(EIGEN_GNUC_AT_LEAST(9,0))
// workaround gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101
// vfmaq_laneq_f64 is implemented through a costly dup
if(LaneID==0) asm("fmla %0.2d, %1.2d, %2.d[0]\n" : "+w" (c) : "w" (a), "w" (b.B_0) : );
else if(LaneID==1) asm("fmla %0.2d, %1.2d, %2.d[1]\n" : "+w" (c) : "w" (a), "w" (b.B_0) : );
else if(LaneID==2) asm("fmla %0.2d, %1.2d, %2.d[0]\n" : "+w" (c) : "w" (a), "w" (b.B_1) : );
else if(LaneID==3) asm("fmla %0.2d, %1.2d, %2.d[1]\n" : "+w" (c) : "w" (a), "w" (b.B_1) : );
#else
if(LaneID==0) c = vfmaq_laneq_f64(c, a, b.B_0, 0);
else if(LaneID==1) c = vfmaq_laneq_f64(c, a, b.B_0, 1);
else if(LaneID==2) c = vfmaq_laneq_f64(c, a, b.B_1, 0);
else if(LaneID==3) c = vfmaq_laneq_f64(c, a, b.B_1, 1);
#endif
}
};
#endif
/* optimized General packed Block * packed Panel product kernel
*
* Mixing type logic: C += A * B
@@ -2649,94 +2533,104 @@ template<typename Scalar, typename Index, typename DataMapper, int nr, bool Conj
struct gemm_pack_rhs<Scalar, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode>
{
typedef typename packet_traits<Scalar>::type Packet;
typedef typename unpacket_traits<Packet>::half HalfPacket;
typedef typename unpacket_traits<typename unpacket_traits<Packet>::half>::half QuarterPacket;
typedef typename DataMapper::LinearMapper LinearMapper;
enum { PacketSize = packet_traits<Scalar>::size };
EIGEN_DONT_INLINE void operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0);
};
template<typename Scalar, typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
EIGEN_DONT_INLINE void gemm_pack_rhs<Scalar, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode>
::operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset)
{
EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK RHS ROWMAJOR");
EIGEN_UNUSED_VARIABLE(stride);
EIGEN_UNUSED_VARIABLE(offset);
eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride));
conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0;
Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0;
Index count = 0;
// if(nr>=8)
// {
// for(Index j2=0; j2<packet_cols8; j2+=8)
// {
// // skip what we have before
// if(PanelMode) count += 8 * offset;
// for(Index k=0; k<depth; k++)
// {
// if (PacketSize==8) {
// Packet A = ploadu<Packet>(&rhs[k*rhsStride + j2]);
// pstoreu(blockB+count, cj.pconj(A));
// } else if (PacketSize==4) {
// Packet A = ploadu<Packet>(&rhs[k*rhsStride + j2]);
// Packet B = ploadu<Packet>(&rhs[k*rhsStride + j2 + PacketSize]);
// pstoreu(blockB+count, cj.pconj(A));
// pstoreu(blockB+count+PacketSize, cj.pconj(B));
// } else {
// const Scalar* b0 = &rhs[k*rhsStride + j2];
// blockB[count+0] = cj(b0[0]);
// blockB[count+1] = cj(b0[1]);
// blockB[count+2] = cj(b0[2]);
// blockB[count+3] = cj(b0[3]);
// blockB[count+4] = cj(b0[4]);
// blockB[count+5] = cj(b0[5]);
// blockB[count+6] = cj(b0[6]);
// blockB[count+7] = cj(b0[7]);
// }
// count += 8;
// }
// // skip what we have after
// if(PanelMode) count += 8 * (stride-offset-depth);
// }
// }
if(nr>=4)
enum { PacketSize = packet_traits<Scalar>::size,
HalfPacketSize = unpacket_traits<HalfPacket>::size,
QuarterPacketSize = unpacket_traits<QuarterPacket>::size};
EIGEN_DONT_INLINE void operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0)
{
for(Index j2=packet_cols8; j2<packet_cols4; j2+=4)
EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK RHS ROWMAJOR");
EIGEN_UNUSED_VARIABLE(stride);
EIGEN_UNUSED_VARIABLE(offset);
eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride));
const bool HasHalf = (int)HalfPacketSize < (int)PacketSize;
const bool HasQuarter = (int)QuarterPacketSize < (int)HalfPacketSize;
conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0;
Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0;
Index count = 0;
// if(nr>=8)
// {
// for(Index j2=0; j2<packet_cols8; j2+=8)
// {
// // skip what we have before
// if(PanelMode) count += 8 * offset;
// for(Index k=0; k<depth; k++)
// {
// if (PacketSize==8) {
// Packet A = ploadu<Packet>(&rhs[k*rhsStride + j2]);
// pstoreu(blockB+count, cj.pconj(A));
// } else if (PacketSize==4) {
// Packet A = ploadu<Packet>(&rhs[k*rhsStride + j2]);
// Packet B = ploadu<Packet>(&rhs[k*rhsStride + j2 + PacketSize]);
// pstoreu(blockB+count, cj.pconj(A));
// pstoreu(blockB+count+PacketSize, cj.pconj(B));
// } else {
// const Scalar* b0 = &rhs[k*rhsStride + j2];
// blockB[count+0] = cj(b0[0]);
// blockB[count+1] = cj(b0[1]);
// blockB[count+2] = cj(b0[2]);
// blockB[count+3] = cj(b0[3]);
// blockB[count+4] = cj(b0[4]);
// blockB[count+5] = cj(b0[5]);
// blockB[count+6] = cj(b0[6]);
// blockB[count+7] = cj(b0[7]);
// }
// count += 8;
// }
// // skip what we have after
// if(PanelMode) count += 8 * (stride-offset-depth);
// }
// }
if(nr>=4)
{
// skip what we have before
if(PanelMode) count += 4 * offset;
for(Index j2=packet_cols8; j2<packet_cols4; j2+=4)
{
// skip what we have before
if(PanelMode) count += 4 * offset;
for(Index k=0; k<depth; k++)
{
if (PacketSize==4) {
Packet A = rhs.template loadPacket<Packet>(k, j2);
pstoreu(blockB+count, cj.pconj(A));
count += PacketSize;
} else if (HasHalf && HalfPacketSize==4) {
HalfPacket A = rhs.template loadPacket<HalfPacket>(k, j2);
pstoreu(blockB+count, cj.pconj(A));
count += HalfPacketSize;
} else if (HasQuarter && QuarterPacketSize==4) {
QuarterPacket A = rhs.template loadPacket<QuarterPacket>(k, j2);
pstoreu(blockB+count, cj.pconj(A));
count += QuarterPacketSize;
} else {
const LinearMapper dm0 = rhs.getLinearMapper(k, j2);
blockB[count+0] = cj(dm0(0));
blockB[count+1] = cj(dm0(1));
blockB[count+2] = cj(dm0(2));
blockB[count+3] = cj(dm0(3));
count += 4;
}
}
// skip what we have after
if(PanelMode) count += 4 * (stride-offset-depth);
}
}
// copy the remaining columns one at a time (nr==1)
for(Index j2=packet_cols4; j2<cols; ++j2)
{
if(PanelMode) count += offset;
for(Index k=0; k<depth; k++)
{
if (PacketSize==4) {
Packet A = rhs.template loadPacket<Packet>(k, j2);
pstoreu(blockB+count, cj.pconj(A));
count += PacketSize;
} else {
const LinearMapper dm0 = rhs.getLinearMapper(k, j2);
blockB[count+0] = cj(dm0(0));
blockB[count+1] = cj(dm0(1));
blockB[count+2] = cj(dm0(2));
blockB[count+3] = cj(dm0(3));
count += 4;
}
blockB[count] = cj(rhs(k, j2));
count += 1;
}
// skip what we have after
if(PanelMode) count += 4 * (stride-offset-depth);
if(PanelMode) count += stride-offset-depth;
}
}
// copy the remaining columns one at a time (nr==1)
for(Index j2=packet_cols4; j2<cols; ++j2)
{
if(PanelMode) count += offset;
for(Index k=0; k<depth; k++)
{
blockB[count] = cj(rhs(k, j2));
count += 1;
}
if(PanelMode) count += stride-offset-depth;
}
}
};
} // end namespace internal

View File

@@ -471,15 +471,16 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemmProduct>
if(a_lhs.cols()==0 || a_lhs.rows()==0 || a_rhs.cols()==0)
return;
// Fallback to GEMV if either the lhs or rhs is a runtime vector
if (dst.cols() == 1)
{
// Fallback to GEMV if either the lhs or rhs is a runtime vector
typename Dest::ColXpr dst_vec(dst.col(0));
return internal::generic_product_impl<Lhs,typename Rhs::ConstColXpr,DenseShape,DenseShape,GemvProduct>
::scaleAndAddTo(dst_vec, a_lhs, a_rhs.col(0), alpha);
}
else if (dst.rows() == 1)
{
// Fallback to GEMV if either the lhs or rhs is a runtime vector
typename Dest::RowXpr dst_vec(dst.row(0));
return internal::generic_product_impl<typename Lhs::ConstRowXpr,Rhs,DenseShape,DenseShape,GemvProduct>
::scaleAndAddTo(dst_vec, a_lhs.row(0), a_rhs, alpha);
@@ -488,8 +489,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemmProduct>
typename internal::add_const_on_value_type<ActualLhsType>::type lhs = LhsBlasTraits::extract(a_lhs);
typename internal::add_const_on_value_type<ActualRhsType>::type rhs = RhsBlasTraits::extract(a_rhs);
Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(a_lhs)
* RhsBlasTraits::extractScalarFactor(a_rhs);
Scalar actualAlpha = combine_scalar_factors(alpha, a_lhs, a_rhs);
typedef internal::gemm_blocking_space<(Dest::Flags&RowMajorBit) ? RowMajor : ColMajor,LhsScalar,RhsScalar,
Dest::MaxRowsAtCompileTime,Dest::MaxColsAtCompileTime,MaxDepthAtCompileTime> BlockingType;

View File

@@ -22,7 +22,7 @@ namespace internal {
inline void manage_multi_threading(Action action, int* v)
{
static int m_maxThreads = -1;
EIGEN_UNUSED_VARIABLE(m_maxThreads);
EIGEN_UNUSED_VARIABLE(m_maxThreads)
if(action==SetAction)
{
@@ -129,7 +129,7 @@ void parallelize_gemm(const Functor& func, Index rows, Index cols, Index depth,
double work = static_cast<double>(rows) * static_cast<double>(cols) *
static_cast<double>(depth);
double kMinTaskSize = 50000; // FIXME improve this heuristic.
pb_max_threads = std::max<Index>(1, std::min<Index>(pb_max_threads, work / kMinTaskSize));
pb_max_threads = std::max<Index>(1, std::min<Index>(pb_max_threads, static_cast<Index>( work / kMinTaskSize ) ));
// compute the number of threads we are going to use
Index threads = std::min<Index>(nbThreads(), pb_max_threads);

View File

@@ -111,7 +111,7 @@ struct selfadjoint_product_selector<MatrixType,OtherType,UpLo,false>
Scalar, OtherIsRowMajor ? ColMajor : RowMajor, (!OtherBlasTraits::NeedToConjugate) && NumTraits<Scalar>::IsComplex,
IsRowMajor ? RowMajor : ColMajor, MatrixType::InnerStrideAtCompileTime, UpLo>
::run(size, depth,
&actualOther.coeffRef(0,0), actualOther.outerStride(), &actualOther.coeffRef(0,0), actualOther.outerStride(),
actualOther.data(), actualOther.outerStride(), actualOther.data(), actualOther.outerStride(),
mat.data(), mat.innerStride(), mat.outerStride(), actualAlpha, blocking);
}
};

View File

@@ -136,7 +136,9 @@ EIGEN_DONT_INLINE void triangular_solve_matrix<Scalar,Index,OnTheLeft,Mode,Conju
}
else
{
Scalar b = (other(i,j) *= a);
Scalar& otherij = other(i,j);
otherij *= a;
Scalar b = otherij;
typename OtherMapper::LinearMapper r = other.getLinearMapper(s,j);
typename TriMapper::LinearMapper l = tri.getLinearMapper(s,i);
for (Index i3=0;i3<rs;++i3)

View File

@@ -195,6 +195,55 @@ protected:
template<typename Scalar, typename Index, int StorageOrder, int AlignmentType = Unaligned, int Incr = 1>
class blas_data_mapper;
// TMP to help PacketBlock store implementation.
// There's currently no known use case for PacketBlock load.
// The default implementation assumes ColMajor order.
// It always store each packet sequentially one `stride` apart.
template<typename Index, typename Scalar, typename Packet, int n, int idx, int StorageOrder>
struct PacketBlockManagement
{
PacketBlockManagement<Index, Scalar, Packet, n, idx - 1, StorageOrder> pbm;
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock<Packet, n> &block) const {
pbm.store(to, stride, i, j, block);
pstoreu<Scalar>(to + i + (j + idx)*stride, block.packet[idx]);
}
};
// PacketBlockManagement specialization to take care of RowMajor order without ifs.
template<typename Index, typename Scalar, typename Packet, int n, int idx>
struct PacketBlockManagement<Index, Scalar, Packet, n, idx, RowMajor>
{
PacketBlockManagement<Index, Scalar, Packet, n, idx - 1, RowMajor> pbm;
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock<Packet, n> &block) const {
pbm.store(to, stride, i, j, block);
pstoreu<Scalar>(to + j + (i + idx)*stride, block.packet[idx]);
}
};
template<typename Index, typename Scalar, typename Packet, int n, int StorageOrder>
struct PacketBlockManagement<Index, Scalar, Packet, n, -1, StorageOrder>
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock<Packet, n> &block) const {
EIGEN_UNUSED_VARIABLE(to);
EIGEN_UNUSED_VARIABLE(stride);
EIGEN_UNUSED_VARIABLE(i);
EIGEN_UNUSED_VARIABLE(j);
EIGEN_UNUSED_VARIABLE(block);
}
};
template<typename Index, typename Scalar, typename Packet, int n>
struct PacketBlockManagement<Index, Scalar, Packet, n, -1, RowMajor>
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock<Packet, n> &block) const {
EIGEN_UNUSED_VARIABLE(to);
EIGEN_UNUSED_VARIABLE(stride);
EIGEN_UNUSED_VARIABLE(i);
EIGEN_UNUSED_VARIABLE(j);
EIGEN_UNUSED_VARIABLE(block);
}
};
template<typename Scalar, typename Index, int StorageOrder, int AlignmentType>
class blas_data_mapper<Scalar,Index,StorageOrder,AlignmentType,1>
{
@@ -258,6 +307,11 @@ public:
return internal::first_default_aligned(m_data, size);
}
template<typename SubPacket, int n>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacketBlock(Index i, Index j, const PacketBlock<SubPacket, n> &block) const {
PacketBlockManagement<Index, Scalar, SubPacket, n, n-1, StorageOrder> pbm;
pbm.store(m_data, m_stride, i, j, block);
}
protected:
Scalar* EIGEN_RESTRICT m_data;
const Index m_stride;
@@ -337,6 +391,77 @@ public:
return pgather<Scalar, SubPacket>(&operator()(i, j), m_stride);
}
// storePacketBlock_helper defines a way to access values inside the PacketBlock, this is essentially required by the Complex types.
template<typename SubPacket, typename ScalarT, int n, int idx>
struct storePacketBlock_helper
{
storePacketBlock_helper<SubPacket, ScalarT, n, idx-1> spbh;
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType, Incr>* sup, Index i, Index j, const PacketBlock<SubPacket, n>& block) const {
spbh.store(sup, i,j,block);
for(int l = 0; l < unpacket_traits<SubPacket>::size; l++)
{
ScalarT *v = &sup->operator()(i+l, j+idx);
*v = block.packet[idx][l];
}
}
};
template<typename SubPacket, int n, int idx>
struct storePacketBlock_helper<SubPacket, std::complex<float>, n, idx>
{
storePacketBlock_helper<SubPacket, std::complex<float>, n, idx-1> spbh;
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType, Incr>* sup, Index i, Index j, const PacketBlock<SubPacket, n>& block) const {
spbh.store(sup,i,j,block);
for(int l = 0; l < unpacket_traits<SubPacket>::size; l++)
{
std::complex<float> *v = &sup->operator()(i+l, j+idx);
v->real(block.packet[idx].v[2*l+0]);
v->imag(block.packet[idx].v[2*l+1]);
}
}
};
template<typename SubPacket, int n, int idx>
struct storePacketBlock_helper<SubPacket, std::complex<double>, n, idx>
{
storePacketBlock_helper<SubPacket, std::complex<double>, n, idx-1> spbh;
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType, Incr>* sup, Index i, Index j, const PacketBlock<SubPacket, n>& block) const {
spbh.store(sup,i,j,block);
for(int l = 0; l < unpacket_traits<SubPacket>::size; l++)
{
std::complex<double> *v = &sup->operator()(i+l, j+idx);
v->real(block.packet[idx].v[2*l+0]);
v->imag(block.packet[idx].v[2*l+1]);
}
}
};
template<typename SubPacket, typename ScalarT, int n>
struct storePacketBlock_helper<SubPacket, ScalarT, n, -1>
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType, Incr>*, Index, Index, const PacketBlock<SubPacket, n>& ) const {
}
};
template<typename SubPacket, int n>
struct storePacketBlock_helper<SubPacket, std::complex<float>, n, -1>
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType, Incr>*, Index, Index, const PacketBlock<SubPacket, n>& ) const {
}
};
template<typename SubPacket, int n>
struct storePacketBlock_helper<SubPacket, std::complex<double>, n, -1>
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType, Incr>*, Index, Index, const PacketBlock<SubPacket, n>& ) const {
}
};
// This function stores a PacketBlock on m_data, this approach is really quite slow compare to Incr=1 and should be avoided when possible.
template<typename SubPacket, int n>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacketBlock(Index i, Index j, const PacketBlock<SubPacket, n>&block) const {
storePacketBlock_helper<SubPacket, Scalar, n, n-1> spb;
spb.store(this, i,j,block);
}
protected:
Scalar* EIGEN_RESTRICT m_data;
const Index m_stride;
@@ -493,6 +618,47 @@ template<typename T> const typename T::Scalar* extract_data(const T& m)
return extract_data_selector<T>::run(m);
}
/**
* \c combine_scalar_factors extracts and multiplies factors from GEMM and GEMV products.
* There is a specialization for booleans
*/
template<typename ResScalar, typename Lhs, typename Rhs>
struct combine_scalar_factors_impl
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static ResScalar run(const Lhs& lhs, const Rhs& rhs)
{
return blas_traits<Lhs>::extractScalarFactor(lhs) * blas_traits<Rhs>::extractScalarFactor(rhs);
}
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static ResScalar run(const ResScalar& alpha, const Lhs& lhs, const Rhs& rhs)
{
return alpha * blas_traits<Lhs>::extractScalarFactor(lhs) * blas_traits<Rhs>::extractScalarFactor(rhs);
}
};
template<typename Lhs, typename Rhs>
struct combine_scalar_factors_impl<bool, Lhs, Rhs>
{
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static bool run(const Lhs& lhs, const Rhs& rhs)
{
return blas_traits<Lhs>::extractScalarFactor(lhs) && blas_traits<Rhs>::extractScalarFactor(rhs);
}
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static bool run(const bool& alpha, const Lhs& lhs, const Rhs& rhs)
{
return alpha && blas_traits<Lhs>::extractScalarFactor(lhs) && blas_traits<Rhs>::extractScalarFactor(rhs);
}
};
template<typename ResScalar, typename Lhs, typename Rhs>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE ResScalar combine_scalar_factors(const ResScalar& alpha, const Lhs& lhs, const Rhs& rhs)
{
return combine_scalar_factors_impl<ResScalar,Lhs,Rhs>::run(alpha, lhs, rhs);
}
template<typename ResScalar, typename Lhs, typename Rhs>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE ResScalar combine_scalar_factors(const Lhs& lhs, const Rhs& rhs)
{
return combine_scalar_factors_impl<ResScalar,Lhs,Rhs>::run(lhs, rhs);
}
} // end namespace internal
} // end namespace Eigen

Some files were not shown because too many files have changed in this diff Show More