Compare commits

..

600 Commits

Author SHA1 Message Date
David Tellenbach
1f4c0311cd Bump to 3.3.91 (3.4-rc1) 2021-04-18 23:43:12 +02:00
David Tellenbach
3e819d83bf Before 3.4 branch 2021-04-18 23:36:14 +02:00
Antonio Sanchez
69adf26aa3 Modify googlehash use to account for namespace issues.
The namespace declaration for googlehash is a configurable macro that
can be disabled.  In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.

Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution.  Symbols within
the `::google` namespace are imported into `Eigen::google`.

We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
2021-04-12 19:00:39 -07:00
Christoph Hertzberg
9357feedc7 Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>. 2021-04-13 01:36:59 +02:00
Rasmus Munk Larsen
a2c0542010 Fix typo in TensorDimensions.h 2021-04-12 18:59:56 +00:00
Rohit Santhanam
dfd6720d82 Fix for float16 GPU unit test. 2021-04-12 10:19:06 +00:00
Christoph Hertzberg
1e1c8a735c Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
Fixes #2209
2021-04-12 01:26:15 +00:00
Jens Wehner
f6fc66aa75 fixed doxygen for unsupported iterative solver module 2021-04-11 16:26:14 +00:00
Christoph Hertzberg
d58678069c Make iterators default constructible and assignable, by making... 2021-04-09 17:03:28 +00:00
Rohit Santhanam
2859db0220 This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.

The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.

But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.

The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.

Previously, the GPU specialization is chosen only if Vectorization is not used.
2021-04-08 15:14:48 +00:00
Antonio Sanchez
fcb5106c6e Scaled epsilon the wrong way.
Should have been 0.5 to widen the bounds, since this is inverse
precision.  Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35 Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189).
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen
22edb46823 Align local arrays to Packet boundary. 2021-04-06 16:22:36 +00:00
Antonio Sanchez
ace7f132ed Fix clang tidy warnings in AnnoyingScalar.
Clang-tidy complains that full specializations in headers can cause ODR
violations.  Marked these as `inline` to fix.

It also complains about renaming arguments in specializations.  Set the
argument names to match.
2021-04-05 12:49:38 -07:00
Antonio Sanchez
90187a33e1 Fix SelfAdjoingEigenSolver (#2191)
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.

Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.

Fixes #2191.
2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen
3ddc0974ce Fix two bugs in commit 2021-04-02 22:06:27 +00:00
Chip Kerchner
c24bee6120 Fix address of temporary object errors in clang11.
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
David Tellenbach
e4233b6e3d Add CI infrastructure for pre-merge smoke tests.
This patch adds pre-merge smoke tests for x86 Linux using gcc-10 and clang-10.

Closes #2188.
2021-04-01 00:08:37 +00:00
David Tellenbach
ae95b74af9 Add CMake infrastructure for smoke testing
Necessary CMake changes to implement pre-merge smoke tests
running via CI.
2021-03-31 22:09:00 +00:00
Rasmus Munk Larsen
5bbc9cea93 Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input.
Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
2021-03-31 21:09:19 +00:00
Guoqiang QI
b5a926a0f6 Add GitLab templates for issues and merge requests
This patch adds GitLab templates for bug reports, feature and merge requests.

This closes #2117.
2021-03-31 16:01:12 +00:00
Antonio Sanchez
78ee3d6261 Fix CUDA constexpr issues for numeric_limits.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F            __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`.  For portability, we
need to remove `constexpr` from the affected functions.

For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
2021-03-30 18:01:27 +00:00
Antonio Sanchez
af1247fbc1 Use Index type in loop over coefficients.
Previously was `int`.  Brought up by Kyle Snow (Polaris Geospatial
Services) on the mailing list.
2021-03-29 17:40:55 +00:00
Antonio Sanchez
87729ea39f Eliminate round_impl double-promotion warnings for c++03. 2021-03-25 16:52:19 +00:00
Deven Desai
748489ef9c Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform
The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit

e7b8643d70

```
In file included from /home/rocm-user/eigen/test/main.h:360:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:162:
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
  static float (max)() {
                ^
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression
    return HIPRT_MAX_NORMAL_F;
           ^
/home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F'
#define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff)
                           ^
/opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here
__device__ static inline float __int_as_float(int x) {
                               ^
```

The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file.

Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
2021-03-25 13:45:52 +00:00
Chip Kerchner
d59ef212e1 Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). 2021-03-25 11:08:19 +00:00
Steve Bronder
e7b8643d70 Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010.
2021-03-24 18:14:56 +00:00
Antonio Sanchez
5521c65afb Eliminate mixingtypes_7 warning.
`g_called` is not used in subtest 7, so was generating a
`-Wunneeded-internal-declaration` warnings.  Here we silence
it by initializing the static variable.
2021-03-24 11:05:41 -07:00
Christoph Hertzberg
69a4f70956 Revert "Uses _mm512_abs_pd for Packet8d pabs"
This reverts commit f019b97aca
2021-03-23 18:52:19 +00:00
David Tellenbach
824272cde8 Re-enable CI for Power 2021-03-22 19:28:25 +01:00
David Tellenbach
4811e81966 Remove yet another comma at end of enum 2021-03-18 23:30:00 +01:00
Steve Bronder
f019b97aca Uses _mm512_abs_pd for Packet8d pabs 2021-03-18 15:47:52 +00:00
David Tellenbach
0cc9b5eb40 Split test commainitializer into two substests 2021-03-18 13:28:51 +01:00
Antonio Sanchez
c3fbc6cec7 Use singleton pattern for static registered tests.
The original fails with nvcc+msvc - there's a static order of initialization
issue leading to registered tests being cleared.  The test then fails on
```
VERIFY(EigenTest::all().size()>0);
```
since `EigenTest` no longer contains any tests.  The singleton pattern
fixes this.
2021-03-18 00:56:31 +00:00
Niek Bouman
ed964ba3f1 Proposed fix for issue #2187 2021-03-18 00:55:36 +00:00
Antonio Sanchez
8dfe1029a5 Augment NumTraits with min/max_exponent() again.
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible.  Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.

The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.

Fixes #2148
2021-03-16 20:12:46 -07:00
David Tellenbach
eb71e5db98 Fix another warning on missing commas 2021-03-17 03:07:04 +01:00
David Tellenbach
df4bc2731c Revert "Augment NumTraits with min/max_exponent()."
This reverts commit 75ce9cd2a7.
2021-03-17 03:06:08 +01:00
Antonio Sanchez
75ce9cd2a7 Augment NumTraits with min/max_exponent().
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase.  Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.

Fixes #2148
2021-03-17 01:00:41 +00:00
David Tellenbach
9fb7062440 Silence warning on comma at end of enumerator list 2021-03-17 01:46:52 +01:00
Theo Fletcher
b8502a9dd6 Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary. 2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen
2e83cbbba9 Add NaN propagation options to minCoeff/maxCoeff visitors. 2021-03-16 17:02:50 +00:00
Jens Wehner
c0a889890f Fixed output of complex matrices 2021-03-15 21:51:55 +00:00
Antonio Sanchez
f612df2736 Add fmod(half, half).
This is to support TensorFlow's `tf.math.floormod` for half.
2021-03-15 13:32:24 -07:00
Antonio Sanchez
14b7ebea11 Fix numext::round pre c++11 for large inputs.
This is to resolve an issue for large inputs when +0.5 can
actually lead to +1 if the input doesn't have enough precision
to resolve the addition - leading to an off-by-one error.

See discussion on 9a663973.
2021-03-15 19:08:04 +00:00
Chip Kerchner
c9d4367fa4 Fix pround and add print 2021-03-15 19:07:43 +00:00
Antonio Sanchez
d24f9f9b55 Fix NVCC+ICC issues.
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.

ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h.  This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.

ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
  static ConstMapType Map(const Scalar *data)

```
I tracked this down to `friend class Eigen::Map`.  Putting the `friend`
statements at the bottom of the class seems to resolve this issue.

Fixes #2180
2021-03-15 18:42:04 +00:00
Antonio Sanchez
14487ed14e Add increment/decrement operators to Eigen::half.
This is for consistency with bfloat16, and to support initialization
with `std::iota`.
2021-03-15 10:52:23 -07:00
Antonio Sanchez
b271110788 Bump up rand histogram threshold.
The previous one sometimes fails for MSVC which has a poor random
number generator.

Fixes #2182
2021-03-10 22:17:03 -08:00
Antonio Sanchez
d098c4d64c Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.
Doesn't seem to correctly select the register type, and most types
lead to compiler crashes.
2021-03-10 16:05:01 -08:00
Antonio Sanchez
543e34ab9d Re-implement move assignments.
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.

Here we pass down the move assignment to the underlying storage.  Static
storage does a one-way copy, dynamic storage does a swap.

Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.

Fixes: #2119
2021-03-10 16:55:20 +00:00
Ben Niu
b8d1857f0d [MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons. 2021-03-10 10:21:31 +00:00
Antonio Sanchez
853a5c4b84 Fix ambiguous call to CUDA __half constructor. 2021-03-08 21:06:28 -08:00
Antonio Sanchez
94327dbfba Fix typo: DEVICE -> GPU 2021-03-08 11:21:00 -08:00
Antonio Sanchez
1296abdf82 Fix non-trivial Half constructor for CUDA.
Both CUDA and HIP require trivial default constructors for types used
in shared memory. Otherwise failing with
```
error: initialization is not supported for __shared__ variables.
```
2021-03-08 07:32:54 -08:00
Antonio Sanchez
6045243141 Revert stack allocation limit change that crept in.
This was accidentally introduced when copying changes between repos.
2021-03-05 14:29:37 -08:00
Deven Desai
1a96d49afe Changing the Eigen::half implementation for HIP
Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build.

In the ROCm Tensorflow build,
* files that do not contain ant GPU code get compiled via gcc, and
* files that contnain GPU code get compiled via hipcc.

In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function.

The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function.

Changing the Eigen::half argument to be "pass by reference" seems to workaround the error.

In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
2021-03-05 19:27:13 +00:00
Antonio Sanchez
2468253c9a Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.

Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise.  This simplifies checks for supported
features.

Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.

Fixes: #2170
2021-03-05 18:33:18 +00:00
Antonio Sanchez
82d61af3a4 Fix rint SSE/NEON again, using optimization barrier.
This is a new version of !423, which failed for MSVC.

Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).

Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.

Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part.  If we are
over `2^digits` then just return the input.  This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied.  Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
David Tellenbach
5f0b4a4010 Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"
This reverts commit 6cbb3038ac because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
2021-03-05 13:16:43 +01:00
Steve Bronder
6cbb3038ac Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size() 2021-03-04 18:58:08 +00:00
David Tellenbach
5bfc67f9e7 Deactive CI for Power due to problems with GitLab runner 2021-03-04 17:33:40 +01:00
Eugene Zhulenev
a6601070f2 Add log2 operation to TensorBase 2021-03-04 00:13:36 +00:00
Antonio Sánchez
9a663973b4 Revert "Fix rint for SSE/NEON."
This reverts commit e72dfeb8b9
2021-03-03 18:51:51 +00:00
Antonio Sanchez
e72dfeb8b9 Fix rint for SSE/NEON.
It seems *sometimes* with aggressive optimizations the combination
`psub(padd(a, b), b)` trick to force rounding is compiled away. Here
we replace with inline assembly to prevent this (I tried `volatile`,
but that leads to additional loads from memory).

Also fixed an edge case for large inputs `a` where adding `b` bumps
the value up a power of two and ends up rounding away more than
just the fractional part.  If we are over `2^digits` then just return
the input.  This edge case was missed in the test since the test was
comparing approximate equality, which was still satisfied.  Adding
a strict equality option catches it.
2021-03-03 09:41:46 -08:00
Christoph Hertzberg
199c5f2b47 geo_alignedbox_5 was failing with AVX enabled, due to storing Vector4d in a std::vector without using an aligned allocator.
Got rid of using `std::vector` and simplified the code.
Avoid leading `_`
2021-03-01 03:59:21 +01:00
Antonio Sanchez
1e0c7d4f49 Add print for SSE/NEON, use NEON rounding intrinsics if available.
In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the
current rounding mode.

For NEON, we use the provided intrinsics for rint/floor/ceil if
available (armv8).

Related to #1969.
2021-02-27 22:42:07 +00:00
David Tellenbach
976ae0ca6f Document that using raw function pointers doesn't work with unaryExpr. 2021-02-27 22:58:42 +01:00
Antonio Sanchez
c65c2b31d4 Make half/bfloat16 constructor take inputs by value, fix powerpc test.
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.

Changing `half` to take inputs by value fixes this.  Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.

Also eliminated some compile warnings for powerpc.
2021-02-27 21:32:06 +00:00
Christoph Hertzberg
39a590dfb6 Remove unused include 2021-02-27 19:02:33 +01:00
Christoph Hertzberg
8f686ac4ec clang 10 aggressively warns about precision loss when converting int to float (or long to double)
(cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
2660d01fa7 Inherit from no_assignment_operator to avoid implicit copy constructor warnings
(cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
a3521d743c Fix some enum-enum conversion warnings
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
ca528593f4 Fixed/masked more implicit copy constructor warnings
(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
81b5fe2f0a ReturnByValue is already non-copyable
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
4fb3459a23 Fix double-promotion warnings
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Jens Wehner
4bfcee47b9 Idrs iterative linear solver 2021-02-27 12:09:33 +00:00
Antonio Sanchez
29ebd84cb7 Fix NEON sqrt for 32-bit, add prsqrt.
With !406, we accidentally broke arm 32-bit NEON builds, since
`vsqrt_f32` is only available for 64-bit.

Here we add back the `rsqrt` implementation for 32-bit, relying
on a `prsqrt` implementation with better handling of edge cases.

Note that several of the 32-bit NEON packet tests are currently
failing - either due to denormal handling (NEON versions flush
to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen
fe19714f80 Merge branch 'rmlarsen1/eigen-nan_prop' 2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen
e67672024d Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen
5e7d4c33d6 Add TODO. 2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen
fb5b59641a Defer default for minCoeff/maxCoeff to templated variant. 2021-02-26 09:07:00 -08:00
Antonio Sanchez
e19829c3b0 Fix floor/ceil for NEON fp16.
Forgot to test this.  Fixes bug introduced in !416.
2021-02-25 20:39:56 -08:00
Antonio Sanchez
5529db7524 Fix SSE/NEON pfloor/pceil for saturated values.
The original will saturate if the input does not fit into an integer
type.  Here we fix this, returning the input if it doesn't have
enough precision to have a fractional part.

Also added `pceil` for NEON.

Fixes #1969.
2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen
51eba8c3e2 Fix indentation. 2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen
5297b7162a Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 18:21:21 +00:00
Antonio Sanchez
ecb7b19dfa Disable new/delete test for HIP 2021-02-25 08:04:05 -08:00
Chip-Kerchner
6eebe97bab Fix clang compile when no MMA flags are set. Simplify MMA compiler detection. 2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen
f284c8592b Don't crash when attempting to slice an empty tensor. 2021-02-24 18:12:51 -08:00
Rasmus Munk Larsen
4cb0592af7 Fix indentation. 2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen
6b34568c74 Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-24 17:54:58 -08:00
Rasmus Munk Larsen
0065f9d322 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen
841c8986f8 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-24 17:49:20 -08:00
Rasmus Munk Larsen
113e61f364 Remove unused function scalar_cmp_with_cast. 2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen
98ca58b02c Cast anonymous enums to int when used in expressions. 2021-02-24 23:50:06 +00:00
Chip-Kerchner
c31ead8a15 Having forward template function declarations in a P10 file causes bad code in certain situations. 2021-02-24 23:43:30 +00:00
Guoqiang QI
f44197fabd Some improvements for kissfft from Martin Reinecke(pocketfft author):
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
2021-02-24 21:36:47 +00:00
Antonio Sanchez
a31effc3bc Add invoke_result and eliminate result_of warnings for C++17+.
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.

Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.

Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.

For completeness, added nullary op support for c++03.

Fixes #1850.
2021-02-24 21:36:14 +00:00
Chip-Kerchner
8523d447a1 Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins. 2021-02-24 20:49:15 +00:00
Antonio Sanchez
5908aeeaba Fix CUDA device new and delete, and add test.
HIP does not support new/delete on device, so test is skipped.
2021-02-24 11:31:41 -08:00
Antonio Sanchez
119763cf38 Eliminate CMake FindPackageHandleStandardArgs warnings.
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
  The package name passed to `find_package_handle_standard_args` (UMFPACK)
  does not match the name of the calling package (Umfpack).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
  bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
2021-02-24 09:52:05 +00:00
Antonio Sanchez
6cf0ab5e99 Disable fast psqrt for NEON.
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.

Fixes #2094.
2021-02-23 19:52:55 -08:00
Antonio Sanchez
aba3998278 Fix check if GPU compile phase for std::hash 2021-02-23 19:52:08 -08:00
Antonio Sanchez
db5691ff2b Fix some CUDA warnings.
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.

`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`.  Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.

Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.

Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang.  Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen
88d4c6d4c8 Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Adam Shapiro
2ac0b78739 Fixed sparse conservativeResize() when both num cols and rows decreased.
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
2021-02-23 21:32:39 +00:00
Chip-Kerchner
10c77b0ff4 Fix compilation errors with later versions of GCC and use of MMA. 2021-02-22 15:01:47 -06:00
Christoph Hertzberg
73922b0174 Fixes Bug #1925. Packets should be passed by const reference, even to inline functions. 2021-02-20 18:56:42 +01:00
Antonio Sanchez
5f9cfb2529 Add missing adolc isinf/isnan.
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.

Fixed: #2157
2021-02-19 22:26:56 +00:00
Christoph Hertzberg
ce4af0b38f Missing change regarding #1910 2021-02-19 20:51:35 +01:00
Christoph Hertzberg
a7749c09bc Bug #1910: Make SparseCholesky work for RowMajor matrices 2021-02-19 19:36:18 +01:00
Antonio Sánchez
128eebf05e Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)."
This reverts commit 12fd3dd655
2021-02-19 17:09:16 +00:00
frgossen
33e0af0130 Return nan at poles of polygamma, digamma, and zeta if limit is not defined 2021-02-19 16:35:11 +00:00
Rasmus Munk Larsen
7f09d3487d Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp. 2021-02-18 20:49:18 +00:00
Masaki Murooka
12fd3dd655 add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC). 2021-02-17 22:55:47 +00:00
David Tellenbach
aa8b22e776 Bump to 3.4.99 2021-02-17 23:23:17 +01:00
David Tellenbach
5336ad8591 Define internal::make_unsigned for [unsigned]long long on macOS.
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization

  internal::make_unsigned<long long>,

for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00
Antonio Sanchez
0845df7f77 Fix uninitialized warning on AVX. 2021-02-17 13:13:39 -08:00
Chip Kerchner
9b51dc7972 Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product 2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen
be0574e215 New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double. 2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980 Updated pfrexp implementation.
The original implementation fails for 0, denormals, inf, and NaN.

See #2150
2021-02-17 02:23:24 +00:00
David Tellenbach
9ad4096ccb Document possible inconsistencies when using Matrix<bool, ...> 2021-02-17 00:50:26 +01:00
Ashutosh Sharma
f702792a7c missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel) 2021-02-16 16:33:59 +00:00
Jan van Dijk
db61b8d478 Avoid -Wunused warnings in NDEBUG builds.
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
2021-02-12 18:35:35 +00:00
David Tellenbach
622c598944 Don't allow all test jobs to fail but only the currently failing ones. 2021-02-12 14:01:17 +01:00
Antonio Sanchez
90ee821c56 Use vrsqrts for rsqrt Newton iterations.
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
2021-02-11 11:33:51 -08:00
Antonio Sanchez
9fde9cce5d Adjust bounds for pexp_float/double
The original clamping bounds on `_x` actually produce finite values:
```
  exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38

  exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.

This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e Fix ldexp implementations.
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits.  See #2131 for a complete discussion,
and !375 for other possible implementations.

Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.

The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.

Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.

Fixes #2131.
2021-02-10 22:45:41 +00:00
Ashutosh Sharma
7eb07da538 loop less ptranspose 2021-02-10 10:21:37 -08:00
David Tellenbach
36200b7855 Remove vim specific comments to recognoize correct file-type.
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
David Tellenbach
54589635ad Replace nullptr by NULL in SparseLU.h to be C++03 compliant. 2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas
984d010b7b add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves 2021-02-08 22:00:31 +00:00
Nikolaus Demmel
b578930657 Fix documentation typos in LDLT.h 2021-02-08 21:07:29 +00:00
Antonio Sanchez
66841ea070 Enable bdcsvd on host.
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error.  This prevents it from running on
the host as well.

Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host.  Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).

Fixes #2139
2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen
6e3b795f81 Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one. 2021-02-05 16:58:49 -08:00
Antonio Sanchez
abcde69a79 Disable vectorized pow for half/bfloat16.
We are potentially seeing some accuracy issues with these.  Ideally we
would hand off to `float`, but that's not trivial with the current
setup.

We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00
Antonio Sanchez
56c8b14d87 Eliminate implicit conversions from float to double. 2021-02-01 15:31:01 -08:00
Antonio Sanchez
fb4548e27b Implement bit_* for device.
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`).  They also never exist in the
global namespace, so the current implementation  always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.

To overcome these limitations, we implement these functionals here.
2021-02-01 13:27:45 -08:00
Antonio Sanchez
1615a27993 Fix altivec packetmath.
Allows the altivec packetmath tests to pass.  There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.

Related to !370, #2128

cc: @ChipKerchner @pdrocaldeira

Tested on `_BIG_ENDIAN` running on QEMU with VSX.  Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c Fix clang compilation for AltiVec from previous check-in 2021-01-28 18:36:40 +00:00
David Tellenbach
170a504c2f Add the following functions
DenseBase::setConstant(NoChange_t, Index, const Scalar&)
  DenseBase::setConstant(Index, NoChange_t, const Scalar&)

to close #663.
2021-01-28 15:13:07 +01:00
David Tellenbach
598e1b6e54 Add the following functions:
DenseBase::setZero(NoChange_t, Index)
  DenseBase::setZero(Index, NoChange_t)
  DenseBase::setOnes(NoChange_t, Index)
  DenseBase::setOnes(Index, NoChange_t)
  DenseBase::setRandom(NoChange_t, Index)
  DenseBase::setRandom(Index, NoChange_t)

This closes #663.
2021-01-28 01:10:36 +01:00
Gael Guennebaud
0668c68b03 Allow for negative strides.
Note that using a stride of -1 is still not possible because it would
clash with the definition of Eigen::Dynamic.

This fixes #747.
2021-01-27 23:32:12 +01:00
Samir Benmendil
288d456c29 Replace language_support module with builtin CheckLanguage
The workaround_9220 function was introduced a long time ago to
workaround a CMake issue with enable_language(OPTIONAL). Since then
CMake has clarified that the OPTIONAL keywords has not been
implemented[0].

A CheckLanguage module is now provided with CMake to check if a language
can be enabled. Use that instead.

[0] https://cmake.org/cmake/help/v3.18/command/enable_language.html
2021-01-27 13:26:40 +00:00
Antonio Sanchez
3f4684f87d Include <cstdint> in one place, remove custom typedefs
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.

To avoid this, we remove the custom `typedef` definitions for win32.  The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.

Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Chip Kerchner
0784d9f87b Fix sqrt, ldexp and frexp compilation errors. 2021-01-25 15:22:19 -06:00
Gmc2
a4edb1079c fix test of ExtractVolumePatchesOp 2021-01-25 03:23:46 +00:00
Antonio Sanchez
4c42d5ee41 Eliminate implicit conversion warning in test/array_cwise.cpp 2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90 Replace std::isnan with numext::isnan for c++03 2021-01-23 11:02:35 -08:00
Florian Maurin
c35965b381 Remove unused variable in SparseLU.h 2021-01-22 22:24:11 +00:00
Antonio Sanchez
f0e46ed5d4 Fix pow and other cwise ops for half/bfloat16.
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.

While adding tests for half/bfloat16, found other issues related to
implicit conversions.

Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types.  These seem to be  implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
f19bcffee6 Specialize std::complex operators for use on GPU device.
NVCC and older versions of clang do not fully support `std::complex` on device,
leading to either compile errors (Cannot call `__host__` function) or worse,
runtime errors (Illegal instruction).  For most functions, we can
implement specialized `numext` versions. Here we specialize the standard
operators (with the exception of stream operators and member function operators
with a scalar that are already specialized in `<complex>`) so they can be used
in device code as well.

To import these operators into the current scope, use
`EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into
the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces.

This allow us to remove specializations of the
sum/difference/product/quotient ops, and allow us to treat complex
numbers like most other scalars (e.g. in tests).
2021-01-22 18:19:19 +00:00
David Tellenbach
65e2169c45 Add support for Arm SVE
This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*).

During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector.

Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`.

This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements.

This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.
2021-01-21 21:11:57 +00:00
Antonio Sanchez
b2126fd6b5 Fix pfrexp/pldexp for half.
The recent addition of vectorized pow (!330) relies on `pfrexp` and
`pldexp`.  This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.

Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00
Antonio Sanchez
25d8498f8b Fix stable_norm_1 test.
Test enters an infinite loop if size is 1x1 when choosing to select
unique indices for adding `inf` and `NaN` to the input. Here we
revert to non-unique indices, and split the `hypotNorm` check into
two cases: one where both `inf` and `NaN` are added, and one where
only `NaN` is added.
2021-01-21 09:44:42 -08:00
David Tellenbach
660c6b857c Remove std::cerr in iterative solver since we don't have iostream.
This fixes #2123
2021-01-21 11:40:05 +01:00
Antonio Sanchez
d5b7981119 Fix signed-unsigned comparison.
Hex literals are interpreted as unsigned, leading to a comparison between
signed max supported function `abcd[0]`  (which was negative) to the unsigned
literal `0x80000006`.  Should not change result since signed is
implicitly converted to unsigned for the comparison, but eliminates the
warning.
2021-01-20 08:34:00 -08:00
Ivan Popivanov
e409795d6b Proper CPUID 2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen
cdd8fdc32e Vectorize pow(x, y). This closes https://gitlab.com/libeigen/eigen/-/issues/2085, which also contains a description of the algorithm.
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:

```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```

If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:

```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```

which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Antonio Sanchez
bde6741641 Improved std::complex sqrt and rsqrt.
Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously
`complex_sqrt` was only used for CUDA and MSVC), and implements
custom `complex_rsqrt`.

Also introduces `numext::rsqrt` to simplify implementation, and modified
`numext::hypot` to adhere to IEEE IEC 6059 for special cases.

The `complex_sqrt` and `complex_rsqrt` implementations were found to be
significantly faster than `std::sqrt<std::complex<T>>` and
`1/numext::sqrt<std::complex<T>>`.

Benchmark file attached.
```
GCC 10, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           9.21 ns         9.21 ns     73225448
BM_StdSqrt<std::complex<float>>        17.1 ns         17.1 ns     40966545
BM_Sqrt<std::complex<double>>          8.53 ns         8.53 ns     81111062
BM_StdSqrt<std::complex<double>>       21.5 ns         21.5 ns     32757248
BM_Rsqrt<std::complex<float>>          10.3 ns         10.3 ns     68047474
BM_DivSqrt<std::complex<float>>        16.3 ns         16.3 ns     42770127
BM_Rsqrt<std::complex<double>>         11.3 ns         11.3 ns     61322028
BM_DivSqrt<std::complex<double>>       16.5 ns         16.5 ns     42200711

Clang 11, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           7.46 ns         7.45 ns     90742042
BM_StdSqrt<std::complex<float>>        16.6 ns         16.6 ns     42369878
BM_Sqrt<std::complex<double>>          8.49 ns         8.49 ns     81629030
BM_StdSqrt<std::complex<double>>       21.8 ns         21.7 ns     31809588
BM_Rsqrt<std::complex<float>>          8.39 ns         8.39 ns     82933666
BM_DivSqrt<std::complex<float>>        14.4 ns         14.4 ns     48638676
BM_Rsqrt<std::complex<double>>         9.83 ns         9.82 ns     70068956
BM_DivSqrt<std::complex<double>>       15.7 ns         15.7 ns     44487798

Clang 9, Pixel 2, aarch64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           24.2 ns         24.1 ns     28616031
BM_StdSqrt<std::complex<float>>         104 ns          103 ns      6826926
BM_Sqrt<std::complex<double>>          31.8 ns         31.8 ns     22157591
BM_StdSqrt<std::complex<double>>        128 ns          128 ns      5437375
BM_Rsqrt<std::complex<float>>          31.9 ns         31.8 ns     22384383
BM_DivSqrt<std::complex<float>>        99.2 ns         98.9 ns      7250438
BM_Rsqrt<std::complex<double>>         46.0 ns         45.8 ns     15338689
BM_DivSqrt<std::complex<double>>        119 ns          119 ns      5898944
```
2021-01-17 08:50:57 -08:00
Maozhou, Ge
21a8a2487c fix paddings of TensorVolumePatchOp 2021-01-15 11:51:49 +08:00
Guoqiang QI
38ae5353ab 1)provide a better generic paddsub op implementation
2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON
3)make paddsub op support the Packet2cf/Packet4f in SSE
2021-01-13 22:54:03 +00:00
Antonio Sanchez
352f1422d3 Remove inf local variable.
Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`,
causing a compile error here. We don't need the local anyways since it's
only used in one spot.
2021-01-12 10:33:15 -08:00
Antonio Sanchez
2044084979 Remove TODO from Transform::computeScaleRotation()
Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD`
for small matrices (twice as fast for 2x2, 20% faster for 3x3,
1% faster for 10x10).  Since the majority of cases will be small,
let's stick with `JacobiSVD`.  See !361.
2021-01-11 11:30:01 -08:00
Antonio Sanchez
3daf92c7a5 Transform::computeScalingRotation flush determinant to +/- 1.
In the previous code, in attempting to correct for a negative
determinant, we end up multiplying and dividing by a number that
is often very near, but not exactly +/-1.  By flushing to +/-1,
we can replace a division with a multiplication, and results
are more numerically consistent.
2021-01-11 10:13:38 -08:00
Antonio Sanchez
587fd6ab70 Only specialize complex sqrt_impl for CUDA if not MSVC.
We already specialize `sqrt_impl` on windows due to MSVC's mishandling
of `inf` (!355).
2021-01-11 09:15:45 -08:00
Deven Desai
2a6addb4f9 Fix for breakage in ROCm support - 210108
The following commit breaks ROCm support for Eigen
f149e0ebc3

All unit tests fail with the following error

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:166:
/home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt'
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
                                  ^
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here
template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x);
                                     ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2

```

The error message is accurate, and the fix (provided in thsi commit) is trivial.
2021-01-08 18:04:40 +00:00
Antonio Sanchez
f149e0ebc3 Fix MSVC complex sqrt and packetmath test.
MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`.
Here we replace it with a custom version (currently used on GPU).

Also fixed the `packetmath` test, which previously skipped several
corner cases since `CHECK_CWISE1` only tests the first `PacketSize`
elements.
2021-01-08 01:17:19 +00:00
Antonio Sanchez
8d9cfba799 Fix rand test for MSVC.
MSVC's uniform random number generator is not quite as uniform as
others, requiring a slightly wider threshold on the histogram test.
After inspecting histograms for several runs, there's no obvious
bias -- just some bins end up having slightly more less elements
(often > 2% but less than 2.5%).
2021-01-07 12:48:40 -08:00
Essex Edwards
e741b43668 Make Transform::computeRotationScaling(0,&S) continuous 2021-01-07 17:45:14 +00:00
David Tellenbach
0bdc0dba20 Add missing #endif directive in Macros.h 2021-01-07 12:32:41 +01:00
shrek1402
cb654b1c45 #define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors. 2021-01-07 10:12:25 +00:00
Antonio Sanchez
52d1dd979a Fix Ref initialization.
Since `eigen_assert` is a macro, the statements can become noops (e.g.
when compiling for GPU), so they may not execute the contained logic -- which
in this case is the entire `Ref` construction.  We need to separate the assert
from statements which have consequences.

Fixes #2113
2021-01-06 13:14:20 -08:00
Antonio Sanchez
166fcdecdb Allow CwiseUnaryView to be used on device.
Added `EIGEN_DEVICE_FUNC` to methods.
2021-01-06 09:16:52 -08:00
Antonio Sanchez
bb1de9dbde Fix Ref Stride checks.
The existing `Ref` class failed to consider cases where the Ref's
`Stride` setting *could* match the underlying referred object's stride,
but **didn't** at runtime.  This led to trying to set invalid stride values,
causing runtime failures in some cases, and garbage due to mismatched
strides in others.

Here we add the missing runtime checks.  This involves computing the
strides necessary to align with the referred object's storage, and
verifying we can actually set those strides at runtime.

In the `const` case, if it *may* be possible to refer to the original
storage at compile-time but fails at runtime, then we defer to the
`construct(...)` method that makes a copy.

Added more tests to check these cases.

Fixes #2093.
2021-01-05 10:41:25 -08:00
Christoph Hertzberg
12dda34b15 Eliminate boolean product warnings by factoring out a
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Antonio Sanchez
070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
rgreenblatt
fdf2ee62c5 Fix missing EIGEN_DEVICE_FUNC 2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen
05754100fe * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.
* Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA.
* Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster.

The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx.

AVX+FMA (float)

name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 1%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.08ns ± 1%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  95.5ns ± 0%    ~
BM_eigen_sqrt_float/4k     776ns ± 0%   763ns ± 0%  -1.67%
BM_eigen_sqrt_float/32k   6.57µs ± 1%  6.13µs ± 0%  -6.69%
BM_eigen_sqrt_float/256k  83.7µs ± 3%  83.3µs ± 2%    ~
BM_eigen_sqrt_float/1M     335µs ± 2%   332µs ± 2%    ~

SSE+FMA (float)
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 0%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.06ns ± 0%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  96.3ns ± 4%    ~
BM_eigen_sqrt_float/4k     774ns ± 0%   763ns ± 0%  -1.50%
BM_eigen_sqrt_float/32k   6.58µs ± 2%  6.11µs ± 0%  -7.06%
BM_eigen_sqrt_float/256k  82.7µs ± 1%  82.6µs ± 1%    ~
BM_eigen_sqrt_float/1M     330µs ± 1%   329µs ± 2%    ~

SSE+FMA (double)
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.08ns ± 0%   -6.68%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.65%
BM_eigen_sqrt_double/512     417ns ± 0%   374ns ± 1%  -10.29%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -11.00%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.7µs ± 0%  -11.07%
BM_eigen_sqrt_double/256k    213µs ± 0%   206µs ± 1%   -3.31%
BM_eigen_sqrt_double/1M      862µs ± 0%   870µs ± 2%   +0.96%

AVX+FMA (double)
name                        old cpu/op  new cpu/op  delta
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.06ns ± 0%   -6.95%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.80%
BM_eigen_sqrt_double/512     417ns ± 0%   373ns ± 1%  -10.59%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -10.79%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.8µs ± 0%  -10.94%
BM_eigen_sqrt_double/256k    214µs ± 0%   208µs ± 2%   -2.76%
BM_eigen_sqrt_double/1M      866µs ± 3%   923µs ± 7%     ~
2020-12-16 18:16:11 +00:00
Turing Eret
3bee9422d6 Merge branch 'lambdaknight/eigen-master' 2020-12-16 09:18:24 -07:00
Turing Eret
19e6496ce0 Replace call to FixedDimensions() with a singleton instance of
FixedDimensions.
2020-12-16 07:34:44 -07:00
Rasmus Munk Larsen
6cee8d347e Add an additional step of Newton-Raphson for psqrt<double> on Arm, which otherwise has an error of ~1000 ulps. 2020-12-15 04:06:41 +00:00
Turing Eret
bc7d1599fb TensorStorage with FixedDimensions now has zero instance memory overhead.
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
2020-12-14 07:19:34 -07:00
Alexander Grund
cf0b5b0344 Remove code checking for CMake < 3.5
As the CMake version is at least 3.5 the code checking for earlier versions can be removed.
2020-12-14 09:57:44 +00:00
David Tellenbach
751f18f2c0 Remove comma at the end of enumeration list to silence C++03 warnings 2020-12-13 18:11:02 +01:00
Antonio Sanchez
5dc2fbabee Fix implicit cast to double.
Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors
in Google due to `-Wall`.
2020-12-12 09:26:20 -08:00
Antonio Sanchez
55967f87d1 Fix NEON pmax<PropagateNumbers,Packet4bf>.
Simple typo, the max impl called pmin instead of pmax for floats.
2020-12-11 21:50:52 -08:00
Antonio Sanchez
839aa505c3 Fix typo in AVX512 packet math. 2020-12-11 21:35:44 -08:00
David Tellenbach
536c8a79f2 Remove unused macro in Half.h 2020-12-12 00:53:26 +01:00
Antonio Sanchez
8c9976d7f0 Fix more SSE/AVX packet conversions for peven.
MSVC doesn't like function-style casts and forces us to use intrinsics.
2020-12-11 15:46:42 -08:00
Antonio Sanchez
c6efc4e0ba Replace M_LOG2E and M_LN2 with custom macros.
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included.  However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).

To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Antonio Sanchez
e82722a4a7 Fix MSVC SSE casts.
MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be
converted using intrinsic methods.
2020-12-11 08:52:59 -08:00
Deven Desai
f3d2ea48f5 Fix for broken ROCm/HIP Support
The following commit introduced a breakage in ROCm/HIP support for Eigen.

5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:222:
/home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'?
  return half2half2(from);
         ^~~~~~~~~~
         __half2half2
/opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here
            __half2 __half2half2(__half x)
                    ^
1 error generated when compiling for gfx900.

```

The cause seems to be a copy-paster error, and the fix is trivial
2020-12-11 16:14:57 +00:00
David Tellenbach
c7eb3a74cb Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64 2020-12-11 12:41:52 +01:00
Everton Constantino
bccf055a7c Add Armv8 guard on PropagateNumbers implementation. 2020-12-10 22:01:55 -03:00
Antonio Sanchez
82c0c18a83 Remove private access of std::deque::_M_impl.
This no longer works on gcc or clang, so we should just remove the hack.
The default should compile to similar code anyways.
2020-12-10 14:59:34 -08:00
David Tellenbach
00be0a7ff3 Fix vectorization of complex sqrt on NEON 2020-12-10 15:23:23 +00:00
David Tellenbach
8eb461a431 Remove comma at end of enumerator list in NEON PacketMath 2020-12-10 15:22:55 +01:00
David Tellenbach
2e8f850c78 Fix a typo in SparseMatrix documentation.
This fixes issue #2091.
2020-12-09 14:48:24 +01:00
Rasmus Munk Larsen
125cc9a5df Implement vectorized complex square root.
Closes #1905

Measured speedup for sqrt of `complex<float>` on Skylake:

SSE:
```
name                      old time/op             new time/op  delta
BM_eigen_sqrt_ctype/1     49.4ns ± 0%             54.3ns ± 0%  +10.01%
BM_eigen_sqrt_ctype/8      332ns ± 0%               50ns ± 1%  -84.97%
BM_eigen_sqrt_ctype/64    2.81µs ± 1%             0.38µs ± 0%  -86.49%
BM_eigen_sqrt_ctype/512   23.8µs ± 0%              3.0µs ± 0%  -87.32%
BM_eigen_sqrt_ctype/4k     202µs ± 0%               24µs ± 2%  -88.03%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%             0.19ms ± 0%  -88.18%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%              1.5ms ± 1%  -88.20%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%              6.2ms ± 0%  -88.18%
```

AVX2:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.6ns ± 0%  55.6ns ± 0%   +3.71%
BM_eigen_sqrt_ctype/8      334ns ± 0%    27ns ± 0%  -91.86%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.22µs ± 2%  -92.28%
BM_eigen_sqrt_ctype/512   23.8µs ± 1%   1.7µs ± 1%  -92.81%
BM_eigen_sqrt_ctype/4k     201µs ± 0%    14µs ± 1%  -93.24%
BM_eigen_sqrt_ctype/32k   1.62ms ± 0%  0.11ms ± 1%  -93.29%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.9ms ± 1%  -93.31%
BM_eigen_sqrt_ctype/1M    52.0ms ± 0%   3.5ms ± 1%  -93.31%
```

AVX512:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.7ns ± 0%  56.2ns ± 1%   +4.75%
BM_eigen_sqrt_ctype/8      334ns ± 0%    18ns ± 2%  -94.63%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.12µs ± 1%  -95.54%
BM_eigen_sqrt_ctype/512   23.9µs ± 1%   1.0µs ± 1%  -95.89%
BM_eigen_sqrt_ctype/4k     202µs ± 0%     8µs ± 1%  -96.13%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%  0.06ms ± 1%  -96.15%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.5ms ± 4%  -96.11%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%   2.0ms ± 1%  -96.13%
```
2020-12-08 18:13:35 -08:00
Antonio Sanchez
8cfe0db108 Fix host/device calls for __half.
The previous code had `__host__ __device__` functions calling `__device__`
functions (e.g. `__low2half`) which caused build failures in tensorflow.
Also tried to simplify the `#ifdef` guards to make them more clear.
2020-12-08 20:31:02 +00:00
Everton Constantino
baf9d762b7 - Enabling PropagateNaN and PropagateNumbers for NEON.
- Adding propagate tests to bfloat16.
2020-12-08 17:05:05 +00:00
Antonio Sanchez
634bd79b0e Fix unused warning on new dense_assignment_loop impl. 2020-12-07 19:14:21 -08:00
Antonio Sanchez
655c3a4042 Add specialization for compile-time zero-sized dense assignment.
In the current `dense_assignment_loop` implementations, if the
destination's inner or outer size is zero at compile time and if the kernel
involves a product, we currently get a compile error (#2080).  This is
triggered by attempting to multiply a non-existent row by a column (or
vice-versa).

To address this, we add a specialization for zero-sized assignments
(`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static
check to ensure the size is in-fact zero. This now seems to be the only
existing use of `AllAtOnceTraversal`.

Fixes #2080.
2020-12-07 08:38:43 -08:00
Antonio Sanchez
5ec4907434 Clean up #ifs in GPU PacketPath.
Removed redundant checks and redundant code for CUDA/HIP.

Note: there are several issues here of calling `__device__` functions
from `__host__ __device__` functions, in particular `__low2half`.
We do not address that here -- only modifying this file enough
to get our current tests to compile.

Fixed: #1847
2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Antonio Sanchez
2dbac2f99f Fix bad NEON fp16 check 2020-12-04 13:42:18 -08:00
Antonio Sanchez
e2f21465fe Special function implementations for half/bfloat16 packets.
Current implementations fail to consider half-float packets, only
half-float scalars.  Added specializations for packets on AVX, AVX512 and
NEON.  Added tests to `special_packetmath`.

The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.

Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
David Tellenbach
305b8bd277 Remove duplicate #if clause 2020-12-04 18:55:46 +01:00
Antonio Sanchez
9ee9ac81de Fix shfl* macros for CUDA/HIP
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.

Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.

Fixes #2083
2020-12-04 17:18:32 +00:00
shrek1402
a9a2f2bebf The function 'prefetch' did not work correctly on the win64 platform 2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9b.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
25d8ae7465 Small cleanup of generic plog implementations:
Adding the term e*ln(2) is split into two step for no obvious reason.
This dates back to the original Cephes code from which the algorithm is adapted.
It appears that this was done in Cephes to prevent the compiler from reordering
the addition of the 3 terms in the approximation

  log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)

which must be added in reverse order since |x| < (sqrt(2)-1).

This allows rewriting the code to just 2 pmadd and 1 padd instructions,
which on a Skylake processor speeds up the code by 5-7%.
2020-12-03 19:40:40 +00:00
Antonio Sanchez
eb4d4ae070 Include chrono in main for c++11.
Hack to fix tensor tests, since min/max are overridden by `main.h`.
2020-12-03 11:27:32 -08:00
Rasmus Munk Larsen
71c85df4c1 Clean up the Tensor header and get rid of the EIGEN_SLEEP macro. 2020-12-02 11:04:04 -08:00
Antonio Sanchez
70fbcf82ed Fix typo in F32MaskToBf16Mask. 2020-12-02 07:58:34 -08:00
Antonio Sanchez
2627e2f2e6 Fix neon cmp* functions for bf16.
The current impl corrupts the comparison masks when converting
from float back to bfloat16.  The resulting masks are then
no longer all zeros or all ones, which breaks when used with
`pselect` (e.g. in `pmin<PropagateNumbers>`).  This was
causing `packetmath_15` to fail on arm.

Introducing a simple `F32MaskToBf16Mask` corrects this (takes
the lower 16-bits for each float mask).
2020-12-02 01:29:34 +00:00
Antonio Sanchez
ddd48b242c Implement CUDA __shfl* for Eigen::half
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen
e57281a741 Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available. 2020-12-01 11:31:47 -08:00
Antonio Sanchez
1992af3de2 Fix #2077, EIGEN_CONSTEXPR in Half.
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`.  This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.

Fixes #2077.
2020-12-01 03:10:21 +00:00
acxz
7b80609d49 add EIGEN_DEVICE_FUNC to methods 2020-12-01 03:08:47 +00:00
Antonio Sanchez
89f90b585d AVX512 missing ops.
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.

Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Florian Maurin
c5985c46f5 Fix typo in doc 2020-11-30 10:53:29 +00:00
Jim Lersch
68f69414f7 Workaround for doxygen class template titles in which the template
part of the class signature is lost due to a problem with forward
declarations.  The problem is probably caused by doxygen bug #7689.
It is confirmed to be fixed in doxygen >= 1.8.19.
2020-11-27 19:52:16 -07:00
Jim Lersch
a7170f2aca Fix doxygen class blocks that were not associated with the correct classes. 2020-11-27 08:48:11 -07:00
David Tellenbach
550e8f8f57 Include CMakeDependentOption to be able to use cmake_dependent_option 2020-11-27 13:21:49 +01:00
Bowie Owens
9842366bba Make inclusion of doc sub-directory optional by adjusting options.
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().

Requested by:

https://gitlab.com/libeigen/eigen/-/issues/1842

Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
filippobrizzi
aa56e1d980 check for include dirs set 2020-11-26 10:22:46 +00:00
Andreas Krebbel
1e74f93d55 Fix some packet-functions in the IBM ZVector packet-math. 2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen
79818216ed Revert "Fix Half NaN definition and test."
This reverts commit c770746d70.
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a3b300f1af Implement missing AVX half ops.
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`.  Allows `packetmath_13` to pass.

Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45 Fixes duplicate symbol when building blas
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`

Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain

  Eigen::{half,bfloat16} -> float -> bool

We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
41d5d5334b Initialize primitives to fix -Wuninitialized-const-reference.
The `meta` test generates warnings with the latest version of clang due
to passing uninitialized variables as const reference arguments.
```
test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference]
    VERIFY(( check_is_convertible(a.dot(b), f) ));
```
We don't actually use the variables, but initializing them eliminates the
new warning.

Fixes #2067.
2020-11-18 20:23:20 +00:00
Antonio Sanchez
3669498f5a Fix rule-of-3 for the Tensor module.
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.

Addresses #1863
2020-11-18 18:14:53 +00:00
Antonio Sanchez
60218829b7 EOF newline added to InverseSize4.
Causing build breakages due to `-Wnewline-eof -Werror` that seems to be
common across Google.
2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen
2d63706545 Add missing parens around macro argument. 2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109 Replace SSE_SHUFFLE_MASK macro with shuffle_mask. 2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8 Avoid promotion of Arm __fp16 to float in Neon PacketMath
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617 Fix missing EIGEN_CONSTEXPR pop_macro in Half.
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055 Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
Antonio Sanchez
8e9cc5b10a Eliminate double-promotion warnings.
Clang currently complains about implicit conversions, e.g.
```
test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion]
          data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
                                                        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion]
          data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
```

Modified to explicitly cast to double.
2020-11-16 10:39:09 -08:00
acxz
9175f50d6f Add EIGEN_DEVICE_FUNC to TranspositionsBase
Fixes #2057.
2020-11-16 15:37:40 +00:00
Martin Vonheim Larsen
280f4f2407 Enable MathJax in Doxygen.in
Note that HTTPS must be used against the MathJax CDN when hosted on `eigen.tuxfamily.org` (which uses HTTPS) in order to avoid `Mixed Content`-errors from browsers. Using HTTPS for MathJax also works if the Eigen docs are hosted on plain HTTP.
2020-11-16 12:59:13 +00:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
Christoph Hertzberg
90f6d9d23e Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace. 2020-11-13 16:21:53 +01:00
guoqiangqi
8324e5e049 Fix typo in NEON/PacketMath.h 2020-11-13 00:46:41 +00:00
Antonio Sanchez
852513e7a6 Disable testing of OpenGL by default.
The `OpenGLSupport` module contains mostly deprecated features, and the
test is highly GL context-dependent, relies on deprecated GLUT, and
requires a display.  Until the module is updated to support modern
OpenGL and the test to use newer windowing frameworks (e.g. GLFW)
it's probably best to disable the test by default.

The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`.

See #2053 for more details.
2020-11-12 16:15:40 -08:00
Rasmus Munk Larsen
bec72345d6 Simplify expression for inner product fallback in Gemv product evaluator. 2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen
276db21f26 Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct. 2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen
cf12474a8b Optimize matrix*matrix and matrix*vector products when they correspond to inner products at runtime.
This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k).

name                           old time/op             new time/op   delta
BM_VecVecStatStat<float>/1     1.64ns ± 0%             1.64ns ± 0%     ~
BM_VecVecStatStat<float>/8     2.99ns ± 0%             2.99ns ± 0%     ~
BM_VecVecStatStat<float>/64    7.00ns ± 1%             7.04ns ± 0%   +0.66%
BM_VecVecStatStat<float>/512   61.6ns ± 0%             61.6ns ± 0%     ~
BM_VecVecStatStat<float>/4k     551ns ± 0%              553ns ± 1%   +0.26%
BM_VecVecStatStat<float>/32k   4.45µs ± 0%             4.45µs ± 0%     ~
BM_VecVecStatStat<float>/256k  77.9µs ± 0%             78.1µs ± 1%     ~
BM_VecVecStatStat<float>/1M     312µs ± 0%              312µs ± 1%     ~
BM_VecVecDynStat<float>/1      13.3ns ± 1%              4.6ns ± 0%  -65.35%
BM_VecVecDynStat<float>/8      14.4ns ± 0%              6.2ns ± 0%  -57.00%
BM_VecVecDynStat<float>/64     24.0ns ± 0%             10.2ns ± 3%  -57.57%
BM_VecVecDynStat<float>/512     138ns ± 0%               68ns ± 0%  -50.52%
BM_VecVecDynStat<float>/4k     1.11µs ± 0%             0.56µs ± 0%  -49.72%
BM_VecVecDynStat<float>/32k    8.89µs ± 0%             4.46µs ± 0%  -49.89%
BM_VecVecDynStat<float>/256k   78.2µs ± 0%             78.1µs ± 1%     ~
BM_VecVecDynStat<float>/1M      313µs ± 0%              312µs ± 1%     ~
BM_VecVecDynDyn<float>/1       10.4ns ± 0%             10.5ns ± 0%   +0.91%
BM_VecVecDynDyn<float>/8       12.0ns ± 3%             11.9ns ± 0%     ~
BM_VecVecDynDyn<float>/64      37.4ns ± 0%             19.6ns ± 1%  -47.57%
BM_VecVecDynDyn<float>/512      159ns ± 0%               81ns ± 0%  -49.07%
BM_VecVecDynDyn<float>/4k      1.13µs ± 0%             0.58µs ± 1%  -49.11%
BM_VecVecDynDyn<float>/32k     8.91µs ± 0%             5.06µs ±12%  -43.23%
BM_VecVecDynDyn<float>/256k    78.2µs ± 0%             78.2µs ± 1%     ~
BM_VecVecDynDyn<float>/1M       313µs ± 0%              312µs ± 1%     ~
2020-11-12 18:02:37 +00:00
Pedro Caldeira
c29935b323 Add support for dynamic dispatch of MMA instructions for POWER 10 2020-11-12 11:31:15 -03:00
acxz
b714dd9701 remove annotation for first declaration of default con/destruction 2020-11-12 04:34:12 +00:00
mehdi-goli
e24a1f57e3 [SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects. 2020-11-12 01:50:28 +00:00
Antonio Sanchez
6961468915 Address issues with openglsupport test.
The existing test fails on several systems due to GL runtime version mismatches,
the use of deprecated features, and memory errors due to improper use of GLUT.
The test was modified to:

- Run within a display function, allowing proper GLUT cleanup.
- Generate dynamic shaders with a supported GLSL version string and output variables.
- Report shader compilation errors.
- Check GL context version before launching version-specific tests.

Note that most of the existing `OpenGLSupport` module and tests rely on deprecated
features (e.g. fixed-function pipeline). The test was modified to allow it to
pass on various systems. We might want to consider removing the module or re-writing
it entirely to support modern OpenGL.  This is beyond the scope of this patch.

Testing of legacy GL (for platforms that support it) can be enabled by defining
`EIGEN_LEGACY_OPENGL`.  Otherwise, the test will try to create a modern context.

Tested on
- MacBook Air (2019), macOS Catalina 10.15.7 (OpenGL 2.1, 4.1)
- Debian 10.6, NVidia Quadro K1200 (OpenGL 3.1, 3.3)
2020-11-11 15:54:43 -08:00
Everton Constantino
348a48682e Fix erroneous forward declaration of boost nvp. 2020-11-10 13:07:34 -03:00
guoqiangqi
82fe059f35 Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x 2020-11-04 09:21:39 +08:00
Deven Desai
9d11e2c03e CMakefile update for ROCm 4.0
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
Deven Desai
39a038f2e4 Fix for ROCm (and CUDA?) breakage - 201029
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error

e265f7ed8e

```

Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
  return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
                                                                           ^~~~~~
                                                                           Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
          ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```

The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e Remove unused functions in Half.h.
The following functions have been removed:

  Eigen::half fabsh(const Eigen::half&)
  Eigen::half exph(const Eigen::half&)
  Eigen::half sqrth(const Eigen::half&)
  Eigen::half powh(const Eigen::half&, const Eigen::half&)
  Eigen::half floorh(const Eigen::half&)
  Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00
David Tellenbach
09f015852b Replace numext::as_uint with numext::bit_cast<numext::uint32_t> 2020-10-29 07:28:28 +01:00
David Tellenbach
e265f7ed8e Add support for Armv8.2-a __fp16
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces

* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
  float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half

The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.

This closes #1940.
2020-10-28 20:15:09 +00:00
mehdi-goli
a725a3233c [SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems 2020-10-28 08:34:49 +00:00
mehdi-goli
b9ff791fed [Missing SYCL math op]: Addin the missing LDEXP Function for SYCL. 2020-10-28 08:32:57 +00:00
mehdi-goli
61461d682a [Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend. 2020-10-28 08:30:34 +00:00
Christoph Hertzberg
ecb7bc9514 Bug #2036 make sure find_standard_math_library_test_program actually compiles (and is guaranteed to call math functions) 2020-10-24 15:22:21 +02:00
Susi Lehtola
09f595a269 Make sure compiler does not optimize away calls to math functions 2020-10-24 06:16:50 +00:00
guoqiangqi
28aef8e816 Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f> 2020-10-20 11:37:09 +08:00
guoqiangqi
4a77eda1fd remove unnecessary specialize template of pexp for scale float/double 2020-10-19 00:51:42 +00:00
Antonio Sanchez
d9f0d9eb76 Fix missing pfirst<Packet16b> for MSVC.
It was only defined under one `#ifdef` case.  This fixes the `packetmath_14`
test for MSVC.
2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen
21edea5edd Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent. 2020-10-15 18:39:58 -07:00
Deven Desai
011e0db31d Fix for ROCm/HIP breakage - 201013
The following commit seems to have introduced regressions in ROCm/HIP support.

183a208212

It causes some unit-tests to fail with the following error

```
...
Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'?
...
Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'?
...
Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'?
...
```

The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above.

It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling  with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( 4091f6b25c ).

Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option.

Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either.

Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen
6ea8091705 Revert change from 4e4d3f32d1 that broke BFloat16.h build with older compilers. 2020-10-15 01:20:08 +00:00
Guoqiang QI
4700713faf Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api 2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen
af6f43d7ff Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512. 2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen
274ef12b61 Remove leftover debug print statement in cxx11_tensor_expr.cpp 2020-10-14 22:59:51 +00:00
Rasmus Munk Larsen
208b3626d1 Revert generic implementation of predux, since it break compilation of predux_any with MSVC. 2020-10-14 21:41:28 +00:00
David Tellenbach
e3e2cf9d24 Add MatrixBase::cwiseArg() 2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen
61fc78bbda Get rid of nested template specialization in TensorReductionGpu.h, which was broken by c6953f799b. 2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
acxz
807e51528d undefine EIGEN_CONSTEXPR before redefinition 2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen
9a4d04c05f Make bitwise_helper a device function to unbreak GPU builds. 2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
David Tellenbach
7a8d3d5b81 Disable test exceptions when using OpenMP. 2020-10-09 17:49:07 +02:00
David Tellenbach
9022f5aa8a Mention problems when using potentially throwing scalars and OpenMP 2020-10-09 17:04:25 +02:00
Karl Ljungkvist
d199c17b14 Fix typo in Tutorial_BlockOperations_block_assignment.cpp 2020-10-09 07:51:36 +00:00
David Tellenbach
4091f6b25c Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD 2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
183a208212 Implement generic bitwise logical packet ops that work for all types. 2020-10-08 22:45:20 +00:00
David Tellenbach
8f8d77b516 Add EIGEN prefix for HAS_LGAMMA_R 2020-10-08 18:32:19 +02:00
Eugene Zhulenev
2279f2c62f Use lgamma_r if it is available (update check for glibc 2.19+) 2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
f66f3393e3 Use reinterpret_cast instead of C-style cast in Inverse_NEON.h 2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen
22c971a225 Don't cast away const in Inverse_NEON.h. 2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen
f93841b53e Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h. 2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen
ee714f79f7 Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments. 2020-10-02 21:05:35 +00:00
janos
f7b185a8b1 dont use =* might not return a Scalar 2020-10-02 14:36:51 +02:00
Rasmus Munk Larsen
9078f47cd6 Fix build breakage with MSVC 2019, which does not support MMX intrinsics for 64 bit builds, see:
https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c

Instead use the equivalent SSE2 intrinsics.
2020-10-01 12:37:55 -07:00
Rasmus Munk Larsen
3b445d9bf2 Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564. 2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen
44b9d4e412 Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead. 2020-09-30 13:33:44 -07:00
Antonio Sanchez
d5a0d89491 Fix alignedbox 32-bit precision test failure.
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.

In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.

Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly.  Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
David Tellenbach
30960d485e Fix failure in GEBP kernel when compiling with OpenMP and FMA
Fixes #1995
2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen
f9d1500f74 Revert !182. 2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen
068121ec02 Add missing newline at the end of Inverse_NEON.h 2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen
74ff5719b3 Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long. 2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen
3a0b23e473 Fix compilation of pset1frombits calls on iOS. 2020-09-28 22:30:36 +00:00
Christoph Hertzberg
6b0c0b587e Provide a more efficient Packet2l->Packet2d cast method 2020-09-28 22:14:02 +00:00
Martin Pecka
6425e875a1 Added AlignedBox::transform(AffineTransform). 2020-09-28 18:06:23 +00:00
Alexander Grund
a967fadb21 Make relative path variables of type STRING
When the type is PATH an absolute path is expected and user-defined
values are converted into absolute paths relative to the current directory.

Fixes #1990
2020-09-28 16:39:48 +00:00
Zhuyie
e4b24e7fb2 Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined 2020-09-25 09:36:43 +00:00
Deven Desai
ce5c59729d Fix for ROCm/HIP breakage - 200921
The following commit causes regressions in the ROCm/HIP support for Eigen
e55182ac09

I suspect the same breakages occur on the CUDA side too.

The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.

When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.

This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
2020-09-22 22:26:45 +00:00
David Tellenbach
b8a13f13ca Add CI configuration for ppc64le 2020-09-22 00:26:23 +00:00
Guoqiang QI
821702e771 Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler 2020-09-21 15:49:00 +00:00
David Tellenbach
493a7c773c Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>> 2020-09-21 12:43:41 +02:00
Павел Мацула
38e4a67394 Fix using FindStandardMathLibrary.cmake with -Wall (-Wunused-value) added to CMAKE_CXX_FLAG 2020-09-19 16:13:16 +00:00
Rasmus Munk Larsen
c4b99f78c7 Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86.
If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.
2020-09-18 18:13:20 -07:00
guoqiangqi
9aad16b443 Fix undefined reference to pset1frombits bug on different platforms 2020-09-19 00:53:21 +00:00
David Tellenbach
c4aa8e0db2 Rename variable to avoid shadowing of a previously declared one 2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen
14022f5eb5 Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h.
'vmvnq_u64' does not exist for some reason.
2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen
a5b226920f Fix typo in PacketMath.h 2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen
3af744b023 Add missing packet op pcmp_lt_or_nan for Packet2d on ARM. 2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen
31a6b88ff3 Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported. 2020-09-17 23:51:06 +00:00
Brad King
880fa43b2b Add support for CastXML on ARM aarch64
CastXML simulates the preprocessors of other compilers, but actually
parses the translation unit with an internal Clang compiler.
Use the same `vld1q_u64` workaround that we do for Clang.

Fixes: #1979
2020-09-16 13:40:23 -04:00
daravi
6f0f6f792e Fix compiler error due to c++20 operator== generation rules 2020-09-16 02:06:53 +00:00
Benoit Jacob
cc0c38ace8 Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler. 2020-09-15 20:54:14 -04:00
Tim Shen
bb56a62582 Make bfloat16(float(-nan)) produce -nan, not nan. 2020-09-15 13:24:23 -07:00
Guoqiang QI
3012e755e9 Add plog ops support packet2d for NEON 2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen
e4fb0ddf78 Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h 2020-09-15 01:18:55 +00:00
Pedro Caldeira
65e400896b Fix bfloat16 round on gcc 4.8 2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen
5636f80d11 Fix issue #1968. Don't discard return value from "new" in C++17. 2020-09-13 17:38:45 +00:00
Guoqiang QI
7c5d48f313 Unified sse pldexp_double api 2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen
71e08c702b Make blueNorm threadsafe if C++11 atomics are available. 2020-09-12 01:23:29 +00:00
David Tellenbach
adc861cabd New CI infrastructure, including AArch64 runners 2020-09-11 18:11:49 +00:00
Niels Dekker
5328c9be43 Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
2020-09-10 16:22:28 +02:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Guoqiang QI
85428a3440 Add Neon psqrt<Packet2d> and pexp<Packet2d> 2020-09-08 09:04:03 +00:00
Alexander Neumann
5272106826 remove semi triggering -Wextra-semi-stmt 2020-09-07 11:42:30 +02:00
Stephen Zheng
5f25bcf7d6 Add Inverse_NEON.h
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.

```
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_float                 -0.1285         -0.1275           568           495           572           499
BM_double                -0.2265         -0.2254           638           494           641           496
```
2020-09-04 10:55:47 +00:00
Everton Constantino
6fe88a3c9d MatrixProuct enhancements:
- Changes to Altivec/MatrixProduct
  Adapting code to gcc 10.
  Generic code style and performance enhancements.
  Adding PanelMode support.
  Adding stride/offset support.
  Enabling float64, std::complex and std::complex.
  Fixing lack of symm_pack.
  Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275 Changing u/int8_t to un/signed char because clang does not understand
it.

Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Gael Guennebaud
27e6648074 fix #1901: warning in Mode==(Upper|Lower) 2020-09-02 15:43:58 +02:00
Hans Johnson
5b9bfc892a BUG: cmake_minimum_required must be the first command
https://cmake.org/cmake/help/v3.5/command/project.html

Note: Call the cmake_minimum_required() command at the beginning of the
top-level CMakeLists.txt file even before calling the project() command.
It is important to establish version and policy settings before invoking
other commands whose behavior they may affect. See also policy CMP0000.
2020-08-28 22:57:16 +00:00
Chip Kerchner
e5886457c8 Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub. 2020-08-28 19:27:32 +00:00
Gael Guennebaud
25424d91f6 Fix #1974: assertion when reserving an empty sparse matrix 2020-08-26 12:32:20 +02:00
Guoqiang QI
8bb0febaf9 add psqrt ops support packet2f/packet4f for NEON 2020-08-21 03:17:15 +00:00
Georg Jäger
1b1082334b adding attributes to constructors to support hip-clang on ROCm 3.5 2020-08-20 16:48:11 +02:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
c060114a25 Fix nightly CI configuration 2020-08-19 20:52:34 +02:00
David Tellenbach
fe8c3ef3cb Add possibility to split test suit build targets and improved CI configuration
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
2020-08-19 18:27:45 +00:00
Rasmus Munk Larsen
d10b27fe37 Add missing inline keyword in Quaternion.h. 2020-08-14 17:51:04 +00:00
David Tellenbach
d4a727d092 Disable min/max NaN propagation in test cxx11_tensor_expr
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.

See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
David Tellenbach
d2bb6cf396 Fix compilation error in blasutil test 2020-08-14 18:15:18 +02:00
David Tellenbach
c6820a6316 Replace the call to int64_t in the blasutil test by explicit types
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.

We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
David Tellenbach
8ba1b0f41a bfloat16 packetmath for Arm Neon backend 2020-08-13 15:48:40 +00:00
Pedro Caldeira
704798d1df Add support for Bfloat16 to use vector instructions on Altivec
architecture
2020-08-10 13:22:01 -05:00
Deven Desai
46f8a18567 Adding an explicit launch_bounds(1024) attribute for GPU kernels.
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.

This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)

Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.

This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
Zachary Garrett
21122498ec Temporarily turn off the NEON implementation of pfloor as it does not work for large values.
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point  represented are supported.
2020-08-04 16:28:23 +00:00
David Tellenbach
23b7f0572b Disable CI buildstage again 2020-08-03 15:41:43 +02:00
Gael Guennebaud
d0f5d4bc50 add a banner to advertise the survey 2020-07-29 19:01:38 +02:00
David Tellenbach
5e484fa11d Fix StlDeque for GCC 10
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641 Fix undefine BF16 union behavior in AVX512. 2020-07-29 02:20:21 +00:00
Rasmus Munk Larsen
b92206676c Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned. 2020-07-28 19:19:37 +00:00
David Tellenbach
99da2e1a8d Fix clang-tidy warnings in generic bfloat16 implementation
See !172 for related discussions.
2020-07-27 16:00:24 +02:00
qxxxb
649fd1c2ae Fix CMake install command 2020-07-25 16:35:13 -04:00
David Tellenbach
e48d8e4725 Don't allow failure for CI build stage anymore 2020-07-24 21:12:15 +02:00
David Tellenbach
b8ca93842c Improve CI configuration
- Fix docker Fedora image to Fedora:31
  - Fix gcc version to gcc-9.2.1
  - Use GitLab CI dag
  - Fix usage of build cache
  - Introduce build artificats
2020-07-24 15:58:44 +00:00
Gael Guennebaud
fb0c6868ad Add missing footer declaration 2020-07-24 10:28:44 +02:00
David Tellenbach
c1ffe452fc Fix bfloat16 casts
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Gael Guennebaud
2ce2f51989 remove piwik tracker 2020-07-23 13:51:39 +02:00
Rasmus Munk Larsen
1b84f21e32 Revert change that made conversion from bfloat16 to {float, double} implicit.
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
David Tellenbach
38b91f256b Fix cast of blfoat16 to std::complex<T>
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen
bed7fbe854 Make sure we take the little-endian path if __BYTE_ORDER__ is not defined. 2020-07-22 18:54:38 +00:00
Niels Dekker
0e1a33a461 Faster conversion from integer types to bfloat16
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.

A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen
acab22c205 Avoid division by zero in nonZerosEstimate() for empty blocks. 2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00
Rasmus Munk Larsen
0aeaf5f451 Make numext::as_uint a device function. 2020-07-22 00:33:41 +00:00
Alexander Turkin
60faa9f897 user-defined copy operations removed in favor of compiler-generated ones 2020-07-20 14:59:35 +03:00
Niels Dekker
b11f817bcf Avoid undefined behavior by union type punning in float_to_bfloat16_rtne
Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

`numext::as_uint` was suggested by David Tellenbach
2020-07-14 19:55:20 +02:00
Sheng Yang
56b3e3f3f8 AVX path for BF16 2020-07-14 01:34:03 +00:00
Niels Dekker
4ab32e2de2 Allow implicit conversion from bfloat16 to float and double
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.

Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d Guard operator<< test by EIGEN_NO_IO. 2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
ed00df445d Guard operator<< by EIGEN_NO_IO. 2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen
fb77b7288c Add operator<< to print a quaternion. 2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48 Fix test basic stuff
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Forrest Voight
8889a2c1c6 Add operator==/operator!= to Quaternion. Fixes #1876. 2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183 Make test packetmath C++98 compliant 2020-07-01 20:41:59 +02:00
Sheng Yang
116c5235ac BF16 for scalar_cmp_with_cast_op 2020-07-01 18:33:42 +00:00
Kan Chen
8731452b97 Delete duplicate test cases in vectorization_logic.cpp 2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f Fix denormal check pre c++11.
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d Report custom C++ flags in CMake testing summary 2020-06-30 17:18:54 +00:00
David Tellenbach
f3b8d441f6 Remote CI tags to enable shared runners 2020-06-29 22:15:41 +02:00
Christoph Grüninger
dc0b81fb1d Pass CMAKE_MAKE_PROGRAM to Fortran language support test
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
2020-06-27 23:52:38 +02:00
David Tellenbach
13d25f5ed8 Add initial CI configuration file.
The initial CI configuration consists of jobs to build and run tests and
to build docs.
2020-06-27 00:03:35 +00:00
Antonio Sanchez
7222f0b6b5 Fix packetmath_1 float tests for arm/aarch64.
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.

Modified `log(denorm)` tests.  Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.

Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Simon Pfreundschuh
14f84978e8 Replaced call to deprecated 'load' function with appropriate call to 'on'. 2020-06-23 11:23:13 +02:00
Antonio Sanchez
ff4e7a0820 Add missing Packet2l/Packet2ul ops for NEON.
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-22 11:24:43 -07:00
Antonio Sanchez
03ebdf6acb Added missing NEON pcasts, update packetmath tests.
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.

Removed incorrect `HasHalfPacket`  definition for NEON's
`Packet2l`/`Packet2ul`.

Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
  packets that are vectorizable)
- added 8:1 cast tests
- random number generation
  - original had uninteresting 0 to 0 casts for many casts between
    floating-point and integers, and exhibited signed overflow
    undefined behavior

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
6b9c92fe7e Add Apache 2.0 license text in COPYING.APACHE. 2020-06-18 12:45:27 -07:00
Nicolas Mellado
cf7adf3a5d Update things you can do message using cmake commands
Print cmake commands instead of make commands, which should work for any generator.
2020-06-16 21:04:33 +00:00
Ilya Tokar
231ce21535 Run two independent chains, when reducing tensors.
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.

Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.

BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
Pedro Caldeira
a475bf14d4 Fix pscatter and pgather for Altivec Complex double 2020-06-16 16:41:02 -03:00
David Tellenbach
c6c84ed961 Fix unused variable warning on Arm 2020-06-15 00:14:58 +02:00
Sebastien Boisvert
6228f27234 Fix #1818: SparseLU: add methods nnzL() and nnzU()
Now this compiles without errors:

$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
2020-06-11 23:49:49 +00:00
Sebastien Boisvert
39cbd6578f Fix #1911: add benchmark for move semantics with fixed-size matrix
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
        -o bench_move_semantics

$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Sebastien Boisvert
463ec86648 Fix #1757: remove the word 'suicide' 2020-06-11 00:56:54 +00:00
ShengYang1
b5d66b5e73 Implement scalar_cmp_with_cast_op 2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
c4059ffcb6 Fix static analyzer warning in SelfadjointProduct.h.
Fix compiler warnings in GeneralBlockPanelKernel.h.
2020-06-08 11:48:44 -07:00
Thales Sabino
1fcaaf460f Update FindComputeCpp.cmake to fix build problems on Windows
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
2020-06-05 20:51:20 +00:00
David Tellenbach
3ce18d3c8f Revert ".gitlab-ci.yml: initial commit"
This reverts commit 95177362ed to
disable GitLab CI temporarily.
2020-06-05 22:43:49 +02:00
Rasmus Munk Larsen
c2ab36f47a Fix broken packetmath test for logistic on Arm. 2020-06-04 16:24:47 -07:00
Rasmus Munk Larsen
537e2b322f Fix typo in previous update to generic predux_any. 2020-06-04 22:25:05 +00:00
Rasmus Munk Larsen
fdc1cbdce3 Avoid implicit float equality comparison in generic predux_any, but use numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal. 2020-06-04 22:15:56 +00:00
Rasmus Munk Larsen
daf9bbeca2 Fix compilation error in logistic packet op. 2020-06-03 00:57:41 +00:00
n0mend
6d2a9a524b Update run instructions for benchCholesky 2020-06-01 18:31:46 +00:00
Gael Guennebaud
029a76e115 Bug #1777: make the scalar and packet path consistent for the logistic function + respective unit test 2020-05-31 00:53:37 +02:00
Gael Guennebaud
99b7f7cb9c Fix #556: warnings with mingw 2020-05-31 00:39:44 +02:00
Gael Guennebaud
72782d13e0 Bug #1767: increase required cmake version to 3.5.0 2020-05-31 00:31:09 +02:00
Gael Guennebaud
867a756509 Fix #1833: compilation issue of "array!=scalar" with c++20 2020-05-30 23:53:58 +02:00
Gael Guennebaud
ab615e4114 Save one extra temporary when assigning a sparse product to a row-major sparse matrix 2020-05-30 23:15:12 +02:00
Christoph Junghans
95177362ed .gitlab-ci.yml: initial commit 2020-05-29 09:23:25 -06:00
Kan Chen
8d1302f566 Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ptranspose on NEON 2020-05-29 00:33:45 +00:00
Antonio Sánchez
8719b9c5bc Disable test for 32-bit systems (e.g. ARM, i386)
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.

https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Yong Tang
8e1df5b082 Fix incorrect usage of if defined(EIGEN_ARCH_PPC) => if EIGEN_ARCH_PPC
This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)`
in `Eigen/Core` header.

In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined
as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true.
This causes issues when building on non PPC platform and `MatrixProduct.h` is not
available.

This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2020-05-28 05:53:44 -07:00
Kan Chen
4e7046063b Fix #1874: it works on both MSVC 2017 and other platforms. 2020-05-21 18:42:56 +08:00
Pedro Caldeira
2d67af2d2b Add pscatter for Packet16{u}c (int8) 2020-05-20 17:29:34 -03:00
David Tellenbach
5328cd62b3 Guard usage of decltype since it's a C++11 feature
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
2020-05-20 16:04:16 +02:00
Rasmus Munk Larsen
cc86a31e20 Add guard around specialization for bool, which is only currently implemented for SSE. 2020-05-19 16:21:56 -07:00
Everton Constantino
8a7f360ec3 - Vectorizing MMA packing.
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
a145e4adf5 Add newline at the end of StlIterators.h. 2020-05-15 20:36:00 +00:00
Gael Guennebaud
8ce9630ddb Fix #1874: workaround MSVC 2017 compilation issue. 2020-05-15 20:47:32 +02:00
Rasmus Munk Larsen
9b411757ab Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
This change also contains a few minor cleanups:
  1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
     which can be done in other ways.
  2. Remove the "HasInsert" enum, which is no longer needed since we removed the
     corresponding packet ops.
  3. Add faster pselect op for Packet4i when SSE4.1 is supported.

Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.

Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                        Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4            9.77           9.77    71670320
BM_TransposeInPlace<float>/8           21.9           21.9     31929525
BM_TransposeInPlace<float>/16          66.6           66.6     10000000
BM_TransposeInPlace<float>/32         243            243        2879561
BM_TransposeInPlace<float>/59         844            844         829767
BM_TransposeInPlace<float>/64         933            933         750567
BM_TransposeInPlace<float>/128       3944           3945         177405
BM_TransposeInPlace<float>/256      16853          16853          41457
BM_TransposeInPlace<float>/512     204952         204968           3448
BM_TransposeInPlace<float>/1k     1053889        1053861            664
BM_TransposeInPlace<bool>/4            14.4           14.4     48637301
BM_TransposeInPlace<bool>/8            36.0           36.0     19370222
BM_TransposeInPlace<bool>/16           31.5           31.5     22178902
BM_TransposeInPlace<bool>/32          111            111        6272048
BM_TransposeInPlace<bool>/59          626            626        1000000
BM_TransposeInPlace<bool>/64          428            428        1632689
BM_TransposeInPlace<bool>/128        1677           1677         417377
BM_TransposeInPlace<bool>/256        7126           7126          96264
BM_TransposeInPlace<bool>/512       29021          29024          24165
BM_TransposeInPlace<bool>/1k       116321         116330           6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31 Added support for reverse iterators for Vectorwise operations. 2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5 Indexed view should have RowMajorBit when there is staticly a single row 2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28 Resolve "IndexedView of a vector should allow linear access" 2020-05-13 19:24:42 +00:00
Mark Eberlein
ba9d18b938 Add KLU support to spbenchsolver 2020-05-11 21:50:27 +00:00
Pedro Caldeira
5fdc179241 Altivec template functions to better code reusability 2020-05-11 21:04:51 +00:00
mehdi-goli
d3e81db6c5 Eigen moved the scanLauncehr function inside the internal namespace.
This commit applies the following changes:
    - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend.
    - Replacing  `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard.
    - minor fixes: commenting out an unused variable to avoid compiler warnings.
2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
David Tellenbach
5c4e19fbe7 Possibility to specify user-defined default cache sizes for GEBP kernel
Some architectures have no convinient way to determine cache sizes at
runtime. Eigen's GEBP kernel falls back to default cache values in this
case which might not be correct in all situations.

This patch introduces three preprocessor directives

  `EIGEN_DEFAULT_L1_CACHE_SIZE`
  `EIGEN_DEFAULT_L2_CACHE_SIZE`
  `EIGEN_DEFAULT_L3_CACHE_SIZE`

to give users the possibility to set these default values explicitly.
2020-05-08 12:54:36 +02:00
Rasmus Munk Larsen
225ab040e0 Remove unused packet op "palign".
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618 Make size odd for transposeInPlace test to make sure we hit the scalar path. 2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
49f1aeb60d Remove traits declaring NEON vectorized casts that do not actually have packet op implementations. 2020-05-07 09:49:22 -07:00
Rasmus Munk Larsen
2fd8a5a08f Add parallelization of TensorScanOp for types without packet ops.
Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors.

Benchmark numbers for Tensor<uint32_t>:

name                                                       old time/op             new time/op             delta
BM_cumSumRowReduction_1T/8   [using 1 threads]             76.5ns ± 0%             61.3ns ± 4%    -19.80%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/64  [using 1 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.77%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/256 [using 1 threads]             39.8µs ± 0%             39.6µs ± 0%     -0.60%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/4k  [using 1 threads]             13.9ms ± 0%             13.4ms ± 1%     -4.19%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/8   [using 2 threads]             76.8ns ± 0%             59.1ns ± 0%    -23.09%          (p=0.016 n=5+4)
BM_cumSumRowReduction_2T/64  [using 2 threads]             2.47µs ± 1%             2.41µs ± 1%     -2.53%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/256 [using 2 threads]             39.8µs ± 0%             34.7µs ± 6%    -12.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/4k  [using 2 threads]             13.8ms ± 1%              7.2ms ± 6%    -47.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/8   [using 8 threads]             76.4ns ± 0%             61.8ns ± 3%    -19.02%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/64  [using 8 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.84%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/256 [using 8 threads]             39.8µs ± 0%             28.3µs ±11%    -28.75%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/4k  [using 8 threads]             13.8ms ± 0%              2.7ms ± 5%    -80.39%          (p=0.008 n=5+5)
BM_cumSumColReduction_1T/8   [using 1 threads]             59.1ns ± 0%             80.3ns ± 0%    +35.94%          (p=0.029 n=4+4)
BM_cumSumColReduction_1T/64  [using 1 threads]             3.06µs ± 0%             3.08µs ± 1%       ~             (p=0.114 n=4+4)
BM_cumSumColReduction_1T/256 [using 1 threads]              175µs ± 0%              176µs ± 0%       ~             (p=0.190 n=4+5)
BM_cumSumColReduction_1T/4k  [using 1 threads]              824ms ± 1%              844ms ± 1%     +2.37%          (p=0.008 n=5+5)
BM_cumSumColReduction_2T/8   [using 2 threads]             59.0ns ± 0%             90.7ns ± 0%    +53.74%          (p=0.029 n=4+4)
BM_cumSumColReduction_2T/64  [using 2 threads]             3.06µs ± 0%             3.10µs ± 0%     +1.08%          (p=0.016 n=4+5)
BM_cumSumColReduction_2T/256 [using 2 threads]              176µs ± 0%              189µs ±18%       ~             (p=0.151 n=5+5)
BM_cumSumColReduction_2T/4k  [using 2 threads]              836ms ± 2%              611ms ±14%    -26.92%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/8   [using 8 threads]             59.3ns ± 2%             90.6ns ± 0%    +52.79%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/64  [using 8 threads]             3.07µs ± 0%             3.10µs ± 0%     +0.99%          (p=0.016 n=5+4)
BM_cumSumColReduction_8T/256 [using 8 threads]              176µs ± 0%               80µs ±19%    -54.51%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/4k  [using 8 threads]              827ms ± 2%              180ms ±14%    -78.24%          (p=0.008 n=5+5)
2020-05-06 14:48:37 -07:00
Rasmus Munk Larsen
0e59f786e1 Fix accidental copy of loop variable. 2020-05-05 21:35:38 +00:00
Rasmus Munk Larsen
7b76c85daf Vectorize and parallelize TensorScanOp.
TensorScanOp is used in TensorFlow for a number of operations, such as cumulative logexp reduction and cumulative sum and product reductions.

The benchmarks numbers below are for cumulative row- and column reductions of NxN matrices.

name                                                         old time/op             new time/op     delta
BM_cumSumRowReduction_1T/4    [using 1 threads ]             25.1ns ± 1%             35.2ns ± 1%    +40.45%
BM_cumSumRowReduction_1T/8    [using 1 threads ]             73.4ns ± 0%             82.7ns ± 3%    +12.74%
BM_cumSumRowReduction_1T/32   [using 1 threads ]              988ns ± 0%              832ns ± 0%    -15.77%
BM_cumSumRowReduction_1T/64   [using 1 threads ]             4.07µs ± 2%             3.47µs ± 0%    -14.70%
BM_cumSumRowReduction_1T/128  [using 1 threads ]             18.0µs ± 0%             16.8µs ± 0%     -6.58%
BM_cumSumRowReduction_1T/512  [using 1 threads ]              287µs ± 0%              281µs ± 0%     -2.22%
BM_cumSumRowReduction_1T/2k   [using 1 threads ]             4.78ms ± 1%             4.78ms ± 2%       ~
BM_cumSumRowReduction_1T/10k  [using 1 threads ]              117ms ± 1%              117ms ± 1%       ~
BM_cumSumRowReduction_8T/4    [using 8 threads ]             25.0ns ± 0%             35.2ns ± 0%    +40.82%
BM_cumSumRowReduction_8T/8    [using 8 threads ]             77.2ns ±16%             81.3ns ± 0%       ~
BM_cumSumRowReduction_8T/32   [using 8 threads ]              988ns ± 0%              833ns ± 0%    -15.67%
BM_cumSumRowReduction_8T/64   [using 8 threads ]             4.08µs ± 2%             3.47µs ± 0%    -14.95%
BM_cumSumRowReduction_8T/128  [using 8 threads ]             18.0µs ± 0%             17.3µs ±10%       ~
BM_cumSumRowReduction_8T/512  [using 8 threads ]              287µs ± 0%               58µs ± 6%    -79.92%
BM_cumSumRowReduction_8T/2k   [using 8 threads ]             4.79ms ± 1%             0.64ms ± 1%    -86.58%
BM_cumSumRowReduction_8T/10k  [using 8 threads ]              117ms ± 1%               18ms ± 6%    -84.50%

BM_cumSumColReduction_1T/4    [using 1 threads ]             23.9ns ± 0%             33.4ns ± 1%    +39.68%
BM_cumSumColReduction_1T/8    [using 1 threads ]             71.6ns ± 1%             49.1ns ± 3%    -31.40%
BM_cumSumColReduction_1T/32   [using 1 threads ]              973ns ± 0%              165ns ± 2%    -83.10%
BM_cumSumColReduction_1T/64   [using 1 threads ]             4.06µs ± 1%             0.57µs ± 1%    -85.94%
BM_cumSumColReduction_1T/128  [using 1 threads ]             33.4µs ± 1%              4.1µs ± 1%    -87.67%
BM_cumSumColReduction_1T/512  [using 1 threads ]             1.72ms ± 4%             0.21ms ± 5%    -87.91%
BM_cumSumColReduction_1T/2k   [using 1 threads ]              119ms ±53%               11ms ±35%    -90.42%
BM_cumSumColReduction_1T/10k  [using 1 threads ]              1.59s ±67%              0.35s ±49%    -77.96%
BM_cumSumColReduction_8T/4    [using 8 threads ]             23.8ns ± 0%             33.3ns ± 0%    +40.06%
BM_cumSumColReduction_8T/8    [using 8 threads ]             71.6ns ± 1%             49.2ns ± 5%    -31.33%
BM_cumSumColReduction_8T/32   [using 8 threads ]             1.01µs ±12%             0.17µs ± 3%    -82.93%
BM_cumSumColReduction_8T/64   [using 8 threads ]             4.15µs ± 4%             0.58µs ± 1%    -86.09%
BM_cumSumColReduction_8T/128  [using 8 threads ]             33.5µs ± 0%              4.1µs ± 4%    -87.65%
BM_cumSumColReduction_8T/512  [using 8 threads ]             1.71ms ± 3%             0.06ms ±16%    -96.21%
BM_cumSumColReduction_8T/2k   [using 8 threads ]             97.1ms ±14%              3.0ms ±23%    -96.88%
BM_cumSumColReduction_8T/10k  [using 8 threads ]              1.97s ± 8%              0.06s ± 2%    -96.74%
2020-05-05 00:19:43 +00:00
Xiaoxiang Cao
a74a278abd Fix confusing template param name for Stride fwd decl. 2020-04-30 01:43:05 +00:00
Rasmus Munk Larsen
923ee9aba3 Fix the embarrassingly incomplete fix to the embarrassing bug in blocked transpose. 2020-04-29 17:27:36 +00:00
Rasmus Munk Larsen
a32923a439 Fix (embarrassing) bug in blocked transpose. 2020-04-29 17:02:27 +00:00
Rasmus Munk Larsen
1e41406c36 Add missing transpose in cleanup loop. Without it, we trip an assertion in debug mode. 2020-04-29 01:30:51 +00:00
Rasmus Munk Larsen
fbe7916c55 Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile. 2020-04-29 00:58:41 +00:00
Clément Grégoire
82f54ad144 Fix perf monitoring merge function 2020-04-28 17:02:59 +00:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993 Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)

name                                       old time/op             new time/op             delta
BM_TransposeInPlace<float>/4               9.84ns ± 0%             6.51ns ± 0%  -33.80%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8               23.6ns ± 1%             17.6ns ± 0%  -25.26%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16              78.8ns ± 0%             60.3ns ± 0%  -23.50%          (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32               302ns ± 0%              229ns ± 0%  -24.40%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59              1.03µs ± 0%             0.84µs ± 1%  -17.87%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64              1.20µs ± 0%             0.89µs ± 1%  -25.81%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128             8.96µs ± 0%             3.82µs ± 2%  -57.33%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256              152µs ± 3%               17µs ± 2%  -89.06%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512              837µs ± 1%              208µs ± 0%  -75.15%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k              4.28ms ± 2%             1.08ms ± 2%  -74.72%          (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Pedro Caldeira
29f0917a43 Add support to vector instructions to Packet16uc and Packet16c 2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen
e80ec24357 Remove unused packet op "preduxp". 2020-04-23 18:17:14 +00:00
René Wagner
0aebe19aca BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.
This enables operator== on Eigen matrices in device code.
2020-04-23 17:25:08 +02:00
Eugene Zhulenev
3c02fefec5 Add async evaluation support to TensorSlicingOp.
Device::memcpy is not async-safe and might lead to deadlocks. Always evaluate slice expression in async mode.
2020-04-22 19:55:01 +00:00
Pedro Caldeira
0c67b855d2 Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations 2020-04-21 14:52:46 -03:00
Rasmus Munk Larsen
e8f40e4670 Fix bug in ptrue for Packet16b. 2020-04-20 21:45:10 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
dlazenby
00f6340153 Update PreprocessorDirectives.dox - Added line for the new VectorwiseOp plugin directive (and re-alphabatized the plugin section) 2020-04-17 21:43:37 +00:00
Rasmus Munk Larsen
5ab87d8aba Move eigen_packet_wrapper to GenericPacketMath.h and use it for SSE/AVX/AVX512 as it is already used for NEON.
This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i.
Use this machanism to define packets for half and clean up the packet op implementations.
2020-04-15 18:17:19 +00:00
Rasmus Munk Larsen
4aae8ac693 Fix typo in TypeCasting.h 2020-04-14 02:55:51 +00:00
Rasmus Munk Larsen
1d674003b2 Fix big in vectorized casting of
{uint8, int8} -> {int16, uint16, int32, uint32, float} 
 {uint16, int16} -> {int32, uint32, int64, uint64, float} 

for NEON. These conversions were advertised as vectorized, but not actually implemented.
2020-04-14 02:11:06 +00:00
Changming Sun
b1aa07a8d3 Fix a bug in TensorIndexList.h 2020-04-13 18:22:03 +00:00
Christoph Hertzberg
d46d726e9d CommaInitializer wrongfully asserted for 0-sized blocks
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
c854e189e6 Fixed commainitializer test.
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
2020-04-10 13:53:26 -07:00
jangsoopark
39142904cc Resolve C4346 when building eigen on windows 2020-04-08 14:55:39 +09:00
Rasmus Munk Larsen
f0577a2bfd Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.

Measured improvement with AVX+FMA:

name                           old time/op             new time/op             delta
BM_MatMul_ATB/8                 139ns ± 1%              129ns ± 1%   -7.49%          (p=0.008 n=5+5)
BM_MatMul_ATB/32               1.46µs ± 1%             1.22µs ± 0%  -16.72%          (p=0.008 n=5+5)
BM_MatMul_ATB/64               8.43µs ± 1%             7.41µs ± 0%  -12.04%          (p=0.008 n=5+5)
BM_MatMul_ATB/128              56.8µs ± 1%             52.9µs ± 1%   -6.83%          (p=0.008 n=5+5)
BM_MatMul_ATB/256               407µs ± 1%              395µs ± 3%   -2.94%          (p=0.032 n=5+5)
BM_MatMul_ATB/512              3.27ms ± 3%             3.18ms ± 1%     ~             (p=0.056 n=5+5)


Measured improvement for AVX512:

name                          old time/op             new time/op             delta
BM_MatMul_ATB/8                167ns ± 1%              154ns ± 1%   -7.63%          (p=0.008 n=5+5)
BM_MatMul_ATB/32              1.08µs ± 1%             0.83µs ± 3%  -23.58%          (p=0.008 n=5+5)
BM_MatMul_ATB/64              6.21µs ± 1%             5.06µs ± 1%  -18.47%          (p=0.008 n=5+5)
BM_MatMul_ATB/128             36.1µs ± 2%             31.3µs ± 1%  -13.32%          (p=0.008 n=5+5)
BM_MatMul_ATB/256              263µs ± 2%              242µs ± 2%   -7.92%          (p=0.008 n=5+5)
BM_MatMul_ATB/512             1.95ms ± 2%             1.91ms ± 2%     ~             (p=0.095 n=5+5)
BM_MatMul_ATB/1k              15.4ms ± 4%             14.8ms ± 2%     ~             (p=0.095 n=5+5)
2020-04-07 22:09:51 +00:00
Antonio Sanchez
8e875719b3 Replace norm() with squaredNorm() to address integer overflows
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.

Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Antonio Sanchez
9dda5eb7d2 Missing struct definition in NumTraits 2020-04-07 09:01:11 -07:00
Akshay Naresh Modi
bcc0e9e15c Add numeric_limits min and max for bool
This will allow (among other things) computation of argmax and argmin of bool tensors
2020-04-06 23:38:57 +00:00
Bernardo Bahia Monteiro
54a0a9c9dd Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar
The error generated by the compiler was:

    no matching function for call to 'maxi'
    RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);

The important part in the following notes was:

    candidate template ignored: deduced conflicting
    types for parameter 'T'"
    ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
    EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)

I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
2020-03-29 19:44:12 -04:00
Rasmus Munk Larsen
4fd5d1477b Fix packetmath test build for AVX. 2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
393dbd8ee9 Fix bug in 52d54278be 2020-03-27 16:42:18 +00:00
Rasmus Munk Larsen
55c8fe8d0f Fix bug in 52d54278be 2020-03-27 16:41:15 +00:00
Joel Holdsworth
6d2dbfc453 NEON: Fixed MSVC types definitions 2020-03-26 20:19:58 +00:00
Joel Holdsworth
52d54278be Additional NEON packet-math operations 2020-03-26 20:18:19 +00:00
Everton Constantino
deb93ed1bf Adhere to recommended load/store intrinsics for pp64le 2020-03-23 15:18:15 -03:00
Aaron Franke
5c22c7a7de Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Everton Constantino
5afdaa473a Fixing float32's pround halfway criteria to match STL's criteria. 2020-03-21 22:30:54 -05:00
Alessio M
96cd1ff718 Fixed:
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
2020-03-21 05:11:21 +00:00
dlazenby
cc954777f2 Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h 2020-03-20 19:30:01 +00:00
Masaki Murooka
55ecd58a3c Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base. 2020-03-20 13:37:37 +09:00
Rasmus Munk Larsen
4da2c6b197 Remove reference to non-existent unary_op_base class. 2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35 Add missing arguments to numext::absdiff(). 2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b Add absolute_difference coefficient-wise binary Array function 2020-03-19 17:45:20 +00:00
Everton Constantino
6ff5a14091 Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned
types.
2020-03-19 17:31:49 +00:00
Joel Holdsworth
232f904082 Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions 2020-03-19 17:24:06 +00:00
Joel Holdsworth
54aa8fa186 Implement integer square-root for NEON 2020-03-19 17:05:13 +00:00
Allan Leal
37ccb86916 Update NullaryFunctors.h 2020-03-16 11:59:02 +00:00
Deven Desai
7158ed4e0e Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as the Eigen::Half packet type 2020-03-12 01:06:24 +00:00
Joel Holdsworth
d53ae40f7b NEON: Added int64_t and uint64_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
4b9ecf2924 NEON: Added int8_t and uint8_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
ceaabd4e16 NEON: Added int16_t and uint16_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
d5d3cf9339 NEON: Added uint32_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
eacf97f727 NEON: Implemented half-size vectors 2020-03-10 22:46:19 +00:00
Joel Holdsworth
5f411b729e NEON: Set packet_traits<double> flags 2020-03-10 22:46:19 +00:00
Joel Holdsworth
88337acae2 test/packetmath: Add tests for all integer types 2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578 test/packetmath: Made negate non-mandatory 2020-03-10 22:46:19 +00:00
Sami Kama
b733b8b680 remove duplicate pset1 for half and add some comments about why we need expose pmul/add/div/min/max on host 2020-03-10 20:28:43 +00:00
Ram-Z
a45d28256d Don't restrict CMAKE_BUILD_TYPE
This prevents projects that add Eigen using `add_subdirectory` from using their own custom CMAKE_BUILD_TYPE and have Eigen respect the same custom flags.
2020-02-28 20:46:53 +00:00
Cédric Hubert
98bfc5aaa8 Update MarketIO.h 2020-02-28 12:41:51 +00:00
Rasmus Munk Larsen
52a2fbbb00 Revert "avoid selecting half-packets when unnecessary"
This reverts commit 5ca10480b0
2020-02-25 01:07:43 +00:00
Rasmus Munk Larsen
235bcfe08d Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"
This reverts commit 44df2109c8
2020-02-25 01:07:28 +00:00
Rasmus Munk Larsen
d7a42eade6 Revert "do not pick full-packet if it'd result in more operations"
This reverts commit e9cc0cd353
2020-02-25 01:07:15 +00:00
Rasmus Munk Larsen
6ac37768a9 Revert "add some static checks for packet-picking logic"
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8 Disable test in test/vectorization_logic.cpp, which is currently failing with AVX. 2020-02-24 23:28:25 +00:00
Tobias Bosch
f0ce88cff7 Include <sstream> explicitly, and don't rely on the implicit include via <complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
2020-02-24 23:09:36 +00:00
Ilya Tokar
eb6cc29583 Avoid a division in NonBlockingThreadPool::Steal.
Looking at profiles we spend ~10-20% of Steal on simply computing
random % size. We can reduce random 32-bit int into [0, size) range with
a single multiplication and shift. This transformation is described in
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
2020-02-14 16:02:57 -05:00
Francesco Mazzoli
7769600245 add some static checks for packet-picking logic 2020-02-07 18:16:16 +01:00
Francesco Mazzoli
e9cc0cd353 do not pick full-packet if it'd result in more operations
See comment and
<https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
44df2109c8 Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE
See comment for details.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
5ca10480b0 avoid selecting half-packets when unnecessary
See
<https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation>
for an explanation of the problem this solves.

In short, for some reason, before this commit the half-packet is
selected when the array / matrix size is not a multiple of
`unpacket_traits<PacketType>::size`, where `PacketType` starts out
being the full Packet.

For example, for some data of 100 `float`s, `Packet4f` will be
selected rather than `Packet8f`, because 100 is not a multiple of 8,
the size of `Packet8f`.

This commit switches to selecting the half-packet if the size is
less than the packet size, which seems to make more sense.

As I stated in the SO post I'm not sure that I'm understanding the
issue correctly, but this fix resolves the issue in my program. Moreover,
`make check` passes, with the exception of line 614 and 616 in
`test/packetmath.cpp`, which however also fail on master on my machine:

    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0);
    ...
    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
2020-02-07 18:16:16 +01:00
Eugene Zhulenev
f584bd9b30 Fail at compile time if default executor tries to use non-default device 2020-02-06 22:43:24 +00:00
Eugene Zhulenev
3fda850c46 Remove dead code from TensorReduction.h 2020-01-29 18:45:31 +00:00
Jeff Daily
b5df8cabd7 fix hip-clang compilation due to new HIP scalar accessor 2020-01-20 21:08:52 +00:00
Deven Desai
6d284bb1b7 Fix for HIP breakage - 200115. Adding a missing EIGEN_DEVICE_FUNC attr 2020-01-16 00:51:43 +00:00
Srinivas Vasudevan
f6c6de5d63 Ensure Igamma does not NaN or Inf for large values. 2020-01-14 21:32:48 +00:00
Rasmus Munk Larsen
6601abce86 Remove rogue include in TypeCasting.h. Meta.h is already included by the top-level header in Eigen/Core. 2020-01-14 21:03:53 +00:00
Eugene Zhulenev
b9362fb8f7 Convert StridedLinearBufferCopy::Kind to enum class 2020-01-13 11:43:24 -08:00
Everton Constantino
5a8b97b401 Switching unpacket_traits<Packet4i> to vectorizable=true. 2020-01-13 16:08:20 -03:00
Everton Constantino
42838c28b8 Adding correct cache sizes for PPC architecture. 2020-01-13 16:58:14 +00:00
Christoph Hertzberg
1d0c45122a Removing executable bit from file mode 2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Rasmus Munk Larsen
e1ecfc162d call Explicitly ::rint and ::rintf for targets without c++11. Without this, the Windows build breaks when trying to compile numext::rint<double>. 2020-01-10 21:14:08 +00:00
Joel Holdsworth
da5a7afed0 Improvements to the tidiness and completeness of the NEON implementation 2020-01-10 18:31:15 +00:00
Anuj Rawat
452371cead Fix for gcc build error when using Eigen headers with AVX512 2020-01-10 18:05:42 +00:00
mehdi-goli
601f89dfd0 Adding RInt vector support for SYCL. 2020-01-10 18:00:36 +00:00
Matthew Powelson
2ea5a715cf Properly initialize b vector in SplineFitting
InterpolateWithDerivative does not initialize the be vector correctly. This issue is discussed In stackoverflow question 48382939.
2020-01-09 21:29:04 +00:00
Rasmus Munk Larsen
9254974115 Don't add EIGEN_DEVICE_FUNC to random() since ::rand is not available in Cuda. 2020-01-09 21:23:09 +00:00
Rasmus Munk Larsen
a3ec89b5bd Add missing EIGEN_DEVICE_FUNC annotations in MathFunctions.h. 2020-01-09 21:06:34 +00:00
Christoph Hertzberg
8333e03590 Use data.data() instead of &data (since it is not obvious that Array is trivially copyable) 2020-01-09 11:38:19 +01:00
Rasmus Munk Larsen
e6fcee995b Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower. 2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen
4217a9f090 The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()). 2020-01-08 22:21:37 +00:00
Christoph Hertzberg
9623c0c4b9 Fix formatting 2020-01-08 13:58:18 +01:00
Ilya Tokar
19876ced76 Bug #1785: Introduce numext::rint.
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
mehdi-goli
d0ae052da4 [SYCL Backend]
* Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL
 * Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend
 * Reducing the SYCL warning by  removing the abort function inside the kernel
 * Adding Strong inline to functions inside SYCL interop.
2020-01-07 15:13:37 +00:00
Everton Constantino
eedb7eeacf Protecting integer_types's long long test with a check to see if we have CXX11 support. 2020-01-07 14:35:35 +00:00
Christoph Hertzberg
bcbaad6d87 Bug #1800: Guard against misleading indentation 2020-01-03 13:47:43 +01:00
Janek Kozicki
00de570793 Fix -Werror -Wfloat-conversion warning. 2019-12-23 23:52:44 +01:00
Deven Desai
636e2bb3fa Fix for HIP breakage - 191220
The breakage was introduced by the following commit :

ae07801dd8

After the commit, HIPCC errors out on some tests with the following error

```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_device_1.dir/cxx11_tensor_device_1_generated_cxx11_tensor_device.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_device.cu:17:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor💯
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:129:12: error: no matching constructor for initialization of 'Eigen::internal::TensorBlockResourceRequirements'
    return {merge(lhs.shape_type, rhs.shape_type),           // shape_type
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
struct TensorBlockResourceRequirements {
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 5 arguments, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 3 were provided
...
...
```

The fix is to explicitly decalre the (implicitly called) constructor as a device func
2019-12-20 21:28:00 +00:00
Christoph Hertzberg
1e9664b147 Bug #1796: Make matrix squareroot usable for Map and Ref types 2019-12-20 18:10:22 +01:00
Christoph Hertzberg
d86544d654 Reduce code duplication and avoid confusing Doxygen 2019-12-19 19:48:39 +01:00
Christoph Hertzberg
dde279f57d Hide recursive meta templates from Doxygen 2019-12-19 19:47:23 +01:00
Christoph Hertzberg
c21771ac04 Use double-braces initialization (as everywhere else in the test-suite). 2019-12-19 19:20:48 +01:00
Christoph Hertzberg
a3273aeff8 Fix trivial shadow warning 2019-12-19 19:13:11 +01:00
Christoph Hertzberg
870e53c0f2 Bug #1788: Fix rule-of-three violations inside the stable modules.
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587)
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f Fix unit-test which I broke in previous fix 2019-12-19 13:42:14 +01:00
Eugene Zhulenev
7a65219a2e Fix TensorPadding bug in squeezed reads from inner dimension 2019-12-19 05:43:57 +00:00
Eugene Zhulenev
73e55525e5 Return const data pointer from TensorRef evaluator.data() 2019-12-18 23:19:36 +00:00
Eugene Zhulenev
ae07801dd8 Tensor block evaluation cost model 2019-12-18 20:07:00 +00:00
Christoph Hertzberg
72166d0e6e Fix some maybe-unitialized warnings 2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac Workaround class-memaccess warnings on newer GCC versions 2019-12-18 16:37:26 +01:00
Jeff Daily
de07c4d1c2 fix compilation due to new HIP scalar accessor 2019-12-17 20:27:30 +00:00
Eugene Zhulenev
788bef6ab5 Reduce block evaluation overhead for small tensor expressions 2019-12-17 19:06:14 +00:00
Rasmus Munk Larsen
7252163335 Add default definition for EIGEN_PREDICT_* 2019-12-16 22:31:59 +00:00
Rasmus Munk Larsen
a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
Christoph Hertzberg
8e5da71466 Resolve double-promotion warnings when compiling with clang.
`sin` was calling `sin(double)` instead of `std::sin(float)`
2019-12-13 22:46:40 +01:00
Christoph Hertzberg
9b7a2b43c2 Renamed .hgignore to .gitignore (removing hg-specific "syntax" line) 2019-12-13 19:40:57 +01:00
Ilya Tokar
06e99aaf40 Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen
73a8d572f5 Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781. 2019-12-12 19:34:25 +00:00
Srinivas Vasudevan
88062b7fed Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one. 2019-12-12 01:56:54 +00:00
Eugene Zhulenev
381f8f3139 Initialize non-trivially constructible types when allocating a temp buffer. 2019-12-12 01:31:30 +00:00
Eugene Zhulenev
64272c7f40 Squeeze reads from two inner dimensions in TensorPadding 2019-12-11 16:54:51 -08:00
Eugene Zhulenev
963ba1015b Add back accidentally deleted default constructor to TensorExecutorTilingContext. 2019-12-11 18:47:55 +00:00
Joel Holdsworth
1b6e0395e6 Added io test 2019-12-11 18:22:57 +00:00
Joel Holdsworth
3c0ef9f394 IO: Fixed printing of char and unsigned char matrices 2019-12-11 18:22:57 +00:00
Joel Holdsworth
e87af0ed37 Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t 2019-12-11 18:22:57 +00:00
Gael Guennebaud
15b3bcfca0 Bug 1786: fix compilation with MSVC 2019-12-11 16:16:38 +01:00
Eugene Zhulenev
c9220c035f Remove block memory allocation required by removed block evaluation API 2019-12-10 17:15:55 -08:00
Eugene Zhulenev
1c879eb010 Remove V2 suffix from TensorBlock 2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880 Remove TensorBlock.h and old TensorBlock/BlockMapper 2019-12-10 14:31:44 -08:00
Deven Desai
c49f0d851a Fix for HIP breakage detected on 191210
The following commit introduces compile errors when running eigen with hipcc

2918f85ba9

hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods
2019-12-10 22:14:05 +00:00
Eugene Zhulenev
2918f85ba9 Do not use std::vector in getResourceRequirements 2019-12-09 16:19:55 -08:00
Artem Belevich
8056a05b54 Undo the block size change.
.z *is* used by the EigenContractionKernelInternal().
2019-12-09 11:10:29 -08:00
Eugene Zhulenev
dbb703d44e Add async evaluation support to TensorSelectOp 2019-12-09 18:36:13 +00:00
Janek Kozicki
11d6465326 fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing. 2019-12-06 21:07:39 +01:00
Eugene Zhulenev
bb7ccac3af Add recursive work splitting to EvalShardedByInnerDimContext 2019-12-05 14:51:49 -08:00
Artem Belevich
25230d1862 Improve performance of contraction kernels
* Force-inline implementations. They pass around pointers to shared memory
  blocks. Without inlining compiler must operate via generic pointers.
  Inlining allows compiler to detect that we're operating on shared memory
  which allows generation of substantially faster code.

* Fixed a long-standing typo which resulted in launching 8x more kernels
  than we needed (.z dimension of the block is unused by the kernel).
2019-12-05 12:48:34 -08:00
Gael Guennebaud
08eeb648ea update hg to git hashes 2019-12-05 16:33:24 +01:00
Rasmus Munk Larsen
366cf005b0 Add missing initialization in cxx11_tensor_trace.cpp. 2019-12-04 23:56:37 +00:00
Gael Guennebaud
c488b8b32f Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
Gael Guennebaud
8fbe0e4699 Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
Gael Guennebaud
114a15c66a Added tag before-git-migration for changeset a7c7d329d8 2019-12-04 10:06:00 +01:00
442 changed files with 38789 additions and 14274 deletions

View File

@@ -1,4 +1,3 @@
syntax: glob
qrc_*cxx
*.orig
*.pyc
@@ -36,3 +35,4 @@ lapack/reference
.*project
.settings
Makefile
!ci/build.gitlab-ci.yml

23
.gitlab-ci.yml Normal file
View File

@@ -0,0 +1,23 @@
# This file is part of Eigen, a lightweight C++ template library
# for linear algebra.
#
# Copyright (C) 2020 Arm Ltd. and Contributors
#
# This Source Code Form is subject to the terms of the Mozilla
# Public License v. 2.0. If a copy of the MPL was not distributed
# with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
stages:
- buildsmoketests
- smoketests
- build
- test
variables:
BUILDDIR: builddir
EIGEN_CI_CMAKE_GENEATOR: "Ninja"
include:
- "/ci/smoketests.gitlab-ci.yml"
- "/ci/build.gitlab-ci.yml"
- "/ci/test.gitlab-ci.yml"

View File

@@ -0,0 +1,69 @@
<!--
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by "bug::confirmed" or "bug::unconfirmed" and "bugzilla" label:
- https://gitlab.com/libeigen/eigen/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=bug%3A%3Aconfirmed
- https://gitlab.com/libeigen/eigen/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=bug%3A%3Aunconfirmed
- https://gitlab.com/libeigen/eigen/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=bugzilla
and verify the issue you're about to submit isn't a duplicate. -->
### Summary
<!-- Summarize the bug encountered concisely. -->
### Environment
<!-- Please provide your development environment here -->
- **Operating System** : Windows/Linux
- **Architecture** : x64/Arm64/PowerPC ...
- **Eigen Version** : 3.3.9
- **Compiler Version** : Gcc7.0
- **Compile Flags** : -O3 -march=native
- **Vector Extension** : SSE/AVX/NEON ...
### Minimal Example
<!-- If possible, please create a minimal example here that exhibits the problematic behavior.
You can also link to [godbolt](https://godbolt.org). But please note that you need to click
the "Share" button in the top right-hand corner of the godbolt page where you reproduce the sample
code to get the share link instead of in your browser address bar.
You can read [the guidelines on stackoverflow](https://stackoverflow.com/help/minimal-reproducible-example)
on how to create a good minimal example. -->
```cpp
//show your code here
```
### Steps to reproduce
<!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. -->
1. first step
2. second step
3. ...
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
### Relevant logs
<!-- Add relevant code snippets or program output within blocks marked by " ``` " -->
<!-- OPTIONAL: remove this section if you are not reporting a compilation warning issue.-->
### Warning Messages
<!-- Show us the warning messages you got! -->
<!-- OPTIONAL: remove this section if you are not reporting a performance issue. -->
### Benchmark scripts and results
<!-- Please share any benchmark scripts - either standalone, or using [Google Benchmark](https://github.com/google/benchmark). -->
### Anything else that might help
<!-- It will be better to provide us more information to help narrow down the cause.
Including but not limited to the following:
- lines of code that might help us diagnose the problem.
- potential ways to address the issue.
- last known working/first broken version (release number or commit hash). -->
- [ ] Have a plan to fix this issue.

View File

@@ -0,0 +1,7 @@
### Describe the feature you would like to be implemented.
### Would such a feature be useful for other users? Why?
### Any hints on how to implement the requested feature?
### Additional resources

View File

@@ -0,0 +1,26 @@
<!--
Thanks for contributing a merge request! Please name and fully describe your MR as you would for a commit message.
If the MR fixes an issue, please include "Fixes #issue" in the commit message and the MR description.
In addition, we recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen) and [git page](https://eigen.tuxfamily.org/index.php?title=Git), which will help you submit a more standardized MR.
Before submitting the MR, you also need to complete the following checks:
- Make one PR per feature/bugfix (don't mix multiple changes into one PR). Avoid committing unrelated changes.
- Rebase before committing
- For code changes, run the test suite (at least the tests that are likely affected by the change).
See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- If possible, add a test (both for bug-fixes as well as new features)
- Make sure new features are documented
Note that we are a team of volunteers; we appreciate your patience during the review process.
Again, thanks for contributing! -->
### Reference issue
<!-- You can link to a specific issue using the gitlab syntax #<issue number> -->
### What does this implement/fix?
<!--Please explain your changes.-->
### Additional information
<!--Any additional information you think is important.-->

View File

@@ -1,6 +1,7 @@
project(Eigen3)
# cmake_minimum_require must be the first command of the file
cmake_minimum_required(VERSION 3.5.0)
cmake_minimum_required(VERSION 2.8.11)
project(Eigen3)
# guard against in-source builds
@@ -20,13 +21,6 @@ if (NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release")
endif()
string(TOLOWER "${CMAKE_BUILD_TYPE}" cmake_build_type_tolower)
if( NOT cmake_build_type_tolower STREQUAL "debug"
AND NOT cmake_build_type_tolower STREQUAL "release"
AND NOT cmake_build_type_tolower STREQUAL "relwithdebinfo")
message(FATAL_ERROR "Unknown build type \"${CMAKE_BUILD_TYPE}\". Allowed values are Debug, Release, RelWithDebInfo (case-insensitive).")
endif()
#############################################################################
# retrieve version information #
@@ -42,29 +36,28 @@ string(REGEX MATCH "define[ \t]+EIGEN_MINOR_VERSION[ \t]+([0-9]+)" _eigen_minor_
set(EIGEN_MINOR_VERSION "${CMAKE_MATCH_1}")
set(EIGEN_VERSION_NUMBER ${EIGEN_WORLD_VERSION}.${EIGEN_MAJOR_VERSION}.${EIGEN_MINOR_VERSION})
# if we are not in a mercurial clone
if(IS_DIRECTORY ${CMAKE_SOURCE_DIR}/.hg)
# if the mercurial program is absent or this will leave the EIGEN_HG_CHANGESET string empty,
# if we are not in a git clone
if(IS_DIRECTORY ${CMAKE_SOURCE_DIR}/.git)
# if the git program is absent or this will leave the EIGEN_GIT_REVNUM string empty,
# but won't stop CMake.
execute_process(COMMAND hg tip -R ${CMAKE_SOURCE_DIR} OUTPUT_VARIABLE EIGEN_HGTIP_OUTPUT)
execute_process(COMMAND hg branch -R ${CMAKE_SOURCE_DIR} OUTPUT_VARIABLE EIGEN_BRANCH_OUTPUT)
execute_process(COMMAND git ls-remote --refs -q ${CMAKE_SOURCE_DIR} HEAD OUTPUT_VARIABLE EIGEN_GIT_OUTPUT)
endif()
# if this is the default (aka development) branch, extract the mercurial changeset number from the hg tip output...
if(EIGEN_BRANCH_OUTPUT MATCHES "default")
string(REGEX MATCH "^changeset: *[0-9]*:([0-9;a-f]+).*" EIGEN_HG_CHANGESET_MATCH "${EIGEN_HGTIP_OUTPUT}")
set(EIGEN_HG_CHANGESET "${CMAKE_MATCH_1}")
# extract the git rev number from the git output...
if(EIGEN_GIT_OUTPUT)
string(REGEX MATCH "^([0-9;a-f]+).*" EIGEN_GIT_CHANGESET_MATCH "${EIGEN_GIT_OUTPUT}")
set(EIGEN_GIT_REVNUM "${CMAKE_MATCH_1}")
endif()
#...and show it next to the version number
if(EIGEN_HG_CHANGESET)
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER} (mercurial changeset ${EIGEN_HG_CHANGESET})")
if(EIGEN_GIT_REVNUM)
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER} (git rev ${EIGEN_GIT_REVNUM})")
else()
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER}")
endif()
include(CheckCXXCompilerFlag)
include(GNUInstallDirs)
include(CMakeDependentOption)
set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)
@@ -248,15 +241,30 @@ if(NOT MSVC)
message(STATUS "Enabling FMA in tests/examples")
endif()
option(EIGEN_TEST_AVX2 "Enable/Disable AVX2 in tests/examples" OFF)
if(EIGEN_TEST_AVX2)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -mfma")
message(STATUS "Enabling AVX2 in tests/examples")
endif()
option(EIGEN_TEST_AVX512 "Enable/Disable AVX512 in tests/examples" OFF)
if(EIGEN_TEST_AVX512)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -mfma -DEIGEN_ENABLE_AVX512")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -mfma")
if (NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fabi-version=6")
endif()
message(STATUS "Enabling AVX512 in tests/examples")
endif()
option(EIGEN_TEST_AVX512DQ "Enable/Disable AVX512DQ in tests/examples" OFF)
if(EIGEN_TEST_AVX512DQ)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512dq")
if (NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fabi-version=6")
endif()
message(STATUS "Enabling AVX512DQ in tests/examples")
endif()
option(EIGEN_TEST_F16C "Enable/Disable F16C in tests/examples" OFF)
if(EIGEN_TEST_F16C)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mf16c")
@@ -416,22 +424,27 @@ endif()
if(EIGEN_INCLUDE_INSTALL_DIR AND NOT INCLUDE_INSTALL_DIR)
set(INCLUDE_INSTALL_DIR ${EIGEN_INCLUDE_INSTALL_DIR}
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
else()
set(INCLUDE_INSTALL_DIR
"${CMAKE_INSTALL_INCLUDEDIR}/eigen3"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
)
endif()
set(CMAKEPACKAGE_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/eigen3/cmake"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
)
set(PKGCONFIG_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/pkgconfig"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
)
foreach(var INCLUDE_INSTALL_DIR CMAKEPACKAGE_INSTALL_DIR PKGCONFIG_INSTALL_DIR)
if(IS_ABSOLUTE "${${var}}")
message(FATAL_ERROR "${var} must be relative to CMAKE_PREFIX_PATH. Got: ${${var}}")
endif()
endforeach()
# similar to set_target_properties but append the property instead of overwriting it
macro(ei_add_target_property target prop value)
@@ -458,7 +471,12 @@ endif()
install(DIRECTORY Eigen DESTINATION ${INCLUDE_INSTALL_DIR} COMPONENT Devel)
add_subdirectory(doc EXCLUDE_FROM_ALL)
option(EIGEN_BUILD_DOC "Enable creation of Eigen documentation" ON)
if(EIGEN_BUILD_DOC)
add_subdirectory(doc EXCLUDE_FROM_ALL)
endif()
option(BUILD_TESTING "Enable creation of Eigen tests." ON)
if(BUILD_TESTING)
@@ -486,6 +504,7 @@ option(EIGEN_TEST_SYCL "Add Sycl support." OFF)
option(EIGEN_SYCL_TRISYCL "Use the triSYCL Sycl implementation (ComputeCPP by default)." OFF)
if(EIGEN_TEST_SYCL)
set (CMAKE_MODULE_PATH "${CMAKE_ROOT}/Modules" "cmake/Modules/" "${CMAKE_MODULE_PATH}")
find_package(Threads REQUIRED)
if(EIGEN_SYCL_TRISYCL)
message(STATUS "Using triSYCL")
include(FindTriSYCL)
@@ -538,32 +557,30 @@ message(STATUS "")
string(TOLOWER "${CMAKE_GENERATOR}" cmake_generator_tolower)
if(cmake_generator_tolower MATCHES "makefile")
message(STATUS "Some things you can do now:")
message(STATUS "--------------+--------------------------------------------------------------")
message(STATUS "Command | Description")
message(STATUS "--------------+--------------------------------------------------------------")
message(STATUS "make install | Install Eigen. Headers will be installed to:")
message(STATUS " | <CMAKE_INSTALL_PREFIX>/<INCLUDE_INSTALL_DIR>")
message(STATUS " | Using the following values:")
message(STATUS " | CMAKE_INSTALL_PREFIX: ${CMAKE_INSTALL_PREFIX}")
message(STATUS " | INCLUDE_INSTALL_DIR: ${INCLUDE_INSTALL_DIR}")
message(STATUS " | Change the install location of Eigen headers using:")
message(STATUS " | cmake . -DCMAKE_INSTALL_PREFIX=yourprefix")
message(STATUS " | Or:")
message(STATUS " | cmake . -DINCLUDE_INSTALL_DIR=yourdir")
message(STATUS "make doc | Generate the API documentation, requires Doxygen & LaTeX")
if(BUILD_TESTING)
message(STATUS "make check | Build and run the unit-tests. Read this page:")
message(STATUS " | http://eigen.tuxfamily.org/index.php?title=Tests")
endif()
message(STATUS "make blas | Build BLAS library (not the same thing as Eigen)")
message(STATUS "make uninstall| Removes files installed by make install")
message(STATUS "--------------+--------------------------------------------------------------")
message(STATUS "Available targets (use: make TARGET):")
else()
message(STATUS "To build/run the unit tests, read this page:")
message(STATUS " http://eigen.tuxfamily.org/index.php?title=Tests")
message(STATUS "Available targets (use: cmake --build . --target TARGET):")
endif()
message(STATUS "---------+--------------------------------------------------------------")
message(STATUS "Target | Description")
message(STATUS "---------+--------------------------------------------------------------")
message(STATUS "install | Install Eigen. Headers will be installed to:")
message(STATUS " | <CMAKE_INSTALL_PREFIX>/<INCLUDE_INSTALL_DIR>")
message(STATUS " | Using the following values:")
message(STATUS " | CMAKE_INSTALL_PREFIX: ${CMAKE_INSTALL_PREFIX}")
message(STATUS " | INCLUDE_INSTALL_DIR: ${INCLUDE_INSTALL_DIR}")
message(STATUS " | Change the install location of Eigen headers using:")
message(STATUS " | cmake . -DCMAKE_INSTALL_PREFIX=yourprefix")
message(STATUS " | Or:")
message(STATUS " | cmake . -DINCLUDE_INSTALL_DIR=yourdir")
message(STATUS "doc | Generate the API documentation, requires Doxygen & LaTeX")
if(BUILD_TESTING)
message(STATUS "check | Build and run the unit-tests. Read this page:")
message(STATUS " | http://eigen.tuxfamily.org/index.php?title=Tests")
endif()
message(STATUS "blas | Build BLAS library (not the same thing as Eigen)")
message(STATUS "uninstall| Remove files installed by the install target")
message(STATUS "---------+--------------------------------------------------------------")
message(STATUS "")
@@ -575,82 +592,48 @@ set ( EIGEN_DEFINITIONS "")
set ( EIGEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${INCLUDE_INSTALL_DIR}" )
set ( EIGEN_ROOT_DIR ${CMAKE_INSTALL_PREFIX} )
# Interface libraries require at least CMake 3.0
if (NOT CMAKE_VERSION VERSION_LESS 3.0)
include (CMakePackageConfigHelpers)
include (CMakePackageConfigHelpers)
# Imported target support
add_library (eigen INTERFACE)
add_library (Eigen3::Eigen ALIAS eigen)
target_compile_definitions (eigen INTERFACE ${EIGEN_DEFINITIONS})
target_include_directories (eigen INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
$<INSTALL_INTERFACE:${INCLUDE_INSTALL_DIR}>
)
# Imported target support
add_library (eigen INTERFACE)
add_library (Eigen3::Eigen ALIAS eigen)
target_compile_definitions (eigen INTERFACE ${EIGEN_DEFINITIONS})
target_include_directories (eigen INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
$<INSTALL_INTERFACE:${INCLUDE_INSTALL_DIR}>
)
# Export as title case Eigen
set_target_properties (eigen PROPERTIES EXPORT_NAME Eigen)
# Export as title case Eigen
set_target_properties (eigen PROPERTIES EXPORT_NAME Eigen)
install (TARGETS eigen EXPORT Eigen3Targets)
install (TARGETS eigen EXPORT Eigen3Targets)
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3Config.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
PATH_VARS EIGEN_INCLUDE_DIR EIGEN_ROOT_DIR
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
# Remove CMAKE_SIZEOF_VOID_P from Eigen3ConfigVersion.cmake since Eigen does
# not depend on architecture specific settings or libraries. More
# specifically, an Eigen3Config.cmake generated from a 64 bit target can be
# used for 32 bit targets as well (and vice versa).
set (_Eigen3_CMAKE_SIZEOF_VOID_P ${CMAKE_SIZEOF_VOID_P})
unset (CMAKE_SIZEOF_VOID_P)
write_basic_package_version_file (Eigen3ConfigVersion.cmake
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion)
set (CMAKE_SIZEOF_VOID_P ${_Eigen3_CMAKE_SIZEOF_VOID_P})
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3Config.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
PATH_VARS EIGEN_INCLUDE_DIR EIGEN_ROOT_DIR
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
# Remove CMAKE_SIZEOF_VOID_P from Eigen3ConfigVersion.cmake since Eigen does
# not depend on architecture specific settings or libraries. More
# specifically, an Eigen3Config.cmake generated from a 64 bit target can be
# used for 32 bit targets as well (and vice versa).
set (_Eigen3_CMAKE_SIZEOF_VOID_P ${CMAKE_SIZEOF_VOID_P})
unset (CMAKE_SIZEOF_VOID_P)
write_basic_package_version_file (Eigen3ConfigVersion.cmake
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion)
set (CMAKE_SIZEOF_VOID_P ${_Eigen3_CMAKE_SIZEOF_VOID_P})
# The Eigen target will be located in the Eigen3 namespace. Other CMake
# targets can refer to it using Eigen3::Eigen.
export (TARGETS eigen NAMESPACE Eigen3:: FILE Eigen3Targets.cmake)
# Export Eigen3 package to CMake registry such that it can be easily found by
# CMake even if it has not been installed to a standard directory.
export (PACKAGE Eigen3)
# The Eigen target will be located in the Eigen3 namespace. Other CMake
# targets can refer to it using Eigen3::Eigen.
export (TARGETS eigen NAMESPACE Eigen3:: FILE Eigen3Targets.cmake)
# Export Eigen3 package to CMake registry such that it can be easily found by
# CMake even if it has not been installed to a standard directory.
export (PACKAGE Eigen3)
install (EXPORT Eigen3Targets NAMESPACE Eigen3:: DESTINATION ${CMAKEPACKAGE_INSTALL_DIR})
else ()
# Fallback to legacy Eigen3Config.cmake without the imported target
# If CMakePackageConfigHelpers module is available (CMake >= 2.8.8)
# create a relocatable Config file, otherwise leave the hardcoded paths
include(CMakePackageConfigHelpers OPTIONAL RESULT_VARIABLE CPCH_PATH)
if(CPCH_PATH)
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3ConfigLegacy.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
PATH_VARS EIGEN_INCLUDE_DIR EIGEN_ROOT_DIR
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
else()
# The PACKAGE_* variables are defined by the configure_package_config_file
# but without it we define them manually to the hardcoded paths
set(PACKAGE_INIT "")
set(PACKAGE_EIGEN_INCLUDE_DIR ${EIGEN_INCLUDE_DIR})
set(PACKAGE_EIGEN_ROOT_DIR ${EIGEN_ROOT_DIR})
configure_file ( ${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3ConfigLegacy.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
@ONLY ESCAPE_QUOTES )
endif()
write_basic_package_version_file( Eigen3ConfigVersion.cmake
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion )
endif ()
install (EXPORT Eigen3Targets NAMESPACE Eigen3:: DESTINATION ${CMAKEPACKAGE_INSTALL_DIR})
install ( FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake/UseEigen3.cmake
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake
@@ -660,3 +643,7 @@ install ( FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake/UseEigen3.cmake
# Add uninstall target
add_custom_target ( uninstall
COMMAND ${CMAKE_COMMAND} -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/EigenUninstall.cmake)
if (EIGEN_SPLIT_TESTSUITE)
ei_split_testsuite("${EIGEN_SPLIT_TESTSUITE}")
endif()

203
COPYING.APACHE Normal file
View File

@@ -0,0 +1,203 @@
/*
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

View File

@@ -23,4 +23,4 @@
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
*/

View File

@@ -1,52 +1,51 @@
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.

View File

@@ -43,4 +43,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_CHOLESKY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_CORE_H
#define EIGEN_CORE_H
// first thing Eigen does: stop the compiler from committing suicide
// first thing Eigen does: stop the compiler from reporting useless warnings.
#include "src/Core/util/DisableStupidWarnings.h"
// then include this file where all our macros are defined. It's really important to do it first because
@@ -22,7 +22,7 @@
#include "src/Core/util/ConfigureVectorization.h"
// We need cuda_runtime.h/hip_runtime.h to ensure that
// the EIGEN_USING_STD_MATH macro works properly on the device side
// the EIGEN_USING_STD macro works properly on the device side
#if defined(EIGEN_CUDACC)
#include <cuda_runtime.h>
#elif defined(EIGEN_HIPCC)
@@ -36,10 +36,17 @@
// Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3)
// See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details.
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6)
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6) && EIGEN_GNUC_AT_MOST(5,5)
#pragma GCC optimize ("-fno-ipa-cp-clone")
#endif
// Prevent ICC from specializing std::complex operators that silently fail
// on device. This allows us to use our own device-compatible specializations
// instead.
#if defined(EIGEN_COMP_ICC) && defined(EIGEN_GPU_COMPILE_PHASE) \
&& !defined(_OVERRIDE_COMPLEX_SPECIALIZATION_)
#define _OVERRIDE_COMPLEX_SPECIALIZATION_ 1
#endif
#include <complex>
// this include file manages BLAS and MKL related macros
@@ -51,6 +58,10 @@
#define EIGEN_HAS_GPU_FP16
#endif
#if defined(EIGEN_HAS_CUDA_BF16) || defined(EIGEN_HAS_HIP_BF16)
#define EIGEN_HAS_GPU_BF16
#endif
#if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE)
#define EIGEN_HAS_OPENMP
#endif
@@ -72,6 +83,7 @@
#include <cmath>
#include <cassert>
#include <functional>
#include <sstream>
#ifndef EIGEN_NO_IO
#include <iosfwd>
#endif
@@ -107,7 +119,7 @@
#undef isnan
#undef isinf
#undef isfinite
#include <SYCL/sycl.hpp>
#include <CL/sycl.hpp>
#include <map>
#include <memory>
#include <utility>
@@ -162,6 +174,7 @@ using std::ptrdiff_t;
#include "src/Core/arch/Default/ConjHelper.h"
// Generic half float support
#include "src/Core/arch/Default/Half.h"
#include "src/Core/arch/Default/BFloat16.h"
#include "src/Core/arch/Default/TypeCasting.h"
#include "src/Core/arch/Default/GenericPacketMathFunctionsFwd.h"
@@ -202,6 +215,10 @@ using std::ptrdiff_t;
#include "src/Core/arch/NEON/TypeCasting.h"
#include "src/Core/arch/NEON/MathFunctions.h"
#include "src/Core/arch/NEON/Complex.h"
#elif defined EIGEN_VECTORIZE_SVE
#include "src/Core/arch/SVE/PacketMath.h"
#include "src/Core/arch/SVE/TypeCasting.h"
#include "src/Core/arch/SVE/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_ZVECTOR
#include "src/Core/arch/ZVector/PacketMath.h"
#include "src/Core/arch/ZVector/MathFunctions.h"
@@ -329,6 +346,12 @@ using std::ptrdiff_t;
#include "src/Core/CoreIterators.h"
#include "src/Core/ConditionEstimator.h"
#if defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/MatrixProduct.h"
#elif defined EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/GeneralBlockPanelKernel.h"
#endif
#include "src/Core/BooleanRedux.h"
#include "src/Core/Select.h"
#include "src/Core/VectorwiseOp.h"

View File

@@ -58,4 +58,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_EIGENVALUES_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -50,11 +50,10 @@
#include "src/Geometry/Umeyama.h"
// Use the SSE optimized version whenever possible.
#if defined EIGEN_VECTORIZE_SSE
#include "src/Geometry/arch/Geometry_SSE.h"
#if (defined EIGEN_VECTORIZE_SSE) || (defined EIGEN_VECTORIZE_NEON)
#include "src/Geometry/arch/Geometry_SIMD.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_GEOMETRY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -27,4 +27,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_HOUSEHOLDER_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -29,5 +29,4 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_JACOBI_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -40,11 +40,10 @@
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
#include "src/LU/arch/Inverse_SSE.h"
#if (defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX) || defined EIGEN_VECTORIZE_NEON
#include "src/LU/arch/InverseSize4.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_LU_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -48,4 +48,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_QR_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -37,4 +37,3 @@ void *qRealloc(void *ptr, std::size_t size)
#endif
#endif // EIGEN_QTMALLOC_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -48,4 +48,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SVD_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -45,7 +45,7 @@ namespace internal {
* matrix \f$ A \f$ such that \f$ A = P^TLDL^*P \f$, where P is a permutation matrix, L
* is lower triangular with a unit diagonal and D is a diagonal matrix.
*
* The decomposition uses pivoting to ensure stability, so that L will have
* The decomposition uses pivoting to ensure stability, so that D will have
* zeros in the bottom right rank(A) - n submatrix. Avoiding the square root
* on D also stabilizes the computation.
*
@@ -53,7 +53,7 @@ namespace internal {
* decomposition to determine whether a system of equations has a solution.
*
* This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism.
*
*
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt(), class LLT
*/
template<typename _MatrixType, int _UpLo> class LDLT
@@ -200,7 +200,7 @@ template<typename _MatrixType, int _UpLo> class LDLT
* \f$ L^* y_4 = y_3 \f$ and \f$ P x = y_4 \f$ in succession. If the matrix \f$ A \f$ is singular, then
* \f$ D \f$ will also be singular (all the other matrices are invertible). In that case, the
* least-square solution of \f$ D y_3 = y_2 \f$ is computed. This does not mean that this function
* computes the least-square solution of \f$ A x = b \f$ is \f$ A \f$ is singular.
* computes the least-square solution of \f$ A x = b \f$ if \f$ A \f$ is singular.
*
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt()
*/
@@ -246,8 +246,8 @@ template<typename _MatrixType, int _UpLo> class LDLT
*/
const LDLT& adjoint() const { return *this; };
inline Index rows() const { return m_matrix.rows(); }
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
/** \brief Reports whether previous computation was successful.
*

View File

@@ -199,10 +199,10 @@ template<typename _MatrixType, int _UpLo> class LLT
* This method is provided for compatibility with other matrix decompositions, thus enabling generic code such as:
* \code x = decomposition.adjoint().solve(b) \endcode
*/
const LLT& adjoint() const { return *this; };
const LLT& adjoint() const EIGEN_NOEXCEPT { return *this; };
inline Index rows() const { return m_matrix.rows(); }
inline Index cols() const { return m_matrix.cols(); }
inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
template<typename VectorType>
LLT & rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);

View File

@@ -117,7 +117,7 @@ class Array
{
return Base::_set(other);
}
/** Default constructor.
*
* For fixed-size matrices, does nothing.
@@ -157,7 +157,7 @@ class Array
EIGEN_DEVICE_FUNC
Array& operator=(Array&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable<Scalar>::value)
{
other.swap(*this);
Base::operator=(std::move(other));
return *this;
}
#endif
@@ -177,24 +177,24 @@ class Array
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs an array and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11
*
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
*
* Example: \include Array_initializer_list_23_cxx11.cpp
* Output: \verbinclude Array_initializer_list_23_cxx11.out
*
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered.
*
*
* In the case of a compile-time column 1D array, implicit transposition from a single row is allowed.
* Therefore <code> Array<int,Dynamic,1>{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>Array<int,Dynamic,1>{{1},{2},{3},{4},{5}}</code> can be avoided:
*
*
* Example: \include Array_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Array_initializer_list_vector_cxx11.out
*
*
* In the case of fixed-sized arrays, the initializer list sizes must exactly match the array sizes,
* and implicit transposition is allowed for compile-time 1D arrays only.
*
*
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
@@ -241,7 +241,7 @@ class Array
/** constructs an initialized 2D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) */
Array(const Scalar& val0, const Scalar& val1);
#endif // end EIGEN_PARSED_BY_DOXYGEN
#endif // end EIGEN_PARSED_BY_DOXYGEN
/** constructs an initialized 3D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
@@ -288,8 +288,10 @@ class Array
: Base(other.derived())
{ }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT{ return 1; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return this->innerSize(); }
#ifdef EIGEN_ARRAY_PLUGIN
#include EIGEN_ARRAY_PLUGIN
@@ -322,7 +324,7 @@ class Array
* template parameter, i.e.:
* - `ArrayRowsCols<Type>` where `Rows` and `Cols` can be \c 2,\c 3,\c 4, or \c X for fixed or dynamic size.
* - `ArraySize<Type>` where `Size` can be \c 2,\c 3,\c 4 or \c X for fixed or dynamic size 1D arrays.
*
*
* \sa class Array
*/
@@ -367,7 +369,7 @@ using Array##SizeSuffix##SizeSuffix = Array<Type, Size, Size>; \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##SizeSuffix = Array<Type, Size, 1>;
using Array##SizeSuffix = Array<Type, Size, 1>;
#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Size) \
/** \ingroup arraytypedefs */ \
@@ -391,7 +393,7 @@ EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(4)
#undef EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS
#endif // EIGEN_HAS_CXX11
#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, SizeSuffix) \
using Eigen::Matrix##SizeSuffix##TypeSuffix; \
using Eigen::Vector##SizeSuffix##TypeSuffix; \

View File

@@ -153,8 +153,8 @@ template<typename Derived> class ArrayBase
// inline void evalTo(Dest& dst) const { dst = matrix(); }
protected:
EIGEN_DEVICE_FUNC
ArrayBase() : Base() {}
EIGEN_DEFAULT_COPY_CONSTRUCTOR(ArrayBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(ArrayBase)
private:
explicit ArrayBase(Index);

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_ARRAYWRAPPER_H
#define EIGEN_ARRAYWRAPPER_H
namespace Eigen {
namespace Eigen {
/** \class ArrayWrapper
* \ingroup Core_Module
@@ -60,14 +60,14 @@ class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
EIGEN_DEVICE_FUNC
explicit EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
@@ -91,8 +91,8 @@ class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
inline void evalTo(Dest& dst) const { dst = m_expression; }
EIGEN_DEVICE_FUNC
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
{
return m_expression;
}
@@ -158,14 +158,14 @@ class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> >
EIGEN_DEVICE_FUNC
explicit inline MatrixWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
@@ -185,8 +185,8 @@ class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> >
}
EIGEN_DEVICE_FUNC
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
{
return m_expression;
}

View File

@@ -17,7 +17,7 @@ namespace Eigen {
// This implementation is based on Assign.h
namespace internal {
/***************************************************************************
* Part 1 : the logic deciding a strategy for traversal and unrolling *
***************************************************************************/
@@ -29,12 +29,12 @@ struct copy_using_evaluator_traits
{
typedef typename DstEvaluator::XprType Dst;
typedef typename Dst::Scalar DstScalar;
enum {
DstFlags = DstEvaluator::Flags,
SrcFlags = SrcEvaluator::Flags
};
public:
enum {
DstAlignment = DstEvaluator::Alignment,
@@ -99,7 +99,8 @@ private:
public:
enum {
Traversal = (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
Traversal = int(Dst::SizeAtCompileTime) == 0 ? int(AllAtOnceTraversal) // If compile-size is zero, traversing will fail at compile-time.
: (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
: int(MayInnerVectorize) ? int(InnerVectorizedTraversal)
: int(MayLinearVectorize) ? int(LinearVectorizedTraversal)
: int(MaySliceVectorize) ? int(SliceVectorizedTraversal)
@@ -137,7 +138,7 @@ public:
? int(CompleteUnrolling)
: int(NoUnrolling) )
: int(Traversal) == int(LinearTraversal)
? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling)
? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling)
: int(NoUnrolling) )
#if EIGEN_UNALIGNED_VECTORIZE
: int(Traversal) == int(SliceVectorizedTraversal)
@@ -199,7 +200,7 @@ struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling
// FIXME: this is not very clean, perhaps this information should be provided by the kernel?
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
enum {
outer = Index / DstXprType::InnerSizeAtCompileTime,
inner = Index % DstXprType::InnerSizeAtCompileTime
@@ -265,7 +266,7 @@ struct copy_using_evaluator_innervec_CompleteUnrolling
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum {
outer = Index / DstXprType::InnerSizeAtCompileTime,
inner = Index % DstXprType::InnerSizeAtCompileTime,
@@ -316,6 +317,22 @@ template<typename Kernel,
int Unrolling = Kernel::AssignmentTraits::Unrolling>
struct dense_assignment_loop;
/************************
***** Special Cases *****
************************/
// Zero-sized assignment is a no-op.
template<typename Kernel, int Unrolling>
struct dense_assignment_loop<Kernel, AllAtOnceTraversal, Unrolling>
{
EIGEN_DEVICE_FUNC static void EIGEN_STRONG_INLINE run(Kernel& /*kernel*/)
{
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
EIGEN_STATIC_ASSERT(int(DstXprType::SizeAtCompileTime) == 0,
EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT)
}
};
/************************
*** Default traversal ***
************************/
@@ -430,10 +447,10 @@ struct dense_assignment_loop<Kernel, LinearVectorizedTraversal, CompleteUnrollin
{
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum { size = DstXprType::SizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
alignedSize = (size/packetSize)*packetSize };
alignedSize = (int(size)/packetSize)*packetSize };
copy_using_evaluator_innervec_CompleteUnrolling<Kernel, 0, alignedSize>::run(kernel);
copy_using_evaluator_DefaultTraversal_CompleteUnrolling<Kernel, alignedSize, size>::run(kernel);
@@ -572,14 +589,15 @@ struct dense_assignment_loop<Kernel, SliceVectorizedTraversal, InnerUnrolling>
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum { size = DstXprType::InnerSizeAtCompileTime,
enum { innerSize = DstXprType::InnerSizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
vectorizableSize = (size/packetSize)*packetSize };
vectorizableSize = (innerSize/packetSize)*packetSize,
size = DstXprType::SizeAtCompileTime };
for(Index outer = 0; outer < kernel.outerSize(); ++outer)
{
copy_using_evaluator_innervec_InnerUnrolling<Kernel, 0, vectorizableSize, 0, 0>::run(kernel, outer);
copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, vectorizableSize, size>::run(kernel, outer);
copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, vectorizableSize, innerSize>::run(kernel, outer);
}
}
};
@@ -603,14 +621,14 @@ protected:
typedef typename DstEvaluatorTypeT::XprType DstXprType;
typedef typename SrcEvaluatorTypeT::XprType SrcXprType;
public:
typedef DstEvaluatorTypeT DstEvaluatorType;
typedef SrcEvaluatorTypeT SrcEvaluatorType;
typedef typename DstEvaluatorType::Scalar Scalar;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: m_dst(dst), m_src(src), m_functor(func), m_dstExpr(dstExpr)
@@ -619,58 +637,58 @@ public:
AssignmentTraits::debug();
#endif
}
EIGEN_DEVICE_FUNC Index size() const { return m_dstExpr.size(); }
EIGEN_DEVICE_FUNC Index innerSize() const { return m_dstExpr.innerSize(); }
EIGEN_DEVICE_FUNC Index outerSize() const { return m_dstExpr.outerSize(); }
EIGEN_DEVICE_FUNC Index rows() const { return m_dstExpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_dstExpr.cols(); }
EIGEN_DEVICE_FUNC Index outerStride() const { return m_dstExpr.outerStride(); }
EIGEN_DEVICE_FUNC DstEvaluatorType& dstEvaluator() { return m_dst; }
EIGEN_DEVICE_FUNC const SrcEvaluatorType& srcEvaluator() const { return m_src; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_dstExpr.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index innerSize() const EIGEN_NOEXCEPT { return m_dstExpr.innerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerSize() const EIGEN_NOEXCEPT { return m_dstExpr.outerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dstExpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_dstExpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerStride() const EIGEN_NOEXCEPT { return m_dstExpr.outerStride(); }
EIGEN_DEVICE_FUNC DstEvaluatorType& dstEvaluator() EIGEN_NOEXCEPT { return m_dst; }
EIGEN_DEVICE_FUNC const SrcEvaluatorType& srcEvaluator() const EIGEN_NOEXCEPT { return m_src; }
/// Assign src(row,col) to dst(row,col) through the assignment functor.
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index row, Index col)
{
m_functor.assignCoeff(m_dst.coeffRef(row,col), m_src.coeff(row,col));
}
/// \sa assignCoeff(Index,Index)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index index)
{
m_functor.assignCoeff(m_dst.coeffRef(index), m_src.coeff(index));
}
/// \sa assignCoeff(Index,Index)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeffByOuterInner(Index outer, Index inner)
{
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
assignCoeff(row, col);
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index row, Index col)
{
m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(row,col), m_src.template packet<LoadMode,PacketType>(row,col));
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index index)
{
m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(index), m_src.template packet<LoadMode,PacketType>(index));
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacketByOuterInner(Index outer, Index inner)
{
Index row = rowIndexByOuterInner(outer, inner);
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
assignPacket<StoreMode,LoadMode,PacketType>(row, col);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index rowIndexByOuterInner(Index outer, Index inner)
{
typedef typename DstEvaluatorType::ExpressionTraits Traits;
@@ -693,7 +711,7 @@ public:
{
return m_dstExpr.data();
}
protected:
DstEvaluatorType& m_dst;
const SrcEvaluatorType& m_src;
@@ -716,13 +734,13 @@ protected:
typedef typename Base::DstXprType DstXprType;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor, 4> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC restricted_packet_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{
}
};
/***************************************************************************
* Part 5 : Entry point for dense rectangular assignment
***************************************************************************/
@@ -760,7 +778,7 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
typedef generic_dense_assignment_kernel<DstEvaluatorType,SrcEvaluatorType,Functor> Kernel;
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
@@ -788,7 +806,7 @@ struct EigenBase2EigenBase {};
template<typename,typename> struct AssignmentKind { typedef EigenBase2EigenBase Kind; };
template<> struct AssignmentKind<DenseShape,DenseShape> { typedef Dense2Dense Kind; };
// This is the main assignment class
template< typename DstXprType, typename SrcXprType, typename Functor,
typename Kind = typename AssignmentKind< typename evaluator_traits<DstXprType>::Shape , typename evaluator_traits<SrcXprType>::Shape >::Kind,
@@ -813,7 +831,7 @@ void call_assignment(const Dst& dst, const Src& src)
{
call_assignment(dst, src, internal::assign_op<typename Dst::Scalar,typename Src::Scalar>());
}
// Deal with "assume-aliasing"
template<typename Dst, typename Src, typename Func>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -853,12 +871,12 @@ void call_assignment_no_alias(Dst& dst, const Src& src, const Func& func)
typedef typename internal::conditional<NeedToTranspose, Transpose<Dst>, Dst>::type ActualDstTypeCleaned;
typedef typename internal::conditional<NeedToTranspose, Transpose<Dst>, Dst&>::type ActualDstType;
ActualDstType actualDst(dst);
// TODO check whether this is the right place to perform these checks:
EIGEN_STATIC_ASSERT_LVALUE(Dst)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(ActualDstTypeCleaned,Src)
EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename ActualDstTypeCleaned::Scalar,typename Src::Scalar);
Assignment<ActualDstTypeCleaned,Src,Func>::run(actualDst, src, func);
}
@@ -875,7 +893,7 @@ void call_restricted_packet_assignment_no_alias(Dst& dst, const Src& src, const
SrcEvaluatorType srcEvaluator(src);
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
@@ -922,7 +940,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Dense2Dense, Weak>
#ifndef EIGEN_NO_DEBUG
internal::check_for_aliasing(dst, src);
#endif
call_dense_assignment_loop(dst, src, func);
}
};

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_BANDMATRIX_H
#define EIGEN_BANDMATRIX_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -45,7 +45,7 @@ class BandMatrixBase : public EigenBase<Derived>
};
public:
using Base::derived;
using Base::rows;
using Base::cols;
@@ -55,10 +55,10 @@ class BandMatrixBase : public EigenBase<Derived>
/** \returns the number of sub diagonals */
inline Index subs() const { return derived().subs(); }
/** \returns an expression of the underlying coefficient matrix */
inline const CoefficientsType& coeffs() const { return derived().coeffs(); }
/** \returns an expression of the underlying coefficient matrix */
inline CoefficientsType& coeffs() { return derived().coeffs(); }
@@ -130,7 +130,7 @@ class BandMatrixBase : public EigenBase<Derived>
eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers()));
return Block<const CoefficientsType,1,Dynamic>(coeffs(), supers()-i, std::max<Index>(0,i), 1, diagonalLength(i));
}
template<typename Dest> inline void evalTo(Dest& dst) const
{
dst.resize(rows(),cols());
@@ -211,16 +211,16 @@ class BandMatrix : public BandMatrixBase<BandMatrix<_Scalar,Rows,Cols,Supers,Sub
}
/** \returns the number of columns */
inline Index rows() const { return m_rows.value(); }
inline EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); }
/** \returns the number of rows */
inline Index cols() const { return m_coeffs.cols(); }
inline EIGEN_CONSTEXPR Index cols() const { return m_coeffs.cols(); }
/** \returns the number of super diagonals */
inline Index supers() const { return m_supers.value(); }
inline EIGEN_CONSTEXPR Index supers() const { return m_supers.value(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return m_subs.value(); }
inline EIGEN_CONSTEXPR Index subs() const { return m_subs.value(); }
inline const CoefficientsType& coeffs() const { return m_coeffs; }
inline CoefficientsType& coeffs() { return m_coeffs; }
@@ -275,16 +275,16 @@ class BandMatrixWrapper : public BandMatrixBase<BandMatrixWrapper<_CoefficientsT
}
/** \returns the number of columns */
inline Index rows() const { return m_rows.value(); }
inline EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); }
/** \returns the number of rows */
inline Index cols() const { return m_coeffs.cols(); }
inline EIGEN_CONSTEXPR Index cols() const { return m_coeffs.cols(); }
/** \returns the number of super diagonals */
inline Index supers() const { return m_supers.value(); }
inline EIGEN_CONSTEXPR Index supers() const { return m_supers.value(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return m_subs.value(); }
inline EIGEN_CONSTEXPR Index subs() const { return m_subs.value(); }
inline const CoefficientsType& coeffs() const { return m_coeffs; }

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_BLOCK_H
#define EIGEN_BLOCK_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
@@ -52,7 +52,7 @@ struct traits<Block<XprType, BlockRows, BlockCols, InnerPanel> > : traits<XprTyp
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
Flags = (traits<XprType>::Flags & (DirectAccessBit | (InnerPanel?CompressedAccessBit:0))) | FlagsLvalueBit | FlagsRowMajorBit,
// FIXME DirectAccessBit should not be handled by expressions
//
//
// Alignment is needed by MapBase's assertions
// We can sefely set it to false here. Internal alignment errors will be detected by an eigen_internal_assert in the respective evaluator
Alignment = 0
@@ -61,7 +61,7 @@ struct traits<Block<XprType, BlockRows, BlockCols, InnerPanel> > : traits<XprTyp
template<typename XprType, int BlockRows=Dynamic, int BlockCols=Dynamic, bool InnerPanel = false,
bool HasDirectAccess = internal::has_direct_access<XprType>::ret> class BlockImpl_dense;
} // end namespace internal
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, typename StorageKind> class BlockImpl;
@@ -109,9 +109,9 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
typedef Impl Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(Block)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Block)
typedef typename internal::remove_all<XprType>::type NestedExpression;
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -147,7 +147,7 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
&& startCol >= 0 && blockCols >= 0 && startCol <= xpr.cols() - blockCols);
}
};
// The generic default implementation for dense block simplu forward to the internal::BlockImpl_dense
// that must be specialized for direct and non-direct access...
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
@@ -296,23 +296,23 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool H
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
{
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
XprType& nestedExpression() { return m_xpr; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startRow() const
{
return m_startRow.value();
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startRow() const EIGEN_NOEXCEPT
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startCol() const
{
return m_startCol.value();
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startCol() const EIGEN_NOEXCEPT
{
return m_startCol.value();
}
protected:
@@ -344,7 +344,7 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr, Index i)
: Base(xpr.data() + i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor))
: Base(xpr.data() + i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor))
|| ((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && ( XprTypeIsRowMajor)) ? xpr.innerStride() : xpr.outerStride()),
BlockRows==1 ? 1 : xpr.rows(),
BlockCols==1 ? 1 : xpr.cols()),
@@ -378,17 +378,17 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const EIGEN_NOEXCEPT
{
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index innerStride() const EIGEN_NOEXCEPT
{
return internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.innerStride()
@@ -396,23 +396,19 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index outerStride() const EIGEN_NOEXCEPT
{
return m_outerStride;
return internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.outerStride()
: m_xpr.innerStride();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startRow() const
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startRow() const EIGEN_NOEXCEPT { return m_startRow.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startCol() const
{
return m_startCol.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startCol() const EIGEN_NOEXCEPT { return m_startCol.value(); }
#ifndef __SUNPRO_CC
// FIXME sunstudio is not friendly with the above friend...

View File

@@ -22,7 +22,7 @@ struct all_unroller
row = (UnrollCount-1) % Rows
};
static inline bool run(const Derived &mat)
EIGEN_DEVICE_FUNC static inline bool run(const Derived &mat)
{
return all_unroller<Derived, UnrollCount-1, Rows>::run(mat) && mat.coeff(row, col);
}
@@ -31,13 +31,13 @@ struct all_unroller
template<typename Derived, int Rows>
struct all_unroller<Derived, 0, Rows>
{
static inline bool run(const Derived &/*mat*/) { return true; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &/*mat*/) { return true; }
};
template<typename Derived, int Rows>
struct all_unroller<Derived, Dynamic, Rows>
{
static inline bool run(const Derived &) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &) { return false; }
};
template<typename Derived, int UnrollCount, int Rows>
@@ -48,7 +48,7 @@ struct any_unroller
row = (UnrollCount-1) % Rows
};
static inline bool run(const Derived &mat)
EIGEN_DEVICE_FUNC static inline bool run(const Derived &mat)
{
return any_unroller<Derived, UnrollCount-1, Rows>::run(mat) || mat.coeff(row, col);
}
@@ -57,13 +57,13 @@ struct any_unroller
template<typename Derived, int Rows>
struct any_unroller<Derived, 0, Rows>
{
static inline bool run(const Derived & /*mat*/) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived & /*mat*/) { return false; }
};
template<typename Derived, int Rows>
struct any_unroller<Derived, Dynamic, Rows>
{
static inline bool run(const Derived &) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &) { return false; }
};
} // end namespace internal

View File

@@ -33,6 +33,8 @@ struct CommaInitializer
inline CommaInitializer(XprType& xpr, const Scalar& s)
: m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1)
{
eigen_assert(m_xpr.rows() > 0 && m_xpr.cols() > 0
&& "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.coeffRef(0,0) = s;
}
@@ -41,6 +43,8 @@ struct CommaInitializer
inline CommaInitializer(XprType& xpr, const DenseBase<OtherDerived>& other)
: m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows())
{
eigen_assert(m_xpr.rows() >= other.rows() && m_xpr.cols() >= other.cols()
&& "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.block(0, 0, other.rows(), other.cols()) = other;
}
@@ -103,7 +107,7 @@ struct CommaInitializer
EIGEN_EXCEPTION_SPEC(Eigen::eigen_assert_exception)
#endif
{
finished();
finished();
}
/** \returns the built matrix once all its coefficients have been set.

View File

@@ -14,7 +14,7 @@
#define EIGEN_COREEVALUATORS_H
namespace Eigen {
namespace internal {
// This class returns the evaluator kind from the expression storage kind.
@@ -63,8 +63,8 @@ template< typename T,
template< typename T,
typename Kind = typename evaluator_traits<typename T::NestedExpression>::Kind,
typename Scalar = typename T::Scalar> struct unary_evaluator;
// evaluator_traits<T> contains traits for evaluator<T>
// evaluator_traits<T> contains traits for evaluator<T>
template<typename T>
struct evaluator_traits_base
@@ -111,7 +111,7 @@ struct evaluator_base
{
// TODO that's not very nice to have to propagate all these traits. They are currently only needed to handle outer,inner indices.
typedef traits<ExpressionType> ExpressionTraits;
enum {
Alignment = 0
};
@@ -143,8 +143,8 @@ public:
#endif
eigen_internal_assert(outerStride==OuterStride);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const { return OuterStride; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index outerStride() const EIGEN_NOEXCEPT { return OuterStride; }
const Scalar *data;
};
@@ -172,7 +172,7 @@ struct evaluator<PlainObjectBase<Derived> >
IsVectorAtCompileTime = PlainObjectType::IsVectorAtCompileTime,
RowsAtCompileTime = PlainObjectType::RowsAtCompileTime,
ColsAtCompileTime = PlainObjectType::ColsAtCompileTime,
CoeffReadCost = NumTraits<Scalar>::ReadCost,
Flags = traits<Derived>::EvaluatorFlags,
Alignment = traits<Derived>::Alignment
@@ -274,13 +274,13 @@ struct evaluator<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
: evaluator<PlainObjectBase<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> > >
{
typedef Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> XprType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& m)
: evaluator<PlainObjectBase<XprType> >(m)
: evaluator<PlainObjectBase<XprType> >(m)
{ }
};
@@ -292,10 +292,10 @@ struct evaluator<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& m)
: evaluator<PlainObjectBase<XprType> >(m)
: evaluator<PlainObjectBase<XprType> >(m)
{ }
};
@@ -306,9 +306,9 @@ struct unary_evaluator<Transpose<ArgType>, IndexBased>
: evaluator_base<Transpose<ArgType> >
{
typedef Transpose<ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
Flags = evaluator<ArgType>::Flags ^ RowMajorBit,
Alignment = evaluator<ArgType>::Alignment
};
@@ -499,10 +499,10 @@ struct evaluator<CwiseNullaryOp<NullaryOp,PlainObjectType> >
{
typedef CwiseNullaryOp<NullaryOp,PlainObjectType> XprType;
typedef typename internal::remove_all<PlainObjectType>::type PlainObjectTypeCleaned;
enum {
CoeffReadCost = internal::functor_traits<NullaryOp>::Cost,
Flags = (evaluator<PlainObjectTypeCleaned>::Flags
& ( HereditaryBits
| (functor_has_linear_access<NullaryOp>::ret ? LinearAccessBit : 0)
@@ -559,10 +559,10 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
: evaluator_base<CwiseUnaryOp<UnaryOp, ArgType> >
{
typedef CwiseUnaryOp<UnaryOp, ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
Flags = evaluator<ArgType>::Flags
& (HereditaryBits | LinearAccessBit | (functor_traits<UnaryOp>::PacketAccess ? PacketAccessBit : 0)),
Alignment = evaluator<ArgType>::Alignment
@@ -628,7 +628,7 @@ struct evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> >
{
typedef CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> XprType;
typedef ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> > Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
@@ -637,10 +637,10 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
: evaluator_base<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> >
{
typedef CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> XprType;
enum {
CoeffReadCost = evaluator<Arg1>::CoeffReadCost + evaluator<Arg2>::CoeffReadCost + evaluator<Arg3>::CoeffReadCost + functor_traits<TernaryOp>::Cost,
Arg1Flags = evaluator<Arg1>::Flags,
Arg2Flags = evaluator<Arg2>::Flags,
Arg3Flags = evaluator<Arg3>::Flags,
@@ -723,7 +723,7 @@ struct evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
{
typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
typedef binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs> > Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
@@ -733,10 +733,10 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
: evaluator_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
{
typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
enum {
CoeffReadCost = evaluator<Lhs>::CoeffReadCost + evaluator<Rhs>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
LhsFlags = evaluator<Lhs>::Flags,
RhsFlags = evaluator<Rhs>::Flags,
SameType = is_same<typename Lhs::Scalar,typename Rhs::Scalar>::value,
@@ -813,12 +813,12 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
: evaluator_base<CwiseUnaryView<UnaryOp, ArgType> >
{
typedef CwiseUnaryView<UnaryOp, ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits | LinearAccessBit | DirectAccessBit)),
Alignment = 0 // FIXME it is not very clear why alignment is necessarily lost...
};
@@ -884,7 +884,7 @@ struct mapbase_evaluator : evaluator_base<Derived>
typedef typename XprType::PointerType PointerType;
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
enum {
IsRowMajor = XprType::RowsAtCompileTime,
ColsAtCompileTime = XprType::ColsAtCompileTime,
@@ -956,17 +956,21 @@ struct mapbase_evaluator : evaluator_base<Derived>
internal::pstoret<Scalar, PacketType, StoreMode>(m_data + index * m_innerStride.value(), x);
}
protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rowStride() const { return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index colStride() const { return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rowStride() const EIGEN_NOEXCEPT {
return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index colStride() const EIGEN_NOEXCEPT {
return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value();
}
PointerType m_data;
const internal::variable_if_dynamic<Index, XprType::InnerStrideAtCompileTime> m_innerStride;
const internal::variable_if_dynamic<Index, XprType::OuterStrideAtCompileTime> m_outerStride;
};
template<typename PlainObjectType, int MapOptions, typename StrideType>
template<typename PlainObjectType, int MapOptions, typename StrideType>
struct evaluator<Map<PlainObjectType, MapOptions, StrideType> >
: public mapbase_evaluator<Map<PlainObjectType, MapOptions, StrideType>, PlainObjectType>
{
@@ -974,7 +978,7 @@ struct evaluator<Map<PlainObjectType, MapOptions, StrideType> >
typedef typename XprType::Scalar Scalar;
// TODO: should check for smaller packet types once we can handle multi-sized packet types
typedef typename packet_traits<Scalar>::type PacketScalar;
enum {
InnerStrideAtCompileTime = StrideType::InnerStrideAtCompileTime == 0
? int(PlainObjectType::InnerStrideAtCompileTime)
@@ -986,27 +990,27 @@ struct evaluator<Map<PlainObjectType, MapOptions, StrideType> >
HasNoOuterStride = StrideType::OuterStrideAtCompileTime == 0,
HasNoStride = HasNoInnerStride && HasNoOuterStride,
IsDynamicSize = PlainObjectType::SizeAtCompileTime==Dynamic,
PacketAccessMask = bool(HasNoInnerStride) ? ~int(0) : ~int(PacketAccessBit),
LinearAccessMask = bool(HasNoStride) || bool(PlainObjectType::IsVectorAtCompileTime) ? ~int(0) : ~int(LinearAccessBit),
Flags = int( evaluator<PlainObjectType>::Flags) & (LinearAccessMask&PacketAccessMask),
Alignment = int(MapOptions)&int(AlignedMask)
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& map)
: mapbase_evaluator<XprType, PlainObjectType>(map)
: mapbase_evaluator<XprType, PlainObjectType>(map)
{ }
};
// -------------------- Ref --------------------
template<typename PlainObjectType, int RefOptions, typename StrideType>
template<typename PlainObjectType, int RefOptions, typename StrideType>
struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
: public mapbase_evaluator<Ref<PlainObjectType, RefOptions, StrideType>, PlainObjectType>
{
typedef Ref<PlainObjectType, RefOptions, StrideType> XprType;
enum {
Flags = evaluator<Map<PlainObjectType, RefOptions, StrideType> >::Flags,
Alignment = evaluator<Map<PlainObjectType, RefOptions, StrideType> >::Alignment
@@ -1014,7 +1018,7 @@ struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& ref)
: mapbase_evaluator<XprType, PlainObjectType>(ref)
: mapbase_evaluator<XprType, PlainObjectType>(ref)
{ }
};
@@ -1022,8 +1026,8 @@ struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel,
bool HasDirectAccess = internal::has_direct_access<ArgType>::ret> struct block_evaluator;
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
: block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel>
{
@@ -1031,15 +1035,15 @@ struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
typedef typename XprType::Scalar Scalar;
// TODO: should check for smaller packet types once we can handle multi-sized packet types
typedef typename packet_traits<Scalar>::type PacketScalar;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
RowsAtCompileTime = traits<XprType>::RowsAtCompileTime,
ColsAtCompileTime = traits<XprType>::ColsAtCompileTime,
MaxRowsAtCompileTime = traits<XprType>::MaxRowsAtCompileTime,
MaxColsAtCompileTime = traits<XprType>::MaxColsAtCompileTime,
ArgTypeIsRowMajor = (int(evaluator<ArgType>::Flags)&RowMajorBit) != 0,
IsRowMajor = (MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1) ? 1
: (MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1) ? 0
@@ -1053,14 +1057,14 @@ struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
? int(outer_stride_at_compile_time<ArgType>::ret)
: int(inner_stride_at_compile_time<ArgType>::ret),
MaskPacketAccessBit = (InnerStrideAtCompileTime == 1 || HasSameStorageOrderAsArgType) ? PacketAccessBit : 0,
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1 || (InnerPanel && (evaluator<ArgType>::Flags&LinearAccessBit))) ? LinearAccessBit : 0,
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1 || (InnerPanel && (evaluator<ArgType>::Flags&LinearAccessBit))) ? LinearAccessBit : 0,
FlagsRowMajorBit = XprType::Flags&RowMajorBit,
Flags0 = evaluator<ArgType>::Flags & ( (HereditaryBits & ~RowMajorBit) |
DirectAccessBit |
MaskPacketAccessBit),
Flags = Flags0 | FlagsLinearAccessBit | FlagsRowMajorBit,
PacketAlignment = unpacket_traits<PacketScalar>::alignment,
Alignment0 = (InnerPanel && (OuterStrideAtCompileTime!=Dynamic)
&& (OuterStrideAtCompileTime!=0)
@@ -1084,7 +1088,7 @@ struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /*HasDirectAcc
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit block_evaluator(const XprType& block)
: unary_evaluator<XprType>(block)
: unary_evaluator<XprType>(block)
{}
};
@@ -1096,12 +1100,12 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& block)
: m_argImpl(block.nestedExpression()),
m_startRow(block.startRow()),
: m_argImpl(block.nestedExpression()),
m_startRow(block.startRow()),
m_startCol(block.startCol()),
m_linear_offset(ForwardLinearAccess?(ArgType::IsRowMajor ? block.startRow()*block.nestedExpression().cols() + block.startCol() : block.startCol()*block.nestedExpression().rows() + block.startRow()):0)
{ }
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
@@ -1109,13 +1113,13 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
RowsAtCompileTime = XprType::RowsAtCompileTime,
ForwardLinearAccess = (InnerPanel || int(XprType::IsRowMajor)==int(ArgType::IsRowMajor)) && bool(evaluator<ArgType>::Flags&LinearAccessBit)
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_argImpl.coeff(m_startRow.value() + row, m_startCol.value() + col);
{
return m_argImpl.coeff(m_startRow.value() + row, m_startCol.value() + col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
@@ -1124,44 +1128,44 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index row, Index col)
{
return m_argImpl.coeffRef(m_startRow.value() + row, m_startCol.value() + col);
{
return m_argImpl.coeffRef(m_startRow.value() + row, m_startCol.value() + col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
return linear_coeffRef_impl(index, bool_constant<ForwardLinearAccess>());
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index row, Index col) const
{
return m_argImpl.template packet<LoadMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col);
PacketType packet(Index row, Index col) const
{
return m_argImpl.template packet<LoadMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col);
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
{
PacketType packet(Index index) const
{
if (ForwardLinearAccess)
return m_argImpl.template packet<LoadMode,PacketType>(m_linear_offset.value() + index);
else
return packet<LoadMode,PacketType>(RowsAtCompileTime == 1 ? 0 : index,
RowsAtCompileTime == 1 ? index : 0);
}
template<int StoreMode, typename PacketType>
EIGEN_STRONG_INLINE
void writePacket(Index row, Index col, const PacketType& x)
void writePacket(Index row, Index col, const PacketType& x)
{
return m_argImpl.template writePacket<StoreMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col, x);
return m_argImpl.template writePacket<StoreMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col, x);
}
template<int StoreMode, typename PacketType>
EIGEN_STRONG_INLINE
void writePacket(Index index, const PacketType& x)
void writePacket(Index index, const PacketType& x)
{
if (ForwardLinearAccess)
return m_argImpl.template writePacket<StoreMode,PacketType>(m_linear_offset.value() + index, x);
@@ -1170,12 +1174,12 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
RowsAtCompileTime == 1 ? index : 0,
x);
}
protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType linear_coeff_impl(Index index, internal::true_type /* ForwardLinearAccess */) const
{
return m_argImpl.coeff(m_linear_offset.value() + index);
return m_argImpl.coeff(m_linear_offset.value() + index);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType linear_coeff_impl(Index index, internal::false_type /* not ForwardLinearAccess */) const
@@ -1186,7 +1190,7 @@ protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& linear_coeffRef_impl(Index index, internal::true_type /* ForwardLinearAccess */)
{
return m_argImpl.coeffRef(m_linear_offset.value() + index);
return m_argImpl.coeffRef(m_linear_offset.value() + index);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& linear_coeffRef_impl(Index index, internal::false_type /* not ForwardLinearAccess */)
@@ -1200,10 +1204,10 @@ protected:
const variable_if_dynamic<Index, ForwardLinearAccess ? Dynamic : 0> m_linear_offset;
};
// TODO: This evaluator does not actually use the child evaluator;
// TODO: This evaluator does not actually use the child evaluator;
// all action is via the data() as returned by the Block expression.
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /* HasDirectAccess */ true>
: mapbase_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>,
typename Block<ArgType, BlockRows, BlockCols, InnerPanel>::PlainObject>
@@ -1213,7 +1217,7 @@ struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /* HasDirectAc
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit block_evaluator(const XprType& block)
: mapbase_evaluator<XprType, typename XprType::PlainObject>(block)
: mapbase_evaluator<XprType, typename XprType::PlainObject>(block)
{
// TODO: for the 3.3 release, this should be turned to an internal assertion, but let's keep it as is for the beta lifetime
eigen_assert(((internal::UIntPtr(block.data()) % EIGEN_PLAIN_ENUM_MAX(1,evaluator<XprType>::Alignment)) == 0) && "data is not aligned");
@@ -1236,7 +1240,7 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
evaluator<ElseMatrixType>::CoeffReadCost),
Flags = (unsigned int)evaluator<ThenMatrixType>::Flags & evaluator<ElseMatrixType>::Flags & HereditaryBits,
Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator<ThenMatrixType>::Alignment, evaluator<ElseMatrixType>::Alignment)
};
@@ -1248,7 +1252,7 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
typedef typename XprType::CoeffReturnType CoeffReturnType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -1268,7 +1272,7 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
else
return m_elseImpl.coeff(index);
}
protected:
evaluator<ConditionMatrixType> m_conditionImpl;
evaluator<ThenMatrixType> m_thenImpl;
@@ -1278,7 +1282,7 @@ protected:
// -------------------- Replicate --------------------
template<typename ArgType, int RowFactor, int ColFactor>
template<typename ArgType, int RowFactor, int ColFactor>
struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
: evaluator_base<Replicate<ArgType, RowFactor, ColFactor> >
{
@@ -1289,12 +1293,12 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
};
typedef typename internal::nested_eval<ArgType,Factor>::type ArgTypeNested;
typedef typename internal::remove_all<ArgTypeNested>::type ArgTypeNestedCleaned;
enum {
CoeffReadCost = evaluator<ArgTypeNestedCleaned>::CoeffReadCost,
LinearAccessMask = XprType::IsVectorAtCompileTime ? LinearAccessBit : 0,
Flags = (evaluator<ArgTypeNestedCleaned>::Flags & (HereditaryBits|LinearAccessMask) & ~RowMajorBit) | (traits<XprType>::Flags & RowMajorBit),
Alignment = evaluator<ArgTypeNestedCleaned>::Alignment
};
@@ -1305,7 +1309,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
m_rows(replicate.nestedExpression().rows()),
m_cols(replicate.nestedExpression().cols())
{}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
@@ -1316,10 +1320,10 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
const Index actual_col = internal::traits<XprType>::ColsAtCompileTime==1 ? 0
: ColFactor==1 ? col
: col % m_cols.value();
return m_argImpl.coeff(actual_row, actual_col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
@@ -1327,7 +1331,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
const Index actual_index = internal::traits<XprType>::RowsAtCompileTime==1
? (ColFactor==1 ? index : index%m_cols.value())
: (RowFactor==1 ? index : index%m_rows.value());
return m_argImpl.coeff(actual_index);
}
@@ -1344,7 +1348,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
return m_argImpl.template packet<LoadMode,PacketType>(actual_row, actual_col);
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
@@ -1355,7 +1359,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
return m_argImpl.template packet<LoadMode,PacketType>(actual_index);
}
protected:
const ArgTypeNested m_arg;
evaluator<ArgTypeNestedCleaned> m_argImpl;
@@ -1487,9 +1491,9 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
ReversePacket = (Direction == BothDirections)
|| ((Direction == Vertical) && IsColMajor)
|| ((Direction == Horizontal) && IsRowMajor),
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
// let's enable LinearAccess only with vectorization because of the product overhead
// FIXME enable DirectAccess with negative strides?
Flags0 = evaluator<ArgType>::Flags,
@@ -1498,7 +1502,7 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
? LinearAccessBit : 0,
Flags = int(Flags0) & (HereditaryBits | PacketAccessBit | LinearAccess),
Alignment = 0 // FIXME in some rare cases, Alignment could be preserved, like a Vector4f.
};
@@ -1508,7 +1512,7 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
m_rows(ReverseRow ? reverse.nestedExpression().rows() : 1),
m_cols(ReverseCol ? reverse.nestedExpression().cols() : 1)
{ }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
@@ -1583,7 +1587,7 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
m_argImpl.template writePacket<LoadMode>
(m_rows.value() * m_cols.value() - index - PacketSize, preverse(x));
}
protected:
evaluator<ArgType> m_argImpl;
@@ -1601,12 +1605,12 @@ struct evaluator<Diagonal<ArgType, DiagIndex> >
: evaluator_base<Diagonal<ArgType, DiagIndex> >
{
typedef Diagonal<ArgType, DiagIndex> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
Flags = (unsigned int)(evaluator<ArgType>::Flags & (HereditaryBits | DirectAccessBit) & ~RowMajorBit) | LinearAccessBit,
Alignment = 0
};
@@ -1615,7 +1619,7 @@ struct evaluator<Diagonal<ArgType, DiagIndex> >
: m_argImpl(diagonal.nestedExpression()),
m_index(diagonal.index())
{ }
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
@@ -1648,8 +1652,10 @@ protected:
const internal::variable_if_dynamicindex<Index, XprType::DiagIndex> m_index;
private:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value() > 0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value() > 0 ? m_index.value() : 0; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rowOffset() const { return m_index.value() > 0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index colOffset() const { return m_index.value() > 0 ? m_index.value() : 0; }
};
@@ -1673,25 +1679,25 @@ class EvalToTemp
: public dense_xpr_base<EvalToTemp<ArgType> >::type
{
public:
typedef typename dense_xpr_base<EvalToTemp>::type Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(EvalToTemp)
explicit EvalToTemp(const ArgType& arg)
: m_arg(arg)
{ }
const ArgType& arg() const
{
return m_arg;
}
Index rows() const
EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT
{
return m_arg.rows();
}
Index cols() const
EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT
{
return m_arg.cols();
}
@@ -1699,7 +1705,7 @@ class EvalToTemp
private:
const ArgType& m_arg;
};
template<typename ArgType>
struct evaluator<EvalToTemp<ArgType> >
: public evaluator<typename ArgType::PlainObject>
@@ -1707,7 +1713,7 @@ struct evaluator<EvalToTemp<ArgType> >
typedef EvalToTemp<ArgType> XprType;
typedef typename ArgType::PlainObject PlainObject;
typedef evaluator<PlainObject> Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr)
: m_result(xpr.arg())
{

View File

@@ -74,7 +74,7 @@ class CwiseBinaryOpImpl;
* \sa MatrixBase::binaryExpr(const MatrixBase<OtherDerived> &,const CustomBinaryOp &) const, class CwiseUnaryOp, class CwiseNullaryOp
*/
template<typename BinaryOp, typename LhsType, typename RhsType>
class CwiseBinaryOp :
class CwiseBinaryOp :
public CwiseBinaryOpImpl<
BinaryOp, LhsType, RhsType,
typename internal::cwise_promote_storage_type<typename internal::traits<LhsType>::StorageKind,
@@ -83,7 +83,7 @@ class CwiseBinaryOp :
internal::no_assignment_operator
{
public:
typedef typename internal::remove_all<BinaryOp>::type Functor;
typedef typename internal::remove_all<LhsType>::type Lhs;
typedef typename internal::remove_all<RhsType>::type Rhs;
@@ -116,21 +116,15 @@ class CwiseBinaryOp :
eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols());
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic)
return m_rhs.rows();
else
return m_lhs.rows();
return internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic ? m_rhs.rows() : m_lhs.rows();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic)
return m_rhs.cols();
else
return m_lhs.cols();
return internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic ? m_rhs.cols() : m_lhs.cols();
}
/** \returns the left hand side nested expression */

View File

@@ -74,10 +74,10 @@ class CwiseNullaryOp : public internal::dense_xpr_base< CwiseNullaryOp<NullaryOp
&& (ColsAtCompileTime == Dynamic || ColsAtCompileTime == cols));
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const { return m_rows.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const { return m_cols.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const { return m_rows.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const { return m_cols.value(); }
/** \returns the functor representing the nullary operation */
EIGEN_DEVICE_FUNC
@@ -131,7 +131,7 @@ DenseBase<Derived>::NullaryExpr(Index rows, Index cols, const CustomNullaryOp& f
*
* Here is an example with C++11 random generators: \include random_cpp11.cpp
* Output: \verbinclude random_cpp11.out
*
*
* \sa class CwiseNullaryOp
*/
template<typename Derived>
@@ -383,6 +383,33 @@ PlainObjectBase<Derived>::setConstant(Index rows, Index cols, const Scalar& val)
return setConstant(val);
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to the given value \a val. For the parameter
* of type NoChange_t, just pass the special value \c NoChange.
*
* \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setConstant(NoChange_t, Index cols, const Scalar& val)
{
return setConstant(rows(), cols, val);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to the given value \a val. For the parameter
* of type NoChange_t, just pass the special value \c NoChange.
*
* \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setConstant(Index rows, NoChange_t, const Scalar& val)
{
return setConstant(rows, cols(), val);
}
/**
* \brief Sets a linearly spaced vector.
*
@@ -556,6 +583,32 @@ PlainObjectBase<Derived>::setZero(Index rows, Index cols)
return setConstant(Scalar(0));
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to zero. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Zero()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setZero(NoChange_t, Index cols)
{
return setZero(rows(), cols);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to zero. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Zero()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setZero(Index rows, NoChange_t)
{
return setZero(rows, cols());
}
// ones:
/** \returns an expression of a matrix where all coefficients equal one.
@@ -682,6 +735,32 @@ PlainObjectBase<Derived>::setOnes(Index rows, Index cols)
return setConstant(Scalar(1));
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to one. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(NoChange_t, Index), class CwiseNullaryOp, MatrixBase::Ones()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setOnes(Index rows, NoChange_t)
{
return setOnes(rows, cols());
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to one. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(Index, NoChange_t) class CwiseNullaryOp, MatrixBase::Ones()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setOnes(NoChange_t, Index cols)
{
return setOnes(rows(), cols);
}
// Identity:
/** \returns an expression of the identity matrix (not necessarily square).

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_CWISE_UNARY_OP_H
#define EIGEN_CWISE_UNARY_OP_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename UnaryOp, typename XprType>
@@ -24,7 +24,7 @@ struct traits<CwiseUnaryOp<UnaryOp, XprType> >
typedef typename XprType::Nested XprTypeNested;
typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
enum {
Flags = _XprTypeNested::Flags & RowMajorBit
Flags = _XprTypeNested::Flags & RowMajorBit
};
};
}
@@ -65,10 +65,10 @@ class CwiseUnaryOp : public CwiseUnaryOpImpl<UnaryOp, XprType, typename internal
explicit CwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp())
: m_xpr(xpr), m_functor(func) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_xpr.cols(); }
/** \returns the functor representing the unary operation */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE

View File

@@ -64,23 +64,25 @@ class CwiseUnaryView : public CwiseUnaryViewImpl<ViewOp, MatrixType, typename in
typedef typename internal::ref_selector<MatrixType>::non_const_type MatrixTypeNested;
typedef typename internal::remove_all<MatrixType>::type NestedExpression;
explicit inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
explicit EIGEN_DEVICE_FUNC inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryView)
EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
/** \returns the functor representing unary operation */
const ViewOp& functor() const { return m_functor; }
EIGEN_DEVICE_FUNC const ViewOp& functor() const { return m_functor; }
/** \returns the nested expression */
const typename internal::remove_all<MatrixTypeNested>::type&
EIGEN_DEVICE_FUNC const typename internal::remove_all<MatrixTypeNested>::type&
nestedExpression() const { return m_matrix; }
/** \returns the nested expression */
typename internal::remove_reference<MatrixTypeNested>::type&
EIGEN_DEVICE_FUNC typename internal::remove_reference<MatrixTypeNested>::type&
nestedExpression() { return m_matrix; }
protected:
@@ -108,19 +110,21 @@ class CwiseUnaryViewImpl<ViewOp,MatrixType,Dense>
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryViewImpl)
EIGEN_DEVICE_FUNC inline Scalar* data() { return &(this->coeffRef(0)); }
EIGEN_DEVICE_FUNC inline const Scalar* data() const { return &(this->coeff(0)); }
EIGEN_DEVICE_FUNC inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const
{
return derived().nestedExpression().innerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
EIGEN_DEVICE_FUNC inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const
{
return derived().nestedExpression().outerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(CwiseUnaryViewImpl)
};
} // end namespace Eigen

View File

@@ -14,15 +14,15 @@
namespace Eigen {
namespace internal {
// The index type defined by EIGEN_DEFAULT_DENSE_INDEX_TYPE must be a signed type.
// This dummy function simply aims at checking that at compile time.
static inline void check_DenseIndex_is_signed() {
EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE);
EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE)
}
} // end namespace internal
/** \class DenseBase
* \ingroup Core_Module
*
@@ -64,12 +64,12 @@ template<typename Derived> class DenseBase
/** The numeric type of the expression' coefficients, e.g. float, double, int or std::complex<float>, etc. */
typedef typename internal::traits<Derived>::Scalar Scalar;
/** The numeric type of the expression' coefficients, e.g. float, double, int or std::complex<float>, etc.
*
* It is an alias for the Scalar type */
typedef Scalar value_type;
typedef typename NumTraits<Scalar>::Real RealScalar;
typedef DenseCoeffsBase<Derived, internal::accessors_level<Derived>::value> Base;
@@ -158,7 +158,7 @@ template<typename Derived> class DenseBase
* a row-vector (if there is only one row). */
NumDimensions = int(MaxSizeAtCompileTime) == 1 ? 0 : bool(IsVectorAtCompileTime) ? 1 : 2,
/**< This value is equal to Tensor::NumDimensions, i.e. 0 for scalars, 1 for vectors,
/**< This value is equal to Tensor::NumDimensions, i.e. 0 for scalars, 1 for vectors,
* and 2 for matrices.
*/
@@ -175,11 +175,11 @@ template<typename Derived> class DenseBase
InnerStrideAtCompileTime = internal::inner_stride_at_compile_time<Derived>::ret,
OuterStrideAtCompileTime = internal::outer_stride_at_compile_time<Derived>::ret
};
typedef typename internal::find_best_packet<Scalar,SizeAtCompileTime>::type PacketScalar;
enum { IsPlainObjectBase = 0 };
/** The plain matrix type corresponding to this expression.
* \sa PlainObject */
typedef Matrix<typename internal::traits<Derived>::Scalar,
@@ -189,7 +189,7 @@ template<typename Derived> class DenseBase
internal::traits<Derived>::MaxRowsAtCompileTime,
internal::traits<Derived>::MaxColsAtCompileTime
> PlainMatrix;
/** The plain array type corresponding to this expression.
* \sa PlainObject */
typedef Array<typename internal::traits<Derived>::Scalar,
@@ -211,7 +211,7 @@ template<typename Derived> class DenseBase
/** \returns the number of nonzero coefficients which is in practice the number
* of stored coefficients. */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index nonZeros() const { return size(); }
/** \returns the outer size.
@@ -219,7 +219,7 @@ template<typename Derived> class DenseBase
* \note For a vector, this returns just 1. For a matrix (non-vector), this is the major dimension
* with respect to the \ref TopicStorageOrders "storage order", i.e., the number of columns for a
* column-major matrix, and the number of rows for a row-major matrix. */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index outerSize() const
{
return IsVectorAtCompileTime ? 1
@@ -229,9 +229,9 @@ template<typename Derived> class DenseBase
/** \returns the inner size.
*
* \note For a vector, this is just the size. For a matrix (non-vector), this is the minor dimension
* with respect to the \ref TopicStorageOrders "storage order", i.e., the number of rows for a
* with respect to the \ref TopicStorageOrders "storage order", i.e., the number of rows for a
* column-major matrix, and the number of columns for a row-major matrix. */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index innerSize() const
{
return IsVectorAtCompileTime ? this->size()
@@ -411,7 +411,7 @@ template<typename Derived> class DenseBase
// size types on MSVC.
return typename internal::eval<Derived>::type(derived());
}
/** swaps *this with the expression \a other.
*
*/
@@ -449,18 +449,58 @@ template<typename Derived> class DenseBase
EIGEN_DEVICE_FUNC Scalar prod() const;
template<int NaNPropagation>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar minCoeff() const;
template<int NaNPropagation>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar maxCoeff() const;
template<typename IndexType> EIGEN_DEVICE_FUNC
// By default, the fastest version with undefined NaN propagation semantics is
// used.
// TODO(rmlarsen): Replace with default template argument when we move to
// c++11 or beyond.
EIGEN_DEVICE_FUNC inline typename internal::traits<Derived>::Scalar minCoeff() const {
return minCoeff<PropagateFast>();
}
EIGEN_DEVICE_FUNC inline typename internal::traits<Derived>::Scalar maxCoeff() const {
return maxCoeff<PropagateFast>();
}
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar minCoeff(IndexType* row, IndexType* col) const;
template<typename IndexType> EIGEN_DEVICE_FUNC
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* row, IndexType* col) const;
template<typename IndexType> EIGEN_DEVICE_FUNC
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar minCoeff(IndexType* index) const;
template<typename IndexType> EIGEN_DEVICE_FUNC
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* index) const;
// TODO(rmlarsen): Replace these methods with a default template argument.
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar minCoeff(IndexType* row, IndexType* col) const {
return minCoeff<PropagateFast>(row, col);
}
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* row, IndexType* col) const {
return maxCoeff<PropagateFast>(row, col);
}
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar minCoeff(IndexType* index) const {
return minCoeff<PropagateFast>(index);
}
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* index) const {
return maxCoeff<PropagateFast>(index);
}
template<typename BinaryOp>
EIGEN_DEVICE_FUNC
Scalar redux(const BinaryOp& func) const;
@@ -530,16 +570,16 @@ template<typename Derived> class DenseBase
static const RandomReturnType Random();
template<typename ThenDerived,typename ElseDerived>
const Select<Derived,ThenDerived,ElseDerived>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived,ElseDerived>
select(const DenseBase<ThenDerived>& thenMatrix,
const DenseBase<ElseDerived>& elseMatrix) const;
template<typename ThenDerived>
inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
select(const DenseBase<ThenDerived>& thenMatrix, const typename ThenDerived::Scalar& elseScalar) const;
template<typename ElseDerived>
inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
inline EIGEN_DEVICE_FUNC const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
select(const typename ElseDerived::Scalar& thenScalar, const DenseBase<ElseDerived>& elseMatrix) const;
template<int p> RealScalar lpNorm() const;
@@ -636,11 +676,12 @@ template<typename Derived> class DenseBase
}
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(DenseBase)
/** Default constructor. Do nothing. */
EIGEN_DEVICE_FUNC DenseBase()
{
/* Just checks for self-consistency of the flags.
* Only do it when debugging Eigen, as this borders on paranoiac and could slow compilation down
* Only do it when debugging Eigen, as this borders on paranoia and could slow compilation down
*/
#ifdef EIGEN_INTERNAL_DEBUGGING
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, int(IsRowMajor))

View File

@@ -27,7 +27,7 @@ template<typename T> struct add_const_on_value_type_if_arithmetic
*
* This class defines the \c operator() \c const function and friends, which can be used to read specific
* entries of a matrix or array.
*
*
* \sa DenseCoeffsBase<Derived, WriteAccessors>, DenseCoeffsBase<Derived, DirectAccessors>,
* \ref TopicClassHierarchy
*/
@@ -295,7 +295,7 @@ class DenseCoeffsBase<Derived,ReadOnlyAccessors> : public EigenBase<Derived>
* This class defines the non-const \c operator() function and friends, which can be used to write specific
* entries of a matrix or array. This class inherits DenseCoeffsBase<Derived, ReadOnlyAccessors> which
* defines the const variant for reading specific entries.
*
*
* \sa DenseCoeffsBase<Derived, DirectAccessors>, \ref TopicClassHierarchy
*/
template<typename Derived>
@@ -495,7 +495,7 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa outerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const
{
return derived().innerStride();
@@ -506,14 +506,14 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa innerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const
{
return derived().outerStride();
}
// FIXME shall we remove it ?
inline Index stride() const
EIGEN_CONSTEXPR inline Index stride() const
{
return Derived::IsVectorAtCompileTime ? innerStride() : outerStride();
}
@@ -522,7 +522,7 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa innerStride(), outerStride(), colStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rowStride() const
{
return Derived::IsRowMajor ? outerStride() : innerStride();
@@ -532,7 +532,7 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa innerStride(), outerStride(), rowStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index colStride() const
{
return Derived::IsRowMajor ? innerStride() : outerStride();
@@ -570,8 +570,8 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa outerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT
{
return derived().innerStride();
}
@@ -581,14 +581,14 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa innerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT
{
return derived().outerStride();
}
// FIXME shall we remove it ?
inline Index stride() const
EIGEN_CONSTEXPR inline Index stride() const EIGEN_NOEXCEPT
{
return Derived::IsVectorAtCompileTime ? innerStride() : outerStride();
}
@@ -597,8 +597,8 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa innerStride(), outerStride(), colStride()
*/
EIGEN_DEVICE_FUNC
inline Index rowStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rowStride() const EIGEN_NOEXCEPT
{
return Derived::IsRowMajor ? outerStride() : innerStride();
}
@@ -607,8 +607,8 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa innerStride(), outerStride(), rowStride()
*/
EIGEN_DEVICE_FUNC
inline Index colStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index colStride() const EIGEN_NOEXCEPT
{
return Derived::IsRowMajor ? innerStride() : outerStride();
}
@@ -619,7 +619,7 @@ namespace internal {
template<int Alignment, typename Derived, bool JustReturnZero>
struct first_aligned_impl
{
static inline Index run(const Derived&)
static EIGEN_CONSTEXPR inline Index run(const Derived&) EIGEN_NOEXCEPT
{ return 0; }
};

View File

@@ -47,20 +47,20 @@ struct plain_array
EIGEN_DEVICE_FUNC
plain_array()
{
{
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
{
check_static_allocation_size<T,Size>();
}
};
#if defined(EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT)
#define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask)
#elif EIGEN_GNUC_AT_LEAST(4,7)
#elif EIGEN_GNUC_AT_LEAST(4,7)
// GCC 4.7 is too aggressive in its optimizations and remove the alignment test based on the fact the array is declared to be aligned.
// See this bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53900
// Hiding the origin of the array pointer behind a function argument seems to do the trick even if the function is inlined:
@@ -85,15 +85,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 8>
EIGEN_ALIGN_TO_BOUNDARY(8) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(7);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -104,15 +104,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 16>
EIGEN_ALIGN_TO_BOUNDARY(16) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
{
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(15);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -123,15 +123,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 32>
EIGEN_ALIGN_TO_BOUNDARY(32) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(31);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -142,15 +142,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 64>
EIGEN_ALIGN_TO_BOUNDARY(64) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
{
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(63);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -190,16 +190,28 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
EIGEN_DEVICE_FUNC
explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()) {}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
DenseStorage(const DenseStorage& other) : m_data(other.m_data) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = Size)
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
DenseStorage& operator=(const DenseStorage& other)
{
{
if (this != &other) m_data = other.m_data;
return *this;
return *this;
}
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
: m_data(std::move(other.m_data))
{
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT
{
if (this != &other)
m_data = std::move(other.m_data);
return *this;
}
#endif
EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({})
eigen_internal_assert(size==rows*cols && rows==_Rows && cols==_Cols);
@@ -210,8 +222,8 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data, other.m_data);
}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) EIGEN_NOEXCEPT {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC void resize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
@@ -228,8 +240,8 @@ template<typename T, int _Rows, int _Cols, int _Options> class DenseStorage<T, 0
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) { return *this; }
EIGEN_DEVICE_FUNC DenseStorage(Index,Index,Index) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& ) {}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) EIGEN_NOEXCEPT {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC void resize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC const T *data() const { return 0; }
@@ -257,15 +269,15 @@ template<typename T, int Size, int _Options> class DenseStorage<T, Size, Dynamic
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0), m_cols(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows), m_cols(other.m_cols) {}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_rows = other.m_rows;
m_cols = other.m_cols;
}
return *this;
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index cols) : m_rows(rows), m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
@@ -292,14 +304,14 @@ template<typename T, int Size, int _Cols, int _Options> class DenseStorage<T, Si
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows) {}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_rows = other.m_rows;
}
return *this;
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index) : m_rows(rows) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
@@ -307,8 +319,8 @@ template<typename T, int Size, int _Cols, int _Options> class DenseStorage<T, Si
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return _Cols;}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols(void) const EIGEN_NOEXCEPT {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index rows, Index) { m_rows = rows; }
EIGEN_DEVICE_FUNC void resize(Index, Index rows, Index) { m_rows = rows; }
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
@@ -339,8 +351,8 @@ template<typename T, int Size, int _Rows, int _Options> class DenseStorage<T, Si
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows(void) const EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC void resize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
@@ -407,8 +419,8 @@ template<typename T, int _Options> class DenseStorage<T, Dynamic, Dynamic, Dynam
numext::swap(m_rows,other.m_rows);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}
void conservativeResize(Index size, Index rows, Index cols)
{
m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, m_rows*m_cols);
@@ -462,7 +474,7 @@ template<typename T, int _Rows, int _Options> class DenseStorage<T, Dynamic, _Ro
this->swap(tmp);
}
return *this;
}
}
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC
DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
@@ -485,8 +497,8 @@ template<typename T, int _Rows, int _Options> class DenseStorage<T, Dynamic, _Ro
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index size, Index, Index cols)
{
m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, _Rows*m_cols);
@@ -538,7 +550,7 @@ template<typename T, int _Cols, int _Options> class DenseStorage<T, Dynamic, Dyn
this->swap(tmp);
}
return *this;
}
}
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC
DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
@@ -561,8 +573,8 @@ template<typename T, int _Cols, int _Options> class DenseStorage<T, Dynamic, Dyn
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) {return _Cols;}
void conservativeResize(Index size, Index rows, Index)
{
m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, m_rows*_Cols);

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_DIAGONAL_H
#define EIGEN_DIAGONAL_H
namespace Eigen {
namespace Eigen {
/** \class Diagonal
* \ingroup Core_Module
@@ -84,20 +84,16 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
: numext::mini<Index>(m_matrix.rows(),m_matrix.cols()-m_index.value());
}
EIGEN_DEVICE_FUNC
inline Index cols() const { return 1; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return 1; }
EIGEN_DEVICE_FUNC
inline Index innerStride() const
{
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT {
return m_matrix.outerStride() + 1;
}
EIGEN_DEVICE_FUNC
inline Index outerStride() const
{
return 0;
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return 0; }
typedef typename internal::conditional<
internal::is_lvalue<MatrixType>::value,
@@ -149,8 +145,8 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
}
EIGEN_DEVICE_FUNC
inline const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
inline const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
{
return m_matrix;
}
@@ -167,12 +163,12 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
private:
// some compilers may fail to optimize std::max etc in case of compile-time constants...
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index absDiagIndex() const { return m_index.value()>0 ? m_index.value() : -m_index.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value()>0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value()>0 ? m_index.value() : 0; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index absDiagIndex() const EIGEN_NOEXCEPT { return m_index.value()>0 ? m_index.value() : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rowOffset() const EIGEN_NOEXCEPT { return m_index.value()>0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index colOffset() const EIGEN_NOEXCEPT { return m_index.value()>0 ? m_index.value() : 0; }
// trigger a compile-time error if someone try to call packet
template<int LoadMode> typename MatrixType::PacketReturnType packet(Index) const;
template<int LoadMode> typename MatrixType::PacketReturnType packet(Index,Index) const;

View File

@@ -207,7 +207,7 @@ struct lpNorm_selector
EIGEN_DEVICE_FUNC
static inline RealScalar run(const MatrixBase<Derived>& m)
{
EIGEN_USING_STD_MATH(pow)
EIGEN_USING_STD(pow)
return pow(m.cwiseAbs().array().pow(p).sum(), RealScalar(1)/p);
}
};

View File

@@ -15,7 +15,7 @@ namespace Eigen {
/** \class EigenBase
* \ingroup Core_Module
*
*
* Common base class for all classes T such that MatrixBase has an operator=(T) and a constructor MatrixBase(T).
*
* In other words, an EigenBase object is an object that can be copied into a MatrixBase.
@@ -29,7 +29,7 @@ namespace Eigen {
template<typename Derived> struct EigenBase
{
// typedef typename internal::plain_matrix_type<Derived>::type PlainObject;
/** \brief The interface type of indices
* \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE.
* \sa StorageIndex, \ref TopicPreprocessorDirectives.
@@ -56,15 +56,15 @@ template<typename Derived> struct EigenBase
{ return *static_cast<const Derived*>(this); }
/** \returns the number of rows. \sa cols(), RowsAtCompileTime */
EIGEN_DEVICE_FUNC
inline Index rows() const { return derived().rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); }
/** \returns the number of columns. \sa rows(), ColsAtCompileTime*/
EIGEN_DEVICE_FUNC
inline Index cols() const { return derived().cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return derived().cols(); }
/** \returns the number of coefficients, which is rows()*cols().
* \sa rows(), cols(), SizeAtCompileTime. */
EIGEN_DEVICE_FUNC
inline Index size() const { return rows() * cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index size() const EIGEN_NOEXCEPT { return rows() * cols(); }
/** \internal Don't use it, but do the equivalent: \code dst = *this; \endcode */
template<typename Dest>

View File

@@ -41,10 +41,14 @@ template<typename ExpressionType> class ForceAlignedAccess
EIGEN_DEVICE_FUNC explicit inline ForceAlignedAccess(const ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index row, Index col) const
{

View File

@@ -228,8 +228,7 @@ template<> struct gemv_dense_selector<OnTheRight,ColMajor,true>
ActualLhsType actualLhs = LhsBlasTraits::extract(lhs);
ActualRhsType actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
// make sure Dest is a compile-time vector type (bug 1166)
typedef typename conditional<Dest::IsVectorAtCompileTime, Dest, typename Dest::ColXpr>::type ActualDest;
@@ -320,8 +319,7 @@ template<> struct gemv_dense_selector<OnTheRight,RowMajor,true>
typename add_const<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(lhs);
typename add_const<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
enum {
// FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1

View File

@@ -44,19 +44,23 @@ struct default_packet_traits
enum {
HasHalfPacket = 0,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasNegate = 1,
HasAbs = 1,
HasArg = 0,
HasAbs2 = 1,
HasMin = 1,
HasMax = 1,
HasConj = 1,
HasAdd = 1,
HasSub = 1,
HasShift = 1,
HasMul = 1,
HasNegate = 1,
HasAbs = 1,
HasArg = 0,
HasAbs2 = 1,
HasAbsDiff = 0,
HasMin = 1,
HasMax = 1,
HasConj = 1,
HasSetLinear = 1,
HasBlend = 0,
HasReduxp = 1,
HasBlend = 0,
// This flag is used to indicate whether packet comparison is supported.
// pcmp_eq, pcmp_lt and pcmp_le should be defined for it to be true.
HasCmp = 0,
HasDiv = 0,
HasSqrt = 0,
@@ -92,9 +96,9 @@ struct default_packet_traits
HasBetaInc = 0,
HasRound = 0,
HasRint = 0,
HasFloor = 0,
HasCeil = 0,
HasSign = 0
};
};
@@ -133,6 +137,22 @@ template <typename Src, typename Tgt> struct type_casting_traits {
};
};
/** \internal Wrapper to ensure that multiple packet types can map to the same
same underlying vector type. */
template<typename T, int unique_id = 0>
struct eigen_packet_wrapper
{
EIGEN_ALWAYS_INLINE operator T&() { return m_val; }
EIGEN_ALWAYS_INLINE operator const T&() const { return m_val; }
EIGEN_ALWAYS_INLINE eigen_packet_wrapper() {}
EIGEN_ALWAYS_INLINE eigen_packet_wrapper(const T &v) : m_val(v) {}
EIGEN_ALWAYS_INLINE eigen_packet_wrapper& operator=(const T &v) {
m_val = v;
return *this;
}
T m_val;
};
/** \internal \returns static_cast<TgtType>(a) (coeff-wise) */
template <typename SrcPacket, typename TgtPacket>
@@ -145,12 +165,17 @@ EIGEN_DEVICE_FUNC inline TgtPacket
pcast(const SrcPacket& a, const SrcPacket& /*b*/) {
return static_cast<TgtPacket>(a);
}
template <typename SrcPacket, typename TgtPacket>
EIGEN_DEVICE_FUNC inline TgtPacket
pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/) {
return static_cast<TgtPacket>(a);
}
template <typename SrcPacket, typename TgtPacket>
EIGEN_DEVICE_FUNC inline TgtPacket
pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/,
const SrcPacket& /*e*/, const SrcPacket& /*f*/, const SrcPacket& /*g*/, const SrcPacket& /*h*/) {
return static_cast<TgtPacket>(a);
}
/** \internal \returns reinterpret_cast<Target>(a) */
template <typename Target, typename Packet>
@@ -160,6 +185,9 @@ preinterpret(const Packet& a); /* { return reinterpret_cast<const Target&>(a); }
/** \internal \returns a + b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
padd(const Packet& a, const Packet& b) { return a+b; }
// Avoid compiler warning for boolean algebra.
template<> EIGEN_DEVICE_FUNC inline bool
padd(const bool& a, const bool& b) { return a || b; }
/** \internal \returns a - b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -169,128 +197,31 @@ psub(const Packet& a, const Packet& b) { return a-b; }
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pnegate(const Packet& a) { return -a; }
/** \internal \returns conj(a) (coeff-wise) */
template<> EIGEN_DEVICE_FUNC inline bool
pnegate(const bool& a) { return !a; }
/** \internal \returns conj(a) (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pconj(const Packet& a) { return numext::conj(a); }
/** \internal \returns a * b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmul(const Packet& a, const Packet& b) { return a*b; }
// Avoid compiler warning for boolean algebra.
template<> EIGEN_DEVICE_FUNC inline bool
pmul(const bool& a, const bool& b) { return a && b; }
/** \internal \returns a / b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pdiv(const Packet& a, const Packet& b) { return a/b; }
/** \internal \returns the min of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmin(const Packet& a, const Packet& b) { return numext::mini(a, b); }
/** \internal \returns the max of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmax(const Packet& a, const Packet& b) { return numext::maxi(a, b); }
/** \internal \returns the absolute value of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pabs(const Packet& a) { using std::abs; return abs(a); }
/** \internal \returns the phase angle of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
parg(const Packet& a) { using numext::arg; return arg(a); }
/** \internal \returns the bitwise and of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pand(const Packet& a, const Packet& b) { return a & b; }
/** \internal \returns the bitwise or of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
por(const Packet& a, const Packet& b) { return a | b; }
/** \internal \returns the bitwise xor of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pxor(const Packet& a, const Packet& b) { return a ^ b; }
/** \internal \returns the bitwise andnot of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pandnot(const Packet& a, const Packet& b) { return a & (~b); }
/** \internal \returns ones */
/** \internal \returns one bits */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ptrue(const Packet& /*a*/) { Packet b; memset((void*)&b, 0xff, sizeof(b)); return b;}
template <typename RealScalar>
EIGEN_DEVICE_FUNC inline std::complex<RealScalar> ptrue(const std::complex<RealScalar>& /*a*/) {
RealScalar b;
b = ptrue(b);
return std::complex<RealScalar>(b, b);
}
/** \internal \returns the bitwise not of \a a */
template <typename Packet> EIGEN_DEVICE_FUNC inline Packet
pnot(const Packet& a) { return pxor(ptrue(a), a);}
/** \internal \returns \a a shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
pshiftright(const int& a) { return a >> N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
pshiftright(const long int& a) { return a >> N; }
/** \internal \returns \a a shifted by N bits to the left */
template<int N> EIGEN_DEVICE_FUNC inline int
pshiftleft(const int& a) { return a << N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
pshiftleft(const long int& a) { return a << N; }
/** \internal \returns the significant and exponent of the underlying floating point numbers
* See https://en.cppreference.com/w/cpp/numeric/math/frexp
*/
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet pfrexp(const Packet& a, Packet& exponent) {
int exp;
EIGEN_USING_STD_MATH(frexp);
Packet result = frexp(a, &exp);
exponent = static_cast<Packet>(exp);
return result;
}
/** \internal \returns a * 2^exponent
* See https://en.cppreference.com/w/cpp/numeric/math/ldexp
*/
/** \internal \returns zero bits */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pldexp(const Packet &a, const Packet &exponent) {
EIGEN_USING_STD_MATH(ldexp);
return ldexp(a, static_cast<int>(exponent));
}
/** \internal \returns zeros */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pzero(const Packet& a) { return pxor(a,a); }
template<> EIGEN_DEVICE_FUNC inline float pzero<float>(const float& a) {
EIGEN_UNUSED_VARIABLE(a);
return 0.f;
}
template<> EIGEN_DEVICE_FUNC inline double pzero<double>(const double& a) {
EIGEN_UNUSED_VARIABLE(a);
return 0.;
}
/** \internal \returns bits of \a or \b according to the input bit mask \a mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pselect(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
template<> EIGEN_DEVICE_FUNC inline float pselect<float>(
const float& mask, const float& a, const float&b) {
return numext::equal_strict(mask,0.f) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline double pselect<double>(
const double& mask, const double& a, const double& b) {
return numext::equal_strict(mask,0.) ? b : a;
}
pzero(const Packet& /*a*/) { Packet b; memset((void*)&b, 0, sizeof(b)); return b;}
/** \internal \returns a <= b as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -306,7 +237,234 @@ pcmp_eq(const Packet& a, const Packet& b) { return a==b ? ptrue(a) : pzero(a); }
/** \internal \returns a < b or a==NaN or b==NaN as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_lt_or_nan(const Packet& a, const Packet& b) { return pnot(pcmp_le(b,a)); }
pcmp_lt_or_nan(const Packet& a, const Packet& b) { return a>=b ? pzero(a) : ptrue(a); }
template<> EIGEN_DEVICE_FUNC inline float pzero<float>(const float& a) {
EIGEN_UNUSED_VARIABLE(a)
return 0.f;
}
template<> EIGEN_DEVICE_FUNC inline double pzero<double>(const double& a) {
EIGEN_UNUSED_VARIABLE(a)
return 0.;
}
template <typename RealScalar>
EIGEN_DEVICE_FUNC inline std::complex<RealScalar> ptrue(const std::complex<RealScalar>& /*a*/) {
RealScalar b = ptrue(RealScalar(0));
return std::complex<RealScalar>(b, b);
}
template <typename Packet, typename Op>
EIGEN_DEVICE_FUNC inline Packet bitwise_helper(const Packet& a, const Packet& b, Op op) {
const unsigned char* a_ptr = reinterpret_cast<const unsigned char*>(&a);
const unsigned char* b_ptr = reinterpret_cast<const unsigned char*>(&b);
Packet c;
unsigned char* c_ptr = reinterpret_cast<unsigned char*>(&c);
for (size_t i = 0; i < sizeof(Packet); ++i) {
*c_ptr++ = op(*a_ptr++, *b_ptr++);
}
return c;
}
template<typename T>
struct bit_and {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const {
return a & b;
}
};
template<typename T>
struct bit_or {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const {
return a | b;
}
};
template<typename T>
struct bit_xor {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const {
return a ^ b;
}
};
/** \internal \returns the bitwise and of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pand(const Packet& a, const Packet& b) {
return bitwise_helper(a, b, bit_and<unsigned char>());
}
/** \internal \returns the bitwise or of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
por(const Packet& a, const Packet& b) {
return bitwise_helper(a ,b, bit_or<unsigned char>());
}
/** \internal \returns the bitwise xor of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pxor(const Packet& a, const Packet& b) {
return bitwise_helper(a ,b, bit_xor<unsigned char>());
}
/** \internal \returns the bitwise and of \a a and not \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pandnot(const Packet& a, const Packet& b) { return pand(a, pxor(ptrue(b), b)); }
/** \internal \returns \a or \b for each field in packet according to \mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pselect(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
template<> EIGEN_DEVICE_FUNC inline float pselect<float>(
const float& cond, const float& a, const float&b) {
return numext::equal_strict(cond,0.f) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline double pselect<double>(
const double& cond, const double& a, const double& b) {
return numext::equal_strict(cond,0.) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline bool pselect<bool>(
const bool& cond, const bool& a, const bool& b) {
return cond ? a : b;
}
/** \internal \returns the min or of \a a and \a b (coeff-wise)
If either \a a or \a b are NaN, the result is implementation defined. */
template<int NaNPropagation>
struct pminmax_impl {
template <typename Packet, typename Op>
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) {
return op(a,b);
}
};
/** \internal \returns the min or max of \a a and \a b (coeff-wise)
If either \a a or \a b are NaN, NaN is returned. */
template<>
struct pminmax_impl<PropagateNaN> {
template <typename Packet, typename Op>
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) {
Packet not_nan_mask_a = pcmp_eq(a, a);
Packet not_nan_mask_b = pcmp_eq(b, b);
return pselect(not_nan_mask_a,
pselect(not_nan_mask_b, op(a, b), b),
a);
}
};
/** \internal \returns the min or max of \a a and \a b (coeff-wise)
If both \a a and \a b are NaN, NaN is returned.
Equivalent to std::fmin(a, b). */
template<>
struct pminmax_impl<PropagateNumbers> {
template <typename Packet, typename Op>
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) {
Packet not_nan_mask_a = pcmp_eq(a, a);
Packet not_nan_mask_b = pcmp_eq(b, b);
return pselect(not_nan_mask_a,
pselect(not_nan_mask_b, op(a, b), a),
b);
}
};
#ifndef SYCL_DEVICE_ONLY
#define EIGEN_BINARY_OP_NAN_PROPAGATION(Type, Func) Func
#else
#define EIGEN_BINARY_OP_NAN_PROPAGATION(Type, Func) \
[](const Type& a, const Type& b) { \
return Func(a, b);}
#endif
/** \internal \returns the min of \a a and \a b (coeff-wise).
If \a a or \b b is NaN, the return value is implementation defined. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmin(const Packet& a, const Packet& b) { return numext::mini(a,b); }
/** \internal \returns the min of \a a and \a b (coeff-wise).
NaNPropagation determines the NaN propagation semantics. */
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline Packet pmin(const Packet& a, const Packet& b) {
return pminmax_impl<NaNPropagation>::run(a, b, EIGEN_BINARY_OP_NAN_PROPAGATION(Packet, (pmin<Packet>)));
}
/** \internal \returns the max of \a a and \a b (coeff-wise)
If \a a or \b b is NaN, the return value is implementation defined. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmax(const Packet& a, const Packet& b) { return numext::maxi(a, b); }
/** \internal \returns the max of \a a and \a b (coeff-wise).
NaNPropagation determines the NaN propagation semantics. */
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline Packet pmax(const Packet& a, const Packet& b) {
return pminmax_impl<NaNPropagation>::run(a, b, EIGEN_BINARY_OP_NAN_PROPAGATION(Packet,(pmax<Packet>)));
}
/** \internal \returns the absolute value of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pabs(const Packet& a) { return numext::abs(a); }
template<> EIGEN_DEVICE_FUNC inline unsigned int
pabs(const unsigned int& a) { return a; }
template<> EIGEN_DEVICE_FUNC inline unsigned long
pabs(const unsigned long& a) { return a; }
template<> EIGEN_DEVICE_FUNC inline unsigned long long
pabs(const unsigned long long& a) { return a; }
/** \internal \returns the addsub value of \a a,b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
paddsub(const Packet& a, const Packet& b) {
return pselect(peven_mask(a), padd(a, b), psub(a, b));
}
/** \internal \returns the phase angle of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
parg(const Packet& a) { using numext::arg; return arg(a); }
/** \internal \returns \a a logically shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
parithmetic_shift_right(const int& a) { return a >> N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
parithmetic_shift_right(const long int& a) { return a >> N; }
/** \internal \returns \a a arithmetically shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
plogical_shift_right(const int& a) { return static_cast<int>(static_cast<unsigned int>(a) >> N); }
template<int N> EIGEN_DEVICE_FUNC inline long int
plogical_shift_right(const long int& a) { return static_cast<long>(static_cast<unsigned long>(a) >> N); }
/** \internal \returns \a a shifted by N bits to the left */
template<int N> EIGEN_DEVICE_FUNC inline int
plogical_shift_left(const int& a) { return a << N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
plogical_shift_left(const long int& a) { return a << N; }
/** \internal \returns the significant and exponent of the underlying floating point numbers
* See https://en.cppreference.com/w/cpp/numeric/math/frexp
*/
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet pfrexp(const Packet& a, Packet& exponent) {
int exp;
EIGEN_USING_STD(frexp);
Packet result = static_cast<Packet>(frexp(a, &exp));
exponent = static_cast<Packet>(exp);
return result;
}
/** \internal \returns a * 2^((int)exponent)
* See https://en.cppreference.com/w/cpp/numeric/math/ldexp
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pldexp(const Packet &a, const Packet &exponent) {
EIGEN_USING_STD(ldexp)
return static_cast<Packet>(ldexp(a, static_cast<int>(exponent)));
}
/** \internal \returns the min of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pabsdiff(const Packet& a, const Packet& b) { return pselect(pcmp_lt(a, b), psub(b, a), psub(a, b)); }
/** \internal \returns a packet version of \a *from, from must be 16 bytes aligned */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -348,7 +506,7 @@ ploaddup(const typename unpacket_traits<Packet>::type* from) { return *from; }
* For instance, for a packet of 8 elements, 2 scalars will be read from \a *from and
* replicated to form: {from[0],from[0],from[0],from[0],from[1],from[1],from[1],from[1]}
* Currently, this function is only used in matrix products.
* For packet-size smaller or equal to 4, this function is equivalent to pload1
* For packet-size smaller or equal to 4, this function is equivalent to pload1
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ploadquad(const typename unpacket_traits<Packet>::type* from)
@@ -392,6 +550,20 @@ inline void pbroadcast2(const typename unpacket_traits<Packet>::type *a,
template<typename Packet> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet
plset(const typename unpacket_traits<Packet>::type& a) { return a; }
/** \internal \returns a packet with constant coefficients \a a, e.g.: (x, 0, x, 0),
where x is the value of all 1-bits. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
peven_mask(const Packet& /*a*/) {
typedef typename unpacket_traits<Packet>::type Scalar;
const size_t n = unpacket_traits<Packet>::size;
EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) Scalar elements[n];
for(size_t i = 0; i < n; ++i) {
memset(elements+i, ((i & 1) == 0 ? 0xff : 0), sizeof(Scalar));
}
return ploadu<Packet>(elements);
}
/** \internal copy the packet \a from to \a *to, \a to must be 16 bytes aligned */
template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstore(Scalar* to, const Packet& from)
{ (*to) = from; }
@@ -421,7 +593,7 @@ template<typename Scalar> EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* a
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// do nothing
#elif defined(EIGEN_CUDA_ARCH)
#if defined(__LP64__)
#if defined(__LP64__) || EIGEN_OS_WIN64
// 64-bit pointer operand constraint for inlined asm
asm(" prefetch.L1 [ %1 ];" : "=l"(addr) : "l"(addr));
#else
@@ -433,38 +605,189 @@ template<typename Scalar> EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* a
#endif
}
/** \internal \returns the first element of a packet */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type pfirst(const Packet& a)
/** \internal \returns the reversed elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a)
{ return a; }
/** \internal \returns a packet where the element i contains the sum of the packet of \a vec[i] */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
preduxp(const Packet* vecs) { return vecs[0]; }
/** \internal \returns \a a with real and imaginary part flipped (for complex type only) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a)
{
return Packet(numext::imag(a),numext::real(a));
}
/** \internal \returns the sum of the elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux(const Packet& a)
/**************************
* Special math functions
***************************/
/** \internal \returns the sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psin(const Packet& a) { EIGEN_USING_STD(sin); return sin(a); }
/** \internal \returns the cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcos(const Packet& a) { EIGEN_USING_STD(cos); return cos(a); }
/** \internal \returns the tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptan(const Packet& a) { EIGEN_USING_STD(tan); return tan(a); }
/** \internal \returns the arc sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pasin(const Packet& a) { EIGEN_USING_STD(asin); return asin(a); }
/** \internal \returns the arc cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pacos(const Packet& a) { EIGEN_USING_STD(acos); return acos(a); }
/** \internal \returns the arc tangent of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet patan(const Packet& a) { EIGEN_USING_STD(atan); return atan(a); }
/** \internal \returns the hyperbolic sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psinh(const Packet& a) { EIGEN_USING_STD(sinh); return sinh(a); }
/** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcosh(const Packet& a) { EIGEN_USING_STD(cosh); return cosh(a); }
/** \internal \returns the hyperbolic tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptanh(const Packet& a) { EIGEN_USING_STD(tanh); return tanh(a); }
/** \internal \returns the exp of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexp(const Packet& a) { EIGEN_USING_STD(exp); return exp(a); }
/** \internal \returns the expm1 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexpm1(const Packet& a) { return numext::expm1(a); }
/** \internal \returns the log of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog(const Packet& a) { EIGEN_USING_STD(log); return log(a); }
/** \internal \returns the log1p of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog1p(const Packet& a) { return numext::log1p(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog10(const Packet& a) { EIGEN_USING_STD(log10); return log10(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog2(const Packet& a) {
typedef typename internal::unpacket_traits<Packet>::type Scalar;
return pmul(pset1<Packet>(Scalar(EIGEN_LOG2E)), plog(a));
}
/** \internal \returns the square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psqrt(const Packet& a) { return numext::sqrt(a); }
/** \internal \returns the reciprocal square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet prsqrt(const Packet& a) {
typedef typename internal::unpacket_traits<Packet>::type Scalar;
return pdiv(pset1<Packet>(Scalar(1)), psqrt(a));
}
/** \internal \returns the rounded value of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pround(const Packet& a) { using numext::round; return round(a); }
/** \internal \returns the floor of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pfloor(const Packet& a) { using numext::floor; return floor(a); }
/** \internal \returns the rounded value of \a a (coeff-wise) with current
* rounding mode */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet print(const Packet& a) { using numext::rint; return rint(a); }
/** \internal \returns the ceil of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pceil(const Packet& a) { using numext::ceil; return ceil(a); }
/** \internal \returns the first element of a packet */
template<typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type
pfirst(const Packet& a)
{ return a; }
/** \internal \returns the sum of the elements of upper and lower half of \a a if \a a is larger than 4.
* For a packet {a0, a1, a2, a3, a4, a5, a6, a7}, it returns a half packet {a0+a4, a1+a5, a2+a6, a3+a7}
* For packet-size smaller or equal to 4, this boils down to a noop.
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline
typename conditional<(unpacket_traits<Packet>::size%8)==0,typename unpacket_traits<Packet>::half,Packet>::type
template<typename Packet>
EIGEN_DEVICE_FUNC inline typename conditional<(unpacket_traits<Packet>::size%8)==0,typename unpacket_traits<Packet>::half,Packet>::type
predux_half_dowto4(const Packet& a)
{ return a; }
// Slow generic implementation of Packet reduction.
template <typename Packet, typename Op>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type
predux_helper(const Packet& a, Op op) {
typedef typename unpacket_traits<Packet>::type Scalar;
const size_t n = unpacket_traits<Packet>::size;
EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) Scalar elements[n];
pstoreu<Scalar>(elements, a);
for(size_t k = n / 2; k > 0; k /= 2) {
for(size_t i = 0; i < k; ++i) {
elements[i] = op(elements[i], elements[i + k]);
}
}
return elements[0];
}
/** \internal \returns the sum of the elements of \a a*/
template<typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type
predux(const Packet& a)
{
return a;
}
/** \internal \returns the product of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_mul(const Packet& a)
{ return a; }
template <typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_mul(
const Packet& a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmul<Scalar>)));
}
/** \internal \returns the min of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(const Packet& a)
{ return a; }
template <typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(
const Packet &a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmin<PropagateFast, Scalar>)));
}
/** \internal \returns the max of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(const Packet& a)
{ return a; }
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(
const Packet& a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmin<NaNPropagation, Scalar>)));
}
/** \internal \returns the min of the elements of \a a */
template <typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(
const Packet &a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmax<PropagateFast, Scalar>)));
}
template <int NaNPropagation, typename Packet>
EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(
const Packet& a) {
typedef typename unpacket_traits<Packet>::type Scalar;
return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmax<NaNPropagation, Scalar>)));
}
#undef EIGEN_BINARY_OP_NAN_PROPAGATION
/** \internal \returns true if all coeffs of \a a means "true"
* It is supposed to be called on values returned by pcmp_*.
@@ -484,101 +807,10 @@ template<typename Packet> EIGEN_DEVICE_FUNC inline bool predux_any(const Packet&
// - bits full of ones (NaN for floats),
// - or first bit equals to 1 (1 for ints, smallest denormal for floats).
// For all these cases, taking the sum is just fine, and this boils down to a no-op for scalars.
return bool(predux(a));
typedef typename unpacket_traits<Packet>::type Scalar;
return numext::not_equal_strict(predux(a), Scalar(0));
}
/** \internal \returns the reversed elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a)
{ return a; }
/** \internal \returns \a a with real and imaginary part flipped (for complex type only) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a)
{
return Packet(numext::imag(a),numext::real(a));
}
/**************************
* Special math functions
***************************/
/** \internal \returns the sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psin(const Packet& a) { EIGEN_USING_STD_MATH(sin); return sin(a); }
/** \internal \returns the cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcos(const Packet& a) { EIGEN_USING_STD_MATH(cos); return cos(a); }
/** \internal \returns the tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptan(const Packet& a) { EIGEN_USING_STD_MATH(tan); return tan(a); }
/** \internal \returns the arc sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pasin(const Packet& a) { EIGEN_USING_STD_MATH(asin); return asin(a); }
/** \internal \returns the arc cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pacos(const Packet& a) { EIGEN_USING_STD_MATH(acos); return acos(a); }
/** \internal \returns the arc tangent of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet patan(const Packet& a) { EIGEN_USING_STD_MATH(atan); return atan(a); }
/** \internal \returns the hyperbolic sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psinh(const Packet& a) { EIGEN_USING_STD_MATH(sinh); return sinh(a); }
/** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcosh(const Packet& a) { EIGEN_USING_STD_MATH(cosh); return cosh(a); }
/** \internal \returns the hyperbolic tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptanh(const Packet& a) { EIGEN_USING_STD_MATH(tanh); return tanh(a); }
/** \internal \returns the exp of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexp(const Packet& a) { EIGEN_USING_STD_MATH(exp); return exp(a); }
/** \internal \returns the expm1 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexpm1(const Packet& a) { return numext::expm1(a); }
/** \internal \returns the log of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog(const Packet& a) { EIGEN_USING_STD_MATH(log); return log(a); }
/** \internal \returns the log1p of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog1p(const Packet& a) { return numext::log1p(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog10(const Packet& a) { EIGEN_USING_STD_MATH(log10); return log10(a); }
/** \internal \returns the square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psqrt(const Packet& a) { EIGEN_USING_STD_MATH(sqrt); return sqrt(a); }
/** \internal \returns the reciprocal square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet prsqrt(const Packet& a) {
return pdiv(pset1<Packet>(1), psqrt(a));
}
/** \internal \returns the rounded value of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pround(const Packet& a) { using numext::round; return round(a); }
/** \internal \returns the floor of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pfloor(const Packet& a) { using numext::floor; return floor(a); }
/** \internal \returns the ceil of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pceil(const Packet& a) { using numext::ceil; return ceil(a); }
/***************************************************************************
* The following functions might not have to be overwritten for vectorized types
***************************************************************************/
@@ -631,35 +863,6 @@ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet ploadt_ro(const typename unpacket_t
return ploadt<Packet, LoadMode>(from);
}
/** \internal default implementation of palign() allowing partial specialization */
template<int Offset,typename PacketType>
struct palign_impl
{
// by default data are aligned, so there is nothing to be done :)
static inline void run(PacketType&, const PacketType&) {}
};
/** \internal update \a first using the concatenation of the packet_size minus \a Offset last elements
* of \a first and \a Offset first elements of \a second.
*
* This function is currently only used to optimize matrix-vector products on unligned matrices.
* It takes 2 packets that represent a contiguous memory array, and returns a packet starting
* at the position \a Offset. For instance, for packets of 4 elements, we have:
* Input:
* - first = {f0,f1,f2,f3}
* - second = {s0,s1,s2,s3}
* Output:
* - if Offset==0 then {f0,f1,f2,f3}
* - if Offset==1 then {f1,f2,f3,s0}
* - if Offset==2 then {f2,f3,s0,s1}
* - if Offset==3 then {f3,s0,s1,s3}
*/
template<int Offset,typename PacketType>
inline void palign(PacketType& first, const PacketType& second)
{
palign_impl<Offset,PacketType>::run(first,second);
}
/***************************************************************************
* Fast complex products (GCC generates a function call which is very slow)
***************************************************************************/
@@ -702,50 +905,6 @@ pblend(const Selector<unpacket_traits<Packet>::size>& ifPacket, const Packet& th
return ifPacket.select[0] ? thenPacket : elsePacket;
}
/** \internal \returns \a a with the first coefficient replaced by the scalar b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pinsertfirst(const Packet& a, typename unpacket_traits<Packet>::type b)
{
// Default implementation based on pblend.
// It must be specialized for higher performance.
Selector<unpacket_traits<Packet>::size> mask;
mask.select[0] = true;
// This for loop should be optimized away by the compiler.
for(Index i=1; i<unpacket_traits<Packet>::size; ++i)
mask.select[i] = false;
return pblend(mask, pset1<Packet>(b), a);
}
/** \internal \returns \a a with the last coefficient replaced by the scalar b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pinsertlast(const Packet& a, typename unpacket_traits<Packet>::type b)
{
// Default implementation based on pblend.
// It must be specialized for higher performance.
Selector<unpacket_traits<Packet>::size> mask;
// This for loop should be optimized away by the compiler.
for(Index i=0; i<unpacket_traits<Packet>::size-1; ++i)
mask.select[i] = false;
mask.select[unpacket_traits<Packet>::size-1] = true;
return pblend(mask, pset1<Packet>(b), a);
}
/***************************************************************************
* Some generic implementations to be used by implementors
***************************************************************************/
/** Default implementation of pfrexp for float.
* It is expected to be called by implementers of template<> pfrexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent);
/** Default implementation of pldexp for float.
* It is expected to be called by implementers of template<> pldexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent);
} // end namespace internal
} // end namespace Eigen

View File

@@ -81,14 +81,16 @@ namespace Eigen
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(expm1,scalar_expm1_op,exponential of a value minus 1,\sa ArrayBase::expm1)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op,natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p,scalar_log1p_op,natural logarithm of 1 plus the value,\sa ArrayBase::log1p)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log10)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log2,scalar_log2_op,base 2 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs,scalar_abs_op,absolute value,\sa ArrayBase::abs DOXCOMMA MatrixBase::cwiseAbs)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs2,scalar_abs2_op,squared absolute value,\sa ArrayBase::abs2 DOXCOMMA MatrixBase::cwiseAbs2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg DOXCOMMA MatrixBase::cwiseArg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt,scalar_sqrt_op,square root,\sa ArrayBase::sqrt DOXCOMMA MatrixBase::cwiseSqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rsqrt,scalar_rsqrt_op,reciprocal square root,\sa ArrayBase::rsqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(square,scalar_square_op,square (power 2),\sa Eigen::abs2 DOXCOMMA Eigen::pow DOXCOMMA ArrayBase::square)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cube,scalar_cube_op,cube (power 3),\sa Eigen::pow DOXCOMMA ArrayBase::cube)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rint,scalar_rint_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(round,scalar_round_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(floor,scalar_floor_op,nearest integer not greater than the giben value,\sa Eigen::ceil DOXCOMMA ArrayBase::floor)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ceil,scalar_ceil_op,nearest integer not less than the giben value,\sa Eigen::floor DOXCOMMA ArrayBase::ceil)

View File

@@ -130,6 +130,9 @@ struct significant_decimals_impl
template<typename Derived>
std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt)
{
using internal::is_same;
using internal::conditional;
if(_m.size() == 0)
{
s << fmt.matPrefix << fmt.matSuffix;
@@ -138,6 +141,22 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
typename Derived::Nested m = _m;
typedef typename Derived::Scalar Scalar;
typedef typename
conditional<
is_same<Scalar, char>::value ||
is_same<Scalar, unsigned char>::value ||
is_same<Scalar, numext::int8_t>::value ||
is_same<Scalar, numext::uint8_t>::value,
int,
typename conditional<
is_same<Scalar, std::complex<char> >::value ||
is_same<Scalar, std::complex<unsigned char> >::value ||
is_same<Scalar, std::complex<numext::int8_t> >::value ||
is_same<Scalar, std::complex<numext::uint8_t> >::value,
std::complex<int>,
const Scalar&
>::type
>::type PrintType;
Index width = 0;
@@ -174,7 +193,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
{
std::stringstream sstr;
sstr.copyfmt(s);
sstr << m.coeff(i,j);
sstr << static_cast<PrintType>(m.coeff(i,j));
width = std::max<Index>(width, Index(sstr.str().length()));
}
}
@@ -190,7 +209,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, 0);
s << static_cast<PrintType>(m.coeff(i, 0));
for(Index j = 1; j < m.cols(); ++j)
{
s << fmt.coeffSeparator;
@@ -198,7 +217,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, j);
s << static_cast<PrintType>(m.coeff(i, j));
}
s << fmt.rowSuffix;
if( i < m.rows() - 1)

View File

@@ -54,7 +54,8 @@ struct traits<IndexedView<XprType, RowIndices, ColIndices> >
DirectAccessMask = (int(InnerIncr)!=UndefinedIncr && int(OuterIncr)!=UndefinedIncr && InnerIncr>=0 && OuterIncr>=0) ? DirectAccessBit : 0,
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask)) | FlagsLvalueBit | FlagsRowMajorBit
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1) ? LinearAccessBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask )) | FlagsLvalueBit | FlagsRowMajorBit | FlagsLinearAccessBit
};
typedef Block<XprType,RowsAtCompileTime,ColsAtCompileTime,IsInnerPannel> BlockType;
@@ -168,7 +169,11 @@ struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost /* TODO + cost of row/col index */,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits /*| LinearAccessBit | DirectAccessBit*/)),
FlagsLinearAccessBit = (traits<XprType>::RowsAtCompileTime == 1 || traits<XprType>::ColsAtCompileTime == 1) ? LinearAccessBit : 0,
FlagsRowMajorBit = traits<XprType>::FlagsRowMajorBit,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits & ~RowMajorBit /*| LinearAccessBit | DirectAccessBit*/)) | FlagsLinearAccessBit | FlagsRowMajorBit,
Alignment = 0
};
@@ -193,6 +198,31 @@ struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
return m_argImpl.coeffRef(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar& coeffRef(Index index) const
{
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const CoeffReturnType coeff(Index index) const
{
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
return m_argImpl.coeff( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
protected:
evaluator<ArgType> m_argImpl;

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_INVERSE_H
#define EIGEN_INVERSE_H
namespace Eigen {
namespace Eigen {
template<typename XprType,typename StorageKind> class InverseImpl;
@@ -49,13 +49,13 @@ public:
typedef typename internal::remove_all<XprTypeNested>::type XprTypeNestedCleaned;
typedef typename internal::ref_selector<Inverse>::type Nested;
typedef typename internal::remove_all<XprType>::type NestedExpression;
explicit EIGEN_DEVICE_FUNC Inverse(const XprType &xpr)
: m_xpr(xpr)
{}
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC const XprTypeNestedCleaned& nestedExpression() const { return m_xpr; }
@@ -81,7 +81,7 @@ namespace internal {
/** \internal
* \brief Default evaluator for Inverse expression.
*
*
* This default evaluator for Inverse expression simply evaluate the inverse into a temporary
* by a call to internal::call_assignment_no_alias.
* Therefore, inverse implementers only have to specialize Assignment<Dst,Inverse<...>, ...> for
@@ -96,7 +96,7 @@ struct unary_evaluator<Inverse<ArgType> >
typedef Inverse<ArgType> InverseType;
typedef typename InverseType::PlainObject PlainObject;
typedef evaluator<PlainObject> Base;
enum { Flags = Base::Flags | EvalBeforeNestingBit };
unary_evaluator(const InverseType& inv_xpr)
@@ -105,11 +105,11 @@ struct unary_evaluator<Inverse<ArgType> >
::new (static_cast<Base*>(this)) Base(m_result);
internal::call_assignment_no_alias(m_result, inv_xpr);
}
protected:
PlainObject m_result;
};
} // end namespace internal
} // end namespace Eigen

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_MAP_H
#define EIGEN_MAP_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename PlainObjectType, int MapOptions, typename StrideType>
@@ -104,13 +104,13 @@ template<typename PlainObjectType, int MapOptions, typename StrideType> class Ma
EIGEN_DEVICE_FUNC
inline PointerType cast_to_pointer_type(PointerArgType ptr) { return ptr; }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const
{
return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1;
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const
{
return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()

View File

@@ -15,7 +15,7 @@
EIGEN_STATIC_ASSERT((int(internal::evaluator<Derived>::Flags) & LinearAccessBit) || Derived::IsVectorAtCompileTime, \
YOU_ARE_TRYING_TO_USE_AN_INDEX_BASED_ACCESSOR_ON_AN_EXPRESSION_THAT_DOES_NOT_SUPPORT_THAT)
namespace Eigen {
namespace Eigen {
/** \ingroup Core_Module
*
@@ -87,9 +87,11 @@ template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
typedef typename Base::CoeffReturnType CoeffReturnType;
/** \copydoc DenseBase::rows() */
EIGEN_DEVICE_FUNC inline Index rows() const { return m_rows.value(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_rows.value(); }
/** \copydoc DenseBase::cols() */
EIGEN_DEVICE_FUNC inline Index cols() const { return m_cols.value(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_cols.value(); }
/** Returns a pointer to the first coefficient of the matrix or vector.
*
@@ -182,6 +184,8 @@ template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
#endif
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
template<typename T>
EIGEN_DEVICE_FUNC
@@ -294,6 +298,9 @@ template<typename Derived> class MapBase<Derived, WriteAccessors>
// In theory we could simply refer to Base:Base::operator=, but MSVC does not like Base::Base,
// see bugs 821 and 920.
using ReadOnlyMapBase::Base::operator=;
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
};
#undef EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS

View File

@@ -10,9 +10,11 @@
#ifndef EIGEN_MATHFUNCTIONS_H
#define EIGEN_MATHFUNCTIONS_H
// source: http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html
// TODO this should better be moved to NumTraits
#define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406L
// Source: WolframAlpha
#define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406L
#define EIGEN_LOG2E 1.442695040888963407359924681001892137426645954152985934135449406931109219L
#define EIGEN_LN2 0.693147180559945309417232121458176568075500134360255254120680009493393621L
namespace Eigen {
@@ -212,12 +214,12 @@ struct imag_ref_default_impl
template<typename Scalar>
struct imag_ref_default_impl<Scalar, false>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Scalar run(Scalar&)
{
return Scalar(0);
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline const Scalar run(const Scalar&)
{
return Scalar(0);
@@ -321,6 +323,65 @@ struct abs2_retval
typedef typename NumTraits<Scalar>::Real type;
};
/****************************************************************************
* Implementation of sqrt/rsqrt *
****************************************************************************/
template<typename Scalar>
struct sqrt_impl
{
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE Scalar run(const Scalar& x)
{
EIGEN_USING_STD(sqrt);
return sqrt(x);
}
};
// Complex sqrt defined in MathFunctionsImpl.h.
template<typename T> EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& a_x);
// Custom implementation is faster than `std::sqrt`, works on
// GPU, and correctly handles special cases (unlike MSVC).
template<typename T>
struct sqrt_impl<std::complex<T> >
{
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE std::complex<T> run(const std::complex<T>& x)
{
return complex_sqrt<T>(x);
}
};
template<typename Scalar>
struct sqrt_retval
{
typedef Scalar type;
};
// Default implementation relies on numext::sqrt, at bottom of file.
template<typename T>
struct rsqrt_impl;
// Complex rsqrt defined in MathFunctionsImpl.h.
template<typename T> EIGEN_DEVICE_FUNC std::complex<T> complex_rsqrt(const std::complex<T>& a_x);
template<typename T>
struct rsqrt_impl<std::complex<T> >
{
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE std::complex<T> run(const std::complex<T>& x)
{
return complex_rsqrt<T>(x);
}
};
template<typename Scalar>
struct rsqrt_retval
{
typedef Scalar type;
};
/****************************************************************************
* Implementation of norm1 *
****************************************************************************/
@@ -335,7 +396,7 @@ struct norm1_default_impl<Scalar,true>
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return abs(x.real()) + abs(x.imag());
}
};
@@ -346,7 +407,7 @@ struct norm1_default_impl<Scalar, false>
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return abs(x);
}
};
@@ -376,7 +437,7 @@ struct hypot_retval
* Implementation of cast *
****************************************************************************/
template<typename OldType, typename NewType>
template<typename OldType, typename NewType, typename EnableIf = void>
struct cast_impl
{
EIGEN_DEVICE_FUNC
@@ -386,6 +447,22 @@ struct cast_impl
}
};
// Casting from S -> Complex<T> leads to an implicit conversion from S to T,
// generating warnings on clang. Here we explicitly cast the real component.
template<typename OldType, typename NewType>
struct cast_impl<OldType, NewType,
typename internal::enable_if<
!NumTraits<OldType>::IsComplex && NumTraits<NewType>::IsComplex
>::type>
{
EIGEN_DEVICE_FUNC
static inline NewType run(const OldType& x)
{
typedef typename NumTraits<NewType>::Real NewReal;
return static_cast<NewType>(static_cast<NewReal>(x));
}
};
// here, for once, we're plainly returning NewType: we don't want cast to do weird things.
template<typename OldType, typename NewType>
@@ -399,29 +476,59 @@ inline NewType cast(const OldType& x)
* Implementation of round *
****************************************************************************/
#if EIGEN_HAS_CXX11_MATH
template<typename Scalar>
struct round_impl {
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
EIGEN_USING_STD_MATH(round);
return round(x);
}
};
#else
template<typename Scalar>
struct round_impl
template<typename Scalar>
struct round_impl
{
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
EIGEN_USING_STD_MATH(floor);
EIGEN_USING_STD_MATH(ceil);
return (x > Scalar(0)) ? floor(x + Scalar(0.5)) : ceil(x - Scalar(0.5));
}
};
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
#if EIGEN_HAS_CXX11_MATH
EIGEN_USING_STD(round);
#endif
return Scalar(round(x));
}
};
#if !EIGEN_HAS_CXX11_MATH
#if EIGEN_HAS_C99_MATH
// Use ::roundf for float.
template<>
struct round_impl<float> {
EIGEN_DEVICE_FUNC
static inline float run(const float& x)
{
return ::roundf(x);
}
};
#else
template<typename Scalar>
struct round_using_floor_ceil_impl
{
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
// Without C99 round/roundf, resort to floor/ceil.
EIGEN_USING_STD(floor);
EIGEN_USING_STD(ceil);
// If not enough precision to resolve a decimal at all, return the input.
// Otherwise, adding 0.5 can trigger an increment by 1.
const Scalar limit = Scalar(1ull << (NumTraits<Scalar>::digits() - 1));
if (x >= limit || x <= -limit) {
return x;
}
return (x > Scalar(0)) ? Scalar(floor(x + Scalar(0.5))) : Scalar(ceil(x - Scalar(0.5)));
}
};
template<>
struct round_impl<float> : round_using_floor_ceil_impl<float> {};
template<>
struct round_impl<double> : round_using_floor_ceil_impl<double> {};
#endif // EIGEN_HAS_C99_MATH
#endif // !EIGEN_HAS_CXX11_MATH
template<typename Scalar>
struct round_retval
@@ -429,49 +536,110 @@ struct round_retval
typedef Scalar type;
};
/****************************************************************************
* Implementation of rint *
****************************************************************************/
template<typename Scalar>
struct rint_impl {
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
#if EIGEN_HAS_CXX11_MATH
EIGEN_USING_STD(rint);
#endif
return rint(x);
}
};
#if !EIGEN_HAS_CXX11_MATH
template<>
struct rint_impl<double> {
EIGEN_DEVICE_FUNC
static inline double run(const double& x)
{
return ::rint(x);
}
};
template<>
struct rint_impl<float> {
EIGEN_DEVICE_FUNC
static inline float run(const float& x)
{
return ::rintf(x);
}
};
#endif
template<typename Scalar>
struct rint_retval
{
typedef Scalar type;
};
/****************************************************************************
* Implementation of arg *
****************************************************************************/
#if EIGEN_HAS_CXX11_MATH
template<typename Scalar>
struct arg_impl {
static inline Scalar run(const Scalar& x)
{
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// HIP does not seem to have a native device side implementation for the math routine "arg"
using std::arg;
#else
EIGEN_USING_STD_MATH(arg);
#endif
return arg(x);
}
};
// std::arg is only defined for types of std::complex, or integer types or float/double/long double
template<typename Scalar,
bool HasStdImpl = NumTraits<Scalar>::IsComplex || is_integral<Scalar>::value
|| is_same<Scalar, float>::value || is_same<Scalar, double>::value
|| is_same<Scalar, long double>::value >
struct arg_default_impl;
template<typename Scalar>
struct arg_default_impl<Scalar, true> {
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
{
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// HIP does not seem to have a native device side implementation for the math routine "arg"
using std::arg;
#else
EIGEN_USING_STD(arg);
#endif
return static_cast<Scalar>(arg(x));
}
};
// Must be non-complex floating-point type (e.g. half/bfloat16).
template<typename Scalar>
struct arg_default_impl<Scalar, false> {
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0);
}
};
#else
template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
struct arg_default_impl
template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
struct arg_default_impl
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0); }
};
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0);
}
};
template<typename Scalar>
struct arg_default_impl<Scalar,true>
template<typename Scalar>
struct arg_default_impl<Scalar,true>
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
EIGEN_USING_STD_MATH(arg);
return arg(x);
}
};
template<typename Scalar> struct arg_impl : arg_default_impl<Scalar> {};
EIGEN_USING_STD(arg);
return arg(x);
}
};
#endif
template<typename Scalar> struct arg_impl : arg_default_impl<Scalar> {};
template<typename Scalar>
struct arg_retval
@@ -493,7 +661,7 @@ namespace std_fallback {
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD_MATH(exp);
EIGEN_USING_STD(exp);
Scalar u = exp(x);
if (numext::equal_strict(u, Scalar(1))) {
return x;
@@ -503,7 +671,7 @@ namespace std_fallback {
return RealScalar(-1);
}
EIGEN_USING_STD_MATH(log);
EIGEN_USING_STD(log);
Scalar logu = log(u);
return numext::equal_strict(u, logu) ? u : (u - RealScalar(1)) * x / logu;
}
@@ -523,16 +691,6 @@ struct expm1_impl {
}
};
// Specialization for complex types that are not supported by std::expm1.
template <typename RealScalar>
struct expm1_impl<std::complex<RealScalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<RealScalar> run(
const std::complex<RealScalar>& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar)
return std_fallback::expm1(x);
}
};
template<typename Scalar>
struct expm1_retval
{
@@ -550,7 +708,7 @@ namespace std_fallback {
EIGEN_DEVICE_FUNC inline Scalar log1p(const Scalar& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD_MATH(log);
EIGEN_USING_STD(log);
Scalar x1p = RealScalar(1) + x;
Scalar log_1p = log(x1p);
const bool is_small = numext::equal_strict(x1p, Scalar(1));
@@ -600,7 +758,7 @@ struct pow_impl
typedef typename ScalarBinaryOpTraits<ScalarX,ScalarY,internal::scalar_pow_op<ScalarX,ScalarY> >::ReturnType result_type;
static EIGEN_DEVICE_FUNC inline result_type run(const ScalarX& x, const ScalarY& y)
{
EIGEN_USING_STD_MATH(pow);
EIGEN_USING_STD(pow);
return pow(x, y);
}
};
@@ -902,12 +1060,12 @@ template<typename T> T generic_fast_tanh_float(const T& a_x);
namespace numext {
#if (!defined(EIGEN_GPUCC) || defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC))
#if (!defined(EIGEN_GPUCC) || defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC))
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y)
{
EIGEN_USING_STD_MATH(min);
EIGEN_USING_STD(min)
return min EIGEN_NOT_A_MACRO (x,y);
}
@@ -915,7 +1073,7 @@ template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
{
EIGEN_USING_STD_MATH(max);
EIGEN_USING_STD(max)
return max EIGEN_NOT_A_MACRO (x,y);
}
#else
@@ -1116,6 +1274,34 @@ inline EIGEN_MATHFUNC_RETVAL(abs2, Scalar) abs2(const Scalar& x)
EIGEN_DEVICE_FUNC
inline bool abs2(bool x) { return x; }
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T absdiff(const T& x, const T& y)
{
return x > y ? x - y : y - x;
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE float absdiff(const float& x, const float& y)
{
return fabsf(x - y);
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE double absdiff(const double& x, const double& y)
{
return fabs(x - y);
}
#if !defined(EIGEN_GPUCC)
// HIP and CUDA do not support long double.
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE long double absdiff(const long double& x, const long double& y) {
return fabsl(x - y);
}
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(norm1, Scalar) norm1(const Scalar& x)
@@ -1174,6 +1360,13 @@ SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isinf, isinf, bool)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isfinite, isfinite, bool)
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(rint, Scalar) rint(const Scalar& x)
{
return EIGEN_MATHFUNC_IMPL(rint, Scalar)::run(x);
}
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(round, Scalar) round(const Scalar& x)
@@ -1189,7 +1382,7 @@ template<typename T>
EIGEN_DEVICE_FUNC
T (floor)(const T& x)
{
EIGEN_USING_STD_MATH(floor);
EIGEN_USING_STD(floor)
return floor(x);
}
@@ -1209,7 +1402,7 @@ template<typename T>
EIGEN_DEVICE_FUNC
T (ceil)(const T& x)
{
EIGEN_USING_STD_MATH(ceil);
EIGEN_USING_STD(ceil);
return ceil(x);
}
@@ -1250,23 +1443,35 @@ inline int log2(int x)
*
* It's usage is justified in performance critical functions, like norm/normalize.
*/
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T sqrt(const T &x)
template<typename Scalar>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE EIGEN_MATHFUNC_RETVAL(sqrt, Scalar) sqrt(const Scalar& x)
{
EIGEN_USING_STD_MATH(sqrt);
return sqrt(x);
return EIGEN_MATHFUNC_IMPL(sqrt, Scalar)::run(x);
}
// Boolean specialization, avoids implicit float to bool conversion (-Wimplicit-conversion-floating-point-to-bool).
template<>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_DEVICE_FUNC
bool sqrt<bool>(const bool &x) { return x; }
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sqrt, sqrt)
#endif
/** \returns the reciprocal square root of \a x. **/
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T rsqrt(const T& x)
{
return internal::rsqrt_impl<T>::run(x);
}
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T log(const T &x) {
EIGEN_USING_STD_MATH(log);
return log(x);
EIGEN_USING_STD(log);
return static_cast<T>(log(x));
}
#if defined(SYCL_DEVICE_ONLY)
@@ -1286,7 +1491,7 @@ template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
typename internal::enable_if<NumTraits<T>::IsSigned || NumTraits<T>::IsComplex,typename NumTraits<T>::Real>::type
abs(const T &x) {
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return abs(x);
}
@@ -1323,7 +1528,7 @@ double abs(const std::complex<double>& x) {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T exp(const T &x) {
EIGEN_USING_STD_MATH(exp);
EIGEN_USING_STD(exp);
return exp(x);
}
@@ -1377,7 +1582,7 @@ double expm1(const double &x) { return ::expm1(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T cos(const T &x) {
EIGEN_USING_STD_MATH(cos);
EIGEN_USING_STD(cos);
return cos(x);
}
@@ -1396,7 +1601,7 @@ double cos(const double &x) { return ::cos(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T sin(const T &x) {
EIGEN_USING_STD_MATH(sin);
EIGEN_USING_STD(sin);
return sin(x);
}
@@ -1415,7 +1620,7 @@ double sin(const double &x) { return ::sin(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T tan(const T &x) {
EIGEN_USING_STD_MATH(tan);
EIGEN_USING_STD(tan);
return tan(x);
}
@@ -1434,7 +1639,7 @@ double tan(const double &x) { return ::tan(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T acos(const T &x) {
EIGEN_USING_STD_MATH(acos);
EIGEN_USING_STD(acos);
return acos(x);
}
@@ -1442,8 +1647,8 @@ T acos(const T &x) {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T acosh(const T &x) {
EIGEN_USING_STD_MATH(acosh);
return acosh(x);
EIGEN_USING_STD(acosh);
return static_cast<T>(acosh(x));
}
#endif
@@ -1463,7 +1668,7 @@ double acos(const double &x) { return ::acos(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T asin(const T &x) {
EIGEN_USING_STD_MATH(asin);
EIGEN_USING_STD(asin);
return asin(x);
}
@@ -1471,8 +1676,8 @@ T asin(const T &x) {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T asinh(const T &x) {
EIGEN_USING_STD_MATH(asinh);
return asinh(x);
EIGEN_USING_STD(asinh);
return static_cast<T>(asinh(x));
}
#endif
@@ -1492,16 +1697,16 @@ double asin(const double &x) { return ::asin(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T atan(const T &x) {
EIGEN_USING_STD_MATH(atan);
return atan(x);
EIGEN_USING_STD(atan);
return static_cast<T>(atan(x));
}
#if EIGEN_HAS_CXX11_MATH
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T atanh(const T &x) {
EIGEN_USING_STD_MATH(atanh);
return atanh(x);
EIGEN_USING_STD(atanh);
return static_cast<T>(atanh(x));
}
#endif
@@ -1522,8 +1727,8 @@ double atan(const double &x) { return ::atan(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T cosh(const T &x) {
EIGEN_USING_STD_MATH(cosh);
return cosh(x);
EIGEN_USING_STD(cosh);
return static_cast<T>(cosh(x));
}
#if defined(SYCL_DEVICE_ONLY)
@@ -1541,8 +1746,8 @@ double cosh(const double &x) { return ::cosh(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T sinh(const T &x) {
EIGEN_USING_STD_MATH(sinh);
return sinh(x);
EIGEN_USING_STD(sinh);
return static_cast<T>(sinh(x));
}
#if defined(SYCL_DEVICE_ONLY)
@@ -1560,7 +1765,7 @@ double sinh(const double &x) { return ::sinh(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T tanh(const T &x) {
EIGEN_USING_STD_MATH(tanh);
EIGEN_USING_STD(tanh);
return tanh(x);
}
@@ -1584,7 +1789,7 @@ double tanh(const double &x) { return ::tanh(x); }
template <typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T fmod(const T& a, const T& b) {
EIGEN_USING_STD_MATH(fmod);
EIGEN_USING_STD(fmod);
return fmod(a, b);
}
@@ -1746,6 +1951,11 @@ template<> struct random_impl<bool>
{
return random<int>(0,1)==0 ? false : true;
}
static inline bool run(const bool& a, const bool& b)
{
return random<int>(a, b)==0 ? false : true;
}
};
template<> struct scalar_fuzzy_impl<bool>
@@ -1772,6 +1982,45 @@ template<> struct scalar_fuzzy_impl<bool>
};
} // end namespace internal
// Default implementations that rely on other numext implementations
namespace internal {
// Specialization for complex types that are not supported by std::expm1.
template <typename RealScalar>
struct expm1_impl<std::complex<RealScalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<RealScalar> run(
const std::complex<RealScalar>& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar)
RealScalar xr = x.real();
RealScalar xi = x.imag();
// expm1(z) = exp(z) - 1
// = exp(x + i * y) - 1
// = exp(x) * (cos(y) + i * sin(y)) - 1
// = exp(x) * cos(y) - 1 + i * exp(x) * sin(y)
// Imag(expm1(z)) = exp(x) * sin(y)
// Real(expm1(z)) = exp(x) * cos(y) - 1
// = exp(x) * cos(y) - 1.
// = expm1(x) + exp(x) * (cos(y) - 1)
// = expm1(x) + exp(x) * (2 * sin(y / 2) ** 2)
RealScalar erm1 = numext::expm1<RealScalar>(xr);
RealScalar er = erm1 + RealScalar(1.);
RealScalar sin2 = numext::sin(xi / RealScalar(2.));
sin2 = sin2 * sin2;
RealScalar s = numext::sin(xi);
RealScalar real_part = erm1 - RealScalar(2.) * er * sin2;
return std::complex<RealScalar>(real_part, er * s);
}
};
template<typename T>
struct rsqrt_impl {
EIGEN_DEVICE_FUNC
static EIGEN_ALWAYS_INLINE T run(const T& x) {
return T(1)/numext::sqrt(x);
}
};
} // end namespace internal

View File

@@ -17,19 +17,28 @@ namespace internal {
/** \internal \returns the hyperbolic tan of \a a (coeff-wise)
Doesn't do anything fancy, just a 13/6-degree rational interpolant which
is accurate up to a couple of ulp in the range [-9, 9], outside of which
the tanh(x) = +/-1.
is accurate up to a couple of ulps in the (approximate) range [-8, 8],
outside of which tanh(x) = +/-1 in single precision. The input is clamped
to the range [-c, c]. The value c is chosen as the smallest value where
the approximation evaluates to exactly 1. In the reange [-0.0004, 0.0004]
the approxmation tanh(x) ~= x is used for better accuracy as x tends to zero.
This implementation works on both scalars and packets.
*/
template<typename T>
T generic_fast_tanh_float(const T& a_x)
{
// Clamp the inputs to the range [-9, 9] since anything outside
// this range is +/-1.0f in single-precision.
const T plus_9 = pset1<T>(9.f);
const T minus_9 = pset1<T>(-9.f);
const T x = pmax(pmin(a_x, plus_9), minus_9);
// Clamp the inputs to the range [-c, c]
#ifdef EIGEN_VECTORIZE_FMA
const T plus_clamp = pset1<T>(7.99881172180175781f);
const T minus_clamp = pset1<T>(-7.99881172180175781f);
#else
const T plus_clamp = pset1<T>(7.90531110763549805f);
const T minus_clamp = pset1<T>(-7.90531110763549805f);
#endif
const T tiny = pset1<T>(0.0004f);
const T x = pmax(pmin(a_x, plus_clamp), minus_clamp);
const T tiny_mask = pcmp_lt(pabs(a_x), tiny);
// The monomial coefficients of the numerator polynomial (odd).
const T alpha_1 = pset1<T>(4.89352455891786e-03f);
const T alpha_3 = pset1<T>(6.37261928875436e-04f);
@@ -57,20 +66,26 @@ T generic_fast_tanh_float(const T& a_x)
p = pmadd(x2, p, alpha_1);
p = pmul(x, p);
// Evaluate the denominator polynomial p.
// Evaluate the denominator polynomial q.
T q = pmadd(x2, beta_6, beta_4);
q = pmadd(x2, q, beta_2);
q = pmadd(x2, q, beta_0);
// Divide the numerator by the denominator.
return pdiv(p, q);
return pselect(tiny_mask, x, pdiv(p, q));
}
template<typename RealScalar>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
RealScalar positive_real_hypot(const RealScalar& x, const RealScalar& y)
{
EIGEN_USING_STD_MATH(sqrt);
// IEEE IEC 6059 special cases.
if ((numext::isinf)(x) || (numext::isinf)(y))
return NumTraits<RealScalar>::infinity();
if ((numext::isnan)(x) || (numext::isnan)(y))
return NumTraits<RealScalar>::quiet_NaN();
EIGEN_USING_STD(sqrt);
RealScalar p, qp;
p = numext::maxi(x,y);
if(p==RealScalar(0)) return RealScalar(0);
@@ -85,11 +100,90 @@ struct hypot_impl
static EIGEN_DEVICE_FUNC
inline RealScalar run(const Scalar& x, const Scalar& y)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return positive_real_hypot<RealScalar>(abs(x), abs(y));
}
};
// Generic complex sqrt implementation that correctly handles corner cases
// according to https://en.cppreference.com/w/cpp/numeric/complex/sqrt
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
// Computes the principal sqrt of the input.
//
// For a complex square root of the number x + i*y. We want to find real
// numbers u and v such that
// (u + i*v)^2 = x + i*y <=>
// u^2 - v^2 + i*2*u*v = x + i*v.
// By equating the real and imaginary parts we get:
// u^2 - v^2 = x
// 2*u*v = y.
//
// For x >= 0, this has the numerically stable solution
// u = sqrt(0.5 * (x + sqrt(x^2 + y^2)))
// v = y / (2 * u)
// and for x < 0,
// v = sign(y) * sqrt(0.5 * (-x + sqrt(x^2 + y^2)))
// u = y / (2 * v)
//
// Letting w = sqrt(0.5 * (|x| + |z|)),
// if x == 0: u = w, v = sign(y) * w
// if x > 0: u = w, v = y / (2 * w)
// if x < 0: u = |y| / (2 * w), v = sign(y) * w
const T x = numext::real(z);
const T y = numext::imag(z);
const T zero = T(0);
const T w = numext::sqrt(T(0.5) * (numext::abs(x) + numext::hypot(x, y)));
return
(numext::isinf)(y) ? std::complex<T>(NumTraits<T>::infinity(), y)
: x == zero ? std::complex<T>(w, y < zero ? -w : w)
: x > zero ? std::complex<T>(w, y / (2 * w))
: std::complex<T>(numext::abs(y) / (2 * w), y < zero ? -w : w );
}
// Generic complex rsqrt implementation.
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_rsqrt(const std::complex<T>& z) {
// Computes the principal reciprocal sqrt of the input.
//
// For a complex reciprocal square root of the number z = x + i*y. We want to
// find real numbers u and v such that
// (u + i*v)^2 = 1 / (x + i*y) <=>
// u^2 - v^2 + i*2*u*v = x/|z|^2 - i*v/|z|^2.
// By equating the real and imaginary parts we get:
// u^2 - v^2 = x/|z|^2
// 2*u*v = y/|z|^2.
//
// For x >= 0, this has the numerically stable solution
// u = sqrt(0.5 * (x + |z|)) / |z|
// v = -y / (2 * u * |z|)
// and for x < 0,
// v = -sign(y) * sqrt(0.5 * (-x + |z|)) / |z|
// u = -y / (2 * v * |z|)
//
// Letting w = sqrt(0.5 * (|x| + |z|)),
// if x == 0: u = w / |z|, v = -sign(y) * w / |z|
// if x > 0: u = w / |z|, v = -y / (2 * w * |z|)
// if x < 0: u = |y| / (2 * w * |z|), v = -sign(y) * w / |z|
const T x = numext::real(z);
const T y = numext::imag(z);
const T zero = T(0);
const T abs_z = numext::hypot(x, y);
const T w = numext::sqrt(T(0.5) * (numext::abs(x) + abs_z));
const T woz = w / abs_z;
// Corner cases consistent with 1/sqrt(z) on gcc/clang.
return
abs_z == zero ? std::complex<T>(NumTraits<T>::infinity(), NumTraits<T>::quiet_NaN())
: ((numext::isinf)(x) || (numext::isinf)(y)) ? std::complex<T>(zero, zero)
: x == zero ? std::complex<T>(woz, y < zero ? woz : -woz)
: x > zero ? std::complex<T>(woz, -y / (2 * w * abs_z))
: std::complex<T>(numext::abs(y) / (2 * w * abs_z), y < zero ? woz : -woz );
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -29,7 +29,7 @@ private:
required_alignment = unpacket_traits<PacketScalar>::alignment,
packet_access_bit = (packet_traits<_Scalar>::Vectorizable && (EIGEN_UNALIGNED_VECTORIZE || (actual_alignment>=required_alignment))) ? PacketAccessBit : 0
};
public:
typedef _Scalar Scalar;
typedef Dense StorageKind;
@@ -44,7 +44,7 @@ public:
Options = _Options,
InnerStrideAtCompileTime = 1,
OuterStrideAtCompileTime = (Options&RowMajor) ? ColsAtCompileTime : RowsAtCompileTime,
// FIXME, the following flag in only used to define NeedsToAlign in PlainObjectBase
EvaluatorFlags = LinearAccessBit | DirectAccessBit | packet_access_bit | row_major_bit,
Alignment = actual_alignment
@@ -278,7 +278,7 @@ class Matrix
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix& operator=(Matrix&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable<Scalar>::value)
{
other.swap(*this);
Base::operator=(std::move(other));
return *this;
}
#endif
@@ -297,24 +297,24 @@ class Matrix
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs a Matrix and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11
*
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
*
* Example: \include Matrix_initializer_list_23_cxx11.cpp
* Output: \verbinclude Matrix_initializer_list_23_cxx11.out
*
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered.
*
*
* In the case of a compile-time column vector, implicit transposition from a single row is allowed.
* Therefore <code>VectorXd{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>RowVectorXd{{1},{2},{3},{4},{5}}</code> can be avoided:
*
*
* Example: \include Matrix_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Matrix_initializer_list_vector_cxx11.out
*
*
* In the case of fixed-sized matrices, the initializer list sizes must exactly match the matrix sizes,
* and implicit transposition is allowed for compile-time vectors only.
*
*
* \sa Matrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
@@ -351,7 +351,7 @@ class Matrix
* This is useful for dynamic-size vectors. For fixed-size vectors,
* it is redundant to pass these parameters, so one should use the default constructor
* Matrix() instead.
*
*
* \warning This constructor is disabled for fixed-size \c 1x1 matrices. For instance,
* calling Matrix<double,1,1>(1) will call the initialization constructor: Matrix(const Scalar&).
* For fixed-size \c 1x1 matrices it is therefore recommended to use the default
@@ -367,7 +367,7 @@ class Matrix
* This is useful for dynamic-size matrices. For fixed-size matrices,
* it is redundant to pass these parameters, so one should use the default constructor
* Matrix() instead.
*
*
* \warning This constructor is disabled for fixed-size \c 1x2 and \c 2x1 vectors. For instance,
* calling Matrix2f(2,1) will call the initialization constructor: Matrix(const Scalar& x, const Scalar& y).
* For fixed-size \c 1x2 or \c 2x1 vectors it is therefore recommended to use the default
@@ -376,7 +376,7 @@ class Matrix
*/
EIGEN_DEVICE_FUNC
Matrix(Index rows, Index cols);
/** \brief Constructs an initialized 2D vector with given coefficients
* \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */
Matrix(const Scalar& x, const Scalar& y);
@@ -423,8 +423,10 @@ class Matrix
: Base(other.derived())
{ }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return 1; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return this->innerSize(); }
/////////// Geometry module ///////////
@@ -463,14 +465,14 @@ class Matrix
*
* There are also \c VectorSizeType and \c RowVectorSizeType which are self-explanatory. For example, \c Vector4cf is
* a fixed-size vector of 4 complex floats.
*
*
* With \cpp11, template alias are also defined for common sizes.
* They follow the same pattern as above except that the scalar type suffix is replaced by a
* template parameter, i.e.:
* - `MatrixSize<Type>` where `Size` can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size.
* - `MatrixXSize<Type>` and `MatrixSizeX<Type>` where `Size` can be \c 2,\c 3,\c 4 for hybrid dynamic/fixed matrices.
* - `VectorSize<Type>` and `RowVectorSize<Type>` for column and row vectors.
*
*
* With \cpp11, you can also use fully generic column and row vector types: `Vector<Type,Size>` and `RowVector<Type,Size>`.
*
* \sa class Matrix

View File

@@ -481,7 +481,8 @@ template<typename Derived> class MatrixBase
EIGEN_MATRIX_FUNCTION_1(MatrixComplexPowerReturnValue, pow, power to \c p, const std::complex<RealScalar>& p)
protected:
EIGEN_DEVICE_FUNC MatrixBase() : Base() {}
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MatrixBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MatrixBase)
private:
EIGEN_DEVICE_FUNC explicit MatrixBase(int);

View File

@@ -45,8 +45,8 @@ template<typename ExpressionType> class NestByValue
EIGEN_DEVICE_FUNC explicit inline NestByValue(const ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; }

View File

@@ -21,14 +21,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits10_impl
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return std::numeric_limits<T>::digits10; }
};
template<typename T>
struct default_digits10_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() {
using std::log10;
using std::ceil;
@@ -40,7 +40,7 @@ struct default_digits10_impl<T,false,false> // Floating point
template<typename T>
struct default_digits10_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return 0; }
};
@@ -52,14 +52,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits_impl
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return std::numeric_limits<T>::digits; }
};
template<typename T>
struct default_digits_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() {
using std::log;
using std::ceil;
@@ -71,12 +71,33 @@ struct default_digits_impl<T,false,false> // Floating point
template<typename T>
struct default_digits_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return 0; }
};
} // end namespace internal
namespace numext {
/** \internal bit-wise cast without changing the underlying bit representation. */
// TODO: Replace by std::bit_cast (available in C++20)
template <typename Tgt, typename Src>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Tgt bit_cast(const Src& src) {
#if EIGEN_HAS_TYPE_TRAITS
// The behaviour of memcpy is not specified for non-trivially copyable types
EIGEN_STATIC_ASSERT(std::is_trivially_copyable<Src>::value, THIS_TYPE_IS_NOT_SUPPORTED);
EIGEN_STATIC_ASSERT(std::is_trivially_copyable<Tgt>::value && std::is_default_constructible<Tgt>::value,
THIS_TYPE_IS_NOT_SUPPORTED);
#endif
EIGEN_STATIC_ASSERT(sizeof(Src) == sizeof(Tgt), THIS_TYPE_IS_NOT_SUPPORTED);
Tgt tgt;
EIGEN_USING_STD(memcpy)
memcpy(&tgt, &src, sizeof(Tgt));
return tgt;
}
} // namespace numext
/** \class NumTraits
* \ingroup Core_Module
*
@@ -114,9 +135,18 @@ struct default_digits_impl<T,false,true> // Integer
* \li A dummy_precision() function returning a weak epsilon value. It is mainly used as a default
* value by the fuzzy comparison operators.
* \li highest() and lowest() functions returning the highest and lowest possible values respectively.
* \li digits() function returning the number of radix digits (non-sign digits for integers, mantissa for floating-point). This is
* the analogue of <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/digits">std::numeric_limits<T>::digits</a>
* which is used as the default implementation if specialized.
* \li digits10() function returning the number of decimal digits that can be represented without change. This is
* the analogue of <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/digits10">std::numeric_limits<T>::digits10</a>
* which is used as the default implementation if specialized.
* \li min_exponent() and max_exponent() functions returning the highest and lowest possible values, respectively,
* such that the radix raised to the power exponent-1 is a normalized floating-point number. These are equivalent to
* <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/min_exponent">std::numeric_limits<T>::min_exponent</a>/
* <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/max_exponent">std::numeric_limits<T>::max_exponent</a>.
* \li infinity() function returning a representation of positive infinity, if available.
* \li quiet_NaN function returning a non-signaling "not-a-number", if available.
*/
template<typename T> struct GenericNumTraits
@@ -140,49 +170,60 @@ template<typename T> struct GenericNumTraits
typedef T Nested;
typedef T Literal;
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real epsilon()
{
return numext::numeric_limits<T>::epsilon();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits10()
{
return internal::default_digits10_impl<T>::run();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits()
{
return internal::default_digits_impl<T>::run();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int min_exponent()
{
return numext::numeric_limits<T>::min_exponent;
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int max_exponent()
{
return numext::numeric_limits<T>::max_exponent;
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real dummy_precision()
{
// make sure to override this for floating-point types
return Real(0);
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T highest() {
return (numext::numeric_limits<T>::max)();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T lowest() {
return IsInteger ? (numext::numeric_limits<T>::min)()
: static_cast<T>(-(numext::numeric_limits<T>::max)());
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T infinity() {
return numext::numeric_limits<T>::infinity();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T quiet_NaN() {
return numext::numeric_limits<T>::quiet_NaN();
}
@@ -194,19 +235,20 @@ template<typename T> struct NumTraits : GenericNumTraits<T>
template<> struct NumTraits<float>
: GenericNumTraits<float>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline float dummy_precision() { return 1e-5f; }
};
template<> struct NumTraits<double> : GenericNumTraits<double>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline double dummy_precision() { return 1e-12; }
};
template<> struct NumTraits<long double>
: GenericNumTraits<long double>
{
EIGEN_CONSTEXPR
static inline long double dummy_precision() { return 1e-15l; }
};
@@ -223,11 +265,11 @@ template<typename _Real> struct NumTraits<std::complex<_Real> >
MulCost = 4 * NumTraits<Real>::MulCost + 2 * NumTraits<Real>::AddCost
};
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real epsilon() { return NumTraits<Real>::epsilon(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real dummy_precision() { return NumTraits<Real>::dummy_precision(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits10() { return NumTraits<Real>::digits10(); }
};
@@ -252,11 +294,12 @@ struct NumTraits<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::MulCost
};
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline RealScalar epsilon() { return NumTraits<RealScalar>::epsilon(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline RealScalar dummy_precision() { return NumTraits<RealScalar>::dummy_precision(); }
EIGEN_CONSTEXPR
static inline int digits10() { return NumTraits<Scalar>::digits10(); }
};
@@ -270,6 +313,7 @@ template<> struct NumTraits<std::string>
MulCost = HugeCost
};
EIGEN_CONSTEXPR
static inline int digits10() { return 0; }
private:
@@ -284,6 +328,8 @@ private:
// Empty specialization for void to allow template specialization based on NumTraits<T>::Real with T==void and SFINAE.
template<> struct NumTraits<void> {};
template<> struct NumTraits<bool> : GenericNumTraits<bool> {};
} // end namespace Eigen
#endif // EIGEN_NUMTRAITS_H

View File

@@ -13,10 +13,10 @@
#if defined(EIGEN_INITIALIZE_MATRICES_BY_ZERO)
# define EIGEN_INITIALIZE_COEFFS
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(int i=0;i<base().size();++i) coeffRef(i)=Scalar(0);
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(Index i=0;i<base().size();++i) coeffRef(i)=Scalar(0);
#elif defined(EIGEN_INITIALIZE_MATRICES_BY_NAN)
# define EIGEN_INITIALIZE_COEFFS
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(int i=0;i<base().size();++i) coeffRef(i)=std::numeric_limits<Scalar>::quiet_NaN();
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(Index i=0;i<base().size();++i) coeffRef(i)=std::numeric_limits<Scalar>::quiet_NaN();
#else
# undef EIGEN_INITIALIZE_COEFFS
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
@@ -118,16 +118,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
using Base::IsVectorAtCompileTime;
using Base::Flags;
template<typename PlainObjectType, int MapOptions, typename StrideType> friend class Eigen::Map;
friend class Eigen::Map<Derived, Unaligned>;
typedef Eigen::Map<Derived, Unaligned> MapType;
friend class Eigen::Map<const Derived, Unaligned>;
typedef const Eigen::Map<const Derived, Unaligned> ConstMapType;
#if EIGEN_MAX_ALIGN_BYTES>0
// for EIGEN_MAX_ALIGN_BYTES==0, AlignedMax==Unaligned, and many compilers generate warnings for friend-ing a class twice.
friend class Eigen::Map<Derived, AlignedMax>;
friend class Eigen::Map<const Derived, AlignedMax>;
#endif
typedef Eigen::Map<Derived, AlignedMax> AlignedMapType;
typedef const Eigen::Map<const Derived, AlignedMax> ConstAlignedMapType;
template<typename StrideType> struct StridedMapType { typedef Eigen::Map<Derived, Unaligned, StrideType> type; };
@@ -147,10 +139,10 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_DEVICE_FUNC
const Base& base() const { return *static_cast<const Base*>(this); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const { return m_storage.rows(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const { return m_storage.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_storage.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_storage.cols(); }
/** This is an overloaded version of DenseCoeffsBase<Derived,ReadOnlyAccessors>::coeff(Index,Index) const
* provided to by-pass the creation of an evaluator of the expression, thus saving compilation efforts.
@@ -508,8 +500,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_DEVICE_FUNC
PlainObjectBase& operator=(PlainObjectBase&& other) EIGEN_NOEXCEPT
{
using std::swap;
swap(m_storage, other.m_storage);
_check_template_params();
m_storage = std::move(other.m_storage);
return *this;
}
#endif
@@ -530,11 +522,11 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
/** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients. \cpp11
*
* \only_for_vectors
*
*
* This constructor is for 1D array or vectors with more than 4 coefficients.
* There exists C++98 analogue constructors for fixed-size array/vector having 1, 2, 3, or 4 coefficients.
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
* constructor must match the the fixed number of rows (resp. columns) of \c *this.
*/
template <typename... ArgTypes>
@@ -548,7 +540,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
m_storage.data()[1] = a1;
m_storage.data()[2] = a2;
m_storage.data()[3] = a3;
int i = 4;
Index i = 4;
auto x = {(m_storage.data()[i++] = args, 0)...};
static_cast<void>(x);
}
@@ -576,7 +568,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
eigen_assert(list.size() == static_cast<size_t>(RowsAtCompileTime) || RowsAtCompileTime == Dynamic);
eigen_assert(list_size == static_cast<size_t>(ColsAtCompileTime) || ColsAtCompileTime == Dynamic);
resize(list.size(), list_size);
Index row_index = 0;
for (const std::initializer_list<Scalar>& row : list) {
eigen_assert(list_size == row.size());
@@ -717,18 +709,26 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
using Base::setConstant;
EIGEN_DEVICE_FUNC Derived& setConstant(Index size, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, Index cols, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(NoChange_t, Index cols, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, NoChange_t, const Scalar& val);
using Base::setZero;
EIGEN_DEVICE_FUNC Derived& setZero(Index size);
EIGEN_DEVICE_FUNC Derived& setZero(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setZero(NoChange_t, Index cols);
EIGEN_DEVICE_FUNC Derived& setZero(Index rows, NoChange_t);
using Base::setOnes;
EIGEN_DEVICE_FUNC Derived& setOnes(Index size);
EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setOnes(NoChange_t, Index cols);
EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, NoChange_t);
using Base::setRandom;
Derived& setRandom(Index size);
Derived& setRandom(Index rows, Index cols);
Derived& setRandom(NoChange_t, Index cols);
Derived& setRandom(Index rows, NoChange_t);
#ifdef EIGEN_PLAINOBJECTBASE_PLUGIN
#include EIGEN_PLAINOBJECTBASE_PLUGIN
@@ -967,8 +967,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE void _check_template_params()
{
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (Options&RowMajor)==RowMajor)
&& EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, (Options&RowMajor)==0)
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (int(Options)&RowMajor)==RowMajor)
&& EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, (int(Options)&RowMajor)==0)
&& ((RowsAtCompileTime == Dynamic) || (RowsAtCompileTime >= 0))
&& ((ColsAtCompileTime == Dynamic) || (ColsAtCompileTime >= 0))
&& ((MaxRowsAtCompileTime == Dynamic) || (MaxRowsAtCompileTime >= 0))
@@ -980,6 +980,17 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
}
enum { IsPlainObjectBase = 1 };
#endif
public:
// These apparently need to be down here for nvcc+icc to prevent duplicate
// Map symbol.
template<typename PlainObjectType, int MapOptions, typename StrideType> friend class Eigen::Map;
friend class Eigen::Map<Derived, Unaligned>;
friend class Eigen::Map<const Derived, Unaligned>;
#if EIGEN_MAX_ALIGN_BYTES>0
// for EIGEN_MAX_ALIGN_BYTES==0, AlignedMax==Unaligned, and many compilers generate warnings for friend-ing a class twice.
friend class Eigen::Map<Derived, AlignedMax>;
friend class Eigen::Map<const Derived, AlignedMax>;
#endif
};

View File

@@ -23,25 +23,25 @@ struct traits<Product<Lhs, Rhs, Option> >
typedef typename remove_all<Rhs>::type RhsCleaned;
typedef traits<LhsCleaned> LhsTraits;
typedef traits<RhsCleaned> RhsTraits;
typedef MatrixXpr XprKind;
typedef typename ScalarBinaryOpTraits<typename traits<LhsCleaned>::Scalar, typename traits<RhsCleaned>::Scalar>::ReturnType Scalar;
typedef typename product_promote_storage_type<typename LhsTraits::StorageKind,
typename RhsTraits::StorageKind,
internal::product_type<Lhs,Rhs>::ret>::ret StorageKind;
typedef typename promote_index_type<typename LhsTraits::StorageIndex,
typename RhsTraits::StorageIndex>::type StorageIndex;
enum {
RowsAtCompileTime = LhsTraits::RowsAtCompileTime,
ColsAtCompileTime = RhsTraits::ColsAtCompileTime,
MaxRowsAtCompileTime = LhsTraits::MaxRowsAtCompileTime,
MaxColsAtCompileTime = RhsTraits::MaxColsAtCompileTime,
// FIXME: only needed by GeneralMatrixMatrixTriangular
InnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(LhsTraits::ColsAtCompileTime, RhsTraits::RowsAtCompileTime),
// The storage order is somewhat arbitrary here. The correct one will be determined through the evaluator.
Flags = (MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1) ? RowMajorBit
: (MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1) ? 0
@@ -74,10 +74,10 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
internal::product_type<_Lhs,_Rhs>::ret>::ret>
{
public:
typedef _Lhs Lhs;
typedef _Rhs Rhs;
typedef typename ProductImpl<
Lhs, Rhs, Option,
typename internal::product_promote_storage_type<typename internal::traits<Lhs>::StorageKind,
@@ -98,10 +98,10 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
&& "if you wanted a coeff-wise or a dot product use the respective explicit functions");
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_lhs.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_lhs.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const LhsNestedCleaned& lhs() const { return m_lhs; }
@@ -115,7 +115,7 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
};
namespace internal {
template<typename Lhs, typename Rhs, int Option, int ProductTag = internal::product_type<Lhs,Rhs>::ret>
class dense_product_base
: public internal::dense_xpr_base<Product<Lhs,Rhs,Option> >::type
@@ -131,7 +131,7 @@ class dense_product_base<Lhs, Rhs, Option, InnerProduct>
public:
using Base::derived;
typedef typename Base::Scalar Scalar;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator const Scalar() const
{
return internal::evaluator<ProductXpr>(derived()).coeff(0,0);
@@ -153,25 +153,25 @@ class ProductImpl<Lhs,Rhs,Option,Dense>
: public internal::dense_product_base<Lhs,Rhs,Option>
{
typedef Product<Lhs, Rhs, Option> Derived;
public:
typedef typename internal::dense_product_base<Lhs, Rhs, Option> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
protected:
enum {
IsOneByOne = (RowsAtCompileTime == 1 || RowsAtCompileTime == Dynamic) &&
IsOneByOne = (RowsAtCompileTime == 1 || RowsAtCompileTime == Dynamic) &&
(ColsAtCompileTime == 1 || ColsAtCompileTime == Dynamic),
EnableCoeff = IsOneByOne || Option==LazyProduct
};
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar coeff(Index row, Index col) const
{
EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS);
eigen_assert( (Option==LazyProduct) || (this->rows() == 1 && this->cols() == 1) );
return internal::evaluator<Derived>(derived()).coeff(row,col);
}
@@ -179,11 +179,11 @@ class ProductImpl<Lhs,Rhs,Option,Dense>
{
EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS);
eigen_assert( (Option==LazyProduct) || (this->rows() == 1 && this->cols() == 1) );
return internal::evaluator<Derived>(derived()).coeff(i);
}
};
} // end namespace Eigen

View File

@@ -14,7 +14,7 @@
#define EIGEN_PRODUCTEVALUATORS_H
namespace Eigen {
namespace internal {
/** \internal
@@ -22,19 +22,19 @@ namespace internal {
* Since products require special treatments to handle all possible cases,
* we simply defer the evaluation logic to a product_evaluator class
* which offers more partial specialization possibilities.
*
*
* \sa class product_evaluator
*/
template<typename Lhs, typename Rhs, int Options>
struct evaluator<Product<Lhs, Rhs, Options> >
struct evaluator<Product<Lhs, Rhs, Options> >
: public product_evaluator<Product<Lhs, Rhs, Options> >
{
typedef Product<Lhs, Rhs, Options> XprType;
typedef product_evaluator<XprType> Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
// Catch "scalar * ( A * B )" and transform it to "(A*scalar) * B"
// TODO we should apply that rule only if that's really helpful
template<typename Lhs, typename Rhs, typename Scalar1, typename Scalar2, typename Plain1>
@@ -62,12 +62,12 @@ struct evaluator<CwiseBinaryOp<internal::scalar_product_op<Scalar1,Scalar2>,
template<typename Lhs, typename Rhs, int DiagIndex>
struct evaluator<Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> >
struct evaluator<Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> >
: public evaluator<Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex> >
{
typedef Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> XprType;
typedef evaluator<Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex> > Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr)
: Base(Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex>(
Product<Lhs, Rhs, LazyProduct>(xpr.nestedExpression().lhs(), xpr.nestedExpression().rhs()),
@@ -108,23 +108,23 @@ struct product_evaluator<Product<Lhs, Rhs, Options>, ProductTag, LhsShape, RhsSh
: m_result(xpr.rows(), xpr.cols())
{
::new (static_cast<Base*>(this)) Base(m_result);
// FIXME shall we handle nested_eval here?,
// if so, then we must take care at removing the call to nested_eval in the specializations (e.g., in permutation_matrix_product, transposition_matrix_product, etc.)
// typedef typename internal::nested_eval<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
// typedef typename internal::nested_eval<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
// typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
// typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
//
//
// const LhsNested lhs(xpr.lhs());
// const RhsNested rhs(xpr.rhs());
//
//
// generic_product_impl<LhsNestedCleaned, RhsNestedCleaned>::evalTo(m_result, lhs, rhs);
generic_product_impl<Lhs, Rhs, LhsShape, RhsShape, ProductTag>::evalTo(m_result, xpr.lhs(), xpr.rhs());
}
protected:
protected:
PlainObject m_result;
};
@@ -250,13 +250,13 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,InnerProduct>
{
dst.coeffRef(0,0) = (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
dst.coeffRef(0,0) += (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.coeffRef(0,0) -= (lhs.transpose().cwiseProduct(rhs)).sum(); }
@@ -298,7 +298,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
{
template<typename T> struct is_row_major : internal::conditional<(int(T::Flags)&RowMajorBit), internal::true_type, internal::false_type>::type {};
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
// TODO it would be nice to be able to exploit our *_assign_op functors for that purpose
struct set { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() = src; } };
struct add { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() += src; } };
@@ -310,31 +310,31 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
dst.const_cast_derived() += m_scale * src;
}
};
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, set(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, add(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, sub(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
internal::outer_product_selector_run(dst, lhs, rhs, adds(alpha), is_row_major<Dst>());
}
};
@@ -343,7 +343,7 @@ template<typename Lhs, typename Rhs, typename Derived>
struct generic_product_impl_base
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.setZero(); scaleAndAddTo(dst, lhs, rhs, Scalar(1)); }
@@ -355,7 +355,7 @@ struct generic_product_impl_base
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ scaleAndAddTo(dst, lhs, rhs, Scalar(-1)); }
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{ Derived::scaleAndAddTo(dst,lhs,rhs,alpha); }
@@ -375,6 +375,11 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
template<typename Dest>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
// Fallback to inner product if both the lhs and rhs is a runtime vector.
if (lhs.rows() == 1 && rhs.cols() == 1) {
dst.coeffRef(0,0) += alpha * lhs.row(0).conjugate().dot(rhs.col(0));
return;
}
LhsNested actual_lhs(lhs);
RhsNested actual_rhs(rhs);
internal::gemv_dense_selector<Side,
@@ -385,10 +390,10 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
};
template<typename Lhs, typename Rhs>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
@@ -403,7 +408,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
// dst.noalias() += lhs.lazyProduct(rhs);
call_assignment_no_alias(dst, lhs.lazyProduct(rhs), internal::add_assign_op<typename Dst::Scalar,Scalar>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
@@ -436,8 +441,8 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
};
// FIXME: in c++11 this should be auto, and extractScalarFactor should also return auto
// this is important for real*complex_mat
Scalar actualAlpha = blas_traits<Lhs>::extractScalarFactor(lhs)
* blas_traits<Rhs>::extractScalarFactor(rhs);
Scalar actualAlpha = combine_scalar_factors<Scalar>(lhs, rhs);
eval_dynamic_impl(dst,
blas_traits<Lhs>::extract(lhs).template conjugateIf<ConjLhs>(),
blas_traits<Rhs>::extract(rhs).template conjugateIf<ConjRhs>(),
@@ -520,7 +525,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
typedef typename internal::nested_eval<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
typedef typename internal::nested_eval<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
@@ -539,19 +544,19 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
typedef typename find_best_packet<Scalar,ColsAtCompileTime>::type RhsVecPacketType;
enum {
LhsCoeffReadCost = LhsEtorType::CoeffReadCost,
RhsCoeffReadCost = RhsEtorType::CoeffReadCost,
CoeffReadCost = InnerSize==0 ? NumTraits<Scalar>::ReadCost
: InnerSize == Dynamic ? HugeCost
: InnerSize * (NumTraits<Scalar>::MulCost + LhsCoeffReadCost + RhsCoeffReadCost)
: InnerSize * (NumTraits<Scalar>::MulCost + int(LhsCoeffReadCost) + int(RhsCoeffReadCost))
+ (InnerSize - 1) * NumTraits<Scalar>::AddCost,
Unroll = CoeffReadCost <= EIGEN_UNROLLING_LIMIT,
LhsFlags = LhsEtorType::Flags,
RhsFlags = RhsEtorType::Flags,
LhsRowMajor = LhsFlags & RowMajorBit,
RhsRowMajor = RhsFlags & RowMajorBit,
@@ -561,7 +566,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
// Here, we don't care about alignment larger than the usable packet size.
LhsAlignment = EIGEN_PLAIN_ENUM_MIN(LhsEtorType::Alignment,LhsVecPacketSize*int(sizeof(typename LhsNestedCleaned::Scalar))),
RhsAlignment = EIGEN_PLAIN_ENUM_MIN(RhsEtorType::Alignment,RhsVecPacketSize*int(sizeof(typename RhsNestedCleaned::Scalar))),
SameType = is_same<typename LhsNestedCleaned::Scalar,typename RhsNestedCleaned::Scalar>::value,
CanVectorizeRhs = bool(RhsRowMajor) && (RhsFlags & PacketAccessBit) && (ColsAtCompileTime!=1),
@@ -571,12 +576,12 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
: (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0
: (bool(RhsRowMajor) && !CanVectorizeLhs),
Flags = ((unsigned int)(LhsFlags | RhsFlags) & HereditaryBits & ~RowMajorBit)
Flags = ((int(LhsFlags) | int(RhsFlags)) & HereditaryBits & ~RowMajorBit)
| (EvalToRowMajor ? RowMajorBit : 0)
// TODO enable vectorization for mixed types
| (SameType && (CanVectorizeLhs || CanVectorizeRhs) ? PacketAccessBit : 0)
| (XprType::IsVectorAtCompileTime ? LinearAccessBit : 0),
LhsOuterStrideBytes = int(LhsNestedCleaned::OuterStrideAtCompileTime) * int(sizeof(typename LhsNestedCleaned::Scalar)),
RhsOuterStrideBytes = int(RhsNestedCleaned::OuterStrideAtCompileTime) * int(sizeof(typename RhsNestedCleaned::Scalar)),
@@ -592,10 +597,10 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
CanVectorizeInner = SameType
&& LhsRowMajor
&& (!RhsRowMajor)
&& (LhsFlags & RhsFlags & ActualPacketAccessBit)
&& (InnerSize % packet_traits<Scalar>::size == 0)
&& (int(LhsFlags) & int(RhsFlags) & ActualPacketAccessBit)
&& (int(InnerSize) % packet_traits<Scalar>::size == 0)
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index row, Index col) const
{
return (m_lhs.row(row).transpose().cwiseProduct( m_rhs.col(col) )).sum();
@@ -637,7 +642,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
protected:
typename internal::add_const_on_value_type<LhsNested>::type m_lhs;
typename internal::add_const_on_value_type<RhsNested>::type m_rhs;
LhsEtorType m_lhsImpl;
RhsEtorType m_rhsImpl;
@@ -668,7 +673,7 @@ struct product_evaluator<Product<Lhs, Rhs, DefaultProduct>, LazyCoeffBasedProduc
template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
{
etor_product_packet_impl<RowMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
res = pmadd(pset1<Packet>(lhs.coeff(row, Index(UnrollingIndex-1))), rhs.template packet<LoadMode,Packet>(Index(UnrollingIndex-1), col), res);
@@ -678,7 +683,7 @@ struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, Load
template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
{
etor_product_packet_impl<ColMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
res = pmadd(lhs.template packet<LoadMode,Packet>(row, Index(UnrollingIndex-1)), pset1<Packet>(rhs.coeff(Index(UnrollingIndex-1), col)), res);
@@ -688,7 +693,7 @@ struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, Load
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, 1, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
{
res = pmul(pset1<Packet>(lhs.coeff(row, Index(0))),rhs.template packet<LoadMode,Packet>(Index(0), col));
}
@@ -697,7 +702,7 @@ struct etor_product_packet_impl<RowMajor, 1, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, 1, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
{
res = pmul(lhs.template packet<LoadMode,Packet>(row, Index(0)), pset1<Packet>(rhs.coeff(Index(0), col)));
}
@@ -706,7 +711,7 @@ struct etor_product_packet_impl<ColMajor, 1, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
}
@@ -715,7 +720,7 @@ struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
}
@@ -724,7 +729,7 @@ struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
for(Index i = 0; i < innerDim; ++i)
@@ -735,7 +740,7 @@ struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
for(Index i = 0; i < innerDim; ++i)
@@ -757,7 +762,7 @@ struct generic_product_impl<Lhs,Rhs,TriangularShape,DenseShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,TriangularShape,DenseShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -771,7 +776,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,TriangularShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,DenseShape,TriangularShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -792,7 +797,7 @@ struct generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static EIGEN_DEVICE_FUNC
void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
@@ -806,7 +811,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -818,7 +823,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag>
/***************************************************************************
* Diagonal products
***************************************************************************/
template<typename MatrixType, typename DiagonalType, typename Derived, int ProductOrder>
struct diagonal_product_evaluator_base
: evaluator_base<Derived>
@@ -827,10 +832,10 @@ struct diagonal_product_evaluator_base
public:
enum {
CoeffReadCost = NumTraits<Scalar>::MulCost + evaluator<MatrixType>::CoeffReadCost + evaluator<DiagonalType>::CoeffReadCost,
MatrixFlags = evaluator<MatrixType>::Flags,
DiagFlags = evaluator<DiagonalType>::Flags,
_StorageOrder = (Derived::MaxRowsAtCompileTime==1 && Derived::MaxColsAtCompileTime!=1) ? RowMajor
: (Derived::MaxColsAtCompileTime==1 && Derived::MaxRowsAtCompileTime!=1) ? ColMajor
: MatrixFlags & RowMajorBit ? RowMajor : ColMajor,
@@ -853,14 +858,14 @@ public:
|| (DiagonalType::SizeAtCompileTime==Dynamic && MatrixType::RowsAtCompileTime==1 && ProductOrder==OnTheLeft)
|| (DiagonalType::SizeAtCompileTime==Dynamic && MatrixType::ColsAtCompileTime==1 && ProductOrder==OnTheRight)
};
diagonal_product_evaluator_base(const MatrixType &mat, const DiagonalType &diag)
EIGEN_DEVICE_FUNC diagonal_product_evaluator_base(const MatrixType &mat, const DiagonalType &diag)
: m_diagImpl(diag), m_matImpl(mat)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(NumTraits<Scalar>::MulCost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index idx) const
{
if(AsScalarProduct)
@@ -868,7 +873,7 @@ public:
else
return m_diagImpl.coeff(idx) * m_matImpl.coeff(idx);
}
protected:
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet_impl(Index row, Index col, Index id, internal::true_type) const
@@ -876,7 +881,7 @@ protected:
return internal::pmul(m_matImpl.template packet<LoadMode,PacketType>(row, col),
internal::pset1<PacketType>(m_diagImpl.coeff(id)));
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet_impl(Index row, Index col, Index id, internal::false_type) const
{
@@ -887,7 +892,7 @@ protected:
return internal::pmul(m_matImpl.template packet<LoadMode,PacketType>(row, col),
m_diagImpl.template packet<DiagonalPacketLoadMode,PacketType>(id));
}
evaluator<DiagonalType> m_diagImpl;
evaluator<MatrixType> m_matImpl;
};
@@ -902,24 +907,24 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
using Base::m_matImpl;
using Base::coeff;
typedef typename Base::Scalar Scalar;
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
typedef typename Lhs::DiagonalVectorType DiagonalType;
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.rhs(), xpr.lhs().diagonal())
{
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
{
return m_diagImpl.coeff(row) * m_matImpl.coeff(row, col);
}
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
@@ -929,7 +934,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
return this->template packet_impl<LoadMode,PacketType>(row,col, row,
typename internal::conditional<int(StorageOrder)==RowMajor, internal::true_type, internal::false_type>::type());
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index idx) const
{
@@ -948,22 +953,22 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
using Base::m_matImpl;
using Base::coeff;
typedef typename Base::Scalar Scalar;
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.lhs(), xpr.rhs().diagonal())
{
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
{
return m_matImpl.coeff(row, col) * m_diagImpl.coeff(col);
}
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
@@ -971,7 +976,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
return this->template packet_impl<LoadMode,PacketType>(row,col, col,
typename internal::conditional<int(StorageOrder)==ColMajor, internal::true_type, internal::false_type>::type());
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index idx) const
{
@@ -999,7 +1004,7 @@ struct permutation_matrix_product<ExpressionType, Side, Transposed, DenseShape>
typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
template<typename Dest, typename PermutationType>
static inline void run(Dest& dst, const PermutationType& perm, const ExpressionType& xpr)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Dest& dst, const PermutationType& perm, const ExpressionType& xpr)
{
MatrixType mat(xpr);
const Index n = Side==OnTheLeft ? mat.rows() : mat.cols();
@@ -1053,7 +1058,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, PermutationShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
permutation_matrix_product<Rhs, OnTheLeft, false, MatrixShape>::run(dst, lhs, rhs);
}
@@ -1063,7 +1068,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, MatrixShape, PermutationShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
permutation_matrix_product<Lhs, OnTheRight, false, MatrixShape>::run(dst, rhs, lhs);
}
@@ -1073,7 +1078,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Inverse<Lhs>, Rhs, PermutationShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Inverse<Lhs>& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Inverse<Lhs>& lhs, const Rhs& rhs)
{
permutation_matrix_product<Rhs, OnTheLeft, true, MatrixShape>::run(dst, lhs.nestedExpression(), rhs);
}
@@ -1083,7 +1088,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Inverse<Rhs>, MatrixShape, PermutationShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Inverse<Rhs>& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Inverse<Rhs>& rhs)
{
permutation_matrix_product<Lhs, OnTheRight, true, MatrixShape>::run(dst, rhs.nestedExpression(), lhs);
}
@@ -1105,9 +1110,9 @@ struct transposition_matrix_product
{
typedef typename nested_eval<ExpressionType, 1>::type MatrixType;
typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
template<typename Dest, typename TranspositionType>
static inline void run(Dest& dst, const TranspositionType& tr, const ExpressionType& xpr)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Dest& dst, const TranspositionType& tr, const ExpressionType& xpr)
{
MatrixType mat(xpr);
typedef typename TranspositionType::StorageIndex StorageIndex;
@@ -1130,7 +1135,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, TranspositionsShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
transposition_matrix_product<Rhs, OnTheLeft, false, MatrixShape>::run(dst, lhs, rhs);
}
@@ -1140,7 +1145,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, MatrixShape, TranspositionsShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
transposition_matrix_product<Lhs, OnTheRight, false, MatrixShape>::run(dst, rhs, lhs);
}
@@ -1151,7 +1156,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Transpose<Lhs>, Rhs, TranspositionsShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Transpose<Lhs>& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Transpose<Lhs>& lhs, const Rhs& rhs)
{
transposition_matrix_product<Rhs, OnTheLeft, true, MatrixShape>::run(dst, lhs.nestedExpression(), rhs);
}
@@ -1161,7 +1166,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Transpose<Rhs>, MatrixShape, TranspositionsShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Transpose<Rhs>& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Transpose<Rhs>& rhs)
{
transposition_matrix_product<Lhs, OnTheRight, true, MatrixShape>::run(dst, rhs.nestedExpression(), lhs);
}

View File

@@ -177,6 +177,42 @@ PlainObjectBase<Derived>::setRandom(Index rows, Index cols)
return setRandom();
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to random values. For the parameter of type
* NoChange_t, just pass the special value \c NoChange.
*
* Numbers are uniformly spread through their whole definition range for integer types,
* and in the [-1:1] range for floating point scalar types.
*
* \not_reentrant
*
* \sa DenseBase::setRandom(), setRandom(Index), setRandom(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Random()
*/
template<typename Derived>
EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setRandom(NoChange_t, Index cols)
{
return setRandom(rows(), cols);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to random values. For the parameter of type
* NoChange_t, just pass the special value \c NoChange.
*
* Numbers are uniformly spread through their whole definition range for integer types,
* and in the [-1:1] range for floating point scalar types.
*
* \not_reentrant
*
* \sa DenseBase::setRandom(), setRandom(Index), setRandom(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Random()
*/
template<typename Derived>
EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setRandom(Index rows, NoChange_t)
{
return setRandom(rows, cols());
}
} // end namespace Eigen
#endif // EIGEN_RANDOM_H

View File

@@ -419,25 +419,33 @@ DenseBase<Derived>::redux(const Func& func) const
}
/** \returns the minimum of all coefficients of \c *this.
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is minimum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
* \warning the result is undefined if \c *this contains NaN.
*/
template<typename Derived>
template<int NaNPropagation>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff() const
{
return derived().redux(Eigen::internal::scalar_min_op<Scalar,Scalar>());
return derived().redux(Eigen::internal::scalar_min_op<Scalar,Scalar, NaNPropagation>());
}
/** \returns the maximum of all coefficients of \c *this.
/** \returns the maximum of all coefficients of \c *this.
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
* \warning the result is undefined if \c *this contains NaN.
*/
template<typename Derived>
template<int NaNPropagation>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff() const
{
return derived().redux(Eigen::internal::scalar_max_op<Scalar,Scalar>());
return derived().redux(Eigen::internal::scalar_max_op<Scalar,Scalar, NaNPropagation>());
}
/** \returns the sum of all coefficients of \c *this

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_REF_H
#define EIGEN_REF_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -48,7 +48,7 @@ struct traits<Ref<_PlainObjectType, _Options, _StrideType> >
};
typedef typename internal::conditional<MatchAtCompileTime,internal::true_type,internal::false_type>::type type;
};
};
template<typename Derived>
@@ -67,12 +67,12 @@ public:
typedef MapBase<Derived> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(RefBase)
EIGEN_DEVICE_FUNC inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const
{
return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1;
}
EIGEN_DEVICE_FUNC inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const
{
return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()
: IsVectorAtCompileTime ? this->size()
@@ -86,36 +86,122 @@ public:
m_stride(StrideType::OuterStrideAtCompileTime==Dynamic?0:StrideType::OuterStrideAtCompileTime,
StrideType::InnerStrideAtCompileTime==Dynamic?0:StrideType::InnerStrideAtCompileTime)
{}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(RefBase)
protected:
typedef Stride<StrideType::OuterStrideAtCompileTime,StrideType::InnerStrideAtCompileTime> StrideBase;
template<typename Expression>
EIGEN_DEVICE_FUNC void construct(Expression& expr)
{
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(PlainObjectType,Expression);
// Resolves inner stride if default 0.
static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index resolveInnerStride(Index inner) {
return inner == 0 ? 1 : inner;
}
// Resolves outer stride if default 0.
static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index resolveOuterStride(Index inner, Index outer, Index rows, Index cols, bool isVectorAtCompileTime, bool isRowMajor) {
return outer == 0 ? isVectorAtCompileTime ? inner * rows * cols : isRowMajor ? inner * cols : inner * rows : outer;
}
// Returns true if construction is valid, false if there is a stride mismatch,
// and fails if there is a size mismatch.
template<typename Expression>
EIGEN_DEVICE_FUNC bool construct(Expression& expr)
{
// Check matrix sizes. If this is a compile-time vector, we do allow
// implicitly transposing.
EIGEN_STATIC_ASSERT(
EIGEN_PREDICATE_SAME_MATRIX_SIZE(PlainObjectType, Expression)
// If it is a vector, the transpose sizes might match.
|| ( PlainObjectType::IsVectorAtCompileTime
&& ((int(PlainObjectType::RowsAtCompileTime)==Eigen::Dynamic
|| int(Expression::ColsAtCompileTime)==Eigen::Dynamic
|| int(PlainObjectType::RowsAtCompileTime)==int(Expression::ColsAtCompileTime))
&& (int(PlainObjectType::ColsAtCompileTime)==Eigen::Dynamic
|| int(Expression::RowsAtCompileTime)==Eigen::Dynamic
|| int(PlainObjectType::ColsAtCompileTime)==int(Expression::RowsAtCompileTime)))),
YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES
)
// Determine runtime rows and columns.
Index rows = expr.rows();
Index cols = expr.cols();
if(PlainObjectType::RowsAtCompileTime==1)
{
eigen_assert(expr.rows()==1 || expr.cols()==1);
::new (static_cast<Base*>(this)) Base(expr.data(), 1, expr.size());
rows = 1;
cols = expr.size();
}
else if(PlainObjectType::ColsAtCompileTime==1)
{
eigen_assert(expr.rows()==1 || expr.cols()==1);
::new (static_cast<Base*>(this)) Base(expr.data(), expr.size(), 1);
rows = expr.size();
cols = 1;
}
else
::new (static_cast<Base*>(this)) Base(expr.data(), expr.rows(), expr.cols());
if(Expression::IsVectorAtCompileTime && (!PlainObjectType::IsVectorAtCompileTime) && ((Expression::Flags&RowMajorBit)!=(PlainObjectType::Flags&RowMajorBit)))
::new (&m_stride) StrideBase(expr.innerStride(), StrideType::InnerStrideAtCompileTime==0?0:1);
else
::new (&m_stride) StrideBase(StrideType::OuterStrideAtCompileTime==0?0:expr.outerStride(),
StrideType::InnerStrideAtCompileTime==0?0:expr.innerStride());
// Verify that the sizes are valid.
eigen_assert(
(PlainObjectType::RowsAtCompileTime == Dynamic) || (PlainObjectType::RowsAtCompileTime == rows));
eigen_assert(
(PlainObjectType::ColsAtCompileTime == Dynamic) || (PlainObjectType::ColsAtCompileTime == cols));
// If this is a vector, we might be transposing, which means that stride should swap.
const bool transpose = PlainObjectType::IsVectorAtCompileTime && (rows != expr.rows());
// If the storage format differs, we also need to swap the stride.
const bool row_major = ((PlainObjectType::Flags)&RowMajorBit) != 0;
const bool expr_row_major = (Expression::Flags&RowMajorBit) != 0;
const bool storage_differs = (row_major != expr_row_major);
const bool swap_stride = (transpose != storage_differs);
// Determine expr's actual strides, resolving any defaults if zero.
const Index expr_inner_actual = resolveInnerStride(expr.innerStride());
const Index expr_outer_actual = resolveOuterStride(expr_inner_actual,
expr.outerStride(),
expr.rows(),
expr.cols(),
Expression::IsVectorAtCompileTime != 0,
expr_row_major);
// If this is a column-major row vector or row-major column vector, the inner-stride
// is arbitrary, so set it to either the compile-time inner stride or 1.
const bool row_vector = (rows == 1);
const bool col_vector = (cols == 1);
const Index inner_stride =
( (!row_major && row_vector) || (row_major && col_vector) ) ?
( StrideType::InnerStrideAtCompileTime > 0 ? Index(StrideType::InnerStrideAtCompileTime) : 1)
: swap_stride ? expr_outer_actual : expr_inner_actual;
// If this is a column-major column vector or row-major row vector, the outer-stride
// is arbitrary, so set it to either the compile-time outer stride or vector size.
const Index outer_stride =
( (!row_major && col_vector) || (row_major && row_vector) ) ?
( StrideType::OuterStrideAtCompileTime > 0 ? Index(StrideType::OuterStrideAtCompileTime) : rows * cols * inner_stride)
: swap_stride ? expr_inner_actual : expr_outer_actual;
// Check if given inner/outer strides are compatible with compile-time strides.
const bool inner_valid = (StrideType::InnerStrideAtCompileTime == Dynamic)
|| (resolveInnerStride(Index(StrideType::InnerStrideAtCompileTime)) == inner_stride);
if (!inner_valid) {
return false;
}
const bool outer_valid = (StrideType::OuterStrideAtCompileTime == Dynamic)
|| (resolveOuterStride(
inner_stride,
Index(StrideType::OuterStrideAtCompileTime),
rows, cols, PlainObjectType::IsVectorAtCompileTime != 0,
row_major)
== outer_stride);
if (!outer_valid) {
return false;
}
::new (static_cast<Base*>(this)) Base(expr.data(), rows, cols);
::new (&m_stride) StrideBase(
(StrideType::OuterStrideAtCompileTime == 0) ? 0 : outer_stride,
(StrideType::InnerStrideAtCompileTime == 0) ? 0 : inner_stride );
return true;
}
StrideBase m_stride;
@@ -212,7 +298,10 @@ template<typename PlainObjectType, int Options, typename StrideType> class Ref
typename internal::enable_if<bool(Traits::template match<Derived>::MatchAtCompileTime),Derived>::type* = 0)
{
EIGEN_STATIC_ASSERT(bool(Traits::template match<Derived>::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH);
Base::construct(expr.derived());
// Construction must pass since we will not create temprary storage in the non-const case.
const bool success = Base::construct(expr.derived());
EIGEN_UNUSED_VARIABLE(success)
eigen_assert(success);
}
template<typename Derived>
EIGEN_DEVICE_FUNC inline Ref(const DenseBase<Derived>& expr,
@@ -226,7 +315,10 @@ template<typename PlainObjectType, int Options, typename StrideType> class Ref
EIGEN_STATIC_ASSERT(bool(internal::is_lvalue<Derived>::value), THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
EIGEN_STATIC_ASSERT(bool(Traits::template match<Derived>::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH);
EIGEN_STATIC_ASSERT(!Derived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
Base::construct(expr.const_cast_derived());
// Construction must pass since we will not create temporary storage in the non-const case.
const bool success = Base::construct(expr.const_cast_derived());
EIGEN_UNUSED_VARIABLE(success)
eigen_assert(success);
}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Ref)
@@ -267,7 +359,10 @@ template<typename TPlainObjectType, int Options, typename StrideType> class Ref<
template<typename Expression>
EIGEN_DEVICE_FUNC void construct(const Expression& expr,internal::true_type)
{
Base::construct(expr);
// Check if we can use the underlying expr's storage directly, otherwise call the copy version.
if (!Base::construct(expr)) {
construct(expr, internal::false_type());
}
}
template<typename Expression>

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_REPLICATE_H
#define EIGEN_REPLICATE_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename MatrixType,int RowFactor,int ColFactor>
@@ -35,7 +35,7 @@ struct traits<Replicate<MatrixType,RowFactor,ColFactor> >
IsRowMajor = MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1 ? 1
: MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1 ? 0
: (MatrixType::Flags & RowMajorBit) ? 1 : 0,
// FIXME enable DirectAccess with negative strides?
Flags = IsRowMajor ? RowMajorBit : 0
};
@@ -88,15 +88,15 @@ template<typename MatrixType,int RowFactor,int ColFactor> class Replicate
THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE)
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const { return m_matrix.rows() * m_rowFactor.value(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const { return m_matrix.cols() * m_colFactor.value(); }
EIGEN_DEVICE_FUNC
const _MatrixTypeNested& nestedExpression() const
{
return m_matrix;
{
return m_matrix;
}
protected:

View File

@@ -12,7 +12,6 @@
#define EIGEN_RESHAPED_H
namespace Eigen {
namespace internal {
/** \class Reshaped
* \ingroup Core_Module
@@ -44,6 +43,8 @@ namespace internal {
* \sa DenseBase::reshaped(NRowsType,NColsType)
*/
namespace internal {
template<typename XprType, int Rows, int Cols, int Order>
struct traits<Reshaped<XprType, Rows, Cols, Order> > : traits<XprType>
{
@@ -239,14 +240,14 @@ class ReshapedImpl_dense<XprType, Rows, Cols, Order, true>
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const
{
return m_xpr.innerStride();
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const
{
return ((Flags&RowMajorBit)==RowMajorBit) ? this->cols() : this->rows();

View File

@@ -60,8 +60,10 @@ template<typename Derived> class ReturnByValue
EIGEN_DEVICE_FUNC
inline void evalTo(Dest& dst) const
{ static_cast<const Derived*>(this)->evalTo(dst); }
EIGEN_DEVICE_FUNC inline Index rows() const { return static_cast<const Derived*>(this)->rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return static_cast<const Derived*>(this)->cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return static_cast<const Derived*>(this)->rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return static_cast<const Derived*>(this)->cols(); }
#ifndef EIGEN_PARSED_BY_DOXYGEN
#define Unusable YOU_ARE_TRYING_TO_ACCESS_A_SINGLE_COEFFICIENT_IN_A_SPECIAL_EXPRESSION_WHERE_THAT_IS_NOT_ALLOWED_BECAUSE_THAT_WOULD_BE_INEFFICIENT
@@ -90,7 +92,7 @@ namespace internal {
// Expression is evaluated in a temporary; default implementation of Assignment is bypassed so that
// when a ReturnByValue expression is assigned, the evaluator is not constructed.
// TODO: Finalize port to new regime; ReturnByValue should not exist in the expression world
template<typename Derived>
struct evaluator<ReturnByValue<Derived> >
: public evaluator<typename internal::traits<Derived>::ReturnType>
@@ -98,7 +100,7 @@ struct evaluator<ReturnByValue<Derived> >
typedef ReturnByValue<Derived> XprType;
typedef typename internal::traits<Derived>::ReturnType PlainObject;
typedef evaluator<PlainObject> Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr)
: m_result(xpr.rows(), xpr.cols())
{

View File

@@ -12,7 +12,7 @@
#ifndef EIGEN_REVERSE_H
#define EIGEN_REVERSE_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -44,7 +44,7 @@ template<typename PacketType> struct reverse_packet_cond<PacketType,false>
static inline PacketType run(const PacketType& x) { return x; }
};
} // end namespace internal
} // end namespace internal
/** \class Reverse
* \ingroup Core_Module
@@ -89,8 +89,10 @@ template<typename MatrixType, int Direction> class Reverse
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Reverse)
EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC inline Index innerStride() const
{
@@ -98,7 +100,7 @@ template<typename MatrixType, int Direction> class Reverse
}
EIGEN_DEVICE_FUNC const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
nestedExpression() const
{
return m_matrix;
}
@@ -161,7 +163,7 @@ EIGEN_DEVICE_FUNC inline void DenseBase<Derived>::reverseInPlace()
}
namespace internal {
template<int Direction>
struct vectorwise_reverse_inplace_impl;

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SELECT_H
#define EIGEN_SELECT_H
namespace Eigen {
namespace Eigen {
/** \class Select
* \ingroup Core_Module
@@ -67,8 +67,10 @@ class Select : public internal::dense_xpr_base< Select<ConditionMatrixType, Then
eigen_assert(m_condition.cols() == m_then.cols() && m_condition.cols() == m_else.cols());
}
inline EIGEN_DEVICE_FUNC Index rows() const { return m_condition.rows(); }
inline EIGEN_DEVICE_FUNC Index cols() const { return m_condition.cols(); }
inline EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_condition.rows(); }
inline EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_condition.cols(); }
inline EIGEN_DEVICE_FUNC
const Scalar coeff(Index i, Index j) const
@@ -120,7 +122,7 @@ class Select : public internal::dense_xpr_base< Select<ConditionMatrixType, Then
*/
template<typename Derived>
template<typename ThenDerived,typename ElseDerived>
inline const Select<Derived,ThenDerived,ElseDerived>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived,ElseDerived>
DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
const DenseBase<ElseDerived>& elseMatrix) const
{
@@ -134,7 +136,7 @@ DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
*/
template<typename Derived>
template<typename ThenDerived>
inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
const typename ThenDerived::Scalar& elseScalar) const
{
@@ -149,7 +151,7 @@ DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
*/
template<typename Derived>
template<typename ElseDerived>
inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
inline EIGEN_DEVICE_FUNC const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
DenseBase<Derived>::select(const typename ElseDerived::Scalar& thenScalar,
const DenseBase<ElseDerived>& elseMatrix) const
{

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SELFADJOINTMATRIX_H
#define EIGEN_SELFADJOINTMATRIX_H
namespace Eigen {
namespace Eigen {
/** \class SelfAdjointView
* \ingroup Core_Module
@@ -58,7 +58,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
typedef MatrixTypeNestedCleaned NestedExpression;
/** \brief The type of coefficients in this matrix */
typedef typename internal::traits<SelfAdjointView>::Scalar Scalar;
typedef typename internal::traits<SelfAdjointView>::Scalar Scalar;
typedef typename MatrixType::StorageIndex StorageIndex;
typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
typedef SelfAdjointView<typename internal::add_const<MatrixType>::type, UpLo> ConstSelfAdjointView;
@@ -76,14 +76,14 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
EIGEN_STATIC_ASSERT(UpLo==Lower || UpLo==Upper,SELFADJOINTVIEW_ACCEPTS_UPPER_AND_LOWER_MODE_ONLY);
}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_matrix.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_matrix.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_matrix.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_matrix.innerStride(); }
/** \sa MatrixBase::coeff()
* \warning the coordinates must fit into the referenced triangular part
@@ -132,7 +132,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
{
return Product<OtherDerived,SelfAdjointView>(lhs.derived(),rhs);
}
friend EIGEN_DEVICE_FUNC
const SelfAdjointView<const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,MatrixType,product),UpLo>
operator*(const Scalar& s, const SelfAdjointView& mat)
@@ -300,17 +300,17 @@ protected:
using Base::m_src;
using Base::m_functor;
public:
typedef typename Base::DstEvaluatorType DstEvaluatorType;
typedef typename Base::SrcEvaluatorType SrcEvaluatorType;
typedef typename Base::Scalar Scalar;
typedef typename Base::AssignmentTraits AssignmentTraits;
EIGEN_DEVICE_FUNC triangular_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{}
EIGEN_DEVICE_FUNC void assignCoeff(Index row, Index col)
{
eigen_internal_assert(row!=col);
@@ -318,12 +318,12 @@ public:
m_functor.assignCoeff(m_dst.coeffRef(row,col), tmp);
m_functor.assignCoeff(m_dst.coeffRef(col,row), numext::conj(tmp));
}
EIGEN_DEVICE_FUNC void assignDiagonalCoeff(Index id)
{
Base::assignCoeff(id,id);
}
EIGEN_DEVICE_FUNC void assignOppositeCoeff(Index, Index)
{ eigen_internal_assert(false && "should never be called"); }
};

View File

@@ -13,7 +13,7 @@
namespace Eigen {
template<typename Decomposition, typename RhsType, typename StorageKind> class SolveImpl;
/** \class Solve
* \ingroup Core_Module
*
@@ -64,13 +64,13 @@ class Solve : public SolveImpl<Decomposition,RhsType,typename internal::traits<R
public:
typedef typename internal::traits<Solve>::PlainObject PlainObject;
typedef typename internal::traits<Solve>::StorageIndex StorageIndex;
Solve(const Decomposition &dec, const RhsType &rhs)
: m_dec(dec), m_rhs(rhs)
{}
EIGEN_DEVICE_FUNC Index rows() const { return m_dec.cols(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dec.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC const Decomposition& dec() const { return m_dec; }
EIGEN_DEVICE_FUNC const RhsType& rhs() const { return m_rhs; }
@@ -87,14 +87,14 @@ class SolveImpl<Decomposition,RhsType,Dense>
: public MatrixBase<Solve<Decomposition,RhsType> >
{
typedef Solve<Decomposition,RhsType> Derived;
public:
typedef MatrixBase<Solve<Decomposition,RhsType> > Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
private:
Scalar coeff(Index row, Index col) const;
Scalar coeff(Index i) const;
};
@@ -119,15 +119,15 @@ struct evaluator<Solve<Decomposition,RhsType> >
typedef evaluator<PlainObject> Base;
enum { Flags = Base::Flags | EvalBeforeNestingBit };
EIGEN_DEVICE_FUNC explicit evaluator(const SolveType& solve)
: m_result(solve.rows(), solve.cols())
{
::new (static_cast<Base*>(this)) Base(m_result);
solve.dec()._solve_impl(solve.rhs(), m_result);
}
protected:
protected:
PlainObject m_result;
};
@@ -176,7 +176,7 @@ struct Assignment<DstXprType, Solve<CwiseUnaryOp<internal::scalar_conjugate_op<t
Index dstCols = src.cols();
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
src.dec().nestedExpression().nestedExpression().template _solve_impl_transposed<true>(src.rhs(), dst);
}
};

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SOLVETRIANGULAR_H
#define EIGEN_SOLVETRIANGULAR_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -54,7 +54,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
typedef blas_traits<Lhs> LhsProductTraits;
typedef typename LhsProductTraits::ExtractType ActualLhsType;
typedef Map<Matrix<RhsScalar,Dynamic,1>, Aligned> MappedRhs;
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
ActualLhsType actualLhs = LhsProductTraits::extract(lhs);
@@ -64,7 +64,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhs,rhs.size(),
(useRhsDirectly ? rhs.data() : 0));
if(!useRhsDirectly)
MappedRhs(actualRhs,rhs.size()) = rhs;
@@ -85,7 +85,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,Dynamic>
typedef blas_traits<Lhs> LhsProductTraits;
typedef typename LhsProductTraits::DirectLinearAccessType ActualLhsType;
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
typename internal::add_const_on_value_type<ActualLhsType>::type actualLhs = LhsProductTraits::extract(lhs);
@@ -118,7 +118,7 @@ struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,false> {
DiagIndex = IsLower ? LoopIndex : Size - LoopIndex - 1,
StartIndex = IsLower ? 0 : DiagIndex+1
};
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
if (LoopIndex>0)
rhs.coeffRef(DiagIndex) -= lhs.row(DiagIndex).template segment<LoopIndex>(StartIndex).transpose()
@@ -133,22 +133,22 @@ struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,false> {
template<typename Lhs, typename Rhs, int Mode, int LoopIndex, int Size>
struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,true> {
static void run(const Lhs&, Rhs&) {}
static EIGEN_DEVICE_FUNC void run(const Lhs&, Rhs&) {}
};
template<typename Lhs, typename Rhs, int Mode>
struct triangular_solver_selector<Lhs,Rhs,OnTheLeft,Mode,CompleteUnrolling,1> {
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{ triangular_solver_unroller<Lhs,Rhs,Mode,0,Rhs::SizeAtCompileTime>::run(lhs,rhs); }
};
template<typename Lhs, typename Rhs, int Mode>
struct triangular_solver_selector<Lhs,Rhs,OnTheRight,Mode,CompleteUnrolling,1> {
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
Transpose<const Lhs> trLhs(lhs);
Transpose<Rhs> trRhs(rhs);
triangular_solver_unroller<Transpose<const Lhs>,Transpose<Rhs>,
((Mode&Upper)==Upper ? Lower : Upper) | (Mode&UnitDiag),
0,Rhs::SizeAtCompileTime>::run(trLhs,trRhs);
@@ -213,8 +213,8 @@ template<int Side, typename TriangularType, typename Rhs> struct triangular_solv
: m_triangularMatrix(tri), m_rhs(rhs)
{}
inline Index rows() const { return m_rhs.rows(); }
inline Index cols() const { return m_rhs.cols(); }
inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_rhs.rows(); }
inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); }
template<typename Dest> inline void evalTo(Dest& dst) const
{

View File

@@ -123,41 +123,28 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
using std::pow;
using std::sqrt;
using std::abs;
// This program calculates the machine-dependent constants
// bl, b2, slm, s2m, relerr overfl
// from the "basic" machine-dependent numbers
// nbig, ibeta, it, iemin, iemax, rbig.
// The following define the basic machine-dependent constants.
// For portability, the PORT subprograms "ilmaeh" and "rlmach"
// are used. For any specific computer, each of the assignment
// statements can be replaced
static const int ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
static const int it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
static const int iemin = NumTraits<RealScalar>::min_exponent(); // minimum exponent
static const int iemax = NumTraits<RealScalar>::max_exponent(); // maximum exponent
static const RealScalar rbig = NumTraits<RealScalar>::highest(); // largest floating-point number
static const RealScalar b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(-((1-iemin)/2)))); // lower boundary of midrange
static const RealScalar b2 = RealScalar(pow(RealScalar(ibeta),RealScalar((iemax + 1 - it)/2))); // upper boundary of midrange
static const RealScalar s1m = RealScalar(pow(RealScalar(ibeta),RealScalar((2-iemin)/2))); // scaling factor for lower range
static const RealScalar s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(- ((iemax+it)/2)))); // scaling factor for upper range
static const RealScalar eps = RealScalar(pow(double(ibeta), 1-it));
static const RealScalar relerr = sqrt(eps); // tolerance for neglecting asml
const Derived& vec(_vec.derived());
static bool initialized = false;
static RealScalar b1, b2, s1m, s2m, rbig, relerr;
if(!initialized)
{
int ibeta, it, iemin, iemax, iexp;
RealScalar eps;
// This program calculates the machine-dependent constants
// bl, b2, slm, s2m, relerr overfl
// from the "basic" machine-dependent numbers
// nbig, ibeta, it, iemin, iemax, rbig.
// The following define the basic machine-dependent constants.
// For portability, the PORT subprograms "ilmaeh" and "rlmach"
// are used. For any specific computer, each of the assignment
// statements can be replaced
ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
iemin = std::numeric_limits<RealScalar>::min_exponent; // minimum exponent
iemax = std::numeric_limits<RealScalar>::max_exponent; // maximum exponent
rbig = (std::numeric_limits<RealScalar>::max)(); // largest floating-point number
iexp = -((1-iemin)/2);
b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // lower boundary of midrange
iexp = (iemax + 1 - it)/2;
b2 = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // upper boundary of midrange
iexp = (2-iemin)/2;
s1m = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // scaling factor for lower range
iexp = - ((iemax+it)/2);
s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // scaling factor for upper range
eps = RealScalar(pow(double(ibeta), 1-it));
relerr = sqrt(eps); // tolerance for neglecting asml
initialized = true;
}
Index n = vec.size();
RealScalar ab2 = b2 / RealScalar(n);
RealScalar asml = RealScalar(0);
@@ -166,9 +153,9 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
for(Index j=0; j<vec.outerSize(); ++j)
{
for(typename Derived::InnerIterator it(vec, j); it; ++it)
for(typename Derived::InnerIterator iter(vec, j); iter; ++iter)
{
RealScalar ax = abs(it.value());
RealScalar ax = abs(iter.value());
if(ax > ab2) abig += numext::abs2(ax*s2m);
else if(ax < b1) asml += numext::abs2(ax*s1m);
else amed += numext::abs2(ax);

View File

@@ -7,6 +7,9 @@
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_STLITERATORS_H
#define EIGEN_STLITERATORS_H
namespace Eigen {
namespace internal {
@@ -30,10 +33,10 @@ public:
typedef Index difference_type;
typedef std::random_access_iterator_tag iterator_category;
indexed_based_stl_iterator_base() : mp_xpr(0), m_index(0) {}
indexed_based_stl_iterator_base(XprType& xpr, Index index) : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_iterator_base() EIGEN_NO_THROW : mp_xpr(0), m_index(0) {}
indexed_based_stl_iterator_base(XprType& xpr, Index index) EIGEN_NO_THROW : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_iterator_base(const non_const_iterator& other)
indexed_based_stl_iterator_base(const non_const_iterator& other) EIGEN_NO_THROW
: mp_xpr(other.mp_xpr), m_index(other.m_index)
{}
@@ -93,6 +96,85 @@ protected:
Index m_index;
};
template<typename Derived>
class indexed_based_stl_reverse_iterator_base
{
protected:
typedef indexed_based_stl_iterator_traits<Derived> traits;
typedef typename traits::XprType XprType;
typedef indexed_based_stl_reverse_iterator_base<typename traits::non_const_iterator> non_const_iterator;
typedef indexed_based_stl_reverse_iterator_base<typename traits::const_iterator> const_iterator;
typedef typename internal::conditional<internal::is_const<XprType>::value,non_const_iterator,const_iterator>::type other_iterator;
// NOTE: in C++03 we cannot declare friend classes through typedefs because we need to write friend class:
friend class indexed_based_stl_reverse_iterator_base<typename traits::const_iterator>;
friend class indexed_based_stl_reverse_iterator_base<typename traits::non_const_iterator>;
public:
typedef Index difference_type;
typedef std::random_access_iterator_tag iterator_category;
indexed_based_stl_reverse_iterator_base() : mp_xpr(0), m_index(0) {}
indexed_based_stl_reverse_iterator_base(XprType& xpr, Index index) : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_reverse_iterator_base(const non_const_iterator& other)
: mp_xpr(other.mp_xpr), m_index(other.m_index)
{}
indexed_based_stl_reverse_iterator_base& operator=(const non_const_iterator& other)
{
mp_xpr = other.mp_xpr;
m_index = other.m_index;
return *this;
}
Derived& operator++() { --m_index; return derived(); }
Derived& operator--() { ++m_index; return derived(); }
Derived operator++(int) { Derived prev(derived()); operator++(); return prev;}
Derived operator--(int) { Derived prev(derived()); operator--(); return prev;}
friend Derived operator+(const indexed_based_stl_reverse_iterator_base& a, Index b) { Derived ret(a.derived()); ret += b; return ret; }
friend Derived operator-(const indexed_based_stl_reverse_iterator_base& a, Index b) { Derived ret(a.derived()); ret -= b; return ret; }
friend Derived operator+(Index a, const indexed_based_stl_reverse_iterator_base& b) { Derived ret(b.derived()); ret += a; return ret; }
friend Derived operator-(Index a, const indexed_based_stl_reverse_iterator_base& b) { Derived ret(b.derived()); ret -= a; return ret; }
Derived& operator+=(Index b) { m_index -= b; return derived(); }
Derived& operator-=(Index b) { m_index += b; return derived(); }
difference_type operator-(const indexed_based_stl_reverse_iterator_base& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return other.m_index - m_index;
}
difference_type operator-(const other_iterator& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return other.m_index - m_index;
}
bool operator==(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator<=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator> (const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator>=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
bool operator==(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator<=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator> (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator>=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
protected:
Derived& derived() { return static_cast<Derived&>(*this); }
const Derived& derived() const { return static_cast<const Derived&>(*this); }
XprType *mp_xpr;
Index m_index;
};
template<typename XprType>
class pointer_based_stl_iterator
{
@@ -111,17 +193,17 @@ public:
typedef typename internal::conditional<bool(is_lvalue), value_type&, const value_type&>::type reference;
pointer_based_stl_iterator() : m_ptr(0) {}
pointer_based_stl_iterator(XprType& xpr, Index index) : m_incr(xpr.innerStride())
pointer_based_stl_iterator() EIGEN_NO_THROW : m_ptr(0) {}
pointer_based_stl_iterator(XprType& xpr, Index index) EIGEN_NO_THROW : m_incr(xpr.innerStride())
{
m_ptr = xpr.data() + index * m_incr.value();
}
pointer_based_stl_iterator(const non_const_iterator& other)
pointer_based_stl_iterator(const non_const_iterator& other) EIGEN_NO_THROW
: m_ptr(other.m_ptr), m_incr(other.m_incr)
{}
pointer_based_stl_iterator& operator=(const non_const_iterator& other)
pointer_based_stl_iterator& operator=(const non_const_iterator& other) EIGEN_NO_THROW
{
m_ptr = other.m_ptr;
m_incr.setValue(other.m_incr);
@@ -267,6 +349,54 @@ public:
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
template<typename _XprType, DirectionType Direction>
struct indexed_based_stl_iterator_traits<subvector_stl_reverse_iterator<_XprType,Direction> >
{
typedef _XprType XprType;
typedef subvector_stl_reverse_iterator<typename internal::remove_const<XprType>::type, Direction> non_const_iterator;
typedef subvector_stl_reverse_iterator<typename internal::add_const<XprType>::type, Direction> const_iterator;
};
template<typename XprType, DirectionType Direction>
class subvector_stl_reverse_iterator : public indexed_based_stl_reverse_iterator_base<subvector_stl_reverse_iterator<XprType,Direction> >
{
protected:
enum { is_lvalue = internal::is_lvalue<XprType>::value };
typedef indexed_based_stl_reverse_iterator_base<subvector_stl_reverse_iterator> Base;
using Base::m_index;
using Base::mp_xpr;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ColXpr,typename XprType::RowXpr>::type SubVectorType;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ConstColXpr,typename XprType::ConstRowXpr>::type ConstSubVectorType;
public:
typedef typename internal::conditional<bool(is_lvalue), SubVectorType, ConstSubVectorType>::type reference;
typedef typename reference::PlainObject value_type;
private:
class subvector_stl_reverse_iterator_ptr
{
public:
subvector_stl_reverse_iterator_ptr(const reference &subvector) : m_subvector(subvector) {}
reference* operator->() { return &m_subvector; }
private:
reference m_subvector;
};
public:
typedef subvector_stl_reverse_iterator_ptr pointer;
subvector_stl_reverse_iterator() : Base() {}
subvector_stl_reverse_iterator(XprType& xpr, Index index) : Base(xpr,index) {}
reference operator*() const { return (*mp_xpr).template subVector<Direction>(m_index); }
reference operator[](Index i) const { return (*mp_xpr).template subVector<Direction>(m_index+i); }
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
} // namespace internal
@@ -329,3 +459,5 @@ inline typename DenseBase<Derived>::const_iterator DenseBase<Derived>::cend() co
}
} // namespace Eigen
#endif // EIGEN_STLITERATORS_H

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_STRIDE_H
#define EIGEN_STRIDE_H
namespace Eigen {
namespace Eigen {
/** \class Stride
* \ingroup Core_Module
@@ -38,6 +38,10 @@ namespace Eigen {
* \include Map_general_stride.cpp
* Output: \verbinclude Map_general_stride.out
*
* Both strides can be negative, however, a negative stride of -1 cannot be specified at compiletime
* because of the ambiguity with Dynamic which is defined to -1 (historically, negative strides were
* not allowed).
*
* \sa class InnerStride, class OuterStride, \ref TopicStorageOrders
*/
template<int _OuterStrideAtCompileTime, int _InnerStrideAtCompileTime>
@@ -55,6 +59,8 @@ class Stride
Stride()
: m_outer(OuterStrideAtCompileTime), m_inner(InnerStrideAtCompileTime)
{
// FIXME: for Eigen 4 we should use DynamicIndex instead of Dynamic.
// FIXME: for Eigen 4 we should also unify this API with fix<>
eigen_assert(InnerStrideAtCompileTime != Dynamic && OuterStrideAtCompileTime != Dynamic);
}
@@ -63,7 +69,6 @@ class Stride
Stride(Index outerStride, Index innerStride)
: m_outer(outerStride), m_inner(innerStride)
{
eigen_assert(innerStride>=0 && outerStride>=0);
}
/** Copy constructor */
@@ -73,10 +78,10 @@ class Stride
{}
/** \returns the outer stride */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outer() const { return m_outer.value(); }
/** \returns the inner stride */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index inner() const { return m_inner.value(); }
protected:

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_TRANSPOSE_H
#define EIGEN_TRANSPOSE_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename MatrixType>
@@ -65,10 +65,10 @@ template<typename MatrixType> class Transpose
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Transpose)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -153,6 +153,8 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
{
return derived().nestedExpression().coeffRef(index);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TransposeImpl)
};
/** \returns an expression of the transpose of *this.
@@ -241,7 +243,6 @@ struct inplace_transpose_selector<MatrixType,true,false> { // square matrix
}
};
// TODO: vectorized path is currently limited to LargestPacketSize x LargestPacketSize cases only.
template<typename MatrixType>
struct inplace_transpose_selector<MatrixType,true,true> { // PacketSize x PacketSize
static void run(MatrixType& m) {
@@ -258,16 +259,66 @@ struct inplace_transpose_selector<MatrixType,true,true> { // PacketSize x Packet
}
};
template <typename MatrixType, Index Alignment>
void BlockedInPlaceTranspose(MatrixType& m) {
typedef typename MatrixType::Scalar Scalar;
typedef typename internal::packet_traits<typename MatrixType::Scalar>::type Packet;
const Index PacketSize = internal::packet_traits<Scalar>::size;
eigen_assert(m.rows() == m.cols());
int row_start = 0;
for (; row_start + PacketSize <= m.rows(); row_start += PacketSize) {
for (int col_start = row_start; col_start + PacketSize <= m.cols(); col_start += PacketSize) {
PacketBlock<Packet> A;
if (row_start == col_start) {
for (Index i=0; i<PacketSize; ++i)
A.packet[i] = m.template packetByOuterInner<Alignment>(row_start + i,col_start);
internal::ptranspose(A);
for (Index i=0; i<PacketSize; ++i)
m.template writePacket<Alignment>(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), A.packet[i]);
} else {
PacketBlock<Packet> B;
for (Index i=0; i<PacketSize; ++i) {
A.packet[i] = m.template packetByOuterInner<Alignment>(row_start + i,col_start);
B.packet[i] = m.template packetByOuterInner<Alignment>(col_start + i, row_start);
}
internal::ptranspose(A);
internal::ptranspose(B);
for (Index i=0; i<PacketSize; ++i) {
m.template writePacket<Alignment>(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), B.packet[i]);
m.template writePacket<Alignment>(m.rowIndexByOuterInner(col_start + i, row_start), m.colIndexByOuterInner(col_start + i,row_start), A.packet[i]);
}
}
}
}
for (Index row = row_start; row < m.rows(); ++row) {
m.matrix().row(row).head(row).swap(
m.matrix().col(row).head(row).transpose());
}
}
template<typename MatrixType,bool MatchPacketSize>
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square matrix
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square or dynamic matrix
static void run(MatrixType& m) {
if (m.rows()==m.cols())
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
else
typedef typename MatrixType::Scalar Scalar;
if (m.rows() == m.cols()) {
const Index PacketSize = internal::packet_traits<Scalar>::size;
if (!NumTraits<Scalar>::IsComplex && m.rows() >= PacketSize) {
if ((m.rows() % PacketSize) == 0)
BlockedInPlaceTranspose<MatrixType,internal::evaluator<MatrixType>::Alignment>(m);
else
BlockedInPlaceTranspose<MatrixType,Unaligned>(m);
}
else {
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
}
} else {
m = m.transpose().eval();
}
}
};
} // end namespace internal
/** This is the "in place" version of transpose(): it replaces \c *this by its own transpose.
@@ -285,7 +336,7 @@ struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non squ
* Notice however that this method is only useful if you want to replace a matrix by its own transpose.
* If you just need the transpose of a matrix, use transpose().
*
* \note if the matrix is not square, then \c *this must be a resizable matrix.
* \note if the matrix is not square, then \c *this must be a resizable matrix.
* This excludes (non-square) fixed-size matrices, block-expressions and maps.
*
* \sa transpose(), adjoint(), adjointInPlace() */

View File

@@ -10,20 +10,22 @@
#ifndef EIGEN_TRANSPOSITIONS_H
#define EIGEN_TRANSPOSITIONS_H
namespace Eigen {
namespace Eigen {
template<typename Derived>
class TranspositionsBase
{
typedef internal::traits<Derived> Traits;
public:
typedef typename Traits::IndicesType IndicesType;
typedef typename IndicesType::Scalar StorageIndex;
typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3
EIGEN_DEVICE_FUNC
Derived& derived() { return *static_cast<Derived*>(this); }
EIGEN_DEVICE_FUNC
const Derived& derived() const { return *static_cast<const Derived*>(this); }
/** Copies the \a other transpositions into \c *this */
@@ -35,13 +37,17 @@ class TranspositionsBase
}
/** \returns the number of transpositions */
EIGEN_DEVICE_FUNC
Index size() const { return indices().size(); }
/** \returns the number of rows of the equivalent permutation matrix */
EIGEN_DEVICE_FUNC
Index rows() const { return indices().size(); }
/** \returns the number of columns of the equivalent permutation matrix */
EIGEN_DEVICE_FUNC
Index cols() const { return indices().size(); }
/** Direct access to the underlying index vector */
EIGEN_DEVICE_FUNC
inline const StorageIndex& coeff(Index i) const { return indices().coeff(i); }
/** Direct access to the underlying index vector */
inline StorageIndex& coeffRef(Index i) { return indices().coeffRef(i); }
@@ -55,8 +61,10 @@ class TranspositionsBase
inline StorageIndex& operator[](Index i) { return indices()(i); }
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return derived().indices(); }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return derived().indices(); }
/** Resizes to given size. */
@@ -178,8 +186,10 @@ class Transpositions : public TranspositionsBase<Transpositions<SizeAtCompileTim
{}
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -237,9 +247,11 @@ class Map<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,_StorageIndex>,P
#endif
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -279,9 +291,11 @@ class TranspositionsWrapper
}
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -335,9 +349,12 @@ class Transpose<TranspositionsBase<TranspositionsDerived> >
explicit Transpose(const TranspositionType& t) : m_transpositions(t) {}
Index size() const { return m_transpositions.size(); }
Index rows() const { return m_transpositions.size(); }
Index cols() const { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index size() const EIGEN_NOEXCEPT { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_transpositions.size(); }
/** \returns the \a matrix with the inverse transpositions applied to the columns.
*/
@@ -356,7 +373,8 @@ class Transpose<TranspositionsBase<TranspositionsDerived> >
{
return Product<Transpose, OtherDerived, AliasFreeProduct>(*this, matrix.derived());
}
EIGEN_DEVICE_FUNC
const TranspositionType& nestedExpression() const { return m_transpositions; }
protected:

View File

@@ -11,12 +11,12 @@
#ifndef EIGEN_TRIANGULARMATRIX_H
#define EIGEN_TRIANGULARMATRIX_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<int Side, typename TriangularType, typename Rhs> struct triangular_solve_retval;
}
/** \class TriangularBase
@@ -34,16 +34,16 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime,
SizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::RowsAtCompileTime,
internal::traits<Derived>::ColsAtCompileTime>::ret),
/**< This is equal to the number of coefficients, i.e. the number of
* rows times the number of columns, or to \a Dynamic if this is not
* known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */
MaxSizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::MaxRowsAtCompileTime,
internal::traits<Derived>::MaxColsAtCompileTime>::ret)
};
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef typename internal::traits<Derived>::StorageKind StorageKind;
@@ -55,15 +55,15 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
EIGEN_DEVICE_FUNC
inline TriangularBase() { eigen_assert(!((Mode&UnitDiag) && (Mode&ZeroDiag))); }
EIGEN_DEVICE_FUNC
inline Index rows() const { return derived().rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return derived().cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return derived().outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return derived().innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return derived().cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return derived().outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return derived().innerStride(); }
// dummy resize function
EIGEN_DEVICE_FUNC
void resize(Index rows, Index cols)
@@ -156,7 +156,7 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
* \param MatrixType the type of the object in which we are taking the triangular part
* \param Mode the kind of triangular matrix expression to construct. Can be #Upper,
* #Lower, #UnitUpper, #UnitLower, #StrictlyUpper, or #StrictlyLower.
* This is in fact a bit field; it must have either #Upper or #Lower,
* This is in fact a bit field; it must have either #Upper or #Lower,
* and additionally it may have #UnitDiag or #ZeroDiag or neither.
*
* This class represents a triangular part of a matrix, not necessarily square. Strictly speaking, for rectangular
@@ -199,7 +199,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
typedef TriangularView<typename internal::add_const<MatrixType>::type, _Mode> ConstTriangularView;
public:
typedef typename internal::traits<TriangularView>::StorageKind StorageKind;
@@ -218,17 +218,15 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
EIGEN_DEVICE_FUNC
explicit inline TriangularView(MatrixType& matrix) : m_matrix(matrix)
{}
using Base::operator=;
TriangularView& operator=(const TriangularView &other)
{ return Base::operator=(other); }
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TriangularView)
/** \copydoc EigenBase::rows() */
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
/** \copydoc EigenBase::cols() */
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
/** \returns a const reference to the nested expression */
EIGEN_DEVICE_FUNC
@@ -237,7 +235,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
/** \returns a reference to the nested expression */
EIGEN_DEVICE_FUNC
NestedExpression& nestedExpression() { return m_matrix; }
typedef TriangularView<const MatrixConjugateReturnType,Mode> ConjugateReturnType;
/** \sa MatrixBase::conjugate() const */
EIGEN_DEVICE_FUNC
@@ -271,7 +269,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
typename MatrixType::TransposeReturnType tmp(m_matrix);
return TransposeReturnType(tmp);
}
typedef TriangularView<const typename MatrixType::ConstTransposeReturnType,TransposeMode> ConstTransposeReturnType;
/** \sa MatrixBase::transpose() const */
EIGEN_DEVICE_FUNC
@@ -282,10 +280,10 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
template<typename Other>
EIGEN_DEVICE_FUNC
inline const Solve<TriangularView, Other>
inline const Solve<TriangularView, Other>
solve(const MatrixBase<Other>& other) const
{ return Solve<TriangularView, Other>(*this, other.derived()); }
// workaround MSVC ICE
#if EIGEN_COMP_MSVC
template<int Side, typename Other>
@@ -329,7 +327,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
else
return m_matrix.diagonal().prod();
}
protected:
MatrixTypeNested m_matrix;
@@ -391,7 +389,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
internal::call_assignment_no_alias(derived(), other.derived(), internal::sub_assign_op<Scalar,typename Other::Scalar>());
return derived();
}
/** \sa MatrixBase::operator*=() */
EIGEN_DEVICE_FUNC
TriangularViewType& operator*=(const typename internal::traits<MatrixType>::Scalar& other) { return *this = derived().nestedExpression() * other; }
@@ -557,6 +555,10 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
template<typename ProductType>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE TriangularViewType& _assignProduct(const ProductType& prod, const Scalar& alpha, bool beta);
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(TriangularViewImpl)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TriangularViewImpl)
};
/***************************************************************************
@@ -713,7 +715,7 @@ bool MatrixBase<Derived>::isLowerTriangular(const RealScalar& prec) const
namespace internal {
// TODO currently a triangular expression has the form TriangularView<.,.>
// in the future triangular-ness should be defined by the expression traits
// such that Transpose<TriangularView<.,.> > is valid. (currently TriangularBase::transpose() is overloaded to make it work)
@@ -742,7 +744,7 @@ struct Dense2Triangular {};
template<typename Kernel, unsigned int Mode, int UnrollCount, bool ClearOpposite> struct triangular_assignment_loop;
/** \internal Specialization of the dense assignment kernel for triangular matrices.
* The main difference is that the triangular, diagonal, and opposite parts are processed through three different functions.
* \tparam UpLo must be either Lower or Upper
@@ -759,17 +761,17 @@ protected:
using Base::m_src;
using Base::m_functor;
public:
typedef typename Base::DstEvaluatorType DstEvaluatorType;
typedef typename Base::SrcEvaluatorType SrcEvaluatorType;
typedef typename Base::Scalar Scalar;
typedef typename Base::AssignmentTraits AssignmentTraits;
EIGEN_DEVICE_FUNC triangular_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{}
#ifdef EIGEN_INTERNAL_DEBUGGING
EIGEN_DEVICE_FUNC void assignCoeff(Index row, Index col)
{
@@ -779,16 +781,16 @@ public:
#else
using Base::assignCoeff;
#endif
EIGEN_DEVICE_FUNC void assignDiagonalCoeff(Index id)
{
if(Mode==UnitDiag && SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(id,id), Scalar(1));
else if(Mode==ZeroDiag && SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(id,id), Scalar(0));
else if(Mode==0) Base::assignCoeff(id,id);
}
EIGEN_DEVICE_FUNC void assignOppositeCoeff(Index row, Index col)
{
{
eigen_internal_assert(row!=col);
if(SetOpposite)
m_functor.assignCoeff(m_dst.coeffRef(row,col), Scalar(0));
@@ -809,17 +811,17 @@ void call_triangular_assignment_loop(DstXprType& dst, const SrcXprType& src, con
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
DstEvaluatorType dstEvaluator(dst);
typedef triangular_dense_assignment_kernel< Mode&(Lower|Upper),Mode&(UnitDiag|ZeroDiag|SelfAdjoint),SetOpposite,
DstEvaluatorType,SrcEvaluatorType,Functor> Kernel;
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
enum {
unroll = DstXprType::SizeAtCompileTime != Dynamic
&& SrcEvaluatorType::CoeffReadCost < HugeCost
&& DstXprType::SizeAtCompileTime * (DstEvaluatorType::CoeffReadCost+SrcEvaluatorType::CoeffReadCost) / 2 <= EIGEN_UNROLLING_LIMIT
};
triangular_assignment_loop<Kernel, Mode, unroll ? int(DstXprType::SizeAtCompileTime) : Dynamic, SetOpposite>::run(kernel);
}
@@ -841,8 +843,8 @@ struct Assignment<DstXprType, SrcXprType, Functor, Triangular2Triangular>
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
eigen_assert(int(DstXprType::Mode) == int(SrcXprType::Mode));
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
}
};
@@ -851,7 +853,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Triangular2Dense>
{
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
call_triangular_assignment_loop<SrcXprType::Mode, (SrcXprType::Mode&SelfAdjoint)==0>(dst, src, func);
call_triangular_assignment_loop<SrcXprType::Mode, (SrcXprType::Mode&SelfAdjoint)==0>(dst, src, func);
}
};
@@ -860,7 +862,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Dense2Triangular>
{
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
}
};
@@ -871,19 +873,19 @@ struct triangular_assignment_loop
// FIXME: this is not very clean, perhaps this information should be provided by the kernel?
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
enum {
col = (UnrollCount-1) / DstXprType::RowsAtCompileTime,
row = (UnrollCount-1) % DstXprType::RowsAtCompileTime
};
typedef typename Kernel::Scalar Scalar;
EIGEN_DEVICE_FUNC
static inline void run(Kernel &kernel)
{
triangular_assignment_loop<Kernel, Mode, UnrollCount-1, SetOpposite>::run(kernel);
if(row==col)
kernel.assignDiagonalCoeff(row);
else if( ((Mode&Lower) && row>col) || ((Mode&Upper) && row<col) )
@@ -926,10 +928,10 @@ struct triangular_assignment_loop<Kernel, Mode, Dynamic, SetOpposite>
}
else
i = maxi;
if(i<kernel.rows()) // then i==j
kernel.assignDiagonalCoeff(i++);
if (((Mode&Upper) && SetOpposite) || (Mode&Lower))
{
for(; i < kernel.rows(); ++i)
@@ -953,7 +955,7 @@ EIGEN_DEVICE_FUNC void TriangularBase<Derived>::evalToLazy(MatrixBase<DenseDeriv
}
namespace internal {
// Triangular = Product
template< typename DstXprType, typename Lhs, typename Rhs, typename Scalar>
struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::assign_op<Scalar,typename Product<Lhs,Rhs,DefaultProduct>::Scalar>, Dense2Triangular>
@@ -966,7 +968,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::assign_
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
dst._assignProduct(src, 1, 0);
dst._assignProduct(src, Scalar(1), false);
}
};
@@ -977,7 +979,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::add_ass
typedef Product<Lhs,Rhs,DefaultProduct> SrcXprType;
static void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op<Scalar,typename SrcXprType::Scalar> &)
{
dst._assignProduct(src, 1, 1);
dst._assignProduct(src, Scalar(1), true);
}
};
@@ -988,7 +990,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::sub_ass
typedef Product<Lhs,Rhs,DefaultProduct> SrcXprType;
static void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op<Scalar,typename SrcXprType::Scalar> &)
{
dst._assignProduct(src, -1, 1);
dst._assignProduct(src, Scalar(-1), true);
}
};

View File

@@ -65,10 +65,10 @@ class PartialReduxExpr : public internal::dense_xpr_base< PartialReduxExpr<Matri
explicit PartialReduxExpr(const MatrixType& mat, const MemberOp& func = MemberOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_DEVICE_FUNC
Index rows() const { return (Direction==Vertical ? 1 : m_matrix.rows()); }
EIGEN_DEVICE_FUNC
Index cols() const { return (Direction==Horizontal ? 1 : m_matrix.cols()); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return (Direction==Vertical ? 1 : m_matrix.rows()); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return (Direction==Horizontal ? 1 : m_matrix.cols()); }
EIGEN_DEVICE_FUNC
typename MatrixType::Nested nestedExpression() const { return m_matrix; }
@@ -134,7 +134,7 @@ struct member_redux {
typedef typename result_of<
BinaryOp(const Scalar&,const Scalar&)
>::type result_type;
enum { Vectorizable = functor_traits<BinaryOp>::PacketAccess };
template<int Size> struct Cost { enum { value = (Size-1) * functor_traits<BinaryOp>::Cost }; };
EIGEN_DEVICE_FUNC explicit member_redux(const BinaryOp func) : m_functor(func) {}
@@ -162,17 +162,17 @@ struct member_redux {
* where `foo` is any method of `VectorwiseOp`. This expression is equivalent to applying `foo()` to each
* column of `A` and then re-assemble the outputs in a matrix expression:
* \code [A.col(0).foo(), A.col(1).foo(), ..., A.col(A.cols()-1).foo()] \endcode
*
*
* Example: \include MatrixBase_colwise.cpp
* Output: \verbinclude MatrixBase_colwise.out
*
* The begin() and end() methods are obviously exceptions to the previous rule as they
* return STL-compatible begin/end iterators to the rows or columns of the nested expression.
* Typical use cases include for-range-loop and calls to STL algorithms:
*
*
* Example: \include MatrixBase_colwise_iterator_cxx11.cpp
* Output: \verbinclude MatrixBase_colwise_iterator_cxx11.out
*
*
* For a partial reduction on an empty input, some rules apply.
* For the sake of clarity, let's consider a vertical reduction:
* - If the number of columns is zero, then a 1x0 row-major vector expression is returned.
@@ -180,7 +180,7 @@ struct member_redux {
* - a row vector of zeros is returned for sum-like reductions (sum, squaredNorm, norm, etc.)
* - a row vector of ones is returned for a product reduction (e.g., <code>MatrixXd(n,0).colwise().prod()</code>)
* - an assert is triggered for all other reductions (minCoeff,maxCoeff,redux(bin_op))
*
*
* \sa DenseBase::colwise(), DenseBase::rowwise(), class PartialReduxExpr
*/
template<typename ExpressionType, int Direction> class VectorwiseOp
@@ -216,7 +216,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
};
protected:
template<typename OtherDerived> struct ExtendedType {
typedef Replicate<OtherDerived,
isVertical ? 1 : ExpressionType::RowsAtCompileTime,
@@ -279,27 +279,47 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
/** This is the const version of iterator (aka read-only) */
random_access_iterator_type const_iterator;
#else
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
typedef internal::subvector_stl_reverse_iterator<ExpressionType, DirectionType(Direction)> reverse_iterator;
typedef internal::subvector_stl_reverse_iterator<const ExpressionType, DirectionType(Direction)> const_reverse_iterator;
#endif
/** returns an iterator to the first row (rowwise) or column (colwise) of the nested expression.
* \sa end(), cbegin()
*/
iterator begin() { return iterator (m_matrix, 0); }
iterator begin() { return iterator (m_matrix, 0); }
/** const version of begin() */
const_iterator begin() const { return const_iterator(m_matrix, 0); }
const_iterator begin() const { return const_iterator(m_matrix, 0); }
/** const version of begin() */
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
/** returns a reverse iterator to the last row (rowwise) or column (colwise) of the nested expression.
* \sa rend(), crbegin()
*/
reverse_iterator rbegin() { return reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** const version of rbegin() */
const_reverse_iterator rbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** const version of rbegin() */
const_reverse_iterator crbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** returns an iterator to the row (resp. column) following the last row (resp. column) of the nested expression
* \sa begin(), cend()
*/
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** returns a reverse iterator to the row (resp. column) before the first row (resp. column) of the nested expression
* \sa begin(), cend()
*/
reverse_iterator rend() { return reverse_iterator (m_matrix, -1); }
/** const version of rend() */
const_reverse_iterator rend() const { return const_reverse_iterator (m_matrix, -1); }
/** const version of rend() */
const_reverse_iterator crend() const { return const_reverse_iterator (m_matrix, -1); }
/** \returns a row or column vector expression of \c *this reduxed by \a func
*
@@ -308,7 +328,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
*
* \sa class VectorwiseOp, DenseBase::colwise(), DenseBase::rowwise()
*/
template<typename BinaryOp>
@@ -345,7 +365,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
*
* \warning the result is undefined if \c *this contains NaN.
*
* Example: \include PartialRedux_minCoeff.cpp
@@ -364,7 +384,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
*
* \warning the result is undefined if \c *this contains NaN.
*
* Example: \include PartialRedux_maxCoeff.cpp
@@ -719,6 +739,10 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_DEVICE_FUNC
const HNormalizedReturnType hnormalized() const;
# ifdef EIGEN_VECTORWISEOP_PLUGIN
# include EIGEN_VECTORWISEOP_PLUGIN
# endif
protected:
Index redux_length() const
{

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_VISITOR_H
#define EIGEN_VISITOR_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -70,22 +70,22 @@ class visitor_evaluator
public:
EIGEN_DEVICE_FUNC
explicit visitor_evaluator(const XprType &xpr) : m_evaluator(xpr), m_xpr(xpr) {}
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
enum {
RowsAtCompileTime = XprType::RowsAtCompileTime,
CoeffReadCost = internal::evaluator<XprType>::CoeffReadCost
};
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index size() const { return m_xpr.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_xpr.size(); }
EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index row, Index col) const
{ return m_evaluator.coeff(row, col); }
protected:
internal::evaluator<XprType> m_evaluator;
const XprType &m_xpr;
@@ -106,7 +106,7 @@ protected:
*
* \note compared to one or two \em for \em loops, visitors offer automatic
* unrolling for small fixed size matrix.
*
*
* \note if the matrix is empty, then the visitor is left unchanged.
*
* \sa minCoeff(Index*,Index*), maxCoeff(Index*,Index*), DenseBase::redux()
@@ -118,10 +118,10 @@ void DenseBase<Derived>::visit(Visitor& visitor) const
{
if(size()==0)
return;
typedef typename internal::visitor_evaluator<Derived> ThisEvaluator;
ThisEvaluator thisEval(derived());
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * ThisEvaluator::CoeffReadCost + (SizeAtCompileTime-1) * internal::functor_traits<Visitor>::Cost <= EIGEN_UNROLLING_LIMIT
@@ -157,7 +157,7 @@ struct coeff_visitor
*
* \sa DenseBase::minCoeff(Index*, Index*)
*/
template <typename Derived>
template <typename Derived, int NaNPropagation>
struct min_coeff_visitor : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
@@ -173,8 +173,40 @@ struct min_coeff_visitor : coeff_visitor<Derived>
}
};
template<typename Scalar>
struct functor_traits<min_coeff_visitor<Scalar> > {
template <typename Derived>
struct min_coeff_visitor<Derived, PropagateNumbers> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(this->res) || (!(numext::isnan)(value) && value < this->res))
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template <typename Derived>
struct min_coeff_visitor<Derived, PropagateNaN> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(value) || value < this->res)
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template<typename Scalar, int NaNPropagation>
struct functor_traits<min_coeff_visitor<Scalar, NaNPropagation> > {
enum {
Cost = NumTraits<Scalar>::AddCost
};
@@ -185,10 +217,10 @@ struct functor_traits<min_coeff_visitor<Scalar> > {
*
* \sa DenseBase::maxCoeff(Index*, Index*)
*/
template <typename Derived>
template <typename Derived, int NaNPropagation>
struct max_coeff_visitor : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
@@ -201,8 +233,40 @@ struct max_coeff_visitor : coeff_visitor<Derived>
}
};
template<typename Scalar>
struct functor_traits<max_coeff_visitor<Scalar> > {
template <typename Derived>
struct max_coeff_visitor<Derived, PropagateNumbers> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(this->res) || (!(numext::isnan)(value) && value > this->res))
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template <typename Derived>
struct max_coeff_visitor<Derived, PropagateNaN> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(value) || value > this->res)
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template<typename Scalar, int NaNPropagation>
struct functor_traits<max_coeff_visitor<Scalar, NaNPropagation> > {
enum {
Cost = NumTraits<Scalar>::AddCost
};
@@ -212,22 +276,24 @@ struct functor_traits<max_coeff_visitor<Scalar> > {
/** \fn DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
* \returns the minimum of all coefficients of *this and puts in *row and *col its location.
*
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(Index*), DenseBase::maxCoeff(Index*,Index*), DenseBase::visit(), DenseBase::minCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
internal::min_coeff_visitor<Derived> minVisitor;
internal::min_coeff_visitor<Derived, NaNPropagation> minVisitor;
this->visit(minVisitor);
*rowId = minVisitor.row;
if (colId) *colId = minVisitor.col;
@@ -235,15 +301,17 @@ DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
}
/** \returns the minimum of all coefficients of *this and puts in *index its location.
*
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::minCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff(IndexType* index) const
@@ -251,7 +319,7 @@ DenseBase<Derived>::minCoeff(IndexType* index) const
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
internal::min_coeff_visitor<Derived> minVisitor;
internal::min_coeff_visitor<Derived, NaNPropagation> minVisitor;
this->visit(minVisitor);
*index = IndexType((RowsAtCompileTime==1) ? minVisitor.col : minVisitor.row);
return minVisitor.res;
@@ -259,22 +327,24 @@ DenseBase<Derived>::minCoeff(IndexType* index) const
/** \fn DenseBase<Derived>::maxCoeff(IndexType* rowId, IndexType* colId) const
* \returns the maximum of all coefficients of *this and puts in *row and *col its location.
*
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::maxCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
internal::max_coeff_visitor<Derived> maxVisitor;
internal::max_coeff_visitor<Derived, NaNPropagation> maxVisitor;
this->visit(maxVisitor);
*rowPtr = maxVisitor.row;
if (colPtr) *colPtr = maxVisitor.col;
@@ -282,15 +352,17 @@ DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
}
/** \returns the maximum of all coefficients of *this and puts in *index its location.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visitor(), DenseBase::maxCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff(IndexType* index) const
@@ -298,7 +370,7 @@ DenseBase<Derived>::maxCoeff(IndexType* index) const
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
internal::max_coeff_visitor<Derived> maxVisitor;
internal::max_coeff_visitor<Derived, NaNPropagation> maxVisitor;
this->visit(maxVisitor);
*index = (RowsAtCompileTime==1) ? maxVisitor.col : maxVisitor.row;
return maxVisitor.res;

View File

@@ -38,6 +38,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -47,7 +48,18 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet4cf> { typedef std::complex<float> type; enum {size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet4cf> {
typedef std::complex<float> type;
typedef Packet2cf half;
typedef Packet8f as_real;
enum {
size=4,
alignment=Aligned32,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet4cf padd<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf psub<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_sub_ps(a.v,b.v)); }
@@ -76,7 +88,6 @@ EIGEN_STRONG_INLINE Packet4cf pcmp_eq(const Packet4cf& a, const Packet4cf& b) {
}
template<> EIGEN_STRONG_INLINE Packet4cf ptrue<Packet4cf>(const Packet4cf& a) { return Packet4cf(ptrue(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pnot<Packet4cf>(const Packet4cf& a) { return Packet4cf(pnot(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pand <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_and_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf por <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_or_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pxor <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_xor_ps(a.v,b.v)); }
@@ -150,37 +161,12 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet4cf>(const Packe
Packet2cf(_mm256_extractf128_ps(a.v,1))));
}
template<> EIGEN_STRONG_INLINE Packet4cf preduxp<Packet4cf>(const Packet4cf* vecs)
{
Packet8f t0 = _mm256_shuffle_ps(vecs[0].v, vecs[0].v, _MM_SHUFFLE(3, 1, 2 ,0));
Packet8f t1 = _mm256_shuffle_ps(vecs[1].v, vecs[1].v, _MM_SHUFFLE(3, 1, 2 ,0));
t0 = _mm256_hadd_ps(t0,t1);
Packet8f t2 = _mm256_shuffle_ps(vecs[2].v, vecs[2].v, _MM_SHUFFLE(3, 1, 2 ,0));
Packet8f t3 = _mm256_shuffle_ps(vecs[3].v, vecs[3].v, _MM_SHUFFLE(3, 1, 2 ,0));
t2 = _mm256_hadd_ps(t2,t3);
t1 = _mm256_permute2f128_ps(t0,t2, 0 + (2<<4));
t3 = _mm256_permute2f128_ps(t0,t2, 1 + (3<<4));
return Packet4cf(_mm256_add_ps(t1,t3));
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet4cf>(const Packet4cf& a)
{
return predux_mul(pmul(Packet2cf(_mm256_extractf128_ps(a.v, 0)),
Packet2cf(_mm256_extractf128_ps(a.v, 1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cf>
{
static EIGEN_STRONG_INLINE void run(Packet4cf& first, const Packet4cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, false,true>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
@@ -254,6 +240,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -263,7 +250,18 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet2cd> { typedef std::complex<double> type; enum {size=2, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet2cd> {
typedef std::complex<double> type;
typedef Packet1cd half;
typedef Packet4d as_real;
enum {
size=2,
alignment=Aligned32,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet2cd padd<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_add_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd psub<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_sub_pd(a.v,b.v)); }
@@ -291,7 +289,6 @@ EIGEN_STRONG_INLINE Packet2cd pcmp_eq(const Packet2cd& a, const Packet2cd& b) {
}
template<> EIGEN_STRONG_INLINE Packet2cd ptrue<Packet2cd>(const Packet2cd& a) { return Packet2cd(ptrue(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pnot<Packet2cd>(const Packet2cd& a) { return Packet2cd(pnot(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pand <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_and_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd por <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_or_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pxor <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_xor_pd(a.v,b.v)); }
@@ -347,30 +344,12 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet2cd>(const Pack
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<> EIGEN_STRONG_INLINE Packet2cd preduxp<Packet2cd>(const Packet2cd* vecs)
{
Packet4d t0 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 0 + (2<<4));
Packet4d t1 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 1 + (3<<4));
return Packet2cd(_mm256_add_pd(t0,t1));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet2cd>(const Packet2cd& a)
{
return predux(pmul(Packet1cd(_mm256_extractf128_pd(a.v,0)),
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet2cd>
{
static EIGEN_STRONG_INLINE void run(Packet2cd& first, const Packet2cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet4d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, false,true>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
@@ -444,24 +423,12 @@ ptranspose(PacketBlock<Packet2cd,2>& kernel) {
kernel.packet[0].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet4cf pinsertfirst(const Packet4cf& a, std::complex<float> b)
{
return Packet4cf(_mm256_blend_ps(a.v,pset1<Packet4cf>(b).v,1|2));
template<> EIGEN_STRONG_INLINE Packet2cd psqrt<Packet2cd>(const Packet2cd& a) {
return psqrt_complex<Packet2cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet2cd pinsertfirst(const Packet2cd& a, std::complex<double> b)
{
return Packet2cd(_mm256_blend_pd(a.v,pset1<Packet2cd>(b).v,1|2));
}
template<> EIGEN_STRONG_INLINE Packet4cf pinsertlast(const Packet4cf& a, std::complex<float> b)
{
return Packet4cf(_mm256_blend_ps(a.v,pset1<Packet4cf>(b).v,(1<<7)|(1<<6)));
}
template<> EIGEN_STRONG_INLINE Packet2cd pinsertlast(const Packet2cd& a, std::complex<double> b)
{
return Packet2cd(_mm256_blend_pd(a.v,pset1<Packet2cd>(b).v,(1<<3)|(1<<2)));
template<> EIGEN_STRONG_INLINE Packet4cf psqrt<Packet4cf>(const Packet4cf& a) {
return psqrt_complex<Packet4cf>(a);
}
} // end namespace internal

View File

@@ -36,6 +36,24 @@ plog<Packet8f>(const Packet8f& _x) {
return plog_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
plog<Packet4d>(const Packet4d& _x) {
return plog_double(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
plog2<Packet8f>(const Packet8f& _x) {
return plog2_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
plog2<Packet4d>(const Packet4d& _x) {
return plog2_double(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f plog1p<Packet8f>(const Packet8f& _x) {
return generic_plog1p(_x);
@@ -58,15 +76,15 @@ pexp<Packet8f>(const Packet8f& _x) {
// Hyperbolic Tangent function.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
ptanh<Packet8f>(const Packet8f& x) {
return internal::generic_fast_tanh_float(x);
ptanh<Packet8f>(const Packet8f& _x) {
return internal::generic_fast_tanh_float(_x);
}
// Exponential function for doubles.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
pexp<Packet4d>(const Packet4d& x) {
return pexp_double(x);
pexp<Packet4d>(const Packet4d& _x) {
return pexp_double(_x);
}
// Functions for sqrt.
@@ -79,33 +97,36 @@ pexp<Packet4d>(const Packet4d& x) {
// For detail see here: http://www.beyond3d.com/content/articles/8/
#if EIGEN_FAST_MATH
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
psqrt<Packet8f>(const Packet8f& _x) {
Packet8f half = pmul(_x, pset1<Packet8f>(.5f));
Packet8f denormal_mask = _mm256_and_ps(
_mm256_cmp_ps(_x, pset1<Packet8f>((std::numeric_limits<float>::min)()),
_CMP_LT_OQ),
_mm256_cmp_ps(_x, _mm256_setzero_ps(), _CMP_GE_OQ));
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& _x) {
Packet8f minus_half_x = pmul(_x, pset1<Packet8f>(-0.5f));
Packet8f denormal_mask = pandnot(
pcmp_lt(_x, pset1<Packet8f>((std::numeric_limits<float>::min)())),
pcmp_lt(_x, pzero(_x)));
// Compute approximate reciprocal sqrt.
Packet8f x = _mm256_rsqrt_ps(_x);
// Do a single step of Newton's iteration.
x = pmul(x, psub(pset1<Packet8f>(1.5f), pmul(half, pmul(x,x))));
x = pmul(x, pmadd(minus_half_x, pmul(x,x), pset1<Packet8f>(1.5f)));
// Flush results for denormals to zero.
return _mm256_andnot_ps(denormal_mask, pmul(_x,x));
return pandnot(pmul(_x,x), denormal_mask);
}
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& x) {
return _mm256_sqrt_ps(x);
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d psqrt<Packet4d>(const Packet4d& x) {
return _mm256_sqrt_pd(x);
}
#if EIGEN_FAST_MATH
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& _x) {
return _mm256_sqrt_ps(_x);
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d psqrt<Packet4d>(const Packet4d& _x) {
return _mm256_sqrt_pd(_x);
}
#if EIGEN_FAST_MATH
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000);
@@ -140,18 +161,65 @@ Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& x) {
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f(one, 1.0f);
return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(x));
return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(_x));
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d prsqrt<Packet4d>(const Packet4d& x) {
Packet4d prsqrt<Packet4d>(const Packet4d& _x) {
_EIGEN_DECLARE_CONST_Packet4d(one, 1.0);
return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(x));
return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(_x));
}
F16_PACKET_FUNCTION(Packet8f, Packet8h, psin)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pcos)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog2)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog1p)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pexpm1)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pexp)
F16_PACKET_FUNCTION(Packet8f, Packet8h, ptanh)
F16_PACKET_FUNCTION(Packet8f, Packet8h, psqrt)
F16_PACKET_FUNCTION(Packet8f, Packet8h, prsqrt)
template <>
EIGEN_STRONG_INLINE Packet8h pfrexp(const Packet8h& a, Packet8h& exponent) {
Packet8f fexponent;
const Packet8h out = float2half(pfrexp<Packet8f>(half2float(a), fexponent));
exponent = float2half(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet8h pldexp(const Packet8h& a, const Packet8h& exponent) {
return float2half(pldexp<Packet8f>(half2float(a), half2float(exponent)));
}
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psin)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pcos)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog2)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog1p)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexpm1)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexp)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, ptanh)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psqrt)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, prsqrt)
template <>
EIGEN_STRONG_INLINE Packet8bf pfrexp(const Packet8bf& a, Packet8bf& exponent) {
Packet8f fexponent;
const Packet8bf out = F32ToBf16(pfrexp<Packet8f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet8bf pldexp(const Packet8bf& a, const Packet8bf& exponent) {
return F32ToBf16(pldexp<Packet8f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
} // end namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -35,6 +35,46 @@ struct type_casting_traits<int, float> {
};
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<bfloat16, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<float, bfloat16> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8i pcast<Packet8f, Packet8i>(const Packet8f& a) {
return _mm256_cvttps_epi32(a);
@@ -52,36 +92,22 @@ template<> EIGEN_STRONG_INLINE Packet8f preinterpret<Packet8f,Packet8i>(const Pa
return _mm256_castsi256_ps(a);
}
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8h, Packet8f>(const Packet8h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8bf, Packet8f>(const Packet8bf& a) {
return Bf16ToF32(a);
}
template<> EIGEN_STRONG_INLINE Packet8h pcast<Packet8f, Packet8h>(const Packet8f& a) {
return float2half(a);
}
template<> EIGEN_STRONG_INLINE Packet8bf pcast<Packet8f, Packet8bf>(const Packet8f& a) {
return F32ToBf16(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -37,17 +37,19 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet8cf> {
typedef std::complex<float> type;
typedef Packet4cf half;
typedef Packet16f as_real;
enum {
size = 8,
alignment=unpacket_traits<Packet16f>::alignment,
@@ -55,11 +57,9 @@ template<> struct unpacket_traits<Packet8cf> {
masked_load_available=false,
masked_store_available=false
};
typedef Packet4cf half;
};
template<> EIGEN_STRONG_INLINE Packet8cf ptrue<Packet8cf>(const Packet8cf& a) { return Packet8cf(ptrue(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf pnot<Packet8cf>(const Packet8cf& a) { return Packet8cf(pnot(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf padd<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf psub<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_sub_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf pnegate(const Packet8cf& a)
@@ -153,16 +153,6 @@ EIGEN_STRONG_INLINE Packet4cf predux_half_dowto4<Packet8cf>(const Packet8cf& a)
return Packet4cf(res);
}
template<int Offset>
struct palign_impl<Offset,Packet8cf>
{
static EIGEN_STRONG_INLINE void run(Packet8cf& first, const Packet8cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet16f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, false,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
@@ -235,17 +225,19 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet4cd> {
typedef std::complex<double> type;
typedef Packet2cd half;
typedef Packet8d as_real;
enum {
size = 4,
alignment = unpacket_traits<Packet8d>::alignment,
@@ -253,7 +245,6 @@ template<> struct unpacket_traits<Packet4cd> {
masked_load_available=false,
masked_store_available=false
};
typedef Packet2cd half;
};
template<> EIGEN_STRONG_INLINE Packet4cd padd<Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_add_pd(a.v,b.v)); }
@@ -277,7 +268,6 @@ template<> EIGEN_STRONG_INLINE Packet4cd pmul<Packet4cd>(const Packet4cd& a, con
}
template<> EIGEN_STRONG_INLINE Packet4cd ptrue<Packet4cd>(const Packet4cd& a) { return Packet4cd(ptrue(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pnot<Packet4cd>(const Packet4cd& a) { return Packet4cd(pnot(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pand <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd por <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pxor <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pxor(a.v,b.v)); }
@@ -337,7 +327,7 @@ template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet4cd>(const Pack
}
template<> EIGEN_STRONG_INLINE Packet4cd preverse(const Packet4cd& a) {
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, EIGEN_SSE_SHUFFLE_MASK(3,2,1,0)));
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, (shuffle_mask<3,2,1,0>::mask)));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet4cd>(const Packet4cd& a)
@@ -352,16 +342,6 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet4cd>(const
Packet2cd(_mm512_extractf64x4_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cd>
{
static EIGEN_STRONG_INLINE void run(Packet4cd& first, const Packet4cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, false,true>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
@@ -450,43 +430,26 @@ ptranspose(PacketBlock<Packet8cf,8>& kernel) {
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet4cd,4>& kernel) {
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [c2 c3 d2 d3]
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<0,1,0,1>::mask)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<2,3,2,3>::mask)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<0,1,0,1>::mask)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<2,3,2,3>::mask)); // [c2 c3 d2 d3]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a0 b0 c0 d0]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<1,3,1,3>::mask))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<0,2,0,2>::mask))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<1,3,1,3>::mask))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<0,2,0,2>::mask))); // [a0 b0 c0 d0]
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertfirst(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,0));
tmp = pinsertfirst(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 0) );
template<> EIGEN_STRONG_INLINE Packet4cd psqrt<Packet4cd>(const Packet4cd& a) {
return psqrt_complex<Packet4cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertfirst(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 0) ));
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertlast(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,3) );
tmp = pinsertlast(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 3) );
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertlast(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 3) ));
template<> EIGEN_STRONG_INLINE Packet8cf psqrt<Packet8cf>(const Packet8cf& a) {
return psqrt_complex<Packet8cf>(a);
}
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_COMPLEX_AVX512_H

View File

@@ -29,106 +29,41 @@ namespace internal {
#define _EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(NAME, X) \
const Packet8d p8d_##NAME = _mm512_castsi512_pd(_mm512_set1_epi64(X))
// Natural logarithm
// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
// be easily approximated by a polynomial centered on m=1 for stability.
#if defined(EIGEN_VECTORIZE_AVX512DQ)
#define _EIGEN_DECLARE_CONST_Packet16bf(NAME, X) \
const Packet16bf p16bf_##NAME = pset1<Packet16bf>(X)
#define _EIGEN_DECLARE_CONST_Packet16bf_FROM_INT(NAME, X) \
const Packet16bf p16bf_##NAME = preinterpret<Packet16bf,Packet16i>(pset1<Packet16i>(X))
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
plog<Packet16f>(const Packet16f& _x) {
Packet16f x = _x;
_EIGEN_DECLARE_CONST_Packet16f(1, 1.0f);
_EIGEN_DECLARE_CONST_Packet16f(half, 0.5f);
_EIGEN_DECLARE_CONST_Packet16f(126f, 126.0f);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(inv_mant_mask, ~0x7f800000);
// The smallest non denormalized float number.
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(min_norm_pos, 0x00800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(minus_inf, 0xff800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(pos_inf, 0x7f800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(nan, 0x7fc00000);
// Polynomial coefficients.
_EIGEN_DECLARE_CONST_Packet16f(cephes_SQRTHF, 0.707106781186547524f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p0, 7.0376836292E-2f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p1, -1.1514610310E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p2, 1.1676998740E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p3, -1.2420140846E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p4, +1.4249322787E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p5, -1.6668057665E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p6, +2.0000714765E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p7, -2.4999993993E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p8, +3.3333331174E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_q1, -2.12194440e-4f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_q2, 0.693359375f);
// invalid_mask is set to true when x is NaN
__mmask16 invalid_mask = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_NGE_UQ);
__mmask16 iszero_mask = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_EQ_OQ);
// Truncate input values to the minimum positive normal.
x = pmax(x, p16f_min_norm_pos);
// Extract the shifted exponents.
Packet16f emm0 = _mm512_cvtepi32_ps(_mm512_srli_epi32(preinterpret<Packet16i,Packet16f>(x), 23));
Packet16f e = _mm512_sub_ps(emm0, p16f_126f);
// Set the exponents to -1, i.e. x are in the range [0.5,1).
x = _mm512_and_ps(x, p16f_inv_mant_mask);
x = _mm512_or_ps(x, p16f_half);
// part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2))
// and shift by -1. The values are then centered around 0, which improves
// the stability of the polynomial evaluation.
// if( x < SQRTHF ) {
// e -= 1;
// x = x + x - 1.0;
// } else { x = x - 1.0; }
__mmask16 mask = _mm512_cmp_ps_mask(x, p16f_cephes_SQRTHF, _CMP_LT_OQ);
Packet16f tmp = _mm512_mask_blend_ps(mask, _mm512_setzero_ps(), x);
x = psub(x, p16f_1);
e = psub(e, _mm512_mask_blend_ps(mask, _mm512_setzero_ps(), p16f_1));
x = padd(x, tmp);
Packet16f x2 = pmul(x, x);
Packet16f x3 = pmul(x2, x);
// Evaluate the polynomial approximant of degree 8 in three parts, probably
// to improve instruction-level parallelism.
Packet16f y, y1, y2;
y = pmadd(p16f_cephes_log_p0, x, p16f_cephes_log_p1);
y1 = pmadd(p16f_cephes_log_p3, x, p16f_cephes_log_p4);
y2 = pmadd(p16f_cephes_log_p6, x, p16f_cephes_log_p7);
y = pmadd(y, x, p16f_cephes_log_p2);
y1 = pmadd(y1, x, p16f_cephes_log_p5);
y2 = pmadd(y2, x, p16f_cephes_log_p8);
y = pmadd(y, x3, y1);
y = pmadd(y, x3, y2);
y = pmul(y, x3);
// Add the logarithm of the exponent back to the result of the interpolation.
y1 = pmul(e, p16f_cephes_log_q1);
tmp = pmul(x2, p16f_half);
y = padd(y, y1);
x = psub(x, tmp);
y2 = pmul(e, p16f_cephes_log_q2);
x = padd(x, y);
x = padd(x, y2);
__mmask16 pos_inf_mask = _mm512_cmp_ps_mask(_x,p16f_pos_inf,_CMP_EQ_OQ);
// Filter out invalid inputs, i.e.:
// - negative arg will be NAN,
// - 0 will be -INF.
// - +INF will be +INF
return _mm512_mask_blend_ps(iszero_mask,
_mm512_mask_blend_ps(invalid_mask,
_mm512_mask_blend_ps(pos_inf_mask,x,p16f_pos_inf),
p16f_nan),
p16f_minus_inf);
return plog_float(_x);
}
#endif
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
plog<Packet8d>(const Packet8d& _x) {
return plog_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog)
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
plog2<Packet16f>(const Packet16f& _x) {
return plog2_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
plog2<Packet8d>(const Packet8d& _x) {
return plog2_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog2)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog2)
// Exponential function. Works by writing "x = m*log(2) + r" where
// "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then
@@ -164,17 +99,17 @@ pexp<Packet16f>(const Packet16f& _x) {
_EIGEN_DECLARE_CONST_Packet16f(nln2, -0.6931471805599453f);
Packet16f r = _mm512_fmadd_ps(m, p16f_nln2, x);
Packet16f r2 = pmul(r, r);
Packet16f r3 = pmul(r2, r);
// TODO(gonnet): Split into odd/even polynomials and try to exploit
// instruction-level parallelism.
Packet16f y = p16f_cephes_exp_p0;
y = pmadd(y, r, p16f_cephes_exp_p1);
y = pmadd(y, r, p16f_cephes_exp_p2);
y = pmadd(y, r, p16f_cephes_exp_p3);
y = pmadd(y, r, p16f_cephes_exp_p4);
y = pmadd(y, r, p16f_cephes_exp_p5);
y = pmadd(y, r2, r);
y = padd(y, p16f_1);
// Evaluate the polynomial approximant,improved by instruction-level parallelism.
Packet16f y, y1, y2;
y = pmadd(p16f_cephes_exp_p0, r, p16f_cephes_exp_p1);
y1 = pmadd(p16f_cephes_exp_p3, r, p16f_cephes_exp_p4);
y2 = padd(r, p16f_1);
y = pmadd(y, r, p16f_cephes_exp_p2);
y1 = pmadd(y1, r, p16f_cephes_exp_p5);
y = pmadd(y, r3, y1);
y = pmadd(y, r2, y2);
// Build emm0 = 2^m.
Packet16i emm0 = _mm512_cvttps_epi32(padd(m, p16f_127));
@@ -253,6 +188,34 @@ pexp<Packet8d>(const Packet8d& _x) {
return pmax(pmul(x, e), _x);
}*/
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexp)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexp)
template <>
EIGEN_STRONG_INLINE Packet16h pfrexp(const Packet16h& a, Packet16h& exponent) {
Packet16f fexponent;
const Packet16h out = float2half(pfrexp<Packet16f>(half2float(a), fexponent));
exponent = float2half(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet16h pldexp(const Packet16h& a, const Packet16h& exponent) {
return float2half(pldexp<Packet16f>(half2float(a), half2float(exponent)));
}
template <>
EIGEN_STRONG_INLINE Packet16bf pfrexp(const Packet16bf& a, Packet16bf& exponent) {
Packet16f fexponent;
const Packet16bf out = F32ToBf16(pfrexp<Packet16f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet16bf pldexp(const Packet16bf& a, const Packet16bf& exponent) {
return F32ToBf16(pldexp<Packet16f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
// Functions for sqrt.
// The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step
@@ -303,12 +266,16 @@ template <>
EIGEN_STRONG_INLINE Packet16f psqrt<Packet16f>(const Packet16f& x) {
return _mm512_sqrt_ps(x);
}
template <>
EIGEN_STRONG_INLINE Packet8d psqrt<Packet8d>(const Packet8d& x) {
return _mm512_sqrt_pd(x);
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, psqrt)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psqrt)
// prsqrt for float.
#if defined(EIGEN_VECTORIZE_AVX512ER)
@@ -316,7 +283,6 @@ template <>
EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
return _mm512_rsqrt28_ps(x);
}
#elif EIGEN_FAST_MATH
template <>
@@ -332,7 +298,7 @@ prsqrt<Packet16f>(const Packet16f& _x) {
__mmask16 inf_mask = _mm512_cmp_ps_mask(_x, p16f_inf, _CMP_EQ_OQ);
__mmask16 not_pos_mask = _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_LE_OQ);
__mmask16 not_finite_pos_mask = not_pos_mask | inf_mask;
// Compute an approximate result using the rsqrt intrinsic, forcing +inf
// for denormals for consistency with AVX and SSE implementations.
Packet16f y_approx = _mm512_rsqrt14_ps(_x);
@@ -347,8 +313,7 @@ prsqrt<Packet16f>(const Packet16f& _x) {
// For other arguments, choose the output of the intrinsic. This will
// return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf.
return _mm512_mask_blend_ps(not_finite_pos_mask, y_newton, y_approx);
}
}
#else
template <>
@@ -356,9 +321,11 @@ EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
_EIGEN_DECLARE_CONST_Packet16f(one, 1.0f);
return _mm512_div_ps(p16f_one, _mm512_sqrt_ps(x));
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, prsqrt)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, prsqrt)
// prsqrt for double.
#if EIGEN_FAST_MATH
template <>
@@ -406,17 +373,21 @@ EIGEN_STRONG_INLINE Packet8d prsqrt<Packet8d>(const Packet8d& x) {
}
#endif
#if defined(EIGEN_VECTORIZE_AVX512DQ)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f plog1p<Packet16f>(const Packet16f& _x) {
return generic_plog1p(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog1p)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog1p)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f pexpm1<Packet16f>(const Packet16f& _x) {
return generic_expm1(_x);
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexpm1)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexpm1)
#endif
@@ -439,6 +410,14 @@ ptanh<Packet16f>(const Packet16f& _x) {
return internal::generic_fast_tanh_float(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, psin)
F16_PACKET_FUNCTION(Packet16f, Packet16h, pcos)
F16_PACKET_FUNCTION(Packet16f, Packet16h, ptanh)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psin)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pcos)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, ptanh)
} // end namespace internal
} // end namespace Eigen

File diff suppressed because it is too large Load Diff

View File

@@ -14,6 +14,22 @@ namespace Eigen {
namespace internal {
template<> EIGEN_STRONG_INLINE Packet16i pcast<Packet16f, Packet16i>(const Packet16f& a) {
return _mm512_cvttps_epi32(a);
}
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16i, Packet16f>(const Packet16i& a) {
return _mm512_cvtepi32_ps(a);
}
template<> EIGEN_STRONG_INLINE Packet16i preinterpret<Packet16i, Packet16f>(const Packet16f& a) {
return _mm512_castps_si512(a);
}
template<> EIGEN_STRONG_INLINE Packet16f preinterpret<Packet16f, Packet16i>(const Packet16i& a) {
return _mm512_castsi512_ps(a);
}
template <>
struct type_casting_traits<half, float> {
enum {
@@ -40,6 +56,32 @@ template<> EIGEN_STRONG_INLINE Packet16h pcast<Packet16f, Packet16h>(const Packe
return float2half(a);
}
template <>
struct type_casting_traits<bfloat16, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16bf, Packet16f>(const Packet16bf& a) {
return Bf16ToF32(a);
}
template <>
struct type_casting_traits<float, bfloat16> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16bf pcast<Packet16f, Packet16bf>(const Packet16f& a) {
return F32ToBf16(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -29,8 +29,54 @@ static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2d_MZERO, (P
//---------- float ----------
struct Packet2cf
{
EIGEN_STRONG_INLINE explicit Packet2cf() : v(p4f_ZERO) {}
EIGEN_STRONG_INLINE explicit Packet2cf() {}
EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {}
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b)
{
Packet4f v1, v2;
// Permute and multiply the real parts of a and b
v1 = vec_perm(a.v, a.v, p16uc_PSET32_WODD);
// Get the imaginary parts of a
v2 = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN);
// multiply a_re * b
v1 = vec_madd(v1, b.v, p4f_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(v2, b.v, p4f_ZERO);
v2 = reinterpret_cast<Packet4f>(pxor(v2, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR)));
// permute back to a proper order
v2 = vec_perm(v2, v2, p16uc_COMPLEX32_REV);
return Packet2cf(padd<Packet4f>(v1, v2));
}
EIGEN_STRONG_INLINE Packet2cf& operator*=(const Packet2cf& b) {
v = pmul(Packet2cf(*this), b).v;
return *this;
}
EIGEN_STRONG_INLINE Packet2cf operator*(const Packet2cf& b) const {
return Packet2cf(*this) *= b;
}
EIGEN_STRONG_INLINE Packet2cf& operator+=(const Packet2cf& b) {
v = padd(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet2cf operator+(const Packet2cf& b) const {
return Packet2cf(*this) += b;
}
EIGEN_STRONG_INLINE Packet2cf& operator-=(const Packet2cf& b) {
v = psub(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet2cf operator-(const Packet2cf& b) const {
return Packet2cf(*this) -= b;
}
EIGEN_STRONG_INLINE Packet2cf operator-(void) const {
return Packet2cf(vec_neg(v));
}
Packet4f v;
};
@@ -38,6 +84,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
{
typedef Packet2cf type;
typedef Packet2cf half;
typedef Packet4f as_real;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
@@ -60,7 +107,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; typedef Packet4f as_real; };
template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>& from)
{
@@ -80,6 +127,25 @@ template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<
template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> * to, const Packet2cf& from) { pstore((float*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> * to, const Packet2cf& from) { pstoreu((float*)to, from.v); }
EIGEN_STRONG_INLINE Packet2cf pload2(const std::complex<float>* from0, const std::complex<float>* from1)
{
Packet4f res0, res1;
#ifdef __VSX__
__asm__ ("lxsdx %x0,%y1" : "=wa" (res0) : "Z" (*from0));
__asm__ ("lxsdx %x0,%y1" : "=wa" (res1) : "Z" (*from1));
#ifdef _BIG_ENDIAN
__asm__ ("xxpermdi %x0, %x1, %x2, 0" : "=wa" (res0) : "wa" (res0), "wa" (res1));
#else
__asm__ ("xxpermdi %x0, %x2, %x1, 0" : "=wa" (res0) : "wa" (res0), "wa" (res1));
#endif
#else
*reinterpret_cast<std::complex<float> *>(&res0) = *from0;
*reinterpret_cast<std::complex<float> *>(&res1) = *from1;
res0 = vec_perm(res0, res1, p16uc_TRANSPOSE64_HI);
#endif
return Packet2cf(res0);
}
template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, Index stride)
{
EIGEN_ALIGN16 std::complex<float> af[2];
@@ -100,25 +166,6 @@ template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, con
template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate(a.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf(pxor<Packet4f>(a.v, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR))); }
template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
Packet4f v1, v2;
// Permute and multiply the real parts of a and b
v1 = vec_perm(a.v, a.v, p16uc_PSET32_WODD);
// Get the imaginary parts of a
v2 = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN);
// multiply a_re * b
v1 = vec_madd(v1, b.v, p4f_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(v2, b.v, p4f_ZERO);
v2 = reinterpret_cast<Packet4f>(pxor(v2, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR)));
// permute back to a proper order
v2 = vec_perm(v2, v2, p16uc_COMPLEX32_REV);
return Packet2cf(padd<Packet4f>(v1, v2));
}
template<> EIGEN_STRONG_INLINE Packet2cf pand <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pand<Packet4f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf por <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(por<Packet4f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pxor<Packet4f>(a.v, b.v)); }
@@ -149,22 +196,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packe
return pfirst<Packet2cf>(Packet2cf(b));
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
{
Packet4f b1, b2;
#ifdef _BIG_ENDIAN
b1 = vec_sld(vecs[0].v, vecs[1].v, 8);
b2 = vec_sld(vecs[1].v, vecs[0].v, 8);
#else
b1 = vec_sld(vecs[1].v, vecs[0].v, 8);
b2 = vec_sld(vecs[0].v, vecs[1].v, 8);
#endif
b2 = vec_sld(b2, b2, 8);
b2 = padd<Packet4f>(b1, b2);
return Packet2cf(b2);
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
Packet4f b;
@@ -175,22 +206,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst<Packet2cf>(prod);
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
{
static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset==1)
{
#ifdef _BIG_ENDIAN
first.v = vec_sld(first.v, second.v, 8);
#else
first.v = vec_sld(second.v, first.v, 8);
#endif
}
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
@@ -259,12 +274,62 @@ template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, con
}
#endif
template<> EIGEN_STRONG_INLINE Packet2cf psqrt<Packet2cf>(const Packet2cf& a)
{
return psqrt_complex<Packet2cf>(a);
}
//---------- double ----------
#ifdef __VSX__
struct Packet1cd
{
EIGEN_STRONG_INLINE Packet1cd() {}
EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {}
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b)
{
Packet2d a_re, a_im, v1, v2;
// Permute and multiply the real parts of a and b
a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI);
// Get the imaginary parts of a
a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO);
// multiply a_re * b
v1 = vec_madd(a_re, b.v, p2d_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(a_im, b.v, p2d_ZERO);
v2 = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(v2), reinterpret_cast<Packet4ui>(v2), 8));
v2 = pxor(v2, reinterpret_cast<Packet2d>(p2ul_CONJ_XOR1));
return Packet1cd(padd<Packet2d>(v1, v2));
}
EIGEN_STRONG_INLINE Packet1cd& operator*=(const Packet1cd& b) {
v = pmul(Packet1cd(*this), b).v;
return *this;
}
EIGEN_STRONG_INLINE Packet1cd operator*(const Packet1cd& b) const {
return Packet1cd(*this) *= b;
}
EIGEN_STRONG_INLINE Packet1cd& operator+=(const Packet1cd& b) {
v = padd(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet1cd operator+(const Packet1cd& b) const {
return Packet1cd(*this) += b;
}
EIGEN_STRONG_INLINE Packet1cd& operator-=(const Packet1cd& b) {
v = psub(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet1cd operator-(const Packet1cd& b) const {
return Packet1cd(*this) -= b;
}
EIGEN_STRONG_INLINE Packet1cd operator-(void) const {
return Packet1cd(vec_neg(v));
}
Packet2d v;
};
@@ -272,6 +337,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
{
typedef Packet1cd type;
typedef Packet1cd half;
typedef Packet2d as_real;
enum {
Vectorizable = 1,
AlignedOnScalar = 0,
@@ -291,7 +357,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; typedef Packet2d as_real; };
template<> EIGEN_STRONG_INLINE Packet1cd pload <Packet1cd>(const std::complex<double>* from) { return Packet1cd(pload<Packet2d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { return Packet1cd(ploadu<Packet2d>((const double*)from)); }
@@ -301,19 +367,13 @@ template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<
template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>& from)
{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index stride)
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index)
{
EIGEN_ALIGN16 std::complex<double> af[2];
af[0] = from[0*stride];
af[1] = from[1*stride];
return pload<Packet1cd>(af);
return pload<Packet1cd>(from);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index stride)
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index)
{
EIGEN_ALIGN16 std::complex<double> af[2];
pstore<std::complex<double> >(af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
pstore<std::complex<double> >(to, from);
}
template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v + b.v); }
@@ -321,24 +381,6 @@ template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, con
template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd(pxor(a.v, reinterpret_cast<Packet2d>(p2ul_CONJ_XOR2))); }
template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
Packet2d a_re, a_im, v1, v2;
// Permute and multiply the real parts of a and b
a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI);
// Get the imaginary parts of a
a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO);
// multiply a_re * b
v1 = vec_madd(a_re, b.v, p2d_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(a_im, b.v, p2d_ZERO);
v2 = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(v2), reinterpret_cast<Packet4ui>(v2), 8));
v2 = pxor(v2, reinterpret_cast<Packet2d>(p2ul_CONJ_XOR1));
return Packet1cd(padd<Packet2d>(v1, v2));
}
template<> EIGEN_STRONG_INLINE Packet1cd pand <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd por <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pxor <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pxor(a.v,b.v)); }
@@ -359,20 +401,9 @@ template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet1cd>(const Pac
template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) { return vecs[0]; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<int Offset>
struct palign_impl<Offset,Packet1cd>
{
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
{
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
@@ -439,6 +470,11 @@ template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packe
return Packet1cd(vec_and(eq, eq_swapped));
}
template<> EIGEN_STRONG_INLINE Packet1cd psqrt<Packet1cd>(const Packet1cd& a)
{
return psqrt_complex<Packet1cd>(a);
}
#endif // __VSX__
} // end namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,221 @@
//#define EIGEN_POWER_USE_PREFETCH // Use prefetching in gemm routines
#ifdef EIGEN_POWER_USE_PREFETCH
#define EIGEN_POWER_PREFETCH(p) prefetch(p)
#else
#define EIGEN_POWER_PREFETCH(p)
#endif
namespace Eigen {
namespace internal {
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accRows>
EIGEN_STRONG_INLINE void gemm_extra_col(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index row,
Index col,
Index remaining_rows,
Index remaining_cols,
const Packet& pAlpha);
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accRows, const Index accCols>
EIGEN_STRONG_INLINE void gemm_extra_row(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index row,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlpha,
const Packet& pMask);
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accCols>
EIGEN_STRONG_INLINE void gemm_unrolled_col(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index& row,
Index rows,
Index col,
Index remaining_cols,
const Packet& pAlpha);
template<typename Packet>
EIGEN_STRONG_INLINE Packet bmask(const int remaining_rows);
template<typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_complex_extra_col(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index row,
Index col,
Index remaining_rows,
Index remaining_cols,
const Packet& pAlphaReal,
const Packet& pAlphaImag);
template<typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_complex_extra_row(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index row,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlphaReal,
const Packet& pAlphaImag,
const Packet& pMask);
template<typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_complex_unrolled_col(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index& row,
Index rows,
Index col,
Index remaining_cols,
const Packet& pAlphaReal,
const Packet& pAlphaImag);
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadLhs(const Scalar* lhs);
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,4>& acc, const DataMapper& res, Index row, Index col);
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,8>& acc, const DataMapper& res, Index row, Index col);
template<typename Packet>
EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha);
template<typename Packet, int N>
EIGEN_STRONG_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packet,N>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,N>& cReal, PacketBlock<Packet,N>& cImag);
const static Packet16uc p16uc_SETCOMPLEX32_FIRST = { 0, 1, 2, 3,
16, 17, 18, 19,
4, 5, 6, 7,
20, 21, 22, 23};
const static Packet16uc p16uc_SETCOMPLEX32_SECOND = { 8, 9, 10, 11,
24, 25, 26, 27,
12, 13, 14, 15,
28, 29, 30, 31};
//[a,b],[ai,bi] = [a,ai] - This is equivalent to p16uc_GETREAL64
const static Packet16uc p16uc_SETCOMPLEX64_FIRST = { 0, 1, 2, 3, 4, 5, 6, 7,
16, 17, 18, 19, 20, 21, 22, 23};
//[a,b],[ai,bi] = [b,bi] - This is equivalent to p16uc_GETIMAG64
const static Packet16uc p16uc_SETCOMPLEX64_SECOND = { 8, 9, 10, 11, 12, 13, 14, 15,
24, 25, 26, 27, 28, 29, 30, 31};
// Grab two decouples real/imaginary PacketBlocks and return two coupled (real/imaginary pairs) PacketBlocks.
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple_common(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX32_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_SECOND);
acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_SECOND);
acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX32_SECOND);
acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX32_SECOND);
}
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc,8>& tRes, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
{
bcouple_common<Packet, Packetc>(taccReal, taccImag, acc1, acc2);
acc1.packet[0] = padd<Packetc>(tRes.packet[0], acc1.packet[0]);
acc1.packet[1] = padd<Packetc>(tRes.packet[1], acc1.packet[1]);
acc1.packet[2] = padd<Packetc>(tRes.packet[2], acc1.packet[2]);
acc1.packet[3] = padd<Packetc>(tRes.packet[3], acc1.packet[3]);
acc2.packet[0] = padd<Packetc>(tRes.packet[4], acc2.packet[0]);
acc2.packet[1] = padd<Packetc>(tRes.packet[5], acc2.packet[1]);
acc2.packet[2] = padd<Packetc>(tRes.packet[6], acc2.packet[2]);
acc2.packet[3] = padd<Packetc>(tRes.packet[7], acc2.packet[3]);
}
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple_common(PacketBlock<Packet,1>& taccReal, PacketBlock<Packet,1>& taccImag, PacketBlock<Packetc, 1>& acc1, PacketBlock<Packetc, 1>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_SECOND);
}
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,1>& taccReal, PacketBlock<Packet,1>& taccImag, PacketBlock<Packetc,2>& tRes, PacketBlock<Packetc, 1>& acc1, PacketBlock<Packetc, 1>& acc2)
{
bcouple_common<Packet, Packetc>(taccReal, taccImag, acc1, acc2);
acc1.packet[0] = padd<Packetc>(tRes.packet[0], acc1.packet[0]);
acc2.packet[0] = padd<Packetc>(tRes.packet[1], acc2.packet[0]);
}
template<>
EIGEN_STRONG_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2d,4>& taccReal, PacketBlock<Packet2d,4>& taccImag, PacketBlock<Packet1cd, 4>& acc1, PacketBlock<Packet1cd, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX64_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_SECOND);
acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_SECOND);
acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX64_SECOND);
acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX64_SECOND);
}
template<>
EIGEN_STRONG_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2d,1>& taccReal, PacketBlock<Packet2d,1>& taccImag, PacketBlock<Packet1cd, 1>& acc1, PacketBlock<Packet1cd, 1>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST);
acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_SECOND);
}
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadRhs(const Scalar* rhs)
{
return *reinterpret_cast<Packet *>(const_cast<Scalar *>(rhs));
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -0,0 +1,637 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2020 Everton Constantino (everton.constantino@ibm.com)
// Copyright (C) 2021 Chip Kerchner (chip.kerchner@ibm.com)
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H
#define EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H
#pragma GCC target("cpu=power10")
#ifdef __has_builtin
#if !__has_builtin(__builtin_vsx_assemble_pair)
#define __builtin_vsx_assemble_pair __builtin_mma_assemble_pair
#endif
#endif
namespace Eigen {
namespace internal {
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void bsetzeroMMA(__vector_quad* acc)
{
__builtin_mma_xxsetaccz(acc);
}
template<typename DataMapper, typename Index, typename Packet, const Index accCols>
EIGEN_STRONG_INLINE void storeAccumulator(Index i, Index j, const DataMapper& data, const Packet& alpha, __vector_quad* acc)
{
PacketBlock<Packet, 4> result;
__builtin_mma_disassemble_acc(&result.packet, acc);
PacketBlock<Packet, 4> tRes;
bload<DataMapper, Packet, Index, accCols, 0, ColMajor>(tRes, data, i, j);
bscale<Packet>(tRes, result, alpha);
data.template storePacketBlock<Packet, 4>(i, j, tRes);
}
template<typename DataMapper, typename Index, typename Packet, typename Packetc, const Index accColsC, int N>
EIGEN_STRONG_INLINE void storeComplexAccumulator(Index i, Index j, const DataMapper& data, const Packet& alphaReal, const Packet& alphaImag, __vector_quad* accReal, __vector_quad* accImag)
{
PacketBlock<Packet, 4> resultReal, resultImag;
__builtin_mma_disassemble_acc(&resultReal.packet, accReal);
__builtin_mma_disassemble_acc(&resultImag.packet, accImag);
PacketBlock<Packetc, 8> tRes;
bload<DataMapper, Packetc, Index, accColsC, N, ColMajor>(tRes, data, i, j);
PacketBlock<Packet,4> taccReal, taccImag;
bscalec<Packet,4>(resultReal, resultImag, alphaReal, alphaImag, taccReal, taccImag);
PacketBlock<Packetc, 4> acc1, acc2;
bcouple<Packet, Packetc>(taccReal, taccImag, tRes, acc1, acc2);
data.template storePacketBlock<Packetc, 4>(i + N*accColsC, j, acc1);
data.template storePacketBlock<Packetc, 4>(i + (N+1)*accColsC, j, acc2);
}
// Defaults to float32, since Eigen still supports C++03 we can't use default template arguments
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const RhsPacket& a, const LhsPacket& b)
{
if(NegativeAccumulate)
{
__builtin_mma_xvf32gernp(acc, (__vector unsigned char)a, (__vector unsigned char)b);
} else {
__builtin_mma_xvf32gerpp(acc, (__vector unsigned char)a, (__vector unsigned char)b);
}
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const PacketBlock<Packet2d,2>& a, const Packet2d& b)
{
__vector_pair* a0 = (__vector_pair *)(&a.packet[0]);
if(NegativeAccumulate)
{
__builtin_mma_xvf64gernp(acc, *a0, (__vector unsigned char)b);
} else {
__builtin_mma_xvf64gerpp(acc, *a0, (__vector unsigned char)b);
}
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet2d& b)
{
if(NegativeAccumulate)
{
__builtin_mma_xvf64gernp(acc, (__vector_pair)a, (__vector unsigned char)b);
} else {
__builtin_mma_xvf64gerpp(acc, (__vector_pair)a, (__vector unsigned char)b);
}
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet4f& b)
{
EIGEN_UNUSED_VARIABLE(acc); // Just for compilation
EIGEN_UNUSED_VARIABLE(a);
EIGEN_UNUSED_VARIABLE(b);
}
template<typename Scalar, typename Packet, typename RhsPacket, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void pgercMMA(__vector_quad* accReal, __vector_quad* accImag, const Packet& lhsV, const Packet& lhsVi, const RhsPacket& rhsV, const RhsPacket& rhsVi)
{
pgerMMA<Packet, RhsPacket, false>(accReal, rhsV, lhsV);
if(LhsIsReal) {
pgerMMA<Packet, RhsPacket, ConjugateRhs>(accImag, rhsVi, lhsV);
} else {
if(!RhsIsReal) {
pgerMMA<Packet, RhsPacket, ConjugateLhs == ConjugateRhs>(accReal, rhsVi, lhsVi);
pgerMMA<Packet, RhsPacket, ConjugateRhs>(accImag, rhsVi, lhsV);
} else {
EIGEN_UNUSED_VARIABLE(rhsVi);
}
pgerMMA<Packet, RhsPacket, ConjugateLhs>(accImag, rhsV, lhsVi);
}
}
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void ploadRhsMMA(const Scalar* rhs, Packet& rhsV)
{
rhsV = ploadRhs<Scalar, Packet>((const Scalar*)(rhs));
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA<double, PacketBlock<Packet2d, 2> >(const double* rhs, PacketBlock<Packet2d, 2>& rhsV)
{
rhsV.packet[0] = ploadRhs<double, Packet2d>((const double *)((Packet2d *)rhs ));
rhsV.packet[1] = ploadRhs<double, Packet2d>((const double *)(((Packet2d *)rhs) + 1));
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA<double, __vector_pair>(const double* rhs, __vector_pair& rhsV)
{
#if EIGEN_COMP_LLVM
__builtin_vsx_assemble_pair(&rhsV,
(__vector unsigned char)(ploadRhs<double, Packet2d>((const double *)(((Packet2d *)rhs) + 1))),
(__vector unsigned char)(ploadRhs<double, Packet2d>((const double *)((Packet2d *)rhs ))));
#else
__asm__ ("lxvp %x0,%1" : "=wa" (rhsV) : "Y" (*rhs));
#endif
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA(const float* rhs, __vector_pair& rhsV)
{
// Just for compilation
EIGEN_UNUSED_VARIABLE(rhs);
EIGEN_UNUSED_VARIABLE(rhsV);
}
// PEEL_MMA loop factor.
#define PEEL_MMA 7
#define MICRO_MMA_UNROLL(func) \
func(0) func(1) func(2) func(3) func(4) func(5) func(6) func(7)
#define MICRO_MMA_LOAD_ONE(iter) \
if (unroll_factor > iter) { \
lhsV##iter = ploadLhs<Scalar, Packet>(lhs_ptr##iter); \
lhs_ptr##iter += accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(lhsV##iter); \
}
#define MICRO_MMA_WORK_ONE(iter, type, peel) \
if (unroll_factor > iter) { \
pgerMMA<Packet, type, false>(&accZero##iter, rhsV##peel, lhsV##iter); \
}
#define MICRO_MMA_TYPE_PEEL(func, func2, type, peel) \
if (PEEL_MMA > peel) { \
Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4, lhsV5, lhsV6, lhsV7; \
ploadRhsMMA<Scalar, type>(rhs_ptr + (accRows * peel), rhsV##peel); \
MICRO_MMA_UNROLL(func2); \
func(0,type,peel) func(1,type,peel) func(2,type,peel) func(3,type,peel) \
func(4,type,peel) func(5,type,peel) func(6,type,peel) func(7,type,peel) \
} else { \
EIGEN_UNUSED_VARIABLE(rhsV##peel); \
}
#define MICRO_MMA_UNROLL_TYPE_PEEL(func, func2, type) \
type rhsV0, rhsV1, rhsV2, rhsV3, rhsV4, rhsV5, rhsV6, rhsV7, rhsV8, rhsV9; \
MICRO_MMA_TYPE_PEEL(func,func2,type,0); MICRO_MMA_TYPE_PEEL(func,func2,type,1); \
MICRO_MMA_TYPE_PEEL(func,func2,type,2); MICRO_MMA_TYPE_PEEL(func,func2,type,3); \
MICRO_MMA_TYPE_PEEL(func,func2,type,4); MICRO_MMA_TYPE_PEEL(func,func2,type,5); \
MICRO_MMA_TYPE_PEEL(func,func2,type,6); MICRO_MMA_TYPE_PEEL(func,func2,type,7); \
MICRO_MMA_TYPE_PEEL(func,func2,type,8); MICRO_MMA_TYPE_PEEL(func,func2,type,9);
#define MICRO_MMA_UNROLL_TYPE_ONE(func, func2, type) \
type rhsV0; \
MICRO_MMA_TYPE_PEEL(func,func2,type,0);
#define MICRO_MMA_ONE_PEEL \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_MMA_UNROLL_TYPE_PEEL(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_MMA_UNROLL_TYPE_PEEL(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr += (accRows * PEEL_MMA);
#define MICRO_MMA_ONE \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_MMA_UNROLL_TYPE_ONE(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_MMA_UNROLL_TYPE_ONE(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr += accRows;
#define MICRO_MMA_DST_PTR_ONE(iter) \
if (unroll_factor > iter) { \
bsetzeroMMA<Scalar, Packet>(&accZero##iter); \
} else { \
EIGEN_UNUSED_VARIABLE(accZero##iter); \
}
#define MICRO_MMA_DST_PTR MICRO_MMA_UNROLL(MICRO_MMA_DST_PTR_ONE)
#define MICRO_MMA_SRC_PTR_ONE(iter) \
if (unroll_factor > iter) { \
lhs_ptr##iter = lhs_base + ( (row/accCols) + iter )*strideA*accCols + accCols*offsetA; \
} else { \
EIGEN_UNUSED_VARIABLE(lhs_ptr##iter); \
}
#define MICRO_MMA_SRC_PTR MICRO_MMA_UNROLL(MICRO_MMA_SRC_PTR_ONE)
#define MICRO_MMA_PREFETCH_ONE(iter) \
if (unroll_factor > iter) { \
EIGEN_POWER_PREFETCH(lhs_ptr##iter); \
}
#define MICRO_MMA_PREFETCH MICRO_MMA_UNROLL(MICRO_MMA_PREFETCH_ONE)
#define MICRO_MMA_STORE_ONE(iter) \
if (unroll_factor > iter) { \
storeAccumulator<DataMapper, Index, Packet, accCols>(row + iter*accCols, col, res, pAlpha, &accZero##iter); \
}
#define MICRO_MMA_STORE MICRO_MMA_UNROLL(MICRO_MMA_STORE_ONE)
template<int unroll_factor, typename Scalar, typename Packet, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols>
EIGEN_STRONG_INLINE void gemm_unrolled_MMA_iteration(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index& row,
Index col,
const Packet& pAlpha)
{
asm("#gemm_MMA begin");
const Scalar* rhs_ptr = rhs_base;
const Scalar* lhs_ptr0, * lhs_ptr1, * lhs_ptr2, * lhs_ptr3, * lhs_ptr4, * lhs_ptr5, * lhs_ptr6, * lhs_ptr7;
__vector_quad accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7;
MICRO_MMA_SRC_PTR
MICRO_MMA_DST_PTR
Index k = 0;
for(; k + PEEL_MMA <= depth; k+= PEEL_MMA)
{
EIGEN_POWER_PREFETCH(rhs_ptr);
MICRO_MMA_PREFETCH
MICRO_MMA_ONE_PEEL
}
for(; k < depth; k++)
{
MICRO_MMA_ONE
}
MICRO_MMA_STORE
row += unroll_factor*accCols;
asm("#gemm_MMA end");
}
template<typename Scalar, typename Index, typename Packet, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols>
void gemmMMA(const DataMapper& res, const Scalar* blockA, const Scalar* blockB, Index rows, Index depth, Index cols, Scalar alpha, Index strideA, Index strideB, Index offsetA, Index offsetB)
{
const Index remaining_rows = rows % accCols;
const Index remaining_cols = cols % accRows;
if( strideA == -1 ) strideA = depth;
if( strideB == -1 ) strideB = depth;
const Packet pAlpha = pset1<Packet>(alpha);
const Packet pMask = bmask<Packet>((const int)(remaining_rows));
Index col = 0;
for(; col + accRows <= cols; col += accRows)
{
const Scalar* rhs_base = blockB + col*strideB + accRows*offsetB;
const Scalar* lhs_base = blockA;
Index row = 0;
#define MAX_MMA_UNROLL 7
while(row + MAX_MMA_UNROLL*accCols <= rows) {
gemm_unrolled_MMA_iteration<MAX_MMA_UNROLL, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
}
switch( (rows-row)/accCols ) {
#if MAX_MMA_UNROLL > 7
case 7:
gemm_unrolled_MMA_iteration<7, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 6
case 6:
gemm_unrolled_MMA_iteration<6, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 5
case 5:
gemm_unrolled_MMA_iteration<5, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 4
case 4:
gemm_unrolled_MMA_iteration<4, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 3
case 3:
gemm_unrolled_MMA_iteration<3, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 2
case 2:
gemm_unrolled_MMA_iteration<2, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 1
case 1:
gemm_unrolled_MMA_iteration<1, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha);
break;
#endif
default:
break;
}
#undef MAX_MMA_UNROLL
if(remaining_rows > 0)
{
gemm_extra_row<Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, rows, cols, remaining_rows, pAlpha, pMask);
}
}
if(remaining_cols > 0)
{
const Scalar* rhs_base = blockB + col*strideB + remaining_cols*offsetB;
const Scalar* lhs_base = blockA;
for(; col < cols; col++)
{
Index row = 0;
gemm_unrolled_col<Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, rows, col, remaining_cols, pAlpha);
if (remaining_rows > 0)
{
gemm_extra_col<Scalar, Packet, DataMapper, Index, accRows>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_rows, remaining_cols, pAlpha);
}
rhs_base++;
}
}
}
#define accColsC (accCols / 2)
#define advanceRows ((LhsIsReal) ? 1 : 2)
#define advanceCols ((RhsIsReal) ? 1 : 2)
// PEEL_COMPLEX_MMA loop factor.
#define PEEL_COMPLEX_MMA 7
#define MICRO_COMPLEX_MMA_UNROLL(func) \
func(0) func(1) func(2) func(3) func(4)
#define MICRO_COMPLEX_MMA_LOAD_ONE(iter) \
if (unroll_factor > iter) { \
lhsV##iter = ploadLhs<Scalar, Packet>(lhs_ptr_real##iter); \
lhs_ptr_real##iter += accCols; \
if(!LhsIsReal) { \
lhsVi##iter = ploadLhs<Scalar, Packet>(lhs_ptr_imag##iter); \
lhs_ptr_imag##iter += accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(lhsVi##iter); \
} \
} else { \
EIGEN_UNUSED_VARIABLE(lhsV##iter); \
EIGEN_UNUSED_VARIABLE(lhsVi##iter); \
}
#define MICRO_COMPLEX_MMA_WORK_ONE(iter, type, peel) \
if (unroll_factor > iter) { \
pgercMMA<Scalar, Packet, type, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal##iter, &accImag##iter, lhsV##iter, lhsVi##iter, rhsV##peel, rhsVi##peel); \
}
#define MICRO_COMPLEX_MMA_TYPE_PEEL(func, func2, type, peel) \
if (PEEL_COMPLEX_MMA > peel) { \
Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4; \
Packet lhsVi0, lhsVi1, lhsVi2, lhsVi3, lhsVi4; \
ploadRhsMMA<Scalar, type>(rhs_ptr_real + (accRows * peel), rhsV##peel); \
if(!RhsIsReal) { \
ploadRhsMMA<Scalar, type>(rhs_ptr_imag + (accRows * peel), rhsVi##peel); \
} else { \
EIGEN_UNUSED_VARIABLE(rhsVi##peel); \
} \
MICRO_COMPLEX_MMA_UNROLL(func2); \
func(0,type,peel) func(1,type,peel) func(2,type,peel) func(3,type,peel) func(4,type,peel) \
} else { \
EIGEN_UNUSED_VARIABLE(rhsV##peel); \
EIGEN_UNUSED_VARIABLE(rhsVi##peel); \
}
#define MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(func, func2, type) \
type rhsV0, rhsV1, rhsV2, rhsV3, rhsV4, rhsV5, rhsV6, rhsV7, rhsV8, rhsV9; \
type rhsVi0, rhsVi1, rhsVi2, rhsVi3, rhsVi4, rhsVi5, rhsVi6, rhsVi7, rhsVi8, rhsVi9; \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,0); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,1); \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,2); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,3); \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,4); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,5); \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,6); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,7); \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,8); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,9);
#define MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(func, func2, type) \
type rhsV0, rhsVi0; \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,0);
#define MICRO_COMPLEX_MMA_ONE_PEEL \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr_real += (accRows * PEEL_COMPLEX_MMA); \
if(!RhsIsReal) rhs_ptr_imag += (accRows * PEEL_COMPLEX_MMA);
#define MICRO_COMPLEX_MMA_ONE \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr_real += accRows; \
if(!RhsIsReal) rhs_ptr_imag += accRows;
#define MICRO_COMPLEX_MMA_DST_PTR_ONE(iter) \
if (unroll_factor > iter) { \
bsetzeroMMA<Scalar, Packet>(&accReal##iter); \
bsetzeroMMA<Scalar, Packet>(&accImag##iter); \
} else { \
EIGEN_UNUSED_VARIABLE(accReal##iter); \
EIGEN_UNUSED_VARIABLE(accImag##iter); \
}
#define MICRO_COMPLEX_MMA_DST_PTR MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_DST_PTR_ONE)
#define MICRO_COMPLEX_MMA_SRC_PTR_ONE(iter) \
if (unroll_factor > iter) { \
lhs_ptr_real##iter = lhs_base + ( ((advanceRows*row)/accCols) + iter*advanceRows )*strideA*accCols + accCols*offsetA; \
if(!LhsIsReal) { \
lhs_ptr_imag##iter = lhs_ptr_real##iter + accCols*strideA; \
} else { \
EIGEN_UNUSED_VARIABLE(lhs_ptr_imag##iter); \
} \
} else { \
EIGEN_UNUSED_VARIABLE(lhs_ptr_real##iter); \
EIGEN_UNUSED_VARIABLE(lhs_ptr_imag##iter); \
}
#define MICRO_COMPLEX_MMA_SRC_PTR MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_SRC_PTR_ONE)
#define MICRO_COMPLEX_MMA_PREFETCH_ONE(iter) \
if (unroll_factor > iter) { \
EIGEN_POWER_PREFETCH(lhs_ptr_real##iter); \
if(!LhsIsReal) { \
EIGEN_POWER_PREFETCH(lhs_ptr_imag##iter); \
} \
}
#define MICRO_COMPLEX_MMA_PREFETCH MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_PREFETCH_ONE)
#define MICRO_COMPLEX_MMA_STORE_ONE(iter) \
if (unroll_factor > iter) { \
storeComplexAccumulator<DataMapper, Index, Packet, Packetc, accColsC, 0>(row + iter*accCols, col, res, pAlphaReal, pAlphaImag, &accReal##iter, &accImag##iter); \
}
#define MICRO_COMPLEX_MMA_STORE MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_STORE_ONE)
template<int unroll_factor, typename Scalar, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_complex_unrolled_MMA_iteration(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index& row,
Index col,
const Packet& pAlphaReal,
const Packet& pAlphaImag)
{
asm("#gemm_complex_MMA begin");
const Scalar* rhs_ptr_real = rhs_base;
const Scalar* rhs_ptr_imag;
if(!RhsIsReal) {
rhs_ptr_imag = rhs_base + accRows*strideB;
} else {
EIGEN_UNUSED_VARIABLE(rhs_ptr_imag);
}
const Scalar* lhs_ptr_real0, * lhs_ptr_imag0, * lhs_ptr_real1, * lhs_ptr_imag1;
const Scalar* lhs_ptr_real2, * lhs_ptr_imag2, * lhs_ptr_real3, * lhs_ptr_imag3;
const Scalar* lhs_ptr_real4, * lhs_ptr_imag4;
__vector_quad accReal0, accImag0, accReal1, accImag1, accReal2, accImag2, accReal3, accImag3, accReal4, accImag4;
MICRO_COMPLEX_MMA_SRC_PTR
MICRO_COMPLEX_MMA_DST_PTR
Index k = 0;
for(; k + PEEL_COMPLEX_MMA <= depth; k+= PEEL_COMPLEX_MMA)
{
EIGEN_POWER_PREFETCH(rhs_ptr_real);
if(!RhsIsReal) {
EIGEN_POWER_PREFETCH(rhs_ptr_imag);
}
MICRO_COMPLEX_MMA_PREFETCH
MICRO_COMPLEX_MMA_ONE_PEEL
}
for(; k < depth; k++)
{
MICRO_COMPLEX_MMA_ONE
}
MICRO_COMPLEX_MMA_STORE
row += unroll_factor*accCols;
asm("#gemm_complex_MMA end");
}
template<typename LhsScalar, typename RhsScalar, typename Scalarc, typename Scalar, typename Index, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
void gemm_complexMMA(const DataMapper& res, const LhsScalar* blockAc, const RhsScalar* blockBc, Index rows, Index depth, Index cols, Scalarc alpha, Index strideA, Index strideB, Index offsetA, Index offsetB)
{
const Index remaining_rows = rows % accCols;
const Index remaining_cols = cols % accRows;
if( strideA == -1 ) strideA = depth;
if( strideB == -1 ) strideB = depth;
const Packet pAlphaReal = pset1<Packet>(alpha.real());
const Packet pAlphaImag = pset1<Packet>(alpha.imag());
const Packet pMask = bmask<Packet>((const int)(remaining_rows));
const Scalar* blockA = (Scalar *) blockAc;
const Scalar* blockB = (Scalar *) blockBc;
Index col = 0;
for(; col + accRows <= cols; col += accRows)
{
const Scalar* rhs_base = blockB + advanceCols*col*strideB + accRows*offsetB;
const Scalar* lhs_base = blockA;
Index row = 0;
#define MAX_COMPLEX_MMA_UNROLL 4
while(row + MAX_COMPLEX_MMA_UNROLL*accCols <= rows) {
gemm_complex_unrolled_MMA_iteration<MAX_COMPLEX_MMA_UNROLL, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag);
}
switch( (rows-row)/accCols ) {
#if MAX_COMPLEX_MMA_UNROLL > 4
case 4:
gemm_complex_unrolled_MMA_iteration<4, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag);
break;
#endif
#if MAX_COMPLEX_MMA_UNROLL > 3
case 3:
gemm_complex_unrolled_MMA_iteration<3, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag);
break;
#endif
#if MAX_COMPLEX_MMA_UNROLL > 2
case 2:
gemm_complex_unrolled_MMA_iteration<2, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag);
break;
#endif
#if MAX_COMPLEX_MMA_UNROLL > 1
case 1:
gemm_complex_unrolled_MMA_iteration<1, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag);
break;
#endif
default:
break;
}
#undef MAX_COMPLEX_MMA_UNROLL
if(remaining_rows > 0)
{
gemm_complex_extra_row<Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, rows, cols, remaining_rows, pAlphaReal, pAlphaImag, pMask);
}
}
if(remaining_cols > 0)
{
const Scalar* rhs_base = blockB + advanceCols*col*strideB + remaining_cols*offsetB;
const Scalar* lhs_base = blockA;
for(; col < cols; col++)
{
Index row = 0;
gemm_complex_unrolled_col<Scalar, Packet, Packetc, DataMapper, Index, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, rows, col, remaining_cols, pAlphaReal, pAlphaImag);
if (remaining_rows > 0)
{
gemm_complex_extra_col<Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_rows, remaining_cols, pAlphaReal, pAlphaImag);
}
rhs_base++;
}
}
}
#undef accColsC
#undef advanceRows
#undef advanceCols
#pragma GCC reset_options
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H

File diff suppressed because it is too large Load Diff

View File

@@ -2,6 +2,7 @@
// for linear algebra.
//
// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
// Copyright (C) 2021 C. Antonio Sanchez <cantonios@google.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
@@ -11,93 +12,248 @@
#define EIGEN_COMPLEX_CUDA_H
// clang-format off
// Many std::complex methods such as operator+, operator-, operator* and
// operator/ are not constexpr. Due to this, GCC and older versions of clang do
// not treat them as device functions and thus Eigen functors making use of
// these operators fail to compile. Here, we manually specialize these
// operators and functors for complex types when building for CUDA to enable
// their use on-device.
#if defined(EIGEN_CUDACC) && defined(EIGEN_GPU_COMPILE_PHASE)
// ICC already specializes std::complex<float> and std::complex<double>
// operators, preventing us from making them device functions here.
// This will lead to silent runtime errors if the operators are used on device.
//
// To allow std::complex operator use on device, define _OVERRIDE_COMPLEX_SPECIALIZATION_
// prior to first inclusion of <complex>. This prevents ICC from adding
// its own specializations, so our custom ones below can be used instead.
#if !(defined(EIGEN_COMP_ICC) && defined(_USE_COMPLEX_SPECIALIZATION_))
// Import Eigen's internal operator specializations.
#define EIGEN_USING_STD_COMPLEX_OPERATORS \
using Eigen::complex_operator_detail::operator+; \
using Eigen::complex_operator_detail::operator-; \
using Eigen::complex_operator_detail::operator*; \
using Eigen::complex_operator_detail::operator/; \
using Eigen::complex_operator_detail::operator+=; \
using Eigen::complex_operator_detail::operator-=; \
using Eigen::complex_operator_detail::operator*=; \
using Eigen::complex_operator_detail::operator/=; \
using Eigen::complex_operator_detail::operator==; \
using Eigen::complex_operator_detail::operator!=;
namespace Eigen {
namespace internal {
// Specialized std::complex overloads.
namespace complex_operator_detail {
#if defined(EIGEN_CUDACC) && defined(EIGEN_USE_GPU)
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_multiply(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
return std::complex<T>(
a_real * b_real - a_imag * b_imag,
a_imag * b_real + a_real * b_imag);
}
// Many std::complex methods such as operator+, operator-, operator* and
// operator/ are not constexpr. Due to this, clang does not treat them as device
// functions and thus Eigen functors making use of these operators fail to
// compile. Here, we manually specialize these functors for complex types when
// building for CUDA to avoid non-constexpr methods.
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_fast(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = T(1) / (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) * norm,
(a_imag * b_real - a_real * b_imag) * norm);
}
// Sum
template<typename T> struct scalar_sum_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_sum_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
return std::complex<T>(numext::real(a) + numext::real(b),
numext::imag(a) + numext::imag(b));
}
};
template<typename T> struct scalar_sum_op<std::complex<T>, std::complex<T> > : scalar_sum_op<const std::complex<T>, const std::complex<T> > {};
// Difference
template<typename T> struct scalar_difference_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_difference_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
return std::complex<T>(numext::real(a) - numext::real(b),
numext::imag(a) - numext::imag(b));
}
};
template<typename T> struct scalar_difference_op<std::complex<T>, std::complex<T> > : scalar_difference_op<const std::complex<T>, const std::complex<T> > {};
// Product
template<typename T> struct scalar_product_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T> >::HasMul
};
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_product_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
return std::complex<T>(a_real * b_real - a_imag * b_imag,
a_real * b_imag + a_imag * b_real);
}
};
template<typename T> struct scalar_product_op<std::complex<T>, std::complex<T> > : scalar_product_op<const std::complex<T>, const std::complex<T> > {};
// Quotient
template<typename T> struct scalar_quotient_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T> >::HasDiv
};
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_quotient_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = T(1) / (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) * norm,
(a_imag * b_real - a_real * b_imag) * norm);
}
};
template<typename T> struct scalar_quotient_op<std::complex<T>, std::complex<T> > : scalar_quotient_op<const std::complex<T>, const std::complex<T> > {};
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_stable(const std::complex<T>& a, const std::complex<T>& b) {
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
// Guard against over/under-flow.
const T scale = T(1) / (numext::abs(b_real) + numext::abs(b_imag));
const T a_real_scaled = numext::real(a) * scale;
const T a_imag_scaled = numext::imag(a) * scale;
const T b_real_scaled = b_real * scale;
const T b_imag_scaled = b_imag * scale;
const T b_norm2_scaled = b_real_scaled * b_real_scaled + b_imag_scaled * b_imag_scaled;
return std::complex<T>(
(a_real_scaled * b_real_scaled + a_imag_scaled * b_imag_scaled) / b_norm2_scaled,
(a_imag_scaled * b_real_scaled - a_real_scaled * b_imag_scaled) / b_norm2_scaled);
}
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide(const std::complex<T>& a, const std::complex<T>& b) {
#if EIGEN_FAST_MATH
return complex_divide_fast(a, b);
#else
return complex_divide_stable(a, b);
#endif
}
} // end namespace internal
// NOTE: We cannot specialize compound assignment operators with Scalar T,
// (i.e. operator@=(const T&), for @=+,-,*,/)
// since they are already specialized for float/double/long double within
// the standard <complex> header. We also do not specialize the stream
// operators.
#define EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(T) \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a) { return a; } \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a) { \
return std::complex<T>(-numext::real(a), -numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a, const std::complex<T>& b) { \
return std::complex<T>(numext::real(a) + numext::real(b), numext::imag(a) + numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) + b, numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a + numext::real(b), numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a, const std::complex<T>& b) { \
return std::complex<T>(numext::real(a) - numext::real(b), numext::imag(a) - numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) - b, numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a - numext::real(b), -numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const std::complex<T>& a, const std::complex<T>& b) { \
return complex_multiply(a, b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) * b, numext::imag(a) * b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a * numext::real(b), a * numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const std::complex<T>& a, const std::complex<T>& b) { \
return complex_divide(a, b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) / b, numext::imag(a) / b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const T& a, const std::complex<T>& b) { \
return complex_divide(std::complex<T>(a, 0), b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator+=(std::complex<T>& a, const std::complex<T>& b) { \
numext::real_ref(a) += numext::real(b); \
numext::imag_ref(a) += numext::imag(b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator-=(std::complex<T>& a, const std::complex<T>& b) { \
numext::real_ref(a) -= numext::real(b); \
numext::imag_ref(a) -= numext::imag(b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator*=(std::complex<T>& a, const std::complex<T>& b) { \
a = complex_multiply(a, b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator/=(std::complex<T>& a, const std::complex<T>& b) { \
a = complex_divide(a, b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const std::complex<T>& a, const std::complex<T>& b) { \
return numext::real(a) == numext::real(b) && numext::imag(a) == numext::imag(b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const std::complex<T>& a, const T& b) { \
return numext::real(a) == b && numext::imag(a) == 0; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const T& a, const std::complex<T>& b) { \
return a == numext::real(b) && 0 == numext::imag(b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const std::complex<T>& a, const std::complex<T>& b) { \
return !(a == b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const std::complex<T>& a, const T& b) { \
return !(a == b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const T& a, const std::complex<T>& b) { \
return !(a == b); \
}
} // end namespace Eigen
// Do not specialize for long double, since that reduces to double on device.
EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(float)
EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(double)
#endif // EIGEN_COMPLEX_CUDA_H
#undef EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS
} // namespace complex_operator_detail
EIGEN_USING_STD_COMPLEX_OPERATORS
namespace numext {
EIGEN_USING_STD_COMPLEX_OPERATORS
} // namespace numext
namespace internal {
EIGEN_USING_STD_COMPLEX_OPERATORS
} // namespace internal
} // namespace Eigen
#endif // !(EIGEN_COMP_ICC && _USE_COMPLEX_SPECIALIZATION_)
#endif // EIGEN_CUDACC && EIGEN_GPU_COMPILE_PHASE
#endif // EIGEN_COMPLEX_CUDA_H

View File

@@ -0,0 +1,707 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef EIGEN_BFLOAT16_H
#define EIGEN_BFLOAT16_H
#define BF16_PACKET_FUNCTION(PACKET_F, PACKET_BF16, METHOD) \
template <> \
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED \
PACKET_BF16 METHOD<PACKET_BF16>(const PACKET_BF16& _x) { \
return F32ToBf16(METHOD<PACKET_F>(Bf16ToF32(_x))); \
}
namespace Eigen {
struct bfloat16;
namespace bfloat16_impl {
// Make our own __bfloat16_raw definition.
struct __bfloat16_raw {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw() : value(0) {}
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw(unsigned short raw) : value(raw) {}
unsigned short value;
};
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(unsigned short value);
template <bool AssumeArgumentIsNormalOrInfinityOrZero>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff);
// Forward declarations of template specializations, to avoid Visual C++ 2019 errors, saying:
// > error C2908: explicit specialization; 'float_to_bfloat16_rtne' has already been instantiated
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<false>(float ff);
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<true>(float ff);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h);
struct bfloat16_base : public __bfloat16_raw {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base(const __bfloat16_raw& h) : __bfloat16_raw(h) {}
};
} // namespace bfloat16_impl
// Class definition.
struct bfloat16 : public bfloat16_impl::bfloat16_base {
typedef bfloat16_impl::__bfloat16_raw __bfloat16_raw;
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const __bfloat16_raw& h) : bfloat16_impl::bfloat16_base(h) {}
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(bool b)
: bfloat16_impl::bfloat16_base(bfloat16_impl::raw_uint16_to_bfloat16(b ? 0x3f80 : 0)) {}
template<class T>
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(T val)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<internal::is_integral<T>::value>(static_cast<float>(val))) {}
explicit EIGEN_DEVICE_FUNC bfloat16(float f)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<false>(f)) {}
// Following the convention of numpy, converting between complex and
// float will lead to loss of imag value.
template<typename RealScalar>
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const std::complex<RealScalar>& val)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<false>(static_cast<float>(val.real()))) {}
EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless.
return bfloat16_impl::bfloat16_to_float(*this);
}
};
} // namespace Eigen
namespace std {
template<>
struct numeric_limits<Eigen::bfloat16> {
static const bool is_specialized = true;
static const bool is_signed = true;
static const bool is_integer = false;
static const bool is_exact = false;
static const bool has_infinity = true;
static const bool has_quiet_NaN = true;
static const bool has_signaling_NaN = true;
static const float_denorm_style has_denorm = std::denorm_absent;
static const bool has_denorm_loss = false;
static const std::float_round_style round_style = numeric_limits<float>::round_style;
static const bool is_iec559 = false;
static const bool is_bounded = true;
static const bool is_modulo = false;
static const int digits = 8;
static const int digits10 = 2;
static const int max_digits10 = 4;
static const int radix = 2;
static const int min_exponent = numeric_limits<float>::min_exponent;
static const int min_exponent10 = numeric_limits<float>::min_exponent10;
static const int max_exponent = numeric_limits<float>::max_exponent;
static const int max_exponent10 = numeric_limits<float>::max_exponent10;
static const bool traps = numeric_limits<float>::traps;
static const bool tinyness_before = numeric_limits<float>::tinyness_before;
static Eigen::bfloat16 (min)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0080); }
static Eigen::bfloat16 lowest() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0xff7f); }
static Eigen::bfloat16 (max)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f7f); }
static Eigen::bfloat16 epsilon() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x3c00); }
static Eigen::bfloat16 round_error() { return Eigen::bfloat16(0x3f00); }
static Eigen::bfloat16 infinity() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f80); }
static Eigen::bfloat16 quiet_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0); }
static Eigen::bfloat16 signaling_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f81); }
static Eigen::bfloat16 denorm_min() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0001); }
};
// If std::numeric_limits<T> is specialized, should also specialize
// std::numeric_limits<const T>, std::numeric_limits<volatile T>, and
// std::numeric_limits<const volatile T>
// https://stackoverflow.com/a/16519653/
template<>
struct numeric_limits<const Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
template<>
struct numeric_limits<volatile Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
template<>
struct numeric_limits<const volatile Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
} // namespace std
namespace Eigen {
namespace bfloat16_impl {
// We need to distinguish clang as the CUDA compiler from clang as the host compiler,
// invoked by NVCC (e.g. on MacOS). The former needs to see both host and device implementation
// of the functions, while the latter can only deal with one of them.
#if !defined(EIGEN_HAS_NATIVE_BF16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for bfloat16 floats
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
// We need to provide emulated *host-side* BF16 operators for clang.
#pragma push_macro("EIGEN_DEVICE_FUNC")
#undef EIGEN_DEVICE_FUNC
#if defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_NATIVE_BF16)
#define EIGEN_DEVICE_FUNC __host__
#else // both host and device need emulated ops.
#define EIGEN_DEVICE_FUNC __host__ __device__
#endif
#endif
// Definitions for CPUs, mostly working through conversion
// to/from fp32.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) + float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const int& b) {
return bfloat16(float(a) + static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const int& a, const bfloat16& b) {
return bfloat16(static_cast<float>(a) + float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator * (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) * float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) - float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) / float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a) {
bfloat16 result;
result.value = a.value ^ 0x8000;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator += (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) + float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator *= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) * float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator -= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) - float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator /= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) / float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a) {
a += bfloat16(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a) {
a -= bfloat16(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a, int) {
bfloat16 original_value = a;
++a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a, int) {
bfloat16 original_value = a;
--a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const bfloat16& a, const bfloat16& b) {
return numext::equal_strict(float(a),float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const bfloat16& a, const bfloat16& b) {
return numext::not_equal_strict(float(a), float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const bfloat16& a, const bfloat16& b) {
return float(a) < float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const bfloat16& a, const bfloat16& b) {
return float(a) <= float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const bfloat16& a, const bfloat16& b) {
return float(a) > float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const bfloat16& a, const bfloat16& b) {
return float(a) >= float(b);
}
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
#pragma pop_macro("EIGEN_DEVICE_FUNC")
#endif
#endif // Emulate support for bfloat16 floats
// Division by an index. Do it in full float precision to avoid accuracy
// issues in converting the denominator to bfloat16.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, Index b) {
return bfloat16(static_cast<float>(a) / static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw truncate_to_bfloat16(const float v) {
__bfloat16_raw output;
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(v)) {
output.value = std::signbit(v) ? 0xFFC0: 0x7FC0;
return output;
} else if (std::fabs(v) < std::numeric_limits<float>::min EIGEN_NOT_A_MACRO()) {
// Flush denormal to +/- 0.
output.value = std::signbit(v) ? 0x8000 : 0;
return output;
}
const uint16_t* p = reinterpret_cast<const uint16_t*>(&v);
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
output.value = p[0];
#else
output.value = p[1];
#endif
return output;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(numext::uint16_t value) {
return __bfloat16_raw(value);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR numext::uint16_t raw_bfloat16_as_uint16(const __bfloat16_raw& bf) {
return bf.value;
}
// float_to_bfloat16_rtne template specialization that does not make any
// assumption about the value of its function argument (ff).
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<false>(float ff) {
#if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16))
// Nothing to do here
#else
__bfloat16_raw output;
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(ff)) {
// If the value is a NaN, squash it to a qNaN with msb of fraction set,
// this makes sure after truncation we don't end up with an inf.
//
// qNaN magic: All exponent bits set + most significant bit of fraction
// set.
output.value = std::signbit(ff) ? 0xFFC0: 0x7FC0;
} else if (std::fabs(ff) < std::numeric_limits<float>::min EIGEN_NOT_A_MACRO()) {
// Flush denormal to +/- 0.0
output.value = std::signbit(ff) ? 0x8000 : 0;
} else {
// Fast rounding algorithm that rounds a half value to nearest even. This
// reduces expected error when we convert a large number of floats. Here
// is how it works:
//
// Definitions:
// To convert a float 32 to bfloat16, a float 32 can be viewed as 32 bits
// with the following tags:
//
// Sign | Exp (8 bits) | Frac (23 bits)
// S EEEEEEEE FFFFFFLRTTTTTTTTTTTTTTT
//
// S: Sign bit.
// E: Exponent bits.
// F: First 6 bits of fraction.
// L: Least significant bit of resulting bfloat16 if we truncate away the
// rest of the float32. This is also the 7th bit of fraction
// R: Rounding bit, 8th bit of fraction.
// T: Sticky bits, rest of fraction, 15 bits.
//
// To round half to nearest even, there are 3 cases where we want to round
// down (simply truncate the result of the bits away, which consists of
// rounding bit and sticky bits) and two cases where we want to round up
// (truncate then add one to the result).
//
// The fast converting algorithm simply adds lsb (L) to 0x7fff (15 bits of
// 1s) as the rounding bias, adds the rounding bias to the input, then
// truncates the last 16 bits away.
//
// To understand how it works, we can analyze this algorithm case by case:
//
// 1. L = 0, R = 0:
// Expect: round down, this is less than half value.
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input may create any carry, depending on
// whether there is any value set to 1 in T bits.
// - R may be set to 1 if there is a carry.
// - L remains 0.
// - Note that this case also handles Inf and -Inf, where all fraction
// bits, including L, R and Ts are all 0. The output remains Inf after
// this algorithm.
//
// 2. L = 1, R = 0:
// Expect: round down, this is less than half value.
//
// Algorithm:
// - Rounding bias: 0x7fff + 1 = 0x8000
// - Adding rounding bias to input doesn't change sticky bits but
// adds 1 to rounding bit.
// - L remains 1.
//
// 3. L = 0, R = 1, all of T are 0:
// Expect: round down, this is exactly at half, the result is already
// even (L=0).
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input sets all sticky bits to 1, but
// doesn't create a carry.
// - R remains 1.
// - L remains 0.
//
// 4. L = 1, R = 1:
// Expect: round up, this is exactly at half, the result needs to be
// round to the next even number.
//
// Algorithm:
// - Rounding bias: 0x7fff + 1 = 0x8000
// - Adding rounding bias to input doesn't change sticky bits, but
// creates a carry from rounding bit.
// - The carry sets L to 0, creates another carry bit and propagate
// forward to F bits.
// - If all the F bits are 1, a carry then propagates to the exponent
// bits, which then creates the minimum value with the next exponent
// value. Note that we won't have the case where exponents are all 1,
// since that's either a NaN (handled in the other if condition) or inf
// (handled in case 1).
//
// 5. L = 0, R = 1, any of T is 1:
// Expect: round up, this is greater than half.
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input creates a carry from sticky bits,
// sets rounding bit to 0, then create another carry.
// - The second carry sets L to 1.
//
// Examples:
//
// Exact half value that is already even:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1000000000000000
//
// This falls into case 3. We truncate the rest of 16 bits and no
// carry is created into F and L:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
//
// Exact half value, round to next even number:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1000000000000000
//
// This falls into case 4. We create a carry from R and T,
// which then propagates into L and F:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
//
//
// Max denormal value round to min normal value:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1111111111111111
//
// This falls into case 4. We create a carry from R and T,
// propagate into L and F, which then propagates into exponent
// bits:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
//
// Max normal value round to Inf:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1111111111111111
//
// This falls into case 4. We create a carry from R and T,
// propagate into L and F, which then propagates into exponent
// bits:
//
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
// At this point, ff must be either a normal float, or +/-infinity.
output = float_to_bfloat16_rtne<true>(ff);
}
return output;
#endif
}
// float_to_bfloat16_rtne template specialization that assumes that its function
// argument (ff) is either a normal floating point number, or +/-infinity, or
// zero. Used to improve the runtime performance of conversion from an integer
// type to bfloat16.
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<true>(float ff) {
#if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16))
// Nothing to do here
#else
numext::uint32_t input = numext::bit_cast<numext::uint32_t>(ff);
__bfloat16_raw output;
// Least significant bit of resulting bfloat.
numext::uint32_t lsb = (input >> 16) & 1;
numext::uint32_t rounding_bias = 0x7fff + lsb;
input += rounding_bias;
output.value = static_cast<numext::uint16_t>(input >> 16);
return output;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h) {
float result = 0;
unsigned short* q = reinterpret_cast<unsigned short*>(&result);
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
q[0] = h.value;
#else
q[1] = h.value;
#endif
return result;
}
// --- standard functions ---
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const bfloat16& a) {
EIGEN_USING_STD(isinf);
return (isinf)(float(a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const bfloat16& a) {
EIGEN_USING_STD(isnan);
return (isnan)(float(a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const bfloat16& a) {
return !(isinf EIGEN_NOT_A_MACRO (a)) && !(isnan EIGEN_NOT_A_MACRO (a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 abs(const bfloat16& a) {
bfloat16 result;
result.value = a.value & 0x7FFF;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 exp(const bfloat16& a) {
return bfloat16(::expf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 expm1(const bfloat16& a) {
return bfloat16(numext::expm1(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log(const bfloat16& a) {
return bfloat16(::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log1p(const bfloat16& a) {
return bfloat16(numext::log1p(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log10(const bfloat16& a) {
return bfloat16(::log10f(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log2(const bfloat16& a) {
return bfloat16(static_cast<float>(EIGEN_LOG2E) * ::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sqrt(const bfloat16& a) {
return bfloat16(::sqrtf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 pow(const bfloat16& a, const bfloat16& b) {
return bfloat16(::powf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sin(const bfloat16& a) {
return bfloat16(::sinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cos(const bfloat16& a) {
return bfloat16(::cosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tan(const bfloat16& a) {
return bfloat16(::tanf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asin(const bfloat16& a) {
return bfloat16(::asinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acos(const bfloat16& a) {
return bfloat16(::acosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atan(const bfloat16& a) {
return bfloat16(::atanf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sinh(const bfloat16& a) {
return bfloat16(::sinhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cosh(const bfloat16& a) {
return bfloat16(::coshf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tanh(const bfloat16& a) {
return bfloat16(::tanhf(float(a)));
}
#if EIGEN_HAS_CXX11_MATH
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asinh(const bfloat16& a) {
return bfloat16(::asinhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acosh(const bfloat16& a) {
return bfloat16(::acoshf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atanh(const bfloat16& a) {
return bfloat16(::atanhf(float(a)));
}
#endif
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 floor(const bfloat16& a) {
return bfloat16(::floorf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 ceil(const bfloat16& a) {
return bfloat16(::ceilf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 rint(const bfloat16& a) {
return bfloat16(::rintf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 round(const bfloat16& a) {
return bfloat16(::roundf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmod(const bfloat16& a, const bfloat16& b) {
return bfloat16(::fmodf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (min)(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return f2 < f1 ? b : a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (max)(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return f1 < f2 ? b : a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmin(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return bfloat16(::fminf(f1, f2));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmax(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return bfloat16(::fmaxf(f1, f2));
}
#ifndef EIGEN_NO_IO
EIGEN_ALWAYS_INLINE std::ostream& operator << (std::ostream& os, const bfloat16& v) {
os << static_cast<float>(v);
return os;
}
#endif
} // namespace bfloat16_impl
namespace internal {
template<>
struct random_default_impl<bfloat16, false, false>
{
static inline bfloat16 run(const bfloat16& x, const bfloat16& y)
{
return x + (y-x) * bfloat16(float(std::rand()) / float(RAND_MAX));
}
static inline bfloat16 run()
{
return run(bfloat16(-1.f), bfloat16(1.f));
}
};
template<> struct is_arithmetic<bfloat16> { enum { value = true }; };
} // namespace internal
template<> struct NumTraits<Eigen::bfloat16>
: GenericNumTraits<Eigen::bfloat16>
{
enum {
IsSigned = true,
IsInteger = false,
IsComplex = false,
RequireInitialization = false
};
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 epsilon() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x3c00);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 dummy_precision() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x3D4D); // bfloat16(5e-2f);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 highest() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7F7F);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 lowest() {
return bfloat16_impl::raw_uint16_to_bfloat16(0xFF7F);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 infinity() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7f80);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 quiet_NaN() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0);
}
};
} // namespace Eigen
namespace Eigen {
namespace numext {
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isnan)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isnan)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isinf)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isinf)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isfinite)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isfinite)(h);
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bit_cast<Eigen::bfloat16, uint16_t>(const uint16_t& src) {
return Eigen::bfloat16(Eigen::bfloat16_impl::raw_uint16_to_bfloat16(src));
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast<uint16_t, Eigen::bfloat16>(const Eigen::bfloat16& src) {
return Eigen::bfloat16_impl::raw_bfloat16_as_uint16(src);
}
} // namespace numext
} // namespace Eigen
#if EIGEN_HAS_STD_HASH
namespace std {
template <>
struct hash<Eigen::bfloat16> {
EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::bfloat16& a) const {
return static_cast<std::size_t>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(a));
}
};
} // namespace std
#endif
#endif // EIGEN_BFLOAT16_H

File diff suppressed because it is too large Load Diff

View File

@@ -17,11 +17,31 @@ namespace internal {
// implemented in GenericPacketMathFunctions.h
// This is needed to workaround a circular dependency.
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent);
/** \internal \returns a packet with constant coefficients \a a, e.g.: (a[N-1],...,a[0]) */
template<typename Packet, int N> EIGEN_DEVICE_FUNC inline Packet
pset(const typename unpacket_traits<Packet>::type (&a)[N] /* a */);
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent);
/***************************************************************************
* Some generic implementations to be used by implementors
***************************************************************************/
/** Default implementation of pfrexp.
* It is expected to be called by implementers of template<> pfrexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
Packet pfrexp_generic(const Packet& a, Packet& exponent);
// Extracts the biased exponent value from Packet p, and casts the results to
// a floating-point Packet type. Used by pfrexp_generic. Override this if
// there is no unpacket_traits<Packet>::integer_packet.
template<typename Packet> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
Packet pfrexp_generic_get_biased_exponent(const Packet& p);
/** Default implementation of pldexp.
* It is expected to be called by implementers of template<> pldexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
Packet pldexp_generic(const Packet& a, const Packet& exponent);
/** \internal \returns log(x) for single precision float */
template <typename Packet>
@@ -29,6 +49,24 @@ EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog_float(const Packet _x);
/** \internal \returns log2(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog2_float(const Packet _x);
/** \internal \returns log(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog_double(const Packet _x);
/** \internal \returns log2(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog2_double(const Packet _x);
/** \internal \returns log(1 + x) */
template<typename Packet>
Packet generic_plog1p(const Packet& x);
@@ -61,8 +99,15 @@ EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pcos_float(const Packet& x);
/** \internal \returns sqrt(x) for complex types */
template<typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet psqrt_complex(const Packet& a);
template <typename Packet, int N> struct ppolevl;
} // end namespace internal
} // end namespace Eigen

View File

@@ -36,12 +36,26 @@
#ifndef EIGEN_HALF_H
#define EIGEN_HALF_H
#if __cplusplus > 199711L
#define EIGEN_EXPLICIT_CAST(tgt_type) explicit operator tgt_type()
#else
#define EIGEN_EXPLICIT_CAST(tgt_type) operator tgt_type()
#include <sstream>
#if defined(EIGEN_HAS_GPU_FP16) || defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
// When compiling with GPU support, the "__half_raw" base class as well as
// some other routines are defined in the GPU compiler header files
// (cuda_fp16.h, hip_fp16.h), and they are not tagged constexpr
// As a consequence, we get compile failures when compiling Eigen with
// GPU support. Hence the need to disable EIGEN_CONSTEXPR when building
// Eigen with GPU support
#pragma push_macro("EIGEN_CONSTEXPR")
#undef EIGEN_CONSTEXPR
#define EIGEN_CONSTEXPR
#endif
#define F16_PACKET_FUNCTION(PACKET_F, PACKET_F16, METHOD) \
template <> \
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_UNUSED \
PACKET_F16 METHOD<PACKET_F16>(const PACKET_F16& _x) { \
return float2half(METHOD<PACKET_F>(half2float(_x))); \
}
namespace Eigen {
@@ -49,43 +63,76 @@ struct half;
namespace half_impl {
#if !defined(EIGEN_HAS_GPU_FP16)
// We want to use the __half_raw struct from the HIP header file only during the device compile phase.
// This is required because of a quirk in the way TensorFlow GPU builds are done.
// When compiling TensorFlow source code with GPU support, files that
// * contain GPU kernels (i.e. *.cu.cc files) are compiled via hipcc
// * do not contain GPU kernels ( i.e. *.cc files) are compiled via gcc (typically)
//
// Tensorflow uses the Eigen::half type as its FP16 type, and there are functions that
// * are defined in a file that gets compiled via hipcc AND
// * have Eigen::half as a pass-by-value argument AND
// * are called in a file that gets compiled via gcc
//
// In the scenario described above the caller and callee will see different versions
// of the Eigen::half base class __half_raw, and they will be compiled by different compilers
//
// There appears to be an ABI mismatch between gcc and clang (which is called by hipcc) that results in
// the callee getting corrupted values for the Eigen::half argument.
//
// Making the host side compile phase of hipcc use the same Eigen::half impl, as the gcc compile, resolves
// this error, and hence the following convoluted #if condition
#if !defined(EIGEN_HAS_GPU_FP16) || !defined(EIGEN_GPU_COMPILE_PHASE)
// Make our own __half_raw definition that is similar to CUDA's.
struct __half_raw {
EIGEN_DEVICE_FUNC __half_raw() : x(0) {}
explicit EIGEN_DEVICE_FUNC __half_raw(unsigned short raw) : x(raw) {}
unsigned short x;
#if (defined(EIGEN_HAS_GPU_FP16) && !defined(EIGEN_GPU_COMPILE_PHASE))
// Eigen::half can be used as the datatype for shared memory declarations (in Eigen and TF)
// The element type for shared memory cannot have non-trivial constructors
// and hence the following special casing (which skips the zero-initilization).
// Note that this check gets done even in the host compilation phase, and
// hence the need for this
EIGEN_DEVICE_FUNC __half_raw() {}
#else
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw() : x(0) {}
#endif
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw(numext::uint16_t raw) : x(numext::bit_cast<__fp16>(raw)) {
}
__fp16 x;
#else
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw(numext::uint16_t raw) : x(raw) {}
numext::uint16_t x;
#endif
};
#elif defined(EIGEN_HAS_HIP_FP16)
// Nothing to do here
// HIP fp16 header file has a definition for __half_raw
#elif defined(EIGEN_HAS_CUDA_FP16)
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000
// In CUDA < 9.0, __half is the equivalent of CUDA 9's __half_raw
typedef __half __half_raw;
#endif // defined(EIGEN_HAS_CUDA_FP16)
#if EIGEN_CUDA_SDK_VER < 90000
// In CUDA < 9.0, __half is the equivalent of CUDA 9's __half_raw
typedef __half __half_raw;
#endif // defined(EIGEN_HAS_CUDA_FP16)
#elif defined(SYCL_DEVICE_ONLY)
typedef cl::sycl::half __half_raw;
typedef cl::sycl::half __half_raw;
#endif
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw raw_uint16_to_half(unsigned short x);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw raw_uint16_to_half(numext::uint16_t x);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h);
struct half_base : public __half_raw {
EIGEN_DEVICE_FUNC half_base() {}
EIGEN_DEVICE_FUNC half_base(const __half_raw& h) : __half_raw(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half_raw& h) : __half_raw(h) {}
#if defined(EIGEN_HAS_GPU_FP16)
#if defined(EIGEN_HAS_HIP_FP16)
EIGEN_DEVICE_FUNC half_base(const __half& h) { x = __half_as_ushort(h); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half& h) { x = __half_as_ushort(h); }
#elif defined(EIGEN_HAS_CUDA_FP16)
#if (defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000)
EIGEN_DEVICE_FUNC half_base(const __half& h) : __half_raw(*(__half_raw*)&h) {}
#if EIGEN_CUDA_SDK_VER >= 90000
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half& h) : __half_raw(*(__half_raw*)&h) {}
#endif
#endif
#endif
#endif
};
@@ -96,7 +143,10 @@ struct half : public half_impl::half_base {
// Writing this out as separate #if-else blocks to make the code easier to follow
// The same applies to most #if-else blocks in this file
#if !defined(EIGEN_HAS_GPU_FP16)
#if !defined(EIGEN_HAS_GPU_FP16) || !defined(EIGEN_GPU_COMPILE_PHASE)
// Use the same base class for the following two scenarios
// * when compiling without GPU support enabled
// * during host compile phase when compiling with GPU support enabled
typedef half_impl::__half_raw __half_raw;
#elif defined(EIGEN_HAS_HIP_FP16)
// Nothing to do here
@@ -110,69 +160,46 @@ struct half : public half_impl::half_base {
#endif
#endif
EIGEN_DEVICE_FUNC half() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half() {}
EIGEN_DEVICE_FUNC half(const __half_raw& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half_raw& h) : half_impl::half_base(h) {}
#if defined(EIGEN_HAS_GPU_FP16)
#if defined(EIGEN_HAS_HIP_FP16)
EIGEN_DEVICE_FUNC half(const __half& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half& h) : half_impl::half_base(h) {}
#elif defined(EIGEN_HAS_CUDA_FP16)
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000
EIGEN_DEVICE_FUNC half(const __half& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half& h) : half_impl::half_base(h) {}
#endif
#endif
#endif
explicit EIGEN_DEVICE_FUNC half(bool b)
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(bool b)
: half_impl::half_base(half_impl::raw_uint16_to_half(b ? 0x3c00 : 0)) {}
template<class T>
explicit EIGEN_DEVICE_FUNC half(const T& val)
explicit EIGEN_DEVICE_FUNC half(T val)
: half_impl::half_base(half_impl::float_to_half_rtne(static_cast<float>(val))) {}
explicit EIGEN_DEVICE_FUNC half(float f)
: half_impl::half_base(half_impl::float_to_half_rtne(f)) {}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(bool) const {
// +0.0 and -0.0 become false, everything else becomes true.
return (x & 0x7fff) != 0;
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(signed char) const {
return static_cast<signed char>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned char) const {
return static_cast<unsigned char>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(short) const {
return static_cast<short>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned short) const {
return static_cast<unsigned short>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(int) const {
return static_cast<int>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned int) const {
return static_cast<unsigned int>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(long) const {
return static_cast<long>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned long) const {
return static_cast<unsigned long>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(long long) const {
return static_cast<long long>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(unsigned long long) const {
return static_cast<unsigned long long>(half_to_float(*this));
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(float) const {
// Following the convention of numpy, converting between complex and
// float will lead to loss of imag value.
template<typename RealScalar>
explicit EIGEN_DEVICE_FUNC half(std::complex<RealScalar> c)
: half_impl::half_base(half_impl::float_to_half_rtne(static_cast<float>(c.real()))) {}
EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless.
return half_impl::half_to_float(*this);
}
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(double) const {
return static_cast<double>(half_impl::half_to_float(*this));
#if defined(EIGEN_HAS_GPU_FP16) && !defined(EIGEN_GPU_COMPILE_PHASE)
EIGEN_DEVICE_FUNC operator __half() const {
::__half_raw hr;
hr.x = x;
return __half(hr);
}
#endif
};
} // end namespace Eigen
@@ -211,7 +238,7 @@ struct numeric_limits<Eigen::half> {
static Eigen::half round_error() { return Eigen::half(0.5); }
static Eigen::half infinity() { return Eigen::half_impl::raw_uint16_to_half(0x7c00); }
static Eigen::half quiet_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7e00); }
static Eigen::half signaling_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7e00); }
static Eigen::half signaling_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7d00); }
static Eigen::half denorm_min() { return Eigen::half_impl::raw_uint16_to_half(0x1); }
};
@@ -234,6 +261,9 @@ namespace half_impl {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
// Note: We deliberatly do *not* define this to 1 even if we have Arm's native
// fp16 type since GPU halfs are rather different from native CPU halfs.
// TODO: Rename to something like EIGEN_HAS_NATIVE_GPU_FP16
#define EIGEN_HAS_NATIVE_FP16
#endif
@@ -302,13 +332,62 @@ EIGEN_STRONG_INLINE __device__ bool operator > (const half& a, const half& b) {
EIGEN_STRONG_INLINE __device__ bool operator >= (const half& a, const half& b) {
return __hge(a, b);
}
#endif
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) {
return half(vaddh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator * (const half& a, const half& b) {
return half(vmulh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a, const half& b) {
return half(vsubh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, const half& b) {
return half(vdivh_f16(a.x, b.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a) {
return half(vnegh_f16(a.x));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator += (half& a, const half& b) {
a = half(vaddh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator *= (half& a, const half& b) {
a = half(vmulh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator -= (half& a, const half& b) {
a = half(vsubh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator /= (half& a, const half& b) {
a = half(vdivh_f16(a.x, b.x));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const half& a, const half& b) {
return vceqh_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const half& a, const half& b) {
return !vceqh_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const half& a, const half& b) {
return vclth_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const half& a, const half& b) {
return vcleh_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const half& a, const half& b) {
return vcgth_f16(a.x, b.x);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const half& a, const half& b) {
return vcgeh_f16(a.x, b.x);
}
// We need to distinguish clang as the CUDA compiler from clang as the host compiler,
// invoked by NVCC (e.g. on MacOS). The former needs to see both host and device implementation
// of the functions, while the latter can only deal with one of them.
#if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for half floats
#elif !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for half floats
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
// We need to provide emulated *host-side* FP16 operators for clang.
@@ -323,7 +402,6 @@ EIGEN_STRONG_INLINE __device__ bool operator >= (const half& a, const half& b) {
// Definitions for CPUs and older HIP+CUDA, mostly working through conversion
// to/from fp32.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) {
return half(float(a) + float(b));
}
@@ -387,15 +465,60 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, Index b) {
return half(static_cast<float>(a) / static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator++(half& a) {
a += half(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator--(half& a) {
a -= half(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator++(half& a, int) {
half original_value = a;
++a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator--(half& a, int) {
half original_value = a;
--a;
return original_value;
}
// Conversion routines, including fallbacks for the host or older CUDA.
// Note that newer Intel CPUs (Haswell or newer) have vectorized versions of
// these in hardware. If we need more performance on older/other CPUs, they are
// also possible to vectorize directly.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw raw_uint16_to_half(unsigned short x) {
__half_raw h;
h.x = x;
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw raw_uint16_to_half(numext::uint16_t x) {
// We cannot simply do a "return __half_raw(x)" here, because __half_raw is union type
// in the hip_fp16 header file, and that will trigger a compile error
// On the other hand, having anything but a return statement also triggers a compile error
// because this is constexpr function.
// Fortunately, since we need to disable EIGEN_CONSTEXPR for GPU anyway, we can get out
// of this catch22 by having separate bodies for GPU / non GPU
#if defined(EIGEN_HAS_GPU_FP16)
__half_raw h;
h.x = x;
return h;
#else
return __half_raw(x);
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC numext::uint16_t raw_half_as_uint16(const __half_raw& h) {
// HIP/CUDA/Default have a member 'x' of type uint16_t.
// For ARM64 native half, the member 'x' is of type __fp16, so we need to bit-cast.
// For SYCL, cl::sycl::half is _Float16, so cast directly.
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return numext::bit_cast<numext::uint16_t>(h.x);
#elif defined(SYCL_DEVICE_ONLY)
return numext::bit_cast<numext::uint16_t>(h);
#else
return h.x;
#endif
}
union float32_bits {
@@ -414,6 +537,11 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
h.x = _cvtss_sh(ff, 0);
return h;
#elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
__half_raw h;
h.x = static_cast<__fp16>(ff);
return h;
#else
float32_bits f; f.f = ff;
@@ -422,7 +550,7 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
const float32_bits denorm_magic = { ((127 - 15) + (23 - 10) + 1) << 23 };
unsigned int sign_mask = 0x80000000u;
__half_raw o;
o.x = static_cast<unsigned short>(0x0u);
o.x = static_cast<numext::uint16_t>(0x0u);
unsigned int sign = f.u & sign_mask;
f.u ^= sign;
@@ -442,20 +570,22 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
f.f += denorm_magic.f;
// and one integer subtract of the bias later, we have our final float!
o.x = static_cast<unsigned short>(f.u - denorm_magic.u);
o.x = static_cast<numext::uint16_t>(f.u - denorm_magic.u);
} else {
unsigned int mant_odd = (f.u >> 13) & 1; // resulting mantissa is odd
// update exponent, rounding bias part 1
f.u += ((unsigned int)(15 - 127) << 23) + 0xfff;
// Equivalent to `f.u += ((unsigned int)(15 - 127) << 23) + 0xfff`, but
// without arithmetic overflow.
f.u += 0xc8000fffU;
// rounding bias part 2
f.u += mant_odd;
// take the bits!
o.x = static_cast<unsigned short>(f.u >> 13);
o.x = static_cast<numext::uint16_t>(f.u >> 13);
}
}
o.x |= static_cast<unsigned short>(sign >> 16);
o.x |= static_cast<numext::uint16_t>(sign >> 16);
return o;
#endif
}
@@ -464,10 +594,10 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h) {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __half2float(h);
#elif defined(EIGEN_HAS_FP16_C)
return _cvtsh_ss(h.x);
#elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return static_cast<float>(h.x);
#else
const float32_bits magic = { 113 << 23 };
const unsigned int shifted_exp = 0x7c00 << 13; // exponent mask after shift
@@ -493,12 +623,18 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h) {
// --- standard functions ---
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const half& a) {
#ifdef EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC
return (numext::bit_cast<numext::uint16_t>(a.x) & 0x7fff) == 0x7c00;
#else
return (a.x & 0x7fff) == 0x7c00;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const half& a) {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __hisnan(a);
#elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return (numext::bit_cast<numext::uint16_t>(a.x) & 0x7fff) > 0x7c00;
#else
return (a.x & 0x7fff) > 0x7c00;
#endif
@@ -508,9 +644,13 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const half& a) {
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half abs(const half& a) {
#if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
return half(vabsh_f16(a.x));
#else
half result;
result.x = a.x & 0x7FFF;
return result;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half exp(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \
@@ -537,6 +677,10 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log1p(const half& a) {
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log10(const half& a) {
return half(::log10f(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log2(const half& a) {
return half(static_cast<float>(EIGEN_LOG2E) * ::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half sqrt(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
@@ -560,6 +704,12 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tan(const half& a) {
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tanh(const half& a) {
return half(::tanhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half asin(const half& a) {
return half(::asinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half acos(const half& a) {
return half(::acosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
@@ -576,6 +726,15 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) {
return half(::ceilf(float(a)));
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half rint(const half& a) {
return half(::rintf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half round(const half& a) {
return half(::roundf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half fmod(const half& a, const half& b) {
return half(::fmodf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (min)(const half& a, const half& b) {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
@@ -639,118 +798,145 @@ template<> struct NumTraits<Eigen::half>
RequireInitialization = false
};
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half epsilon() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half epsilon() {
return half_impl::raw_uint16_to_half(0x0800);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half dummy_precision() { return Eigen::half(1e-2f); }
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half highest() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half dummy_precision() {
return half_impl::raw_uint16_to_half(0x211f); // Eigen::half(1e-2f);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half highest() {
return half_impl::raw_uint16_to_half(0x7bff);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half lowest() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half lowest() {
return half_impl::raw_uint16_to_half(0xfbff);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half infinity() {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half infinity() {
return half_impl::raw_uint16_to_half(0x7c00);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Eigen::half quiet_NaN() {
return half_impl::raw_uint16_to_half(0x7c01);
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half quiet_NaN() {
return half_impl::raw_uint16_to_half(0x7e00);
}
};
} // end namespace Eigen
// C-like standard mathematical functions and trancendentals.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half fabsh(const Eigen::half& a) {
Eigen::half result;
result.x = a.x & 0x7FFF;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half exph(const Eigen::half& a) {
return Eigen::half(::expf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half logh(const Eigen::half& a) {
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return Eigen::half(::hlog(a));
#else
return Eigen::half(::logf(float(a)));
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half sqrth(const Eigen::half& a) {
return Eigen::half(::sqrtf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half powh(const Eigen::half& a, const Eigen::half& b) {
return Eigen::half(::powf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half floorh(const Eigen::half& a) {
return Eigen::half(::floorf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half ceilh(const Eigen::half& a) {
return Eigen::half(::ceilf(float(a)));
}
namespace std {
#if __cplusplus > 199711L
template <>
struct hash<Eigen::half> {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::half& a) const {
return static_cast<std::size_t>(a.x);
}
};
#if defined(EIGEN_HAS_GPU_FP16) || defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC)
#pragma pop_macro("EIGEN_CONSTEXPR")
#endif
} // end namespace std
// Add the missing shfl_xor intrinsic
#if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneMask, int width=warpSize) {
#if (EIGEN_CUDA_SDK_VER < 90000) || \
defined(EIGEN_HAS_HIP_FP16)
return static_cast<Eigen::half>(__shfl_xor(static_cast<float>(var), laneMask, width));
#else
return static_cast<Eigen::half>(__shfl_xor_sync(0xFFFFFFFF, static_cast<float>(var), laneMask, width));
#endif
}
#endif
// ldg() has an overload for __half_raw, but we also need one for Eigen::half.
#if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 350) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half __ldg(const Eigen::half* ptr) {
return Eigen::half_impl::raw_uint16_to_half(
__ldg(reinterpret_cast<const unsigned short*>(ptr)));
}
#endif
#if defined(EIGEN_GPU_COMPILE_PHASE)
namespace Eigen {
namespace numext {
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isnan)(const Eigen::half& h) {
#if defined(EIGEN_GPU_COMPILE_PHASE)
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isnan)(const Eigen::half& h) {
return (half_impl::isnan)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isinf)(const Eigen::half& h) {
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isinf)(const Eigen::half& h) {
return (half_impl::isinf)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isfinite)(const Eigen::half& h) {
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isfinite)(const Eigen::half& h) {
return (half_impl::isfinite)(h);
}
} // namespace Eigen
#endif
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bit_cast<Eigen::half, uint16_t>(const uint16_t& src) {
return Eigen::half(Eigen::half_impl::raw_uint16_to_half(src));
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast<uint16_t, Eigen::half>(const Eigen::half& src) {
return Eigen::half_impl::raw_half_as_uint16(src);
}
} // namespace numext
} // namespace Eigen
// Add the missing shfl* intrinsics.
// The __shfl* functions are only valid on HIP or _CUDA_ARCH_ >= 300.
// CUDA defines them for (__CUDA_ARCH__ >= 300 || !defined(__CUDA_ARCH__))
//
// HIP and CUDA prior to SDK 9.0 define
// __shfl, __shfl_up, __shfl_down, __shfl_xor for int and float
// CUDA since 9.0 deprecates those and instead defines
// __shfl_sync, __shfl_up_sync, __shfl_down_sync, __shfl_xor_sync,
// with native support for __half and __nv_bfloat16
//
// Note that the following are __device__ - only functions.
#if (defined(EIGEN_CUDACC) && (!defined(EIGEN_CUDA_ARCH) || EIGEN_CUDA_ARCH >= 300)) \
|| defined(EIGEN_HIPCC)
#if defined(EIGEN_HAS_CUDA_FP16) && EIGEN_CUDA_SDK_VER >= 90000
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_sync(unsigned mask, Eigen::half var, int srcLane, int width=warpSize) {
const __half h = var;
return static_cast<Eigen::half>(__shfl_sync(mask, h, srcLane, width));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_up_sync(unsigned mask, Eigen::half var, unsigned int delta, int width=warpSize) {
const __half h = var;
return static_cast<Eigen::half>(__shfl_up_sync(mask, h, delta, width));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_down_sync(unsigned mask, Eigen::half var, unsigned int delta, int width=warpSize) {
const __half h = var;
return static_cast<Eigen::half>(__shfl_down_sync(mask, h, delta, width));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor_sync(unsigned mask, Eigen::half var, int laneMask, int width=warpSize) {
const __half h = var;
return static_cast<Eigen::half>(__shfl_xor_sync(mask, h, laneMask, width));
}
#else // HIP or CUDA SDK < 9.0
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl(Eigen::half var, int srcLane, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl(ivar, srcLane, width)));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_up(Eigen::half var, unsigned int delta, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl_up(ivar, delta, width)));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_down(Eigen::half var, unsigned int delta, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl_down(ivar, delta, width)));
}
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneMask, int width=warpSize) {
const int ivar = static_cast<int>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(var));
return Eigen::numext::bit_cast<Eigen::half>(static_cast<Eigen::numext::uint16_t>(__shfl_xor(ivar, laneMask, width)));
}
#endif // HIP vs CUDA
#endif // __shfl*
// ldg() has an overload for __half_raw, but we also need one for Eigen::half.
#if (defined(EIGEN_CUDACC) && (!defined(EIGEN_CUDA_ARCH) || EIGEN_CUDA_ARCH >= 350)) \
|| defined(EIGEN_HIPCC)
EIGEN_STRONG_INLINE __device__ Eigen::half __ldg(const Eigen::half* ptr) {
return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const Eigen::numext::uint16_t*>(ptr)));
}
#endif // __ldg
#if EIGEN_HAS_STD_HASH
namespace std {
template <>
struct hash<Eigen::half> {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::half& a) const {
return static_cast<std::size_t>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(a));
}
};
} // end namespace std
#endif
#endif // EIGEN_HALF_H

View File

@@ -71,6 +71,49 @@ template<>
struct functor_traits<scalar_cast_op<Eigen::half, float> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<float, Eigen::bfloat16> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::bfloat16 result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 operator() (const float& a) const {
return Eigen::bfloat16(a);
}
};
template<>
struct functor_traits<scalar_cast_op<float, Eigen::bfloat16> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<int, Eigen::bfloat16> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::bfloat16 result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 operator() (const int& a) const {
return Eigen::bfloat16(static_cast<float>(a));
}
};
template<>
struct functor_traits<scalar_cast_op<int, Eigen::bfloat16> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<Eigen::bfloat16, float> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef float result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator() (const Eigen::bfloat16& a) const {
return static_cast<float>(a);
}
};
template<>
struct functor_traits<scalar_cast_op<Eigen::bfloat16, float> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -17,12 +17,13 @@ namespace internal {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 2,
TgtCoeffRatio = 1
SrcCoeffRatio = 1,
TgtCoeffRatio = 2
};
};
@@ -32,15 +33,39 @@ template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast<half2, float4>(con
return make_float4(r1.x, r1.y, r2.x, r2.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pcast<float4, Packet4h2>(const float4& a, const float4& b) {
Packet4h2 r;
half2* r_alias=reinterpret_cast<half2*>(&r);
r_alias[0]=__floats2half2_rn(a.x,a.y);
r_alias[1]=__floats2half2_rn(a.z,a.w);
r_alias[2]=__floats2half2_rn(b.x,b.y);
r_alias[3]=__floats2half2_rn(b.z,b.w);
return r;
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 2
SrcCoeffRatio = 2,
TgtCoeffRatio = 1
};
};
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast<Packet4h2, float4>(const Packet4h2& a) {
// Simply discard the second half of the input
float4 r;
const half2* a_alias=reinterpret_cast<const half2*>(&a);
float2 r1 = __half22float2(a_alias[0]);
float2 r2 = __half22float2(a_alias[1]);
r.x=static_cast<float>(r1.x);
r.y=static_cast<float>(r1.y);
r.z=static_cast<float>(r2.x);
r.w=static_cast<float>(r2.y);
return r;
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pcast<float4, half2>(const float4& a) {
// Simply discard the second half of the input
return __floats2half2_rn(a.x, a.y);

View File

@@ -297,20 +297,6 @@ EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packet2cf& a) {
return std::complex<float>(value[0], value[1]);
}
template <>
EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs) {
EIGEN_MSA_DEBUG;
Packet4f sum1, sum2, sum;
// Add the first two 64-bit float32x2_t of vecs[0]
sum1 = (Packet4f)__builtin_msa_ilvr_d((v2i64)vecs[1].v, (v2i64)vecs[0].v);
sum2 = (Packet4f)__builtin_msa_ilvl_d((v2i64)vecs[1].v, (v2i64)vecs[0].v);
sum = padd(sum1, sum2);
return Packet2cf(sum);
}
template <>
EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a) {
EIGEN_MSA_DEBUG;
@@ -319,15 +305,6 @@ EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a
(a.v[0] * a.v[3]) + (a.v[1] * a.v[2]));
}
template <int Offset>
struct palign_impl<Offset, Packet2cf> {
EIGEN_STRONG_INLINE static void run(Packet2cf& first, const Packet2cf& second) {
if (Offset == 1) {
first.v = (Packet4f)__builtin_msa_sldi_b((v16i8)second.v, (v16i8)first.v, Offset * 8);
}
}
};
template <>
struct conj_helper<Packet2cf, Packet2cf, false, true> {
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y,
@@ -660,13 +637,6 @@ EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) {
return pfirst(a);
}
template <>
EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) {
EIGEN_MSA_DEBUG;
return vecs[0];
}
template <>
EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) {
EIGEN_MSA_DEBUG;
@@ -674,15 +644,6 @@ EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd&
return pfirst(a);
}
template <int Offset>
struct palign_impl<Offset, Packet1cd> {
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/) {
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes
// boundary...
}
};
template <>
struct conj_helper<Packet1cd, Packet1cd, false, true> {
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y,

View File

@@ -575,45 +575,6 @@ EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a) {
return s[0];
}
template <>
EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs) {
EIGEN_MSA_DEBUG;
v4i32 tmp1, tmp2, tmp3, tmp4;
Packet4f sum;
tmp1 = __builtin_msa_ilvr_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp2 = __builtin_msa_ilvr_w((v4i32)vecs[3], (v4i32)vecs[2]);
tmp3 = __builtin_msa_ilvl_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp4 = __builtin_msa_ilvl_w((v4i32)vecs[3], (v4i32)vecs[2]);
sum = (Packet4f)__builtin_msa_ilvr_d((v2i64)tmp2, (v2i64)tmp1);
sum = padd(sum, (Packet4f)__builtin_msa_ilvod_d((v2i64)tmp2, (v2i64)tmp1));
sum = padd(sum, (Packet4f)__builtin_msa_ilvr_d((v2i64)tmp4, (v2i64)tmp3));
sum = padd(sum, (Packet4f)__builtin_msa_ilvod_d((v2i64)tmp4, (v2i64)tmp3));
return sum;
}
template <>
EIGEN_STRONG_INLINE Packet4i preduxp<Packet4i>(const Packet4i* vecs) {
EIGEN_MSA_DEBUG;
v4i32 tmp1, tmp2, tmp3, tmp4;
Packet4i sum;
tmp1 = __builtin_msa_ilvr_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp2 = __builtin_msa_ilvr_w((v4i32)vecs[3], (v4i32)vecs[2]);
tmp3 = __builtin_msa_ilvl_w((v4i32)vecs[1], (v4i32)vecs[0]);
tmp4 = __builtin_msa_ilvl_w((v4i32)vecs[3], (v4i32)vecs[2]);
sum = (Packet4i)__builtin_msa_ilvr_d((v2i64)tmp2, (v2i64)tmp1);
sum = padd(sum, (Packet4i)__builtin_msa_ilvod_d((v2i64)tmp2, (v2i64)tmp1));
sum = padd(sum, (Packet4i)__builtin_msa_ilvr_d((v2i64)tmp4, (v2i64)tmp3));
sum = padd(sum, (Packet4i)__builtin_msa_ilvod_d((v2i64)tmp4, (v2i64)tmp3));
return sum;
}
template <>
EIGEN_STRONG_INLINE int32_t predux<Packet4i>(const Packet4i& a) {
@@ -714,25 +675,6 @@ EIGEN_STRONG_INLINE int32_t predux_max<Packet4i>(const Packet4i& a) {
return m[0];
}
#define PALIGN_MSA(Offset, Type, Command) \
template <> \
struct palign_impl<Offset, Type> { \
EIGEN_STRONG_INLINE static void run(Type& first, const Type& second) { \
if (Offset != 0) first = (Type)(Command((v16i8)second, (v16i8)first, Offset * 4)); \
} \
};
PALIGN_MSA(0, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(1, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(2, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(3, Packet4f, __builtin_msa_sldi_b)
PALIGN_MSA(0, Packet4i, __builtin_msa_sldi_b)
PALIGN_MSA(1, Packet4i, __builtin_msa_sldi_b)
PALIGN_MSA(2, Packet4i, __builtin_msa_sldi_b)
PALIGN_MSA(3, Packet4i, __builtin_msa_sldi_b)
#undef PALIGN_MSA
inline std::ostream& operator<<(std::ostream& os, const PacketBlock<Packet4f, 4>& value) {
os << "[ " << value.packet[0] << "," << std::endl
<< " " << value.packet[1] << "," << std::endl
@@ -1148,16 +1090,6 @@ EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) {
return s[0];
}
template <>
EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs) {
EIGEN_MSA_DEBUG;
Packet2d v0 = (Packet2d)__builtin_msa_ilvev_d((v2i64)vecs[1], (v2i64)vecs[0]);
Packet2d v1 = (Packet2d)__builtin_msa_ilvod_d((v2i64)vecs[1], (v2i64)vecs[0]);
return padd(v0, v1);
}
// Other reduction functions:
// mul
template <>
@@ -1217,19 +1149,6 @@ EIGEN_STRONG_INLINE Packet2d prsqrt(const Packet2d& a) {
#endif
}
#define PALIGN_MSA(Offset, Type, Command) \
template <> \
struct palign_impl<Offset, Type> { \
EIGEN_STRONG_INLINE static void run(Type& first, const Type& second) { \
if (Offset != 0) first = (Type)(Command((v16i8)second, (v16i8)first, Offset * 8)); \
} \
};
PALIGN_MSA(0, Packet2d, __builtin_msa_sldi_b)
PALIGN_MSA(1, Packet2d, __builtin_msa_sldi_b)
#undef PALIGN_MSA
inline std::ostream& operator<<(std::ostream& os, const PacketBlock<Packet2d, 2>& value) {
os << "[ " << value.packet[0] << "," << std::endl << " " << value.packet[1] << " ]";
return os;

Some files were not shown because too many files have changed in this diff Show More