Compare commits

...

100 Commits
5.0.0 ... 3.4.0

Author SHA1 Message Date
Rasmus Munk Larsen
3147391d94 Change version to 3.4.0. 2021-08-18 13:41:58 -07:00
Antonio Sanchez
115591b9e3 Workaround VS 2017 arg bug.
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs.  It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).


(cherry picked from commit 2b410ecbef)
2021-08-18 19:04:50 +00:00
Antonio Sanchez
fd100138dd Remove unaligned assert tests.
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor.  Newer versions of clang optimize away
this check.

Removing the affected tests.


(cherry picked from commit 0c4ae56e37)
2021-08-18 18:39:04 +00:00
Jakob Struye
1ec173b54e Clearer doc for squaredNorm
(cherry picked from commit 53a29c7e35)
2021-08-18 15:12:36 +00:00
Antonio Sanchez
aef926abf6 Renamed shift_left/shift_right to shiftLeft/shiftRight.
For naming consistency.  Also moved to ArrayCwiseUnaryOps, and added
test.


(cherry picked from commit fc9d352432)
2021-08-18 14:44:31 +00:00
Antonio Sanchez
f1032255d3 Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.


(cherry picked from commit 2cc6ee0d2e)
2021-08-17 15:33:55 +00:00
Chip-Kerchner
f57dec64ef Fix unaligned loads in ploadLhs & ploadRhs for P8.
(cherry picked from commit 8dcf3e38ba)
2021-08-17 12:48:36 +00:00
Rasmus Munk Larsen
926e1a8226 Update documentation for matrix decompositions and least squares solvers.
(cherry picked from commit 7e6f94961c)
2021-08-16 22:11:38 +00:00
andiwand
cd474d4cd0 minor doc fix in Map.h
(cherry picked from commit 5c6b3efead)
2021-08-16 14:26:39 +00:00
Chip-Kerchner
0b56b62f30 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).
(cherry picked from commit e07227c411)
2021-08-13 18:01:15 +00:00
Chip Kerchner
44cc96e1a1 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+
(cherry picked from commit 66499f0f17)
2021-08-12 21:39:17 +00:00
Rasmus Munk Larsen
576e451b10 Add CompleteOrthogonalDecomposition to the table of linear algeba decompositions.
(cherry picked from commit 96e3b4fc95)
2021-08-12 16:49:40 +00:00
Antonio Sanchez
0d89012708 Update code snippet for tridiagonalize_inplace.
(cherry picked from commit fb1718ad14)
2021-08-12 15:37:32 +00:00
Rasmus Munk Larsen
6d2506040c * revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B.
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.

Authored by @awoniu.

(cherry picked from commit 8ce341caf2)
2021-08-11 18:11:26 +00:00
Nikolay Tverdokhleb
cb44a003de Do not set AnnoyingScalar::dont_throw if not defined EIGEN_TEST_ANNOYING_SCALAR_DONT_THROW.
- Because that member is not declared if the macro is defined.


(cherry picked from commit f1b899eef7)
2021-08-11 16:39:44 +00:00
ChipKerchner
13d7658c5d Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).
(cherry picked from commit 413bc491f1)
2021-08-10 20:40:54 +00:00
jenswehner
338924602d added includes for unordered_map
(cherry picked from commit e3e74001f7)
2021-08-10 16:10:03 +00:00
Gauri Deshpande
93bff85a42 remove denormal flushing in fp32tobf16 for avx & avx512
(cherry picked from commit e6a5a594a7)
2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen
4e0357c6dd Avoid memory allocation in tridiagonalization_inplace_selector::run.
(cherry picked from commit a5a7faeb45)
2021-08-06 21:48:00 +00:00
Daniel N. Miller (APD)
1e9f623f3e Do not build shared libs if not supported
(cherry picked from commit 09d7122468)
2021-08-06 21:47:37 +00:00
Jens Wehner
4240b480e0 updated documentation for middleCol and middleRow
(cherry picked from commit 4d870c49b7)
2021-08-05 17:53:36 +00:00
Antonio Sanchez
5b83d3c4bc Make inverse 3x3 faster and avoid gcc bug.
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory.  I'm *pretty* sure it's
a false positive, but it's hard to trigger.  The same warning
does not trigger with clang or later compiler versions.

In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.

```
name                                            old cpu/op  new cpu/op  delta
BM_Inverse3x3<DynamicMatrix3T<float>>            423ns ± 2%   433ns ± 3%   +2.32%    (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>>           425ns ± 2%   427ns ± 3%   +0.48%    (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>>            7.10ns ± 2%  0.80ns ± 1%  -88.67%  (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>>           7.45ns ± 2%  1.34ns ± 1%  -82.01%  (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>>     409ns ± 3%   419ns ± 3%   +2.40%   (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>>    414ns ± 3%   413ns ± 2%     ~       (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>>     7.57ns ± 1%  0.80ns ± 1%  -89.37%  (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>>    9.09ns ± 1%  2.58ns ±41%  -71.60%  (p=0.000 n=113+116)
```


(cherry picked from commit 5ad8b9bfe2)
2021-08-04 22:06:52 +00:00
Antonio Sanchez
46ecdcd745 Fix MPReal detection and support.
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.

Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`.  It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.

Fixes #2282.


(cherry picked from commit 31f796ebef)
2021-08-03 18:13:12 +00:00
Antonio Sanchez
9a1691a14e Fix cmake warnings, FindPASTIX/FindPTSCOTCH.
We were getting a lot of warnings due to nested `find_package` calls
within `Find***.cmake` files.  The recommended approach is to use
[`find_dependency`](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html)
in package configuration files. I made this change for all instances.

Case mismatches between `Find<Package>.cmake` and calling
`find_package(<PACKAGE>`) also lead to warnings. Fixed for
`FindPASTIX.cmake` and `FindSCOTCH.cmake`.

`FindBLASEXT.cmake` was broken due to calling `find_package_handle_standard_args(BLAS ...)`.
The package name must match, otherwise the `find_package(BLASEXT)` falsely thinks
the package wasn't found.  I changed to `BLASEXT`, but then also copied that value
to `BLAS_FOUND` for compatibility.

`FindPastix.cmake` had a typo that incorrectly added `PTSCOTCH` when looking for
the `SCOTCH` component.

`FindPTSCOTCH` incorrectly added `***-NOTFOUND` to include/library lists,
corrupting them.  This led to cmake errors down-the-line.

Fixes #2288.


(cherry picked from commit 1cdec38653)
2021-08-03 17:48:20 +00:00
Antonio Sanchez
bb33880e57 Fix TriSycl CMake files.
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.

Also, trisycl now requires c++17.


(cherry picked from commit 8cf6cb27ba)
2021-08-03 17:25:17 +00:00
Antonio Sanchez
237c59a2aa Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset.
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.

This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.

Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.

For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.

Fixes #2201.


(cherry picked from commit 3d98a6ef5c)
2021-08-03 16:32:59 +00:00
Antonio Sanchez
3dc42eeaec Enable equality comparisons on GPU.
Since `std::equal_to::operator()` is not a device function, it
fails on GPU.  On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).

Replacing this with a portable version enables comparisons on device.

Addresses #2292 - would need to be cherry-picked.  The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.


(cherry picked from commit 7880f10526)
2021-08-03 16:15:44 +00:00
hyunggi-sv
7adc1545b4 fix:typo in dox (has->have)
(cherry picked from commit 02a0e79c70)
2021-08-03 00:54:41 +00:00
Antonio Sanchez
c0c7b695cd Fix assignment operator issue for latest MSVC+NVCC.
Details are scattered across #920, #1000, #1324, #2291.

Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors).  This mess tries to cover
all the cases encountered.

Fixes #2291.


(cherry picked from commit 9816fe59b4)
2021-08-03 00:52:21 +00:00
Alexander Karatarakis
c334eece44 _DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier
(cherry picked from commit f357283d31)
2021-07-29 18:18:47 +00:00
Jonas Harsch
5ccb72b2e4 Fixed typo in TutorialSparse.dox
(cherry picked from commit 5b81764c0f)
2021-07-26 14:33:10 +00:00
arthurfeeney
9c90d5d832 Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.
(cherry picked from commit a77638387d)
2021-07-22 18:01:55 +00:00
Antonio Sanchez
5d37114fc0 Fix explicit default cache size typo.
(cherry picked from commit 297f0f563d)
2021-07-20 18:42:25 +00:00
Rohit Santhanam
930696fc53 Enable extract et. al. for HIP GPU.
(cherry picked from commit beea14a18f)
2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen
56966fd2e6 Defer to std::fill_n when filling a dense object with a constant value.
(cherry picked from commit 0c361c4899)
2021-07-09 03:59:56 +00:00
Jonas Harsch
5a3c9eddb4 Removed superfluous boolean degenerate in TensorMorphing.h.
(cherry picked from commit e9c9a3130b)
2021-07-08 18:34:10 +00:00
Guoqiang QI
69ec4907da Make a copy of input matrix when try to do the inverse in place, this fixes #2285.
(cherry picked from commit 4bcd42c271)
2021-07-08 17:07:54 +00:00
Antonio Sanchez
7571704a43 Fix CMake directory issues.
Allows absolute and relative paths for
- `INCLUDE_INSTALL_DIR`
- `CMAKEPACKAGE_INSTALL_DIR`
- `PKGCONFIG_INSTALL_DIR`

Type should be `PATH` not `STRING`.  Contrary to !211, these don't
seem to be made absolute if user-defined - according to the doc any
directories should use `PATH` type, which allows a file dialog
to be used via the GUI.  It also better handles file separators.

If user provides an absolute path, it will be made relative to
`CMAKE_INSTALL_PREFIX` so that the `configure_packet_config_file` will
work.

Fixes #2155 and #2269.


(cherry picked from commit f44f05532d)
2021-07-07 17:44:00 +00:00
Antonio Sanchez
84955d109f Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.


(cherry picked from commit f5a9873bbb)
2021-07-07 17:18:20 +00:00
Jonas Harsch
601814b575 Don't crash when attempting to shuffle an empty tensor.
(cherry picked from commit aab747021b)
2021-07-02 21:08:38 +00:00
Rasmus Munk Larsen
05bab8139a Fix breakage of conj_helper in conjunction with custom types introduced in !537.
(cherry picked from commit 7b35638ddb)
2021-07-02 20:59:50 +00:00
Chip Kerchner
eebde572d9 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow
(cherry picked from commit 91e99ec1e0)
2021-07-01 23:32:38 +00:00
Antonio Sanchez
8190739f12 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.


(cherry picked from commit 6035da5283)
2021-07-01 23:18:10 +00:00
Antonio Sanchez
b6db013435 Fix inverse nullptr/asan errors for LU.
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.


(cherry picked from commit 154f00e9ea)
2021-07-01 22:57:25 +00:00
Dan Miller
1f6b1c1a1f Fix duplicate definitions on Mac
(cherry picked from commit eb04775903)
2021-07-01 20:49:05 +00:00
Alexander Karatarakis
517294d6e1 Make DenseStorage<> trivially_copyable
(cherry picked from commit 60400334a9)
2021-07-01 20:48:47 +00:00
大河メタル
94e2250b36 Correct declarations for aarch64-pc-windows-msvc
(cherry picked from commit c81da59a25)
2021-06-30 04:10:04 +00:00
Antonio Sanchez
d82d915047 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.


(cherry picked from commit 3a087ccb99)
2021-06-29 23:28:37 +00:00
Rasmus Munk Larsen
380d0e4916 Get rid of redundant pabs instruction in complex square root.
(cherry picked from commit 5aebbe9098)
2021-06-29 23:27:09 +00:00
Rohit Santhanam
e83af2cc24 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.


(cherry picked from commit 2d132d1736)
2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen
413ff2b531 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
(cherry picked from commit bffd267d17)
2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen
a235ddef39 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
(cherry picked from commit 52a5f98212)
2021-06-24 23:30:42 +00:00
Rasmus Munk Larsen
4780d8dfb2 Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp
(cherry picked from commit c8a2b4d20a)
2021-06-21 19:07:17 +00:00
Rasmus Munk Larsen
fd5d23fdf3 Update ComplexEigenSolver_eigenvectors.cpp
(cherry picked from commit ea62c937ed)
2021-06-21 19:06:54 +00:00
Antonio Sanchez
a2040ef796 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.


(cherry picked from commit e9ab4278b7)
2021-06-21 18:14:53 +00:00
Antonio Sanchez
c2c0f6f64b Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267


(cherry picked from commit 35a367d557)
2021-06-21 17:26:07 +00:00
Antonio Sanchez
ee4e099aa2 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.


(cherry picked from commit 12e8d57108)
2021-06-17 17:11:08 +00:00
Chip-Kerchner
9fc93ce31a EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
(cherry picked from commit ef1fd341a8)
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28 Add missing ppc pcmp_lt_or_nan<Packet8bf>
(cherry picked from commit 9e94c59570)
2021-06-15 22:12:22 +00:00
Antonio Sanchez
2d6eaaf687 Fix placement of permanent GPU defines.
(cherry picked from commit 954879183b)
2021-06-15 19:18:20 +00:00
Rasmus Munk Larsen
47722a66f2 Fix more enum arithmetic.
(cherry picked from commit 13fb5ab92c)
2021-06-15 16:40:35 +00:00
Antonio Sanchez
5e75331b9f Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.


(cherry picked from commit ad82d20cf6)
2021-06-12 00:02:26 +00:00
Antonio Sanchez
b5fc69bdd8 Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.


(cherry picked from commit 514977f31b)
2021-06-11 17:48:37 +00:00
Antonio Sanchez
4b683b65df Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.


(cherry picked from commit 6aec83263d)
2021-06-11 17:19:29 +00:00
Rasmus Munk Larsen
1cb1ffd5b2 Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
(cherry picked from commit fc87e2cbaa)
2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen
4b502a7215 Fix c++20 warnings about using enums in arithmetic expressions.
(cherry picked from commit f64b2954c7)
2021-06-11 02:35:19 +00:00
Nicolas Cornu
85868564df Fix parsing of version for nvhpc
As the first line of the version is empty it crashes,
so delete first line if it is empty


(cherry picked from commit 001a57519a)
2021-06-10 18:50:22 +00:00
Rohit Santhanam
cbb6ae6296 Removed dead code from GPU float16 unit test.
(cherry picked from commit c8d40a7bf1)
2021-06-10 17:16:47 +00:00
Cyril Kaiser
573570b6c9 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.
(cherry picked from commit 91cd67f057)
2021-05-26 19:45:25 +00:00
Antonio Sanchez
98cf1e076f Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.


(cherry picked from commit dba753a986)
2021-05-25 19:09:50 +00:00
Antonio Sanchez
ee2a8f7139 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251


(cherry picked from commit ebb300d0b4)
2021-05-25 18:19:53 +00:00
Jakub Lichman
3835046309 predux_half_dowto4 test extended to all applicable packets
(cherry picked from commit 12471fcb5d)
2021-05-21 16:58:16 +00:00
Steve Bronder
4fbd01cd4b Adds macro for checking if C++14 variable templates are supported
(cherry picked from commit 1720057023)
2021-05-21 16:43:30 +00:00
Niall Murphy
a883a8797c Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.


(cherry picked from commit 391094c507)
2021-05-20 23:43:57 +00:00
Jakub Lichman
0bd9e9bc45 ptranpose test for non-square kernels added
(cherry picked from commit 8877f8d9b2)
2021-05-20 19:27:20 +00:00
Guoqiang QI
77c66e368c Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 .
(cherry picked from commit 3e006bfd31)
2021-05-13 15:03:47 +00:00
guoqiangqi
2f908f8255 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
(cherry picked from commit 3d9051ea84)
2021-05-12 17:02:19 +00:00
Nathan Luehr
82f13830e6 Fix calls to device functions from host code
(cherry picked from commit 972cf0c28a)
2021-05-12 17:01:45 +00:00
Nathan Luehr
d1825cbb68 Device implementation of log for std::complex types.
(cherry picked from commit 7e6a1c129c)
2021-05-11 22:31:53 +00:00
Nathan Luehr
d9288f078d Fix ambiguity due to argument dependent lookup.
(cherry picked from commit 6753f0f197)
2021-05-11 22:00:36 +00:00
Rohit Santhanam
85ebd6aff8 Fix for issue where numext::imag and numext::real are used before they are defined.
(cherry picked from commit 39ec31c0ad)
2021-05-10 20:14:10 +00:00
Antonio Sanchez
2947c0cc84 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.


(cherry picked from commit c0eb5f89a4)
2021-05-07 18:38:23 +00:00
Antonio Sanchez
25424f4cf1 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.


(cherry picked from commit 0eba8a1fe3)
2021-05-07 18:13:40 +00:00
Antonio Sanchez
42acbd5700 Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.


(cherry picked from commit 90e9a33e1c)
2021-05-07 17:52:07 +00:00
Christoph Hertzberg
9e0dc8f09b Revert addition of unused paddsub<Packet2cf>. This fixes #2242
(cherry picked from commit 722ca0b665)
2021-05-07 16:23:03 +00:00
Antonio Sanchez
da19f7a910 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.


(cherry picked from commit e3b7f59659)
2021-05-05 23:37:48 +00:00
Antonio Sanchez
fc2cc10842 Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.


(cherry picked from commit 1c013be2cc)
2021-04-29 17:58:45 +00:00
Antonio Sanchez
a33855f6ee Add missing pcmp_lt_or_nan for NEON Packet4bf.
(cherry picked from commit 172db7bfc3)
2021-04-27 21:15:08 +00:00
Theo Fletcher
83df5df61b Added complex matrix unit tests for SelfAdjointEigenSolve
(cherry picked from commit 2ced0cc233)
2021-04-26 19:18:53 +00:00
Jakub Lichman
ac3c5aad31 Tests added and AVX512 bug fixed for pcmp_lt_or_nan
(cherry picked from commit d87648a6be)
2021-04-26 18:07:55 +00:00
Jakub Lichman
63abb10000 Tests for pcmp_lt and pcmp_le added
(cherry picked from commit 1115f5462e)
2021-04-23 19:52:23 +00:00
Turing Eret
baf601a0e3 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.


(cherry picked from commit 3804ca0d90)
2021-04-23 19:06:16 +00:00
Antonio Sanchez
587a691516 Check existence of BSD random before use.
`TensorRandom` currently relies on BSD `random()`, which is not always
available.  The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
               || /* Glibc since 2.19: */ _DEFAULT_SOURCE
	       || /* Glibc <= 2.19: */ _SVID_SOURCE ||  _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.


(cherry picked from commit 045c0609b5)
2021-04-23 00:35:05 +00:00
Antonio Sanchez
8830d66c02 DenseStorage safely copy/swap.
Fixes #2229.

For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set.  Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.


(cherry picked from commit d213a0bcea)
2021-04-22 21:05:50 +00:00
Rasmus Munk Larsen
54425a39b2 Make vectorized compute_inverse_size4 compile with AVX.
(cherry picked from commit 85a76a16ea)
2021-04-22 17:25:25 +00:00
Jakub Lichman
34d0be9ec1 Compilation of basicbenchmark fixed
(cherry picked from commit d72c794ccd)
2021-04-21 12:09:42 +02:00
Jakub Lichman
42a8bdd4d7 HasExp added for AVX512 Packet8d
(cherry picked from commit 2b1dfd1ba0)
2021-04-21 12:09:21 +02:00
Chip-Kerchner
28564957ac Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).
(cherry picked from commit 06c2760bd1)
2021-04-21 01:05:21 +00:00
Antonio Sanchez
ab7fe215f9 Fix ldexp for AVX512 (#2215)
Wrong shuffle was used.  Need to interleave low/high halves with a
`permute` instruction.

Fixes #2215.


(cherry picked from commit 1d79c68ba0)
2021-04-20 20:52:26 +00:00
David Tellenbach
1f4c0311cd Bump to 3.3.91 (3.4-rc1) 2021-04-18 23:43:12 +02:00
163 changed files with 2358 additions and 5815 deletions

View File

@@ -88,6 +88,9 @@ else()
ei_add_cxx_compiler_flag("-std=c++03")
endif()
# Determine if we should build shared libraries on this platform.
get_cmake_property(EIGEN_BUILD_SHARED_LIBS TARGET_SUPPORTS_SHARED_LIBS)
#############################################################################
# find how to link to the standard libraries #
#############################################################################
@@ -424,25 +427,26 @@ endif()
if(EIGEN_INCLUDE_INSTALL_DIR AND NOT INCLUDE_INSTALL_DIR)
set(INCLUDE_INSTALL_DIR ${EIGEN_INCLUDE_INSTALL_DIR}
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
CACHE PATH "The directory relative to CMAKE_INSTALL_PREFIX where Eigen header files are installed")
else()
set(INCLUDE_INSTALL_DIR
"${CMAKE_INSTALL_INCLUDEDIR}/eigen3"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
CACHE PATH "The directory relative to CMAKE_INSTALL_PREFIX where Eigen header files are installed"
)
endif()
set(CMAKEPACKAGE_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/eigen3/cmake"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
CACHE PATH "The directory relative to CMAKE_INSTALL_PREFIX where Eigen3Config.cmake is installed"
)
set(PKGCONFIG_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/pkgconfig"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
CACHE PATH "The directory relative to CMAKE_INSTALL_PREFIX where eigen3.pc is installed"
)
foreach(var INCLUDE_INSTALL_DIR CMAKEPACKAGE_INSTALL_DIR PKGCONFIG_INSTALL_DIR)
# If an absolute path is specified, make it relative to "{CMAKE_INSTALL_PREFIX}".
if(IS_ABSOLUTE "${${var}}")
message(FATAL_ERROR "${var} must be relative to CMAKE_PREFIX_PATH. Got: ${${var}}")
file(RELATIVE_PATH "${var}" "${CMAKE_INSTALL_PREFIX}" "${${var}}")
endif()
endforeach()

View File

@@ -38,9 +38,7 @@
#include "src/LU/Determinant.h"
#include "src/LU/InverseImpl.h"
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if (defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX) || defined EIGEN_VECTORIZE_NEON
#if defined EIGEN_VECTORIZE_SSE || defined EIGEN_VECTORIZE_NEON
#include "src/LU/arch/InverseSize4.h"
#endif

View File

@@ -591,7 +591,7 @@ struct dense_assignment_loop<Kernel, SliceVectorizedTraversal, InnerUnrolling>
enum { innerSize = DstXprType::InnerSizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
vectorizableSize = (innerSize/packetSize)*packetSize,
vectorizableSize = (int(innerSize) / int(packetSize)) * int(packetSize),
size = DstXprType::SizeAtCompileTime };
for(Index outer = 0; outer < kernel.outerSize(); ++outer)
@@ -785,6 +785,16 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType
dense_assignment_loop<Kernel>::run(kernel);
}
// Specialization for filling the destination with a constant value.
#ifndef EIGEN_GPU_COMPILE_PHASE
template<typename DstXprType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<typename DstXprType::Scalar>, DstXprType>& src, const internal::assign_op<typename DstXprType::Scalar,typename DstXprType::Scalar>& func)
{
resize_if_allowed(dst, src, func);
std::fill_n(dst.data(), dst.size(), src.functor()());
}
#endif
template<typename DstXprType, typename SrcXprType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const SrcXprType& src)
{

View File

@@ -67,7 +67,7 @@ class BandMatrixBase : public EigenBase<Derived>
* \warning the internal storage must be column major. */
inline Block<CoefficientsType,Dynamic,1> col(Index i)
{
EIGEN_STATIC_ASSERT((Options&RowMajor)==0,THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
EIGEN_STATIC_ASSERT((int(Options) & int(RowMajor)) == 0, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
Index start = 0;
Index len = coeffs().rows();
if (i<=supers())
@@ -90,7 +90,7 @@ class BandMatrixBase : public EigenBase<Derived>
template<int Index> struct DiagonalIntReturnType {
enum {
ReturnOpposite = (Options&SelfAdjoint) && (((Index)>0 && Supers==0) || ((Index)<0 && Subs==0)),
ReturnOpposite = (int(Options) & int(SelfAdjoint)) && (((Index) > 0 && Supers == 0) || ((Index) < 0 && Subs == 0)),
Conjugate = ReturnOpposite && NumTraits<Scalar>::IsComplex,
ActualIndex = ReturnOpposite ? -Index : Index,
DiagonalSize = (RowsAtCompileTime==Dynamic || ColsAtCompileTime==Dynamic)
@@ -192,7 +192,7 @@ struct traits<BandMatrix<_Scalar,_Rows,_Cols,_Supers,_Subs,_Options> >
Options = _Options,
DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic
};
typedef Matrix<Scalar,DataRowsAtCompileTime,ColsAtCompileTime,Options&RowMajor?RowMajor:ColMajor> CoefficientsType;
typedef Matrix<Scalar, DataRowsAtCompileTime, ColsAtCompileTime, int(Options) & int(RowMajor) ? RowMajor : ColMajor> CoefficientsType;
};
template<typename _Scalar, int Rows, int Cols, int Supers, int Subs, int Options>

View File

@@ -81,7 +81,7 @@ EIGEN_DEVICE_FUNC inline bool DenseBase<Derived>::all() const
typedef internal::evaluator<Derived> Evaluator;
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * (Evaluator::CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
&& SizeAtCompileTime * (int(Evaluator::CoeffReadCost) + int(NumTraits<Scalar>::AddCost)) <= EIGEN_UNROLLING_LIMIT
};
Evaluator evaluator(derived());
if(unroll)
@@ -105,7 +105,7 @@ EIGEN_DEVICE_FUNC inline bool DenseBase<Derived>::any() const
typedef internal::evaluator<Derived> Evaluator;
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * (Evaluator::CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
&& SizeAtCompileTime * (int(Evaluator::CoeffReadCost) + int(NumTraits<Scalar>::AddCost)) <= EIGEN_UNROLLING_LIMIT
};
Evaluator evaluator(derived());
if(unroll)

View File

@@ -561,7 +561,7 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
typedef CwiseUnaryOp<UnaryOp, ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
CoeffReadCost = int(evaluator<ArgType>::CoeffReadCost) + int(functor_traits<UnaryOp>::Cost),
Flags = evaluator<ArgType>::Flags
& (HereditaryBits | LinearAccessBit | (functor_traits<UnaryOp>::PacketAccess ? PacketAccessBit : 0)),
@@ -606,13 +606,13 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
protected:
// this helper permits to completely eliminate the functor if it is empty
class Data : private UnaryOp
struct Data
{
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : UnaryOp(xpr.functor()), argImpl(xpr.nestedExpression()) {}
Data(const XprType& xpr) : op(xpr.functor()), argImpl(xpr.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& func() const { return static_cast<const UnaryOp&>(*this); }
const UnaryOp& func() const { return op; }
UnaryOp op;
evaluator<ArgType> argImpl;
};
@@ -639,7 +639,7 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
typedef CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> XprType;
enum {
CoeffReadCost = evaluator<Arg1>::CoeffReadCost + evaluator<Arg2>::CoeffReadCost + evaluator<Arg3>::CoeffReadCost + functor_traits<TernaryOp>::Cost,
CoeffReadCost = int(evaluator<Arg1>::CoeffReadCost) + int(evaluator<Arg2>::CoeffReadCost) + int(evaluator<Arg3>::CoeffReadCost) + int(functor_traits<TernaryOp>::Cost),
Arg1Flags = evaluator<Arg1>::Flags,
Arg2Flags = evaluator<Arg2>::Flags,
@@ -700,12 +700,13 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
protected:
// this helper permits to completely eliminate the functor if it is empty
struct Data : private TernaryOp
struct Data
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : TernaryOp(xpr.functor()), arg1Impl(xpr.arg1()), arg2Impl(xpr.arg2()), arg3Impl(xpr.arg3()) {}
Data(const XprType& xpr) : op(xpr.functor()), arg1Impl(xpr.arg1()), arg2Impl(xpr.arg2()), arg3Impl(xpr.arg3()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const TernaryOp& func() const { return static_cast<const TernaryOp&>(*this); }
const TernaryOp& func() const { return op; }
TernaryOp op;
evaluator<Arg1> arg1Impl;
evaluator<Arg2> arg2Impl;
evaluator<Arg3> arg3Impl;
@@ -735,7 +736,7 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
enum {
CoeffReadCost = evaluator<Lhs>::CoeffReadCost + evaluator<Rhs>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<Lhs>::CoeffReadCost) + int(evaluator<Rhs>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
LhsFlags = evaluator<Lhs>::Flags,
RhsFlags = evaluator<Rhs>::Flags,
@@ -793,12 +794,13 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
protected:
// this helper permits to completely eliminate the functor if it is empty
struct Data : private BinaryOp
struct Data
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : BinaryOp(xpr.functor()), lhsImpl(xpr.lhs()), rhsImpl(xpr.rhs()) {}
Data(const XprType& xpr) : op(xpr.functor()), lhsImpl(xpr.lhs()), rhsImpl(xpr.rhs()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const BinaryOp& func() const { return static_cast<const BinaryOp&>(*this); }
const BinaryOp& func() const { return op; }
BinaryOp op;
evaluator<Lhs> lhsImpl;
evaluator<Rhs> rhsImpl;
};
@@ -815,7 +817,7 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
typedef CwiseUnaryView<UnaryOp, ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
CoeffReadCost = int(evaluator<ArgType>::CoeffReadCost) + int(functor_traits<UnaryOp>::Cost),
Flags = (evaluator<ArgType>::Flags & (HereditaryBits | LinearAccessBit | DirectAccessBit)),
@@ -858,12 +860,13 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
protected:
// this helper permits to completely eliminate the functor if it is empty
struct Data : private UnaryOp
struct Data
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : UnaryOp(xpr.functor()), argImpl(xpr.nestedExpression()) {}
Data(const XprType& xpr) : op(xpr.functor()), argImpl(xpr.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& func() const { return static_cast<const UnaryOp&>(*this); }
const UnaryOp& func() const { return op; }
UnaryOp op;
evaluator<ArgType> argImpl;
};

View File

@@ -102,7 +102,7 @@ class CwiseBinaryOp :
#if EIGEN_COMP_MSVC && EIGEN_HAS_CXX11
//Required for Visual Studio or the Copy constructor will probably not get inlined!
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
EIGEN_STRONG_INLINE
CwiseBinaryOp(const CwiseBinaryOp<BinaryOp,LhsType,RhsType>&) = default;
#endif

View File

@@ -163,6 +163,30 @@ struct plain_array<T, 0, MatrixOrArrayOptions, Alignment>
EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) {}
};
struct plain_array_helper {
template<typename T, int Size, int MatrixOrArrayOptions, int Alignment>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
static void copy(const plain_array<T, Size, MatrixOrArrayOptions, Alignment>& src, const Eigen::Index size,
plain_array<T, Size, MatrixOrArrayOptions, Alignment>& dst) {
smart_copy(src.array, src.array + size, dst.array);
}
template<typename T, int Size, int MatrixOrArrayOptions, int Alignment>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
static void swap(plain_array<T, Size, MatrixOrArrayOptions, Alignment>& a, const Eigen::Index a_size,
plain_array<T, Size, MatrixOrArrayOptions, Alignment>& b, const Eigen::Index b_size) {
if (a_size < b_size) {
std::swap_ranges(b.array, b.array + a_size, a.array);
smart_move(b.array + a_size, b.array + b_size, a.array + a_size);
} else if (a_size > b_size) {
std::swap_ranges(a.array, a.array + b_size, b.array);
smart_move(a.array + b_size, a.array + a_size, b.array + b_size);
} else {
std::swap_ranges(a.array, a.array + a_size, b.array);
}
}
};
} // end namespace internal
/** \internal
@@ -190,17 +214,26 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
EIGEN_DEVICE_FUNC
explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()) {}
#if !EIGEN_HAS_CXX11 || defined(EIGEN_DENSE_STORAGE_CTOR_PLUGIN)
EIGEN_DEVICE_FUNC
DenseStorage(const DenseStorage& other) : m_data(other.m_data) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = Size)
}
#else
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage&) = default;
#endif
#if !EIGEN_HAS_CXX11
EIGEN_DEVICE_FUNC
DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other) m_data = other.m_data;
return *this;
}
#else
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) = default;
#endif
#if EIGEN_HAS_RVALUE_REFERENCES
#if !EIGEN_HAS_CXX11
EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
: m_data(std::move(other.m_data))
{
@@ -211,6 +244,10 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
m_data = std::move(other.m_data);
return *this;
}
#else
EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&&) = default;
EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&&) = default;
#endif
#endif
EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({})
@@ -268,21 +305,25 @@ template<typename T, int Size, int _Options> class DenseStorage<T, Size, Dynamic
EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0), m_cols(0) {}
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0), m_cols(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows), m_cols(other.m_cols) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(other.m_rows), m_cols(other.m_cols)
{
internal::plain_array_helper::copy(other.m_data, m_rows * m_cols, m_data);
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_rows = other.m_rows;
m_cols = other.m_cols;
internal::plain_array_helper::copy(other.m_data, m_rows * m_cols, m_data);
}
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index cols) : m_rows(rows), m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{
numext::swap(m_data,other.m_data);
internal::plain_array_helper::swap(m_data, m_rows * m_cols, other.m_data, other.m_rows * other.m_cols);
numext::swap(m_rows,other.m_rows);
numext::swap(m_cols,other.m_cols);
}
@@ -303,21 +344,26 @@ template<typename T, int Size, int _Cols, int _Options> class DenseStorage<T, Si
EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0) {}
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(other.m_rows)
{
internal::plain_array_helper::copy(other.m_data, m_rows * _Cols, m_data);
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_rows = other.m_rows;
internal::plain_array_helper::copy(other.m_data, m_rows * _Cols, m_data);
}
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index) : m_rows(rows) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
{
internal::plain_array_helper::swap(m_data, m_rows * _Cols, other.m_data, other.m_rows * _Cols);
numext::swap(m_rows, other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols(void) const EIGEN_NOEXCEPT {return _Cols;}
@@ -336,20 +382,24 @@ template<typename T, int Size, int _Rows, int _Options> class DenseStorage<T, Si
EIGEN_DEVICE_FUNC DenseStorage() : m_cols(0) {}
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_cols(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_cols(other.m_cols) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other)
: m_data(internal::constructor_without_unaligned_array_assert()), m_cols(other.m_cols)
{
internal::plain_array_helper::copy(other.m_data, _Rows * m_cols, m_data);
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_cols = other.m_cols;
internal::plain_array_helper::copy(other.m_data, _Rows * m_cols, m_data);
}
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index, Index cols) : m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
internal::plain_array_helper::swap(m_data, _Rows * m_cols, other.m_data, _Rows * other.m_cols);
numext::swap(m_cols, other.m_cols);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows(void) const EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}

View File

@@ -86,7 +86,7 @@ MatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const
//---------- implementation of L2 norm and related functions ----------
/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the Frobenius norm.
/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the squared Frobenius norm.
* In both cases, it consists in the sum of the square of all the matrix entries.
* For vectors, this is also equals to the dot product of \c *this with itself.
*

View File

@@ -129,6 +129,22 @@ template<typename T> struct packet_traits : default_packet_traits
template<typename T> struct packet_traits<const T> : packet_traits<T> { };
template<typename T> struct unpacket_traits
{
typedef T type;
typedef T half;
enum
{
size = 1,
alignment = 1,
vectorizable = false,
masked_load_available=false,
masked_store_available=false
};
};
template<typename T> struct unpacket_traits<const T> : unpacket_traits<T> { };
template <typename Src, typename Tgt> struct type_casting_traits {
enum {
VectorizedCast = 0,
@@ -154,6 +170,18 @@ struct eigen_packet_wrapper
T m_val;
};
/** \internal A convenience utility for determining if the type is a scalar.
* This is used to enable some generic packet implementations.
*/
template<typename Packet>
struct is_scalar {
typedef typename unpacket_traits<Packet>::type Scalar;
enum {
value = internal::is_same<Packet, Scalar>::value
};
};
/** \internal \returns static_cast<TgtType>(a) (coeff-wise) */
template <typename SrcPacket, typename TgtPacket>
EIGEN_DEVICE_FUNC inline TgtPacket
@@ -215,13 +243,59 @@ pmul(const bool& a, const bool& b) { return a && b; }
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pdiv(const Packet& a, const Packet& b) { return a/b; }
/** \internal \returns one bits */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ptrue(const Packet& /*a*/) { Packet b; memset((void*)&b, 0xff, sizeof(b)); return b;}
// In the generic case, memset to all one bits.
template<typename Packet, typename EnableIf = void>
struct ptrue_impl {
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& /*a*/){
Packet b;
memset(static_cast<void*>(&b), 0xff, sizeof(Packet));
return b;
}
};
/** \internal \returns zero bits */
// For non-trivial scalars, set to Scalar(1) (i.e. a non-zero value).
// Although this is technically not a valid bitmask, the scalar path for pselect
// uses a comparison to zero, so this should still work in most cases. We don't
// have another option, since the scalar type requires initialization.
template<typename T>
struct ptrue_impl<T,
typename internal::enable_if<is_scalar<T>::value && NumTraits<T>::RequireInitialization>::type > {
static EIGEN_DEVICE_FUNC inline T run(const T& /*a*/){
return T(1);
}
};
/** \internal \returns one bits. */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pzero(const Packet& /*a*/) { Packet b; memset((void*)&b, 0, sizeof(b)); return b;}
ptrue(const Packet& a) {
return ptrue_impl<Packet>::run(a);
}
// In the general case, memset to zero.
template<typename Packet, typename EnableIf = void>
struct pzero_impl {
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& /*a*/) {
Packet b;
memset(static_cast<void*>(&b), 0x00, sizeof(Packet));
return b;
}
};
// For scalars, explicitly set to Scalar(0), since the underlying representation
// for zero may not consist of all-zero bits.
template<typename T>
struct pzero_impl<T,
typename internal::enable_if<is_scalar<T>::value>::type> {
static EIGEN_DEVICE_FUNC inline T run(const T& /*a*/) {
return T(0);
}
};
/** \internal \returns packet of zeros */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pzero(const Packet& a) {
return pzero_impl<Packet>::run(a);
}
/** \internal \returns a <= b as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -238,33 +312,6 @@ pcmp_eq(const Packet& a, const Packet& b) { return a==b ? ptrue(a) : pzero(a); }
/** \internal \returns a < b or a==NaN or b==NaN as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_lt_or_nan(const Packet& a, const Packet& b) { return a>=b ? pzero(a) : ptrue(a); }
template<> EIGEN_DEVICE_FUNC inline float pzero<float>(const float& a) {
EIGEN_UNUSED_VARIABLE(a)
return 0.f;
}
template<> EIGEN_DEVICE_FUNC inline double pzero<double>(const double& a) {
EIGEN_UNUSED_VARIABLE(a)
return 0.;
}
template <typename RealScalar>
EIGEN_DEVICE_FUNC inline std::complex<RealScalar> ptrue(const std::complex<RealScalar>& /*a*/) {
RealScalar b = ptrue(RealScalar(0));
return std::complex<RealScalar>(b, b);
}
template <typename Packet, typename Op>
EIGEN_DEVICE_FUNC inline Packet bitwise_helper(const Packet& a, const Packet& b, Op op) {
const unsigned char* a_ptr = reinterpret_cast<const unsigned char*>(&a);
const unsigned char* b_ptr = reinterpret_cast<const unsigned char*>(&b);
Packet c;
unsigned char* c_ptr = reinterpret_cast<unsigned char*>(&c);
for (size_t i = 0; i < sizeof(Packet); ++i) {
*c_ptr++ = op(*a_ptr++, *b_ptr++);
}
return c;
}
template<typename T>
struct bit_and {
@@ -287,42 +334,123 @@ struct bit_xor {
}
};
template<typename T>
struct bit_not {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a) const {
return ~a;
}
};
// Use operators &, |, ^, ~.
template<typename T>
struct operator_bitwise_helper {
EIGEN_DEVICE_FUNC static inline T bitwise_and(const T& a, const T& b) { return bit_and<T>()(a, b); }
EIGEN_DEVICE_FUNC static inline T bitwise_or(const T& a, const T& b) { return bit_or<T>()(a, b); }
EIGEN_DEVICE_FUNC static inline T bitwise_xor(const T& a, const T& b) { return bit_xor<T>()(a, b); }
EIGEN_DEVICE_FUNC static inline T bitwise_not(const T& a) { return bit_not<T>()(a); }
};
// Apply binary operations byte-by-byte
template<typename T>
struct bytewise_bitwise_helper {
EIGEN_DEVICE_FUNC static inline T bitwise_and(const T& a, const T& b) {
return binary(a, b, bit_and<unsigned char>());
}
EIGEN_DEVICE_FUNC static inline T bitwise_or(const T& a, const T& b) {
return binary(a, b, bit_or<unsigned char>());
}
EIGEN_DEVICE_FUNC static inline T bitwise_xor(const T& a, const T& b) {
return binary(a, b, bit_xor<unsigned char>());
}
EIGEN_DEVICE_FUNC static inline T bitwise_not(const T& a) {
return unary(a,bit_not<unsigned char>());
}
private:
template<typename Op>
EIGEN_DEVICE_FUNC static inline T unary(const T& a, Op op) {
const unsigned char* a_ptr = reinterpret_cast<const unsigned char*>(&a);
T c;
unsigned char* c_ptr = reinterpret_cast<unsigned char*>(&c);
for (size_t i = 0; i < sizeof(T); ++i) {
*c_ptr++ = op(*a_ptr++);
}
return c;
}
template<typename Op>
EIGEN_DEVICE_FUNC static inline T binary(const T& a, const T& b, Op op) {
const unsigned char* a_ptr = reinterpret_cast<const unsigned char*>(&a);
const unsigned char* b_ptr = reinterpret_cast<const unsigned char*>(&b);
T c;
unsigned char* c_ptr = reinterpret_cast<unsigned char*>(&c);
for (size_t i = 0; i < sizeof(T); ++i) {
*c_ptr++ = op(*a_ptr++, *b_ptr++);
}
return c;
}
};
// In the general case, use byte-by-byte manipulation.
template<typename T, typename EnableIf = void>
struct bitwise_helper : public bytewise_bitwise_helper<T> {};
// For integers or non-trivial scalars, use binary operators.
template<typename T>
struct bitwise_helper<T,
typename internal::enable_if<
is_scalar<T>::value && (NumTraits<T>::IsInteger || NumTraits<T>::RequireInitialization)>::type
> : public operator_bitwise_helper<T> {};
/** \internal \returns the bitwise and of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pand(const Packet& a, const Packet& b) {
return bitwise_helper(a, b, bit_and<unsigned char>());
return bitwise_helper<Packet>::bitwise_and(a, b);
}
/** \internal \returns the bitwise or of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
por(const Packet& a, const Packet& b) {
return bitwise_helper(a ,b, bit_or<unsigned char>());
return bitwise_helper<Packet>::bitwise_or(a, b);
}
/** \internal \returns the bitwise xor of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pxor(const Packet& a, const Packet& b) {
return bitwise_helper(a ,b, bit_xor<unsigned char>());
return bitwise_helper<Packet>::bitwise_xor(a, b);
}
/** \internal \returns the bitwise not of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pnot(const Packet& a) {
return bitwise_helper<Packet>::bitwise_not(a);
}
/** \internal \returns the bitwise and of \a a and not \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pandnot(const Packet& a, const Packet& b) { return pand(a, pxor(ptrue(b), b)); }
pandnot(const Packet& a, const Packet& b) { return pand(a, pnot(b)); }
// In the general case, use bitwise select.
template<typename Packet, typename EnableIf = void>
struct pselect_impl {
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
};
// For scalars, use ternary select.
template<typename Packet>
struct pselect_impl<Packet,
typename internal::enable_if<is_scalar<Packet>::value>::type > {
static EIGEN_DEVICE_FUNC inline Packet run(const Packet& mask, const Packet& a, const Packet& b) {
return numext::equal_strict(mask, Packet(0)) ? b : a;
}
};
/** \internal \returns \a or \b for each field in packet according to \mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pselect(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
template<> EIGEN_DEVICE_FUNC inline float pselect<float>(
const float& cond, const float& a, const float&b) {
return numext::equal_strict(cond,0.f) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline double pselect<double>(
const double& cond, const double& a, const double& b) {
return numext::equal_strict(cond,0.) ? b : a;
return pselect_impl<Packet>::run(mask, a, b);
}
template<> EIGEN_DEVICE_FUNC inline bool pselect<bool>(

View File

@@ -47,7 +47,7 @@ private:
* \brief A matrix or vector expression mapping an existing array of data.
*
* \tparam PlainObjectType the equivalent matrix type of the mapped data
* \tparam MapOptions specifies the pointer alignment in bytes. It can be: \c #Aligned128, , \c #Aligned64, \c #Aligned32, \c #Aligned16, \c #Aligned8 or \c #Unaligned.
* \tparam MapOptions specifies the pointer alignment in bytes. It can be: \c #Aligned128, \c #Aligned64, \c #Aligned32, \c #Aligned16, \c #Aligned8 or \c #Unaligned.
* The default is \c #Unaligned.
* \tparam StrideType optionally specifies strides. By default, Map assumes the memory layout
* of an ordinary, contiguous array. This can be overridden by specifying strides.

View File

@@ -2,6 +2,7 @@
// for linear algebra.
//
// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
// Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
@@ -260,19 +261,8 @@ struct conj_default_impl<Scalar,true>
}
};
template<typename Scalar> struct conj_impl : conj_default_impl<Scalar> {};
#if defined(EIGEN_GPU_COMPILE_PHASE)
template<typename T>
struct conj_impl<std::complex<T> >
{
EIGEN_DEVICE_FUNC
static inline std::complex<T> run(const std::complex<T>& x)
{
return std::complex<T>(x.real(), -x.imag());
}
};
#endif
template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
struct conj_impl : conj_default_impl<Scalar, IsComplex> {};
template<typename Scalar>
struct conj_retval
@@ -582,7 +572,9 @@ struct rint_retval
* Implementation of arg *
****************************************************************************/
#if EIGEN_HAS_CXX11_MATH
// Visual Studio 2017 has a bug where arg(float) returns 0 for negative inputs.
// This seems to be fixed in VS 2019.
#if EIGEN_HAS_CXX11_MATH && (!EIGEN_COMP_MSVC || EIGEN_COMP_MSVC >= 1920)
// std::arg is only defined for types of std::complex, or integer types or float/double/long double
template<typename Scalar,
bool HasStdImpl = NumTraits<Scalar>::IsComplex || is_integral<Scalar>::value
@@ -592,8 +584,9 @@ struct arg_default_impl;
template<typename Scalar>
struct arg_default_impl<Scalar, true> {
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
static inline RealScalar run(const Scalar& x)
{
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// HIP does not seem to have a native device side implementation for the math routine "arg"
@@ -601,7 +594,7 @@ struct arg_default_impl<Scalar, true> {
#else
EIGEN_USING_STD(arg);
#endif
return static_cast<Scalar>(arg(x));
return static_cast<RealScalar>(arg(x));
}
};
@@ -612,7 +605,7 @@ struct arg_default_impl<Scalar, false> {
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0);
return (x < Scalar(0)) ? RealScalar(EIGEN_PI) : RealScalar(0);
}
};
#else
@@ -623,7 +616,7 @@ struct arg_default_impl
EIGEN_DEVICE_FUNC
static inline RealScalar run(const Scalar& x)
{
return (x < Scalar(0)) ? Scalar(EIGEN_PI) : Scalar(0);
return (x < RealScalar(0)) ? RealScalar(EIGEN_PI) : RealScalar(0);
}
};
@@ -697,6 +690,30 @@ struct expm1_retval
typedef Scalar type;
};
/****************************************************************************
* Implementation of log *
****************************************************************************/
// Complex log defined in MathFunctionsImpl.h.
template<typename T> EIGEN_DEVICE_FUNC std::complex<T> complex_log(const std::complex<T>& z);
template<typename Scalar>
struct log_impl {
EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x)
{
EIGEN_USING_STD(log);
return static_cast<Scalar>(log(x));
}
};
template<typename Scalar>
struct log_impl<std::complex<Scalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<Scalar> run(const std::complex<Scalar>& z)
{
return complex_log(z);
}
};
/****************************************************************************
* Implementation of log1p *
****************************************************************************/
@@ -710,7 +727,7 @@ namespace std_fallback {
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD(log);
Scalar x1p = RealScalar(1) + x;
Scalar log_1p = log(x1p);
Scalar log_1p = log_impl<Scalar>::run(x1p);
const bool is_small = numext::equal_strict(x1p, Scalar(1));
const bool is_inf = numext::equal_strict(x1p, log_1p);
return (is_small || is_inf) ? x : x * (log_1p / (x1p - RealScalar(1)));
@@ -1470,8 +1487,7 @@ T rsqrt(const T& x)
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T log(const T &x) {
EIGEN_USING_STD(log);
return static_cast<T>(log(x));
return internal::log_impl<T>::run(x);
}
#if defined(SYCL_DEVICE_ONLY)
@@ -2022,6 +2038,18 @@ struct rsqrt_impl {
}
};
#if defined(EIGEN_GPU_COMPILE_PHASE)
template<typename T>
struct conj_impl<std::complex<T>, true>
{
EIGEN_DEVICE_FUNC
static inline std::complex<T> run(const std::complex<T>& x)
{
return std::complex<T>(numext::real(x), -numext::imag(x));
}
};
#endif
} // end namespace internal
} // end namespace Eigen

View File

@@ -184,6 +184,15 @@ EIGEN_DEVICE_FUNC std::complex<T> complex_rsqrt(const std::complex<T>& z) {
: std::complex<T>(numext::abs(y) / (2 * w * abs_z), y < zero ? woz : -woz );
}
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_log(const std::complex<T>& z) {
// Computes complex log.
T a = numext::abs(z);
EIGEN_USING_STD(atan2);
T b = atan2(z.imag(), z.real());
return std::complex<T>(numext::log(a), b);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -289,9 +289,9 @@ struct NumTraits<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
IsInteger = NumTraits<Scalar>::IsInteger,
IsSigned = NumTraits<Scalar>::IsSigned,
RequireInitialization = 1,
ReadCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::ReadCost,
AddCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::AddCost,
MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::MulCost
ReadCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits<Scalar>::ReadCost),
AddCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits<Scalar>::AddCost),
MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits<Scalar>::MulCost)
};
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR

View File

@@ -145,7 +145,7 @@ struct evaluator<PartialReduxExpr<ArgType, MemberOp, Direction> >
enum {
CoeffReadCost = TraversalSize==Dynamic ? HugeCost
: TraversalSize==0 ? 1
: TraversalSize * evaluator<ArgType>::CoeffReadCost + int(CostOpType::value),
: int(TraversalSize) * int(evaluator<ArgType>::CoeffReadCost) + int(CostOpType::value),
_ArgFlags = evaluator<ArgType>::Flags,

View File

@@ -1019,7 +1019,7 @@ struct conservative_resize_like_impl
else
{
// The storage order does not allow us to use reallocation.
typename Derived::PlainObject tmp(rows,cols);
Derived tmp(rows,cols);
const Index common_rows = numext::mini(rows, _this.rows());
const Index common_cols = numext::mini(cols, _this.cols());
tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols);
@@ -1054,7 +1054,7 @@ struct conservative_resize_like_impl
else
{
// The storage order does not allow us to use reallocation.
typename Derived::PlainObject tmp(other);
Derived tmp(other);
const Index common_rows = numext::mini(tmp.rows(), _this.rows());
const Index common_cols = numext::mini(tmp.cols(), _this.cols());
tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols);

View File

@@ -831,7 +831,7 @@ struct diagonal_product_evaluator_base
typedef typename ScalarBinaryOpTraits<typename MatrixType::Scalar, typename DiagonalType::Scalar>::ReturnType Scalar;
public:
enum {
CoeffReadCost = NumTraits<Scalar>::MulCost + evaluator<MatrixType>::CoeffReadCost + evaluator<DiagonalType>::CoeffReadCost,
CoeffReadCost = int(NumTraits<Scalar>::MulCost) + int(evaluator<MatrixType>::CoeffReadCost) + int(evaluator<DiagonalType>::CoeffReadCost),
MatrixFlags = evaluator<MatrixType>::Flags,
DiagFlags = evaluator<DiagonalType>::Flags,

View File

@@ -58,7 +58,7 @@ public:
public:
enum {
Cost = Evaluator::SizeAtCompileTime == Dynamic ? HugeCost
: Evaluator::SizeAtCompileTime * Evaluator::CoeffReadCost + (Evaluator::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
: int(Evaluator::SizeAtCompileTime) * int(Evaluator::CoeffReadCost) + (Evaluator::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
UnrollingLimit = EIGEN_UNROLLING_LIMIT * (int(Traversal) == int(DefaultTraversal) ? 1 : int(PacketSize))
};
@@ -331,7 +331,7 @@ struct redux_impl<Func, Evaluator, LinearVectorizedTraversal, CompleteUnrolling>
enum {
PacketSize = redux_traits<Func, Evaluator>::PacketSize,
Size = Evaluator::SizeAtCompileTime,
VectorizedSize = (Size / PacketSize) * PacketSize
VectorizedSize = (int(Size) / int(PacketSize)) * int(PacketSize)
};
template<typename XprType>

View File

@@ -66,7 +66,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
enum {
Mode = internal::traits<SelfAdjointView>::Mode,
Flags = internal::traits<SelfAdjointView>::Flags,
TransposeMode = ((Mode & Upper) ? Lower : 0) | ((Mode & Lower) ? Upper : 0)
TransposeMode = ((int(Mode) & int(Upper)) ? Lower : 0) | ((int(Mode) & int(Lower)) ? Upper : 0)
};
typedef typename MatrixType::PlainObject PlainObject;

View File

@@ -168,7 +168,7 @@ EIGEN_DEVICE_FUNC void TriangularViewImpl<MatrixType,Mode,Dense>::solveInPlace(c
{
OtherDerived& other = _other.const_cast_derived();
eigen_assert( derived().cols() == derived().rows() && ((Side==OnTheLeft && derived().cols() == other.rows()) || (Side==OnTheRight && derived().cols() == other.cols())) );
eigen_assert((!(Mode & ZeroDiag)) && bool(Mode & (Upper|Lower)));
eigen_assert((!(int(Mode) & int(ZeroDiag))) && bool(int(Mode) & (int(Upper) | int(Lower))));
// If solving for a 0x0 matrix, nothing to do, simply return.
if (derived().cols() == 0)
return;

View File

@@ -53,7 +53,7 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
typedef Derived const& Nested;
EIGEN_DEVICE_FUNC
inline TriangularBase() { eigen_assert(!((Mode&UnitDiag) && (Mode&ZeroDiag))); }
inline TriangularBase() { eigen_assert(!((int(Mode) & int(UnitDiag)) && (int(Mode) & int(ZeroDiag)))); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); }
@@ -819,7 +819,7 @@ void call_triangular_assignment_loop(DstXprType& dst, const SrcXprType& src, con
enum {
unroll = DstXprType::SizeAtCompileTime != Dynamic
&& SrcEvaluatorType::CoeffReadCost < HugeCost
&& DstXprType::SizeAtCompileTime * (DstEvaluatorType::CoeffReadCost+SrcEvaluatorType::CoeffReadCost) / 2 <= EIGEN_UNROLLING_LIMIT
&& DstXprType::SizeAtCompileTime * (int(DstEvaluatorType::CoeffReadCost) + int(SrcEvaluatorType::CoeffReadCost)) / 2 <= EIGEN_UNROLLING_LIMIT
};
triangular_assignment_loop<Kernel, Mode, unroll ? int(DstXprType::SizeAtCompileTime) : Dynamic, SetOpposite>::run(kernel);
@@ -853,7 +853,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Triangular2Dense>
{
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
call_triangular_assignment_loop<SrcXprType::Mode, (SrcXprType::Mode&SelfAdjoint)==0>(dst, src, func);
call_triangular_assignment_loop<SrcXprType::Mode, (int(SrcXprType::Mode) & int(SelfAdjoint)) == 0>(dst, src, func);
}
};
@@ -951,7 +951,7 @@ template<typename DenseDerived>
EIGEN_DEVICE_FUNC void TriangularBase<Derived>::evalToLazy(MatrixBase<DenseDerived> &other) const
{
other.derived().resize(this->rows(), this->cols());
internal::call_triangular_assignment_loop<Derived::Mode,(Derived::Mode&SelfAdjoint)==0 /* SetOpposite */>(other.derived(), derived().nestedExpression());
internal::call_triangular_assignment_loop<Derived::Mode, (int(Derived::Mode) & int(SelfAdjoint)) == 0 /* SetOpposite */>(other.derived(), derived().nestedExpression());
}
namespace internal {

View File

@@ -124,7 +124,7 @@ void DenseBase<Derived>::visit(Visitor& visitor) const
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * ThisEvaluator::CoeffReadCost + (SizeAtCompileTime-1) * internal::functor_traits<Visitor>::Cost <= EIGEN_UNROLLING_LIMIT
&& SizeAtCompileTime * int(ThisEvaluator::CoeffReadCost) + (SizeAtCompileTime-1) * int(internal::functor_traits<Visitor>::Cost) <= EIGEN_UNROLLING_LIMIT
};
return internal::visitor_impl<Visitor, ThisEvaluator, unroll ? int(SizeAtCompileTime) : Dynamic>::run(thisEval, visitor);
}

View File

@@ -167,39 +167,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet4cf>(const P
Packet2cf(_mm256_extractf128_ps(a.v, 1))));
}
template<> struct conj_helper<Packet4cf, Packet4cf, false,true>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, true,false>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, true,true>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet4cf,Packet8f)
template<> EIGEN_STRONG_INLINE Packet4cf pdiv<Packet4cf>(const Packet4cf& a, const Packet4cf& b)
@@ -350,39 +317,6 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet2cd>(const
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<> struct conj_helper<Packet2cd, Packet2cd, false,true>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, true,false>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, true,true>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cd,Packet4d)
template<> EIGEN_STRONG_INLINE Packet2cd pdiv<Packet2cd>(const Packet2cd& a, const Packet2cd& b)

View File

@@ -1274,12 +1274,7 @@ EIGEN_STRONG_INLINE Packet8f Bf16ToF32(const Packet8bf& a) {
EIGEN_STRONG_INLINE Packet8bf F32ToBf16(const Packet8f& a) {
Packet8bf r;
// Flush input denormals value to zero with hardware capability.
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
__m256 flush = _mm256_and_ps(a, a);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_OFF);
__m256i input = _mm256_castps_si256(flush);
__m256i input = _mm256_castps_si256(a);
#ifdef EIGEN_VECTORIZE_AVX2
// uint32_t lsb = (input >> 16);
@@ -1293,7 +1288,7 @@ EIGEN_STRONG_INLINE Packet8bf F32ToBf16(const Packet8f& a) {
// input = input >> 16;
t = _mm256_srli_epi32(t, 16);
// Check NaN before converting back to bf16
__m256 mask = _mm256_cmp_ps(flush, flush, _CMP_ORD_Q);
__m256 mask = _mm256_cmp_ps(a, a, _CMP_ORD_Q);
__m256i nan = _mm256_set1_epi32(0x7fc0);
t = _mm256_blendv_epi8(nan, t, _mm256_castps_si256(mask));
// output = numext::bit_cast<uint16_t>(input);
@@ -1316,7 +1311,7 @@ EIGEN_STRONG_INLINE Packet8bf F32ToBf16(const Packet8f& a) {
lo = _mm_srli_epi32(lo, 16);
hi = _mm_srli_epi32(hi, 16);
// Check NaN before converting back to bf16
__m256 mask = _mm256_cmp_ps(flush, flush, _CMP_ORD_Q);
__m256 mask = _mm256_cmp_ps(a, a, _CMP_ORD_Q);
__m128i nan = _mm_set1_epi32(0x7fc0);
lo = _mm_blendv_epi8(nan, lo, _mm_castps_si128(_mm256_castps256_ps128(mask)));
hi = _mm_blendv_epi8(nan, hi, _mm_castps_si128(_mm256_extractf128_ps(mask, 1)));

View File

@@ -153,39 +153,6 @@ EIGEN_STRONG_INLINE Packet4cf predux_half_dowto4<Packet8cf>(const Packet8cf& a)
return Packet4cf(res);
}
template<> struct conj_helper<Packet8cf, Packet8cf, false,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, true,false>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, true,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet8cf,Packet16f)
template<> EIGEN_STRONG_INLINE Packet8cf pdiv<Packet8cf>(const Packet8cf& a, const Packet8cf& b)

View File

@@ -119,74 +119,11 @@ pexp<Packet16f>(const Packet16f& _x) {
return pmax(pmul(y, _mm512_castsi512_ps(emm0)), _x);
}
/*template <>
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
pexp<Packet8d>(const Packet8d& _x) {
Packet8d x = _x;
_EIGEN_DECLARE_CONST_Packet8d(1, 1.0);
_EIGEN_DECLARE_CONST_Packet8d(2, 2.0);
_EIGEN_DECLARE_CONST_Packet8d(exp_hi, 709.437);
_EIGEN_DECLARE_CONST_Packet8d(exp_lo, -709.436139303);
_EIGEN_DECLARE_CONST_Packet8d(cephes_LOG2EF, 1.4426950408889634073599);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_p0, 1.26177193074810590878e-4);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_p1, 3.02994407707441961300e-2);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_p2, 9.99999999999999999910e-1);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q0, 3.00198505138664455042e-6);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q1, 2.52448340349684104192e-3);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q2, 2.27265548208155028766e-1);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q3, 2.00000000000000000009e0);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_C1, 0.693145751953125);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_C2, 1.42860682030941723212e-6);
// clamp x
x = pmax(pmin(x, p8d_exp_hi), p8d_exp_lo);
// Express exp(x) as exp(g + n*log(2)).
const Packet8d n =
_mm512_mul_round_pd(p8d_cephes_LOG2EF, x, _MM_FROUND_TO_NEAREST_INT);
// Get the remainder modulo log(2), i.e. the "g" described above. Subtract
// n*log(2) out in two steps, i.e. n*C1 + n*C2, C1+C2=log2 to get the last
// digits right.
const Packet8d nC1 = pmul(n, p8d_cephes_exp_C1);
const Packet8d nC2 = pmul(n, p8d_cephes_exp_C2);
x = psub(x, nC1);
x = psub(x, nC2);
const Packet8d x2 = pmul(x, x);
// Evaluate the numerator polynomial of the rational interpolant.
Packet8d px = p8d_cephes_exp_p0;
px = pmadd(px, x2, p8d_cephes_exp_p1);
px = pmadd(px, x2, p8d_cephes_exp_p2);
px = pmul(px, x);
// Evaluate the denominator polynomial of the rational interpolant.
Packet8d qx = p8d_cephes_exp_q0;
qx = pmadd(qx, x2, p8d_cephes_exp_q1);
qx = pmadd(qx, x2, p8d_cephes_exp_q2);
qx = pmadd(qx, x2, p8d_cephes_exp_q3);
// I don't really get this bit, copied from the SSE2 routines, so...
// TODO(gonnet): Figure out what is going on here, perhaps find a better
// rational interpolant?
x = _mm512_div_pd(px, psub(qx, px));
x = pmadd(p8d_2, x, p8d_1);
// Build e=2^n.
const Packet8d e = _mm512_castsi512_pd(_mm512_slli_epi64(
_mm512_add_epi64(_mm512_cvtpd_epi64(n), _mm512_set1_epi64(1023)), 52));
// Construct the result 2^n * exp(g) = e * x. The max is used to catch
// non-finite values in the input.
return pmax(pmul(x, e), _x);
}*/
return pexp_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexp)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexp)

View File

@@ -140,6 +140,7 @@ template<> struct packet_traits<double> : default_packet_traits
HasHalfPacket = 1,
#if EIGEN_GNUC_AT_LEAST(5, 3) || (!EIGEN_COMP_GNUC_STRICT)
HasLog = 1,
HasExp = 1,
HasSqrt = EIGEN_FAST_MATH,
HasRsqrt = EIGEN_FAST_MATH,
#endif
@@ -486,7 +487,7 @@ template<> EIGEN_STRONG_INLINE Packet16f pcmp_lt(const Packet16f& a, const Packe
}
template<> EIGEN_STRONG_INLINE Packet16f pcmp_lt_or_nan(const Packet16f& a, const Packet16f& b) {
__mmask16 mask = _mm512_cmp_ps_mask(a, b, _CMP_NGT_UQ);
__mmask16 mask = _mm512_cmp_ps_mask(a, b, _CMP_NGE_UQ);
return _mm512_castsi512_ps(
_mm512_mask_set1_epi32(_mm512_set1_epi32(0), mask, 0xffffffffu));
}
@@ -517,7 +518,7 @@ EIGEN_STRONG_INLINE Packet8d pcmp_lt(const Packet8d& a, const Packet8d& b) {
}
template <>
EIGEN_STRONG_INLINE Packet8d pcmp_lt_or_nan(const Packet8d& a, const Packet8d& b) {
__mmask8 mask = _mm512_cmp_pd_mask(a, b, _CMP_NGT_UQ);
__mmask8 mask = _mm512_cmp_pd_mask(a, b, _CMP_NGE_UQ);
return _mm512_castsi512_pd(
_mm512_mask_set1_epi64(_mm512_set1_epi64(0), mask, 0xffffffffffffffffu));
}
@@ -929,7 +930,8 @@ template<> EIGEN_STRONG_INLINE Packet8d pldexp<Packet8d>(const Packet8d& a, cons
Packet8i b = parithmetic_shift_right<2>(e); // floor(e/4)
// 2^b
Packet8i hi = _mm256_shuffle_epi32(padd(b, bias), _MM_SHUFFLE(3, 1, 2, 0));
const Packet8i permute_idx = _mm256_setr_epi32(0, 4, 1, 5, 2, 6, 3, 7);
Packet8i hi = _mm256_permutevar8x32_epi32(padd(b, bias), permute_idx);
Packet8i lo = _mm256_slli_epi64(hi, 52);
hi = _mm256_slli_epi64(_mm256_srli_epi64(hi, 32), 52);
Packet8d c = _mm512_castsi512_pd(_mm512_inserti64x4(_mm512_castsi256_si512(lo), hi, 1));
@@ -937,7 +939,7 @@ template<> EIGEN_STRONG_INLINE Packet8d pldexp<Packet8d>(const Packet8d& a, cons
// 2^(e - 3b)
b = psub(psub(psub(e, b), b), b); // e - 3b
hi = _mm256_shuffle_epi32(padd(b, bias), _MM_SHUFFLE(3, 1, 2, 0));
hi = _mm256_permutevar8x32_epi32(padd(b, bias), permute_idx);
lo = _mm256_slli_epi64(hi, 52);
hi = _mm256_slli_epi64(_mm256_srli_epi64(hi, 32), 52);
c = _mm512_castsi512_pd(_mm512_inserti64x4(_mm512_castsi256_si512(lo), hi, 1));
@@ -1943,23 +1945,15 @@ EIGEN_STRONG_INLINE Packet16f Bf16ToF32(const Packet16bf& a) {
EIGEN_STRONG_INLINE Packet16bf F32ToBf16(const Packet16f& a) {
Packet16bf r;
// Flush input denormals value to zero with hardware capability.
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
#if defined(EIGEN_VECTORIZE_AVX512DQ)
__m512 flush = _mm512_and_ps(a, a);
#else
__m512 flush = _mm512_max_ps(a, a);
#endif // EIGEN_VECTORIZE_AVX512DQ
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_OFF);
#if defined(EIGEN_VECTORIZE_AVX512BF16) && EIGEN_GNUC_AT_LEAST(10, 1)
// Since GCC 10.1 supports avx512bf16 and C style explicit cast
// (C++ static_cast is not supported yet), do converion via intrinsic
// and register path for performance.
r = (__m256i)(_mm512_cvtneps_pbh(flush));
r = (__m256i)(_mm512_cvtneps_pbh(a));
#else
__m512i t;
__m512i input = _mm512_castps_si512(flush);
__m512i input = _mm512_castps_si512(a);
__m512i nan = _mm512_set1_epi32(0x7fc0);
// uint32_t lsb = (input >> 16) & 1;
@@ -1972,9 +1966,9 @@ EIGEN_STRONG_INLINE Packet16bf F32ToBf16(const Packet16f& a) {
t = _mm512_srli_epi32(t, 16);
// Check NaN before converting back to bf16
__mmask16 mask = _mm512_cmp_ps_mask(flush, flush, _CMP_ORD_Q);
t = _mm512_mask_blend_epi32(mask, nan, t);
__mmask16 mask = _mm512_cmp_ps_mask(a, a, _CMP_ORD_Q);
t = _mm512_mask_blend_epi32(mask, nan, t);
// output.value = static_cast<uint16_t>(input);
r = _mm512_cvtepi32_epi16(t);
#endif // EIGEN_VECTORIZE_AVX512BF16

View File

@@ -74,7 +74,7 @@ struct Packet2cf
return Packet2cf(*this) -= b;
}
EIGEN_STRONG_INLINE Packet2cf operator-(void) const {
return Packet2cf(vec_neg(v));
return Packet2cf(-v);
}
Packet4f v;
@@ -206,45 +206,12 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst<Packet2cf>(prod);
}
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
// TODO optimize it for AltiVec
Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a, b);
Packet2cf res = pmul(a, pconj(b));
Packet4f s = pmul<Packet4f>(b.v, b.v);
return Packet2cf(pdiv(res.v, padd<Packet4f>(s, vec_perm(s, s, p16uc_COMPLEX32_REV))));
}
@@ -327,7 +294,7 @@ struct Packet1cd
return Packet1cd(*this) -= b;
}
EIGEN_STRONG_INLINE Packet1cd operator-(void) const {
return Packet1cd(vec_neg(v));
return Packet1cd(-v);
}
Packet2d v;
@@ -404,45 +371,12 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Pack
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d)
template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
// TODO optimize it for AltiVec
Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
Packet1cd res = pmul(a,pconj(b));
Packet2d s = pmul<Packet2d>(b.v, b.v);
return Packet1cd(pdiv(res.v, padd<Packet2d>(s, vec_perm(s, s, p16uc_REVERSE64))));
}

View File

@@ -11,6 +11,10 @@
#ifndef EIGEN_MATRIX_PRODUCT_ALTIVEC_H
#define EIGEN_MATRIX_PRODUCT_ALTIVEC_H
#ifndef EIGEN_ALTIVEC_USE_CUSTOM_PACK
#define EIGEN_ALTIVEC_USE_CUSTOM_PACK 1
#endif
#include "MatrixProductCommon.h"
// Since LLVM doesn't support dynamic dispatching, force either always MMA or VSX
@@ -113,7 +117,7 @@ const static Packet16uc p16uc_GETIMAG64 = { 8, 9, 10, 11, 12, 13, 14, 15,
* float32/64 and complex float32/64 version.
**/
template<typename Scalar, typename Index, int StorageOrder>
EIGEN_STRONG_INLINE std::complex<Scalar> getAdjointVal(Index i, Index j, const_blas_data_mapper<std::complex<Scalar>, Index, StorageOrder>& dt)
EIGEN_ALWAYS_INLINE std::complex<Scalar> getAdjointVal(Index i, Index j, const_blas_data_mapper<std::complex<Scalar>, Index, StorageOrder>& dt)
{
std::complex<Scalar> v;
if(i < j)
@@ -403,7 +407,7 @@ struct symm_pack_lhs<double, Index, Pack1, Pack2_dummy, StorageOrder>
**/
template<typename Scalar, typename Packet, typename Index>
EIGEN_STRONG_INLINE void storeBlock(Scalar* to, PacketBlock<Packet,4>& block)
EIGEN_ALWAYS_INLINE void storeBlock(Scalar* to, PacketBlock<Packet,4>& block)
{
const Index size = 16 / sizeof(Scalar);
pstore<Scalar>(to + (0 * size), block.packet[0]);
@@ -413,7 +417,7 @@ EIGEN_STRONG_INLINE void storeBlock(Scalar* to, PacketBlock<Packet,4>& block)
}
template<typename Scalar, typename Packet, typename Index>
EIGEN_STRONG_INLINE void storeBlock(Scalar* to, PacketBlock<Packet,2>& block)
EIGEN_ALWAYS_INLINE void storeBlock(Scalar* to, PacketBlock<Packet,2>& block)
{
const Index size = 16 / sizeof(Scalar);
pstore<Scalar>(to + (0 * size), block.packet[0]);
@@ -493,21 +497,21 @@ struct dhs_cpack {
cblock.packet[1] = lhs.template loadPacket<PacketC>(i, j + 2);
}
} else {
const std::complex<Scalar> *lhs0, *lhs1;
std::complex<Scalar> lhs0, lhs1;
if (UseLhs) {
lhs0 = &lhs(j + 0, i);
lhs1 = &lhs(j + 1, i);
cblock.packet[0] = pload2(lhs0, lhs1);
lhs0 = &lhs(j + 2, i);
lhs1 = &lhs(j + 3, i);
cblock.packet[1] = pload2(lhs0, lhs1);
lhs0 = lhs(j + 0, i);
lhs1 = lhs(j + 1, i);
cblock.packet[0] = pload2(&lhs0, &lhs1);
lhs0 = lhs(j + 2, i);
lhs1 = lhs(j + 3, i);
cblock.packet[1] = pload2(&lhs0, &lhs1);
} else {
lhs0 = &lhs(i, j + 0);
lhs1 = &lhs(i, j + 1);
cblock.packet[0] = pload2(lhs0, lhs1);
lhs0 = &lhs(i, j + 2);
lhs1 = &lhs(i, j + 3);
cblock.packet[1] = pload2(lhs0, lhs1);
lhs0 = lhs(i, j + 0);
lhs1 = lhs(i, j + 1);
cblock.packet[0] = pload2(&lhs0, &lhs1);
lhs0 = lhs(i, j + 2);
lhs1 = lhs(i, j + 3);
cblock.packet[1] = pload2(&lhs0, &lhs1);
}
}
@@ -992,7 +996,7 @@ struct dhs_cpack<double, Index, DataMapper, Packet, PacketC, StorageOrder, Conju
// 512-bits rank1-update of acc. It can either positive or negative accumulate (useful for complex gemm).
template<typename Packet, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pger_common(PacketBlock<Packet,4>* acc, const Packet& lhsV, const Packet* rhsV)
EIGEN_ALWAYS_INLINE void pger_common(PacketBlock<Packet,4>* acc, const Packet& lhsV, const Packet* rhsV)
{
if(NegativeAccumulate)
{
@@ -1009,7 +1013,7 @@ EIGEN_STRONG_INLINE void pger_common(PacketBlock<Packet,4>* acc, const Packet& l
}
template<typename Packet, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pger_common(PacketBlock<Packet,1>* acc, const Packet& lhsV, const Packet* rhsV)
EIGEN_ALWAYS_INLINE void pger_common(PacketBlock<Packet,1>* acc, const Packet& lhsV, const Packet* rhsV)
{
if(NegativeAccumulate)
{
@@ -1020,7 +1024,7 @@ EIGEN_STRONG_INLINE void pger_common(PacketBlock<Packet,1>* acc, const Packet& l
}
template<int N, typename Scalar, typename Packet, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pger(PacketBlock<Packet,N>* acc, const Scalar* lhs, const Packet* rhsV)
EIGEN_ALWAYS_INLINE void pger(PacketBlock<Packet,N>* acc, const Scalar* lhs, const Packet* rhsV)
{
Packet lhsV = pload<Packet>(lhs);
@@ -1028,7 +1032,7 @@ EIGEN_STRONG_INLINE void pger(PacketBlock<Packet,N>* acc, const Scalar* lhs, con
}
template<typename Scalar, typename Packet, typename Index>
EIGEN_STRONG_INLINE void loadPacketRemaining(const Scalar* lhs, Packet &lhsV, Index remaining_rows)
EIGEN_ALWAYS_INLINE void loadPacketRemaining(const Scalar* lhs, Packet &lhsV, Index remaining_rows)
{
#ifdef _ARCH_PWR9
lhsV = vec_xl_len((Scalar *)lhs, remaining_rows * sizeof(Scalar));
@@ -1041,7 +1045,7 @@ EIGEN_STRONG_INLINE void loadPacketRemaining(const Scalar* lhs, Packet &lhsV, In
}
template<int N, typename Scalar, typename Packet, typename Index, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pger(PacketBlock<Packet,N>* acc, const Scalar* lhs, const Packet* rhsV, Index remaining_rows)
EIGEN_ALWAYS_INLINE void pger(PacketBlock<Packet,N>* acc, const Scalar* lhs, const Packet* rhsV, Index remaining_rows)
{
Packet lhsV;
loadPacketRemaining<Scalar, Packet, Index>(lhs, lhsV, remaining_rows);
@@ -1051,7 +1055,7 @@ EIGEN_STRONG_INLINE void pger(PacketBlock<Packet,N>* acc, const Scalar* lhs, con
// 512-bits rank1-update of complex acc. It takes decoupled accumulators as entries. It also takes cares of mixed types real * complex and complex * real.
template<int N, typename Packet, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void pgerc_common(PacketBlock<Packet,N>* accReal, PacketBlock<Packet,N>* accImag, const Packet &lhsV, const Packet &lhsVi, const Packet* rhsV, const Packet* rhsVi)
EIGEN_ALWAYS_INLINE void pgerc_common(PacketBlock<Packet,N>* accReal, PacketBlock<Packet,N>* accImag, const Packet &lhsV, const Packet &lhsVi, const Packet* rhsV, const Packet* rhsVi)
{
pger_common<Packet, false>(accReal, lhsV, rhsV);
if(LhsIsReal)
@@ -1070,7 +1074,7 @@ EIGEN_STRONG_INLINE void pgerc_common(PacketBlock<Packet,N>* accReal, PacketBloc
}
template<int N, typename Scalar, typename Packet, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void pgerc(PacketBlock<Packet,N>* accReal, PacketBlock<Packet,N>* accImag, const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, const Packet* rhsV, const Packet* rhsVi)
EIGEN_ALWAYS_INLINE void pgerc(PacketBlock<Packet,N>* accReal, PacketBlock<Packet,N>* accImag, const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, const Packet* rhsV, const Packet* rhsVi)
{
Packet lhsV = ploadLhs<Scalar, Packet>(lhs_ptr);
Packet lhsVi;
@@ -1081,7 +1085,7 @@ EIGEN_STRONG_INLINE void pgerc(PacketBlock<Packet,N>* accReal, PacketBlock<Packe
}
template<typename Scalar, typename Packet, typename Index, bool LhsIsReal>
EIGEN_STRONG_INLINE void loadPacketRemaining(const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, Packet &lhsV, Packet &lhsVi, Index remaining_rows)
EIGEN_ALWAYS_INLINE void loadPacketRemaining(const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, Packet &lhsV, Packet &lhsVi, Index remaining_rows)
{
#ifdef _ARCH_PWR9
lhsV = vec_xl_len((Scalar *)lhs_ptr, remaining_rows * sizeof(Scalar));
@@ -1098,7 +1102,7 @@ EIGEN_STRONG_INLINE void loadPacketRemaining(const Scalar* lhs_ptr, const Scalar
}
template<int N, typename Scalar, typename Packet, typename Index, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void pgerc(PacketBlock<Packet,N>* accReal, PacketBlock<Packet,N>* accImag, const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, const Packet* rhsV, const Packet* rhsVi, Index remaining_rows)
EIGEN_ALWAYS_INLINE void pgerc(PacketBlock<Packet,N>* accReal, PacketBlock<Packet,N>* accImag, const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, const Packet* rhsV, const Packet* rhsVi, Index remaining_rows)
{
Packet lhsV, lhsVi;
loadPacketRemaining<Scalar, Packet, Index, LhsIsReal>(lhs_ptr, lhs_ptr_imag, lhsV, lhsVi, remaining_rows);
@@ -1107,14 +1111,14 @@ EIGEN_STRONG_INLINE void pgerc(PacketBlock<Packet,N>* accReal, PacketBlock<Packe
}
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadLhs(const Scalar* lhs)
EIGEN_ALWAYS_INLINE Packet ploadLhs(const Scalar* lhs)
{
return *reinterpret_cast<Packet *>(const_cast<Scalar *>(lhs));
return ploadu<Packet>(lhs);
}
// Zero the accumulator on PacketBlock.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void bsetzero(PacketBlock<Packet,4>& acc)
EIGEN_ALWAYS_INLINE void bsetzero(PacketBlock<Packet,4>& acc)
{
acc.packet[0] = pset1<Packet>((Scalar)0);
acc.packet[1] = pset1<Packet>((Scalar)0);
@@ -1123,14 +1127,14 @@ EIGEN_STRONG_INLINE void bsetzero(PacketBlock<Packet,4>& acc)
}
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void bsetzero(PacketBlock<Packet,1>& acc)
EIGEN_ALWAYS_INLINE void bsetzero(PacketBlock<Packet,1>& acc)
{
acc.packet[0] = pset1<Packet>((Scalar)0);
}
// Scale the PacketBlock vectors by alpha.
template<typename Packet>
EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha)
EIGEN_ALWAYS_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha)
{
acc.packet[0] = pmadd(pAlpha, accZ.packet[0], acc.packet[0]);
acc.packet[1] = pmadd(pAlpha, accZ.packet[1], acc.packet[1]);
@@ -1139,13 +1143,13 @@ EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4
}
template<typename Packet>
EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,1>& acc, PacketBlock<Packet,1>& accZ, const Packet& pAlpha)
EIGEN_ALWAYS_INLINE void bscale(PacketBlock<Packet,1>& acc, PacketBlock<Packet,1>& accZ, const Packet& pAlpha)
{
acc.packet[0] = pmadd(pAlpha, accZ.packet[0], acc.packet[0]);
}
template<typename Packet>
EIGEN_STRONG_INLINE void bscalec_common(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha)
EIGEN_ALWAYS_INLINE void bscalec_common(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha)
{
acc.packet[0] = pmul<Packet>(accZ.packet[0], pAlpha);
acc.packet[1] = pmul<Packet>(accZ.packet[1], pAlpha);
@@ -1154,14 +1158,14 @@ EIGEN_STRONG_INLINE void bscalec_common(PacketBlock<Packet,4>& acc, PacketBlock<
}
template<typename Packet>
EIGEN_STRONG_INLINE void bscalec_common(PacketBlock<Packet,1>& acc, PacketBlock<Packet,1>& accZ, const Packet& pAlpha)
EIGEN_ALWAYS_INLINE void bscalec_common(PacketBlock<Packet,1>& acc, PacketBlock<Packet,1>& accZ, const Packet& pAlpha)
{
acc.packet[0] = pmul<Packet>(accZ.packet[0], pAlpha);
}
// Complex version of PacketBlock scaling.
template<typename Packet, int N>
EIGEN_STRONG_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packet,N>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,N>& cReal, PacketBlock<Packet,N>& cImag)
EIGEN_ALWAYS_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packet,N>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,N>& cReal, PacketBlock<Packet,N>& cImag)
{
bscalec_common<Packet>(cReal, aReal, bReal);
@@ -1173,7 +1177,7 @@ EIGEN_STRONG_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packe
}
template<typename Packet>
EIGEN_STRONG_INLINE void band(PacketBlock<Packet,4>& acc, const Packet& pMask)
EIGEN_ALWAYS_INLINE void band(PacketBlock<Packet,4>& acc, const Packet& pMask)
{
acc.packet[0] = pand(acc.packet[0], pMask);
acc.packet[1] = pand(acc.packet[1], pMask);
@@ -1182,7 +1186,7 @@ EIGEN_STRONG_INLINE void band(PacketBlock<Packet,4>& acc, const Packet& pMask)
}
template<typename Packet>
EIGEN_STRONG_INLINE void bscalec(PacketBlock<Packet,4>& aReal, PacketBlock<Packet,4>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,4>& cReal, PacketBlock<Packet,4>& cImag, const Packet& pMask)
EIGEN_ALWAYS_INLINE void bscalec(PacketBlock<Packet,4>& aReal, PacketBlock<Packet,4>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,4>& cReal, PacketBlock<Packet,4>& cImag, const Packet& pMask)
{
band<Packet>(aReal, pMask);
band<Packet>(aImag, pMask);
@@ -1192,7 +1196,7 @@ EIGEN_STRONG_INLINE void bscalec(PacketBlock<Packet,4>& aReal, PacketBlock<Packe
// Load a PacketBlock, the N parameters make tunning gemm easier so we can add more accumulators as needed.
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,4>& acc, const DataMapper& res, Index row, Index col)
EIGEN_ALWAYS_INLINE void bload(PacketBlock<Packet,4>& acc, const DataMapper& res, Index row, Index col)
{
if (StorageOrder == RowMajor) {
acc.packet[0] = res.template loadPacket<Packet>(row + 0, col + N*accCols);
@@ -1209,7 +1213,7 @@ EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,4>& acc, const DataMapper& res
// An overload of bload when you have a PacketBLock with 8 vectors.
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,8>& acc, const DataMapper& res, Index row, Index col)
EIGEN_ALWAYS_INLINE void bload(PacketBlock<Packet,8>& acc, const DataMapper& res, Index row, Index col)
{
if (StorageOrder == RowMajor) {
acc.packet[0] = res.template loadPacket<Packet>(row + 0, col + N*accCols);
@@ -1233,7 +1237,7 @@ EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,8>& acc, const DataMapper& res
}
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,2>& acc, const DataMapper& res, Index row, Index col)
EIGEN_ALWAYS_INLINE void bload(PacketBlock<Packet,2>& acc, const DataMapper& res, Index row, Index col)
{
acc.packet[0] = res.template loadPacket<Packet>(row + N*accCols, col + 0);
acc.packet[1] = res.template loadPacket<Packet>(row + (N+1)*accCols, col + 0);
@@ -1246,7 +1250,7 @@ const static Packet4i mask43 = { -1, -1, -1, 0 };
const static Packet2l mask21 = { -1, 0 };
template<typename Packet>
EIGEN_STRONG_INLINE Packet bmask(const int remaining_rows)
EIGEN_ALWAYS_INLINE Packet bmask(const int remaining_rows)
{
if (remaining_rows == 0) {
return pset1<Packet>(float(0.0)); // Not used
@@ -1260,7 +1264,7 @@ EIGEN_STRONG_INLINE Packet bmask(const int remaining_rows)
}
template<>
EIGEN_STRONG_INLINE Packet2d bmask<Packet2d>(const int remaining_rows)
EIGEN_ALWAYS_INLINE Packet2d bmask<Packet2d>(const int remaining_rows)
{
if (remaining_rows == 0) {
return pset1<Packet2d>(double(0.0)); // Not used
@@ -1270,7 +1274,7 @@ EIGEN_STRONG_INLINE Packet2d bmask<Packet2d>(const int remaining_rows)
}
template<typename Packet>
EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha, const Packet& pMask)
EIGEN_ALWAYS_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha, const Packet& pMask)
{
band<Packet>(accZ, pMask);
@@ -1278,13 +1282,13 @@ EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4
}
template<typename Packet>
EIGEN_STRONG_INLINE void pbroadcast4_old(const __UNPACK_TYPE__(Packet)* a, Packet& a0, Packet& a1, Packet& a2, Packet& a3)
EIGEN_ALWAYS_INLINE void pbroadcast4_old(const __UNPACK_TYPE__(Packet)* a, Packet& a0, Packet& a1, Packet& a2, Packet& a3)
{
pbroadcast4<Packet>(a, a0, a1, a2, a3);
}
template<>
EIGEN_STRONG_INLINE void pbroadcast4_old<Packet2d>(const double* a, Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3)
EIGEN_ALWAYS_INLINE void pbroadcast4_old<Packet2d>(const double* a, Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3)
{
a1 = pload<Packet2d>(a);
a3 = pload<Packet2d>(a + 2);
@@ -1298,7 +1302,7 @@ EIGEN_STRONG_INLINE void pbroadcast4_old<Packet2d>(const double* a, Packet2d& a0
#define PEEL 7
template<typename Scalar, typename Packet, typename Index>
EIGEN_STRONG_INLINE void MICRO_EXTRA_COL(
EIGEN_ALWAYS_INLINE void MICRO_EXTRA_COL(
const Scalar* &lhs_ptr,
const Scalar* &rhs_ptr,
PacketBlock<Packet,1> &accZero,
@@ -1362,7 +1366,7 @@ EIGEN_STRONG_INLINE void gemm_extra_col(
}
template<typename Scalar, typename Packet, typename Index, const Index accRows>
EIGEN_STRONG_INLINE void MICRO_EXTRA_ROW(
EIGEN_ALWAYS_INLINE void MICRO_EXTRA_ROW(
const Scalar* &lhs_ptr,
const Scalar* &rhs_ptr,
PacketBlock<Packet,4> &accZero,
@@ -1565,9 +1569,8 @@ EIGEN_STRONG_INLINE void gemm_unrolled_iteration(
Index col,
const Packet& pAlpha)
{
asm("#gemm begin");
const Scalar* rhs_ptr = rhs_base;
const Scalar* lhs_ptr0, * lhs_ptr1, * lhs_ptr2, * lhs_ptr3, * lhs_ptr4, * lhs_ptr5, * lhs_ptr6, * lhs_ptr7;
const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, * lhs_ptr7 = NULL;
PacketBlock<Packet,4> accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7;
PacketBlock<Packet,4> acc;
@@ -1588,7 +1591,6 @@ asm("#gemm begin");
MICRO_STORE
row += unroll_factor*accCols;
asm("#gemm end");
}
template<int unroll_factor, typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accCols>
@@ -1605,7 +1607,7 @@ EIGEN_STRONG_INLINE void gemm_unrolled_col_iteration(
const Packet& pAlpha)
{
const Scalar* rhs_ptr = rhs_base;
const Scalar* lhs_ptr0, * lhs_ptr1, * lhs_ptr2, * lhs_ptr3, * lhs_ptr4, * lhs_ptr5, * lhs_ptr6, *lhs_ptr7;
const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, *lhs_ptr7 = NULL;
PacketBlock<Packet,1> accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7;
PacketBlock<Packet,1> acc;
@@ -1789,7 +1791,7 @@ EIGEN_STRONG_INLINE void gemm(const DataMapper& res, const Scalar* blockA, const
#define PEEL_COMPLEX 3
template<typename Scalar, typename Packet, typename Index, const Index accRows, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void MICRO_COMPLEX_EXTRA_COL(
EIGEN_ALWAYS_INLINE void MICRO_COMPLEX_EXTRA_COL(
const Scalar* &lhs_ptr_real, const Scalar* &lhs_ptr_imag,
const Scalar* &rhs_ptr_real, const Scalar* &rhs_ptr_imag,
PacketBlock<Packet,1> &accReal, PacketBlock<Packet,1> &accImag,
@@ -1888,7 +1890,7 @@ EIGEN_STRONG_INLINE void gemm_complex_extra_col(
}
template<typename Scalar, typename Packet, typename Index, const Index accRows, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void MICRO_COMPLEX_EXTRA_ROW(
EIGEN_ALWAYS_INLINE void MICRO_COMPLEX_EXTRA_ROW(
const Scalar* &lhs_ptr_real, const Scalar* &lhs_ptr_imag,
const Scalar* &rhs_ptr_real, const Scalar* &rhs_ptr_imag,
PacketBlock<Packet,4> &accReal, PacketBlock<Packet,4> &accImag,
@@ -1924,7 +1926,6 @@ EIGEN_STRONG_INLINE void gemm_complex_extra_row(
const Packet& pAlphaImag,
const Packet& pMask)
{
asm("#gemm_complex begin");
const Scalar* rhs_ptr_real = rhs_base;
const Scalar* rhs_ptr_imag;
if(!RhsIsReal) rhs_ptr_imag = rhs_base + accRows*strideB;
@@ -2001,7 +2002,6 @@ asm("#gemm_complex begin");
}
}
}
asm("#gemm_complex end");
}
#define MICRO_COMPLEX_UNROLL(func) \
@@ -2173,7 +2173,6 @@ EIGEN_STRONG_INLINE void gemm_complex_unrolled_iteration(
const Packet& pAlphaReal,
const Packet& pAlphaImag)
{
asm("#gemm_complex_unrolled begin");
const Scalar* rhs_ptr_real = rhs_base;
const Scalar* rhs_ptr_imag;
if(!RhsIsReal) {
@@ -2181,9 +2180,9 @@ asm("#gemm_complex_unrolled begin");
} else {
EIGEN_UNUSED_VARIABLE(rhs_ptr_imag);
}
const Scalar* lhs_ptr_real0, * lhs_ptr_imag0, * lhs_ptr_real1, * lhs_ptr_imag1;
const Scalar* lhs_ptr_real2, * lhs_ptr_imag2, * lhs_ptr_real3, * lhs_ptr_imag3;
const Scalar* lhs_ptr_real4, * lhs_ptr_imag4;
const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_imag0 = NULL, * lhs_ptr_real1 = NULL, * lhs_ptr_imag1 = NULL;
const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_imag2 = NULL, * lhs_ptr_real3 = NULL, * lhs_ptr_imag3 = NULL;
const Scalar* lhs_ptr_real4 = NULL, * lhs_ptr_imag4 = NULL;
PacketBlock<Packet,4> accReal0, accImag0, accReal1, accImag1;
PacketBlock<Packet,4> accReal2, accImag2, accReal3, accImag3;
PacketBlock<Packet,4> accReal4, accImag4;
@@ -2211,7 +2210,6 @@ asm("#gemm_complex_unrolled begin");
MICRO_COMPLEX_STORE
row += unroll_factor*accCols;
asm("#gemm_complex_unrolled end");
}
template<int unroll_factor, typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
@@ -2236,9 +2234,9 @@ EIGEN_STRONG_INLINE void gemm_complex_unrolled_col_iteration(
} else {
EIGEN_UNUSED_VARIABLE(rhs_ptr_imag);
}
const Scalar* lhs_ptr_real0, * lhs_ptr_imag0, * lhs_ptr_real1, * lhs_ptr_imag1;
const Scalar* lhs_ptr_real2, * lhs_ptr_imag2, * lhs_ptr_real3, * lhs_ptr_imag3;
const Scalar* lhs_ptr_real4, * lhs_ptr_imag4;
const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_imag0 = NULL, * lhs_ptr_real1 = NULL, * lhs_ptr_imag1 = NULL;
const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_imag2 = NULL, * lhs_ptr_real3 = NULL, * lhs_ptr_imag3 = NULL;
const Scalar* lhs_ptr_real4 = NULL, * lhs_ptr_imag4 = NULL;
PacketBlock<Packet,1> accReal0, accImag0, accReal1, accImag1;
PacketBlock<Packet,1> accReal2, accImag2, accReal3, accImag3;
PacketBlock<Packet,1> accReal4, accImag4;
@@ -2429,6 +2427,7 @@ void gemm_pack_lhs<double, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Co
pack(blockA, lhs, depth, rows, stride, offset);
}
#if EIGEN_ALTIVEC_USE_CUSTOM_PACK
template<typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
struct gemm_pack_rhs<double, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode>
{
@@ -2456,6 +2455,7 @@ void gemm_pack_rhs<double, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode
dhs_pack<double, Index, DataMapper, Packet2d, RowMajor, PanelMode, false> pack;
pack(blockB, rhs, depth, cols, stride, offset);
}
#endif
template<typename Index, typename DataMapper, int Pack1, int Pack2, typename Packet, bool Conjugate, bool PanelMode>
struct gemm_pack_lhs<float, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Conjugate, PanelMode>
@@ -2484,6 +2484,7 @@ void gemm_pack_lhs<float, Index, DataMapper, Pack1, Pack2, Packet, ColMajor, Con
dhs_pack<float, Index, DataMapper, Packet4f, ColMajor, PanelMode, true> pack;
pack(blockA, lhs, depth, rows, stride, offset);
}
template<typename Index, typename DataMapper, int Pack1, int Pack2, typename Packet, bool Conjugate, bool PanelMode>
struct gemm_pack_lhs<std::complex<float>, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Conjugate, PanelMode>
{
@@ -2512,6 +2513,7 @@ void gemm_pack_lhs<std::complex<float>, Index, DataMapper, Pack1, Pack2, Packet,
pack(blockA, lhs, depth, rows, stride, offset);
}
#if EIGEN_ALTIVEC_USE_CUSTOM_PACK
template<typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
struct gemm_pack_rhs<float, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode>
{
@@ -2539,6 +2541,7 @@ void gemm_pack_rhs<float, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode>
dhs_pack<float, Index, DataMapper, Packet4f, RowMajor, PanelMode, false> pack;
pack(blockB, rhs, depth, cols, stride, offset);
}
#endif
template<typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
struct gemm_pack_rhs<std::complex<float>, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode>

View File

@@ -54,7 +54,7 @@ EIGEN_STRONG_INLINE void gemm_unrolled_col(
const Packet& pAlpha);
template<typename Packet>
EIGEN_STRONG_INLINE Packet bmask(const int remaining_rows);
EIGEN_ALWAYS_INLINE Packet bmask(const int remaining_rows);
template<typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_complex_extra_col(
@@ -107,19 +107,19 @@ EIGEN_STRONG_INLINE void gemm_complex_unrolled_col(
const Packet& pAlphaImag);
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadLhs(const Scalar* lhs);
EIGEN_ALWAYS_INLINE Packet ploadLhs(const Scalar* lhs);
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,4>& acc, const DataMapper& res, Index row, Index col);
EIGEN_ALWAYS_INLINE void bload(PacketBlock<Packet,4>& acc, const DataMapper& res, Index row, Index col);
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int N, int StorageOrder>
EIGEN_STRONG_INLINE void bload(PacketBlock<Packet,8>& acc, const DataMapper& res, Index row, Index col);
EIGEN_ALWAYS_INLINE void bload(PacketBlock<Packet,8>& acc, const DataMapper& res, Index row, Index col);
template<typename Packet>
EIGEN_STRONG_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha);
EIGEN_ALWAYS_INLINE void bscale(PacketBlock<Packet,4>& acc, PacketBlock<Packet,4>& accZ, const Packet& pAlpha);
template<typename Packet, int N>
EIGEN_STRONG_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packet,N>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,N>& cReal, PacketBlock<Packet,N>& cImag);
EIGEN_ALWAYS_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packet,N>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,N>& cReal, PacketBlock<Packet,N>& cImag);
const static Packet16uc p16uc_SETCOMPLEX32_FIRST = { 0, 1, 2, 3,
16, 17, 18, 19,
@@ -141,7 +141,7 @@ const static Packet16uc p16uc_SETCOMPLEX64_SECOND = { 8, 9, 10, 11, 12, 13, 14
// Grab two decouples real/imaginary PacketBlocks and return two coupled (real/imaginary pairs) PacketBlocks.
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple_common(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_FIRST);
@@ -155,7 +155,7 @@ EIGEN_STRONG_INLINE void bcouple_common(PacketBlock<Packet,4>& taccReal, PacketB
}
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc,8>& tRes, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
EIGEN_ALWAYS_INLINE void bcouple(PacketBlock<Packet,4>& taccReal, PacketBlock<Packet,4>& taccImag, PacketBlock<Packetc,8>& tRes, PacketBlock<Packetc, 4>& acc1, PacketBlock<Packetc, 4>& acc2)
{
bcouple_common<Packet, Packetc>(taccReal, taccImag, acc1, acc2);
@@ -171,7 +171,7 @@ EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,4>& taccReal, PacketBlock<Pa
}
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple_common(PacketBlock<Packet,1>& taccReal, PacketBlock<Packet,1>& taccImag, PacketBlock<Packetc, 1>& acc1, PacketBlock<Packetc, 1>& acc2)
EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock<Packet,1>& taccReal, PacketBlock<Packet,1>& taccImag, PacketBlock<Packetc, 1>& acc1, PacketBlock<Packetc, 1>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST);
@@ -179,7 +179,7 @@ EIGEN_STRONG_INLINE void bcouple_common(PacketBlock<Packet,1>& taccReal, PacketB
}
template<typename Packet, typename Packetc>
EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,1>& taccReal, PacketBlock<Packet,1>& taccImag, PacketBlock<Packetc,2>& tRes, PacketBlock<Packetc, 1>& acc1, PacketBlock<Packetc, 1>& acc2)
EIGEN_ALWAYS_INLINE void bcouple(PacketBlock<Packet,1>& taccReal, PacketBlock<Packet,1>& taccImag, PacketBlock<Packetc,2>& tRes, PacketBlock<Packetc, 1>& acc1, PacketBlock<Packetc, 1>& acc2)
{
bcouple_common<Packet, Packetc>(taccReal, taccImag, acc1, acc2);
@@ -189,7 +189,7 @@ EIGEN_STRONG_INLINE void bcouple(PacketBlock<Packet,1>& taccReal, PacketBlock<Pa
}
template<>
EIGEN_STRONG_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2d,4>& taccReal, PacketBlock<Packet2d,4>& taccImag, PacketBlock<Packet1cd, 4>& acc1, PacketBlock<Packet1cd, 4>& acc2)
EIGEN_ALWAYS_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2d,4>& taccReal, PacketBlock<Packet2d,4>& taccImag, PacketBlock<Packet1cd, 4>& acc1, PacketBlock<Packet1cd, 4>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST);
acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_FIRST);
@@ -203,7 +203,7 @@ EIGEN_STRONG_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2
}
template<>
EIGEN_STRONG_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2d,1>& taccReal, PacketBlock<Packet2d,1>& taccImag, PacketBlock<Packet1cd, 1>& acc1, PacketBlock<Packet1cd, 1>& acc2)
EIGEN_ALWAYS_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2d,1>& taccReal, PacketBlock<Packet2d,1>& taccImag, PacketBlock<Packet1cd, 1>& acc1, PacketBlock<Packet1cd, 1>& acc2)
{
acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST);
@@ -212,9 +212,9 @@ EIGEN_STRONG_INLINE void bcouple_common<Packet2d, Packet1cd>(PacketBlock<Packet2
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE Packet ploadRhs(const Scalar* rhs)
EIGEN_ALWAYS_INLINE Packet ploadRhs(const Scalar* rhs)
{
return *reinterpret_cast<Packet *>(const_cast<Scalar *>(rhs));
return ploadu<Packet>(rhs);
}
} // end namespace internal

View File

@@ -24,13 +24,13 @@ namespace Eigen {
namespace internal {
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void bsetzeroMMA(__vector_quad* acc)
EIGEN_ALWAYS_INLINE void bsetzeroMMA(__vector_quad* acc)
{
__builtin_mma_xxsetaccz(acc);
}
template<typename DataMapper, typename Index, typename Packet, const Index accCols>
EIGEN_STRONG_INLINE void storeAccumulator(Index i, Index j, const DataMapper& data, const Packet& alpha, __vector_quad* acc)
EIGEN_ALWAYS_INLINE void storeAccumulator(Index i, Index j, const DataMapper& data, const Packet& alpha, __vector_quad* acc)
{
PacketBlock<Packet, 4> result;
__builtin_mma_disassemble_acc(&result.packet, acc);
@@ -44,7 +44,7 @@ EIGEN_STRONG_INLINE void storeAccumulator(Index i, Index j, const DataMapper& da
}
template<typename DataMapper, typename Index, typename Packet, typename Packetc, const Index accColsC, int N>
EIGEN_STRONG_INLINE void storeComplexAccumulator(Index i, Index j, const DataMapper& data, const Packet& alphaReal, const Packet& alphaImag, __vector_quad* accReal, __vector_quad* accImag)
EIGEN_ALWAYS_INLINE void storeComplexAccumulator(Index i, Index j, const DataMapper& data, const Packet& alphaReal, const Packet& alphaImag, __vector_quad* accReal, __vector_quad* accImag)
{
PacketBlock<Packet, 4> resultReal, resultImag;
__builtin_mma_disassemble_acc(&resultReal.packet, accReal);
@@ -65,7 +65,7 @@ EIGEN_STRONG_INLINE void storeComplexAccumulator(Index i, Index j, const DataMap
// Defaults to float32, since Eigen still supports C++03 we can't use default template arguments
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const RhsPacket& a, const LhsPacket& b)
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const RhsPacket& a, const LhsPacket& b)
{
if(NegativeAccumulate)
{
@@ -76,7 +76,7 @@ EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const RhsPacket& a, const L
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const PacketBlock<Packet2d,2>& a, const Packet2d& b)
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const PacketBlock<Packet2d,2>& a, const Packet2d& b)
{
__vector_pair* a0 = (__vector_pair *)(&a.packet[0]);
if(NegativeAccumulate)
@@ -88,7 +88,7 @@ EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const PacketBlock<Packet2d,
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet2d& b)
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet2d& b)
{
if(NegativeAccumulate)
{
@@ -99,15 +99,13 @@ EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, con
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_STRONG_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet4f& b)
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad*, const __vector_pair&, const Packet4f&)
{
EIGEN_UNUSED_VARIABLE(acc); // Just for compilation
EIGEN_UNUSED_VARIABLE(a);
EIGEN_UNUSED_VARIABLE(b);
// Just for compilation
}
template<typename Scalar, typename Packet, typename RhsPacket, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void pgercMMA(__vector_quad* accReal, __vector_quad* accImag, const Packet& lhsV, const Packet& lhsVi, const RhsPacket& rhsV, const RhsPacket& rhsVi)
EIGEN_ALWAYS_INLINE void pgercMMA(__vector_quad* accReal, __vector_quad* accImag, const Packet& lhsV, const Packet& lhsVi, const RhsPacket& rhsV, const RhsPacket& rhsVi)
{
pgerMMA<Packet, RhsPacket, false>(accReal, rhsV, lhsV);
if(LhsIsReal) {
@@ -125,20 +123,20 @@ EIGEN_STRONG_INLINE void pgercMMA(__vector_quad* accReal, __vector_quad* accImag
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_STRONG_INLINE void ploadRhsMMA(const Scalar* rhs, Packet& rhsV)
EIGEN_ALWAYS_INLINE void ploadRhsMMA(const Scalar* rhs, Packet& rhsV)
{
rhsV = ploadRhs<Scalar, Packet>((const Scalar*)(rhs));
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA<double, PacketBlock<Packet2d, 2> >(const double* rhs, PacketBlock<Packet2d, 2>& rhsV)
EIGEN_ALWAYS_INLINE void ploadRhsMMA<double, PacketBlock<Packet2d, 2> >(const double* rhs, PacketBlock<Packet2d, 2>& rhsV)
{
rhsV.packet[0] = ploadRhs<double, Packet2d>((const double *)((Packet2d *)rhs ));
rhsV.packet[1] = ploadRhs<double, Packet2d>((const double *)(((Packet2d *)rhs) + 1));
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA<double, __vector_pair>(const double* rhs, __vector_pair& rhsV)
EIGEN_ALWAYS_INLINE void ploadRhsMMA<double, __vector_pair>(const double* rhs, __vector_pair& rhsV)
{
#if EIGEN_COMP_LLVM
__builtin_vsx_assemble_pair(&rhsV,
@@ -150,11 +148,9 @@ EIGEN_STRONG_INLINE void ploadRhsMMA<double, __vector_pair>(const double* rhs, _
}
template<>
EIGEN_STRONG_INLINE void ploadRhsMMA(const float* rhs, __vector_pair& rhsV)
EIGEN_ALWAYS_INLINE void ploadRhsMMA(const float*, __vector_pair&)
{
// Just for compilation
EIGEN_UNUSED_VARIABLE(rhs);
EIGEN_UNUSED_VARIABLE(rhsV);
}
// PEEL_MMA loop factor.
@@ -259,9 +255,8 @@ EIGEN_STRONG_INLINE void gemm_unrolled_MMA_iteration(
Index col,
const Packet& pAlpha)
{
asm("#gemm_MMA begin");
const Scalar* rhs_ptr = rhs_base;
const Scalar* lhs_ptr0, * lhs_ptr1, * lhs_ptr2, * lhs_ptr3, * lhs_ptr4, * lhs_ptr5, * lhs_ptr6, * lhs_ptr7;
const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, * lhs_ptr7 = NULL;
__vector_quad accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7;
MICRO_MMA_SRC_PTR
@@ -281,7 +276,6 @@ asm("#gemm_MMA begin");
MICRO_MMA_STORE
row += unroll_factor*accCols;
asm("#gemm_MMA end");
}
template<typename Scalar, typename Index, typename Packet, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols>
@@ -509,7 +503,6 @@ EIGEN_STRONG_INLINE void gemm_complex_unrolled_MMA_iteration(
const Packet& pAlphaReal,
const Packet& pAlphaImag)
{
asm("#gemm_complex_MMA begin");
const Scalar* rhs_ptr_real = rhs_base;
const Scalar* rhs_ptr_imag;
if(!RhsIsReal) {
@@ -517,9 +510,9 @@ asm("#gemm_complex_MMA begin");
} else {
EIGEN_UNUSED_VARIABLE(rhs_ptr_imag);
}
const Scalar* lhs_ptr_real0, * lhs_ptr_imag0, * lhs_ptr_real1, * lhs_ptr_imag1;
const Scalar* lhs_ptr_real2, * lhs_ptr_imag2, * lhs_ptr_real3, * lhs_ptr_imag3;
const Scalar* lhs_ptr_real4, * lhs_ptr_imag4;
const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_imag0 = NULL, * lhs_ptr_real1 = NULL, * lhs_ptr_imag1 = NULL;
const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_imag2 = NULL, * lhs_ptr_real3 = NULL, * lhs_ptr_imag3 = NULL;
const Scalar* lhs_ptr_real4 = NULL, * lhs_ptr_imag4 = NULL;
__vector_quad accReal0, accImag0, accReal1, accImag1, accReal2, accImag2, accReal3, accImag3, accReal4, accImag4;
MICRO_COMPLEX_MMA_SRC_PTR
@@ -542,7 +535,6 @@ asm("#gemm_complex_MMA begin");
MICRO_COMPLEX_MMA_STORE
row += unroll_factor*accCols;
asm("#gemm_complex_MMA end");
}
template<typename LhsScalar, typename RhsScalar, typename Scalarc, typename Scalar, typename Index, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>

View File

@@ -22,10 +22,6 @@ namespace internal {
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#endif
// NOTE Altivec has 32 registers, but Eigen only accepts a value of 8 or 16
#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32
@@ -437,7 +433,7 @@ EIGEN_STRONG_INLINE Packet pload_common(const __UNPACK_TYPE__(Packet)* from)
EIGEN_UNUSED_VARIABLE(from);
EIGEN_DEBUG_ALIGNED_LOAD
#ifdef __VSX__
return vec_xl(0, from);
return vec_xl(0, const_cast<__UNPACK_TYPE__(Packet)*>(from));
#else
return vec_ld(0, from);
#endif
@@ -871,17 +867,26 @@ template<> EIGEN_STRONG_INLINE Packet16uc pmax<Packet16uc>(const Packet16uc& a,
template<> EIGEN_STRONG_INLINE Packet4f pcmp_le(const Packet4f& a, const Packet4f& b) { return reinterpret_cast<Packet4f>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt(const Packet4f& a, const Packet4f& b) { return reinterpret_cast<Packet4f>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_eq(const Packet4f& a, const Packet4f& b) { return reinterpret_cast<Packet4f>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16c pcmp_eq(const Packet16c& a, const Packet16c& b) { return reinterpret_cast<Packet16c>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16uc pcmp_eq(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast<Packet16uc>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8s pcmp_eq(const Packet8s& a, const Packet8s& b) { return reinterpret_cast<Packet8s>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8us pcmp_eq(const Packet8us& a, const Packet8us& b) { return reinterpret_cast<Packet8us>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt_or_nan(const Packet4f& a, const Packet4f& b) {
Packet4f c = reinterpret_cast<Packet4f>(vec_cmpge(a,b));
return vec_nor(c,c);
}
template<> EIGEN_STRONG_INLINE Packet4i pcmp_le(const Packet4i& a, const Packet4i& b) { return reinterpret_cast<Packet4i>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4i pcmp_lt(const Packet4i& a, const Packet4i& b) { return reinterpret_cast<Packet4i>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4i pcmp_eq(const Packet4i& a, const Packet4i& b) { return reinterpret_cast<Packet4i>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8s pcmp_le(const Packet8s& a, const Packet8s& b) { return reinterpret_cast<Packet8s>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8s pcmp_lt(const Packet8s& a, const Packet8s& b) { return reinterpret_cast<Packet8s>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8s pcmp_eq(const Packet8s& a, const Packet8s& b) { return reinterpret_cast<Packet8s>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8us pcmp_le(const Packet8us& a, const Packet8us& b) { return reinterpret_cast<Packet8us>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8us pcmp_lt(const Packet8us& a, const Packet8us& b) { return reinterpret_cast<Packet8us>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet8us pcmp_eq(const Packet8us& a, const Packet8us& b) { return reinterpret_cast<Packet8us>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16c pcmp_le(const Packet16c& a, const Packet16c& b) { return reinterpret_cast<Packet16c>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16c pcmp_lt(const Packet16c& a, const Packet16c& b) { return reinterpret_cast<Packet16c>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16c pcmp_eq(const Packet16c& a, const Packet16c& b) { return reinterpret_cast<Packet16c>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16uc pcmp_le(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast<Packet16uc>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16uc pcmp_lt(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast<Packet16uc>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet16uc pcmp_eq(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast<Packet16uc>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pand<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_and(a, b); }
template<> EIGEN_STRONG_INLINE Packet4i pand<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_and(a, b); }
@@ -906,8 +911,8 @@ template<> EIGEN_STRONG_INLINE Packet8bf pxor<Packet8bf>(const Packet8bf& a, con
return pxor<Packet8us>(a, b);
}
template<> EIGEN_STRONG_INLINE Packet4f pandnot<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_and(a, vec_nor(b, b)); }
template<> EIGEN_STRONG_INLINE Packet4i pandnot<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_and(a, vec_nor(b, b)); }
template<> EIGEN_STRONG_INLINE Packet4f pandnot<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_andc(a, b); }
template<> EIGEN_STRONG_INLINE Packet4i pandnot<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_andc(a, b); }
template<> EIGEN_STRONG_INLINE Packet4f pselect(const Packet4f& mask, const Packet4f& a, const Packet4f& b) {
return vec_sel(b, a, reinterpret_cast<Packet4ui>(mask));
@@ -956,7 +961,7 @@ template<typename Packet> EIGEN_STRONG_INLINE Packet ploadu_common(const __UNPAC
return (Packet) vec_perm(MSQ, LSQ, mask); // align the data
#else
EIGEN_DEBUG_UNALIGNED_LOAD
return vec_xl(0, from);
return vec_xl(0, const_cast<__UNPACK_TYPE__(Packet)*>(from));
#endif
}
@@ -1264,15 +1269,15 @@ EIGEN_STRONG_INLINE Packet8bf F32ToBf16(Packet4f p4f){
Packet4bi is_max_exp = vec_cmpeq(exp, p4ui_max_exp);
Packet4bi is_zero_exp = vec_cmpeq(exp, reinterpret_cast<Packet4ui>(p4i_ZERO));
Packet4bi is_mant_not_zero = vec_cmpne(mantissa, reinterpret_cast<Packet4ui>(p4i_ZERO));
Packet4ui nan_selector = pand<Packet4ui>(
Packet4bi is_mant_zero = vec_cmpeq(mantissa, reinterpret_cast<Packet4ui>(p4i_ZERO));
Packet4ui nan_selector = pandnot<Packet4ui>(
reinterpret_cast<Packet4ui>(is_max_exp),
reinterpret_cast<Packet4ui>(is_mant_not_zero)
reinterpret_cast<Packet4ui>(is_mant_zero)
);
Packet4ui subnormal_selector = pand<Packet4ui>(
Packet4ui subnormal_selector = pandnot<Packet4ui>(
reinterpret_cast<Packet4ui>(is_zero_exp),
reinterpret_cast<Packet4ui>(is_mant_not_zero)
reinterpret_cast<Packet4ui>(is_mant_zero)
);
const _EIGEN_DECLARE_CONST_FAST_Packet4ui(nan, 0x7FC00000);
@@ -1411,6 +1416,9 @@ template<> EIGEN_STRONG_INLINE Packet8bf pmax<Packet8bf>(const Packet8bf& a, con
template<> EIGEN_STRONG_INLINE Packet8bf pcmp_lt(const Packet8bf& a, const Packet8bf& b) {
BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_lt<Packet4f>, a, b);
}
template<> EIGEN_STRONG_INLINE Packet8bf pcmp_lt_or_nan(const Packet8bf& a, const Packet8bf& b) {
BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_lt_or_nan<Packet4f>, a, b);
}
template<> EIGEN_STRONG_INLINE Packet8bf pcmp_le(const Packet8bf& a, const Packet8bf& b) {
BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_le<Packet4f>, a, b);
}
@@ -2260,7 +2268,8 @@ static Packet2ul p2ul_SIGN = { 0x8000000000000000ull, 0x8000000000000000ull };
static Packet2ul p2ul_PREV0DOT5 = { 0x3FDFFFFFFFFFFFFFull, 0x3FDFFFFFFFFFFFFFull };
static Packet2d p2d_ONE = { 1.0, 1.0 };
static Packet2d p2d_ZERO = reinterpret_cast<Packet2d>(p4f_ZERO);
static Packet2d p2d_MZERO = { -0.0, -0.0 };
static Packet2d p2d_MZERO = { numext::bit_cast<double>(0x8000000000000000ull),
numext::bit_cast<double>(0x8000000000000000ull) };
#ifdef _BIG_ENDIAN
static Packet2d p2d_COUNTDOWN = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4f>(p2d_ZERO), reinterpret_cast<Packet4f>(p2d_ONE), 8));
@@ -2453,7 +2462,7 @@ template<> EIGEN_STRONG_INLINE Packet2d print<Packet2d>(const Packet2d& a)
template<> EIGEN_STRONG_INLINE Packet2d ploadu<Packet2d>(const double* from)
{
EIGEN_DEBUG_UNALIGNED_LOAD
return vec_xl(0, from);
return vec_xl(0, const_cast<double*>(from));
}
template<> EIGEN_STRONG_INLINE Packet2d ploaddup<Packet2d>(const double* from)

View File

@@ -67,27 +67,26 @@ std::complex<T> complex_divide_fast(const std::complex<T>& a, const std::complex
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = T(1) / (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) * norm,
(a_imag * b_real - a_real * b_imag) * norm);
const T norm = (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) / norm,
(a_imag * b_real - a_real * b_imag) / norm);
}
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_stable(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
// Guard against over/under-flow.
const T scale = T(1) / (numext::abs(b_real) + numext::abs(b_imag));
const T a_real_scaled = numext::real(a) * scale;
const T a_imag_scaled = numext::imag(a) * scale;
const T b_real_scaled = b_real * scale;
const T b_imag_scaled = b_imag * scale;
const T b_norm2_scaled = b_real_scaled * b_real_scaled + b_imag_scaled * b_imag_scaled;
return std::complex<T>(
(a_real_scaled * b_real_scaled + a_imag_scaled * b_imag_scaled) / b_norm2_scaled,
(a_imag_scaled * b_real_scaled - a_real_scaled * b_imag_scaled) / b_norm2_scaled);
// Smith's complex division (https://arxiv.org/pdf/1210.4539.pdf),
// guards against over/under-flow.
const bool scale_imag = numext::abs(b_imag) <= numext::abs(b_real);
const T rscale = scale_imag ? T(1) : b_real / b_imag;
const T iscale = scale_imag ? b_imag / b_real : T(1);
const T denominator = b_real * rscale + b_imag * iscale;
return std::complex<T>((a_real * rscale + a_imag * iscale) / denominator,
(a_imag * rscale - a_real * iscale) / denominator);
}
template<typename T>

View File

@@ -250,10 +250,6 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw truncate_to_bfloat16(const
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(v)) {
output.value = std::signbit(v) ? 0xFFC0: 0x7FC0;
return output;
} else if (std::fabs(v) < std::numeric_limits<float>::min EIGEN_NOT_A_MACRO()) {
// Flush denormal to +/- 0.
output.value = std::signbit(v) ? 0x8000 : 0;
return output;
}
const uint16_t* p = reinterpret_cast<const uint16_t*>(&v);
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
@@ -288,9 +284,6 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<fals
// qNaN magic: All exponent bits set + most significant bit of fraction
// set.
output.value = std::signbit(ff) ? 0xFFC0: 0x7FC0;
} else if (std::fabs(ff) < std::numeric_limits<float>::min EIGEN_NOT_A_MACRO()) {
// Flush denormal to +/- 0.0
output.value = std::signbit(ff) ? 0x8000 : 0;
} else {
// Fast rounding algorithm that rounds a half value to nearest even. This
// reduces expected error when we convert a large number of floats. Here

View File

@@ -11,19 +11,107 @@
#ifndef EIGEN_ARCH_CONJ_HELPER_H
#define EIGEN_ARCH_CONJ_HELPER_H
#define EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(PACKET_CPLX, PACKET_REAL) \
template<> struct conj_helper<PACKET_REAL, PACKET_CPLX, false,false> { \
EIGEN_STRONG_INLINE PACKET_CPLX pmadd(const PACKET_REAL& x, const PACKET_CPLX& y, const PACKET_CPLX& c) const \
{ return padd(c, pmul(x,y)); } \
EIGEN_STRONG_INLINE PACKET_CPLX pmul(const PACKET_REAL& x, const PACKET_CPLX& y) const \
{ return PACKET_CPLX(Eigen::internal::pmul<PACKET_REAL>(x, y.v)); } \
}; \
\
template<> struct conj_helper<PACKET_CPLX, PACKET_REAL, false,false> { \
EIGEN_STRONG_INLINE PACKET_CPLX pmadd(const PACKET_CPLX& x, const PACKET_REAL& y, const PACKET_CPLX& c) const \
{ return padd(c, pmul(x,y)); } \
EIGEN_STRONG_INLINE PACKET_CPLX pmul(const PACKET_CPLX& x, const PACKET_REAL& y) const \
{ return PACKET_CPLX(Eigen::internal::pmul<PACKET_REAL>(x.v, y)); } \
#define EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(PACKET_CPLX, PACKET_REAL) \
template <> \
struct conj_helper<PACKET_REAL, PACKET_CPLX, false, false> { \
EIGEN_STRONG_INLINE PACKET_CPLX pmadd(const PACKET_REAL& x, \
const PACKET_CPLX& y, \
const PACKET_CPLX& c) const { \
return padd(c, this->pmul(x, y)); \
} \
EIGEN_STRONG_INLINE PACKET_CPLX pmul(const PACKET_REAL& x, \
const PACKET_CPLX& y) const { \
return PACKET_CPLX(Eigen::internal::pmul<PACKET_REAL>(x, y.v)); \
} \
}; \
\
template <> \
struct conj_helper<PACKET_CPLX, PACKET_REAL, false, false> { \
EIGEN_STRONG_INLINE PACKET_CPLX pmadd(const PACKET_CPLX& x, \
const PACKET_REAL& y, \
const PACKET_CPLX& c) const { \
return padd(c, this->pmul(x, y)); \
} \
EIGEN_STRONG_INLINE PACKET_CPLX pmul(const PACKET_CPLX& x, \
const PACKET_REAL& y) const { \
return PACKET_CPLX(Eigen::internal::pmul<PACKET_REAL>(x.v, y)); \
} \
};
#endif // EIGEN_ARCH_CONJ_HELPER_H
namespace Eigen {
namespace internal {
template<bool Conjugate> struct conj_if;
template<> struct conj_if<true> {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& x) const { return numext::conj(x); }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T pconj(const T& x) const { return internal::pconj(x); }
};
template<> struct conj_if<false> {
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& operator()(const T& x) const { return x; }
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& pconj(const T& x) const { return x; }
};
// Generic Implementation, assume scalars since the packet-version is
// specialized below.
template<typename LhsType, typename RhsType, bool ConjLhs, bool ConjRhs>
struct conj_helper {
typedef typename ScalarBinaryOpTraits<LhsType, RhsType>::ReturnType ResultType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType
pmadd(const LhsType& x, const RhsType& y, const ResultType& c) const
{ return this->pmul(x, y) + c; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType
pmul(const LhsType& x, const RhsType& y) const
{ return conj_if<ConjLhs>()(x) * conj_if<ConjRhs>()(y); }
};
template<typename LhsScalar, typename RhsScalar>
struct conj_helper<LhsScalar, RhsScalar, true, true> {
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar>::ReturnType ResultType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType
pmadd(const LhsScalar& x, const RhsScalar& y, const ResultType& c) const
{ return this->pmul(x, y) + c; }
// We save a conjuation by using the identity conj(a)*conj(b) = conj(a*b).
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType
pmul(const LhsScalar& x, const RhsScalar& y) const
{ return numext::conj(x * y); }
};
// Implementation with equal type, use packet operations.
template<typename Packet, bool ConjLhs, bool ConjRhs>
struct conj_helper<Packet, Packet, ConjLhs, ConjRhs>
{
typedef Packet ResultType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmadd(const Packet& x, const Packet& y, const Packet& c) const
{ return Eigen::internal::pmadd(conj_if<ConjLhs>().pconj(x), conj_if<ConjRhs>().pconj(y), c); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmul(const Packet& x, const Packet& y) const
{ return Eigen::internal::pmul(conj_if<ConjLhs>().pconj(x), conj_if<ConjRhs>().pconj(y)); }
};
template<typename Packet>
struct conj_helper<Packet, Packet, true, true>
{
typedef Packet ResultType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmadd(const Packet& x, const Packet& y, const Packet& c) const
{ return Eigen::internal::pmadd(pconj(x), pconj(y), c); }
// We save a conjuation by using the identity conj(a)*conj(b) = conj(a*b).
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmul(const Packet& x, const Packet& y) const
{ return pconj(Eigen::internal::pmul(x, y)); }
};
} // namespace internal
} // namespace Eigen
#endif // EIGEN_ARCH_CONJ_HELPER_H

View File

@@ -19,12 +19,6 @@
namespace Eigen {
namespace internal {
template<typename Packet, int N> EIGEN_DEVICE_FUNC inline Packet
pset(const typename unpacket_traits<Packet>::type (&a)[N] /* a */) {
EIGEN_STATIC_ASSERT(unpacket_traits<Packet>::size == N, THE_ARRAY_SIZE_SHOULD_EQUAL_WITH_PACKET_SIZE);
return pload<Packet>(a);
}
// Creates a Scalar integer type with same bit-width.
template<typename T> struct make_integer;
template<> struct make_integer<float> { typedef numext::int32_t type; };
@@ -808,9 +802,8 @@ Packet psqrt_complex(const Packet& a) {
// l0 = (min0 == 0 ? max0 : max0 * sqrt(1 + (min0/max0)**2)),
// where max0 = max(|x0|, |y0|), min0 = min(|x0|, |y0|), and similarly for l1.
Packet a_flip = pcplxflip(a);
RealPacket a_abs = pabs(a.v); // [|x0|, |y0|, |x1|, |y1|]
RealPacket a_abs_flip = pabs(a_flip.v); // [|y0|, |x0|, |y1|, |x1|]
RealPacket a_abs_flip = pcplxflip(Packet(a_abs)).v; // [|y0|, |x0|, |y1|, |x1|]
RealPacket a_max = pmax(a_abs, a_abs_flip);
RealPacket a_min = pmin(a_abs, a_abs_flip);
RealPacket a_min_zero_mask = pcmp_eq(a_min, pzero(a_min));
@@ -839,7 +832,8 @@ Packet psqrt_complex(const Packet& a) {
// Step 4. Compute solution for inputs with negative real part:
// [|eta0|, sign(y0)*rho0, |eta1|, sign(y1)*rho1]
const RealPacket cst_imag_sign_mask = pset1<Packet>(Scalar(RealScalar(0.0), RealScalar(-0.0))).v;
const RealScalar neg_zero = RealScalar(numext::bit_cast<float>(0x80000000u));
const RealPacket cst_imag_sign_mask = pset1<Packet>(Scalar(RealScalar(0.0), neg_zero)).v;
RealPacket imag_signs = pand(a.v, cst_imag_sign_mask);
Packet negative_real_result;
// Notice that rho is positive, so taking it's absolute value is a noop.

View File

@@ -17,10 +17,6 @@ namespace internal {
// implemented in GenericPacketMathFunctions.h
// This is needed to workaround a circular dependency.
/** \internal \returns a packet with constant coefficients \a a, e.g.: (a[N-1],...,a[0]) */
template<typename Packet, int N> EIGEN_DEVICE_FUNC inline Packet
pset(const typename unpacket_traits<Packet>::type (&a)[N] /* a */);
/***************************************************************************
* Some generic implementations to be used by implementors
***************************************************************************/

View File

@@ -305,42 +305,6 @@ EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a
(a.v[0] * a.v[3]) + (a.v[1] * a.v[2]));
}
template <>
struct conj_helper<Packet2cf, Packet2cf, false, true> {
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y,
const Packet2cf& c) const {
return padd(pmul(x, y), c);
}
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const {
return internal::pmul(a, pconj(b));
}
};
template <>
struct conj_helper<Packet2cf, Packet2cf, true, false> {
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y,
const Packet2cf& c) const {
return padd(pmul(x, y), c);
}
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const {
return internal::pmul(pconj(a), b);
}
};
template <>
struct conj_helper<Packet2cf, Packet2cf, true, true> {
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y,
const Packet2cf& c) const {
return padd(pmul(x, y), c);
}
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const {
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf, Packet4f)
template <>
@@ -644,42 +608,6 @@ EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd&
return pfirst(a);
}
template <>
struct conj_helper<Packet1cd, Packet1cd, false, true> {
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y,
const Packet1cd& c) const {
return padd(pmul(x, y), c);
}
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const {
return internal::pmul(a, pconj(b));
}
};
template <>
struct conj_helper<Packet1cd, Packet1cd, true, false> {
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y,
const Packet1cd& c) const {
return padd(pmul(x, y), c);
}
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const {
return internal::pmul(pconj(a), b);
}
};
template <>
struct conj_helper<Packet1cd, Packet1cd, true, true> {
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y,
const Packet1cd& c) const {
return padd(pmul(x, y), c);
}
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const {
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd, Packet2d)
template <>

View File

@@ -28,10 +28,6 @@ namespace internal {
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#endif
#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32
#endif

View File

@@ -124,13 +124,6 @@ template<> EIGEN_STRONG_INLINE Packet1cf psub<Packet1cf>(const Packet1cf& a, con
template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{ return Packet2cf(psub<Packet4f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor<Packet2cf>(const Packet2cf& a, const Packet2cf& b);
template<> EIGEN_STRONG_INLINE Packet2cf paddsub<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
Packet4f mask = {-0.0f, -0.0f, 0.0f, 0.0f};
return Packet2cf(padd(a.v, pxor(mask, b.v)));
}
template<> EIGEN_STRONG_INLINE Packet1cf pnegate(const Packet1cf& a) { return Packet1cf(pnegate<Packet2f>(a.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate<Packet4f>(a.v)); }
@@ -349,67 +342,13 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return s;
}
template<> struct conj_helper<Packet1cf,Packet1cf,false,true>
{
EIGEN_STRONG_INLINE Packet1cf pmadd(const Packet1cf& x, const Packet1cf& y, const Packet1cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) const
{ return internal::pmul(a, pconj(b)); }
};
template<> struct conj_helper<Packet1cf,Packet1cf,true,false>
{
EIGEN_STRONG_INLINE Packet1cf pmadd(const Packet1cf& x, const Packet1cf& y, const Packet1cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) const
{ return internal::pmul(pconj(a), b); }
};
template<> struct conj_helper<Packet1cf,Packet1cf,true,true>
{
EIGEN_STRONG_INLINE Packet1cf pmadd(const Packet1cf& x, const Packet1cf& y, const Packet1cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) const
{ return pconj(internal::pmul(a,b)); }
};
template<> struct conj_helper<Packet2cf,Packet2cf,false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{ return internal::pmul(a, pconj(b)); }
};
template<> struct conj_helper<Packet2cf,Packet2cf,true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{ return internal::pmul(pconj(a), b); }
};
template<> struct conj_helper<Packet2cf,Packet2cf,true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{ return pconj(internal::pmul(a,b)); }
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cf,Packet2f)
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet1cf pdiv<Packet1cf>(const Packet1cf& a, const Packet1cf& b)
{
// TODO optimize it for NEON
Packet1cf res = conj_helper<Packet1cf, Packet1cf, false, true>().pmul(a,b);
Packet1cf res = pmul(a, pconj(b));
Packet2f s, rev_s;
// this computes the norm
@@ -421,7 +360,7 @@ template<> EIGEN_STRONG_INLINE Packet1cf pdiv<Packet1cf>(const Packet1cf& a, con
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
// TODO optimize it for NEON
Packet2cf res = conj_helper<Packet2cf, Packet2cf, false, true>().pmul(a,b);
Packet2cf res = pmul(a,pconj(b));
Packet4f s, rev_s;
// this computes the norm
@@ -610,39 +549,12 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Pack
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{ return internal::pmul(a, pconj(b)); }
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{ return internal::pmul(pconj(a), b); }
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{ return pconj(internal::pmul(a,b)); }
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d)
template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
// TODO optimize it for NEON
Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
Packet1cd res = pmul(a,pconj(b));
Packet2d s = pmul<Packet2d>(b.v, b.v);
Packet2d rev_s = preverse<Packet2d>(s);

View File

@@ -24,10 +24,6 @@ namespace internal {
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#endif
#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
#if EIGEN_ARCH_ARM64
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32
@@ -36,7 +32,7 @@ namespace internal {
#endif
#endif
#if EIGEN_COMP_MSVC
#if EIGEN_COMP_MSVC_STRICT
// In MSVC's arm_neon.h header file, all NEON vector types
// are aliases to the same underlying type __n128.
@@ -82,7 +78,7 @@ typedef uint32x4_t Packet4ui;
typedef int64x2_t Packet2l;
typedef uint64x2_t Packet2ul;
#endif // EIGEN_COMP_MSVC
#endif // EIGEN_COMP_MSVC_STRICT
EIGEN_STRONG_INLINE Packet4f shuffle1(const Packet4f& m, int mask){
const float* a = reinterpret_cast<const float*>(&m);
@@ -866,12 +862,12 @@ template<> EIGEN_STRONG_INLINE Packet2ul psub<Packet2ul>(const Packet2ul& a, con
template<> EIGEN_STRONG_INLINE Packet2f pxor<Packet2f>(const Packet2f& a, const Packet2f& b);
template<> EIGEN_STRONG_INLINE Packet2f paddsub<Packet2f>(const Packet2f& a, const Packet2f & b) {
Packet2f mask = {-0.0f, 0.0f};
Packet2f mask = {numext::bit_cast<float>(0x80000000u), 0.0f};
return padd(a, pxor(mask, b));
}
template<> EIGEN_STRONG_INLINE Packet4f pxor<Packet4f>(const Packet4f& a, const Packet4f& b);
template<> EIGEN_STRONG_INLINE Packet4f paddsub<Packet4f>(const Packet4f& a, const Packet4f& b) {
Packet4f mask = {-0.0f, 0.0f, -0.0f, 0.0f};
Packet4f mask = {numext::bit_cast<float>(0x80000000u), 0.0f, numext::bit_cast<float>(0x80000000u), 0.0f};
return padd(a, pxor(mask, b));
}
@@ -2774,22 +2770,167 @@ template<> EIGEN_STRONG_INLINE bool predux_any(const Packet4f& x)
return vget_lane_u32(vpmax_u32(tmp, tmp), 0);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2f, 2>& kernel)
{
const float32x2x2_t z = vzip_f32(kernel.packet[0], kernel.packet[1]);
kernel.packet[0] = z.val[0];
kernel.packet[1] = z.val[1];
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4f, 4>& kernel)
{
const float32x4x2_t tmp1 = vzipq_f32(kernel.packet[0], kernel.packet[1]);
const float32x4x2_t tmp2 = vzipq_f32(kernel.packet[2], kernel.packet[3]);
// Helpers for ptranspose.
namespace detail {
template<typename Packet>
void zip_in_place(Packet& p1, Packet& p2);
kernel.packet[0] = vcombine_f32(vget_low_f32(tmp1.val[0]), vget_low_f32(tmp2.val[0]));
kernel.packet[1] = vcombine_f32(vget_high_f32(tmp1.val[0]), vget_high_f32(tmp2.val[0]));
kernel.packet[2] = vcombine_f32(vget_low_f32(tmp1.val[1]), vget_low_f32(tmp2.val[1]));
kernel.packet[3] = vcombine_f32(vget_high_f32(tmp1.val[1]), vget_high_f32(tmp2.val[1]));
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet2f>(Packet2f& p1, Packet2f& p2) {
const float32x2x2_t tmp = vzip_f32(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet4f>(Packet4f& p1, Packet4f& p2) {
const float32x4x2_t tmp = vzipq_f32(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet8c>(Packet8c& p1, Packet8c& p2) {
const int8x8x2_t tmp = vzip_s8(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet16c>(Packet16c& p1, Packet16c& p2) {
const int8x16x2_t tmp = vzipq_s8(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet8uc>(Packet8uc& p1, Packet8uc& p2) {
const uint8x8x2_t tmp = vzip_u8(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet16uc>(Packet16uc& p1, Packet16uc& p2) {
const uint8x16x2_t tmp = vzipq_u8(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet2i>(Packet2i& p1, Packet2i& p2) {
const int32x2x2_t tmp = vzip_s32(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet4i>(Packet4i& p1, Packet4i& p2) {
const int32x4x2_t tmp = vzipq_s32(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet2ui>(Packet2ui& p1, Packet2ui& p2) {
const uint32x2x2_t tmp = vzip_u32(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet4ui>(Packet4ui& p1, Packet4ui& p2) {
const uint32x4x2_t tmp = vzipq_u32(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet4s>(Packet4s& p1, Packet4s& p2) {
const int16x4x2_t tmp = vzip_s16(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet8s>(Packet8s& p1, Packet8s& p2) {
const int16x8x2_t tmp = vzipq_s16(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet4us>(Packet4us& p1, Packet4us& p2) {
const uint16x4x2_t tmp = vzip_u16(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet8us>(Packet8us& p1, Packet8us& p2) {
const uint16x8x2_t tmp = vzipq_u16(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
template<typename Packet>
EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock<Packet, 2>& kernel) {
zip_in_place(kernel.packet[0], kernel.packet[1]);
}
template<typename Packet>
EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock<Packet, 4>& kernel) {
zip_in_place(kernel.packet[0], kernel.packet[2]);
zip_in_place(kernel.packet[1], kernel.packet[3]);
zip_in_place(kernel.packet[0], kernel.packet[1]);
zip_in_place(kernel.packet[2], kernel.packet[3]);
}
template<typename Packet>
EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock<Packet, 8>& kernel) {
zip_in_place(kernel.packet[0], kernel.packet[4]);
zip_in_place(kernel.packet[1], kernel.packet[5]);
zip_in_place(kernel.packet[2], kernel.packet[6]);
zip_in_place(kernel.packet[3], kernel.packet[7]);
zip_in_place(kernel.packet[0], kernel.packet[2]);
zip_in_place(kernel.packet[1], kernel.packet[3]);
zip_in_place(kernel.packet[4], kernel.packet[6]);
zip_in_place(kernel.packet[5], kernel.packet[7]);
zip_in_place(kernel.packet[0], kernel.packet[1]);
zip_in_place(kernel.packet[2], kernel.packet[3]);
zip_in_place(kernel.packet[4], kernel.packet[5]);
zip_in_place(kernel.packet[6], kernel.packet[7]);
}
template<typename Packet>
EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock<Packet, 16>& kernel) {
EIGEN_UNROLL_LOOP
for (int i=0; i<4; ++i) {
const int m = (1 << i);
EIGEN_UNROLL_LOOP
for (int j=0; j<m; ++j) {
const int n = (1 << (3-i));
EIGEN_UNROLL_LOOP
for (int k=0; k<n; ++k) {
const int idx = 2*j*n+k;
zip_in_place(kernel.packet[idx], kernel.packet[idx + n]);
}
}
}
}
} // namespace detail
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2f, 2>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4f, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4c, 4>& kernel)
{
const int8x8_t a = vreinterpret_s8_s32(vset_lane_s32(kernel.packet[2], vdup_n_s32(kernel.packet[0]), 1));
@@ -2803,83 +2944,22 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4c, 4>&
kernel.packet[2] = vget_lane_s32(vreinterpret_s32_s16(zip16.val[1]), 0);
kernel.packet[3] = vget_lane_s32(vreinterpret_s32_s16(zip16.val[1]), 1);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8c, 8>& kernel)
{
int8x8x2_t zip8[4];
uint16x4x2_t zip16[4];
EIGEN_UNROLL_LOOP
for (int i = 0; i != 4; i++)
zip8[i] = vzip_s8(kernel.packet[i*2], kernel.packet[i*2+1]);
EIGEN_UNROLL_LOOP
for (int i = 0; i != 2; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
zip16[i*2+j] = vzip_u16(vreinterpret_u16_s8(zip8[i*2].val[j]), vreinterpret_u16_s8(zip8[i*2+1].val[j]));
}
EIGEN_UNROLL_LOOP
for (int i = 0; i != 2; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
const uint32x2x2_t z = vzip_u32(vreinterpret_u32_u16(zip16[i].val[j]), vreinterpret_u32_u16(zip16[i+2].val[j]));
EIGEN_UNROLL_LOOP
for (int k = 0; k != 2; k++)
kernel.packet[i*4+j*2+k] = vreinterpret_s8_u32(z.val[k]);
}
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8c, 8>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16c, 16>& kernel)
{
int8x16x2_t zip8[8];
uint16x8x2_t zip16[8];
uint32x4x2_t zip32[8];
EIGEN_UNROLL_LOOP
for (int i = 0; i != 8; i++)
zip8[i] = vzipq_s8(kernel.packet[i*2], kernel.packet[i*2+1]);
EIGEN_UNROLL_LOOP
for (int i = 0; i != 4; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
zip16[i*2+j] = vzipq_u16(vreinterpretq_u16_s8(zip8[i*2].val[j]),
vreinterpretq_u16_s8(zip8[i*2+1].val[j]));
}
}
EIGEN_UNROLL_LOOP
for (int i = 0; i != 2; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
EIGEN_UNROLL_LOOP
for (int k = 0; k != 2; k++)
zip32[i*4+j*2+k] = vzipq_u32(vreinterpretq_u32_u16(zip16[i*4+j].val[k]),
vreinterpretq_u32_u16(zip16[i*4+j+2].val[k]));
}
}
EIGEN_UNROLL_LOOP
for (int i = 0; i != 4; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
kernel.packet[i*4+j*2] = vreinterpretq_s8_u32(vcombine_u32(vget_low_u32(zip32[i].val[j]),
vget_low_u32(zip32[i+4].val[j])));
kernel.packet[i*4+j*2+1] = vreinterpretq_s8_u32(vcombine_u32(vget_high_u32(zip32[i].val[j]),
vget_high_u32(zip32[i+4].val[j])));
}
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8c, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16c, 16>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16c, 8>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16c, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4uc, 4>& kernel)
{
const uint8x8_t a = vreinterpret_u8_u32(vset_lane_u32(kernel.packet[2], vdup_n_u32(kernel.packet[0]), 1));
@@ -2893,233 +2973,62 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4uc, 4>&
kernel.packet[2] = vget_lane_u32(vreinterpret_u32_u16(zip16.val[1]), 0);
kernel.packet[3] = vget_lane_u32(vreinterpret_u32_u16(zip16.val[1]), 1);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8uc, 8>& kernel)
{
uint8x8x2_t zip8[4];
uint16x4x2_t zip16[4];
EIGEN_UNROLL_LOOP
for (int i = 0; i != 4; i++)
zip8[i] = vzip_u8(kernel.packet[i*2], kernel.packet[i*2+1]);
EIGEN_UNROLL_LOOP
for (int i = 0; i != 2; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
zip16[i*2+j] = vzip_u16(vreinterpret_u16_u8(zip8[i*2].val[j]), vreinterpret_u16_u8(zip8[i*2+1].val[j]));
}
EIGEN_UNROLL_LOOP
for (int i = 0; i != 2; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
const uint32x2x2_t z = vzip_u32(vreinterpret_u32_u16(zip16[i].val[j]), vreinterpret_u32_u16(zip16[i+2].val[j]));
EIGEN_UNROLL_LOOP
for (int k = 0; k != 2; k++)
kernel.packet[i*4+j*2+k] = vreinterpret_u8_u32(z.val[k]);
}
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8uc, 8>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16uc, 16>& kernel)
{
uint8x16x2_t zip8[8];
uint16x8x2_t zip16[8];
uint32x4x2_t zip32[8];
EIGEN_UNROLL_LOOP
for (int i = 0; i != 8; i++)
zip8[i] = vzipq_u8(kernel.packet[i*2], kernel.packet[i*2+1]);
EIGEN_UNROLL_LOOP
for (int i = 0; i != 4; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
zip16[i*2+j] = vzipq_u16(vreinterpretq_u16_u8(zip8[i*2].val[j]),
vreinterpretq_u16_u8(zip8[i*2+1].val[j]));
}
EIGEN_UNROLL_LOOP
for (int i = 0; i != 2; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
EIGEN_UNROLL_LOOP
for (int k = 0; k != 2; k++)
zip32[i*4+j*2+k] = vzipq_u32(vreinterpretq_u32_u16(zip16[i*4+j].val[k]),
vreinterpretq_u32_u16(zip16[i*4+j+2].val[k]));
}
}
EIGEN_UNROLL_LOOP
for (int i = 0; i != 4; i++)
{
EIGEN_UNROLL_LOOP
for (int j = 0; j != 2; j++)
{
kernel.packet[i*4+j*2] = vreinterpretq_u8_u32(vcombine_u32(vget_low_u32(zip32[i].val[j]),
vget_low_u32(zip32[i+4].val[j])));
kernel.packet[i*4+j*2+1] = vreinterpretq_u8_u32(vcombine_u32(vget_high_u32(zip32[i].val[j]),
vget_high_u32(zip32[i+4].val[j])));
}
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8uc, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4s, 4>& kernel)
{
const int16x4x2_t zip16_1 = vzip_s16(kernel.packet[0], kernel.packet[1]);
const int16x4x2_t zip16_2 = vzip_s16(kernel.packet[2], kernel.packet[3]);
const uint32x2x2_t zip32_1 = vzip_u32(vreinterpret_u32_s16(zip16_1.val[0]), vreinterpret_u32_s16(zip16_2.val[0]));
const uint32x2x2_t zip32_2 = vzip_u32(vreinterpret_u32_s16(zip16_1.val[1]), vreinterpret_u32_s16(zip16_2.val[1]));
kernel.packet[0] = vreinterpret_s16_u32(zip32_1.val[0]);
kernel.packet[1] = vreinterpret_s16_u32(zip32_1.val[1]);
kernel.packet[2] = vreinterpret_s16_u32(zip32_2.val[0]);
kernel.packet[3] = vreinterpret_s16_u32(zip32_2.val[1]);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16uc, 16>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16uc, 8>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16uc, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8s, 4>& kernel)
{
const int16x8x2_t zip16_1 = vzipq_s16(kernel.packet[0], kernel.packet[1]);
const int16x8x2_t zip16_2 = vzipq_s16(kernel.packet[2], kernel.packet[3]);
const uint32x4x2_t zip32_1 = vzipq_u32(vreinterpretq_u32_s16(zip16_1.val[0]), vreinterpretq_u32_s16(zip16_2.val[0]));
const uint32x4x2_t zip32_2 = vzipq_u32(vreinterpretq_u32_s16(zip16_1.val[1]), vreinterpretq_u32_s16(zip16_2.val[1]));
kernel.packet[0] = vreinterpretq_s16_u32(zip32_1.val[0]);
kernel.packet[1] = vreinterpretq_s16_u32(zip32_1.val[1]);
kernel.packet[2] = vreinterpretq_s16_u32(zip32_2.val[0]);
kernel.packet[3] = vreinterpretq_s16_u32(zip32_2.val[1]);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4s, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8s, 8>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8s, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16c, 4>& kernel)
{
const int8x16x2_t zip8_1 = vzipq_s8(kernel.packet[0], kernel.packet[1]);
const int8x16x2_t zip8_2 = vzipq_s8(kernel.packet[2], kernel.packet[3]);
const int16x8x2_t zip16_1 = vzipq_s16(vreinterpretq_s16_s8(zip8_1.val[0]), vreinterpretq_s16_s8(zip8_2.val[0]));
const int16x8x2_t zip16_2 = vzipq_s16(vreinterpretq_s16_s8(zip8_1.val[1]), vreinterpretq_s16_s8(zip8_2.val[1]));
kernel.packet[0] = vreinterpretq_s8_s16(zip16_1.val[0]);
kernel.packet[1] = vreinterpretq_s8_s16(zip16_1.val[1]);
kernel.packet[2] = vreinterpretq_s8_s16(zip16_2.val[0]);
kernel.packet[3] = vreinterpretq_s8_s16(zip16_2.val[1]);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4us, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8us, 8>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8us, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet16uc, 4>& kernel)
{
const uint8x16x2_t zip8_1 = vzipq_u8(kernel.packet[0], kernel.packet[1]);
const uint8x16x2_t zip8_2 = vzipq_u8(kernel.packet[2], kernel.packet[3]);
const uint16x8x2_t zip16_1 = vzipq_u16(vreinterpretq_u16_u8(zip8_1.val[0]), vreinterpretq_u16_u8(zip8_2.val[0]));
const uint16x8x2_t zip16_2 = vzipq_u16(vreinterpretq_u16_u8(zip8_1.val[1]), vreinterpretq_u16_u8(zip8_2.val[1]));
kernel.packet[0] = vreinterpretq_u8_u16(zip16_1.val[0]);
kernel.packet[1] = vreinterpretq_u8_u16(zip16_1.val[1]);
kernel.packet[2] = vreinterpretq_u8_u16(zip16_2.val[0]);
kernel.packet[3] = vreinterpretq_u8_u16(zip16_2.val[1]);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2i, 2>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4i, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2ui, 2>& kernel) {
detail::zip_in_place(kernel.packet[0], kernel.packet[1]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4ui, 4>& kernel) {
detail::ptranspose_impl(kernel);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8s, 8>& kernel)
{
const int16x8x2_t zip16_1 = vzipq_s16(kernel.packet[0], kernel.packet[1]);
const int16x8x2_t zip16_2 = vzipq_s16(kernel.packet[2], kernel.packet[3]);
const int16x8x2_t zip16_3 = vzipq_s16(kernel.packet[4], kernel.packet[5]);
const int16x8x2_t zip16_4 = vzipq_s16(kernel.packet[6], kernel.packet[7]);
const uint32x4x2_t zip32_1 = vzipq_u32(vreinterpretq_u32_s16(zip16_1.val[0]), vreinterpretq_u32_s16(zip16_2.val[0]));
const uint32x4x2_t zip32_2 = vzipq_u32(vreinterpretq_u32_s16(zip16_1.val[1]), vreinterpretq_u32_s16(zip16_2.val[1]));
const uint32x4x2_t zip32_3 = vzipq_u32(vreinterpretq_u32_s16(zip16_3.val[0]), vreinterpretq_u32_s16(zip16_4.val[0]));
const uint32x4x2_t zip32_4 = vzipq_u32(vreinterpretq_u32_s16(zip16_3.val[1]), vreinterpretq_u32_s16(zip16_4.val[1]));
kernel.packet[0] = vreinterpretq_s16_u32(vcombine_u32(vget_low_u32(zip32_1.val[0]), vget_low_u32(zip32_3.val[0])));
kernel.packet[1] = vreinterpretq_s16_u32(vcombine_u32(vget_high_u32(zip32_1.val[0]), vget_high_u32(zip32_3.val[0])));
kernel.packet[2] = vreinterpretq_s16_u32(vcombine_u32(vget_low_u32(zip32_1.val[1]), vget_low_u32(zip32_3.val[1])));
kernel.packet[3] = vreinterpretq_s16_u32(vcombine_u32(vget_high_u32(zip32_1.val[1]), vget_high_u32(zip32_3.val[1])));
kernel.packet[4] = vreinterpretq_s16_u32(vcombine_u32(vget_low_u32(zip32_2.val[0]), vget_low_u32(zip32_4.val[0])));
kernel.packet[5] = vreinterpretq_s16_u32(vcombine_u32(vget_high_u32(zip32_2.val[0]), vget_high_u32(zip32_4.val[0])));
kernel.packet[6] = vreinterpretq_s16_u32(vcombine_u32(vget_low_u32(zip32_2.val[1]), vget_low_u32(zip32_4.val[1])));
kernel.packet[7] = vreinterpretq_s16_u32(vcombine_u32(vget_high_u32(zip32_2.val[1]), vget_high_u32(zip32_4.val[1])));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4us, 4>& kernel)
{
const uint16x4x2_t zip16_1 = vzip_u16(kernel.packet[0], kernel.packet[1]);
const uint16x4x2_t zip16_2 = vzip_u16(kernel.packet[2], kernel.packet[3]);
const uint32x2x2_t zip32_1 = vzip_u32(vreinterpret_u32_u16(zip16_1.val[0]), vreinterpret_u32_u16(zip16_2.val[0]));
const uint32x2x2_t zip32_2 = vzip_u32(vreinterpret_u32_u16(zip16_1.val[1]), vreinterpret_u32_u16(zip16_2.val[1]));
kernel.packet[0] = vreinterpret_u16_u32(zip32_1.val[0]);
kernel.packet[1] = vreinterpret_u16_u32(zip32_1.val[1]);
kernel.packet[2] = vreinterpret_u16_u32(zip32_2.val[0]);
kernel.packet[3] = vreinterpret_u16_u32(zip32_2.val[1]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet8us, 8>& kernel)
{
const uint16x8x2_t zip16_1 = vzipq_u16(kernel.packet[0], kernel.packet[1]);
const uint16x8x2_t zip16_2 = vzipq_u16(kernel.packet[2], kernel.packet[3]);
const uint16x8x2_t zip16_3 = vzipq_u16(kernel.packet[4], kernel.packet[5]);
const uint16x8x2_t zip16_4 = vzipq_u16(kernel.packet[6], kernel.packet[7]);
const uint32x4x2_t zip32_1 = vzipq_u32(vreinterpretq_u32_u16(zip16_1.val[0]), vreinterpretq_u32_u16(zip16_2.val[0]));
const uint32x4x2_t zip32_2 = vzipq_u32(vreinterpretq_u32_u16(zip16_1.val[1]), vreinterpretq_u32_u16(zip16_2.val[1]));
const uint32x4x2_t zip32_3 = vzipq_u32(vreinterpretq_u32_u16(zip16_3.val[0]), vreinterpretq_u32_u16(zip16_4.val[0]));
const uint32x4x2_t zip32_4 = vzipq_u32(vreinterpretq_u32_u16(zip16_3.val[1]), vreinterpretq_u32_u16(zip16_4.val[1]));
kernel.packet[0] = vreinterpretq_u16_u32(vcombine_u32(vget_low_u32(zip32_1.val[0]), vget_low_u32(zip32_3.val[0])));
kernel.packet[1] = vreinterpretq_u16_u32(vcombine_u32(vget_high_u32(zip32_1.val[0]), vget_high_u32(zip32_3.val[0])));
kernel.packet[2] = vreinterpretq_u16_u32(vcombine_u32(vget_low_u32(zip32_1.val[1]), vget_low_u32(zip32_3.val[1])));
kernel.packet[3] = vreinterpretq_u16_u32(vcombine_u32(vget_high_u32(zip32_1.val[1]), vget_high_u32(zip32_3.val[1])));
kernel.packet[4] = vreinterpretq_u16_u32(vcombine_u32(vget_low_u32(zip32_2.val[0]), vget_low_u32(zip32_4.val[0])));
kernel.packet[5] = vreinterpretq_u16_u32(vcombine_u32(vget_high_u32(zip32_2.val[0]), vget_high_u32(zip32_4.val[0])));
kernel.packet[6] = vreinterpretq_u16_u32(vcombine_u32(vget_low_u32(zip32_2.val[1]), vget_low_u32(zip32_4.val[1])));
kernel.packet[7] = vreinterpretq_u16_u32(vcombine_u32(vget_high_u32(zip32_2.val[1]), vget_high_u32(zip32_4.val[1])));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2i, 2>& kernel)
{
const int32x2x2_t z = vzip_s32(kernel.packet[0], kernel.packet[1]);
kernel.packet[0] = z.val[0];
kernel.packet[1] = z.val[1];
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4i, 4>& kernel)
{
const int32x4x2_t tmp1 = vzipq_s32(kernel.packet[0], kernel.packet[1]);
const int32x4x2_t tmp2 = vzipq_s32(kernel.packet[2], kernel.packet[3]);
kernel.packet[0] = vcombine_s32(vget_low_s32(tmp1.val[0]), vget_low_s32(tmp2.val[0]));
kernel.packet[1] = vcombine_s32(vget_high_s32(tmp1.val[0]), vget_high_s32(tmp2.val[0]));
kernel.packet[2] = vcombine_s32(vget_low_s32(tmp1.val[1]), vget_low_s32(tmp2.val[1]));
kernel.packet[3] = vcombine_s32(vget_high_s32(tmp1.val[1]), vget_high_s32(tmp2.val[1]));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2ui, 2>& kernel)
{
const uint32x2x2_t z = vzip_u32(kernel.packet[0], kernel.packet[1]);
kernel.packet[0] = z.val[0];
kernel.packet[1] = z.val[1];
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4ui, 4>& kernel)
{
const uint32x4x2_t tmp1 = vzipq_u32(kernel.packet[0], kernel.packet[1]);
const uint32x4x2_t tmp2 = vzipq_u32(kernel.packet[2], kernel.packet[3]);
kernel.packet[0] = vcombine_u32(vget_low_u32(tmp1.val[0]), vget_low_u32(tmp2.val[0]));
kernel.packet[1] = vcombine_u32(vget_high_u32(tmp1.val[0]), vget_high_u32(tmp2.val[0]));
kernel.packet[2] = vcombine_u32(vget_low_u32(tmp1.val[1]), vget_low_u32(tmp2.val[1]));
kernel.packet[3] = vcombine_u32(vget_high_u32(tmp1.val[1]), vget_high_u32(tmp2.val[1]));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void
ptranspose(PacketBlock<Packet2l, 2>& kernel)
{
#if EIGEN_ARCH_ARM64
const int64x2_t tmp1 = vzip1q_s64(kernel.packet[0], kernel.packet[1]);
const int64x2_t tmp2 = vzip2q_s64(kernel.packet[0], kernel.packet[1]);
kernel.packet[1] = vzip2q_s64(kernel.packet[0], kernel.packet[1]);
kernel.packet[0] = tmp1;
kernel.packet[1] = tmp2;
#else
const int64x1_t tmp[2][2] = {
{ vget_low_s64(kernel.packet[0]), vget_high_s64(kernel.packet[0]) },
@@ -3135,10 +3044,8 @@ ptranspose(PacketBlock<Packet2ul, 2>& kernel)
{
#if EIGEN_ARCH_ARM64
const uint64x2_t tmp1 = vzip1q_u64(kernel.packet[0], kernel.packet[1]);
const uint64x2_t tmp2 = vzip2q_u64(kernel.packet[0], kernel.packet[1]);
kernel.packet[1] = vzip2q_u64(kernel.packet[0], kernel.packet[1]);
kernel.packet[0] = tmp1;
kernel.packet[1] = tmp2;
#else
const uint64x1_t tmp[2][2] = {
{ vget_low_u64(kernel.packet[0]), vget_high_u64(kernel.packet[0]) },
@@ -3468,6 +3375,15 @@ template<> struct unpacket_traits<Packet4bf>
};
};
namespace detail {
template<>
EIGEN_ALWAYS_INLINE void zip_in_place<Packet4bf>(Packet4bf& p1, Packet4bf& p2) {
const uint16x4x2_t tmp = vzip_u16(p1, p2);
p1 = tmp.val[0];
p2 = tmp.val[1];
}
} // namespace detail
EIGEN_STRONG_INLINE Packet4bf F32ToBf16(const Packet4f& p)
{
// See the scalar implemention in BFloat16.h for a comprehensible explanation
@@ -3674,16 +3590,7 @@ template<> EIGEN_STRONG_INLINE Packet4bf preverse<Packet4bf>(const Packet4bf& a)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet4bf, 4>& kernel)
{
PacketBlock<Packet4us, 4> k;
k.packet[0] = kernel.packet[0];
k.packet[1] = kernel.packet[1];
k.packet[2] = kernel.packet[2];
k.packet[3] = kernel.packet[3];
ptranspose(k);
kernel.packet[0] = k.packet[0];
kernel.packet[1] = k.packet[1];
kernel.packet[2] = k.packet[2];
kernel.packet[3] = k.packet[3];
detail::ptranspose_impl(kernel);
}
template<> EIGEN_STRONG_INLINE Packet4bf pabsdiff<Packet4bf>(const Packet4bf& a, const Packet4bf& b)
@@ -3701,6 +3608,11 @@ template<> EIGEN_STRONG_INLINE Packet4bf pcmp_lt<Packet4bf>(const Packet4bf& a,
return F32MaskToBf16Mask(pcmp_lt<Packet4f>(Bf16ToF32(a), Bf16ToF32(b)));
}
template<> EIGEN_STRONG_INLINE Packet4bf pcmp_lt_or_nan<Packet4bf>(const Packet4bf& a, const Packet4bf& b)
{
return F32MaskToBf16Mask(pcmp_lt_or_nan<Packet4f>(Bf16ToF32(a), Bf16ToF32(b)));
}
template<> EIGEN_STRONG_INLINE Packet4bf pcmp_le<Packet4bf>(const Packet4bf& a, const Packet4bf& b)
{
return F32MaskToBf16Mask(pcmp_le<Packet4f>(Bf16ToF32(a), Bf16ToF32(b)));
@@ -3835,7 +3747,7 @@ template<> EIGEN_STRONG_INLINE Packet2d psub<Packet2d>(const Packet2d& a, const
template<> EIGEN_STRONG_INLINE Packet2d pxor<Packet2d>(const Packet2d& , const Packet2d& );
template<> EIGEN_STRONG_INLINE Packet2d paddsub<Packet2d>(const Packet2d& a, const Packet2d& b){
const Packet2d mask = {-0.0,0.0};
const Packet2d mask = {numext::bit_cast<double>(0x8000000000000000ull),0.0};
return padd(a, pxor(mask, b));
}

View File

@@ -19,7 +19,7 @@ struct Packet2cf
{
EIGEN_STRONG_INLINE Packet2cf() {}
EIGEN_STRONG_INLINE explicit Packet2cf(const __m128& a) : v(a) {}
__m128 v;
Packet4f v;
};
// Use the packet_traits defined in AVX/PacketMath.h instead if we're going
@@ -66,12 +66,6 @@ template<> struct unpacket_traits<Packet2cf> {
template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_sub_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pxor<Packet2cf>(const Packet2cf& a, const Packet2cf& b);
template<> EIGEN_STRONG_INLINE Packet2cf paddsub<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x0,0x0));
return Packet2cf(padd(a.v, pxor(mask, b.v)));
}
template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a)
{
@@ -171,74 +165,21 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst(pmul(a, Packet2cf(_mm_movehl_ps(a.v,a.v))));
}
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
EIGEN_STRONG_INLINE Packet2cf pcplxflip/* <Packet2cf> */(const Packet2cf& x)
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
#ifdef EIGEN_VECTORIZE_SSE3
return internal::pmul(a, pconj(b));
#else
const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
return Packet2cf(_mm_add_ps(_mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v), mask),
_mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
vec4f_swizzle1(b.v, 1, 0, 3, 2))));
#endif
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
#ifdef EIGEN_VECTORIZE_SSE3
return internal::pmul(pconj(a), b);
#else
const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
return Packet2cf(_mm_add_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v),
_mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
vec4f_swizzle1(b.v, 1, 0, 3, 2)), mask)));
#endif
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
#ifdef EIGEN_VECTORIZE_SSE3
return pconj(internal::pmul(a, b));
#else
const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
return Packet2cf(_mm_sub_ps(_mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v), mask),
_mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
vec4f_swizzle1(b.v, 1, 0, 3, 2))));
#endif
}
};
return Packet2cf(vec4f_swizzle1(x.v, 1, 0, 3, 2));
}
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
// TODO optimize it for SSE3 and 4
Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a,b);
Packet2cf res = pmul(a, pconj(b));
__m128 s = _mm_mul_ps(b.v,b.v);
return Packet2cf(_mm_div_ps(res.v,_mm_add_ps(s,_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(s), 0xb1)))));
return Packet2cf(_mm_div_ps(res.v,_mm_add_ps(s,vec4f_swizzle1(s, 1, 0, 3, 2))));
}
EIGEN_STRONG_INLINE Packet2cf pcplxflip/* <Packet2cf> */(const Packet2cf& x)
{
return Packet2cf(vec4f_swizzle1(x.v, 1, 0, 3, 2));
}
//---------- double ----------
@@ -246,7 +187,7 @@ struct Packet1cd
{
EIGEN_STRONG_INLINE Packet1cd() {}
EIGEN_STRONG_INLINE explicit Packet1cd(const __m128d& a) : v(a) {}
__m128d v;
Packet2d v;
};
// Use the packet_traits defined in AVX/PacketMath.h instead if we're going
@@ -354,66 +295,12 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const
return pfirst(a);
}
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
#ifdef EIGEN_VECTORIZE_SSE3
return internal::pmul(a, pconj(b));
#else
const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
return Packet1cd(_mm_add_pd(_mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v), mask),
_mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
vec2d_swizzle1(b.v, 1, 0))));
#endif
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
#ifdef EIGEN_VECTORIZE_SSE3
return internal::pmul(pconj(a), b);
#else
const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
return Packet1cd(_mm_add_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v),
_mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
vec2d_swizzle1(b.v, 1, 0)), mask)));
#endif
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
#ifdef EIGEN_VECTORIZE_SSE3
return pconj(internal::pmul(a, b));
#else
const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
return Packet1cd(_mm_sub_pd(_mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v), mask),
_mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
vec2d_swizzle1(b.v, 1, 0))));
#endif
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d)
template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
// TODO optimize it for SSE3 and 4
Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
Packet1cd res = pmul(a,pconj(b));
__m128d s = _mm_mul_pd(b.v,b.v);
return Packet1cd(_mm_div_pd(res.v, _mm_add_pd(s,_mm_shuffle_pd(s, s, 0x1))));
}

View File

@@ -22,10 +22,6 @@ namespace internal
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#endif
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32
template <typename Scalar, int SVEVectorLength>

View File

@@ -165,45 +165,12 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const
{
return pfirst(a);
}
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d)
template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
// TODO optimize it for AltiVec
Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
Packet1cd res = pmul(a,pconj(b));
Packet2d s = vec_madd(b.v, b.v, p2d_ZERO_);
return Packet1cd(pdiv(res.v, s + vec_perm(s, s, p16uc_REVERSE64)));
}
@@ -337,39 +304,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return res;
}
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
@@ -456,45 +390,12 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst<Packet2cf>(prod);
}
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
// TODO optimize it for AltiVec
Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a, b);
Packet2cf res = pmul(a, pconj(b));
Packet4f s = pmul<Packet4f>(b.v, b.v);
return Packet2cf(pdiv(res.v, padd<Packet4f>(s, vec_perm(s, s, p16uc_COMPLEX32_REV))));
}

View File

@@ -22,10 +22,6 @@ namespace internal {
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#endif
#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32
#endif
@@ -94,8 +90,9 @@ static _EIGEN_DECLARE_CONST_FAST_Packet2d(ZERO, 0);
static _EIGEN_DECLARE_CONST_FAST_Packet2l(ZERO, 0);
static _EIGEN_DECLARE_CONST_FAST_Packet2l(ONE, 1);
static Packet2d p2d_ONE = { 1.0, 1.0 };
static Packet2d p2d_ZERO_ = { -0.0, -0.0 };
static Packet2d p2d_ONE = { 1.0, 1.0 };
static Packet2d p2d_ZERO_ = { numext::bit_cast<double>0x8000000000000000ull),
numext::bit_cast<double>0x8000000000000000ull) };
#if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12)
#define _EIGEN_DECLARE_CONST_FAST_Packet4f(NAME,X) \

View File

@@ -50,7 +50,7 @@ struct scalar_sum_op : binary_op_base<LhsScalar,RhsScalar>
template<typename LhsScalar,typename RhsScalar>
struct functor_traits<scalar_sum_op<LhsScalar,RhsScalar> > {
enum {
Cost = (NumTraits<LhsScalar>::AddCost+NumTraits<RhsScalar>::AddCost)/2, // rough estimate!
Cost = (int(NumTraits<LhsScalar>::AddCost) + int(NumTraits<RhsScalar>::AddCost)) / 2, // rough estimate!
PacketAccess = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasAdd && packet_traits<RhsScalar>::HasAdd
// TODO vectorize mixed sum
};
@@ -88,7 +88,7 @@ struct scalar_product_op : binary_op_base<LhsScalar,RhsScalar>
template<typename LhsScalar,typename RhsScalar>
struct functor_traits<scalar_product_op<LhsScalar,RhsScalar> > {
enum {
Cost = (NumTraits<LhsScalar>::MulCost + NumTraits<RhsScalar>::MulCost)/2, // rough estimate!
Cost = (int(NumTraits<LhsScalar>::MulCost) + int(NumTraits<RhsScalar>::MulCost))/2, // rough estimate!
PacketAccess = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasMul && packet_traits<RhsScalar>::HasMul
// TODO vectorize mixed product
};
@@ -364,7 +364,7 @@ struct scalar_difference_op : binary_op_base<LhsScalar,RhsScalar>
template<typename LhsScalar,typename RhsScalar>
struct functor_traits<scalar_difference_op<LhsScalar,RhsScalar> > {
enum {
Cost = (NumTraits<LhsScalar>::AddCost+NumTraits<RhsScalar>::AddCost)/2,
Cost = (int(NumTraits<LhsScalar>::AddCost) + int(NumTraits<RhsScalar>::AddCost)) / 2,
PacketAccess = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasSub && packet_traits<RhsScalar>::HasSub
};
};

View File

@@ -12,6 +12,28 @@
namespace Eigen {
// Portable replacements for certain functors.
namespace numext {
template<typename T = void>
struct equal_to {
typedef bool result_type;
EIGEN_DEVICE_FUNC bool operator()(const T& lhs, const T& rhs) const {
return lhs == rhs;
}
};
template<typename T = void>
struct not_equal_to {
typedef bool result_type;
EIGEN_DEVICE_FUNC bool operator()(const T& lhs, const T& rhs) const {
return lhs != rhs;
}
};
}
namespace internal {
// default functor traits for STL functors:
@@ -68,10 +90,18 @@ template<typename T>
struct functor_traits<std::equal_to<T> >
{ enum { Cost = 1, PacketAccess = false }; };
template<typename T>
struct functor_traits<numext::equal_to<T> >
: functor_traits<std::equal_to<T> > {};
template<typename T>
struct functor_traits<std::not_equal_to<T> >
{ enum { Cost = 1, PacketAccess = false }; };
template<typename T>
struct functor_traits<numext::not_equal_to<T> >
: functor_traits<std::not_equal_to<T> > {};
#if (EIGEN_COMP_CXXVER < 11)
// std::binder* are deprecated since c++11 and will be removed in c++17
template<typename T>

View File

@@ -109,7 +109,7 @@ struct functor_traits<scalar_abs2_op<Scalar> >
template<typename Scalar> struct scalar_conjugate_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_conjugate_op)
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::conj; return conj(a); }
EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::conj(a); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pconj(a); }
};
@@ -138,7 +138,7 @@ struct functor_traits<scalar_conjugate_op<Scalar> >
template<typename Scalar> struct scalar_arg_op {
EIGEN_EMPTY_STRUCT_CTOR(scalar_arg_op)
typedef typename NumTraits<Scalar>::Real result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { using numext::arg; return arg(a); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return numext::arg(a); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
{ return internal::parg(a); }

View File

@@ -44,7 +44,7 @@ inline std::ptrdiff_t manage_caching_sizes_helper(std::ptrdiff_t a, std::ptrdiff
#endif // defined(EIGEN_DEFAULT_L2_CACHE_SIZE)
#if defined(EIGEN_DEFAULT_L3_CACHE_SIZE)
#define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) EIGEN_SET_DEFAULT_L3_CACHE_SIZE
#define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) EIGEN_DEFAULT_L3_CACHE_SIZE
#else
#define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) val
#endif // defined(EIGEN_DEFAULT_L3_CACHE_SIZE)
@@ -349,36 +349,6 @@ inline void computeProductBlockingSizes(Index& k, Index& m, Index& n, Index num_
computeProductBlockingSizes<LhsScalar,RhsScalar,1,Index>(k, m, n, num_threads);
}
#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
#define CJMADD(CJ,A,B,C,T) C = CJ.pmadd(A,B,C);
#else
// FIXME (a bit overkill maybe ?)
template<typename CJ, typename A, typename B, typename C, typename T> struct gebp_madd_selector {
EIGEN_ALWAYS_INLINE static void run(const CJ& cj, A& a, B& b, C& c, T& /*t*/)
{
c = cj.pmadd(a,b,c);
}
};
template<typename CJ, typename T> struct gebp_madd_selector<CJ,T,T,T,T> {
EIGEN_ALWAYS_INLINE static void run(const CJ& cj, T& a, T& b, T& c, T& t)
{
t = b; t = cj.pmul(a,t); c = padd(c,t);
}
};
template<typename CJ, typename A, typename B, typename C, typename T>
EIGEN_STRONG_INLINE void gebp_madd(const CJ& cj, A& a, B& b, C& c, T& t)
{
gebp_madd_selector<CJ,A,B,C,T>::run(cj,a,b,c,t);
}
#define CJMADD(CJ,A,B,C,T) gebp_madd(CJ,A,B,C,T);
// #define CJMADD(CJ,A,B,C,T) T = B; T = CJ.pmul(A,T); C = padd(C,T);
#endif
template <typename RhsPacket, typename RhsPacketx4, int registers_taken>
struct RhsPanelHelper {
private:
@@ -1673,8 +1643,8 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
EIGEN_GEBGP_ONESTEP(6);
EIGEN_GEBGP_ONESTEP(7);
blB += pk*RhsProgress;
blA += pk*3*Traits::LhsProgress;
blB += int(pk) * int(RhsProgress);
blA += int(pk) * 3 * int(Traits::LhsProgress);
EIGEN_ASM_COMMENT("end gebp micro kernel 3pX1");
}
@@ -1885,8 +1855,8 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
EIGEN_GEBGP_ONESTEP(6);
EIGEN_GEBGP_ONESTEP(7);
blB += pk*RhsProgress;
blA += pk*2*Traits::LhsProgress;
blB += int(pk) * int(RhsProgress);
blA += int(pk) * 2 * int(Traits::LhsProgress);
EIGEN_ASM_COMMENT("end gebp micro kernel 2pX1");
}
@@ -2060,14 +2030,14 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
B_0 = blB[0];
B_1 = blB[1];
CJMADD(cj,A0,B_0,C0, B_0);
CJMADD(cj,A0,B_1,C1, B_1);
C0 = cj.pmadd(A0,B_0,C0);
C1 = cj.pmadd(A0,B_1,C1);
B_0 = blB[2];
B_1 = blB[3];
CJMADD(cj,A0,B_0,C2, B_0);
CJMADD(cj,A0,B_1,C3, B_1);
C2 = cj.pmadd(A0,B_0,C2);
C3 = cj.pmadd(A0,B_1,C3);
blB += 4;
}
res(i, j2 + 0) += alpha * C0;
@@ -2092,7 +2062,7 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
{
LhsScalar A0 = blA[k];
RhsScalar B_0 = blB[k];
CJMADD(cj, A0, B_0, C0, B_0);
C0 = cj.pmadd(A0, B_0, C0);
}
res(i, j2) += alpha * C0;
}
@@ -2101,8 +2071,6 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
}
#undef CJMADD
// pack a block of the lhs
// The traversal is as follow (mr==4):
// 0 4 8 12 ...

View File

@@ -80,8 +80,8 @@ EIGEN_DEVICE_FUNC SelfAdjointView<MatrixType,UpLo>& SelfAdjointView<MatrixType,U
if (IsRowMajor)
actualAlpha = numext::conj(actualAlpha);
typedef typename internal::remove_all<typename internal::conj_expr_if<IsRowMajor ^ UBlasTraits::NeedToConjugate,_ActualUType>::type>::type UType;
typedef typename internal::remove_all<typename internal::conj_expr_if<IsRowMajor ^ VBlasTraits::NeedToConjugate,_ActualVType>::type>::type VType;
typedef typename internal::remove_all<typename internal::conj_expr_if<int(IsRowMajor) ^ int(UBlasTraits::NeedToConjugate), _ActualUType>::type>::type UType;
typedef typename internal::remove_all<typename internal::conj_expr_if<int(IsRowMajor) ^ int(VBlasTraits::NeedToConjugate), _ActualVType>::type>::type VType;
internal::selfadjoint_rank2_update_selector<Scalar, Index, UType, VType,
(IsRowMajor ? int(UpLo==Upper ? Lower : Upper) : UpLo)>
::run(_expression().const_cast_derived().data(),_expression().outerStride(),UType(actualU),VType(actualV),actualAlpha);

View File

@@ -39,90 +39,6 @@ template<typename Index,
typename RhsScalar, typename RhsMapper, bool ConjugateRhs, int Version=Specialized>
struct general_matrix_vector_product;
template<bool Conjugate> struct conj_if;
template<> struct conj_if<true> {
template<typename T>
inline T operator()(const T& x) const { return numext::conj(x); }
template<typename T>
inline T pconj(const T& x) const { return internal::pconj(x); }
};
template<> struct conj_if<false> {
template<typename T>
inline const T& operator()(const T& x) const { return x; }
template<typename T>
inline const T& pconj(const T& x) const { return x; }
};
// Generic implementation for custom complex types.
template<typename LhsScalar, typename RhsScalar, bool ConjLhs, bool ConjRhs>
struct conj_helper
{
typedef typename ScalarBinaryOpTraits<LhsScalar,RhsScalar>::ReturnType Scalar;
EIGEN_STRONG_INLINE Scalar pmadd(const LhsScalar& x, const RhsScalar& y, const Scalar& c) const
{ return padd(c, pmul(x,y)); }
EIGEN_STRONG_INLINE Scalar pmul(const LhsScalar& x, const RhsScalar& y) const
{ return conj_if<ConjLhs>()(x) * conj_if<ConjRhs>()(y); }
};
template<typename Scalar> struct conj_helper<Scalar,Scalar,false,false>
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const { return internal::pmadd(x,y,c); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const { return internal::pmul(x,y); }
};
template<typename RealScalar> struct conj_helper<std::complex<RealScalar>, std::complex<RealScalar>, false,true>
{
typedef std::complex<RealScalar> Scalar;
EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const
{ return c + pmul(x,y); }
EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const
{ return Scalar(numext::real(x)*numext::real(y) + numext::imag(x)*numext::imag(y), numext::imag(x)*numext::real(y) - numext::real(x)*numext::imag(y)); }
};
template<typename RealScalar> struct conj_helper<std::complex<RealScalar>, std::complex<RealScalar>, true,false>
{
typedef std::complex<RealScalar> Scalar;
EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const
{ return c + pmul(x,y); }
EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const
{ return Scalar(numext::real(x)*numext::real(y) + numext::imag(x)*numext::imag(y), numext::real(x)*numext::imag(y) - numext::imag(x)*numext::real(y)); }
};
template<typename RealScalar> struct conj_helper<std::complex<RealScalar>, std::complex<RealScalar>, true,true>
{
typedef std::complex<RealScalar> Scalar;
EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const
{ return c + pmul(x,y); }
EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const
{ return Scalar(numext::real(x)*numext::real(y) - numext::imag(x)*numext::imag(y), - numext::real(x)*numext::imag(y) - numext::imag(x)*numext::real(y)); }
};
template<typename RealScalar,bool Conj> struct conj_helper<std::complex<RealScalar>, RealScalar, Conj,false>
{
typedef std::complex<RealScalar> Scalar;
EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const RealScalar& y, const Scalar& c) const
{ return padd(c, pmul(x,y)); }
EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const RealScalar& y) const
{ return conj_if<Conj>()(x)*y; }
};
template<typename RealScalar,bool Conj> struct conj_helper<RealScalar, std::complex<RealScalar>, false,Conj>
{
typedef std::complex<RealScalar> Scalar;
EIGEN_STRONG_INLINE Scalar pmadd(const RealScalar& x, const Scalar& y, const Scalar& c) const
{ return padd(c, pmul(x,y)); }
EIGEN_STRONG_INLINE Scalar pmul(const RealScalar& x, const Scalar& y) const
{ return x*conj_if<Conj>()(y); }
};
template<typename From,typename To> struct get_factor {
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE To run(const From& x) { return To(x); }
};
@@ -602,7 +518,7 @@ struct blas_traits<const T>
template<typename T, bool HasUsableDirectAccess=blas_traits<T>::HasUsableDirectAccess>
struct extract_data_selector {
static const typename T::Scalar* run(const T& m)
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static const typename T::Scalar* run(const T& m)
{
return blas_traits<T>::extract(m).data();
}
@@ -613,7 +529,8 @@ struct extract_data_selector<T,false> {
static typename T::Scalar* run(const T&) { return 0; }
};
template<typename T> const typename T::Scalar* extract_data(const T& m)
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const typename T::Scalar* extract_data(const T& m)
{
return extract_data_selector<T>::run(m);
}

View File

@@ -157,7 +157,7 @@ const unsigned int DirectAccessBit = 0x40;
/** \deprecated \ingroup flags
*
* means the first coefficient packet is guaranteed to be aligned.
* An expression cannot has the AlignedBit without the PacketAccessBit flag.
* An expression cannot have the AlignedBit without the PacketAccessBit flag.
* In other words, this means we are allow to perform an aligned packet access to the first element regardless
* of the expression kind:
* \code

View File

@@ -77,7 +77,7 @@ public:
template<int M>
FixedInt<N&M> operator&( FixedInt<M>) const { return FixedInt<N&M>(); }
#if EIGEN_HAS_CXX14
#if EIGEN_HAS_CXX14_VARIABLE_TEMPLATES
// Needed in C++14 to allow fix<N>():
FixedInt operator() () const { return *this; }
@@ -184,7 +184,7 @@ template<int N, int DynamicKey> struct cleanup_index_type<std::integral_constant
#ifndef EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX14
#if EIGEN_HAS_CXX14_VARIABLE_TEMPLATES
template<int N>
static const internal::FixedInt<N> fix{};
#else

View File

@@ -16,8 +16,8 @@
//------------------------------------------------------------------------------------------
#define EIGEN_WORLD_VERSION 3
#define EIGEN_MAJOR_VERSION 3
#define EIGEN_MINOR_VERSION 90
#define EIGEN_MAJOR_VERSION 4
#define EIGEN_MINOR_VERSION 0
#define EIGEN_VERSION_AT_LEAST(x,y,z) (EIGEN_WORLD_VERSION>x || (EIGEN_WORLD_VERSION>=x && \
(EIGEN_MAJOR_VERSION>y || (EIGEN_MAJOR_VERSION>=y && \
@@ -162,8 +162,8 @@
/// \internal EIGEN_COMP_IBM set to xlc version if the compiler is IBM XL C++
// XLC version
// 3.1 0x0301
// 4.5 0x0405
// 3.1 0x0301
// 4.5 0x0405
// 5.0 0x0500
// 12.1 0x0C01
#if defined(__IBMCPP__) || defined(__xlc__) || defined(__ibmxl__)
@@ -637,6 +637,14 @@
#define EIGEN_COMP_CXXVER 03
#endif
#ifndef EIGEN_HAS_CXX14_VARIABLE_TEMPLATES
#if defined(__cpp_variable_templates) && __cpp_variable_templates >= 201304 && EIGEN_MAX_CPP_VER>=14
#define EIGEN_HAS_CXX14_VARIABLE_TEMPLATES 1
#else
#define EIGEN_HAS_CXX14_VARIABLE_TEMPLATES 0
#endif
#endif
// The macros EIGEN_HAS_CXX?? defines a rough estimate of available c++ features
// but in practice we should not rely on them but rather on the availabilty of
@@ -833,7 +841,7 @@
#endif
#endif
// NOTE: the required Apple's clang version is very conservative
// NOTE: the required Apple's clang version is very conservative
// and it could be that XCode 9 works just fine.
// NOTE: the MSVC version is based on https://en.cppreference.com/w/cpp/compiler_support
// and not tested.
@@ -962,7 +970,7 @@
#endif
#define EIGEN_DEVICE_FUNC __attribute__((flatten)) __attribute__((always_inline))
// All functions callable from CUDA/HIP code must be qualified with __device__
#elif defined(EIGEN_GPUCC)
#elif defined(EIGEN_GPUCC)
#define EIGEN_DEVICE_FUNC __host__ __device__
#else
#define EIGEN_DEVICE_FUNC
@@ -989,7 +997,7 @@
#else
#define eigen_plain_assert(x)
#endif
#else
#else
#if EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO
namespace Eigen {
namespace internal {
@@ -1177,8 +1185,12 @@ namespace Eigen {
#define EIGEN_USING_STD(FUNC) using std::FUNC;
#endif
#if EIGEN_COMP_MSVC_STRICT && (EIGEN_COMP_MSVC < 1900 || EIGEN_COMP_NVCC)
// for older MSVC versions, as well as 1900 && CUDA 8, using the base operator is sufficient (cf Bugs 1000, 1324)
#if EIGEN_COMP_MSVC_STRICT && (EIGEN_COMP_MSVC < 1900 || (EIGEN_COMP_MSVC == 1900 && EIGEN_COMP_NVCC))
// For older MSVC versions, as well as 1900 && CUDA 8, using the base operator is necessary,
// otherwise we get duplicate definition errors
// For later MSVC versions, we require explicit operator= definition, otherwise we get
// use of implicitly deleted operator errors.
// (cf Bugs 920, 1000, 1324, 2291)
#define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \
using Base::operator =;
#elif EIGEN_COMP_CLANG // workaround clang bug (see http://forum.kde.org/viewtopic.php?f=74&t=102653)

View File

@@ -566,6 +566,17 @@ template<typename T> struct smart_memmove_helper<T,false> {
}
};
#if EIGEN_HAS_RVALUE_REFERENCES
template<typename T> EIGEN_DEVICE_FUNC T* smart_move(T* start, T* end, T* target)
{
return std::move(start, end, target);
}
#else
template<typename T> EIGEN_DEVICE_FUNC T* smart_move(T* start, T* end, T* target)
{
return std::copy(start, end, target);
}
#endif
/*****************************************************************************
*** Implementation of runtime stack allocation (falling back to malloc) ***

View File

@@ -194,12 +194,12 @@ template<> struct make_unsigned<signed __int64> { typedef unsigned __int64 typ
template<> struct make_unsigned<unsigned __int64> { typedef unsigned __int64 type; };
#endif
// Some platforms define int64_t as long long even for C++03. In this case we
// are missing the definition for make_unsigned. If we just define it, we get
// duplicated definitions for platforms defining int64_t as signed long for
// C++03. We therefore add the specialization for C++03 long long for these
// platforms only.
#if EIGEN_OS_MAC
// Some platforms define int64_t as `long long` even for C++03, where
// `long long` is not guaranteed by the standard. In this case we are missing
// the definition for make_unsigned. If we just define it, we run into issues
// where `long long` doesn't exist in some compilers for C++03. We therefore add
// the specialization for these platforms only.
#if EIGEN_OS_MAC || EIGEN_COMP_MINGW
template<> struct make_unsigned<unsigned long long> { typedef unsigned long long type; };
template<> struct make_unsigned<long long> { typedef unsigned long long type; };
#endif
@@ -715,20 +715,25 @@ class meta_sqrt<Y, InfX, SupX, true> { public: enum { ret = (SupX*SupX <= Y) ?
/** \internal Computes the least common multiple of two positive integer A and B
* at compile-time. It implements a naive algorithm testing all multiples of A.
* It thus works better if A>=B.
* at compile-time.
*/
template<int A, int B, int K=1, bool Done = ((A*K)%B)==0>
template<int A, int B, int K=1, bool Done = ((A*K)%B)==0, bool Big=(A>=B)>
struct meta_least_common_multiple
{
enum { ret = meta_least_common_multiple<A,B,K+1>::ret };
};
template<int A, int B, int K, bool Done>
struct meta_least_common_multiple<A,B,K,Done,false>
{
enum { ret = meta_least_common_multiple<B,A,K>::ret };
};
template<int A, int B, int K>
struct meta_least_common_multiple<A,B,K,true>
struct meta_least_common_multiple<A,B,K,true,true>
{
enum { ret = A*K };
};
/** \internal determines whether the product of two numeric types is allowed and what the return type is */
template<typename T, typename U> struct scalar_product_traits
{

View File

@@ -184,19 +184,7 @@ template<typename T> struct functor_traits
template<typename T> struct packet_traits;
template<typename T> struct unpacket_traits
{
typedef T type;
typedef T half;
enum
{
size = 1,
alignment = 1,
vectorizable = false,
masked_load_available=false,
masked_store_available=false
};
};
template<typename T> struct unpacket_traits;
template<int Size, typename PacketType,
bool Stop = Size==Dynamic || (Size%unpacket_traits<PacketType>::size)==0 || is_same<PacketType,typename unpacket_traits<PacketType>::half>::value>
@@ -611,9 +599,9 @@ template<typename ExpressionType, typename Scalar = typename ExpressionType::Sca
struct plain_row_type
{
typedef Matrix<Scalar, 1, ExpressionType::ColsAtCompileTime,
ExpressionType::PlainObject::Options | RowMajor, 1, ExpressionType::MaxColsAtCompileTime> MatrixRowType;
int(ExpressionType::PlainObject::Options) | int(RowMajor), 1, ExpressionType::MaxColsAtCompileTime> MatrixRowType;
typedef Array<Scalar, 1, ExpressionType::ColsAtCompileTime,
ExpressionType::PlainObject::Options | RowMajor, 1, ExpressionType::MaxColsAtCompileTime> ArrayRowType;
int(ExpressionType::PlainObject::Options) | int(RowMajor), 1, ExpressionType::MaxColsAtCompileTime> ArrayRowType;
typedef typename conditional<
is_same< typename traits<ExpressionType>::XprKind, MatrixXpr >::value,

View File

@@ -267,7 +267,7 @@ template<typename _MatrixType> class HessenbergDecomposition
private:
typedef Matrix<Scalar, 1, Size, Options | RowMajor, 1, MaxSize> VectorType;
typedef Matrix<Scalar, 1, Size, int(Options) | int(RowMajor), 1, MaxSize> VectorType;
typedef typename NumTraits<Scalar>::Real RealScalar;
static void _compute(MatrixType& matA, CoeffVectorType& hCoeffs, VectorType& temp);

View File

@@ -125,6 +125,7 @@ template<typename _MatrixType> class SelfAdjointEigenSolver
: m_eivec(),
m_eivalues(),
m_subdiag(),
m_hcoeffs(),
m_info(InvalidInput),
m_isInitialized(false),
m_eigenvectorsOk(false)
@@ -147,6 +148,7 @@ template<typename _MatrixType> class SelfAdjointEigenSolver
: m_eivec(size, size),
m_eivalues(size),
m_subdiag(size > 1 ? size - 1 : 1),
m_hcoeffs(size > 1 ? size - 1 : 1),
m_isInitialized(false),
m_eigenvectorsOk(false)
{}
@@ -172,6 +174,7 @@ template<typename _MatrixType> class SelfAdjointEigenSolver
: m_eivec(matrix.rows(), matrix.cols()),
m_eivalues(matrix.cols()),
m_subdiag(matrix.rows() > 1 ? matrix.rows() - 1 : 1),
m_hcoeffs(matrix.cols() > 1 ? matrix.cols() - 1 : 1),
m_isInitialized(false),
m_eigenvectorsOk(false)
{
@@ -378,6 +381,7 @@ template<typename _MatrixType> class SelfAdjointEigenSolver
EigenvectorsType m_eivec;
RealVectorType m_eivalues;
typename TridiagonalizationType::SubDiagonalType m_subdiag;
typename TridiagonalizationType::CoeffVectorType m_hcoeffs;
ComputationInfo m_info;
bool m_isInitialized;
bool m_eigenvectorsOk;
@@ -450,7 +454,8 @@ SelfAdjointEigenSolver<MatrixType>& SelfAdjointEigenSolver<MatrixType>
if(scale==RealScalar(0)) scale = RealScalar(1);
mat.template triangularView<Lower>() /= scale;
m_subdiag.resize(n-1);
internal::tridiagonalization_inplace(mat, diag, m_subdiag, computeEigenvectors);
m_hcoeffs.resize(n-1);
internal::tridiagonalization_inplace(mat, diag, m_subdiag, m_hcoeffs, computeEigenvectors);
m_info = internal::computeFromTridiagonal_impl(diag, m_subdiag, m_maxIterations, computeEigenvectors, m_eivec);

View File

@@ -425,12 +425,13 @@ struct tridiagonalization_inplace_selector;
*
* \sa class Tridiagonalization
*/
template<typename MatrixType, typename DiagonalType, typename SubDiagonalType>
template<typename MatrixType, typename DiagonalType, typename SubDiagonalType, typename CoeffVectorType>
EIGEN_DEVICE_FUNC
void tridiagonalization_inplace(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, bool extractQ)
void tridiagonalization_inplace(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag,
CoeffVectorType& hcoeffs, bool extractQ)
{
eigen_assert(mat.cols()==mat.rows() && diag.size()==mat.rows() && subdiag.size()==mat.rows()-1);
tridiagonalization_inplace_selector<MatrixType>::run(mat, diag, subdiag, extractQ);
tridiagonalization_inplace_selector<MatrixType>::run(mat, diag, subdiag, hcoeffs, extractQ);
}
/** \internal
@@ -443,10 +444,9 @@ struct tridiagonalization_inplace_selector
typedef typename Tridiagonalization<MatrixType>::HouseholderSequenceType HouseholderSequenceType;
template<typename DiagonalType, typename SubDiagonalType>
static EIGEN_DEVICE_FUNC
void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, bool extractQ)
void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, CoeffVectorType& hCoeffs, bool extractQ)
{
CoeffVectorType hCoeffs(mat.cols()-1);
tridiagonalization_inplace(mat,hCoeffs);
tridiagonalization_inplace(mat, hCoeffs);
diag = mat.diagonal().real();
subdiag = mat.template diagonal<-1>().real();
if(extractQ)
@@ -466,8 +466,8 @@ struct tridiagonalization_inplace_selector<MatrixType,3,false>
typedef typename MatrixType::Scalar Scalar;
typedef typename MatrixType::RealScalar RealScalar;
template<typename DiagonalType, typename SubDiagonalType>
static void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, bool extractQ)
template<typename DiagonalType, typename SubDiagonalType, typename CoeffVectorType>
static void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, CoeffVectorType&, bool extractQ)
{
using std::sqrt;
const RealScalar tol = (std::numeric_limits<RealScalar>::min)();
@@ -511,9 +511,9 @@ struct tridiagonalization_inplace_selector<MatrixType,1,IsComplex>
{
typedef typename MatrixType::Scalar Scalar;
template<typename DiagonalType, typename SubDiagonalType>
template<typename DiagonalType, typename SubDiagonalType, typename CoeffVectorType>
static EIGEN_DEVICE_FUNC
void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType&, bool extractQ)
void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType&, CoeffVectorType&, bool extractQ)
{
diag(0,0) = numext::real(mat(0,0));
if(extractQ)

View File

@@ -28,8 +28,9 @@ struct quat_product<Architecture::Target, Derived, OtherDerived, float>
evaluator<typename Derived::Coefficients> ae(_a.coeffs());
evaluator<typename OtherDerived::Coefficients> be(_b.coeffs());
Quaternion<float> res;
float arr[4] = {0.f, 0.f, 0.f, -0.f};
const Packet4f mask = pset<Packet4f>(arr);
const float neg_zero = numext::bit_cast<float>(0x80000000u);
const float arr[4] = {0.f, 0.f, 0.f, neg_zero};
const Packet4f mask = ploadu<Packet4f>(arr);
Packet4f a = ae.template packet<AAlignment,Packet4f>(0);
Packet4f b = be.template packet<BAlignment,Packet4f>(0);
Packet4f s1 = pmul(vec4f_swizzle1(a,1,2,0,2),vec4f_swizzle1(b,2,0,1,2));
@@ -55,8 +56,9 @@ struct quat_conj<Architecture::Target, Derived, float>
{
evaluator<typename Derived::Coefficients> qe(q.coeffs());
Quaternion<float> res;
float arr[4] = {-0.f,-0.f,-0.f,0.f};
const Packet4f mask = pset<Packet4f>(arr);
const float neg_zero = numext::bit_cast<float>(0x80000000u);
const float arr[4] = {neg_zero, neg_zero, neg_zero,0.f};
const Packet4f mask = ploadu<Packet4f>(arr);
pstoret<float,Packet4f,ResAlignment>(&res.x(), pxor(mask, qe.template packet<traits<Derived>::Alignment,Packet4f>(0)));
return res;
}
@@ -146,10 +148,11 @@ struct quat_conj<Architecture::Target, Derived, double>
{
evaluator<typename Derived::Coefficients> qe(q.coeffs());
Quaternion<double> res;
double arr1[2] = {-0.0, -0.0};
double arr2[2] = {-0.0, 0.0};
const Packet2d mask0 = pset<Packet2d>(arr1);
const Packet2d mask2 = pset<Packet2d>(arr2);
const double neg_zero = numext::bit_cast<double>(0x8000000000000000ull);
const double arr1[2] = {neg_zero, neg_zero};
const double arr2[2] = {neg_zero, 0.0};
const Packet2d mask0 = ploadu<Packet2d>(arr1);
const Packet2d mask2 = ploadu<Packet2d>(arr2);
pstoret<double,Packet2d,ResAlignment>(&res.x(), pxor(mask0, qe.template packet<traits<Derived>::Alignment,Packet2d>(0)));
pstoret<double,Packet2d,ResAlignment>(&res.z(), pxor(mask2, qe.template packet<traits<Derived>::Alignment,Packet2d>(2)));
return res;

View File

@@ -77,10 +77,11 @@ inline void compute_inverse_size2_helper(
const MatrixType& matrix, const typename ResultType::Scalar& invdet,
ResultType& result)
{
typename ResultType::Scalar temp = matrix.coeff(0,0);
result.coeffRef(0,0) = matrix.coeff(1,1) * invdet;
result.coeffRef(1,0) = -matrix.coeff(1,0) * invdet;
result.coeffRef(0,1) = -matrix.coeff(0,1) * invdet;
result.coeffRef(1,1) = matrix.coeff(0,0) * invdet;
result.coeffRef(1,1) = temp * invdet;
}
template<typename MatrixType, typename ResultType>
@@ -143,13 +144,18 @@ inline void compute_inverse_size3_helper(
const Matrix<typename ResultType::Scalar,3,1>& cofactors_col0,
ResultType& result)
{
result.row(0) = cofactors_col0 * invdet;
result.coeffRef(1,0) = cofactor_3x3<MatrixType,0,1>(matrix) * invdet;
result.coeffRef(1,1) = cofactor_3x3<MatrixType,1,1>(matrix) * invdet;
// Compute cofactors in a way that avoids aliasing issues.
typedef typename ResultType::Scalar Scalar;
const Scalar c01 = cofactor_3x3<MatrixType,0,1>(matrix) * invdet;
const Scalar c11 = cofactor_3x3<MatrixType,1,1>(matrix) * invdet;
const Scalar c02 = cofactor_3x3<MatrixType,0,2>(matrix) * invdet;
result.coeffRef(1,2) = cofactor_3x3<MatrixType,2,1>(matrix) * invdet;
result.coeffRef(2,0) = cofactor_3x3<MatrixType,0,2>(matrix) * invdet;
result.coeffRef(2,1) = cofactor_3x3<MatrixType,1,2>(matrix) * invdet;
result.coeffRef(2,2) = cofactor_3x3<MatrixType,2,2>(matrix) * invdet;
result.coeffRef(1,0) = c01;
result.coeffRef(1,1) = c11;
result.coeffRef(2,0) = c02;
result.row(0) = cofactors_col0 * invdet;
}
template<typename MatrixType, typename ResultType>
@@ -181,14 +187,13 @@ struct compute_inverse_and_det_with_check<MatrixType, ResultType, 3>
bool& invertible
)
{
using std::abs;
typedef typename ResultType::Scalar Scalar;
Matrix<Scalar,3,1> cofactors_col0;
cofactors_col0.coeffRef(0) = cofactor_3x3<MatrixType,0,0>(matrix);
cofactors_col0.coeffRef(1) = cofactor_3x3<MatrixType,1,0>(matrix);
cofactors_col0.coeffRef(2) = cofactor_3x3<MatrixType,2,0>(matrix);
determinant = (cofactors_col0.cwiseProduct(matrix.col(0))).sum();
invertible = abs(determinant) > absDeterminantThreshold;
invertible = Eigen::numext::abs(determinant) > absDeterminantThreshold;
if(!invertible) return;
const Scalar invdet = Scalar(1) / determinant;
compute_inverse_size3_helper(matrix, invdet, cofactors_col0, inverse);
@@ -273,7 +278,13 @@ struct compute_inverse_and_det_with_check<MatrixType, ResultType, 4>
using std::abs;
determinant = matrix.determinant();
invertible = abs(determinant) > absDeterminantThreshold;
if(invertible) compute_inverse<MatrixType, ResultType>::run(matrix, inverse);
if(invertible && extract_data(matrix) != extract_data(inverse)) {
compute_inverse<MatrixType, ResultType>::run(matrix, inverse);
}
else if(invertible) {
MatrixType matrix_t = matrix;
compute_inverse<MatrixType, ResultType>::run(matrix_t, inverse);
}
}
};
@@ -347,6 +358,8 @@ inline const Inverse<Derived> MatrixBase<Derived>::inverse() const
*
* This is only for fixed-size square matrices of size up to 4x4.
*
* Notice that it will trigger a copy of input matrix when trying to do the inverse in place.
*
* \param inverse Reference to the matrix in which to store the inverse.
* \param determinant Reference to the variable in which to store the determinant.
* \param invertible Reference to the bool variable in which to store whether the matrix is invertible.
@@ -387,6 +400,8 @@ inline void MatrixBase<Derived>::computeInverseAndDetWithCheck(
*
* This is only for fixed-size square matrices of size up to 4x4.
*
* Notice that it will trigger a copy of input matrix when trying to do the inverse in place.
*
* \param inverse Reference to the matrix in which to store the inverse.
* \param invertible Reference to the bool variable in which to store whether the matrix is invertible.
* \param absDeterminantThreshold Optional parameter controlling the invertibility check.

View File

@@ -504,8 +504,13 @@ struct partial_lu_impl
template<typename MatrixType, typename TranspositionType>
void partial_lu_inplace(MatrixType& lu, TranspositionType& row_transpositions, typename TranspositionType::StorageIndex& nb_transpositions)
{
// Special-case of zero matrix.
if (lu.rows() == 0 || lu.cols() == 0) {
nb_transpositions = 0;
return;
}
eigen_assert(lu.cols() == row_transpositions.size());
eigen_assert((&row_transpositions.coeffRef(1)-&row_transpositions.coeffRef(0)) == 1);
eigen_assert(row_transpositions.size() < 2 || (&row_transpositions.coeffRef(1)-&row_transpositions.coeffRef(0)) == 1);
partial_lu_impl
< typename MatrixType::Scalar, MatrixType::Flags&RowMajorBit?RowMajor:ColMajor,

View File

@@ -54,10 +54,12 @@ struct compute_inverse_size4<Architecture::Target, float, MatrixType, ResultType
{
ActualMatrixType matrix(mat);
Packet4f _L1 = matrix.template packet<MatrixAlignment>(0);
Packet4f _L2 = matrix.template packet<MatrixAlignment>(4);
Packet4f _L3 = matrix.template packet<MatrixAlignment>(8);
Packet4f _L4 = matrix.template packet<MatrixAlignment>(12);
const float* data = matrix.data();
const Index stride = matrix.innerStride();
Packet4f _L1 = ploadt<Packet4f,MatrixAlignment>(data);
Packet4f _L2 = ploadt<Packet4f,MatrixAlignment>(data + stride*4);
Packet4f _L3 = ploadt<Packet4f,MatrixAlignment>(data + stride*8);
Packet4f _L4 = ploadt<Packet4f,MatrixAlignment>(data + stride*12);
// Four 2x2 sub-matrices of the input matrix
// input = [[A, B],
@@ -141,8 +143,8 @@ struct compute_inverse_size4<Architecture::Target, float, MatrixType, ResultType
iC = psub(iC, pmul(vec4f_swizzle2(A, A, 1, 0, 3, 2), vec4f_swizzle2(DC, DC, 2, 1, 2, 1)));
iC = psub(pmul(B, vec4f_duplane(dC, 0)), iC);
const float sign_mask[4] = {0.0f, -0.0f, -0.0f, 0.0f};
const Packet4f p4f_sign_PNNP = pset<Packet4f>(sign_mask);
const float sign_mask[4] = {0.0f, numext::bit_cast<float>(0x80000000u), numext::bit_cast<float>(0x80000000u), 0.0f};
const Packet4f p4f_sign_PNNP = ploadu<Packet4f>(sign_mask);
rd = pxor(rd, p4f_sign_PNNP);
iA = pmul(iA, rd);
iB = pmul(iB, rd);
@@ -189,25 +191,26 @@ struct compute_inverse_size4<Architecture::Target, double, MatrixType, ResultTyp
Packet2d A1, A2, B1, B2, C1, C2, D1, D2;
const double* data = matrix.data();
const Index stride = matrix.innerStride();
if (StorageOrdersMatch)
{
A1 = matrix.template packet<MatrixAlignment>(0);
B1 = matrix.template packet<MatrixAlignment>(2);
A2 = matrix.template packet<MatrixAlignment>(4);
B2 = matrix.template packet<MatrixAlignment>(6);
C1 = matrix.template packet<MatrixAlignment>(8);
D1 = matrix.template packet<MatrixAlignment>(10);
C2 = matrix.template packet<MatrixAlignment>(12);
D2 = matrix.template packet<MatrixAlignment>(14);
A1 = ploadt<Packet2d,MatrixAlignment>(data + stride*0);
B1 = ploadt<Packet2d,MatrixAlignment>(data + stride*2);
A2 = ploadt<Packet2d,MatrixAlignment>(data + stride*4);
B2 = ploadt<Packet2d,MatrixAlignment>(data + stride*6);
C1 = ploadt<Packet2d,MatrixAlignment>(data + stride*8);
D1 = ploadt<Packet2d,MatrixAlignment>(data + stride*10);
C2 = ploadt<Packet2d,MatrixAlignment>(data + stride*12);
D2 = ploadt<Packet2d,MatrixAlignment>(data + stride*14);
}
else
{
Packet2d temp;
A1 = matrix.template packet<MatrixAlignment>(0);
C1 = matrix.template packet<MatrixAlignment>(2);
A2 = matrix.template packet<MatrixAlignment>(4);
C2 = matrix.template packet<MatrixAlignment>(6);
A1 = ploadt<Packet2d,MatrixAlignment>(data + stride*0);
C1 = ploadt<Packet2d,MatrixAlignment>(data + stride*2);
A2 = ploadt<Packet2d,MatrixAlignment>(data + stride*4);
C2 = ploadt<Packet2d,MatrixAlignment>(data + stride*6);
temp = A1;
A1 = vec2d_unpacklo(A1, A2);
A2 = vec2d_unpackhi(temp, A2);
@@ -216,10 +219,10 @@ struct compute_inverse_size4<Architecture::Target, double, MatrixType, ResultTyp
C1 = vec2d_unpacklo(C1, C2);
C2 = vec2d_unpackhi(temp, C2);
B1 = matrix.template packet<MatrixAlignment>(8);
D1 = matrix.template packet<MatrixAlignment>(10);
B2 = matrix.template packet<MatrixAlignment>(12);
D2 = matrix.template packet<MatrixAlignment>(14);
B1 = ploadt<Packet2d,MatrixAlignment>(data + stride*8);
D1 = ploadt<Packet2d,MatrixAlignment>(data + stride*10);
B2 = ploadt<Packet2d,MatrixAlignment>(data + stride*12);
D2 = ploadt<Packet2d,MatrixAlignment>(data + stride*14);
temp = B1;
B1 = vec2d_unpacklo(B1, B2);
@@ -323,10 +326,10 @@ struct compute_inverse_size4<Architecture::Target, double, MatrixType, ResultTyp
iC1 = psub(pmul(B1, dC), iC1);
iC2 = psub(pmul(B2, dC), iC2);
const double sign_mask1[2] = {0.0, -0.0};
const double sign_mask2[2] = {-0.0, 0.0};
const Packet2d sign_PN = pset<Packet2d>(sign_mask1);
const Packet2d sign_NP = pset<Packet2d>(sign_mask2);
const double sign_mask1[2] = {0.0, numext::bit_cast<double>(0x8000000000000000ull)};
const double sign_mask2[2] = {numext::bit_cast<double>(0x8000000000000000ull), 0.0};
const Packet2d sign_PN = ploadu<Packet2d>(sign_mask1);
const Packet2d sign_NP = ploadu<Packet2d>(sign_mask2);
d1 = pxor(rd, sign_PN);
d2 = pxor(rd, sign_NP);

View File

@@ -112,12 +112,12 @@ public:
ColsAtCompileTime = MatrixType::ColsAtCompileTime,
MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
TrOptions = RowsAtCompileTime==1 ? (MatrixType::Options & ~(RowMajor))
: ColsAtCompileTime==1 ? (MatrixType::Options | RowMajor)
: MatrixType::Options
Options = MatrixType::Options
};
typedef Matrix<Scalar, ColsAtCompileTime, RowsAtCompileTime, TrOptions, MaxColsAtCompileTime, MaxRowsAtCompileTime>
TransposeTypeWithSameStorageOrder;
typedef typename internal::make_proper_matrix_type<
Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime
>::type TransposeTypeWithSameStorageOrder;
void allocate(const JacobiSVD<MatrixType, FullPivHouseholderQRPreconditioner>& svd)
{
@@ -202,13 +202,12 @@ public:
ColsAtCompileTime = MatrixType::ColsAtCompileTime,
MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
TrOptions = RowsAtCompileTime==1 ? (MatrixType::Options & ~(RowMajor))
: ColsAtCompileTime==1 ? (MatrixType::Options | RowMajor)
: MatrixType::Options
Options = MatrixType::Options
};
typedef Matrix<Scalar, ColsAtCompileTime, RowsAtCompileTime, TrOptions, MaxColsAtCompileTime, MaxRowsAtCompileTime>
TransposeTypeWithSameStorageOrder;
typedef typename internal::make_proper_matrix_type<
Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime
>::type TransposeTypeWithSameStorageOrder;
void allocate(const JacobiSVD<MatrixType, ColPivHouseholderQRPreconditioner>& svd)
{
@@ -303,8 +302,9 @@ public:
Options = MatrixType::Options
};
typedef Matrix<Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime>
TransposeTypeWithSameStorageOrder;
typedef typename internal::make_proper_matrix_type<
Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime
>::type TransposeTypeWithSameStorageOrder;
void allocate(const JacobiSVD<MatrixType, HouseholderQRPreconditioner>& svd)
{

View File

@@ -218,7 +218,7 @@ class SimplicialCholeskyBase : public SparseSolverBase<Derived>
CholMatrixType tmp(size,size);
ConstCholMatrixPtr pmat;
if(m_P.size()==0 && (UpLo&Upper)==Upper)
if(m_P.size() == 0 && (int(UpLo) & int(Upper)) == Upper)
{
// If there is no ordering, try to directly use the input matrix without any copy
internal::simplicial_cholesky_grab_input<CholMatrixType,MatrixType>::run(a, pmat, tmp);

View File

@@ -126,7 +126,7 @@ public:
enum {
CoeffReadCost = evaluator<Lhs>::CoeffReadCost + evaluator<Rhs>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<Lhs>::CoeffReadCost) + int(evaluator<Rhs>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
Flags = XprType::Flags
};
@@ -211,7 +211,7 @@ public:
enum {
CoeffReadCost = evaluator<Lhs>::CoeffReadCost + evaluator<Rhs>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<Lhs>::CoeffReadCost) + int(evaluator<Rhs>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
Flags = XprType::Flags
};
@@ -298,7 +298,7 @@ public:
enum {
CoeffReadCost = evaluator<Lhs>::CoeffReadCost + evaluator<Rhs>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<Lhs>::CoeffReadCost) + int(evaluator<Rhs>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
Flags = XprType::Flags
};
@@ -457,7 +457,7 @@ public:
enum {
CoeffReadCost = evaluator<LhsArg>::CoeffReadCost + evaluator<RhsArg>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<LhsArg>::CoeffReadCost) + int(evaluator<RhsArg>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
Flags = XprType::Flags
};
@@ -530,7 +530,7 @@ public:
enum {
CoeffReadCost = evaluator<LhsArg>::CoeffReadCost + evaluator<RhsArg>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<LhsArg>::CoeffReadCost) + int(evaluator<RhsArg>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
Flags = XprType::Flags
};
@@ -604,7 +604,7 @@ public:
enum {
CoeffReadCost = evaluator<LhsArg>::CoeffReadCost + evaluator<RhsArg>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<LhsArg>::CoeffReadCost) + int(evaluator<RhsArg>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
Flags = XprType::Flags
};

View File

@@ -24,7 +24,7 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp,ArgType>, IteratorBased>
class InnerIterator;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
CoeffReadCost = int(evaluator<ArgType>::CoeffReadCost) + int(functor_traits<UnaryOp>::Cost),
Flags = XprType::Flags
};
@@ -79,7 +79,7 @@ struct unary_evaluator<CwiseUnaryView<ViewOp,ArgType>, IteratorBased>
class InnerIterator;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<ViewOp>::Cost,
CoeffReadCost = int(evaluator<ArgType>::CoeffReadCost) + int(functor_traits<ViewOp>::Cost),
Flags = XprType::Flags
};

View File

@@ -497,6 +497,45 @@ ceil() const
return CeilReturnType(derived());
}
template<int N> struct ShiftRightXpr {
typedef CwiseUnaryOp<internal::scalar_shift_right_op<Scalar, N>, const Derived> Type;
};
/** \returns an expression of \c *this with the \a Scalar type arithmetically
* shifted right by \a N bit positions.
*
* The template parameter \a N specifies the number of bit positions to shift.
*
* \sa shiftLeft()
*/
template<int N>
EIGEN_DEVICE_FUNC
typename ShiftRightXpr<N>::Type
shiftRight() const
{
return typename ShiftRightXpr<N>::Type(derived());
}
template<int N> struct ShiftLeftXpr {
typedef CwiseUnaryOp<internal::scalar_shift_left_op<Scalar, N>, const Derived> Type;
};
/** \returns an expression of \c *this with the \a Scalar type logically
* shifted left by \a N bit positions.
*
* The template parameter \a N specifies the number of bit positions to shift.
*
* \sa shiftRight()
*/
template<int N>
EIGEN_DEVICE_FUNC
typename ShiftLeftXpr<N>::Type
shiftLeft() const
{
return typename ShiftLeftXpr<N>::Type(derived());
}
/** \returns an expression of the coefficient-wise isnan of *this.
*
* Example: \include Cwise_isNaN.cpp

View File

@@ -64,49 +64,6 @@ cast() const
return typename CastXpr<NewType>::Type(derived());
}
template<int N> struct ShiftRightXpr {
typedef CwiseUnaryOp<internal::scalar_shift_right_op<Scalar, N>, const Derived> Type;
};
/// \returns an expression of \c *this with the \a Scalar type arithmetically
/// shifted right by \a N bit positions.
///
/// The template parameter \a N specifies the number of bit positions to shift.
///
EIGEN_DOC_UNARY_ADDONS(cast,conversion function)
///
/// \sa class CwiseUnaryOp
///
template<int N>
EIGEN_DEVICE_FUNC
typename ShiftRightXpr<N>::Type
shift_right() const
{
return typename ShiftRightXpr<N>::Type(derived());
}
template<int N> struct ShiftLeftXpr {
typedef CwiseUnaryOp<internal::scalar_shift_left_op<Scalar, N>, const Derived> Type;
};
/// \returns an expression of \c *this with the \a Scalar type logically
/// shifted left by \a N bit positions.
///
/// The template parameter \a N specifies the number of bit positions to shift.
///
EIGEN_DOC_UNARY_ADDONS(cast,conversion function)
///
/// \sa class CwiseUnaryOp
///
template<int N>
EIGEN_DEVICE_FUNC
typename ShiftLeftXpr<N>::Type
shift_left() const
{
return typename ShiftLeftXpr<N>::Type(derived());
}
/// \returns an expression of the complex conjugate of \c *this.
///
EIGEN_DOC_UNARY_ADDONS(conjugate,complex conjugate)

View File

@@ -39,10 +39,10 @@ cwiseProduct(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
inline const CwiseBinaryOp<std::equal_to<Scalar>, const Derived, const OtherDerived>
inline const CwiseBinaryOp<numext::equal_to<Scalar>, const Derived, const OtherDerived>
cwiseEqual(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
{
return CwiseBinaryOp<std::equal_to<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
return CwiseBinaryOp<numext::equal_to<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
}
/** \returns an expression of the coefficient-wise != operator of *this and \a other
@@ -59,10 +59,10 @@ cwiseEqual(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
inline const CwiseBinaryOp<std::not_equal_to<Scalar>, const Derived, const OtherDerived>
inline const CwiseBinaryOp<numext::not_equal_to<Scalar>, const Derived, const OtherDerived>
cwiseNotEqual(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
{
return CwiseBinaryOp<std::not_equal_to<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
return CwiseBinaryOp<numext::not_equal_to<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
}
/** \returns an expression of the coefficient-wise min of *this and \a other

View File

@@ -16,13 +16,13 @@ void benchBasic_loop(const MatrixType& I, MatrixType& m, int iterations)
{
asm("#begin_bench_loop LazyEval");
if (MatrixType::SizeAtCompileTime!=Eigen::Dynamic) asm("#fixedsize");
m = (I + 0.00005 * (m + m.lazy() * m)).eval();
m = (I + 0.00005 * (m + m.lazyProduct(m))).eval();
}
else if (Mode==OmpEval)
{
asm("#begin_bench_loop OmpEval");
if (MatrixType::SizeAtCompileTime!=Eigen::Dynamic) asm("#fixedsize");
m = (I + 0.00005 * (m + m.lazy() * m)).evalOMP();
m = (I + 0.00005 * (m + m.lazyProduct(m))).eval();
}
else
{

View File

@@ -26,20 +26,27 @@ else()
set(EigenBlas_SRCS ${EigenBlas_SRCS} f2c/complexdots.c)
endif()
add_library(eigen_blas_static ${EigenBlas_SRCS})
add_library(eigen_blas SHARED ${EigenBlas_SRCS})
set(EIGEN_BLAS_TARGETS "")
if(EIGEN_STANDARD_LIBRARIES_TO_LINK_TO)
target_link_libraries(eigen_blas_static ${EIGEN_STANDARD_LIBRARIES_TO_LINK_TO})
target_link_libraries(eigen_blas ${EIGEN_STANDARD_LIBRARIES_TO_LINK_TO})
add_library(eigen_blas_static ${EigenBlas_SRCS})
list(APPEND EIGEN_BLAS_TARGETS eigen_blas_static)
if (EIGEN_BUILD_SHARED_LIBS)
add_library(eigen_blas SHARED ${EigenBlas_SRCS})
list(APPEND EIGEN_BLAS_TARGETS eigen_blas)
endif()
add_dependencies(blas eigen_blas eigen_blas_static)
foreach(target IN LISTS EIGEN_BLAS_TARGETS)
if(EIGEN_STANDARD_LIBRARIES_TO_LINK_TO)
target_link_libraries(${target} ${EIGEN_STANDARD_LIBRARIES_TO_LINK_TO})
endif()
install(TARGETS eigen_blas eigen_blas_static
RUNTIME DESTINATION bin
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib)
add_dependencies(blas ${target})
install(TARGETS ${target}
RUNTIME DESTINATION bin
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib)
endforeach()
if(EIGEN_Fortran_COMPILER_WORKS)

View File

@@ -478,6 +478,7 @@ macro(ei_get_compilerver VAR)
execute_process(COMMAND ${CMAKE_CXX_COMPILER} ${EIGEN_CXX_FLAG_VERSION}
OUTPUT_VARIABLE eigen_cxx_compiler_version_string OUTPUT_STRIP_TRAILING_WHITESPACE)
string(REGEX REPLACE "^[ \n\r]+" "" eigen_cxx_compiler_version_string ${eigen_cxx_compiler_version_string})
string(REGEX REPLACE "[\n\r].*" "" eigen_cxx_compiler_version_string ${eigen_cxx_compiler_version_string})
ei_get_compilerver_from_cxx_version_string("${eigen_cxx_compiler_version_string}" CNAME CVER)
@@ -487,9 +488,10 @@ macro(ei_get_compilerver VAR)
endmacro()
# Extract compiler name and version from a raw version string
# WARNING: if you edit thid macro, then please test it by uncommenting
# WARNING: if you edit this macro, then please test it by uncommenting
# the testing macro call in ei_init_testing() of the EigenTesting.cmake file.
# See also the ei_test_get_compilerver_from_cxx_version_string macro at the end of the file
# See also the ei_test_get_compilerver_from_cxx_version_string macro at the end
# of the file
macro(ei_get_compilerver_from_cxx_version_string VERSTRING CNAME CVER)
# extract possible compiler names
string(REGEX MATCH "g\\+\\+" ei_has_gpp ${VERSTRING})
@@ -497,6 +499,7 @@ macro(ei_get_compilerver_from_cxx_version_string VERSTRING CNAME CVER)
string(REGEX MATCH "gcc|GCC" ei_has_gcc ${VERSTRING})
string(REGEX MATCH "icpc|ICC" ei_has_icpc ${VERSTRING})
string(REGEX MATCH "clang|CLANG" ei_has_clang ${VERSTRING})
string(REGEX MATCH "mingw32" ei_has_mingw ${VERSTRING})
# combine them
if((ei_has_llvm) AND (ei_has_gpp OR ei_has_gcc))
@@ -505,6 +508,8 @@ macro(ei_get_compilerver_from_cxx_version_string VERSTRING CNAME CVER)
set(${CNAME} "llvm-clang++")
elseif(ei_has_clang)
set(${CNAME} "clang++")
elseif ((ei_has_mingw) AND (ei_has_gpp OR ei_has_gcc))
set(${CNAME} "mingw32-g++")
elseif(ei_has_icpc)
set(${CNAME} "icpc")
elseif(ei_has_gpp OR ei_has_gcc)
@@ -525,11 +530,17 @@ macro(ei_get_compilerver_from_cxx_version_string VERSTRING CNAME CVER)
if(NOT eicver)
# try to extract 2:
string(REGEX MATCH "[^0-9][0-9]+\\.[0-9]+" eicver ${VERSTRING})
else()
set(eicver " _")
if (NOT eicver AND ei_has_mingw)
# try to extract 1 number plus suffix:
string(REGEX MATCH "[^0-9][0-9]+-win32" eicver ${VERSTRING})
endif()
endif()
endif()
endif()
if (NOT eicver)
set(eicver " _")
endif()
string(REGEX REPLACE ".(.*)" "\\1" ${CVER} ${eicver})
@@ -654,6 +665,7 @@ macro(ei_test_get_compilerver_from_cxx_version_string)
ei_test1_get_compilerver_from_cxx_version_string("i686-apple-darwin11-llvm-g++-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)" "llvm-g++" "4.2.1")
ei_test1_get_compilerver_from_cxx_version_string("g++-mp-4.4 (GCC) 4.4.6" "g++" "4.4.6")
ei_test1_get_compilerver_from_cxx_version_string("g++-mp-4.4 (GCC) 2011" "g++" "4.4")
ei_test1_get_compilerver_from_cxx_version_string("x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110" "mingw32-g++" "10-win32")
endmacro()
# Split all tests listed in EIGEN_TESTS_LIST into num_splits many targets
@@ -767,4 +779,4 @@ macro(ei_add_smoke_tests smoke_test_list)
set_property(TEST ${test} PROPERTY LABELS "${test_labels};smoketest")
endif()
endforeach()
endmacro(ei_add_smoke_tests)
endmacro(ei_add_smoke_tests)

View File

@@ -147,6 +147,7 @@ mark_as_advanced(BLAS_VERBOSE)
include(CheckFunctionExists)
include(CheckFortranFunctionExists)
include(CMakeFindDependencyMacro)
set(_blas_ORIG_CMAKE_FIND_LIBRARY_SUFFIXES ${CMAKE_FIND_LIBRARY_SUFFIXES})
@@ -509,9 +510,9 @@ if (BLA_VENDOR MATCHES "Intel*" OR BLA_VENDOR STREQUAL "All")
if (_LANGUAGES_ MATCHES C OR _LANGUAGES_ MATCHES CXX)
if(BLAS_FIND_QUIETLY OR NOT BLAS_FIND_REQUIRED)
find_package(Threads)
find_dependency(Threads)
else()
find_package(Threads REQUIRED)
find_dependency(Threads REQUIRED)
endif()
set(BLAS_SEARCH_LIBS "")

View File

@@ -41,18 +41,19 @@
# License text for the above reference.)
# macro to factorize this call
include(CMakeFindDependencyMacro)
macro(find_package_blas)
if(BLASEXT_FIND_REQUIRED)
if(BLASEXT_FIND_QUIETLY)
find_package(BLAS REQUIRED QUIET)
find_dependency(BLAS REQUIRED QUIET)
else()
find_package(BLAS REQUIRED)
find_dependency(BLAS REQUIRED)
endif()
else()
if(BLASEXT_FIND_QUIETLY)
find_package(BLAS QUIET)
find_dependency(BLAS QUIET)
else()
find_package(BLAS)
find_dependency(BLAS)
endif()
endif()
endmacro()
@@ -316,7 +317,7 @@ if(BLA_VENDOR MATCHES "Intel*")
"\n (see BLAS_SEQ_LIBRARIES and BLAS_PAR_LIBRARIES)")
message(STATUS "BLAS sequential libraries stored in BLAS_SEQ_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_SEQ_LIBRARIES
BLAS_LIBRARY_DIRS
BLAS_INCLUDE_DIRS)
@@ -324,14 +325,14 @@ if(BLA_VENDOR MATCHES "Intel*")
if(NOT BLASEXT_FIND_QUIETLY)
message(STATUS "BLAS parallel libraries stored in BLAS_PAR_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_PAR_LIBRARIES)
endif()
else()
if(NOT BLASEXT_FIND_QUIETLY)
message(STATUS "BLAS sequential libraries stored in BLAS_SEQ_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_SEQ_LIBRARIES
BLAS_LIBRARY_DIRS
BLAS_INCLUDE_DIRS)
@@ -343,14 +344,14 @@ elseif(BLA_VENDOR MATCHES "ACML*")
"\n (see BLAS_SEQ_LIBRARIES and BLAS_PAR_LIBRARIES)")
message(STATUS "BLAS sequential libraries stored in BLAS_SEQ_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_SEQ_LIBRARIES
BLAS_LIBRARY_DIRS)
if(BLAS_PAR_LIBRARIES)
if(NOT BLASEXT_FIND_QUIETLY)
message(STATUS "BLAS parallel libraries stored in BLAS_PAR_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_PAR_LIBRARIES)
endif()
elseif(BLA_VENDOR MATCHES "IBMESSL*")
@@ -360,21 +361,24 @@ elseif(BLA_VENDOR MATCHES "IBMESSL*")
"\n (see BLAS_SEQ_LIBRARIES and BLAS_PAR_LIBRARIES)")
message(STATUS "BLAS sequential libraries stored in BLAS_SEQ_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_SEQ_LIBRARIES
BLAS_LIBRARY_DIRS)
if(BLAS_PAR_LIBRARIES)
if(NOT BLASEXT_FIND_QUIETLY)
message(STATUS "BLAS parallel libraries stored in BLAS_PAR_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_PAR_LIBRARIES)
endif()
else()
if(NOT BLASEXT_FIND_QUIETLY)
message(STATUS "BLAS sequential libraries stored in BLAS_SEQ_LIBRARIES")
endif()
find_package_handle_standard_args(BLAS DEFAULT_MSG
find_package_handle_standard_args(BLASEXT DEFAULT_MSG
BLAS_SEQ_LIBRARIES
BLAS_LIBRARY_DIRS)
endif()
# Callers expect BLAS_FOUND to be set as well.
set(BLAS_FOUND BLASEXT_FOUND)

View File

@@ -41,7 +41,8 @@ set(COMPUTECPP_BITCODE "spir64" CACHE STRING
"Bitcode type to use as SYCL target in compute++")
mark_as_advanced(COMPUTECPP_BITCODE)
find_package(OpenCL REQUIRED)
include(CMakeFindDependencyMacro)
find_dependency(OpenCL REQUIRED)
# Find ComputeCpp package

View File

@@ -22,7 +22,8 @@ if( NOT FFTW_ROOT AND ENV{FFTWDIR} )
endif()
# Check if we can use PkgConfig
find_package(PkgConfig)
include(CMakeFindDependencyMacro)
find_dependency(PkgConfig)
#Determine from PKG
if( PKG_CONFIG_FOUND AND NOT FFTW_ROOT )

View File

@@ -65,8 +65,9 @@ endif()
# Optionally use pkg-config to detect include/library dirs (if pkg-config is available)
# -------------------------------------------------------------------------------------
include(FindPkgConfig)
find_package(PkgConfig QUIET)
include(CMakeFindDependencyMacro)
# include(FindPkgConfig)
find_dependency(PkgConfig QUIET)
if( PKG_CONFIG_EXECUTABLE AND NOT HWLOC_GIVEN_BY_USER )
pkg_search_module(HWLOC hwloc)

View File

@@ -26,6 +26,7 @@
include(CheckFunctionExists)
include(CMakeFindDependencyMacro)
# This macro checks for the existence of the combination of fortran libraries
# given by _list. If the combination is found, this macro checks (using the
@@ -88,7 +89,7 @@ macro(check_lapack_libraries DEFINITIONS LIBRARIES _prefix _name _flags _list _b
set(${LIBRARIES} ${_libraries_found})
# Some C++ linkers require the f2c library to link with Fortran libraries.
# I do not know which ones, thus I just add the f2c library if it is available.
find_package( F2C QUIET )
find_dependency( F2C QUIET )
if ( F2C_FOUND )
set(${DEFINITIONS} ${${DEFINITIONS}} ${F2C_DEFINITIONS})
set(${LIBRARIES} ${${LIBRARIES}} ${F2C_LIBRARIES})
@@ -135,9 +136,9 @@ endmacro()
# LAPACK requires BLAS
if(LAPACK_FIND_QUIETLY OR NOT LAPACK_FIND_REQUIRED)
find_package(BLAS)
find_dependency(BLAS)
else()
find_package(BLAS REQUIRED)
find_dependency(BLAS REQUIRED)
endif()
if (NOT BLAS_FOUND)

103
cmake/FindMPREAL.cmake Normal file
View File

@@ -0,0 +1,103 @@
# Try to find the MPFR C++ (MPREAL) library
# See http://www.holoborodko.com/pavel/mpreal/
#
# This module supports requiring a minimum version, e.g. you can do
# find_package(MPREAL 1.8.6)
# to require version 1.8.6 or newer of MPREAL C++.
#
# Once done this will define
#
# MPREAL_FOUND - system has MPREAL lib with correct version
# MPREAL_INCLUDES - MPREAL required include directories
# MPREAL_LIBRARIES - MPREAL required libraries
# MPREAL_VERSION - MPREAL version
# Copyright (c) 2020 The Eigen Authors.
# Redistribution and use is allowed according to the terms of the BSD license.
include(CMakeFindDependencyMacro)
find_dependency(MPFR)
find_dependency(GMP)
# Set MPREAL_INCLUDES
find_path(MPREAL_INCLUDES
NAMES
mpreal.h
PATHS
$ENV{GMPDIR}
${INCLUDE_INSTALL_DIR}
)
# Set MPREAL_FIND_VERSION to 1.0.0 if no minimum version is specified
if(NOT MPREAL_FIND_VERSION)
if(NOT MPREAL_FIND_VERSION_MAJOR)
set(MPREAL_FIND_VERSION_MAJOR 1)
endif()
if(NOT MPREAL_FIND_VERSION_MINOR)
set(MPREAL_FIND_VERSION_MINOR 0)
endif()
if(NOT MPREAL_FIND_VERSION_PATCH)
set(MPREAL_FIND_VERSION_PATCH 0)
endif()
set(MPREAL_FIND_VERSION "${MPREAL_FIND_VERSION_MAJOR}.${MPREAL_FIND_VERSION_MINOR}.${MPREAL_FIND_VERSION_PATCH}")
endif()
# Check bugs
# - https://github.com/advanpix/mpreal/issues/7
# - https://github.com/advanpix/mpreal/issues/9
set(MPREAL_TEST_PROGRAM "
#include <mpreal.h>
#include <algorithm>
int main(int argc, char** argv) {
const mpfr::mpreal one = 1.0;
const mpfr::mpreal zero = 0.0;
using namespace std;
const mpfr::mpreal smaller = min(one, zero);
return 0;
}")
if(MPREAL_INCLUDES)
# Set MPREAL_VERSION
file(READ "${MPREAL_INCLUDES}/mpreal.h" _mpreal_version_header)
string(REGEX MATCH "define[ \t]+MPREAL_VERSION_MAJOR[ \t]+([0-9]+)" _mpreal_major_version_match "${_mpreal_version_header}")
set(MPREAL_MAJOR_VERSION "${CMAKE_MATCH_1}")
string(REGEX MATCH "define[ \t]+MPREAL_VERSION_MINOR[ \t]+([0-9]+)" _mpreal_minor_version_match "${_mpreal_version_header}")
set(MPREAL_MINOR_VERSION "${CMAKE_MATCH_1}")
string(REGEX MATCH "define[ \t]+MPREAL_VERSION_PATCHLEVEL[ \t]+([0-9]+)" _mpreal_patchlevel_version_match "${_mpreal_version_header}")
set(MPREAL_PATCHLEVEL_VERSION "${CMAKE_MATCH_1}")
set(MPREAL_VERSION ${MPREAL_MAJOR_VERSION}.${MPREAL_MINOR_VERSION}.${MPREAL_PATCHLEVEL_VERSION})
# Check whether found version exceeds minimum version
if(${MPREAL_VERSION} VERSION_LESS ${MPREAL_FIND_VERSION})
set(MPREAL_VERSION_OK FALSE)
message(STATUS "MPREAL version ${MPREAL_VERSION} found in ${MPREAL_INCLUDES}, "
"but at least version ${MPREAL_FIND_VERSION} is required")
else()
set(MPREAL_VERSION_OK TRUE)
list(APPEND MPREAL_INCLUDES "${MPFR_INCLUDES}" "${GMP_INCLUDES}")
list(REMOVE_DUPLICATES MPREAL_INCLUDES)
list(APPEND MPREAL_LIBRARIES "${MPFR_LIBRARIES}" "${GMP_LIBRARIES}")
list(REMOVE_DUPLICATES MPREAL_LIBRARIES)
# Make sure it compiles with the current compiler.
unset(MPREAL_WORKS CACHE)
include(CheckCXXSourceCompiles)
set(CMAKE_REQUIRED_INCLUDES "${MPREAL_INCLUDES}")
set(CMAKE_REQUIRED_LIBRARIES "${MPREAL_LIBRARIES}")
check_cxx_source_compiles("${MPREAL_TEST_PROGRAM}" MPREAL_WORKS)
endif()
endif()
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(MPREAL DEFAULT_MSG
MPREAL_INCLUDES MPREAL_VERSION_OK MPREAL_WORKS)
mark_as_advanced(MPREAL_INCLUDES)

View File

@@ -118,7 +118,7 @@ if( PASTIX_FIND_COMPONENTS )
if (${component} STREQUAL "SCOTCH")
set(PASTIX_LOOK_FOR_SCOTCH ON)
endif()
if (${component} STREQUAL "SCOTCH")
if (${component} STREQUAL "PTSCOTCH")
set(PASTIX_LOOK_FOR_PTSCOTCH ON)
endif()
if (${component} STREQUAL "METIS")
@@ -133,14 +133,14 @@ endif()
# Required dependencies
# ---------------------
include(CMakeFindDependencyMacro)
if (NOT PASTIX_FIND_QUIETLY)
message(STATUS "Looking for PASTIX - Try to detect pthread")
endif()
if (PASTIX_FIND_REQUIRED)
find_package(Threads REQUIRED QUIET)
find_dependency(Threads REQUIRED QUIET)
else()
find_package(Threads QUIET)
find_dependency(Threads QUIET)
endif()
set(PASTIX_EXTRA_LIBRARIES "")
if( THREADS_FOUND )
@@ -198,9 +198,9 @@ if (NOT PASTIX_FIND_QUIETLY)
message(STATUS "Looking for PASTIX - Try to detect HWLOC")
endif()
if (PASTIX_FIND_REQUIRED)
find_package(HWLOC REQUIRED QUIET)
find_dependency(HWLOC REQUIRED QUIET)
else()
find_package(HWLOC QUIET)
find_dependency(HWLOC QUIET)
endif()
# PASTIX depends on BLAS
@@ -209,9 +209,9 @@ if (NOT PASTIX_FIND_QUIETLY)
message(STATUS "Looking for PASTIX - Try to detect BLAS")
endif()
if (PASTIX_FIND_REQUIRED)
find_package(BLASEXT REQUIRED QUIET)
find_dependency(BLASEXT REQUIRED QUIET)
else()
find_package(BLASEXT QUIET)
find_dependency(BLASEXT QUIET)
endif()
# Optional dependencies
@@ -230,9 +230,9 @@ if (NOT MPI_FOUND AND PASTIX_LOOK_FOR_MPI)
set(MPI_C_COMPILER mpicc)
endif()
if (PASTIX_FIND_REQUIRED AND PASTIX_FIND_REQUIRED_MPI)
find_package(MPI REQUIRED QUIET)
find_dependency(MPI REQUIRED QUIET)
else()
find_package(MPI QUIET)
find_dependency(MPI QUIET)
endif()
if (MPI_FOUND)
mark_as_advanced(MPI_LIBRARY)
@@ -272,10 +272,10 @@ if( NOT STARPU_FOUND AND PASTIX_LOOK_FOR_STARPU)
endif()
# set the list of optional dependencies we may discover
if (PASTIX_FIND_REQUIRED AND PASTIX_FIND_REQUIRED_STARPU)
find_package(STARPU ${PASTIX_STARPU_VERSION} REQUIRED
find_dependency(STARPU ${PASTIX_STARPU_VERSION} REQUIRED
COMPONENTS ${STARPU_COMPONENT_LIST})
else()
find_package(STARPU ${PASTIX_STARPU_VERSION}
find_dependency(STARPU ${PASTIX_STARPU_VERSION}
COMPONENTS ${STARPU_COMPONENT_LIST})
endif()
@@ -288,9 +288,9 @@ if (NOT SCOTCH_FOUND AND PASTIX_LOOK_FOR_SCOTCH)
message(STATUS "Looking for PASTIX - Try to detect SCOTCH")
endif()
if (PASTIX_FIND_REQUIRED AND PASTIX_FIND_REQUIRED_SCOTCH)
find_package(SCOTCH REQUIRED QUIET)
find_dependency(SCOTCH REQUIRED QUIET)
else()
find_package(SCOTCH QUIET)
find_dependency(SCOTCH QUIET)
endif()
endif()
@@ -301,9 +301,9 @@ if (NOT PTSCOTCH_FOUND AND PASTIX_LOOK_FOR_PTSCOTCH)
message(STATUS "Looking for PASTIX - Try to detect PTSCOTCH")
endif()
if (PASTIX_FIND_REQUIRED AND PASTIX_FIND_REQUIRED_PTSCOTCH)
find_package(PTSCOTCH REQUIRED QUIET)
find_dependency(PTSCOTCH REQUIRED QUIET)
else()
find_package(PTSCOTCH QUIET)
find_dependency(PTSCOTCH QUIET)
endif()
endif()
@@ -314,9 +314,9 @@ if (NOT METIS_FOUND AND PASTIX_LOOK_FOR_METIS)
message(STATUS "Looking for PASTIX - Try to detect METIS")
endif()
if (PASTIX_FIND_REQUIRED AND PASTIX_FIND_REQUIRED_METIS)
find_package(METIS REQUIRED QUIET)
find_dependency(METIS REQUIRED QUIET)
else()
find_package(METIS QUIET)
find_dependency(METIS QUIET)
endif()
endif()

View File

@@ -79,20 +79,21 @@ if( PTSCOTCH_FIND_COMPONENTS )
endif()
# PTSCOTCH depends on Threads, try to find it
include(CMakeFindDependencyMacro)
if (NOT THREADS_FOUND)
if (PTSCOTCH_FIND_REQUIRED)
find_package(Threads REQUIRED)
find_dependency(Threads REQUIRED)
else()
find_package(Threads)
find_dependency(Threads)
endif()
endif()
# PTSCOTCH depends on MPI, try to find it
if (NOT MPI_FOUND)
if (PTSCOTCH_FIND_REQUIRED)
find_package(MPI REQUIRED)
find_dependency(MPI REQUIRED)
else()
find_package(MPI)
find_dependency(MPI)
endif()
endif()
@@ -148,18 +149,18 @@ else()
foreach(ptscotch_hdr ${PTSCOTCH_hdrs_to_find})
set(PTSCOTCH_${ptscotch_hdr}_DIRS "PTSCOTCH_${ptscotch_hdr}_DIRS-NOTFOUND")
find_path(PTSCOTCH_${ptscotch_hdr}_DIRS
NAMES ${ptscotch_hdr}
HINTS ${PTSCOTCH_DIR}
PATH_SUFFIXES "include" "include/scotch")
NAMES ${ptscotch_hdr}
HINTS ${PTSCOTCH_DIR}
PATH_SUFFIXES "include" "include/scotch")
mark_as_advanced(PTSCOTCH_${ptscotch_hdr}_DIRS)
endforeach()
else()
foreach(ptscotch_hdr ${PTSCOTCH_hdrs_to_find})
set(PTSCOTCH_${ptscotch_hdr}_DIRS "PTSCOTCH_${ptscotch_hdr}_DIRS-NOTFOUND")
find_path(PTSCOTCH_${ptscotch_hdr}_DIRS
NAMES ${ptscotch_hdr}
HINTS ${_inc_env}
PATH_SUFFIXES "scotch")
NAMES ${ptscotch_hdr}
HINTS ${_inc_env}
PATH_SUFFIXES "scotch")
mark_as_advanced(PTSCOTCH_${ptscotch_hdr}_DIRS)
endforeach()
endif()
@@ -171,7 +172,6 @@ foreach(ptscotch_hdr ${PTSCOTCH_hdrs_to_find})
if (PTSCOTCH_${ptscotch_hdr}_DIRS)
list(APPEND PTSCOTCH_INCLUDE_DIRS "${PTSCOTCH_${ptscotch_hdr}_DIRS}")
else ()
set(PTSCOTCH_INCLUDE_DIRS "PTSCOTCH_INCLUDE_DIRS-NOTFOUND")
if (NOT PTSCOTCH_FIND_QUIETLY)
message(STATUS "Looking for ptscotch -- ${ptscotch_hdr} not found")
endif()
@@ -229,16 +229,16 @@ else()
foreach(ptscotch_lib ${PTSCOTCH_libs_to_find})
set(PTSCOTCH_${ptscotch_lib}_LIBRARY "PTSCOTCH_${ptscotch_lib}_LIBRARY-NOTFOUND")
find_library(PTSCOTCH_${ptscotch_lib}_LIBRARY
NAMES ${ptscotch_lib}
HINTS ${PTSCOTCH_DIR}
PATH_SUFFIXES lib lib32 lib64)
NAMES ${ptscotch_lib}
HINTS ${PTSCOTCH_DIR}
PATH_SUFFIXES lib lib32 lib64)
endforeach()
else()
foreach(ptscotch_lib ${PTSCOTCH_libs_to_find})
set(PTSCOTCH_${ptscotch_lib}_LIBRARY "PTSCOTCH_${ptscotch_lib}_LIBRARY-NOTFOUND")
find_library(PTSCOTCH_${ptscotch_lib}_LIBRARY
NAMES ${ptscotch_lib}
HINTS ${_lib_env})
NAMES ${ptscotch_lib}
HINTS ${_lib_env})
endforeach()
endif()
endif()
@@ -255,7 +255,6 @@ foreach(ptscotch_lib ${PTSCOTCH_libs_to_find})
list(APPEND PTSCOTCH_LIBRARIES "${PTSCOTCH_${ptscotch_lib}_LIBRARY}")
list(APPEND PTSCOTCH_LIBRARY_DIRS "${${ptscotch_lib}_lib_path}")
else ()
list(APPEND PTSCOTCH_LIBRARIES "${PTSCOTCH_${ptscotch_lib}_LIBRARY}")
if (NOT PTSCOTCH_FIND_QUIETLY)
message(STATUS "Looking for ptscotch -- lib ${ptscotch_lib} not found")
endif()

View File

@@ -71,11 +71,12 @@ if( SCOTCH_FIND_COMPONENTS )
endif()
# SCOTCH may depend on Threads, try to find it
include(CMakeFindDependencyMacro)
if (NOT THREADS_FOUND)
if (SCOTCH_FIND_REQUIRED)
find_package(Threads REQUIRED)
find_dependency(Threads REQUIRED)
else()
find_package(Threads)
find_dependency(Threads)
endif()
endif()

View File

@@ -57,18 +57,19 @@ mark_as_advanced(TRISYCL_DEBUG_STRUCTORS)
mark_as_advanced(TRISYCL_TRACE_KERNEL)
#triSYCL definitions
set(CL_SYCL_LANGUAGE_VERSION 220 CACHE VERSION
set(CL_SYCL_LANGUAGE_VERSION 220 CACHE STRING
"Host language version to be used by trisYCL (default is: 220)")
set(TRISYCL_CL_LANGUAGE_VERSION 220 CACHE VERSION
set(TRISYCL_CL_LANGUAGE_VERSION 220 CACHE STRING
"Device language version to be used by trisYCL (default is: 220)")
#set(TRISYCL_COMPILE_OPTIONS "-std=c++1z -Wall -Wextra")
set(CMAKE_CXX_STANDARD 14)
# triSYCL now requires c++17
set(CMAKE_CXX_STANDARD 17)
set(CXX_STANDARD_REQUIRED ON)
# Find OpenCL package
include(CMakeFindDependencyMacro)
if(TRISYCL_OPENCL)
find_package(OpenCL REQUIRED)
find_dependency(OpenCL REQUIRED)
if(UNIX)
set(BOOST_COMPUTE_INCPATH /usr/include/compute CACHE PATH
"Path to Boost.Compute headers (default is: /usr/include/compute)")
@@ -77,11 +78,11 @@ endif()
# Find OpenMP package
if(TRISYCL_OPENMP)
find_package(OpenMP REQUIRED)
find_dependency(OpenMP REQUIRED)
endif()
# Find Boost
find_package(Boost 1.58 REQUIRED COMPONENTS chrono log)
find_dependency(Boost 1.58 REQUIRED COMPONENTS chrono log)
# If debug or trace we need boost log
if(TRISYCL_DEBUG OR TRISYCL_DEBUG_STRUCTORS OR TRISYCL_TRACE_KERNEL)
@@ -90,9 +91,23 @@ else()
set(LOG_NEEDED OFF)
endif()
find_package(Threads REQUIRED)
find_dependency(Threads REQUIRED)
# Find triSYCL directory
if (TRISYCL_INCLUDES AND TRISYCL_LIBRARIES)
set(TRISYCL_FIND_QUIETLY TRUE)
endif ()
find_path(TRISYCL_INCLUDE_DIR
NAMES sycl.hpp
PATHS $ENV{TRISYCLDIR} $ENV{TRISYCLDIR}/include ${INCLUDE_INSTALL_DIR}
PATH_SUFFIXES triSYCL
)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(TriSYCL DEFAULT_MSG
TRISYCL_INCLUDE_DIR)
if(NOT TRISYCL_INCLUDE_DIR)
message(FATAL_ERROR
"triSYCL include directory - Not found! (please set TRISYCL_INCLUDE_DIR")
@@ -100,36 +115,42 @@ else()
message(STATUS "triSYCL include directory - Found ${TRISYCL_INCLUDE_DIR}")
endif()
include(CMakeParseArguments)
#######################
# add_sycl_to_target
#######################
#
# Sets the proper flags and includes for the target compilation.
#
# targetName : Name of the target to add a SYCL to.
# sourceFile : Source file to be compiled for SYCL.
# binaryDir : Intermediate directory to output the integration header.
#
function(add_sycl_to_target targetName sourceFile binaryDir)
function(add_sycl_to_target)
set(options)
set(one_value_args
TARGET
)
set(multi_value_args
SOURCES
)
cmake_parse_arguments(ADD_SYCL_ARGS
"${options}"
"${one_value_args}"
"${multi_value_args}"
${ARGN}
)
# Add include directories to the "#include <>" paths
target_include_directories (${targetName} PUBLIC
target_include_directories (${ADD_SYCL_ARGS_TARGET} PUBLIC
${TRISYCL_INCLUDE_DIR}
${Boost_INCLUDE_DIRS}
$<$<BOOL:${TRISYCL_OPENCL}>:${OpenCL_INCLUDE_DIRS}>
$<$<BOOL:${TRISYCL_OPENCL}>:${BOOST_COMPUTE_INCPATH}>)
# Link dependencies
target_link_libraries(${targetName} PUBLIC
target_link_libraries(${ADD_SYCL_ARGS_TARGET}
$<$<BOOL:${TRISYCL_OPENCL}>:${OpenCL_LIBRARIES}>
Threads::Threads
$<$<BOOL:${LOG_NEEDED}>:Boost::log>
Boost::chrono)
# Compile definitions
target_compile_definitions(${targetName} PUBLIC
target_compile_definitions(${ADD_SYCL_ARGS_TARGET} PUBLIC
EIGEN_SYCL_TRISYCL
$<$<BOOL:${TRISYCL_NO_ASYNC}>:TRISYCL_NO_ASYNC>
$<$<BOOL:${TRISYCL_OPENCL}>:TRISYCL_OPENCL>
$<$<BOOL:${TRISYCL_DEBUG}>:TRISYCL_DEBUG>
@@ -138,13 +159,13 @@ function(add_sycl_to_target targetName sourceFile binaryDir)
$<$<BOOL:${LOG_NEEDED}>:BOOST_LOG_DYN_LINK>)
# C++ and OpenMP requirements
target_compile_options(${targetName} PUBLIC
target_compile_options(${ADD_SYCL_ARGS_TARGET} PUBLIC
${TRISYCL_COMPILE_OPTIONS}
$<$<BOOL:${TRISYCL_OPENMP}>:${OpenMP_CXX_FLAGS}>)
if(${TRISYCL_OPENMP} AND (NOT WIN32))
# Does not support generator expressions
set_target_properties(${targetName}
set_target_properties(${ADD_SYCL_ARGS_TARGET}
PROPERTIES
LINK_FLAGS ${OpenMP_CXX_FLAGS})
endif()

View File

@@ -30,14 +30,17 @@ computing least squares solutions:
</table>
This is example from the page \link TutorialLinearAlgebra Linear algebra and decompositions \endlink.
If you just need to solve the least squares problem, but are not interested in the SVD per se, a
faster alternative method is CompleteOrthogonalDecomposition.
\section LeastSquaresQR Using the QR decomposition
The solve() method in QR decomposition classes also computes the least squares solution. There are
three QR decomposition classes: HouseholderQR (no pivoting, so fast but unstable),
ColPivHouseholderQR (column pivoting, thus a bit slower but more accurate) and FullPivHouseholderQR
(full pivoting, so slowest and most stable). Here is an example with column pivoting:
three QR decomposition classes: HouseholderQR (no pivoting, fast but unstable if your matrix is
not rull rank), ColPivHouseholderQR (column pivoting, thus a bit slower but more stable) and
FullPivHouseholderQR (full pivoting, so slowest and slightly more stable than ColPivHouseholderQR).
Here is an example with column pivoting:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
@@ -61,9 +64,11 @@ Finding the least squares solution of \a Ax = \a b is equivalent to solving the
</tr>
</table>
If the matrix \a A is ill-conditioned, then this is not a good method, because the condition number
This method is usually the fastest, especially when \a A is "tall and skinny". However, if the
matrix \a A is even mildly ill-conditioned, this is not a good method, because the condition number
of <i>A</i><sup>T</sup><i>A</i> is the square of the condition number of \a A. This means that you
lose twice as many digits using normal equation than if you use the other methods.
lose roughly twice as many digits of accuracy using the normal equation, compared to the more stable
methods mentioned above.
*/

View File

@@ -72,7 +72,7 @@ To get an overview of the true relative speed of the different decompositions, c
<td>Orthogonalization</td>
<td>Yes</td>
<td>Excellent</td>
<td><em>Soon: blocking</em></td>
<td><em>-</em></td>
</tr>
<tr>
@@ -88,6 +88,18 @@ To get an overview of the true relative speed of the different decompositions, c
</tr>
<tr class="alt">
<td>CompleteOrthogonalDecomposition</td>
<td>-</td>
<td>Fast</td>
<td>Good</td>
<td>Yes</td>
<td>Orthogonalization</td>
<td>Yes</td>
<td>Excellent</td>
<td><em>-</em></td>
</tr>
<tr>
<td>LLT</td>
<td>Positive definite</td>
<td>Very fast</td>
@@ -99,7 +111,7 @@ To get an overview of the true relative speed of the different decompositions, c
<td>Blocking</td>
</tr>
<tr>
<tr class="alt">
<td>LDLT</td>
<td>Positive or negative semidefinite<sup><a href="#note1">1</a></sup></td>
<td>Very fast</td>

View File

@@ -167,6 +167,20 @@ matrix.rightCols(q);\endcode </td>
<td>\code
matrix.rightCols<q>();\endcode </td>
</tr>
<tr><td>%Block containing the q columns starting from i
\link DenseBase::middleCols() * \endlink</td>
<td>\code
matrix.middleCols(i,q);\endcode </td>
<td>\code
matrix.middleCols<q>(i);\endcode </td>
</tr>
<tr><td>%Block containing the q rows starting from i
\link DenseBase::middleRows() * \endlink</td>
<td>\code
matrix.middleRows(i,q);\endcode </td>
<td>\code
matrix.middleRows<q>(i);\endcode </td>
</tr>
</table>
Here is a simple example illustrating the use of the operations presented above:

View File

@@ -14,7 +14,7 @@ QR, %SVD, eigendecompositions... After reading this page, don't miss our
\f[ Ax \: = \: b \f]
Where \a A and \a b are matrices (\a b could be a vector, as a special case). You want to find a solution \a x.
\b The \b solution: You can choose between various decompositions, depending on what your matrix \a A looks like,
\b The \b solution: You can choose between various decompositions, depending on the properties of your matrix \a A,
and depending on whether you favor speed or accuracy. However, let's start with an example that works in all cases,
and is a good compromise:
<table class="example">
@@ -34,7 +34,7 @@ Vector3f x = dec.solve(b);
Here, ColPivHouseholderQR is a QR decomposition with column pivoting. It's a good compromise for this tutorial, as it
works for all matrices while being quite fast. Here is a table of some other decompositions that you can choose from,
depending on your matrix and the trade-off you want to make:
depending on your matrix, the problem you are trying to solve, and the trade-off you want to make:
<table class="manual">
<tr>
@@ -128,11 +128,13 @@ depending on your matrix and the trade-off you want to make:
</table>
To get an overview of the true relative speed of the different decompositions, check this \link DenseDecompositionBenchmark benchmark \endlink.
All of these decompositions offer a solve() method that works as in the above example.
All of these decompositions offer a solve() method that works as in the above example.
For example, if your matrix is positive definite, the above table says that a very good
choice is then the LLT or LDLT decomposition. Here's an example, also demonstrating that using a general
matrix (not a vector) as right hand side is possible.
If you know more about the properties of your matrix, you can use the above table to select the best method.
For example, a good choice for solving linear systems with a non-symmetric matrix of full rank is PartialPivLU.
If you know that your matrix is also symmetric and positive definite, the above table says that
a very good choice is the LLT or LDLT decomposition. Here's an example, also demonstrating that using a general
matrix (not a vector) as right hand side is possible:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
@@ -146,7 +148,34 @@ For a \ref TopicLinearAlgebraDecompositions "much more complete table" comparing
supports many other decompositions), see our special page on
\ref TopicLinearAlgebraDecompositions "this topic".
\section TutorialLinAlgSolutionExists Checking if a solution really exists
\section TutorialLinAlgLeastsquares Least squares solving
The most general and accurate method to solve under- or over-determined linear systems
in the least squares sense, is the SVD decomposition. Eigen provides two implementations.
The recommended one is the BDCSVD class, which scales well for large problems
and automatically falls back to the JacobiSVD class for smaller problems.
For both classes, their solve() method solved the linear system in the least-squares
sense.
Here is an example:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
<tr>
<td>\include TutorialLinAlgSVDSolve.cpp </td>
<td>\verbinclude TutorialLinAlgSVDSolve.out </td>
</tr>
</table>
An alternative to the SVD, which is usually faster and about as accurate, is CompleteOrthogonalDecomposition.
Again, if you know more about the problem, the table above contains methods that are potentially faster.
If your matrix is full rank, HouseHolderQR is the method of choice. If your matrix is full rank and well conditioned,
using the Cholesky decomposition (LLT) on the matrix of the normal equations can be faster still.
Our page on \link LeastSquares least squares solving \endlink has more details.
\section TutorialLinAlgSolutionExists Checking if a matrix is singular
Only you know what error margin you want to allow for a solution to be considered valid.
So Eigen lets you do this computation for yourself, if you want to, as in this example:
@@ -179,11 +208,11 @@ very rare. The call to info() is to check for this possibility.
\section TutorialLinAlgInverse Computing inverse and determinant
First of all, make sure that you really want this. While inverse and determinant are fundamental mathematical concepts,
in \em numerical linear algebra they are not as popular as in pure mathematics. Inverse computations are often
in \em numerical linear algebra they are not as useful as in pure mathematics. Inverse computations are often
advantageously replaced by solve() operations, and the determinant is often \em not a good way of checking if a matrix
is invertible.
However, for \em very \em small matrices, the above is not true, and inverse and determinant can be very useful.
However, for \em very \em small matrices, the above may not be true, and inverse and determinant can be very useful.
While certain decompositions, such as PartialPivLU and FullPivLU, offer inverse() and determinant() methods, you can also
call inverse() and determinant() directly on a matrix. If your matrix is of a very small fixed size (at most 4x4) this
@@ -198,28 +227,6 @@ Here is an example:
</tr>
</table>
\section TutorialLinAlgLeastsquares Least squares solving
The most accurate method to do least squares solving is with a SVD decomposition.
Eigen provides two implementations.
The recommended one is the BDCSVD class, which scale well for large problems
and automatically fall-back to the JacobiSVD class for smaller problems.
For both classes, their solve() method is doing least-squares solving.
Here is an example:
<table class="example">
<tr><th>Example:</th><th>Output:</th></tr>
<tr>
<td>\include TutorialLinAlgSVDSolve.cpp </td>
<td>\verbinclude TutorialLinAlgSVDSolve.out </td>
</tr>
</table>
Another methods, potentially faster but less reliable, are to use a Cholesky decomposition of the
normal matrix or a QR decomposition. Our page on \link LeastSquares least squares solving \endlink
has more details.
\section TutorialLinAlgSeparateComputation Separating the computation from the construction
In the above examples, the decomposition was computed at the same time that the decomposition object was constructed.

View File

@@ -60,7 +60,7 @@ On the other hand, inserting elements with increasing inner indices in a given i
The case where no empty space is available is a special case, and is referred as the \em compressed mode.
It corresponds to the widely used Compressed Column (or Row) Storage schemes (CCS or CRS).
Any SparseMatrix can be turned to this form by calling the SparseMatrix::makeCompressed() function.
In this case, one can remark that the \c InnerNNZs array is redundant with \c OuterStarts because we the equality: \c InnerNNZs[j] = \c OuterStarts[j+1]-\c OuterStarts[j].
In this case, one can remark that the \c InnerNNZs array is redundant with \c OuterStarts because we have the equality: \c InnerNNZs[j] = \c OuterStarts[j+1]-\c OuterStarts[j].
Therefore, in practice a call to SparseMatrix::makeCompressed() frees this buffer.
It is worth noting that most of our wrappers to external libraries requires compressed matrices as inputs.

View File

@@ -1,4 +1,4 @@
MatrixXcf ones = MatrixXcf::Ones(3,3);
ComplexEigenSolver<MatrixXcf> ces(ones);
cout << "The first eigenvector of the 3x3 matrix of ones is:"
<< endl << ces.eigenvectors().col(1) << endl;
<< endl << ces.eigenvectors().col(0) << endl;

View File

@@ -1,4 +1,4 @@
MatrixXd ones = MatrixXd::Ones(3,3);
SelfAdjointEigenSolver<MatrixXd> es(ones);
cout << "The first eigenvector of the 3x3 matrix of ones is:"
<< endl << es.eigenvectors().col(1) << endl;
<< endl << es.eigenvectors().col(0) << endl;

View File

@@ -4,7 +4,8 @@ cout << "Here is a random symmetric 5x5 matrix:" << endl << A << endl << endl;
VectorXd diag(5);
VectorXd subdiag(4);
internal::tridiagonalization_inplace(A, diag, subdiag, true);
VectorXd hcoeffs(4); // Scratch space for householder reflector.
internal::tridiagonalization_inplace(A, diag, subdiag, hcoeffs, true);
cout << "The orthogonal matrix Q is:" << endl << A << endl;
cout << "The diagonal of the tridiagonal matrix T is:" << endl << diag << endl;
cout << "The subdiagonal of the tridiagonal matrix T is:" << endl << subdiag << endl;

View File

@@ -88,25 +88,29 @@ endif()
endif()
set(EIGEN_LAPACK_TARGETS "")
add_library(eigen_lapack_static ${EigenLapack_SRCS} ${ReferenceLapack_SRCS})
add_library(eigen_lapack SHARED ${EigenLapack_SRCS})
list(APPEND EIGEN_LAPACK_TARGETS eigen_lapack_static)
target_link_libraries(eigen_lapack eigen_blas)
if(EIGEN_STANDARD_LIBRARIES_TO_LINK_TO)
target_link_libraries(eigen_lapack_static ${EIGEN_STANDARD_LIBRARIES_TO_LINK_TO})
target_link_libraries(eigen_lapack ${EIGEN_STANDARD_LIBRARIES_TO_LINK_TO})
if (EIGEN_BUILD_SHARED_LIBS)
add_library(eigen_lapack SHARED ${EigenLapack_SRCS})
list(APPEND EIGEN_LAPACK_TARGETS eigen_lapack)
target_link_libraries(eigen_lapack eigen_blas)
endif()
add_dependencies(lapack eigen_lapack eigen_lapack_static)
foreach(target IN LISTS EIGEN_LAPACK_TARGETS)
if(EIGEN_STANDARD_LIBRARIES_TO_LINK_TO)
target_link_libraries(${target} ${EIGEN_STANDARD_LIBRARIES_TO_LINK_TO})
endif()
add_dependencies(lapack ${target})
install(TARGETS ${target}
RUNTIME DESTINATION bin
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib)
endforeach()
install(TARGETS eigen_lapack eigen_lapack_static
RUNTIME DESTINATION bin
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib)
get_filename_component(eigen_full_path_to_testing_lapack "./testing/" ABSOLUTE)
if(EXISTS ${eigen_full_path_to_testing_lapack})

View File

@@ -126,7 +126,7 @@ template<>
struct NumTraits<AnnoyingScalar> : NumTraits<float>
{
enum {
RequireInitialization = true
RequireInitialization = 1,
};
typedef AnnoyingScalar Real;
typedef AnnoyingScalar Nested;
@@ -145,10 +145,6 @@ bool (isfinite)(const AnnoyingScalar& x) {
}
namespace internal {
template<> EIGEN_STRONG_INLINE AnnoyingScalar pcmp_eq(const AnnoyingScalar& a, const AnnoyingScalar& b)
{ return AnnoyingScalar(pcmp_eq(*a.v, *b.v)); }
template<> EIGEN_STRONG_INLINE AnnoyingScalar pselect(const AnnoyingScalar& mask, const AnnoyingScalar& a, const AnnoyingScalar& b)
{ return numext::equal_strict(*mask.v, 0.f) ? b : a; }
template<> EIGEN_STRONG_INLINE double cast(const AnnoyingScalar& x) { return double(*x.v); }
template<> EIGEN_STRONG_INLINE float cast(const AnnoyingScalar& x) { return *x.v; }
}

View File

@@ -164,7 +164,6 @@ ei_add_test(nullary)
ei_add_test(mixingtypes)
ei_add_test(io)
ei_add_test(packetmath "-DEIGEN_FAST_MATH=1")
ei_add_test(unalignedassert)
ei_add_test(vectorization_logic)
ei_add_test(basicstuff)
ei_add_test(constructor)

30
test/SafeScalar.h Normal file
View File

@@ -0,0 +1,30 @@
// A Scalar that asserts for uninitialized access.
template<typename T>
class SafeScalar {
public:
SafeScalar() : initialized_(false) {}
SafeScalar(const SafeScalar& other) {
*this = other;
}
SafeScalar& operator=(const SafeScalar& other) {
val_ = T(other);
initialized_ = true;
return *this;
}
SafeScalar(T val) : val_(val), initialized_(true) {}
SafeScalar& operator=(T val) {
val_ = val;
initialized_ = true;
}
operator T() const {
VERIFY(initialized_ && "Uninitialized access.");
return val_;
}
private:
T val_;
bool initialized_;
};

View File

@@ -626,6 +626,41 @@ template<typename ArrayType> void min_max(const ArrayType& m)
}
}
template<int N>
struct shift_left {
template<typename Scalar>
Scalar operator()(const Scalar& v) const {
return v << N;
}
};
template<int N>
struct arithmetic_shift_right {
template<typename Scalar>
Scalar operator()(const Scalar& v) const {
return v >> N;
}
};
template<typename ArrayType> void array_integer(const ArrayType& m)
{
Index rows = m.rows();
Index cols = m.cols();
ArrayType m1 = ArrayType::Random(rows, cols),
m2(rows, cols);
m2 = m1.template shiftLeft<2>();
VERIFY( (m2 == m1.unaryExpr(shift_left<2>())).all() );
m2 = m1.template shiftLeft<9>();
VERIFY( (m2 == m1.unaryExpr(shift_left<9>())).all() );
m2 = m1.template shiftRight<2>();
VERIFY( (m2 == m1.unaryExpr(arithmetic_shift_right<2>())).all() );
m2 = m1.template shiftRight<9>();
VERIFY( (m2 == m1.unaryExpr(arithmetic_shift_right<9>())).all() );
}
EIGEN_DECLARE_TEST(array_cwise)
{
for(int i = 0; i < g_repeat; i++) {
@@ -636,6 +671,8 @@ EIGEN_DECLARE_TEST(array_cwise)
CALL_SUBTEST_5( array(ArrayXXf(internal::random<int>(1,EIGEN_TEST_MAX_SIZE), internal::random<int>(1,EIGEN_TEST_MAX_SIZE))) );
CALL_SUBTEST_6( array(ArrayXXi(internal::random<int>(1,EIGEN_TEST_MAX_SIZE), internal::random<int>(1,EIGEN_TEST_MAX_SIZE))) );
CALL_SUBTEST_6( array(Array<Index,Dynamic,Dynamic>(internal::random<int>(1,EIGEN_TEST_MAX_SIZE), internal::random<int>(1,EIGEN_TEST_MAX_SIZE))) );
CALL_SUBTEST_6( array_integer(ArrayXXi(internal::random<int>(1,EIGEN_TEST_MAX_SIZE), internal::random<int>(1,EIGEN_TEST_MAX_SIZE))) );
CALL_SUBTEST_6( array_integer(Array<Index,Dynamic,Dynamic>(internal::random<int>(1,EIGEN_TEST_MAX_SIZE), internal::random<int>(1,EIGEN_TEST_MAX_SIZE))) );
}
for(int i = 0; i < g_repeat; i++) {
CALL_SUBTEST_1( comparisons(Array<float, 1, 1>()) );

View File

@@ -32,18 +32,6 @@ float BinaryToFloat(uint32_t sign, uint32_t exponent, uint32_t high_mantissa,
return dest;
}
void test_truncate(float input, float expected_truncation, float expected_rounding){
bfloat16 truncated = Eigen::bfloat16_impl::truncate_to_bfloat16(input);
bfloat16 rounded = Eigen::bfloat16_impl::float_to_bfloat16_rtne<false>(input);
if ((numext::isnan)(input)){
VERIFY((numext::isnan)(static_cast<float>(truncated)) || (numext::isinf)(static_cast<float>(truncated)));
VERIFY((numext::isnan)(static_cast<float>(rounded)) || (numext::isinf)(static_cast<float>(rounded)));
return;
}
VERIFY_IS_EQUAL(expected_truncation, static_cast<float>(truncated));
VERIFY_IS_EQUAL(expected_rounding, static_cast<float>(rounded));
}
template<typename T>
void test_roundtrip() {
// Representable T round trip via bfloat16
@@ -122,31 +110,6 @@ void test_conversion()
VERIFY_BFLOAT16_BITS_EQUAL(bfloat16(0.0f), 0x0000);
VERIFY_BFLOAT16_BITS_EQUAL(bfloat16(-0.0f), 0x8000);
// Flush denormals to zero
for (float denorm = -std::numeric_limits<float>::denorm_min();
denorm < std::numeric_limits<float>::denorm_min();
denorm = nextafterf(denorm, 1.0f)) {
bfloat16 bf_trunc = Eigen::bfloat16_impl::truncate_to_bfloat16(denorm);
VERIFY_IS_EQUAL(static_cast<float>(bf_trunc), 0.0f);
// Implicit conversion of denormls to bool is correct
VERIFY_IS_EQUAL(static_cast<bool>(bfloat16(denorm)), false);
VERIFY_IS_EQUAL(bfloat16(denorm), false);
if (std::signbit(denorm)) {
VERIFY_BFLOAT16_BITS_EQUAL(bf_trunc, 0x8000);
} else {
VERIFY_BFLOAT16_BITS_EQUAL(bf_trunc, 0x0000);
}
bfloat16 bf_round = Eigen::bfloat16_impl::float_to_bfloat16_rtne<false>(denorm);
VERIFY_IS_EQUAL(static_cast<float>(bf_round), 0.0f);
if (std::signbit(denorm)) {
VERIFY_BFLOAT16_BITS_EQUAL(bf_round, 0x8000);
} else {
VERIFY_BFLOAT16_BITS_EQUAL(bf_round, 0x0000);
}
}
// Default is zero
VERIFY_IS_EQUAL(static_cast<float>(bfloat16()), 0.0f);
@@ -156,52 +119,6 @@ void test_conversion()
test_roundtrip<std::complex<float> >();
test_roundtrip<std::complex<double> >();
// Truncate test
test_truncate(
BinaryToFloat(0, 0x80, 0x48, 0xf5c3),
BinaryToFloat(0, 0x80, 0x48, 0x0000),
BinaryToFloat(0, 0x80, 0x49, 0x0000));
test_truncate(
BinaryToFloat(1, 0x80, 0x48, 0xf5c3),
BinaryToFloat(1, 0x80, 0x48, 0x0000),
BinaryToFloat(1, 0x80, 0x49, 0x0000));
test_truncate(
BinaryToFloat(0, 0x80, 0x48, 0x8000),
BinaryToFloat(0, 0x80, 0x48, 0x0000),
BinaryToFloat(0, 0x80, 0x48, 0x0000));
test_truncate(
BinaryToFloat(0, 0xff, 0x00, 0x0001),
BinaryToFloat(0, 0xff, 0x40, 0x0000),
BinaryToFloat(0, 0xff, 0x40, 0x0000));
test_truncate(
BinaryToFloat(0, 0xff, 0x7f, 0xffff),
BinaryToFloat(0, 0xff, 0x40, 0x0000),
BinaryToFloat(0, 0xff, 0x40, 0x0000));
test_truncate(
BinaryToFloat(1, 0x80, 0x48, 0xc000),
BinaryToFloat(1, 0x80, 0x48, 0x0000),
BinaryToFloat(1, 0x80, 0x49, 0x0000));
test_truncate(
BinaryToFloat(0, 0x80, 0x48, 0x0000),
BinaryToFloat(0, 0x80, 0x48, 0x0000),
BinaryToFloat(0, 0x80, 0x48, 0x0000));
test_truncate(
BinaryToFloat(0, 0x80, 0x48, 0x4000),
BinaryToFloat(0, 0x80, 0x48, 0x0000),
BinaryToFloat(0, 0x80, 0x48, 0x0000));
test_truncate(
BinaryToFloat(0, 0x80, 0x48, 0x8000),
BinaryToFloat(0, 0x80, 0x48, 0x0000),
BinaryToFloat(0, 0x80, 0x48, 0x0000));
test_truncate(
BinaryToFloat(0, 0x00, 0x48, 0x8000),
BinaryToFloat(0, 0x00, 0x00, 0x0000),
BinaryToFloat(0, 0x00, 0x00, 0x0000));
test_truncate(
BinaryToFloat(0, 0x00, 0x7f, 0xc000),
BinaryToFloat(0, 0x00, 0x00, 0x0000),
BinaryToFloat(0, 0x00, 0x00, 0x0000));
// Conversion
Array<float,1,100> a;
for (int i = 0; i < 100; i++) a(i) = i + 1.25;
@@ -250,12 +167,6 @@ void test_conversion()
VERIFY_BFLOAT16_BITS_EQUAL(bfloat16(BinaryToFloat(0x0, 0xff, 0x40, 0x0)), 0x7fc0);
VERIFY_BFLOAT16_BITS_EQUAL(bfloat16(BinaryToFloat(0x1, 0xff, 0x40, 0x0)), 0xffc0);
VERIFY_BFLOAT16_BITS_EQUAL(Eigen::bfloat16_impl::truncate_to_bfloat16(
BinaryToFloat(0x0, 0xff, 0x40, 0x0)),
0x7fc0);
VERIFY_BFLOAT16_BITS_EQUAL(Eigen::bfloat16_impl::truncate_to_bfloat16(
BinaryToFloat(0x1, 0xff, 0x40, 0x0)),
0xffc0);
}
void test_numtraits()

View File

@@ -115,9 +115,11 @@ template<int> void noncopyable()
{
typedef Eigen::Matrix<AnnoyingScalar,Dynamic,1> VectorType;
typedef Eigen::Matrix<AnnoyingScalar,Dynamic,Dynamic> MatrixType;
{
#ifndef EIGEN_TEST_ANNOYING_SCALAR_DONT_THROW
AnnoyingScalar::dont_throw = true;
#endif
int n = 50;
VectorType v0(n), v1(n);
MatrixType m0(n,n), m1(n,n), m2(n,n);
@@ -148,6 +150,7 @@ EIGEN_DECLARE_TEST(conservative_resize)
CALL_SUBTEST_4((run_matrix_tests<std::complex<float>, Eigen::ColMajor>()));
CALL_SUBTEST_5((run_matrix_tests<std::complex<double>, Eigen::RowMajor>()));
CALL_SUBTEST_5((run_matrix_tests<std::complex<double>, Eigen::ColMajor>()));
CALL_SUBTEST_1((run_matrix_tests<int, Eigen::RowMajor | Eigen::DontAlign>()));
CALL_SUBTEST_1((run_vector_tests<int>()));
CALL_SUBTEST_2((run_vector_tests<float>()));
@@ -155,7 +158,9 @@ EIGEN_DECLARE_TEST(conservative_resize)
CALL_SUBTEST_4((run_vector_tests<std::complex<float> >()));
CALL_SUBTEST_5((run_vector_tests<std::complex<double> >()));
#ifndef EIGEN_TEST_ANNOYING_SCALAR_DONT_THROW
AnnoyingScalar::dont_throw = true;
#endif
CALL_SUBTEST_6(( run_vector_tests<AnnoyingScalar>() ));
CALL_SUBTEST_6(( noncopyable<0>() ));
}

Some files were not shown because too many files have changed in this diff Show More