Compare commits

...

915 Commits

Author SHA1 Message Date
Antonio Sanchez
28ded8800c Fix input type for min/max.
We're missing an input base class specifier for the default `NanPropagation=PropagateFast`
case for consistency.

Fixes #2990.
2025-10-20 12:26:11 -07:00
Antonio Sánchez
0295f81a83 Make eigen_packet_wrapper trivial for c++11. 2025-10-02 17:53:19 +00:00
Antonio Sanchez
d71c30c478 Fix docs build job. 2025-09-29 16:25:14 -07:00
Antonio Sánchez
79488684e1 Extend the range of supported CMake package config versions
Modified to be backward-compatible with Eigen 3.4.0, in that the following
will still accept 3.4.1:
```
find_package(Eigen3 3.3)
```

(cherry picked from commit 027dc5bc8d)
2025-09-29 10:47:01 -07:00
Antonio Sanchez
b66188b5df Run smoketests on small runners. 2025-09-23 13:22:29 -07:00
Antonio Sanchez
5c81034fc1 Run pipeline on merge requests 2025-09-12 14:14:52 -07:00
Antonio Sanchez
c3f9707824 Move more jobs to gitlab runners.
(cherry picked from commit 5d4485e767)
2025-08-29 11:34:35 -07:00
Antonio Sánchez
4fc1cfeda5 Move GPU ci jobs to gitlab-hosted runners.
(cherry picked from commit 52f570a409)
2025-08-28 11:29:30 -07:00
Alexander Vieth
cd7263e7f6 Restore EIGEN_INCLUDE_DIR in CMake again (for 3.4.x) 2025-08-18 19:53:08 +00:00
Alexander Vieth
eb57d4bdf1 Fix compilation with clang and c++03 on ARM 2025-08-17 16:00:53 +00:00
Antonio Sanchez
4a2c4901ce Update CI configuration from master. 2025-07-07 14:26:40 -07:00
Antonio Sanchez
68f4e58cfa Don't expire docs pages job 2025-02-24 08:24:04 -08:00
Antonio Sanchez
0e607fd350 Fix c++03 build and tests 2025-02-18 10:41:23 -08:00
C. Antonio Sanchez
13507d1efd Remove nightly tag deploy on non-default branches 2025-02-17 18:48:09 -08:00
C. Antonio Sanchez
85ffda9539 Fix arm32 packetmath tests 2025-02-17 17:49:08 -08:00
Charles Schlosser
72f77ccb3e Fix arm32 float division and related bugs
(cherry picked from commit 81b48065ea)
2025-02-17 14:24:36 -08:00
Antonio Sanchez
526a6328e2 Default eigen_packet_wrapper constructor.
This makes it trivial, allowing use of `memcpy`.

Fixes #2326

(cherry picked from commit cb50730993)
2025-02-17 13:00:15 -08:00
C. Antonio Sanchez
7b378c2d91 Fix cherry-pick bug for NEON make_packet 2025-02-17 12:59:57 -08:00
Antonio Sánchez
129e003cdf Disable FP16 arithmetic for arm32.
(cherry picked from commit 7465b7651e)
2025-02-17 12:18:51 -08:00
Antonio Sánchez
6161ce5cde Fix arm builds.
(cherry picked from commit 2c8011c2dd)
2025-02-17 12:18:02 -08:00
Antonio Sánchez
be62728876 More NEON packetmath fixes.
(cherry picked from commit 384269937f)
2025-02-17 12:16:09 -08:00
Antonio Sánchez
1426855b68 Fix NEON make_packet2f.
(cherry picked from commit 2dfbf1b251)
2025-02-17 12:15:51 -08:00
Antonio Sánchez
b2deb94e4a Fix MSVC arm build.
(cherry picked from commit 0a5392d606)
2025-02-17 12:15:28 -08:00
Antonio Sánchez
c23abcf25c Fix arm32 issues.
(cherry picked from commit a73970a864)
2025-02-17 12:11:20 -08:00
Antonio Sánchez
f23b8c0d78 Fix more hard-coded magic bounds.
(cherry picked from commit ae5280aa8d)
2025-02-17 07:28:41 -08:00
Antonio Sánchez
d60c3a3341 Slightly adjust error bound for nonlinear tests.
(cherry picked from commit 42aa3d17cd)
2025-02-17 07:28:21 -08:00
C. Antonio Sanchez
57c8d7c93f Fix failing builds and update CI on push.
Specifically:
- Fixed ctz on 32-bit arm (where `uint64_t` is `unsigned long long`)
- Fixed build of random_cpp11 snippet when C++11 is disabled
- Updated CI scripts to run windows on push, and added a no-c++11 test
2025-02-17 07:20:12 -08:00
C. Antonio Sanchez
ab92609cad Add missing ci scripts 2025-02-16 15:06:16 -08:00
C. Antonio Sanchez
551e95a409 Run pipelines on push 2025-02-16 14:50:58 -08:00
C. Antonio Sanchez
2924f58188 Remove deprecated check in meta test 2025-02-16 14:42:15 -08:00
C. Antonio Sanchez
f1922b6dac Update cmake configuration from master 2025-02-16 14:41:46 -08:00
C. Antonio Sanchez
052d91349a Split bdcsvd tests 2025-02-16 14:41:42 -08:00
Antonio Sánchez
72e38684c1 Disable deprecated warnings for SVD tests on MSVC.
(cherry picked from commit d58e629130)
2025-02-16 14:41:34 -08:00
Antonio Sánchez
bb1dbb4df6 Disable deprecated warnings in SVD tests.
(cherry picked from commit f0b81fefb7)
2025-02-16 14:41:32 -08:00
C. Antonio Sanchez
0a5abc042e Copy CI configuration from master
(minus loongarch)
2025-02-16 08:00:49 -08:00
C. Antonio Sanchez
42d9cc0b1d Fix Tensor docs 2025-02-15 22:46:57 -08:00
C. Antonio Sanchez
7312765992 Fix all doxygen warnings. 2025-02-15 21:10:48 -08:00
C. Antonio Sanchez
88cd53774e Fix altivec and vsx builds 2025-02-15 13:18:14 -08:00
Chip Kerchner
c0378fedd8 Fix non-VSX PowerPC build
(cherry picked from commit 9e0afe0f02)
2025-02-15 10:14:40 -08:00
Chip Kerchner
414f0a1756 Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt.
(cherry picked from commit 4a58f30aa0)
2025-02-15 09:17:56 -08:00
Sergey Fedorov
3adc78e39c Altivec fixes for Darwin: do not use unsupported VSX insns
(cherry picked from commit 4d05765345)
2025-02-11 20:55:29 -08:00
Morris Hafner
e67c494cba Use old syntax for CMake's separate_arguments() to restore compatiblity with old CMake versions. 2024-11-13 17:01:13 +00:00
Morris Hafner
3e7bcf54f7 cherry-pick !1682 Add nvc++ support into 3.4 2024-11-04 17:55:47 +00:00
Antonio Sánchez
9df21dc8b4 Work around VS2015 compile bug. 2024-03-15 18:07:02 +00:00
Antonio Sánchez
157756130a Restore C++03 compatibility. 2024-03-15 17:55:04 +00:00
Antonio Sanchez
7893285e59 Fix tridiagonalize snippet for 3.4.
Fixes #2770.
2024-03-12 22:04:46 -07:00
Antonio Sanchez
3ee06ec52f Fix real schur and polynomial solver.
For certain inputs, the real schur decomposition might get stuck in a cycle.
Exceptional shifts are supposed to knock us out of that - but previously
they were only ever applied at iteration 10 and 30, which doesn't help if
the cycle starts after cycle 30.  Modified to apply a shift every 16 iterations
(for reference, LAPACK seems to do it every 6 iterations).

Also added an assert in polynomial solver to verify that the schur decomposition
was successful.

Fixes #2633.
2024-02-16 13:11:54 -08:00
Antonio Sánchez
287c801780 Use stableNorm in ComplexEigenSolver.
(cherry picked from commit 0f0c76dc29)
2024-01-30 08:39:35 -08:00
Antonio Sanchez
42b04a08c4 Fix preshear transformation.
Fixes #2777.  The `preshear` function seems to have always used an invalid constructor
internally, and has been broken for a while.  Fixed the implementation and added a test.

(cherry picked from commit 45da84e21570bf70238cf489ad862b2f09242c5f)
2024-01-29 12:30:06 -08:00
Rasmus Munk Larsen
b86ac5f1e7 Use padd instead of +.
(cherry picked from commit bbfc4d54cd)
2024-01-29 10:50:55 -08:00
Rasmus Munk Larsen
380a9483e0 Implement a generic vectorized version of Smith's algorithms for complex division.
(cherry picked from commit 9312a5bf5c)
2024-01-26 20:42:52 -08:00
Charles Schlosser
25270e35db Fix compiler warnings in 3.4 2023-12-21 00:57:21 +00:00
Antonio Sanchez
ebf968b272 Remove c++11 from ctz/clz 2023-12-20 14:18:48 -08:00
Charles Schlosser
bd57b99f44 fix msvc clz
(cherry picked from commit 2c4541f735)
2023-12-14 13:38:18 -08:00
Antonio Sánchez
b8f894947a Add internal ctz/clz implementation.
(cherry picked from commit 75e273afcc)
2023-12-14 13:37:03 -08:00
Antonio Sanchez
4be2870267 Only apply ASM work-around for min/max on GNUC strict.
Fixes #2742.
2023-11-27 10:08:18 -08:00
Charles Schlosser
f7085a1096 replace using with typedef 2023-11-24 19:42:54 +00:00
Charles Schlosser
63291e34bf Update file GeneralMatrixVector.h
(cherry picked from commit 283dec7f25)
2023-11-24 19:07:33 +00:00
Charles Schlosser
23886fd7db Gemv microoptimization
(cherry picked from commit d1b03fb5c9)
2023-11-24 19:07:17 +00:00
Charles Schlosser
7c6020e424 Fix -Waggressive-loop-optimizations
(cherry picked from commit 4e9e493b4a)
2023-11-24 19:06:40 +00:00
arthurfeeney
2e3f1d8044 Fix implicit conversion warning in GEBP kernel's packing
(cherry picked from commit 937c3d73cb)
2023-11-18 18:17:21 +00:00
Silvio Traversaro
fc5575264f Backport "disambiguate overloads for empty index list" to 3.4 branch 2023-11-10 04:03:11 +00:00
Antonio Sanchez
bae907b8f6 Update version to 3.4.1
Tests all pass: https://gitlab.com/libeigen/eigen_ci_cross_testing/-/pipelines/1060764169
2023-11-06 13:53:54 -08:00
Charles Schlosser
cf207eacd5 Patch SparseLU
(cherry picked from commit a8bab0d8ae)
2023-11-02 21:17:17 -07:00
Chip Kerchner
e734787bb7 Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt.
(cherry picked from commit 4a58f30aa0)
2023-10-25 15:19:37 -07:00
Antonio Sanchez
1217390db4 Fix windows+CUDA builds 2023-10-25 20:55:59 +00:00
Antonio Sanchez
7176ae1623 Make 3.4.1 compatible with c++03 2023-10-16 15:38:25 -07:00
Antonio Sánchez
0db5928f00 Eliminate use of _res.
(cherry picked from commit 5bdf58b8df)
2023-10-16 13:38:17 -07:00
Erik Schultheis
764b132a79 ensure that eigen::internal::size is not found by ADL, rename to ssize and...
(cherry picked from commit 9210e71fb3)
2023-08-24 12:42:34 -07:00
Fabian Keßler
d0bfdc1658 optimize cmake scripts for subproject use
(cherry picked from commit 19cacd3ecb)
2023-07-26 12:01:28 -07:00
Antonio Sánchez
75ebef26b6 Adds new CMake Options for controlling build components.
(cherry picked from commit cf82186416)
2023-07-26 11:52:47 -07:00
Charles Schlosser
208e44c979 fix warnings in tensorreduction and memory 2023-07-19 16:48:07 +00:00
Antonio Sánchez
17d57fb168 Fix up PowerPC MMA flags so it builds by default.
(cherry picked from commit 591906477b)
2023-07-11 16:27:32 -07:00
Antonio Sánchez
6973687c70 Fix up PowerPC MMA flags so it builds by default.
(cherry picked from commit 65eeedf964)
2023-07-11 16:20:57 -07:00
Antonio Sanchez
ac561cd038 Reduce tensor_contract_gpu test.
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed.  Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.

(cherry picked from commit be9e7d205f)
2023-07-11 11:27:31 -07:00
Antonio Sanchez
554982beef Disable Tree reduction for GPU.
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.

(cherry picked from commit 24ebb37f38)
2023-07-10 16:09:30 -07:00
Antonio Sanchez
89a71f3126 Fix gpu special function tests.
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.

Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.

(cherry picked from commit 701f5d1c91)
2023-07-10 15:57:08 -07:00
Antonio Sanchez
a605d6b996 Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.

(cherry picked from commit 846d34384a)
2023-07-10 15:30:41 -07:00
Antonio Sanchez
dfcd6de20a Clean up CUDA CMake files.
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).

The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).

(cherry picked from commit 7b00e8b186)
2023-07-10 15:30:41 -07:00
Antonio Sanchez
1ec1b16d36 Add buildtests_gpu and check_gpu to simplify GPU testing.
This is in preparation of adding GPU tests to the CI, allowing
us to limit building/testing of GPU-specific tests for a given
GPU-capable runner.

GPU tests are tagged with the label "gpu".  The new targets
```
make buildtests_gpu
make check_gpu
```
allow building and running only the gpu tests.

(cherry picked from commit 16f9a20a6f)
2023-07-10 15:30:41 -07:00
Antonio Sánchez
0f39c851a5 Fix use of arg function in CUDA.
(cherry picked from commit 63dcb429cd)
2023-07-10 15:30:41 -07:00
Kevin Leonardic
daa0b70a65 Fix argument for _mm256_cvtps_ph imm parameter
(cherry picked from commit d4b05454a7)
2023-07-10 15:30:41 -07:00
Antonio Sánchez
33ba98b641 Ensure EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC is always defined on arm.
(cherry picked from commit 31cd2ad371)
2023-07-10 15:30:41 -07:00
Antonio Sánchez
e6e921f0e3 Disable FP16 arithmetic for arm32.
(cherry picked from commit 7465b7651e)
2023-07-10 15:30:41 -07:00
Alexander Shaposhnikov
ebfdd6bdea Do not set EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC for cuda compilation
(cherry picked from commit 316eab8deb)
2023-07-10 15:30:41 -07:00
Alejandro Acosta
357bb11066 Replace usage of CudaStreamDevice with GpuStreamDevice in tensor benchmarks GPU
(cherry picked from commit 07e4604b19)
2023-07-10 15:30:40 -07:00
Rasmus Munk Larsen
9b3d104c02 Add missing braces in Umeyama.h
(cherry picked from commit 1321821e86)
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
af3ca50f0b Work around compiler bug in Umeyama.h.
(cherry picked from commit 524c329ab2)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
26b8fabd80 Return NaN in ndtri for values outside valid input range.
(cherry picked from commit 1f79a6078f)
2023-07-10 14:52:08 -07:00
Charles Schlosser
385a0b38f8 JacobiSVD: set m_nonzeroSingularValues to zero if not finite
(cherry picked from commit fdc749de2a)
2023-07-10 14:52:08 -07:00
Antonio Sanchez
a4ecfd8ead Fix boolean bitwise and warning.
(cherry picked from commit 70410310a4)
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
f296720d7d Make sure we return +/-1 above the clamping point for Erf().
(cherry picked from commit b378014fef)
2023-07-10 14:52:08 -07:00
Rob Conde
f04d02dbf6 exclude Eigen/Core and Eigen/src/Core from being ignored due to core ignore rule
(cherry picked from commit 990a282fc4)
2023-07-10 14:52:08 -07:00
Rohit Goswami
6f9bffe8dd DOC: Update documentation for 3.4.x
(cherry picked from commit b0eded878d)
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
d4c24eca96 Don't crash on empty tensor contraction.
(cherry picked from commit b0f877f8e0)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
72b0759451 Fix arm builds.
(cherry picked from commit 2c8011c2dd)
2023-07-10 14:52:08 -07:00
Jonas Schulze
34d0d83278 Fix some typos
(cherry picked from commit 81cb6a51d0)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
63e8b31c94 Fix parsing of command-line arguments when already specified as a cmake list.
(cherry picked from commit 555cec17ed)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
99473f255b Fix failing MSVC tests due to compiler bugs.
(cherry picked from commit 394aabb0a3)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
2ce5dc428f Guard use of long double on GPU device.
(cherry picked from commit bc5cdc7a67)
2023-07-10 14:52:08 -07:00
Chip Kerchner
8f1b6198c2 Fix epsilon and dummy_precision values in long double for double doubles. Prevented some algorithms from converging on PPC.
(cherry picked from commit 54459214a1)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
dae8c6d7ad Guard complex sqrt on old MSVC compilers.
(cherry picked from commit a16fb889dd)
2023-07-10 14:52:07 -07:00
Antonio Sánchez
2dfdaa2abf More NEON packetmath fixes.
(cherry picked from commit 384269937f)
2023-07-10 14:52:03 -07:00
Antonio Sánchez
a659b5dbb2 Fix NEON make_packet2f.
(cherry picked from commit 2dfbf1b251)
2023-07-10 14:34:09 -07:00
Antonio Sánchez
879854382c Fix MSVC arm build.
(cherry picked from commit 0a5392d606)
2023-07-10 14:34:09 -07:00
Jeremy Nimmer
90dce8dfa3 Fix undefined behavior in Block access
(cherry picked from commit a1cdcdb038)
2023-07-10 14:34:09 -07:00
Martin Burchell
b26ada1e03 Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm
(cherry picked from commit c54785b071)
2023-07-10 14:34:09 -07:00
Antonio Sánchez
f5593b4baa Fix reshape strides when input has non-zero inner stride.
(cherry picked from commit 2260e11eb0)
2023-07-10 14:34:09 -07:00
Alexandre Hoffmann
3eb0c8b69e Changing BiCGSTAB parameters initialization so that it works with custom types
(cherry picked from commit 23524ab6fc)
2023-07-10 14:34:09 -07:00
Antonio Sánchez
26adb0e5af Fix sparseLU solver when destination has a non-unit stride.
(cherry picked from commit ab2b26fbc2)
2023-07-10 14:34:09 -07:00
Antonio Sánchez
5547205092 Correct pnegate for floating-point zero.
(cherry picked from commit 8588d8c74b)
2023-07-10 14:34:04 -07:00
Antonio Sánchez
771e91860b Fix typo in CholmodSupport
(cherry picked from commit 7dc6db75d4)
2023-07-10 12:26:39 -07:00
Antonio Sánchez
4786edba26 Fix pragma check for disabling fastmath.
(cherry picked from commit c27d1abe46)
2023-07-10 10:09:09 -07:00
Antonio Sánchez
15e23ab849 Explicitly state that indices must be sorted.
(cherry picked from commit bf48d46338)
2023-07-10 10:09:09 -07:00
Laurent Rineau
af6e7cc66a Eigen/Sparse: fix warnings -Wunused-but-set-variable
(cherry picked from commit 7846c7387c)
2023-07-10 10:09:09 -07:00
Rasmus Munk Larsen
3fbb1c1b48 Guard GCC-specific pragmas with "#ifdef EIGEN_COMP_GNUC"
(cherry picked from commit 5ceed0d57f)
2023-07-10 10:09:09 -07:00
Antonio Sánchez
28cd280726 Fix 4x4 inverse when compiling with -Ofast.
(cherry picked from commit 7d6a9925cc)
2023-07-10 10:09:09 -07:00
Antonio Sánchez
8cc3ec8e47 Fix realloc for non-trivial types.
(cherry picked from commit 311ba66f7c)
2023-07-10 10:09:02 -07:00
Gilles Aouizerate
d641062a05 fix typo in doc/TutorialSparse.dox
(cherry picked from commit 6e83e906c2)
2023-07-07 15:21:18 -07:00
Michael Palomas
1000cf9fbc fixed msvc compilation error in GeneralizedEigenSolver.h
(cherry picked from commit 525f066671)
2023-07-07 15:21:18 -07:00
Antonio Sánchez
fd2817e3d6 Add asserts for index-out-of-bounds in IndexedView.
(cherry picked from commit f241a2c18a)
2023-07-07 15:21:18 -07:00
Antonio Sánchez
11dacc4802 Fix some cmake issues.
(cherry picked from commit f5364331eb)
2023-07-07 15:21:18 -07:00
Antonio Sánchez
ab6f39e1e3 Fix mixingtypes tests.
(cherry picked from commit d816044b6e)
2023-07-07 15:21:18 -07:00
Gilles Aouizerate
6576ee4fb1 2 typos fix in the 3rd table.
(cherry picked from commit 94cc83faa1)
2023-07-07 15:21:18 -07:00
Arthur
68f35d76b8 Fix GeneralizedEigenSolver::info() and Asserts
(cherry picked from commit a7c1cac18b)
2023-07-07 15:21:18 -07:00
Matthew Sterrett
d0e2b3e58d Removed unnecessary checks for FP16C
(cherry picked from commit 39fcc89798)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
669dc8fadf Eliminate bool bitwise warnings.
(cherry picked from commit b8e93bf589)
2023-07-07 15:21:17 -07:00
Lexi Bromfield
33a602eb37 Don't double-define Half functions on aarch64
(cherry picked from commit 66ea0c09fd)
2023-07-07 15:21:17 -07:00
Rasmus Munk Larsen
a9490cd3c5 Fix code and unit test for a few corner cases in vectorized pow()
(cherry picked from commit 7a87ed1b6a)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
61efca2e90 Use numext::sqrt in ConjugateGradient.
(cherry picked from commit 7896c7dc6b)
2023-07-07 15:21:17 -07:00
Alexander Richardson
a5469a6f0f Avoid including <sstream> with EIGEN_NO_IO
(cherry picked from commit b7668c0371)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
6aaa45db5f Include immintrin.h header for enscripten.
(cherry picked from commit 34780d8bd1)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
ea57f9b78f AutoDiff depends on Core, so include appropriate header.
(cherry picked from commit e1165dbf9a)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
f55a112cb1 Fix ODR violations.
(cherry picked from commit bb51d9f4fa)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
a11bdf3965 Skip f16/bf16 bessel specializations on AVX512 if unavailable.
(cherry picked from commit 8ed3b9dcd6)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
b9ac284e52 Use numext::sqrt in Householder.h.
(cherry picked from commit 0e083b172e)
2023-07-07 15:21:17 -07:00
sfalmo
3df7d7fec9 Fix row vs column vector typo in Matrix class tutorial
(cherry picked from commit 9960a30422)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
80c5b8b3c3 Fix ambiguous comparisons for c++20 (again again)
(cherry picked from commit 8c2e0e3cb8)
2023-07-07 15:21:17 -07:00
Antonio Sanchez
848db4ed2d Fix BDCSVD condition for failing with numerical issue.
(cherry picked from commit 481a4a8c31)
2023-07-07 15:21:17 -07:00
Rohan Ghige
5cb7505a44 Fix 'Incorrect reference code in STL_interface.hh for ata_product' eigen/isses/2425
(cherry picked from commit 798fc1c577)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
af912a7b5c Fix MSVC+CUDA issues.
(cherry picked from commit 5ed7a86ae9)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
86d958e8f2 Consider inf/nan in scalar test_isApprox.
(cherry picked from commit 0c859cf35d)
2023-07-07 15:21:17 -07:00
Erik Schultheis
8e7bd5397f fixed order of arguments in blas syrk
(cherry picked from commit 1ddd3e29cb)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
8a21df2d9c Disable f16c scalar conversions for MSVC.
(cherry picked from commit 73b2c13bf2)
2023-07-07 15:21:12 -07:00
Antonio Sanchez
ac78f84b72 Eliminate trace unused warning.
(cherry picked from commit 9bc9992dd3)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
973b04f3e1 Fix AVX512 builds with MSVC.
(cherry picked from commit 9a14d91a99)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
16844d7529 Work around MSVC compiler bug dropping const.
(cherry picked from commit 3ca1228d45)
2023-07-07 15:06:18 -07:00
Tobias Schlüter
5cb2dfec1d Fix RowMajorBit <-> RowMajor mixup.
(cherry picked from commit 40eb34bc5d)
2023-07-07 15:06:18 -07:00
Øystein Sørensen
c473d69d22 Completed a missing parenthesis in tutorial.
(cherry picked from commit c062983464)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
6469fbf93a Work around g++-10 docker issue for geo_orthomethods_4.
(cherry picked from commit 9deaa19121)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
3a4a4e9fde Disable schur non-convergence test.
(cherry picked from commit 01b5bc48cc)
2023-07-07 15:06:18 -07:00
Arthur
fab848d4f7 Remove workarounds for bad GCC-4 warnings
(cherry picked from commit 514f90c9ff)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b158fcaa74 Fix edge-case in zeta for large inputs.
(cherry picked from commit 9296bb4b93)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b6d9b6f48d Remove duplicate IsRowMajor declaration.
(cherry picked from commit 0ae94456a0)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b1f06aac61 Update vectorization_logic tests for all platforms.
(cherry picked from commit 27d8f29be3)
2023-07-07 15:06:08 -07:00
Antonio Sanchez
f6954e4485 Fix enum conversion warnings in BooleanRedux.
(cherry picked from commit 55c7400db5)
2023-07-07 11:51:10 -07:00
Antonio Sánchez
b30a2a527e Remove poor non-convergence checks in NonLinearOptimization.
(cherry picked from commit d819a33bf6)
2023-07-07 11:50:25 -07:00
Antonio Sanchez
bc1b354b32 Adjust tolerance of matrix_power test for MSVC.
(cherry picked from commit 1c2690ed24)
2023-07-07 11:50:02 -07:00
Yury Gitman
bd0d873b16 Fix any/all reduction in the case of row-major layout
(cherry picked from commit bf6726a0c6)
2023-07-07 11:48:49 -07:00
Antonio Sánchez
e0fe006915 Fix mixingtypes for g++-11.
(cherry picked from commit 19c39bea29)
2023-07-07 11:47:23 -07:00
Antonio Sánchez
d259104c8d Fix frexp packetmath tests for MSVC.
(cherry picked from commit 2ed4bee78f)
2023-07-07 11:47:09 -07:00
Antonio Sánchez
cd543434bf Fix gcc-5 packetmath_12 bug.
(cherry picked from commit 8970719771)
2023-07-07 11:46:15 -07:00
Antonio Sánchez
36be6747e0 Modify test expression to avoid numerical differences (#2402).
(cherry picked from commit ae86a146b1)
2023-07-07 11:45:56 -07:00
Martin Heistermann
d1ed3fe5c9 Fix for crash bug in SPQRSupport: Initialize pointers to nullptr to avoid free() calls of invalid pointers.
(cherry picked from commit 550af3938c)
2023-07-07 11:44:55 -07:00
Antonio Sanchez
21e0ad056e Fix ODR failures in TensorRandom.
(cherry picked from commit bded5028a5)
2023-07-07 11:43:03 -07:00
Antonio Sánchez
709d704819 Fix collision with resolve.h.
(cherry picked from commit 94bed2b80c)
2023-07-07 11:40:44 -07:00
Antonio Sánchez
995714142d Restrict GCC<6.3 maxpd workaround to only gcc.
(cherry picked from commit 4bffbe84f9)
2023-07-07 11:39:27 -07:00
Antonio Sánchez
730a781221 Define EIGEN_HAS_AVX512_MATH in PacketMath.
(cherry picked from commit e7f4a901ee)
2023-07-07 11:39:13 -07:00
Antonio Sánchez
77b2807322 Fix AVX512 math function consistency, enable for ICC.
(cherry picked from commit 96da541cba)
2023-07-07 11:37:49 -07:00
Antonio Sánchez
52e545324e Fix ODR violations.
(cherry picked from commit cafeadffef)
2023-07-07 11:37:31 -07:00
Stephen Pierce
0cd4719f3e Silence some MSVC warnings
(cherry picked from commit 81c928ba55)
2023-07-07 11:30:40 -07:00
Erik Schultheis
770ed0794e fix broken asserts
(cherry picked from commit 5a0a165c09)
2023-07-07 11:25:03 -07:00
Antonio Sánchez
e7248b26a1 Prevent BDCSVD crash caused by index out of bounds.
(cherry picked from commit 028ab12586)
2022-05-19 22:30:33 +00:00
Antonio Sanchez
a1e1612c28 Fix cwise NaN propagation for scalar input.
Was missing a template parameter.  Updated tests.

Fixes #2474.
2022-04-15 22:11:22 -07:00
Antonio Sánchez
f3aaba8705 Revert "Replace call to FixedDimensions() with a singleton instance of"
This reverts commit 19e6496ce0

(cherry picked from commit f7b31f864c)
2022-04-10 15:34:11 +00:00
Antonio Sánchez
34e5f34b39 Update warning suppression to latest. 2022-03-21 15:56:03 +00:00
Antonio Sánchez
4612627355 Revert "ensure that eigen::internal::size is not found by ADL, rename to ssize and..."
This reverts commit bd72e4a8c4
2022-01-18 16:08:59 +00:00
Antonio Sánchez
3e71c621c9 Revert "fix compilation issue with gcc < 10 and -std=c++2a"
This reverts commit b5d218d857
2022-01-18 16:08:37 +00:00
Jörg Buchwald
b5d218d857 fix compilation issue with gcc < 10 and -std=c++2a
(cherry picked from commit d1bf056394)
2022-01-13 01:43:43 +00:00
Erik Schultheis
bd72e4a8c4 ensure that eigen::internal::size is not found by ADL, rename to ssize and...
(cherry picked from commit 9210e71fb3)
2022-01-11 16:43:21 +00:00
David Tellenbach
3af8c262ac Include immintrin.h if F16C is available and vectorization is disabled
If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.

This fixes issue #2395.

(cherry picked from commit c06c3e52a0)
2021-12-25 22:53:23 +01:00
Antonio Sanchez
7e3bc4177e Fix tensor broadcast off-by-one error.
Caught by JAX unit tests.  Triggered if broadcast is smaller than packet
size.


(cherry picked from commit ffb78e23a1)
2021-11-16 18:41:25 +00:00
Minh Quan HO
c379a21191 nestbyvalue test: fix uninitialized matrix
- Doing computation with uninitialized (zero-ed ? but thanks Linux) matrix, or
worse NaN on other non-linux systems.
- This commit fixes it by initializing to Random().


(cherry picked from commit 4284c68fbb)
2021-11-05 18:19:27 +00:00
Gengxin Xie
6f57470bcc Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C
if the compiler isn't clang


(cherry picked from commit 5c642950a5)
2021-11-04 22:54:01 +00:00
Lennart Steffen
df53e28179 Included note on inner stride for compile-time vectors. See https://gitlab.com/libeigen/eigen/-/issues/2355#note_711078126
(cherry picked from commit 163f11e24a)
2021-11-03 23:35:40 +00:00
Chip Kerchner
fbdaff81bd Invert rows and depth in non-vectorized portion of packing (PowerPC).
(cherry picked from commit 9cf34ee0ae)
2021-11-03 23:34:47 +00:00
Nico
71320af66a Fix -Wbitwise-instead-of-logical clang warning
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.

Here, it is inconsequential, so switch to && and || to suppress the warning.

(cherry picked from commit b17bcddbca)
2021-11-03 23:32:57 +00:00
Maxiwell S. Garcia
962a596d21 test: fix boostmutiprec test to compile with older Boost versions
Eigen boostmultiprec test redefines a symbol that is already defined
inside Boot Math [1]. Boost has fixed it recently [2], but this
patch avoids errors if Boost version was less than 1.77.

https://github.com/boostorg/math/blob/boost-1.76.0/include/boost/math/policies/policy.hpp#L18
6830712302 (diff-c7a8e5911c2e6be4138e1a966d762200f147792ac16ad96fdcc724313d11f839)


(cherry picked from commit 99600bd1a6)
2021-11-03 23:31:48 +00:00
Antonio Sanchez
0ab1f8ec03 Fix broadcasting oob error.
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly.  Here we fix this, and make other minor
cleanup changes.

Fixes #2351.


(cherry picked from commit a500da1dc0)
2021-11-03 23:30:47 +00:00
Alex Druinsky
b0fe14213e Fix vectorized reductions for Eigen::half
Fixes compiler errors in expressions that look like

  Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()

The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.

The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.


(cherry picked from commit d0e3791b1a)
2021-11-03 23:29:55 +00:00
Andreas Krebbel
23469c3cda ZVector: Move alignas qualifier to come first
We currently have plenty of type definitions with the alignment
qualifier coming after the type.  The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];

Turn this into:
EIGEN_ALIGN16 int ai[4];


(cherry picked from commit 8faafc3aaa)
2021-11-03 23:29:10 +00:00
Antonio Sanchez
18824d10ea Fix ZVector build.
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu.  This allows the
packetmath tests to pass.


(cherry picked from commit 40bbe8a4d0)
2021-11-03 23:28:26 +00:00
Antonio Sanchez
f9b2e92040 Remove bad "take" impl that causes g++-11 crash.
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).

Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list.  Commenting it out
allows our Eigen tests to pass.


(cherry picked from commit 8f8c2ba2fe)
2021-11-03 23:26:34 +00:00
Xinle Liu
9c193db5c7 Fix BDCSVD's total deflation in branch 3.4, similar to that of master in MR 707.
(cherry picked from commit 4d045eba53f9a32d052eb942448ba62def066529)
2021-11-03 17:58:57 +00:00
Antonio Sanchez
6b6ba41269 Fix min/max nan-propagation for scalar "other".
Copied input type from `EIGEN_MAKE_CWISE_BINARY_OP`.

Fixes #2362.


(cherry picked from commit 03d4cbb307)
2021-10-28 17:16:49 +00:00
Rasmus Munk Larsen
96007cae8c Remove license column in tables for builtin sparse solvers since all are MPL2 now.
(cherry picked from commit 68e0d023c0)
2021-10-26 18:11:02 +00:00
Rasmus Munk Larsen
5d918b82a8 Add nan-propagation options to matrix and array plugins. 2021-10-21 13:48:50 -07:00
Antonio Sanchez
05c9d7ce20 Disable MSVC constant condition warning.
We use extensive use of `if (CONSTANT)`, and cannot use c++17's `if
constexpr`.

(cherry picked from commit 5bf35383e0)
2021-10-11 10:00:29 -07:00
Antonio Sanchez
943ef50a2d Disable testing of complex compound assignment operators for MSVC.
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).

Trying to use one of these on device will currently lead to a
duplicate definition error.  This is still probably preferable
to no error though.  If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.

The only proper solution would be to define our own custom `Complex`
type.

(cherry picked from commit f0f1d7938b)
2021-10-11 10:00:29 -07:00
Antonio Sanchez
7ea4adb5f0 Disable another device warning
(cherry picked from commit e9e90892fe)
2021-10-11 10:00:29 -07:00
Antonio Sanchez
71498b32c9 Disable more NVCC warnings.
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function.  Although we probably should eventually
address these, they are flooding the logs.  Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.

The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case.  Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects.  Again,
these warnings are flooding the logs, so disabling for now.

(cherry picked from commit 86c0decc48)
2021-10-11 10:00:29 -07:00
Antonio Sanchez
ebd5c6d44b Add -mfma for AVX512DQ tests.
(cherry picked from commit 76bb29c0c2)
2021-10-11 10:00:29 -07:00
Rasmus Munk Larsen
a8eb797a43 Remove -fabi-version=6 flag from AVX512 builds. It was added to fix builds with gcc 4.9, but these don't even work today, and the flag breaks compilation with newer versions of gcc.
(cherry picked from commit 1239adfcab)
2021-10-11 10:00:29 -07:00
Alexander Grund
929bc0e191 Fix alias violation in BFloat16
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast


(cherry picked from commit b5eaa42695)
2021-09-20 14:25:58 +00:00
Antonio Sanchez
f046e326d9 Fix strict aliasing bug causing product_small failure.
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.

Fixes #2327.


(cherry picked from commit 3c724c44cf)
2021-09-19 18:06:17 +00:00
Ryan Pavlik
3335e0767c Fix typos in copyright dates 2021-09-15 13:26:50 -05:00
Antonio Sanchez
3395f4e604 Fix tridiagonalization_inplace_selector.
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.


(cherry picked from commit ebd4b17d2f)
2021-09-08 15:47:39 +00:00
Antonio Sanchez
f03d3e7072 Missing EIGEN_DEVICE_FUNCs to get gpu_basic passing with CUDA 9.
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored.  Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.


(cherry picked from commit 998bab4b04)
2021-09-02 03:21:43 +00:00
Maxiwell S. Garcia
b8cf1ed753 Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h
(cherry picked from commit 09fc0f97b5)
2021-09-01 17:26:59 +00:00
Rasmus Munk Larsen
9263475740 Add missing dependency on LAPACK test suite binaries to target buildtests, so make check will work correctly when EIGEN_ENABLE_LAPACK_TESTS is ON.
(cherry picked from commit 6f429a202d)
2021-09-01 16:41:47 +00:00
Rasmus Munk Larsen
0fdc99c65e Allow old Fortran code for LAPACK tests to compile despite argument mismatch errors (REAL passed to COMPLEX workspace argument) with GNU Fortran 10.
(cherry picked from commit 7e096ddcb0)
2021-09-01 16:41:28 +00:00
Antonio Sanchez
07cc362238 Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang.
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.

This fixes our gitlab CI.


(cherry picked from commit 3a6296d4f1)
2021-09-01 16:40:08 +00:00
Antonio Sanchez
4ef67cbfb2 GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315).
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".

Tested `r` instead, and that seems to work, even with latest compilers.

Also fixed some minor macro issues to eliminate warnings on armv7.

Fixes #2315.


(cherry picked from commit ff07a8a639)
2021-08-31 21:23:28 +00:00
Antonio Sanchez
c2b6df6e60 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.


(cherry picked from commit cc3573ab44)
2021-08-31 21:23:11 +00:00
Adam Kallai
277d369060 win: include intrin header in Windows on ARM
intrin header is needed for _BitScanReverse and
_BitScanReverse64


(cherry picked from commit 1415817d8d)
2021-08-31 21:22:37 +00:00
Antonio Sanchez
7aee90b8d3 Fix fix<N> when variable templates are not supported.
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.

Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.

Fixes #2308


(cherry picked from commit 5db9e5c779)
2021-08-30 16:23:35 +00:00
Rasmus Munk Larsen
3147391d94 Change version to 3.4.0. 2021-08-18 13:41:58 -07:00
Antonio Sanchez
115591b9e3 Workaround VS 2017 arg bug.
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs.  It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).


(cherry picked from commit 2b410ecbef)
2021-08-18 19:04:50 +00:00
Antonio Sanchez
fd100138dd Remove unaligned assert tests.
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor.  Newer versions of clang optimize away
this check.

Removing the affected tests.


(cherry picked from commit 0c4ae56e37)
2021-08-18 18:39:04 +00:00
Jakob Struye
1ec173b54e Clearer doc for squaredNorm
(cherry picked from commit 53a29c7e35)
2021-08-18 15:12:36 +00:00
Antonio Sanchez
aef926abf6 Renamed shift_left/shift_right to shiftLeft/shiftRight.
For naming consistency.  Also moved to ArrayCwiseUnaryOps, and added
test.


(cherry picked from commit fc9d352432)
2021-08-18 14:44:31 +00:00
Antonio Sanchez
f1032255d3 Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.


(cherry picked from commit 2cc6ee0d2e)
2021-08-17 15:33:55 +00:00
Chip-Kerchner
f57dec64ef Fix unaligned loads in ploadLhs & ploadRhs for P8.
(cherry picked from commit 8dcf3e38ba)
2021-08-17 12:48:36 +00:00
Rasmus Munk Larsen
926e1a8226 Update documentation for matrix decompositions and least squares solvers.
(cherry picked from commit 7e6f94961c)
2021-08-16 22:11:38 +00:00
andiwand
cd474d4cd0 minor doc fix in Map.h
(cherry picked from commit 5c6b3efead)
2021-08-16 14:26:39 +00:00
Chip-Kerchner
0b56b62f30 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).
(cherry picked from commit e07227c411)
2021-08-13 18:01:15 +00:00
Chip Kerchner
44cc96e1a1 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+
(cherry picked from commit 66499f0f17)
2021-08-12 21:39:17 +00:00
Rasmus Munk Larsen
576e451b10 Add CompleteOrthogonalDecomposition to the table of linear algeba decompositions.
(cherry picked from commit 96e3b4fc95)
2021-08-12 16:49:40 +00:00
Antonio Sanchez
0d89012708 Update code snippet for tridiagonalize_inplace.
(cherry picked from commit fb1718ad14)
2021-08-12 15:37:32 +00:00
Rasmus Munk Larsen
6d2506040c * revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B.
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.

Authored by @awoniu.

(cherry picked from commit 8ce341caf2)
2021-08-11 18:11:26 +00:00
Nikolay Tverdokhleb
cb44a003de Do not set AnnoyingScalar::dont_throw if not defined EIGEN_TEST_ANNOYING_SCALAR_DONT_THROW.
- Because that member is not declared if the macro is defined.


(cherry picked from commit f1b899eef7)
2021-08-11 16:39:44 +00:00
ChipKerchner
13d7658c5d Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).
(cherry picked from commit 413bc491f1)
2021-08-10 20:40:54 +00:00
jenswehner
338924602d added includes for unordered_map
(cherry picked from commit e3e74001f7)
2021-08-10 16:10:03 +00:00
Gauri Deshpande
93bff85a42 remove denormal flushing in fp32tobf16 for avx & avx512
(cherry picked from commit e6a5a594a7)
2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen
4e0357c6dd Avoid memory allocation in tridiagonalization_inplace_selector::run.
(cherry picked from commit a5a7faeb45)
2021-08-06 21:48:00 +00:00
Daniel N. Miller (APD)
1e9f623f3e Do not build shared libs if not supported
(cherry picked from commit 09d7122468)
2021-08-06 21:47:37 +00:00
Jens Wehner
4240b480e0 updated documentation for middleCol and middleRow
(cherry picked from commit 4d870c49b7)
2021-08-05 17:53:36 +00:00
Antonio Sanchez
5b83d3c4bc Make inverse 3x3 faster and avoid gcc bug.
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory.  I'm *pretty* sure it's
a false positive, but it's hard to trigger.  The same warning
does not trigger with clang or later compiler versions.

In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.

```
name                                            old cpu/op  new cpu/op  delta
BM_Inverse3x3<DynamicMatrix3T<float>>            423ns ± 2%   433ns ± 3%   +2.32%    (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>>           425ns ± 2%   427ns ± 3%   +0.48%    (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>>            7.10ns ± 2%  0.80ns ± 1%  -88.67%  (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>>           7.45ns ± 2%  1.34ns ± 1%  -82.01%  (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>>     409ns ± 3%   419ns ± 3%   +2.40%   (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>>    414ns ± 3%   413ns ± 2%     ~       (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>>     7.57ns ± 1%  0.80ns ± 1%  -89.37%  (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>>    9.09ns ± 1%  2.58ns ±41%  -71.60%  (p=0.000 n=113+116)
```


(cherry picked from commit 5ad8b9bfe2)
2021-08-04 22:06:52 +00:00
Antonio Sanchez
46ecdcd745 Fix MPReal detection and support.
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.

Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`.  It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.

Fixes #2282.


(cherry picked from commit 31f796ebef)
2021-08-03 18:13:12 +00:00
Antonio Sanchez
9a1691a14e Fix cmake warnings, FindPASTIX/FindPTSCOTCH.
We were getting a lot of warnings due to nested `find_package` calls
within `Find***.cmake` files.  The recommended approach is to use
[`find_dependency`](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html)
in package configuration files. I made this change for all instances.

Case mismatches between `Find<Package>.cmake` and calling
`find_package(<PACKAGE>`) also lead to warnings. Fixed for
`FindPASTIX.cmake` and `FindSCOTCH.cmake`.

`FindBLASEXT.cmake` was broken due to calling `find_package_handle_standard_args(BLAS ...)`.
The package name must match, otherwise the `find_package(BLASEXT)` falsely thinks
the package wasn't found.  I changed to `BLASEXT`, but then also copied that value
to `BLAS_FOUND` for compatibility.

`FindPastix.cmake` had a typo that incorrectly added `PTSCOTCH` when looking for
the `SCOTCH` component.

`FindPTSCOTCH` incorrectly added `***-NOTFOUND` to include/library lists,
corrupting them.  This led to cmake errors down-the-line.

Fixes #2288.


(cherry picked from commit 1cdec38653)
2021-08-03 17:48:20 +00:00
Antonio Sanchez
bb33880e57 Fix TriSycl CMake files.
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.

Also, trisycl now requires c++17.


(cherry picked from commit 8cf6cb27ba)
2021-08-03 17:25:17 +00:00
Antonio Sanchez
237c59a2aa Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset.
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.

This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.

Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.

For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.

Fixes #2201.


(cherry picked from commit 3d98a6ef5c)
2021-08-03 16:32:59 +00:00
Antonio Sanchez
3dc42eeaec Enable equality comparisons on GPU.
Since `std::equal_to::operator()` is not a device function, it
fails on GPU.  On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).

Replacing this with a portable version enables comparisons on device.

Addresses #2292 - would need to be cherry-picked.  The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.


(cherry picked from commit 7880f10526)
2021-08-03 16:15:44 +00:00
hyunggi-sv
7adc1545b4 fix:typo in dox (has->have)
(cherry picked from commit 02a0e79c70)
2021-08-03 00:54:41 +00:00
Antonio Sanchez
c0c7b695cd Fix assignment operator issue for latest MSVC+NVCC.
Details are scattered across #920, #1000, #1324, #2291.

Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors).  This mess tries to cover
all the cases encountered.

Fixes #2291.


(cherry picked from commit 9816fe59b4)
2021-08-03 00:52:21 +00:00
Alexander Karatarakis
c334eece44 _DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier
(cherry picked from commit f357283d31)
2021-07-29 18:18:47 +00:00
Jonas Harsch
5ccb72b2e4 Fixed typo in TutorialSparse.dox
(cherry picked from commit 5b81764c0f)
2021-07-26 14:33:10 +00:00
arthurfeeney
9c90d5d832 Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.
(cherry picked from commit a77638387d)
2021-07-22 18:01:55 +00:00
Antonio Sanchez
5d37114fc0 Fix explicit default cache size typo.
(cherry picked from commit 297f0f563d)
2021-07-20 18:42:25 +00:00
Rohit Santhanam
930696fc53 Enable extract et. al. for HIP GPU.
(cherry picked from commit beea14a18f)
2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen
56966fd2e6 Defer to std::fill_n when filling a dense object with a constant value.
(cherry picked from commit 0c361c4899)
2021-07-09 03:59:56 +00:00
Jonas Harsch
5a3c9eddb4 Removed superfluous boolean degenerate in TensorMorphing.h.
(cherry picked from commit e9c9a3130b)
2021-07-08 18:34:10 +00:00
Guoqiang QI
69ec4907da Make a copy of input matrix when try to do the inverse in place, this fixes #2285.
(cherry picked from commit 4bcd42c271)
2021-07-08 17:07:54 +00:00
Antonio Sanchez
7571704a43 Fix CMake directory issues.
Allows absolute and relative paths for
- `INCLUDE_INSTALL_DIR`
- `CMAKEPACKAGE_INSTALL_DIR`
- `PKGCONFIG_INSTALL_DIR`

Type should be `PATH` not `STRING`.  Contrary to !211, these don't
seem to be made absolute if user-defined - according to the doc any
directories should use `PATH` type, which allows a file dialog
to be used via the GUI.  It also better handles file separators.

If user provides an absolute path, it will be made relative to
`CMAKE_INSTALL_PREFIX` so that the `configure_packet_config_file` will
work.

Fixes #2155 and #2269.


(cherry picked from commit f44f05532d)
2021-07-07 17:44:00 +00:00
Antonio Sanchez
84955d109f Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.


(cherry picked from commit f5a9873bbb)
2021-07-07 17:18:20 +00:00
Jonas Harsch
601814b575 Don't crash when attempting to shuffle an empty tensor.
(cherry picked from commit aab747021b)
2021-07-02 21:08:38 +00:00
Rasmus Munk Larsen
05bab8139a Fix breakage of conj_helper in conjunction with custom types introduced in !537.
(cherry picked from commit 7b35638ddb)
2021-07-02 20:59:50 +00:00
Chip Kerchner
eebde572d9 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow
(cherry picked from commit 91e99ec1e0)
2021-07-01 23:32:38 +00:00
Antonio Sanchez
8190739f12 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.


(cherry picked from commit 6035da5283)
2021-07-01 23:18:10 +00:00
Antonio Sanchez
b6db013435 Fix inverse nullptr/asan errors for LU.
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.


(cherry picked from commit 154f00e9ea)
2021-07-01 22:57:25 +00:00
Dan Miller
1f6b1c1a1f Fix duplicate definitions on Mac
(cherry picked from commit eb04775903)
2021-07-01 20:49:05 +00:00
Alexander Karatarakis
517294d6e1 Make DenseStorage<> trivially_copyable
(cherry picked from commit 60400334a9)
2021-07-01 20:48:47 +00:00
大河メタル
94e2250b36 Correct declarations for aarch64-pc-windows-msvc
(cherry picked from commit c81da59a25)
2021-06-30 04:10:04 +00:00
Antonio Sanchez
d82d915047 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.


(cherry picked from commit 3a087ccb99)
2021-06-29 23:28:37 +00:00
Rasmus Munk Larsen
380d0e4916 Get rid of redundant pabs instruction in complex square root.
(cherry picked from commit 5aebbe9098)
2021-06-29 23:27:09 +00:00
Rohit Santhanam
e83af2cc24 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.


(cherry picked from commit 2d132d1736)
2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen
413ff2b531 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
(cherry picked from commit bffd267d17)
2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen
a235ddef39 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
(cherry picked from commit 52a5f98212)
2021-06-24 23:30:42 +00:00
Rasmus Munk Larsen
4780d8dfb2 Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp
(cherry picked from commit c8a2b4d20a)
2021-06-21 19:07:17 +00:00
Rasmus Munk Larsen
fd5d23fdf3 Update ComplexEigenSolver_eigenvectors.cpp
(cherry picked from commit ea62c937ed)
2021-06-21 19:06:54 +00:00
Antonio Sanchez
a2040ef796 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.


(cherry picked from commit e9ab4278b7)
2021-06-21 18:14:53 +00:00
Antonio Sanchez
c2c0f6f64b Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267


(cherry picked from commit 35a367d557)
2021-06-21 17:26:07 +00:00
Antonio Sanchez
ee4e099aa2 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.


(cherry picked from commit 12e8d57108)
2021-06-17 17:11:08 +00:00
Chip-Kerchner
9fc93ce31a EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
(cherry picked from commit ef1fd341a8)
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28 Add missing ppc pcmp_lt_or_nan<Packet8bf>
(cherry picked from commit 9e94c59570)
2021-06-15 22:12:22 +00:00
Antonio Sanchez
2d6eaaf687 Fix placement of permanent GPU defines.
(cherry picked from commit 954879183b)
2021-06-15 19:18:20 +00:00
Rasmus Munk Larsen
47722a66f2 Fix more enum arithmetic.
(cherry picked from commit 13fb5ab92c)
2021-06-15 16:40:35 +00:00
Antonio Sanchez
5e75331b9f Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.


(cherry picked from commit ad82d20cf6)
2021-06-12 00:02:26 +00:00
Antonio Sanchez
b5fc69bdd8 Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.


(cherry picked from commit 514977f31b)
2021-06-11 17:48:37 +00:00
Antonio Sanchez
4b683b65df Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.


(cherry picked from commit 6aec83263d)
2021-06-11 17:19:29 +00:00
Rasmus Munk Larsen
1cb1ffd5b2 Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
(cherry picked from commit fc87e2cbaa)
2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen
4b502a7215 Fix c++20 warnings about using enums in arithmetic expressions.
(cherry picked from commit f64b2954c7)
2021-06-11 02:35:19 +00:00
Nicolas Cornu
85868564df Fix parsing of version for nvhpc
As the first line of the version is empty it crashes,
so delete first line if it is empty


(cherry picked from commit 001a57519a)
2021-06-10 18:50:22 +00:00
Rohit Santhanam
cbb6ae6296 Removed dead code from GPU float16 unit test.
(cherry picked from commit c8d40a7bf1)
2021-06-10 17:16:47 +00:00
Cyril Kaiser
573570b6c9 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.
(cherry picked from commit 91cd67f057)
2021-05-26 19:45:25 +00:00
Antonio Sanchez
98cf1e076f Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.


(cherry picked from commit dba753a986)
2021-05-25 19:09:50 +00:00
Antonio Sanchez
ee2a8f7139 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251


(cherry picked from commit ebb300d0b4)
2021-05-25 18:19:53 +00:00
Jakub Lichman
3835046309 predux_half_dowto4 test extended to all applicable packets
(cherry picked from commit 12471fcb5d)
2021-05-21 16:58:16 +00:00
Steve Bronder
4fbd01cd4b Adds macro for checking if C++14 variable templates are supported
(cherry picked from commit 1720057023)
2021-05-21 16:43:30 +00:00
Niall Murphy
a883a8797c Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.


(cherry picked from commit 391094c507)
2021-05-20 23:43:57 +00:00
Jakub Lichman
0bd9e9bc45 ptranpose test for non-square kernels added
(cherry picked from commit 8877f8d9b2)
2021-05-20 19:27:20 +00:00
Guoqiang QI
77c66e368c Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 .
(cherry picked from commit 3e006bfd31)
2021-05-13 15:03:47 +00:00
guoqiangqi
2f908f8255 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
(cherry picked from commit 3d9051ea84)
2021-05-12 17:02:19 +00:00
Nathan Luehr
82f13830e6 Fix calls to device functions from host code
(cherry picked from commit 972cf0c28a)
2021-05-12 17:01:45 +00:00
Nathan Luehr
d1825cbb68 Device implementation of log for std::complex types.
(cherry picked from commit 7e6a1c129c)
2021-05-11 22:31:53 +00:00
Nathan Luehr
d9288f078d Fix ambiguity due to argument dependent lookup.
(cherry picked from commit 6753f0f197)
2021-05-11 22:00:36 +00:00
Rohit Santhanam
85ebd6aff8 Fix for issue where numext::imag and numext::real are used before they are defined.
(cherry picked from commit 39ec31c0ad)
2021-05-10 20:14:10 +00:00
Antonio Sanchez
2947c0cc84 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.


(cherry picked from commit c0eb5f89a4)
2021-05-07 18:38:23 +00:00
Antonio Sanchez
25424f4cf1 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.


(cherry picked from commit 0eba8a1fe3)
2021-05-07 18:13:40 +00:00
Antonio Sanchez
42acbd5700 Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.


(cherry picked from commit 90e9a33e1c)
2021-05-07 17:52:07 +00:00
Christoph Hertzberg
9e0dc8f09b Revert addition of unused paddsub<Packet2cf>. This fixes #2242
(cherry picked from commit 722ca0b665)
2021-05-07 16:23:03 +00:00
Antonio Sanchez
da19f7a910 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.


(cherry picked from commit e3b7f59659)
2021-05-05 23:37:48 +00:00
Antonio Sanchez
fc2cc10842 Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.


(cherry picked from commit 1c013be2cc)
2021-04-29 17:58:45 +00:00
Antonio Sanchez
a33855f6ee Add missing pcmp_lt_or_nan for NEON Packet4bf.
(cherry picked from commit 172db7bfc3)
2021-04-27 21:15:08 +00:00
Theo Fletcher
83df5df61b Added complex matrix unit tests for SelfAdjointEigenSolve
(cherry picked from commit 2ced0cc233)
2021-04-26 19:18:53 +00:00
Jakub Lichman
ac3c5aad31 Tests added and AVX512 bug fixed for pcmp_lt_or_nan
(cherry picked from commit d87648a6be)
2021-04-26 18:07:55 +00:00
Jakub Lichman
63abb10000 Tests for pcmp_lt and pcmp_le added
(cherry picked from commit 1115f5462e)
2021-04-23 19:52:23 +00:00
Turing Eret
baf601a0e3 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.


(cherry picked from commit 3804ca0d90)
2021-04-23 19:06:16 +00:00
Antonio Sanchez
587a691516 Check existence of BSD random before use.
`TensorRandom` currently relies on BSD `random()`, which is not always
available.  The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
               || /* Glibc since 2.19: */ _DEFAULT_SOURCE
	       || /* Glibc <= 2.19: */ _SVID_SOURCE ||  _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.


(cherry picked from commit 045c0609b5)
2021-04-23 00:35:05 +00:00
Antonio Sanchez
8830d66c02 DenseStorage safely copy/swap.
Fixes #2229.

For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set.  Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.


(cherry picked from commit d213a0bcea)
2021-04-22 21:05:50 +00:00
Rasmus Munk Larsen
54425a39b2 Make vectorized compute_inverse_size4 compile with AVX.
(cherry picked from commit 85a76a16ea)
2021-04-22 17:25:25 +00:00
Jakub Lichman
34d0be9ec1 Compilation of basicbenchmark fixed
(cherry picked from commit d72c794ccd)
2021-04-21 12:09:42 +02:00
Jakub Lichman
42a8bdd4d7 HasExp added for AVX512 Packet8d
(cherry picked from commit 2b1dfd1ba0)
2021-04-21 12:09:21 +02:00
Chip-Kerchner
28564957ac Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).
(cherry picked from commit 06c2760bd1)
2021-04-21 01:05:21 +00:00
Antonio Sanchez
ab7fe215f9 Fix ldexp for AVX512 (#2215)
Wrong shuffle was used.  Need to interleave low/high halves with a
`permute` instruction.

Fixes #2215.


(cherry picked from commit 1d79c68ba0)
2021-04-20 20:52:26 +00:00
David Tellenbach
1f4c0311cd Bump to 3.3.91 (3.4-rc1) 2021-04-18 23:43:12 +02:00
David Tellenbach
3e819d83bf Before 3.4 branch 2021-04-18 23:36:14 +02:00
Antonio Sanchez
69adf26aa3 Modify googlehash use to account for namespace issues.
The namespace declaration for googlehash is a configurable macro that
can be disabled.  In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.

Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution.  Symbols within
the `::google` namespace are imported into `Eigen::google`.

We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
2021-04-12 19:00:39 -07:00
Christoph Hertzberg
9357feedc7 Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>. 2021-04-13 01:36:59 +02:00
Rasmus Munk Larsen
a2c0542010 Fix typo in TensorDimensions.h 2021-04-12 18:59:56 +00:00
Rohit Santhanam
dfd6720d82 Fix for float16 GPU unit test. 2021-04-12 10:19:06 +00:00
Christoph Hertzberg
1e1c8a735c Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
Fixes #2209
2021-04-12 01:26:15 +00:00
Jens Wehner
f6fc66aa75 fixed doxygen for unsupported iterative solver module 2021-04-11 16:26:14 +00:00
Christoph Hertzberg
d58678069c Make iterators default constructible and assignable, by making... 2021-04-09 17:03:28 +00:00
Rohit Santhanam
2859db0220 This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.

The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.

But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.

The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.

Previously, the GPU specialization is chosen only if Vectorization is not used.
2021-04-08 15:14:48 +00:00
Antonio Sanchez
fcb5106c6e Scaled epsilon the wrong way.
Should have been 0.5 to widen the bounds, since this is inverse
precision.  Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35 Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189).
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen
22edb46823 Align local arrays to Packet boundary. 2021-04-06 16:22:36 +00:00
Antonio Sanchez
ace7f132ed Fix clang tidy warnings in AnnoyingScalar.
Clang-tidy complains that full specializations in headers can cause ODR
violations.  Marked these as `inline` to fix.

It also complains about renaming arguments in specializations.  Set the
argument names to match.
2021-04-05 12:49:38 -07:00
Antonio Sanchez
90187a33e1 Fix SelfAdjoingEigenSolver (#2191)
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.

Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.

Fixes #2191.
2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen
3ddc0974ce Fix two bugs in commit 2021-04-02 22:06:27 +00:00
Chip Kerchner
c24bee6120 Fix address of temporary object errors in clang11.
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
David Tellenbach
e4233b6e3d Add CI infrastructure for pre-merge smoke tests.
This patch adds pre-merge smoke tests for x86 Linux using gcc-10 and clang-10.

Closes #2188.
2021-04-01 00:08:37 +00:00
David Tellenbach
ae95b74af9 Add CMake infrastructure for smoke testing
Necessary CMake changes to implement pre-merge smoke tests
running via CI.
2021-03-31 22:09:00 +00:00
Rasmus Munk Larsen
5bbc9cea93 Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input.
Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
2021-03-31 21:09:19 +00:00
Guoqiang QI
b5a926a0f6 Add GitLab templates for issues and merge requests
This patch adds GitLab templates for bug reports, feature and merge requests.

This closes #2117.
2021-03-31 16:01:12 +00:00
Antonio Sanchez
78ee3d6261 Fix CUDA constexpr issues for numeric_limits.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F            __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`.  For portability, we
need to remove `constexpr` from the affected functions.

For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
2021-03-30 18:01:27 +00:00
Antonio Sanchez
af1247fbc1 Use Index type in loop over coefficients.
Previously was `int`.  Brought up by Kyle Snow (Polaris Geospatial
Services) on the mailing list.
2021-03-29 17:40:55 +00:00
Antonio Sanchez
87729ea39f Eliminate round_impl double-promotion warnings for c++03. 2021-03-25 16:52:19 +00:00
Deven Desai
748489ef9c Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform
The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit

e7b8643d70

```
In file included from /home/rocm-user/eigen/test/main.h:360:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:162:
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
  static float (max)() {
                ^
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression
    return HIPRT_MAX_NORMAL_F;
           ^
/home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F'
#define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff)
                           ^
/opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here
__device__ static inline float __int_as_float(int x) {
                               ^
```

The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file.

Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
2021-03-25 13:45:52 +00:00
Chip Kerchner
d59ef212e1 Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). 2021-03-25 11:08:19 +00:00
Steve Bronder
e7b8643d70 Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010.
2021-03-24 18:14:56 +00:00
Antonio Sanchez
5521c65afb Eliminate mixingtypes_7 warning.
`g_called` is not used in subtest 7, so was generating a
`-Wunneeded-internal-declaration` warnings.  Here we silence
it by initializing the static variable.
2021-03-24 11:05:41 -07:00
Christoph Hertzberg
69a4f70956 Revert "Uses _mm512_abs_pd for Packet8d pabs"
This reverts commit f019b97aca
2021-03-23 18:52:19 +00:00
David Tellenbach
824272cde8 Re-enable CI for Power 2021-03-22 19:28:25 +01:00
David Tellenbach
4811e81966 Remove yet another comma at end of enum 2021-03-18 23:30:00 +01:00
Steve Bronder
f019b97aca Uses _mm512_abs_pd for Packet8d pabs 2021-03-18 15:47:52 +00:00
David Tellenbach
0cc9b5eb40 Split test commainitializer into two substests 2021-03-18 13:28:51 +01:00
Antonio Sanchez
c3fbc6cec7 Use singleton pattern for static registered tests.
The original fails with nvcc+msvc - there's a static order of initialization
issue leading to registered tests being cleared.  The test then fails on
```
VERIFY(EigenTest::all().size()>0);
```
since `EigenTest` no longer contains any tests.  The singleton pattern
fixes this.
2021-03-18 00:56:31 +00:00
Niek Bouman
ed964ba3f1 Proposed fix for issue #2187 2021-03-18 00:55:36 +00:00
Antonio Sanchez
8dfe1029a5 Augment NumTraits with min/max_exponent() again.
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible.  Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.

The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.

Fixes #2148
2021-03-16 20:12:46 -07:00
David Tellenbach
eb71e5db98 Fix another warning on missing commas 2021-03-17 03:07:04 +01:00
David Tellenbach
df4bc2731c Revert "Augment NumTraits with min/max_exponent()."
This reverts commit 75ce9cd2a7.
2021-03-17 03:06:08 +01:00
Antonio Sanchez
75ce9cd2a7 Augment NumTraits with min/max_exponent().
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase.  Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.

Fixes #2148
2021-03-17 01:00:41 +00:00
David Tellenbach
9fb7062440 Silence warning on comma at end of enumerator list 2021-03-17 01:46:52 +01:00
Theo Fletcher
b8502a9dd6 Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary. 2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen
2e83cbbba9 Add NaN propagation options to minCoeff/maxCoeff visitors. 2021-03-16 17:02:50 +00:00
Jens Wehner
c0a889890f Fixed output of complex matrices 2021-03-15 21:51:55 +00:00
Antonio Sanchez
f612df2736 Add fmod(half, half).
This is to support TensorFlow's `tf.math.floormod` for half.
2021-03-15 13:32:24 -07:00
Antonio Sanchez
14b7ebea11 Fix numext::round pre c++11 for large inputs.
This is to resolve an issue for large inputs when +0.5 can
actually lead to +1 if the input doesn't have enough precision
to resolve the addition - leading to an off-by-one error.

See discussion on 9a663973.
2021-03-15 19:08:04 +00:00
Chip Kerchner
c9d4367fa4 Fix pround and add print 2021-03-15 19:07:43 +00:00
Antonio Sanchez
d24f9f9b55 Fix NVCC+ICC issues.
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.

ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h.  This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.

ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
  static ConstMapType Map(const Scalar *data)

```
I tracked this down to `friend class Eigen::Map`.  Putting the `friend`
statements at the bottom of the class seems to resolve this issue.

Fixes #2180
2021-03-15 18:42:04 +00:00
Antonio Sanchez
14487ed14e Add increment/decrement operators to Eigen::half.
This is for consistency with bfloat16, and to support initialization
with `std::iota`.
2021-03-15 10:52:23 -07:00
Antonio Sanchez
b271110788 Bump up rand histogram threshold.
The previous one sometimes fails for MSVC which has a poor random
number generator.

Fixes #2182
2021-03-10 22:17:03 -08:00
Antonio Sanchez
d098c4d64c Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.
Doesn't seem to correctly select the register type, and most types
lead to compiler crashes.
2021-03-10 16:05:01 -08:00
Antonio Sanchez
543e34ab9d Re-implement move assignments.
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.

Here we pass down the move assignment to the underlying storage.  Static
storage does a one-way copy, dynamic storage does a swap.

Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.

Fixes: #2119
2021-03-10 16:55:20 +00:00
Ben Niu
b8d1857f0d [MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons. 2021-03-10 10:21:31 +00:00
Antonio Sanchez
853a5c4b84 Fix ambiguous call to CUDA __half constructor. 2021-03-08 21:06:28 -08:00
Antonio Sanchez
94327dbfba Fix typo: DEVICE -> GPU 2021-03-08 11:21:00 -08:00
Antonio Sanchez
1296abdf82 Fix non-trivial Half constructor for CUDA.
Both CUDA and HIP require trivial default constructors for types used
in shared memory. Otherwise failing with
```
error: initialization is not supported for __shared__ variables.
```
2021-03-08 07:32:54 -08:00
Antonio Sanchez
6045243141 Revert stack allocation limit change that crept in.
This was accidentally introduced when copying changes between repos.
2021-03-05 14:29:37 -08:00
Deven Desai
1a96d49afe Changing the Eigen::half implementation for HIP
Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build.

In the ROCm Tensorflow build,
* files that do not contain ant GPU code get compiled via gcc, and
* files that contnain GPU code get compiled via hipcc.

In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function.

The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function.

Changing the Eigen::half argument to be "pass by reference" seems to workaround the error.

In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
2021-03-05 19:27:13 +00:00
Antonio Sanchez
2468253c9a Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.

Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise.  This simplifies checks for supported
features.

Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.

Fixes: #2170
2021-03-05 18:33:18 +00:00
Antonio Sanchez
82d61af3a4 Fix rint SSE/NEON again, using optimization barrier.
This is a new version of !423, which failed for MSVC.

Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).

Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.

Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part.  If we are
over `2^digits` then just return the input.  This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied.  Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
David Tellenbach
5f0b4a4010 Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"
This reverts commit 6cbb3038ac because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
2021-03-05 13:16:43 +01:00
Steve Bronder
6cbb3038ac Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size() 2021-03-04 18:58:08 +00:00
David Tellenbach
5bfc67f9e7 Deactive CI for Power due to problems with GitLab runner 2021-03-04 17:33:40 +01:00
Eugene Zhulenev
a6601070f2 Add log2 operation to TensorBase 2021-03-04 00:13:36 +00:00
Antonio Sánchez
9a663973b4 Revert "Fix rint for SSE/NEON."
This reverts commit e72dfeb8b9
2021-03-03 18:51:51 +00:00
Antonio Sanchez
e72dfeb8b9 Fix rint for SSE/NEON.
It seems *sometimes* with aggressive optimizations the combination
`psub(padd(a, b), b)` trick to force rounding is compiled away. Here
we replace with inline assembly to prevent this (I tried `volatile`,
but that leads to additional loads from memory).

Also fixed an edge case for large inputs `a` where adding `b` bumps
the value up a power of two and ends up rounding away more than
just the fractional part.  If we are over `2^digits` then just return
the input.  This edge case was missed in the test since the test was
comparing approximate equality, which was still satisfied.  Adding
a strict equality option catches it.
2021-03-03 09:41:46 -08:00
Christoph Hertzberg
199c5f2b47 geo_alignedbox_5 was failing with AVX enabled, due to storing Vector4d in a std::vector without using an aligned allocator.
Got rid of using `std::vector` and simplified the code.
Avoid leading `_`
2021-03-01 03:59:21 +01:00
Antonio Sanchez
1e0c7d4f49 Add print for SSE/NEON, use NEON rounding intrinsics if available.
In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the
current rounding mode.

For NEON, we use the provided intrinsics for rint/floor/ceil if
available (armv8).

Related to #1969.
2021-02-27 22:42:07 +00:00
David Tellenbach
976ae0ca6f Document that using raw function pointers doesn't work with unaryExpr. 2021-02-27 22:58:42 +01:00
Antonio Sanchez
c65c2b31d4 Make half/bfloat16 constructor take inputs by value, fix powerpc test.
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.

Changing `half` to take inputs by value fixes this.  Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.

Also eliminated some compile warnings for powerpc.
2021-02-27 21:32:06 +00:00
Christoph Hertzberg
39a590dfb6 Remove unused include 2021-02-27 19:02:33 +01:00
Christoph Hertzberg
8f686ac4ec clang 10 aggressively warns about precision loss when converting int to float (or long to double)
(cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
2660d01fa7 Inherit from no_assignment_operator to avoid implicit copy constructor warnings
(cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
a3521d743c Fix some enum-enum conversion warnings
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
ca528593f4 Fixed/masked more implicit copy constructor warnings
(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
81b5fe2f0a ReturnByValue is already non-copyable
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
4fb3459a23 Fix double-promotion warnings
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Jens Wehner
4bfcee47b9 Idrs iterative linear solver 2021-02-27 12:09:33 +00:00
Antonio Sanchez
29ebd84cb7 Fix NEON sqrt for 32-bit, add prsqrt.
With !406, we accidentally broke arm 32-bit NEON builds, since
`vsqrt_f32` is only available for 64-bit.

Here we add back the `rsqrt` implementation for 32-bit, relying
on a `prsqrt` implementation with better handling of edge cases.

Note that several of the 32-bit NEON packet tests are currently
failing - either due to denormal handling (NEON versions flush
to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen
fe19714f80 Merge branch 'rmlarsen1/eigen-nan_prop' 2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen
e67672024d Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen
5e7d4c33d6 Add TODO. 2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen
fb5b59641a Defer default for minCoeff/maxCoeff to templated variant. 2021-02-26 09:07:00 -08:00
Antonio Sanchez
e19829c3b0 Fix floor/ceil for NEON fp16.
Forgot to test this.  Fixes bug introduced in !416.
2021-02-25 20:39:56 -08:00
Antonio Sanchez
5529db7524 Fix SSE/NEON pfloor/pceil for saturated values.
The original will saturate if the input does not fit into an integer
type.  Here we fix this, returning the input if it doesn't have
enough precision to have a fractional part.

Also added `pceil` for NEON.

Fixes #1969.
2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen
51eba8c3e2 Fix indentation. 2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen
5297b7162a Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 18:21:21 +00:00
Antonio Sanchez
ecb7b19dfa Disable new/delete test for HIP 2021-02-25 08:04:05 -08:00
Chip-Kerchner
6eebe97bab Fix clang compile when no MMA flags are set. Simplify MMA compiler detection. 2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen
f284c8592b Don't crash when attempting to slice an empty tensor. 2021-02-24 18:12:51 -08:00
Rasmus Munk Larsen
4cb0592af7 Fix indentation. 2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen
6b34568c74 Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-24 17:54:58 -08:00
Rasmus Munk Larsen
0065f9d322 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen
841c8986f8 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-24 17:49:20 -08:00
Rasmus Munk Larsen
113e61f364 Remove unused function scalar_cmp_with_cast. 2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen
98ca58b02c Cast anonymous enums to int when used in expressions. 2021-02-24 23:50:06 +00:00
Chip-Kerchner
c31ead8a15 Having forward template function declarations in a P10 file causes bad code in certain situations. 2021-02-24 23:43:30 +00:00
Guoqiang QI
f44197fabd Some improvements for kissfft from Martin Reinecke(pocketfft author):
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
2021-02-24 21:36:47 +00:00
Antonio Sanchez
a31effc3bc Add invoke_result and eliminate result_of warnings for C++17+.
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.

Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.

Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.

For completeness, added nullary op support for c++03.

Fixes #1850.
2021-02-24 21:36:14 +00:00
Chip-Kerchner
8523d447a1 Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins. 2021-02-24 20:49:15 +00:00
Antonio Sanchez
5908aeeaba Fix CUDA device new and delete, and add test.
HIP does not support new/delete on device, so test is skipped.
2021-02-24 11:31:41 -08:00
Antonio Sanchez
119763cf38 Eliminate CMake FindPackageHandleStandardArgs warnings.
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
  The package name passed to `find_package_handle_standard_args` (UMFPACK)
  does not match the name of the calling package (Umfpack).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
  bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
2021-02-24 09:52:05 +00:00
Antonio Sanchez
6cf0ab5e99 Disable fast psqrt for NEON.
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.

Fixes #2094.
2021-02-23 19:52:55 -08:00
Antonio Sanchez
aba3998278 Fix check if GPU compile phase for std::hash 2021-02-23 19:52:08 -08:00
Antonio Sanchez
db5691ff2b Fix some CUDA warnings.
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.

`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`.  Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.

Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.

Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang.  Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen
88d4c6d4c8 Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Adam Shapiro
2ac0b78739 Fixed sparse conservativeResize() when both num cols and rows decreased.
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
2021-02-23 21:32:39 +00:00
Chip-Kerchner
10c77b0ff4 Fix compilation errors with later versions of GCC and use of MMA. 2021-02-22 15:01:47 -06:00
Christoph Hertzberg
73922b0174 Fixes Bug #1925. Packets should be passed by const reference, even to inline functions. 2021-02-20 18:56:42 +01:00
Antonio Sanchez
5f9cfb2529 Add missing adolc isinf/isnan.
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.

Fixed: #2157
2021-02-19 22:26:56 +00:00
Christoph Hertzberg
ce4af0b38f Missing change regarding #1910 2021-02-19 20:51:35 +01:00
Christoph Hertzberg
a7749c09bc Bug #1910: Make SparseCholesky work for RowMajor matrices 2021-02-19 19:36:18 +01:00
Antonio Sánchez
128eebf05e Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)."
This reverts commit 12fd3dd655
2021-02-19 17:09:16 +00:00
frgossen
33e0af0130 Return nan at poles of polygamma, digamma, and zeta if limit is not defined 2021-02-19 16:35:11 +00:00
Rasmus Munk Larsen
7f09d3487d Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp. 2021-02-18 20:49:18 +00:00
Masaki Murooka
12fd3dd655 add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC). 2021-02-17 22:55:47 +00:00
David Tellenbach
aa8b22e776 Bump to 3.4.99 2021-02-17 23:23:17 +01:00
David Tellenbach
5336ad8591 Define internal::make_unsigned for [unsigned]long long on macOS.
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization

  internal::make_unsigned<long long>,

for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00
Antonio Sanchez
0845df7f77 Fix uninitialized warning on AVX. 2021-02-17 13:13:39 -08:00
Chip Kerchner
9b51dc7972 Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product 2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen
be0574e215 New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double. 2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980 Updated pfrexp implementation.
The original implementation fails for 0, denormals, inf, and NaN.

See #2150
2021-02-17 02:23:24 +00:00
David Tellenbach
9ad4096ccb Document possible inconsistencies when using Matrix<bool, ...> 2021-02-17 00:50:26 +01:00
Ashutosh Sharma
f702792a7c missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel) 2021-02-16 16:33:59 +00:00
Jan van Dijk
db61b8d478 Avoid -Wunused warnings in NDEBUG builds.
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
2021-02-12 18:35:35 +00:00
David Tellenbach
622c598944 Don't allow all test jobs to fail but only the currently failing ones. 2021-02-12 14:01:17 +01:00
Antonio Sanchez
90ee821c56 Use vrsqrts for rsqrt Newton iterations.
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
2021-02-11 11:33:51 -08:00
Antonio Sanchez
9fde9cce5d Adjust bounds for pexp_float/double
The original clamping bounds on `_x` actually produce finite values:
```
  exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38

  exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.

This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e Fix ldexp implementations.
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits.  See #2131 for a complete discussion,
and !375 for other possible implementations.

Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.

The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.

Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.

Fixes #2131.
2021-02-10 22:45:41 +00:00
Ashutosh Sharma
7eb07da538 loop less ptranspose 2021-02-10 10:21:37 -08:00
David Tellenbach
36200b7855 Remove vim specific comments to recognoize correct file-type.
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
David Tellenbach
54589635ad Replace nullptr by NULL in SparseLU.h to be C++03 compliant. 2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas
984d010b7b add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves 2021-02-08 22:00:31 +00:00
Nikolaus Demmel
b578930657 Fix documentation typos in LDLT.h 2021-02-08 21:07:29 +00:00
Antonio Sanchez
66841ea070 Enable bdcsvd on host.
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error.  This prevents it from running on
the host as well.

Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host.  Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).

Fixes #2139
2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen
6e3b795f81 Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one. 2021-02-05 16:58:49 -08:00
Antonio Sanchez
abcde69a79 Disable vectorized pow for half/bfloat16.
We are potentially seeing some accuracy issues with these.  Ideally we
would hand off to `float`, but that's not trivial with the current
setup.

We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00
Antonio Sanchez
56c8b14d87 Eliminate implicit conversions from float to double. 2021-02-01 15:31:01 -08:00
Antonio Sanchez
fb4548e27b Implement bit_* for device.
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`).  They also never exist in the
global namespace, so the current implementation  always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.

To overcome these limitations, we implement these functionals here.
2021-02-01 13:27:45 -08:00
Antonio Sanchez
1615a27993 Fix altivec packetmath.
Allows the altivec packetmath tests to pass.  There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.

Related to !370, #2128

cc: @ChipKerchner @pdrocaldeira

Tested on `_BIG_ENDIAN` running on QEMU with VSX.  Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c Fix clang compilation for AltiVec from previous check-in 2021-01-28 18:36:40 +00:00
David Tellenbach
170a504c2f Add the following functions
DenseBase::setConstant(NoChange_t, Index, const Scalar&)
  DenseBase::setConstant(Index, NoChange_t, const Scalar&)

to close #663.
2021-01-28 15:13:07 +01:00
David Tellenbach
598e1b6e54 Add the following functions:
DenseBase::setZero(NoChange_t, Index)
  DenseBase::setZero(Index, NoChange_t)
  DenseBase::setOnes(NoChange_t, Index)
  DenseBase::setOnes(Index, NoChange_t)
  DenseBase::setRandom(NoChange_t, Index)
  DenseBase::setRandom(Index, NoChange_t)

This closes #663.
2021-01-28 01:10:36 +01:00
Gael Guennebaud
0668c68b03 Allow for negative strides.
Note that using a stride of -1 is still not possible because it would
clash with the definition of Eigen::Dynamic.

This fixes #747.
2021-01-27 23:32:12 +01:00
Samir Benmendil
288d456c29 Replace language_support module with builtin CheckLanguage
The workaround_9220 function was introduced a long time ago to
workaround a CMake issue with enable_language(OPTIONAL). Since then
CMake has clarified that the OPTIONAL keywords has not been
implemented[0].

A CheckLanguage module is now provided with CMake to check if a language
can be enabled. Use that instead.

[0] https://cmake.org/cmake/help/v3.18/command/enable_language.html
2021-01-27 13:26:40 +00:00
Antonio Sanchez
3f4684f87d Include <cstdint> in one place, remove custom typedefs
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.

To avoid this, we remove the custom `typedef` definitions for win32.  The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.

Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Chip Kerchner
0784d9f87b Fix sqrt, ldexp and frexp compilation errors. 2021-01-25 15:22:19 -06:00
Gmc2
a4edb1079c fix test of ExtractVolumePatchesOp 2021-01-25 03:23:46 +00:00
Antonio Sanchez
4c42d5ee41 Eliminate implicit conversion warning in test/array_cwise.cpp 2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90 Replace std::isnan with numext::isnan for c++03 2021-01-23 11:02:35 -08:00
Florian Maurin
c35965b381 Remove unused variable in SparseLU.h 2021-01-22 22:24:11 +00:00
Antonio Sanchez
f0e46ed5d4 Fix pow and other cwise ops for half/bfloat16.
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.

While adding tests for half/bfloat16, found other issues related to
implicit conversions.

Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types.  These seem to be  implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
f19bcffee6 Specialize std::complex operators for use on GPU device.
NVCC and older versions of clang do not fully support `std::complex` on device,
leading to either compile errors (Cannot call `__host__` function) or worse,
runtime errors (Illegal instruction).  For most functions, we can
implement specialized `numext` versions. Here we specialize the standard
operators (with the exception of stream operators and member function operators
with a scalar that are already specialized in `<complex>`) so they can be used
in device code as well.

To import these operators into the current scope, use
`EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into
the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces.

This allow us to remove specializations of the
sum/difference/product/quotient ops, and allow us to treat complex
numbers like most other scalars (e.g. in tests).
2021-01-22 18:19:19 +00:00
David Tellenbach
65e2169c45 Add support for Arm SVE
This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*).

During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector.

Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`.

This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements.

This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.
2021-01-21 21:11:57 +00:00
Antonio Sanchez
b2126fd6b5 Fix pfrexp/pldexp for half.
The recent addition of vectorized pow (!330) relies on `pfrexp` and
`pldexp`.  This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.

Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00
Antonio Sanchez
25d8498f8b Fix stable_norm_1 test.
Test enters an infinite loop if size is 1x1 when choosing to select
unique indices for adding `inf` and `NaN` to the input. Here we
revert to non-unique indices, and split the `hypotNorm` check into
two cases: one where both `inf` and `NaN` are added, and one where
only `NaN` is added.
2021-01-21 09:44:42 -08:00
David Tellenbach
660c6b857c Remove std::cerr in iterative solver since we don't have iostream.
This fixes #2123
2021-01-21 11:40:05 +01:00
Antonio Sanchez
d5b7981119 Fix signed-unsigned comparison.
Hex literals are interpreted as unsigned, leading to a comparison between
signed max supported function `abcd[0]`  (which was negative) to the unsigned
literal `0x80000006`.  Should not change result since signed is
implicitly converted to unsigned for the comparison, but eliminates the
warning.
2021-01-20 08:34:00 -08:00
Ivan Popivanov
e409795d6b Proper CPUID 2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen
cdd8fdc32e Vectorize pow(x, y). This closes https://gitlab.com/libeigen/eigen/-/issues/2085, which also contains a description of the algorithm.
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:

```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```

If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:

```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```

which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Antonio Sanchez
bde6741641 Improved std::complex sqrt and rsqrt.
Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously
`complex_sqrt` was only used for CUDA and MSVC), and implements
custom `complex_rsqrt`.

Also introduces `numext::rsqrt` to simplify implementation, and modified
`numext::hypot` to adhere to IEEE IEC 6059 for special cases.

The `complex_sqrt` and `complex_rsqrt` implementations were found to be
significantly faster than `std::sqrt<std::complex<T>>` and
`1/numext::sqrt<std::complex<T>>`.

Benchmark file attached.
```
GCC 10, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           9.21 ns         9.21 ns     73225448
BM_StdSqrt<std::complex<float>>        17.1 ns         17.1 ns     40966545
BM_Sqrt<std::complex<double>>          8.53 ns         8.53 ns     81111062
BM_StdSqrt<std::complex<double>>       21.5 ns         21.5 ns     32757248
BM_Rsqrt<std::complex<float>>          10.3 ns         10.3 ns     68047474
BM_DivSqrt<std::complex<float>>        16.3 ns         16.3 ns     42770127
BM_Rsqrt<std::complex<double>>         11.3 ns         11.3 ns     61322028
BM_DivSqrt<std::complex<double>>       16.5 ns         16.5 ns     42200711

Clang 11, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           7.46 ns         7.45 ns     90742042
BM_StdSqrt<std::complex<float>>        16.6 ns         16.6 ns     42369878
BM_Sqrt<std::complex<double>>          8.49 ns         8.49 ns     81629030
BM_StdSqrt<std::complex<double>>       21.8 ns         21.7 ns     31809588
BM_Rsqrt<std::complex<float>>          8.39 ns         8.39 ns     82933666
BM_DivSqrt<std::complex<float>>        14.4 ns         14.4 ns     48638676
BM_Rsqrt<std::complex<double>>         9.83 ns         9.82 ns     70068956
BM_DivSqrt<std::complex<double>>       15.7 ns         15.7 ns     44487798

Clang 9, Pixel 2, aarch64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           24.2 ns         24.1 ns     28616031
BM_StdSqrt<std::complex<float>>         104 ns          103 ns      6826926
BM_Sqrt<std::complex<double>>          31.8 ns         31.8 ns     22157591
BM_StdSqrt<std::complex<double>>        128 ns          128 ns      5437375
BM_Rsqrt<std::complex<float>>          31.9 ns         31.8 ns     22384383
BM_DivSqrt<std::complex<float>>        99.2 ns         98.9 ns      7250438
BM_Rsqrt<std::complex<double>>         46.0 ns         45.8 ns     15338689
BM_DivSqrt<std::complex<double>>        119 ns          119 ns      5898944
```
2021-01-17 08:50:57 -08:00
Maozhou, Ge
21a8a2487c fix paddings of TensorVolumePatchOp 2021-01-15 11:51:49 +08:00
Guoqiang QI
38ae5353ab 1)provide a better generic paddsub op implementation
2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON
3)make paddsub op support the Packet2cf/Packet4f in SSE
2021-01-13 22:54:03 +00:00
Antonio Sanchez
352f1422d3 Remove inf local variable.
Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`,
causing a compile error here. We don't need the local anyways since it's
only used in one spot.
2021-01-12 10:33:15 -08:00
Antonio Sanchez
2044084979 Remove TODO from Transform::computeScaleRotation()
Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD`
for small matrices (twice as fast for 2x2, 20% faster for 3x3,
1% faster for 10x10).  Since the majority of cases will be small,
let's stick with `JacobiSVD`.  See !361.
2021-01-11 11:30:01 -08:00
Antonio Sanchez
3daf92c7a5 Transform::computeScalingRotation flush determinant to +/- 1.
In the previous code, in attempting to correct for a negative
determinant, we end up multiplying and dividing by a number that
is often very near, but not exactly +/-1.  By flushing to +/-1,
we can replace a division with a multiplication, and results
are more numerically consistent.
2021-01-11 10:13:38 -08:00
Antonio Sanchez
587fd6ab70 Only specialize complex sqrt_impl for CUDA if not MSVC.
We already specialize `sqrt_impl` on windows due to MSVC's mishandling
of `inf` (!355).
2021-01-11 09:15:45 -08:00
Deven Desai
2a6addb4f9 Fix for breakage in ROCm support - 210108
The following commit breaks ROCm support for Eigen
f149e0ebc3

All unit tests fail with the following error

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:166:
/home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt'
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
                                  ^
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here
template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x);
                                     ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2

```

The error message is accurate, and the fix (provided in thsi commit) is trivial.
2021-01-08 18:04:40 +00:00
Antonio Sanchez
f149e0ebc3 Fix MSVC complex sqrt and packetmath test.
MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`.
Here we replace it with a custom version (currently used on GPU).

Also fixed the `packetmath` test, which previously skipped several
corner cases since `CHECK_CWISE1` only tests the first `PacketSize`
elements.
2021-01-08 01:17:19 +00:00
Antonio Sanchez
8d9cfba799 Fix rand test for MSVC.
MSVC's uniform random number generator is not quite as uniform as
others, requiring a slightly wider threshold on the histogram test.
After inspecting histograms for several runs, there's no obvious
bias -- just some bins end up having slightly more less elements
(often > 2% but less than 2.5%).
2021-01-07 12:48:40 -08:00
Essex Edwards
e741b43668 Make Transform::computeRotationScaling(0,&S) continuous 2021-01-07 17:45:14 +00:00
David Tellenbach
0bdc0dba20 Add missing #endif directive in Macros.h 2021-01-07 12:32:41 +01:00
shrek1402
cb654b1c45 #define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors. 2021-01-07 10:12:25 +00:00
Antonio Sanchez
52d1dd979a Fix Ref initialization.
Since `eigen_assert` is a macro, the statements can become noops (e.g.
when compiling for GPU), so they may not execute the contained logic -- which
in this case is the entire `Ref` construction.  We need to separate the assert
from statements which have consequences.

Fixes #2113
2021-01-06 13:14:20 -08:00
Antonio Sanchez
166fcdecdb Allow CwiseUnaryView to be used on device.
Added `EIGEN_DEVICE_FUNC` to methods.
2021-01-06 09:16:52 -08:00
Antonio Sanchez
bb1de9dbde Fix Ref Stride checks.
The existing `Ref` class failed to consider cases where the Ref's
`Stride` setting *could* match the underlying referred object's stride,
but **didn't** at runtime.  This led to trying to set invalid stride values,
causing runtime failures in some cases, and garbage due to mismatched
strides in others.

Here we add the missing runtime checks.  This involves computing the
strides necessary to align with the referred object's storage, and
verifying we can actually set those strides at runtime.

In the `const` case, if it *may* be possible to refer to the original
storage at compile-time but fails at runtime, then we defer to the
`construct(...)` method that makes a copy.

Added more tests to check these cases.

Fixes #2093.
2021-01-05 10:41:25 -08:00
Christoph Hertzberg
12dda34b15 Eliminate boolean product warnings by factoring out a
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Antonio Sanchez
070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
rgreenblatt
fdf2ee62c5 Fix missing EIGEN_DEVICE_FUNC 2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen
05754100fe * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.
* Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA.
* Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster.

The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx.

AVX+FMA (float)

name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 1%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.08ns ± 1%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  95.5ns ± 0%    ~
BM_eigen_sqrt_float/4k     776ns ± 0%   763ns ± 0%  -1.67%
BM_eigen_sqrt_float/32k   6.57µs ± 1%  6.13µs ± 0%  -6.69%
BM_eigen_sqrt_float/256k  83.7µs ± 3%  83.3µs ± 2%    ~
BM_eigen_sqrt_float/1M     335µs ± 2%   332µs ± 2%    ~

SSE+FMA (float)
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 0%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.06ns ± 0%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  96.3ns ± 4%    ~
BM_eigen_sqrt_float/4k     774ns ± 0%   763ns ± 0%  -1.50%
BM_eigen_sqrt_float/32k   6.58µs ± 2%  6.11µs ± 0%  -7.06%
BM_eigen_sqrt_float/256k  82.7µs ± 1%  82.6µs ± 1%    ~
BM_eigen_sqrt_float/1M     330µs ± 1%   329µs ± 2%    ~

SSE+FMA (double)
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.08ns ± 0%   -6.68%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.65%
BM_eigen_sqrt_double/512     417ns ± 0%   374ns ± 1%  -10.29%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -11.00%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.7µs ± 0%  -11.07%
BM_eigen_sqrt_double/256k    213µs ± 0%   206µs ± 1%   -3.31%
BM_eigen_sqrt_double/1M      862µs ± 0%   870µs ± 2%   +0.96%

AVX+FMA (double)
name                        old cpu/op  new cpu/op  delta
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.06ns ± 0%   -6.95%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.80%
BM_eigen_sqrt_double/512     417ns ± 0%   373ns ± 1%  -10.59%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -10.79%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.8µs ± 0%  -10.94%
BM_eigen_sqrt_double/256k    214µs ± 0%   208µs ± 2%   -2.76%
BM_eigen_sqrt_double/1M      866µs ± 3%   923µs ± 7%     ~
2020-12-16 18:16:11 +00:00
Turing Eret
3bee9422d6 Merge branch 'lambdaknight/eigen-master' 2020-12-16 09:18:24 -07:00
Turing Eret
19e6496ce0 Replace call to FixedDimensions() with a singleton instance of
FixedDimensions.
2020-12-16 07:34:44 -07:00
Rasmus Munk Larsen
6cee8d347e Add an additional step of Newton-Raphson for psqrt<double> on Arm, which otherwise has an error of ~1000 ulps. 2020-12-15 04:06:41 +00:00
Turing Eret
bc7d1599fb TensorStorage with FixedDimensions now has zero instance memory overhead.
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
2020-12-14 07:19:34 -07:00
Alexander Grund
cf0b5b0344 Remove code checking for CMake < 3.5
As the CMake version is at least 3.5 the code checking for earlier versions can be removed.
2020-12-14 09:57:44 +00:00
David Tellenbach
751f18f2c0 Remove comma at the end of enumeration list to silence C++03 warnings 2020-12-13 18:11:02 +01:00
Antonio Sanchez
5dc2fbabee Fix implicit cast to double.
Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors
in Google due to `-Wall`.
2020-12-12 09:26:20 -08:00
Antonio Sanchez
55967f87d1 Fix NEON pmax<PropagateNumbers,Packet4bf>.
Simple typo, the max impl called pmin instead of pmax for floats.
2020-12-11 21:50:52 -08:00
Antonio Sanchez
839aa505c3 Fix typo in AVX512 packet math. 2020-12-11 21:35:44 -08:00
David Tellenbach
536c8a79f2 Remove unused macro in Half.h 2020-12-12 00:53:26 +01:00
Antonio Sanchez
8c9976d7f0 Fix more SSE/AVX packet conversions for peven.
MSVC doesn't like function-style casts and forces us to use intrinsics.
2020-12-11 15:46:42 -08:00
Antonio Sanchez
c6efc4e0ba Replace M_LOG2E and M_LN2 with custom macros.
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included.  However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).

To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Antonio Sanchez
e82722a4a7 Fix MSVC SSE casts.
MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be
converted using intrinsic methods.
2020-12-11 08:52:59 -08:00
Deven Desai
f3d2ea48f5 Fix for broken ROCm/HIP Support
The following commit introduced a breakage in ROCm/HIP support for Eigen.

5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:222:
/home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'?
  return half2half2(from);
         ^~~~~~~~~~
         __half2half2
/opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here
            __half2 __half2half2(__half x)
                    ^
1 error generated when compiling for gfx900.

```

The cause seems to be a copy-paster error, and the fix is trivial
2020-12-11 16:14:57 +00:00
David Tellenbach
c7eb3a74cb Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64 2020-12-11 12:41:52 +01:00
Everton Constantino
bccf055a7c Add Armv8 guard on PropagateNumbers implementation. 2020-12-10 22:01:55 -03:00
Antonio Sanchez
82c0c18a83 Remove private access of std::deque::_M_impl.
This no longer works on gcc or clang, so we should just remove the hack.
The default should compile to similar code anyways.
2020-12-10 14:59:34 -08:00
David Tellenbach
00be0a7ff3 Fix vectorization of complex sqrt on NEON 2020-12-10 15:23:23 +00:00
David Tellenbach
8eb461a431 Remove comma at end of enumerator list in NEON PacketMath 2020-12-10 15:22:55 +01:00
David Tellenbach
2e8f850c78 Fix a typo in SparseMatrix documentation.
This fixes issue #2091.
2020-12-09 14:48:24 +01:00
Rasmus Munk Larsen
125cc9a5df Implement vectorized complex square root.
Closes #1905

Measured speedup for sqrt of `complex<float>` on Skylake:

SSE:
```
name                      old time/op             new time/op  delta
BM_eigen_sqrt_ctype/1     49.4ns ± 0%             54.3ns ± 0%  +10.01%
BM_eigen_sqrt_ctype/8      332ns ± 0%               50ns ± 1%  -84.97%
BM_eigen_sqrt_ctype/64    2.81µs ± 1%             0.38µs ± 0%  -86.49%
BM_eigen_sqrt_ctype/512   23.8µs ± 0%              3.0µs ± 0%  -87.32%
BM_eigen_sqrt_ctype/4k     202µs ± 0%               24µs ± 2%  -88.03%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%             0.19ms ± 0%  -88.18%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%              1.5ms ± 1%  -88.20%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%              6.2ms ± 0%  -88.18%
```

AVX2:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.6ns ± 0%  55.6ns ± 0%   +3.71%
BM_eigen_sqrt_ctype/8      334ns ± 0%    27ns ± 0%  -91.86%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.22µs ± 2%  -92.28%
BM_eigen_sqrt_ctype/512   23.8µs ± 1%   1.7µs ± 1%  -92.81%
BM_eigen_sqrt_ctype/4k     201µs ± 0%    14µs ± 1%  -93.24%
BM_eigen_sqrt_ctype/32k   1.62ms ± 0%  0.11ms ± 1%  -93.29%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.9ms ± 1%  -93.31%
BM_eigen_sqrt_ctype/1M    52.0ms ± 0%   3.5ms ± 1%  -93.31%
```

AVX512:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.7ns ± 0%  56.2ns ± 1%   +4.75%
BM_eigen_sqrt_ctype/8      334ns ± 0%    18ns ± 2%  -94.63%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.12µs ± 1%  -95.54%
BM_eigen_sqrt_ctype/512   23.9µs ± 1%   1.0µs ± 1%  -95.89%
BM_eigen_sqrt_ctype/4k     202µs ± 0%     8µs ± 1%  -96.13%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%  0.06ms ± 1%  -96.15%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.5ms ± 4%  -96.11%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%   2.0ms ± 1%  -96.13%
```
2020-12-08 18:13:35 -08:00
Antonio Sanchez
8cfe0db108 Fix host/device calls for __half.
The previous code had `__host__ __device__` functions calling `__device__`
functions (e.g. `__low2half`) which caused build failures in tensorflow.
Also tried to simplify the `#ifdef` guards to make them more clear.
2020-12-08 20:31:02 +00:00
Everton Constantino
baf9d762b7 - Enabling PropagateNaN and PropagateNumbers for NEON.
- Adding propagate tests to bfloat16.
2020-12-08 17:05:05 +00:00
Antonio Sanchez
634bd79b0e Fix unused warning on new dense_assignment_loop impl. 2020-12-07 19:14:21 -08:00
Antonio Sanchez
655c3a4042 Add specialization for compile-time zero-sized dense assignment.
In the current `dense_assignment_loop` implementations, if the
destination's inner or outer size is zero at compile time and if the kernel
involves a product, we currently get a compile error (#2080).  This is
triggered by attempting to multiply a non-existent row by a column (or
vice-versa).

To address this, we add a specialization for zero-sized assignments
(`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static
check to ensure the size is in-fact zero. This now seems to be the only
existing use of `AllAtOnceTraversal`.

Fixes #2080.
2020-12-07 08:38:43 -08:00
Antonio Sanchez
5ec4907434 Clean up #ifs in GPU PacketPath.
Removed redundant checks and redundant code for CUDA/HIP.

Note: there are several issues here of calling `__device__` functions
from `__host__ __device__` functions, in particular `__low2half`.
We do not address that here -- only modifying this file enough
to get our current tests to compile.

Fixed: #1847
2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Antonio Sanchez
2dbac2f99f Fix bad NEON fp16 check 2020-12-04 13:42:18 -08:00
Antonio Sanchez
e2f21465fe Special function implementations for half/bfloat16 packets.
Current implementations fail to consider half-float packets, only
half-float scalars.  Added specializations for packets on AVX, AVX512 and
NEON.  Added tests to `special_packetmath`.

The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.

Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
David Tellenbach
305b8bd277 Remove duplicate #if clause 2020-12-04 18:55:46 +01:00
Antonio Sanchez
9ee9ac81de Fix shfl* macros for CUDA/HIP
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.

Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.

Fixes #2083
2020-12-04 17:18:32 +00:00
shrek1402
a9a2f2bebf The function 'prefetch' did not work correctly on the win64 platform 2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9b.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
25d8ae7465 Small cleanup of generic plog implementations:
Adding the term e*ln(2) is split into two step for no obvious reason.
This dates back to the original Cephes code from which the algorithm is adapted.
It appears that this was done in Cephes to prevent the compiler from reordering
the addition of the 3 terms in the approximation

  log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)

which must be added in reverse order since |x| < (sqrt(2)-1).

This allows rewriting the code to just 2 pmadd and 1 padd instructions,
which on a Skylake processor speeds up the code by 5-7%.
2020-12-03 19:40:40 +00:00
Antonio Sanchez
eb4d4ae070 Include chrono in main for c++11.
Hack to fix tensor tests, since min/max are overridden by `main.h`.
2020-12-03 11:27:32 -08:00
Rasmus Munk Larsen
71c85df4c1 Clean up the Tensor header and get rid of the EIGEN_SLEEP macro. 2020-12-02 11:04:04 -08:00
Antonio Sanchez
70fbcf82ed Fix typo in F32MaskToBf16Mask. 2020-12-02 07:58:34 -08:00
Antonio Sanchez
2627e2f2e6 Fix neon cmp* functions for bf16.
The current impl corrupts the comparison masks when converting
from float back to bfloat16.  The resulting masks are then
no longer all zeros or all ones, which breaks when used with
`pselect` (e.g. in `pmin<PropagateNumbers>`).  This was
causing `packetmath_15` to fail on arm.

Introducing a simple `F32MaskToBf16Mask` corrects this (takes
the lower 16-bits for each float mask).
2020-12-02 01:29:34 +00:00
Antonio Sanchez
ddd48b242c Implement CUDA __shfl* for Eigen::half
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen
e57281a741 Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available. 2020-12-01 11:31:47 -08:00
Antonio Sanchez
1992af3de2 Fix #2077, EIGEN_CONSTEXPR in Half.
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`.  This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.

Fixes #2077.
2020-12-01 03:10:21 +00:00
acxz
7b80609d49 add EIGEN_DEVICE_FUNC to methods 2020-12-01 03:08:47 +00:00
Antonio Sanchez
89f90b585d AVX512 missing ops.
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.

Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Florian Maurin
c5985c46f5 Fix typo in doc 2020-11-30 10:53:29 +00:00
Jim Lersch
68f69414f7 Workaround for doxygen class template titles in which the template
part of the class signature is lost due to a problem with forward
declarations.  The problem is probably caused by doxygen bug #7689.
It is confirmed to be fixed in doxygen >= 1.8.19.
2020-11-27 19:52:16 -07:00
Jim Lersch
a7170f2aca Fix doxygen class blocks that were not associated with the correct classes. 2020-11-27 08:48:11 -07:00
David Tellenbach
550e8f8f57 Include CMakeDependentOption to be able to use cmake_dependent_option 2020-11-27 13:21:49 +01:00
Bowie Owens
9842366bba Make inclusion of doc sub-directory optional by adjusting options.
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().

Requested by:

https://gitlab.com/libeigen/eigen/-/issues/1842

Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
filippobrizzi
aa56e1d980 check for include dirs set 2020-11-26 10:22:46 +00:00
Andreas Krebbel
1e74f93d55 Fix some packet-functions in the IBM ZVector packet-math. 2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen
79818216ed Revert "Fix Half NaN definition and test."
This reverts commit c770746d70.
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a3b300f1af Implement missing AVX half ops.
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`.  Allows `packetmath_13` to pass.

Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45 Fixes duplicate symbol when building blas
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`

Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain

  Eigen::{half,bfloat16} -> float -> bool

We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
41d5d5334b Initialize primitives to fix -Wuninitialized-const-reference.
The `meta` test generates warnings with the latest version of clang due
to passing uninitialized variables as const reference arguments.
```
test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference]
    VERIFY(( check_is_convertible(a.dot(b), f) ));
```
We don't actually use the variables, but initializing them eliminates the
new warning.

Fixes #2067.
2020-11-18 20:23:20 +00:00
Antonio Sanchez
3669498f5a Fix rule-of-3 for the Tensor module.
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.

Addresses #1863
2020-11-18 18:14:53 +00:00
Antonio Sanchez
60218829b7 EOF newline added to InverseSize4.
Causing build breakages due to `-Wnewline-eof -Werror` that seems to be
common across Google.
2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen
2d63706545 Add missing parens around macro argument. 2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109 Replace SSE_SHUFFLE_MASK macro with shuffle_mask. 2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8 Avoid promotion of Arm __fp16 to float in Neon PacketMath
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617 Fix missing EIGEN_CONSTEXPR pop_macro in Half.
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055 Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
Antonio Sanchez
8e9cc5b10a Eliminate double-promotion warnings.
Clang currently complains about implicit conversions, e.g.
```
test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion]
          data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
                                                        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion]
          data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
```

Modified to explicitly cast to double.
2020-11-16 10:39:09 -08:00
acxz
9175f50d6f Add EIGEN_DEVICE_FUNC to TranspositionsBase
Fixes #2057.
2020-11-16 15:37:40 +00:00
Martin Vonheim Larsen
280f4f2407 Enable MathJax in Doxygen.in
Note that HTTPS must be used against the MathJax CDN when hosted on `eigen.tuxfamily.org` (which uses HTTPS) in order to avoid `Mixed Content`-errors from browsers. Using HTTPS for MathJax also works if the Eigen docs are hosted on plain HTTP.
2020-11-16 12:59:13 +00:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
Christoph Hertzberg
90f6d9d23e Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace. 2020-11-13 16:21:53 +01:00
guoqiangqi
8324e5e049 Fix typo in NEON/PacketMath.h 2020-11-13 00:46:41 +00:00
Antonio Sanchez
852513e7a6 Disable testing of OpenGL by default.
The `OpenGLSupport` module contains mostly deprecated features, and the
test is highly GL context-dependent, relies on deprecated GLUT, and
requires a display.  Until the module is updated to support modern
OpenGL and the test to use newer windowing frameworks (e.g. GLFW)
it's probably best to disable the test by default.

The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`.

See #2053 for more details.
2020-11-12 16:15:40 -08:00
Rasmus Munk Larsen
bec72345d6 Simplify expression for inner product fallback in Gemv product evaluator. 2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen
276db21f26 Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct. 2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen
cf12474a8b Optimize matrix*matrix and matrix*vector products when they correspond to inner products at runtime.
This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k).

name                           old time/op             new time/op   delta
BM_VecVecStatStat<float>/1     1.64ns ± 0%             1.64ns ± 0%     ~
BM_VecVecStatStat<float>/8     2.99ns ± 0%             2.99ns ± 0%     ~
BM_VecVecStatStat<float>/64    7.00ns ± 1%             7.04ns ± 0%   +0.66%
BM_VecVecStatStat<float>/512   61.6ns ± 0%             61.6ns ± 0%     ~
BM_VecVecStatStat<float>/4k     551ns ± 0%              553ns ± 1%   +0.26%
BM_VecVecStatStat<float>/32k   4.45µs ± 0%             4.45µs ± 0%     ~
BM_VecVecStatStat<float>/256k  77.9µs ± 0%             78.1µs ± 1%     ~
BM_VecVecStatStat<float>/1M     312µs ± 0%              312µs ± 1%     ~
BM_VecVecDynStat<float>/1      13.3ns ± 1%              4.6ns ± 0%  -65.35%
BM_VecVecDynStat<float>/8      14.4ns ± 0%              6.2ns ± 0%  -57.00%
BM_VecVecDynStat<float>/64     24.0ns ± 0%             10.2ns ± 3%  -57.57%
BM_VecVecDynStat<float>/512     138ns ± 0%               68ns ± 0%  -50.52%
BM_VecVecDynStat<float>/4k     1.11µs ± 0%             0.56µs ± 0%  -49.72%
BM_VecVecDynStat<float>/32k    8.89µs ± 0%             4.46µs ± 0%  -49.89%
BM_VecVecDynStat<float>/256k   78.2µs ± 0%             78.1µs ± 1%     ~
BM_VecVecDynStat<float>/1M      313µs ± 0%              312µs ± 1%     ~
BM_VecVecDynDyn<float>/1       10.4ns ± 0%             10.5ns ± 0%   +0.91%
BM_VecVecDynDyn<float>/8       12.0ns ± 3%             11.9ns ± 0%     ~
BM_VecVecDynDyn<float>/64      37.4ns ± 0%             19.6ns ± 1%  -47.57%
BM_VecVecDynDyn<float>/512      159ns ± 0%               81ns ± 0%  -49.07%
BM_VecVecDynDyn<float>/4k      1.13µs ± 0%             0.58µs ± 1%  -49.11%
BM_VecVecDynDyn<float>/32k     8.91µs ± 0%             5.06µs ±12%  -43.23%
BM_VecVecDynDyn<float>/256k    78.2µs ± 0%             78.2µs ± 1%     ~
BM_VecVecDynDyn<float>/1M       313µs ± 0%              312µs ± 1%     ~
2020-11-12 18:02:37 +00:00
Pedro Caldeira
c29935b323 Add support for dynamic dispatch of MMA instructions for POWER 10 2020-11-12 11:31:15 -03:00
acxz
b714dd9701 remove annotation for first declaration of default con/destruction 2020-11-12 04:34:12 +00:00
mehdi-goli
e24a1f57e3 [SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects. 2020-11-12 01:50:28 +00:00
Antonio Sanchez
6961468915 Address issues with openglsupport test.
The existing test fails on several systems due to GL runtime version mismatches,
the use of deprecated features, and memory errors due to improper use of GLUT.
The test was modified to:

- Run within a display function, allowing proper GLUT cleanup.
- Generate dynamic shaders with a supported GLSL version string and output variables.
- Report shader compilation errors.
- Check GL context version before launching version-specific tests.

Note that most of the existing `OpenGLSupport` module and tests rely on deprecated
features (e.g. fixed-function pipeline). The test was modified to allow it to
pass on various systems. We might want to consider removing the module or re-writing
it entirely to support modern OpenGL.  This is beyond the scope of this patch.

Testing of legacy GL (for platforms that support it) can be enabled by defining
`EIGEN_LEGACY_OPENGL`.  Otherwise, the test will try to create a modern context.

Tested on
- MacBook Air (2019), macOS Catalina 10.15.7 (OpenGL 2.1, 4.1)
- Debian 10.6, NVidia Quadro K1200 (OpenGL 3.1, 3.3)
2020-11-11 15:54:43 -08:00
Everton Constantino
348a48682e Fix erroneous forward declaration of boost nvp. 2020-11-10 13:07:34 -03:00
guoqiangqi
82fe059f35 Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x 2020-11-04 09:21:39 +08:00
Deven Desai
9d11e2c03e CMakefile update for ROCm 4.0
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
Deven Desai
39a038f2e4 Fix for ROCm (and CUDA?) breakage - 201029
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error

e265f7ed8e

```

Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
  return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
                                                                           ^~~~~~
                                                                           Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
          ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```

The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e Remove unused functions in Half.h.
The following functions have been removed:

  Eigen::half fabsh(const Eigen::half&)
  Eigen::half exph(const Eigen::half&)
  Eigen::half sqrth(const Eigen::half&)
  Eigen::half powh(const Eigen::half&, const Eigen::half&)
  Eigen::half floorh(const Eigen::half&)
  Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00
David Tellenbach
09f015852b Replace numext::as_uint with numext::bit_cast<numext::uint32_t> 2020-10-29 07:28:28 +01:00
David Tellenbach
e265f7ed8e Add support for Armv8.2-a __fp16
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces

* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
  float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half

The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.

This closes #1940.
2020-10-28 20:15:09 +00:00
mehdi-goli
a725a3233c [SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems 2020-10-28 08:34:49 +00:00
mehdi-goli
b9ff791fed [Missing SYCL math op]: Addin the missing LDEXP Function for SYCL. 2020-10-28 08:32:57 +00:00
mehdi-goli
61461d682a [Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend. 2020-10-28 08:30:34 +00:00
Christoph Hertzberg
ecb7bc9514 Bug #2036 make sure find_standard_math_library_test_program actually compiles (and is guaranteed to call math functions) 2020-10-24 15:22:21 +02:00
Susi Lehtola
09f595a269 Make sure compiler does not optimize away calls to math functions 2020-10-24 06:16:50 +00:00
guoqiangqi
28aef8e816 Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f> 2020-10-20 11:37:09 +08:00
guoqiangqi
4a77eda1fd remove unnecessary specialize template of pexp for scale float/double 2020-10-19 00:51:42 +00:00
Antonio Sanchez
d9f0d9eb76 Fix missing pfirst<Packet16b> for MSVC.
It was only defined under one `#ifdef` case.  This fixes the `packetmath_14`
test for MSVC.
2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen
21edea5edd Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent. 2020-10-15 18:39:58 -07:00
Deven Desai
011e0db31d Fix for ROCm/HIP breakage - 201013
The following commit seems to have introduced regressions in ROCm/HIP support.

183a208212

It causes some unit-tests to fail with the following error

```
...
Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'?
...
Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'?
...
Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'?
...
```

The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above.

It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling  with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( 4091f6b25c ).

Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option.

Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either.

Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen
6ea8091705 Revert change from 4e4d3f32d1 that broke BFloat16.h build with older compilers. 2020-10-15 01:20:08 +00:00
Guoqiang QI
4700713faf Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api 2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen
af6f43d7ff Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512. 2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen
274ef12b61 Remove leftover debug print statement in cxx11_tensor_expr.cpp 2020-10-14 22:59:51 +00:00
Rasmus Munk Larsen
208b3626d1 Revert generic implementation of predux, since it break compilation of predux_any with MSVC. 2020-10-14 21:41:28 +00:00
David Tellenbach
e3e2cf9d24 Add MatrixBase::cwiseArg() 2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen
61fc78bbda Get rid of nested template specialization in TensorReductionGpu.h, which was broken by c6953f799b. 2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
acxz
807e51528d undefine EIGEN_CONSTEXPR before redefinition 2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen
9a4d04c05f Make bitwise_helper a device function to unbreak GPU builds. 2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
David Tellenbach
7a8d3d5b81 Disable test exceptions when using OpenMP. 2020-10-09 17:49:07 +02:00
David Tellenbach
9022f5aa8a Mention problems when using potentially throwing scalars and OpenMP 2020-10-09 17:04:25 +02:00
Karl Ljungkvist
d199c17b14 Fix typo in Tutorial_BlockOperations_block_assignment.cpp 2020-10-09 07:51:36 +00:00
David Tellenbach
4091f6b25c Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD 2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
183a208212 Implement generic bitwise logical packet ops that work for all types. 2020-10-08 22:45:20 +00:00
David Tellenbach
8f8d77b516 Add EIGEN prefix for HAS_LGAMMA_R 2020-10-08 18:32:19 +02:00
Eugene Zhulenev
2279f2c62f Use lgamma_r if it is available (update check for glibc 2.19+) 2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
f66f3393e3 Use reinterpret_cast instead of C-style cast in Inverse_NEON.h 2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen
22c971a225 Don't cast away const in Inverse_NEON.h. 2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen
f93841b53e Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h. 2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen
ee714f79f7 Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments. 2020-10-02 21:05:35 +00:00
janos
f7b185a8b1 dont use =* might not return a Scalar 2020-10-02 14:36:51 +02:00
Rasmus Munk Larsen
9078f47cd6 Fix build breakage with MSVC 2019, which does not support MMX intrinsics for 64 bit builds, see:
https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c

Instead use the equivalent SSE2 intrinsics.
2020-10-01 12:37:55 -07:00
Rasmus Munk Larsen
3b445d9bf2 Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564. 2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen
44b9d4e412 Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead. 2020-09-30 13:33:44 -07:00
Antonio Sanchez
d5a0d89491 Fix alignedbox 32-bit precision test failure.
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.

In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.

Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly.  Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
David Tellenbach
30960d485e Fix failure in GEBP kernel when compiling with OpenMP and FMA
Fixes #1995
2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen
f9d1500f74 Revert !182. 2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen
068121ec02 Add missing newline at the end of Inverse_NEON.h 2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen
74ff5719b3 Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long. 2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen
3a0b23e473 Fix compilation of pset1frombits calls on iOS. 2020-09-28 22:30:36 +00:00
Christoph Hertzberg
6b0c0b587e Provide a more efficient Packet2l->Packet2d cast method 2020-09-28 22:14:02 +00:00
Martin Pecka
6425e875a1 Added AlignedBox::transform(AffineTransform). 2020-09-28 18:06:23 +00:00
Alexander Grund
a967fadb21 Make relative path variables of type STRING
When the type is PATH an absolute path is expected and user-defined
values are converted into absolute paths relative to the current directory.

Fixes #1990
2020-09-28 16:39:48 +00:00
Zhuyie
e4b24e7fb2 Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined 2020-09-25 09:36:43 +00:00
Deven Desai
ce5c59729d Fix for ROCm/HIP breakage - 200921
The following commit causes regressions in the ROCm/HIP support for Eigen
e55182ac09

I suspect the same breakages occur on the CUDA side too.

The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.

When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.

This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
2020-09-22 22:26:45 +00:00
David Tellenbach
b8a13f13ca Add CI configuration for ppc64le 2020-09-22 00:26:23 +00:00
Guoqiang QI
821702e771 Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler 2020-09-21 15:49:00 +00:00
David Tellenbach
493a7c773c Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>> 2020-09-21 12:43:41 +02:00
Павел Мацула
38e4a67394 Fix using FindStandardMathLibrary.cmake with -Wall (-Wunused-value) added to CMAKE_CXX_FLAG 2020-09-19 16:13:16 +00:00
Rasmus Munk Larsen
c4b99f78c7 Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86.
If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.
2020-09-18 18:13:20 -07:00
guoqiangqi
9aad16b443 Fix undefined reference to pset1frombits bug on different platforms 2020-09-19 00:53:21 +00:00
David Tellenbach
c4aa8e0db2 Rename variable to avoid shadowing of a previously declared one 2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen
14022f5eb5 Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h.
'vmvnq_u64' does not exist for some reason.
2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen
a5b226920f Fix typo in PacketMath.h 2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen
3af744b023 Add missing packet op pcmp_lt_or_nan for Packet2d on ARM. 2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen
31a6b88ff3 Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported. 2020-09-17 23:51:06 +00:00
Brad King
880fa43b2b Add support for CastXML on ARM aarch64
CastXML simulates the preprocessors of other compilers, but actually
parses the translation unit with an internal Clang compiler.
Use the same `vld1q_u64` workaround that we do for Clang.

Fixes: #1979
2020-09-16 13:40:23 -04:00
daravi
6f0f6f792e Fix compiler error due to c++20 operator== generation rules 2020-09-16 02:06:53 +00:00
Benoit Jacob
cc0c38ace8 Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler. 2020-09-15 20:54:14 -04:00
Tim Shen
bb56a62582 Make bfloat16(float(-nan)) produce -nan, not nan. 2020-09-15 13:24:23 -07:00
Guoqiang QI
3012e755e9 Add plog ops support packet2d for NEON 2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen
e4fb0ddf78 Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h 2020-09-15 01:18:55 +00:00
Pedro Caldeira
65e400896b Fix bfloat16 round on gcc 4.8 2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen
5636f80d11 Fix issue #1968. Don't discard return value from "new" in C++17. 2020-09-13 17:38:45 +00:00
Guoqiang QI
7c5d48f313 Unified sse pldexp_double api 2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen
71e08c702b Make blueNorm threadsafe if C++11 atomics are available. 2020-09-12 01:23:29 +00:00
David Tellenbach
adc861cabd New CI infrastructure, including AArch64 runners 2020-09-11 18:11:49 +00:00
Niels Dekker
5328c9be43 Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
2020-09-10 16:22:28 +02:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Guoqiang QI
85428a3440 Add Neon psqrt<Packet2d> and pexp<Packet2d> 2020-09-08 09:04:03 +00:00
Alexander Neumann
5272106826 remove semi triggering -Wextra-semi-stmt 2020-09-07 11:42:30 +02:00
Stephen Zheng
5f25bcf7d6 Add Inverse_NEON.h
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.

```
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_float                 -0.1285         -0.1275           568           495           572           499
BM_double                -0.2265         -0.2254           638           494           641           496
```
2020-09-04 10:55:47 +00:00
Everton Constantino
6fe88a3c9d MatrixProuct enhancements:
- Changes to Altivec/MatrixProduct
  Adapting code to gcc 10.
  Generic code style and performance enhancements.
  Adding PanelMode support.
  Adding stride/offset support.
  Enabling float64, std::complex and std::complex.
  Fixing lack of symm_pack.
  Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275 Changing u/int8_t to un/signed char because clang does not understand
it.

Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Gael Guennebaud
27e6648074 fix #1901: warning in Mode==(Upper|Lower) 2020-09-02 15:43:58 +02:00
Hans Johnson
5b9bfc892a BUG: cmake_minimum_required must be the first command
https://cmake.org/cmake/help/v3.5/command/project.html

Note: Call the cmake_minimum_required() command at the beginning of the
top-level CMakeLists.txt file even before calling the project() command.
It is important to establish version and policy settings before invoking
other commands whose behavior they may affect. See also policy CMP0000.
2020-08-28 22:57:16 +00:00
Chip Kerchner
e5886457c8 Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub. 2020-08-28 19:27:32 +00:00
Gael Guennebaud
25424d91f6 Fix #1974: assertion when reserving an empty sparse matrix 2020-08-26 12:32:20 +02:00
Guoqiang QI
8bb0febaf9 add psqrt ops support packet2f/packet4f for NEON 2020-08-21 03:17:15 +00:00
Georg Jäger
1b1082334b adding attributes to constructors to support hip-clang on ROCm 3.5 2020-08-20 16:48:11 +02:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
c060114a25 Fix nightly CI configuration 2020-08-19 20:52:34 +02:00
David Tellenbach
fe8c3ef3cb Add possibility to split test suit build targets and improved CI configuration
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
2020-08-19 18:27:45 +00:00
Rasmus Munk Larsen
d10b27fe37 Add missing inline keyword in Quaternion.h. 2020-08-14 17:51:04 +00:00
David Tellenbach
d4a727d092 Disable min/max NaN propagation in test cxx11_tensor_expr
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.

See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
David Tellenbach
d2bb6cf396 Fix compilation error in blasutil test 2020-08-14 18:15:18 +02:00
David Tellenbach
c6820a6316 Replace the call to int64_t in the blasutil test by explicit types
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.

We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
David Tellenbach
8ba1b0f41a bfloat16 packetmath for Arm Neon backend 2020-08-13 15:48:40 +00:00
Pedro Caldeira
704798d1df Add support for Bfloat16 to use vector instructions on Altivec
architecture
2020-08-10 13:22:01 -05:00
Deven Desai
46f8a18567 Adding an explicit launch_bounds(1024) attribute for GPU kernels.
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.

This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)

Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.

This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
Zachary Garrett
21122498ec Temporarily turn off the NEON implementation of pfloor as it does not work for large values.
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point  represented are supported.
2020-08-04 16:28:23 +00:00
David Tellenbach
23b7f0572b Disable CI buildstage again 2020-08-03 15:41:43 +02:00
Gael Guennebaud
d0f5d4bc50 add a banner to advertise the survey 2020-07-29 19:01:38 +02:00
David Tellenbach
5e484fa11d Fix StlDeque for GCC 10
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641 Fix undefine BF16 union behavior in AVX512. 2020-07-29 02:20:21 +00:00
Rasmus Munk Larsen
b92206676c Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned. 2020-07-28 19:19:37 +00:00
David Tellenbach
99da2e1a8d Fix clang-tidy warnings in generic bfloat16 implementation
See !172 for related discussions.
2020-07-27 16:00:24 +02:00
qxxxb
649fd1c2ae Fix CMake install command 2020-07-25 16:35:13 -04:00
David Tellenbach
e48d8e4725 Don't allow failure for CI build stage anymore 2020-07-24 21:12:15 +02:00
David Tellenbach
b8ca93842c Improve CI configuration
- Fix docker Fedora image to Fedora:31
  - Fix gcc version to gcc-9.2.1
  - Use GitLab CI dag
  - Fix usage of build cache
  - Introduce build artificats
2020-07-24 15:58:44 +00:00
Gael Guennebaud
fb0c6868ad Add missing footer declaration 2020-07-24 10:28:44 +02:00
David Tellenbach
c1ffe452fc Fix bfloat16 casts
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Gael Guennebaud
2ce2f51989 remove piwik tracker 2020-07-23 13:51:39 +02:00
Rasmus Munk Larsen
1b84f21e32 Revert change that made conversion from bfloat16 to {float, double} implicit.
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
David Tellenbach
38b91f256b Fix cast of blfoat16 to std::complex<T>
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen
bed7fbe854 Make sure we take the little-endian path if __BYTE_ORDER__ is not defined. 2020-07-22 18:54:38 +00:00
Niels Dekker
0e1a33a461 Faster conversion from integer types to bfloat16
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.

A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen
acab22c205 Avoid division by zero in nonZerosEstimate() for empty blocks. 2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00
Rasmus Munk Larsen
0aeaf5f451 Make numext::as_uint a device function. 2020-07-22 00:33:41 +00:00
Alexander Turkin
60faa9f897 user-defined copy operations removed in favor of compiler-generated ones 2020-07-20 14:59:35 +03:00
Niels Dekker
b11f817bcf Avoid undefined behavior by union type punning in float_to_bfloat16_rtne
Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

`numext::as_uint` was suggested by David Tellenbach
2020-07-14 19:55:20 +02:00
Sheng Yang
56b3e3f3f8 AVX path for BF16 2020-07-14 01:34:03 +00:00
Niels Dekker
4ab32e2de2 Allow implicit conversion from bfloat16 to float and double
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.

Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d Guard operator<< test by EIGEN_NO_IO. 2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
ed00df445d Guard operator<< by EIGEN_NO_IO. 2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen
fb77b7288c Add operator<< to print a quaternion. 2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48 Fix test basic stuff
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Forrest Voight
8889a2c1c6 Add operator==/operator!= to Quaternion. Fixes #1876. 2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183 Make test packetmath C++98 compliant 2020-07-01 20:41:59 +02:00
Sheng Yang
116c5235ac BF16 for scalar_cmp_with_cast_op 2020-07-01 18:33:42 +00:00
Kan Chen
8731452b97 Delete duplicate test cases in vectorization_logic.cpp 2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f Fix denormal check pre c++11.
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d Report custom C++ flags in CMake testing summary 2020-06-30 17:18:54 +00:00
David Tellenbach
f3b8d441f6 Remote CI tags to enable shared runners 2020-06-29 22:15:41 +02:00
Christoph Grüninger
dc0b81fb1d Pass CMAKE_MAKE_PROGRAM to Fortran language support test
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
2020-06-27 23:52:38 +02:00
David Tellenbach
13d25f5ed8 Add initial CI configuration file.
The initial CI configuration consists of jobs to build and run tests and
to build docs.
2020-06-27 00:03:35 +00:00
Antonio Sanchez
7222f0b6b5 Fix packetmath_1 float tests for arm/aarch64.
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.

Modified `log(denorm)` tests.  Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.

Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Simon Pfreundschuh
14f84978e8 Replaced call to deprecated 'load' function with appropriate call to 'on'. 2020-06-23 11:23:13 +02:00
Antonio Sanchez
ff4e7a0820 Add missing Packet2l/Packet2ul ops for NEON.
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-22 11:24:43 -07:00
Antonio Sanchez
03ebdf6acb Added missing NEON pcasts, update packetmath tests.
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.

Removed incorrect `HasHalfPacket`  definition for NEON's
`Packet2l`/`Packet2ul`.

Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
  packets that are vectorizable)
- added 8:1 cast tests
- random number generation
  - original had uninteresting 0 to 0 casts for many casts between
    floating-point and integers, and exhibited signed overflow
    undefined behavior

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
6b9c92fe7e Add Apache 2.0 license text in COPYING.APACHE. 2020-06-18 12:45:27 -07:00
Nicolas Mellado
cf7adf3a5d Update things you can do message using cmake commands
Print cmake commands instead of make commands, which should work for any generator.
2020-06-16 21:04:33 +00:00
Ilya Tokar
231ce21535 Run two independent chains, when reducing tensors.
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.

Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.

BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
Pedro Caldeira
a475bf14d4 Fix pscatter and pgather for Altivec Complex double 2020-06-16 16:41:02 -03:00
David Tellenbach
c6c84ed961 Fix unused variable warning on Arm 2020-06-15 00:14:58 +02:00
Sebastien Boisvert
6228f27234 Fix #1818: SparseLU: add methods nnzL() and nnzU()
Now this compiles without errors:

$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
2020-06-11 23:49:49 +00:00
Sebastien Boisvert
39cbd6578f Fix #1911: add benchmark for move semantics with fixed-size matrix
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
        -o bench_move_semantics

$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Sebastien Boisvert
463ec86648 Fix #1757: remove the word 'suicide' 2020-06-11 00:56:54 +00:00
ShengYang1
b5d66b5e73 Implement scalar_cmp_with_cast_op 2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
c4059ffcb6 Fix static analyzer warning in SelfadjointProduct.h.
Fix compiler warnings in GeneralBlockPanelKernel.h.
2020-06-08 11:48:44 -07:00
Thales Sabino
1fcaaf460f Update FindComputeCpp.cmake to fix build problems on Windows
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
2020-06-05 20:51:20 +00:00
David Tellenbach
3ce18d3c8f Revert ".gitlab-ci.yml: initial commit"
This reverts commit 95177362ed to
disable GitLab CI temporarily.
2020-06-05 22:43:49 +02:00
Rasmus Munk Larsen
c2ab36f47a Fix broken packetmath test for logistic on Arm. 2020-06-04 16:24:47 -07:00
Rasmus Munk Larsen
537e2b322f Fix typo in previous update to generic predux_any. 2020-06-04 22:25:05 +00:00
Rasmus Munk Larsen
fdc1cbdce3 Avoid implicit float equality comparison in generic predux_any, but use numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal. 2020-06-04 22:15:56 +00:00
Rasmus Munk Larsen
daf9bbeca2 Fix compilation error in logistic packet op. 2020-06-03 00:57:41 +00:00
n0mend
6d2a9a524b Update run instructions for benchCholesky 2020-06-01 18:31:46 +00:00
Gael Guennebaud
029a76e115 Bug #1777: make the scalar and packet path consistent for the logistic function + respective unit test 2020-05-31 00:53:37 +02:00
Gael Guennebaud
99b7f7cb9c Fix #556: warnings with mingw 2020-05-31 00:39:44 +02:00
Gael Guennebaud
72782d13e0 Bug #1767: increase required cmake version to 3.5.0 2020-05-31 00:31:09 +02:00
Gael Guennebaud
867a756509 Fix #1833: compilation issue of "array!=scalar" with c++20 2020-05-30 23:53:58 +02:00
Gael Guennebaud
ab615e4114 Save one extra temporary when assigning a sparse product to a row-major sparse matrix 2020-05-30 23:15:12 +02:00
Christoph Junghans
95177362ed .gitlab-ci.yml: initial commit 2020-05-29 09:23:25 -06:00
Kan Chen
8d1302f566 Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ptranspose on NEON 2020-05-29 00:33:45 +00:00
Antonio Sánchez
8719b9c5bc Disable test for 32-bit systems (e.g. ARM, i386)
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.

https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Yong Tang
8e1df5b082 Fix incorrect usage of if defined(EIGEN_ARCH_PPC) => if EIGEN_ARCH_PPC
This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)`
in `Eigen/Core` header.

In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined
as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true.
This causes issues when building on non PPC platform and `MatrixProduct.h` is not
available.

This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2020-05-28 05:53:44 -07:00
Kan Chen
4e7046063b Fix #1874: it works on both MSVC 2017 and other platforms. 2020-05-21 18:42:56 +08:00
Pedro Caldeira
2d67af2d2b Add pscatter for Packet16{u}c (int8) 2020-05-20 17:29:34 -03:00
David Tellenbach
5328cd62b3 Guard usage of decltype since it's a C++11 feature
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
2020-05-20 16:04:16 +02:00
Rasmus Munk Larsen
cc86a31e20 Add guard around specialization for bool, which is only currently implemented for SSE. 2020-05-19 16:21:56 -07:00
Everton Constantino
8a7f360ec3 - Vectorizing MMA packing.
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
a145e4adf5 Add newline at the end of StlIterators.h. 2020-05-15 20:36:00 +00:00
Gael Guennebaud
8ce9630ddb Fix #1874: workaround MSVC 2017 compilation issue. 2020-05-15 20:47:32 +02:00
Rasmus Munk Larsen
9b411757ab Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
This change also contains a few minor cleanups:
  1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
     which can be done in other ways.
  2. Remove the "HasInsert" enum, which is no longer needed since we removed the
     corresponding packet ops.
  3. Add faster pselect op for Packet4i when SSE4.1 is supported.

Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.

Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                        Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4            9.77           9.77    71670320
BM_TransposeInPlace<float>/8           21.9           21.9     31929525
BM_TransposeInPlace<float>/16          66.6           66.6     10000000
BM_TransposeInPlace<float>/32         243            243        2879561
BM_TransposeInPlace<float>/59         844            844         829767
BM_TransposeInPlace<float>/64         933            933         750567
BM_TransposeInPlace<float>/128       3944           3945         177405
BM_TransposeInPlace<float>/256      16853          16853          41457
BM_TransposeInPlace<float>/512     204952         204968           3448
BM_TransposeInPlace<float>/1k     1053889        1053861            664
BM_TransposeInPlace<bool>/4            14.4           14.4     48637301
BM_TransposeInPlace<bool>/8            36.0           36.0     19370222
BM_TransposeInPlace<bool>/16           31.5           31.5     22178902
BM_TransposeInPlace<bool>/32          111            111        6272048
BM_TransposeInPlace<bool>/59          626            626        1000000
BM_TransposeInPlace<bool>/64          428            428        1632689
BM_TransposeInPlace<bool>/128        1677           1677         417377
BM_TransposeInPlace<bool>/256        7126           7126          96264
BM_TransposeInPlace<bool>/512       29021          29024          24165
BM_TransposeInPlace<bool>/1k       116321         116330           6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31 Added support for reverse iterators for Vectorwise operations. 2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5 Indexed view should have RowMajorBit when there is staticly a single row 2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28 Resolve "IndexedView of a vector should allow linear access" 2020-05-13 19:24:42 +00:00
Mark Eberlein
ba9d18b938 Add KLU support to spbenchsolver 2020-05-11 21:50:27 +00:00
Pedro Caldeira
5fdc179241 Altivec template functions to better code reusability 2020-05-11 21:04:51 +00:00
mehdi-goli
d3e81db6c5 Eigen moved the scanLauncehr function inside the internal namespace.
This commit applies the following changes:
    - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend.
    - Replacing  `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard.
    - minor fixes: commenting out an unused variable to avoid compiler warnings.
2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
David Tellenbach
5c4e19fbe7 Possibility to specify user-defined default cache sizes for GEBP kernel
Some architectures have no convinient way to determine cache sizes at
runtime. Eigen's GEBP kernel falls back to default cache values in this
case which might not be correct in all situations.

This patch introduces three preprocessor directives

  `EIGEN_DEFAULT_L1_CACHE_SIZE`
  `EIGEN_DEFAULT_L2_CACHE_SIZE`
  `EIGEN_DEFAULT_L3_CACHE_SIZE`

to give users the possibility to set these default values explicitly.
2020-05-08 12:54:36 +02:00
Rasmus Munk Larsen
225ab040e0 Remove unused packet op "palign".
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618 Make size odd for transposeInPlace test to make sure we hit the scalar path. 2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
49f1aeb60d Remove traits declaring NEON vectorized casts that do not actually have packet op implementations. 2020-05-07 09:49:22 -07:00
Rasmus Munk Larsen
2fd8a5a08f Add parallelization of TensorScanOp for types without packet ops.
Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors.

Benchmark numbers for Tensor<uint32_t>:

name                                                       old time/op             new time/op             delta
BM_cumSumRowReduction_1T/8   [using 1 threads]             76.5ns ± 0%             61.3ns ± 4%    -19.80%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/64  [using 1 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.77%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/256 [using 1 threads]             39.8µs ± 0%             39.6µs ± 0%     -0.60%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/4k  [using 1 threads]             13.9ms ± 0%             13.4ms ± 1%     -4.19%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/8   [using 2 threads]             76.8ns ± 0%             59.1ns ± 0%    -23.09%          (p=0.016 n=5+4)
BM_cumSumRowReduction_2T/64  [using 2 threads]             2.47µs ± 1%             2.41µs ± 1%     -2.53%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/256 [using 2 threads]             39.8µs ± 0%             34.7µs ± 6%    -12.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/4k  [using 2 threads]             13.8ms ± 1%              7.2ms ± 6%    -47.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/8   [using 8 threads]             76.4ns ± 0%             61.8ns ± 3%    -19.02%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/64  [using 8 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.84%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/256 [using 8 threads]             39.8µs ± 0%             28.3µs ±11%    -28.75%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/4k  [using 8 threads]             13.8ms ± 0%              2.7ms ± 5%    -80.39%          (p=0.008 n=5+5)
BM_cumSumColReduction_1T/8   [using 1 threads]             59.1ns ± 0%             80.3ns ± 0%    +35.94%          (p=0.029 n=4+4)
BM_cumSumColReduction_1T/64  [using 1 threads]             3.06µs ± 0%             3.08µs ± 1%       ~             (p=0.114 n=4+4)
BM_cumSumColReduction_1T/256 [using 1 threads]              175µs ± 0%              176µs ± 0%       ~             (p=0.190 n=4+5)
BM_cumSumColReduction_1T/4k  [using 1 threads]              824ms ± 1%              844ms ± 1%     +2.37%          (p=0.008 n=5+5)
BM_cumSumColReduction_2T/8   [using 2 threads]             59.0ns ± 0%             90.7ns ± 0%    +53.74%          (p=0.029 n=4+4)
BM_cumSumColReduction_2T/64  [using 2 threads]             3.06µs ± 0%             3.10µs ± 0%     +1.08%          (p=0.016 n=4+5)
BM_cumSumColReduction_2T/256 [using 2 threads]              176µs ± 0%              189µs ±18%       ~             (p=0.151 n=5+5)
BM_cumSumColReduction_2T/4k  [using 2 threads]              836ms ± 2%              611ms ±14%    -26.92%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/8   [using 8 threads]             59.3ns ± 2%             90.6ns ± 0%    +52.79%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/64  [using 8 threads]             3.07µs ± 0%             3.10µs ± 0%     +0.99%          (p=0.016 n=5+4)
BM_cumSumColReduction_8T/256 [using 8 threads]              176µs ± 0%               80µs ±19%    -54.51%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/4k  [using 8 threads]              827ms ± 2%              180ms ±14%    -78.24%          (p=0.008 n=5+5)
2020-05-06 14:48:37 -07:00
Rasmus Munk Larsen
0e59f786e1 Fix accidental copy of loop variable. 2020-05-05 21:35:38 +00:00
Rasmus Munk Larsen
7b76c85daf Vectorize and parallelize TensorScanOp.
TensorScanOp is used in TensorFlow for a number of operations, such as cumulative logexp reduction and cumulative sum and product reductions.

The benchmarks numbers below are for cumulative row- and column reductions of NxN matrices.

name                                                         old time/op             new time/op     delta
BM_cumSumRowReduction_1T/4    [using 1 threads ]             25.1ns ± 1%             35.2ns ± 1%    +40.45%
BM_cumSumRowReduction_1T/8    [using 1 threads ]             73.4ns ± 0%             82.7ns ± 3%    +12.74%
BM_cumSumRowReduction_1T/32   [using 1 threads ]              988ns ± 0%              832ns ± 0%    -15.77%
BM_cumSumRowReduction_1T/64   [using 1 threads ]             4.07µs ± 2%             3.47µs ± 0%    -14.70%
BM_cumSumRowReduction_1T/128  [using 1 threads ]             18.0µs ± 0%             16.8µs ± 0%     -6.58%
BM_cumSumRowReduction_1T/512  [using 1 threads ]              287µs ± 0%              281µs ± 0%     -2.22%
BM_cumSumRowReduction_1T/2k   [using 1 threads ]             4.78ms ± 1%             4.78ms ± 2%       ~
BM_cumSumRowReduction_1T/10k  [using 1 threads ]              117ms ± 1%              117ms ± 1%       ~
BM_cumSumRowReduction_8T/4    [using 8 threads ]             25.0ns ± 0%             35.2ns ± 0%    +40.82%
BM_cumSumRowReduction_8T/8    [using 8 threads ]             77.2ns ±16%             81.3ns ± 0%       ~
BM_cumSumRowReduction_8T/32   [using 8 threads ]              988ns ± 0%              833ns ± 0%    -15.67%
BM_cumSumRowReduction_8T/64   [using 8 threads ]             4.08µs ± 2%             3.47µs ± 0%    -14.95%
BM_cumSumRowReduction_8T/128  [using 8 threads ]             18.0µs ± 0%             17.3µs ±10%       ~
BM_cumSumRowReduction_8T/512  [using 8 threads ]              287µs ± 0%               58µs ± 6%    -79.92%
BM_cumSumRowReduction_8T/2k   [using 8 threads ]             4.79ms ± 1%             0.64ms ± 1%    -86.58%
BM_cumSumRowReduction_8T/10k  [using 8 threads ]              117ms ± 1%               18ms ± 6%    -84.50%

BM_cumSumColReduction_1T/4    [using 1 threads ]             23.9ns ± 0%             33.4ns ± 1%    +39.68%
BM_cumSumColReduction_1T/8    [using 1 threads ]             71.6ns ± 1%             49.1ns ± 3%    -31.40%
BM_cumSumColReduction_1T/32   [using 1 threads ]              973ns ± 0%              165ns ± 2%    -83.10%
BM_cumSumColReduction_1T/64   [using 1 threads ]             4.06µs ± 1%             0.57µs ± 1%    -85.94%
BM_cumSumColReduction_1T/128  [using 1 threads ]             33.4µs ± 1%              4.1µs ± 1%    -87.67%
BM_cumSumColReduction_1T/512  [using 1 threads ]             1.72ms ± 4%             0.21ms ± 5%    -87.91%
BM_cumSumColReduction_1T/2k   [using 1 threads ]              119ms ±53%               11ms ±35%    -90.42%
BM_cumSumColReduction_1T/10k  [using 1 threads ]              1.59s ±67%              0.35s ±49%    -77.96%
BM_cumSumColReduction_8T/4    [using 8 threads ]             23.8ns ± 0%             33.3ns ± 0%    +40.06%
BM_cumSumColReduction_8T/8    [using 8 threads ]             71.6ns ± 1%             49.2ns ± 5%    -31.33%
BM_cumSumColReduction_8T/32   [using 8 threads ]             1.01µs ±12%             0.17µs ± 3%    -82.93%
BM_cumSumColReduction_8T/64   [using 8 threads ]             4.15µs ± 4%             0.58µs ± 1%    -86.09%
BM_cumSumColReduction_8T/128  [using 8 threads ]             33.5µs ± 0%              4.1µs ± 4%    -87.65%
BM_cumSumColReduction_8T/512  [using 8 threads ]             1.71ms ± 3%             0.06ms ±16%    -96.21%
BM_cumSumColReduction_8T/2k   [using 8 threads ]             97.1ms ±14%              3.0ms ±23%    -96.88%
BM_cumSumColReduction_8T/10k  [using 8 threads ]              1.97s ± 8%              0.06s ± 2%    -96.74%
2020-05-05 00:19:43 +00:00
Xiaoxiang Cao
a74a278abd Fix confusing template param name for Stride fwd decl. 2020-04-30 01:43:05 +00:00
Rasmus Munk Larsen
923ee9aba3 Fix the embarrassingly incomplete fix to the embarrassing bug in blocked transpose. 2020-04-29 17:27:36 +00:00
Rasmus Munk Larsen
a32923a439 Fix (embarrassing) bug in blocked transpose. 2020-04-29 17:02:27 +00:00
Rasmus Munk Larsen
1e41406c36 Add missing transpose in cleanup loop. Without it, we trip an assertion in debug mode. 2020-04-29 01:30:51 +00:00
Rasmus Munk Larsen
fbe7916c55 Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile. 2020-04-29 00:58:41 +00:00
Clément Grégoire
82f54ad144 Fix perf monitoring merge function 2020-04-28 17:02:59 +00:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993 Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)

name                                       old time/op             new time/op             delta
BM_TransposeInPlace<float>/4               9.84ns ± 0%             6.51ns ± 0%  -33.80%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8               23.6ns ± 1%             17.6ns ± 0%  -25.26%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16              78.8ns ± 0%             60.3ns ± 0%  -23.50%          (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32               302ns ± 0%              229ns ± 0%  -24.40%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59              1.03µs ± 0%             0.84µs ± 1%  -17.87%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64              1.20µs ± 0%             0.89µs ± 1%  -25.81%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128             8.96µs ± 0%             3.82µs ± 2%  -57.33%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256              152µs ± 3%               17µs ± 2%  -89.06%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512              837µs ± 1%              208µs ± 0%  -75.15%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k              4.28ms ± 2%             1.08ms ± 2%  -74.72%          (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Pedro Caldeira
29f0917a43 Add support to vector instructions to Packet16uc and Packet16c 2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen
e80ec24357 Remove unused packet op "preduxp". 2020-04-23 18:17:14 +00:00
René Wagner
0aebe19aca BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.
This enables operator== on Eigen matrices in device code.
2020-04-23 17:25:08 +02:00
Eugene Zhulenev
3c02fefec5 Add async evaluation support to TensorSlicingOp.
Device::memcpy is not async-safe and might lead to deadlocks. Always evaluate slice expression in async mode.
2020-04-22 19:55:01 +00:00
Pedro Caldeira
0c67b855d2 Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations 2020-04-21 14:52:46 -03:00
Rasmus Munk Larsen
e8f40e4670 Fix bug in ptrue for Packet16b. 2020-04-20 21:45:10 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
dlazenby
00f6340153 Update PreprocessorDirectives.dox - Added line for the new VectorwiseOp plugin directive (and re-alphabatized the plugin section) 2020-04-17 21:43:37 +00:00
Rasmus Munk Larsen
5ab87d8aba Move eigen_packet_wrapper to GenericPacketMath.h and use it for SSE/AVX/AVX512 as it is already used for NEON.
This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i.
Use this machanism to define packets for half and clean up the packet op implementations.
2020-04-15 18:17:19 +00:00
Rasmus Munk Larsen
4aae8ac693 Fix typo in TypeCasting.h 2020-04-14 02:55:51 +00:00
Rasmus Munk Larsen
1d674003b2 Fix big in vectorized casting of
{uint8, int8} -> {int16, uint16, int32, uint32, float} 
 {uint16, int16} -> {int32, uint32, int64, uint64, float} 

for NEON. These conversions were advertised as vectorized, but not actually implemented.
2020-04-14 02:11:06 +00:00
Changming Sun
b1aa07a8d3 Fix a bug in TensorIndexList.h 2020-04-13 18:22:03 +00:00
Christoph Hertzberg
d46d726e9d CommaInitializer wrongfully asserted for 0-sized blocks
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
c854e189e6 Fixed commainitializer test.
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
2020-04-10 13:53:26 -07:00
jangsoopark
39142904cc Resolve C4346 when building eigen on windows 2020-04-08 14:55:39 +09:00
Rasmus Munk Larsen
f0577a2bfd Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.

Measured improvement with AVX+FMA:

name                           old time/op             new time/op             delta
BM_MatMul_ATB/8                 139ns ± 1%              129ns ± 1%   -7.49%          (p=0.008 n=5+5)
BM_MatMul_ATB/32               1.46µs ± 1%             1.22µs ± 0%  -16.72%          (p=0.008 n=5+5)
BM_MatMul_ATB/64               8.43µs ± 1%             7.41µs ± 0%  -12.04%          (p=0.008 n=5+5)
BM_MatMul_ATB/128              56.8µs ± 1%             52.9µs ± 1%   -6.83%          (p=0.008 n=5+5)
BM_MatMul_ATB/256               407µs ± 1%              395µs ± 3%   -2.94%          (p=0.032 n=5+5)
BM_MatMul_ATB/512              3.27ms ± 3%             3.18ms ± 1%     ~             (p=0.056 n=5+5)


Measured improvement for AVX512:

name                          old time/op             new time/op             delta
BM_MatMul_ATB/8                167ns ± 1%              154ns ± 1%   -7.63%          (p=0.008 n=5+5)
BM_MatMul_ATB/32              1.08µs ± 1%             0.83µs ± 3%  -23.58%          (p=0.008 n=5+5)
BM_MatMul_ATB/64              6.21µs ± 1%             5.06µs ± 1%  -18.47%          (p=0.008 n=5+5)
BM_MatMul_ATB/128             36.1µs ± 2%             31.3µs ± 1%  -13.32%          (p=0.008 n=5+5)
BM_MatMul_ATB/256              263µs ± 2%              242µs ± 2%   -7.92%          (p=0.008 n=5+5)
BM_MatMul_ATB/512             1.95ms ± 2%             1.91ms ± 2%     ~             (p=0.095 n=5+5)
BM_MatMul_ATB/1k              15.4ms ± 4%             14.8ms ± 2%     ~             (p=0.095 n=5+5)
2020-04-07 22:09:51 +00:00
Antonio Sanchez
8e875719b3 Replace norm() with squaredNorm() to address integer overflows
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.

Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Antonio Sanchez
9dda5eb7d2 Missing struct definition in NumTraits 2020-04-07 09:01:11 -07:00
Akshay Naresh Modi
bcc0e9e15c Add numeric_limits min and max for bool
This will allow (among other things) computation of argmax and argmin of bool tensors
2020-04-06 23:38:57 +00:00
Bernardo Bahia Monteiro
54a0a9c9dd Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar
The error generated by the compiler was:

    no matching function for call to 'maxi'
    RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);

The important part in the following notes was:

    candidate template ignored: deduced conflicting
    types for parameter 'T'"
    ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
    EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)

I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
2020-03-29 19:44:12 -04:00
Rasmus Munk Larsen
4fd5d1477b Fix packetmath test build for AVX. 2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
393dbd8ee9 Fix bug in 52d54278be 2020-03-27 16:42:18 +00:00
Rasmus Munk Larsen
55c8fe8d0f Fix bug in 52d54278be 2020-03-27 16:41:15 +00:00
Joel Holdsworth
6d2dbfc453 NEON: Fixed MSVC types definitions 2020-03-26 20:19:58 +00:00
Joel Holdsworth
52d54278be Additional NEON packet-math operations 2020-03-26 20:18:19 +00:00
Everton Constantino
deb93ed1bf Adhere to recommended load/store intrinsics for pp64le 2020-03-23 15:18:15 -03:00
Aaron Franke
5c22c7a7de Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Everton Constantino
5afdaa473a Fixing float32's pround halfway criteria to match STL's criteria. 2020-03-21 22:30:54 -05:00
Alessio M
96cd1ff718 Fixed:
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
2020-03-21 05:11:21 +00:00
dlazenby
cc954777f2 Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h 2020-03-20 19:30:01 +00:00
Masaki Murooka
55ecd58a3c Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base. 2020-03-20 13:37:37 +09:00
Rasmus Munk Larsen
4da2c6b197 Remove reference to non-existent unary_op_base class. 2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35 Add missing arguments to numext::absdiff(). 2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b Add absolute_difference coefficient-wise binary Array function 2020-03-19 17:45:20 +00:00
Everton Constantino
6ff5a14091 Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned
types.
2020-03-19 17:31:49 +00:00
Joel Holdsworth
232f904082 Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions 2020-03-19 17:24:06 +00:00
Joel Holdsworth
54aa8fa186 Implement integer square-root for NEON 2020-03-19 17:05:13 +00:00
Allan Leal
37ccb86916 Update NullaryFunctors.h 2020-03-16 11:59:02 +00:00
Deven Desai
7158ed4e0e Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as the Eigen::Half packet type 2020-03-12 01:06:24 +00:00
Joel Holdsworth
d53ae40f7b NEON: Added int64_t and uint64_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
4b9ecf2924 NEON: Added int8_t and uint8_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
ceaabd4e16 NEON: Added int16_t and uint16_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
d5d3cf9339 NEON: Added uint32_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
eacf97f727 NEON: Implemented half-size vectors 2020-03-10 22:46:19 +00:00
Joel Holdsworth
5f411b729e NEON: Set packet_traits<double> flags 2020-03-10 22:46:19 +00:00
Joel Holdsworth
88337acae2 test/packetmath: Add tests for all integer types 2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578 test/packetmath: Made negate non-mandatory 2020-03-10 22:46:19 +00:00
Sami Kama
b733b8b680 remove duplicate pset1 for half and add some comments about why we need expose pmul/add/div/min/max on host 2020-03-10 20:28:43 +00:00
Ram-Z
a45d28256d Don't restrict CMAKE_BUILD_TYPE
This prevents projects that add Eigen using `add_subdirectory` from using their own custom CMAKE_BUILD_TYPE and have Eigen respect the same custom flags.
2020-02-28 20:46:53 +00:00
Cédric Hubert
98bfc5aaa8 Update MarketIO.h 2020-02-28 12:41:51 +00:00
Rasmus Munk Larsen
52a2fbbb00 Revert "avoid selecting half-packets when unnecessary"
This reverts commit 5ca10480b0
2020-02-25 01:07:43 +00:00
Rasmus Munk Larsen
235bcfe08d Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"
This reverts commit 44df2109c8
2020-02-25 01:07:28 +00:00
Rasmus Munk Larsen
d7a42eade6 Revert "do not pick full-packet if it'd result in more operations"
This reverts commit e9cc0cd353
2020-02-25 01:07:15 +00:00
Rasmus Munk Larsen
6ac37768a9 Revert "add some static checks for packet-picking logic"
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8 Disable test in test/vectorization_logic.cpp, which is currently failing with AVX. 2020-02-24 23:28:25 +00:00
Tobias Bosch
f0ce88cff7 Include <sstream> explicitly, and don't rely on the implicit include via <complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
2020-02-24 23:09:36 +00:00
Ilya Tokar
eb6cc29583 Avoid a division in NonBlockingThreadPool::Steal.
Looking at profiles we spend ~10-20% of Steal on simply computing
random % size. We can reduce random 32-bit int into [0, size) range with
a single multiplication and shift. This transformation is described in
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
2020-02-14 16:02:57 -05:00
Francesco Mazzoli
7769600245 add some static checks for packet-picking logic 2020-02-07 18:16:16 +01:00
Francesco Mazzoli
e9cc0cd353 do not pick full-packet if it'd result in more operations
See comment and
<https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
44df2109c8 Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE
See comment for details.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
5ca10480b0 avoid selecting half-packets when unnecessary
See
<https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation>
for an explanation of the problem this solves.

In short, for some reason, before this commit the half-packet is
selected when the array / matrix size is not a multiple of
`unpacket_traits<PacketType>::size`, where `PacketType` starts out
being the full Packet.

For example, for some data of 100 `float`s, `Packet4f` will be
selected rather than `Packet8f`, because 100 is not a multiple of 8,
the size of `Packet8f`.

This commit switches to selecting the half-packet if the size is
less than the packet size, which seems to make more sense.

As I stated in the SO post I'm not sure that I'm understanding the
issue correctly, but this fix resolves the issue in my program. Moreover,
`make check` passes, with the exception of line 614 and 616 in
`test/packetmath.cpp`, which however also fail on master on my machine:

    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0);
    ...
    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
2020-02-07 18:16:16 +01:00
Eugene Zhulenev
f584bd9b30 Fail at compile time if default executor tries to use non-default device 2020-02-06 22:43:24 +00:00
Eugene Zhulenev
3fda850c46 Remove dead code from TensorReduction.h 2020-01-29 18:45:31 +00:00
Jeff Daily
b5df8cabd7 fix hip-clang compilation due to new HIP scalar accessor 2020-01-20 21:08:52 +00:00
Deven Desai
6d284bb1b7 Fix for HIP breakage - 200115. Adding a missing EIGEN_DEVICE_FUNC attr 2020-01-16 00:51:43 +00:00
Srinivas Vasudevan
f6c6de5d63 Ensure Igamma does not NaN or Inf for large values. 2020-01-14 21:32:48 +00:00
Rasmus Munk Larsen
6601abce86 Remove rogue include in TypeCasting.h. Meta.h is already included by the top-level header in Eigen/Core. 2020-01-14 21:03:53 +00:00
Eugene Zhulenev
b9362fb8f7 Convert StridedLinearBufferCopy::Kind to enum class 2020-01-13 11:43:24 -08:00
Everton Constantino
5a8b97b401 Switching unpacket_traits<Packet4i> to vectorizable=true. 2020-01-13 16:08:20 -03:00
Everton Constantino
42838c28b8 Adding correct cache sizes for PPC architecture. 2020-01-13 16:58:14 +00:00
Christoph Hertzberg
1d0c45122a Removing executable bit from file mode 2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Rasmus Munk Larsen
e1ecfc162d call Explicitly ::rint and ::rintf for targets without c++11. Without this, the Windows build breaks when trying to compile numext::rint<double>. 2020-01-10 21:14:08 +00:00
Joel Holdsworth
da5a7afed0 Improvements to the tidiness and completeness of the NEON implementation 2020-01-10 18:31:15 +00:00
Anuj Rawat
452371cead Fix for gcc build error when using Eigen headers with AVX512 2020-01-10 18:05:42 +00:00
mehdi-goli
601f89dfd0 Adding RInt vector support for SYCL. 2020-01-10 18:00:36 +00:00
Matthew Powelson
2ea5a715cf Properly initialize b vector in SplineFitting
InterpolateWithDerivative does not initialize the be vector correctly. This issue is discussed In stackoverflow question 48382939.
2020-01-09 21:29:04 +00:00
Rasmus Munk Larsen
9254974115 Don't add EIGEN_DEVICE_FUNC to random() since ::rand is not available in Cuda. 2020-01-09 21:23:09 +00:00
Rasmus Munk Larsen
a3ec89b5bd Add missing EIGEN_DEVICE_FUNC annotations in MathFunctions.h. 2020-01-09 21:06:34 +00:00
Christoph Hertzberg
8333e03590 Use data.data() instead of &data (since it is not obvious that Array is trivially copyable) 2020-01-09 11:38:19 +01:00
Rasmus Munk Larsen
e6fcee995b Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower. 2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen
4217a9f090 The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()). 2020-01-08 22:21:37 +00:00
Christoph Hertzberg
9623c0c4b9 Fix formatting 2020-01-08 13:58:18 +01:00
Ilya Tokar
19876ced76 Bug #1785: Introduce numext::rint.
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
mehdi-goli
d0ae052da4 [SYCL Backend]
* Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL
 * Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend
 * Reducing the SYCL warning by  removing the abort function inside the kernel
 * Adding Strong inline to functions inside SYCL interop.
2020-01-07 15:13:37 +00:00
Everton Constantino
eedb7eeacf Protecting integer_types's long long test with a check to see if we have CXX11 support. 2020-01-07 14:35:35 +00:00
Christoph Hertzberg
bcbaad6d87 Bug #1800: Guard against misleading indentation 2020-01-03 13:47:43 +01:00
Janek Kozicki
00de570793 Fix -Werror -Wfloat-conversion warning. 2019-12-23 23:52:44 +01:00
Deven Desai
636e2bb3fa Fix for HIP breakage - 191220
The breakage was introduced by the following commit :

ae07801dd8

After the commit, HIPCC errors out on some tests with the following error

```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_device_1.dir/cxx11_tensor_device_1_generated_cxx11_tensor_device.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_device.cu:17:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor💯
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:129:12: error: no matching constructor for initialization of 'Eigen::internal::TensorBlockResourceRequirements'
    return {merge(lhs.shape_type, rhs.shape_type),           // shape_type
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
struct TensorBlockResourceRequirements {
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 5 arguments, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 3 were provided
...
...
```

The fix is to explicitly decalre the (implicitly called) constructor as a device func
2019-12-20 21:28:00 +00:00
Christoph Hertzberg
1e9664b147 Bug #1796: Make matrix squareroot usable for Map and Ref types 2019-12-20 18:10:22 +01:00
Christoph Hertzberg
d86544d654 Reduce code duplication and avoid confusing Doxygen 2019-12-19 19:48:39 +01:00
Christoph Hertzberg
dde279f57d Hide recursive meta templates from Doxygen 2019-12-19 19:47:23 +01:00
Christoph Hertzberg
c21771ac04 Use double-braces initialization (as everywhere else in the test-suite). 2019-12-19 19:20:48 +01:00
Christoph Hertzberg
a3273aeff8 Fix trivial shadow warning 2019-12-19 19:13:11 +01:00
Christoph Hertzberg
870e53c0f2 Bug #1788: Fix rule-of-three violations inside the stable modules.
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587)
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f Fix unit-test which I broke in previous fix 2019-12-19 13:42:14 +01:00
Eugene Zhulenev
7a65219a2e Fix TensorPadding bug in squeezed reads from inner dimension 2019-12-19 05:43:57 +00:00
Eugene Zhulenev
73e55525e5 Return const data pointer from TensorRef evaluator.data() 2019-12-18 23:19:36 +00:00
Eugene Zhulenev
ae07801dd8 Tensor block evaluation cost model 2019-12-18 20:07:00 +00:00
Christoph Hertzberg
72166d0e6e Fix some maybe-unitialized warnings 2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac Workaround class-memaccess warnings on newer GCC versions 2019-12-18 16:37:26 +01:00
Jeff Daily
de07c4d1c2 fix compilation due to new HIP scalar accessor 2019-12-17 20:27:30 +00:00
Eugene Zhulenev
788bef6ab5 Reduce block evaluation overhead for small tensor expressions 2019-12-17 19:06:14 +00:00
Rasmus Munk Larsen
7252163335 Add default definition for EIGEN_PREDICT_* 2019-12-16 22:31:59 +00:00
Rasmus Munk Larsen
a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
Christoph Hertzberg
8e5da71466 Resolve double-promotion warnings when compiling with clang.
`sin` was calling `sin(double)` instead of `std::sin(float)`
2019-12-13 22:46:40 +01:00
Christoph Hertzberg
9b7a2b43c2 Renamed .hgignore to .gitignore (removing hg-specific "syntax" line) 2019-12-13 19:40:57 +01:00
Ilya Tokar
06e99aaf40 Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen
73a8d572f5 Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781. 2019-12-12 19:34:25 +00:00
Srinivas Vasudevan
88062b7fed Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one. 2019-12-12 01:56:54 +00:00
Eugene Zhulenev
381f8f3139 Initialize non-trivially constructible types when allocating a temp buffer. 2019-12-12 01:31:30 +00:00
Eugene Zhulenev
64272c7f40 Squeeze reads from two inner dimensions in TensorPadding 2019-12-11 16:54:51 -08:00
Eugene Zhulenev
963ba1015b Add back accidentally deleted default constructor to TensorExecutorTilingContext. 2019-12-11 18:47:55 +00:00
Joel Holdsworth
1b6e0395e6 Added io test 2019-12-11 18:22:57 +00:00
Joel Holdsworth
3c0ef9f394 IO: Fixed printing of char and unsigned char matrices 2019-12-11 18:22:57 +00:00
Joel Holdsworth
e87af0ed37 Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t 2019-12-11 18:22:57 +00:00
Gael Guennebaud
15b3bcfca0 Bug 1786: fix compilation with MSVC 2019-12-11 16:16:38 +01:00
Eugene Zhulenev
c9220c035f Remove block memory allocation required by removed block evaluation API 2019-12-10 17:15:55 -08:00
Eugene Zhulenev
1c879eb010 Remove V2 suffix from TensorBlock 2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880 Remove TensorBlock.h and old TensorBlock/BlockMapper 2019-12-10 14:31:44 -08:00
Deven Desai
c49f0d851a Fix for HIP breakage detected on 191210
The following commit introduces compile errors when running eigen with hipcc

2918f85ba9

hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods
2019-12-10 22:14:05 +00:00
Eugene Zhulenev
2918f85ba9 Do not use std::vector in getResourceRequirements 2019-12-09 16:19:55 -08:00
Artem Belevich
8056a05b54 Undo the block size change.
.z *is* used by the EigenContractionKernelInternal().
2019-12-09 11:10:29 -08:00
Eugene Zhulenev
dbb703d44e Add async evaluation support to TensorSelectOp 2019-12-09 18:36:13 +00:00
Janek Kozicki
11d6465326 fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing. 2019-12-06 21:07:39 +01:00
Eugene Zhulenev
bb7ccac3af Add recursive work splitting to EvalShardedByInnerDimContext 2019-12-05 14:51:49 -08:00
Artem Belevich
25230d1862 Improve performance of contraction kernels
* Force-inline implementations. They pass around pointers to shared memory
  blocks. Without inlining compiler must operate via generic pointers.
  Inlining allows compiler to detect that we're operating on shared memory
  which allows generation of substantially faster code.

* Fixed a long-standing typo which resulted in launching 8x more kernels
  than we needed (.z dimension of the block is unused by the kernel).
2019-12-05 12:48:34 -08:00
Gael Guennebaud
08eeb648ea update hg to git hashes 2019-12-05 16:33:24 +01:00
Rasmus Munk Larsen
366cf005b0 Add missing initialization in cxx11_tensor_trace.cpp. 2019-12-04 23:56:37 +00:00
Gael Guennebaud
c488b8b32f Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
Gael Guennebaud
8fbe0e4699 Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
Gael Guennebaud
114a15c66a Added tag before-git-migration for changeset a7c7d329d8 2019-12-04 10:06:00 +01:00
603 changed files with 44306 additions and 20954 deletions

19
.clang-format Normal file
View File

@@ -0,0 +1,19 @@
---
BasedOnStyle: Google
ColumnLimit: 120
---
Language: Cpp
BasedOnStyle: Google
ColumnLimit: 120
StatementMacros:
- EIGEN_STATIC_ASSERT
- EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
- EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
SortIncludes: false
AttributeMacros:
- EIGEN_STRONG_INLINE
- EIGEN_ALWAYS_INLINE
- EIGEN_DEVICE_FUNC
- EIGEN_DONT_INLINE
- EIGEN_DEPRECATED
- EIGEN_UNUSED

View File

@@ -1,4 +1,3 @@
syntax: glob
qrc_*cxx
*.orig
*.pyc
@@ -36,3 +35,7 @@ lapack/reference
.*project
.settings
Makefile
!ci/build.gitlab-ci.yml
!scripts/buildtests.in
!Eigen/Core
!Eigen/src/Core

34
.gitlab-ci.yml Normal file
View File

@@ -0,0 +1,34 @@
# This file is part of Eigen, a lightweight C++ template library
# for linear algebra.
#
# Copyright (C) 2023, The Eigen Authors
#
# This Source Code Form is subject to the terms of the Mozilla
# Public License v. 2.0. If a copy of the MPL was not distributed
# with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
stages:
- checkformat
- build
- test
- deploy
variables:
# CMake build directory.
EIGEN_CI_BUILDDIR: .build
# Specify the CMake build target.
EIGEN_CI_BUILD_TARGET: ""
# If a test regex is specified, that will be selected.
# Otherwise, we will try a label if specified.
EIGEN_CI_CTEST_REGEX: ""
EIGEN_CI_CTEST_LABEL: ""
EIGEN_CI_CTEST_ARGS: ""
include:
- "/ci/checkformat.gitlab-ci.yml"
- "/ci/common.gitlab-ci.yml"
- "/ci/build.linux.gitlab-ci.yml"
- "/ci/build.windows.gitlab-ci.yml"
- "/ci/test.linux.gitlab-ci.yml"
- "/ci/test.windows.gitlab-ci.yml"
- "/ci/deploy.gitlab-ci.yml"

View File

@@ -0,0 +1,69 @@
<!--
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by "bug::confirmed" or "bug::unconfirmed" and "bugzilla" label:
- https://gitlab.com/libeigen/eigen/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=bug%3A%3Aconfirmed
- https://gitlab.com/libeigen/eigen/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=bug%3A%3Aunconfirmed
- https://gitlab.com/libeigen/eigen/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=bugzilla
and verify the issue you're about to submit isn't a duplicate. -->
### Summary
<!-- Summarize the bug encountered concisely. -->
### Environment
<!-- Please provide your development environment here -->
- **Operating System** : Windows/Linux
- **Architecture** : x64/Arm64/PowerPC ...
- **Eigen Version** : 3.3.9
- **Compiler Version** : Gcc7.0
- **Compile Flags** : -O3 -march=native
- **Vector Extension** : SSE/AVX/NEON ...
### Minimal Example
<!-- If possible, please create a minimal example here that exhibits the problematic behavior.
You can also link to [godbolt](https://godbolt.org). But please note that you need to click
the "Share" button in the top right-hand corner of the godbolt page where you reproduce the sample
code to get the share link instead of in your browser address bar.
You can read [the guidelines on stackoverflow](https://stackoverflow.com/help/minimal-reproducible-example)
on how to create a good minimal example. -->
```cpp
//show your code here
```
### Steps to reproduce
<!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. -->
1. first step
2. second step
3. ...
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
### Relevant logs
<!-- Add relevant code snippets or program output within blocks marked by " ``` " -->
<!-- OPTIONAL: remove this section if you are not reporting a compilation warning issue.-->
### Warning Messages
<!-- Show us the warning messages you got! -->
<!-- OPTIONAL: remove this section if you are not reporting a performance issue. -->
### Benchmark scripts and results
<!-- Please share any benchmark scripts - either standalone, or using [Google Benchmark](https://github.com/google/benchmark). -->
### Anything else that might help
<!-- It will be better to provide us more information to help narrow down the cause.
Including but not limited to the following:
- lines of code that might help us diagnose the problem.
- potential ways to address the issue.
- last known working/first broken version (release number or commit hash). -->
- [ ] Have a plan to fix this issue.

View File

@@ -0,0 +1,7 @@
### Describe the feature you would like to be implemented.
### Would such a feature be useful for other users? Why?
### Any hints on how to implement the requested feature?
### Additional resources

View File

@@ -0,0 +1,26 @@
<!--
Thanks for contributing a merge request! Please name and fully describe your MR as you would for a commit message.
If the MR fixes an issue, please include "Fixes #issue" in the commit message and the MR description.
In addition, we recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen) and [git page](https://eigen.tuxfamily.org/index.php?title=Git), which will help you submit a more standardized MR.
Before submitting the MR, you also need to complete the following checks:
- Make one PR per feature/bugfix (don't mix multiple changes into one PR). Avoid committing unrelated changes.
- Rebase before committing
- For code changes, run the test suite (at least the tests that are likely affected by the change).
See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- If possible, add a test (both for bug-fixes as well as new features)
- Make sure new features are documented
Note that we are a team of volunteers; we appreciate your patience during the review process.
Again, thanks for contributing! -->
### Reference issue
<!-- You can link to a specific issue using the gitlab syntax #<issue number> -->
### What does this implement/fix?
<!--Please explain your changes.-->
### Additional information
<!--Any additional information you think is important.-->

File diff suppressed because it is too large Load Diff

203
COPYING.APACHE Normal file
View File

@@ -0,0 +1,203 @@
/*
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

View File

@@ -23,4 +23,4 @@
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
*/

View File

@@ -1,52 +1,51 @@
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.

View File

@@ -43,4 +43,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_CHOLESKY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -22,7 +22,7 @@ extern "C" {
* This module provides an interface to the Cholmod library which is part of the <a href="http://www.suitesparse.com">suitesparse</a> package.
* It provides the two following main factorization classes:
* - class CholmodSupernodalLLT: a supernodal LLT Cholesky factorization.
* - class CholmodDecomposiiton: a general L(D)LT Cholesky factorization with automatic or explicit runtime selection of the underlying factorization method (supernodal or simplicial).
* - class CholmodDecomposition: a general L(D)LT Cholesky factorization with automatic or explicit runtime selection of the underlying factorization method (supernodal or simplicial).
*
* For the sake of completeness, this module also propose the two following classes:
* - class CholmodSimplicialLLT

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_CORE_H
#define EIGEN_CORE_H
// first thing Eigen does: stop the compiler from committing suicide
// first thing Eigen does: stop the compiler from reporting useless warnings.
#include "src/Core/util/DisableStupidWarnings.h"
// then include this file where all our macros are defined. It's really important to do it first because
@@ -22,7 +22,7 @@
#include "src/Core/util/ConfigureVectorization.h"
// We need cuda_runtime.h/hip_runtime.h to ensure that
// the EIGEN_USING_STD_MATH macro works properly on the device side
// the EIGEN_USING_STD macro works properly on the device side
#if defined(EIGEN_CUDACC)
#include <cuda_runtime.h>
#elif defined(EIGEN_HIPCC)
@@ -36,10 +36,17 @@
// Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3)
// See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details.
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6)
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6) && EIGEN_GNUC_AT_MOST(5,5)
#pragma GCC optimize ("-fno-ipa-cp-clone")
#endif
// Prevent ICC from specializing std::complex operators that silently fail
// on device. This allows us to use our own device-compatible specializations
// instead.
#if defined(EIGEN_COMP_ICC) && defined(EIGEN_GPU_COMPILE_PHASE) \
&& !defined(_OVERRIDE_COMPLEX_SPECIALIZATION_)
#define _OVERRIDE_COMPLEX_SPECIALIZATION_ 1
#endif
#include <complex>
// this include file manages BLAS and MKL related macros
@@ -51,6 +58,10 @@
#define EIGEN_HAS_GPU_FP16
#endif
#if defined(EIGEN_HAS_CUDA_BF16) || defined(EIGEN_HAS_HIP_BF16)
#define EIGEN_HAS_GPU_BF16
#endif
#if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE)
#define EIGEN_HAS_OPENMP
#endif
@@ -73,6 +84,7 @@
#include <cassert>
#include <functional>
#ifndef EIGEN_NO_IO
#include <sstream>
#include <iosfwd>
#endif
#include <cstring>
@@ -97,7 +109,8 @@
#endif
// required for __cpuid, needs to be included after cmath
#if EIGEN_COMP_MSVC && EIGEN_ARCH_i386_OR_x86_64 && !EIGEN_OS_WINCE
// also required for _BitScanReverse on Windows on ARM
#if EIGEN_COMP_MSVC && (EIGEN_ARCH_i386_OR_x86_64 || EIGEN_ARCH_ARM64) && !EIGEN_OS_WINCE
#include <intrin.h>
#endif
@@ -107,7 +120,7 @@
#undef isnan
#undef isinf
#undef isfinite
#include <SYCL/sycl.hpp>
#include <CL/sycl.hpp>
#include <map>
#include <memory>
#include <utility>
@@ -162,6 +175,7 @@ using std::ptrdiff_t;
#include "src/Core/arch/Default/ConjHelper.h"
// Generic half float support
#include "src/Core/arch/Default/Half.h"
#include "src/Core/arch/Default/BFloat16.h"
#include "src/Core/arch/Default/TypeCasting.h"
#include "src/Core/arch/Default/GenericPacketMathFunctionsFwd.h"
@@ -202,6 +216,10 @@ using std::ptrdiff_t;
#include "src/Core/arch/NEON/TypeCasting.h"
#include "src/Core/arch/NEON/MathFunctions.h"
#include "src/Core/arch/NEON/Complex.h"
#elif defined EIGEN_VECTORIZE_SVE
#include "src/Core/arch/SVE/PacketMath.h"
#include "src/Core/arch/SVE/TypeCasting.h"
#include "src/Core/arch/SVE/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_ZVECTOR
#include "src/Core/arch/ZVector/PacketMath.h"
#include "src/Core/arch/ZVector/MathFunctions.h"
@@ -329,6 +347,12 @@ using std::ptrdiff_t;
#include "src/Core/CoreIterators.h"
#include "src/Core/ConditionEstimator.h"
#if defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/MatrixProduct.h"
#elif defined EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/GeneralBlockPanelKernel.h"
#endif
#include "src/Core/BooleanRedux.h"
#include "src/Core/Select.h"
#include "src/Core/VectorwiseOp.h"

View File

@@ -58,4 +58,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_EIGENVALUES_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -50,11 +50,10 @@
#include "src/Geometry/Umeyama.h"
// Use the SSE optimized version whenever possible.
#if defined EIGEN_VECTORIZE_SSE
#include "src/Geometry/arch/Geometry_SSE.h"
#if (defined EIGEN_VECTORIZE_SSE) || (defined EIGEN_VECTORIZE_NEON)
#include "src/Geometry/arch/Geometry_SIMD.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_GEOMETRY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -27,4 +27,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_HOUSEHOLDER_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -29,5 +29,4 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_JACOBI_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -38,13 +38,10 @@
#include "src/LU/Determinant.h"
#include "src/LU/InverseImpl.h"
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
#include "src/LU/arch/Inverse_SSE.h"
#if defined EIGEN_VECTORIZE_SSE || defined EIGEN_VECTORIZE_NEON
#include "src/LU/arch/InverseSize4.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_LU_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -48,4 +48,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_QR_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -37,4 +37,3 @@ void *qRealloc(void *ptr, std::size_t size)
#endif
#endif // EIGEN_QTMALLOC_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -48,4 +48,3 @@
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SVD_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

View File

@@ -25,8 +25,6 @@
#include "src/Core/util/DisableStupidWarnings.h"
#include "src/SparseLU/SparseLU_gemm_kernel.h"
#include "src/SparseLU/SparseLU_Structs.h"
#include "src/SparseLU/SparseLU_SupernodalMatrix.h"
#include "src/SparseLU/SparseLUImpl.h"

View File

@@ -45,7 +45,7 @@ namespace internal {
* matrix \f$ A \f$ such that \f$ A = P^TLDL^*P \f$, where P is a permutation matrix, L
* is lower triangular with a unit diagonal and D is a diagonal matrix.
*
* The decomposition uses pivoting to ensure stability, so that L will have
* The decomposition uses pivoting to ensure stability, so that D will have
* zeros in the bottom right rank(A) - n submatrix. Avoiding the square root
* on D also stabilizes the computation.
*
@@ -53,7 +53,7 @@ namespace internal {
* decomposition to determine whether a system of equations has a solution.
*
* This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism.
*
*
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt(), class LLT
*/
template<typename _MatrixType, int _UpLo> class LDLT
@@ -200,7 +200,7 @@ template<typename _MatrixType, int _UpLo> class LDLT
* \f$ L^* y_4 = y_3 \f$ and \f$ P x = y_4 \f$ in succession. If the matrix \f$ A \f$ is singular, then
* \f$ D \f$ will also be singular (all the other matrices are invertible). In that case, the
* least-square solution of \f$ D y_3 = y_2 \f$ is computed. This does not mean that this function
* computes the least-square solution of \f$ A x = b \f$ is \f$ A \f$ is singular.
* computes the least-square solution of \f$ A x = b \f$ if \f$ A \f$ is singular.
*
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt()
*/
@@ -246,8 +246,8 @@ template<typename _MatrixType, int _UpLo> class LDLT
*/
const LDLT& adjoint() const { return *this; };
inline Index rows() const { return m_matrix.rows(); }
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
/** \brief Reports whether previous computation was successful.
*

View File

@@ -199,10 +199,10 @@ template<typename _MatrixType, int _UpLo> class LLT
* This method is provided for compatibility with other matrix decompositions, thus enabling generic code such as:
* \code x = decomposition.adjoint().solve(b) \endcode
*/
const LLT& adjoint() const { return *this; };
const LLT& adjoint() const EIGEN_NOEXCEPT { return *this; };
inline Index rows() const { return m_matrix.rows(); }
inline Index cols() const { return m_matrix.cols(); }
inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
template<typename VectorType>
LLT & rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);

View File

@@ -172,7 +172,8 @@ seqN(FirstType first, SizeType size) {
return ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,typename internal::cleanup_index_type<SizeType>::type>(first,size);
}
#ifdef EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX11
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and with positive (or negative) increment \a incr
*
@@ -183,24 +184,6 @@ seqN(FirstType first, SizeType size) {
*
* \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType)
*/
template<typename FirstType,typename LastType, typename IncrType>
auto seq(FirstType f, LastType l, IncrType incr);
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and unit increment
*
* It is essentially an alias to:
* \code
* seqN(f,l-f+1);
* \endcode
*
* \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType)
*/
template<typename FirstType,typename LastType>
auto seq(FirstType f, LastType l);
#else // EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX11
template<typename FirstType,typename LastType>
auto seq(FirstType f, LastType l) -> decltype(seqN(typename internal::cleanup_index_type<FirstType>::type(f),
( typename internal::cleanup_index_type<LastType>::type(l)
@@ -211,6 +194,15 @@ auto seq(FirstType f, LastType l) -> decltype(seqN(typename internal::cleanup_in
-typename internal::cleanup_index_type<FirstType>::type(f)+fix<1>()));
}
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and unit increment
*
* It is essentially an alias to:
* \code
* seqN(f,l-f+1);
* \endcode
*
* \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType)
*/
template<typename FirstType,typename LastType, typename IncrType>
auto seq(FirstType f, LastType l, IncrType incr)
-> decltype(seqN(typename internal::cleanup_index_type<FirstType>::type(f),
@@ -317,26 +309,12 @@ seq(const symbolic::BaseExpr<FirstTypeDerived> &f, const symbolic::BaseExpr<Last
}
#endif // EIGEN_HAS_CXX11
#endif // EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX11 || defined(EIGEN_PARSED_BY_DOXYGEN)
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with increment \a incr.
*
* It is a shortcut for: \code seqN(last-(size-fix<1>)*incr, size, incr) \endcode
*
* \sa lastN(SizeType), seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */
template<typename SizeType,typename IncrType>
auto lastN(SizeType size, IncrType incr)
-> decltype(seqN(Eigen::last-(size-fix<1>())*incr, size, incr))
{
return seqN(Eigen::last-(size-fix<1>())*incr, size, incr);
}
#if EIGEN_HAS_CXX11
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with a unit increment.
*
* \anchor indexing_lastN
*
* It is a shortcut for: \code seq(last+fix<1>-size, last) \endcode
*
* \sa lastN(SizeType,IncrType, seqN(FirstType,SizeType), seq(FirstType,LastType) */
@@ -346,6 +324,21 @@ auto lastN(SizeType size)
{
return seqN(Eigen::last+fix<1>()-size, size);
}
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with increment \a incr.
*
* \anchor indexing_lastN_with_incr
*
* It is a shortcut for: \code seqN(last-(size-fix<1>)*incr, size, incr) \endcode
*
* \sa lastN(SizeType), seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */
template<typename SizeType,typename IncrType>
auto lastN(SizeType size, IncrType incr)
-> decltype(seqN(Eigen::last-(size-fix<1>())*incr, size, incr))
{
return seqN(Eigen::last-(size-fix<1>())*incr, size, incr);
}
#endif
namespace internal {

View File

@@ -117,7 +117,7 @@ class Array
{
return Base::_set(other);
}
/** Default constructor.
*
* For fixed-size matrices, does nothing.
@@ -157,13 +157,21 @@ class Array
EIGEN_DEVICE_FUNC
Array& operator=(Array&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable<Scalar>::value)
{
other.swap(*this);
Base::operator=(std::move(other));
return *this;
}
#endif
#if EIGEN_HAS_CXX11
/** \copydoc PlainObjectBase(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
/** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients. \cpp11
*
* \only_for_vectors
*
* This constructor is for 1D array or vectors with more than 4 coefficients.
* There exists C++98 analogue constructors for fixed-size array/vector having 1, 2, 3, or 4 coefficients.
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
* constructor must match the the fixed number of rows (resp. columns) of \c *this.
*
* Example: \include Array_variadic_ctor_cxx11.cpp
* Output: \verbinclude Array_variadic_ctor_cxx11.out
@@ -177,24 +185,24 @@ class Array
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs an array and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11
*
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
*
* Example: \include Array_initializer_list_23_cxx11.cpp
* Output: \verbinclude Array_initializer_list_23_cxx11.out
*
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered.
*
*
* In the case of a compile-time column 1D array, implicit transposition from a single row is allowed.
* Therefore <code> Array<int,Dynamic,1>{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>Array<int,Dynamic,1>{{1},{2},{3},{4},{5}}</code> can be avoided:
*
*
* Example: \include Array_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Array_initializer_list_vector_cxx11.out
*
*
* In the case of fixed-sized arrays, the initializer list sizes must exactly match the array sizes,
* and implicit transposition is allowed for compile-time 1D arrays only.
*
*
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
@@ -241,7 +249,7 @@ class Array
/** constructs an initialized 2D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) */
Array(const Scalar& val0, const Scalar& val1);
#endif // end EIGEN_PARSED_BY_DOXYGEN
#endif // end EIGEN_PARSED_BY_DOXYGEN
/** constructs an initialized 3D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
@@ -288,8 +296,10 @@ class Array
: Base(other.derived())
{ }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT{ return 1; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return this->innerSize(); }
#ifdef EIGEN_ARRAY_PLUGIN
#include EIGEN_ARRAY_PLUGIN
@@ -322,7 +332,7 @@ class Array
* template parameter, i.e.:
* - `ArrayRowsCols<Type>` where `Rows` and `Cols` can be \c 2,\c 3,\c 4, or \c X for fixed or dynamic size.
* - `ArraySize<Type>` where `Size` can be \c 2,\c 3,\c 4 or \c X for fixed or dynamic size 1D arrays.
*
*
* \sa class Array
*/
@@ -367,7 +377,7 @@ using Array##SizeSuffix##SizeSuffix = Array<Type, Size, Size>; \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##SizeSuffix = Array<Type, Size, 1>;
using Array##SizeSuffix = Array<Type, Size, 1>;
#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Size) \
/** \ingroup arraytypedefs */ \
@@ -391,7 +401,7 @@ EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(4)
#undef EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS
#endif // EIGEN_HAS_CXX11
#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, SizeSuffix) \
using Eigen::Matrix##SizeSuffix##TypeSuffix; \
using Eigen::Vector##SizeSuffix##TypeSuffix; \

View File

@@ -153,8 +153,8 @@ template<typename Derived> class ArrayBase
// inline void evalTo(Dest& dst) const { dst = matrix(); }
protected:
EIGEN_DEVICE_FUNC
ArrayBase() : Base() {}
EIGEN_DEFAULT_COPY_CONSTRUCTOR(ArrayBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(ArrayBase)
private:
explicit ArrayBase(Index);

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_ARRAYWRAPPER_H
#define EIGEN_ARRAYWRAPPER_H
namespace Eigen {
namespace Eigen {
/** \class ArrayWrapper
* \ingroup Core_Module
@@ -60,14 +60,14 @@ class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
EIGEN_DEVICE_FUNC
explicit EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
@@ -91,8 +91,8 @@ class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
inline void evalTo(Dest& dst) const { dst = m_expression; }
EIGEN_DEVICE_FUNC
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
{
return m_expression;
}
@@ -158,14 +158,14 @@ class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> >
EIGEN_DEVICE_FUNC
explicit inline MatrixWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
@@ -185,8 +185,8 @@ class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> >
}
EIGEN_DEVICE_FUNC
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
{
return m_expression;
}

View File

@@ -17,7 +17,7 @@ namespace Eigen {
// This implementation is based on Assign.h
namespace internal {
/***************************************************************************
* Part 1 : the logic deciding a strategy for traversal and unrolling *
***************************************************************************/
@@ -29,12 +29,12 @@ struct copy_using_evaluator_traits
{
typedef typename DstEvaluator::XprType Dst;
typedef typename Dst::Scalar DstScalar;
enum {
DstFlags = DstEvaluator::Flags,
SrcFlags = SrcEvaluator::Flags
};
public:
enum {
DstAlignment = DstEvaluator::Alignment,
@@ -99,7 +99,8 @@ private:
public:
enum {
Traversal = (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
Traversal = int(Dst::SizeAtCompileTime) == 0 ? int(AllAtOnceTraversal) // If compile-size is zero, traversing will fail at compile-time.
: (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
: int(MayInnerVectorize) ? int(InnerVectorizedTraversal)
: int(MayLinearVectorize) ? int(LinearVectorizedTraversal)
: int(MaySliceVectorize) ? int(SliceVectorizedTraversal)
@@ -137,7 +138,7 @@ public:
? int(CompleteUnrolling)
: int(NoUnrolling) )
: int(Traversal) == int(LinearTraversal)
? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling)
? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling)
: int(NoUnrolling) )
#if EIGEN_UNALIGNED_VECTORIZE
: int(Traversal) == int(SliceVectorizedTraversal)
@@ -199,7 +200,7 @@ struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling
// FIXME: this is not very clean, perhaps this information should be provided by the kernel?
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
enum {
outer = Index / DstXprType::InnerSizeAtCompileTime,
inner = Index % DstXprType::InnerSizeAtCompileTime
@@ -265,7 +266,7 @@ struct copy_using_evaluator_innervec_CompleteUnrolling
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum {
outer = Index / DstXprType::InnerSizeAtCompileTime,
inner = Index % DstXprType::InnerSizeAtCompileTime,
@@ -316,6 +317,22 @@ template<typename Kernel,
int Unrolling = Kernel::AssignmentTraits::Unrolling>
struct dense_assignment_loop;
/************************
***** Special Cases *****
************************/
// Zero-sized assignment is a no-op.
template<typename Kernel, int Unrolling>
struct dense_assignment_loop<Kernel, AllAtOnceTraversal, Unrolling>
{
EIGEN_DEVICE_FUNC static void EIGEN_STRONG_INLINE run(Kernel& /*kernel*/)
{
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
EIGEN_STATIC_ASSERT(int(DstXprType::SizeAtCompileTime) == 0,
EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT)
}
};
/************************
*** Default traversal ***
************************/
@@ -430,10 +447,10 @@ struct dense_assignment_loop<Kernel, LinearVectorizedTraversal, CompleteUnrollin
{
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum { size = DstXprType::SizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
alignedSize = (size/packetSize)*packetSize };
alignedSize = (int(size)/packetSize)*packetSize };
copy_using_evaluator_innervec_CompleteUnrolling<Kernel, 0, alignedSize>::run(kernel);
copy_using_evaluator_DefaultTraversal_CompleteUnrolling<Kernel, alignedSize, size>::run(kernel);
@@ -572,14 +589,15 @@ struct dense_assignment_loop<Kernel, SliceVectorizedTraversal, InnerUnrolling>
typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
typedef typename Kernel::PacketType PacketType;
enum { size = DstXprType::InnerSizeAtCompileTime,
enum { innerSize = DstXprType::InnerSizeAtCompileTime,
packetSize =unpacket_traits<PacketType>::size,
vectorizableSize = (size/packetSize)*packetSize };
vectorizableSize = (int(innerSize) / int(packetSize)) * int(packetSize),
size = DstXprType::SizeAtCompileTime };
for(Index outer = 0; outer < kernel.outerSize(); ++outer)
{
copy_using_evaluator_innervec_InnerUnrolling<Kernel, 0, vectorizableSize, 0, 0>::run(kernel, outer);
copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, vectorizableSize, size>::run(kernel, outer);
copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, vectorizableSize, innerSize>::run(kernel, outer);
}
}
};
@@ -603,14 +621,14 @@ protected:
typedef typename DstEvaluatorTypeT::XprType DstXprType;
typedef typename SrcEvaluatorTypeT::XprType SrcXprType;
public:
typedef DstEvaluatorTypeT DstEvaluatorType;
typedef SrcEvaluatorTypeT SrcEvaluatorType;
typedef typename DstEvaluatorType::Scalar Scalar;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: m_dst(dst), m_src(src), m_functor(func), m_dstExpr(dstExpr)
@@ -619,58 +637,58 @@ public:
AssignmentTraits::debug();
#endif
}
EIGEN_DEVICE_FUNC Index size() const { return m_dstExpr.size(); }
EIGEN_DEVICE_FUNC Index innerSize() const { return m_dstExpr.innerSize(); }
EIGEN_DEVICE_FUNC Index outerSize() const { return m_dstExpr.outerSize(); }
EIGEN_DEVICE_FUNC Index rows() const { return m_dstExpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_dstExpr.cols(); }
EIGEN_DEVICE_FUNC Index outerStride() const { return m_dstExpr.outerStride(); }
EIGEN_DEVICE_FUNC DstEvaluatorType& dstEvaluator() { return m_dst; }
EIGEN_DEVICE_FUNC const SrcEvaluatorType& srcEvaluator() const { return m_src; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_dstExpr.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index innerSize() const EIGEN_NOEXCEPT { return m_dstExpr.innerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerSize() const EIGEN_NOEXCEPT { return m_dstExpr.outerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dstExpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_dstExpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerStride() const EIGEN_NOEXCEPT { return m_dstExpr.outerStride(); }
EIGEN_DEVICE_FUNC DstEvaluatorType& dstEvaluator() EIGEN_NOEXCEPT { return m_dst; }
EIGEN_DEVICE_FUNC const SrcEvaluatorType& srcEvaluator() const EIGEN_NOEXCEPT { return m_src; }
/// Assign src(row,col) to dst(row,col) through the assignment functor.
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index row, Index col)
{
m_functor.assignCoeff(m_dst.coeffRef(row,col), m_src.coeff(row,col));
}
/// \sa assignCoeff(Index,Index)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index index)
{
m_functor.assignCoeff(m_dst.coeffRef(index), m_src.coeff(index));
}
/// \sa assignCoeff(Index,Index)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeffByOuterInner(Index outer, Index inner)
{
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
assignCoeff(row, col);
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index row, Index col)
{
m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(row,col), m_src.template packet<LoadMode,PacketType>(row,col));
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index index)
{
m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(index), m_src.template packet<LoadMode,PacketType>(index));
}
template<int StoreMode, int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacketByOuterInner(Index outer, Index inner)
{
Index row = rowIndexByOuterInner(outer, inner);
Index row = rowIndexByOuterInner(outer, inner);
Index col = colIndexByOuterInner(outer, inner);
assignPacket<StoreMode,LoadMode,PacketType>(row, col);
}
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index rowIndexByOuterInner(Index outer, Index inner)
{
typedef typename DstEvaluatorType::ExpressionTraits Traits;
@@ -693,7 +711,7 @@ public:
{
return m_dstExpr.data();
}
protected:
DstEvaluatorType& m_dst;
const SrcEvaluatorType& m_src;
@@ -716,13 +734,13 @@ protected:
typedef typename Base::DstXprType DstXprType;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor, 4> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC restricted_packet_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{
}
};
/***************************************************************************
* Part 5 : Entry point for dense rectangular assignment
***************************************************************************/
@@ -760,13 +778,23 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
typedef generic_dense_assignment_kernel<DstEvaluatorType,SrcEvaluatorType,Functor> Kernel;
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
dense_assignment_loop<Kernel>::run(kernel);
}
// Specialization for filling the destination with a constant value.
#ifndef EIGEN_GPU_COMPILE_PHASE
template<typename DstXprType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<typename DstXprType::Scalar>, DstXprType>& src, const internal::assign_op<typename DstXprType::Scalar,typename DstXprType::Scalar>& func)
{
resize_if_allowed(dst, src, func);
std::fill_n(dst.data(), dst.size(), src.functor()());
}
#endif
template<typename DstXprType, typename SrcXprType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const SrcXprType& src)
{
@@ -788,7 +816,7 @@ struct EigenBase2EigenBase {};
template<typename,typename> struct AssignmentKind { typedef EigenBase2EigenBase Kind; };
template<> struct AssignmentKind<DenseShape,DenseShape> { typedef Dense2Dense Kind; };
// This is the main assignment class
template< typename DstXprType, typename SrcXprType, typename Functor,
typename Kind = typename AssignmentKind< typename evaluator_traits<DstXprType>::Shape , typename evaluator_traits<SrcXprType>::Shape >::Kind,
@@ -813,7 +841,7 @@ void call_assignment(const Dst& dst, const Src& src)
{
call_assignment(dst, src, internal::assign_op<typename Dst::Scalar,typename Src::Scalar>());
}
// Deal with "assume-aliasing"
template<typename Dst, typename Src, typename Func>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -853,12 +881,12 @@ void call_assignment_no_alias(Dst& dst, const Src& src, const Func& func)
typedef typename internal::conditional<NeedToTranspose, Transpose<Dst>, Dst>::type ActualDstTypeCleaned;
typedef typename internal::conditional<NeedToTranspose, Transpose<Dst>, Dst&>::type ActualDstType;
ActualDstType actualDst(dst);
// TODO check whether this is the right place to perform these checks:
EIGEN_STATIC_ASSERT_LVALUE(Dst)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(ActualDstTypeCleaned,Src)
EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename ActualDstTypeCleaned::Scalar,typename Src::Scalar);
Assignment<ActualDstTypeCleaned,Src,Func>::run(actualDst, src, func);
}
@@ -875,7 +903,7 @@ void call_restricted_packet_assignment_no_alias(Dst& dst, const Src& src, const
SrcEvaluatorType srcEvaluator(src);
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
@@ -922,7 +950,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Dense2Dense, Weak>
#ifndef EIGEN_NO_DEBUG
internal::check_for_aliasing(dst, src);
#endif
call_dense_assignment_loop(dst, src, func);
}
};

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_BANDMATRIX_H
#define EIGEN_BANDMATRIX_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -45,7 +45,7 @@ class BandMatrixBase : public EigenBase<Derived>
};
public:
using Base::derived;
using Base::rows;
using Base::cols;
@@ -55,10 +55,10 @@ class BandMatrixBase : public EigenBase<Derived>
/** \returns the number of sub diagonals */
inline Index subs() const { return derived().subs(); }
/** \returns an expression of the underlying coefficient matrix */
inline const CoefficientsType& coeffs() const { return derived().coeffs(); }
/** \returns an expression of the underlying coefficient matrix */
inline CoefficientsType& coeffs() { return derived().coeffs(); }
@@ -67,7 +67,7 @@ class BandMatrixBase : public EigenBase<Derived>
* \warning the internal storage must be column major. */
inline Block<CoefficientsType,Dynamic,1> col(Index i)
{
EIGEN_STATIC_ASSERT((Options&RowMajor)==0,THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
EIGEN_STATIC_ASSERT((int(Options) & int(RowMajor)) == 0, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
Index start = 0;
Index len = coeffs().rows();
if (i<=supers())
@@ -90,7 +90,7 @@ class BandMatrixBase : public EigenBase<Derived>
template<int Index> struct DiagonalIntReturnType {
enum {
ReturnOpposite = (Options&SelfAdjoint) && (((Index)>0 && Supers==0) || ((Index)<0 && Subs==0)),
ReturnOpposite = (int(Options) & int(SelfAdjoint)) && (((Index) > 0 && Supers == 0) || ((Index) < 0 && Subs == 0)),
Conjugate = ReturnOpposite && NumTraits<Scalar>::IsComplex,
ActualIndex = ReturnOpposite ? -Index : Index,
DiagonalSize = (RowsAtCompileTime==Dynamic || ColsAtCompileTime==Dynamic)
@@ -130,7 +130,7 @@ class BandMatrixBase : public EigenBase<Derived>
eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers()));
return Block<const CoefficientsType,1,Dynamic>(coeffs(), supers()-i, std::max<Index>(0,i), 1, diagonalLength(i));
}
template<typename Dest> inline void evalTo(Dest& dst) const
{
dst.resize(rows(),cols());
@@ -192,7 +192,7 @@ struct traits<BandMatrix<_Scalar,_Rows,_Cols,_Supers,_Subs,_Options> >
Options = _Options,
DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic
};
typedef Matrix<Scalar,DataRowsAtCompileTime,ColsAtCompileTime,Options&RowMajor?RowMajor:ColMajor> CoefficientsType;
typedef Matrix<Scalar, DataRowsAtCompileTime, ColsAtCompileTime, int(Options) & int(RowMajor) ? RowMajor : ColMajor> CoefficientsType;
};
template<typename _Scalar, int Rows, int Cols, int Supers, int Subs, int Options>
@@ -211,16 +211,16 @@ class BandMatrix : public BandMatrixBase<BandMatrix<_Scalar,Rows,Cols,Supers,Sub
}
/** \returns the number of columns */
inline Index rows() const { return m_rows.value(); }
inline EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); }
/** \returns the number of rows */
inline Index cols() const { return m_coeffs.cols(); }
inline EIGEN_CONSTEXPR Index cols() const { return m_coeffs.cols(); }
/** \returns the number of super diagonals */
inline Index supers() const { return m_supers.value(); }
inline EIGEN_CONSTEXPR Index supers() const { return m_supers.value(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return m_subs.value(); }
inline EIGEN_CONSTEXPR Index subs() const { return m_subs.value(); }
inline const CoefficientsType& coeffs() const { return m_coeffs; }
inline CoefficientsType& coeffs() { return m_coeffs; }
@@ -275,16 +275,16 @@ class BandMatrixWrapper : public BandMatrixBase<BandMatrixWrapper<_CoefficientsT
}
/** \returns the number of columns */
inline Index rows() const { return m_rows.value(); }
inline EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); }
/** \returns the number of rows */
inline Index cols() const { return m_coeffs.cols(); }
inline EIGEN_CONSTEXPR Index cols() const { return m_coeffs.cols(); }
/** \returns the number of super diagonals */
inline Index supers() const { return m_supers.value(); }
inline EIGEN_CONSTEXPR Index supers() const { return m_supers.value(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return m_subs.value(); }
inline EIGEN_CONSTEXPR Index subs() const { return m_subs.value(); }
inline const CoefficientsType& coeffs() const { return m_coeffs; }

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_BLOCK_H
#define EIGEN_BLOCK_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
@@ -52,7 +52,7 @@ struct traits<Block<XprType, BlockRows, BlockCols, InnerPanel> > : traits<XprTyp
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
Flags = (traits<XprType>::Flags & (DirectAccessBit | (InnerPanel?CompressedAccessBit:0))) | FlagsLvalueBit | FlagsRowMajorBit,
// FIXME DirectAccessBit should not be handled by expressions
//
//
// Alignment is needed by MapBase's assertions
// We can sefely set it to false here. Internal alignment errors will be detected by an eigen_internal_assert in the respective evaluator
Alignment = 0
@@ -61,7 +61,7 @@ struct traits<Block<XprType, BlockRows, BlockCols, InnerPanel> > : traits<XprTyp
template<typename XprType, int BlockRows=Dynamic, int BlockCols=Dynamic, bool InnerPanel = false,
bool HasDirectAccess = internal::has_direct_access<XprType>::ret> class BlockImpl_dense;
} // end namespace internal
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, typename StorageKind> class BlockImpl;
@@ -109,9 +109,9 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
typedef Impl Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(Block)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Block)
typedef typename internal::remove_all<XprType>::type NestedExpression;
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -147,7 +147,7 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
&& startCol >= 0 && blockCols >= 0 && startCol <= xpr.cols() - blockCols);
}
};
// The generic default implementation for dense block simplu forward to the internal::BlockImpl_dense
// that must be specialized for direct and non-direct access...
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
@@ -260,19 +260,19 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool H
}
template<int LoadMode>
inline PacketScalar packet(Index rowId, Index colId) const
EIGEN_DEVICE_FUNC inline PacketScalar packet(Index rowId, Index colId) const
{
return m_xpr.template packet<Unaligned>(rowId + m_startRow.value(), colId + m_startCol.value());
}
template<int LoadMode>
inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
EIGEN_DEVICE_FUNC inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
{
m_xpr.template writePacket<Unaligned>(rowId + m_startRow.value(), colId + m_startCol.value(), val);
}
template<int LoadMode>
inline PacketScalar packet(Index index) const
EIGEN_DEVICE_FUNC inline PacketScalar packet(Index index) const
{
return m_xpr.template packet<Unaligned>
(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
@@ -280,7 +280,7 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool H
}
template<int LoadMode>
inline void writePacket(Index index, const PacketScalar& val)
EIGEN_DEVICE_FUNC inline void writePacket(Index index, const PacketScalar& val)
{
m_xpr.template writePacket<Unaligned>
(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
@@ -296,23 +296,23 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool H
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
{
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
XprType& nestedExpression() { return m_xpr; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startRow() const
{
return m_startRow.value();
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startRow() const EIGEN_NOEXCEPT
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startCol() const
{
return m_startCol.value();
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startCol() const EIGEN_NOEXCEPT
{
return m_startCol.value();
}
protected:
@@ -334,6 +334,17 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
enum {
XprTypeIsRowMajor = (int(traits<XprType>::Flags)&RowMajorBit) != 0
};
/** \internal Returns base+offset (unless base is null, in which case returns null).
* Adding an offset to nullptr is undefined behavior, so we must avoid it.
*/
template <typename Scalar>
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE
static Scalar* add_to_nullable_pointer(Scalar* base, Index offset)
{
return base != NULL ? base+offset : NULL;
}
public:
typedef MapBase<BlockType> Base;
@@ -344,8 +355,9 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr, Index i)
: Base(xpr.data() + i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor))
|| ((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && ( XprTypeIsRowMajor)) ? xpr.innerStride() : xpr.outerStride()),
: Base((BlockRows == 0 || BlockCols == 0) ? NULL : add_to_nullable_pointer(xpr.data(),
i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor))
|| ((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && ( XprTypeIsRowMajor)) ? xpr.innerStride() : xpr.outerStride())),
BlockRows==1 ? 1 : xpr.rows(),
BlockCols==1 ? 1 : xpr.cols()),
m_xpr(xpr),
@@ -359,7 +371,8 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
: Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol)),
: Base((BlockRows == 0 || BlockCols == 0) ? NULL : add_to_nullable_pointer(xpr.data(),
xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol))),
m_xpr(xpr), m_startRow(startRow), m_startCol(startCol)
{
init();
@@ -371,24 +384,26 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
BlockImpl_dense(XprType& xpr,
Index startRow, Index startCol,
Index blockRows, Index blockCols)
: Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol), blockRows, blockCols),
: Base((blockRows == 0 || blockCols == 0) ? NULL : add_to_nullable_pointer(xpr.data(),
xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol)),
blockRows, blockCols),
m_xpr(xpr), m_startRow(startRow), m_startCol(startCol)
{
init();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const EIGEN_NOEXCEPT
{
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index innerStride() const EIGEN_NOEXCEPT
{
return internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.innerStride()
@@ -396,23 +411,19 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index outerStride() const EIGEN_NOEXCEPT
{
return m_outerStride;
return internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.outerStride()
: m_xpr.innerStride();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startRow() const
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startRow() const EIGEN_NOEXCEPT { return m_startRow.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startCol() const
{
return m_startCol.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
StorageIndex startCol() const EIGEN_NOEXCEPT { return m_startCol.value(); }
#ifndef __SUNPRO_CC
// FIXME sunstudio is not friendly with the above friend...

View File

@@ -14,56 +14,58 @@ namespace Eigen {
namespace internal {
template<typename Derived, int UnrollCount, int Rows>
template<typename Derived, int UnrollCount, int InnerSize>
struct all_unroller
{
enum {
col = (UnrollCount-1) / Rows,
row = (UnrollCount-1) % Rows
IsRowMajor = (int(Derived::Flags) & int(RowMajor)),
i = (UnrollCount-1) / InnerSize,
j = (UnrollCount-1) % InnerSize
};
static inline bool run(const Derived &mat)
EIGEN_DEVICE_FUNC static inline bool run(const Derived &mat)
{
return all_unroller<Derived, UnrollCount-1, Rows>::run(mat) && mat.coeff(row, col);
return all_unroller<Derived, UnrollCount-1, InnerSize>::run(mat) && mat.coeff(IsRowMajor ? i : j, IsRowMajor ? j : i);
}
};
template<typename Derived, int Rows>
struct all_unroller<Derived, 0, Rows>
template<typename Derived, int InnerSize>
struct all_unroller<Derived, 0, InnerSize>
{
static inline bool run(const Derived &/*mat*/) { return true; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &/*mat*/) { return true; }
};
template<typename Derived, int Rows>
struct all_unroller<Derived, Dynamic, Rows>
template<typename Derived, int InnerSize>
struct all_unroller<Derived, Dynamic, InnerSize>
{
static inline bool run(const Derived &) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &) { return false; }
};
template<typename Derived, int UnrollCount, int Rows>
template<typename Derived, int UnrollCount, int InnerSize>
struct any_unroller
{
enum {
col = (UnrollCount-1) / Rows,
row = (UnrollCount-1) % Rows
IsRowMajor = (int(Derived::Flags) & int(RowMajor)),
i = (UnrollCount-1) / InnerSize,
j = (UnrollCount-1) % InnerSize
};
static inline bool run(const Derived &mat)
EIGEN_DEVICE_FUNC static inline bool run(const Derived &mat)
{
return any_unroller<Derived, UnrollCount-1, Rows>::run(mat) || mat.coeff(row, col);
return any_unroller<Derived, UnrollCount-1, InnerSize>::run(mat) || mat.coeff(IsRowMajor ? i : j, IsRowMajor ? j : i);
}
};
template<typename Derived, int Rows>
struct any_unroller<Derived, 0, Rows>
template<typename Derived, int InnerSize>
struct any_unroller<Derived, 0, InnerSize>
{
static inline bool run(const Derived & /*mat*/) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived & /*mat*/) { return false; }
};
template<typename Derived, int Rows>
struct any_unroller<Derived, Dynamic, Rows>
template<typename Derived, int InnerSize>
struct any_unroller<Derived, Dynamic, InnerSize>
{
static inline bool run(const Derived &) { return false; }
EIGEN_DEVICE_FUNC static inline bool run(const Derived &) { return false; }
};
} // end namespace internal
@@ -81,16 +83,16 @@ EIGEN_DEVICE_FUNC inline bool DenseBase<Derived>::all() const
typedef internal::evaluator<Derived> Evaluator;
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * (Evaluator::CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
&& SizeAtCompileTime * (int(Evaluator::CoeffReadCost) + int(NumTraits<Scalar>::AddCost)) <= EIGEN_UNROLLING_LIMIT
};
Evaluator evaluator(derived());
if(unroll)
return internal::all_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic, internal::traits<Derived>::RowsAtCompileTime>::run(evaluator);
return internal::all_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic, InnerSizeAtCompileTime>::run(evaluator);
else
{
for(Index j = 0; j < cols(); ++j)
for(Index i = 0; i < rows(); ++i)
if (!evaluator.coeff(i, j)) return false;
for(Index i = 0; i < derived().outerSize(); ++i)
for(Index j = 0; j < derived().innerSize(); ++j)
if (!evaluator.coeff(IsRowMajor ? i : j, IsRowMajor ? j : i)) return false;
return true;
}
}
@@ -105,16 +107,16 @@ EIGEN_DEVICE_FUNC inline bool DenseBase<Derived>::any() const
typedef internal::evaluator<Derived> Evaluator;
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * (Evaluator::CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
&& SizeAtCompileTime * (int(Evaluator::CoeffReadCost) + int(NumTraits<Scalar>::AddCost)) <= EIGEN_UNROLLING_LIMIT
};
Evaluator evaluator(derived());
if(unroll)
return internal::any_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic, internal::traits<Derived>::RowsAtCompileTime>::run(evaluator);
return internal::any_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic, InnerSizeAtCompileTime>::run(evaluator);
else
{
for(Index j = 0; j < cols(); ++j)
for(Index i = 0; i < rows(); ++i)
if (evaluator.coeff(i, j)) return true;
for(Index i = 0; i < derived().outerSize(); ++i)
for(Index j = 0; j < derived().innerSize(); ++j)
if (evaluator.coeff(IsRowMajor ? i : j, IsRowMajor ? j : i)) return true;
return false;
}
}
@@ -156,7 +158,7 @@ inline bool DenseBase<Derived>::allFinite() const
return !((derived()-derived()).hasNaN());
#endif
}
} // end namespace Eigen
#endif // EIGEN_ALLANDANY_H

View File

@@ -33,6 +33,8 @@ struct CommaInitializer
inline CommaInitializer(XprType& xpr, const Scalar& s)
: m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1)
{
eigen_assert(m_xpr.rows() > 0 && m_xpr.cols() > 0
&& "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.coeffRef(0,0) = s;
}
@@ -41,6 +43,8 @@ struct CommaInitializer
inline CommaInitializer(XprType& xpr, const DenseBase<OtherDerived>& other)
: m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows())
{
eigen_assert(m_xpr.rows() >= other.rows() && m_xpr.cols() >= other.cols()
&& "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.block(0, 0, other.rows(), other.cols()) = other;
}
@@ -103,7 +107,7 @@ struct CommaInitializer
EIGEN_EXCEPTION_SPEC(Eigen::eigen_assert_exception)
#endif
{
finished();
finished();
}
/** \returns the built matrix once all its coefficients have been set.

View File

@@ -14,7 +14,7 @@
#define EIGEN_COREEVALUATORS_H
namespace Eigen {
namespace internal {
// This class returns the evaluator kind from the expression storage kind.
@@ -63,8 +63,8 @@ template< typename T,
template< typename T,
typename Kind = typename evaluator_traits<typename T::NestedExpression>::Kind,
typename Scalar = typename T::Scalar> struct unary_evaluator;
// evaluator_traits<T> contains traits for evaluator<T>
// evaluator_traits<T> contains traits for evaluator<T>
template<typename T>
struct evaluator_traits_base
@@ -111,7 +111,7 @@ struct evaluator_base
{
// TODO that's not very nice to have to propagate all these traits. They are currently only needed to handle outer,inner indices.
typedef traits<ExpressionType> ExpressionTraits;
enum {
Alignment = 0
};
@@ -143,8 +143,8 @@ public:
#endif
eigen_internal_assert(outerStride==OuterStride);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const { return OuterStride; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index outerStride() const EIGEN_NOEXCEPT { return OuterStride; }
const Scalar *data;
};
@@ -172,7 +172,7 @@ struct evaluator<PlainObjectBase<Derived> >
IsVectorAtCompileTime = PlainObjectType::IsVectorAtCompileTime,
RowsAtCompileTime = PlainObjectType::RowsAtCompileTime,
ColsAtCompileTime = PlainObjectType::ColsAtCompileTime,
CoeffReadCost = NumTraits<Scalar>::ReadCost,
Flags = traits<Derived>::EvaluatorFlags,
Alignment = traits<Derived>::Alignment
@@ -274,13 +274,13 @@ struct evaluator<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
: evaluator<PlainObjectBase<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> > >
{
typedef Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> XprType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& m)
: evaluator<PlainObjectBase<XprType> >(m)
: evaluator<PlainObjectBase<XprType> >(m)
{ }
};
@@ -292,10 +292,10 @@ struct evaluator<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& m)
: evaluator<PlainObjectBase<XprType> >(m)
: evaluator<PlainObjectBase<XprType> >(m)
{ }
};
@@ -306,9 +306,9 @@ struct unary_evaluator<Transpose<ArgType>, IndexBased>
: evaluator_base<Transpose<ArgType> >
{
typedef Transpose<ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
Flags = evaluator<ArgType>::Flags ^ RowMajorBit,
Alignment = evaluator<ArgType>::Alignment
};
@@ -499,10 +499,10 @@ struct evaluator<CwiseNullaryOp<NullaryOp,PlainObjectType> >
{
typedef CwiseNullaryOp<NullaryOp,PlainObjectType> XprType;
typedef typename internal::remove_all<PlainObjectType>::type PlainObjectTypeCleaned;
enum {
CoeffReadCost = internal::functor_traits<NullaryOp>::Cost,
Flags = (evaluator<PlainObjectTypeCleaned>::Flags
& ( HereditaryBits
| (functor_has_linear_access<NullaryOp>::ret ? LinearAccessBit : 0)
@@ -559,10 +559,10 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
: evaluator_base<CwiseUnaryOp<UnaryOp, ArgType> >
{
typedef CwiseUnaryOp<UnaryOp, ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
CoeffReadCost = int(evaluator<ArgType>::CoeffReadCost) + int(functor_traits<UnaryOp>::Cost),
Flags = evaluator<ArgType>::Flags
& (HereditaryBits | LinearAccessBit | (functor_traits<UnaryOp>::PacketAccess ? PacketAccessBit : 0)),
Alignment = evaluator<ArgType>::Alignment
@@ -606,13 +606,13 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
protected:
// this helper permits to completely eliminate the functor if it is empty
class Data : private UnaryOp
struct Data
{
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : UnaryOp(xpr.functor()), argImpl(xpr.nestedExpression()) {}
Data(const XprType& xpr) : op(xpr.functor()), argImpl(xpr.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& func() const { return static_cast<const UnaryOp&>(*this); }
const UnaryOp& func() const { return op; }
UnaryOp op;
evaluator<ArgType> argImpl;
};
@@ -628,7 +628,7 @@ struct evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> >
{
typedef CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> XprType;
typedef ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> > Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
@@ -637,10 +637,10 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
: evaluator_base<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> >
{
typedef CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> XprType;
enum {
CoeffReadCost = evaluator<Arg1>::CoeffReadCost + evaluator<Arg2>::CoeffReadCost + evaluator<Arg3>::CoeffReadCost + functor_traits<TernaryOp>::Cost,
CoeffReadCost = int(evaluator<Arg1>::CoeffReadCost) + int(evaluator<Arg2>::CoeffReadCost) + int(evaluator<Arg3>::CoeffReadCost) + int(functor_traits<TernaryOp>::Cost),
Arg1Flags = evaluator<Arg1>::Flags,
Arg2Flags = evaluator<Arg2>::Flags,
Arg3Flags = evaluator<Arg3>::Flags,
@@ -700,12 +700,13 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
protected:
// this helper permits to completely eliminate the functor if it is empty
struct Data : private TernaryOp
struct Data
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : TernaryOp(xpr.functor()), arg1Impl(xpr.arg1()), arg2Impl(xpr.arg2()), arg3Impl(xpr.arg3()) {}
Data(const XprType& xpr) : op(xpr.functor()), arg1Impl(xpr.arg1()), arg2Impl(xpr.arg2()), arg3Impl(xpr.arg3()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const TernaryOp& func() const { return static_cast<const TernaryOp&>(*this); }
const TernaryOp& func() const { return op; }
TernaryOp op;
evaluator<Arg1> arg1Impl;
evaluator<Arg2> arg2Impl;
evaluator<Arg3> arg3Impl;
@@ -723,7 +724,7 @@ struct evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
{
typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
typedef binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs> > Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
@@ -733,10 +734,10 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
: evaluator_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
{
typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
enum {
CoeffReadCost = evaluator<Lhs>::CoeffReadCost + evaluator<Rhs>::CoeffReadCost + functor_traits<BinaryOp>::Cost,
CoeffReadCost = int(evaluator<Lhs>::CoeffReadCost) + int(evaluator<Rhs>::CoeffReadCost) + int(functor_traits<BinaryOp>::Cost),
LhsFlags = evaluator<Lhs>::Flags,
RhsFlags = evaluator<Rhs>::Flags,
SameType = is_same<typename Lhs::Scalar,typename Rhs::Scalar>::value,
@@ -793,12 +794,13 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
protected:
// this helper permits to completely eliminate the functor if it is empty
struct Data : private BinaryOp
struct Data
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : BinaryOp(xpr.functor()), lhsImpl(xpr.lhs()), rhsImpl(xpr.rhs()) {}
Data(const XprType& xpr) : op(xpr.functor()), lhsImpl(xpr.lhs()), rhsImpl(xpr.rhs()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const BinaryOp& func() const { return static_cast<const BinaryOp&>(*this); }
const BinaryOp& func() const { return op; }
BinaryOp op;
evaluator<Lhs> lhsImpl;
evaluator<Rhs> rhsImpl;
};
@@ -813,12 +815,12 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
: evaluator_base<CwiseUnaryView<UnaryOp, ArgType> >
{
typedef CwiseUnaryView<UnaryOp, ArgType> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost + functor_traits<UnaryOp>::Cost,
CoeffReadCost = int(evaluator<ArgType>::CoeffReadCost) + int(functor_traits<UnaryOp>::Cost),
Flags = (evaluator<ArgType>::Flags & (HereditaryBits | LinearAccessBit | DirectAccessBit)),
Alignment = 0 // FIXME it is not very clear why alignment is necessarily lost...
};
@@ -858,12 +860,13 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
protected:
// this helper permits to completely eliminate the functor if it is empty
struct Data : private UnaryOp
struct Data
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : UnaryOp(xpr.functor()), argImpl(xpr.nestedExpression()) {}
Data(const XprType& xpr) : op(xpr.functor()), argImpl(xpr.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& func() const { return static_cast<const UnaryOp&>(*this); }
const UnaryOp& func() const { return op; }
UnaryOp op;
evaluator<ArgType> argImpl;
};
@@ -884,7 +887,7 @@ struct mapbase_evaluator : evaluator_base<Derived>
typedef typename XprType::PointerType PointerType;
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
enum {
IsRowMajor = XprType::RowsAtCompileTime,
ColsAtCompileTime = XprType::ColsAtCompileTime,
@@ -956,17 +959,21 @@ struct mapbase_evaluator : evaluator_base<Derived>
internal::pstoret<Scalar, PacketType, StoreMode>(m_data + index * m_innerStride.value(), x);
}
protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rowStride() const { return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index colStride() const { return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rowStride() const EIGEN_NOEXCEPT {
return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index colStride() const EIGEN_NOEXCEPT {
return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value();
}
PointerType m_data;
const internal::variable_if_dynamic<Index, XprType::InnerStrideAtCompileTime> m_innerStride;
const internal::variable_if_dynamic<Index, XprType::OuterStrideAtCompileTime> m_outerStride;
};
template<typename PlainObjectType, int MapOptions, typename StrideType>
template<typename PlainObjectType, int MapOptions, typename StrideType>
struct evaluator<Map<PlainObjectType, MapOptions, StrideType> >
: public mapbase_evaluator<Map<PlainObjectType, MapOptions, StrideType>, PlainObjectType>
{
@@ -974,7 +981,7 @@ struct evaluator<Map<PlainObjectType, MapOptions, StrideType> >
typedef typename XprType::Scalar Scalar;
// TODO: should check for smaller packet types once we can handle multi-sized packet types
typedef typename packet_traits<Scalar>::type PacketScalar;
enum {
InnerStrideAtCompileTime = StrideType::InnerStrideAtCompileTime == 0
? int(PlainObjectType::InnerStrideAtCompileTime)
@@ -986,27 +993,27 @@ struct evaluator<Map<PlainObjectType, MapOptions, StrideType> >
HasNoOuterStride = StrideType::OuterStrideAtCompileTime == 0,
HasNoStride = HasNoInnerStride && HasNoOuterStride,
IsDynamicSize = PlainObjectType::SizeAtCompileTime==Dynamic,
PacketAccessMask = bool(HasNoInnerStride) ? ~int(0) : ~int(PacketAccessBit),
LinearAccessMask = bool(HasNoStride) || bool(PlainObjectType::IsVectorAtCompileTime) ? ~int(0) : ~int(LinearAccessBit),
Flags = int( evaluator<PlainObjectType>::Flags) & (LinearAccessMask&PacketAccessMask),
Alignment = int(MapOptions)&int(AlignedMask)
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& map)
: mapbase_evaluator<XprType, PlainObjectType>(map)
: mapbase_evaluator<XprType, PlainObjectType>(map)
{ }
};
// -------------------- Ref --------------------
template<typename PlainObjectType, int RefOptions, typename StrideType>
template<typename PlainObjectType, int RefOptions, typename StrideType>
struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
: public mapbase_evaluator<Ref<PlainObjectType, RefOptions, StrideType>, PlainObjectType>
{
typedef Ref<PlainObjectType, RefOptions, StrideType> XprType;
enum {
Flags = evaluator<Map<PlainObjectType, RefOptions, StrideType> >::Flags,
Alignment = evaluator<Map<PlainObjectType, RefOptions, StrideType> >::Alignment
@@ -1014,7 +1021,7 @@ struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& ref)
: mapbase_evaluator<XprType, PlainObjectType>(ref)
: mapbase_evaluator<XprType, PlainObjectType>(ref)
{ }
};
@@ -1022,8 +1029,8 @@ struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel,
bool HasDirectAccess = internal::has_direct_access<ArgType>::ret> struct block_evaluator;
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
: block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel>
{
@@ -1031,15 +1038,15 @@ struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
typedef typename XprType::Scalar Scalar;
// TODO: should check for smaller packet types once we can handle multi-sized packet types
typedef typename packet_traits<Scalar>::type PacketScalar;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
RowsAtCompileTime = traits<XprType>::RowsAtCompileTime,
ColsAtCompileTime = traits<XprType>::ColsAtCompileTime,
MaxRowsAtCompileTime = traits<XprType>::MaxRowsAtCompileTime,
MaxColsAtCompileTime = traits<XprType>::MaxColsAtCompileTime,
ArgTypeIsRowMajor = (int(evaluator<ArgType>::Flags)&RowMajorBit) != 0,
IsRowMajor = (MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1) ? 1
: (MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1) ? 0
@@ -1053,14 +1060,14 @@ struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
? int(outer_stride_at_compile_time<ArgType>::ret)
: int(inner_stride_at_compile_time<ArgType>::ret),
MaskPacketAccessBit = (InnerStrideAtCompileTime == 1 || HasSameStorageOrderAsArgType) ? PacketAccessBit : 0,
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1 || (InnerPanel && (evaluator<ArgType>::Flags&LinearAccessBit))) ? LinearAccessBit : 0,
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1 || (InnerPanel && (evaluator<ArgType>::Flags&LinearAccessBit))) ? LinearAccessBit : 0,
FlagsRowMajorBit = XprType::Flags&RowMajorBit,
Flags0 = evaluator<ArgType>::Flags & ( (HereditaryBits & ~RowMajorBit) |
DirectAccessBit |
MaskPacketAccessBit),
Flags = Flags0 | FlagsLinearAccessBit | FlagsRowMajorBit,
PacketAlignment = unpacket_traits<PacketScalar>::alignment,
Alignment0 = (InnerPanel && (OuterStrideAtCompileTime!=Dynamic)
&& (OuterStrideAtCompileTime!=0)
@@ -1084,7 +1091,7 @@ struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /*HasDirectAcc
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit block_evaluator(const XprType& block)
: unary_evaluator<XprType>(block)
: unary_evaluator<XprType>(block)
{}
};
@@ -1096,12 +1103,12 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& block)
: m_argImpl(block.nestedExpression()),
m_startRow(block.startRow()),
: m_argImpl(block.nestedExpression()),
m_startRow(block.startRow()),
m_startCol(block.startCol()),
m_linear_offset(ForwardLinearAccess?(ArgType::IsRowMajor ? block.startRow()*block.nestedExpression().cols() + block.startCol() : block.startCol()*block.nestedExpression().rows() + block.startRow()):0)
{ }
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
@@ -1109,13 +1116,13 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
RowsAtCompileTime = XprType::RowsAtCompileTime,
ForwardLinearAccess = (InnerPanel || int(XprType::IsRowMajor)==int(ArgType::IsRowMajor)) && bool(evaluator<ArgType>::Flags&LinearAccessBit)
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_argImpl.coeff(m_startRow.value() + row, m_startCol.value() + col);
{
return m_argImpl.coeff(m_startRow.value() + row, m_startCol.value() + col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
@@ -1124,44 +1131,44 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index row, Index col)
{
return m_argImpl.coeffRef(m_startRow.value() + row, m_startCol.value() + col);
{
return m_argImpl.coeffRef(m_startRow.value() + row, m_startCol.value() + col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
return linear_coeffRef_impl(index, bool_constant<ForwardLinearAccess>());
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index row, Index col) const
{
return m_argImpl.template packet<LoadMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col);
PacketType packet(Index row, Index col) const
{
return m_argImpl.template packet<LoadMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col);
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
{
PacketType packet(Index index) const
{
if (ForwardLinearAccess)
return m_argImpl.template packet<LoadMode,PacketType>(m_linear_offset.value() + index);
else
return packet<LoadMode,PacketType>(RowsAtCompileTime == 1 ? 0 : index,
RowsAtCompileTime == 1 ? index : 0);
}
template<int StoreMode, typename PacketType>
EIGEN_STRONG_INLINE
void writePacket(Index row, Index col, const PacketType& x)
void writePacket(Index row, Index col, const PacketType& x)
{
return m_argImpl.template writePacket<StoreMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col, x);
return m_argImpl.template writePacket<StoreMode,PacketType>(m_startRow.value() + row, m_startCol.value() + col, x);
}
template<int StoreMode, typename PacketType>
EIGEN_STRONG_INLINE
void writePacket(Index index, const PacketType& x)
void writePacket(Index index, const PacketType& x)
{
if (ForwardLinearAccess)
return m_argImpl.template writePacket<StoreMode,PacketType>(m_linear_offset.value() + index, x);
@@ -1170,12 +1177,12 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
RowsAtCompileTime == 1 ? index : 0,
x);
}
protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType linear_coeff_impl(Index index, internal::true_type /* ForwardLinearAccess */) const
{
return m_argImpl.coeff(m_linear_offset.value() + index);
return m_argImpl.coeff(m_linear_offset.value() + index);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType linear_coeff_impl(Index index, internal::false_type /* not ForwardLinearAccess */) const
@@ -1186,7 +1193,7 @@ protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& linear_coeffRef_impl(Index index, internal::true_type /* ForwardLinearAccess */)
{
return m_argImpl.coeffRef(m_linear_offset.value() + index);
return m_argImpl.coeffRef(m_linear_offset.value() + index);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& linear_coeffRef_impl(Index index, internal::false_type /* not ForwardLinearAccess */)
@@ -1200,10 +1207,10 @@ protected:
const variable_if_dynamic<Index, ForwardLinearAccess ? Dynamic : 0> m_linear_offset;
};
// TODO: This evaluator does not actually use the child evaluator;
// TODO: This evaluator does not actually use the child evaluator;
// all action is via the data() as returned by the Block expression.
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /* HasDirectAccess */ true>
: mapbase_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>,
typename Block<ArgType, BlockRows, BlockCols, InnerPanel>::PlainObject>
@@ -1213,7 +1220,7 @@ struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /* HasDirectAc
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit block_evaluator(const XprType& block)
: mapbase_evaluator<XprType, typename XprType::PlainObject>(block)
: mapbase_evaluator<XprType, typename XprType::PlainObject>(block)
{
// TODO: for the 3.3 release, this should be turned to an internal assertion, but let's keep it as is for the beta lifetime
eigen_assert(((internal::UIntPtr(block.data()) % EIGEN_PLAIN_ENUM_MAX(1,evaluator<XprType>::Alignment)) == 0) && "data is not aligned");
@@ -1236,7 +1243,7 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
evaluator<ElseMatrixType>::CoeffReadCost),
Flags = (unsigned int)evaluator<ThenMatrixType>::Flags & evaluator<ElseMatrixType>::Flags & HereditaryBits,
Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator<ThenMatrixType>::Alignment, evaluator<ElseMatrixType>::Alignment)
};
@@ -1248,7 +1255,7 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
typedef typename XprType::CoeffReturnType CoeffReturnType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -1268,7 +1275,7 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
else
return m_elseImpl.coeff(index);
}
protected:
evaluator<ConditionMatrixType> m_conditionImpl;
evaluator<ThenMatrixType> m_thenImpl;
@@ -1278,7 +1285,7 @@ protected:
// -------------------- Replicate --------------------
template<typename ArgType, int RowFactor, int ColFactor>
template<typename ArgType, int RowFactor, int ColFactor>
struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
: evaluator_base<Replicate<ArgType, RowFactor, ColFactor> >
{
@@ -1289,12 +1296,12 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
};
typedef typename internal::nested_eval<ArgType,Factor>::type ArgTypeNested;
typedef typename internal::remove_all<ArgTypeNested>::type ArgTypeNestedCleaned;
enum {
CoeffReadCost = evaluator<ArgTypeNestedCleaned>::CoeffReadCost,
LinearAccessMask = XprType::IsVectorAtCompileTime ? LinearAccessBit : 0,
Flags = (evaluator<ArgTypeNestedCleaned>::Flags & (HereditaryBits|LinearAccessMask) & ~RowMajorBit) | (traits<XprType>::Flags & RowMajorBit),
Alignment = evaluator<ArgTypeNestedCleaned>::Alignment
};
@@ -1305,7 +1312,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
m_rows(replicate.nestedExpression().rows()),
m_cols(replicate.nestedExpression().cols())
{}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
@@ -1316,10 +1323,10 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
const Index actual_col = internal::traits<XprType>::ColsAtCompileTime==1 ? 0
: ColFactor==1 ? col
: col % m_cols.value();
return m_argImpl.coeff(actual_row, actual_col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
@@ -1327,7 +1334,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
const Index actual_index = internal::traits<XprType>::RowsAtCompileTime==1
? (ColFactor==1 ? index : index%m_cols.value())
: (RowFactor==1 ? index : index%m_rows.value());
return m_argImpl.coeff(actual_index);
}
@@ -1344,7 +1351,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
return m_argImpl.template packet<LoadMode,PacketType>(actual_row, actual_col);
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
@@ -1355,7 +1362,7 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
return m_argImpl.template packet<LoadMode,PacketType>(actual_index);
}
protected:
const ArgTypeNested m_arg;
evaluator<ArgTypeNestedCleaned> m_argImpl;
@@ -1487,9 +1494,9 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
ReversePacket = (Direction == BothDirections)
|| ((Direction == Vertical) && IsColMajor)
|| ((Direction == Horizontal) && IsRowMajor),
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
// let's enable LinearAccess only with vectorization because of the product overhead
// FIXME enable DirectAccess with negative strides?
Flags0 = evaluator<ArgType>::Flags,
@@ -1498,7 +1505,7 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
? LinearAccessBit : 0,
Flags = int(Flags0) & (HereditaryBits | PacketAccessBit | LinearAccess),
Alignment = 0 // FIXME in some rare cases, Alignment could be preserved, like a Vector4f.
};
@@ -1508,7 +1515,7 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
m_rows(ReverseRow ? reverse.nestedExpression().rows() : 1),
m_cols(ReverseCol ? reverse.nestedExpression().cols() : 1)
{ }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
@@ -1583,7 +1590,7 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
m_argImpl.template writePacket<LoadMode>
(m_rows.value() * m_cols.value() - index - PacketSize, preverse(x));
}
protected:
evaluator<ArgType> m_argImpl;
@@ -1601,12 +1608,12 @@ struct evaluator<Diagonal<ArgType, DiagIndex> >
: evaluator_base<Diagonal<ArgType, DiagIndex> >
{
typedef Diagonal<ArgType, DiagIndex> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
Flags = (unsigned int)(evaluator<ArgType>::Flags & (HereditaryBits | DirectAccessBit) & ~RowMajorBit) | LinearAccessBit,
Alignment = 0
};
@@ -1615,7 +1622,7 @@ struct evaluator<Diagonal<ArgType, DiagIndex> >
: m_argImpl(diagonal.nestedExpression()),
m_index(diagonal.index())
{ }
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
@@ -1648,8 +1655,10 @@ protected:
const internal::variable_if_dynamicindex<Index, XprType::DiagIndex> m_index;
private:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value() > 0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value() > 0 ? m_index.value() : 0; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rowOffset() const { return m_index.value() > 0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index colOffset() const { return m_index.value() > 0 ? m_index.value() : 0; }
};
@@ -1673,25 +1682,25 @@ class EvalToTemp
: public dense_xpr_base<EvalToTemp<ArgType> >::type
{
public:
typedef typename dense_xpr_base<EvalToTemp>::type Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(EvalToTemp)
explicit EvalToTemp(const ArgType& arg)
: m_arg(arg)
{ }
const ArgType& arg() const
{
return m_arg;
}
Index rows() const
EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT
{
return m_arg.rows();
}
Index cols() const
EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT
{
return m_arg.cols();
}
@@ -1699,7 +1708,7 @@ class EvalToTemp
private:
const ArgType& m_arg;
};
template<typename ArgType>
struct evaluator<EvalToTemp<ArgType> >
: public evaluator<typename ArgType::PlainObject>
@@ -1707,7 +1716,7 @@ struct evaluator<EvalToTemp<ArgType> >
typedef EvalToTemp<ArgType> XprType;
typedef typename ArgType::PlainObject PlainObject;
typedef evaluator<PlainObject> Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr)
: m_result(xpr.arg())
{

View File

@@ -74,7 +74,7 @@ class CwiseBinaryOpImpl;
* \sa MatrixBase::binaryExpr(const MatrixBase<OtherDerived> &,const CustomBinaryOp &) const, class CwiseUnaryOp, class CwiseNullaryOp
*/
template<typename BinaryOp, typename LhsType, typename RhsType>
class CwiseBinaryOp :
class CwiseBinaryOp :
public CwiseBinaryOpImpl<
BinaryOp, LhsType, RhsType,
typename internal::cwise_promote_storage_type<typename internal::traits<LhsType>::StorageKind,
@@ -83,7 +83,7 @@ class CwiseBinaryOp :
internal::no_assignment_operator
{
public:
typedef typename internal::remove_all<BinaryOp>::type Functor;
typedef typename internal::remove_all<LhsType>::type Lhs;
typedef typename internal::remove_all<RhsType>::type Rhs;
@@ -102,7 +102,7 @@ class CwiseBinaryOp :
#if EIGEN_COMP_MSVC && EIGEN_HAS_CXX11
//Required for Visual Studio or the Copy constructor will probably not get inlined!
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
EIGEN_STRONG_INLINE
CwiseBinaryOp(const CwiseBinaryOp<BinaryOp,LhsType,RhsType>&) = default;
#endif
@@ -116,21 +116,15 @@ class CwiseBinaryOp :
eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols());
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic)
return m_rhs.rows();
else
return m_lhs.rows();
return internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic ? m_rhs.rows() : m_lhs.rows();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic)
return m_rhs.cols();
else
return m_lhs.cols();
return internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic ? m_rhs.cols() : m_lhs.cols();
}
/** \returns the left hand side nested expression */

View File

@@ -74,10 +74,10 @@ class CwiseNullaryOp : public internal::dense_xpr_base< CwiseNullaryOp<NullaryOp
&& (ColsAtCompileTime == Dynamic || ColsAtCompileTime == cols));
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const { return m_rows.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const { return m_cols.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const { return m_rows.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const { return m_cols.value(); }
/** \returns the functor representing the nullary operation */
EIGEN_DEVICE_FUNC
@@ -131,7 +131,7 @@ DenseBase<Derived>::NullaryExpr(Index rows, Index cols, const CustomNullaryOp& f
*
* Here is an example with C++11 random generators: \include random_cpp11.cpp
* Output: \verbinclude random_cpp11.out
*
*
* \sa class CwiseNullaryOp
*/
template<typename Derived>
@@ -292,7 +292,7 @@ DenseBase<Derived>::LinSpaced(Index size, const Scalar& low, const Scalar& high)
}
/**
* \copydoc DenseBase::LinSpaced(Index, const Scalar&, const Scalar&)
* \copydoc DenseBase::LinSpaced(Index, const DenseBase::Scalar&, const DenseBase::Scalar&)
* Special version for fixed size types which does not require the size parameter.
*/
template<typename Derived>
@@ -383,6 +383,33 @@ PlainObjectBase<Derived>::setConstant(Index rows, Index cols, const Scalar& val)
return setConstant(val);
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to the given value \a val. For the parameter
* of type NoChange_t, just pass the special value \c NoChange.
*
* \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setConstant(NoChange_t, Index cols, const Scalar& val)
{
return setConstant(rows(), cols, val);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to the given value \a val. For the parameter
* of type NoChange_t, just pass the special value \c NoChange.
*
* \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setConstant(Index rows, NoChange_t, const Scalar& val)
{
return setConstant(rows, cols(), val);
}
/**
* \brief Sets a linearly spaced vector.
*
@@ -556,6 +583,32 @@ PlainObjectBase<Derived>::setZero(Index rows, Index cols)
return setConstant(Scalar(0));
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to zero. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Zero()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setZero(NoChange_t, Index cols)
{
return setZero(rows(), cols);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to zero. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Zero()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setZero(Index rows, NoChange_t)
{
return setZero(rows, cols());
}
// ones:
/** \returns an expression of a matrix where all coefficients equal one.
@@ -682,6 +735,32 @@ PlainObjectBase<Derived>::setOnes(Index rows, Index cols)
return setConstant(Scalar(1));
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to one. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(NoChange_t, Index), class CwiseNullaryOp, MatrixBase::Ones()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setOnes(Index rows, NoChange_t)
{
return setOnes(rows, cols());
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to one. For the parameter of type NoChange_t,
* just pass the special value \c NoChange.
*
* \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(Index, NoChange_t) class CwiseNullaryOp, MatrixBase::Ones()
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setOnes(NoChange_t, Index cols)
{
return setOnes(rows(), cols);
}
// Identity:
/** \returns an expression of the identity matrix (not necessarily square).

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_CWISE_UNARY_OP_H
#define EIGEN_CWISE_UNARY_OP_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename UnaryOp, typename XprType>
@@ -24,7 +24,7 @@ struct traits<CwiseUnaryOp<UnaryOp, XprType> >
typedef typename XprType::Nested XprTypeNested;
typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
enum {
Flags = _XprTypeNested::Flags & RowMajorBit
Flags = _XprTypeNested::Flags & RowMajorBit
};
};
}
@@ -65,10 +65,10 @@ class CwiseUnaryOp : public CwiseUnaryOpImpl<UnaryOp, XprType, typename internal
explicit CwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp())
: m_xpr(xpr), m_functor(func) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_xpr.cols(); }
/** \returns the functor representing the unary operation */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE

View File

@@ -64,23 +64,25 @@ class CwiseUnaryView : public CwiseUnaryViewImpl<ViewOp, MatrixType, typename in
typedef typename internal::ref_selector<MatrixType>::non_const_type MatrixTypeNested;
typedef typename internal::remove_all<MatrixType>::type NestedExpression;
explicit inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
explicit EIGEN_DEVICE_FUNC inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryView)
EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
/** \returns the functor representing unary operation */
const ViewOp& functor() const { return m_functor; }
EIGEN_DEVICE_FUNC const ViewOp& functor() const { return m_functor; }
/** \returns the nested expression */
const typename internal::remove_all<MatrixTypeNested>::type&
EIGEN_DEVICE_FUNC const typename internal::remove_all<MatrixTypeNested>::type&
nestedExpression() const { return m_matrix; }
/** \returns the nested expression */
typename internal::remove_reference<MatrixTypeNested>::type&
EIGEN_DEVICE_FUNC typename internal::remove_reference<MatrixTypeNested>::type&
nestedExpression() { return m_matrix; }
protected:
@@ -108,19 +110,21 @@ class CwiseUnaryViewImpl<ViewOp,MatrixType,Dense>
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryViewImpl)
EIGEN_DEVICE_FUNC inline Scalar* data() { return &(this->coeffRef(0)); }
EIGEN_DEVICE_FUNC inline const Scalar* data() const { return &(this->coeff(0)); }
EIGEN_DEVICE_FUNC inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const
{
return derived().nestedExpression().innerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
EIGEN_DEVICE_FUNC inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const
{
return derived().nestedExpression().outerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(CwiseUnaryViewImpl)
};
} // end namespace Eigen

View File

@@ -14,15 +14,15 @@
namespace Eigen {
namespace internal {
// The index type defined by EIGEN_DEFAULT_DENSE_INDEX_TYPE must be a signed type.
// This dummy function simply aims at checking that at compile time.
static inline void check_DenseIndex_is_signed() {
EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE);
EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE)
}
} // end namespace internal
/** \class DenseBase
* \ingroup Core_Module
*
@@ -64,12 +64,12 @@ template<typename Derived> class DenseBase
/** The numeric type of the expression' coefficients, e.g. float, double, int or std::complex<float>, etc. */
typedef typename internal::traits<Derived>::Scalar Scalar;
/** The numeric type of the expression' coefficients, e.g. float, double, int or std::complex<float>, etc.
*
* It is an alias for the Scalar type */
typedef Scalar value_type;
typedef typename NumTraits<Scalar>::Real RealScalar;
typedef DenseCoeffsBase<Derived, internal::accessors_level<Derived>::value> Base;
@@ -158,7 +158,7 @@ template<typename Derived> class DenseBase
* a row-vector (if there is only one row). */
NumDimensions = int(MaxSizeAtCompileTime) == 1 ? 0 : bool(IsVectorAtCompileTime) ? 1 : 2,
/**< This value is equal to Tensor::NumDimensions, i.e. 0 for scalars, 1 for vectors,
/**< This value is equal to Tensor::NumDimensions, i.e. 0 for scalars, 1 for vectors,
* and 2 for matrices.
*/
@@ -175,11 +175,11 @@ template<typename Derived> class DenseBase
InnerStrideAtCompileTime = internal::inner_stride_at_compile_time<Derived>::ret,
OuterStrideAtCompileTime = internal::outer_stride_at_compile_time<Derived>::ret
};
typedef typename internal::find_best_packet<Scalar,SizeAtCompileTime>::type PacketScalar;
enum { IsPlainObjectBase = 0 };
/** The plain matrix type corresponding to this expression.
* \sa PlainObject */
typedef Matrix<typename internal::traits<Derived>::Scalar,
@@ -189,7 +189,7 @@ template<typename Derived> class DenseBase
internal::traits<Derived>::MaxRowsAtCompileTime,
internal::traits<Derived>::MaxColsAtCompileTime
> PlainMatrix;
/** The plain array type corresponding to this expression.
* \sa PlainObject */
typedef Array<typename internal::traits<Derived>::Scalar,
@@ -211,7 +211,7 @@ template<typename Derived> class DenseBase
/** \returns the number of nonzero coefficients which is in practice the number
* of stored coefficients. */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index nonZeros() const { return size(); }
/** \returns the outer size.
@@ -219,7 +219,7 @@ template<typename Derived> class DenseBase
* \note For a vector, this returns just 1. For a matrix (non-vector), this is the major dimension
* with respect to the \ref TopicStorageOrders "storage order", i.e., the number of columns for a
* column-major matrix, and the number of rows for a row-major matrix. */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index outerSize() const
{
return IsVectorAtCompileTime ? 1
@@ -229,9 +229,9 @@ template<typename Derived> class DenseBase
/** \returns the inner size.
*
* \note For a vector, this is just the size. For a matrix (non-vector), this is the minor dimension
* with respect to the \ref TopicStorageOrders "storage order", i.e., the number of rows for a
* with respect to the \ref TopicStorageOrders "storage order", i.e., the number of rows for a
* column-major matrix, and the number of columns for a row-major matrix. */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index innerSize() const
{
return IsVectorAtCompileTime ? this->size()
@@ -324,9 +324,9 @@ template<typename Derived> class DenseBase
typedef Transpose<Derived> TransposeReturnType;
EIGEN_DEVICE_FUNC
TransposeReturnType transpose();
typedef typename internal::add_const<Transpose<const Derived> >::type ConstTransposeReturnType;
typedef Transpose<const Derived> ConstTransposeReturnType;
EIGEN_DEVICE_FUNC
ConstTransposeReturnType transpose() const;
const ConstTransposeReturnType transpose() const;
EIGEN_DEVICE_FUNC
void transposeInPlace();
@@ -411,7 +411,7 @@ template<typename Derived> class DenseBase
// size types on MSVC.
return typename internal::eval<Derived>::type(derived());
}
/** swaps *this with the expression \a other.
*
*/
@@ -449,18 +449,58 @@ template<typename Derived> class DenseBase
EIGEN_DEVICE_FUNC Scalar prod() const;
template<int NaNPropagation>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar minCoeff() const;
template<int NaNPropagation>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar maxCoeff() const;
template<typename IndexType> EIGEN_DEVICE_FUNC
// By default, the fastest version with undefined NaN propagation semantics is
// used.
// TODO(rmlarsen): Replace with default template argument when we move to
// c++11 or beyond.
EIGEN_DEVICE_FUNC inline typename internal::traits<Derived>::Scalar minCoeff() const {
return minCoeff<PropagateFast>();
}
EIGEN_DEVICE_FUNC inline typename internal::traits<Derived>::Scalar maxCoeff() const {
return maxCoeff<PropagateFast>();
}
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar minCoeff(IndexType* row, IndexType* col) const;
template<typename IndexType> EIGEN_DEVICE_FUNC
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* row, IndexType* col) const;
template<typename IndexType> EIGEN_DEVICE_FUNC
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar minCoeff(IndexType* index) const;
template<typename IndexType> EIGEN_DEVICE_FUNC
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* index) const;
// TODO(rmlarsen): Replace these methods with a default template argument.
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar minCoeff(IndexType* row, IndexType* col) const {
return minCoeff<PropagateFast>(row, col);
}
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* row, IndexType* col) const {
return maxCoeff<PropagateFast>(row, col);
}
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar minCoeff(IndexType* index) const {
return minCoeff<PropagateFast>(index);
}
template<typename IndexType>
EIGEN_DEVICE_FUNC inline
typename internal::traits<Derived>::Scalar maxCoeff(IndexType* index) const {
return maxCoeff<PropagateFast>(index);
}
template<typename BinaryOp>
EIGEN_DEVICE_FUNC
Scalar redux(const BinaryOp& func) const;
@@ -530,16 +570,16 @@ template<typename Derived> class DenseBase
static const RandomReturnType Random();
template<typename ThenDerived,typename ElseDerived>
const Select<Derived,ThenDerived,ElseDerived>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived,ElseDerived>
select(const DenseBase<ThenDerived>& thenMatrix,
const DenseBase<ElseDerived>& elseMatrix) const;
template<typename ThenDerived>
inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
select(const DenseBase<ThenDerived>& thenMatrix, const typename ThenDerived::Scalar& elseScalar) const;
template<typename ElseDerived>
inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
inline EIGEN_DEVICE_FUNC const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
select(const typename ElseDerived::Scalar& thenScalar, const DenseBase<ElseDerived>& elseMatrix) const;
template<int p> RealScalar lpNorm() const;
@@ -636,11 +676,12 @@ template<typename Derived> class DenseBase
}
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(DenseBase)
/** Default constructor. Do nothing. */
EIGEN_DEVICE_FUNC DenseBase()
{
/* Just checks for self-consistency of the flags.
* Only do it when debugging Eigen, as this borders on paranoiac and could slow compilation down
* Only do it when debugging Eigen, as this borders on paranoia and could slow compilation down
*/
#ifdef EIGEN_INTERNAL_DEBUGGING
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, int(IsRowMajor))

View File

@@ -27,7 +27,7 @@ template<typename T> struct add_const_on_value_type_if_arithmetic
*
* This class defines the \c operator() \c const function and friends, which can be used to read specific
* entries of a matrix or array.
*
*
* \sa DenseCoeffsBase<Derived, WriteAccessors>, DenseCoeffsBase<Derived, DirectAccessors>,
* \ref TopicClassHierarchy
*/
@@ -295,7 +295,7 @@ class DenseCoeffsBase<Derived,ReadOnlyAccessors> : public EigenBase<Derived>
* This class defines the non-const \c operator() function and friends, which can be used to write specific
* entries of a matrix or array. This class inherits DenseCoeffsBase<Derived, ReadOnlyAccessors> which
* defines the const variant for reading specific entries.
*
*
* \sa DenseCoeffsBase<Derived, DirectAccessors>, \ref TopicClassHierarchy
*/
template<typename Derived>
@@ -495,7 +495,7 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa outerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const
{
return derived().innerStride();
@@ -506,14 +506,14 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa innerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const
{
return derived().outerStride();
}
// FIXME shall we remove it ?
inline Index stride() const
EIGEN_CONSTEXPR inline Index stride() const
{
return Derived::IsVectorAtCompileTime ? innerStride() : outerStride();
}
@@ -522,7 +522,7 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa innerStride(), outerStride(), colStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rowStride() const
{
return Derived::IsRowMajor ? outerStride() : innerStride();
@@ -532,7 +532,7 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
*
* \sa innerStride(), outerStride(), rowStride()
*/
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index colStride() const
{
return Derived::IsRowMajor ? innerStride() : outerStride();
@@ -570,8 +570,8 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa outerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT
{
return derived().innerStride();
}
@@ -581,14 +581,14 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa innerStride(), rowStride(), colStride()
*/
EIGEN_DEVICE_FUNC
inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT
{
return derived().outerStride();
}
// FIXME shall we remove it ?
inline Index stride() const
EIGEN_CONSTEXPR inline Index stride() const EIGEN_NOEXCEPT
{
return Derived::IsVectorAtCompileTime ? innerStride() : outerStride();
}
@@ -597,8 +597,8 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa innerStride(), outerStride(), colStride()
*/
EIGEN_DEVICE_FUNC
inline Index rowStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rowStride() const EIGEN_NOEXCEPT
{
return Derived::IsRowMajor ? outerStride() : innerStride();
}
@@ -607,8 +607,8 @@ class DenseCoeffsBase<Derived, DirectWriteAccessors>
*
* \sa innerStride(), outerStride(), rowStride()
*/
EIGEN_DEVICE_FUNC
inline Index colStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index colStride() const EIGEN_NOEXCEPT
{
return Derived::IsRowMajor ? innerStride() : outerStride();
}
@@ -619,7 +619,7 @@ namespace internal {
template<int Alignment, typename Derived, bool JustReturnZero>
struct first_aligned_impl
{
static inline Index run(const Derived&)
static EIGEN_CONSTEXPR inline Index run(const Derived&) EIGEN_NOEXCEPT
{ return 0; }
};

View File

@@ -47,20 +47,20 @@ struct plain_array
EIGEN_DEVICE_FUNC
plain_array()
{
{
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
{
check_static_allocation_size<T,Size>();
}
};
#if defined(EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT)
#define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask)
#elif EIGEN_GNUC_AT_LEAST(4,7)
#elif EIGEN_GNUC_AT_LEAST(4,7)
// GCC 4.7 is too aggressive in its optimizations and remove the alignment test based on the fact the array is declared to be aligned.
// See this bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53900
// Hiding the origin of the array pointer behind a function argument seems to do the trick even if the function is inlined:
@@ -85,15 +85,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 8>
EIGEN_ALIGN_TO_BOUNDARY(8) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(7);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -104,15 +104,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 16>
EIGEN_ALIGN_TO_BOUNDARY(16) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
{
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(15);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -123,15 +123,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 32>
EIGEN_ALIGN_TO_BOUNDARY(32) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(31);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -142,15 +142,15 @@ struct plain_array<T, Size, MatrixOrArrayOptions, 64>
EIGEN_ALIGN_TO_BOUNDARY(64) T array[Size];
EIGEN_DEVICE_FUNC
plain_array()
{
plain_array()
{
EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(63);
check_static_allocation_size<T,Size>();
}
EIGEN_DEVICE_FUNC
plain_array(constructor_without_unaligned_array_assert)
{
plain_array(constructor_without_unaligned_array_assert)
{
check_static_allocation_size<T,Size>();
}
};
@@ -163,6 +163,30 @@ struct plain_array<T, 0, MatrixOrArrayOptions, Alignment>
EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) {}
};
struct plain_array_helper {
template<typename T, int Size, int MatrixOrArrayOptions, int Alignment>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
static void copy(const plain_array<T, Size, MatrixOrArrayOptions, Alignment>& src, const Eigen::Index size,
plain_array<T, Size, MatrixOrArrayOptions, Alignment>& dst) {
smart_copy(src.array, src.array + size, dst.array);
}
template<typename T, int Size, int MatrixOrArrayOptions, int Alignment>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
static void swap(plain_array<T, Size, MatrixOrArrayOptions, Alignment>& a, const Eigen::Index a_size,
plain_array<T, Size, MatrixOrArrayOptions, Alignment>& b, const Eigen::Index b_size) {
if (a_size < b_size) {
std::swap_ranges(b.array, b.array + a_size, a.array);
smart_move(b.array + a_size, b.array + b_size, a.array + a_size);
} else if (a_size > b_size) {
std::swap_ranges(a.array, a.array + b_size, b.array);
smart_move(a.array + b_size, a.array + a_size, b.array + b_size);
} else {
std::swap_ranges(a.array, a.array + a_size, b.array);
}
}
};
} // end namespace internal
/** \internal
@@ -190,16 +214,41 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
EIGEN_DEVICE_FUNC
explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()) {}
EIGEN_DEVICE_FUNC
#if !EIGEN_HAS_CXX11 || defined(EIGEN_DENSE_STORAGE_CTOR_PLUGIN)
EIGEN_DEVICE_FUNC
DenseStorage(const DenseStorage& other) : m_data(other.m_data) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = Size)
}
EIGEN_DEVICE_FUNC
#else
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage&) = default;
#endif
#if !EIGEN_HAS_CXX11
EIGEN_DEVICE_FUNC
DenseStorage& operator=(const DenseStorage& other)
{
{
if (this != &other) m_data = other.m_data;
return *this;
return *this;
}
#else
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) = default;
#endif
#if EIGEN_HAS_RVALUE_REFERENCES
#if !EIGEN_HAS_CXX11
EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
: m_data(std::move(other.m_data))
{
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT
{
if (this != &other)
m_data = std::move(other.m_data);
return *this;
}
#else
EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&&) = default;
EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&&) = default;
#endif
#endif
EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) {
EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({})
eigen_internal_assert(size==rows*cols && rows==_Rows && cols==_Cols);
@@ -210,8 +259,8 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data, other.m_data);
}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) EIGEN_NOEXCEPT {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC void resize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
@@ -228,8 +277,8 @@ template<typename T, int _Rows, int _Cols, int _Options> class DenseStorage<T, 0
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) { return *this; }
EIGEN_DEVICE_FUNC DenseStorage(Index,Index,Index) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& ) {}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) EIGEN_NOEXCEPT {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC void resize(Index,Index,Index) {}
EIGEN_DEVICE_FUNC const T *data() const { return 0; }
@@ -256,21 +305,25 @@ template<typename T, int Size, int _Options> class DenseStorage<T, Size, Dynamic
EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0), m_cols(0) {}
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0), m_cols(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows), m_cols(other.m_cols) {}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(other.m_rows), m_cols(other.m_cols)
{
internal::plain_array_helper::copy(other.m_data, m_rows * m_cols, m_data);
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_rows = other.m_rows;
m_cols = other.m_cols;
internal::plain_array_helper::copy(other.m_data, m_rows * m_cols, m_data);
}
return *this;
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index cols) : m_rows(rows), m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{
numext::swap(m_data,other.m_data);
internal::plain_array_helper::swap(m_data, m_rows * m_cols, other.m_data, other.m_rows * other.m_cols);
numext::swap(m_rows,other.m_rows);
numext::swap(m_cols,other.m_cols);
}
@@ -291,24 +344,29 @@ template<typename T, int Size, int _Cols, int _Options> class DenseStorage<T, Si
EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0) {}
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows) {}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other)
: m_data(internal::constructor_without_unaligned_array_assert()), m_rows(other.m_rows)
{
internal::plain_array_helper::copy(other.m_data, m_rows * _Cols, m_data);
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_rows = other.m_rows;
internal::plain_array_helper::copy(other.m_data, m_rows * _Cols, m_data);
}
return *this;
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index) : m_rows(rows) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
{
internal::plain_array_helper::swap(m_data, m_rows * _Cols, other.m_data, other.m_rows * _Cols);
numext::swap(m_rows, other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return _Cols;}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols(void) const EIGEN_NOEXCEPT {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index rows, Index) { m_rows = rows; }
EIGEN_DEVICE_FUNC void resize(Index, Index rows, Index) { m_rows = rows; }
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
@@ -324,23 +382,27 @@ template<typename T, int Size, int _Rows, int _Options> class DenseStorage<T, Si
EIGEN_DEVICE_FUNC DenseStorage() : m_cols(0) {}
EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert)
: m_data(internal::constructor_without_unaligned_array_assert()), m_cols(0) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_cols(other.m_cols) {}
EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other)
: m_data(internal::constructor_without_unaligned_array_assert()), m_cols(other.m_cols)
{
internal::plain_array_helper::copy(other.m_data, _Rows * m_cols, m_data);
}
EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other)
{
if (this != &other)
{
m_data = other.m_data;
m_cols = other.m_cols;
internal::plain_array_helper::copy(other.m_data, _Rows * m_cols, m_data);
}
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index, Index cols) : m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
internal::plain_array_helper::swap(m_data, _Rows * m_cols, other.m_data, _Rows * other.m_cols);
numext::swap(m_cols, other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows(void) const EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC void resize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
@@ -407,8 +469,8 @@ template<typename T, int _Options> class DenseStorage<T, Dynamic, Dynamic, Dynam
numext::swap(m_rows,other.m_rows);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}
void conservativeResize(Index size, Index rows, Index cols)
{
m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, m_rows*m_cols);
@@ -462,7 +524,7 @@ template<typename T, int _Rows, int _Options> class DenseStorage<T, Dynamic, _Ro
this->swap(tmp);
}
return *this;
}
}
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC
DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
@@ -485,8 +547,8 @@ template<typename T, int _Rows, int _Options> class DenseStorage<T, Dynamic, _Ro
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index size, Index, Index cols)
{
m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, _Rows*m_cols);
@@ -538,7 +600,7 @@ template<typename T, int _Cols, int _Options> class DenseStorage<T, Dynamic, Dyn
this->swap(tmp);
}
return *this;
}
}
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC
DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT
@@ -561,8 +623,8 @@ template<typename T, int _Cols, int _Options> class DenseStorage<T, Dynamic, Dyn
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;}
EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) {return _Cols;}
void conservativeResize(Index size, Index rows, Index)
{
m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, m_rows*_Cols);

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_DIAGONAL_H
#define EIGEN_DIAGONAL_H
namespace Eigen {
namespace Eigen {
/** \class Diagonal
* \ingroup Core_Module
@@ -84,20 +84,16 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
: numext::mini<Index>(m_matrix.rows(),m_matrix.cols()-m_index.value());
}
EIGEN_DEVICE_FUNC
inline Index cols() const { return 1; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return 1; }
EIGEN_DEVICE_FUNC
inline Index innerStride() const
{
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT {
return m_matrix.outerStride() + 1;
}
EIGEN_DEVICE_FUNC
inline Index outerStride() const
{
return 0;
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return 0; }
typedef typename internal::conditional<
internal::is_lvalue<MatrixType>::value,
@@ -149,8 +145,8 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
}
EIGEN_DEVICE_FUNC
inline const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
inline const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
{
return m_matrix;
}
@@ -167,12 +163,12 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
private:
// some compilers may fail to optimize std::max etc in case of compile-time constants...
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index absDiagIndex() const { return m_index.value()>0 ? m_index.value() : -m_index.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value()>0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value()>0 ? m_index.value() : 0; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index absDiagIndex() const EIGEN_NOEXCEPT { return m_index.value()>0 ? m_index.value() : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rowOffset() const EIGEN_NOEXCEPT { return m_index.value()>0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index colOffset() const EIGEN_NOEXCEPT { return m_index.value()>0 ? m_index.value() : 0; }
// trigger a compile-time error if someone try to call packet
template<int LoadMode> typename MatrixType::PacketReturnType packet(Index) const;
template<int LoadMode> typename MatrixType::PacketReturnType packet(Index,Index) const;
@@ -195,7 +191,8 @@ MatrixBase<Derived>::diagonal()
/** This is the const version of diagonal(). */
template<typename Derived>
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::ConstDiagonalReturnType
EIGEN_DEVICE_FUNC inline
const typename MatrixBase<Derived>::ConstDiagonalReturnType
MatrixBase<Derived>::diagonal() const
{
return ConstDiagonalReturnType(derived());
@@ -213,18 +210,18 @@ MatrixBase<Derived>::diagonal() const
*
* \sa MatrixBase::diagonal(), class Diagonal */
template<typename Derived>
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::DiagonalDynamicIndexReturnType
EIGEN_DEVICE_FUNC inline Diagonal<Derived, DynamicIndex>
MatrixBase<Derived>::diagonal(Index index)
{
return DiagonalDynamicIndexReturnType(derived(), index);
return Diagonal<Derived, DynamicIndex>(derived(), index);
}
/** This is the const version of diagonal(Index). */
template<typename Derived>
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::ConstDiagonalDynamicIndexReturnType
EIGEN_DEVICE_FUNC inline const Diagonal<const Derived, DynamicIndex>
MatrixBase<Derived>::diagonal(Index index) const
{
return ConstDiagonalDynamicIndexReturnType(derived(), index);
return Diagonal<const Derived, DynamicIndex>(derived(), index);
}
/** \returns an expression of the \a DiagIndex-th sub or super diagonal of the matrix \c *this
@@ -241,20 +238,20 @@ MatrixBase<Derived>::diagonal(Index index) const
template<typename Derived>
template<int Index_>
EIGEN_DEVICE_FUNC
inline typename MatrixBase<Derived>::template DiagonalIndexReturnType<Index_>::Type
inline Diagonal<Derived, Index_>
MatrixBase<Derived>::diagonal()
{
return typename DiagonalIndexReturnType<Index_>::Type(derived());
return Diagonal<Derived, Index_>(derived());
}
/** This is the const version of diagonal<int>(). */
template<typename Derived>
template<int Index_>
EIGEN_DEVICE_FUNC
inline typename MatrixBase<Derived>::template ConstDiagonalIndexReturnType<Index_>::Type
inline const Diagonal<const Derived, Index_>
MatrixBase<Derived>::diagonal() const
{
return typename ConstDiagonalIndexReturnType<Index_>::Type(derived());
return Diagonal<const Derived, Index_>(derived());
}
} // end namespace Eigen

View File

@@ -18,14 +18,9 @@ namespace internal {
// with mismatched types, the compiler emits errors about failing to instantiate cwiseProduct BEFORE
// looking at the static assertions. Thus this is a trick to get better compile errors.
template<typename T, typename U,
// the NeedToTranspose condition here is taken straight from Assign.h
bool NeedToTranspose = T::IsVectorAtCompileTime
&& U::IsVectorAtCompileTime
&& ((int(T::RowsAtCompileTime) == 1 && int(U::ColsAtCompileTime) == 1)
| // FIXME | instead of || to please GCC 4.4.0 stupid warning "suggest parentheses around &&".
// revert to || as soon as not needed anymore.
(int(T::ColsAtCompileTime) == 1 && int(U::RowsAtCompileTime) == 1))
>
bool NeedToTranspose = T::IsVectorAtCompileTime && U::IsVectorAtCompileTime &&
((int(T::RowsAtCompileTime) == 1 && int(U::ColsAtCompileTime) == 1) ||
(int(T::ColsAtCompileTime) == 1 && int(U::RowsAtCompileTime) == 1))>
struct dot_nocheck
{
typedef scalar_conj_product_op<typename traits<T>::Scalar,typename traits<U>::Scalar> conj_prod;
@@ -86,7 +81,7 @@ MatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const
//---------- implementation of L2 norm and related functions ----------
/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the Frobenius norm.
/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the squared Frobenius norm.
* In both cases, it consists in the sum of the square of all the matrix entries.
* For vectors, this is also equals to the dot product of \c *this with itself.
*
@@ -207,7 +202,7 @@ struct lpNorm_selector
EIGEN_DEVICE_FUNC
static inline RealScalar run(const MatrixBase<Derived>& m)
{
EIGEN_USING_STD_MATH(pow)
EIGEN_USING_STD(pow)
return pow(m.cwiseAbs().array().pow(p).sum(), RealScalar(1)/p);
}
};

View File

@@ -15,7 +15,7 @@ namespace Eigen {
/** \class EigenBase
* \ingroup Core_Module
*
*
* Common base class for all classes T such that MatrixBase has an operator=(T) and a constructor MatrixBase(T).
*
* In other words, an EigenBase object is an object that can be copied into a MatrixBase.
@@ -29,7 +29,7 @@ namespace Eigen {
template<typename Derived> struct EigenBase
{
// typedef typename internal::plain_matrix_type<Derived>::type PlainObject;
/** \brief The interface type of indices
* \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE.
* \sa StorageIndex, \ref TopicPreprocessorDirectives.
@@ -56,15 +56,15 @@ template<typename Derived> struct EigenBase
{ return *static_cast<const Derived*>(this); }
/** \returns the number of rows. \sa cols(), RowsAtCompileTime */
EIGEN_DEVICE_FUNC
inline Index rows() const { return derived().rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); }
/** \returns the number of columns. \sa rows(), ColsAtCompileTime*/
EIGEN_DEVICE_FUNC
inline Index cols() const { return derived().cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return derived().cols(); }
/** \returns the number of coefficients, which is rows()*cols().
* \sa rows(), cols(), SizeAtCompileTime. */
EIGEN_DEVICE_FUNC
inline Index size() const { return rows() * cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index size() const EIGEN_NOEXCEPT { return rows() * cols(); }
/** \internal Don't use it, but do the equivalent: \code dst = *this; \endcode */
template<typename Dest>

View File

@@ -41,10 +41,14 @@ template<typename ExpressionType> class ForceAlignedAccess
EIGEN_DEVICE_FUNC explicit inline ForceAlignedAccess(const ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index row, Index col) const
{

View File

@@ -228,8 +228,7 @@ template<> struct gemv_dense_selector<OnTheRight,ColMajor,true>
ActualLhsType actualLhs = LhsBlasTraits::extract(lhs);
ActualRhsType actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
// make sure Dest is a compile-time vector type (bug 1166)
typedef typename conditional<Dest::IsVectorAtCompileTime, Dest, typename Dest::ColXpr>::type ActualDest;
@@ -320,8 +319,7 @@ template<> struct gemv_dense_selector<OnTheRight,RowMajor,true>
typename add_const<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(lhs);
typename add_const<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
enum {
// FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1

File diff suppressed because it is too large Load Diff

View File

@@ -81,14 +81,16 @@ namespace Eigen
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(expm1,scalar_expm1_op,exponential of a value minus 1,\sa ArrayBase::expm1)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op,natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p,scalar_log1p_op,natural logarithm of 1 plus the value,\sa ArrayBase::log1p)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log10)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log2,scalar_log2_op,base 2 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs,scalar_abs_op,absolute value,\sa ArrayBase::abs DOXCOMMA MatrixBase::cwiseAbs)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs2,scalar_abs2_op,squared absolute value,\sa ArrayBase::abs2 DOXCOMMA MatrixBase::cwiseAbs2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg DOXCOMMA MatrixBase::cwiseArg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt,scalar_sqrt_op,square root,\sa ArrayBase::sqrt DOXCOMMA MatrixBase::cwiseSqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rsqrt,scalar_rsqrt_op,reciprocal square root,\sa ArrayBase::rsqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(square,scalar_square_op,square (power 2),\sa Eigen::abs2 DOXCOMMA Eigen::pow DOXCOMMA ArrayBase::square)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cube,scalar_cube_op,cube (power 3),\sa Eigen::pow DOXCOMMA ArrayBase::cube)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rint,scalar_rint_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(round,scalar_round_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(floor,scalar_floor_op,nearest integer not greater than the giben value,\sa Eigen::ceil DOXCOMMA ArrayBase::floor)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ceil,scalar_ceil_op,nearest integer not less than the giben value,\sa Eigen::floor DOXCOMMA ArrayBase::ceil)

View File

@@ -130,6 +130,9 @@ struct significant_decimals_impl
template<typename Derived>
std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt)
{
using internal::is_same;
using internal::conditional;
if(_m.size() == 0)
{
s << fmt.matPrefix << fmt.matSuffix;
@@ -138,6 +141,22 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
typename Derived::Nested m = _m;
typedef typename Derived::Scalar Scalar;
typedef typename
conditional<
is_same<Scalar, char>::value ||
is_same<Scalar, unsigned char>::value ||
is_same<Scalar, numext::int8_t>::value ||
is_same<Scalar, numext::uint8_t>::value,
int,
typename conditional<
is_same<Scalar, std::complex<char> >::value ||
is_same<Scalar, std::complex<unsigned char> >::value ||
is_same<Scalar, std::complex<numext::int8_t> >::value ||
is_same<Scalar, std::complex<numext::uint8_t> >::value,
std::complex<int>,
const Scalar&
>::type
>::type PrintType;
Index width = 0;
@@ -174,7 +193,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
{
std::stringstream sstr;
sstr.copyfmt(s);
sstr << m.coeff(i,j);
sstr << static_cast<PrintType>(m.coeff(i,j));
width = std::max<Index>(width, Index(sstr.str().length()));
}
}
@@ -190,7 +209,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, 0);
s << static_cast<PrintType>(m.coeff(i, 0));
for(Index j = 1; j < m.cols(); ++j)
{
s << fmt.coeffSeparator;
@@ -198,7 +217,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, j);
s << static_cast<PrintType>(m.coeff(i, j));
}
s << fmt.rowSuffix;
if( i < m.rows() - 1)

View File

@@ -54,7 +54,8 @@ struct traits<IndexedView<XprType, RowIndices, ColIndices> >
DirectAccessMask = (int(InnerIncr)!=UndefinedIncr && int(OuterIncr)!=UndefinedIncr && InnerIncr>=0 && OuterIncr>=0) ? DirectAccessBit : 0,
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask)) | FlagsLvalueBit | FlagsRowMajorBit
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1) ? LinearAccessBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask )) | FlagsLvalueBit | FlagsRowMajorBit | FlagsLinearAccessBit
};
typedef Block<XprType,RowsAtCompileTime,ColsAtCompileTime,IsInnerPannel> BlockType;
@@ -121,10 +122,10 @@ public:
{}
/** \returns number of rows */
Index rows() const { return internal::size(m_rowIndices); }
Index rows() const { return internal::index_list_size(m_rowIndices); }
/** \returns number of columns */
Index cols() const { return internal::size(m_colIndices); }
Index cols() const { return internal::index_list_size(m_colIndices); }
/** \returns the nested expression */
const typename internal::remove_all<XprType>::type&
@@ -168,7 +169,11 @@ struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost /* TODO + cost of row/col index */,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits /*| LinearAccessBit | DirectAccessBit*/)),
FlagsLinearAccessBit = (traits<XprType>::RowsAtCompileTime == 1 || traits<XprType>::ColsAtCompileTime == 1) ? LinearAccessBit : 0,
FlagsRowMajorBit = traits<XprType>::FlagsRowMajorBit,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits & ~RowMajorBit /*| LinearAccessBit | DirectAccessBit*/)) | FlagsLinearAccessBit | FlagsRowMajorBit,
Alignment = 0
};
@@ -184,15 +189,50 @@ struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
eigen_assert(m_xpr.rowIndices()[row] >= 0 && m_xpr.rowIndices()[row] < m_xpr.nestedExpression().rows()
&& m_xpr.colIndices()[col] >= 0 && m_xpr.colIndices()[col] < m_xpr.nestedExpression().cols());
return m_argImpl.coeff(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index row, Index col)
{
eigen_assert(m_xpr.rowIndices()[row] >= 0 && m_xpr.rowIndices()[row] < m_xpr.nestedExpression().rows()
&& m_xpr.colIndices()[col] >= 0 && m_xpr.colIndices()[col] < m_xpr.nestedExpression().cols());
return m_argImpl.coeffRef(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
eigen_assert(m_xpr.rowIndices()[row] >= 0 && m_xpr.rowIndices()[row] < m_xpr.nestedExpression().rows()
&& m_xpr.colIndices()[col] >= 0 && m_xpr.colIndices()[col] < m_xpr.nestedExpression().cols());
return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar& coeffRef(Index index) const
{
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
eigen_assert(m_xpr.rowIndices()[row] >= 0 && m_xpr.rowIndices()[row] < m_xpr.nestedExpression().rows()
&& m_xpr.colIndices()[col] >= 0 && m_xpr.colIndices()[col] < m_xpr.nestedExpression().cols());
return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const CoeffReturnType coeff(Index index) const
{
Index row = XprType::RowsAtCompileTime == 1 ? 0 : index;
Index col = XprType::RowsAtCompileTime == 1 ? index : 0;
eigen_assert(m_xpr.rowIndices()[row] >= 0 && m_xpr.rowIndices()[row] < m_xpr.nestedExpression().rows()
&& m_xpr.colIndices()[col] >= 0 && m_xpr.colIndices()[col] < m_xpr.nestedExpression().cols());
return m_argImpl.coeff( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
protected:
evaluator<ArgType> m_argImpl;

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_INVERSE_H
#define EIGEN_INVERSE_H
namespace Eigen {
namespace Eigen {
template<typename XprType,typename StorageKind> class InverseImpl;
@@ -49,13 +49,13 @@ public:
typedef typename internal::remove_all<XprTypeNested>::type XprTypeNestedCleaned;
typedef typename internal::ref_selector<Inverse>::type Nested;
typedef typename internal::remove_all<XprType>::type NestedExpression;
explicit EIGEN_DEVICE_FUNC Inverse(const XprType &xpr)
: m_xpr(xpr)
{}
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC const XprTypeNestedCleaned& nestedExpression() const { return m_xpr; }
@@ -81,7 +81,7 @@ namespace internal {
/** \internal
* \brief Default evaluator for Inverse expression.
*
*
* This default evaluator for Inverse expression simply evaluate the inverse into a temporary
* by a call to internal::call_assignment_no_alias.
* Therefore, inverse implementers only have to specialize Assignment<Dst,Inverse<...>, ...> for
@@ -96,7 +96,7 @@ struct unary_evaluator<Inverse<ArgType> >
typedef Inverse<ArgType> InverseType;
typedef typename InverseType::PlainObject PlainObject;
typedef evaluator<PlainObject> Base;
enum { Flags = Base::Flags | EvalBeforeNestingBit };
unary_evaluator(const InverseType& inv_xpr)
@@ -105,11 +105,11 @@ struct unary_evaluator<Inverse<ArgType> >
::new (static_cast<Base*>(this)) Base(m_result);
internal::call_assignment_no_alias(m_result, inv_xpr);
}
protected:
PlainObject m_result;
};
} // end namespace internal
} // end namespace Eigen

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_MAP_H
#define EIGEN_MAP_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename PlainObjectType, int MapOptions, typename StrideType>
@@ -47,7 +47,7 @@ private:
* \brief A matrix or vector expression mapping an existing array of data.
*
* \tparam PlainObjectType the equivalent matrix type of the mapped data
* \tparam MapOptions specifies the pointer alignment in bytes. It can be: \c #Aligned128, , \c #Aligned64, \c #Aligned32, \c #Aligned16, \c #Aligned8 or \c #Unaligned.
* \tparam MapOptions specifies the pointer alignment in bytes. It can be: \c #Aligned128, \c #Aligned64, \c #Aligned32, \c #Aligned16, \c #Aligned8 or \c #Unaligned.
* The default is \c #Unaligned.
* \tparam StrideType optionally specifies strides. By default, Map assumes the memory layout
* of an ordinary, contiguous array. This can be overridden by specifying strides.
@@ -104,13 +104,13 @@ template<typename PlainObjectType, int MapOptions, typename StrideType> class Ma
EIGEN_DEVICE_FUNC
inline PointerType cast_to_pointer_type(PointerArgType ptr) { return ptr; }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const
{
return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1;
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const
{
return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()

View File

@@ -15,7 +15,7 @@
EIGEN_STATIC_ASSERT((int(internal::evaluator<Derived>::Flags) & LinearAccessBit) || Derived::IsVectorAtCompileTime, \
YOU_ARE_TRYING_TO_USE_AN_INDEX_BASED_ACCESSOR_ON_AN_EXPRESSION_THAT_DOES_NOT_SUPPORT_THAT)
namespace Eigen {
namespace Eigen {
/** \ingroup Core_Module
*
@@ -87,9 +87,11 @@ template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
typedef typename Base::CoeffReturnType CoeffReturnType;
/** \copydoc DenseBase::rows() */
EIGEN_DEVICE_FUNC inline Index rows() const { return m_rows.value(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_rows.value(); }
/** \copydoc DenseBase::cols() */
EIGEN_DEVICE_FUNC inline Index cols() const { return m_cols.value(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_cols.value(); }
/** Returns a pointer to the first coefficient of the matrix or vector.
*
@@ -182,6 +184,8 @@ template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
#endif
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
template<typename T>
EIGEN_DEVICE_FUNC
@@ -294,6 +298,9 @@ template<typename Derived> class MapBase<Derived, WriteAccessors>
// In theory we could simply refer to Base:Base::operator=, but MSVC does not like Base::Base,
// see bugs 821 and 920.
using ReadOnlyMapBase::Base::operator=;
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
};
#undef EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS

File diff suppressed because it is too large Load Diff

View File

@@ -17,19 +17,28 @@ namespace internal {
/** \internal \returns the hyperbolic tan of \a a (coeff-wise)
Doesn't do anything fancy, just a 13/6-degree rational interpolant which
is accurate up to a couple of ulp in the range [-9, 9], outside of which
the tanh(x) = +/-1.
is accurate up to a couple of ulps in the (approximate) range [-8, 8],
outside of which tanh(x) = +/-1 in single precision. The input is clamped
to the range [-c, c]. The value c is chosen as the smallest value where
the approximation evaluates to exactly 1. In the reange [-0.0004, 0.0004]
the approxmation tanh(x) ~= x is used for better accuracy as x tends to zero.
This implementation works on both scalars and packets.
*/
template<typename T>
T generic_fast_tanh_float(const T& a_x)
{
// Clamp the inputs to the range [-9, 9] since anything outside
// this range is +/-1.0f in single-precision.
const T plus_9 = pset1<T>(9.f);
const T minus_9 = pset1<T>(-9.f);
const T x = pmax(pmin(a_x, plus_9), minus_9);
// Clamp the inputs to the range [-c, c]
#ifdef EIGEN_VECTORIZE_FMA
const T plus_clamp = pset1<T>(7.99881172180175781f);
const T minus_clamp = pset1<T>(-7.99881172180175781f);
#else
const T plus_clamp = pset1<T>(7.90531110763549805f);
const T minus_clamp = pset1<T>(-7.90531110763549805f);
#endif
const T tiny = pset1<T>(0.0004f);
const T x = pmax(pmin(a_x, plus_clamp), minus_clamp);
const T tiny_mask = pcmp_lt(pabs(a_x), tiny);
// The monomial coefficients of the numerator polynomial (odd).
const T alpha_1 = pset1<T>(4.89352455891786e-03f);
const T alpha_3 = pset1<T>(6.37261928875436e-04f);
@@ -57,20 +66,26 @@ T generic_fast_tanh_float(const T& a_x)
p = pmadd(x2, p, alpha_1);
p = pmul(x, p);
// Evaluate the denominator polynomial p.
// Evaluate the denominator polynomial q.
T q = pmadd(x2, beta_6, beta_4);
q = pmadd(x2, q, beta_2);
q = pmadd(x2, q, beta_0);
// Divide the numerator by the denominator.
return pdiv(p, q);
return pselect(tiny_mask, x, pdiv(p, q));
}
template<typename RealScalar>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
RealScalar positive_real_hypot(const RealScalar& x, const RealScalar& y)
{
EIGEN_USING_STD_MATH(sqrt);
// IEEE IEC 6059 special cases.
if ((numext::isinf)(x) || (numext::isinf)(y))
return NumTraits<RealScalar>::infinity();
if ((numext::isnan)(x) || (numext::isnan)(y))
return NumTraits<RealScalar>::quiet_NaN();
EIGEN_USING_STD(sqrt);
RealScalar p, qp;
p = numext::maxi(x,y);
if(p==RealScalar(0)) return RealScalar(0);
@@ -85,11 +100,99 @@ struct hypot_impl
static EIGEN_DEVICE_FUNC
inline RealScalar run(const Scalar& x, const Scalar& y)
{
EIGEN_USING_STD_MATH(abs);
EIGEN_USING_STD(abs);
return positive_real_hypot<RealScalar>(abs(x), abs(y));
}
};
// Generic complex sqrt implementation that correctly handles corner cases
// according to https://en.cppreference.com/w/cpp/numeric/complex/sqrt
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
// Computes the principal sqrt of the input.
//
// For a complex square root of the number x + i*y. We want to find real
// numbers u and v such that
// (u + i*v)^2 = x + i*y <=>
// u^2 - v^2 + i*2*u*v = x + i*v.
// By equating the real and imaginary parts we get:
// u^2 - v^2 = x
// 2*u*v = y.
//
// For x >= 0, this has the numerically stable solution
// u = sqrt(0.5 * (x + sqrt(x^2 + y^2)))
// v = y / (2 * u)
// and for x < 0,
// v = sign(y) * sqrt(0.5 * (-x + sqrt(x^2 + y^2)))
// u = y / (2 * v)
//
// Letting w = sqrt(0.5 * (|x| + |z|)),
// if x == 0: u = w, v = sign(y) * w
// if x > 0: u = w, v = y / (2 * w)
// if x < 0: u = |y| / (2 * w), v = sign(y) * w
const T x = numext::real(z);
const T y = numext::imag(z);
const T zero = T(0);
const T w = numext::sqrt(T(0.5) * (numext::abs(x) + numext::hypot(x, y)));
return
(numext::isinf)(y) ? std::complex<T>(NumTraits<T>::infinity(), y)
: x == zero ? std::complex<T>(w, y < zero ? -w : w)
: x > zero ? std::complex<T>(w, y / (2 * w))
: std::complex<T>(numext::abs(y) / (2 * w), y < zero ? -w : w );
}
// Generic complex rsqrt implementation.
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_rsqrt(const std::complex<T>& z) {
// Computes the principal reciprocal sqrt of the input.
//
// For a complex reciprocal square root of the number z = x + i*y. We want to
// find real numbers u and v such that
// (u + i*v)^2 = 1 / (x + i*y) <=>
// u^2 - v^2 + i*2*u*v = x/|z|^2 - i*v/|z|^2.
// By equating the real and imaginary parts we get:
// u^2 - v^2 = x/|z|^2
// 2*u*v = y/|z|^2.
//
// For x >= 0, this has the numerically stable solution
// u = sqrt(0.5 * (x + |z|)) / |z|
// v = -y / (2 * u * |z|)
// and for x < 0,
// v = -sign(y) * sqrt(0.5 * (-x + |z|)) / |z|
// u = -y / (2 * v * |z|)
//
// Letting w = sqrt(0.5 * (|x| + |z|)),
// if x == 0: u = w / |z|, v = -sign(y) * w / |z|
// if x > 0: u = w / |z|, v = -y / (2 * w * |z|)
// if x < 0: u = |y| / (2 * w * |z|), v = -sign(y) * w / |z|
const T x = numext::real(z);
const T y = numext::imag(z);
const T zero = T(0);
const T abs_z = numext::hypot(x, y);
const T w = numext::sqrt(T(0.5) * (numext::abs(x) + abs_z));
const T woz = w / abs_z;
// Corner cases consistent with 1/sqrt(z) on gcc/clang.
return
abs_z == zero ? std::complex<T>(NumTraits<T>::infinity(), NumTraits<T>::quiet_NaN())
: ((numext::isinf)(x) || (numext::isinf)(y)) ? std::complex<T>(zero, zero)
: x == zero ? std::complex<T>(woz, y < zero ? woz : -woz)
: x > zero ? std::complex<T>(woz, -y / (2 * w * abs_z))
: std::complex<T>(numext::abs(y) / (2 * w * abs_z), y < zero ? woz : -woz );
}
template<typename T>
EIGEN_DEVICE_FUNC std::complex<T> complex_log(const std::complex<T>& z) {
// Computes complex log.
T a = numext::abs(z);
EIGEN_USING_STD(atan2);
T b = atan2(z.imag(), z.real());
return std::complex<T>(numext::log(a), b);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -29,7 +29,7 @@ private:
required_alignment = unpacket_traits<PacketScalar>::alignment,
packet_access_bit = (packet_traits<_Scalar>::Vectorizable && (EIGEN_UNALIGNED_VECTORIZE || (actual_alignment>=required_alignment))) ? PacketAccessBit : 0
};
public:
typedef _Scalar Scalar;
typedef Dense StorageKind;
@@ -44,7 +44,7 @@ public:
Options = _Options,
InnerStrideAtCompileTime = 1,
OuterStrideAtCompileTime = (Options&RowMajor) ? ColsAtCompileTime : RowsAtCompileTime,
// FIXME, the following flag in only used to define NeedsToAlign in PlainObjectBase
EvaluatorFlags = LinearAccessBit | DirectAccessBit | packet_access_bit | row_major_bit,
Alignment = actual_alignment
@@ -225,8 +225,6 @@ class Matrix
return Base::_set(other);
}
/* Here, doxygen failed to copy the brief information when using \copydoc */
/**
* \brief Copies the generic expression \a other into *this.
* \copydetails DenseBase::operator=(const EigenBase<OtherDerived> &other)
@@ -278,13 +276,21 @@ class Matrix
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix& operator=(Matrix&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable<Scalar>::value)
{
other.swap(*this);
Base::operator=(std::move(other));
return *this;
}
#endif
#if EIGEN_HAS_CXX11
/** \copydoc PlainObjectBase(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&... args)
/** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients. \cpp11
*
* \only_for_vectors
*
* This constructor is for 1D array or vectors with more than 4 coefficients.
* There exists C++98 analogue constructors for fixed-size array/vector having 1, 2, 3, or 4 coefficients.
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
* constructor must match the the fixed number of rows (resp. columns) of \c *this.
*
* Example: \include Matrix_variadic_ctor_cxx11.cpp
* Output: \verbinclude Matrix_variadic_ctor_cxx11.out
@@ -297,24 +303,26 @@ class Matrix
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs a Matrix and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11
*
*
* \anchor matrix_constructor_initializer_list
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
*
* Example: \include Matrix_initializer_list_23_cxx11.cpp
* Output: \verbinclude Matrix_initializer_list_23_cxx11.out
*
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered.
*
*
* In the case of a compile-time column vector, implicit transposition from a single row is allowed.
* Therefore <code>VectorXd{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>RowVectorXd{{1},{2},{3},{4},{5}}</code> can be avoided:
*
*
* Example: \include Matrix_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Matrix_initializer_list_vector_cxx11.out
*
*
* In the case of fixed-sized matrices, the initializer list sizes must exactly match the matrix sizes,
* and implicit transposition is allowed for compile-time vectors only.
*
*
* \sa Matrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
@@ -351,7 +359,7 @@ class Matrix
* This is useful for dynamic-size vectors. For fixed-size vectors,
* it is redundant to pass these parameters, so one should use the default constructor
* Matrix() instead.
*
*
* \warning This constructor is disabled for fixed-size \c 1x1 matrices. For instance,
* calling Matrix<double,1,1>(1) will call the initialization constructor: Matrix(const Scalar&).
* For fixed-size \c 1x1 matrices it is therefore recommended to use the default
@@ -367,7 +375,7 @@ class Matrix
* This is useful for dynamic-size matrices. For fixed-size matrices,
* it is redundant to pass these parameters, so one should use the default constructor
* Matrix() instead.
*
*
* \warning This constructor is disabled for fixed-size \c 1x2 and \c 2x1 vectors. For instance,
* calling Matrix2f(2,1) will call the initialization constructor: Matrix(const Scalar& x, const Scalar& y).
* For fixed-size \c 1x2 or \c 2x1 vectors it is therefore recommended to use the default
@@ -376,7 +384,7 @@ class Matrix
*/
EIGEN_DEVICE_FUNC
Matrix(Index rows, Index cols);
/** \brief Constructs an initialized 2D vector with given coefficients
* \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */
Matrix(const Scalar& x, const Scalar& y);
@@ -423,8 +431,10 @@ class Matrix
: Base(other.derived())
{ }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return 1; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return this->innerSize(); }
/////////// Geometry module ///////////
@@ -463,14 +473,14 @@ class Matrix
*
* There are also \c VectorSizeType and \c RowVectorSizeType which are self-explanatory. For example, \c Vector4cf is
* a fixed-size vector of 4 complex floats.
*
*
* With \cpp11, template alias are also defined for common sizes.
* They follow the same pattern as above except that the scalar type suffix is replaced by a
* template parameter, i.e.:
* - `MatrixSize<Type>` where `Size` can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size.
* - `MatrixXSize<Type>` and `MatrixSizeX<Type>` where `Size` can be \c 2,\c 3,\c 4 for hybrid dynamic/fixed matrices.
* - `VectorSize<Type>` and `RowVectorSize<Type>` for column and row vectors.
*
*
* With \cpp11, you can also use fully generic column and row vector types: `Vector<Type,Size>` and `RowVector<Type,Size>`.
*
* \sa class Matrix
@@ -478,16 +488,21 @@ class Matrix
#define EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix) \
/** \ingroup matrixtypedefs */ \
/** \brief \noop */ \
typedef Matrix<Type, Size, Size> Matrix##SizeSuffix##TypeSuffix; \
/** \ingroup matrixtypedefs */ \
/** \brief \noop */ \
typedef Matrix<Type, Size, 1> Vector##SizeSuffix##TypeSuffix; \
/** \ingroup matrixtypedefs */ \
/** \brief \noop */ \
typedef Matrix<Type, 1, Size> RowVector##SizeSuffix##TypeSuffix;
#define EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, Size) \
/** \ingroup matrixtypedefs */ \
/** \brief \noop */ \
typedef Matrix<Type, Size, Dynamic> Matrix##Size##X##TypeSuffix; \
/** \ingroup matrixtypedefs */ \
/** \brief \noop */ \
typedef Matrix<Type, Dynamic, Size> Matrix##X##Size##TypeSuffix;
#define EIGEN_MAKE_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \

View File

@@ -206,28 +206,22 @@ template<typename Derived> class MatrixBase
EIGEN_DEVICE_FUNC
DiagonalReturnType diagonal();
typedef typename internal::add_const<Diagonal<const Derived> >::type ConstDiagonalReturnType;
typedef Diagonal<const Derived> ConstDiagonalReturnType;
EIGEN_DEVICE_FUNC
ConstDiagonalReturnType diagonal() const;
template<int Index> struct DiagonalIndexReturnType { typedef Diagonal<Derived,Index> Type; };
template<int Index> struct ConstDiagonalIndexReturnType { typedef const Diagonal<const Derived,Index> Type; };
const ConstDiagonalReturnType diagonal() const;
template<int Index>
EIGEN_DEVICE_FUNC
typename DiagonalIndexReturnType<Index>::Type diagonal();
Diagonal<Derived, Index> diagonal();
template<int Index>
EIGEN_DEVICE_FUNC
typename ConstDiagonalIndexReturnType<Index>::Type diagonal() const;
typedef Diagonal<Derived,DynamicIndex> DiagonalDynamicIndexReturnType;
typedef typename internal::add_const<Diagonal<const Derived,DynamicIndex> >::type ConstDiagonalDynamicIndexReturnType;
const Diagonal<const Derived, Index> diagonal() const;
EIGEN_DEVICE_FUNC
DiagonalDynamicIndexReturnType diagonal(Index index);
Diagonal<Derived, DynamicIndex> diagonal(Index index);
EIGEN_DEVICE_FUNC
ConstDiagonalDynamicIndexReturnType diagonal(Index index) const;
const Diagonal<const Derived, DynamicIndex> diagonal(Index index) const;
template<unsigned int Mode> struct TriangularViewReturnType { typedef TriangularView<Derived, Mode> Type; };
template<unsigned int Mode> struct ConstTriangularViewReturnType { typedef const TriangularView<const Derived, Mode> Type; };
@@ -481,7 +475,8 @@ template<typename Derived> class MatrixBase
EIGEN_MATRIX_FUNCTION_1(MatrixComplexPowerReturnValue, pow, power to \c p, const std::complex<RealScalar>& p)
protected:
EIGEN_DEVICE_FUNC MatrixBase() : Base() {}
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MatrixBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MatrixBase)
private:
EIGEN_DEVICE_FUNC explicit MatrixBase(int);

View File

@@ -45,8 +45,8 @@ template<typename ExpressionType> class NestByValue
EIGEN_DEVICE_FUNC explicit inline NestByValue(const ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); }
EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; }

View File

@@ -21,14 +21,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits10_impl
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return std::numeric_limits<T>::digits10; }
};
template<typename T>
struct default_digits10_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() {
using std::log10;
using std::ceil;
@@ -40,7 +40,7 @@ struct default_digits10_impl<T,false,false> // Floating point
template<typename T>
struct default_digits10_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return 0; }
};
@@ -52,14 +52,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits_impl
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return std::numeric_limits<T>::digits; }
};
template<typename T>
struct default_digits_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() {
using std::log;
using std::ceil;
@@ -71,12 +71,34 @@ struct default_digits_impl<T,false,false> // Floating point
template<typename T>
struct default_digits_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static int run() { return 0; }
};
} // end namespace internal
namespace numext {
/** \internal bit-wise cast without changing the underlying bit representation. */
// TODO: Replace by std::bit_cast (available in C++20)
template <typename Tgt, typename Src>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Tgt bit_cast(const Src& src) {
#if EIGEN_HAS_TYPE_TRAITS
// The behaviour of memcpy is not specified for non-trivially copyable types
EIGEN_STATIC_ASSERT(std::is_trivially_copyable<Src>::value, THIS_TYPE_IS_NOT_SUPPORTED);
EIGEN_STATIC_ASSERT(std::is_trivially_copyable<Tgt>::value && std::is_default_constructible<Tgt>::value,
THIS_TYPE_IS_NOT_SUPPORTED);
#endif
EIGEN_STATIC_ASSERT(sizeof(Src) == sizeof(Tgt), THIS_TYPE_IS_NOT_SUPPORTED);
Tgt tgt;
EIGEN_USING_STD(memcpy)
memcpy(&tgt, &src, sizeof(Tgt));
return tgt;
}
} // namespace numext
// clang-format off
/** \class NumTraits
* \ingroup Core_Module
*
@@ -88,36 +110,47 @@ struct default_digits_impl<T,false,true> // Integer
*
* The provided data consists of:
* \li A typedef \c Real, giving the "real part" type of \a T. If \a T is already real,
* then \c Real is just a typedef to \a T. If \a T is \c std::complex<U> then \c Real
* then \c Real is just a typedef to \a T. If \a T is `std::complex<U>` then \c Real
* is a typedef to \a U.
* \li A typedef \c NonInteger, giving the type that should be used for operations producing non-integral values,
* such as quotients, square roots, etc. If \a T is a floating-point type, then this typedef just gives
* \a T again. Note however that many Eigen functions such as internal::sqrt simply refuse to
* \a T again. Note however that many Eigen functions such as `internal::sqrt` simply refuse to
* take integers. Outside of a few cases, Eigen doesn't do automatic type promotion. Thus, this typedef is
* only intended as a helper for code that needs to explicitly promote types.
* \li A typedef \c Literal giving the type to use for numeric literals such as "2" or "0.5". For instance, for \c std::complex<U>, Literal is defined as \c U.
* \li A typedef \c Literal giving the type to use for numeric literals such as "2" or "0.5". For instance, for `std::complex<U>`,
* Literal is defined as \c U.
* Of course, this type must be fully compatible with \a T. In doubt, just use \a T here.
* \li A typedef \a Nested giving the type to use to nest a value inside of the expression tree. If you don't know what
* \li A typedef \c Nested giving the type to use to nest a value inside of the expression tree. If you don't know what
* this means, just use \a T here.
* \li An enum value \a IsComplex. It is equal to 1 if \a T is a \c std::complex
* \li An enum value \c IsComplex. It is equal to 1 if \a T is a \c std::complex
* type, and to 0 otherwise.
* \li An enum value \a IsInteger. It is equal to \c 1 if \a T is an integer type such as \c int,
* \li An enum value \c IsInteger. It is equal to \c 1 if \a T is an integer type such as \c int,
* and to \c 0 otherwise.
* \li Enum values ReadCost, AddCost and MulCost representing a rough estimate of the number of CPU cycles needed
* \li Enum values \c ReadCost, \c AddCost and \c MulCost representing a rough estimate of the number of CPU cycles needed
* to by move / add / mul instructions respectively, assuming the data is already stored in CPU registers.
* Stay vague here. No need to do architecture-specific stuff. If you don't know what this means, just use \c Eigen::HugeCost.
* \li An enum value \a IsSigned. It is equal to \c 1 if \a T is a signed type and to 0 if \a T is unsigned.
* \li An enum value \a RequireInitialization. It is equal to \c 1 if the constructor of the numeric type \a T must
* \li An enum value \c IsSigned. It is equal to \c 1 if \a T is a signed type and to 0 if \a T is unsigned.
* \li An enum value \c RequireInitialization. It is equal to \c 1 if the constructor of the numeric type \a T must
* be called, and to 0 if it is safe not to call it. Default is 0 if \a T is an arithmetic type, and 1 otherwise.
* \li An epsilon() function which, unlike <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/epsilon">std::numeric_limits::epsilon()</a>,
* it returns a \a Real instead of a \a T.
* \li A dummy_precision() function returning a weak epsilon value. It is mainly used as a default
* \li An `epsilon()` function which, unlike <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/epsilon">`std::numeric_limits::epsilon()`</a>,
* it returns a \c Real instead of a \a T.
* \li A `dummy_precision()` function returning a weak epsilon value. It is mainly used as a default
* value by the fuzzy comparison operators.
* \li highest() and lowest() functions returning the highest and lowest possible values respectively.
* \li digits10() function returning the number of decimal digits that can be represented without change. This is
* \li `highest()` and `lowest()` functions returning the highest and lowest possible values respectively.
* \li `digits()` function returning the number of radix digits (non-sign digits for integers, mantissa for floating-point). This is
* the analogue of <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/digits">std::numeric_limits<T>::digits</a>
* which is used as the default implementation if specialized.
* \li `digits10()` function returning the number of decimal digits that can be represented without change. This is
* the analogue of <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/digits10">std::numeric_limits<T>::digits10</a>
* which is used as the default implementation if specialized.
* \li `min_exponent()` and `max_exponent()` functions returning the highest and lowest possible values, respectively,
* such that the radix raised to the power exponent-1 is a normalized floating-point number. These are equivalent to
* <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/min_exponent">`std::numeric_limits<T>::min_exponent`</a>/
* <a href="http://en.cppreference.com/w/cpp/types/numeric_limits/max_exponent">`std::numeric_limits<T>::max_exponent`</a>.
* \li `infinity()` function returning a representation of positive infinity, if available.
* \li `quiet_NaN` function returning a non-signaling "not-a-number", if available.
*/
// clang-format on
template<typename T> struct GenericNumTraits
{
@@ -140,49 +173,60 @@ template<typename T> struct GenericNumTraits
typedef T Nested;
typedef T Literal;
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real epsilon()
{
return numext::numeric_limits<T>::epsilon();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits10()
{
return internal::default_digits10_impl<T>::run();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits()
{
return internal::default_digits_impl<T>::run();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int min_exponent()
{
return numext::numeric_limits<T>::min_exponent;
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int max_exponent()
{
return numext::numeric_limits<T>::max_exponent;
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real dummy_precision()
{
// make sure to override this for floating-point types
return Real(0);
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T highest() {
return (numext::numeric_limits<T>::max)();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T lowest() {
return IsInteger ? (numext::numeric_limits<T>::min)()
: static_cast<T>(-(numext::numeric_limits<T>::max)());
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T infinity() {
return numext::numeric_limits<T>::infinity();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline T quiet_NaN() {
return numext::numeric_limits<T>::quiet_NaN();
}
@@ -194,21 +238,35 @@ template<typename T> struct NumTraits : GenericNumTraits<T>
template<> struct NumTraits<float>
: GenericNumTraits<float>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline float dummy_precision() { return 1e-5f; }
};
template<> struct NumTraits<double> : GenericNumTraits<double>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline double dummy_precision() { return 1e-12; }
};
// GPU devices treat `long double` as `double`.
#ifndef EIGEN_GPU_COMPILE_PHASE
template<> struct NumTraits<long double>
: GenericNumTraits<long double>
{
static inline long double dummy_precision() { return 1e-15l; }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline long double dummy_precision() { return static_cast<long double>(1e-15l); }
#if defined(EIGEN_ARCH_PPC) && (__LDBL_MANT_DIG__ == 106)
// PowerPC double double causes issues with some values
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline long double epsilon()
{
// 2^(-(__LDBL_MANT_DIG__)+1)
return static_cast<long double>(2.4651903288156618919116517665087e-32l);
}
#endif
};
#endif
template<typename _Real> struct NumTraits<std::complex<_Real> >
: GenericNumTraits<std::complex<_Real> >
@@ -223,11 +281,11 @@ template<typename _Real> struct NumTraits<std::complex<_Real> >
MulCost = 4 * NumTraits<Real>::MulCost + 2 * NumTraits<Real>::AddCost
};
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real epsilon() { return NumTraits<Real>::epsilon(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline Real dummy_precision() { return NumTraits<Real>::dummy_precision(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline int digits10() { return NumTraits<Real>::digits10(); }
};
@@ -247,16 +305,17 @@ struct NumTraits<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
IsInteger = NumTraits<Scalar>::IsInteger,
IsSigned = NumTraits<Scalar>::IsSigned,
RequireInitialization = 1,
ReadCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::ReadCost,
AddCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::AddCost,
MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::MulCost
ReadCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits<Scalar>::ReadCost),
AddCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits<Scalar>::AddCost),
MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits<Scalar>::MulCost)
};
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline RealScalar epsilon() { return NumTraits<RealScalar>::epsilon(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
static inline RealScalar dummy_precision() { return NumTraits<RealScalar>::dummy_precision(); }
EIGEN_CONSTEXPR
static inline int digits10() { return NumTraits<Scalar>::digits10(); }
};
@@ -270,6 +329,7 @@ template<> struct NumTraits<std::string>
MulCost = HugeCost
};
EIGEN_CONSTEXPR
static inline int digits10() { return 0; }
private:
@@ -284,6 +344,8 @@ private:
// Empty specialization for void to allow template specialization based on NumTraits<T>::Real with T==void and SFINAE.
template<> struct NumTraits<void> {};
template<> struct NumTraits<bool> : GenericNumTraits<bool> {};
} // end namespace Eigen
#endif // EIGEN_NUMTRAITS_H

View File

@@ -54,12 +54,17 @@ struct packetwise_redux_traits
/* Value to be returned when size==0 , by default let's return 0 */
template<typename PacketType,typename Func>
EIGEN_DEVICE_FUNC
PacketType packetwise_redux_empty_value(const Func& ) { return pset1<PacketType>(0); }
PacketType packetwise_redux_empty_value(const Func& ) {
const typename unpacket_traits<PacketType>::type zero(0);
return pset1<PacketType>(zero);
}
/* For products the default is 1 */
template<typename PacketType,typename Scalar>
EIGEN_DEVICE_FUNC
PacketType packetwise_redux_empty_value(const scalar_product_op<Scalar,Scalar>& ) { return pset1<PacketType>(1); }
PacketType packetwise_redux_empty_value(const scalar_product_op<Scalar,Scalar>& ) {
return pset1<PacketType>(Scalar(1));
}
/* Perform the actual reduction */
template<typename Func, typename Evaluator,
@@ -145,7 +150,7 @@ struct evaluator<PartialReduxExpr<ArgType, MemberOp, Direction> >
enum {
CoeffReadCost = TraversalSize==Dynamic ? HugeCost
: TraversalSize==0 ? 1
: TraversalSize * evaluator<ArgType>::CoeffReadCost + int(CostOpType::value),
: int(TraversalSize) * int(evaluator<ArgType>::CoeffReadCost) + int(CostOpType::value),
_ArgFlags = evaluator<ArgType>::Flags,

View File

@@ -13,10 +13,10 @@
#if defined(EIGEN_INITIALIZE_MATRICES_BY_ZERO)
# define EIGEN_INITIALIZE_COEFFS
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(int i=0;i<base().size();++i) coeffRef(i)=Scalar(0);
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(Index i=0;i<base().size();++i) coeffRef(i)=Scalar(0);
#elif defined(EIGEN_INITIALIZE_MATRICES_BY_NAN)
# define EIGEN_INITIALIZE_COEFFS
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(int i=0;i<base().size();++i) coeffRef(i)=std::numeric_limits<Scalar>::quiet_NaN();
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(Index i=0;i<base().size();++i) coeffRef(i)=std::numeric_limits<Scalar>::quiet_NaN();
#else
# undef EIGEN_INITIALIZE_COEFFS
# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
@@ -118,16 +118,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
using Base::IsVectorAtCompileTime;
using Base::Flags;
template<typename PlainObjectType, int MapOptions, typename StrideType> friend class Eigen::Map;
friend class Eigen::Map<Derived, Unaligned>;
typedef Eigen::Map<Derived, Unaligned> MapType;
friend class Eigen::Map<const Derived, Unaligned>;
typedef const Eigen::Map<const Derived, Unaligned> ConstMapType;
#if EIGEN_MAX_ALIGN_BYTES>0
// for EIGEN_MAX_ALIGN_BYTES==0, AlignedMax==Unaligned, and many compilers generate warnings for friend-ing a class twice.
friend class Eigen::Map<Derived, AlignedMax>;
friend class Eigen::Map<const Derived, AlignedMax>;
#endif
typedef Eigen::Map<Derived, AlignedMax> AlignedMapType;
typedef const Eigen::Map<const Derived, AlignedMax> ConstAlignedMapType;
template<typename StrideType> struct StridedMapType { typedef Eigen::Map<Derived, Unaligned, StrideType> type; };
@@ -147,10 +139,10 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_DEVICE_FUNC
const Base& base() const { return *static_cast<const Base*>(this); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const { return m_storage.rows(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const { return m_storage.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_storage.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_storage.cols(); }
/** This is an overloaded version of DenseCoeffsBase<Derived,ReadOnlyAccessors>::coeff(Index,Index) const
* provided to by-pass the creation of an evaluator of the expression, thus saving compilation efforts.
@@ -508,8 +500,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_DEVICE_FUNC
PlainObjectBase& operator=(PlainObjectBase&& other) EIGEN_NOEXCEPT
{
using std::swap;
swap(m_storage, other.m_storage);
_check_template_params();
m_storage = std::move(other.m_storage);
return *this;
}
#endif
@@ -530,11 +522,11 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
/** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients. \cpp11
*
* \only_for_vectors
*
*
* This constructor is for 1D array or vectors with more than 4 coefficients.
* There exists C++98 analogue constructors for fixed-size array/vector having 1, 2, 3, or 4 coefficients.
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
* constructor must match the the fixed number of rows (resp. columns) of \c *this.
*/
template <typename... ArgTypes>
@@ -548,7 +540,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
m_storage.data()[1] = a1;
m_storage.data()[2] = a2;
m_storage.data()[3] = a3;
int i = 4;
Index i = 4;
auto x = {(m_storage.data()[i++] = args, 0)...};
static_cast<void>(x);
}
@@ -576,7 +568,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
eigen_assert(list.size() == static_cast<size_t>(RowsAtCompileTime) || RowsAtCompileTime == Dynamic);
eigen_assert(list_size == static_cast<size_t>(ColsAtCompileTime) || ColsAtCompileTime == Dynamic);
resize(list.size(), list_size);
Index row_index = 0;
for (const std::initializer_list<Scalar>& row : list) {
eigen_assert(list_size == row.size());
@@ -717,18 +709,26 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
using Base::setConstant;
EIGEN_DEVICE_FUNC Derived& setConstant(Index size, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, Index cols, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(NoChange_t, Index cols, const Scalar& val);
EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, NoChange_t, const Scalar& val);
using Base::setZero;
EIGEN_DEVICE_FUNC Derived& setZero(Index size);
EIGEN_DEVICE_FUNC Derived& setZero(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setZero(NoChange_t, Index cols);
EIGEN_DEVICE_FUNC Derived& setZero(Index rows, NoChange_t);
using Base::setOnes;
EIGEN_DEVICE_FUNC Derived& setOnes(Index size);
EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setOnes(NoChange_t, Index cols);
EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, NoChange_t);
using Base::setRandom;
Derived& setRandom(Index size);
Derived& setRandom(Index rows, Index cols);
Derived& setRandom(NoChange_t, Index cols);
Derived& setRandom(Index rows, NoChange_t);
#ifdef EIGEN_PLAINOBJECTBASE_PLUGIN
#include EIGEN_PLAINOBJECTBASE_PLUGIN
@@ -967,8 +967,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE void _check_template_params()
{
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (Options&RowMajor)==RowMajor)
&& EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, (Options&RowMajor)==0)
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (int(Options)&RowMajor)==RowMajor)
&& EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, (int(Options)&RowMajor)==0)
&& ((RowsAtCompileTime == Dynamic) || (RowsAtCompileTime >= 0))
&& ((ColsAtCompileTime == Dynamic) || (ColsAtCompileTime >= 0))
&& ((MaxRowsAtCompileTime == Dynamic) || (MaxRowsAtCompileTime >= 0))
@@ -980,6 +980,17 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
}
enum { IsPlainObjectBase = 1 };
#endif
public:
// These apparently need to be down here for nvcc+icc to prevent duplicate
// Map symbol.
template<typename PlainObjectType, int MapOptions, typename StrideType> friend class Eigen::Map;
friend class Eigen::Map<Derived, Unaligned>;
friend class Eigen::Map<const Derived, Unaligned>;
#if EIGEN_MAX_ALIGN_BYTES>0
// for EIGEN_MAX_ALIGN_BYTES==0, AlignedMax==Unaligned, and many compilers generate warnings for friend-ing a class twice.
friend class Eigen::Map<Derived, AlignedMax>;
friend class Eigen::Map<const Derived, AlignedMax>;
#endif
};
@@ -1008,7 +1019,7 @@ struct conservative_resize_like_impl
else
{
// The storage order does not allow us to use reallocation.
typename Derived::PlainObject tmp(rows,cols);
Derived tmp(rows,cols);
const Index common_rows = numext::mini(rows, _this.rows());
const Index common_cols = numext::mini(cols, _this.cols());
tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols);
@@ -1043,7 +1054,7 @@ struct conservative_resize_like_impl
else
{
// The storage order does not allow us to use reallocation.
typename Derived::PlainObject tmp(other);
Derived tmp(other);
const Index common_rows = numext::mini(tmp.rows(), _this.rows());
const Index common_cols = numext::mini(tmp.cols(), _this.cols());
tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols);

View File

@@ -23,25 +23,25 @@ struct traits<Product<Lhs, Rhs, Option> >
typedef typename remove_all<Rhs>::type RhsCleaned;
typedef traits<LhsCleaned> LhsTraits;
typedef traits<RhsCleaned> RhsTraits;
typedef MatrixXpr XprKind;
typedef typename ScalarBinaryOpTraits<typename traits<LhsCleaned>::Scalar, typename traits<RhsCleaned>::Scalar>::ReturnType Scalar;
typedef typename product_promote_storage_type<typename LhsTraits::StorageKind,
typename RhsTraits::StorageKind,
internal::product_type<Lhs,Rhs>::ret>::ret StorageKind;
typedef typename promote_index_type<typename LhsTraits::StorageIndex,
typename RhsTraits::StorageIndex>::type StorageIndex;
enum {
RowsAtCompileTime = LhsTraits::RowsAtCompileTime,
ColsAtCompileTime = RhsTraits::ColsAtCompileTime,
MaxRowsAtCompileTime = LhsTraits::MaxRowsAtCompileTime,
MaxColsAtCompileTime = RhsTraits::MaxColsAtCompileTime,
// FIXME: only needed by GeneralMatrixMatrixTriangular
InnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(LhsTraits::ColsAtCompileTime, RhsTraits::RowsAtCompileTime),
// The storage order is somewhat arbitrary here. The correct one will be determined through the evaluator.
Flags = (MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1) ? RowMajorBit
: (MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1) ? 0
@@ -74,10 +74,10 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
internal::product_type<_Lhs,_Rhs>::ret>::ret>
{
public:
typedef _Lhs Lhs;
typedef _Rhs Rhs;
typedef typename ProductImpl<
Lhs, Rhs, Option,
typename internal::product_promote_storage_type<typename internal::traits<Lhs>::StorageKind,
@@ -98,10 +98,10 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
&& "if you wanted a coeff-wise or a dot product use the respective explicit functions");
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_lhs.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_lhs.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const LhsNestedCleaned& lhs() const { return m_lhs; }
@@ -115,7 +115,7 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
};
namespace internal {
template<typename Lhs, typename Rhs, int Option, int ProductTag = internal::product_type<Lhs,Rhs>::ret>
class dense_product_base
: public internal::dense_xpr_base<Product<Lhs,Rhs,Option> >::type
@@ -131,7 +131,7 @@ class dense_product_base<Lhs, Rhs, Option, InnerProduct>
public:
using Base::derived;
typedef typename Base::Scalar Scalar;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator const Scalar() const
{
return internal::evaluator<ProductXpr>(derived()).coeff(0,0);
@@ -153,25 +153,25 @@ class ProductImpl<Lhs,Rhs,Option,Dense>
: public internal::dense_product_base<Lhs,Rhs,Option>
{
typedef Product<Lhs, Rhs, Option> Derived;
public:
typedef typename internal::dense_product_base<Lhs, Rhs, Option> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
protected:
enum {
IsOneByOne = (RowsAtCompileTime == 1 || RowsAtCompileTime == Dynamic) &&
IsOneByOne = (RowsAtCompileTime == 1 || RowsAtCompileTime == Dynamic) &&
(ColsAtCompileTime == 1 || ColsAtCompileTime == Dynamic),
EnableCoeff = IsOneByOne || Option==LazyProduct
};
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar coeff(Index row, Index col) const
{
EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS);
eigen_assert( (Option==LazyProduct) || (this->rows() == 1 && this->cols() == 1) );
return internal::evaluator<Derived>(derived()).coeff(row,col);
}
@@ -179,11 +179,11 @@ class ProductImpl<Lhs,Rhs,Option,Dense>
{
EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS);
eigen_assert( (Option==LazyProduct) || (this->rows() == 1 && this->cols() == 1) );
return internal::evaluator<Derived>(derived()).coeff(i);
}
};
} // end namespace Eigen

View File

@@ -14,7 +14,7 @@
#define EIGEN_PRODUCTEVALUATORS_H
namespace Eigen {
namespace internal {
/** \internal
@@ -22,19 +22,19 @@ namespace internal {
* Since products require special treatments to handle all possible cases,
* we simply defer the evaluation logic to a product_evaluator class
* which offers more partial specialization possibilities.
*
*
* \sa class product_evaluator
*/
template<typename Lhs, typename Rhs, int Options>
struct evaluator<Product<Lhs, Rhs, Options> >
struct evaluator<Product<Lhs, Rhs, Options> >
: public product_evaluator<Product<Lhs, Rhs, Options> >
{
typedef Product<Lhs, Rhs, Options> XprType;
typedef product_evaluator<XprType> Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
// Catch "scalar * ( A * B )" and transform it to "(A*scalar) * B"
// TODO we should apply that rule only if that's really helpful
template<typename Lhs, typename Rhs, typename Scalar1, typename Scalar2, typename Plain1>
@@ -62,12 +62,12 @@ struct evaluator<CwiseBinaryOp<internal::scalar_product_op<Scalar1,Scalar2>,
template<typename Lhs, typename Rhs, int DiagIndex>
struct evaluator<Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> >
struct evaluator<Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> >
: public evaluator<Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex> >
{
typedef Diagonal<const Product<Lhs, Rhs, DefaultProduct>, DiagIndex> XprType;
typedef evaluator<Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex> > Base;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr)
: Base(Diagonal<const Product<Lhs, Rhs, LazyProduct>, DiagIndex>(
Product<Lhs, Rhs, LazyProduct>(xpr.nestedExpression().lhs(), xpr.nestedExpression().rhs()),
@@ -108,23 +108,23 @@ struct product_evaluator<Product<Lhs, Rhs, Options>, ProductTag, LhsShape, RhsSh
: m_result(xpr.rows(), xpr.cols())
{
::new (static_cast<Base*>(this)) Base(m_result);
// FIXME shall we handle nested_eval here?,
// if so, then we must take care at removing the call to nested_eval in the specializations (e.g., in permutation_matrix_product, transposition_matrix_product, etc.)
// typedef typename internal::nested_eval<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
// typedef typename internal::nested_eval<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
// typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
// typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
//
//
// const LhsNested lhs(xpr.lhs());
// const RhsNested rhs(xpr.rhs());
//
//
// generic_product_impl<LhsNestedCleaned, RhsNestedCleaned>::evalTo(m_result, lhs, rhs);
generic_product_impl<Lhs, Rhs, LhsShape, RhsShape, ProductTag>::evalTo(m_result, xpr.lhs(), xpr.rhs());
}
protected:
protected:
PlainObject m_result;
};
@@ -250,13 +250,13 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,InnerProduct>
{
dst.coeffRef(0,0) = (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
dst.coeffRef(0,0) += (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.coeffRef(0,0) -= (lhs.transpose().cwiseProduct(rhs)).sum(); }
@@ -298,7 +298,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
{
template<typename T> struct is_row_major : internal::conditional<(int(T::Flags)&RowMajorBit), internal::true_type, internal::false_type>::type {};
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
// TODO it would be nice to be able to exploit our *_assign_op functors for that purpose
struct set { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() = src; } };
struct add { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() += src; } };
@@ -310,31 +310,31 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
dst.const_cast_derived() += m_scale * src;
}
};
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, set(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, add(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, sub(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
internal::outer_product_selector_run(dst, lhs, rhs, adds(alpha), is_row_major<Dst>());
}
};
@@ -343,7 +343,7 @@ template<typename Lhs, typename Rhs, typename Derived>
struct generic_product_impl_base
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.setZero(); scaleAndAddTo(dst, lhs, rhs, Scalar(1)); }
@@ -355,7 +355,7 @@ struct generic_product_impl_base
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ scaleAndAddTo(dst, lhs, rhs, Scalar(-1)); }
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{ Derived::scaleAndAddTo(dst,lhs,rhs,alpha); }
@@ -375,6 +375,11 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
template<typename Dest>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
// Fallback to inner product if both the lhs and rhs is a runtime vector.
if (lhs.rows() == 1 && rhs.cols() == 1) {
dst.coeffRef(0,0) += alpha * lhs.row(0).conjugate().dot(rhs.col(0));
return;
}
LhsNested actual_lhs(lhs);
RhsNested actual_rhs(rhs);
internal::gemv_dense_selector<Side,
@@ -385,10 +390,10 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
};
template<typename Lhs, typename Rhs>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
@@ -403,7 +408,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
// dst.noalias() += lhs.lazyProduct(rhs);
call_assignment_no_alias(dst, lhs.lazyProduct(rhs), internal::add_assign_op<typename Dst::Scalar,Scalar>());
}
template<typename Dst>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
@@ -436,8 +441,8 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
};
// FIXME: in c++11 this should be auto, and extractScalarFactor should also return auto
// this is important for real*complex_mat
Scalar actualAlpha = blas_traits<Lhs>::extractScalarFactor(lhs)
* blas_traits<Rhs>::extractScalarFactor(rhs);
Scalar actualAlpha = combine_scalar_factors<Scalar>(lhs, rhs);
eval_dynamic_impl(dst,
blas_traits<Lhs>::extract(lhs).template conjugateIf<ConjLhs>(),
blas_traits<Rhs>::extract(rhs).template conjugateIf<ConjRhs>(),
@@ -520,7 +525,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
typedef typename internal::nested_eval<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
typedef typename internal::nested_eval<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
@@ -539,19 +544,19 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
typedef typename find_best_packet<Scalar,ColsAtCompileTime>::type RhsVecPacketType;
enum {
LhsCoeffReadCost = LhsEtorType::CoeffReadCost,
RhsCoeffReadCost = RhsEtorType::CoeffReadCost,
CoeffReadCost = InnerSize==0 ? NumTraits<Scalar>::ReadCost
: InnerSize == Dynamic ? HugeCost
: InnerSize * (NumTraits<Scalar>::MulCost + LhsCoeffReadCost + RhsCoeffReadCost)
: InnerSize * (NumTraits<Scalar>::MulCost + int(LhsCoeffReadCost) + int(RhsCoeffReadCost))
+ (InnerSize - 1) * NumTraits<Scalar>::AddCost,
Unroll = CoeffReadCost <= EIGEN_UNROLLING_LIMIT,
LhsFlags = LhsEtorType::Flags,
RhsFlags = RhsEtorType::Flags,
LhsRowMajor = LhsFlags & RowMajorBit,
RhsRowMajor = RhsFlags & RowMajorBit,
@@ -561,7 +566,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
// Here, we don't care about alignment larger than the usable packet size.
LhsAlignment = EIGEN_PLAIN_ENUM_MIN(LhsEtorType::Alignment,LhsVecPacketSize*int(sizeof(typename LhsNestedCleaned::Scalar))),
RhsAlignment = EIGEN_PLAIN_ENUM_MIN(RhsEtorType::Alignment,RhsVecPacketSize*int(sizeof(typename RhsNestedCleaned::Scalar))),
SameType = is_same<typename LhsNestedCleaned::Scalar,typename RhsNestedCleaned::Scalar>::value,
CanVectorizeRhs = bool(RhsRowMajor) && (RhsFlags & PacketAccessBit) && (ColsAtCompileTime!=1),
@@ -571,12 +576,12 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
: (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0
: (bool(RhsRowMajor) && !CanVectorizeLhs),
Flags = ((unsigned int)(LhsFlags | RhsFlags) & HereditaryBits & ~RowMajorBit)
Flags = ((int(LhsFlags) | int(RhsFlags)) & HereditaryBits & ~RowMajorBit)
| (EvalToRowMajor ? RowMajorBit : 0)
// TODO enable vectorization for mixed types
| (SameType && (CanVectorizeLhs || CanVectorizeRhs) ? PacketAccessBit : 0)
| (XprType::IsVectorAtCompileTime ? LinearAccessBit : 0),
LhsOuterStrideBytes = int(LhsNestedCleaned::OuterStrideAtCompileTime) * int(sizeof(typename LhsNestedCleaned::Scalar)),
RhsOuterStrideBytes = int(RhsNestedCleaned::OuterStrideAtCompileTime) * int(sizeof(typename RhsNestedCleaned::Scalar)),
@@ -592,10 +597,10 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
CanVectorizeInner = SameType
&& LhsRowMajor
&& (!RhsRowMajor)
&& (LhsFlags & RhsFlags & ActualPacketAccessBit)
&& (InnerSize % packet_traits<Scalar>::size == 0)
&& (int(LhsFlags) & int(RhsFlags) & ActualPacketAccessBit)
&& (int(InnerSize) % packet_traits<Scalar>::size == 0)
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index row, Index col) const
{
return (m_lhs.row(row).transpose().cwiseProduct( m_rhs.col(col) )).sum();
@@ -637,7 +642,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
protected:
typename internal::add_const_on_value_type<LhsNested>::type m_lhs;
typename internal::add_const_on_value_type<RhsNested>::type m_rhs;
LhsEtorType m_lhsImpl;
RhsEtorType m_rhsImpl;
@@ -668,7 +673,7 @@ struct product_evaluator<Product<Lhs, Rhs, DefaultProduct>, LazyCoeffBasedProduc
template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
{
etor_product_packet_impl<RowMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
res = pmadd(pset1<Packet>(lhs.coeff(row, Index(UnrollingIndex-1))), rhs.template packet<LoadMode,Packet>(Index(UnrollingIndex-1), col), res);
@@ -678,7 +683,7 @@ struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, Load
template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
{
etor_product_packet_impl<ColMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
res = pmadd(lhs.template packet<LoadMode,Packet>(row, Index(UnrollingIndex-1)), pset1<Packet>(rhs.coeff(Index(UnrollingIndex-1), col)), res);
@@ -688,7 +693,7 @@ struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, Load
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, 1, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
{
res = pmul(pset1<Packet>(lhs.coeff(row, Index(0))),rhs.template packet<LoadMode,Packet>(Index(0), col));
}
@@ -697,7 +702,7 @@ struct etor_product_packet_impl<RowMajor, 1, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, 1, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
{
res = pmul(lhs.template packet<LoadMode,Packet>(row, Index(0)), pset1<Packet>(rhs.coeff(Index(0), col)));
}
@@ -706,7 +711,7 @@ struct etor_product_packet_impl<ColMajor, 1, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
}
@@ -715,7 +720,7 @@ struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& /*lhs*/, const Rhs& /*rhs*/, Index /*innerDim*/, Packet &res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
}
@@ -724,7 +729,7 @@ struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
for(Index i = 0; i < innerDim; ++i)
@@ -735,7 +740,7 @@ struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
struct etor_product_packet_impl<ColMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
{
static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
{
res = pset1<Packet>(typename unpacket_traits<Packet>::type(0));
for(Index i = 0; i < innerDim; ++i)
@@ -757,7 +762,7 @@ struct generic_product_impl<Lhs,Rhs,TriangularShape,DenseShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,TriangularShape,DenseShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -771,7 +776,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,TriangularShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,DenseShape,TriangularShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -792,7 +797,7 @@ struct generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static EIGEN_DEVICE_FUNC
void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
@@ -806,7 +811,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag>
: generic_product_impl_base<Lhs,Rhs,generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag> >
{
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
@@ -818,7 +823,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,SelfAdjointShape,ProductTag>
/***************************************************************************
* Diagonal products
***************************************************************************/
template<typename MatrixType, typename DiagonalType, typename Derived, int ProductOrder>
struct diagonal_product_evaluator_base
: evaluator_base<Derived>
@@ -826,11 +831,11 @@ struct diagonal_product_evaluator_base
typedef typename ScalarBinaryOpTraits<typename MatrixType::Scalar, typename DiagonalType::Scalar>::ReturnType Scalar;
public:
enum {
CoeffReadCost = NumTraits<Scalar>::MulCost + evaluator<MatrixType>::CoeffReadCost + evaluator<DiagonalType>::CoeffReadCost,
CoeffReadCost = int(NumTraits<Scalar>::MulCost) + int(evaluator<MatrixType>::CoeffReadCost) + int(evaluator<DiagonalType>::CoeffReadCost),
MatrixFlags = evaluator<MatrixType>::Flags,
DiagFlags = evaluator<DiagonalType>::Flags,
_StorageOrder = (Derived::MaxRowsAtCompileTime==1 && Derived::MaxColsAtCompileTime!=1) ? RowMajor
: (Derived::MaxColsAtCompileTime==1 && Derived::MaxRowsAtCompileTime!=1) ? ColMajor
: MatrixFlags & RowMajorBit ? RowMajor : ColMajor,
@@ -853,14 +858,14 @@ public:
|| (DiagonalType::SizeAtCompileTime==Dynamic && MatrixType::RowsAtCompileTime==1 && ProductOrder==OnTheLeft)
|| (DiagonalType::SizeAtCompileTime==Dynamic && MatrixType::ColsAtCompileTime==1 && ProductOrder==OnTheRight)
};
diagonal_product_evaluator_base(const MatrixType &mat, const DiagonalType &diag)
EIGEN_DEVICE_FUNC diagonal_product_evaluator_base(const MatrixType &mat, const DiagonalType &diag)
: m_diagImpl(diag), m_matImpl(mat)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(NumTraits<Scalar>::MulCost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index idx) const
{
if(AsScalarProduct)
@@ -868,7 +873,7 @@ public:
else
return m_diagImpl.coeff(idx) * m_matImpl.coeff(idx);
}
protected:
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet_impl(Index row, Index col, Index id, internal::true_type) const
@@ -876,7 +881,7 @@ protected:
return internal::pmul(m_matImpl.template packet<LoadMode,PacketType>(row, col),
internal::pset1<PacketType>(m_diagImpl.coeff(id)));
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet_impl(Index row, Index col, Index id, internal::false_type) const
{
@@ -887,7 +892,7 @@ protected:
return internal::pmul(m_matImpl.template packet<LoadMode,PacketType>(row, col),
m_diagImpl.template packet<DiagonalPacketLoadMode,PacketType>(id));
}
evaluator<DiagonalType> m_diagImpl;
evaluator<MatrixType> m_matImpl;
};
@@ -902,24 +907,24 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
using Base::m_matImpl;
using Base::coeff;
typedef typename Base::Scalar Scalar;
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
typedef typename Lhs::DiagonalVectorType DiagonalType;
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.rhs(), xpr.lhs().diagonal())
{
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
{
return m_diagImpl.coeff(row) * m_matImpl.coeff(row, col);
}
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
@@ -929,7 +934,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
return this->template packet_impl<LoadMode,PacketType>(row,col, row,
typename internal::conditional<int(StorageOrder)==RowMajor, internal::true_type, internal::false_type>::type());
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index idx) const
{
@@ -948,22 +953,22 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
using Base::m_matImpl;
using Base::coeff;
typedef typename Base::Scalar Scalar;
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.lhs(), xpr.rhs().diagonal())
{
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
{
return m_matImpl.coeff(row, col) * m_diagImpl.coeff(col);
}
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
@@ -971,7 +976,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
return this->template packet_impl<LoadMode,PacketType>(row,col, col,
typename internal::conditional<int(StorageOrder)==ColMajor, internal::true_type, internal::false_type>::type());
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index idx) const
{
@@ -999,7 +1004,7 @@ struct permutation_matrix_product<ExpressionType, Side, Transposed, DenseShape>
typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
template<typename Dest, typename PermutationType>
static inline void run(Dest& dst, const PermutationType& perm, const ExpressionType& xpr)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Dest& dst, const PermutationType& perm, const ExpressionType& xpr)
{
MatrixType mat(xpr);
const Index n = Side==OnTheLeft ? mat.rows() : mat.cols();
@@ -1053,7 +1058,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, PermutationShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
permutation_matrix_product<Rhs, OnTheLeft, false, MatrixShape>::run(dst, lhs, rhs);
}
@@ -1063,7 +1068,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, MatrixShape, PermutationShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
permutation_matrix_product<Lhs, OnTheRight, false, MatrixShape>::run(dst, rhs, lhs);
}
@@ -1073,7 +1078,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Inverse<Lhs>, Rhs, PermutationShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Inverse<Lhs>& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Inverse<Lhs>& lhs, const Rhs& rhs)
{
permutation_matrix_product<Rhs, OnTheLeft, true, MatrixShape>::run(dst, lhs.nestedExpression(), rhs);
}
@@ -1083,7 +1088,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Inverse<Rhs>, MatrixShape, PermutationShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Inverse<Rhs>& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Inverse<Rhs>& rhs)
{
permutation_matrix_product<Lhs, OnTheRight, true, MatrixShape>::run(dst, rhs.nestedExpression(), lhs);
}
@@ -1105,9 +1110,9 @@ struct transposition_matrix_product
{
typedef typename nested_eval<ExpressionType, 1>::type MatrixType;
typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
template<typename Dest, typename TranspositionType>
static inline void run(Dest& dst, const TranspositionType& tr, const ExpressionType& xpr)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Dest& dst, const TranspositionType& tr, const ExpressionType& xpr)
{
MatrixType mat(xpr);
typedef typename TranspositionType::StorageIndex StorageIndex;
@@ -1130,7 +1135,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, TranspositionsShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
transposition_matrix_product<Rhs, OnTheLeft, false, MatrixShape>::run(dst, lhs, rhs);
}
@@ -1140,7 +1145,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Rhs, MatrixShape, TranspositionsShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Rhs& rhs)
{
transposition_matrix_product<Lhs, OnTheRight, false, MatrixShape>::run(dst, rhs, lhs);
}
@@ -1151,7 +1156,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Transpose<Lhs>, Rhs, TranspositionsShape, MatrixShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Transpose<Lhs>& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Transpose<Lhs>& lhs, const Rhs& rhs)
{
transposition_matrix_product<Rhs, OnTheLeft, true, MatrixShape>::run(dst, lhs.nestedExpression(), rhs);
}
@@ -1161,7 +1166,7 @@ template<typename Lhs, typename Rhs, int ProductTag, typename MatrixShape>
struct generic_product_impl<Lhs, Transpose<Rhs>, MatrixShape, TranspositionsShape, ProductTag>
{
template<typename Dest>
static void evalTo(Dest& dst, const Lhs& lhs, const Transpose<Rhs>& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dest& dst, const Lhs& lhs, const Transpose<Rhs>& rhs)
{
transposition_matrix_product<Lhs, OnTheRight, true, MatrixShape>::run(dst, rhs.nestedExpression(), lhs);
}

View File

@@ -177,6 +177,42 @@ PlainObjectBase<Derived>::setRandom(Index rows, Index cols)
return setRandom();
}
/** Resizes to the given size, changing only the number of columns, and sets all
* coefficients in this expression to random values. For the parameter of type
* NoChange_t, just pass the special value \c NoChange.
*
* Numbers are uniformly spread through their whole definition range for integer types,
* and in the [-1:1] range for floating point scalar types.
*
* \not_reentrant
*
* \sa DenseBase::setRandom(), setRandom(Index), setRandom(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Random()
*/
template<typename Derived>
EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setRandom(NoChange_t, Index cols)
{
return setRandom(rows(), cols);
}
/** Resizes to the given size, changing only the number of rows, and sets all
* coefficients in this expression to random values. For the parameter of type
* NoChange_t, just pass the special value \c NoChange.
*
* Numbers are uniformly spread through their whole definition range for integer types,
* and in the [-1:1] range for floating point scalar types.
*
* \not_reentrant
*
* \sa DenseBase::setRandom(), setRandom(Index), setRandom(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Random()
*/
template<typename Derived>
EIGEN_STRONG_INLINE Derived&
PlainObjectBase<Derived>::setRandom(Index rows, NoChange_t)
{
return setRandom(rows, cols());
}
} // end namespace Eigen
#endif // EIGEN_RANDOM_H

View File

@@ -58,7 +58,7 @@ public:
public:
enum {
Cost = Evaluator::SizeAtCompileTime == Dynamic ? HugeCost
: Evaluator::SizeAtCompileTime * Evaluator::CoeffReadCost + (Evaluator::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
: int(Evaluator::SizeAtCompileTime) * int(Evaluator::CoeffReadCost) + (Evaluator::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
UnrollingLimit = EIGEN_UNROLLING_LIMIT * (int(Traversal) == int(DefaultTraversal) ? 1 : int(PacketSize))
};
@@ -331,7 +331,7 @@ struct redux_impl<Func, Evaluator, LinearVectorizedTraversal, CompleteUnrolling>
enum {
PacketSize = redux_traits<Func, Evaluator>::PacketSize,
Size = Evaluator::SizeAtCompileTime,
VectorizedSize = (Size / PacketSize) * PacketSize
VectorizedSize = (int(Size) / int(PacketSize)) * int(PacketSize)
};
template<typename XprType>
@@ -419,25 +419,33 @@ DenseBase<Derived>::redux(const Func& func) const
}
/** \returns the minimum of all coefficients of \c *this.
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is minimum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
* \warning the result is undefined if \c *this contains NaN.
*/
template<typename Derived>
template<int NaNPropagation>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff() const
{
return derived().redux(Eigen::internal::scalar_min_op<Scalar,Scalar>());
return derived().redux(Eigen::internal::scalar_min_op<Scalar,Scalar, NaNPropagation>());
}
/** \returns the maximum of all coefficients of \c *this.
/** \returns the maximum of all coefficients of \c *this.
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
* \warning the result is undefined if \c *this contains NaN.
*/
template<typename Derived>
template<int NaNPropagation>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff() const
{
return derived().redux(Eigen::internal::scalar_max_op<Scalar,Scalar>());
return derived().redux(Eigen::internal::scalar_max_op<Scalar,Scalar, NaNPropagation>());
}
/** \returns the sum of all coefficients of \c *this

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_REF_H
#define EIGEN_REF_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -48,7 +48,7 @@ struct traits<Ref<_PlainObjectType, _Options, _StrideType> >
};
typedef typename internal::conditional<MatchAtCompileTime,internal::true_type,internal::false_type>::type type;
};
};
template<typename Derived>
@@ -67,12 +67,12 @@ public:
typedef MapBase<Derived> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(RefBase)
EIGEN_DEVICE_FUNC inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const
{
return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1;
}
EIGEN_DEVICE_FUNC inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const
{
return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()
: IsVectorAtCompileTime ? this->size()
@@ -86,36 +86,122 @@ public:
m_stride(StrideType::OuterStrideAtCompileTime==Dynamic?0:StrideType::OuterStrideAtCompileTime,
StrideType::InnerStrideAtCompileTime==Dynamic?0:StrideType::InnerStrideAtCompileTime)
{}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(RefBase)
protected:
typedef Stride<StrideType::OuterStrideAtCompileTime,StrideType::InnerStrideAtCompileTime> StrideBase;
template<typename Expression>
EIGEN_DEVICE_FUNC void construct(Expression& expr)
{
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(PlainObjectType,Expression);
// Resolves inner stride if default 0.
static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index resolveInnerStride(Index inner) {
return inner == 0 ? 1 : inner;
}
// Resolves outer stride if default 0.
static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index resolveOuterStride(Index inner, Index outer, Index rows, Index cols, bool isVectorAtCompileTime, bool isRowMajor) {
return outer == 0 ? isVectorAtCompileTime ? inner * rows * cols : isRowMajor ? inner * cols : inner * rows : outer;
}
// Returns true if construction is valid, false if there is a stride mismatch,
// and fails if there is a size mismatch.
template<typename Expression>
EIGEN_DEVICE_FUNC bool construct(Expression& expr)
{
// Check matrix sizes. If this is a compile-time vector, we do allow
// implicitly transposing.
EIGEN_STATIC_ASSERT(
EIGEN_PREDICATE_SAME_MATRIX_SIZE(PlainObjectType, Expression)
// If it is a vector, the transpose sizes might match.
|| ( PlainObjectType::IsVectorAtCompileTime
&& ((int(PlainObjectType::RowsAtCompileTime)==Eigen::Dynamic
|| int(Expression::ColsAtCompileTime)==Eigen::Dynamic
|| int(PlainObjectType::RowsAtCompileTime)==int(Expression::ColsAtCompileTime))
&& (int(PlainObjectType::ColsAtCompileTime)==Eigen::Dynamic
|| int(Expression::RowsAtCompileTime)==Eigen::Dynamic
|| int(PlainObjectType::ColsAtCompileTime)==int(Expression::RowsAtCompileTime)))),
YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES
)
// Determine runtime rows and columns.
Index rows = expr.rows();
Index cols = expr.cols();
if(PlainObjectType::RowsAtCompileTime==1)
{
eigen_assert(expr.rows()==1 || expr.cols()==1);
::new (static_cast<Base*>(this)) Base(expr.data(), 1, expr.size());
rows = 1;
cols = expr.size();
}
else if(PlainObjectType::ColsAtCompileTime==1)
{
eigen_assert(expr.rows()==1 || expr.cols()==1);
::new (static_cast<Base*>(this)) Base(expr.data(), expr.size(), 1);
rows = expr.size();
cols = 1;
}
else
::new (static_cast<Base*>(this)) Base(expr.data(), expr.rows(), expr.cols());
if(Expression::IsVectorAtCompileTime && (!PlainObjectType::IsVectorAtCompileTime) && ((Expression::Flags&RowMajorBit)!=(PlainObjectType::Flags&RowMajorBit)))
::new (&m_stride) StrideBase(expr.innerStride(), StrideType::InnerStrideAtCompileTime==0?0:1);
else
::new (&m_stride) StrideBase(StrideType::OuterStrideAtCompileTime==0?0:expr.outerStride(),
StrideType::InnerStrideAtCompileTime==0?0:expr.innerStride());
// Verify that the sizes are valid.
eigen_assert(
(PlainObjectType::RowsAtCompileTime == Dynamic) || (PlainObjectType::RowsAtCompileTime == rows));
eigen_assert(
(PlainObjectType::ColsAtCompileTime == Dynamic) || (PlainObjectType::ColsAtCompileTime == cols));
// If this is a vector, we might be transposing, which means that stride should swap.
const bool transpose = PlainObjectType::IsVectorAtCompileTime && (rows != expr.rows());
// If the storage format differs, we also need to swap the stride.
const bool row_major = ((PlainObjectType::Flags)&RowMajorBit) != 0;
const bool expr_row_major = (Expression::Flags&RowMajorBit) != 0;
const bool storage_differs = (row_major != expr_row_major);
const bool swap_stride = (transpose != storage_differs);
// Determine expr's actual strides, resolving any defaults if zero.
const Index expr_inner_actual = resolveInnerStride(expr.innerStride());
const Index expr_outer_actual = resolveOuterStride(expr_inner_actual,
expr.outerStride(),
expr.rows(),
expr.cols(),
Expression::IsVectorAtCompileTime != 0,
expr_row_major);
// If this is a column-major row vector or row-major column vector, the inner-stride
// is arbitrary, so set it to either the compile-time inner stride or 1.
const bool row_vector = (rows == 1);
const bool col_vector = (cols == 1);
const Index inner_stride =
( (!row_major && row_vector) || (row_major && col_vector) ) ?
( StrideType::InnerStrideAtCompileTime > 0 ? Index(StrideType::InnerStrideAtCompileTime) : 1)
: swap_stride ? expr_outer_actual : expr_inner_actual;
// If this is a column-major column vector or row-major row vector, the outer-stride
// is arbitrary, so set it to either the compile-time outer stride or vector size.
const Index outer_stride =
( (!row_major && col_vector) || (row_major && row_vector) ) ?
( StrideType::OuterStrideAtCompileTime > 0 ? Index(StrideType::OuterStrideAtCompileTime) : rows * cols * inner_stride)
: swap_stride ? expr_inner_actual : expr_outer_actual;
// Check if given inner/outer strides are compatible with compile-time strides.
const bool inner_valid = (StrideType::InnerStrideAtCompileTime == Dynamic)
|| (resolveInnerStride(Index(StrideType::InnerStrideAtCompileTime)) == inner_stride);
if (!inner_valid) {
return false;
}
const bool outer_valid = (StrideType::OuterStrideAtCompileTime == Dynamic)
|| (resolveOuterStride(
inner_stride,
Index(StrideType::OuterStrideAtCompileTime),
rows, cols, PlainObjectType::IsVectorAtCompileTime != 0,
row_major)
== outer_stride);
if (!outer_valid) {
return false;
}
::new (static_cast<Base*>(this)) Base(expr.data(), rows, cols);
::new (&m_stride) StrideBase(
(StrideType::OuterStrideAtCompileTime == 0) ? 0 : outer_stride,
(StrideType::InnerStrideAtCompileTime == 0) ? 0 : inner_stride );
return true;
}
StrideBase m_stride;
@@ -212,7 +298,10 @@ template<typename PlainObjectType, int Options, typename StrideType> class Ref
typename internal::enable_if<bool(Traits::template match<Derived>::MatchAtCompileTime),Derived>::type* = 0)
{
EIGEN_STATIC_ASSERT(bool(Traits::template match<Derived>::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH);
Base::construct(expr.derived());
// Construction must pass since we will not create temprary storage in the non-const case.
const bool success = Base::construct(expr.derived());
EIGEN_UNUSED_VARIABLE(success)
eigen_assert(success);
}
template<typename Derived>
EIGEN_DEVICE_FUNC inline Ref(const DenseBase<Derived>& expr,
@@ -223,10 +312,13 @@ template<typename PlainObjectType, int Options, typename StrideType> class Ref
inline Ref(DenseBase<Derived>& expr)
#endif
{
EIGEN_STATIC_ASSERT(bool(internal::is_lvalue<Derived>::value), THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
EIGEN_STATIC_ASSERT(bool(Traits::template match<Derived>::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH);
EIGEN_STATIC_ASSERT((static_cast<bool>(internal::is_lvalue<Derived>::value)), THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
EIGEN_STATIC_ASSERT((static_cast<bool>(Traits::template match<Derived>::MatchAtCompileTime)), STORAGE_LAYOUT_DOES_NOT_MATCH);
EIGEN_STATIC_ASSERT(!Derived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
Base::construct(expr.const_cast_derived());
// Construction must pass since we will not create temporary storage in the non-const case.
const bool success = Base::construct(expr.const_cast_derived());
EIGEN_UNUSED_VARIABLE(success)
eigen_assert(success);
}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Ref)
@@ -267,7 +359,10 @@ template<typename TPlainObjectType, int Options, typename StrideType> class Ref<
template<typename Expression>
EIGEN_DEVICE_FUNC void construct(const Expression& expr,internal::true_type)
{
Base::construct(expr);
// Check if we can use the underlying expr's storage directly, otherwise call the copy version.
if (!Base::construct(expr)) {
construct(expr, internal::false_type());
}
}
template<typename Expression>

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_REPLICATE_H
#define EIGEN_REPLICATE_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename MatrixType,int RowFactor,int ColFactor>
@@ -35,7 +35,7 @@ struct traits<Replicate<MatrixType,RowFactor,ColFactor> >
IsRowMajor = MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1 ? 1
: MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1 ? 0
: (MatrixType::Flags & RowMajorBit) ? 1 : 0,
// FIXME enable DirectAccess with negative strides?
Flags = IsRowMajor ? RowMajorBit : 0
};
@@ -88,15 +88,15 @@ template<typename MatrixType,int RowFactor,int ColFactor> class Replicate
THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE)
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const { return m_matrix.rows() * m_rowFactor.value(); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const { return m_matrix.cols() * m_colFactor.value(); }
EIGEN_DEVICE_FUNC
const _MatrixTypeNested& nestedExpression() const
{
return m_matrix;
{
return m_matrix;
}
protected:

View File

@@ -12,7 +12,6 @@
#define EIGEN_RESHAPED_H
namespace Eigen {
namespace internal {
/** \class Reshaped
* \ingroup Core_Module
@@ -44,6 +43,8 @@ namespace internal {
* \sa DenseBase::reshaped(NRowsType,NColsType)
*/
namespace internal {
template<typename XprType, int Rows, int Cols, int Order>
struct traits<Reshaped<XprType, Rows, Cols, Order> > : traits<XprType>
{
@@ -239,17 +240,17 @@ class ReshapedImpl_dense<XprType, Rows, Cols, Order, true>
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const
{
return m_xpr.innerStride();
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const
{
return ((Flags&RowMajorBit)==RowMajorBit) ? this->cols() : this->rows();
return (((Flags&RowMajorBit)==RowMajorBit) ? this->cols() : this->rows()) * m_xpr.innerStride();
}
protected:

View File

@@ -60,8 +60,10 @@ template<typename Derived> class ReturnByValue
EIGEN_DEVICE_FUNC
inline void evalTo(Dest& dst) const
{ static_cast<const Derived*>(this)->evalTo(dst); }
EIGEN_DEVICE_FUNC inline Index rows() const { return static_cast<const Derived*>(this)->rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return static_cast<const Derived*>(this)->cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return static_cast<const Derived*>(this)->rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return static_cast<const Derived*>(this)->cols(); }
#ifndef EIGEN_PARSED_BY_DOXYGEN
#define Unusable YOU_ARE_TRYING_TO_ACCESS_A_SINGLE_COEFFICIENT_IN_A_SPECIAL_EXPRESSION_WHERE_THAT_IS_NOT_ALLOWED_BECAUSE_THAT_WOULD_BE_INEFFICIENT
@@ -90,7 +92,7 @@ namespace internal {
// Expression is evaluated in a temporary; default implementation of Assignment is bypassed so that
// when a ReturnByValue expression is assigned, the evaluator is not constructed.
// TODO: Finalize port to new regime; ReturnByValue should not exist in the expression world
template<typename Derived>
struct evaluator<ReturnByValue<Derived> >
: public evaluator<typename internal::traits<Derived>::ReturnType>
@@ -98,7 +100,7 @@ struct evaluator<ReturnByValue<Derived> >
typedef ReturnByValue<Derived> XprType;
typedef typename internal::traits<Derived>::ReturnType PlainObject;
typedef evaluator<PlainObject> Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr)
: m_result(xpr.rows(), xpr.cols())
{

View File

@@ -12,7 +12,7 @@
#ifndef EIGEN_REVERSE_H
#define EIGEN_REVERSE_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -44,7 +44,7 @@ template<typename PacketType> struct reverse_packet_cond<PacketType,false>
static inline PacketType run(const PacketType& x) { return x; }
};
} // end namespace internal
} // end namespace internal
/** \class Reverse
* \ingroup Core_Module
@@ -89,8 +89,10 @@ template<typename MatrixType, int Direction> class Reverse
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Reverse)
EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC inline Index innerStride() const
{
@@ -98,7 +100,7 @@ template<typename MatrixType, int Direction> class Reverse
}
EIGEN_DEVICE_FUNC const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
nestedExpression() const
{
return m_matrix;
}
@@ -161,7 +163,7 @@ EIGEN_DEVICE_FUNC inline void DenseBase<Derived>::reverseInPlace()
}
namespace internal {
template<int Direction>
struct vectorwise_reverse_inplace_impl;

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SELECT_H
#define EIGEN_SELECT_H
namespace Eigen {
namespace Eigen {
/** \class Select
* \ingroup Core_Module
@@ -67,8 +67,10 @@ class Select : public internal::dense_xpr_base< Select<ConditionMatrixType, Then
eigen_assert(m_condition.cols() == m_then.cols() && m_condition.cols() == m_else.cols());
}
inline EIGEN_DEVICE_FUNC Index rows() const { return m_condition.rows(); }
inline EIGEN_DEVICE_FUNC Index cols() const { return m_condition.cols(); }
inline EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_condition.rows(); }
inline EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_condition.cols(); }
inline EIGEN_DEVICE_FUNC
const Scalar coeff(Index i, Index j) const
@@ -120,7 +122,7 @@ class Select : public internal::dense_xpr_base< Select<ConditionMatrixType, Then
*/
template<typename Derived>
template<typename ThenDerived,typename ElseDerived>
inline const Select<Derived,ThenDerived,ElseDerived>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived,ElseDerived>
DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
const DenseBase<ElseDerived>& elseMatrix) const
{
@@ -134,7 +136,7 @@ DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
*/
template<typename Derived>
template<typename ThenDerived>
inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
inline EIGEN_DEVICE_FUNC const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
const typename ThenDerived::Scalar& elseScalar) const
{
@@ -149,7 +151,7 @@ DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
*/
template<typename Derived>
template<typename ElseDerived>
inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
inline EIGEN_DEVICE_FUNC const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
DenseBase<Derived>::select(const typename ElseDerived::Scalar& thenScalar,
const DenseBase<ElseDerived>& elseMatrix) const
{

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SELFADJOINTMATRIX_H
#define EIGEN_SELFADJOINTMATRIX_H
namespace Eigen {
namespace Eigen {
/** \class SelfAdjointView
* \ingroup Core_Module
@@ -58,7 +58,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
typedef MatrixTypeNestedCleaned NestedExpression;
/** \brief The type of coefficients in this matrix */
typedef typename internal::traits<SelfAdjointView>::Scalar Scalar;
typedef typename internal::traits<SelfAdjointView>::Scalar Scalar;
typedef typename MatrixType::StorageIndex StorageIndex;
typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
typedef SelfAdjointView<typename internal::add_const<MatrixType>::type, UpLo> ConstSelfAdjointView;
@@ -66,7 +66,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
enum {
Mode = internal::traits<SelfAdjointView>::Mode,
Flags = internal::traits<SelfAdjointView>::Flags,
TransposeMode = ((Mode & Upper) ? Lower : 0) | ((Mode & Lower) ? Upper : 0)
TransposeMode = ((int(Mode) & int(Upper)) ? Lower : 0) | ((int(Mode) & int(Lower)) ? Upper : 0)
};
typedef typename MatrixType::PlainObject PlainObject;
@@ -76,14 +76,14 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
EIGEN_STATIC_ASSERT(UpLo==Lower || UpLo==Upper,SELFADJOINTVIEW_ACCEPTS_UPPER_AND_LOWER_MODE_ONLY);
}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_matrix.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_matrix.innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return m_matrix.outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return m_matrix.innerStride(); }
/** \sa MatrixBase::coeff()
* \warning the coordinates must fit into the referenced triangular part
@@ -132,7 +132,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
{
return Product<OtherDerived,SelfAdjointView>(lhs.derived(),rhs);
}
friend EIGEN_DEVICE_FUNC
const SelfAdjointView<const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,MatrixType,product),UpLo>
operator*(const Scalar& s, const SelfAdjointView& mat)
@@ -300,17 +300,17 @@ protected:
using Base::m_src;
using Base::m_functor;
public:
typedef typename Base::DstEvaluatorType DstEvaluatorType;
typedef typename Base::SrcEvaluatorType SrcEvaluatorType;
typedef typename Base::Scalar Scalar;
typedef typename Base::AssignmentTraits AssignmentTraits;
EIGEN_DEVICE_FUNC triangular_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{}
EIGEN_DEVICE_FUNC void assignCoeff(Index row, Index col)
{
eigen_internal_assert(row!=col);
@@ -318,12 +318,12 @@ public:
m_functor.assignCoeff(m_dst.coeffRef(row,col), tmp);
m_functor.assignCoeff(m_dst.coeffRef(col,row), numext::conj(tmp));
}
EIGEN_DEVICE_FUNC void assignDiagonalCoeff(Index id)
{
Base::assignCoeff(id,id);
}
EIGEN_DEVICE_FUNC void assignOppositeCoeff(Index, Index)
{ eigen_internal_assert(false && "should never be called"); }
};

View File

@@ -13,7 +13,7 @@
namespace Eigen {
template<typename Decomposition, typename RhsType, typename StorageKind> class SolveImpl;
/** \class Solve
* \ingroup Core_Module
*
@@ -64,13 +64,13 @@ class Solve : public SolveImpl<Decomposition,RhsType,typename internal::traits<R
public:
typedef typename internal::traits<Solve>::PlainObject PlainObject;
typedef typename internal::traits<Solve>::StorageIndex StorageIndex;
Solve(const Decomposition &dec, const RhsType &rhs)
: m_dec(dec), m_rhs(rhs)
{}
EIGEN_DEVICE_FUNC Index rows() const { return m_dec.cols(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dec.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC const Decomposition& dec() const { return m_dec; }
EIGEN_DEVICE_FUNC const RhsType& rhs() const { return m_rhs; }
@@ -87,14 +87,14 @@ class SolveImpl<Decomposition,RhsType,Dense>
: public MatrixBase<Solve<Decomposition,RhsType> >
{
typedef Solve<Decomposition,RhsType> Derived;
public:
typedef MatrixBase<Solve<Decomposition,RhsType> > Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
private:
Scalar coeff(Index row, Index col) const;
Scalar coeff(Index i) const;
};
@@ -119,15 +119,15 @@ struct evaluator<Solve<Decomposition,RhsType> >
typedef evaluator<PlainObject> Base;
enum { Flags = Base::Flags | EvalBeforeNestingBit };
EIGEN_DEVICE_FUNC explicit evaluator(const SolveType& solve)
: m_result(solve.rows(), solve.cols())
{
::new (static_cast<Base*>(this)) Base(m_result);
solve.dec()._solve_impl(solve.rhs(), m_result);
}
protected:
protected:
PlainObject m_result;
};
@@ -176,7 +176,7 @@ struct Assignment<DstXprType, Solve<CwiseUnaryOp<internal::scalar_conjugate_op<t
Index dstCols = src.cols();
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
src.dec().nestedExpression().nestedExpression().template _solve_impl_transposed<true>(src.rhs(), dst);
}
};

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_SOLVETRIANGULAR_H
#define EIGEN_SOLVETRIANGULAR_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -54,7 +54,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
typedef blas_traits<Lhs> LhsProductTraits;
typedef typename LhsProductTraits::ExtractType ActualLhsType;
typedef Map<Matrix<RhsScalar,Dynamic,1>, Aligned> MappedRhs;
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
ActualLhsType actualLhs = LhsProductTraits::extract(lhs);
@@ -64,7 +64,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhs,rhs.size(),
(useRhsDirectly ? rhs.data() : 0));
if(!useRhsDirectly)
MappedRhs(actualRhs,rhs.size()) = rhs;
@@ -85,7 +85,7 @@ struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,Dynamic>
typedef blas_traits<Lhs> LhsProductTraits;
typedef typename LhsProductTraits::DirectLinearAccessType ActualLhsType;
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
typename internal::add_const_on_value_type<ActualLhsType>::type actualLhs = LhsProductTraits::extract(lhs);
@@ -118,7 +118,7 @@ struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,false> {
DiagIndex = IsLower ? LoopIndex : Size - LoopIndex - 1,
StartIndex = IsLower ? 0 : DiagIndex+1
};
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
if (LoopIndex>0)
rhs.coeffRef(DiagIndex) -= lhs.row(DiagIndex).template segment<LoopIndex>(StartIndex).transpose()
@@ -133,22 +133,22 @@ struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,false> {
template<typename Lhs, typename Rhs, int Mode, int LoopIndex, int Size>
struct triangular_solver_unroller<Lhs,Rhs,Mode,LoopIndex,Size,true> {
static void run(const Lhs&, Rhs&) {}
static EIGEN_DEVICE_FUNC void run(const Lhs&, Rhs&) {}
};
template<typename Lhs, typename Rhs, int Mode>
struct triangular_solver_selector<Lhs,Rhs,OnTheLeft,Mode,CompleteUnrolling,1> {
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{ triangular_solver_unroller<Lhs,Rhs,Mode,0,Rhs::SizeAtCompileTime>::run(lhs,rhs); }
};
template<typename Lhs, typename Rhs, int Mode>
struct triangular_solver_selector<Lhs,Rhs,OnTheRight,Mode,CompleteUnrolling,1> {
static void run(const Lhs& lhs, Rhs& rhs)
static EIGEN_DEVICE_FUNC void run(const Lhs& lhs, Rhs& rhs)
{
Transpose<const Lhs> trLhs(lhs);
Transpose<Rhs> trRhs(rhs);
triangular_solver_unroller<Transpose<const Lhs>,Transpose<Rhs>,
((Mode&Upper)==Upper ? Lower : Upper) | (Mode&UnitDiag),
0,Rhs::SizeAtCompileTime>::run(trLhs,trRhs);
@@ -168,7 +168,7 @@ EIGEN_DEVICE_FUNC void TriangularViewImpl<MatrixType,Mode,Dense>::solveInPlace(c
{
OtherDerived& other = _other.const_cast_derived();
eigen_assert( derived().cols() == derived().rows() && ((Side==OnTheLeft && derived().cols() == other.rows()) || (Side==OnTheRight && derived().cols() == other.cols())) );
eigen_assert((!(Mode & ZeroDiag)) && bool(Mode & (Upper|Lower)));
eigen_assert((!(int(Mode) & int(ZeroDiag))) && bool(int(Mode) & (int(Upper) | int(Lower))));
// If solving for a 0x0 matrix, nothing to do, simply return.
if (derived().cols() == 0)
return;
@@ -213,8 +213,8 @@ template<int Side, typename TriangularType, typename Rhs> struct triangular_solv
: m_triangularMatrix(tri), m_rhs(rhs)
{}
inline Index rows() const { return m_rhs.rows(); }
inline Index cols() const { return m_rhs.cols(); }
inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_rhs.rows(); }
inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); }
template<typename Dest> inline void evalTo(Dest& dst) const
{

View File

@@ -110,7 +110,7 @@ class SolverBase : public EigenBase<Derived>
}
/** \internal the return type of transpose() */
typedef typename internal::add_const<Transpose<const Derived> >::type ConstTransposeReturnType;
typedef Transpose<const Derived> ConstTransposeReturnType;
/** \returns an expression of the transposed of the factored matrix.
*
* A typical usage is to solve for the transposed problem A^T x = b:
@@ -118,16 +118,16 @@ class SolverBase : public EigenBase<Derived>
*
* \sa adjoint(), solve()
*/
inline ConstTransposeReturnType transpose() const
inline const ConstTransposeReturnType transpose() const
{
return ConstTransposeReturnType(derived());
}
/** \internal the return type of adjoint() */
typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, ConstTransposeReturnType>,
ConstTransposeReturnType
>::type AdjointReturnType;
CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, const ConstTransposeReturnType>,
const ConstTransposeReturnType
>::type AdjointReturnType;
/** \returns an expression of the adjoint of the factored matrix
*
* A typical usage is to solve for the adjoint problem A' x = b:
@@ -137,7 +137,7 @@ class SolverBase : public EigenBase<Derived>
*
* \sa transpose(), solve()
*/
inline AdjointReturnType adjoint() const
inline const AdjointReturnType adjoint() const
{
return AdjointReturnType(derived().transpose());
}

View File

@@ -123,41 +123,28 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
using std::pow;
using std::sqrt;
using std::abs;
// This program calculates the machine-dependent constants
// bl, b2, slm, s2m, relerr overfl
// from the "basic" machine-dependent numbers
// nbig, ibeta, it, iemin, iemax, rbig.
// The following define the basic machine-dependent constants.
// For portability, the PORT subprograms "ilmaeh" and "rlmach"
// are used. For any specific computer, each of the assignment
// statements can be replaced
static const int ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
static const int it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
static const int iemin = NumTraits<RealScalar>::min_exponent(); // minimum exponent
static const int iemax = NumTraits<RealScalar>::max_exponent(); // maximum exponent
static const RealScalar rbig = NumTraits<RealScalar>::highest(); // largest floating-point number
static const RealScalar b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(-((1-iemin)/2)))); // lower boundary of midrange
static const RealScalar b2 = RealScalar(pow(RealScalar(ibeta),RealScalar((iemax + 1 - it)/2))); // upper boundary of midrange
static const RealScalar s1m = RealScalar(pow(RealScalar(ibeta),RealScalar((2-iemin)/2))); // scaling factor for lower range
static const RealScalar s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(- ((iemax+it)/2)))); // scaling factor for upper range
static const RealScalar eps = RealScalar(pow(double(ibeta), 1-it));
static const RealScalar relerr = sqrt(eps); // tolerance for neglecting asml
const Derived& vec(_vec.derived());
static bool initialized = false;
static RealScalar b1, b2, s1m, s2m, rbig, relerr;
if(!initialized)
{
int ibeta, it, iemin, iemax, iexp;
RealScalar eps;
// This program calculates the machine-dependent constants
// bl, b2, slm, s2m, relerr overfl
// from the "basic" machine-dependent numbers
// nbig, ibeta, it, iemin, iemax, rbig.
// The following define the basic machine-dependent constants.
// For portability, the PORT subprograms "ilmaeh" and "rlmach"
// are used. For any specific computer, each of the assignment
// statements can be replaced
ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
iemin = std::numeric_limits<RealScalar>::min_exponent; // minimum exponent
iemax = std::numeric_limits<RealScalar>::max_exponent; // maximum exponent
rbig = (std::numeric_limits<RealScalar>::max)(); // largest floating-point number
iexp = -((1-iemin)/2);
b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // lower boundary of midrange
iexp = (iemax + 1 - it)/2;
b2 = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // upper boundary of midrange
iexp = (2-iemin)/2;
s1m = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // scaling factor for lower range
iexp = - ((iemax+it)/2);
s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp))); // scaling factor for upper range
eps = RealScalar(pow(double(ibeta), 1-it));
relerr = sqrt(eps); // tolerance for neglecting asml
initialized = true;
}
Index n = vec.size();
RealScalar ab2 = b2 / RealScalar(n);
RealScalar asml = RealScalar(0);
@@ -166,9 +153,9 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
for(Index j=0; j<vec.outerSize(); ++j)
{
for(typename Derived::InnerIterator it(vec, j); it; ++it)
for(typename Derived::InnerIterator iter(vec, j); iter; ++iter)
{
RealScalar ax = abs(it.value());
RealScalar ax = abs(iter.value());
if(ax > ab2) abig += numext::abs2(ax*s2m);
else if(ax < b1) asml += numext::abs2(ax*s1m);
else amed += numext::abs2(ax);

View File

@@ -7,6 +7,9 @@
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_STLITERATORS_H
#define EIGEN_STLITERATORS_H
namespace Eigen {
namespace internal {
@@ -30,10 +33,10 @@ public:
typedef Index difference_type;
typedef std::random_access_iterator_tag iterator_category;
indexed_based_stl_iterator_base() : mp_xpr(0), m_index(0) {}
indexed_based_stl_iterator_base(XprType& xpr, Index index) : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_iterator_base() EIGEN_NO_THROW : mp_xpr(0), m_index(0) {}
indexed_based_stl_iterator_base(XprType& xpr, Index index) EIGEN_NO_THROW : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_iterator_base(const non_const_iterator& other)
indexed_based_stl_iterator_base(const non_const_iterator& other) EIGEN_NO_THROW
: mp_xpr(other.mp_xpr), m_index(other.m_index)
{}
@@ -93,6 +96,85 @@ protected:
Index m_index;
};
template<typename Derived>
class indexed_based_stl_reverse_iterator_base
{
protected:
typedef indexed_based_stl_iterator_traits<Derived> traits;
typedef typename traits::XprType XprType;
typedef indexed_based_stl_reverse_iterator_base<typename traits::non_const_iterator> non_const_iterator;
typedef indexed_based_stl_reverse_iterator_base<typename traits::const_iterator> const_iterator;
typedef typename internal::conditional<internal::is_const<XprType>::value,non_const_iterator,const_iterator>::type other_iterator;
// NOTE: in C++03 we cannot declare friend classes through typedefs because we need to write friend class:
friend class indexed_based_stl_reverse_iterator_base<typename traits::const_iterator>;
friend class indexed_based_stl_reverse_iterator_base<typename traits::non_const_iterator>;
public:
typedef Index difference_type;
typedef std::random_access_iterator_tag iterator_category;
indexed_based_stl_reverse_iterator_base() : mp_xpr(0), m_index(0) {}
indexed_based_stl_reverse_iterator_base(XprType& xpr, Index index) : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_reverse_iterator_base(const non_const_iterator& other)
: mp_xpr(other.mp_xpr), m_index(other.m_index)
{}
indexed_based_stl_reverse_iterator_base& operator=(const non_const_iterator& other)
{
mp_xpr = other.mp_xpr;
m_index = other.m_index;
return *this;
}
Derived& operator++() { --m_index; return derived(); }
Derived& operator--() { ++m_index; return derived(); }
Derived operator++(int) { Derived prev(derived()); operator++(); return prev;}
Derived operator--(int) { Derived prev(derived()); operator--(); return prev;}
friend Derived operator+(const indexed_based_stl_reverse_iterator_base& a, Index b) { Derived ret(a.derived()); ret += b; return ret; }
friend Derived operator-(const indexed_based_stl_reverse_iterator_base& a, Index b) { Derived ret(a.derived()); ret -= b; return ret; }
friend Derived operator+(Index a, const indexed_based_stl_reverse_iterator_base& b) { Derived ret(b.derived()); ret += a; return ret; }
friend Derived operator-(Index a, const indexed_based_stl_reverse_iterator_base& b) { Derived ret(b.derived()); ret -= a; return ret; }
Derived& operator+=(Index b) { m_index -= b; return derived(); }
Derived& operator-=(Index b) { m_index += b; return derived(); }
difference_type operator-(const indexed_based_stl_reverse_iterator_base& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return other.m_index - m_index;
}
difference_type operator-(const other_iterator& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return other.m_index - m_index;
}
bool operator==(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator<=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator> (const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator>=(const indexed_based_stl_reverse_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
bool operator==(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator<=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator> (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator>=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
protected:
Derived& derived() { return static_cast<Derived&>(*this); }
const Derived& derived() const { return static_cast<const Derived&>(*this); }
XprType *mp_xpr;
Index m_index;
};
template<typename XprType>
class pointer_based_stl_iterator
{
@@ -111,17 +193,17 @@ public:
typedef typename internal::conditional<bool(is_lvalue), value_type&, const value_type&>::type reference;
pointer_based_stl_iterator() : m_ptr(0) {}
pointer_based_stl_iterator(XprType& xpr, Index index) : m_incr(xpr.innerStride())
pointer_based_stl_iterator() EIGEN_NO_THROW : m_ptr(0) {}
pointer_based_stl_iterator(XprType& xpr, Index index) EIGEN_NO_THROW : m_incr(xpr.innerStride())
{
m_ptr = xpr.data() + index * m_incr.value();
}
pointer_based_stl_iterator(const non_const_iterator& other)
pointer_based_stl_iterator(const non_const_iterator& other) EIGEN_NO_THROW
: m_ptr(other.m_ptr), m_incr(other.m_incr)
{}
pointer_based_stl_iterator& operator=(const non_const_iterator& other)
pointer_based_stl_iterator& operator=(const non_const_iterator& other) EIGEN_NO_THROW
{
m_ptr = other.m_ptr;
m_incr.setValue(other.m_incr);
@@ -267,6 +349,54 @@ public:
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
template<typename _XprType, DirectionType Direction>
struct indexed_based_stl_iterator_traits<subvector_stl_reverse_iterator<_XprType,Direction> >
{
typedef _XprType XprType;
typedef subvector_stl_reverse_iterator<typename internal::remove_const<XprType>::type, Direction> non_const_iterator;
typedef subvector_stl_reverse_iterator<typename internal::add_const<XprType>::type, Direction> const_iterator;
};
template<typename XprType, DirectionType Direction>
class subvector_stl_reverse_iterator : public indexed_based_stl_reverse_iterator_base<subvector_stl_reverse_iterator<XprType,Direction> >
{
protected:
enum { is_lvalue = internal::is_lvalue<XprType>::value };
typedef indexed_based_stl_reverse_iterator_base<subvector_stl_reverse_iterator> Base;
using Base::m_index;
using Base::mp_xpr;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ColXpr,typename XprType::RowXpr>::type SubVectorType;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ConstColXpr,typename XprType::ConstRowXpr>::type ConstSubVectorType;
public:
typedef typename internal::conditional<bool(is_lvalue), SubVectorType, ConstSubVectorType>::type reference;
typedef typename reference::PlainObject value_type;
private:
class subvector_stl_reverse_iterator_ptr
{
public:
subvector_stl_reverse_iterator_ptr(const reference &subvector) : m_subvector(subvector) {}
reference* operator->() { return &m_subvector; }
private:
reference m_subvector;
};
public:
typedef subvector_stl_reverse_iterator_ptr pointer;
subvector_stl_reverse_iterator() : Base() {}
subvector_stl_reverse_iterator(XprType& xpr, Index index) : Base(xpr,index) {}
reference operator*() const { return (*mp_xpr).template subVector<Direction>(m_index); }
reference operator[](Index i) const { return (*mp_xpr).template subVector<Direction>(m_index+i); }
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
} // namespace internal
@@ -329,3 +459,5 @@ inline typename DenseBase<Derived>::const_iterator DenseBase<Derived>::cend() co
}
} // namespace Eigen
#endif // EIGEN_STLITERATORS_H

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_STRIDE_H
#define EIGEN_STRIDE_H
namespace Eigen {
namespace Eigen {
/** \class Stride
* \ingroup Core_Module
@@ -38,6 +38,14 @@ namespace Eigen {
* \include Map_general_stride.cpp
* Output: \verbinclude Map_general_stride.out
*
* Both strides can be negative. However, a negative stride of -1 cannot be specified at compile time
* because of the ambiguity with Dynamic which is defined to -1 (historically, negative strides were
* not allowed).
*
* Note that for compile-time vectors (ColsAtCompileTime==1 or RowsAtCompile==1),
* the inner stride is the pointer increment between two consecutive elements,
* regardless of storage layout.
*
* \sa class InnerStride, class OuterStride, \ref TopicStorageOrders
*/
template<int _OuterStrideAtCompileTime, int _InnerStrideAtCompileTime>
@@ -55,6 +63,8 @@ class Stride
Stride()
: m_outer(OuterStrideAtCompileTime), m_inner(InnerStrideAtCompileTime)
{
// FIXME: for Eigen 4 we should use DynamicIndex instead of Dynamic.
// FIXME: for Eigen 4 we should also unify this API with fix<>
eigen_assert(InnerStrideAtCompileTime != Dynamic && OuterStrideAtCompileTime != Dynamic);
}
@@ -63,7 +73,6 @@ class Stride
Stride(Index outerStride, Index innerStride)
: m_outer(outerStride), m_inner(innerStride)
{
eigen_assert(innerStride>=0 && outerStride>=0);
}
/** Copy constructor */
@@ -73,10 +82,10 @@ class Stride
{}
/** \returns the outer stride */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outer() const { return m_outer.value(); }
/** \returns the inner stride */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index inner() const { return m_inner.value(); }
protected:

View File

@@ -11,7 +11,7 @@
#ifndef EIGEN_TRANSPOSE_H
#define EIGEN_TRANSPOSE_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<typename MatrixType>
@@ -65,10 +65,10 @@ template<typename MatrixType> class Transpose
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Transpose)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -153,6 +153,8 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
{
return derived().nestedExpression().coeffRef(index);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TransposeImpl)
};
/** \returns an expression of the transpose of *this.
@@ -176,7 +178,7 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
* \sa transposeInPlace(), adjoint() */
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Transpose<Derived>
typename DenseBase<Derived>::TransposeReturnType
DenseBase<Derived>::transpose()
{
return TransposeReturnType(derived());
@@ -189,7 +191,7 @@ DenseBase<Derived>::transpose()
* \sa transposeInPlace(), adjoint() */
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
typename DenseBase<Derived>::ConstTransposeReturnType
const typename DenseBase<Derived>::ConstTransposeReturnType
DenseBase<Derived>::transpose() const
{
return ConstTransposeReturnType(derived());
@@ -241,7 +243,6 @@ struct inplace_transpose_selector<MatrixType,true,false> { // square matrix
}
};
// TODO: vectorized path is currently limited to LargestPacketSize x LargestPacketSize cases only.
template<typename MatrixType>
struct inplace_transpose_selector<MatrixType,true,true> { // PacketSize x PacketSize
static void run(MatrixType& m) {
@@ -258,16 +259,66 @@ struct inplace_transpose_selector<MatrixType,true,true> { // PacketSize x Packet
}
};
template <typename MatrixType, Index Alignment>
void BlockedInPlaceTranspose(MatrixType& m) {
typedef typename MatrixType::Scalar Scalar;
typedef typename internal::packet_traits<typename MatrixType::Scalar>::type Packet;
const Index PacketSize = internal::packet_traits<Scalar>::size;
eigen_assert(m.rows() == m.cols());
int row_start = 0;
for (; row_start + PacketSize <= m.rows(); row_start += PacketSize) {
for (int col_start = row_start; col_start + PacketSize <= m.cols(); col_start += PacketSize) {
PacketBlock<Packet> A;
if (row_start == col_start) {
for (Index i=0; i<PacketSize; ++i)
A.packet[i] = m.template packetByOuterInner<Alignment>(row_start + i,col_start);
internal::ptranspose(A);
for (Index i=0; i<PacketSize; ++i)
m.template writePacket<Alignment>(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), A.packet[i]);
} else {
PacketBlock<Packet> B;
for (Index i=0; i<PacketSize; ++i) {
A.packet[i] = m.template packetByOuterInner<Alignment>(row_start + i,col_start);
B.packet[i] = m.template packetByOuterInner<Alignment>(col_start + i, row_start);
}
internal::ptranspose(A);
internal::ptranspose(B);
for (Index i=0; i<PacketSize; ++i) {
m.template writePacket<Alignment>(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), B.packet[i]);
m.template writePacket<Alignment>(m.rowIndexByOuterInner(col_start + i, row_start), m.colIndexByOuterInner(col_start + i,row_start), A.packet[i]);
}
}
}
}
for (Index row = row_start; row < m.rows(); ++row) {
m.matrix().row(row).head(row).swap(
m.matrix().col(row).head(row).transpose());
}
}
template<typename MatrixType,bool MatchPacketSize>
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square matrix
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square or dynamic matrix
static void run(MatrixType& m) {
if (m.rows()==m.cols())
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
else
typedef typename MatrixType::Scalar Scalar;
if (m.rows() == m.cols()) {
const Index PacketSize = internal::packet_traits<Scalar>::size;
if (!NumTraits<Scalar>::IsComplex && m.rows() >= PacketSize) {
if ((m.rows() % PacketSize) == 0)
BlockedInPlaceTranspose<MatrixType,internal::evaluator<MatrixType>::Alignment>(m);
else
BlockedInPlaceTranspose<MatrixType,Unaligned>(m);
}
else {
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
}
} else {
m = m.transpose().eval();
}
}
};
} // end namespace internal
/** This is the "in place" version of transpose(): it replaces \c *this by its own transpose.
@@ -285,7 +336,7 @@ struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non squ
* Notice however that this method is only useful if you want to replace a matrix by its own transpose.
* If you just need the transpose of a matrix, use transpose().
*
* \note if the matrix is not square, then \c *this must be a resizable matrix.
* \note if the matrix is not square, then \c *this must be a resizable matrix.
* This excludes (non-square) fixed-size matrices, block-expressions and maps.
*
* \sa transpose(), adjoint(), adjointInPlace() */

View File

@@ -10,20 +10,22 @@
#ifndef EIGEN_TRANSPOSITIONS_H
#define EIGEN_TRANSPOSITIONS_H
namespace Eigen {
namespace Eigen {
template<typename Derived>
class TranspositionsBase
{
typedef internal::traits<Derived> Traits;
public:
typedef typename Traits::IndicesType IndicesType;
typedef typename IndicesType::Scalar StorageIndex;
typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3
EIGEN_DEVICE_FUNC
Derived& derived() { return *static_cast<Derived*>(this); }
EIGEN_DEVICE_FUNC
const Derived& derived() const { return *static_cast<const Derived*>(this); }
/** Copies the \a other transpositions into \c *this */
@@ -35,13 +37,17 @@ class TranspositionsBase
}
/** \returns the number of transpositions */
EIGEN_DEVICE_FUNC
Index size() const { return indices().size(); }
/** \returns the number of rows of the equivalent permutation matrix */
EIGEN_DEVICE_FUNC
Index rows() const { return indices().size(); }
/** \returns the number of columns of the equivalent permutation matrix */
EIGEN_DEVICE_FUNC
Index cols() const { return indices().size(); }
/** Direct access to the underlying index vector */
EIGEN_DEVICE_FUNC
inline const StorageIndex& coeff(Index i) const { return indices().coeff(i); }
/** Direct access to the underlying index vector */
inline StorageIndex& coeffRef(Index i) { return indices().coeffRef(i); }
@@ -55,8 +61,10 @@ class TranspositionsBase
inline StorageIndex& operator[](Index i) { return indices()(i); }
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return derived().indices(); }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return derived().indices(); }
/** Resizes to given size. */
@@ -178,8 +186,10 @@ class Transpositions : public TranspositionsBase<Transpositions<SizeAtCompileTim
{}
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -237,9 +247,11 @@ class Map<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,_StorageIndex>,P
#endif
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -279,9 +291,11 @@ class TranspositionsWrapper
}
/** const version of indices(). */
EIGEN_DEVICE_FUNC
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the transpositions. */
EIGEN_DEVICE_FUNC
IndicesType& indices() { return m_indices; }
protected:
@@ -335,9 +349,12 @@ class Transpose<TranspositionsBase<TranspositionsDerived> >
explicit Transpose(const TranspositionType& t) : m_transpositions(t) {}
Index size() const { return m_transpositions.size(); }
Index rows() const { return m_transpositions.size(); }
Index cols() const { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index size() const EIGEN_NOEXCEPT { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return m_transpositions.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return m_transpositions.size(); }
/** \returns the \a matrix with the inverse transpositions applied to the columns.
*/
@@ -356,7 +373,8 @@ class Transpose<TranspositionsBase<TranspositionsDerived> >
{
return Product<Transpose, OtherDerived, AliasFreeProduct>(*this, matrix.derived());
}
EIGEN_DEVICE_FUNC
const TranspositionType& nestedExpression() const { return m_transpositions; }
protected:

View File

@@ -11,12 +11,12 @@
#ifndef EIGEN_TRIANGULARMATRIX_H
#define EIGEN_TRIANGULARMATRIX_H
namespace Eigen {
namespace Eigen {
namespace internal {
template<int Side, typename TriangularType, typename Rhs> struct triangular_solve_retval;
}
/** \class TriangularBase
@@ -34,16 +34,16 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime,
SizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::RowsAtCompileTime,
internal::traits<Derived>::ColsAtCompileTime>::ret),
/**< This is equal to the number of coefficients, i.e. the number of
* rows times the number of columns, or to \a Dynamic if this is not
* known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */
MaxSizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::MaxRowsAtCompileTime,
internal::traits<Derived>::MaxColsAtCompileTime>::ret)
};
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef typename internal::traits<Derived>::StorageKind StorageKind;
@@ -53,17 +53,17 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
typedef Derived const& Nested;
EIGEN_DEVICE_FUNC
inline TriangularBase() { eigen_assert(!((Mode&UnitDiag) && (Mode&ZeroDiag))); }
inline TriangularBase() { eigen_assert(!((int(Mode) & int(UnitDiag)) && (int(Mode) & int(ZeroDiag)))); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return derived().cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index outerStride() const EIGEN_NOEXCEPT { return derived().outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index innerStride() const EIGEN_NOEXCEPT { return derived().innerStride(); }
EIGEN_DEVICE_FUNC
inline Index rows() const { return derived().rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return derived().cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return derived().outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return derived().innerStride(); }
// dummy resize function
EIGEN_DEVICE_FUNC
void resize(Index rows, Index cols)
@@ -100,12 +100,10 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
return coeffRef(row,col);
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
EIGEN_DEVICE_FUNC
inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
EIGEN_DEVICE_FUNC
inline Derived& derived() { return *static_cast<Derived*>(this); }
#endif // not EIGEN_PARSED_BY_DOXYGEN
template<typename DenseDerived>
EIGEN_DEVICE_FUNC
@@ -156,7 +154,7 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
* \param MatrixType the type of the object in which we are taking the triangular part
* \param Mode the kind of triangular matrix expression to construct. Can be #Upper,
* #Lower, #UnitUpper, #UnitLower, #StrictlyUpper, or #StrictlyLower.
* This is in fact a bit field; it must have either #Upper or #Lower,
* This is in fact a bit field; it must have either #Upper or #Lower,
* and additionally it may have #UnitDiag or #ZeroDiag or neither.
*
* This class represents a triangular part of a matrix, not necessarily square. Strictly speaking, for rectangular
@@ -199,7 +197,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
typedef TriangularView<typename internal::add_const<MatrixType>::type, _Mode> ConstTriangularView;
public:
typedef typename internal::traits<TriangularView>::StorageKind StorageKind;
@@ -218,17 +216,15 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
EIGEN_DEVICE_FUNC
explicit inline TriangularView(MatrixType& matrix) : m_matrix(matrix)
{}
using Base::operator=;
TriangularView& operator=(const TriangularView &other)
{ return Base::operator=(other); }
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TriangularView)
/** \copydoc EigenBase::rows() */
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); }
/** \copydoc EigenBase::cols() */
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); }
/** \returns a const reference to the nested expression */
EIGEN_DEVICE_FUNC
@@ -237,7 +233,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
/** \returns a reference to the nested expression */
EIGEN_DEVICE_FUNC
NestedExpression& nestedExpression() { return m_matrix; }
typedef TriangularView<const MatrixConjugateReturnType,Mode> ConjugateReturnType;
/** \sa MatrixBase::conjugate() const */
EIGEN_DEVICE_FUNC
@@ -271,7 +267,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
typename MatrixType::TransposeReturnType tmp(m_matrix);
return TransposeReturnType(tmp);
}
typedef TriangularView<const typename MatrixType::ConstTransposeReturnType,TransposeMode> ConstTransposeReturnType;
/** \sa MatrixBase::transpose() const */
EIGEN_DEVICE_FUNC
@@ -282,10 +278,10 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
template<typename Other>
EIGEN_DEVICE_FUNC
inline const Solve<TriangularView, Other>
inline const Solve<TriangularView, Other>
solve(const MatrixBase<Other>& other) const
{ return Solve<TriangularView, Other>(*this, other.derived()); }
// workaround MSVC ICE
#if EIGEN_COMP_MSVC
template<int Side, typename Other>
@@ -329,7 +325,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
else
return m_matrix.diagonal().prod();
}
protected:
MatrixTypeNested m_matrix;
@@ -391,7 +387,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
internal::call_assignment_no_alias(derived(), other.derived(), internal::sub_assign_op<Scalar,typename Other::Scalar>());
return derived();
}
/** \sa MatrixBase::operator*=() */
EIGEN_DEVICE_FUNC
TriangularViewType& operator*=(const typename internal::traits<MatrixType>::Scalar& other) { return *this = derived().nestedExpression() * other; }
@@ -444,7 +440,6 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
EIGEN_DEVICE_FUNC
TriangularViewType& operator=(const MatrixBase<OtherDerived>& other);
#ifndef EIGEN_PARSED_BY_DOXYGEN
EIGEN_DEVICE_FUNC
TriangularViewType& operator=(const TriangularViewImpl& other)
{ return *this = other.derived().nestedExpression(); }
@@ -458,7 +453,6 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
/** \deprecated */
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC
void lazyAssign(const MatrixBase<OtherDerived>& other);
#endif
/** Efficient triangular matrix times vector/matrix product */
template<typename OtherDerived>
@@ -526,11 +520,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
/** Swaps the coefficients of the common triangular parts of two matrices */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
#ifdef EIGEN_PARSED_BY_DOXYGEN
void swap(TriangularBase<OtherDerived> &other)
#else
void swap(TriangularBase<OtherDerived> const & other)
#endif
{
EIGEN_STATIC_ASSERT_LVALUE(OtherDerived);
call_assignment(derived(), other.const_cast_derived(), internal::swap_assign_op<Scalar>());
@@ -554,9 +544,14 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
this->solveInPlace(dst);
}
template<typename ProductType>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE TriangularViewType& _assignProduct(const ProductType& prod, const Scalar& alpha, bool beta);
template <typename ProductType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TriangularViewType& _assignProduct(const ProductType& prod, const Scalar& alpha,
bool beta);
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(TriangularViewImpl)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TriangularViewImpl)
};
/***************************************************************************
@@ -713,7 +708,7 @@ bool MatrixBase<Derived>::isLowerTriangular(const RealScalar& prec) const
namespace internal {
// TODO currently a triangular expression has the form TriangularView<.,.>
// in the future triangular-ness should be defined by the expression traits
// such that Transpose<TriangularView<.,.> > is valid. (currently TriangularBase::transpose() is overloaded to make it work)
@@ -742,7 +737,7 @@ struct Dense2Triangular {};
template<typename Kernel, unsigned int Mode, int UnrollCount, bool ClearOpposite> struct triangular_assignment_loop;
/** \internal Specialization of the dense assignment kernel for triangular matrices.
* The main difference is that the triangular, diagonal, and opposite parts are processed through three different functions.
* \tparam UpLo must be either Lower or Upper
@@ -759,17 +754,17 @@ protected:
using Base::m_src;
using Base::m_functor;
public:
typedef typename Base::DstEvaluatorType DstEvaluatorType;
typedef typename Base::SrcEvaluatorType SrcEvaluatorType;
typedef typename Base::Scalar Scalar;
typedef typename Base::AssignmentTraits AssignmentTraits;
EIGEN_DEVICE_FUNC triangular_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{}
#ifdef EIGEN_INTERNAL_DEBUGGING
EIGEN_DEVICE_FUNC void assignCoeff(Index row, Index col)
{
@@ -779,16 +774,16 @@ public:
#else
using Base::assignCoeff;
#endif
EIGEN_DEVICE_FUNC void assignDiagonalCoeff(Index id)
{
if(Mode==UnitDiag && SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(id,id), Scalar(1));
else if(Mode==ZeroDiag && SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(id,id), Scalar(0));
else if(Mode==0) Base::assignCoeff(id,id);
}
EIGEN_DEVICE_FUNC void assignOppositeCoeff(Index row, Index col)
{
{
eigen_internal_assert(row!=col);
if(SetOpposite)
m_functor.assignCoeff(m_dst.coeffRef(row,col), Scalar(0));
@@ -809,17 +804,17 @@ void call_triangular_assignment_loop(DstXprType& dst, const SrcXprType& src, con
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
DstEvaluatorType dstEvaluator(dst);
typedef triangular_dense_assignment_kernel< Mode&(Lower|Upper),Mode&(UnitDiag|ZeroDiag|SelfAdjoint),SetOpposite,
DstEvaluatorType,SrcEvaluatorType,Functor> Kernel;
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
enum {
unroll = DstXprType::SizeAtCompileTime != Dynamic
&& SrcEvaluatorType::CoeffReadCost < HugeCost
&& DstXprType::SizeAtCompileTime * (DstEvaluatorType::CoeffReadCost+SrcEvaluatorType::CoeffReadCost) / 2 <= EIGEN_UNROLLING_LIMIT
&& DstXprType::SizeAtCompileTime * (int(DstEvaluatorType::CoeffReadCost) + int(SrcEvaluatorType::CoeffReadCost)) / 2 <= EIGEN_UNROLLING_LIMIT
};
triangular_assignment_loop<Kernel, Mode, unroll ? int(DstXprType::SizeAtCompileTime) : Dynamic, SetOpposite>::run(kernel);
}
@@ -841,8 +836,8 @@ struct Assignment<DstXprType, SrcXprType, Functor, Triangular2Triangular>
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
eigen_assert(int(DstXprType::Mode) == int(SrcXprType::Mode));
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
}
};
@@ -851,7 +846,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Triangular2Dense>
{
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
call_triangular_assignment_loop<SrcXprType::Mode, (SrcXprType::Mode&SelfAdjoint)==0>(dst, src, func);
call_triangular_assignment_loop<SrcXprType::Mode, (int(SrcXprType::Mode) & int(SelfAdjoint)) == 0>(dst, src, func);
}
};
@@ -860,7 +855,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, Dense2Triangular>
{
EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func)
{
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
call_triangular_assignment_loop<DstXprType::Mode, false>(dst, src, func);
}
};
@@ -871,19 +866,19 @@ struct triangular_assignment_loop
// FIXME: this is not very clean, perhaps this information should be provided by the kernel?
typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
typedef typename DstEvaluatorType::XprType DstXprType;
enum {
col = (UnrollCount-1) / DstXprType::RowsAtCompileTime,
row = (UnrollCount-1) % DstXprType::RowsAtCompileTime
};
typedef typename Kernel::Scalar Scalar;
EIGEN_DEVICE_FUNC
static inline void run(Kernel &kernel)
{
triangular_assignment_loop<Kernel, Mode, UnrollCount-1, SetOpposite>::run(kernel);
if(row==col)
kernel.assignDiagonalCoeff(row);
else if( ((Mode&Lower) && row>col) || ((Mode&Upper) && row<col) )
@@ -926,10 +921,10 @@ struct triangular_assignment_loop<Kernel, Mode, Dynamic, SetOpposite>
}
else
i = maxi;
if(i<kernel.rows()) // then i==j
kernel.assignDiagonalCoeff(i++);
if (((Mode&Upper) && SetOpposite) || (Mode&Lower))
{
for(; i < kernel.rows(); ++i)
@@ -949,11 +944,11 @@ template<typename DenseDerived>
EIGEN_DEVICE_FUNC void TriangularBase<Derived>::evalToLazy(MatrixBase<DenseDerived> &other) const
{
other.derived().resize(this->rows(), this->cols());
internal::call_triangular_assignment_loop<Derived::Mode,(Derived::Mode&SelfAdjoint)==0 /* SetOpposite */>(other.derived(), derived().nestedExpression());
internal::call_triangular_assignment_loop<Derived::Mode, (int(Derived::Mode) & int(SelfAdjoint)) == 0 /* SetOpposite */>(other.derived(), derived().nestedExpression());
}
namespace internal {
// Triangular = Product
template< typename DstXprType, typename Lhs, typename Rhs, typename Scalar>
struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::assign_op<Scalar,typename Product<Lhs,Rhs,DefaultProduct>::Scalar>, Dense2Triangular>
@@ -966,7 +961,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::assign_
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
dst._assignProduct(src, 1, 0);
dst._assignProduct(src, Scalar(1), false);
}
};
@@ -977,7 +972,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::add_ass
typedef Product<Lhs,Rhs,DefaultProduct> SrcXprType;
static void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op<Scalar,typename SrcXprType::Scalar> &)
{
dst._assignProduct(src, 1, 1);
dst._assignProduct(src, Scalar(1), true);
}
};
@@ -988,7 +983,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,DefaultProduct>, internal::sub_ass
typedef Product<Lhs,Rhs,DefaultProduct> SrcXprType;
static void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op<Scalar,typename SrcXprType::Scalar> &)
{
dst._assignProduct(src, -1, 1);
dst._assignProduct(src, Scalar(-1), true);
}
};

View File

@@ -65,10 +65,10 @@ class PartialReduxExpr : public internal::dense_xpr_base< PartialReduxExpr<Matri
explicit PartialReduxExpr(const MatrixType& mat, const MemberOp& func = MemberOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_DEVICE_FUNC
Index rows() const { return (Direction==Vertical ? 1 : m_matrix.rows()); }
EIGEN_DEVICE_FUNC
Index cols() const { return (Direction==Horizontal ? 1 : m_matrix.cols()); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index rows() const EIGEN_NOEXCEPT { return (Direction==Vertical ? 1 : m_matrix.rows()); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
Index cols() const EIGEN_NOEXCEPT { return (Direction==Horizontal ? 1 : m_matrix.cols()); }
EIGEN_DEVICE_FUNC
typename MatrixType::Nested nestedExpression() const { return m_matrix; }
@@ -134,7 +134,7 @@ struct member_redux {
typedef typename result_of<
BinaryOp(const Scalar&,const Scalar&)
>::type result_type;
enum { Vectorizable = functor_traits<BinaryOp>::PacketAccess };
template<int Size> struct Cost { enum { value = (Size-1) * functor_traits<BinaryOp>::Cost }; };
EIGEN_DEVICE_FUNC explicit member_redux(const BinaryOp func) : m_functor(func) {}
@@ -162,17 +162,17 @@ struct member_redux {
* where `foo` is any method of `VectorwiseOp`. This expression is equivalent to applying `foo()` to each
* column of `A` and then re-assemble the outputs in a matrix expression:
* \code [A.col(0).foo(), A.col(1).foo(), ..., A.col(A.cols()-1).foo()] \endcode
*
*
* Example: \include MatrixBase_colwise.cpp
* Output: \verbinclude MatrixBase_colwise.out
*
* The begin() and end() methods are obviously exceptions to the previous rule as they
* return STL-compatible begin/end iterators to the rows or columns of the nested expression.
* Typical use cases include for-range-loop and calls to STL algorithms:
*
*
* Example: \include MatrixBase_colwise_iterator_cxx11.cpp
* Output: \verbinclude MatrixBase_colwise_iterator_cxx11.out
*
*
* For a partial reduction on an empty input, some rules apply.
* For the sake of clarity, let's consider a vertical reduction:
* - If the number of columns is zero, then a 1x0 row-major vector expression is returned.
@@ -180,7 +180,7 @@ struct member_redux {
* - a row vector of zeros is returned for sum-like reductions (sum, squaredNorm, norm, etc.)
* - a row vector of ones is returned for a product reduction (e.g., <code>MatrixXd(n,0).colwise().prod()</code>)
* - an assert is triggered for all other reductions (minCoeff,maxCoeff,redux(bin_op))
*
*
* \sa DenseBase::colwise(), DenseBase::rowwise(), class PartialReduxExpr
*/
template<typename ExpressionType, int Direction> class VectorwiseOp
@@ -216,7 +216,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
};
protected:
template<typename OtherDerived> struct ExtendedType {
typedef Replicate<OtherDerived,
isVertical ? 1 : ExpressionType::RowsAtCompileTime,
@@ -279,27 +279,47 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
/** This is the const version of iterator (aka read-only) */
random_access_iterator_type const_iterator;
#else
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
typedef internal::subvector_stl_reverse_iterator<ExpressionType, DirectionType(Direction)> reverse_iterator;
typedef internal::subvector_stl_reverse_iterator<const ExpressionType, DirectionType(Direction)> const_reverse_iterator;
#endif
/** returns an iterator to the first row (rowwise) or column (colwise) of the nested expression.
* \sa end(), cbegin()
*/
iterator begin() { return iterator (m_matrix, 0); }
iterator begin() { return iterator (m_matrix, 0); }
/** const version of begin() */
const_iterator begin() const { return const_iterator(m_matrix, 0); }
const_iterator begin() const { return const_iterator(m_matrix, 0); }
/** const version of begin() */
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
/** returns a reverse iterator to the last row (rowwise) or column (colwise) of the nested expression.
* \sa rend(), crbegin()
*/
reverse_iterator rbegin() { return reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** const version of rbegin() */
const_reverse_iterator rbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** const version of rbegin() */
const_reverse_iterator crbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()-1); }
/** returns an iterator to the row (resp. column) following the last row (resp. column) of the nested expression
* \sa begin(), cend()
*/
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** returns a reverse iterator to the row (resp. column) before the first row (resp. column) of the nested expression
* \sa begin(), cend()
*/
reverse_iterator rend() { return reverse_iterator (m_matrix, -1); }
/** const version of rend() */
const_reverse_iterator rend() const { return const_reverse_iterator (m_matrix, -1); }
/** const version of rend() */
const_reverse_iterator crend() const { return const_reverse_iterator (m_matrix, -1); }
/** \returns a row or column vector expression of \c *this reduxed by \a func
*
@@ -308,7 +328,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
*
* \sa class VectorwiseOp, DenseBase::colwise(), DenseBase::rowwise()
*/
template<typename BinaryOp>
@@ -345,7 +365,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
*
* \warning the result is undefined if \c *this contains NaN.
*
* Example: \include PartialRedux_minCoeff.cpp
@@ -364,7 +384,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
*
* \warning the result is undefined if \c *this contains NaN.
*
* Example: \include PartialRedux_maxCoeff.cpp
@@ -719,6 +739,10 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_DEVICE_FUNC
const HNormalizedReturnType hnormalized() const;
# ifdef EIGEN_VECTORWISEOP_PLUGIN
# include EIGEN_VECTORWISEOP_PLUGIN
# endif
protected:
Index redux_length() const
{

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_VISITOR_H
#define EIGEN_VISITOR_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -70,22 +70,22 @@ class visitor_evaluator
public:
EIGEN_DEVICE_FUNC
explicit visitor_evaluator(const XprType &xpr) : m_evaluator(xpr), m_xpr(xpr) {}
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
enum {
RowsAtCompileTime = XprType::RowsAtCompileTime,
CoeffReadCost = internal::evaluator<XprType>::CoeffReadCost
};
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index size() const { return m_xpr.size(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_xpr.size(); }
EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index row, Index col) const
{ return m_evaluator.coeff(row, col); }
protected:
internal::evaluator<XprType> m_evaluator;
const XprType &m_xpr;
@@ -106,7 +106,7 @@ protected:
*
* \note compared to one or two \em for \em loops, visitors offer automatic
* unrolling for small fixed size matrix.
*
*
* \note if the matrix is empty, then the visitor is left unchanged.
*
* \sa minCoeff(Index*,Index*), maxCoeff(Index*,Index*), DenseBase::redux()
@@ -118,13 +118,13 @@ void DenseBase<Derived>::visit(Visitor& visitor) const
{
if(size()==0)
return;
typedef typename internal::visitor_evaluator<Derived> ThisEvaluator;
ThisEvaluator thisEval(derived());
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * ThisEvaluator::CoeffReadCost + (SizeAtCompileTime-1) * internal::functor_traits<Visitor>::Cost <= EIGEN_UNROLLING_LIMIT
&& SizeAtCompileTime * int(ThisEvaluator::CoeffReadCost) + (SizeAtCompileTime-1) * int(internal::functor_traits<Visitor>::Cost) <= EIGEN_UNROLLING_LIMIT
};
return internal::visitor_impl<Visitor, ThisEvaluator, unroll ? int(SizeAtCompileTime) : Dynamic>::run(thisEval, visitor);
}
@@ -157,7 +157,7 @@ struct coeff_visitor
*
* \sa DenseBase::minCoeff(Index*, Index*)
*/
template <typename Derived>
template <typename Derived, int NaNPropagation>
struct min_coeff_visitor : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
@@ -173,8 +173,40 @@ struct min_coeff_visitor : coeff_visitor<Derived>
}
};
template<typename Scalar>
struct functor_traits<min_coeff_visitor<Scalar> > {
template <typename Derived>
struct min_coeff_visitor<Derived, PropagateNumbers> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(this->res) || (!(numext::isnan)(value) && value < this->res))
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template <typename Derived>
struct min_coeff_visitor<Derived, PropagateNaN> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(value) || value < this->res)
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template<typename Scalar, int NaNPropagation>
struct functor_traits<min_coeff_visitor<Scalar, NaNPropagation> > {
enum {
Cost = NumTraits<Scalar>::AddCost
};
@@ -185,10 +217,10 @@ struct functor_traits<min_coeff_visitor<Scalar> > {
*
* \sa DenseBase::maxCoeff(Index*, Index*)
*/
template <typename Derived>
template <typename Derived, int NaNPropagation>
struct max_coeff_visitor : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
@@ -201,8 +233,40 @@ struct max_coeff_visitor : coeff_visitor<Derived>
}
};
template<typename Scalar>
struct functor_traits<max_coeff_visitor<Scalar> > {
template <typename Derived>
struct max_coeff_visitor<Derived, PropagateNumbers> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(this->res) || (!(numext::isnan)(value) && value > this->res))
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template <typename Derived>
struct max_coeff_visitor<Derived, PropagateNaN> : coeff_visitor<Derived>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
void operator() (const Scalar& value, Index i, Index j)
{
if((numext::isnan)(value) || value > this->res)
{
this->res = value;
this->row = i;
this->col = j;
}
}
};
template<typename Scalar, int NaNPropagation>
struct functor_traits<max_coeff_visitor<Scalar, NaNPropagation> > {
enum {
Cost = NumTraits<Scalar>::AddCost
};
@@ -212,22 +276,24 @@ struct functor_traits<max_coeff_visitor<Scalar> > {
/** \fn DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
* \returns the minimum of all coefficients of *this and puts in *row and *col its location.
*
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(Index*), DenseBase::maxCoeff(Index*,Index*), DenseBase::visit(), DenseBase::minCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
internal::min_coeff_visitor<Derived> minVisitor;
internal::min_coeff_visitor<Derived, NaNPropagation> minVisitor;
this->visit(minVisitor);
*rowId = minVisitor.row;
if (colId) *colId = minVisitor.col;
@@ -235,15 +301,17 @@ DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
}
/** \returns the minimum of all coefficients of *this and puts in *index its location.
*
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::minCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff(IndexType* index) const
@@ -251,7 +319,7 @@ DenseBase<Derived>::minCoeff(IndexType* index) const
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
internal::min_coeff_visitor<Derived> minVisitor;
internal::min_coeff_visitor<Derived, NaNPropagation> minVisitor;
this->visit(minVisitor);
*index = IndexType((RowsAtCompileTime==1) ? minVisitor.col : minVisitor.row);
return minVisitor.res;
@@ -259,22 +327,24 @@ DenseBase<Derived>::minCoeff(IndexType* index) const
/** \fn DenseBase<Derived>::maxCoeff(IndexType* rowId, IndexType* colId) const
* \returns the maximum of all coefficients of *this and puts in *row and *col its location.
*
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::maxCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
internal::max_coeff_visitor<Derived> maxVisitor;
internal::max_coeff_visitor<Derived, NaNPropagation> maxVisitor;
this->visit(maxVisitor);
*rowPtr = maxVisitor.row;
if (colPtr) *colPtr = maxVisitor.col;
@@ -282,15 +352,17 @@ DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
}
/** \returns the maximum of all coefficients of *this and puts in *index its location.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visitor(), DenseBase::maxCoeff()
*/
template<typename Derived>
template<typename IndexType>
template<int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff(IndexType* index) const
@@ -298,7 +370,7 @@ DenseBase<Derived>::maxCoeff(IndexType* index) const
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
internal::max_coeff_visitor<Derived> maxVisitor;
internal::max_coeff_visitor<Derived, NaNPropagation> maxVisitor;
this->visit(maxVisitor);
*index = (RowsAtCompileTime==1) ? maxVisitor.col : maxVisitor.row;
return maxVisitor.res;

View File

@@ -38,6 +38,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -47,7 +48,18 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet4cf> { typedef std::complex<float> type; enum {size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet4cf> {
typedef std::complex<float> type;
typedef Packet2cf half;
typedef Packet8f as_real;
enum {
size=4,
alignment=Aligned32,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet4cf padd<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf psub<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_sub_ps(a.v,b.v)); }
@@ -76,7 +88,6 @@ EIGEN_STRONG_INLINE Packet4cf pcmp_eq(const Packet4cf& a, const Packet4cf& b) {
}
template<> EIGEN_STRONG_INLINE Packet4cf ptrue<Packet4cf>(const Packet4cf& a) { return Packet4cf(ptrue(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pnot<Packet4cf>(const Packet4cf& a) { return Packet4cf(pnot(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pand <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_and_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf por <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_or_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pxor <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_xor_ps(a.v,b.v)); }
@@ -88,7 +99,9 @@ template<> EIGEN_STRONG_INLINE Packet4cf ploadu<Packet4cf>(const std::complex<fl
template<> EIGEN_STRONG_INLINE Packet4cf pset1<Packet4cf>(const std::complex<float>& from)
{
return Packet4cf(_mm256_castpd_ps(_mm256_broadcast_sd((const double*)(const void*)&from)));
const float re = std::real(from);
const float im = std::imag(from);
return Packet4cf(_mm256_set_ps(im, re, im, re, im, re, im, re));
}
template<> EIGEN_STRONG_INLINE Packet4cf ploaddup<Packet4cf>(const std::complex<float>* from)
@@ -150,79 +163,18 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet4cf>(const Packe
Packet2cf(_mm256_extractf128_ps(a.v,1))));
}
template<> EIGEN_STRONG_INLINE Packet4cf preduxp<Packet4cf>(const Packet4cf* vecs)
{
Packet8f t0 = _mm256_shuffle_ps(vecs[0].v, vecs[0].v, _MM_SHUFFLE(3, 1, 2 ,0));
Packet8f t1 = _mm256_shuffle_ps(vecs[1].v, vecs[1].v, _MM_SHUFFLE(3, 1, 2 ,0));
t0 = _mm256_hadd_ps(t0,t1);
Packet8f t2 = _mm256_shuffle_ps(vecs[2].v, vecs[2].v, _MM_SHUFFLE(3, 1, 2 ,0));
Packet8f t3 = _mm256_shuffle_ps(vecs[3].v, vecs[3].v, _MM_SHUFFLE(3, 1, 2 ,0));
t2 = _mm256_hadd_ps(t2,t3);
t1 = _mm256_permute2f128_ps(t0,t2, 0 + (2<<4));
t3 = _mm256_permute2f128_ps(t0,t2, 1 + (3<<4));
return Packet4cf(_mm256_add_ps(t1,t3));
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet4cf>(const Packet4cf& a)
{
return predux_mul(pmul(Packet2cf(_mm256_extractf128_ps(a.v, 0)),
Packet2cf(_mm256_extractf128_ps(a.v, 1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cf>
{
static EIGEN_STRONG_INLINE void run(Packet4cf& first, const Packet4cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, false,true>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, true,false>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet4cf, Packet4cf, true,true>
{
EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet4cf,Packet8f)
template<> EIGEN_STRONG_INLINE Packet4cf pdiv<Packet4cf>(const Packet4cf& a, const Packet4cf& b)
{
Packet4cf num = pmul(a, pconj(b));
__m256 tmp = _mm256_mul_ps(b.v, b.v);
__m256 tmp2 = _mm256_shuffle_ps(tmp,tmp,0xB1);
__m256 denom = _mm256_add_ps(tmp, tmp2);
return Packet4cf(_mm256_div_ps(num.v, denom));
return pdiv_complex(a, b);
}
template<> EIGEN_STRONG_INLINE Packet4cf pcplxflip<Packet4cf>(const Packet4cf& x)
@@ -254,6 +206,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
@@ -263,7 +216,18 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
#endif
template<> struct unpacket_traits<Packet2cd> { typedef std::complex<double> type; enum {size=2, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet2cd> {
typedef std::complex<double> type;
typedef Packet1cd half;
typedef Packet4d as_real;
enum {
size=2,
alignment=Aligned32,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
};
template<> EIGEN_STRONG_INLINE Packet2cd padd<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_add_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd psub<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_sub_pd(a.v,b.v)); }
@@ -291,7 +255,6 @@ EIGEN_STRONG_INLINE Packet2cd pcmp_eq(const Packet2cd& a, const Packet2cd& b) {
}
template<> EIGEN_STRONG_INLINE Packet2cd ptrue<Packet2cd>(const Packet2cd& a) { return Packet2cd(ptrue(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pnot<Packet2cd>(const Packet2cd& a) { return Packet2cd(pnot(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pand <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_and_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd por <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_or_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pxor <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_xor_pd(a.v,b.v)); }
@@ -347,71 +310,17 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet2cd>(const Pack
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<> EIGEN_STRONG_INLINE Packet2cd preduxp<Packet2cd>(const Packet2cd* vecs)
{
Packet4d t0 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 0 + (2<<4));
Packet4d t1 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 1 + (3<<4));
return Packet2cd(_mm256_add_pd(t0,t1));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet2cd>(const Packet2cd& a)
{
return predux(pmul(Packet1cd(_mm256_extractf128_pd(a.v,0)),
Packet1cd(_mm256_extractf128_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet2cd>
{
static EIGEN_STRONG_INLINE void run(Packet2cd& first, const Packet2cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet4d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, false,true>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, true,false>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet2cd, Packet2cd, true,true>
{
EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cd,Packet4d)
template<> EIGEN_STRONG_INLINE Packet2cd pdiv<Packet2cd>(const Packet2cd& a, const Packet2cd& b)
{
Packet2cd num = pmul(a, pconj(b));
__m256d tmp = _mm256_mul_pd(b.v, b.v);
__m256d denom = _mm256_hadd_pd(tmp, tmp);
return Packet2cd(_mm256_div_pd(num.v, denom));
return pdiv_complex(a, b);
}
template<> EIGEN_STRONG_INLINE Packet2cd pcplxflip<Packet2cd>(const Packet2cd& x)
@@ -444,24 +353,12 @@ ptranspose(PacketBlock<Packet2cd,2>& kernel) {
kernel.packet[0].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet4cf pinsertfirst(const Packet4cf& a, std::complex<float> b)
{
return Packet4cf(_mm256_blend_ps(a.v,pset1<Packet4cf>(b).v,1|2));
template<> EIGEN_STRONG_INLINE Packet2cd psqrt<Packet2cd>(const Packet2cd& a) {
return psqrt_complex<Packet2cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet2cd pinsertfirst(const Packet2cd& a, std::complex<double> b)
{
return Packet2cd(_mm256_blend_pd(a.v,pset1<Packet2cd>(b).v,1|2));
}
template<> EIGEN_STRONG_INLINE Packet4cf pinsertlast(const Packet4cf& a, std::complex<float> b)
{
return Packet4cf(_mm256_blend_ps(a.v,pset1<Packet4cf>(b).v,(1<<7)|(1<<6)));
}
template<> EIGEN_STRONG_INLINE Packet2cd pinsertlast(const Packet2cd& a, std::complex<double> b)
{
return Packet2cd(_mm256_blend_pd(a.v,pset1<Packet2cd>(b).v,(1<<3)|(1<<2)));
template<> EIGEN_STRONG_INLINE Packet4cf psqrt<Packet4cf>(const Packet4cf& a) {
return psqrt_complex<Packet4cf>(a);
}
} // end namespace internal

View File

@@ -36,6 +36,24 @@ plog<Packet8f>(const Packet8f& _x) {
return plog_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
plog<Packet4d>(const Packet4d& _x) {
return plog_double(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
plog2<Packet8f>(const Packet8f& _x) {
return plog2_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
plog2<Packet4d>(const Packet4d& _x) {
return plog2_double(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f plog1p<Packet8f>(const Packet8f& _x) {
return generic_plog1p(_x);
@@ -58,15 +76,15 @@ pexp<Packet8f>(const Packet8f& _x) {
// Hyperbolic Tangent function.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
ptanh<Packet8f>(const Packet8f& x) {
return internal::generic_fast_tanh_float(x);
ptanh<Packet8f>(const Packet8f& _x) {
return internal::generic_fast_tanh_float(_x);
}
// Exponential function for doubles.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
pexp<Packet4d>(const Packet4d& x) {
return pexp_double(x);
pexp<Packet4d>(const Packet4d& _x) {
return pexp_double(_x);
}
// Functions for sqrt.
@@ -79,33 +97,36 @@ pexp<Packet4d>(const Packet4d& x) {
// For detail see here: http://www.beyond3d.com/content/articles/8/
#if EIGEN_FAST_MATH
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
psqrt<Packet8f>(const Packet8f& _x) {
Packet8f half = pmul(_x, pset1<Packet8f>(.5f));
Packet8f denormal_mask = _mm256_and_ps(
_mm256_cmp_ps(_x, pset1<Packet8f>((std::numeric_limits<float>::min)()),
_CMP_LT_OQ),
_mm256_cmp_ps(_x, _mm256_setzero_ps(), _CMP_GE_OQ));
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& _x) {
Packet8f minus_half_x = pmul(_x, pset1<Packet8f>(-0.5f));
Packet8f denormal_mask = pandnot(
pcmp_lt(_x, pset1<Packet8f>((std::numeric_limits<float>::min)())),
pcmp_lt(_x, pzero(_x)));
// Compute approximate reciprocal sqrt.
Packet8f x = _mm256_rsqrt_ps(_x);
// Do a single step of Newton's iteration.
x = pmul(x, psub(pset1<Packet8f>(1.5f), pmul(half, pmul(x,x))));
x = pmul(x, pmadd(minus_half_x, pmul(x,x), pset1<Packet8f>(1.5f)));
// Flush results for denormals to zero.
return _mm256_andnot_ps(denormal_mask, pmul(_x,x));
return pandnot(pmul(_x,x), denormal_mask);
}
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& x) {
return _mm256_sqrt_ps(x);
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d psqrt<Packet4d>(const Packet4d& x) {
return _mm256_sqrt_pd(x);
}
#if EIGEN_FAST_MATH
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f psqrt<Packet8f>(const Packet8f& _x) {
return _mm256_sqrt_ps(_x);
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d psqrt<Packet4d>(const Packet4d& _x) {
return _mm256_sqrt_pd(_x);
}
#if EIGEN_FAST_MATH
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000);
@@ -140,18 +161,65 @@ Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
#else
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& x) {
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f(one, 1.0f);
return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(x));
return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(_x));
}
#endif
template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4d prsqrt<Packet4d>(const Packet4d& x) {
Packet4d prsqrt<Packet4d>(const Packet4d& _x) {
_EIGEN_DECLARE_CONST_Packet4d(one, 1.0);
return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(x));
return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(_x));
}
F16_PACKET_FUNCTION(Packet8f, Packet8h, psin)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pcos)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog2)
F16_PACKET_FUNCTION(Packet8f, Packet8h, plog1p)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pexpm1)
F16_PACKET_FUNCTION(Packet8f, Packet8h, pexp)
F16_PACKET_FUNCTION(Packet8f, Packet8h, ptanh)
F16_PACKET_FUNCTION(Packet8f, Packet8h, psqrt)
F16_PACKET_FUNCTION(Packet8f, Packet8h, prsqrt)
template <>
EIGEN_STRONG_INLINE Packet8h pfrexp(const Packet8h& a, Packet8h& exponent) {
Packet8f fexponent;
const Packet8h out = float2half(pfrexp<Packet8f>(half2float(a), fexponent));
exponent = float2half(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet8h pldexp(const Packet8h& a, const Packet8h& exponent) {
return float2half(pldexp<Packet8f>(half2float(a), half2float(exponent)));
}
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psin)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pcos)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog2)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog1p)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexpm1)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexp)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, ptanh)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psqrt)
BF16_PACKET_FUNCTION(Packet8f, Packet8bf, prsqrt)
template <>
EIGEN_STRONG_INLINE Packet8bf pfrexp(const Packet8bf& a, Packet8bf& exponent) {
Packet8f fexponent;
const Packet8bf out = F32ToBf16(pfrexp<Packet8f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet8bf pldexp(const Packet8bf& a, const Packet8bf& exponent) {
return F32ToBf16(pldexp<Packet8f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
} // end namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -35,6 +35,46 @@ struct type_casting_traits<int, float> {
};
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<bfloat16, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<float, bfloat16> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8i pcast<Packet8f, Packet8i>(const Packet8f& a) {
return _mm256_cvttps_epi32(a);
@@ -52,36 +92,22 @@ template<> EIGEN_STRONG_INLINE Packet8f preinterpret<Packet8f,Packet8i>(const Pa
return _mm256_castsi256_ps(a);
}
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8h, Packet8f>(const Packet8h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8bf, Packet8f>(const Packet8bf& a) {
return Bf16ToF32(a);
}
template<> EIGEN_STRONG_INLINE Packet8h pcast<Packet8f, Packet8h>(const Packet8f& a) {
return float2half(a);
}
template<> EIGEN_STRONG_INLINE Packet8bf pcast<Packet8f, Packet8bf>(const Packet8f& a) {
return F32ToBf16(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -37,17 +37,19 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = EIGEN_HAS_AVX512_MATH,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet8cf> {
typedef std::complex<float> type;
typedef Packet4cf half;
typedef Packet16f as_real;
enum {
size = 8,
alignment=unpacket_traits<Packet16f>::alignment,
@@ -55,11 +57,9 @@ template<> struct unpacket_traits<Packet8cf> {
masked_load_available=false,
masked_store_available=false
};
typedef Packet4cf half;
};
template<> EIGEN_STRONG_INLINE Packet8cf ptrue<Packet8cf>(const Packet8cf& a) { return Packet8cf(ptrue(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf pnot<Packet8cf>(const Packet8cf& a) { return Packet8cf(pnot(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf padd<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf psub<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_sub_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf pnegate(const Packet8cf& a)
@@ -97,7 +97,9 @@ template<> EIGEN_STRONG_INLINE Packet8cf ploadu<Packet8cf>(const std::complex<fl
template<> EIGEN_STRONG_INLINE Packet8cf pset1<Packet8cf>(const std::complex<float>& from)
{
return Packet8cf(_mm512_castpd_ps(pload1<Packet8d>((const double*)(const void*)&from)));
const float re = std::real(from);
const float im = std::imag(from);
return Packet8cf(_mm512_set_ps(im, re, im, re, im, re, im, re, im, re, im, re, im, re, im, re));
}
template<> EIGEN_STRONG_INLINE Packet8cf ploaddup<Packet8cf>(const std::complex<float>* from)
@@ -153,58 +155,11 @@ EIGEN_STRONG_INLINE Packet4cf predux_half_dowto4<Packet8cf>(const Packet8cf& a)
return Packet4cf(res);
}
template<int Offset>
struct palign_impl<Offset,Packet8cf>
{
static EIGEN_STRONG_INLINE void run(Packet8cf& first, const Packet8cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet16f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, false,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, true,false>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, true,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet8cf,Packet16f)
template<> EIGEN_STRONG_INLINE Packet8cf pdiv<Packet8cf>(const Packet8cf& a, const Packet8cf& b)
{
Packet8cf num = pmul(a, pconj(b));
__m512 tmp = _mm512_mul_ps(b.v, b.v);
__m512 tmp2 = _mm512_shuffle_ps(tmp,tmp,0xB1);
__m512 denom = _mm512_add_ps(tmp, tmp2);
return Packet8cf(_mm512_div_ps(num.v, denom));
return pdiv_complex(a, b);
}
template<> EIGEN_STRONG_INLINE Packet8cf pcplxflip<Packet8cf>(const Packet8cf& x)
@@ -235,17 +190,19 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasSqrt = EIGEN_HAS_AVX512_MATH,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet4cd> {
typedef std::complex<double> type;
typedef Packet2cd half;
typedef Packet8d as_real;
enum {
size = 4,
alignment = unpacket_traits<Packet8d>::alignment,
@@ -253,7 +210,6 @@ template<> struct unpacket_traits<Packet4cd> {
masked_load_available=false,
masked_store_available=false
};
typedef Packet2cd half;
};
template<> EIGEN_STRONG_INLINE Packet4cd padd<Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_add_pd(a.v,b.v)); }
@@ -277,7 +233,6 @@ template<> EIGEN_STRONG_INLINE Packet4cd pmul<Packet4cd>(const Packet4cd& a, con
}
template<> EIGEN_STRONG_INLINE Packet4cd ptrue<Packet4cd>(const Packet4cd& a) { return Packet4cd(ptrue(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pnot<Packet4cd>(const Packet4cd& a) { return Packet4cd(pnot(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pand <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd por <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pxor <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pxor(a.v,b.v)); }
@@ -296,11 +251,7 @@ template<> EIGEN_STRONG_INLINE Packet4cd ploadu<Packet4cd>(const std::complex<do
template<> EIGEN_STRONG_INLINE Packet4cd pset1<Packet4cd>(const std::complex<double>& from)
{
#ifdef EIGEN_VECTORIZE_AVX512DQ
return Packet4cd(_mm512_broadcast_f64x2(pset1<Packet1cd>(from).v));
#else
return Packet4cd(_mm512_castps_pd(_mm512_broadcast_f32x4( _mm_castpd_ps(pset1<Packet1cd>(from).v))));
#endif
}
template<> EIGEN_STRONG_INLINE Packet4cd ploaddup<Packet4cd>(const std::complex<double>* from) {
@@ -337,7 +288,7 @@ template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet4cd>(const Pack
}
template<> EIGEN_STRONG_INLINE Packet4cd preverse(const Packet4cd& a) {
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, EIGEN_SSE_SHUFFLE_MASK(3,2,1,0)));
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, (shuffle_mask<3,2,1,0>::mask)));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet4cd>(const Packet4cd& a)
@@ -352,57 +303,11 @@ template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet4cd>(const
Packet2cd(_mm512_extractf64x4_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cd>
{
static EIGEN_STRONG_INLINE void run(Packet4cd& first, const Packet4cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, false,true>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, true,false>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, true,true>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet4cd,Packet8d)
template<> EIGEN_STRONG_INLINE Packet4cd pdiv<Packet4cd>(const Packet4cd& a, const Packet4cd& b)
{
Packet4cd num = pmul(a, pconj(b));
__m512d tmp = _mm512_mul_pd(b.v, b.v);
__m512d denom = padd(_mm512_permute_pd(tmp,0x55), tmp);
return Packet4cd(_mm512_div_pd(num.v, denom));
return pdiv_complex(a, b);
}
template<> EIGEN_STRONG_INLINE Packet4cd pcplxflip<Packet4cd>(const Packet4cd& x)
@@ -450,43 +355,30 @@ ptranspose(PacketBlock<Packet8cf,8>& kernel) {
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet4cd,4>& kernel) {
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [c2 c3 d2 d3]
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<0,1,0,1>::mask)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<2,3,2,3>::mask)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<0,1,0,1>::mask)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<2,3,2,3>::mask)); // [c2 c3 d2 d3]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a0 b0 c0 d0]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<1,3,1,3>::mask))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<0,2,0,2>::mask))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<1,3,1,3>::mask))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<0,2,0,2>::mask))); // [a0 b0 c0 d0]
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertfirst(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,0));
tmp = pinsertfirst(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 0) );
#if EIGEN_HAS_AVX512_MATH
template<> EIGEN_STRONG_INLINE Packet4cd psqrt<Packet4cd>(const Packet4cd& a) {
return psqrt_complex<Packet4cd>(a);
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertfirst(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 0) ));
template<> EIGEN_STRONG_INLINE Packet8cf psqrt<Packet8cf>(const Packet8cf& a) {
return psqrt_complex<Packet8cf>(a);
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertlast(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,3) );
tmp = pinsertlast(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 3) );
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertlast(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 3) ));
}
#endif
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_COMPLEX_AVX512_H

View File

@@ -14,8 +14,7 @@ namespace Eigen {
namespace internal {
// Disable the code for older versions of gcc that don't support many of the required avx512 instrinsics.
#if EIGEN_GNUC_AT_LEAST(5, 3) || EIGEN_COMP_CLANG || EIGEN_COMP_MSVC >= 1923
#if EIGEN_HAS_AVX512_MATH
#define _EIGEN_DECLARE_CONST_Packet16f(NAME, X) \
const Packet16f p16f_##NAME = pset1<Packet16f>(X)
@@ -29,106 +28,41 @@ namespace internal {
#define _EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(NAME, X) \
const Packet8d p8d_##NAME = _mm512_castsi512_pd(_mm512_set1_epi64(X))
// Natural logarithm
// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
// be easily approximated by a polynomial centered on m=1 for stability.
#if defined(EIGEN_VECTORIZE_AVX512DQ)
#define _EIGEN_DECLARE_CONST_Packet16bf(NAME, X) \
const Packet16bf p16bf_##NAME = pset1<Packet16bf>(X)
#define _EIGEN_DECLARE_CONST_Packet16bf_FROM_INT(NAME, X) \
const Packet16bf p16bf_##NAME = preinterpret<Packet16bf,Packet16i>(pset1<Packet16i>(X))
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
plog<Packet16f>(const Packet16f& _x) {
Packet16f x = _x;
_EIGEN_DECLARE_CONST_Packet16f(1, 1.0f);
_EIGEN_DECLARE_CONST_Packet16f(half, 0.5f);
_EIGEN_DECLARE_CONST_Packet16f(126f, 126.0f);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(inv_mant_mask, ~0x7f800000);
// The smallest non denormalized float number.
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(min_norm_pos, 0x00800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(minus_inf, 0xff800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(pos_inf, 0x7f800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(nan, 0x7fc00000);
// Polynomial coefficients.
_EIGEN_DECLARE_CONST_Packet16f(cephes_SQRTHF, 0.707106781186547524f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p0, 7.0376836292E-2f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p1, -1.1514610310E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p2, 1.1676998740E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p3, -1.2420140846E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p4, +1.4249322787E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p5, -1.6668057665E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p6, +2.0000714765E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p7, -2.4999993993E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_p8, +3.3333331174E-1f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_q1, -2.12194440e-4f);
_EIGEN_DECLARE_CONST_Packet16f(cephes_log_q2, 0.693359375f);
// invalid_mask is set to true when x is NaN
__mmask16 invalid_mask = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_NGE_UQ);
__mmask16 iszero_mask = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_EQ_OQ);
// Truncate input values to the minimum positive normal.
x = pmax(x, p16f_min_norm_pos);
// Extract the shifted exponents.
Packet16f emm0 = _mm512_cvtepi32_ps(_mm512_srli_epi32(preinterpret<Packet16i,Packet16f>(x), 23));
Packet16f e = _mm512_sub_ps(emm0, p16f_126f);
// Set the exponents to -1, i.e. x are in the range [0.5,1).
x = _mm512_and_ps(x, p16f_inv_mant_mask);
x = _mm512_or_ps(x, p16f_half);
// part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2))
// and shift by -1. The values are then centered around 0, which improves
// the stability of the polynomial evaluation.
// if( x < SQRTHF ) {
// e -= 1;
// x = x + x - 1.0;
// } else { x = x - 1.0; }
__mmask16 mask = _mm512_cmp_ps_mask(x, p16f_cephes_SQRTHF, _CMP_LT_OQ);
Packet16f tmp = _mm512_mask_blend_ps(mask, _mm512_setzero_ps(), x);
x = psub(x, p16f_1);
e = psub(e, _mm512_mask_blend_ps(mask, _mm512_setzero_ps(), p16f_1));
x = padd(x, tmp);
Packet16f x2 = pmul(x, x);
Packet16f x3 = pmul(x2, x);
// Evaluate the polynomial approximant of degree 8 in three parts, probably
// to improve instruction-level parallelism.
Packet16f y, y1, y2;
y = pmadd(p16f_cephes_log_p0, x, p16f_cephes_log_p1);
y1 = pmadd(p16f_cephes_log_p3, x, p16f_cephes_log_p4);
y2 = pmadd(p16f_cephes_log_p6, x, p16f_cephes_log_p7);
y = pmadd(y, x, p16f_cephes_log_p2);
y1 = pmadd(y1, x, p16f_cephes_log_p5);
y2 = pmadd(y2, x, p16f_cephes_log_p8);
y = pmadd(y, x3, y1);
y = pmadd(y, x3, y2);
y = pmul(y, x3);
// Add the logarithm of the exponent back to the result of the interpolation.
y1 = pmul(e, p16f_cephes_log_q1);
tmp = pmul(x2, p16f_half);
y = padd(y, y1);
x = psub(x, tmp);
y2 = pmul(e, p16f_cephes_log_q2);
x = padd(x, y);
x = padd(x, y2);
__mmask16 pos_inf_mask = _mm512_cmp_ps_mask(_x,p16f_pos_inf,_CMP_EQ_OQ);
// Filter out invalid inputs, i.e.:
// - negative arg will be NAN,
// - 0 will be -INF.
// - +INF will be +INF
return _mm512_mask_blend_ps(iszero_mask,
_mm512_mask_blend_ps(invalid_mask,
_mm512_mask_blend_ps(pos_inf_mask,x,p16f_pos_inf),
p16f_nan),
p16f_minus_inf);
return plog_float(_x);
}
#endif
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
plog<Packet8d>(const Packet8d& _x) {
return plog_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog)
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
plog2<Packet16f>(const Packet16f& _x) {
return plog2_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
plog2<Packet8d>(const Packet8d& _x) {
return plog2_double(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog2)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog2)
// Exponential function. Works by writing "x = m*log(2) + r" where
// "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then
@@ -164,17 +98,17 @@ pexp<Packet16f>(const Packet16f& _x) {
_EIGEN_DECLARE_CONST_Packet16f(nln2, -0.6931471805599453f);
Packet16f r = _mm512_fmadd_ps(m, p16f_nln2, x);
Packet16f r2 = pmul(r, r);
Packet16f r3 = pmul(r2, r);
// TODO(gonnet): Split into odd/even polynomials and try to exploit
// instruction-level parallelism.
Packet16f y = p16f_cephes_exp_p0;
y = pmadd(y, r, p16f_cephes_exp_p1);
y = pmadd(y, r, p16f_cephes_exp_p2);
y = pmadd(y, r, p16f_cephes_exp_p3);
y = pmadd(y, r, p16f_cephes_exp_p4);
y = pmadd(y, r, p16f_cephes_exp_p5);
y = pmadd(y, r2, r);
y = padd(y, p16f_1);
// Evaluate the polynomial approximant,improved by instruction-level parallelism.
Packet16f y, y1, y2;
y = pmadd(p16f_cephes_exp_p0, r, p16f_cephes_exp_p1);
y1 = pmadd(p16f_cephes_exp_p3, r, p16f_cephes_exp_p4);
y2 = padd(r, p16f_1);
y = pmadd(y, r, p16f_cephes_exp_p2);
y1 = pmadd(y1, r, p16f_cephes_exp_p5);
y = pmadd(y, r3, y1);
y = pmadd(y, r2, y2);
// Build emm0 = 2^m.
Packet16i emm0 = _mm512_cvttps_epi32(padd(m, p16f_127));
@@ -184,75 +118,40 @@ pexp<Packet16f>(const Packet16f& _x) {
return pmax(pmul(y, _mm512_castsi512_ps(emm0)), _x);
}
/*template <>
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
pexp<Packet8d>(const Packet8d& _x) {
Packet8d x = _x;
return pexp_double(_x);
}
_EIGEN_DECLARE_CONST_Packet8d(1, 1.0);
_EIGEN_DECLARE_CONST_Packet8d(2, 2.0);
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexp)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexp)
_EIGEN_DECLARE_CONST_Packet8d(exp_hi, 709.437);
_EIGEN_DECLARE_CONST_Packet8d(exp_lo, -709.436139303);
template <>
EIGEN_STRONG_INLINE Packet16h pfrexp(const Packet16h& a, Packet16h& exponent) {
Packet16f fexponent;
const Packet16h out = float2half(pfrexp<Packet16f>(half2float(a), fexponent));
exponent = float2half(fexponent);
return out;
}
_EIGEN_DECLARE_CONST_Packet8d(cephes_LOG2EF, 1.4426950408889634073599);
template <>
EIGEN_STRONG_INLINE Packet16h pldexp(const Packet16h& a, const Packet16h& exponent) {
return float2half(pldexp<Packet16f>(half2float(a), half2float(exponent)));
}
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_p0, 1.26177193074810590878e-4);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_p1, 3.02994407707441961300e-2);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_p2, 9.99999999999999999910e-1);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q0, 3.00198505138664455042e-6);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q1, 2.52448340349684104192e-3);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q2, 2.27265548208155028766e-1);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_q3, 2.00000000000000000009e0);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_C1, 0.693145751953125);
_EIGEN_DECLARE_CONST_Packet8d(cephes_exp_C2, 1.42860682030941723212e-6);
// clamp x
x = pmax(pmin(x, p8d_exp_hi), p8d_exp_lo);
// Express exp(x) as exp(g + n*log(2)).
const Packet8d n =
_mm512_mul_round_pd(p8d_cephes_LOG2EF, x, _MM_FROUND_TO_NEAREST_INT);
// Get the remainder modulo log(2), i.e. the "g" described above. Subtract
// n*log(2) out in two steps, i.e. n*C1 + n*C2, C1+C2=log2 to get the last
// digits right.
const Packet8d nC1 = pmul(n, p8d_cephes_exp_C1);
const Packet8d nC2 = pmul(n, p8d_cephes_exp_C2);
x = psub(x, nC1);
x = psub(x, nC2);
const Packet8d x2 = pmul(x, x);
// Evaluate the numerator polynomial of the rational interpolant.
Packet8d px = p8d_cephes_exp_p0;
px = pmadd(px, x2, p8d_cephes_exp_p1);
px = pmadd(px, x2, p8d_cephes_exp_p2);
px = pmul(px, x);
// Evaluate the denominator polynomial of the rational interpolant.
Packet8d qx = p8d_cephes_exp_q0;
qx = pmadd(qx, x2, p8d_cephes_exp_q1);
qx = pmadd(qx, x2, p8d_cephes_exp_q2);
qx = pmadd(qx, x2, p8d_cephes_exp_q3);
// I don't really get this bit, copied from the SSE2 routines, so...
// TODO(gonnet): Figure out what is going on here, perhaps find a better
// rational interpolant?
x = _mm512_div_pd(px, psub(qx, px));
x = pmadd(p8d_2, x, p8d_1);
// Build e=2^n.
const Packet8d e = _mm512_castsi512_pd(_mm512_slli_epi64(
_mm512_add_epi64(_mm512_cvtpd_epi64(n), _mm512_set1_epi64(1023)), 52));
// Construct the result 2^n * exp(g) = e * x. The max is used to catch
// non-finite values in the input.
return pmax(pmul(x, e), _x);
}*/
template <>
EIGEN_STRONG_INLINE Packet16bf pfrexp(const Packet16bf& a, Packet16bf& exponent) {
Packet16f fexponent;
const Packet16bf out = F32ToBf16(pfrexp<Packet16f>(Bf16ToF32(a), fexponent));
exponent = F32ToBf16(fexponent);
return out;
}
template <>
EIGEN_STRONG_INLINE Packet16bf pldexp(const Packet16bf& a, const Packet16bf& exponent) {
return F32ToBf16(pldexp<Packet16f>(Bf16ToF32(a), Bf16ToF32(exponent)));
}
// Functions for sqrt.
// The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step
@@ -303,12 +202,16 @@ template <>
EIGEN_STRONG_INLINE Packet16f psqrt<Packet16f>(const Packet16f& x) {
return _mm512_sqrt_ps(x);
}
template <>
EIGEN_STRONG_INLINE Packet8d psqrt<Packet8d>(const Packet8d& x) {
return _mm512_sqrt_pd(x);
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, psqrt)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psqrt)
// prsqrt for float.
#if defined(EIGEN_VECTORIZE_AVX512ER)
@@ -316,7 +219,6 @@ template <>
EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
return _mm512_rsqrt28_ps(x);
}
#elif EIGEN_FAST_MATH
template <>
@@ -332,7 +234,7 @@ prsqrt<Packet16f>(const Packet16f& _x) {
__mmask16 inf_mask = _mm512_cmp_ps_mask(_x, p16f_inf, _CMP_EQ_OQ);
__mmask16 not_pos_mask = _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_LE_OQ);
__mmask16 not_finite_pos_mask = not_pos_mask | inf_mask;
// Compute an approximate result using the rsqrt intrinsic, forcing +inf
// for denormals for consistency with AVX and SSE implementations.
Packet16f y_approx = _mm512_rsqrt14_ps(_x);
@@ -347,8 +249,7 @@ prsqrt<Packet16f>(const Packet16f& _x) {
// For other arguments, choose the output of the intrinsic. This will
// return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf.
return _mm512_mask_blend_ps(not_finite_pos_mask, y_newton, y_approx);
}
}
#else
template <>
@@ -356,9 +257,11 @@ EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
_EIGEN_DECLARE_CONST_Packet16f(one, 1.0f);
return _mm512_div_ps(p16f_one, _mm512_sqrt_ps(x));
}
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, prsqrt)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, prsqrt)
// prsqrt for double.
#if EIGEN_FAST_MATH
template <>
@@ -406,19 +309,23 @@ EIGEN_STRONG_INLINE Packet8d prsqrt<Packet8d>(const Packet8d& x) {
}
#endif
#if defined(EIGEN_VECTORIZE_AVX512DQ)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f plog1p<Packet16f>(const Packet16f& _x) {
return generic_plog1p(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, plog1p)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog1p)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f pexpm1<Packet16f>(const Packet16f& _x) {
return generic_expm1(_x);
}
#endif
#endif
F16_PACKET_FUNCTION(Packet16f, Packet16h, pexpm1)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexpm1)
#endif // EIGEN_HAS_AVX512_MATH
template <>
@@ -439,6 +346,14 @@ ptanh<Packet16f>(const Packet16f& _x) {
return internal::generic_fast_tanh_float(_x);
}
F16_PACKET_FUNCTION(Packet16f, Packet16h, psin)
F16_PACKET_FUNCTION(Packet16f, Packet16h, pcos)
F16_PACKET_FUNCTION(Packet16f, Packet16h, ptanh)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psin)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pcos)
BF16_PACKET_FUNCTION(Packet16f, Packet16bf, ptanh)
} // end namespace internal
} // end namespace Eigen

File diff suppressed because it is too large Load Diff

View File

@@ -14,6 +14,22 @@ namespace Eigen {
namespace internal {
template<> EIGEN_STRONG_INLINE Packet16i pcast<Packet16f, Packet16i>(const Packet16f& a) {
return _mm512_cvttps_epi32(a);
}
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16i, Packet16f>(const Packet16i& a) {
return _mm512_cvtepi32_ps(a);
}
template<> EIGEN_STRONG_INLINE Packet16i preinterpret<Packet16i, Packet16f>(const Packet16f& a) {
return _mm512_castps_si512(a);
}
template<> EIGEN_STRONG_INLINE Packet16f preinterpret<Packet16f, Packet16i>(const Packet16i& a) {
return _mm512_castsi512_ps(a);
}
template <>
struct type_casting_traits<half, float> {
enum {
@@ -40,6 +56,32 @@ template<> EIGEN_STRONG_INLINE Packet16h pcast<Packet16f, Packet16h>(const Packe
return float2half(a);
}
template <>
struct type_casting_traits<bfloat16, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16bf, Packet16f>(const Packet16bf& a) {
return Bf16ToF32(a);
}
template <>
struct type_casting_traits<float, bfloat16> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16bf pcast<Packet16f, Packet16bf>(const Packet16f& a) {
return F32ToBf16(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -15,8 +15,10 @@ namespace Eigen {
namespace internal {
static Packet4ui p4ui_CONJ_XOR = vec_mergeh((Packet4ui)p4i_ZERO, (Packet4ui)p4f_MZERO);//{ 0x00000000, 0x80000000, 0x00000000, 0x80000000 };
#ifdef __VSX__
inline Packet4ui p4ui_CONJ_XOR() {
return vec_mergeh((Packet4ui)p4i_ZERO, (Packet4ui)p4f_MZERO);//{ 0x00000000, 0x80000000, 0x00000000, 0x80000000 };
}
#ifdef EIGEN_VECTORIZE_VSX
#if defined(_BIG_ENDIAN)
static Packet2ul p2ul_CONJ_XOR1 = (Packet2ul) vec_sld((Packet4ui) p2d_MZERO, (Packet4ui) p2l_ZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 };
static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2l_ZERO, (Packet4ui) p2d_MZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 };
@@ -29,8 +31,54 @@ static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2d_MZERO, (P
//---------- float ----------
struct Packet2cf
{
EIGEN_STRONG_INLINE explicit Packet2cf() : v(p4f_ZERO) {}
EIGEN_STRONG_INLINE explicit Packet2cf() {}
EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {}
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b)
{
Packet4f v1, v2;
// Permute and multiply the real parts of a and b
v1 = vec_perm(a.v, a.v, p16uc_PSET32_WODD);
// Get the imaginary parts of a
v2 = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN);
// multiply a_re * b
v1 = vec_madd(v1, b.v, p4f_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(v2, b.v, p4f_ZERO);
v2 = reinterpret_cast<Packet4f>(pxor(v2, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR())));
// permute back to a proper order
v2 = vec_perm(v2, v2, p16uc_COMPLEX32_REV);
return Packet2cf(padd<Packet4f>(v1, v2));
}
EIGEN_STRONG_INLINE Packet2cf& operator*=(const Packet2cf& b) {
v = pmul(Packet2cf(*this), b).v;
return *this;
}
EIGEN_STRONG_INLINE Packet2cf operator*(const Packet2cf& b) const {
return Packet2cf(*this) *= b;
}
EIGEN_STRONG_INLINE Packet2cf& operator+=(const Packet2cf& b) {
v = padd(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet2cf operator+(const Packet2cf& b) const {
return Packet2cf(*this) += b;
}
EIGEN_STRONG_INLINE Packet2cf& operator-=(const Packet2cf& b) {
v = psub(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet2cf operator-(const Packet2cf& b) const {
return Packet2cf(*this) -= b;
}
EIGEN_STRONG_INLINE Packet2cf operator-(void) const {
return Packet2cf(-v);
}
Packet4f v;
};
@@ -38,6 +86,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
{
typedef Packet2cf type;
typedef Packet2cf half;
typedef Packet4f as_real;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
@@ -53,14 +102,15 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
#ifdef __VSX__
HasSqrt = 1,
#ifdef EIGEN_VECTORIZE_VSX
HasBlend = 1,
#endif
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; typedef Packet4f as_real; };
template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>& from)
{
@@ -80,6 +130,25 @@ template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<
template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> * to, const Packet2cf& from) { pstore((float*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> * to, const Packet2cf& from) { pstoreu((float*)to, from.v); }
EIGEN_STRONG_INLINE Packet2cf pload2(const std::complex<float>& from0, const std::complex<float>& from1)
{
Packet4f res0, res1;
#ifdef EIGEN_VECTORIZE_VSX
__asm__ ("lxsdx %x0,%y1" : "=wa" (res0) : "Z" (from0));
__asm__ ("lxsdx %x0,%y1" : "=wa" (res1) : "Z" (from1));
#ifdef _BIG_ENDIAN
__asm__ ("xxpermdi %x0, %x1, %x2, 0" : "=wa" (res0) : "wa" (res0), "wa" (res1));
#else
__asm__ ("xxpermdi %x0, %x2, %x1, 0" : "=wa" (res0) : "wa" (res0), "wa" (res1));
#endif
#else
*reinterpret_cast<std::complex<float> *>(&res0) = from0;
*reinterpret_cast<std::complex<float> *>(&res1) = from1;
res0 = vec_perm(res0, res1, p16uc_TRANSPOSE64_HI);
#endif
return Packet2cf(res0);
}
template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, Index stride)
{
EIGEN_ALIGN16 std::complex<float> af[2];
@@ -98,26 +167,7 @@ template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf
template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(a.v + b.v); }
template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(a.v - b.v); }
template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate(a.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf(pxor<Packet4f>(a.v, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR))); }
template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
Packet4f v1, v2;
// Permute and multiply the real parts of a and b
v1 = vec_perm(a.v, a.v, p16uc_PSET32_WODD);
// Get the imaginary parts of a
v2 = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN);
// multiply a_re * b
v1 = vec_madd(v1, b.v, p4f_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(v2, b.v, p4f_ZERO);
v2 = reinterpret_cast<Packet4f>(pxor(v2, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR)));
// permute back to a proper order
v2 = vec_perm(v2, v2, p16uc_COMPLEX32_REV);
return Packet2cf(padd<Packet4f>(v1, v2));
}
template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf(pxor<Packet4f>(a.v, reinterpret_cast<Packet4f>(p4ui_CONJ_XOR()))); }
template<> EIGEN_STRONG_INLINE Packet2cf pand <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pand<Packet4f>(a.v, b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cf por <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(por<Packet4f>(a.v, b.v)); }
@@ -149,22 +199,6 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packe
return pfirst<Packet2cf>(Packet2cf(b));
}
template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
{
Packet4f b1, b2;
#ifdef _BIG_ENDIAN
b1 = vec_sld(vecs[0].v, vecs[1].v, 8);
b2 = vec_sld(vecs[1].v, vecs[0].v, 8);
#else
b1 = vec_sld(vecs[1].v, vecs[0].v, 8);
b2 = vec_sld(vecs[0].v, vecs[1].v, 8);
#endif
b2 = vec_sld(b2, b2, 8);
b2 = padd<Packet4f>(b1, b2);
return Packet2cf(b2);
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
{
Packet4f b;
@@ -175,63 +209,11 @@ template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const P
return pfirst<Packet2cf>(prod);
}
template<int Offset>
struct palign_impl<Offset,Packet2cf>
{
static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
{
if (Offset==1)
{
#ifdef _BIG_ENDIAN
first.v = vec_sld(first.v, second.v, 8);
#else
first.v = vec_sld(second.v, first.v, 8);
#endif
}
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
{
EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f)
template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
{
// TODO optimize it for AltiVec
Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a, b);
Packet4f s = pmul<Packet4f>(b.v, b.v);
return Packet2cf(pdiv(res.v, padd<Packet4f>(s, vec_perm(s, s, p16uc_COMPLEX32_REV))));
return pdiv_complex(a, b);
}
template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip<Packet2cf>(const Packet2cf& x)
@@ -251,20 +233,70 @@ template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packe
return Packet2cf(vec_and(eq, vec_perm(eq, eq, p16uc_COMPLEX32_REV)));
}
#ifdef __VSX__
#ifdef EIGEN_VECTORIZE_VSX
template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) {
Packet2cf result;
result.v = reinterpret_cast<Packet4f>(pblend<Packet2d>(ifPacket, reinterpret_cast<Packet2d>(thenPacket.v), reinterpret_cast<Packet2d>(elsePacket.v)));
return result;
}
template<> EIGEN_STRONG_INLINE Packet2cf psqrt<Packet2cf>(const Packet2cf& a)
{
return psqrt_complex<Packet2cf>(a);
}
#endif
//---------- double ----------
#ifdef __VSX__
#ifdef EIGEN_VECTORIZE_VSX
struct Packet1cd
{
EIGEN_STRONG_INLINE Packet1cd() {}
EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {}
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b)
{
Packet2d a_re, a_im, v1, v2;
// Permute and multiply the real parts of a and b
a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI);
// Get the imaginary parts of a
a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO);
// multiply a_re * b
v1 = vec_madd(a_re, b.v, p2d_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(a_im, b.v, p2d_ZERO);
v2 = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(v2), reinterpret_cast<Packet4ui>(v2), 8));
v2 = pxor(v2, reinterpret_cast<Packet2d>(p2ul_CONJ_XOR1));
return Packet1cd(padd<Packet2d>(v1, v2));
}
EIGEN_STRONG_INLINE Packet1cd& operator*=(const Packet1cd& b) {
v = pmul(Packet1cd(*this), b).v;
return *this;
}
EIGEN_STRONG_INLINE Packet1cd operator*(const Packet1cd& b) const {
return Packet1cd(*this) *= b;
}
EIGEN_STRONG_INLINE Packet1cd& operator+=(const Packet1cd& b) {
v = padd(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet1cd operator+(const Packet1cd& b) const {
return Packet1cd(*this) += b;
}
EIGEN_STRONG_INLINE Packet1cd& operator-=(const Packet1cd& b) {
v = psub(v, b.v);
return *this;
}
EIGEN_STRONG_INLINE Packet1cd operator-(const Packet1cd& b) const {
return Packet1cd(*this) -= b;
}
EIGEN_STRONG_INLINE Packet1cd operator-(void) const {
return Packet1cd(-v);
}
Packet2d v;
};
@@ -272,6 +304,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
{
typedef Packet1cd type;
typedef Packet1cd half;
typedef Packet2d as_real;
enum {
Vectorizable = 1,
AlignedOnScalar = 0,
@@ -287,11 +320,12 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSqrt = 1,
HasSetLinear = 0
};
};
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; typedef Packet2d as_real; };
template<> EIGEN_STRONG_INLINE Packet1cd pload <Packet1cd>(const std::complex<double>* from) { return Packet1cd(pload<Packet2d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { return Packet1cd(ploadu<Packet2d>((const double*)from)); }
@@ -301,19 +335,13 @@ template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<
template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>& from)
{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index stride)
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index)
{
EIGEN_ALIGN16 std::complex<double> af[2];
af[0] = from[0*stride];
af[1] = from[1*stride];
return pload<Packet1cd>(af);
return pload<Packet1cd>(from);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index stride)
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index)
{
EIGEN_ALIGN16 std::complex<double> af[2];
pstore<std::complex<double> >(af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
pstore<std::complex<double> >(to, from);
}
template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v + b.v); }
@@ -321,24 +349,6 @@ template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, con
template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd(pxor(a.v, reinterpret_cast<Packet2d>(p2ul_CONJ_XOR2))); }
template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
Packet2d a_re, a_im, v1, v2;
// Permute and multiply the real parts of a and b
a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI);
// Get the imaginary parts of a
a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO);
// multiply a_re * b
v1 = vec_madd(a_re, b.v, p2d_ZERO);
// multiply a_im * b and get the conjugate result
v2 = vec_madd(a_im, b.v, p2d_ZERO);
v2 = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(v2), reinterpret_cast<Packet4ui>(v2), 8));
v2 = pxor(v2, reinterpret_cast<Packet2d>(p2ul_CONJ_XOR1));
return Packet1cd(padd<Packet2d>(v1, v2));
}
template<> EIGEN_STRONG_INLINE Packet1cd pand <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd por <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet1cd pxor <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pxor(a.v,b.v)); }
@@ -359,61 +369,14 @@ template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet1cd>(const Pac
template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) { return vecs[0]; }
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
template<int Offset>
struct palign_impl<Offset,Packet1cd>
{
static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
{
// FIXME is it sure we never have to align a Packet1cd?
// Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
{
EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d)
template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
{
// TODO optimize it for AltiVec
Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
Packet2d s = pmul<Packet2d>(b.v, b.v);
return Packet1cd(pdiv(res.v, padd<Packet2d>(s, vec_perm(s, s, p16uc_REVERSE64))));
return pdiv_complex(a, b);
}
EIGEN_STRONG_INLINE Packet1cd pcplxflip/*<Packet1cd>*/(const Packet1cd& x)
@@ -439,7 +402,12 @@ template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packe
return Packet1cd(vec_and(eq, eq_swapped));
}
#endif // __VSX__
template<> EIGEN_STRONG_INLINE Packet1cd psqrt<Packet1cd>(const Packet1cd& a)
{
return psqrt_complex<Packet1cd>(a);
}
#endif // EIGEN_VECTORIZE_VSX
} // end namespace internal
} // end namespace Eigen

View File

@@ -40,16 +40,14 @@ Packet4f pcos<Packet4f>(const Packet4f& _x)
return pcos_float(_x);
}
#ifdef EIGEN_VECTORIZE_VSX
#ifndef EIGEN_COMP_CLANG
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f prsqrt<Packet4f>(const Packet4f& x)
{
return vec_rsqrt(x);
}
#endif
#ifdef __VSX__
#ifndef EIGEN_COMP_CLANG
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet2d prsqrt<Packet2d>(const Packet2d& x)
{
@@ -57,7 +55,7 @@ Packet2d prsqrt<Packet2d>(const Packet2d& x)
}
#endif
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet4f psqrt<Packet4f>(const Packet4f& x)
{
return vec_sqrt(x);
@@ -69,12 +67,43 @@ Packet2d psqrt<Packet2d>(const Packet2d& x)
return vec_sqrt(x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
#if !EIGEN_COMP_CLANG
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet4f prsqrt<Packet4f>(const Packet4f& x)
{
return pset1<Packet4f>(1.0f) / psqrt<Packet4f>(x);
// vec_rsqrt returns different results from the generic version
// return vec_rsqrt(x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet2d prsqrt<Packet2d>(const Packet2d& x)
{
return pset1<Packet2d>(1.0) / psqrt<Packet2d>(x);
// vec_rsqrt returns different results from the generic version
// return vec_rsqrt(x);
}
#endif
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet2d pexp<Packet2d>(const Packet2d& _x)
{
return pexp_double(_x);
}
#endif
template<> EIGEN_STRONG_INLINE Packet8bf psqrt<Packet8bf> (const Packet8bf& a){
BF16_TO_F32_UNARY_OP_WRAPPER(vec_sqrt, a);
}
template<> EIGEN_STRONG_INLINE Packet8bf prsqrt<Packet8bf> (const Packet8bf& a){
BF16_TO_F32_UNARY_OP_WRAPPER(prsqrt<Packet4f>, a);
}
template<> EIGEN_STRONG_INLINE Packet8bf pexp<Packet8bf> (const Packet8bf& a){
BF16_TO_F32_UNARY_OP_WRAPPER(pexp_float, a);
}
#endif // EIGEN_VECTORIZE_VSX
// Hyperbolic Tangent function.
template <>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,159 @@
//#define EIGEN_POWER_USE_PREFETCH // Use prefetching in gemm routines
#ifdef EIGEN_POWER_USE_PREFETCH
#define EIGEN_POWER_PREFETCH(p) prefetch(p)
#else
#define EIGEN_POWER_PREFETCH(p)
#endif
namespace Eigen {
namespace internal {
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accRows, const Index accCols>
EIGEN_ALWAYS_INLINE void gemm_extra_row(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index row,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlpha,
const Packet& pMask);
template<typename Scalar, typename Packet, typename DataMapper, typename Index, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_extra_cols(
const DataMapper& res,
const Scalar* blockA,
const Scalar* blockB,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index offsetB,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlpha,
const Packet& pMask);
template<typename Packet>
EIGEN_ALWAYS_INLINE Packet bmask(const int remaining_rows);
template<typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_ALWAYS_INLINE void gemm_complex_extra_row(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index row,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlphaReal,
const Packet& pAlphaImag,
const Packet& pMask);
template<typename Scalar, typename Packet, typename Packetc, typename DataMapper, typename Index, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_STRONG_INLINE void gemm_complex_extra_cols(
const DataMapper& res,
const Scalar* blockA,
const Scalar* blockB,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index offsetB,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlphaReal,
const Packet& pAlphaImag,
const Packet& pMask);
template<typename Scalar, typename Packet>
EIGEN_ALWAYS_INLINE Packet ploadLhs(const Scalar* lhs);
template<typename DataMapper, typename Packet, typename Index, const Index accCols, int StorageOrder, bool Complex, int N>
EIGEN_ALWAYS_INLINE void bload(PacketBlock<Packet,N>& acc, const DataMapper& res, Index row, Index col);
template<typename Packet, int N>
EIGEN_ALWAYS_INLINE void bscale(PacketBlock<Packet,N>& acc, PacketBlock<Packet,N>& accZ, const Packet& pAlpha);
template<typename Packet, int N>
EIGEN_ALWAYS_INLINE void bscalec(PacketBlock<Packet,N>& aReal, PacketBlock<Packet,N>& aImag, const Packet& bReal, const Packet& bImag, PacketBlock<Packet,N>& cReal, PacketBlock<Packet,N>& cImag);
// Grab two decouples real/imaginary PacketBlocks and return two coupled (real/imaginary pairs) PacketBlocks.
template<typename Packet, typename Packetc, int N>
EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock<Packet,N>& taccReal, PacketBlock<Packet,N>& taccImag, PacketBlock<Packetc, N>& acc1, PacketBlock<Packetc, N>& acc2)
{
acc1.packet[0].v = vec_mergeh(taccReal.packet[0], taccImag.packet[0]);
if (N > 1) {
acc1.packet[1].v = vec_mergeh(taccReal.packet[1], taccImag.packet[1]);
}
if (N > 2) {
acc1.packet[2].v = vec_mergeh(taccReal.packet[2], taccImag.packet[2]);
}
if (N > 3) {
acc1.packet[3].v = vec_mergeh(taccReal.packet[3], taccImag.packet[3]);
}
acc2.packet[0].v = vec_mergel(taccReal.packet[0], taccImag.packet[0]);
if (N > 1) {
acc2.packet[1].v = vec_mergel(taccReal.packet[1], taccImag.packet[1]);
}
if (N > 2) {
acc2.packet[2].v = vec_mergel(taccReal.packet[2], taccImag.packet[2]);
}
if (N > 3) {
acc2.packet[3].v = vec_mergel(taccReal.packet[3], taccImag.packet[3]);
}
}
template<typename Packet, typename Packetc, int N>
EIGEN_ALWAYS_INLINE void bcouple(PacketBlock<Packet,N>& taccReal, PacketBlock<Packet,N>& taccImag, PacketBlock<Packetc,N*2>& tRes, PacketBlock<Packetc, N>& acc1, PacketBlock<Packetc, N>& acc2)
{
bcouple_common<Packet, Packetc, N>(taccReal, taccImag, acc1, acc2);
acc1.packet[0] = padd<Packetc>(tRes.packet[0], acc1.packet[0]);
if (N > 1) {
acc1.packet[1] = padd<Packetc>(tRes.packet[1], acc1.packet[1]);
}
if (N > 2) {
acc1.packet[2] = padd<Packetc>(tRes.packet[2], acc1.packet[2]);
}
if (N > 3) {
acc1.packet[3] = padd<Packetc>(tRes.packet[3], acc1.packet[3]);
}
acc2.packet[0] = padd<Packetc>(tRes.packet[0+N], acc2.packet[0]);
if (N > 1) {
acc2.packet[1] = padd<Packetc>(tRes.packet[1+N], acc2.packet[1]);
}
if (N > 2) {
acc2.packet[2] = padd<Packetc>(tRes.packet[2+N], acc2.packet[2]);
}
if (N > 3) {
acc2.packet[3] = padd<Packetc>(tRes.packet[3+N], acc2.packet[3]);
}
}
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_ALWAYS_INLINE Packet ploadRhs(const Scalar* rhs)
{
return ploadu<Packet>(rhs);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -0,0 +1,627 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2020 Everton Constantino (everton.constantino@ibm.com)
// Copyright (C) 2021 Chip Kerchner (chip.kerchner@ibm.com)
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H
#define EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H
// If using dynamic dispatch, set the CPU target.
#if defined(EIGEN_ALTIVEC_MMA_DYNAMIC_DISPATCH)
#pragma GCC push_options
#pragma GCC target("cpu=power10,htm")
#endif
#ifdef __has_builtin
#if !__has_builtin(__builtin_vsx_assemble_pair)
#define __builtin_vsx_assemble_pair __builtin_mma_assemble_pair
#endif
#endif
namespace Eigen {
namespace internal {
template<typename Scalar, typename Packet>
EIGEN_ALWAYS_INLINE void bsetzeroMMA(__vector_quad* acc)
{
__builtin_mma_xxsetaccz(acc);
}
template<typename DataMapper, typename Index, typename Packet, const Index accCols>
EIGEN_ALWAYS_INLINE void storeAccumulator(Index i, const DataMapper& data, const Packet& alpha, __vector_quad* acc)
{
PacketBlock<Packet, 4> result;
__builtin_mma_disassemble_acc(&result.packet, acc);
PacketBlock<Packet, 4> tRes;
bload<DataMapper, Packet, Index, accCols, ColMajor, false, 4>(tRes, data, i, 0);
bscale<Packet, 4>(tRes, result, alpha);
data.template storePacketBlock<Packet, 4>(i, 0, tRes);
}
template<typename DataMapper, typename Index, typename Packet, typename Packetc, const Index accColsC>
EIGEN_ALWAYS_INLINE void storeComplexAccumulator(Index i, const DataMapper& data, const Packet& alphaReal, const Packet& alphaImag, __vector_quad* accReal, __vector_quad* accImag)
{
PacketBlock<Packet, 4> resultReal, resultImag;
__builtin_mma_disassemble_acc(&resultReal.packet, accReal);
__builtin_mma_disassemble_acc(&resultImag.packet, accImag);
PacketBlock<Packetc, 8> tRes;
bload<DataMapper, Packetc, Index, accColsC, ColMajor, true, 4>(tRes, data, i, 0);
PacketBlock<Packet,4> taccReal, taccImag;
bscalec<Packet,4>(resultReal, resultImag, alphaReal, alphaImag, taccReal, taccImag);
PacketBlock<Packetc, 4> acc1, acc2;
bcouple<Packet, Packetc, 4>(taccReal, taccImag, tRes, acc1, acc2);
data.template storePacketBlock<Packetc, 4>(i, 0, acc1);
data.template storePacketBlock<Packetc, 4>(i + accColsC, 0, acc2);
}
// Defaults to float32, since Eigen still supports C++03 we can't use default template arguments
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const RhsPacket& a, const LhsPacket& b)
{
if(NegativeAccumulate)
{
__builtin_mma_xvf32gernp(acc, (__vector unsigned char)a, (__vector unsigned char)b);
} else {
__builtin_mma_xvf32gerpp(acc, (__vector unsigned char)a, (__vector unsigned char)b);
}
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const PacketBlock<Packet2d,2>& a, const Packet2d& b)
{
__vector_pair* a0 = (__vector_pair *)(&a.packet[0]);
if(NegativeAccumulate)
{
__builtin_mma_xvf64gernp(acc, *a0, (__vector unsigned char)b);
} else {
__builtin_mma_xvf64gerpp(acc, *a0, (__vector unsigned char)b);
}
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet2d& b)
{
if(NegativeAccumulate)
{
__builtin_mma_xvf64gernp(acc, (__vector_pair)a, (__vector unsigned char)b);
} else {
__builtin_mma_xvf64gerpp(acc, (__vector_pair)a, (__vector unsigned char)b);
}
}
template<typename LhsPacket, typename RhsPacket, bool NegativeAccumulate>
EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad*, const __vector_pair&, const Packet4f&)
{
// Just for compilation
}
template<typename Scalar, typename Packet, typename RhsPacket, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_ALWAYS_INLINE void pgercMMA(__vector_quad* accReal, __vector_quad* accImag, const Packet& lhsV, const Packet& lhsVi, const RhsPacket& rhsV, const RhsPacket& rhsVi)
{
pgerMMA<Packet, RhsPacket, false>(accReal, rhsV, lhsV);
if(LhsIsReal) {
pgerMMA<Packet, RhsPacket, ConjugateRhs>(accImag, rhsVi, lhsV);
} else {
if(!RhsIsReal) {
pgerMMA<Packet, RhsPacket, ConjugateLhs == ConjugateRhs>(accReal, rhsVi, lhsVi);
pgerMMA<Packet, RhsPacket, ConjugateRhs>(accImag, rhsVi, lhsV);
} else {
EIGEN_UNUSED_VARIABLE(rhsVi);
}
pgerMMA<Packet, RhsPacket, ConjugateLhs>(accImag, rhsV, lhsVi);
}
}
// This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled.
template<typename Scalar, typename Packet>
EIGEN_ALWAYS_INLINE void ploadRhsMMA(const Scalar* rhs, Packet& rhsV)
{
rhsV = ploadRhs<Scalar, Packet>(rhs);
}
template<>
EIGEN_ALWAYS_INLINE void ploadRhsMMA<double, PacketBlock<Packet2d, 2> >(const double* rhs, PacketBlock<Packet2d, 2>& rhsV)
{
rhsV.packet[0] = ploadRhs<double, Packet2d>((const double *)((Packet2d *)rhs ));
rhsV.packet[1] = ploadRhs<double, Packet2d>((const double *)(((Packet2d *)rhs) + 1));
}
template<>
EIGEN_ALWAYS_INLINE void ploadRhsMMA<double, __vector_pair>(const double* rhs, __vector_pair& rhsV)
{
#if EIGEN_COMP_LLVM
__builtin_vsx_assemble_pair(&rhsV,
(__vector unsigned char)(ploadRhs<double, Packet2d>((const double *)(((Packet2d *)rhs) + 1))),
(__vector unsigned char)(ploadRhs<double, Packet2d>((const double *)((Packet2d *)rhs ))));
#else
__asm__ ("lxvp %x0,%1" : "=wa" (rhsV) : "Y" (*rhs));
#endif
}
template<>
EIGEN_ALWAYS_INLINE void ploadRhsMMA(const float*, __vector_pair&)
{
// Just for compilation
}
// PEEL_MMA loop factor.
#define PEEL_MMA 7
#define MICRO_MMA_UNROLL(func) \
func(0) func(1) func(2) func(3) func(4) func(5) func(6) func(7)
#define MICRO_MMA_LOAD_ONE(iter) \
if (unroll_factor > iter) { \
lhsV##iter = ploadLhs<Scalar, Packet>(lhs_ptr##iter); \
lhs_ptr##iter += accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(lhsV##iter); \
}
#define MICRO_MMA_WORK_ONE(iter, type, peel) \
if (unroll_factor > iter) { \
pgerMMA<Packet, type, false>(&accZero##iter, rhsV##peel, lhsV##iter); \
}
#define MICRO_MMA_TYPE_PEEL(func, func2, type, peel) \
if (PEEL_MMA > peel) { \
Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4, lhsV5, lhsV6, lhsV7; \
ploadRhsMMA<Scalar, type>(rhs_ptr + (accRows * peel), rhsV##peel); \
MICRO_MMA_UNROLL(func2); \
func(0,type,peel) func(1,type,peel) func(2,type,peel) func(3,type,peel) \
func(4,type,peel) func(5,type,peel) func(6,type,peel) func(7,type,peel) \
} else { \
EIGEN_UNUSED_VARIABLE(rhsV##peel); \
}
#define MICRO_MMA_UNROLL_TYPE_PEEL(func, func2, type) \
type rhsV0, rhsV1, rhsV2, rhsV3, rhsV4, rhsV5, rhsV6, rhsV7; \
MICRO_MMA_TYPE_PEEL(func,func2,type,0); MICRO_MMA_TYPE_PEEL(func,func2,type,1); \
MICRO_MMA_TYPE_PEEL(func,func2,type,2); MICRO_MMA_TYPE_PEEL(func,func2,type,3); \
MICRO_MMA_TYPE_PEEL(func,func2,type,4); MICRO_MMA_TYPE_PEEL(func,func2,type,5); \
MICRO_MMA_TYPE_PEEL(func,func2,type,6); MICRO_MMA_TYPE_PEEL(func,func2,type,7);
#define MICRO_MMA_UNROLL_TYPE_ONE(func, func2, type) \
type rhsV0; \
MICRO_MMA_TYPE_PEEL(func,func2,type,0);
#define MICRO_MMA_ONE_PEEL \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_MMA_UNROLL_TYPE_PEEL(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_MMA_UNROLL_TYPE_PEEL(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr += (accRows * PEEL_MMA);
#define MICRO_MMA_ONE \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_MMA_UNROLL_TYPE_ONE(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_MMA_UNROLL_TYPE_ONE(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr += accRows;
#define MICRO_MMA_DST_PTR_ONE(iter) \
if (unroll_factor > iter) { \
bsetzeroMMA<Scalar, Packet>(&accZero##iter); \
} else { \
EIGEN_UNUSED_VARIABLE(accZero##iter); \
}
#define MICRO_MMA_DST_PTR MICRO_MMA_UNROLL(MICRO_MMA_DST_PTR_ONE)
#define MICRO_MMA_SRC_PTR_ONE(iter) \
if (unroll_factor > iter) { \
lhs_ptr##iter = lhs_base + ( (row/accCols) + iter )*strideA*accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(lhs_ptr##iter); \
}
#define MICRO_MMA_SRC_PTR MICRO_MMA_UNROLL(MICRO_MMA_SRC_PTR_ONE)
#define MICRO_MMA_PREFETCH_ONE(iter) \
if (unroll_factor > iter) { \
EIGEN_POWER_PREFETCH(lhs_ptr##iter); \
}
#define MICRO_MMA_PREFETCH MICRO_MMA_UNROLL(MICRO_MMA_PREFETCH_ONE)
#define MICRO_MMA_STORE_ONE(iter) \
if (unroll_factor > iter) { \
storeAccumulator<DataMapper, Index, Packet, accCols>(row + iter*accCols, res, pAlpha, &accZero##iter); \
}
#define MICRO_MMA_STORE MICRO_MMA_UNROLL(MICRO_MMA_STORE_ONE)
template<int unroll_factor, typename Scalar, typename Packet, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols>
EIGEN_ALWAYS_INLINE void gemm_unrolled_MMA_iteration(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index& row,
const Packet& pAlpha)
{
const Scalar* rhs_ptr = rhs_base;
const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, * lhs_ptr7 = NULL;
__vector_quad accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7;
MICRO_MMA_SRC_PTR
MICRO_MMA_DST_PTR
Index k = 0;
for(; k + PEEL_MMA <= depth; k+= PEEL_MMA)
{
EIGEN_POWER_PREFETCH(rhs_ptr);
MICRO_MMA_PREFETCH
MICRO_MMA_ONE_PEEL
}
for(; k < depth; k++)
{
MICRO_MMA_ONE
}
MICRO_MMA_STORE
row += unroll_factor*accCols;
}
template<typename Scalar, typename Packet, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols>
EIGEN_ALWAYS_INLINE void gemmMMA_cols(
const DataMapper& res,
const Scalar* blockA,
const Scalar* blockB,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index offsetB,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlpha,
const Packet& pMask)
{
const DataMapper res3 = res.getSubMapper(0, col);
const Scalar* rhs_base = blockB + col*strideB + accRows*offsetB;
const Scalar* lhs_base = blockA + accCols*offsetA;
Index row = 0;
#define MAX_MMA_UNROLL 7
while(row + MAX_MMA_UNROLL*accCols <= rows) {
gemm_unrolled_MMA_iteration<MAX_MMA_UNROLL, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
}
switch( (rows-row)/accCols ) {
#if MAX_MMA_UNROLL > 7
case 7:
gemm_unrolled_MMA_iteration<7, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 6
case 6:
gemm_unrolled_MMA_iteration<6, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 5
case 5:
gemm_unrolled_MMA_iteration<5, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 4
case 4:
gemm_unrolled_MMA_iteration<4, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 3
case 3:
gemm_unrolled_MMA_iteration<3, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 2
case 2:
gemm_unrolled_MMA_iteration<2, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
#if MAX_MMA_UNROLL > 1
case 1:
gemm_unrolled_MMA_iteration<1, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res3, lhs_base, rhs_base, depth, strideA, row, pAlpha);
break;
#endif
default:
break;
}
#undef MAX_MMA_UNROLL
if(remaining_rows > 0)
{
gemm_extra_row<Scalar, Packet, DataMapper, Index, accRows, accCols>(res3, blockA, rhs_base, depth, strideA, offsetA, row, col, rows, cols, remaining_rows, pAlpha, pMask);
}
}
template<typename Scalar, typename Index, typename Packet, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols>
void gemmMMA(const DataMapper& res, const Scalar* blockA, const Scalar* blockB, Index rows, Index depth, Index cols, Scalar alpha, Index strideA, Index strideB, Index offsetA, Index offsetB)
{
const Index remaining_rows = rows % accCols;
if( strideA == -1 ) strideA = depth;
if( strideB == -1 ) strideB = depth;
const Packet pAlpha = pset1<Packet>(alpha);
const Packet pMask = bmask<Packet>((const int)(remaining_rows));
Index col = 0;
for(; col + accRows <= cols; col += accRows)
{
gemmMMA_cols<Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, blockA, blockB, depth, strideA, offsetA, strideB, offsetB, col, rows, cols, remaining_rows, pAlpha, pMask);
}
gemm_extra_cols<Scalar, Packet, DataMapper, Index, accCols>(res, blockA, blockB, depth, strideA, offsetA, strideB, offsetB, col, rows, cols, remaining_rows, pAlpha, pMask);
}
#define accColsC (accCols / 2)
#define advanceRows ((LhsIsReal) ? 1 : 2)
#define advanceCols ((RhsIsReal) ? 1 : 2)
// PEEL_COMPLEX_MMA loop factor.
#define PEEL_COMPLEX_MMA 3
#define MICRO_COMPLEX_MMA_UNROLL(func) \
func(0) func(1) func(2) func(3)
#define MICRO_COMPLEX_MMA_LOAD_ONE(iter) \
if (unroll_factor > iter) { \
lhsV##iter = ploadLhs<Scalar, Packet>(lhs_ptr_real##iter); \
if(!LhsIsReal) { \
lhsVi##iter = ploadLhs<Scalar, Packet>(lhs_ptr_real##iter + imag_delta); \
} else { \
EIGEN_UNUSED_VARIABLE(lhsVi##iter); \
} \
lhs_ptr_real##iter += accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(lhsV##iter); \
EIGEN_UNUSED_VARIABLE(lhsVi##iter); \
}
#define MICRO_COMPLEX_MMA_WORK_ONE(iter, type, peel) \
if (unroll_factor > iter) { \
pgercMMA<Scalar, Packet, type, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal##iter, &accImag##iter, lhsV##iter, lhsVi##iter, rhsV##peel, rhsVi##peel); \
}
#define MICRO_COMPLEX_MMA_TYPE_PEEL(func, func2, type, peel) \
if (PEEL_COMPLEX_MMA > peel) { \
Packet lhsV0, lhsV1, lhsV2, lhsV3; \
Packet lhsVi0, lhsVi1, lhsVi2, lhsVi3; \
ploadRhsMMA<Scalar, type>(rhs_ptr_real + (accRows * peel), rhsV##peel); \
if(!RhsIsReal) { \
ploadRhsMMA<Scalar, type>(rhs_ptr_imag + (accRows * peel), rhsVi##peel); \
} else { \
EIGEN_UNUSED_VARIABLE(rhsVi##peel); \
} \
MICRO_COMPLEX_MMA_UNROLL(func2); \
func(0,type,peel) func(1,type,peel) func(2,type,peel) func(3,type,peel) \
} else { \
EIGEN_UNUSED_VARIABLE(rhsV##peel); \
EIGEN_UNUSED_VARIABLE(rhsVi##peel); \
}
#define MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(func, func2, type) \
type rhsV0, rhsV1, rhsV2, rhsV3; \
type rhsVi0, rhsVi1, rhsVi2, rhsVi3; \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,0); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,1); \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,2); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,3);
#define MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(func, func2, type) \
type rhsV0, rhsVi0; \
MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,0);
#define MICRO_COMPLEX_MMA_ONE_PEEL \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr_real += (accRows * PEEL_COMPLEX_MMA); \
if(!RhsIsReal) rhs_ptr_imag += (accRows * PEEL_COMPLEX_MMA);
#define MICRO_COMPLEX_MMA_ONE \
if (sizeof(Scalar) == sizeof(float)) { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, RhsPacket); \
} else { \
MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, __vector_pair); \
} \
rhs_ptr_real += accRows; \
if(!RhsIsReal) rhs_ptr_imag += accRows;
#define MICRO_COMPLEX_MMA_DST_PTR_ONE(iter) \
if (unroll_factor > iter) { \
bsetzeroMMA<Scalar, Packet>(&accReal##iter); \
bsetzeroMMA<Scalar, Packet>(&accImag##iter); \
} else { \
EIGEN_UNUSED_VARIABLE(accReal##iter); \
EIGEN_UNUSED_VARIABLE(accImag##iter); \
}
#define MICRO_COMPLEX_MMA_DST_PTR MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_DST_PTR_ONE)
#define MICRO_COMPLEX_MMA_SRC_PTR_ONE(iter) \
if (unroll_factor > iter) { \
lhs_ptr_real##iter = lhs_base + ( ((advanceRows*row)/accCols) + iter*advanceRows )*strideA*accCols; \
} else { \
EIGEN_UNUSED_VARIABLE(lhs_ptr_real##iter); \
}
#define MICRO_COMPLEX_MMA_SRC_PTR MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_SRC_PTR_ONE)
#define MICRO_COMPLEX_MMA_PREFETCH_ONE(iter) \
if (unroll_factor > iter) { \
EIGEN_POWER_PREFETCH(lhs_ptr_real##iter); \
}
#define MICRO_COMPLEX_MMA_PREFETCH MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_PREFETCH_ONE)
#define MICRO_COMPLEX_MMA_STORE_ONE(iter) \
if (unroll_factor > iter) { \
storeComplexAccumulator<DataMapper, Index, Packet, Packetc, accColsC>(row + iter*accCols, res, pAlphaReal, pAlphaImag, &accReal##iter, &accImag##iter); \
}
#define MICRO_COMPLEX_MMA_STORE MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_STORE_ONE)
template<int unroll_factor, typename Scalar, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_ALWAYS_INLINE void gemm_complex_unrolled_MMA_iteration(
const DataMapper& res,
const Scalar* lhs_base,
const Scalar* rhs_base,
Index depth,
Index strideA,
Index strideB,
Index& row,
const Packet& pAlphaReal,
const Packet& pAlphaImag)
{
const Scalar* rhs_ptr_real = rhs_base;
const Scalar* rhs_ptr_imag = NULL;
const Index imag_delta = accCols*strideA;
if(!RhsIsReal) {
rhs_ptr_imag = rhs_base + accRows*strideB;
} else {
EIGEN_UNUSED_VARIABLE(rhs_ptr_imag);
}
const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_real1 = NULL;
const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_real3 = NULL;
__vector_quad accReal0, accImag0, accReal1, accImag1, accReal2, accImag2, accReal3, accImag3;
MICRO_COMPLEX_MMA_SRC_PTR
MICRO_COMPLEX_MMA_DST_PTR
Index k = 0;
for(; k + PEEL_COMPLEX_MMA <= depth; k+= PEEL_COMPLEX_MMA)
{
EIGEN_POWER_PREFETCH(rhs_ptr_real);
if(!RhsIsReal) {
EIGEN_POWER_PREFETCH(rhs_ptr_imag);
}
MICRO_COMPLEX_MMA_PREFETCH
MICRO_COMPLEX_MMA_ONE_PEEL
}
for(; k < depth; k++)
{
MICRO_COMPLEX_MMA_ONE
}
MICRO_COMPLEX_MMA_STORE
row += unroll_factor*accCols;
}
template<typename Scalar, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, typename Index, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
EIGEN_ALWAYS_INLINE void gemmMMA_complex_cols(
const DataMapper& res,
const Scalar* blockA,
const Scalar* blockB,
Index depth,
Index strideA,
Index offsetA,
Index strideB,
Index offsetB,
Index col,
Index rows,
Index cols,
Index remaining_rows,
const Packet& pAlphaReal,
const Packet& pAlphaImag,
const Packet& pMask)
{
const DataMapper res3 = res.getSubMapper(0, col);
const Scalar* rhs_base = blockB + advanceCols*col*strideB + accRows*offsetB;
const Scalar* lhs_base = blockA + accCols*offsetA;
Index row = 0;
#define MAX_COMPLEX_MMA_UNROLL 4
while(row + MAX_COMPLEX_MMA_UNROLL*accCols <= rows) {
gemm_complex_unrolled_MMA_iteration<MAX_COMPLEX_MMA_UNROLL, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res3, lhs_base, rhs_base, depth, strideA, strideB, row, pAlphaReal, pAlphaImag);
}
switch( (rows-row)/accCols ) {
#if MAX_COMPLEX_MMA_UNROLL > 4
case 4:
gemm_complex_unrolled_MMA_iteration<4, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res3, lhs_base, rhs_base, depth, strideA, strideB, row, pAlphaReal, pAlphaImag);
break;
#endif
#if MAX_COMPLEX_MMA_UNROLL > 3
case 3:
gemm_complex_unrolled_MMA_iteration<3, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res3, lhs_base, rhs_base, depth, strideA, strideB, row, pAlphaReal, pAlphaImag);
break;
#endif
#if MAX_COMPLEX_MMA_UNROLL > 2
case 2:
gemm_complex_unrolled_MMA_iteration<2, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res3, lhs_base, rhs_base, depth, strideA, strideB, row, pAlphaReal, pAlphaImag);
break;
#endif
#if MAX_COMPLEX_MMA_UNROLL > 1
case 1:
gemm_complex_unrolled_MMA_iteration<1, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res3, lhs_base, rhs_base, depth, strideA, strideB, row, pAlphaReal, pAlphaImag);
break;
#endif
default:
break;
}
#undef MAX_COMPLEX_MMA_UNROLL
if(remaining_rows > 0)
{
gemm_complex_extra_row<Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res3, blockA, rhs_base, depth, strideA, offsetA, strideB, row, col, rows, cols, remaining_rows, pAlphaReal, pAlphaImag, pMask);
}
}
template<typename LhsScalar, typename RhsScalar, typename Scalarc, typename Scalar, typename Index, typename Packet, typename Packetc, typename RhsPacket, typename DataMapper, const Index accRows, const Index accCols, bool ConjugateLhs, bool ConjugateRhs, bool LhsIsReal, bool RhsIsReal>
void gemm_complexMMA(const DataMapper& res, const LhsScalar* blockAc, const RhsScalar* blockBc, Index rows, Index depth, Index cols, Scalarc alpha, Index strideA, Index strideB, Index offsetA, Index offsetB)
{
const Index remaining_rows = rows % accCols;
if( strideA == -1 ) strideA = depth;
if( strideB == -1 ) strideB = depth;
const Packet pAlphaReal = pset1<Packet>(alpha.real());
const Packet pAlphaImag = pset1<Packet>(alpha.imag());
const Packet pMask = bmask<Packet>((const int)(remaining_rows));
const Scalar* blockA = (Scalar *) blockAc;
const Scalar* blockB = (Scalar *) blockBc;
Index col = 0;
for(; col + accRows <= cols; col += accRows)
{
gemmMMA_complex_cols<Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, blockA, blockB, depth, strideA, offsetA, strideB, offsetB, col, rows, cols, remaining_rows, pAlphaReal, pAlphaImag, pMask);
}
gemm_complex_extra_cols<Scalar, Packet, Packetc, DataMapper, Index, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, blockA, blockB, depth, strideA, offsetA, strideB, offsetB, col, rows, cols, remaining_rows, pAlphaReal, pAlphaImag, pMask);
}
#undef accColsC
#undef advanceRows
#undef advanceCols
} // end namespace internal
} // end namespace Eigen
#if defined(EIGEN_ALTIVEC_MMA_DYNAMIC_DISPATCH)
#pragma GCC pop_options
#endif
#endif // EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -2,6 +2,7 @@
// for linear algebra.
//
// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
// Copyright (C) 2021 C. Antonio Sanchez <cantonios@google.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
@@ -10,94 +11,259 @@
#ifndef EIGEN_COMPLEX_CUDA_H
#define EIGEN_COMPLEX_CUDA_H
// clang-format off
// Many std::complex methods such as operator+, operator-, operator* and
// operator/ are not constexpr. Due to this, GCC and older versions of clang do
// not treat them as device functions and thus Eigen functors making use of
// these operators fail to compile. Here, we manually specialize these
// operators and functors for complex types when building for CUDA to enable
// their use on-device.
//
// NOTES:
// - Compound assignment operators +=,-=,*=,/=(Scalar) will not work on device,
// since they are already specialized in the standard. Using them will result
// in silent kernel failures.
// - Compiling with MSVC and using +=,-=,*=,/=(std::complex<Scalar>) will lead
// to duplicate definition errors, since these are already specialized in
// Visual Studio's <complex> header (contrary to the standard). This is
// preferable to removing such definitions, which will lead to silent kernel
// failures.
// - Compiling with ICC requires defining _USE_COMPLEX_SPECIALIZATION_ prior
// to the first inclusion of <complex>.
#if defined(EIGEN_CUDACC) && defined(EIGEN_GPU_COMPILE_PHASE)
// ICC already specializes std::complex<float> and std::complex<double>
// operators, preventing us from making them device functions here.
// This will lead to silent runtime errors if the operators are used on device.
//
// To allow std::complex operator use on device, define _OVERRIDE_COMPLEX_SPECIALIZATION_
// prior to first inclusion of <complex>. This prevents ICC from adding
// its own specializations, so our custom ones below can be used instead.
#if !(defined(EIGEN_COMP_ICC) && defined(_USE_COMPLEX_SPECIALIZATION_))
// Import Eigen's internal operator specializations.
#define EIGEN_USING_STD_COMPLEX_OPERATORS \
using Eigen::complex_operator_detail::operator+; \
using Eigen::complex_operator_detail::operator-; \
using Eigen::complex_operator_detail::operator*; \
using Eigen::complex_operator_detail::operator/; \
using Eigen::complex_operator_detail::operator+=; \
using Eigen::complex_operator_detail::operator-=; \
using Eigen::complex_operator_detail::operator*=; \
using Eigen::complex_operator_detail::operator/=; \
using Eigen::complex_operator_detail::operator==; \
using Eigen::complex_operator_detail::operator!=;
namespace Eigen {
namespace internal {
// Specialized std::complex overloads.
namespace complex_operator_detail {
#if defined(EIGEN_CUDACC) && defined(EIGEN_USE_GPU)
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_multiply(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
return std::complex<T>(
a_real * b_real - a_imag * b_imag,
a_imag * b_real + a_real * b_imag);
}
// Many std::complex methods such as operator+, operator-, operator* and
// operator/ are not constexpr. Due to this, clang does not treat them as device
// functions and thus Eigen functors making use of these operators fail to
// compile. Here, we manually specialize these functors for complex types when
// building for CUDA to avoid non-constexpr methods.
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_fast(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) / norm,
(a_imag * b_real - a_real * b_imag) / norm);
}
// Sum
template<typename T> struct scalar_sum_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_sum_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
return std::complex<T>(numext::real(a) + numext::real(b),
numext::imag(a) + numext::imag(b));
}
};
template<typename T> struct scalar_sum_op<std::complex<T>, std::complex<T> > : scalar_sum_op<const std::complex<T>, const std::complex<T> > {};
// Difference
template<typename T> struct scalar_difference_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_difference_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
return std::complex<T>(numext::real(a) - numext::real(b),
numext::imag(a) - numext::imag(b));
}
};
template<typename T> struct scalar_difference_op<std::complex<T>, std::complex<T> > : scalar_difference_op<const std::complex<T>, const std::complex<T> > {};
// Product
template<typename T> struct scalar_product_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T> >::HasMul
};
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_product_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
return std::complex<T>(a_real * b_real - a_imag * b_imag,
a_real * b_imag + a_imag * b_real);
}
};
template<typename T> struct scalar_product_op<std::complex<T>, std::complex<T> > : scalar_product_op<const std::complex<T>, const std::complex<T> > {};
// Quotient
template<typename T> struct scalar_quotient_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T> >::HasDiv
};
typedef typename std::complex<T> result_type;
EIGEN_EMPTY_STRUCT_CTOR(scalar_quotient_op)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex<T> operator() (const std::complex<T>& a, const std::complex<T>& b) const {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
const T norm = T(1) / (b_real * b_real + b_imag * b_imag);
return std::complex<T>((a_real * b_real + a_imag * b_imag) * norm,
(a_imag * b_real - a_real * b_imag) * norm);
}
};
template<typename T> struct scalar_quotient_op<std::complex<T>, std::complex<T> > : scalar_quotient_op<const std::complex<T>, const std::complex<T> > {};
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide_stable(const std::complex<T>& a, const std::complex<T>& b) {
const T a_real = numext::real(a);
const T a_imag = numext::imag(a);
const T b_real = numext::real(b);
const T b_imag = numext::imag(b);
// Smith's complex division (https://arxiv.org/pdf/1210.4539.pdf),
// guards against over/under-flow.
const bool scale_imag = numext::abs(b_imag) <= numext::abs(b_real);
const T rscale = scale_imag ? T(1) : b_real / b_imag;
const T iscale = scale_imag ? b_imag / b_real : T(1);
const T denominator = b_real * rscale + b_imag * iscale;
return std::complex<T>((a_real * rscale + a_imag * iscale) / denominator,
(a_imag * rscale - a_real * iscale) / denominator);
}
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
std::complex<T> complex_divide(const std::complex<T>& a, const std::complex<T>& b) {
#if EIGEN_FAST_MATH
return complex_divide_fast(a, b);
#else
return complex_divide_stable(a, b);
#endif
}
} // end namespace internal
// NOTE: We cannot specialize compound assignment operators with Scalar T,
// (i.e. operator@=(const T&), for @=+,-,*,/)
// since they are already specialized for float/double/long double within
// the standard <complex> header. We also do not specialize the stream
// operators.
#define EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(T) \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a) { return a; } \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a) { \
return std::complex<T>(-numext::real(a), -numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a, const std::complex<T>& b) { \
return std::complex<T>(numext::real(a) + numext::real(b), numext::imag(a) + numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) + b, numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator+(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a + numext::real(b), numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a, const std::complex<T>& b) { \
return std::complex<T>(numext::real(a) - numext::real(b), numext::imag(a) - numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) - b, numext::imag(a)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator-(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a - numext::real(b), -numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const std::complex<T>& a, const std::complex<T>& b) { \
return complex_multiply(a, b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) * b, numext::imag(a) * b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator*(const T& a, const std::complex<T>& b) { \
return std::complex<T>(a * numext::real(b), a * numext::imag(b)); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const std::complex<T>& a, const std::complex<T>& b) { \
return complex_divide(a, b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const std::complex<T>& a, const T& b) { \
return std::complex<T>(numext::real(a) / b, numext::imag(a) / b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T> operator/(const T& a, const std::complex<T>& b) { \
return complex_divide(std::complex<T>(a, 0), b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator+=(std::complex<T>& a, const std::complex<T>& b) { \
numext::real_ref(a) += numext::real(b); \
numext::imag_ref(a) += numext::imag(b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator-=(std::complex<T>& a, const std::complex<T>& b) { \
numext::real_ref(a) -= numext::real(b); \
numext::imag_ref(a) -= numext::imag(b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator*=(std::complex<T>& a, const std::complex<T>& b) { \
a = complex_multiply(a, b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
std::complex<T>& operator/=(std::complex<T>& a, const std::complex<T>& b) { \
a = complex_divide(a, b); \
return a; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const std::complex<T>& a, const std::complex<T>& b) { \
return numext::real(a) == numext::real(b) && numext::imag(a) == numext::imag(b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const std::complex<T>& a, const T& b) { \
return numext::real(a) == b && numext::imag(a) == 0; \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator==(const T& a, const std::complex<T>& b) { \
return a == numext::real(b) && 0 == numext::imag(b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const std::complex<T>& a, const std::complex<T>& b) { \
return !(a == b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const std::complex<T>& a, const T& b) { \
return !(a == b); \
} \
\
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
bool operator!=(const T& a, const std::complex<T>& b) { \
return !(a == b); \
}
} // end namespace Eigen
// Do not specialize for long double, since that reduces to double on device.
EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(float)
EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(double)
#endif // EIGEN_COMPLEX_CUDA_H
#undef EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS
} // namespace complex_operator_detail
EIGEN_USING_STD_COMPLEX_OPERATORS
namespace numext {
EIGEN_USING_STD_COMPLEX_OPERATORS
} // namespace numext
namespace internal {
EIGEN_USING_STD_COMPLEX_OPERATORS
} // namespace internal
} // namespace Eigen
#endif // !(EIGEN_COMP_ICC && _USE_COMPLEX_SPECIALIZATION_)
#endif // EIGEN_CUDACC && EIGEN_GPU_COMPILE_PHASE
#endif // EIGEN_COMPLEX_CUDA_H

View File

@@ -0,0 +1,688 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef EIGEN_BFLOAT16_H
#define EIGEN_BFLOAT16_H
#define BF16_PACKET_FUNCTION(PACKET_F, PACKET_BF16, METHOD) \
template <> \
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED \
PACKET_BF16 METHOD<PACKET_BF16>(const PACKET_BF16& _x) { \
return F32ToBf16(METHOD<PACKET_F>(Bf16ToF32(_x))); \
}
namespace Eigen {
struct bfloat16;
namespace bfloat16_impl {
// Make our own __bfloat16_raw definition.
struct __bfloat16_raw {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw() : value(0) {}
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw(unsigned short raw) : value(raw) {}
unsigned short value;
};
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(unsigned short value);
template <bool AssumeArgumentIsNormalOrInfinityOrZero>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff);
// Forward declarations of template specializations, to avoid Visual C++ 2019 errors, saying:
// > error C2908: explicit specialization; 'float_to_bfloat16_rtne' has already been instantiated
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<false>(float ff);
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<true>(float ff);
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h);
struct bfloat16_base : public __bfloat16_raw {
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base(const __bfloat16_raw& h) : __bfloat16_raw(h) {}
};
} // namespace bfloat16_impl
// Class definition.
struct bfloat16 : public bfloat16_impl::bfloat16_base {
typedef bfloat16_impl::__bfloat16_raw __bfloat16_raw;
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16() {}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const __bfloat16_raw& h) : bfloat16_impl::bfloat16_base(h) {}
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(bool b)
: bfloat16_impl::bfloat16_base(bfloat16_impl::raw_uint16_to_bfloat16(b ? 0x3f80 : 0)) {}
template<class T>
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(T val)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<internal::is_integral<T>::value>(static_cast<float>(val))) {}
explicit EIGEN_DEVICE_FUNC bfloat16(float f)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<false>(f)) {}
// Following the convention of numpy, converting between complex and
// float will lead to loss of imag value.
template<typename RealScalar>
explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const std::complex<RealScalar>& val)
: bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne<false>(static_cast<float>(val.real()))) {}
EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless.
return bfloat16_impl::bfloat16_to_float(*this);
}
};
} // namespace Eigen
namespace std {
template<>
struct numeric_limits<Eigen::bfloat16> {
static const bool is_specialized = true;
static const bool is_signed = true;
static const bool is_integer = false;
static const bool is_exact = false;
static const bool has_infinity = true;
static const bool has_quiet_NaN = true;
static const bool has_signaling_NaN = true;
static const float_denorm_style has_denorm = std::denorm_absent;
static const bool has_denorm_loss = false;
static const std::float_round_style round_style = numeric_limits<float>::round_style;
static const bool is_iec559 = false;
static const bool is_bounded = true;
static const bool is_modulo = false;
static const int digits = 8;
static const int digits10 = 2;
static const int max_digits10 = 4;
static const int radix = 2;
static const int min_exponent = numeric_limits<float>::min_exponent;
static const int min_exponent10 = numeric_limits<float>::min_exponent10;
static const int max_exponent = numeric_limits<float>::max_exponent;
static const int max_exponent10 = numeric_limits<float>::max_exponent10;
static const bool traps = numeric_limits<float>::traps;
static const bool tinyness_before = numeric_limits<float>::tinyness_before;
static Eigen::bfloat16 (min)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0080); }
static Eigen::bfloat16 lowest() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0xff7f); }
static Eigen::bfloat16 (max)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f7f); }
static Eigen::bfloat16 epsilon() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x3c00); }
static Eigen::bfloat16 round_error() { return Eigen::bfloat16(0x3f00); }
static Eigen::bfloat16 infinity() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f80); }
static Eigen::bfloat16 quiet_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0); }
static Eigen::bfloat16 signaling_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f81); }
static Eigen::bfloat16 denorm_min() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0001); }
};
// If std::numeric_limits<T> is specialized, should also specialize
// std::numeric_limits<const T>, std::numeric_limits<volatile T>, and
// std::numeric_limits<const volatile T>
// https://stackoverflow.com/a/16519653/
template<>
struct numeric_limits<const Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
template<>
struct numeric_limits<volatile Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
template<>
struct numeric_limits<const volatile Eigen::bfloat16> : numeric_limits<Eigen::bfloat16> {};
} // namespace std
namespace Eigen {
namespace bfloat16_impl {
// We need to distinguish clang as the CUDA compiler from clang as the host compiler,
// invoked by NVCC (e.g. on MacOS). The former needs to see both host and device implementation
// of the functions, while the latter can only deal with one of them.
#if !defined(EIGEN_HAS_NATIVE_BF16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for bfloat16 floats
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
// We need to provide emulated *host-side* BF16 operators for clang.
#pragma push_macro("EIGEN_DEVICE_FUNC")
#undef EIGEN_DEVICE_FUNC
#if defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_NATIVE_BF16)
#define EIGEN_DEVICE_FUNC __host__
#else // both host and device need emulated ops.
#define EIGEN_DEVICE_FUNC __host__ __device__
#endif
#endif
// Definitions for CPUs, mostly working through conversion
// to/from fp32.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) + float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const int& b) {
return bfloat16(float(a) + static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const int& a, const bfloat16& b) {
return bfloat16(static_cast<float>(a) + float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator * (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) * float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) - float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, const bfloat16& b) {
return bfloat16(float(a) / float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a) {
bfloat16 result;
result.value = a.value ^ 0x8000;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator += (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) + float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator *= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) * float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator -= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) - float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator /= (bfloat16& a, const bfloat16& b) {
a = bfloat16(float(a) / float(b));
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a) {
a += bfloat16(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a) {
a -= bfloat16(1);
return a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a, int) {
bfloat16 original_value = a;
++a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a, int) {
bfloat16 original_value = a;
--a;
return original_value;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const bfloat16& a, const bfloat16& b) {
return numext::equal_strict(float(a),float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const bfloat16& a, const bfloat16& b) {
return numext::not_equal_strict(float(a), float(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const bfloat16& a, const bfloat16& b) {
return float(a) < float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const bfloat16& a, const bfloat16& b) {
return float(a) <= float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const bfloat16& a, const bfloat16& b) {
return float(a) > float(b);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const bfloat16& a, const bfloat16& b) {
return float(a) >= float(b);
}
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
#pragma pop_macro("EIGEN_DEVICE_FUNC")
#endif
#endif // Emulate support for bfloat16 floats
// Division by an index. Do it in full float precision to avoid accuracy
// issues in converting the denominator to bfloat16.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, Index b) {
return bfloat16(static_cast<float>(a) / static_cast<float>(b));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw truncate_to_bfloat16(const float v) {
__bfloat16_raw output;
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(v)) {
output.value = std::signbit(v) ? 0xFFC0: 0x7FC0;
return output;
}
output.value = static_cast<numext::uint16_t>(numext::bit_cast<numext::uint32_t>(v) >> 16);
return output;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(numext::uint16_t value) {
return __bfloat16_raw(value);
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR numext::uint16_t raw_bfloat16_as_uint16(const __bfloat16_raw& bf) {
return bf.value;
}
// float_to_bfloat16_rtne template specialization that does not make any
// assumption about the value of its function argument (ff).
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<false>(float ff) {
#if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16))
// Nothing to do here
#else
__bfloat16_raw output;
if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(ff)) {
// If the value is a NaN, squash it to a qNaN with msb of fraction set,
// this makes sure after truncation we don't end up with an inf.
//
// qNaN magic: All exponent bits set + most significant bit of fraction
// set.
output.value = std::signbit(ff) ? 0xFFC0: 0x7FC0;
} else {
// Fast rounding algorithm that rounds a half value to nearest even. This
// reduces expected error when we convert a large number of floats. Here
// is how it works:
//
// Definitions:
// To convert a float 32 to bfloat16, a float 32 can be viewed as 32 bits
// with the following tags:
//
// Sign | Exp (8 bits) | Frac (23 bits)
// S EEEEEEEE FFFFFFLRTTTTTTTTTTTTTTT
//
// S: Sign bit.
// E: Exponent bits.
// F: First 6 bits of fraction.
// L: Least significant bit of resulting bfloat16 if we truncate away the
// rest of the float32. This is also the 7th bit of fraction
// R: Rounding bit, 8th bit of fraction.
// T: Sticky bits, rest of fraction, 15 bits.
//
// To round half to nearest even, there are 3 cases where we want to round
// down (simply truncate the result of the bits away, which consists of
// rounding bit and sticky bits) and two cases where we want to round up
// (truncate then add one to the result).
//
// The fast converting algorithm simply adds lsb (L) to 0x7fff (15 bits of
// 1s) as the rounding bias, adds the rounding bias to the input, then
// truncates the last 16 bits away.
//
// To understand how it works, we can analyze this algorithm case by case:
//
// 1. L = 0, R = 0:
// Expect: round down, this is less than half value.
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input may create any carry, depending on
// whether there is any value set to 1 in T bits.
// - R may be set to 1 if there is a carry.
// - L remains 0.
// - Note that this case also handles Inf and -Inf, where all fraction
// bits, including L, R and Ts are all 0. The output remains Inf after
// this algorithm.
//
// 2. L = 1, R = 0:
// Expect: round down, this is less than half value.
//
// Algorithm:
// - Rounding bias: 0x7fff + 1 = 0x8000
// - Adding rounding bias to input doesn't change sticky bits but
// adds 1 to rounding bit.
// - L remains 1.
//
// 3. L = 0, R = 1, all of T are 0:
// Expect: round down, this is exactly at half, the result is already
// even (L=0).
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input sets all sticky bits to 1, but
// doesn't create a carry.
// - R remains 1.
// - L remains 0.
//
// 4. L = 1, R = 1:
// Expect: round up, this is exactly at half, the result needs to be
// round to the next even number.
//
// Algorithm:
// - Rounding bias: 0x7fff + 1 = 0x8000
// - Adding rounding bias to input doesn't change sticky bits, but
// creates a carry from rounding bit.
// - The carry sets L to 0, creates another carry bit and propagate
// forward to F bits.
// - If all the F bits are 1, a carry then propagates to the exponent
// bits, which then creates the minimum value with the next exponent
// value. Note that we won't have the case where exponents are all 1,
// since that's either a NaN (handled in the other if condition) or inf
// (handled in case 1).
//
// 5. L = 0, R = 1, any of T is 1:
// Expect: round up, this is greater than half.
//
// Algorithm:
// - Rounding bias: 0x7fff + 0 = 0x7fff
// - Adding rounding bias to input creates a carry from sticky bits,
// sets rounding bit to 0, then create another carry.
// - The second carry sets L to 1.
//
// Examples:
//
// Exact half value that is already even:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1000000000000000
//
// This falls into case 3. We truncate the rest of 16 bits and no
// carry is created into F and L:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
//
// Exact half value, round to next even number:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1000000000000000
//
// This falls into case 4. We create a carry from R and T,
// which then propagates into L and F:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
//
//
// Max denormal value round to min normal value:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1111111111111111
//
// This falls into case 4. We create a carry from R and T,
// propagate into L and F, which then propagates into exponent
// bits:
//
// Output:
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
//
// Max normal value round to Inf:
// Input:
// Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit)
// S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT
// 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1111111111111111
//
// This falls into case 4. We create a carry from R and T,
// propagate into L and F, which then propagates into exponent
// bits:
//
// Sign | Exp (8 bit) | Frac (first 7 bit)
// S E E E E E E E E F F F F F F L
// 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
// At this point, ff must be either a normal float, or +/-infinity.
output = float_to_bfloat16_rtne<true>(ff);
}
return output;
#endif
}
// float_to_bfloat16_rtne template specialization that assumes that its function
// argument (ff) is either a normal floating point number, or +/-infinity, or
// zero. Used to improve the runtime performance of conversion from an integer
// type to bfloat16.
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne<true>(float ff) {
#if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16))
// Nothing to do here
#else
numext::uint32_t input = numext::bit_cast<numext::uint32_t>(ff);
__bfloat16_raw output;
// Least significant bit of resulting bfloat.
numext::uint32_t lsb = (input >> 16) & 1;
numext::uint32_t rounding_bias = 0x7fff + lsb;
input += rounding_bias;
output.value = static_cast<numext::uint16_t>(input >> 16);
return output;
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h) {
return numext::bit_cast<float>(static_cast<numext::uint32_t>(h.value) << 16);
}
// --- standard functions ---
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const bfloat16& a) {
EIGEN_USING_STD(isinf);
return (isinf)(float(a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const bfloat16& a) {
EIGEN_USING_STD(isnan);
return (isnan)(float(a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const bfloat16& a) {
return !(isinf EIGEN_NOT_A_MACRO (a)) && !(isnan EIGEN_NOT_A_MACRO (a));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 abs(const bfloat16& a) {
bfloat16 result;
result.value = a.value & 0x7FFF;
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 exp(const bfloat16& a) {
return bfloat16(::expf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 expm1(const bfloat16& a) {
return bfloat16(numext::expm1(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log(const bfloat16& a) {
return bfloat16(::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log1p(const bfloat16& a) {
return bfloat16(numext::log1p(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log10(const bfloat16& a) {
return bfloat16(::log10f(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log2(const bfloat16& a) {
return bfloat16(static_cast<float>(EIGEN_LOG2E) * ::logf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sqrt(const bfloat16& a) {
return bfloat16(::sqrtf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 pow(const bfloat16& a, const bfloat16& b) {
return bfloat16(::powf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sin(const bfloat16& a) {
return bfloat16(::sinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cos(const bfloat16& a) {
return bfloat16(::cosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tan(const bfloat16& a) {
return bfloat16(::tanf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asin(const bfloat16& a) {
return bfloat16(::asinf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acos(const bfloat16& a) {
return bfloat16(::acosf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atan(const bfloat16& a) {
return bfloat16(::atanf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sinh(const bfloat16& a) {
return bfloat16(::sinhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cosh(const bfloat16& a) {
return bfloat16(::coshf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tanh(const bfloat16& a) {
return bfloat16(::tanhf(float(a)));
}
#if EIGEN_HAS_CXX11_MATH
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asinh(const bfloat16& a) {
return bfloat16(::asinhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acosh(const bfloat16& a) {
return bfloat16(::acoshf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atanh(const bfloat16& a) {
return bfloat16(::atanhf(float(a)));
}
#endif
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 floor(const bfloat16& a) {
return bfloat16(::floorf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 ceil(const bfloat16& a) {
return bfloat16(::ceilf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 rint(const bfloat16& a) {
return bfloat16(::rintf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 round(const bfloat16& a) {
return bfloat16(::roundf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmod(const bfloat16& a, const bfloat16& b) {
return bfloat16(::fmodf(float(a), float(b)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (min)(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return f2 < f1 ? b : a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (max)(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return f1 < f2 ? b : a;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmin(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return bfloat16(::fminf(f1, f2));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmax(const bfloat16& a, const bfloat16& b) {
const float f1 = static_cast<float>(a);
const float f2 = static_cast<float>(b);
return bfloat16(::fmaxf(f1, f2));
}
#ifndef EIGEN_NO_IO
EIGEN_ALWAYS_INLINE std::ostream& operator << (std::ostream& os, const bfloat16& v) {
os << static_cast<float>(v);
return os;
}
#endif
} // namespace bfloat16_impl
namespace internal {
template<>
struct random_default_impl<bfloat16, false, false>
{
static inline bfloat16 run(const bfloat16& x, const bfloat16& y)
{
return x + (y-x) * bfloat16(float(std::rand()) / float(RAND_MAX));
}
static inline bfloat16 run()
{
return run(bfloat16(-1.f), bfloat16(1.f));
}
};
template<> struct is_arithmetic<bfloat16> { enum { value = true }; };
} // namespace internal
template<> struct NumTraits<Eigen::bfloat16>
: GenericNumTraits<Eigen::bfloat16>
{
enum {
IsSigned = true,
IsInteger = false,
IsComplex = false,
RequireInitialization = false
};
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 epsilon() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x3c00);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 dummy_precision() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x3D4D); // bfloat16(5e-2f);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 highest() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7F7F);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 lowest() {
return bfloat16_impl::raw_uint16_to_bfloat16(0xFF7F);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 infinity() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7f80);
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 quiet_NaN() {
return bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0);
}
};
} // namespace Eigen
namespace Eigen {
namespace numext {
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isnan)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isnan)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isinf)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isinf)(h);
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
bool (isfinite)(const Eigen::bfloat16& h) {
return (bfloat16_impl::isfinite)(h);
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bit_cast<Eigen::bfloat16, uint16_t>(const uint16_t& src) {
return Eigen::bfloat16(Eigen::bfloat16_impl::raw_uint16_to_bfloat16(src));
}
template <>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast<uint16_t, Eigen::bfloat16>(const Eigen::bfloat16& src) {
return Eigen::bfloat16_impl::raw_bfloat16_as_uint16(src);
}
} // namespace numext
} // namespace Eigen
#if EIGEN_HAS_STD_HASH
namespace std {
template <>
struct hash<Eigen::bfloat16> {
EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::bfloat16& a) const {
return static_cast<std::size_t>(Eigen::numext::bit_cast<Eigen::numext::uint16_t>(a));
}
};
} // namespace std
#endif
#endif // EIGEN_BFLOAT16_H

Some files were not shown because too many files have changed in this diff Show More