Compare commits

..

1989 Commits

Author SHA1 Message Date
Rasmus Larsen
a7c7d329d8 Merged in ezhulenev/eigen-01 (pull request PR-769)
Capture TensorMap by value inside tensor expression AST
2019-12-04 00:49:10 +00:00
Rasmus Larsen
cacf433975 Merged in anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754)
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.

Approved-by: Deven Desai <deven.desai.amd@gmail.com>
2019-12-04 00:45:42 +00:00
Eugene Zhulenev
8f4536e852 Capture TensorMap by value inside tensor expression AST 2019-12-03 16:39:05 -08:00
Rasmus Munk Larsen
4e696901f8 Remove __host__ annotation for device-only function. 2019-12-03 14:33:19 -08:00
Rasmus Munk Larsen
ead81559c8 Use EIGEN_DEVICE_FUNC macro instead of __device__. 2019-12-03 12:08:22 -08:00
Gael Guennebaud
6358599ecb Fix QuaternionBase::cast for quaternion map and wrapper. 2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013 bug #1776: fix vector-wise STL iterator's operator-> using a proxy as pointer type.
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Rasmus Munk Larsen
66f07efeae Revert the specialization for scalar_logistic_op<float> introduced in:
77b447c24e


While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.
2019-12-02 17:00:58 -08:00
Rasmus Larsen
3b15373bb3 Merged in ezhulenev/eigen-02 (pull request PR-767)
Fix shadow warnings in AlignedBox and SparseBlock
2019-12-02 18:23:11 +00:00
Deven Desai
312c8e77ff Fix for the HIP build+test errors.
Recent changes have introduced the following build error when compiling with HIPCC

---------

unsupported/test/../../Eigen/src/Core/GenericPacketMath.h:254:58: error:  'ldexp':  no overloaded function has restriction specifiers that are compatible with the ambient context 'pldexp'

---------

The fix for the error is to pick the math function(s) from the global namespace (where they are declared as device functions in the HIP header files) when compiling with HIPCC.
2019-12-02 17:41:32 +00:00
Rasmus Larsen
956131d0e6 Merged in codeplaysoftware/eigen/SYCL-Backend (pull request PR-691)
SYCL Backend

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-11-28 16:19:25 +00:00
Mehdi Goli
00f32752f7 [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Eugene Zhulenev
82a47338df Fix shadow warnings in AlignedBox and SparseBlock 2019-11-27 16:22:27 -08:00
Rasmus Munk Larsen
ea51a9eace Add missing EIGEN_DEVICE_FUNC attribute to template specializations for pexp to fix GPU build. 2019-11-27 10:17:09 -08:00
Rasmus Munk Larsen
5a3ebda36b Fix warning due to missing cast for exponent arguments for std::frexp and std::lexp. 2019-11-26 16:18:29 -08:00
Rasmus Larsen
2df57be856 Merged in realjhol/eigen/fix-warnings (pull request PR-760)
Fix warnings
2019-11-26 23:24:23 +00:00
Eugene Zhulenev
5496d0da0b Add async evaluation support to TensorReverse 2019-11-26 15:02:24 -08:00
Eugene Zhulenev
bc66c88255 Add async evaluation support to TensorPadding/TensorImagePatch/TensorShuffling 2019-11-26 11:41:57 -08:00
Gael Guennebaud
c79b6ffe1f Add an explicit example for auto and re-evaluation 2019-11-20 17:31:23 +01:00
Hans Johnson
e78ed6e7f3 COMP: Simplify install commands for Eigen
Confirm that install directory is identical
before and after this simplifying patch.

```bash
hg clone <<Eigen>>
mkdir eigen-bld
cd eigen-bld
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/bef
make install
find /tmp/pre_eigen_modernize >/tmp/bef

#  Apply this patch

cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/aft
make install
find /tmp/post_eigen_modernize |sed 's/post_e/pre_e/g' >/tmp/aft
diff /tmp/bef /tmp/aft
```
2019-11-17 15:14:25 -06:00
Hans Johnson
9d5cdc98c3 COMP: target_compile_definitions requires cmake 2.8.11
Features committed in 2016 have required cmake verison 2.8.11.

`sergiu Tue Nov 22 12:25:06 2016 +0100:   target_compile_definitions`

Set the minimum cmake version to the minimum version that
is capable of compiling or installing the code base.
2019-11-17 14:59:32 -06:00
Gael Guennebaud
e5778b87b9 Fix duplicate symbol linking error. 2019-11-20 17:23:19 +01:00
Joel Holdsworth
86eb41f1cb SparseRef: Fixed alignment warning on ARM GCC 2019-11-07 14:34:06 +00:00
Anshul Jaiswal
c1a67cb5af Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda. 2019-11-06 22:40:38 +00:00
Rasmus Munk Larsen
cc3d0e6a40 Add EIGEN_HAS_INTRINSIC_INT128 macro
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
2019-11-06 14:24:33 -08:00
Rasmus Munk Larsen
ee404667e2 Rollback or PR-746 and partial rollback of 668ab3fc47
.

std::array is still not supported in CUDA device code on Windows.
2019-11-05 17:17:58 -08:00
Joel Holdsworth
743c925286 test/packetmath: Silence alignment warnings 2019-11-05 19:06:12 +00:00
Rasmus Larsen
0c9745903a Merged in ezhulenev/eigen-01 (pull request PR-746)
Remove internal::smart_copy and replace with std::copy
2019-11-04 20:18:38 +00:00
Hans Johnson
8c8cab1afd STYLE: Convert CMake-language commands to lower case
Ancient CMake versions required upper-case commands.  Later command names
became case-insensitive.  Now the preferred style is lower-case.
2019-10-31 11:36:37 -05:00
Hans Johnson
6fb3e5f176 STYLE: Remove CMake-language block-end command arguments
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308 1. Fix a bug in psqrt and make it return 0 for +inf arguments.
2. Simplify handling of special cases by taking advantage of the fact that the
   builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
   This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:

Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)

Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.

4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.

Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
2cb2915f90 bug #1744: fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require plog/pexp, but the later was disabled on some compilers 2019-11-15 13:39:51 +01:00
Gael Guennebaud
c3f6fcf2c0 bug #1747: one more fix for MSVC regarding the Bessel implementation. 2019-11-15 11:12:35 +01:00
Gael Guennebaud
b9837ca9ae bug #1281: fix AutoDiffScalar's make_coherent for nested expression of constant ADs. 2019-11-14 14:58:08 +01:00
Gael Guennebaud
0fb6e24408 Fix case issue with Lapack unit tests 2019-11-14 14:16:05 +01:00
Gael Guennebaud
8af045a287 bug #1774: fix VectorwiseOp::begin()/end() return types regarding constness. 2019-11-14 11:45:52 +01:00
Sakshi Goynar
75b4c0a3e0 PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flag 2019-10-31 16:09:16 -07:00
Gael Guennebaud
8496f86f84 Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices. 2019-11-13 21:16:53 +01:00
Gael Guennebaud
002e5b6db6 Move to my.cdash.org 2019-11-13 13:33:49 +01:00
Eugene Zhulenev
13c3327f5c Remove legacy block evaluation support 2019-11-12 10:12:28 -08:00
Gael Guennebaud
71aa53dd6d Disable AVX on broken xcode versions. See PR 748.
Patch adapted from Hans Johnson's PR 748.
2019-11-12 11:40:38 +01:00
Rasmus Munk Larsen
0ed0338593 Fix a race in async tensor evaluation: Don't run on_done() until after device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs. 2019-11-11 12:26:41 -08:00
Eugene Zhulenev
c952b8dfda Break loop dependence in TensorGenerator block access 2019-11-11 10:32:57 -08:00
Rasmus Munk Larsen
ebf04fb3e8 Fix data race in css11_tensor_notification test. 2019-11-08 17:44:50 -08:00
Eugene Zhulenev
73ecb2c57d Cleanup includes in Tensor module after switch to C++11 and above 2019-10-29 15:49:54 -07:00
Eugene Zhulenev
e7ed4bd388 Remove internal::smart_copy and replace with std::copy 2019-10-29 11:25:24 -07:00
Eugene Zhulenev
fbc0a9a3ec Fix CXX11Meta compilation with MSVC 2019-10-28 18:30:10 -07:00
Eugene Zhulenev
bd864ab42b Prevent potential ODR in TensorExecutor 2019-10-28 15:45:09 -07:00
Mehdi Goli
6332aff0b2 This PR fixes:
* The specialization of array class in the different namespace for GCC<=6.4
* The implicit call to `std::array` constructor using the initializer list for GCC <=6.1
2019-10-23 15:56:56 +01:00
Rasmus Larsen
8e4e29ae99 Merged in deven-amd/eigen-hip-fix-191018 (pull request PR-738)
Fix for the HIP build+test errors.
2019-10-22 22:18:38 +00:00
Rasmus Munk Larsen
97c0c5d485 Add block evaluation V2 to TensorAsyncExecutor.
Add async evaluation to a number of ops.
2019-10-22 12:42:44 -07:00
Deven Desai
102cf2a72d Fix for the HIP build+test errors.
The errors were introduced by this commit :

After the above mentioned commit, some of the tests started failing with the following error


```
Built target cxx11_tensor_reduction
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible
    DestinationBufferKind m_kind;
    ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible
  DestinationBuffer m_destination;
  ^
```


For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype
2019-10-22 19:21:27 +00:00
Rasmus Munk Larsen
668ab3fc47 Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers. 2019-10-18 16:42:00 -07:00
Eugene Zhulenev
df0e8b8137 Propagate block evaluation preference through rvalue tensor expressions 2019-10-17 11:17:33 -07:00
Eugene Zhulenev
0d2a14ce11 Cleanup Tensor block destination and materialized block storage allocation 2019-10-16 17:14:37 -07:00
Eugene Zhulenev
02431cbe71 TensorBroadcasting support for random/uniform blocks 2019-10-16 13:26:28 -07:00
Eugene Zhulenev
d380c23b2c Block evaluation for TensorGenerator/TensorReverse/TensorShuffling 2019-10-14 14:31:59 -07:00
Gael Guennebaud
39fb9eeccf bug #1747: fix compilation with MSVC 2019-10-14 22:50:23 +02:00
Eugene Zhulenev
a411e9f344 Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op 2019-10-10 10:56:58 -07:00
Rasmus Larsen
b03eb63d7c Merged in ezhulenev/eigen-01 (pull request PR-726)
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-10 16:58:11 +00:00
Gael Guennebaud
e7d8ba747c bug #1752: make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled. 2019-10-10 17:41:47 +02:00
Gael Guennebaud
fb557aec5c bug #1752: disable some is_convertible tests for recent compilers. 2019-10-10 11:40:21 +02:00
Eugene Zhulenev
33e1746139 Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing 2019-10-09 12:45:31 -07:00
Gael Guennebaud
f0a4642bab Implement c++03 compatible fix for changeset 7a43af1a33 2019-10-09 16:00:57 +02:00
Gael Guennebaud
196de2efe3 Explicitly bypass resize and memmoves when there is already the exact right number of elements available. 2019-10-08 21:44:33 +02:00
Gael Guennebaud
36da231a41 Disable an expected warning in unit test 2019-10-08 16:28:14 +02:00
Gael Guennebaud
d1def335dc fix one more possible conflicts with real/imag 2019-10-08 16:19:10 +02:00
Gael Guennebaud
87427d2eaa PR 719: fix real/imag namespace conflict 2019-10-08 09:15:17 +02:00
Gael Guennebaud
7a43af1a33 Fix compilation of FFTW unit test 2019-10-08 08:58:35 +02:00
Eugene Zhulenev
f74ab8cb8d Add block evaluation to TensorEvalTo and fix few small bugs 2019-10-07 15:34:26 -07:00
Brian Zhao
3afb640b56 Fixing incorrect size in Tensor documentation. 2019-10-04 21:30:35 -07:00
Rasmus Munk Larsen
20c4a9118f Use "pdiv" rather than operator/ to support packet types. 2019-10-04 16:54:03 -07:00
Rasmus Larsen
d1dd51cb5f Merged in ezhulenev/eigen-01 (pull request PR-723)
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-10-04 17:19:13 +00:00
Eugene Zhulenev
98bdd7252e Fix compilation warnings and errors with clang in TensorBlockV2 code and tests 2019-10-04 10:15:33 -07:00
Rasmus Munk Larsen
fab4e3a753 Address comments on Chebyshev evaluation code:
1. Use pmadd when possible.
2. Add casts to avoid c++03 warnings.
2019-10-02 12:48:17 -07:00
Eugene Zhulenev
60ae24ee1a Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect 2019-10-02 12:44:06 -07:00
Eugene Zhulenev
6e40454a6e Add beta to TensorContractionKernel and make memset optional 2019-10-02 11:06:02 -07:00
Rasmus Munk Larsen
bd0fac456f Prevent infinite loop in the nvcc compiler while unrolling the recurrent templates for Chebyshev polynomial evaluation. 2019-10-01 13:15:30 -07:00
Gael Guennebaud
9549ba8313 Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real) 2019-10-01 12:54:25 +02:00
Gael Guennebaud
c8b2c603b0 Fix speed issue with SimplicialLDLT for complexes: the diagonal is real! 2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen
13ef08e5ac Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h. 2019-09-27 13:56:04 -07:00
Eugene Zhulenev
7c8bc0d928 Fix cxx11_tensor_block_io test 2019-09-25 11:48:11 -07:00
Eugene Zhulenev
0c845e28c9 Fix erf in c++03 2019-09-25 11:31:45 -07:00
Eugene Zhulenev
71d5bedf72 Fix compilation warnings and errors with clang in TensorBlockV2 2019-09-25 11:25:22 -07:00
Deven Desai
5e186b1987 Fix for the HIP build+test errors.
The errors were introduced by this commit : d38e6fbc27


After the above mentioned commit, some of the tests started failing with the following error


```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
  return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
                     ^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
      ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
    erf(const Scalar& x) {
    ^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
  return make_double2(erf(a.x), erf(a.y));
                      ^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
    erf(const Scalar& x) {
    ^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
  return make_double2(erf(a.x), erf(a.y));
                                ^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
    erf(const Scalar& x) {
    ^
3 errors generated.
```


This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).

This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
2019-09-25 15:39:13 +00:00
Eugene Zhulenev
f35b9ab510 Fix a bug in a packed block type in TensorContractionThreadPool 2019-09-24 16:54:36 -07:00
Rasmus Larsen
d38e6fbc27 Merged in rmlarsen/eigen (pull request PR-704)
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Rasmus Munk Larsen
591a554c68 Add TODO to cleanup FMA cost modelling. 2019-09-24 16:39:25 -07:00
Eugene Zhulenev
c64396b4c6 Choose TensorBlock StridedLinearCopy type statically 2019-09-24 16:04:29 -07:00
Eugene Zhulenev
c97b208468 Add new TensorBlock api implementation + tests 2019-09-24 15:17:35 -07:00
Eugene Zhulenev
ef9dfee7bd Tensor block evaluation V2 support for unary/binary/broadcsting 2019-09-24 12:52:45 -07:00
Christoph Hertzberg
efd9867ff0 bug #1746: Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types 2019-09-24 11:09:58 +02:00
Christoph Hertzberg
e4c1b3c1d2 Fix implicit conversion warnings and use pnegate to negate packets 2019-09-23 16:07:43 +02:00
Christoph Hertzberg
ba0736fa8e Fix (or mask away) conversion warnings introduced in 553caeb6a3
.
2019-09-23 15:58:05 +02:00
Rasmus Munk Larsen
1d5af0693c Add support for asynchronous evaluation of tensor casting expressions. 2019-09-19 13:54:49 -07:00
Rasmus Munk Larsen
6de5ed08d8 Add generic PacketMath implementation of the Error Function (erf). 2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen
28b6786498 Fix build on setups without AVX512DQ. 2019-09-19 12:36:09 -07:00
Deven Desai
e02d429637 Fix for the HIP build+test errors.
The errors were introduced by this commit : 6e215cf109


The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
2019-09-18 18:44:20 +00:00
Srinivas Vasudevan
df0816b71f Merging eigen/eigen. 2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109 Add Bessel functions to SpecialFunctions.
- Split SpecialFunctions files in to a separate BesselFunctions file.

In particular add:
    - Modified bessel functions of the second kind k0, k1, k0e, k1e
    - Bessel functions of the first kind j0, j1
    - Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Eugene Zhulenev
7c73296849 Revert accidental change to GCC diagnostics 2019-09-13 14:30:58 -07:00
Eugene Zhulenev
bf8866b466 Fix maybe-unitialized warnings in TensorContractionThreadPool 2019-09-13 14:29:55 -07:00
Eugene Zhulenev
553caeb6a3 Use ThreadLocal container in TensorContractionThreadPool 2019-09-13 12:14:44 -07:00
Srinivas Vasudevan
facdec5aa7 Add packetized versions of i0e and i1e special functions.
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
  - Move chebevl to GenericPacketMathFunctions.


A brief benchmark with building Eigen with FMA, AVX and AVX2 flags

Before:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            57.3           57.3     10000000
BM_eigen_i0e_double/8           398            398        1748554
BM_eigen_i0e_double/64         3184           3184         218961
BM_eigen_i0e_double/512       25579          25579          27330
BM_eigen_i0e_double/4k       205043         205042           3418
BM_eigen_i0e_double/32k     1646038        1646176            422
BM_eigen_i0e_double/256k   13180959       13182613             53
BM_eigen_i0e_double/1M     52684617       52706132             10
BM_eigen_i0e_float/1             28.4           28.4     24636711
BM_eigen_i0e_float/8             75.7           75.7      9207634
BM_eigen_i0e_float/64           512            512        1000000
BM_eigen_i0e_float/512         4194           4194         166359
BM_eigen_i0e_float/4k         32756          32761          21373
BM_eigen_i0e_float/32k       261133         261153           2678
BM_eigen_i0e_float/256k     2087938        2088231            333
BM_eigen_i0e_float/1M       8380409        8381234             84
BM_eigen_i1e_double/1            56.3           56.3     10000000
BM_eigen_i1e_double/8           397            397        1772376
BM_eigen_i1e_double/64         3114           3115         223881
BM_eigen_i1e_double/512       25358          25361          27761
BM_eigen_i1e_double/4k       203543         203593           3462
BM_eigen_i1e_double/32k     1613649        1613803            428
BM_eigen_i1e_double/256k   12910625       12910374             54
BM_eigen_i1e_double/1M     51723824       51723991             10
BM_eigen_i1e_float/1             28.3           28.3     24683049
BM_eigen_i1e_float/8             74.8           74.9      9366216
BM_eigen_i1e_float/64           505            505        1000000
BM_eigen_i1e_float/512         4068           4068         171690
BM_eigen_i1e_float/4k         31803          31806          21948
BM_eigen_i1e_float/32k       253637         253692           2763
BM_eigen_i1e_float/256k     2019711        2019918            346
BM_eigen_i1e_float/1M       8238681        8238713             86


After:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            15.8           15.8     44097476
BM_eigen_i0e_double/8            99.3           99.3      7014884
BM_eigen_i0e_double/64          777            777         886612
BM_eigen_i0e_double/512        6180           6181         100000
BM_eigen_i0e_double/4k        48136          48140          14678
BM_eigen_i0e_double/32k      385936         385943           1801
BM_eigen_i0e_double/256k    3293324        3293551            228
BM_eigen_i0e_double/1M     12423600       12424458             57
BM_eigen_i0e_float/1             16.3           16.3     43038042
BM_eigen_i0e_float/8             30.1           30.1     23456931
BM_eigen_i0e_float/64           169            169        4132875
BM_eigen_i0e_float/512         1338           1339         516860
BM_eigen_i0e_float/4k         10191          10191          68513
BM_eigen_i0e_float/32k        81338          81337           8531
BM_eigen_i0e_float/256k      651807         651984           1000
BM_eigen_i0e_float/1M       2633821        2634187            268
BM_eigen_i1e_double/1            16.2           16.2     42352499
BM_eigen_i1e_double/8           110            110        6316524
BM_eigen_i1e_double/64          822            822         851065
BM_eigen_i1e_double/512        6480           6481         100000
BM_eigen_i1e_double/4k        51843          51843          10000
BM_eigen_i1e_double/32k      414854         414852           1680
BM_eigen_i1e_double/256k    3320001        3320568            212
BM_eigen_i1e_double/1M     13442795       13442391             53
BM_eigen_i1e_float/1             17.6           17.6     41025735
BM_eigen_i1e_float/8             35.5           35.5     19597891
BM_eigen_i1e_float/64           240            240        2924237
BM_eigen_i1e_float/512         1424           1424         485953
BM_eigen_i1e_float/4k         10722          10723          65162
BM_eigen_i1e_float/32k        86286          86297           8048
BM_eigen_i1e_float/256k      691821         691868           1000
BM_eigen_i1e_float/1M       2777336        2777747            256


This shows anywhere from a 50% to 75% improvement on these operations.

I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).

Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Srinivas Vasudevan
b052ec6992 Merged eigen/eigen into default 2019-09-11 18:01:54 -07:00
Deven Desai
cdb377d0cb Fix for the HIP build+test errors introduced by the ndtri support.
The fixes needed are
 * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
 * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
 * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Gael Guennebaud
747c6a51ca bug #1736: fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i) 2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d bug #1741: fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride 2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08 Fix compilation of BLAS backend and frontend 2019-09-11 10:02:37 +02:00
Rasmus Larsen
97f1e1d89f Merged in ezhulenev/eigen-01 (pull request PR-698)
ThreadLocal container that does not rely on thread local storage

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-09-10 23:19:33 +00:00
Eugene Zhulenev
d918bd9a8b Update ThreadLocal to use separate Initialize/Release callables 2019-09-10 16:13:32 -07:00
Gael Guennebaud
afa8d13532 Fix some implicit literal to Scalar conversions in SparseCore 2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115 bug #1741: fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride 2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956 bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1 2019-09-10 16:25:24 +02:00
Eugene Zhulenev
e3dec4dcc1 ThreadLocal container that does not rely on thread local storage 2019-09-09 15:18:14 -07:00
Gael Guennebaud
17226100c5 Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3 Fix compilation without vector engine available (e.g., x86 with SSE disabled):
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7 Merged eigen/eigen 2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd Fix doc issues regarding ndtri 2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926 Fix possible warning regarding strict equality comparisons 2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615 Merging from eigen/eigen. 2019-09-03 15:34:47 -04:00
Eugene Zhulenev
a8d264fa9c Add test for const TensorMap underlying data mutation 2019-09-03 11:38:39 -07:00
Eugene Zhulenev
f68f2bba09 TensorMap constness should not change underlying storage constness 2019-09-03 11:08:09 -07:00
Gael Guennebaud
8e7e3d9bc8 Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688) 2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27 PR 681: Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13 Change typedefs from private to protected to fix MSVC compilation 2019-09-03 19:11:36 -07:00
Eugene Zhulenev
47fefa235f Allow move-only done callback in TensorAsyncDevice 2019-09-03 17:20:56 -07:00
Srinivas Vasudevan
18ceb3413d Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen
85928e5f47 Guard against repeated definition of EIGEN_MPL2_ONLY 2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen
facc4e4536 Disable tests for contraction with output kernels when using libxsmm, which does not support this. 2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen
eab7e52db2 [Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
A few benchmark numbers:

name                              old time/op             new time/op             delta
BM_Xent_16_10000_cpu              448µs ± 3%              389µs ± 2%  -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu              575µs ± 6%              454µs ± 3%  -21.00%          (p=0.008 n=5+5)
BM_Xent_64_10000_cpu              933µs ± 4%              712µs ± 1%  -23.71%          (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen
0987126165 Clean up unnecessary namespace specifiers in TensorBlock.h. 2019-08-07 12:12:52 -07:00
Gael Guennebaud
0050644b23 Fix doc regarding alignment and c++17 2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen
e2999d4c38 Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662.
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:


Benchmark                       Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4     128173         231326           2922  1.610G items/s

VS

Benchmark                       Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4     146683         246914           2719  1.509G items/s
2019-08-02 11:18:13 -07:00
Alberto Luaces
c694be1214 Fixed Tensor documentation formatting. 2019-07-23 09:24:06 +00:00
Gael Guennebaud
15f3d9d272 More colamd cleanup:
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d Eigen_Colamd.h updated to replace constexpr with consts and enums. 2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face Ordering.h edited to fix dependencies on Eigen_Colamd.h 2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2 Eigen_Colamd.h edited replacing macros with constexprs and functions. 2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions 2019-07-21 04:53:31 +00:00
Kyle Vedder
f22b7283a3 Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code. 2019-07-18 18:12:14 +00:00
Michael Grupp
6e17491f45 Fix typo in Umeyama method documentation 2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456 Remove {} accidentally added in previous commit 2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN block, to make it actually appear in the generated documentation. 2019-07-12 19:46:37 +02:00
Christoph Hertzberg
9237883ff1 Escape \# inside doxygen docu 2019-07-12 19:45:13 +02:00
Christoph Hertzberg
c2671e5315 Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Eugene Zhulenev
3cd148f983 Fix expression evaluation heuristic for TensorSliceOp 2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen
23b958818e Fix compiler for unsigned integers. 2019-07-09 11:18:25 -07:00
Eugene Zhulenev
6083014594 Add outer/inner chipping optimization for chipping dimension specified at runtime 2019-07-03 11:35:25 -07:00
Deven Desai
7eb2e0a95b adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.
Not having this attribute results in the following failures in the `--config=rocm` TF build.

```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error:  'Eigen::constCast':  no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
    typename Storage::Type result = constCast(m_impl.data());
                                    ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error:  'Eigen::constCast':  no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
    return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());

```

Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
2019-07-02 20:02:46 +00:00
Gael Guennebaud
ef8aca6a89 Merged in codeplaysoftware/eigen (pull request PR-667)
[SYCL] :

Approved-by: Gael Guennebaud <g.gael@free.fr>
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-07-02 12:45:23 +00:00
Eugene Zhulenev
4ac93f8edc Allocate non-const scalar buffer for block evaluation with DefaultDevice 2019-07-01 10:55:19 -07:00
Mehdi Goli
9ea490c82c [SYCL] :
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
  * Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
  * Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
  * Fixing the SYCL device macro in SpecialFunctionsImpl.h.
2019-07-01 16:27:28 +01:00
Eugene Zhulenev
81a03bec75 Fix TensorReverse on GPU with m_stride[i]==0 2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen
8053eeb51e Fix CUDA compilation error for pselect<half>. 2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen
74a9dd1102 Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit. 2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen
70d4020ad9 Remove comma causing warning in c++03 mode. 2019-06-28 11:39:45 -07:00
Eugene Zhulenev
6e7c76481a Merge with Eigen head 2019-06-28 11:22:46 -07:00
Eugene Zhulenev
878845cb25 Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred 2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen
1f61aee5ca [SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:11:56 -07:00
Mehdi Goli
7d08fa805a [SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:08:23 +01:00
Mehdi Goli
16a56b2ddd [SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL  backend in Core
*  Supporting Vectorization
2019-06-27 12:25:09 +01:00
Christoph Hertzberg
adec097c61 Remove extra comma (causes warnings in C++03) 2019-06-26 16:14:28 +02:00
Eugene Zhulenev
229db81572 Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp 2019-06-25 15:41:37 -07:00
Deven Desai
ba506d5bd2 fix for a ROCm/HIP specificcompile errror introduced by a recent commit. 2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e Remove extra "one" in comment. 2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb Update comment as suggested by tra@google.com. 2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad Fix grammar. 2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause. 2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1 Fix CUDA build on Mac. 2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730 Various fixes for packet ops.
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1 bug #1724: Mask buggy warnings with g++-7
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Anshul Jaiswal
fab51d133e Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD
to prevent conflicts with other libraries / code.
2019-06-08 21:09:06 +00:00
Eugene Zhulenev
79c402e40e Fix shadow warnings in TensorContractionThreadPool 2019-08-30 15:38:31 -07:00
Eugene Zhulenev
edf2ec28d8 Fix block mapper type name in TensorExecutor 2019-08-30 15:29:25 -07:00
Eugene Zhulenev
f0b36fb9a4 evalSubExprsIfNeededAsync + async TensorContractionThreadPool 2019-08-30 15:13:38 -07:00
Eugene Zhulenev
619cea9491 Revert accidentally removed <memory> header from ThreadPool 2019-08-30 14:51:17 -07:00
Eugene Zhulenev
66665e7e76 Asynchronous expression evaluation with TensorAsyncDevice 2019-08-30 14:49:40 -07:00
Rasmus Munk Larsen
f6c51d9209 Fix missing header inclusion and colliding definitions for half type casting, which broke
build with -march=native on Haswell/Skylake.
2019-08-30 14:03:29 -07:00
Eugene Zhulenev
bc40d4522c Const correctness in TensorMap<const Tensor<T, ...>> expressions 2019-08-28 17:46:05 -07:00
Rasmus Munk Larsen
1187bb65ad Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf. 2019-08-28 12:20:21 -07:00
Eugene Zhulenev
6e77f9bef3 Remove shadow warnings in TensorDeviceThreadPool 2019-08-28 10:32:19 -07:00
Rasmus Munk Larsen
9aba527405 Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly. 2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories. 2019-08-27 11:30:31 -07:00
Rasmus Larsen
84fefdf321 Merged in ezhulenev/eigen-01 (pull request PR-683)
Asynchronous parallelFor in Eigen ThreadPoolDevice
2019-08-26 21:49:17 +00:00
maratek
8b5ab0e4dd Fix get_random_seed on Native Client
Newlib in Native Client SDK does not provide ::random function.
Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
2019-08-23 15:25:56 -07:00
Eugene Zhulenev
6901788013 Asynchronous parallelFor in Eigen ThreadPoolDevice 2019-08-22 10:50:51 -07:00
Christoph Hertzberg
2fb24384c9 Merged in jaopaulolc/eigen (pull request PR-679)
Fixes for Altivec/VSX and compilation with clang on PowerPC
2019-08-22 15:57:33 +00:00
Rasmus Larsen
57f6b62597 Merged in rmlarsen/eigen (pull request PR-680)
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
2019-08-22 00:25:29 +00:00
Eugene Zhulenev
071311821e Remove XSMM support from Tensor module 2019-08-19 11:44:25 -07:00
João P. L. de Carvalho
5ac7984ffa Fix debug macros in p{load,store}u 2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40 Add missing pcmp_XX methods for double/Packet2d
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen
a3298b22ec Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)

The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.

Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
João P. L. de Carvalho
787f6ef025 Fix packed load/store for PowerPC's VSX
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.

For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.

Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294 Fix offset argument of ploadu/pstoreu for Altivec
If no offset is given, them it should be zero.

Also passes full address to vec_vsx_ld/st builtins.

Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.

Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e bug #1718: Add cast to successfully compile with clang on PowerPC
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Rasmus Munk Larsen
6d432eae5d Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off. 2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816 Add workaround for choosing the right include files with FP16C support with clang. 2019-06-05 13:36:37 -07:00
Justin Carpentier
ffaf658ecd PR 655: Fix missing Eigen namespace in Macros 2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c [SYCL] Adding the SYCL memory model. The SYCL memory model provides :
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
  * an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Larsen
c1b0aea653 Merged in Artem-B/eigen (pull request PR-654)
Minor build improvements

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1 Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures. 2019-05-31 15:26:06 -07:00
tra
b4c49bf00e Minor build improvements
* Allow specifying multiple GPU architectures. E.g.:
  cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
2019-05-31 14:08:34 -07:00
Christoph Hertzberg
5614400581 digits10() needs to return an integer
Problem reported on https://stackoverflow.com/questions/56395899
2019-05-31 15:45:41 +02:00
Rasmus Larsen
36e0a2b93f Merged in deven-amd/eigen-hip-fix-190524 (pull request PR-649)
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 16:05:31 +00:00
Deven Desai
2c38930161 fix for HIP build errors that were introduced by a commit earlier this week 2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves
56bc4974fb GEMV: remove double declaration of constant.
That was hurting users with compilers that would object to proceed with
that:

"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
         LhsPacketSize = Traits::LhsPacketSize,
         ^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
  static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
2019-05-23 14:50:29 -07:00
Christoph Hertzberg
ac21a08c13 Cast Index to RealScalar
This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types).
Reported by Peter Anderson-Sprecher via eMail
2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen
3eb5ad0ed0 Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177
and are part of LLVM 3.8.0.
2019-05-20 17:19:20 -07:00
Rasmus Larsen
e92486b8c3 Merged in rmlarsen/eigen (pull request PR-643)
Make Eigen build with cuda 10 and clang.

Approved-by: Justin Lebar <justin.lebar@gmail.com>
2019-05-20 17:02:39 +00:00
Rasmus Munk Larsen
fd595d42a7 Merge 2019-05-20 09:39:11 -07:00
Gael Guennebaud
cc7ecbb124 Merged in scramsby/eigen (pull request PR-646)
Eigen: Fix MSVC C++17 language standard detection logic
2019-05-20 07:19:10 +00:00
Eugene Zhulenev
01654d97fa Prevent potential division by zero in TensorExecutor 2019-05-17 14:02:25 -07:00
Rasmus Larsen
78d3015722 Merged in ezhulenev/eigen-01 (pull request PR-644)
Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
2019-05-17 19:44:25 +00:00
Rasmus Larsen
bf9cbed8d0 Merged in glchaves/eigen (pull request PR-635)
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-17 19:40:50 +00:00
Eugene Zhulenev
96a276803c Always evaluate Tensor expressions with broadcasting via tiled evaluation code path 2019-05-16 16:15:45 -07:00
Rasmus Munk Larsen
ab0a30e429 Make Eigen build with cuda 10 and clang. 2019-05-15 13:32:15 -07:00
Rasmus Munk Larsen
734a50dc60 Make Eigen build with cuda 10 and clang. 2019-05-15 13:32:15 -07:00
Rasmus Larsen
c8d8d5c0fc Merged in rmlarsen/eigen_threadpool (pull request PR-640)
Fix deadlocks in thread pool.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2019-05-13 20:04:35 +00:00
Christoph Hertzberg
5f32b79edc Collapsed revision from PR-641
* SparseLU.h - corrected example, it didn't compile
* Changed encoding back to UTF8
2019-05-13 19:02:30 +02:00
Anuj Rawat
ad372084f5 Removing unused API to fix compile error in TensorFlow due to
AVX512VL, AVX512BW usage
2019-05-12 14:43:10 +00:00
Christoph Hertzberg
4ccd1ece92 bug #1707: Fix deprecation warnings, or disable warnings when testing deprecated functions 2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen
d3ef7cf03e Fix build with clang on Windows. 2019-05-09 11:07:04 -07:00
Rasmus Munk Larsen
e5ac8cbd7a A) fix deadlocks in thread pool caused by EventCount
This fixed 2 deadlocks caused by sloppiness in the EventCount logic.
Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm:
01da8caf00

bug #1 (Prewait):
Prewait must not consume existing signals.
Consider the following scenario.
There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty.
Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait.
Thread 2 checks the queue and now is going to call Prewait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Now thread 2 resumes and calls Prewait and takes away the signal.
Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks.
As the result we have 2 tasks, but only 1 thread is running.

bug #2 (CancelWait):
CancelWait must not take away a signal if it's not sure that the signal was meant for this thread.
When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm):
(a) the registered waiter notices presence of the new task and does not block
(b) the signaler notices presence of the waiters and wakes it
(c) both the waiter notices presence of the new task and signaler notices presence of the waiter
[it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock]
CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else.
Consider:
Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1).
Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks.
As the result we have 2 tasks, but only 1 thread is running.

Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2.

This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running.



B) fix deadlock in thread pool caused by RunQueue

This fixed a deadlock caused by sloppiness in the RunQueue logic.
Most likely this was introduced with the non-blocking thread pool.
The deadlock only affects workloads that require parallelism.
Most computational tasks don't require parallelism.

PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals.
Consider 2 worker threads are blocked.
External thread submits a task. One of the threads is woken.
It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock).
The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait).
Now external thread submits another task and signals EventCount again.
The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running.

It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug.
It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
2019-05-08 10:16:46 -07:00
Michael Tesch
c5019f722b Use pade for matrix exponential also for complex values. 2019-05-08 17:04:55 +02:00
Eugene Zhulenev
45b40d91ca Fix AVX512 & GCC 6.3 compilation 2019-05-07 16:44:55 -07:00
Christoph Hertzberg
e6667a7060 Fix stupid shadow-warnings (with old clang versions) 2019-05-07 18:32:19 +02:00
Christoph Hertzberg
e54dc24d62 Restore C++03 compatibility 2019-05-07 18:30:44 +02:00
Christoph Hertzberg
cca76c272c Restore C++03 compatibility 2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7 Fix traits for scalar_logistic_op. 2019-05-03 15:49:09 -07:00
Scott Ramsby
ff06ef7584 Eigen: Fix MSVC C++17 language standard detection logic
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
2019-05-03 14:14:09 -07:00
Eugene Zhulenev
e9f0eb8a5e Add masked_store_available to unpacket_traits 2019-05-02 14:52:58 -07:00
Eugene Zhulenev
96e30e936a Add masked pstoreu for Packet16h 2019-05-02 14:11:01 -07:00
Eugene Zhulenev
b4010f02f9 Add masked pstoreu to AVX and AVX512 PacketMath 2019-05-02 13:14:18 -07:00
Gael Guennebaud
578407f42f Fix regression in changeset ae33e866c7 2019-05-02 15:45:21 +02:00
Rasmus Larsen
ac50afaffa Merged in ezhulenev/eigen-01 (pull request PR-633)
Check if gpu_assert was overridden in TensorGpuHipCudaDefines
2019-04-29 16:29:35 +00:00
Gustavo Lima Chaves
d4dcb71bcb Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
We take advantage of smaller SIMD registers as well, in that case.

Gains up to 3x for select input sizes.
2019-04-26 14:12:39 -07:00
Andy May
ae33e866c7 Fix compilation with PGI version 19 2019-04-25 21:23:19 +01:00
Gael Guennebaud
665ac22cc6 Merged in ezhulenev/eigen-01 (pull request PR-632)
Fix doxygen warnings
2019-04-25 20:02:20 +00:00
Eugene Zhulenev
01d7e6ee9b Check if gpu_assert was overridden in TensorGpuHipCudaDefines 2019-04-25 11:19:17 -07:00
Eugene Zhulenev
8ead5bb3d8 Fix doxygen warnings to enable statis code analysis 2019-04-24 12:42:28 -07:00
Eugene Zhulenev
07355d47c6 Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h 2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen
144ca33321 Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings. 2019-04-24 08:50:07 -07:00
Eugene Zhulenev
a7b7f3ca8a Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings 2019-04-23 17:23:19 -07:00
Eugene Zhulenev
68a2a8c445 Use packet ops instead of AVX2 intrinsics 2019-04-23 11:41:02 -07:00
Anuj Rawat
8c7a6feb8e Adding lowlevel APIs for optimized RHS packet load in TensorFlow
SpatialConvolution

Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.

This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.

Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim

Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.

AVX512:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128,   24x24,  1, 64,   8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
 32,   24x24,  3, 64,   5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128,   24x24,  3, 64,   3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
 32,   14x14, 24, 64,   5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128,  3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X

AVX2:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
 32,   24x24,  3, 64,   5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128,   24x24,  1, 64,   5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128,   24x24,  3, 64,   3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128,  3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X

In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).

On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:

AVX512:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  |  41350                     | 15073                   | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |   7277                     |  7341                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |   8675                     |  8681                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  |  24155                     | 16079                   | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  |  25052                     | 17152                   | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) |  18269                     | 18345                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) |  19468                     | 19872                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 156060                     | 42432                   | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 132701                     | 36944                   | 3.59X

AVX2:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  | 26233                      | 12393                   | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |  6091                      |  6062                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |  7427                      |  7408                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  | 23453                      | 20826                   | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  | 23167                      | 22091                   | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422                      | 23682                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165                      | 23663                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 72689                      | 44969                   | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 61732                      | 39779                   | 1.55X

All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Christoph Hertzberg
4270c62812 Split the implementation of i?amax/min into two. Based on PR-627 by Sameer Agarwal.
Like the Netlib reference implementation, I*AMAX now uses the L1-norm instead of the L2-norm for each element. Changed I*MIN accordingly.
2019-04-15 17:18:03 +02:00
Rasmus Munk Larsen
039ee52125 Tweak cost model for tensor contraction when parallelizing over the inner dimension.
https://bitbucket.org/snippets/rmlarsen/MexxLo
2019-04-12 13:35:10 -07:00
Jonathon Koyle
9a3f06d836 Update TheadPoolDevice example to include ThreadPool creation and passing pointer into constructor. 2019-04-10 10:02:33 -06:00
Deven Desai
66a885b61e adding EIGEN_DEVICE_FUNC to the recently added TensorContractionKernel constructor. Not having the EIGEN_DEVICE_FUNC attribute on it was leading to compiler errors when compiling Eigen in the ROCm/HIP path 2019-04-08 13:45:08 +00:00
Eugene Zhulenev
629ddebd15 Add missing semicolon 2019-04-02 15:04:26 -07:00
Eugene Zhulenev
4e2f6de1a8 Add support for custom packed Lhs/Rhs blocks in tensor contractions 2019-04-01 11:47:31 -07:00
Gael Guennebaud
45e65fbb77 bug #1695: fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign. 2019-03-27 20:16:58 +01:00
William D. Irons
8de66719f9 Collapsed revision from PR-619
* Add support for pcmp_eq in AltiVec/Complex.h
* Fixed implementation of pcmp_eq for double

The new logic is based on the logic from NEON for double.
2019-03-26 18:14:49 +00:00
Gael Guennebaud
f11364290e ICC does not support -fno-unsafe-math-optimizations 2019-03-22 09:26:24 +01:00
David Tellenbach
3031d57200 PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN 2019-03-21 02:21:04 +01:00
Deven Desai
51e399fc15 updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16 2019-03-19 21:45:25 +00:00
Deven Desai
2dbea5510f Merged eigen/eigen into default 2019-03-19 16:52:38 -04:00
Rasmus Larsen
5c93b38c5f Merged in rmlarsen/eigen (pull request PR-618)
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>.

Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-03-18 15:51:55 +00:00
Gael Guennebaud
48898a988a fix unit test in c++03: c++03 does not allow passing local or anonymous enum as template param 2019-03-18 11:38:36 +01:00
Gael Guennebaud
cf7e2e277f bug #1692: enable enum as sizes of Matrix and Array 2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen
e42f9aa68a Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>. 2019-03-15 17:15:14 -07:00
Rasmus Larsen
1936aac43f Merged in tellenbach/eigen/sykline_consistent_include_guards (pull request PR-617)
Fix include guard comments for Skyline module
2019-03-15 20:04:56 +00:00
David Tellenbach
bd9c2ae3fd Fix include guard comments 2019-03-15 15:29:17 +01:00
Rasmus Munk Larsen
8450a6d519 Clean up half packet traits and add a few more missing packet ops. 2019-03-14 15:18:06 -07:00
David Tellenbach
b013176e52 Remove undefined std::complex<int> 2019-03-14 11:40:28 +01:00
David Tellenbach
97f9a46cb9 PR 593: Add variadtic ctor for DiagonalMatrix with unit tests 2019-03-14 10:18:24 +01:00
Gael Guennebaud
45ab514fe2 revert debug stuff 2019-03-14 10:08:12 +01:00
Rasmus Munk Larsen
6a34003141 Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2. 2019-03-13 11:52:41 -07:00
Gael Guennebaud
d7d2f0680e bug #1684: partially workaround clang's 6/7 bug #40815 2019-03-13 10:40:01 +01:00
Rasmus Larsen
690f0795d0 Merged in rmlarsen/eigen (pull request PR-615)
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-12 16:09:48 +00:00
Thomas Capricelli
1901433674 erm.. use proper id 2019-03-12 13:53:38 +01:00
Thomas Capricelli
90302aa8c9 update tracking code 2019-03-12 13:47:01 +01:00
Rasmus Munk Larsen
77f7d4a894 Clean up PacketMathHalf.h and add a few missing logical packet ops. 2019-03-11 17:51:16 -07:00
Eugene Zhulenev
001f10e3c9 Fix segfaults with cuda compilation 2019-03-11 09:43:33 -07:00
Eugene Zhulenev
899c16fa2c Fix a bug in TensorGenerator for 1d tensors 2019-03-11 09:42:01 -07:00
Eugene Zhulenev
0f8bfff23d Fix a data race in NonBlockingThreadPool 2019-03-11 09:38:44 -07:00
Gael Guennebaud
656d9bc66b Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax 2019-03-10 21:19:18 +01:00
Gael Guennebaud
2df4f00246 Change license from LGPL to MPL2 with agreement from David Harmon. 2019-03-07 18:17:10 +01:00
Rasmus Munk Larsen
3c3f639fe2 Merge. 2019-03-06 11:54:30 -08:00
Rasmus Munk Larsen
f4ec8edea8 Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local. 2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen
41cdc370d0 Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
Found with -Wundefined-func-template.

Author: tkoeppe@google.com
2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen
cc407c9d4d Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
Found with -Wundefined-func-template.

Author: tkoeppe@google.com
2019-03-06 11:40:06 -08:00
Eugene Zhulenev
1bc2a0a57c Add missing return to NonBlockingThreadPool::LocalSteal 2019-03-06 10:49:49 -08:00
Eugene Zhulenev
4e4dcd9026 Remove redundant steal loop 2019-03-06 10:39:07 -08:00
Rasmus Larsen
4d808e834a Merged in rmlarsen/eigen_threadpool (pull request PR-606)
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239


Approved-by: Sameer Agarwal <sameeragarwal@google.com>
2019-03-06 17:59:03 +00:00
Rasmus Larsen
2ea18e505f Merged in ezhulenev/eigen-01 (pull request PR-610)
Block evaluation for TensorGeneratorOp
2019-03-06 16:49:38 +00:00
Eugene Zhulenev
25abaa2e41 Check that inner block dimension is continuous 2019-03-05 17:34:35 -08:00
Eugene Zhulenev
5d9a6686ed Block evaluation for TensorGeneratorOp 2019-03-05 16:35:21 -08:00
Rasmus Larsen
b4861f4778 Merged in ezhulenev/eigen-01 (pull request PR-609)
Tune tensor contraction threadpool heuristics
2019-03-05 23:54:40 +00:00
Gael Guennebaud
bfbf7da047 bug #1689 fix used-but-marked-unused warning 2019-03-05 23:46:24 +01:00
Eugene Zhulenev
a407e022e6 Tune tensor contraction threadpool heuristics 2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82 Add an extra check for the RunQueue size estimate 2019-03-05 11:51:26 -08:00
Eugene Zhulenev
b1a8627493 Do not create Tensor<const T> in cxx11_tensor_forced_eval test 2019-03-05 11:19:25 -08:00
Rasmus Munk Larsen
0318fc7f44 Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
Eugene Zhulenev
efb5080d31 Do not initialize invalid fast_strides in TensorGeneratorOp 2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2 Add tiled evaluation for TensorForcedEvalOp 2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd Use fast divisors in TensorGeneratorOp 2019-03-04 11:10:21 -08:00
Gael Guennebaud
b0d406d91c Enable construction of Ref<VectorType> from a runtime vector. 2019-03-03 15:25:25 +01:00
Sam Hasinoff
9ba81cf0ff Fully qualify Eigen::internal::aligned_free
This helps avoids a conflict on certain Windows toolchains
(potentially due to some ADL name resolution bug) in the case
where aligned_free is defined in the global namespace. In any
case, tightening this up is harmless.
2019-03-02 17:42:16 +00:00
Gael Guennebaud
22144e949d bug #1629: fix compilation of PardisoSupport (regression introduced in changeset a7842daef2
)
2019-03-02 22:44:47 +01:00
Bernhard M. Wiedemann
b071672e78 Do not keep latex logs
to make package builds more reproducible.
See https://reproducible-builds.org/ for why this is good.
2019-02-27 11:09:00 +01:00
Rasmus Munk Larsen
cf4a1c81fa Fix specialization for conjugate on non-complex types in TensorBase.h. 2019-03-01 14:21:09 -08:00
Sameer Agarwal
c181dfb8ab Consistently use EIGEN_BLAS_FUNC in BLAS.
Previously, for a few functions, eithe BLASFUNC or, EIGEN_CAT
was being used. This change uses EIGEN_BLAS_FUNC consistently
everywhere.

Also introduce EIGEN_BLAS_FUNC_SUFFIX, which by default is
equal to "_", this allows the user to inject a new suffix as
needed.
2019-02-27 11:30:58 -08:00
Rasmus Larsen
9558f4c25f Merged in rmlarsen/eigen_threadpool (pull request PR-596)
Improve EventCount used by the non-blocking threadpool.

Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-26 20:37:26 +00:00
Rasmus Larsen
2ca1e73239 Merged in rmlarsen/eigen (pull request PR-597)
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2.

Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-25 17:02:16 +00:00
Gael Guennebaud
e409dbba14 Enable SSE vectorization of Quaternion and cross3() with AVX 2019-02-23 10:45:40 +01:00
Rasmus Munk Larsen
6560692c67 Improve EventCount used by the non-blocking threadpool.
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Gael Guennebaud
0b25a5c431 fix alignment in ploadquad 2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen
1dc1677d52 Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2. 2019-02-22 12:33:57 -08:00
Gael Guennebaud
0cb4ba98e7 update wrt recent changes 2019-02-21 17:19:36 +01:00
Gael Guennebaud
cca6c207f4 AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM 2019-02-21 17:18:28 +01:00
Gael Guennebaud
1c09ee8541 bug #1674: workaround clang fast-math aggressive optimizations 2019-02-22 15:48:53 +01:00
Gael Guennebaud
7e3084bb6f Fix compilation on ARM. 2019-02-22 14:56:12 +01:00
Gael Guennebaud
32502f3c45 bug #1684: add simplified regression test for respective clang's bug (this also reveal the same bug in Apples's clang) 2019-02-22 10:29:06 +01:00
Gael Guennebaud
42c23f14ac Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes. 2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen
4d7f317102 Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU. 2019-02-21 13:32:13 -08:00
Gael Guennebaud
2a39659d79 Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases. 2019-02-20 15:23:23 +01:00
Gael Guennebaud
302377110a Update documentation of Matrix and Array type aliases. 2019-02-20 15:18:48 +01:00
Gael Guennebaud
475295b5ff Enable documentation of Array's typedefs 2019-02-20 15:18:07 +01:00
Gael Guennebaud
44b54fa4a3 Protect c++11 type alias with Eigen's macro, and add respective unit test. 2019-02-20 14:43:05 +01:00
Gael Guennebaud
7195f008ce Merged in ra_bauke/eigen (pull request PR-180)
alias template for matrix and array classes, see also bug #864

Approved-by: Heiko Bauke <heiko.bauke@mail.de>
2019-02-20 13:22:39 +00:00
Gael Guennebaud
4e8047cdcf Fix compilation with gcc and remove TR1 stuff. 2019-02-20 13:59:34 +01:00
Gael Guennebaud
844e5447f8 Update documentation regarding alignment issue. 2019-02-20 13:54:04 +01:00
Gael Guennebaud
edd413c184 bug #1409: make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
- this helps clang 5 and 6 to support alignas in STL's containers.
 - this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
3b5deeb546 bug #899: make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems. 2019-02-19 22:57:51 +01:00
Gael Guennebaud
482c5fb321 bug #899: remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing. 2019-02-19 22:52:15 +01:00
Gael Guennebaud
9ac1634fdf Fix conversion warnings 2019-02-19 21:59:53 +01:00
Gael Guennebaud
292d61970a Fix C++17 compilation 2019-02-19 21:59:41 +01:00
Rasmus Munk Larsen
071629a440 Fix incorrect value of NumDimensions in TensorContraction traits.
Reported here: #1671
2019-02-19 10:49:54 -08:00
Christoph Hertzberg
a1646fc960 Commas at the end of enumerator lists are not allowed in C++03 2019-02-19 14:32:25 +01:00
Gael Guennebaud
2cfc025bda fix unit compilation in c++17: std::ptr_fun has been removed. 2019-02-19 14:05:22 +01:00
Gael Guennebaud
ab78cabd39 Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode. 2019-02-19 14:04:35 +01:00
Gael Guennebaud
115da6a1ea Fix conversion warnings 2019-02-19 14:00:15 +01:00
Gael Guennebaud
7d10c78738 bug #1046: add unit tests for correct propagation of alignment through std::alignment_of 2019-02-19 10:31:56 +01:00
Gael Guennebaud
7580112c31 Fix harmless Scalar vs RealScalar cast. 2019-02-18 22:12:28 +01:00
Gael Guennebaud
e23bf40dc2 Add unit test for LinSpaced and complex numbers. 2019-02-18 22:03:47 +01:00
Gael Guennebaud
796db94e6e bug #1194: implement slightly faster and SIMD friendly 4x4 determinant. 2019-02-18 16:21:27 +01:00
Gael Guennebaud
31b6e080a9 Fix regression: .conjugate() was popped out but not re-introduced. 2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0 Set cost of conjugate to 0 (in practice it boils down to a no-op).
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1 GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).
2019-02-18 11:47:54 +01:00
Christoph Hertzberg
ec032ac03b Guard C++11-style default constructor. Also, this is only needed for MSVC 2019-02-16 09:44:05 +01:00
Gael Guennebaud
902a7793f7 Add possibility to bench row-major lhs and rhs 2019-02-15 16:52:34 +01:00
Gael Guennebaud
83309068b4 bug #1680: improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE. 2019-02-15 16:35:35 +01:00
Gael Guennebaud
0505248f25 bug #1680: make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC) 2019-02-15 16:33:56 +01:00
Gael Guennebaud
559320745e bug #1678: Fix lack of __FMA__ macro on MSVC with AVX512 2019-02-15 10:30:28 +01:00
Gael Guennebaud
d85ae650bf bug #1678: workaround MSVC compilation issues with AVX512 2019-02-15 10:24:17 +01:00
Gael Guennebaud
f2970819a2 bug #1679: avoid possible division by 0 in complex-schur 2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen
65e23ca7e9 Revert b55b5c7280
.
2019-02-14 13:46:13 -08:00
Rasmus Larsen
efeabee445 Merged in ezhulenev/eigen-01 (pull request PR-590)
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7 Fix signed-unsigned return in RuqQueue 2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265 Fix signed-unsigned comparison warning in RunQueue 2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a Do not generate no-op cast() and conjugate() expressions 2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790 Speedup Tensor ThreadPool RunQueu::Empty() 2019-02-13 10:20:53 -08:00
Gael Guennebaud
bdcb5f3304 Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...) 2019-02-11 22:56:19 +01:00
Gael Guennebaud
2edfc6807d Fix compilation of empty products of the form: Mx0 * 0xN 2019-02-11 18:24:07 +01:00
Gael Guennebaud
eb46f34a8c Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.
Not sure that's so critical, but this does not complexify the code base much.
2019-02-11 17:59:35 +01:00
Gael Guennebaud
dada863d23 Enable unit tests of PartialPivLU on fixed size matrices, and increase tested matrix size (blocking was not tested!) 2019-02-11 17:56:20 +01:00
Gael Guennebaud
ab6e6edc32 Speedup PartialPivLU for small matrices by passing compile-time sizes when available.
This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well.
The table below reports times in micro seconds for 10 random matrices:
           | ------ float --------- | ------- double ------- |
     size  | before   after  ratio  |  before   after  ratio |
fixed	  1	 | 0.34     0.11   2.93   |  0.35     0.11   3.06  |
fixed	  2	 | 0.81     0.24   3.38   |  0.91     0.25   3.60  |
fixed	  3	 | 1.49     0.49   3.04   |  1.68     0.55   3.01  |
fixed	  4	 | 2.31     0.70   3.28   |  2.45     1.08   2.27  |
fixed	  5	 | 3.49     1.11   3.13   |  3.84     2.24   1.71  |
fixed	  6	 | 4.76     1.64   2.88   |  4.87     2.84   1.71  |
dyn     1	 | 0.50     0.40   1.23   |  0.51     0.40   1.26  |
dyn     2	 | 1.08     0.85   1.27   |  1.04     0.69   1.49  |
dyn     3	 | 1.76     1.26   1.40   |  1.84     1.14   1.60  |
dyn     4	 | 2.57     1.75   1.46   |  2.67     1.66   1.60  |
dyn     5	 | 3.80     2.64   1.43   |  4.00     2.48   1.61  |
dyn     6	 | 5.06     3.43   1.47   |  5.15     3.21   1.60  |
2019-02-11 13:58:24 +01:00
Eugene Zhulenev
21eb97d3e0 Add PacketConv implementation for non-vectorizable src expressions 2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1 Optimize TensorConversion evaluator: do not convert same type 2019-02-08 15:13:24 -08:00
Steven Peters
953ca5ba2f Spline.h: fix spelling "spang" -> "span" 2019-02-08 06:23:24 +00:00
Eugene Zhulenev
59998117bb Don't do parallel_pack if we can use thread_local memory in tensor contractions 2019-02-07 09:21:25 -08:00
Gael Guennebaud
013cc3a6b3 Make GEMM fallback to GEMV for runtime vectors.
This is a more general and simpler version of changeset 4c0fa6ce0f
2019-02-07 16:24:09 +01:00
Gael Guennebaud
fa2fcb4895 Backed out changeset 4c0fa6ce0f 2019-02-07 16:07:08 +01:00
Gael Guennebaud
b3c4344a68 bug #1676: workaround GCC's bug in c++17 mode. 2019-02-07 15:21:35 +01:00
Rasmus Larsen
3091c03898 Merged in ezhulenev/eigen-01 (pull request PR-581)
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing

Approved-by: Rasmus Larsen <rmlarsen@google.com>
Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-05 22:45:20 +00:00
Eugene Zhulenev
8491127082 Do not reduce parallelism too much in contractions with small number of threads 2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769 Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing 2019-02-04 10:43:16 -08:00
Eugene Zhulenev
6d0f6265a9 Remove duplicated comment line 2019-02-04 10:30:25 -08:00
Eugene Zhulenev
690b2c45b1 Fix GeneralBlockPanelKernel Android compilation 2019-02-04 10:29:15 -08:00
Gael Guennebaud
871e2e5339 bug #1674: disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise) 2019-02-03 08:54:47 +01:00
Rasmus Larsen
e7b481ea74 Merged in rmlarsen/eigen (pull request PR-578)
Speed up Eigen matrix*vector and vector*matrix multiplication.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2019-02-02 01:53:44 +00:00
Sameer Agarwal
b55b5c7280 Speed up row-major matrix-vector product on ARM
The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.

This change introduces two modifications to this logic.

1. A smaller threshold for ARM and ARM64 devices.

The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.

On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.

The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.

With the current change, Eigen handily beats that code.

2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.

Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen
4c0fa6ce0f Speed up Eigen matrix*vector and vector*matrix multiplication.
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.

The benchmarks below test

c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.

Benchmark measurements:

SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64                            1096       312    +71.5%
BM_MatVec/128                           4581      1464    +68.0%
BM_MatVec/256                          18534      5710    +69.2%
BM_MatVec/512                         118083     24162    +79.5%
BM_MatVec/1k                          704106    173346    +75.4%
BM_MatVec/2k                         3080828    742728    +75.9%
BM_MatVec/4k                        25421512   4530117    +82.2%
BM_VecMat/32                             352       130    +63.1%
BM_VecMat/64                            1213       425    +65.0%
BM_VecMat/128                           4640      1564    +66.3%
BM_VecMat/256                          17902      5884    +67.1%
BM_VecMat/512                          70466     24000    +65.9%
BM_VecMat/1k                          340150    161263    +52.6%
BM_VecMat/2k                         1420590    645576    +54.6%
BM_VecMat/4k                         8083859   4364327    +46.0%

AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64                             619       120    +80.6%
BM_MatVec/128                           9693       752    +92.2%
BM_MatVec/256                          38356      2773    +92.8%
BM_MatVec/512                          69006     12803    +81.4%
BM_MatVec/1k                          443810    160378    +63.9%
BM_MatVec/2k                         2633553    646594    +75.4%
BM_MatVec/4k                        16211095   4327148    +73.3%
BM_VecMat/64                             925       227    +75.5%
BM_VecMat/128                           3438       830    +75.9%
BM_VecMat/256                          13427      2936    +78.1%
BM_VecMat/512                          53944     12473    +76.9%
BM_VecMat/1k                          302264    157076    +48.0%
BM_VecMat/2k                         1396811    675778    +51.6%
BM_VecMat/4k                         8962246   4459010    +50.2%

AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64                             401       111    +72.3%
BM_MatVec/128                           1846       513    +72.2%
BM_MatVec/256                          36739      1927    +94.8%
BM_MatVec/512                          54490      9227    +83.1%
BM_MatVec/1k                          487374    161457    +66.9%
BM_MatVec/2k                         2016270    643824    +68.1%
BM_MatVec/4k                        13204300   4077412    +69.1%
BM_VecMat/32                             324       106    +67.3%
BM_VecMat/64                            1034       246    +76.2%
BM_VecMat/128                           3576       802    +77.6%
BM_VecMat/256                          13411      2561    +80.9%
BM_VecMat/512                          58686     10037    +82.9%
BM_VecMat/1k                          320862    163750    +49.0%
BM_VecMat/2k                         1406719    651397    +53.7%
BM_VecMat/4k                         7785179   4124677    +47.0%
Currently watchingStop watching
2019-01-31 14:24:08 -08:00
Gael Guennebaud
7ef879f6bf GEBP: improves pipelining in the 1pX4 path with FMA.
Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA.
With AVX+FMA I measured a speed up of about x1.25 in such cases.
2019-01-30 23:45:12 +01:00
Gael Guennebaud
de77bf5d6c Fix compilation with ARM64. 2019-01-30 16:48:20 +01:00
Gael Guennebaud
d586686924 Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper. 2019-01-30 16:48:01 +01:00
Gael Guennebaud
eb4c6bb22d Fix conflicts and merge 2019-01-30 15:57:08 +01:00
Gael Guennebaud
e3622a0396 Slightly extend discussions on auto and move the content of the Pit falls wiki page here.
http://eigen.tuxfamily.org/index.php?title=Pit_Falls
2019-01-30 13:09:21 +01:00
Gael Guennebaud
df12fae8b8 According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101, the previous GCC issue is fixed in GCC trunk (will be gcc 9). 2019-01-30 11:52:28 +01:00
Gael Guennebaud
3775926bba ARM64 & GEBP: add specialization for double +30% speed up 2019-01-30 11:49:06 +01:00
Gael Guennebaud
be5b0f664a ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM 2019-01-30 11:48:25 +01:00
Christoph Hertzberg
a7779a9b42 Hide some annoying unused variable warnings in g++8.1 2019-01-29 16:48:21 +01:00
Gael Guennebaud
efe02292a6 Add recent gemm related changesets and various cleanups in perf-monitoring 2019-01-29 11:53:47 +01:00
Gael Guennebaud
8a06c699d0 bug #1669: fix PartialPivLU/inverse with zero-sized matrices. 2019-01-29 10:27:13 +01:00
Gael Guennebaud
a2a07e62b9 Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected. 2019-01-29 10:10:07 +01:00
Gael Guennebaud
f489f44519 bug #1574: implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs. 2019-01-28 17:29:50 +01:00
Gael Guennebaud
803fa79767 Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion 2019-01-28 17:24:44 +01:00
Gael Guennebaud
53560f9186 bug #1672: fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro) 2019-01-28 13:47:28 +01:00
Christoph Hertzberg
c9825b967e Renaming even more I identifiers 2019-01-26 13:22:13 +01:00
Christoph Hertzberg
5a52e35f9a Renaming some more I identifiers 2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen
71429883ee Fix compilation error in NEON GEBP specializaition of madd. 2019-01-25 17:00:21 -08:00
Christoph Hertzberg
934b8a1304 Avoid I as an identifier, since it may clash with the C-header complex.h 2019-01-25 14:54:39 +01:00
Gael Guennebaud
ec8a387972 cleanup 2019-01-24 10:24:45 +01:00
Gael Guennebaud
6908ce2a15 More thoroughly check variadic template ctor of fixed-size vectors 2019-01-24 10:24:28 +01:00
David Tellenbach
237b03b372 PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients. 2019-01-23 00:07:19 +01:00
Christoph Hertzberg
bd6dadcda8 Tell doxygen that cxx11 math is available 2019-01-24 00:14:02 +01:00
Gael Guennebaud
c64d5d3827 Bypass inline asm for non compatible compilers. 2019-01-23 23:43:13 +01:00
Christoph Hertzberg
e16913a45f Fix name of tutorial snippet. 2019-01-23 10:35:06 +01:00
Gael Guennebaud
80f81f9c4b Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing. 2019-01-22 17:08:47 +01:00
David Tellenbach
db152b9ee6 PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
Gael Guennebaud
543529da6a Add more extensive tests of Array ctors, including {} variants 2019-01-22 15:30:50 +01:00
nluehr
92774f0275 Replace host_define.h with cuda_runtime_api.h 2019-01-18 16:10:09 -06:00
Gael Guennebaud
d18f49cbb3 Fix compilation of unit tests with gcc and c++17 2019-01-18 11:12:42 +01:00
Christoph Hertzberg
da0a41b9ce Mask unused-parameter warnings, when building with NDEBUG 2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen
2eccbaf3f7 Add missing logical packet ops for GPU and NEON. 2019-01-17 17:45:08 -08:00
Christoph Hertzberg
d575505d25 After fixing bug #1557, boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success) 2019-01-17 19:14:07 +01:00
Gael Guennebaud
ee3662abc5 Remove some useless const_cast 2019-01-17 18:27:49 +01:00
Gael Guennebaud
0fe6b7d687 Make nestByValue works again (broken since 3.3) and add unit tests. 2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82 Extend reshaped unit tests and remove useless const_cast 2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1 Cleanup useless const_cast and add missing broadcast assignment tests 2019-01-17 16:55:42 +01:00
Gael Guennebaud
be05d0030d Make FullPivLU use conjugateIf<> 2019-01-17 12:01:00 +01:00
Patrick Peltzer
bba2f05064 Boosttest only available for Boost version >= 1.53.0 2019-01-17 11:54:37 +01:00
Patrick Peltzer
15e53d5d93 PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
This changeset also includes:
 * add HouseholderSequence::conjugateIf
 * define int as the StorageIndex type for all dense solvers
 * dedicated unit tests, including assertion checking
 * _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
 * CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
 * Cholesky: add missing assertions
 * FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
 * BDCSVD: Unambiguous return type for ternary operator
 * SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11 Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it. 2019-01-17 11:33:43 +01:00
Gael Guennebaud
7b35c26b1c Doc: remove link to porting guide 2019-01-17 10:35:50 +01:00
Gael Guennebaud
4759d9e86d Doc: add manual page on STL iterators 2019-01-17 10:35:14 +01:00
Gael Guennebaud
562985bac4 bug #1646: fix false aliasing detection for A.row(0) = A.col(0);
This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.
2019-01-17 00:14:27 +01:00
Rasmus Munk Larsen
7401e2541d Fix compilation error for logical packet ops with older compilers. 2019-01-16 14:43:33 -08:00
Rasmus Munk Larsen
ee550a2ac3 Fix flaky test for tensor fft. 2019-01-16 14:03:12 -08:00
Gael Guennebaud
0f028f61cb GEBP: fix swapped kernel mode with AVX512 and complex scalars 2019-01-16 22:26:38 +01:00
Gael Guennebaud
e118ce86fd GEBP: cleanup logic to choose between a 4 packets of 1 packet 2019-01-16 21:47:42 +01:00
Gael Guennebaud
70e133333d bug #1661: fix regression in GEBP and AVX512 2019-01-16 21:22:20 +01:00
Gael Guennebaud
ce88e297dc Add a comment stating this doc page is partly obsolete. 2019-01-16 16:29:02 +01:00
Gael Guennebaud
729d1291c2 bug #1585: update doc on lazy-evaluation 2019-01-16 16:28:17 +01:00
Gael Guennebaud
c8e40edac9 Remove Eigen2ToEigen3 migration page (obsolete since 3.3) 2019-01-16 16:27:00 +01:00
Gael Guennebaud
aeffdf909e bug #1617: add unit tests for empty triangular solve. 2019-01-16 15:24:59 +01:00
Gael Guennebaud
502f717980 bug #1646: disable aliasing detection for empty and 1x1 expression 2019-01-16 14:33:45 +01:00
Gael Guennebaud
0b466b6933 bug #1633: use proper type for madd temporaries, factorize RhsPacketx4. 2019-01-16 13:50:13 +01:00
Renjie Liu
dbfcceabf5 Bug: 1633: refactor gebp kernel and optimize for neon 2019-01-16 12:51:36 +08:00
Gael Guennebaud
2b70b2f570 Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry 2019-01-15 22:50:42 +01:00
Gael Guennebaud
2c2c114995 Silent maybe-uninitialized warnings by gcc 2019-01-15 16:53:15 +01:00
Gael Guennebaud
6ec6bf0b0d Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests) 2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24 bug #1592: makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests) 2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
32d7232aec fix always true warning with gcc 4.7 2019-01-15 11:18:48 +01:00
Gael Guennebaud
6cf7afa3d9 Typo 2019-01-15 11:04:37 +01:00
Gael Guennebaud
e7d4d4f192 cleanup 2019-01-15 10:51:03 +01:00
Rasmus Larsen
7b3aab0936 Merged in rmlarsen/eigen (pull request PR-570)
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Rasmus Munk Larsen
8bf00c2baf Remove extra <tr>. 2019-01-14 13:29:29 -08:00
Rasmus Munk Larsen
ec7fe83554 Merge. 2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2ea4efc0c3 Merge. 2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2c5843dbbb Update documentation. 2019-01-14 13:26:34 -08:00
Gael Guennebaud
250dcd1fdb bug #1652: fix position of EIGEN_ALIGN16 attributes in Neon and Altivec 2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae Merged eigen/eigen into default 2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers 2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe AVX512 (r)sqrt(double) was mistakenly disabled with clang and others 2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90 fix warning 2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3 Doc: add Isometry in the list of supported Mode of Transform<> 2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a Initialize isometric transforms like affine transforms.
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61 PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023 Fix StorageIndex FIXME in dense LU solvers 2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
1c6e6e2c3f Merge. 2019-01-11 17:47:11 -08:00
Rasmus Larsen
0ba3b45419 Merged eigen/eigen into default 2019-01-11 17:46:04 -08:00
Rasmus Munk Larsen
28ba1b2c32 Add support for inverse hyperbolic functions.
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
89c4001d6f Fix warnings in ptrue for complex and half types. 2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen
a49d01edba Fix warnings in ptrue for complex and half types. 2019-01-11 13:18:17 -08:00
Eugene Zhulenev
1e6d15b55b Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-11 11:41:53 -08:00
Rasmus Munk Larsen
df29511ac0 Fix merge. 2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen
8e71ed4cc9 Merge. 2019-01-11 10:35:07 -08:00
Rasmus Munk Larsen
fff5a5b579 Resolve. 2019-01-11 10:28:52 -08:00
Rasmus Munk Larsen
9396ace46b Merge. 2019-01-11 10:28:52 -08:00
Rasmus Larsen
74882471d0 Merged eigen/eigen into default 2019-01-11 10:20:55 -08:00
Rasmus Munk Larsen
e9936cf2b9 Merge. 2019-01-11 09:58:33 -08:00
Gael Guennebaud
9005f0111f Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7. 2019-01-11 17:10:54 +01:00
Mark D Ryan
3c9add6598 Remove reinterpret_cast from AVX512 complex implementation
The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&)
ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working
correctly.  They're used to convert the kernel parameters to
PacketBlock<Packet8d,T>& so that the complex number versions of
ptranspose can be written using the existing double implementations.
Unfortunately, they don't seem to work and are responsible for 9 unit
test failures in the AVX512 build of tensorflow master.  This commit
fixes the issue by manually initialising PacketBlock<Packet8d,T>
variables with the contents of the kernel parameter before calling
the double version of ptranspose, and then copying the resulting
values back into the kernel parameter before returning.
2019-01-11 14:02:09 +01:00
Christoph Hertzberg
0522460a0d bug #1656: Enable failtests only if BUILD_TESTING is enabled 2019-01-11 11:07:56 +01:00
Eugene Zhulenev
0abe03764c Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-10 10:27:55 -08:00
Rasmus Munk Larsen
fcfced13ed Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate. 2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
ce38c342c3 merge. 2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
a05ec7993e merge 2019-01-09 17:17:30 -08:00
Rasmus Munk Larsen
e15bb785ad Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5 Fix typo. 2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526 Collapsed revision
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f178429b9 Collapsed revision
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
1119c73d22 Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514 Undo useless diffs. 2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8 Simplify a bit. 2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6 Add packet up "pones". Write pnot(a) as pxor(pones(a), a). 2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4 Merged eigen/eigen into default 2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3 bug #1654: fix compilation with cuda and no c++11 2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74 fix plog(+inf) with AVX512 2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7 Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE 2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e fix warning 2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b Add missing pcmp_lt and others for AVX512 2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd bug #1652: implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
  - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Eugene Zhulenev
e70ffef967 Optimize evalShardedByInnerDim 2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
055f0b73db Add support for pcmp_eq and pnot, including for complex types. 2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41 Explicitly set fill character when printing aligned data to ostream 2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd PR560: Fix the AVX512f only builds
Commit c53eececb0
 introduced AVX512 support for complex numbers but required
avx512dq to build.  Commit 1d683ae2f5
 fixed some but not, it would seem all,
of the hard avx512dq dependencies.  Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3.  Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
 calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps.  This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
697fba3bb0 Fix unit test 2018-12-27 11:20:47 +01:00
Gael Guennebaud
60d3fe9a89 One more stupid AVX 512 fix (I don't have direct access to AVX512 machines) 2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510 Add EIGEN_STRONG_INLINE where required 2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8 Add missing pcmp_lt_or_nan for AVX512 2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a Implement a faster fix for sin/cos of large entries that also correctly handle INF input. 2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8 Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate) 2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb Fix plog(+INF): it returned ~87 instead of +INF 2018-12-23 15:40:52 +01:00
Christoph Hertzberg
6dd93f7e3b Make code compile again for older compilers.
See https://stackoverflow.com/questions/7411515/
2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves
1024a70e82 gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The patch works by altering the gebp lhs packing routines to also
consider ½ and ¼ packet lenght rows when packing, besides the original
whole package and row-by-row attempts. Finally, gebp itself will try
to fit a fraction of a packet at a time if:

i) ½ and/or ¼ packets are available for the current context (e.g. AVX2
   and SSE-sized SIMD register for x86)

ii) The matrix's height is favorable to it (it may be it's too small
    in that dimension to take full advantage of the current/maximum
    packet width or it may be the case that last rows may take
    advantage of smaller packets before gebp goes row-by-row)

This helps mitigate huge slowdowns one had on AVX512 builds when
compared to AVX2 ones, for some dimensions. Gains top at an extra 1x
in throughput. This patch is a complement to changeset 4ad359237a
.

Since packing is changed, Eigen users which would go for very
low-level API usage, like TensorFlow, will have to be adapted to work
fine with the changes.
2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves
e763fcd09e Introducing "vectorized" byte on unpacket_traits structs
This is a preparation to a change on gebp_traits, where a new template
argument will be introduced to dictate the packet size, so it won't be
bound to the current/max packet size only anymore.

By having packet types defined early on gebp_traits, one has now to
act on packet types, not scalars anymore, for the enum values defined
on that class. One approach for reaching the vectorizable/size
properties one needs there could be getting the packet's scalar again
with unpacket_traits<>, then the size/Vectorizable enum entries from
packet_traits<>. It turns out guards like "#ifndef
EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet
variations of packet_traits<> for some types (and it makes sense to
keep that). In other words, one can't go back to the scalar and create
a new PacketType, as this will always lead to the maximum packet type
for the architecture.

The less costly/invasive solution for that, thus, is to add the
vectorizable info on every unpacket_traits struct as well.
2018-12-19 14:24:44 -08:00
Gael Guennebaud
efa4c9c40f bug #1615: slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
f20c991679 add changesets related to matrix product perf. 2018-12-13 10:33:29 +01:00
Rasmus Munk Larsen
dd6d65898a Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0. 2018-12-12 14:45:31 -08:00
Gael Guennebaud
f582ea3579 Fix compilation with expression template scalar type. 2018-12-12 22:47:00 +01:00
Gael Guennebaud
cfc70dc13f Add regression test for bug #1174 2018-12-12 18:03:31 +01:00
Gael Guennebaud
2de8da70fd bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the diagonal. 2018-12-12 17:30:08 +01:00
Gael Guennebaud
72c0bbe2bd Simplify handling of tests that must fail to compile.
Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).
2018-12-12 15:48:36 +01:00
Gael Guennebaud
37c91e1836 bug #1644: fix warning 2018-12-11 22:07:20 +01:00
Gael Guennebaud
f159cf3d75 Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd Properly set the number of registers for AVX512 2018-12-11 15:33:17 +01:00
Gael Guennebaud
7166496f70 bug #1643: fix compilation issue with gcc and no optimizaion 2018-12-11 13:24:42 +01:00
Gael Guennebaud
0d90637838 enable spilling workaround on architectures with SSE/AVX 2018-12-10 23:22:44 +01:00
Gael Guennebaud
cf697272e1 Remove debug code. 2018-12-09 23:05:46 +01:00
Gael Guennebaud
450dc97c6b Various fixes in polynomial solver and its unit tests:
- cleanup noise in imaginary part of real roots
 - take into account the magnitude of the derivative to check roots.
 - use <= instead of < at appropriate places
2018-12-09 22:54:39 +01:00
Gael Guennebaud
348bb386d1 Enable "old" CMP0026 policy (not perfect, but better than dozens of warning) 2018-12-08 18:59:51 +01:00
Gael Guennebaud
bff90bf270 workaround "may be used uninitialized" warning 2018-12-08 18:58:28 +01:00
Gael Guennebaud
81c27325ae bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 2018-12-08 14:27:48 +01:00
Gael Guennebaud
426bce7529 fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target 2018-12-08 09:44:21 +01:00
Gael Guennebaud
cd25b538ab Fix noise in sparse_basic_3 (numerical cancellation) 2018-12-08 00:13:37 +01:00
Gael Guennebaud
efaf03bf96 Fix noise in lu unit test 2018-12-08 00:05:03 +01:00
Gael Guennebaud
956678a4ef bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling. 2018-12-07 18:03:36 +01:00
Gael Guennebaud
7b6d0ff1f6 Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error. 2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA 2018-12-07 10:01:09 +01:00
Gael Guennebaud
ae59a7652b bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA 2018-12-07 09:23:28 +01:00
Gael Guennebaud
4e7746fe22 bug #1636: fix gemm performance issue with gcc>=6 and no FMA 2018-12-07 09:15:46 +01:00
Gael Guennebaud
cbf2f4b7a0 AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only 2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5 Fix compilation with avx512f only, i.e., no AVX512DQ 2018-12-06 18:11:07 +01:00
Gael Guennebaud
aab749b1c3 fix test regarding AVX512 vectorization of complexes. 2018-12-06 16:55:00 +01:00
Gael Guennebaud
c53eececb0 Implement AVX512 vectorization of std::complex<float/double> 2018-12-06 15:58:06 +01:00
Gael Guennebaud
3fba59ea59 temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though! 2018-12-06 00:13:26 +01:00
Gael Guennebaud
1ac2695ef7 bug #1636: fix compilation with some ABI versions. 2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen
47d8b741b2 #elif -> #else to fix GPU build. 2018-12-05 13:19:31 -08:00
Rasmus Munk Larsen
8a02883d58 Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
Fix tensor contraction on AVX512 builds

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-05 18:19:32 +00:00
Gael Guennebaud
acc3459a49 Add help messages in the quick ref/ascii docs regarding slicing, indexing, and reshaping. 2018-12-05 17:17:23 +01:00
Gael Guennebaud
e2e897298a Fix page nesting 2018-12-05 17:13:46 +01:00
Christoph Hertzberg
c1d356e8b4 bug #1635: Use infinity from Numtraits instead of creating it manually. 2018-12-05 15:01:04 +01:00
Mark D Ryan
36f8f6d0be Fix evalShardedByInnerDim for AVX512 builds
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size.  While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16.  The result is slightly
incorrect float32 contractions on AVX512 builds.

This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Rasmus Munk Larsen
b57b31cce9 Merged in ezhulenev/eigen-01 (pull request PR-553)
Do not disable alignment with EIGEN_GPUCC

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-04 23:47:19 +00:00
Eugene Zhulenev
0bb15bb6d6 Update checks in ConfigureVectorization.h 2018-12-03 17:10:40 -08:00
Eugene Zhulenev
fd0fbfa9b5 Do not disable alignment with EIGEN_GPUCC 2018-12-03 15:54:10 -08:00
Christoph Hertzberg
919414b9fe bug #785: Make Cholesky decomposition work for empty matrices 2018-12-03 16:18:15 +01:00
Gael Guennebaud
0ea7ae7213 Add missing padd for Packet8i (it was implicitly generated by clang and gcc) 2018-11-30 21:52:25 +01:00
Gael Guennebaud
ab4df3e6ff bug #1634: remove double copy in move-ctor of non movable Matrix/Array 2018-11-30 21:25:51 +01:00
Gael Guennebaud
c785464430 Add packet sin and cos to Altivec/VSX and NEON 2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be Several improvements regarding packet-bitwise operations:
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876 Add psin/pcos on AVX512 -> almost for free, at last! 2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a Cleanup 2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303 Fix pandnot order in AVX512 2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6 Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX) 2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) 2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7 same for pmax 2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6 pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions 2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b Add missing SSE/AVX type-casting in AVX512 mode 2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. 2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35 Use explicit packet type in SSE/PacketMath pldexp 2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08 do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). 2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24 bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions. 2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21 Update pshiftleft to pass the shift as a true compile-time integer. 2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda Unify SSE/AVX psin functions.
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
 - add packet-float to packet-int type traits
 - add packet float<->int reinterpret casts
 - add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Rasmus Munk Larsen
08edbc8cfe Merged in bjacob/eigen/fixbuild (pull request PR-549)
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 20:14:12 +00:00
Benoit Jacob
7b1cb8a440 fix the build on 64-bit ARM when NEON is disabled 2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008 Unify Altivec/VSX pexp(double) with default implementation 2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e cleanup 2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10 Unify SSE and AVX pexp for double. 2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054 Unify NEON's pexp with generic implementation 2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc Unify Altivec/VSX's pexp with generic implementation 2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5 Unify SSE and AVX implementation of pexp 2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47 Unify Altivec/VSX's plog with generic implementation, and enable it! 2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8 Unify NEON's plog with generic implementation 2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114 First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" 2018-11-26 14:14:07 +01:00
Gael Guennebaud
382279eb7f Extend unit test to recursively check half-packet types and non packet types 2018-11-26 14:10:07 +01:00
Gael Guennebaud
0836a715d6 bug #1611: fix plog(0) on NEON 2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4 Fix typos 2018-11-23 22:22:14 +00:00
Gael Guennebaud
e3b22a6bd0 merge 2018-11-23 16:06:21 +01:00
Gael Guennebaud
ccabdd88c9 Fix reserved usage of double __ in macro names 2018-11-23 16:01:47 +01:00
Gael Guennebaud
572d62697d check two ctors 2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b Fix double = bool ! 2018-11-23 15:12:06 +01:00
Gael Guennebaud
a7842daef2 Fix several uninitialized member from ctor 2018-11-23 15:10:28 +01:00
Christoph Hertzberg
ea60a172cf Add default constructor to Bar to make test compile again with clang-3.8 2018-11-23 14:24:22 +01:00
Christoph Hertzberg
806352d844 Small typo found be Patrick Huber (pull request PR-547) 2018-11-23 12:34:27 +00:00
Gael Guennebaud
a476054879 bug #1624: improve matrix-matrix product on ARM 64, 20% speedup 2018-11-23 10:25:19 +01:00
Gael Guennebaud
c685fe9838 Move regression test to right unit test file 2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8 Workaround weird MSVC bug 2018-11-21 15:53:37 +01:00
Christoph Hertzberg
0ec8afde57 Fixed most conversion warnings in MatrixFunctions module 2018-11-20 16:23:28 +01:00
Deven Desai
e7e6809e6b ROCm/HIP specfic fixes + updates
1. Eigen/src/Core/arch/GPU/Half.h

   Updating the HIPCC implementation half so that it can declared as a __shared__ variable


2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h

   introducing a EIGEN_USE_STD(func) macro that calls
   - std::func be default
   - ::func when eigen is being compiled with HIPCC

   This change was requested in the previous HIP PR
   (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff)


3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h

   Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC


4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h

   Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Gael Guennebaud
6a510fe69c Make MaxPacketSize a true upper bound, even for fixed-size inputs 2018-11-16 11:25:32 +01:00
Gael Guennebaud
43c987b1c1 Add explicit regression test for bug #1622 2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
Commit aa110e681b
 optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation.  However, it
introduced a mismatch between the packet size and the requestedAlignment.  This
mismatch can lead to crashes when the destination is not aligned.  This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize

This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment

A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046 Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES 2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f typo 2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a help doxygen linking to DenseBase::NulllaryExpr 2018-11-14 14:42:59 +01:00
Gael Guennebaud
4263f23c28 Improve doc on multi-threading and warn about hyper-threading 2018-11-14 14:42:29 +01:00
Gael Guennebaud
db529ae4ec doxygen does not like \addtogroup and \ingroup in the same line 2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen
72928a2c8a Merged in rmlarsen/eigen2 (pull request PR-543)
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626 Remove accidental changes. 2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65 Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. 2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. 2018-11-12 13:42:24 -08:00
Gael Guennebaud
c81bdbdadc Add manual doc on STL-compatible iterators 2018-11-12 22:06:33 +01:00
Gael Guennebaud
0105146915 Fix warning in c++03 2018-11-10 09:11:38 +01:00
Rasmus Munk Larsen
93f9988a7e A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles. 2018-11-09 14:15:32 -08:00
Gael Guennebaud
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators 2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba bug #1619: make const and non-const iterators compatible 2018-11-09 16:49:19 +01:00
Gael Guennebaud
fbd6e7b025 add missing ref to a.zeta(b) 2018-11-09 13:53:42 +01:00
Gael Guennebaud
dffd1e11de Limit the size of the toc 2018-11-09 13:52:34 +01:00
Gael Guennebaud
a88e0a0e95 Update doxy hacks wrt doxygen 1.8.13/14 2018-11-09 13:52:10 +01:00
Gael Guennebaud
bd9a00718f Let doxygen sees lastN 2018-11-09 11:35:48 +01:00
Gael Guennebaud
d7c644213c Add and update manual pages for slicing, indexing, and reshaping. 2018-11-09 11:35:27 +01:00
Gael Guennebaud
a368848473 Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE 2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6 Fix max-size in indexed-view 2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff Merged in glchaves/eigen (pull request PR-539)
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gael Guennebaud
995730fc6c Add option to disable plot generation 2018-11-07 00:41:16 +01:00
Gustavo Lima Chaves
4ad359237a Vectorize row-by-row gebp loop iterations on 16 packets as well
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Gael Guennebaud
9d318b92c6 add unit tests for bug #1619 2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d bug #1618: Use different power-of-2 check to avoid MSVC warning 2018-11-01 13:23:19 +01:00
Rasmus Munk Larsen
07fcdd1438 Merged in ezhulenev/eigen-02 (pull request PR-534)
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 18:34:35 +00:00
Eugene Zhulenev
8a977c1f46 Fix cxx11_tensor_{block_access, reduction} tests 2018-10-25 11:31:29 -07:00
Halie Murray-Davis
fb62d6d96e Fix typo in tutorial documentation. 2018-10-25 04:55:34 +00:00
Christoph Hertzberg
b5f077d22c Document EIGEN_NO_IO preprocessor directive 2018-10-25 16:49:25 +02:00
Christian von Schultz
4a40b3785d Collapsed revision (based on pull request PR-325)
* Support compiling without IO streams

Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code

4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
954b4ca9d0 Suppress compiler warning about unused global variable. 2018-10-22 13:48:56 -07:00
Rasmus Munk Larsen
9caafca550 Merged in rmlarsen/eigen (pull request PR-532)
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672 Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
39fec15d5c Merged eigen/eigen into default 2018-10-19 09:48:19 -07:00
Christoph Hertzberg
40fa6f98bf bug #1606: Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
Grafted manually from a4afa90d16
2018-10-19 17:20:51 +02:00
Rasmus Munk Larsen
d8f285852b Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available. 2018-10-18 16:55:02 -07:00
Rasmus Munk Larsen
dda68f56ec Fix GPU build due to gpu_assert not always being defined. 2018-10-18 16:29:29 -07:00
Gael Guennebaud
1dcf5a6ed8 fix typo in doc 2018-10-17 09:29:36 +02:00
Eugene Zhulenev
9e96e91936 Move from rvalue arguments in ThreadPool enqueue* methods 2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816 Reduce thread scheduling overhead in parallelFor 2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f Merged in ezhulenev/eigen-02 (pull request PR-528)
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-10-16 15:39:40 +00:00
Gael Guennebaud
0f780bb0b4 Fix float-to-double warning 2018-10-16 09:19:45 +02:00
Eugene Zhulenev
900c7c61bb Check if it's allowed to squueze inner dimensions in TensorBlockIO 2018-10-15 16:52:33 -07:00
Gael Guennebaud
a39e0f7438 bug #1612: fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>) 2018-10-16 01:04:25 +02:00
Gael Guennebaud
e3b85771d7 Show call stack in case of failing sparse solving. 2018-10-16 00:43:44 +02:00
Gael Guennebaud
d2d570c116 Remove useless (and broken) resize 2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d Iterative solvers: unify and fix handling of multiple rhs.
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve 2018-10-15 23:46:00 +02:00
Gael Guennebaud
d835a0bf53 relax number of iterations checks to avoid false negatives 2018-10-15 10:23:32 +02:00
Gael Guennebaud
3a33db4de5 merge 2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1 Suppress unused variable compiler warning in sparse subtest 3. 2018-10-12 13:41:57 -07:00
Mark D Ryan
aa110e681b PR 526: Speed up multiplication of small, dynamically sized matrices
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop.  Using these types is likely to lead to little or no
vectorization.  Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.

This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions.  It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.

I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds.  Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.


Disabling Packed8d on AVX512 builds:

Total Cases:             969
Better:                  511
Worse:                   85
Same:                    373
Max Improvement:         169.00% (4 8 6)
Max Degradation:         36.50% (8 5 3)
Median Improvement:      35.46%
Median Degradation:      17.41%
Total FLOPs Improvement: 19.42%


Disabling Packet16f and Packed8f on AVX512 builds:

Total Cases:             969
Better:                  658
Worse:                   5
Same:                    306
Max Improvement:         214.05% (8 6 5)
Max Degradation:         22.26% (16 2 1)
Median Improvement:      60.05%
Median Degradation:      13.32%
Total FLOPs Improvement: 59.58%


Disabling Packed8f on AVX builds:

Total Cases:             969
Better:                  663
Worse:                   96
Same:                    210
Max Improvement:         155.29% (4 10 5)
Max Degradation:         35.12% (8 3 2)
Median Improvement:      34.28%
Median Degradation:      15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55 Fix code format 2018-11-02 14:51:35 -07:00
Eugene Zhulenev
118520f04a Workaround nbcc+msvc compiler bug 2018-11-02 14:48:28 -07:00
Christoph Hertzberg
24dc076519 Explicitly convert 0 to Scalar for custom types 2018-10-12 10:22:19 +02:00
Gael Guennebaud
8214cf1896 Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways 2018-10-11 10:27:23 +02:00
Gael Guennebaud
43633fbaba Fix warning with AVX512f 2018-10-11 10:13:48 +02:00
Gael Guennebaud
97e2c808e9 Fix avx512 plog(NaN) to return NaN instead of +inf 2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5 Enable avx512 plog with clang 2018-10-11 10:12:21 +02:00
Gael Guennebaud
2ef1b39674 Relaxed fastmath unit test: if std::foo fails, then let's only trigger a warning is numext::foo fails too.
A true error will triggered only if std::foo works but our numext::foo fails.
2018-10-11 09:45:30 +02:00
Gael Guennebaud
1d5a6363ea relax numerical tests from equal to approx (x87) 2018-10-11 09:29:56 +02:00
Gael Guennebaud
f0aa7e40fc Fix regression in changeset 5335659c47 2018-10-10 23:47:30 +02:00
Gael Guennebaud
ce243ee45b bug #520: add diagmat +/- diagmat operators. 2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47 Merged in ezhulenev/eigen-02 (pull request PR-525)
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688 bug #632: add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d Fix bug in partial reduction of expressions requiring evaluation 2018-10-10 13:23:52 -07:00
Gael Guennebaud
76ceae49c1 bug #1609: add inplace transposition unit test 2018-10-10 21:48:58 +02:00
Eugene Zhulenev
2bf1a31d81 Use void type if stl-style iterators are not supported 2018-10-10 10:31:40 -07:00
Christoph Hertzberg
f3130ee1ba Avoid empty macro arguments 2018-10-10 08:23:40 +02:00
Rasmus Munk Larsen
e8918743c1 Merged in ezhulenev/eigen-01 (pull request PR-523)
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
befcac883d Hide stl-container detection test under #if 2018-10-09 15:36:01 -07:00
Eugene Zhulenev
c0ca8a9fa3 Compile time detection for unimplemented stl-style iterators 2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454 bug #65: add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean() 2018-10-09 23:36:50 +02:00
Gael Guennebaud
bfa2a81a50 Make redux_vec_unroller more flexible regarding packet-type 2018-10-09 23:30:41 +02:00
Gael Guennebaud
c0c3be26ed Extend unit tests for partial reductions 2018-10-09 22:54:54 +02:00
Christoph Hertzberg
3f2c8b7ff0 Fix a lot of Doxygen warnings in Tensor module 2018-10-09 20:22:47 +02:00
Christoph Hertzberg
f6359ad795 Small Doxygen fixes 2018-10-09 19:33:35 +02:00
Gael Guennebaud
7a882c05ab Fix compilation on CUDA 2018-10-09 17:02:16 +02:00
Gael Guennebaud
93a6192e98 fix mpreal for mpfr<4.0.0 2018-10-09 09:15:22 +02:00
Rasmus Munk Larsen
d16634c4d4 Fix out-of bounds access in TensorArgMax.h. 2018-10-08 16:41:36 -07:00
Rasmus Munk Larsen
1a737e1d6a Fix contraction test. 2018-10-08 16:37:07 -07:00
Gael Guennebaud
e00487f7d2 bug #1603: add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy. 2018-10-08 22:27:04 +02:00
Gael Guennebaud
2eda9783de typo 2018-10-08 21:37:46 +02:00
Gael Guennebaud
c6e2dde714 fix c++11 deprecated warning 2018-10-08 18:26:05 +02:00
Gael Guennebaud
6cc9b2c831 fix warning in mpreal.h 2018-10-08 18:25:37 +02:00
Gael Guennebaud
649d4758a6 merge 2018-10-08 17:35:18 +02:00
Gael Guennebaud
aa5820056e Unify c++11 usage in doc's examples and snippets 2018-10-08 17:32:54 +02:00
Gael Guennebaud
e29bfe8479 Update included mpreal header to 3.6.5 and fix deprecated warnings. 2018-10-08 17:09:23 +02:00
Gael Guennebaud
64b1a15318 Workaround stupid warning 2018-10-08 12:01:18 +02:00
Gael Guennebaud
c9643f4a6f Disable C++11 deprecated warning when limiting Eigen to C++98 2018-10-08 10:43:43 +02:00
Gael Guennebaud
774bb9d6f7 fix a doxygen issue 2018-10-08 09:30:15 +02:00
Gael Guennebaud
6c3f6cd52b Fix maybe-uninitialized warning 2018-10-07 23:29:51 +02:00
Gael Guennebaud
bcb7c66b53 Workaround gcc's alloc-size-larger-than= warning 2018-10-07 21:55:59 +02:00
Gael Guennebaud
16b2001ece Fix gcc 8.1 warning: "maybe use uninitialized" 2018-10-07 21:54:49 +02:00
Gael Guennebaud
6512c5e136 Implement a better workaround for GCC's bug #87544 2018-10-07 15:00:05 +02:00
Gael Guennebaud
409132bb81 Workaround gcc bug making it trigger an invalid warning 2018-10-07 09:23:15 +02:00
Gael Guennebaud
c6a1ab4036 Workaround MSVC compilation issue 2018-10-06 13:49:17 +02:00
Gael Guennebaud
e21766c6f5 Clarify doc of rowwise/colwise/vectorwise. 2018-10-05 23:12:09 +02:00
Gael Guennebaud
d92f004ab7 Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns 2018-10-05 23:11:21 +02:00
Gael Guennebaud
91613bf2c2 Add support for c++11 snippets 2018-10-05 23:08:39 +02:00
Gael Guennebaud
3e64b1fc86 Move iterators to internal, improve doc, make unit test c++03 friendly 2018-10-03 15:13:15 +02:00
Gael Guennebaud
2b2b4d0580 fix unused warning 2018-10-03 14:16:21 +02:00
Gael Guennebaud
8a1e98240e add unit tests 2018-10-03 11:56:27 +02:00
Gael Guennebaud
5f26f57598 Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25 Add pointer-based iterator for direct-access expressions 2018-10-02 23:44:36 +02:00
Christoph Hertzberg
c5f1d0a72a Fix shadow warning 2018-10-02 19:01:08 +02:00
Christoph Hertzberg
b92c71235d Move struct outside of method for C++03 compatibility. 2018-10-02 18:59:10 +02:00
Christoph Hertzberg
051f9c1aff Make code compile in C++03 mode again 2018-10-02 18:36:30 +02:00
Christoph Hertzberg
b786ce8c72 Fix conversion warning ... again 2018-10-02 18:35:25 +02:00
Gael Guennebaud
8c38528168 Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index) 2018-10-02 14:03:26 +02:00
Gael Guennebaud
12487531ce Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns) 2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893 Use Index instead of ptrdiff_t or int, fix random-accessors. 2018-10-02 13:29:32 +02:00
Gael Guennebaud
de2efbc43c bug #1605: workaround ABI issue with vector types (aka __m128) versus scalar types (aka float) 2018-10-01 23:45:55 +02:00
Gael Guennebaud
b0c66adfb1 bug #231: initial implementation of STL iterators for dense expressions 2018-10-01 23:21:37 +02:00
Christoph Hertzberg
564ca71e39 Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6 This commit contains the following (HIP specific) updates:
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
  Changing "pass-by-reference" argument to be "pass-by-value" instead
  (in a  __global__ function decl).
  "pass-by-reference" arguments to __global__ functions are unwise,
  and will be explicitly flagged as errors by the newer versions of HIP.

- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
  Changes introduced in recent commits breaks the HIP compile.
  Adding EIGEN_DEVICE_FUNC attribute to some functions and
  calling ::malloc/free instead of the corresponding std:: versions
  to get the HIP compile working again

- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
  Change introduced a recent commit breaks the HIP compile
  (link stage errors out due to failure to inline a function).
  Disabling the recently introduced code (only for HIP compile), to get
  the eigen nightly testing going again.
  Will submit another PR once we have te proper fix.

- Eigen/src/Core/util/ConfigureVectorization.h
  Enabling GPU VECTOR support when HIP compiler is in use
  (for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Rasmus Munk Larsen
2088c0897f Merged eigen/eigen into default 2018-09-28 16:00:46 -07:00
Rasmus Munk Larsen
31629bb964 Get rid of unused variable warning. 2018-09-28 16:00:09 -07:00
Eugene Zhulenev
bb13d5d917 Fix bug in copy optimization in Tensor slicing. 2018-09-28 14:34:42 -07:00
Rasmus Munk Larsen
104e8fa074 Fix a few warnings and rename a variable to not shadow "last". 2018-09-28 12:00:08 -07:00
Rasmus Munk Larsen
7c1b47840a Merged in ezhulenev/eigen-01 (pull request PR-514)
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 18:37:54 +00:00
Eugene Zhulenev
524c81f3fa Add tests for evalShardedByInnerDim contraction + fix bugs 2018-09-28 11:24:08 -07:00
Christoph Hertzberg
86ba50be39 Fix integer conversion warnings 2018-09-28 19:33:39 +02:00
Eugene Zhulenev
e95696acb3 Optimize TensorBlockCopyOp 2018-09-27 14:49:26 -07:00
Eugene Zhulenev
9f33e71e9d Revert code lost in merge 2018-09-27 12:08:17 -07:00
Eugene Zhulenev
a7a3e9f2b6 Merge with eigen/eigen default 2018-09-27 12:05:06 -07:00
Eugene Zhulenev
9f4988959f Remove explicit mkldnn support and redundant TensorContractionKernelBlocking 2018-09-27 11:49:19 -07:00
Rasmus Munk Larsen
1e5750a5b8 Merged in rmlarsen/eigen4 (pull request PR-511)
Parallelize tensor contraction over the inner dimension.
2018-09-27 17:18:32 +00:00
Gael Guennebaud
af3ad4b513 oops, I've been too fast in previous copy/paste 2018-09-27 09:28:57 +02:00
Gael Guennebaud
24b163a877 #pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6 2018-09-27 09:23:54 +02:00
Eugene Zhulenev
b314376f9c Test mkldnn pack for doubles 2018-09-26 18:22:24 -07:00
Eugene Zhulenev
22ed98a331 Conditionally add mkldnn test 2018-09-26 17:57:37 -07:00
Rasmus Munk Larsen
d956204ab2 Remove "false &&" left over from test. 2018-09-26 17:03:30 -07:00
Rasmus Munk Larsen
3815aeed7a Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.

Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1                  387457    396013     -2.2%
BM_Matmul_1_80_13522_2                  406487    230789    +43.2%
BM_Matmul_1_80_13522_4                  395821    123211    +68.9%
BM_Matmul_1_80_13522_6                  391625     97002    +75.2%
BM_Matmul_1_80_13522_8                  408986    113828    +72.2%
BM_Matmul_1_80_13522_16                 399988     67600    +83.1%
BM_Matmul_1_80_13522_22                 411546     60044    +85.4%
BM_Matmul_1_80_13522_32                 393528     57312    +85.4%
BM_Matmul_1_80_13522_44                 390047     63525    +83.7%
BM_Matmul_1_80_13522_88                 387876     63592    +83.6%
BM_Matmul_1_1500_500_1                  245359    248119     -1.1%
BM_Matmul_1_1500_500_2                  401833    143271    +64.3%
BM_Matmul_1_1500_500_4                  210519    100231    +52.4%
BM_Matmul_1_1500_500_6                  251582     86575    +65.6%
BM_Matmul_1_1500_500_8                  211499     80444    +62.0%
BM_Matmul_3_250_512_1                    70297     68551     +2.5%
BM_Matmul_3_250_512_2                    70141     52450    +25.2%
BM_Matmul_3_250_512_4                    67872     58204    +14.2%
BM_Matmul_3_250_512_6                    71378     63340    +11.3%
BM_Matmul_3_250_512_8                    69595     41652    +40.2%
BM_Matmul_3_250_512_16                   72055     42549    +40.9%
BM_Matmul_3_250_512_22                   70158     54023    +23.0%
BM_Matmul_3_250_512_32                   71541     56042    +21.7%
BM_Matmul_3_250_512_44                   71843     57019    +20.6%
BM_Matmul_3_250_512_88                   69951     54045    +22.7%
BM_Matmul_3_1500_512_1                  369328    374284     -1.4%
BM_Matmul_3_1500_512_2                  428656    223603    +47.8%
BM_Matmul_3_1500_512_4                  205599    139508    +32.1%
BM_Matmul_3_1500_512_6                  214278    139071    +35.1%
BM_Matmul_3_1500_512_8                  184149    142338    +22.7%
BM_Matmul_3_1500_512_16                 156462    156983     -0.3%
BM_Matmul_3_1500_512_22                 163905    158259     +3.4%
BM_Matmul_3_1500_512_32                 155314    157662     -1.5%
BM_Matmul_3_1500_512_44                 235434    158657    +32.6%
BM_Matmul_3_1500_512_88                 156779    160275     -2.2%
BM_Matmul_1500_4_512_1                  363358    349528     +3.8%
BM_Matmul_1500_4_512_2                  303134    263319    +13.1%
BM_Matmul_1500_4_512_4                  176208    130086    +26.2%
BM_Matmul_1500_4_512_6                  148026    115449    +22.0%
BM_Matmul_1500_4_512_8                  131656     98421    +25.2%
BM_Matmul_1500_4_512_16                 134011     82861    +38.2%
BM_Matmul_1500_4_512_22                 134950     85685    +36.5%
BM_Matmul_1500_4_512_32                 133165     90081    +32.4%
BM_Matmul_1500_4_512_44                 133203     90644    +32.0%
BM_Matmul_1500_4_512_88                 134106    100566    +25.0%
BM_Matmul_4_1500_512_1                  439243    435058     +1.0%
BM_Matmul_4_1500_512_2                  451830    257032    +43.1%
BM_Matmul_4_1500_512_4                  276434    164513    +40.5%
BM_Matmul_4_1500_512_6                  182542    144827    +20.7%
BM_Matmul_4_1500_512_8                  179411    166256     +7.3%
BM_Matmul_4_1500_512_16                 158101    155560     +1.6%
BM_Matmul_4_1500_512_22                 152435    155448     -1.9%
BM_Matmul_4_1500_512_32                 155150    149538     +3.6%
BM_Matmul_4_1500_512_44                 193842    149777    +22.7%
BM_Matmul_4_1500_512_88                 149544    154468     -3.3%
2018-09-26 16:47:13 -07:00
Eugene Zhulenev
71cd3fbd6a Support multiple contraction kernel types in TensorContractionThreadPool 2018-09-26 11:08:47 -07:00
Christoph Hertzberg
0a3356f4ec Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang) 2018-09-25 20:26:16 +02:00
Gael Guennebaud
41c3a2ffc1 Fix documentation of reshape to vectors. 2018-09-25 16:35:44 +02:00
Christoph Hertzberg
2c083ace3e Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides 2018-09-24 18:01:17 +02:00
Gael Guennebaud
626942d9dd fix alignment issue in ploaddup for AVX512 2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36 Merge with default. 2018-09-23 21:52:58 +02:00
Gael Guennebaud
795e12393b Fix logic in diagonal*dense product in a corner case.
The problem was for: diag(1x1) * mat(1,n)
2018-09-22 16:44:33 +02:00
Gael Guennebaud
bac36d0996 Demangle Travseral and Unrolling in Redux 2018-09-21 23:03:45 +02:00
Gael Guennebaud
c696dbcaa6 Fiw shadowing of last and all 2018-09-21 23:02:33 +02:00
Christoph Hertzberg
e3c8289047 Replace unused PREDICATE by corresponding STATIC_ASSERT 2018-09-21 21:15:51 +02:00
Gael Guennebaud
1bf12880ae Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all) 2018-09-21 16:50:04 +02:00
Gael Guennebaud
4291f167ee Add missing plugins to DynamicSparseMatrix -- fix sparse_extra_3 2018-09-21 14:53:43 +02:00
Gael Guennebaud
03a0cb2b72 fix unalignedcount for avx512 2018-09-21 14:40:26 +02:00
Gael Guennebaud
371068992a Add more debug output 2018-09-21 14:32:39 +02:00
Gael Guennebaud
91716f03a7 Fix vectorization logic unit test for AVX512 2018-09-21 14:32:24 +02:00
Gael Guennebaud
b00e48a867 Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks) 2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787 merge with default Eigen 2018-09-21 11:51:49 +02:00
Gael Guennebaud
47720e7970 Doc fixes 2018-09-21 11:48:22 +02:00
Gael Guennebaud
3ec2985914 Merged indexing cleanup (pull request PR-506) 2018-09-21 09:36:05 +00:00
Gael Guennebaud
651e5d4866 Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
This change also make it future proof for AVX1024
2018-09-21 10:33:22 +02:00
Eugene Zhulenev
719e438a20 Collapsed revision
* Split cxx11_tensor_executor test
* Register test parts with EIGEN_SUFFIXES
* Fix EIGEN_SUFFIXES in cxx11_tensor_executor test
2018-09-20 15:19:12 -07:00
Gael Guennebaud
f0ef3467de Fix doc 2018-09-20 22:57:28 +02:00
Gael Guennebaud
617f75f117 Add indexing namespace 2018-09-20 22:57:10 +02:00
Gael Guennebaud
0c56d22e2e Fix shadowing 2018-09-20 22:56:21 +02:00
Rasmus Munk Larsen
8e2be7777e Merged eigen/eigen into default 2018-09-20 11:41:15 -07:00
Rasmus Munk Larsen
5d2e759329 Initialize BlockIteratorState in a C++03 compatible way. 2018-09-20 11:40:43 -07:00
Gael Guennebaud
e04faca930 merge 2018-09-20 18:33:54 +02:00
Gael Guennebaud
d37188b9c1 Fix MPrealSupport 2018-09-20 18:30:10 +02:00
Gael Guennebaud
3c6dc93f99 Fix GPU support. 2018-09-20 18:29:21 +02:00
Gael Guennebaud
e0f6d352fb Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header. 2018-09-20 18:07:32 +02:00
Gael Guennebaud
eeeb18814f Fix warning 2018-09-20 17:48:56 +02:00
Gael Guennebaud
9419f506d0 Fix regression introduced by the previous fix for AVX512.
It brokes the complex-complex case on SSE.
2018-09-20 17:32:34 +02:00
Christoph Hertzberg
a0166ab651 Workaround for spurious "array subscript is above array bounds" warnings with g++4.x 2018-09-20 17:08:43 +02:00
Gael Guennebaud
e38d1ab4d1 Workaround increases required alignment warning 2018-09-20 17:07:33 +02:00
Christoph Hertzberg
c50250cb24 Avoid warning "suggest braces around initialization of subobject".
This test is not run in C++03 mode, so no compatibility is lost.
2018-09-20 17:03:42 +02:00
Gael Guennebaud
71496b0e25 Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Gael Guennebaud
5a30eed17e Fix warnings in AVX512 2018-09-20 16:58:51 +02:00
Gael Guennebaud
2cf6d3050c Disable ignoring attributes warning 2018-09-20 11:38:19 +02:00
Rasmus Munk Larsen
44d8274383 Cast to longer type. 2018-09-19 13:31:42 -07:00
Rasmus Munk Larsen
d638b62dda Silence compiler warning. 2018-09-19 13:27:55 -07:00
Rasmus Munk Larsen
db9c9df59a Silence more compiler warnings. 2018-09-19 11:50:27 -07:00
Rasmus Munk Larsen
febd09dcc0 Silence compiler warnings in ThreadPoolInterface.h. 2018-09-19 11:11:04 -07:00
Gael Guennebaud
c3a19527a2 Fix doc wrt previous change 2018-09-19 11:49:26 +02:00
Gael Guennebaud
dfa8439e4d Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
luz.paz"
f67b19a884 [PATCH 1/2] Misc. typos
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
 CMakeLists.txt                                | 26 +++++++++----------
 Eigen/src/Core/GenericPacketMath.h            |  2 +-
 Eigen/src/SparseLU/SparseLU.h                 |  2 +-
 bench/bench_norm.cpp                          |  2 +-
 doc/HiPerformance.dox                         |  2 +-
 doc/QuickStartGuide.dox                       |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorChipping.h   |  6 ++---
 .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h  |  2 +-
 .../src/Tensor/TensorForwardDeclarations.h    |  4 +--
 .../src/Tensor/TensorGpuHipCudaDefines.h      |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorReduction.h  |  2 +-
 .../CXX11/src/Tensor/TensorReductionGpu.h     |  2 +-
 .../test/cxx11_tensor_concatenation.cpp       |  2 +-
 unsupported/test/cxx11_tensor_executor.cpp    |  2 +-
 14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Gael Guennebaud
297ca62319 ease transition by adding placeholders::all/last/and as deprecated 2018-09-17 16:24:52 +02:00
Gael Guennebaud
2014c7ae28 Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end. 2018-09-15 14:35:10 +02:00
Gael Guennebaud
82772e8d9d Rename Symbolic namespace to symbolic to be consistent with numext namespace 2018-09-15 14:16:20 +02:00
Rasmus Munk Larsen
400512bfad Merged in ezhulenev/eigen-02 (pull request PR-501)
Enable DSizes type promotion with c++03
2018-09-19 00:50:04 +00:00
Eugene Zhulenev
c4627039ac Support static dimensions (aka IndexList) in Tensor::resize(...) 2018-09-18 14:25:21 -07:00
Gael Guennebaud
3e8188fc77 bug #1600: initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert) 2018-09-18 21:24:48 +02:00
Eugene Zhulenev
218a7b9840 Enable DSizes type promotion with c++03 compilers 2018-09-18 10:57:00 -07:00
Ravi Kiran
1f0c941c3d Collapsed revision
* Merged eigen/eigen into default
2018-09-17 18:29:12 -07:00
Rasmus Munk Larsen
03a88c57e1 Merged in ezhulenev/eigen-02 (pull request PR-498)
Add DSizes index type promotion
2018-09-17 21:58:38 +00:00
Rasmus Munk Larsen
5ca0e4a245 Merged in ezhulenev/eigen-01 (pull request PR-497)
Fix warnings in IndexList array_prod
2018-09-17 20:15:06 +00:00
Eugene Zhulenev
a5cd4e9ad1 Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList 2018-09-17 10:58:07 -07:00
Gael Guennebaud
b311bfb752 bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
Gael Guennebaud
72f19c827a typo 2018-09-16 22:10:34 +02:00
Eugene Zhulenev
66f056776f Add DSizes index type promotion 2018-09-15 15:17:38 -07:00
Eugene Zhulenev
f313126dab Fix warnings in IndexList array_prod 2018-09-15 13:47:54 -07:00
Christoph Hertzberg
42705ba574 Fix weird error for building with g++-4.7 in C++03 mode. 2018-09-15 12:43:41 +02:00
Rasmus Munk Larsen
c2383f95af Merged in ezhulenev/eigen/fix_dsizes (pull request PR-494)
Fix DSizes IndexList constructor
2018-09-15 02:36:19 +00:00
Rasmus Munk Larsen
30290cdd56 Merged in ezhulenev/eigen/moar_eigen_fixes_3 (pull request PR-493)
Const cast scalar pointer in TensorSlicingOp evaluator

Approved-by: Sameer Agarwal <sameeragarwal@google.com>
2018-09-15 02:35:07 +00:00
Eugene Zhulenev
f7d0053cf0 Fix DSizes IndexList constructor 2018-09-14 19:19:13 -07:00
Rasmus Munk Larsen
601e289d27 Merged in ezhulenev/eigen/moar_eigen_fixes_1 (pull request PR-492)
Explicitly construct tensor block dimensions from evaluator dimensions
2018-09-15 01:36:21 +00:00
Eugene Zhulenev
71070a1e84 Const cast scalar pointer in TensorSlicingOp evaluator 2018-09-14 17:17:50 -07:00
Eugene Zhulenev
4863375723 Explicitly construct tensor block dimensions from evaluator dimensions 2018-09-14 16:55:05 -07:00
Rasmus Munk Larsen
14e35855e1 Merged in chtz/eigen-maxsizevector (pull request PR-490)
Let MaxSizeVector respect alignment of objects

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-09-14 23:29:24 +00:00
Rasmus Munk Larsen
281e631839 Merged in ezhulenev/eigen/indexlist_to_dsize (pull request PR-491)
Support reshaping with static shapes and dimensions conversion in tensor broadcasting

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-09-14 22:45:52 +00:00
Eugene Zhulenev
1b8d70a22b Support reshaping with static shapes and dimensions conversion in tensor broadcasting 2018-09-14 15:25:27 -07:00
Christoph Hertzberg
007f165c69 bug #1598: Let MaxSizeVector respect alignment of objects and add a unit test
Also revert 8b3d9ed081
2018-09-14 20:21:56 +02:00
Christoph Hertzberg
d7378aae8e Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment. 2018-09-14 20:17:47 +02:00
Rasmus Munk Larsen
9b864cdb37 Merged in rmlarsen/eigen3 (pull request PR-480)
Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.
2018-09-14 00:05:09 +00:00
Rasmus Munk Larsen
d0eef5fe6c Don't use bracket syntax in ctor. 2018-09-13 17:04:05 -07:00
Rasmus Munk Larsen
6313dde390 Fix merge error. 2018-09-13 16:42:05 -07:00
Rasmus Munk Larsen
0db590d22d Backed out changeset 01197e4452 2018-09-13 16:20:57 -07:00
Rasmus Munk Larsen
b3f4c067d9 Merge 2018-09-13 16:18:52 -07:00
Rasmus Munk Larsen
2b07018140 Enable vectorized version on GPUs. The underlying bug has been fixed. 2018-09-13 16:12:22 -07:00
Rasmus Munk Larsen
53568e3549 Merged in ezhulenev/eigen/tiled_evalution_support (pull request PR-444)
Tiled evaluation for Tensor ops

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
Approved-by: Gael Guennebaud <g.gael@free.fr>
2018-09-13 22:05:47 +00:00
Eugene Zhulenev
01197e4452 Fix warnings 2018-09-13 15:03:36 -07:00
Gael Guennebaud
1141bcf794 Fix conjugate-gradient for very small rhs 2018-09-13 23:53:28 +02:00
Gael Guennebaud
7f3b17e403 MSVC 2015 supports c++11 thread-local-storage 2018-09-13 18:15:07 +02:00
Eugene Zhulenev
d138fe341d Fis static_assert in test to conform c++11 standard 2018-09-11 17:23:18 -07:00
Rasmus Munk Larsen
e289f44c56 Don't vectorize the MeanReducer unless pdiv is available. 2018-09-11 14:09:00 -07:00
Eugene Zhulenev
55bb7e7935 Merge with upstream eigen/default 2018-09-11 13:33:06 -07:00
Eugene Zhulenev
81b38a155a Fix compilation of tiled evaluation code with c++03 2018-09-11 13:32:32 -07:00
Rasmus Munk Larsen
5da960702f Merged eigen/eigen into default 2018-09-11 10:08:46 -07:00
Rasmus Munk Larsen
46f88fc454 Use numerically stable tree reduction in TensorReduction. 2018-09-11 10:08:10 -07:00
Justin Carpentier
4827bec776 LLT: correct doc and add missing reference for the return type of rankUpdate
---
 Eigen/src/Cholesky/LLT.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
2018-09-11 09:33:21 +02:00
Rasmus Munk Larsen
3d057e0453 Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set. 2018-09-06 12:59:36 -07:00
cgs1019
c6066ac411 Make param name and docs constistent for JacobiRotation::makeGivens
Previously the rendered math in the doc string called the optional return value
'r', while the actual parameter and the doc string text referred to the
parameter as 'z'. This changeset renames all the z's to r's to match the math.
2018-09-06 11:04:17 -04:00
Alexey Frunze
edeee16a16 Fix build failures in matrix_power and matrix_exponential tests.
This fixes the static assertion complaining about double being
used in place of long double. This happened on MIPS32, where
double and long double have the same type representation.
This can be simulated on x86 as well if we pass -mlong-double-64
to g++.
2018-08-31 14:11:10 -07:00
Deven Desai
c64fe9ea1f Updates to fix HIP-clang specific compile errors.
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
Rasmus Munk Larsen
8b3d9ed081 Use padding instead of alignment attribute, which MaxSizeVector does not respect. This leads to undefined behavior and hard-to-trace bugs. 2018-09-05 11:20:06 -07:00
Gael Guennebaud
5927eef612 Enable std::result_of for msvc 2015 and later 2018-09-13 09:44:46 +02:00
Christoph Hertzberg
3adece4827 Fix misleading indentation of errorCode and make it loop-local 2018-09-12 14:41:38 +02:00
Christoph Hertzberg
7e9c9fbb2d Disable type-limits warnings for g++ < 4.8 2018-09-12 14:40:39 +02:00
Christoph Hertzberg
ba2c8efdcf EIGEN_UNUSED is not supported by g++4.7 (and not portable) 2018-09-12 11:49:10 +02:00
Christoph Hertzberg
ff4e835d6b "sparse_product.cpp" must be included before "sparse_basic.cpp", otherwise EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN has no effect 2018-08-30 20:10:11 +02:00
Christoph Hertzberg
023ed6b9a8 Product of empty array must be 1 and not 0. 2018-08-30 17:14:52 +02:00
Christoph Hertzberg
c2f4e8c08e Fix integer conversion warning 2018-08-30 17:12:53 +02:00
Christoph Hertzberg
ddbc564386 Fixed a few more shadowing warnings when compiling with g++ (and c++03) 2018-08-30 16:33:03 +02:00
Deven Desai
946c3e2544 adding EIGEN_DEVICE_FUNC attribute to fix some GPU unit tests that are broken in HIP mode 2018-08-27 23:04:08 +00:00
Mehdi Goli
7ec8b40ad9 Collapsed revision
* Separating SYCL math function.
* Converting function overload to function specialisation.
* Applying the suggested design.
2018-08-28 14:20:48 +01:00
Christoph Hertzberg
20ba2eee6d gcc thinks this may not be initialized 2018-08-28 18:33:24 +02:00
Christoph Hertzberg
73ca600bca Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
Christoph Hertzberg
ef4d79fed8 Disable/ReenableStupidWarnings did not work properly, when included recursively 2018-08-28 18:26:22 +02:00
Gael Guennebaud
befaf83f5f bug #1590: fix collision with some system headers defining the macro FP32 2018-08-28 13:21:28 +02:00
Christoph Hertzberg
42f3ee4fb8 Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop
Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers
2018-08-28 11:44:15 +02:00
Eugene Zhulenev
c144bb355b Merge with upstream eigen/default 2018-08-27 14:34:07 -07:00
Gael Guennebaud
5747288676 Disable a bonus unit-test which is broken with gcc 4.7 2018-08-27 13:07:34 +02:00
Gael Guennebaud
d5ed64512f bug #1573: workaround gcc 4.7 and 4.8 bug 2018-08-27 10:38:20 +02:00
Christoph Hertzberg
b1653d1599 Fix some trivial C++11 vs C++03 compatibility warnings 2018-08-25 12:21:00 +02:00
Christoph Hertzberg
42123ff38b Make unit test C++03 compatible 2018-08-25 11:53:28 +02:00
Christoph Hertzberg
4b1ad086b5 Fix shadow warnings in doc-snippets 2018-08-25 10:07:17 +02:00
Christoph Hertzberg
117bc5d505 Fix some shadow warnings 2018-08-25 09:06:08 +02:00
Christoph Hertzberg
f155e97adb Previous fix broke compilation for clang 2018-08-25 00:10:46 +02:00
Christoph Hertzberg
209b4972ec Fix conversion warning 2018-08-25 00:02:46 +02:00
Christoph Hertzberg
495f6c3c3a Fix missing-braces warnings 2018-08-24 23:56:13 +02:00
Christoph Hertzberg
5aaedbeced Fixed more sign-compare and type-limits warnings 2018-08-24 23:54:12 +02:00
Christoph Hertzberg
8295f02b36 Hide "maybe uninitialized" warning on gcc 2018-08-24 23:22:20 +02:00
Christoph Hertzberg
f7675b826b Fix several integer conversion and sign-compare warnings 2018-08-24 22:58:55 +02:00
Christoph Hertzberg
949b0ad9cb Merged in rmlarsen/eigen3 (pull request PR-468)
Add support for emulating thread local.
2018-08-24 17:29:03 +00:00
Rasmus Munk Larsen
744e2fe0de Address comments about EIGEN_THREAD_LOCAL. 2018-08-24 10:24:54 -07:00
Christoph Hertzberg
ad4a08fb68 Use Intel cast intrinsics, since MSVC does not allow direct casting.
Reported by David Winkler.
2018-08-24 19:04:33 +02:00
Rasmus Munk Larsen
8d9bc5cc02 Fix g++ compilation. 2018-08-23 13:06:39 -07:00
Rasmus Munk Larsen
e9f9d70611 Don't rely on __had_feature for g++.
Don't use __thread.
Only use thread_local for gcc 4.8 or newer.
2018-08-23 12:59:46 -07:00
Rasmus Munk Larsen
668690978f Pad PerThread when we emulate thread_local to prevent false sharing. 2018-08-23 12:54:33 -07:00
Rasmus Munk Larsen
6cedc5a9b3 rename mu. 2018-08-23 12:11:58 -07:00
Rasmus Munk Larsen
6e0464004a Store std::unique_ptr instead of raw pointers in per_thread_map_. 2018-08-23 12:10:08 -07:00
Rasmus Munk Larsen
e51d9e473a Protect #undef max with #ifdef max. 2018-08-23 11:42:05 -07:00
Rasmus Munk Larsen
d35880ed91 merge 2018-08-23 11:36:49 -07:00
Christoph Hertzberg
a709c8efb4 Replace pointers by values or unique_ptr for better leak-safety 2018-08-23 19:41:59 +02:00
Christoph Hertzberg
39335cf51e Make MaxSizeVector leak-safe 2018-08-23 19:37:56 +02:00
Benoit Steiner
ff8e0ecc2f Updated one more line of code to avoid making the test dependent on cxx11 features. 2018-08-17 15:15:52 -07:00
Benoit Steiner
43d9dd9b28 Removed more dependencies on cxx11. 2018-08-17 08:49:32 -07:00
Gael Guennebaud
f76c802973 Add missing empty line 2018-08-17 17:16:12 +02:00
Christoph Hertzberg
41f1cc67b8 Assertion depended on a not yet initialized value 2018-08-17 16:42:53 +02:00
Christoph Hertzberg
4713465eef Silence double-promotion warning 2018-08-17 16:39:43 +02:00
Christoph Hertzberg
595cae9b09 Silence logical-op-parentheses warning 2018-08-17 16:30:32 +02:00
Christoph Hertzberg
c9b25fbefa Silence unused parameter warning 2018-08-17 16:28:28 +02:00
Christoph Hertzberg
dbdeceabdd Silence double-promotion warning (when converting double to complex<long double>) 2018-08-17 16:26:11 +02:00
Benoit Steiner
19df4d5752 Merged in codeplaysoftware/eigen-upstream-pure/Pointer_type_creation (pull request PR-461)
Creating a pointer type in TensorCustomOp.h
2018-08-16 18:28:33 +00:00
Benoit Steiner
f641cf1253 Adding missing at method in Eigen::array 2018-08-16 11:24:37 -07:00
Benoit Steiner
ede580ccda Avoid using the auto keyword to make the tensor block access test more portable 2018-08-16 10:49:47 -07:00
Benoit Steiner
e23c8c294e Use actual types instead of the auto keyword to make the code more portable 2018-08-16 10:41:01 -07:00
Mehdi Goli
80f1a76dec removing the noises. 2018-08-16 13:33:24 +01:00
Mehdi Goli
d0b01ebbf6 Reverting the unitended delete from the code. 2018-08-16 13:21:36 +01:00
Mehdi Goli
161dcbae9b Using PointerType struct and specializing it per device for TensorCustomOp.h 2018-08-16 00:07:02 +01:00
Sameer Agarwal
f197c3f55b Removed an used variable (PacketSize) from TensorExecutor 2018-08-15 11:24:57 -07:00
Benoit Steiner
4181556907 Fixed the tensor contraction code. 2018-08-15 09:34:47 -07:00
Benoit Steiner
b6f96cf7dd Removed dependencies on cxx11 language features from the tensor_block_access test 2018-08-15 08:54:31 -07:00
Benoit Steiner
fbb834144d Fixed more compilation errors 2018-08-15 08:52:58 -07:00
Benoit Steiner
6bb3f1b43e Made the tensor_block_access test compile again 2018-08-14 14:26:59 -07:00
Benoit Steiner
43ec0082a6 Made the kronecker_product test compile again 2018-08-14 14:08:36 -07:00
Benoit Steiner
ab3f481141 Cleaned up the code and make it compile with more compilers 2018-08-14 14:05:46 -07:00
Rasmus Munk Larsen
fa0bcbf230 merge 2018-08-14 12:18:31 -07:00
Rasmus Munk Larsen
15d4f515e2 Use plain_assert in destructors to avoid throwing in CXX11 tests where main.h owerwrites eigen_assert with a throwing version. 2018-08-14 12:17:46 -07:00
Rasmus Munk Larsen
aebdb06424 Fix a few compiler warnings in CXX11 tests. 2018-08-14 12:06:39 -07:00
Rasmus Munk Larsen
2a98bd9c8e Merged eigen/eigen into default 2018-08-14 12:02:09 -07:00
Benoit Steiner
59bba77ead Fixed compilation errors with gcc 4.7 and 4.8 2018-08-14 10:54:48 -07:00
Mehdi Goli
a97aaa2bcf Merge with upstream. 2018-08-14 17:49:29 +01:00
Mehdi Goli
8ba799805b Merge with upstream 2018-08-14 09:43:45 +01:00
Rasmus Munk Larsen
6d6e7b7027 merge 2018-08-13 15:34:50 -07:00
Rasmus Munk Larsen
9bb75d8d31 Add Barrier.h. 2018-08-13 15:34:03 -07:00
Rasmus Munk Larsen
2e1adc0324 Merged eigen/eigen into default 2018-08-13 15:32:00 -07:00
Rasmus Munk Larsen
8278ae6313 Add support for thread local support on platforms that do not support it through emulation using a hash map. 2018-08-13 15:31:23 -07:00
Benoit Steiner
501be70b27 Code cleanup 2018-08-13 15:16:40 -07:00
Benoit Steiner
3d3711f22f Fixed compilation errors. 2018-08-13 15:16:06 -07:00
Gael Guennebaud
3ec60215df Merged in rmlarsen/eigen2 (pull request PR-466)
Move sigmoid functor to core and rename it to 'logistic'.
2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen
0f1b2e08a5 Call logistic functor from Tensor::sigmoid. 2018-08-13 11:52:58 -07:00
Rasmus Munk Larsen
d6e283ba96 sigmoid -> logistic 2018-08-13 11:14:50 -07:00
Benoit Steiner
26239ee580 Use NULL instead of nullptr to avoid adding a cxx11 requirement. 2018-08-13 11:05:51 -07:00
Benoit Steiner
3810ec228f Don't use the auto keyword since it's not always supported properly. 2018-08-13 10:46:09 -07:00
Benoit Steiner
e6d5be811d Fixed syntax of nested templates chevrons to make it compatible with c++97 mode. 2018-08-13 10:29:21 -07:00
Mehdi Goli
1aa86aad14 Merge with upstream. 2018-08-13 15:40:31 +01:00
Eugene Zhulenev
35d90e8960 Fix BlockAccess enum in CwiseUnaryOp evaluator 2018-08-10 17:37:58 -07:00
Eugene Zhulenev
855b68896b Merge with eigen/default 2018-08-10 17:18:42 -07:00
Eugene Zhulenev
f2209d06e4 Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all evaluators 2018-08-10 16:53:36 -07:00
Benoit Steiner
c8ea398675 Avoided language features that are only available in cxx11 mode. 2018-08-10 13:02:41 -07:00
Benoit Steiner
4be4286224 Made the code compile with gcc 5.4. 2018-08-10 11:32:58 -07:00
Justin Carpentier
eabc7a4031 PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type
The type should be RowMajor
2018-08-10 14:30:06 +02:00
Rasmus Munk Larsen
c49e93440f SuiteSparse defines the macro SuiteSparse_long to control what type is used for 64bit integers. The default value of this macro on non-MSVC platforms is long and __int64 on MSVC. CholmodSupport defaults to using long for the long variants of CHOLMOD functions. This creates problems when SuiteSparse_long is different than long. So the correct thing to do here is
to use SuiteSparse_long as the type instead of long.
2018-08-13 15:53:31 -07:00
Mehdi Goli
3a2e1b1fc6 Merge with upstream. 2018-08-10 12:28:38 +01:00
Rasmus Munk Larsen
bfc5091dd5 Cast to diagonalSize to RealScalar instead Scalar. 2018-08-09 14:46:17 -07:00
Rasmus Munk Larsen
8603d80029 Cast diagonalSize() to Scalar before multiplication. Without this, automatic differentiation in Ceres breaks because Scalar is a custom type that does not support multiplication by Index. 2018-08-09 11:09:10 -07:00
Eugene Zhulenev
cfaedb38cd Fix bug in a test + compilation errors 2018-08-09 09:44:07 -07:00
Mehdi Goli
ea8fa5e86f Merge with upstream 2018-08-09 14:07:56 +01:00
Mehdi Goli
8c083bfd0e Properly fixing the PointerType for TensorCustomOp.h. As the output type here should be based on CoeffreturnType not the Scalar type. Therefore, Similar to reduction and evalTo function, it should have its own MakePointer class. In this case, for other device the type is defaulted to CoeffReturnType and no changes is required on users' code. However, in SYCL, on the device, we can recunstruct the device Type. 2018-08-09 13:57:43 +01:00
Alexey Frunze
050bcf6126 bug #1584: Improve random (avoid undefined behavior). 2018-08-08 20:19:32 -07:00
Eugene Zhulenev
1c8b9e10a7 Merged with upstream eigen 2018-08-08 16:57:58 -07:00
Benoit Steiner
131ed1191f Merged in codeplaysoftware/eigen-upstream-pure/Fixing_compiler_warning (pull request PR-462)
Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.
2018-08-08 18:14:15 +00:00
Benoit Steiner
1285c080b3 Merged in codeplaysoftware/eigen-upstream-pure/disabling_assert_in_sycl (pull request PR-459)
Disabling assert inside SYCL kernel.
2018-08-08 18:12:42 +00:00
Benoit Steiner
c4b2845be9 Merged in rmlarsen/eigen3 (pull request PR-458)
Fix init order.
2018-08-08 18:11:49 +00:00
Benoit Steiner
7124172b83 Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_UNROLL_LOOP (pull request PR-460)
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 18:10:54 +00:00
Mehdi Goli
532a0be05c Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation. 2018-08-08 12:12:26 +01:00
Mehdi Goli
67711eaa31 Fixing typo. 2018-08-08 11:38:10 +01:00
Mehdi Goli
3055e3a7c2 Creating a pointer type in TensorCustomOp.h 2018-08-08 11:19:02 +01:00
Mehdi Goli
22031ab59a Adding EIGEN_UNROLL_LOOP macro. 2018-08-08 11:07:27 +01:00
Mehdi Goli
908b906d79 Disabling assert inside SYCL kernel. 2018-08-08 10:01:10 +01:00
Rasmus Munk Larsen
693fb1d41e Fix init order. 2018-08-07 17:18:51 -07:00
Benoit Steiner
10d286f55b Silenced a couple of compilation warnings. 2018-08-06 16:00:29 -07:00
Benoit Steiner
d011d05fd6 Fixed compilation errors. 2018-08-06 13:40:51 -07:00
Rasmus Munk Larsen
36e7e7dd8f Forward declare NoOpOutputKernel as struct rather than class to be consistent with implementation. 2018-08-06 13:16:32 -07:00
Rasmus Munk Larsen
fa68342ef8 Move sigmoid functor to core. 2018-08-03 17:31:23 -07:00
Gael Guennebaud
09c81ac033 bug #1451: fix numeric_limits<AutoDiffScalar<Der>> with a reference as derivative type 2018-08-04 00:17:37 +02:00
luz.paz"
43fd42a33b Fix doxy and misc. typos
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
 Eigen/src/Core/ProductEvaluators.h |  4 ++--
 Eigen/src/Core/arch/GPU/Half.h     |  2 +-
 Eigen/src/Core/util/Memory.h       |  2 +-
 Eigen/src/Geometry/Hyperplane.h    |  2 +-
 Eigen/src/Geometry/Transform.h     |  2 +-
 Eigen/src/Geometry/Translation.h   | 12 ++++++------
 doc/PreprocessorDirectives.dox     |  2 +-
 doc/TutorialGeometry.dox           |  2 +-
 test/boostmultiprec.cpp            |  2 +-
 test/triangular.cpp                |  2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Jean-Christophe Fillion-Robin
2cbd9dd498 [PATCH] cmake: Support source include with add_subdirectory and
find_package use
This commit allows the sources of the project to be included in a parent
project CMakeLists.txt and support use of "find_package(Eigen3 CONFIG REQUIRED)"

Here is an example allowing to test the changes. It is not particularly
useful in itself. This change will allow to support one of the scenario
allowing to create custom 3D Slicer application bundling associated plugins.

/tmp/eigen-git-mirror  # Eigen sources

/tmp/test/CMakeLists.txt:

  cmake_minimum_required(VERSION 3.12)
  project(test)
  add_subdirectory("/tmp/eigen-git-mirror" "eigen-git-mirror")
  find_package(Eigen3 CONFIG REQUIRED)

and configuring it using:

  mkdir /tmp/test-build && cd $_
  cmake \
    -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY:BOOL=1 \
    -DEigen3_DIR:PATH=/tmp/test-build/eigen-git-mirror \
    /tmp/test

Co-authored-by: Pablo Hernandez <pablo.hernandez@kitware.com>
---
 CMakeLists.txt              | 1 +
 cmake/Eigen3Config.cmake.in | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)
2018-09-07 15:50:19 -04:00
Christoph Hertzberg
a80a290079 Fix 'template argument uses local type'-warnings (when compiled in C++03 mode) 2018-09-10 18:57:28 +02:00
Jiandong Ruan
6dcd2642aa bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above 2018-09-08 12:05:33 -07:00
Christoph Hertzberg
edfb7962fd Use static const int instead of enum to avoid numerous local-type-template-args warnings in C++03 mode 2018-09-07 14:08:39 +02:00
Alexey Frunze
ec38f07b79 bug #1595: Don't use C++11's std::isnan() in MIPS/MSA packet math.
This removes reliance on C++11 and improves generated code.
2018-09-06 15:40:09 -07:00
Eugene Zhulenev
1b0373ae10 Replace all using declarations with typedefs in Tensor ops 2018-08-01 15:55:46 -07:00
Rasmus Munk Larsen
7f8b53fd0e bug #1580: Fix cuda clang build. STL is not supported, so std::equal_to and std::not_equal breaks compilation.
Update the definition of EIGEN_CONSTEXPR_ARE_DEVICE_FUNC to exclude clang.
See also PR 450.
2018-08-01 12:36:24 -07:00
Rasmus Munk Larsen
bcb29f890c Fix initialization order. 2018-08-03 10:18:53 -07:00
Benoit Steiner
cf17794ef4 Merged in codeplaysoftware/eigen-upstream-pure/SYCL-required-changes (pull request PR-454)
SYCL required changes
2018-08-03 16:17:30 +00:00
Mehdi Goli
3074b1ff9e Fixing the compilation error. 2018-08-03 17:13:44 +01:00
Mehdi Goli
225fa112aa Merge with upstream. 2018-08-03 17:04:08 +01:00
Mehdi Goli
01358300d5 Creating separate SYCL required PR for uncontroversial files. 2018-08-03 16:59:15 +01:00
Gustavo Lima Chaves
2bf1cc8cf7 Fix 256 bit packet size assumptions in unit tests.
Like in change 2606abed53
, we have hit the threshould again. With
AVX512 builds we would never have Vector8f packets aligned at 64
bytes (the new value of EIGEN_MAX_ALIGN_BYTES after change 405859f18d
,
for AVX512-enabled builds).

This makes test/dynalloc.cpp pass for those builds.
2018-08-02 15:55:36 -07:00
Benoit Steiner
dd5875e30d Merged in codeplaysoftware/eigen-upstream-pure/constructor_error_clang (pull request PR-451)
Fixing ambigous constructor error for Clang compiler.
2018-08-02 20:46:03 +00:00
Benoit Steiner
113d8343d6 Merged in codeplaysoftware/eigen-upstream-pure/Fixing_visual_studio_error_For_tensor_trace (pull request PR-452)
Fixing compilation error for cxx11_tensor_trace.cpp on Microsoft Visual Studio.
2018-08-02 17:54:26 +00:00
Mehdi Goli
516d2621b9 fixing compilation error for cxx11_tensor_trace.cpp error on Microsoft Visual Studio. 2018-08-02 14:30:48 +01:00
Mehdi Goli
40d6d020a0 Fixing ambigous constructor error for Clang compiler. 2018-08-02 13:34:53 +01:00
Gael Guennebaud
62169419ab Fix two regressions introduced in previous merges: bad usage of EIGEN_HAS_VARIADIC_TEMPLATES and linking issue. 2018-08-01 23:35:34 +02:00
Eugene Zhulenev
64abdf1d7e Fix typo + get rid of redundant member variables for block sizes 2018-08-01 12:35:19 -07:00
Benoit Steiner
93b9e36e10 Merged in paultucker/eigen (pull request PR-431)
Optional ThreadPoolDevice allocator

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-08-01 19:14:34 +00:00
Eugene Zhulenev
385b3ff12f Merged latest changes from upstream/eigen 2018-08-01 11:59:04 -07:00
Benoit Steiner
17221115c9 Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447)
Adding variadic version of assert which can take a parameter pack as its input.
2018-08-01 16:41:54 +00:00
Benoit Steiner
0360c36170 Merged in codeplaysoftware/eigen-upstream-pure/separating_internal_memory_allocation (pull request PR-446)
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
2018-08-01 16:13:15 +00:00
Mehdi Goli
c6a5c70712 Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.h 2018-08-01 16:56:26 +01:00
Benoit Steiner
9ca1c09131 Merged in codeplaysoftware/eigen-upstream-pure/new-arch-SYCL-headers (pull request PR-448)
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 15:50:54 +00:00
Benoit Steiner
45f75f1ace Merged in codeplaysoftware/eigen-upstream-pure/using_PacketType_class (pull request PR-449)
Enabling per device specialisation of packetSize.
2018-08-01 15:43:03 +00:00
Benoit Steiner
90e632fd66 Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_STRONG_INLINE_MACRO (pull request PR-445)
Replacing ad-hoc inline keyword with EIGEN_STRONG_INLINE MACRO.
2018-08-01 15:41:06 +00:00
Mehdi Goli
af96018b49 Using the suggested modification. 2018-08-01 16:04:44 +01:00
Mehdi Goli
b512a9536f Enabling per device specialisation of packetsize. 2018-08-01 13:39:13 +01:00
Mehdi Goli
c84509d7cc Adding new arch/SYCL headers, used for SYCL vectorization. 2018-08-01 12:40:54 +01:00
Mehdi Goli
3a197a60e6 variadic version of assert which can take a parameter pack as its input. 2018-08-01 12:19:14 +01:00
Mehdi Goli
d7a8414848 Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation. 2018-08-01 11:56:30 +01:00
Mehdi Goli
9e219bb3d3 Converting ad-hoc inline keyword to EIGEN_STRONG_INLINE MACRO. 2018-08-01 10:47:49 +01:00
Eugene Zhulenev
83c0a16baf Add block evaluation support to TensorOps 2018-07-31 15:56:31 -07:00
Benoit Steiner
edf46bd7a2 Merged in yuefengz/eigen (pull request PR-370)
Use device's allocate function instead of internal::aligned_malloc.
2018-07-31 22:38:28 +00:00
Paul Tucker
385f7b8d0c Change getAllocator() to allocator() in ThreadPoolDevice. 2018-07-31 13:52:18 -07:00
Mark D Ryan
6f5b126e6d Fix tensor contraction for AVX512 machines
This patch modifies the TensorContraction class to ensure that the kc_ field is
always a multiple of the packet_size, if the packet_size is > 8.  Without this
change spatial convolutions in Tensorflow do not work properly as the code that
re-arranges the input matrices can assert if kc_ is not a multiple of the
packet_size.  This leads to a unit test failure,
//tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
2018-07-31 09:33:37 +01:00
Gael Guennebaud
d6568425f8 Close branch tiling_3. 2018-07-31 08:13:43 +00:00
Gael Guennebaud
678a0dcb12 Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)
Tiled tensor executor
2018-07-31 08:13:00 +00:00
Gael Guennebaud
679eece876 Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437. 2018-07-31 10:10:14 +02:00
Gael Guennebaud
723856dec1 bug #1577: fix msvc compilation of unit test, msvc defines ptrdiff_t as long long 2018-07-30 14:52:15 +02:00
Eugene Zhulenev
966c2a7bb6 Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible 2018-07-27 12:45:17 -07:00
Eugene Zhulenev
6913221c43 Add tiled evaluation support to TensorExecutor 2018-07-25 13:51:10 -07:00
Alexey Frunze
7b91c11207 bug #1578: Improve prefetching in matrix multiplication on MIPS. 2018-07-24 18:36:44 -07:00
Patrik Huber
f5cace5e9f Fix two small typos in the documentation 2018-07-26 19:55:19 +00:00
Gael Guennebaud
34539c4af4 Merged in rmlarsen/eigen1 (pull request PR-441)
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-30 11:26:24 +00:00
Mark D Ryan
bc615e4585 Re-enable FMA for fast sqrt functions 2018-07-30 13:21:00 +02:00
Mark D Ryan
96b030a8e4 Re-enable FMA for fast sqrt functions
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms.  The float32
version is now 88% the speed of the original function, while the
double version is 90%.
2018-07-30 10:19:51 +01:00
Rasmus Munk Larsen
e478532625 Reduce the number of template specializations of classes related to tensor contraction to reduce binary size. 2018-07-27 12:36:34 -07:00
Rasmus Munk Larsen
2ebcb911b2 Add pcast packet op for NEON. 2018-07-26 14:28:48 -07:00
Christoph Hertzberg
397b0547e1 DIsable static assertions only when necessary and disable double-promotion warnings in that case as well 2018-07-26 00:01:24 +02:00
Christoph Hertzberg
5e79402b4a fix warnings for doc-eigen-prerequisites 2018-07-24 21:59:15 +02:00
Christoph Hertzberg
5f79b7f9a9 Removed several shadowing types and use global Index typedef everywhere 2018-07-25 21:47:45 +02:00
Christoph Hertzberg
44ee201337 Rename variable which shadows class name 2018-07-25 20:26:15 +02:00
Gustavo Lima Chaves
705f66a9ca Account for missing change on commit "Remove SimpleThreadPool and..."
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
2018-07-23 16:29:09 -07:00
Christoph Hertzberg
fd4fe7cbc5 Fixed issue which made documentation not getting built anymore 2018-07-24 22:56:15 +02:00
Christoph Hertzberg
636126ef40 Allow to filter out build-error messages 2018-07-24 20:12:49 +02:00
Eugene Zhulenev
d55efa6f0f TensorBlockIO 2018-07-23 15:50:55 -07:00
Eugene Zhulenev
34a75c3c5c Initial support of TensorBlock 2018-07-20 17:37:20 -07:00
Gael Guennebaud
2c2de9da7d Merged in glchaves/eigen (pull request PR-433)
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded  block
2018-07-23 19:38:55 +00:00
Gael Guennebaud
4ca3e48f42 fix typo 2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a Add lastN shorcuts to seq/seqN. 2018-07-23 16:20:25 +02:00
Gustavo Lima Chaves
02eaaacbc5 Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded
block

Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
2018-07-20 16:08:40 -07:00
Eugene Zhulenev
2bf864f1eb Disable type traits for stdlibc++ <= 4.9.3 2018-07-20 10:11:44 -07:00
Gael Guennebaud
de70671937 Oopps, EIGEN_COMP_MSVC is not available before including Eigen. 2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build. 2018-07-20 08:36:38 -07:00
Paul Tucker
d4afccde5a Add test coverage for ThreadPoolDevice optional allocator. 2018-07-19 17:43:44 -07:00
Eugene Zhulenev
c58b874727 PR430: Convert count to the reducer type in MeanReducer
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.

cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
  ->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
     return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.

static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
2018-07-19 17:37:03 -07:00
Gael Guennebaud
2424e3b7ac Pass by const ref. 2018-07-19 18:48:19 +02:00
Gael Guennebaud
509a5fa77f Fix IsRelocatable without C++11 2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009 Fix determination of EIGEN_HAS_TYPE_TRAITS 2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f Fix stupid error in Quaternion move ctor 2018-07-19 18:33:53 +02:00
Paul Tucker
4e9848fa86 Actually add optional Allocator* arg to ThreadPoolDevice(). 2018-07-16 17:53:36 -07:00
Paul Tucker
b3e7c9132d Add optional Allocator argument to ThreadPoolDevice constructor.
When supplied, this allocator will be used in place of
internal::aligned_malloc.  This permits e.g. use of a NUMA-node specific
allocator where the thread-pool is also restricted a single NUMA-node.
2018-07-16 17:26:05 -07:00
Gael Guennebaud
40797dbea3 bug #1572: use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11. 2018-07-17 00:11:20 +02:00
Gael Guennebaud
add5757488 Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder. 2018-07-16 18:55:40 +02:00
Gael Guennebaud
901c7d31f0 Fix usage of EIGEN_SPLIT_LARGE_TESTS=ON: some unit tests, such as indexed_view have to be split unconditionally. 2018-07-16 18:35:05 +02:00
Gael Guennebaud
f2b52f9946 Add the cmake option "EIGEN_DASHBOARD_BUILD_TARGET" to control the build target in dashboard mode (e.g., ctest -D Experimental) 2018-07-16 17:59:30 +02:00
Gael Guennebaud
23d82c1ac5 Merged in rmlarsen/eigen2 (pull request PR-422)
Optimize the case where broadcasting is a no-op.
2018-07-14 11:42:58 +00:00
Gael Guennebaud
a87cff20df Fix GeneralizedEigenSolver when requesting for eigenvalues only. 2018-07-14 09:38:49 +02:00
Rasmus Munk Larsen
3a9cf4e290 Get rid of alias for m_broadcast. 2018-07-13 16:24:48 -07:00
Rasmus Munk Larsen
4222550e17 Optimize the case where broadcasting is a no-op. 2018-07-13 16:12:38 -07:00
Rasmus Munk Larsen
4a3952fd55 Relax the condition to not only work on Android. 2018-07-13 11:24:07 -07:00
Rasmus Munk Larsen
02a9443db9 Clang produces incorrect Thumb2 assembler when using alloca.
Don't define EIGEN_ALLOCA when generating Thumb with clang.
2018-07-13 11:03:04 -07:00
Gael Guennebaud
20991c3203 bug #1571: fix is_convertible<from,to> with "from" a reference. 2018-07-13 17:47:28 +02:00
Gael Guennebaud
1920129d71 Remove clang warning 2018-07-13 16:05:35 +02:00
Gael Guennebaud
195c9c054b Print more debug info in gpu_basic 2018-07-13 16:05:07 +02:00
Gael Guennebaud
06eb24cf4d Introduce gpu_assert for assertion in device-code, and disable them with clang-cuda. 2018-07-13 16:04:27 +02:00
Gael Guennebaud
5fd03ddbfb Make EIGEN_TEST_CUDA_CLANG more friendly with OSX 2018-07-13 16:03:14 +02:00
Gael Guennebaud
86d9c0255c Forward declaring std::array does not work with all std libs, so let's just include <array> 2018-07-13 13:06:44 +02:00
David Hyde
d908afe35f bug #1558: fix a corner case in MINRES when both v_new and w_new vanish. 2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379 Reduce number of allocations in TensorContractionThreadPool. 2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746 bug #1569: fix Tensor<half>::mean() on AVX with respective unit test. 2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304 Add MIPS changes missing from previous merge. 2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739 Assert that no output kernel is defined for GPU contraction 2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85 Disable type traits for GCC < 5.1.0 2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce Specify default output kernel for TensorContractionOp 2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f Add regression for bugs #1573 and #1575 2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88 bug #1432: fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable 2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72 Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests. 2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725 bug #1575: fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted. 2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9 More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA 2018-07-18 13:51:36 +02:00
Alexey Frunze
3875fb05aa Add support for MIPS SIMD (MSA) 2018-07-06 16:04:30 -07:00
Gael Guennebaud
44ea5f7623 Add unit test for -Tensor<complex> on GPU 2018-07-12 17:19:38 +02:00
Gael Guennebaud
12e1ebb68b Remove local Index typedef from unit-tests 2018-07-12 17:16:40 +02:00
Gael Guennebaud
63185be8b2 Disable eigenvalues test for clang-cuda 2018-07-12 17:03:14 +02:00
Gael Guennebaud
bec013b2c9 fix unused warning 2018-07-12 17:02:18 +02:00
Gael Guennebaud
5c73c9223a Fix shadowing typedefs 2018-07-12 17:01:07 +02:00
Gael Guennebaud
98728312c8 Fix compilation regarding std::array 2018-07-12 17:00:37 +02:00
Gael Guennebaud
eb3d8f68bb fix unused warning 2018-07-12 16:59:47 +02:00
Gael Guennebaud
006e18e52b Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h),
and alignment/vectorization logic is now in util/ConfigureVectorization.h
2018-07-12 16:57:41 +02:00
Thales Sabino
9a6a43319f Fix cxx11_tensor_fft not building on Windows.
The type used in Eigen::DSizes needs to be at least 8 bytes long. Internally Tensor tries to convert this to an __int64 on Windows and this fails to build. On Linux, long and long long are both 8 byte integer types.
* * *
Changing from "long long" to "std::int64_t".
2018-07-12 11:20:59 +01:00
Gael Guennebaud
b347eb0b1c Fix doc 2018-07-12 11:56:18 +02:00
Mark D Ryan
e79c5149bf Fix AVX512 implementations of psqrt
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
 fixed the AVX2 version of this function.  The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN.  Fixing the issues requires adding
some additional instructions that slow down the algorithms.  A
similar test to the one used in 3ed67cb0bb
 shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7 Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances. 2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e Fix implicit conversion from 0.0 to scalar 2018-02-16 22:26:01 +04:00
Gael Guennebaud
937ad18221 add unit test for SimplicialCholesky and Boost multiprec. 2018-02-16 22:25:11 +04:00
Julian Kent
6d451cf2b6 Add missing consts for rows and cols functions in SparseLU 2018-02-10 13:44:05 +01:00
Daniele E. Domenichelli
a12b8a8c75 FindEigen3: Set Eigen3_FOUND variable 2018-07-11 16:31:50 +02:00
Gael Guennebaud
8bdb214fd0 remove double ;; 2018-07-12 11:17:53 +02:00
Gael Guennebaud
a9060378d3 bug #1570: fix warning 2018-07-12 11:07:09 +02:00
Gael Guennebaud
6cd6551b26 Add deprecated header files for TensorFlow 2018-07-12 10:50:53 +02:00
Gael Guennebaud
da0c604078 Merged in deven-amd/eigen (pull request PR-402)
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
a4ea611ca7 Remove useless specialization thanks to is_convertible being more robust. 2018-07-12 09:59:44 +02:00
Gael Guennebaud
8a40dda5a6 Add some basic unit-tests 2018-07-12 09:59:00 +02:00
Gael Guennebaud
8ef267ccbd spellcheck 2018-07-12 09:58:29 +02:00
Gael Guennebaud
21cf4a1a8b Make is_convertible more robust and conformant to std::is_convertible 2018-07-12 09:57:19 +02:00
Gael Guennebaud
8a5955a052 Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product. 2018-07-11 17:16:50 +02:00
Gael Guennebaud
d193cc87f4 Fix regression in 9357838f94 2018-07-11 17:09:23 +02:00
Gael Guennebaud
fb33687736 Fix double ;; 2018-07-11 17:08:30 +02:00
Deven Desai
876f392c39 Updates corresponding to the latest round of PR feedback
The major changes are

1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
    The above three changes effectively enable the Eigen "Packet" layer for the HIP platform

4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places

The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
1fe0b74904 deleting hip specific files that are no longer required 2018-07-11 09:28:44 -04:00
Deven Desai
dec47a6493 renaming CUDA* to GPU* for some header files 2018-07-11 09:26:54 -04:00
Deven Desai
471cfe5ff7 renaming CUDA* to GPU* for some header files 2018-07-11 09:22:04 -04:00
Deven Desai
38807a2575 merging updates from upstream 2018-07-11 09:17:33 -04:00
Gael Guennebaud
f00d08cc0a Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix. 2018-07-11 14:01:47 +02:00
Gael Guennebaud
1625476091 Add internall::is_identity compile-time helper 2018-07-11 14:00:24 +02:00
Gael Guennebaud
fe723d6129 Fix conversion warning 2018-07-10 09:10:32 +02:00
Gael Guennebaud
9357838f94 bug #1543: improve linear indexing for general block expressions 2018-07-10 09:10:15 +02:00
Gael Guennebaud
de9e31a06d Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
2018-07-09 15:41:14 +02:00
Gael Guennebaud
6190aa5632 bug #1567: add optimized path for tensor broadcasting and 'Channel First' shape 2018-07-09 11:23:16 +02:00
Gael Guennebaud
ec323b7e66 Skip null numerators in triangular-vector-solve (as in BLAS TRSV). 2018-07-09 11:13:19 +02:00
Gael Guennebaud
359dd77ec3 Fix legitimate "declaration shadows a typedef" warning 2018-07-09 11:03:39 +02:00
Deven Desai
e2b2c61533 merging from master 2018-06-20 16:47:45 -04:00
Deven Desai
1bb6fa99a3 merging the CUDA and HIP implementation for the Tensor directory and the unit tests 2018-06-20 16:44:58 -04:00
Deven Desai
cfdabbcc8f removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories 2018-06-20 12:57:02 -04:00
Deven Desai
7e41c8f1a9 renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories 2018-06-20 12:52:30 -04:00
Deven Desai
ee73ae0a80 Merged eigen/eigen into default 2018-06-20 12:37:11 -04:00
Mark D Ryan
90a53ca6fd Fix the Packet16h version of ptranspose
The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was
reordering the PacketBlock argument incorrectly.  This lead to errors in
the multiplication of matrices composed of 16 bit floats on AVX512
machines, if at least of the matrices was using RowMajor order.  This
error is responsible for one tensorflow unit test failure on AVX512
machines:

//tensorflow/python/kernel_tests:batch_matmul_op_test
2018-06-16 15:13:06 -07:00
Gael Guennebaud
1f54164eca Fix a few issues with Packet16h 2018-07-07 00:15:07 +02:00
Gael Guennebaud
f2dc048df9 complete implementation of Packet16h (AVX512) 2018-07-06 17:43:11 +02:00
Gael Guennebaud
a937c50208 palign is not used anymore, so let's relax the unit test 2018-07-06 17:41:52 +02:00
Gael Guennebaud
56a33ae57d test product kernel with half-floats. 2018-07-06 17:14:04 +02:00
Gael Guennebaud
f4d623ffa7 Complete Packet8h implementation and test it in packetmath unit test 2018-07-06 17:13:36 +02:00
Gael Guennebaud
a8ab6060df Add unitests for inverse and selfadjoint-eigenvalues on CUDA 2018-07-06 09:58:45 +02:00
Gael Guennebaud
b8271bb368 fix md5sum of lapack_addons 2018-06-15 14:21:29 +02:00
Deven Desai
b6cc0961b1 updates based on PR feedback
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)

1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`

After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.

2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
ba972fb6b4 moving Half headers from CUDA dir to GPU dir, removing the HIP versions 2018-06-13 12:26:18 -04:00
Deven Desai
d1d22ef0f4 syncing this fork with upstream 2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
Derivative of the incomplete Gamma function and the sample of a Gamma random variable

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-11 17:57:47 +00:00
Andrea Bocci
f7124b3e46 Extend CUDA support to matrix inversion and selfadjointeigensolver 2018-06-11 18:33:24 +02:00
Gael Guennebaud
0537123953 bug #1565: help MSVC to generatenot too bad ASM in reductions. 2018-07-05 09:21:26 +02:00
Gael Guennebaud
6a241bd8ee Implement custom inplace triangular product to avoid a temporary 2018-07-03 14:02:46 +02:00
Gael Guennebaud
3ae2083e23 Make is_same_dense compatible with different scalar types. 2018-07-03 13:21:43 +02:00
Gael Guennebaud
67ec37f7b0 Activate dgmres unit test 2018-07-02 12:54:14 +02:00
Gael Guennebaud
047677a08d Fix regression in changeset f05dea6b23
: computeFromHessenberg can take any expression for matrixQ, not only an HouseholderSequence.
2018-07-02 12:18:25 +02:00
Gael Guennebaud
d625564936 Simplify redux_evaluator using inheritance, and properly rename parameters in reducers. 2018-07-02 11:50:41 +02:00
Gael Guennebaud
d428a199ab bug #1562: optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x... 2018-07-02 11:41:09 +02:00
Gael Guennebaud
a7b313a16c Fix unit test 2018-07-01 22:45:47 +02:00
Gael Guennebaud
0cdacf3fa4 update comment 2018-06-29 11:28:36 +02:00
Gael Guennebaud
54f6eeda90 Merged in net147/eigen (pull request PR-411)
Use std::complex constructor instead of assignment from scalar
2018-06-28 21:01:04 +00:00
Gael Guennebaud
9a81de1d35 Fix order of EIGEN_DEVICE_FUNC and returned type 2018-06-28 00:20:59 +02:00
Jonathan Liu
b7689bded9 Use std::complex constructor instead of assignment from scalar
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
2018-06-28 00:32:37 +10:00
Gael Guennebaud
f9d337780d First step towards a generic vectorised quaternion product 2018-06-25 14:26:51 +02:00
Gael Guennebaud
ee5864f72e bug #1560 fix product with a 1x1 diagonal matrix 2018-06-25 10:30:12 +02:00
Rasmus Munk Larsen
2f62cc68cd merge 2018-06-22 15:09:44 -07:00
Rasmus Munk Larsen
bda71ad394 Fix typo in pbend for AltiVec. 2018-06-22 15:04:35 -07:00
Benoit Steiner
b6ffcd22e3 Merged in rmlarsen/eigen2 (pull request PR-409)
Fix oversharding bug in parallelFor.
2018-06-21 18:34:57 +00:00
Gael Guennebaud
4cc32d80fd bug #1555: compilation fix with XLC 2018-06-21 10:28:38 +02:00
Rasmus Munk Larsen
5418154a45 Fix oversharding bug in parallelFor. 2018-06-20 17:51:48 -07:00
Gael Guennebaud
cb4c9a6a94 bug #1531: make dedicatd unit testing for NumDimensions 2018-06-08 17:11:45 +02:00
Gael Guennebaud
d6813fb1c5 bug #1531: expose NumDimensions for solve and sparse expressions. 2018-06-08 16:55:10 +02:00
Gael Guennebaud
89d65bb9d6 bug #1531: expose NumDimensions for compatibility with Tensor 2018-06-08 16:50:17 +02:00
Gael Guennebaud
f05dea6b23 bug #1550: prevent avoidable memory allocation in RealSchur 2018-06-08 10:14:57 +02:00
Gael Guennebaud
7933267c67 fix prototype 2018-06-08 09:56:01 +02:00
Gael Guennebaud
f4d1461874 Fix the way matrix folder is passed to the tests. 2018-06-08 09:55:46 +02:00
Benoit Steiner
522d3ca54d Don't use std::equal_to inside cuda kernels since it's not supported. 2018-06-07 13:02:07 -07:00
Christoph Hertzberg
7d7bb91537 Missing line during manual rebase of PR-374 2018-06-07 20:30:09 +02:00
Michael Figurnov
30fa3d0454 Merge from eigen/eigen 2018-06-07 17:57:56 +01:00
Benoit Steiner
d2b0a4a59b Merged in mfigurnov/eigen/fix-bessel (pull request PR-404)
Fix compilation of special functions without C99 math.
2018-06-07 16:12:42 +00:00
Michael Figurnov
6c71c7d360 Merge from eigen/eigen. 2018-06-07 15:54:18 +01:00
Gael Guennebaud
c25034710e Fiw some warnings in dox examples 2018-06-07 16:09:22 +02:00
Gael Guennebaud
37348d03ae Fix int versus Index 2018-06-07 15:56:43 +02:00
Gael Guennebaud
c723ffd763 Fix warning 2018-06-07 15:56:20 +02:00
Gael Guennebaud
af7c83b9a2 Fix warning 2018-06-07 15:45:24 +02:00
Gael Guennebaud
7fe29aceeb Fix MSVC warning C4290: C++ exception specification ignored except to indicate a function is not __declspec(nothrow) 2018-06-07 15:36:20 +02:00
Michael Figurnov
aa813d417b Fix compilation of special functions without C99 math.
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
2018-06-07 14:35:07 +01:00
Gael Guennebaud
55774b48e4 Fix short vs long 2018-06-07 15:26:25 +02:00
Christoph Hertzberg
e5f9f4768f Avoid unnecessary C++11 dependency 2018-06-07 15:03:50 +02:00
Gael Guennebaud
b3fd93207b Fix typos found using codespell 2018-06-07 14:43:02 +02:00
Michael Figurnov
5172a32849 Updated the stopping criteria in igammac_cf_impl.
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
2018-06-07 12:03:58 +01:00
Michael Figurnov
4bd158fa37 Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter

The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Deven Desai
8fbd47052b Adding support for using Eigen in HIP kernels.
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.

Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)


Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Benoit Steiner
e206f8d4a4 Merged in mfigurnov/eigen (pull request PR-400)
Exponentially scaled modified Bessel functions of order zero and one.

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-05 17:05:21 +00:00
Penporn Koanantakool
e2ed0cf8ab Add a ThreadPoolInterface* getter for ThreadPoolDevice. 2018-06-02 12:07:49 -07:00
Gael Guennebaud
84868da904 Don't run hg on non mercurial clone 2018-05-31 21:21:57 +02:00
Michael Figurnov
f216854453 Exponentially scaled modified Bessel functions of order zero and one.
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)

The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Gael Guennebaud
6af1433cb5 Doc: add aliasing in common pitfaffs. 2018-05-29 22:37:47 +02:00
Katrin Leinweber
ea94543190 Hyperlink DOIs against preferred resolver 2018-05-24 18:55:40 +02:00
Gael Guennebaud
999b552c16 Search for sequential Pastix. 2018-05-29 20:49:25 +02:00
Gael Guennebaud
eef4b7bd87 Fix handling of path names containing spaces and the likes. 2018-05-29 20:49:06 +02:00
Gael Guennebaud
647b724a36 Define pcast<> for SSE types even when AVX is enabled. (otherwise float are silently reinterpreted as int instead of being converted) 2018-05-29 20:46:46 +02:00
Gael Guennebaud
49262dfee6 Fix compilation and SSE support with PGI compiler 2018-05-29 15:09:31 +02:00
Christoph Hertzberg
750af06362 Add an option to test with external BLAS library 2018-05-22 21:04:32 +02:00
Christoph Hertzberg
d06a753d10 Make qr_fullpivoting unit test run for fixed-sized matrices 2018-05-22 20:29:17 +02:00
Gael Guennebaud
f0862b062f Fix internal::is_integral<size_t/ptrdiff_t> with MSVC 2013 and older. 2018-05-22 19:29:51 +02:00
Gael Guennebaud
36e413a534 Workaround a MSVC 2013 compilation issue with MatrixBase(Index,int) 2018-05-22 18:51:35 +02:00
Gael Guennebaud
725bd92903 fix stupid typo 2018-05-18 17:46:43 +02:00
Gael Guennebaud
a382bc9364 is_convertible<T,Index> does not seems to work well with MSVC 2013, so let's rather use __is_enum(T) for old MSVC versions 2018-05-18 17:02:27 +02:00
Gael Guennebaud
4dd767f455 add some internal checks 2018-05-18 13:59:55 +02:00
Gael Guennebaud
345c0ab450 check that all integer types are properly handled by mat(i,j) 2018-05-18 13:46:46 +02:00
Mark D Ryan
405859f18d Set EIGEN_IDEAL_MAX_ALIGN_BYTES correctly for AVX512 builds
bug #1548

The macro EIGEN_IDEAL_MAX_ALIGN_BYTES is being incorrectly set to 32
on AVX512 builds.  It should be set to 64.  In the current code it is
only set to 64 if the macro EIGEN_VECTORIZE_AVX512 is defined.  This
macro does get defined in AVX512 builds in Core, but only after Macros.h,
the file that defines EIGEN_IDEAL_MAX_ALIGN_BYTES, has been included.
This commit fixes the issue by setting EIGEN_IDEAL_MAX_ALIGN_BYTES to
64 if __AVX512F__ is defined.
2018-05-17 17:04:00 +01:00
Vamsi Sripathi
6293ad3f39 Performance improvements to tensor broadcast operation
1. Added new packet functions using SIMD for NByOne, OneByN cases
  2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
  3. Added 4 test cases to cover the new packet functions
2018-05-23 14:02:05 -07:00
Gael Guennebaud
7134fa7a2e Fix compilation with MSVC by reverting to char* for _mm_prefetch except for PGI (the later being the one that has the wrong prototype). 2018-06-07 09:33:10 +02:00
Jeff Trull
e7147f69ae Add tests for sparseQR results (value and size) covering bugs #1522 and #1544 2018-04-21 10:26:30 -07:00
Robert Lukierski
b2053990d0 Adding EIGEN_DEVICE_FUNC to Products, especially Dense2Dense Assignment
specializations. Otherwise causes problems with small fixed size matrix multiplication (call to
0x00 in call_assignment_no_alias in debug mode or trap in release with CUDA 9.1).
2018-03-14 16:19:43 +00:00
Jeff Trull
9f0c5c3669 Make sparse QR result sizes consistent with dense QR, with the following rules:
1) Q is always square
2) Q*R*P' is valid and recovers the original matrix

This implies that the size of Q is the number of rows in the original matrix, square,
and that the size of R is the size of the original matrix.
2018-02-15 15:00:31 -08:00
Christoph Hertzberg
d655900953 bug #1544: Generate correct Q matrix in complex case. Original patch was by Jeff Trull in PR-386. 2018-05-17 19:17:01 +02:00
Benoit Steiner
0371380d5b Merged in rmlarsen/eigen2 (pull request PR-393)
Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.
2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen
b8d36774fa Rename clip2 to clamp. 2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen
812480baa3 Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow. 2018-05-16 09:49:24 -07:00
Benoit Steiner
1403c2c15b Merged in didierjansen/eigen (pull request PR-360)
Fix bugs and typos in the contraction example of the tensor README
2018-05-16 01:16:36 +00:00
Benoit Steiner
ad355b3f05 Merged in rmlarsen/eigen2 (pull request PR-392)
Add vectorized clip functor for Eigen Tensors
2018-05-16 01:15:56 +00:00
Christoph Hertzberg
0272f2451a Fix "suggest parentheses around comparison" warning 2018-05-15 19:35:53 +02:00
Rasmus Munk Larsen
afec3021f7 Use numext::maxi & numext::mini. 2018-05-14 16:35:39 -07:00
Rasmus Munk Larsen
b8c8e5f436 Add vectorized clip functor for Eigen Tensors. 2018-05-14 16:07:13 -07:00
Benoit Steiner
6118c6ff4f Enable RawAccess to tensor slices whenever possinle.
Avoid 32-bit integer overflow in TensorSlicingOp
2018-04-30 11:28:12 -07:00
Gael Guennebaud
6e7118265d Fix compilation with NEON+MSVC 2018-04-26 10:50:41 +02:00
Gael Guennebaud
097dd4616d Fix unit test for SIMD engine not supporting sqrt 2018-04-26 10:47:39 +02:00
Gael Guennebaud
8810baaed4 Add multi-threading for sparse-row-major * dense-row-major 2018-04-25 10:14:48 +02:00
Gael Guennebaud
2f3287da7d Fix "used uninitialized" warnings 2018-04-24 17:17:25 +02:00
Gael Guennebaud
3ffd449ef5 Workaround warning 2018-04-24 17:11:51 +02:00
Gael Guennebaud
e8ca5166a9 bug #1428: atempt to make NEON vectorization compilable by MSVC.
The workaround is to wrap NEON packet types to make them different c++ types.
2018-04-24 11:19:49 +02:00
Benoit Steiner
6f5935421a fix AVX512 plog 2018-04-23 15:49:26 +00:00
Gael Guennebaud
e9da464e20 Add specializations of is_arithmetic for long long in c++11 2018-04-23 16:26:29 +02:00
Gael Guennebaud
a57e6e5f0f workaround MSVC 2013 compilation issue (ambiguous call) 2018-04-23 15:31:51 +02:00
Gael Guennebaud
11123175db typo in doc 2018-04-23 15:30:35 +02:00
Gael Guennebaud
5679e439e0 bug #1543: fix linear indexing in generic block evaluation (this completes the fix in commit 12efc7d41b
)
2018-04-23 14:40:16 +02:00
Gael Guennebaud
35b31353ab Fix unit test 2018-04-22 22:49:08 +02:00
Christoph Hertzberg
34e499ad36 Disable -Wshadow when compiling with g++ 2018-04-21 22:08:26 +02:00
Jayaram Bobba
b7b868d1c4 fix AVX512 plog 2018-04-20 13:39:18 -07:00
Gael Guennebaud
686fb57233 fix const cast in NEON 2018-04-18 18:46:34 +02:00
Dmitriy Korchemkin
02d2f1cb4a Cast zeros to Scalar in RealSchur 2018-04-18 13:52:46 +03:00
Christoph Hertzberg
50633d1a83 Renamed .trans() et al. to .reverseFlag() et at. Adapted documentation of .setReverseFlag() 2018-04-17 11:30:27 +02:00
nicolov
39c2cba810 Add a specialization of Eigen::numext::conj for std::complex<T> to be used when compiling a cuda kernel. This fixes the compilation of TensorFlow 1.4 with clang 6.0 used as CUDA compiler with libc++.
This follows the previous change in 2a69290ddb
, which mentions OSX (I guess because it uses libc++ too).
2018-04-13 22:29:10 +00:00
Christoph Hertzberg
775766d175 Add parenthesis to fix compiler warnings 2018-04-15 18:43:56 +02:00
Christoph Hertzberg
42715533f1 bug #1493: Make representation of HouseholderSequence consistent and working for complex numbers. Made corresponding unit test actually test that. Also simplify implementation of QR decompositions 2018-04-15 10:15:28 +02:00
Christoph Hertzberg
c9ecfff2e6 Add links where to make PRs and report bugs into README.md 2018-04-13 21:05:28 +00:00
Christoph Hertzberg
c8b19702bc Limit test size for sparse Cholesky solvers to EIGEN_TEST_MAX_SIZE 2018-04-13 20:36:58 +02:00
Christoph Hertzberg
2cbb00b18e No need to make noise, if KLU is found 2018-04-13 19:14:25 +02:00
Christoph Hertzberg
84dcd998a9 Recent Adolc versions require C++11 2018-04-13 19:10:23 +02:00
Christoph Hertzberg
4d392d93aa Make hypot_impl compile again for types with expression-templates (e.g., boost::multiprecision) 2018-04-13 19:01:37 +02:00
Christoph Hertzberg
072e111ec0 SelfAdjointView<...,Mode> causes a static assert since commit d820ab9edc 2018-04-13 19:00:34 +02:00
Gael Guennebaud
7a9089c33c fix linking issue 2018-04-13 08:51:47 +02:00
Gael Guennebaud
e43ca0320d bug #1520: workaround some -Wfloat-equal warnings by calling std::equal_to 2018-04-11 15:24:13 +02:00
Weiming Zhao
b0eda3cb9f Avoid using memcpy for non-POD elements 2018-04-11 11:37:06 +02:00
Gael Guennebaud
79266fec75 extend doxygen splitter for huge screens 2018-04-11 11:31:17 +02:00
Gael Guennebaud
426052ef6e Update header/footer for doxygen 1.8.13 2018-04-11 11:30:34 +02:00
Gael Guennebaud
9c8decffbf Fix javascript hacks for oxygen 1.8.13 2018-04-11 11:30:14 +02:00
Gael Guennebaud
e798466871 bug #1538: update manual pages regarding BDCSVD. 2018-04-11 10:46:11 +02:00
Gael Guennebaud
c91906b065 Umfpack: UF_long has been removed in recent versions of suitesparse, and fix a few long-to-int conversions issues. 2018-04-11 09:59:59 +02:00
Gael Guennebaud
0050709ea7 Merged in v_huber/eigen (pull request PR-378)
Add interface to umfpack_*l_* functions
2018-04-11 07:43:04 +00:00
Guillaume Jacob
8c1652055a Fix code sample output in block(int, int, int, int) doxygen 2018-04-09 17:23:59 +02:00
vhuber
08008f67e1 Add unitTest 2018-04-09 17:07:46 +02:00
Gael Guennebaud
add15924ac Fix MKL backend for symmetric eigenvalues on row-major matrices. 2018-04-09 13:29:26 +02:00
Gael Guennebaud
04b1628e55 Add missing empty line. 2018-04-09 13:28:31 +02:00
Gael Guennebaud
c2624c0318 Fix cmake scripts with no fortran compiler 2018-04-07 08:45:19 +02:00
Gael Guennebaud
2f833b1c64 bug #1509: fix computeInverseWithCheck for complexes 2018-04-04 15:47:46 +02:00
Gael Guennebaud
b903fa74fd Extend list of MSVC versions 2018-04-04 15:14:09 +02:00
Gael Guennebaud
403f09ccef Make stableNorm and blueNorm compatible with 2D matrices. 2018-04-04 15:13:31 +02:00
Gael Guennebaud
4213b63f5c Factories code between numext::hypot and scalar_hyot_op functor. 2018-04-04 15:12:43 +02:00
Gael Guennebaud
368dd4cd9d Make innerVector() and innerVectors() methods available to all expressions supported by Block.
Before, only SparseBase exposed such methods.
2018-04-04 15:09:21 +02:00
Gael Guennebaud
e116f6847e bug #1521: avoid signalling NaN in hypot and make it std::complex<> friendly. 2018-04-04 13:47:23 +02:00
Gael Guennebaud
73729025a4 bug #1521: add unit test dedicated to numbest::hypos 2018-04-04 13:45:34 +02:00
Gael Guennebaud
13f5df9f67 Add a note on vec_min vs asm 2018-04-04 13:10:38 +02:00
Gael Guennebaud
e91e314347 bug #1494: makes pmin/pmax behave on Altivec/VSX as on x86 regading NaNs 2018-04-04 11:39:19 +02:00
Gael Guennebaud
112c899304 comment unreachable code 2018-04-03 23:16:43 +02:00
Gael Guennebaud
a1292395d6 Fix compilation of product with inverse transpositions (e.g., mat * Transpositions().inverse()) 2018-04-03 23:06:44 +02:00
Gael Guennebaud
8c7b5158a1 commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix)
Author: George Burgess IV <gbiv@google.com>
Date:   Thu Mar 1 11:20:24 2018 -0800

    Prefer `::operator new` to `new`

    The C++ standard allows compilers much flexibility with `new`
    expressions, including eliding them entirely
    (https://godbolt.org/g/yS6i91). However, calls to `operator new` are
    required to be treated like opaque function calls.

    Since we're calling `new` for side-effects other than allocating heap
    memory, we should prefer the less flexible version.

    Signed-off-by: George Burgess IV <gbiv@google.com>
2018-04-03 17:15:38 +02:00
Gael Guennebaud
dd4cc6bd9e bug #1527: fix support for MKL's VML (destination was not properly resized) 2018-04-03 17:11:15 +02:00
Gael Guennebaud
c5b56f1fb2 bug #1528: better use numeric_limits::min() instead of 1/highest() that with underflow. 2018-04-03 16:49:35 +02:00
Gael Guennebaud
8d0ffe3655 bug #1516: add assertion for out-of-range diagonal index in MatrixBase::diagonal(i) 2018-04-03 16:15:43 +02:00
Gael Guennebaud
407e3e2621 bug #1532: disable stl::*_negate in C++17 (they are deprecated) 2018-04-03 15:59:30 +02:00
Gael Guennebaud
40b4bf3d32 AVX512: _mm512_rsqrt28_ps is available for AVX512ER only 2018-04-03 14:36:27 +02:00
Gael Guennebaud
584951ca4d Rename predux_downto4 to be more accurate on its semantic. 2018-04-03 14:28:38 +02:00
Gael Guennebaud
67bac6368c protect calls to isnan 2018-04-03 14:19:04 +02:00
Gael Guennebaud
d43b2f01f4 Fix unit testing of predux_downto4 (bad name), and add unit testing of prsqrt 2018-04-03 14:14:00 +02:00
Gael Guennebaud
7b0630315f AVX512: fix psqrt and prsqrt 2018-04-03 14:12:50 +02:00
Gael Guennebaud
6719409cd9 AVX512: add missing pinsertfirst and pinsertlast, implement pblend for Packet8d, fix compilation without AVX512DQ 2018-04-03 14:11:56 +02:00
Gael Guennebaud
524119d32a Fix uninitialized output argument. 2018-04-03 10:56:10 +02:00
vhuber
267a144da5 Remove unnecessary define 2018-03-30 23:04:53 +02:00
vhuber
baf9a5a776 Add interface to umfpack_*l_* functions 2018-03-30 18:53:34 +02:00
luz.paz
e3912f5e63 MIsc. source and comment typos
Found using `codespell` and `grep` from downstream FreeCAD
2018-03-11 10:01:44 -04:00
Gael Guennebaud
5deeb19e7b bug #1517: fix triangular product with unit diagonal and nested scaling factor: (s*A).triangularView<UpperUnit>()*B 2018-02-09 16:52:35 +01:00
Gael Guennebaud
12efc7d41b Fix linear indexing in generic block evaluation. 2018-02-09 16:45:49 +01:00
Gael Guennebaud
f4a6863c75 Fix typo 2018-02-09 16:43:49 +01:00
Viktor Csomor
000840cae0 Added a move constructor and move assignment operator to Tensor and wrote some tests. 2018-02-07 19:10:54 +01:00
Gael Guennebaud
3a2dc3869e Fix weird issue with MSVC 2013 2018-07-18 02:26:43 -07:00
Eugene Zhulenev
c95aacab90 Fix TensorContractionOp evaluators for GPU and SYCL 2018-07-17 14:09:37 -07:00
Gael Guennebaud
038b55464b Merged in deven-amd/eigen (pull request PR-425)
applying EIGEN_DECLARE_TEST to *gpu  unit tests
2018-07-17 21:14:40 +00:00
Deven Desai
f124f07965 applying EIGEN_DECLARE_TEST to *gpu* tests
Also, a few minor fixes for GPU tests running in HIP mode.

1. Adding an include for hip/hip_runtime.h in the Macros.h file
   For HIP __host__ and __device__ are macros which are defined in hip headers.
   Their definitions need to be included before their use in the file.

2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
   "Fuse computations into the Tensor contractions using output kernel"

3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
dff3a92d52 Remove usage of #if EIGEN_TEST_PART_XX in unit tests that does not require them (splitting can thus be avoided for them) 2018-07-17 15:52:58 +02:00
Gael Guennebaud
82f0ce2726 Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Gael Guennebaud
37f4bdd97d Fix VERIFY_EVALUATION_COUNT(EXPR,N) with a complex expression as N 2018-07-17 13:20:49 +02:00
Gael Guennebaud
2b2cd85694 bug #1573: add noexcept move constructor and move assignment operator to Quaternion 2018-07-17 11:11:33 +02:00
Eugene Zhulenev
43206ac4de Call OutputKernel in evalGemv 2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf Remove SimpleThreadPool and always use {NonBlocking}ThreadPool 2018-07-16 15:06:57 -07:00
Eugene Zhulenev
b324ed55d9 Call OutputKernel in evalGemv 2018-07-12 14:52:23 -07:00
Eugene Zhulenev
01fd4096d3 Fuse computations into the Tensor contractions using output kernel 2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f Some warning fixes 2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2018-01-30 20:22:12 +00:00
Gael Guennebaud
09a16ba42f bug #1412: fix compilation with nvcc+MSVC 2018-01-17 23:13:16 +01:00
Lee.Deokjae
5b3c367926 Fix typos in the contraction example of tensor README 2018-01-06 14:36:19 +09:00
Eugene Chereshnev
f558ad2955 Fix incorrect ldvt in LAPACKE call from JacobiSVD 2018-01-03 12:55:52 -08:00
Benoit Steiner
22de74aa76 Disable use of recurrence for computing twiddle factors. 2018-01-09 18:32:52 +00:00
Gael Guennebaud
73629f8b68 Fix gcc7 warning 2018-01-09 08:59:27 +01:00
RJ Ryan
59985cfd26 Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689 2017-12-31 10:44:56 -05:00
nluehr
f9bdcea022 For cuda 9.1 replace math_functions.hpp with cuda_runtime.h 2017-12-18 16:51:15 -08:00
Gael Guennebaud
06bf1047f9 Fix compilation of stableNorm with some expressions as input 2017-12-15 15:15:37 +01:00
Gael Guennebaud
73214c4bd0 Workaround nvcc 9.0 issue. See PR 351.
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Gael Guennebaud
31e0bda2e3 Fix cmake warning 2017-12-14 15:48:27 +01:00
Gael Guennebaud
26a2c6fc16 fix unit test 2017-12-14 15:11:04 +01:00
Gael Guennebaud
546ab97d76 Add possibility to overwrite EIGEN_STRONG_INLINE. 2017-12-14 14:47:38 +01:00
Gael Guennebaud
9c3aed9d48 Fix packet and alignment propagation logic of Block<Xpr> expressions. In particular, (A+B).col(j) lost vectorisation. 2017-12-14 14:24:33 +01:00
Gael Guennebaud
76c7dae600 ignore all *build* sub directories 2017-12-14 14:22:14 +01:00
Gael Guennebaud
b2cacd189e fix header inclusion 2017-12-14 10:01:02 +01:00
Yangzihao Wang
3122477c86 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2017-12-12 11:15:24 -08:00
Benoit Steiner
393b7c4959 Merged in ncluehr/eigen/float2half-fix (pull request PR-349)
Replace __float2half_rn with __float2half
2017-12-01 00:29:51 +00:00
nluehr
aefd5fd5c4 Replace __float2half_rn with __float2half
The latter provides a consistent definition for CUDA 8.0 and 9.0.
2017-11-28 10:15:46 -08:00
Gael Guennebaud
d0b028e173 clarify Pastix requirements 2017-11-27 22:11:57 +01:00
Gael Guennebaud
3587e481fb silent MSVC warning 2017-11-27 21:53:02 +01:00
Benoit Steiner
3a327cd3c7 Merged in ncluehr/eigen/predux_fp16_fix (pull request PR-348)
Fix incorrect integer cast in half2 predux.
2017-11-21 21:11:45 +00:00
nluehr
dd6de618c3 Fix incorrect integer cast in predux<half2>().
Bug corrupts results on Maxwell and earlier GPU architectures.
2017-11-21 10:47:00 -08:00
Gael Guennebaud
3dc6ff73ca Handle PGI compiler 2017-11-17 22:54:39 +01:00
Zvi Rackover
599a88da27 Disable gcc-specific workaround for Clang to allow build with AVX512
There is currently a workaround for an issue in gcc that requires invoking gcc with the -fabi-version flag. This workaround is not needed for Clang and moreover is not supported.
2017-11-16 19:53:38 +00:00
Gael Guennebaud
672bdc126b bug #1479: fix failure detection in LDLT 2017-11-16 17:55:24 +01:00
Basil Fierz
624df50945 Adds missing EIGEN_STRONG_INLINE to support MSVC properly inlining small vector calculations
When working with MSVC often small vector operations are not properly inlined. This behaviour is observed even on the most recent compiler versions.
2017-10-26 22:44:28 +02:00
Benoit Steiner
746a6b7b81 Merged in zzp11/eigen/zzp11/a-small-mistake-quickreferencedox-edited-1510217281963 (pull request PR-346)
a small mistake QuickReference.dox edited online with Bitbucket
2018-03-23 01:02:34 +00:00
Benoit Steiner
d2631ef61d Merged in facaiy/eigen/ENH/exp_support_complex_for_gpu (pull request PR-359)
ENH: exp supports complex type for cuda
2018-03-23 00:59:15 +00:00
Benoit Steiner
8fcbd6d4c9 Merged in dtrebbien/eigen (pull request PR-369)
Move up the specialization of std::numeric_limits
2018-03-23 00:54:58 +00:00
Rasmus Munk Larsen
e900b010c8 Improve robustness of igamma and igammac to bad inputs.
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.

Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00
Gael Guennebaud
f7d17689a5 Add static assertion for fixed sizes Ref<> 2018-03-09 10:11:13 +01:00
Gael Guennebaud
f6be7289d7 Implement better static assertion checking to make sure that the first assertion is a static one and not a runtime one. 2018-03-09 10:00:51 +01:00
Gael Guennebaud
d820ab9edc Add static assertion on selfadjoint-view's UpLo parameter. 2018-03-09 09:33:43 +01:00
Daniel Trebbien
0c57be407d Move up the specialization of std::numeric_limits
This fixes a compilation error seen when building TensorFlow on macOS:
https://github.com/tensorflow/tensorflow/issues/17067
2018-02-18 15:35:45 -08:00
Yan Facai (颜发才)
42a8334668 ENH: exp supports complex type for cuda 2018-01-04 16:01:01 +08:00
zhouzhaoping
912e9965ef a small mistake QuickReference.dox edited online with Bitbucket 2017-11-09 08:49:01 +00:00
Gael Guennebaud
4c03b3511e Fix issue with boost::multiprec in previous commit 2017-11-08 23:28:01 +01:00
Gael Guennebaud
e9d2888e74 Improve debugging tests and output in BDCSVD 2017-11-08 10:26:03 +01:00
Gael Guennebaud
e8468ea91b Fix overflow issues in BDCSVD 2017-11-08 10:24:28 +01:00
Benoit Steiner
3949615176 Merged in JonasMu/eigen (pull request PR-329)
Added an example for a contraction to a scalar value to README.md

Approved-by: Jonas Harsch <jonas.harsch@gmail.com>
2017-10-27 07:27:46 +00:00
Christoph Hertzberg
11ddac57e5 Merged in guillaume_michel/eigen (pull request PR-334)
- Add support for NEON plog PacketMath function
2017-10-23 13:22:22 +00:00
Benoit Steiner
a6d875bac8 Removed unecesasry #include 2017-10-22 08:12:45 -07:00
Benoit Steiner
f16ba2a630 Merged in LaFeuille/eigen-1/LaFeuille/typo-fix-alignmeent-alignment-1505889397887 (pull request PR-335)
Typo fix alignmeent ->alignment
2017-10-21 01:59:55 +00:00
Benoit Steiner
ee6ad21b25 Merged in henryiii/eigen/henryiii/device (pull request PR-343)
Fixing missing inlines on device functions for newer CUDA cards
2017-10-21 01:58:22 +00:00
Henry Schreiner
9bb26eb8f1 Restore __device__ 2017-10-21 00:50:38 +00:00
Henry Schreiner
4245475d22 Fixing missing inlines on device functions for newer CUDA cards 2017-10-20 03:20:13 +00:00
Benoit Steiner
8eb4b9d254 Merged in benoitsteiner/opencl (pull request PR-341) 2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen
2dd63ed395 Merge 2017-10-13 15:58:52 -07:00
Rasmus Munk Larsen
f349507e02 Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%. 2017-10-13 15:58:12 -07:00
Benoit Steiner
688451409d Merged in mehdi_goli/upstr_benoit/ComputeCppNewReleaseFix (pull request PR-16)
Changes required for new ComputeCpp CE version.
2017-10-13 20:56:01 +00:00
Konstantinos Margaritis
0e6e027e91 check both z13 and z14 arches 2017-10-12 15:38:34 -04:00
Konstantinos Margaritis
6c3475f110 remove debugging 2017-10-12 15:34:55 -04:00
Konstantinos Margaritis
df7644aec3 Merged eigen/eigen into default 2017-10-12 22:23:13 +03:00
Konstantinos Margaritis
98e52cc770 rollback 374f750ad4 2017-10-12 15:22:10 -04:00
Konstantinos Margaritis
c4ad358565 explicitly set conjugate mask 2017-10-11 11:05:29 -04:00
Konstantinos Margaritis
380d41fd76 added some extra debugging 2017-10-11 10:40:12 -04:00
Konstantinos Margaritis
d0b7b9d0d3 some Packet2cf pmul fixes 2017-10-11 10:17:22 -04:00
Konstantinos Margaritis
df173f5620 initial pexp() for 32-bit floats, commented out due to vec_cts() 2017-10-11 09:40:49 -04:00
Konstantinos Margaritis
3dcae2a27f initial pexp() for 32-bit floats, commented out due to vec_cts() 2017-10-11 09:40:45 -04:00
Konstantinos Margaritis
c2a2246489 fix predux_mul for z14/float 2017-10-10 13:38:32 -04:00
Konstantinos Margaritis
374f750ad4 eliminate 'enumeral and non-enumeral type in conditional expression' warning 2017-10-09 16:56:30 -04:00
Konstantinos Margaritis
bc30305d29 complete z14 port 2017-10-09 16:55:10 -04:00
Gael Guennebaud
0e85a677e3 bug #1472: fix warning 2017-09-26 10:53:33 +02:00
Gael Guennebaud
8579195169 bug #1468 (1/2) : add missing std:: to memcpy 2017-09-22 09:23:24 +02:00
Gael Guennebaud
f92567fecc Add link to a useful example. 2017-09-20 10:22:23 +02:00
Gael Guennebaud
7ad07fc6f2 Update documentation for aligned_allocator 2017-09-20 10:22:00 +02:00
LaFeuille
7c9b07dc5c Typo fix alignmeent ->alignment 2017-09-20 06:38:39 +00:00
Mehdi Goli
2062ac9958 Changes required for new ComputeCpp CE version. 2017-09-18 18:17:39 +01:00
Christoph Hertzberg
23f8b00bc8 clang provides __has_feature(is_enum) (but not <type_traits>) in C++03 mode 2017-09-14 19:26:03 +02:00
Christoph Hertzberg
0c9ad2f525 std::integral_constant is not C++03 compatible 2017-09-14 19:23:38 +02:00
Rasmus Munk Larsen
1b7294f6fc Fix cut-and-paste error. 2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen
94e2213b38 Avoid undefined behavior in Eigen::TensorCostModel::numThreads.
If the cost is large enough then the thread count can be larger than the maximum
representable int, so just casting it to an int is undefined behavior.

Contributed by phurst@google.com.
2017-09-08 15:49:55 -07:00
Gael Guennebaud
6d42309f13 Fix compilation of Vector::operator()(enum) by treating enums as Index 2017-09-07 14:34:30 +02:00
Benoit Steiner
ea4e65bf41 Fixed compilation with cuda_clang. 2017-09-07 09:13:52 +00:00
Gael Guennebaud
a91918a105 Merged in infinitei/eigen (pull request PR-328)
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.

Approved-by: Tal Hadad <tal_hd@hotmail.com>
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>
2017-09-06 08:42:14 +00:00
Gael Guennebaud
9c353dd145 Add C++11 max_digits10 for half. 2017-09-06 10:22:47 +02:00
Gael Guennebaud
b35d1ce4a5 Implement true compile-time "if" for apply_rotation_in_the_plane. This fixes a compilation issue for vectorized real type with missing vectorization for complexes, e.g. AVX512. 2017-09-06 10:02:49 +02:00
Gael Guennebaud
80142362ac Fix mixing types in sparse matrix products. 2017-09-02 22:50:20 +02:00
Jonas Harsch
810b70ad09 Merged in JonasMu/added-an-example-for-a-contraction-to-a--1504265366851 (pull request PR-1)
Added an example for a contraction to a scalar value
2017-09-01 12:01:39 +00:00
Jonas Harsch
a34fb212cd Close branch JonasMu/added-an-example-for-a-contraction-to-a--1504265366851 2017-09-01 12:01:39 +00:00
Jonas Harsch
a991c80365 Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example. 2017-09-01 11:30:26 +00:00
Benoit Steiner
a4089991eb Added support for CUDA 9.0. 2017-08-31 02:49:39 +00:00
Abhijit Kundu
6d991a9595 bug #1464 : Fixes construction of EulerAngles from 3D vector expression. 2017-08-30 13:26:30 -04:00
Gael Guennebaud
304ef29571 Handle min/max/inf/etc issue in cuda_fp16.h directly in test/main.h 2017-08-24 11:26:41 +02:00
Konstantinos Margaritis
1affe3d8df Merged eigen/eigen into default 2017-08-24 12:24:01 +03:00
Gael Guennebaud
21633e585b bug #1462: remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER 2017-08-24 11:06:47 +02:00
Gael Guennebaud
12249849b5 Make the threshold from gemm to coeff-based-product configurable, and add some explanations. 2017-08-24 10:43:21 +02:00
Gael Guennebaud
39864ebe1e bug #336: improve doc for PlainObjectBase::Map 2017-08-22 17:18:43 +02:00
Gael Guennebaud
600e52fc7f Add missing scalar conversion 2017-08-22 17:06:57 +02:00
Gael Guennebaud
9deee79922 bug #1457: add setUnit() methods for consistency. 2017-08-22 16:48:07 +02:00
Gael Guennebaud
bc4dae9aeb bug #1449: fix redux_3 unit test 2017-08-22 15:59:08 +02:00
Gael Guennebaud
bc91a2df8b bug #1461: fix compilation of Map<const Quaternion>::x() 2017-08-22 15:10:42 +02:00
Gael Guennebaud
fc39d5954b Merged in dtrebbien/eigen/patch-1 (pull request PR-312)
Work around a compilation error seen with nvcc V8.0.61
2017-08-22 12:17:37 +00:00
Gael Guennebaud
b223918ea9 Doc: warn about constness in LLT::solveInPlace 2017-08-22 14:12:47 +02:00
Konstantinos Margaritis
4ce5ec5197 initial support for z14 2017-08-07 05:54:29 -04:00
Konstantinos Margaritis
e1e71ca4e4 initial support for z14 2017-08-06 19:53:18 -04:00
Benoit Steiner
84d7be103a Fixing Argmax that was breaking upstream TensorFlow. 2017-07-22 03:19:34 +00:00
Benoit Steiner
f0b154a4b0 Code cleanup 2017-07-10 09:54:09 -07:00
Benoit Steiner
575cda76b3 Fixed syntax errors generated by xcode 2017-07-09 11:39:01 -07:00
Benoit Steiner
5ac27d5b51 Avoid relying on cxx11 features when possible. 2017-07-08 21:58:44 -07:00
Benoit Steiner
c5a241ab9b Merged in benoitsteiner/opencl (pull request PR-323)
Improved support for OpenCL
2017-07-07 16:27:33 +00:00
Benoit Steiner
b7ae4dd9ef Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315)
Add labels to #ifdef, in TensorReductionCuda.h
2017-07-07 04:23:52 +00:00
Benoit Steiner
9daed67952 Merged in tntnatbry/eigen (pull request PR-319)
Tensor Trace op
2017-07-07 04:18:03 +00:00
Benoit Steiner
6795512e59 Improved the randomness of the tensor random generator 2017-07-06 21:12:45 -07:00
Benoit Steiner
dc524ac716 Fixed compilation warning 2017-07-06 21:11:15 -07:00
Benoit Steiner
62b4634ebe Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14)
Applying Benoit's comment for Fixing ImageVolumePatch.

* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.

* Fixing dealocation of the memory in ImagePatch test for SYCL.

* Fixing the automerge issue.
2017-07-06 05:08:13 +00:00
Benoit Steiner
c92faf9d84 Merged in mehdi_goli/upstr_benoit/HiperbolicOP (pull request PR-13)
Adding hyperbolic operations for sycl.

* Adding hyperbolic operations.

* Adding the hyperbolic operations for CPU as well.
2017-07-06 05:05:57 +00:00
Benoit Steiner
53725c10b8 Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)
DataDependancy

* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.

* Applying Ronnan's Comments.

* Applying benoit's comments
2017-06-28 17:55:23 +00:00
Gael Guennebaud
c010b17360 Fix warning 2017-06-27 14:29:57 +02:00
Gael Guennebaud
561f777075 Fix a gcc7 warning about bool * bool in abs2 default implementation. 2017-06-27 12:05:17 +02:00
Gael Guennebaud
b651ce0ffa Fix a gcc7 warning: Wint-in-bool-context 2017-06-26 09:58:28 +02:00
Christoph Hertzberg
157040d44f Make sure CMAKE_Fortran_COMPILER is set before checking for Fortran functions 2017-06-20 16:58:03 +02:00
Gael Guennebaud
24fe1de9b4 merge 2017-06-15 10:17:39 +02:00
Gael Guennebaud
b240080e64 bug #1436: fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing. 2017-06-15 10:16:30 +02:00
Benoit Steiner
3baef62b9a Added missing __device__ qualifier 2017-06-13 12:56:55 -07:00
Benoit Steiner
449936828c Added missing __device__ qualifier 2017-06-13 12:54:57 -07:00
Benoit Steiner
b8e805497e Merged in benoitsteiner/opencl (pull request PR-318)
Improved support for OpenCL
2017-06-13 05:01:10 +00:00
Gael Guennebaud
9fbdf02059 Enable Array(EigenBase<>) ctor for compatible scalar types only. This prevents nested arrays to look as being convertible from/to simple arrays. 2017-06-12 22:30:32 +02:00
Gael Guennebaud
e43d8fe9d7 Fix compilation of streaming nested Array, i.e., cout << Array<Array<>> 2017-06-12 22:26:26 +02:00
Gael Guennebaud
d9d7bd6d62 Fix 1x1 case in Solve expression with EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION==RowMajor 2017-06-12 22:25:02 +02:00
Androbin42
95ecb2b5d6 Make buildtests.in more robust 2017-06-12 17:11:06 +00:00
Androbin42
3f7fb5a6d6 Make eigen_monitor_perf.sh more robust 2017-06-12 17:07:56 +00:00
Gael Guennebaud
7f42a93349 Merged in alainvaucher/eigen/find-module-imported-target (pull request PR-324)
In the CMake find module, define the Eigen imported target as when installing with CMake

* In the CMake find module, define the Eigen imported target

* Add quotes to the imported location, in case there are spaces in the path.

Approved-by: Alain Vaucher <acvaucher@gmail.com>
2017-11-15 20:45:09 +00:00
Gael Guennebaud
7cc503f9f5 bug #1485: fix linking issue of non template functions 2017-11-15 21:33:37 +01:00
Gael Guennebaud
103c0aa6ad Add KLU in the list of third-party sparse solvers 2017-11-10 14:13:29 +01:00
Gael Guennebaud
00bc67c374 Move KLU support to official 2017-11-10 14:11:22 +01:00
Gael Guennebaud
b82cd93c01 KLU: truely disable unimplemented code, add proper static assertions in solve 2017-11-10 14:09:01 +01:00
Gael Guennebaud
6365f937d6 KLU depends on BTF but not on libSuiteSparse nor Cholmod 2017-11-10 13:58:52 +01:00
Gael Guennebaud
8cf63ccb99 Merged in kylemacfarlan/eigen (pull request PR-337)
Add support for SuiteSparse's KLU routines
2017-11-10 10:43:17 +00:00
Gael Guennebaud
1495b98a8e Merged in spraetor/eigen (pull request PR-305)
Issue with mpreal and std::numeric_limits::digits
2017-11-10 10:28:54 +00:00
Gael Guennebaud
fc45324380 Merged in jkflying/eigen-fix-scaling (pull request PR-302)
Make scaling work with non-square matrices
2017-11-10 10:11:36 +00:00
Gael Guennebaud
d306b96fb7 Merged in carpent/eigen (pull request PR-342)
Use col method for column-major matrix
2017-11-10 10:09:53 +00:00
Gael Guennebaud
1b2dcf9a47 Check that Schur decomposition succeed. 2017-11-10 10:26:09 +01:00
Gael Guennebaud
0a1cc73942 bug #1484: restore deleted line for 128 bits long doubles, and improve dispatching logic. 2017-11-10 10:25:41 +01:00
Gael Guennebaud
f86bb89d39 Add EIGEN_MKL_NO_DIRECT_CALL option 2017-11-09 11:07:45 +01:00
Gael Guennebaud
5fa79f96b8 Patch from Konstantin Arturov to enable MKL's direct call by default 2017-11-09 10:58:38 +01:00
Justin Carpentier
a020d9b134 Use col method for column-major matrix 2017-10-17 21:51:27 +02:00
Kyle Vedder
c0e1d510fd Add support for SuiteSparse's KLU routines 2017-10-04 21:01:23 -05:00
Gael Guennebaud
6dcf966558 Avoid implicit scalar conversion with accuracy loss in pow(scalar,array) 2017-06-12 16:47:22 +02:00
Gael Guennebaud
50e09cca0f fix tipo 2017-06-11 15:30:36 +02:00
Gael Guennebaud
a4fd4233ad Fix compilation with some compilers 2017-06-09 23:02:02 +02:00
Gael Guennebaud
c3e2afce0d Enable MSVC 2010 workaround from MSVC only 2017-06-09 16:25:18 +02:00
Gael Guennebaud
731c8c704d bug #1403: more scalar conversions fixes in BDCSVD 2017-06-09 15:45:49 +02:00
Gael Guennebaud
1bbcf19029 bug #1403: fix implicit scalar type conversion. 2017-06-09 14:44:02 +02:00
Gael Guennebaud
ba5cab576a bug #1405: enable StrictlyLower/StrictlyUpper triangularView as the destination of matrix*matrix products. 2017-06-09 14:38:04 +02:00
Gael Guennebaud
90168c003d bug #1414: doxygen, add EigenBase to CoreModule 2017-06-09 14:01:44 +02:00
Gael Guennebaud
26f552c18d fix compilation of Half in C++98 (issue introduced in previous commit) 2017-06-09 13:36:58 +02:00
Gael Guennebaud
1d59ca2458 Fix compilation with gcc 4.3 and ARM NEON 2017-06-09 13:20:52 +02:00
Gael Guennebaud
fb1ee04087 bug #1410: fix lvalue propagation of Array/Matrix-Wrapper with a const nested expression. 2017-06-09 13:13:03 +02:00
Gael Guennebaud
723a59ac26 add regression test for aliasing in product rewritting 2017-06-09 12:54:40 +02:00
Gael Guennebaud
8640093af1 fix compilation in C++98 2017-06-09 12:45:01 +02:00
Gael Guennebaud
a7be4cd1b1 Fix LeastSquareDiagonalPreconditioner for complexes (issue introduced in previous commit) 2017-06-09 11:57:53 +02:00
Gael Guennebaud
498aa95a8b bug #1424: add numext::abs specialization for unsigned integer types. 2017-06-09 11:53:49 +02:00
Gael Guennebaud
d588822779 Add missing std::numeric_limits specialization for half, and complete NumTraits<half> 2017-06-09 11:51:53 +02:00
Gael Guennebaud
682b2ef17e bug #1423: fix LSCG\'s Jacobi preconditioner for row-major matrices. 2017-06-08 15:06:27 +02:00
Gael Guennebaud
4bbc320468 bug #1435: fix aliasing issue in exressions like: A = C - B*A; 2017-06-08 12:55:25 +02:00
Hugh Perkins
9341f258d4 Add labels to #ifdef, in TensorReductionCuda.h 2017-06-06 15:51:06 +01:00
Benoit Steiner
1e736b9ead Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7)
Fixing SYCL alignment issue required by TensorFlow.
2017-05-26 17:23:00 +00:00
Benoit Steiner
9dee55ec33 Merged eigen/eigen into default 2017-05-26 09:01:04 -07:00
Mehdi Goli
0370d3576e Applying Ronnan's comments. 2017-05-26 16:01:48 +01:00
Benoit Steiner
615aff4d6e Merged in a-doumoulakis/opencl (pull request PR-12)
Enable triSYCL with Eigen
2017-05-25 18:18:23 +00:00
a-doumoulakis
c3bd860de8 Modification upon request
- Remove warning suppression
2017-05-25 18:46:18 +01:00
Mehdi Goli
e3f964ed55 Applying Benoit's comment;removing dead code. 2017-05-25 11:17:26 +01:00
Benoit Steiner
df90010cdd Merged in mehdi_goli/opencl/CmakeFixForUbuntu16.04 (pull request PR-11)
CmakeFixForUbuntu16.04
2017-05-24 19:03:57 +00:00
a-doumoulakis
fb853a857a Restore misplaced comment 2017-05-24 17:50:15 +01:00
a-doumoulakis
7a8ba565f8 Merge changed from upstream 2017-05-24 17:45:29 +01:00
Mehdi Goli
daf99daadd Merged in DuncanMcBain/opencl/default (pull request PR-2)
Update FindComputeCpp.cmake with new changes from SDK
2017-05-24 13:59:53 +00:00
Mehdi Goli
9ef5c948ba Fixing Cmake for gcc>=5. 2017-05-24 13:11:16 +01:00
Duncan McBain
0cb3c7c7dd Update FindComputeCpp.cmake with new changes from SDK 2017-05-24 12:24:21 +01:00
Mmanu Chaturvedi
2971503fed Specializing numeric_limits For AutoDiffScalar 2017-05-23 17:12:36 -04:00
Gael Guennebaud
26e8f9171e Fix compilation of matrix log with Map as input 2017-06-07 10:51:23 +02:00
Gael Guennebaud
f2a553fb7b bug #1411: fix usage of alignment information in vectorization of quaternion product and conjugate. 2017-06-07 10:10:30 +02:00
Christoph Hertzberg
e018142604 Make sure CholmodSupport works when included in multiple compilation units (issue was reported on stackoverflow.com) 2017-06-06 19:23:14 +02:00
Gael Guennebaud
8508db52ab bug #1417: make LinSpace compatible with std::complex 2017-06-06 17:25:56 +02:00
Mehdi Goli
9aa7c30163 Merge with Benoit. 2017-05-23 10:51:34 +01:00
Mehdi Goli
b42d775f13 Temporarry branch for synch with upstream 2017-05-23 10:51:14 +01:00
Benoit Steiner
615733381e Merged in mehdi_goli/opencl/FixingCmakeDependency (pull request PR-2)
Fixing Cmake Dependency for SYCL
2017-05-22 17:43:06 +00:00
Benoit Steiner
1500a67c41 Merged in mehdi_goli/opencl/TensorSupportedDevice (pull request PR-6)
Fixing suported device list.
2017-05-22 16:22:21 +00:00
Mehdi Goli
76c0fc1f95 Fixing SYCL alignment issue required by TensorFlow. 2017-05-22 16:49:32 +01:00
Mehdi Goli
2d17128d6f Fixing suported device list. 2017-05-22 16:40:33 +01:00
Mehdi Goli
61d7f3664a Fixing Cmake Dependency for SYCL 2017-05-22 14:58:28 +01:00
a-doumoulakis
a5226ce4f7 Add cmake file FindTriSYCL.cmake 2017-05-17 17:59:30 +01:00
a-doumoulakis
052426b824 Add support for triSYCL
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options

Fix contraction kernel with correct nd_item dimension
2017-05-05 19:26:27 +01:00
Abhijit Kundu
4343db84d8 updated warning number for nvcc relase 8 (V8.0.61) for the stupid warning message 'calling a __host__ function from a __host__ __device__ function is not allowed'. 2017-05-01 10:36:27 -04:00
Abhijit Kundu
9bc0a35731 Fixed nested angle barckets >> issue when compiling with cuda 8 2017-04-27 03:09:03 -04:00
Gael Guennebaud
891ac03483 Fix dense * sparse-selfadjoint-view product. 2017-04-25 13:58:10 +02:00
RJ Ryan
949a2da38c Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer.
Improves support for std::complex types when compiling for CUDA.

Expands on e2e9cdd169
 and 2bda1b0d93
.
2017-04-14 13:23:35 -07:00
Gael Guennebaud
d9084ac8e1 Improve mixing of complex and real in the vectorized path of apply_rotation_in_the_plane 2017-04-14 11:05:13 +02:00
Gael Guennebaud
f75dfdda7e Fix unwanted Real to Scalar to Real conversions in column-pivoting QR. 2017-04-14 10:34:30 +02:00
Gael Guennebaud
0f83aeb6b2 Improve cmake scripts for Pastix and BLAS detection. 2017-04-14 10:22:12 +02:00
Benoit Steiner
0d08165a7f Merged in benoitsteiner/opencl (pull request PR-309)
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
068cc09708 Preserve file naming conventions 2017-04-04 10:09:10 -07:00
Benoit Steiner
c302ea7bc4 Deleted empty line of code 2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1 Guard sycl specific code under a EIGEN_USE_SYCL ifdef 2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7 Code cleanup 2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd Guard the sycl specific code with EIGEN_USE_SYCL 2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL 2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666 iGate the sycl specific code under a EIGEN_USE_SYCL define 2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0 Fixed compilation error when sycl is enabled. 2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96 fix typos in the Tensor readme 2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6 Restored code compatibility with compilers that dont support c++11
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3 Restore the old constructors to retain compatibility with non c++11 compilers. 2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f Gate the sycl specific code under #ifdef sycl 2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555 Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax. 2017-03-28 16:50:34 +01:00
Simon Praetorius
511810797e Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport. 2017-03-24 17:45:56 +01:00
Luke Iwanski
a91417a7a5 Introduces align allocator for SYCL buffer 2017-03-20 14:48:54 +00:00
Gael Guennebaud
aae19c70ac update has_ReturnType to be more consistent with other has_ helpers 2017-03-17 17:33:15 +01:00
Benoit Steiner
f8a622ef3c Merged eigen/eigen into default 2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b Silenced compilation warning 2017-03-15 20:02:39 -07:00
Luke Iwanski
9597d6f6ab Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread 2017-03-15 19:28:09 +00:00
Luke Iwanski
c06861d15e Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration 2017-03-15 19:26:08 +00:00
Benoit Steiner
7f31bb6822 Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304)
Fixed compilation with cuda-clang
2017-03-15 16:48:52 +00:00
Gael Guennebaud
89fd0c3881 better check array index before using it 2017-03-15 15:18:03 +01:00
Benoit Jacob
61160a21d2 ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32. 2017-03-15 06:57:25 -04:00
Benoit Steiner
f0f3591118 Made the reduction code compile with cuda-clang 2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496 Adding synchronisation to convolution kernel for sycl backend. 2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b Get rid of Init(). 2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094 Use C++11 ctor forwarding to simplify code a bit. 2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6 Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.

* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.

* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.

* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053 Use name to distinguish name instead of the vendor 2017-03-08 18:26:34 +00:00
Mehdi Goli
aadb7405a7 Fixing typo in sycl Benchmark. 2017-03-08 18:20:06 +00:00
Gael Guennebaud
970ff78294 bug #1401: fix compilation of "cond ? x : -x" with x an AutoDiffScalar 2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a Adding sycl Benchmarks. 2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533 Fixing potential race condition on sycl device. 2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95 Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch. 2017-03-07 14:27:10 +00:00
Gael Guennebaud
e5156e4d25 fix typo 2017-03-07 11:25:58 +01:00
Gael Guennebaud
5694315fbb remove UTF8 symbol 2017-03-07 10:53:47 +01:00
Gael Guennebaud
e958c2baac remove UTF8 symbols 2017-03-07 10:47:40 +01:00
Gael Guennebaud
d967718525 do not include std header within extern C 2017-03-07 10:16:39 +01:00
Gael Guennebaud
659087b622 bug #1400: fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY 2017-03-07 10:02:34 +01:00
Ilya Biryukov
1c03d43a5c Fixed compilation with cuda-clang 2017-03-06 12:01:12 +01:00
Julian Kent
bbe717fa2f Make scaling work with non-square matrices 2017-03-03 12:58:51 +01:00
Benoit Steiner
a71943b9a4 Made the Tensor code compile with clang 3.9 2017-03-02 10:47:29 -08:00
Benoit Steiner
09ae0e6586 Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that:
* they're used consistently between the declaration and the definition of a function
  * we avoid calling host only methods from host device methods.
2017-03-01 11:47:47 -08:00
Benoit Steiner
1e2d046651 Silenced a couple of compilation warnings 2017-03-01 10:13:42 -08:00
Benoit Steiner
c1d87ec110 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-03-01 10:08:50 -08:00
Benoit Steiner
3a3f040baa Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 17:06:15 -08:00
Benoit Steiner
7b61944669 Made most of the packet math primitives usable within CUDA kernel when compiling with clang 2017-02-28 17:05:28 -08:00
Benoit Steiner
c92406d613 Silenced clang compilation warning. 2017-02-28 17:03:11 -08:00
Benoit Steiner
857adbbd52 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 14:58:45 -08:00
Benoit Steiner
4a7df114c8 Added missing EIGEN_DEVICE_FUNC 2017-02-28 14:00:15 -08:00
Benoit Steiner
de7b0fdea9 Made the TensorStorage class compile with clang 3.9 2017-02-28 13:52:22 -08:00
Benoit Steiner
765f4cc4b4 Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to run on GPU yet. 2017-02-28 11:57:00 -08:00
Benoit Steiner
e993c94f07 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:56:45 -08:00
Benoit Steiner
33443ec2b0 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:50:10 -08:00
Benoit Steiner
f3e9c42876 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:46:30 -08:00
Mehdi Goli
8296b87d7b Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used. 2017-02-28 17:16:14 +00:00
Gael Guennebaud
4e98a7b2f0 bug #1396: add some missing EIGEN_DEVICE_FUNC 2017-02-28 09:47:38 +01:00
Gael Guennebaud
478a9f53be Fix typo. 2017-02-28 09:32:45 +01:00
Benoit Steiner
889c606f8f Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops 2017-02-27 17:17:47 -08:00
Benoit Steiner
193939d6aa Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods. 2017-02-27 17:11:47 -08:00
Benoit Steiner
ed4dc9d01a Declared the plset, ploadt_ro, and ploaddup packet primitives as usable within a gpu kernel 2017-02-27 16:57:01 -08:00
Benoit Steiner
b1fc7c9a09 Added missing EIGEN_DEVICE_FUNC qualifiers. 2017-02-27 16:48:30 -08:00
Benoit Steiner
554116bec1 Added EIGEN_DEVICE_FUNC to make the prototype of the EigenBase override match that of DenseBase 2017-02-27 16:45:31 -08:00
Benoit Steiner
34d9fce93b Avoid unecessary float to double conversions. 2017-02-27 16:33:33 -08:00
Benoit Steiner
e0bd6f5738 Merged eigen/eigen into default 2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9 Adding TensorVolumePatchOP.h for sycl 2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137 Converting fixed float type into template type for TensorContraction. 2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae Adding Sycl Backend for TensorGenerator.h. 2017-02-22 16:36:24 +00:00
Gael Guennebaud
5c68ba41a8 typos 2017-02-21 17:10:55 +01:00
Gael Guennebaud
b0f55ef85a merge 2017-02-21 17:04:10 +01:00
Gael Guennebaud
d29e9d7119 Improve documentation of reshaped 2017-02-21 17:03:10 +01:00
Gael Guennebaud
9b6e365018 Fix linking issue. 2017-02-21 16:52:22 +01:00
Gael Guennebaud
3d200257d7 Add support for automatic-size deduction in reshaped, e.g.:
mat.reshaped(4,AutoSize); <-> mat.reshaped(4,mat.size()/4);
2017-02-21 15:57:25 +01:00
Gael Guennebaud
f8179385bd Add missing const version of mat(all). 2017-02-21 13:56:26 +01:00
Gael Guennebaud
1e3aa470fa Fix long to int conversion 2017-02-21 13:56:01 +01:00
Gael Guennebaud
b3fc0007ae Add support for mat(all) as an alias to mat.reshaped(mat.size(),fix<1>); 2017-02-21 13:49:09 +01:00
Mehdi Goli
4f07ac16b0 Reducing the number of warnings. 2017-02-21 10:09:47 +00:00
Gael Guennebaud
76687f385c bug #1394: fix compilation of SelfAdjointEigenSolver<Matrix>(sparse*sparse); 2017-02-20 14:27:26 +01:00
Gael Guennebaud
d8b1f6cebd bug #1380: for Map<> as input of matrix exponential 2017-02-20 14:06:06 +01:00
Gael Guennebaud
6572825703 bug #1395: fix the use of compile-time vectors as inputs of JacobiSVD. 2017-02-20 13:44:37 +01:00
Mehdi Goli
79ebc8f761 Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h. 2017-02-20 12:11:05 +00:00
Gael Guennebaud
9081c8f6ea Add support for RowOrder reshaped 2017-02-20 11:46:21 +01:00
Gael Guennebaud
a811a04696 Silent warning. 2017-02-20 10:14:21 +01:00
Gael Guennebaud
63798df038 Fix usage of CUDACC_VER 2017-02-20 08:16:36 +01:00
Gael Guennebaud
deefa54a54 Fix tracking of temporaries in unit tests 2017-02-19 10:32:54 +01:00
Gael Guennebaud
f8a55cc062 Fix compilation. 2017-02-18 10:08:13 +01:00
Gael Guennebaud
cbbf88c4d7 Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON. 2017-02-17 14:39:02 +01:00
Gael Guennebaud
582b5e39bf bug #1393: enable Matrix/Array explicit ctor from types with conversion operators (was ok with 3.2) 2017-02-17 14:10:57 +01:00
Benoit Steiner
cfa0568ef7 Size indices are signed. 2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0 Adding TensorLayoutSwapOp for sycl. 2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6 Adding TensorPatch.h for sycl backend. 2017-02-15 10:13:01 +00:00
Benoit Steiner
31a25ab226 Merged eigen/eigen into default 2017-02-14 15:36:21 -08:00
Mehdi Goli
0d153ded29 Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test 2017-02-13 17:25:12 +00:00
Gael Guennebaud
5937c4ae32 Fall back is_integral to std::is_integral in c++11 2017-02-13 17:14:26 +01:00
Gael Guennebaud
7073430946 Fix overflow and make use of long long in c++11 only. 2017-02-13 17:14:04 +01:00
Jonathan Hseu
3453b00a1e Fix vector indexing with uint64_t 2017-02-11 21:45:32 -08:00
Gael Guennebaud
e7ebe52bfb bug #1391: include IO.h before DenseBase to enable its usage in DenseBase plugins. 2017-02-13 09:46:20 +01:00
Gael Guennebaud
b3750990d5 Workaround some gcc 4.7 warnings 2017-02-11 23:24:06 +01:00
Gael Guennebaud
4b22048cea Fallback Reshaped to MapBase when possible (same storage order and linear access to the nested expression) 2017-02-11 15:32:53 +01:00
Gael Guennebaud
83d6a529c3 Use Eigen::fix<N> to pass compile-time sizes. 2017-02-11 15:31:28 +01:00
Gael Guennebaud
c16ee72b20 bug #1392: fix #include <Eigen/Sparse> with mpl2-only 2017-02-11 10:35:01 +01:00
Gael Guennebaud
e43016367a Forgot to include a file in previous commit 2017-02-11 10:34:18 +01:00
Gael Guennebaud
6486d4fc95 Worakound gcc 4.7 issue in c++11. 2017-02-11 10:29:10 +01:00
Gael Guennebaud
4a4a72951f Fix previous commits: disbale only problematic indexed view methods for old compilers instead of disabling everything.
Tested with gcc 4.7 (c++03) and gcc 4.8 (c++03 & c++11)
2017-02-11 10:28:44 +01:00
Benoit Steiner
fad776492f Merged eigen/eigen into default 2017-02-10 14:27:43 -08:00
Benoit Steiner
1ef30b8090 Fixed bug introduced in previous commit 2017-02-10 13:35:10 -08:00
Benoit Steiner
769208a17f Pulled latest updates from upstream 2017-02-10 13:11:40 -08:00
Benoit Steiner
8b3cc54c42 Added a new EIGEN_HAS_INDEXED_VIEW define that set to 0 for older compilers that are known to fail to compile the indexed views (I used the define from the indexed_views.cpp test).
Only include the indexed view methods when the compiler supports the code.
This makes it possible to use Eigen again in complex code bases such as TensorFlow and older compilers such as gcc 4.8
2017-02-10 13:08:49 -08:00
Gael Guennebaud
a1ff24f96a Fix prunning in (sparse*sparse).pruned() when the result is nearly dense. 2017-02-10 13:59:32 +01:00
Gael Guennebaud
0256c52359 Include clang in the list of non strict MSVC (just to be sure) 2017-02-10 13:41:52 +01:00
Alexander Neumann
dd58462e63 fixed inlining issue with clang-cl on visual studio
(grafted from 7962ac1a58
)
2017-02-08 23:50:38 +01:00
Gael Guennebaud
fc8fd5fd24 Improve multi-threading heuristic for matrix products with a small number of columns. 2017-02-07 17:19:59 +01:00
Mehdi Goli
0ee97b60c2 Adding mean to TensorReductionSycl.h 2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b Fixing TensorReductionSycl for min and max. 2017-02-06 18:05:23 +00:00
Gael Guennebaud
4254b3eda3 bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
Mehdi Goli
bc128f9f3b Reducing the warnings in Sycl backend. 2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30 Silenced several compilation warnings 2017-02-01 15:50:58 -08:00
Benoit Steiner
2db75c07a6 fixed the ordering of the template and EIGEN_DEVICE_FUNC keywords in a few more places to get more of the Eigen codebase to compile with nvcc again. 2017-02-01 15:41:29 -08:00
Benoit Steiner
fcd257039b Replaced EIGEN_DEVICE_FUNC template<foo> with template<foo> EIGEN_DEVICE_FUNC to make the code compile with nvcc8. 2017-02-01 15:30:49 -08:00
Gael Guennebaud
84090027c4 Disable a part of the unit test for gcc 4.8 2017-02-01 23:37:44 +01:00
Gael Guennebaud
0eceea4efd Define EIGEN_COMP_GNUC to reflect version number: 47, 48, 49, 50, 60, ... 2017-02-01 23:36:40 +01:00
Mehdi Goli
ff53050034 Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp in order to be the same as other tests. 2017-02-01 15:36:03 +00:00
Mehdi Goli
bab29936a1 Reducing warnings in Sycl backend. 2017-02-01 15:29:53 +00:00
Gael Guennebaud
645a8e32a5 Fix compilation of JacobiSVD for vectors type 2017-01-31 16:22:54 +01:00
Mehdi Goli
48a20b7d95 Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h 2017-01-31 14:06:36 +00:00
Gael Guennebaud
53026d29d4 bug #478: fix regression in the eigen decomposition of zero matrices. 2017-01-31 14:22:42 +01:00
Benoit Steiner
fbc39fd02c Merge latest changes from upstream 2017-01-30 15:25:57 -08:00
Gael Guennebaud
63de19c000 bug #1380: fix matrix exponential with Map<> 2017-01-30 13:55:27 +01:00
Gael Guennebaud
c86911ac73 bug #1384: fix evaluation of "sparse/scalar" that used the wrong evaluation path. 2017-01-30 13:38:24 +01:00
Mehdi Goli
82ce92419e Fixing the buffer type in memcpy. 2017-01-30 11:38:20 +00:00
Gael Guennebaud
24409f3acd Use fix<> API to specify compile-time reshaped sizes. 2017-01-29 15:20:35 +01:00
Gael Guennebaud
9036cda364 Cleanup intitial reshape implementation:
- reshape -> reshaped
 - make it compatible with evaluators.
2017-01-29 14:57:45 +01:00
Gael Guennebaud
0e89baa5d8 import yoco xiao's work on reshape 2017-01-29 14:29:31 +01:00
Gael Guennebaud
d024e9942d MSVC 1900 release is not c++14 compatible enough for us. The 1910 update seems to be fine though. 2017-01-27 22:17:59 +01:00
Gael Guennebaud
83592659ba merge 2017-01-27 21:59:59 +01:00
Gael Guennebaud
4a351be163 Fix warning 2017-01-27 11:59:35 +01:00
Gael Guennebaud
251ad3e04f Fix unamed type as template parametre issue. 2017-01-27 11:57:52 +01:00
Rasmus Munk Larsen
edaa0fc5d1 Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579 Merged in ggael/eigen-flexidexing (pull request PR-294)
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
98dfe0c13f Fix useless ';' warning 2017-01-25 22:55:04 +01:00
Gael Guennebaud
28351073d8 Fix unamed type as template argument (ok in c++11 only) 2017-01-25 22:54:51 +01:00
Gael Guennebaud
607be65a03 Fix duplicates of array_size bewteen unsupported and Core 2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
7d39c6d50a Merged eigen/eigen into default 2017-01-25 09:22:26 -08:00
Rasmus Munk Larsen
5c9ed4ba0d Reverse arguments for pmin in AVX. 2017-01-25 09:21:57 -08:00
Gael Guennebaud
850ca961d2 bug #1383: fix regression in LinSpaced for integers and high<low 2017-01-25 18:13:53 +01:00
Gael Guennebaud
296d24be4d bug #1381: fix sparse.diagonal() used as a rvalue.
The problem was that is "sparse" is not const, then sparse.diagonal() must have the
LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference,
const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing
zero coefficient. The trick is to return a reference to a local member of
evaluator<SparseMatrix>.
2017-01-25 17:39:01 +01:00
Gael Guennebaud
d06a48959a bug #1383: Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0. 2017-01-25 15:27:13 +01:00
Rasmus Munk Larsen
ae3e43a125 Remove extra space. 2017-01-24 16:16:39 -08:00
Benoit Steiner
e96c77668d Merged in rmlarsen/eigen2 (pull request PR-292)
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352 Update copy helper to use fast_memcpy. 2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221 Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.

2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux

Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.

The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.

Measured improvements in wall clock time:

Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2                          3.48      2.39    +31.3%
BM_memcpy_1T/8                          12.3      6.51    +47.0%
BM_memcpy_1T/64                          371       383     -3.2%
BM_memcpy_1T/512                       66922     66720     +0.3%
BM_memcpy_1T/4k                      9892867   6849682    +30.8%
BM_memcpy_1T/5k                     14951099  10332856    +30.9%
BM_memcpy_2T/2                          3.50      2.46    +29.7%
BM_memcpy_2T/8                          12.3      7.66    +37.7%
BM_memcpy_2T/64                          371       376     -1.3%
BM_memcpy_2T/512                       66652     66788     -0.2%
BM_memcpy_2T/4k                      6145012   6117776     +0.4%
BM_memcpy_2T/5k                      9181478   9010942     +1.9%
BM_memcpy_4T/2                          3.47      2.47    +31.0%
BM_memcpy_4T/8                          12.3      6.67    +45.8
BM_memcpy_4T/64                          374       376     -0.5%
BM_memcpy_4T/512                       67833     68019     -0.3%
BM_memcpy_4T/4k                      5057425   5188253     -2.6%
BM_memcpy_4T/5k                      7555638   7779468     -3.0%
BM_memcpy_6T/2                          3.51      2.50    +28.8%
BM_memcpy_6T/8                          12.3      7.61    +38.1%
BM_memcpy_6T/64                          373       378     -1.3%
BM_memcpy_6T/512                       66871     66774     +0.1%
BM_memcpy_6T/4k                      5112975   5233502     -2.4%
BM_memcpy_6T/5k                      7614180   7772246     -2.1%
BM_memcpy_8T/2                          3.47      2.41    +30.5%
BM_memcpy_8T/8                          12.4      10.5    +15.3%
BM_memcpy_8T/64                          372       388     -4.3%
BM_memcpy_8T/512                       67373     66588     +1.2%
BM_memcpy_8T/4k                      5148462   5254897     -2.1%
BM_memcpy_8T/5k                      7660989   7799058     -1.8%
BM_memcpy_12T/2                         3.50      2.40    +31.4%
BM_memcpy_12T/8                         12.4      7.55    +39.1
BM_memcpy_12T/64                         374       378     -1.1%
BM_memcpy_12T/512                      67132     66683     +0.7%
BM_memcpy_12T/4k                     5185125   5292920     -2.1%
BM_memcpy_12T/5k                     7717284   7942684     -2.9%
BM_slicingSmallPieces_1T/2              47.3      47.5     +0.4%
BM_slicingSmallPieces_1T/8              53.6      52.3     +2.4%
BM_slicingSmallPieces_1T/64              491       476     +3.1%
BM_slicingSmallPieces_1T/512           21734     18814    +13.4%
BM_slicingSmallPieces_1T/4k           394660    396760     -0.5%
BM_slicingSmallPieces_1T/5k           218722    209244     +4.3%
BM_slicingSmallPieces_2T/2              80.7      79.9     +1.0%
BM_slicingSmallPieces_2T/8              54.2      53.1     +2.0
BM_slicingSmallPieces_2T/64              497       477     +4.0%
BM_slicingSmallPieces_2T/512           21732     18822    +13.4%
BM_slicingSmallPieces_2T/4k           392885    390490     +0.6%
BM_slicingSmallPieces_2T/5k           221988    208678     +6.0%
BM_slicingSmallPieces_4T/2              80.8      80.1     +0.9%
BM_slicingSmallPieces_4T/8              54.1      53.2     +1.7%
BM_slicingSmallPieces_4T/64              493       476     +3.4%
BM_slicingSmallPieces_4T/512           21702     18758    +13.6%
BM_slicingSmallPieces_4T/4k           393962    404023     -2.6%
BM_slicingSmallPieces_4T/5k           249667    211732    +15.2%
BM_slicingSmallPieces_6T/2              80.5      80.1     +0.5%
BM_slicingSmallPieces_6T/8              54.4      53.4     +1.8%
BM_slicingSmallPieces_6T/64              488       478     +2.0%
BM_slicingSmallPieces_6T/512           21719     18841    +13.3%
BM_slicingSmallPieces_6T/4k           394950    397583     -0.7%
BM_slicingSmallPieces_6T/5k           223080    210148     +5.8%
BM_slicingSmallPieces_8T/2              81.2      80.4     +1.0%
BM_slicingSmallPieces_8T/8              58.1      53.5     +7.9%
BM_slicingSmallPieces_8T/64              489       480     +1.8%
BM_slicingSmallPieces_8T/512           21586     18798    +12.9%
BM_slicingSmallPieces_8T/4k           394592    400165     -1.4%
BM_slicingSmallPieces_8T/5k           219688    208301     +5.2%
BM_slicingSmallPieces_12T/2             80.2      79.8     +0.7%
BM_slicingSmallPieces_12T/8             54.4      53.4     +1.8
BM_slicingSmallPieces_12T/64             488       476     +2.5%
BM_slicingSmallPieces_12T/512          21931     18831    +14.1%
BM_slicingSmallPieces_12T/4k          393962    396541     -0.7%
BM_slicingSmallPieces_12T/5k          218803    207965     +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440 Fix NaN propagation for AVX512. 2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4 Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2 Add support for std::integral_constant 2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854 Add test for multiple symbols 2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13 Fix seq().reverse() in c++98 2017-01-24 11:36:43 +01:00
Gael Guennebaud
5783158e8f Add unit test for FixedInt and Symbolic 2017-01-24 10:55:12 +01:00
Gael Guennebaud
ddd83f82d8 Add support for "SymbolicExpr op fix<N>" in C++98/11 mode. 2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|) 2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62 Add internal doc 2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab Rename fix_t to FixedInt 2017-01-24 09:39:49 +01:00
Gael Guennebaud
156e6234f1 bug #1375: fix cmake installation with cmake 2.8 2017-01-24 09:16:40 +01:00
Gael Guennebaud
ba3f977946 bug #1376: add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j)) 2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36 bug #1382: move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!) 2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t 2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692 Use Index instead of size_t 2017-01-23 22:00:33 +01:00
Luke Iwanski
bf44fed9b7 Allows AMD APU 2017-01-23 15:56:45 +00:00
Gael Guennebaud
0fe278f7be bug #1379: fix compilation in sparse*diagonal*dense with openmp 2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e bug #1378: fix doc (DiagonalIndex vs Diagonal) 2017-01-21 22:09:59 +01:00
Mehdi Goli
602f8c27f5 Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow. 2017-01-20 18:23:20 +00:00
Gael Guennebaud
4d302a080c Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only) 2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24 Exploit fixed values in seq and reverse with C++98 compatibility 2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34 Add support for fixed-value in symbolic expression, c++11 only for now. 2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8 Made sure that enabling avx2 instructions enables avx and sse instructions as well. 2017-01-19 09:54:48 -08:00
Mehdi Goli
77cc4d06c7 Removing unused variables 2017-01-19 17:06:21 +00:00
Mehdi Goli
837fdbdcb2 Merging with Benoit's upstream. 2017-01-19 11:34:34 +00:00
Mehdi Goli
6bdd15f572 Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class. 2017-01-19 11:30:59 +00:00
Benoit Steiner
aa7fb88dfa Merged in LaFeuille/eigen (pull request PR-289)
Fix a typo
2017-01-18 16:44:39 -08:00
Gael Guennebaud
e84ed7b6ef Remove dead code 2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419 Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end. 2017-01-18 23:16:32 +01:00
Mehdi Goli
c6f7b33834 Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy. 2017-01-18 10:45:28 +00:00
Gael Guennebaud
15471432fe Add a .reverse() member to ArithmeticSequence. 2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a Add missing operator* 2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n) 2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353 Merge the generic and dynamic overloads of block() 2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8 Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f Fix regression when passing enums to operator() 2017-01-17 17:10:16 +01:00
Gael Guennebaud
f7852c3d16 Fix -Wunnamed-type-template-args 2017-01-17 16:05:58 +01:00
Gael Guennebaud
4f36dcfda8 Add a generic block() method compatible with Eigen::fix 2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356 Add a get_runtime_value helper to deal with pointer-to-function hack,
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
59801a3250 Add \newin{3.x} doxygen command 2017-01-17 10:31:28 +01:00
Gael Guennebaud
23bfcfc15f Add missing overload of get_compile_time for c++98/11 2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2 Disambiguate the two versions of fix for doxygen 2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2 Add support for symbolic expressions as arguments of operator() 2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844 typos in doc 2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa Typo 2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845 Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161 Introduce a EIGEN_HAS_CXX14 macro 2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381 Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree. 2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8 Reverting unintentional change to Eigen/Geometry 2017-01-16 11:05:56 +00:00
LaFeuille
1b19b80c06 Fix a typo 2017-01-13 07:24:55 +00:00
Fraser Cormack
8245d3c7ad Fix case-sensitivity of file include 2017-01-12 12:13:18 +00:00
Gael Guennebaud
752bd92ba5 Large code refactoring:
- generalize some utilities and move them to Meta (size(), array_size())
 - move handling of all and single indices to IndexedViewHelper.h
 - several cleanup changes
2017-01-11 17:24:02 +01:00
Gael Guennebaud
f93d1c58e0 Make get_compile_time compatible with variable_if_dynamic 2017-01-11 17:08:59 +01:00
Gael Guennebaud
c020d307a6 Make variable_if_dynamic<T> implicitely convertible to T 2017-01-11 17:08:05 +01:00
Gael Guennebaud
43c617e2ee merge 2017-01-11 14:33:37 +01:00
Gael Guennebaud
152cd57bb7 Enable generation of doc for static variables in Eigen's namespace. 2017-01-11 14:29:20 +01:00
Gael Guennebaud
b1dc0fa813 Move fix and symbolic to their own file, and improve doxygen compatibility 2017-01-11 14:28:28 +01:00
Gael Guennebaud
04397f17e2 Add 1D overloads of operator() 2017-01-11 13:17:09 +01:00
Gael Guennebaud
45199b9773 Fix typo 2017-01-11 09:34:08 +01:00
Gael Guennebaud
1b5570988b Add doc to seq, seqN, ArithmeticSequence, operator(), etc. 2017-01-10 22:58:58 +01:00
Gael Guennebaud
17eac60446 Factorize const and non-const version of the generic operator() method. 2017-01-10 21:45:55 +01:00
Gael Guennebaud
d072fc4b14 add writeable IndexedView 2017-01-10 17:10:35 +01:00
Gael Guennebaud
c9d5e5c6da Simplify Symbolic API: std::tuple is now used internally and automatically built. 2017-01-10 16:55:07 +01:00
Gael Guennebaud
407e7b7a93 Simplify symbolic API by using "symbol=value" to associate a runtime value to a symbol. 2017-01-10 16:45:32 +01:00
Gael Guennebaud
96e6cf9aa2 Fix linking issue. 2017-01-10 16:35:46 +01:00
Gael Guennebaud
e63678bc89 Fix ambiguous call 2017-01-10 16:33:40 +01:00
Gael Guennebaud
8e247744a4 Fix linking issue 2017-01-10 16:32:06 +01:00
Gael Guennebaud
b47a7e5c3a Add doc for IndexedView 2017-01-10 16:28:57 +01:00
Gael Guennebaud
87963f441c Fallback to Block<> when possible (Index, all, seq with > increment).
This is important to take advantage of the optimized implementations (evaluator, products, etc.),
and to support sparse matrices.
2017-01-10 14:25:30 +01:00
Gael Guennebaud
a98c7efb16 Add a more generic evaluation mechanism and minimalistic doc. 2017-01-10 11:46:29 +01:00
Gael Guennebaud
13d954f270 Cleanup Eigen's namespace 2017-01-10 11:06:02 +01:00
Gael Guennebaud
9eaab4f9e0 Refactoring: move all symbolic stuff into its own namespace 2017-01-10 10:57:08 +01:00
Gael Guennebaud
acd08900c9 Move 'last' and 'end' to their own namespace 2017-01-10 10:31:07 +01:00
Gael Guennebaud
1df2377d78 Implement c++98 version of seq() 2017-01-10 10:28:45 +01:00
Gael Guennebaud
ecd9cc5412 Isolate legacy code (we keep it for performance comparison purpose) 2017-01-10 09:34:25 +01:00
Gael Guennebaud
b50c3e967e Add a minimalistic symbolic scalar type with expression template and make use of it to define the last placeholder and to unify the return type of seq and seqN. 2017-01-09 23:42:16 +01:00
Gael Guennebaud
68064e14fa Rename span/range to seqN/seq 2017-01-09 17:35:21 +01:00
Gael Guennebaud
ad3eef7608 Add link to SO 2017-01-09 13:01:39 +01:00
Gael Guennebaud
75aef5b37f Fix extraction of compile-time size of std::array with gcc 2017-01-06 22:04:49 +01:00
Gael Guennebaud
233dff1b35 Add support for plain arrays for columns and both rows/columns 2017-01-06 22:01:53 +01:00
Gael Guennebaud
76e183bd52 Propagate compile-time size for plain arrays 2017-01-06 22:01:23 +01:00
Gael Guennebaud
3264d3c761 Add support for plain-array as indices, e.g., mat({1,2,3,4}) 2017-01-06 21:53:32 +01:00
Gael Guennebaud
831fffe874 Add missing doc of SparseView 2017-01-06 18:01:29 +01:00
Gael Guennebaud
a875167d99 Propagate compile-time increment and strides.
Had to introduce a UndefinedIncr constant for non structured list of indices.
2017-01-06 15:54:55 +01:00
Gael Guennebaud
e383d6159a MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd 2017-01-06 15:44:13 +01:00
Gael Guennebaud
fad1fa75b3 Propagate compile-time size with "all" and add c++11 array unit test 2017-01-06 13:29:33 +01:00
Gael Guennebaud
3730e3ca9e Use "fix" for compile-time values, propagate compile-time sizes for span, clean some cleanup. 2017-01-06 13:10:10 +01:00
Gael Guennebaud
60e99ad8d7 Add unit test for indexed views 2017-01-06 11:59:08 +01:00
Gael Guennebaud
ac7e4ac9c0 Initial commit to add a generic indexed-based view of matrices.
This version already works as a read-only expression.
Numerous refactoring, renaming, extension, tuning passes are expected...
2017-01-06 00:01:44 +01:00
Gael Guennebaud
f3f026c9aa Convert integers to real numbers when computing relative L2 error 2017-01-05 13:36:08 +01:00
Jim Radford
0c226644d8 LLT: const the arg to solveInPlace() to allow passing .transpose(), .block(), etc. 2017-01-04 14:42:57 -08:00
Jim Radford
be281e5289 LLT: avoid making a copy when decomposing in place 2017-01-04 14:43:56 -08:00
Gael Guennebaud
e27f17bf5c Gub 1453: fix Map with non-default inner-stride but no outer-stride. 2017-08-22 13:27:37 +02:00
Gael Guennebaud
21d0a0bcf5 bug #1456: add perf recommendation for LLT and storage format 2017-08-22 12:46:35 +02:00
Gael Guennebaud
2c3d70d915 Re-enable hidden doc in LLT 2017-08-22 12:04:09 +02:00
Gael Guennebaud
a6e7a41a55 bug #1455: Cholesky module depends on Jacobi for rank-updates. 2017-08-22 11:37:32 +02:00
Gael Guennebaud
e6021cc8cc bug #1458: fix documentation of LLT and LDLT info() method. 2017-08-22 11:32:55 +02:00
Gael Guennebaud
2810ba194b Clarify MKL_DIRECT_CALL doc. 2017-08-17 22:12:26 +02:00
Gael Guennebaud
f727844658 use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
Gael Guennebaud
8c858bd891 Clarify doc regarding the usage of MKL_DIRECT_CALL 2017-08-17 12:17:45 +02:00
Gael Guennebaud
b95f92843c Fix support for MKL's BLAS when using MKL_DIRECT_CALL. 2017-08-17 12:07:10 +02:00
Gael Guennebaud
89c01a494a Add unit test for has_ReturnType 2017-08-17 11:55:00 +02:00
Gael Guennebaud
687bedfcad Make NoAlias and JacobiRotation compatible with CUDA. 2017-08-17 11:51:22 +02:00
Gael Guennebaud
1f4b24d2df Do not preallocate more space than the matrix size (when the sparse matrix boils down to a vector 2017-07-20 10:13:48 +02:00
Gael Guennebaud
d580a90c9a Disable BDCSVD preallocation check. 2017-07-20 10:03:54 +02:00
Gael Guennebaud
55d7181557 Fix lazyness of operator* with CUDA 2017-07-20 09:47:28 +02:00
Gael Guennebaud
cda47c42c2 Fix compilation in c++98 mode. 2017-07-17 21:08:20 +02:00
Gael Guennebaud
a74b9ba7cd Update documentation for CUDA 2017-07-17 11:05:26 +02:00
Gael Guennebaud
3182bdbae6 Disable vectorization when compiled by nvcc, even is EIGEN_NO_CUDA is defined 2017-07-17 11:01:28 +02:00
Gael Guennebaud
9f8136ff74 disable nvcc boolean-expr-is-constant warning 2017-07-17 10:43:18 +02:00
Gael Guennebaud
bbd97b4095 Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases 2017-07-17 01:02:51 +02:00
Gael Guennebaud
2299717fd5 Fix and workaround several doxygen issues/warnings 2017-01-04 23:27:33 +01:00
Luke Iwanski
90c5bc8d64 Fixes auto appearance in functor template argument for reduction. 2017-01-04 22:18:44 +00:00
Gael Guennebaud
ee6f7f6c0c Add doc for sparse triangular solve functions 2017-01-04 23:10:36 +01:00
Gael Guennebaud
5165de97a4 Add missing snippet files. 2017-01-04 23:08:27 +01:00
Gael Guennebaud
a0a36ad0ef bug #1336: workaround doxygen failing to include numerous members of MatriBase in Matrix 2017-01-04 22:02:39 +01:00
Gael Guennebaud
29a1a58113 Document selfadjointView 2017-01-04 22:01:50 +01:00
Gael Guennebaud
a5ebc92f8d bug #1336: fix doxygen issue regarding EIGEN_CWISE_BINARY_RETURN_TYPE 2017-01-04 18:21:44 +01:00
Gael Guennebaud
45b289505c Add debug output 2017-01-03 11:31:02 +01:00
Gael Guennebaud
5838f078a7 Fix inclusion 2017-01-03 11:30:27 +01:00
Gael Guennebaud
8702562177 bug #1370: add doc for StorageIndex 2017-01-03 11:25:41 +01:00
Gael Guennebaud
575c078759 bug #1370: rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index 2017-01-03 11:19:14 +01:00
NeroBurner
c4fc2611ba add cmake-option to enable/disable creation of tests
* * *
disable unsupportet/test when test are disabled
* * *
rename EIGEN_ENABLE_TESTS to BUILD_TESTING
* * *
consider BUILD_TESTING in blas
2017-01-02 09:09:21 +01:00
Valentin Roussellet
d3c5525c23 Added += and + operators to inner iterators
Fix #1340
#1340
2016-12-28 18:29:30 +01:00
Gael Guennebaud
5c27962453 Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class. 2017-01-02 22:27:07 +01:00
Marco Falke
4ebf69394d doc: Fix trivial typo in AsciiQuickReference.txt
* * *
fixup!
2017-01-01 13:25:48 +00:00
Gael Guennebaud
8d7810a476 bug #1365: fix another type mismatch warning
(sync is set from and compared to an Index)
2016-12-28 23:35:43 +01:00
Gael Guennebaud
97812ff0d3 bug #1369: fix type mismatch warning.
Returned values of omp thread id and numbers are int,
o let's use int instead of Index here.
2016-12-28 23:29:35 +01:00
Gael Guennebaud
7713e20fd2 Fix compilation 2016-12-27 22:04:58 +01:00
Gael Guennebaud
ab69a7f6d1 Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order 2016-12-27 16:55:47 +01:00
Gael Guennebaud
d32a43e33a Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly. 2016-12-27 16:35:45 +01:00
Gael Guennebaud
7136267461 Add missing .outer() member to iterators of evaluators of cwise sparse binary expression 2016-12-27 16:34:30 +01:00
Gael Guennebaud
fe0ee72390 Fix check of storage order mismatch for "sparse cwiseop sparse". 2016-12-27 16:33:19 +01:00
Gael Guennebaud
6b8f637ab1 Harmless typo 2016-12-27 16:31:17 +01:00
Benoit Steiner
3eda02d78d Fixed the sycl benchmarking code 2016-12-22 10:37:05 -08:00
Mehdi Goli
8b1c2108ba Reverting asynchronous exec to Synchronous exec regarding random race condition. 2016-12-22 16:45:38 +00:00
Benoit Steiner
354baa0fb1 Avoid using horizontal adds since they're not very efficient. 2016-12-21 20:55:07 -08:00
Benoit Steiner
d7825b6707 Use native AVX512 types instead of Eigen Packets whenever possible. 2016-12-21 20:06:18 -08:00
Benoit Steiner
660da83e18 Pulled latest update from trunk 2016-12-21 16:43:27 -08:00
Benoit Steiner
4236aebe10 Simplified the contraction code` 2016-12-21 16:42:56 -08:00
Benoit Steiner
3cfa16f41d Merged in benoitsteiner/opencl (pull request PR-279)
Fix for auto appearing in functor template argument.
2016-12-21 15:08:54 -08:00
Benoit Steiner
519d63d350 Added support for libxsmm kernel in multithreaded contractions 2016-12-21 15:06:06 -08:00
Benoit Steiner
0657228569 Simplified the way we link libxsmm 2016-12-21 14:40:08 -08:00
Benoit Steiner
bbca405f04 Pulled latest updates from trunk 2016-12-21 13:45:28 -08:00
Benoit Steiner
b91be60220 Automatically include and link libxsmm when present. 2016-12-21 13:44:59 -08:00
Gael Guennebaud
c6882a72ed Merged in joaoruileal/eigen (pull request PR-276)
Minor improvements to Umfpack support
2016-12-21 21:39:48 +01:00
Benoit Steiner
f9eff17e91 Leverage libxsmm kernels within signle threaded contractions 2016-12-21 12:32:06 -08:00
Benoit Steiner
c19fe5e9ed Added support for libxsmm in the eigen makefiles 2016-12-21 10:43:40 -08:00
Benoit Steiner
a34d4ebd74 Merged in benoitsteiner/opencl (pull request PR-278) 2016-12-21 08:24:17 -08:00
Luke Iwanski
c55ecfd820 Fix for auto appearing in functor template argument. 2016-12-21 15:42:51 +00:00
Joao Rui Leal
c8c89b5e19 renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus() 2016-12-21 09:16:28 +00:00
Benoit Steiner
0f577d4744 Merged eigen/eigen into default 2016-12-20 17:02:06 -08:00
Gael Guennebaud
f2f9df8aa5 Remove MSVC warning 4127 - conditional expression is constant from the disabled list as we now have a local workaround. 2016-12-20 22:53:19 +01:00
Gael Guennebaud
2b3fc981b8 bug #1362: workaround constant conditional warning produced by MSVC 2016-12-20 22:52:27 +01:00
Luke Iwanski
29186f766f Fixed order of initialisation in ExecExprFunctorKernel functor. 2016-12-20 21:32:42 +00:00
Gael Guennebaud
94e8d8902f Fix bug #1367: compilation fix for gcc 4.1! 2016-12-20 22:17:01 +01:00
Gael Guennebaud
e8d6862f14 Properly adjust precision when saving to Market format. 2016-12-20 22:10:33 +01:00
Gael Guennebaud
e2f4ee1c2b Speed up parsing of sparse Market file. 2016-12-20 21:56:21 +01:00
Luke Iwanski
8245851d1b Matching parameters order between lambda and the functor. 2016-12-20 16:18:15 +00:00
Gael Guennebaud
684cfc762d Add transpose, adjoint, conjugate methods to SelfAdjointView (useful to write generic code) 2016-12-20 16:33:53 +01:00
Gael Guennebaud
8bd0d3aa34 merge 2016-12-20 15:56:00 +01:00
Gael Guennebaud
11f55b2979 Optimize storage layout of Cwise* and PlainObjectBase evaluator to remove the functor or outer-stride if they are empty.
For instance, sizeof("(A-B).cwiseAbs2()") with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
In theory, evaluators should be completely optimized away by the compiler, but this might help in some cases.
2016-12-20 15:55:40 +01:00
Gael Guennebaud
5271474b15 Remove common "noncopyable" base class from evaluator_base to get a chance to get EBO (Empty Base Optimization)
Note: we should probbaly get rid of this class and define a macro instead.
2016-12-20 15:51:30 +01:00
Christoph Hertzberg
1c024e5585 Added some possible temporaries to .hgignore 2016-12-20 14:45:44 +01:00
Gael Guennebaud
316673bbde Clean-up usage of ExpressionTraits in all/any implementation. 2016-12-20 14:38:05 +01:00
Benoit Steiner
548ed30a1c Added an OpenCL regression test 2016-12-19 18:56:26 -08:00
Christoph Hertzberg
10c6bcdc2e Add support for long indexes and for (real-valued) row-major matrices to CholmodSupport module 2016-12-19 14:07:42 +01:00
Gael Guennebaud
f5d644b415 Make sure that HyperPlane::transform manitains a unit normal vector in the Affine case. 2016-12-20 09:35:00 +01:00
Benoit Steiner
27ceb43bf6 Fixed race condition in the tensor_shuffling_sycl test 2016-12-19 15:34:42 -08:00
Benoit Steiner
923acadfac Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics 2016-12-19 13:02:27 -08:00
Benoit Jacob
751e097c57 Use 32 registers on ARM64 2016-12-19 13:44:46 -05:00
Benoit Steiner
fb1d0138ec Include SSE packet instructions when compiling with avx512 enabled. 2016-12-19 07:32:48 -08:00
Joao Rui Leal
95b804c0fe it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack 2016-12-19 10:45:59 +00:00
Gael Guennebaud
8c0e701504 bug #1360: fix sign issue with pmull on altivec 2016-12-18 22:13:19 +00:00
Gael Guennebaud
fc94258e77 Fix unused warning 2016-12-18 22:11:48 +00:00
Benoit Steiner
0e0d92d34b Merged in benoitsteiner/opencl (pull request PR-275)
Improved support for OpenCL
2016-12-17 10:14:17 -08:00
Benoit Steiner
9e03dfb452 Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code 2016-12-17 09:23:37 -08:00
Benoit Steiner
70d0172f0c Merged eigen/eigen into default 2016-12-16 17:37:04 -08:00
Benoit Steiner
8910442e19 Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl. 2016-12-16 15:45:04 -08:00
Luke Iwanski
54db66c5df struct -> class in order to silence compilation warning. 2016-12-16 20:25:20 +00:00
Mehdi Goli
35bae513a0 Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl. 2016-12-16 19:46:45 +00:00
ermak
d60cca32e5 Transformation methods added to ParametrizedLine class. 2016-12-17 00:45:13 +07:00
Jeff Trull
7949849ebc refactor common row/column iteration code into its own class 2016-12-08 19:40:15 -08:00
Jeff Trull
d7bc64328b add display of entries to gdb sparse matrix prettyprinter 2016-12-08 18:50:17 -08:00
Jeff Trull
ff424927bc Introduce a simple pretty printer for sparse matrices (no contents) 2016-12-08 09:45:27 -08:00
Jeff Trull
5ce5418631 Correct prettyprinter comment - Quaternions are in fact supported 2016-12-08 07:31:16 -08:00
Rafael Guglielmetti
8f11df2667 NumTraits.h:
For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost
2016-12-16 09:07:12 +00:00
Gael Guennebaud
7d5303a083 Partly revert changeset 642dddcce2
, just in case the x87 issue popup again
2016-12-16 09:25:14 +01:00
Benoit Steiner
2f7c2459b7 Merged in benoitsteiner/opencl (pull request PR-272)
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
2016-12-15 17:46:40 -08:00
Mehdi Goli
c5e8546306 Adding asynchandler to sycl queue as lack of it can cause undefined behaviour. 2016-12-15 16:59:57 +00:00
Christoph Hertzberg
4247d35d4b Fixed bug which (extremely rarely) could end in an infinite loop 2016-12-15 17:22:12 +01:00
Christoph Hertzberg
642dddcce2 Fix nonnull-compare warning 2016-12-15 17:16:56 +01:00
Benoit Steiner
1324ffef2f Reenabled the use of constexpr on OpenCL devices 2016-12-15 06:49:38 -08:00
Gael Guennebaud
5d00fdf0e8 bug #1363: fix mingw's ABI issue 2016-12-15 11:58:31 +01:00
Benoit Steiner
2c2e218471 Avoid using #define since they can conflict with user code 2016-12-14 19:49:15 -08:00
Benoit Steiner
3beb180ee5 Don't call EnvThread::OnCancel by default since it doesn't do anything. 2016-12-14 18:33:39 -08:00
Benoit Steiner
9ff5d0f821 Merged eigen/eigen into default 2016-12-14 17:32:16 -08:00
Mehdi Goli
730eb9fe1c Adding asynchronous execution as it improves the performance. 2016-12-14 17:38:53 +00:00
Gael Guennebaud
11b492e993 bug #1358: fix compilation for sparse += sparse.selfadjointView(); 2016-12-14 17:53:47 +01:00
Gael Guennebaud
e67397bfa7 bug #1359: fix compilation of col_major_sparse.row() *= scalar
(used to work in 3.2.9 though the expression is not really writable)
2016-12-14 17:05:26 +01:00
Gael Guennebaud
98d7458275 bug #1359: fix sparse /=scalar and *=scalar implementation.
InnerIterators must be obtained from an evaluator.
2016-12-14 17:03:13 +01:00
Mehdi Goli
2d4a091beb Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h 2016-12-14 15:30:37 +00:00
Gael Guennebaud
c817ce3ba3 bug #1361: fix compilation issue in mat=perm.inverse() 2016-12-13 23:10:27 +01:00
Benoit Steiner
a432fc102d Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool 2016-12-12 15:24:16 -08:00
Benoit Steiner
8ae68924ed Made ThreadPoolInterface::Cancel() an optional functionality 2016-12-12 11:58:38 -08:00
Gael Guennebaud
57acb05eef Update and extend doc on alignment issues. 2016-12-11 22:45:32 +01:00
Benoit Steiner
76fca22134 Use a more accurate timer to sleep on Linux systems. 2016-12-09 15:12:24 -08:00
Benoit Steiner
4deafd35b7 Introduce a portable EIGEN_SLEEP macro. 2016-12-09 14:52:15 -08:00
Benoit Steiner
aafa97f4d2 Fixed build error with MSVC 2016-12-09 14:42:32 -08:00
Benoit Steiner
2f5b7a199b Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms. 2016-12-09 13:05:14 -08:00
Benoit Steiner
3d59a47720 Added a message to ease the detection of platforms on which thread cancellation isn't supported. 2016-12-08 14:51:46 -08:00
Benoit Steiner
28ee8f42b2 Added a Flush method to the RunQueue 2016-12-08 14:07:56 -08:00
Benoit Steiner
69ef267a77 Added the new threadpool cancel method to the threadpool interface based class. 2016-12-08 14:03:25 -08:00
Benoit Steiner
7bfff85355 Added support for thread cancellation on Linux 2016-12-08 08:12:49 -08:00
Benoit Steiner
6811e6cf49 Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268)
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-08 05:14:11 -08:00
Gael Guennebaud
747202d338 typo 2016-12-08 12:48:15 +01:00
Gael Guennebaud
bb297abb9e make sure we use the right eigen version 2016-12-08 12:00:11 +01:00
Gael Guennebaud
8b4b00d277 fix usage of custom compiler 2016-12-08 11:59:39 +01:00
Gael Guennebaud
7105596899 Add missing include and use -O3 2016-12-07 16:56:08 +01:00
Gael Guennebaud
780f3c1adf Fix call to convert on linux 2016-12-07 16:30:11 +01:00
Gael Guennebaud
3855ab472f Cleanup file structure 2016-12-07 14:23:49 +01:00
Gael Guennebaud
59a59fa8e7 Update perf monitoring scripts to generate html/svg outputs 2016-12-07 13:36:56 +01:00
Angelos Mantzaflaris
7694684992 Remove superfluous const's (can cause warnings on some Intel compilers)
(grafted from e236d3443c
)
2016-12-07 00:37:48 +01:00
Gael Guennebaud
f2c506b03d Add a script example to run and upload performance tests 2016-12-06 16:46:52 +01:00
Gael Guennebaud
1b4e085a7f generate png file for web upload 2016-12-06 16:46:22 +01:00
Gael Guennebaud
f725f1cebc Mention the CMAKE_PREFIX_PATH variable. 2016-12-06 15:23:45 +01:00
Gael Guennebaud
f90c4aebc5 Update monitored changeset lists 2016-12-06 15:07:46 +01:00
Gael Guennebaud
eb621413c1 Revert vec/y to vec*(1/y) in row-major TRSM:
- div is extremely costly
- this is consistent with the column-major case
- this is consistent with all other BLAS implementations
2016-12-06 15:04:50 +01:00
Gael Guennebaud
8365c2c941 Fix BLAS backend for symmetric rank K updates. 2016-12-06 14:47:09 +01:00
Gael Guennebaud
0c4d05b009 Explain how to choose your favorite Eigen version 2016-12-06 11:34:06 +01:00
Silvio Traversaro
e049a2a72a Added relocatable cmake support also for CMake before 3.0 and after 2.8.8 2016-12-06 10:37:34 +01:00
Srinivas Vasudevan
e6c8b5500c Change comparisons to use Scalar instead of RealScalar. 2016-12-05 14:01:45 -08:00
Srinivas Vasudevan
f7d7c33a28 Fix expm1 CUDA implementation (do not shadow exp CUDA implementation). 2016-12-05 12:19:01 -08:00
Silvio Traversaro
18481b518f Make CMake config file relocatable 2016-12-05 10:39:52 +01:00
Gael Guennebaud
c68c8631e7 fix compilation of BTL's blaze interface 2016-12-05 23:02:16 +01:00
Gael Guennebaud
1ff1d4a124 Add performance monitoring for LLT 2016-12-05 23:01:52 +01:00
Srinivas Vasudevan
09ee7f0c80 Fix small nit where I changed name of plog1p to pexpm1. 2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
a0d3ac760f Sync from Head. 2016-12-02 14:14:45 -08:00
Srinivas Vasudevan
218764ee1f Added support for expm1 in Eigen. 2016-12-02 14:13:01 -08:00
Gael Guennebaud
66f65ccc36 Ease compiler job to generate clean and efficient code in mat*vec. 2016-12-02 22:41:26 +01:00
Gael Guennebaud
fe696022ec Operators += and -= do not resize! 2016-12-02 22:40:25 +01:00
Angelos Mantzaflaris
18de92329e use numext::abs
(grafted from 0a08d4c60b
)
2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris
e8a6aa518e 1. Add explicit template to abs2 (resolves deduction for some arithmetic types)
2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned)
(grafted from 4086187e49
)
2016-12-02 11:39:18 +01:00
Gael Guennebaud
a6b971e291 Fix memory leak in Ref<Sparse> 2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65 Optimize SparseLU::solve for rhs vectors 2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903 remove temporary in SparseLU::solve 2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4 bug #1356: fix calls to evaluator::coeffRef(0,0) to get the address of the destination
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86 typo 2016-12-05 13:51:07 +01:00
Gael Guennebaud
445c015751 extend monitoring benchmarks with transpose matrix-vector and triangular matrix-vectors. 2016-12-05 13:36:26 +01:00
Gael Guennebaud
e3f613cbd4 Improve performance of row-major-dense-matrix * vector products for recent CPUs.
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354 Clean debugging code 2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a Merged in srvasude/eigen (pull request PR-265)
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
4465d20403 Add missing generic load methods. 2016-12-03 21:25:04 +01:00
Gael Guennebaud
6a5fe86098 Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Benoit Steiner
2bfece5cd1 Merged eigen/eigen into default 2016-12-02 16:30:14 -08:00
Mehdi Goli
592acc5bfa Makingt default numeric_list works with sycl. 2016-12-02 17:58:30 +00:00
Gael Guennebaud
8dfb3e00b8 merge 2016-12-02 11:34:21 +01:00
Gael Guennebaud
4c0d5f3c01 Add perf monitoring for gemv 2016-12-02 11:34:12 +01:00
Gael Guennebaud
d2718d662c Re-enable A^T*A action in BTL 2016-12-02 11:32:03 +01:00
Christoph Hertzberg
22f7d398e2 bug #1355: Fixed wrong line-endings on two files 2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4 Clean up SparseCore module regarding ReverseInnerIterator 2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09 typo UIntPtr
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655 fix two warnings(unused typedef, unused variable) and a typo
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb fix member order 2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae Merged in rmlarsen/eigen (pull request PR-256)
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Gael Guennebaud
f95e3b84a5 merge 2016-12-01 16:18:57 +01:00
Benoit Steiner
7ff26ddcbb Merged eigen/eigen into default 2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d Fix misleading-indentation warnings. 2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code. 2016-12-01 13:02:27 +00:00
Benoit Steiner
a70393fd02 Cleaned up forward declarations 2016-11-30 21:59:07 -08:00
Benoit Steiner
e073de96dc Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again 2016-11-30 21:36:52 -08:00
Benoit Steiner
fca27350eb Added the deallocate_all() method back 2016-11-30 20:45:20 -08:00
Benoit Steiner
e633a8371f Simplified includes 2016-11-30 20:21:18 -08:00
Benoit Steiner
7cd33df4ce Improved formatting 2016-11-30 20:20:44 -08:00
Benoit Steiner
fd1dc3363e Merged eigen/eigen into default 2016-11-30 20:16:17 -08:00
Benoit Steiner
f5107010ee Udated the Sizes class to work on AMD gpus without requiring a separate implementation 2016-11-30 19:57:28 -08:00
Benoit Steiner
e37c2c52d3 Added an implementation of numeric_list that works with sycl 2016-11-30 19:55:15 -08:00
Gael Guennebaud
8df272af88 Fix slection of product implementation for dynamic size matrices with fixed max size. 2016-11-30 22:21:33 +01:00
Benoit Steiner
faa2ff99c6 Pulled latest update from trunk 2016-11-30 09:31:24 -08:00
Benoit Steiner
df3da0780d Updated customIndices2Array to handle various index sizes. 2016-11-30 09:30:12 -08:00
Gael Guennebaud
c927af60ed Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times. 2016-11-30 17:59:13 +01:00
Luke Iwanski
26fff1c5b1 Added EIGEN_STRONG_INLINE to get_sycl_supported_device(). 2016-11-30 16:55:22 +00:00
Gael Guennebaud
ab4ef5e66e bug #1351: fix compilation of random with old compilers 2016-11-30 17:37:53 +01:00
Sergiu Deitsch
5e3c5c42f6 cmake: remove architecture dependency from Eigen3ConfigVersion.cmake
Also, install Eigen3*.cmake under $prefix/share/eigen3/cmake by default.
(grafted from 86ab00cdcf
)
2016-11-30 15:46:46 +01:00
Sergiu Deitsch
3440b46e2f doc: mention the NO_MODULE option and target availability
(grafted from 65f09be8d2
)
2016-11-30 15:41:38 +01:00
Rasmus Munk Larsen
a0329f64fb Add a default constructor for the "fake" __half class when not using the
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Mehdi Goli
577ce78085 Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend. 2016-11-29 15:30:42 +00:00
Benoit Steiner
3011dc94ef Call internal::array_prod to compute the total size of the tensor. 2016-11-28 09:00:31 -08:00
Benoit Steiner
02080e2b67 Merged eigen/eigen into default 2016-11-27 07:27:30 -08:00
Benoit Steiner
9fd081cddc Fixed compilation warnings 2016-11-26 20:22:25 -08:00
Benoit Steiner
9f8fbd9434 Merged eigen/eigen into default 2016-11-26 11:28:25 -08:00
Benoit Steiner
67b2c41f30 Avoided unnecessary type conversion 2016-11-26 11:27:29 -08:00
Benoit Steiner
7fe704596a Added missing array_get method for numeric_list 2016-11-26 11:26:07 -08:00
Mehdi Goli
7318daf887 Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h. 2016-11-25 16:19:07 +00:00
Benoit Steiner
7ad37606dd Fixed the documentation of Scalar Tensors 2016-11-24 12:31:43 -08:00
Benoit Steiner
3be1afca11 Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5 2016-11-23 18:49:51 -08:00
Gael Guennebaud
308961c05e Fix compilation. 2016-11-23 22:17:52 +01:00
Gael Guennebaud
21d0286d81 bug #1348: Document EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES,
and reflect in the doc that EIGEN_DONT_ALIGN* are deprecated.
2016-11-23 22:15:03 +01:00
Mehdi Goli
b8cc5635d5 Removing unsupported device from test case; cleaning the tensor device sycl. 2016-11-23 16:30:41 +00:00
Gael Guennebaud
7f6333c32b Merged in tal500/eigen-eulerangles (pull request PR-237)
Euler angles
2016-11-23 15:17:38 +00:00
Gael Guennebaud
f12b368417 Extend polynomial solver unit tests to complexes 2016-11-23 16:05:45 +01:00
Gael Guennebaud
56e5ec07c6 Automatically switch between EigenSolver and ComplexEigenSolver, and fix a few Real versus Scalar issues. 2016-11-23 16:05:10 +01:00
Gael Guennebaud
9246587122 Patch from Oleg Shirokobrod to extend polynomial solver to complexes 2016-11-23 15:42:26 +01:00
Gael Guennebaud
e340866c81 Fix compilation with gcc and old ABI version 2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98 Fix compilation issue with MSVC:
MSVC always messes up with shadowed template arguments, for instance in:
  struct B { typedef float T; }
  template<typename T> struct A : B {
    T g;
  };
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3 Optimize predux<Packet8f> (AVX) 2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856 Disable usage of SSE3 _mm_hadd_ps that is extremely slow. 2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e Optimize predux<Packet4d> (AVX) 2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940 Disable usage of SSE3 haddpd that is extremely slow. 2016-11-22 16:58:31 +01:00
Sergiu Deitsch
5c516e4e0a cmake: added Eigen3::Eigen imported target
(grafted from a287140f72
)
2016-11-22 12:25:06 +01:00
Gael Guennebaud
6a84246a6a Fix regression in assigment of sparse block to spasre block. 2016-11-21 21:46:42 +01:00
Benoit Steiner
f11da1d83b Made the QueueInterface thread safe 2016-11-20 13:17:08 -08:00
Benoit Steiner
ed839c5851 Enable the use of constant expressions with clang >= 3.6 2016-11-20 10:34:49 -08:00
Benoit Steiner
6d781e3e52 Merged eigen/eigen into default 2016-11-20 10:12:54 -08:00
Benoit Steiner
79a07b891b Fixed a typo 2016-11-20 07:07:41 -08:00
Gael Guennebaud
465ede0f20 Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d3
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474 Fixed merge conflicts 2016-11-19 19:12:59 -08:00
Benoit Steiner
9265ca707e Made it possible to check the state of a sycl device without synchronization 2016-11-19 10:56:24 -08:00
Benoit Steiner
2d1aec15a7 Added missing include 2016-11-19 08:09:54 -08:00
Luke Iwanski
af67335e0e Added test for cwiseMin, cwiseMax and operator%. 2016-11-19 13:37:27 +00:00
Benoit Steiner
1bdf1b9ce0 Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
a357fe1fb9 Code cleanup 2016-11-18 16:58:09 -08:00
Benoit Steiner
1c6eafb46b Updated cxx11_tensor_device_sycl to run only on the OpenCL devices available on the host 2016-11-18 16:43:27 -08:00
Benoit Steiner
ca754caa23 Only runs the cxx11_tensor_reduction_sycl on devices that are available. 2016-11-18 16:31:14 -08:00
Benoit Steiner
dc601d79d1 Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices(). 2016-11-18 16:26:50 -08:00
Benoit Steiner
8649e16c2a Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio 2016-11-18 14:18:34 -08:00
Benoit Steiner
110b7f8d9f Deleted unnecessary semicolons 2016-11-18 14:06:17 -08:00
Benoit Steiner
b5e3285e16 Test broadcasting on OpenCL devices with 64 bit indexing 2016-11-18 13:44:20 -08:00
Gael Guennebaud
164414c563 Merged in ChunW/eigen (pull request PR-252)
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Benoit Steiner
37c2c516a6 Cleaned up the sycl device code 2016-11-18 12:38:06 -08:00
Benoit Steiner
7335c49204 Fixed the cxx11_tensor_device_sycl test 2016-11-18 12:37:13 -08:00
Mehdi Goli
15e226d7d3 adding Benoit changes on the TensorDeviceSycl.h 2016-11-18 16:34:54 +00:00
Mehdi Goli
622805a0c5 Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}. 2016-11-18 16:20:42 +00:00
Luke Iwanski
5159675c33 Added isnan, isfinite and isinf for SYCL device. Plus test for that. 2016-11-18 16:01:48 +00:00
Tal Hadad
76b2a3e6e7 Allow to construct EulerAngles from 3D vector directly.
Using assignment template struct to distinguish between 3D vector and 3D rotation matrix.
2016-11-18 15:01:06 +02:00
Luke Iwanski
927bd62d2a Now testing out (+=, =) in.FUNC() and out (+=, =) out.FUNC() 2016-11-18 11:16:42 +00:00
Gael Guennebaud
8193ffb3d3 bug #1343: fix compilation regression in mat+=selfadjoint_view.
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2 bug #1343: fix compilation regression in array = matrix_product 2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f Merged eigen/eigen into default 2016-11-17 22:53:37 -08:00
Benoit Steiner
553f50b246 Added a way to detect errors generated by the opencl device from the host 2016-11-17 21:51:48 -08:00
Benoit Steiner
72a45d32e9 Cleanup 2016-11-17 21:29:15 -08:00
Benoit Steiner
4349fc640e Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
Small cleanup
2016-11-17 20:27:54 -08:00
Benoit Steiner
a6a3fd0703 Made TensorDeviceCuda.h compile on windows 2016-11-17 16:15:27 -08:00
Chun Wang
0d0948c3b9 Workaround for error in VS2012 with /clr 2016-11-17 17:54:27 -05:00
Benoit Steiner
004344cf54 Avoid calling log(0) or 1/0 2016-11-17 11:56:44 -08:00
Konstantinos Margaritis
a1d5c503fa replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f 2016-11-17 13:27:45 -05:00
Konstantinos Margaritis
672aa97d4d implement float/std::complex<float> for ZVector as well, minor fixes to ZVector 2016-11-17 13:27:33 -05:00
Konstantinos Margaritis
8290e21fb5 replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f 2016-11-17 13:21:15 -05:00
Luke Iwanski
7878756dea Fixed existing test. 2016-11-17 17:46:55 +00:00
Luke Iwanski
c5130dedbe Specialised basic math functions for SYCL device. 2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256 Enable the use of AVX512 instruction by default 2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c bump default branch to 3.3.90 2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4 Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs 2016-11-16 09:01:51 -08:00
Benoit Steiner
b5c75351e3 Merged eigen/eigen into default 2016-11-14 15:54:44 -08:00
Rasmus Munk Larsen
32df1b1046 Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics). 2016-11-14 14:18:16 -08:00
Mehdi Goli
05e8c2a1d9 Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl. 2016-11-14 18:13:53 +00:00
Mehdi Goli
f8ca893976 Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing. 2016-11-14 17:51:57 +00:00
Gael Guennebaud
0ee92aa38e Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products. 2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0 bug #426: move operator && and || to MatrixBase and SparseMatrixBase. 2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c Merged in olesalscheider/eigen (pull request PR-248)
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba Fix regression in SparseMatrix::ReverseInnerIterator 2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408 Make sure not to call numext::maxi on expression templates 2016-11-12 12:20:57 +01:00
Mehdi Goli
a5c3f15682 Adding comment to TensorDeviceSycl.h and cleaning the code. 2016-11-11 19:06:34 +00:00
Benoit Steiner
f4722aa479 Merged in benoitsteiner/opencl (pull request PR-247) 2016-11-11 00:01:28 +00:00
Mehdi Goli
3be3963021 Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor. 2016-11-10 19:16:31 +00:00
Mehdi Goli
12387abad5 adding the missing in eigen_assert! 2016-11-10 18:58:08 +00:00
Mehdi Goli
2e704d4257 Adding Memset; optimising MecopyDeviceToHost by removing double copying; 2016-11-10 18:45:12 +00:00
Benoit Steiner
75c080b176 Added a test to validate memory transfers between host and sycl device 2016-11-09 06:23:42 -08:00
Tal Hadad
15eca2432a Euler tests: Tighter precision when no roll exists and clean code. 2016-10-18 23:24:57 +03:00
Tal Hadad
6f4f12d1ed Add isApprox() and cast() functions.
test cases included
2016-10-17 22:23:47 +03:00
Tal Hadad
7402cfd4cc Add safty for near pole cases and test them better. 2016-10-17 20:42:08 +03:00
Tal Hadad
58f5d7d058 Fix calc bug, docs and better testing.
Test code changes:
* better coded
* rand and manual numbers
* singularity checking
2016-10-16 14:39:26 +03:00
Tal Hadad
078a202621 Merge Hongkai Dai correct range calculation, and remove ranges from API.
Docs updated.
2016-10-14 16:03:28 +03:00
Hongkai Dai
014d9f1d9b implement euler angles with the right ranges 2016-10-13 14:45:51 -07:00
Heiko Bauke
e19b58e672 alias template for matrix and array classes 2016-04-23 00:08:51 +02:00
yoco
15f273b63c fix reshape flag and test case 2014-02-10 22:49:13 +08:00
yoco
b64a09acc1 fix reshape's Max[Row/Col]AtCompileTime 2014-02-04 05:54:50 +08:00
yoco
f8ad87f226 Reshape always non-directly-access 2014-02-04 05:19:56 +08:00
yoco
515bbf8bb2 Improve reshape test case
- simplify test code
- add reshape chain
2014-02-04 02:50:23 +08:00
yoco
009047db27 Fix Reshape traits flag calculate bug 2014-02-04 02:21:41 +08:00
yoco
2b89080903 Remove reshape InnerPanel, add test, fix bug 2014-01-20 01:43:28 +08:00
yoco
03723abda0 Remove useless reshape row/col ctor 2014-01-20 00:22:16 +08:00
yoco
342c8e5321 Fix Reshape DirectAccessBit bug 2014-01-20 00:15:19 +08:00
yoco
24e1c0f2a1 add reshape test for const and static-size matrix 2014-01-18 23:27:53 +08:00
yoco
150796337a Add unit-test for reshape
- add unittest for dynamic & fixed-size reshape
- fix fixed-size reshape bugs bugs found while testing
2014-01-18 16:10:46 +08:00
yoco
497a7b0ce1 remove c++11, make c++03 compatible 2014-01-18 13:20:14 +08:00
yoco
9c832fad60 add reshape() snippets 2014-01-18 04:53:46 +08:00
yoco
1e1d0c15b5 add example code for Reshape class 2014-01-18 04:43:29 +08:00
yoco
fe2ad0647a reshape now supported
- add member function to plugin
- add forward declaration
- add documentation
- add include
2014-01-18 04:21:20 +08:00
yoco
7bd58ad0b6 add Eigen/src/Core/Reshape.h 2014-01-18 04:16:44 +08:00
891 changed files with 74189 additions and 20601 deletions

View File

@@ -1,3 +1,4 @@
syntax: glob
qrc_*cxx
*.orig
*.pyc

View File

@@ -1,6 +1,6 @@
project(Eigen3)
cmake_minimum_required(VERSION 2.8.5)
cmake_minimum_required(VERSION 2.8.11)
# guard against in-source builds
@@ -8,6 +8,7 @@ if(${CMAKE_SOURCE_DIR} STREQUAL ${CMAKE_BINARY_DIR})
message(FATAL_ERROR "In-source builds not allowed. Please make a new directory (called a build directory) and run CMake from there. You may need to remove CMakeCache.txt. ")
endif()
# Alias Eigen_*_DIR to Eigen3_*_DIR:
set(Eigen_SOURCE_DIR ${Eigen3_SOURCE_DIR})
@@ -28,7 +29,7 @@ endif()
#############################################################################
# retrieve version infomation #
# retrieve version information #
#############################################################################
# automatically parse the version number
@@ -53,13 +54,13 @@ endif()
if(EIGEN_BRANCH_OUTPUT MATCHES "default")
string(REGEX MATCH "^changeset: *[0-9]*:([0-9;a-f]+).*" EIGEN_HG_CHANGESET_MATCH "${EIGEN_HGTIP_OUTPUT}")
set(EIGEN_HG_CHANGESET "${CMAKE_MATCH_1}")
endif(EIGEN_BRANCH_OUTPUT MATCHES "default")
endif()
#...and show it next to the version number
if(EIGEN_HG_CHANGESET)
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER} (mercurial changeset ${EIGEN_HG_CHANGESET})")
else(EIGEN_HG_CHANGESET)
else()
set(EIGEN_VERSION "${EIGEN_VERSION_NUMBER}")
endif(EIGEN_HG_CHANGESET)
endif()
include(CheckCXXCompilerFlag)
@@ -78,7 +79,7 @@ macro(ei_add_cxx_compiler_flag FLAG)
if(COMPILER_SUPPORT_${SFLAG})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${FLAG}")
endif()
endmacro(ei_add_cxx_compiler_flag)
endmacro()
check_cxx_compiler_flag("-std=c++11" EIGEN_COMPILER_SUPPORT_CPP11)
@@ -134,7 +135,7 @@ if(NOT WIN32 OR NOT CMAKE_HOST_SYSTEM_NAME MATCHES Windows)
option(EIGEN_BUILD_PKGCONFIG "Build pkg-config .pc file for Eigen" ON)
endif()
set(CMAKE_INCLUDE_CURRENT_DIR ON)
set(CMAKE_INCLUDE_CURRENT_DIR OFF)
option(EIGEN_SPLIT_LARGE_TESTS "Split large tests into smaller executables" ON)
@@ -158,7 +159,7 @@ if(NOT MSVC)
ei_add_cxx_compiler_flag("-Wall")
ei_add_cxx_compiler_flag("-Wextra")
#ei_add_cxx_compiler_flag("-Weverything") # clang
ei_add_cxx_compiler_flag("-Wundef")
ei_add_cxx_compiler_flag("-Wcast-align")
ei_add_cxx_compiler_flag("-Wchar-subscripts")
@@ -173,29 +174,25 @@ if(NOT MSVC)
ei_add_cxx_compiler_flag("-Wc++11-extensions")
ei_add_cxx_compiler_flag("-Wdouble-promotion")
# ei_add_cxx_compiler_flag("-Wconversion")
# -Wshadow is insanely too strict with gcc, hopefully it will become usable with gcc 6
# if(NOT CMAKE_COMPILER_IS_GNUCXX OR (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "5.0.0"))
if(NOT CMAKE_COMPILER_IS_GNUCXX)
ei_add_cxx_compiler_flag("-Wshadow")
endif()
ei_add_cxx_compiler_flag("-Wshadow")
ei_add_cxx_compiler_flag("-Wno-psabi")
ei_add_cxx_compiler_flag("-Wno-variadic-macros")
ei_add_cxx_compiler_flag("-Wno-long-long")
ei_add_cxx_compiler_flag("-fno-check-new")
ei_add_cxx_compiler_flag("-fno-common")
ei_add_cxx_compiler_flag("-fstrict-aliasing")
ei_add_cxx_compiler_flag("-wd981") # disable ICC's "operands are evaluated in unspecified order" remark
ei_add_cxx_compiler_flag("-wd2304") # disable ICC's "warning #2304: non-explicit constructor with single argument may cause implicit type conversion" produced by -Wnon-virtual-dtor
# The -ansi flag must be added last, otherwise it is also used as a linker flag by check_cxx_compiler_flag making it fails
# Moreover we should not set both -strict-ansi and -ansi
check_cxx_compiler_flag("-strict-ansi" COMPILER_SUPPORT_STRICTANSI)
ei_add_cxx_compiler_flag("-Qunused-arguments") # disable clang warning: argument unused during compilation: '-ansi'
if(COMPILER_SUPPORT_STRICTANSI)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -strict-ansi")
else()
@@ -206,7 +203,7 @@ if(NOT MSVC)
ei_add_cxx_compiler_flag("-pie")
ei_add_cxx_compiler_flag("-fPIE")
endif()
set(CMAKE_REQUIRED_FLAGS "")
option(EIGEN_TEST_SSE2 "Enable/Disable SSE2 in tests/examples" OFF)
@@ -253,7 +250,10 @@ if(NOT MSVC)
option(EIGEN_TEST_AVX512 "Enable/Disable AVX512 in tests/examples" OFF)
if(EIGEN_TEST_AVX512)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -fabi-version=6 -DEIGEN_ENABLE_AVX512")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -mfma -DEIGEN_ENABLE_AVX512")
if (NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fabi-version=6")
endif()
message(STATUS "Enabling AVX512 in tests/examples")
endif()
@@ -275,6 +275,12 @@ if(NOT MSVC)
message(STATUS "Enabling VSX in tests/examples")
endif()
option(EIGEN_TEST_MSA "Enable/Disable MSA in tests/examples" OFF)
if(EIGEN_TEST_MSA)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mmsa")
message(STATUS "Enabling MSA in tests/examples")
endif()
option(EIGEN_TEST_NEON "Enable/Disable Neon in tests/examples" OFF)
if(EIGEN_TEST_NEON)
if(EIGEN_TEST_FMA)
@@ -292,12 +298,18 @@ if(NOT MSVC)
message(STATUS "Enabling NEON in tests/examples")
endif()
option(EIGEN_TEST_ZVECTOR "Enable/Disable S390X(zEC13) ZVECTOR in tests/examples" OFF)
if(EIGEN_TEST_ZVECTOR)
option(EIGEN_TEST_Z13 "Enable/Disable S390X(zEC13) ZVECTOR in tests/examples" OFF)
if(EIGEN_TEST_Z13)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=z13 -mzvector")
message(STATUS "Enabling S390X(zEC13) ZVECTOR in tests/examples")
endif()
option(EIGEN_TEST_Z14 "Enable/Disable S390X(zEC14) ZVECTOR in tests/examples" OFF)
if(EIGEN_TEST_Z14)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=z14 -mzvector")
message(STATUS "Enabling S390X(zEC13) ZVECTOR in tests/examples")
endif()
check_cxx_compiler_flag("-fopenmp" COMPILER_SUPPORT_OPENMP)
if(COMPILER_SUPPORT_OPENMP)
option(EIGEN_TEST_OPENMP "Enable/Disable OpenMP in tests/examples" OFF)
@@ -307,7 +319,7 @@ if(NOT MSVC)
endif()
endif()
else(NOT MSVC)
else()
# C4127 - conditional expression is constant
# C4714 - marked as __forceinline not inlined (I failed to deactivate it selectively)
@@ -315,7 +327,7 @@ else(NOT MSVC)
# because we are oftentimes returning objects that have a destructor or may
# throw exceptions - in particular in the unit tests we are throwing extra many
# exceptions to cover indexing errors.
# C4505 - unreferenced local function has been removed (impossible to deactive selectively)
# C4505 - unreferenced local function has been removed (impossible to deactivate selectively)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /EHsc /wd4127 /wd4505 /wd4714")
# replace all /Wx by /W4
@@ -335,10 +347,23 @@ else(NOT MSVC)
if(NOT CMAKE_CL_64)
# arch is not supported on 64 bit systems, SSE is enabled automatically.
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:SSE2")
endif(NOT CMAKE_CL_64)
endif()
message(STATUS "Enabling SSE2 in tests/examples")
endif(EIGEN_TEST_SSE2)
endif(NOT MSVC)
endif()
option(EIGEN_TEST_AVX "Enable/Disable AVX in tests/examples" OFF)
if(EIGEN_TEST_AVX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
message(STATUS "Enabling AVX in tests/examples")
endif()
option(EIGEN_TEST_FMA "Enable/Disable FMA/AVX2 in tests/examples" OFF)
if(EIGEN_TEST_FMA AND NOT EIGEN_TEST_NEON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
message(STATUS "Enabling FMA/AVX2 in tests/examples")
endif()
endif()
option(EIGEN_TEST_NO_EXPLICIT_VECTORIZATION "Disable explicit vectorization in tests/examples" OFF)
option(EIGEN_TEST_X87 "Force using X87 instructions. Implies no vectorization." OFF)
@@ -382,7 +407,7 @@ endif()
set(EIGEN_CUDA_COMPUTE_ARCH 30 CACHE STRING "The CUDA compute architecture level to target when compiling CUDA code")
include_directories(${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR})
include_directories(${CMAKE_CURRENT_SOURCE_DIR})
# Backward compatibility support for EIGEN_INCLUDE_INSTALL_DIR
if(EIGEN_INCLUDE_INSTALL_DIR)
@@ -391,27 +416,22 @@ endif()
if(EIGEN_INCLUDE_INSTALL_DIR AND NOT INCLUDE_INSTALL_DIR)
set(INCLUDE_INSTALL_DIR ${EIGEN_INCLUDE_INSTALL_DIR}
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed")
else()
set(INCLUDE_INSTALL_DIR
"${CMAKE_INSTALL_INCLUDEDIR}/eigen3"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen header files are installed"
)
endif()
set(CMAKEPACKAGE_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/eigen3/cmake"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where Eigen3Config.cmake is installed"
)
set(PKGCONFIG_INSTALL_DIR
"${CMAKE_INSTALL_DATADIR}/pkgconfig"
CACHE STRING "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
CACHE PATH "The directory relative to CMAKE_PREFIX_PATH where eigen3.pc is installed"
)
foreach(var INCLUDE_INSTALL_DIR CMAKEPACKAGE_INSTALL_DIR PKGCONFIG_INSTALL_DIR)
if(IS_ABSOLUTE "${${var}}")
message(FATAL_ERROR "${var} must be relative to CMAKE_PREFIX_PATH. Got: ${${var}}")
endif()
endforeach()
# similar to set_target_properties but append the property instead of overwriting it
macro(ei_add_target_property target prop value)
@@ -420,9 +440,9 @@ macro(ei_add_target_property target prop value)
# if the property wasn't previously set, ${previous} is now "previous-NOTFOUND" which cmake allows catching with plain if()
if(NOT previous)
set(previous "")
endif(NOT previous)
endif()
set_target_properties(${target} PROPERTIES ${prop} "${previous} ${value}")
endmacro(ei_add_target_property)
endmacro()
install(FILES
signature_of_eigen3_matrix_library
@@ -436,7 +456,7 @@ if(EIGEN_BUILD_PKGCONFIG)
)
endif()
add_subdirectory(Eigen)
install(DIRECTORY Eigen DESTINATION ${INCLUDE_INSTALL_DIR} COMPONENT Devel)
add_subdirectory(doc EXCLUDE_FROM_ALL)
@@ -449,6 +469,8 @@ if(BUILD_TESTING)
else()
add_subdirectory(test EXCLUDE_FROM_ALL)
endif()
add_subdirectory(failtest)
endif()
if(EIGEN_LEAVE_TEST_IN_ALL_TARGET)
@@ -461,9 +483,31 @@ endif()
# add SYCL
option(EIGEN_TEST_SYCL "Add Sycl support." OFF)
option(EIGEN_SYCL_TRISYCL "Use the triSYCL Sycl implementation (ComputeCPP by default)." OFF)
if(EIGEN_TEST_SYCL)
set (CMAKE_MODULE_PATH "${CMAKE_ROOT}/Modules" "cmake/Modules/" "${CMAKE_MODULE_PATH}")
include(FindComputeCpp)
if(EIGEN_SYCL_TRISYCL)
message(STATUS "Using triSYCL")
include(FindTriSYCL)
else()
message(STATUS "Using ComputeCPP SYCL")
include(FindComputeCpp)
set(COMPUTECPP_DRIVER_DEFAULT_VALUE OFF)
if (NOT MSVC)
set(COMPUTECPP_DRIVER_DEFAULT_VALUE ON)
endif()
option(COMPUTECPP_USE_COMPILER_DRIVER
"Use ComputeCpp driver instead of a 2 steps compilation"
${COMPUTECPP_DRIVER_DEFAULT_VALUE}
)
endif(EIGEN_SYCL_TRISYCL)
option(EIGEN_DONT_VECTORIZE_SYCL "Don't use vectorisation in the SYCL tests." OFF)
if(EIGEN_DONT_VECTORIZE_SYCL)
message(STATUS "Disabling SYCL vectorization in tests/examples")
# When disabling SYCL vectorization, also disable Eigen default vectorization
add_definitions(-DEIGEN_DONT_VECTORIZE=1)
add_definitions(-DEIGEN_DONT_VECTORIZE_SYCL=1)
endif()
endif()
add_subdirectory(unsupported)
@@ -476,11 +520,11 @@ add_subdirectory(scripts EXCLUDE_FROM_ALL)
# TODO: consider also replacing EIGEN_BUILD_BTL by a custom target "make btl"?
if(EIGEN_BUILD_BTL)
add_subdirectory(bench/btl EXCLUDE_FROM_ALL)
endif(EIGEN_BUILD_BTL)
endif()
if(NOT WIN32)
add_subdirectory(bench/spbench EXCLUDE_FROM_ALL)
endif(NOT WIN32)
endif()
configure_file(scripts/cdashtesting.cmake.in cdashtesting.cmake @ONLY)
@@ -492,11 +536,6 @@ message(STATUS "")
message(STATUS "Configured Eigen ${EIGEN_VERSION_NUMBER}")
message(STATUS "")
option(EIGEN_FAILTEST "Enable failtests." OFF)
if(EIGEN_FAILTEST)
add_subdirectory(failtest)
endif()
string(TOLOWER "${CMAKE_GENERATOR}" cmake_generator_tolower)
if(cmake_generator_tolower MATCHES "makefile")
message(STATUS "Some things you can do now:")
@@ -513,8 +552,10 @@ if(cmake_generator_tolower MATCHES "makefile")
message(STATUS " | Or:")
message(STATUS " | cmake . -DINCLUDE_INSTALL_DIR=yourdir")
message(STATUS "make doc | Generate the API documentation, requires Doxygen & LaTeX")
message(STATUS "make check | Build and run the unit-tests. Read this page:")
message(STATUS " | http://eigen.tuxfamily.org/index.php?title=Tests")
if(BUILD_TESTING)
message(STATUS "make check | Build and run the unit-tests. Read this page:")
message(STATUS " | http://eigen.tuxfamily.org/index.php?title=Tests")
endif()
message(STATUS "make blas | Build BLAS library (not the same thing as Eigen)")
message(STATUS "make uninstall| Removes files installed by make install")
message(STATUS "--------------+--------------------------------------------------------------")
@@ -540,7 +581,7 @@ if (NOT CMAKE_VERSION VERSION_LESS 3.0)
# Imported target support
add_library (eigen INTERFACE)
add_library (Eigen3::Eigen ALIAS eigen)
target_compile_definitions (eigen INTERFACE ${EIGEN_DEFINITIONS})
target_include_directories (eigen INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
@@ -579,13 +620,13 @@ if (NOT CMAKE_VERSION VERSION_LESS 3.0)
install (EXPORT Eigen3Targets NAMESPACE Eigen3:: DESTINATION ${CMAKEPACKAGE_INSTALL_DIR})
else (NOT CMAKE_VERSION VERSION_LESS 3.0)
else ()
# Fallback to legacy Eigen3Config.cmake without the imported target
# If CMakePackageConfigHelpers module is available (CMake >= 2.8.8)
# create a relocatable Config file, otherwise leave the hardcoded paths
# create a relocatable Config file, otherwise leave the hardcoded paths
include(CMakePackageConfigHelpers OPTIONAL RESULT_VARIABLE CPCH_PATH)
if(CPCH_PATH)
configure_package_config_file (
${CMAKE_CURRENT_SOURCE_DIR}/cmake/Eigen3ConfigLegacy.cmake.in
@@ -594,7 +635,7 @@ else (NOT CMAKE_VERSION VERSION_LESS 3.0)
INSTALL_DESTINATION ${CMAKEPACKAGE_INSTALL_DIR}
NO_CHECK_REQUIRED_COMPONENTS_MACRO # Eigen does not provide components
)
else()
else()
# The PACKAGE_* variables are defined by the configure_package_config_file
# but without it we define them manually to the hardcoded paths
set(PACKAGE_INIT "")
@@ -609,7 +650,7 @@ else (NOT CMAKE_VERSION VERSION_LESS 3.0)
VERSION ${EIGEN_VERSION_NUMBER}
COMPATIBILITY SameMajorVersion )
endif (NOT CMAKE_VERSION VERSION_LESS 3.0)
endif ()
install ( FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake/UseEigen3.cmake
${CMAKE_CURRENT_BINARY_DIR}/Eigen3Config.cmake

View File

@@ -2,8 +2,8 @@
## Then modify the CMakeLists.txt file in the root directory of your
## project to incorporate the testing dashboard.
## # The following are required to uses Dart and the Cdash dashboard
## ENABLE_TESTING()
## INCLUDE(CTest)
## enable_testing()
## include(CTest)
set(CTEST_PROJECT_NAME "Eigen")
set(CTEST_NIGHTLY_START_TIME "00:00:00 UTC")
@@ -11,3 +11,7 @@ set(CTEST_DROP_METHOD "http")
set(CTEST_DROP_SITE "my.cdash.org")
set(CTEST_DROP_LOCATION "/submit.php?project=Eigen")
set(CTEST_DROP_SITE_CDASH TRUE)
#set(CTEST_PROJECT_SUBPROJECTS
#Official
#Unsupported
#)

View File

@@ -1,19 +0,0 @@
include(RegexUtils)
test_escape_string_as_regex()
file(GLOB Eigen_directory_files "*")
escape_string_as_regex(ESCAPED_CMAKE_CURRENT_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}")
foreach(f ${Eigen_directory_files})
if(NOT f MATCHES "\\.txt" AND NOT f MATCHES "${ESCAPED_CMAKE_CURRENT_SOURCE_DIR}/[.].+" AND NOT f MATCHES "${ESCAPED_CMAKE_CURRENT_SOURCE_DIR}/src")
list(APPEND Eigen_directory_files_to_install ${f})
endif()
endforeach(f ${Eigen_directory_files})
install(FILES
${Eigen_directory_files_to_install}
DESTINATION ${INCLUDE_INSTALL_DIR}/Eigen COMPONENT Devel
)
install(DIRECTORY src DESTINATION ${INCLUDE_INSTALL_DIR}/Eigen COMPONENT Devel FILES_MATCHING PATTERN "*.h")

View File

@@ -14,79 +14,26 @@
// first thing Eigen does: stop the compiler from committing suicide
#include "src/Core/util/DisableStupidWarnings.h"
#if defined(__CUDACC__) && !defined(EIGEN_NO_CUDA)
#define EIGEN_CUDACC __CUDACC__
// then include this file where all our macros are defined. It's really important to do it first because
// it's where we do all the compiler/OS/arch detections and define most defaults.
#include "src/Core/util/Macros.h"
// This detects SSE/AVX/NEON/etc. and configure alignment settings
#include "src/Core/util/ConfigureVectorization.h"
// We need cuda_runtime.h/hip_runtime.h to ensure that
// the EIGEN_USING_STD_MATH macro works properly on the device side
#if defined(EIGEN_CUDACC)
#include <cuda_runtime.h>
#elif defined(EIGEN_HIPCC)
#include <hip/hip_runtime.h>
#endif
#if defined(__CUDA_ARCH__) && !defined(EIGEN_NO_CUDA)
#define EIGEN_CUDA_ARCH __CUDA_ARCH__
#endif
#if defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 9)
#define EIGEN_CUDACC_VER ((__CUDACC_VER_MAJOR__ * 10000) + (__CUDACC_VER_MINOR__ * 100))
#elif defined(__CUDACC_VER__)
#define EIGEN_CUDACC_VER __CUDACC_VER__
#else
#define EIGEN_CUDACC_VER 0
#endif
// Handle NVCC/CUDA/SYCL
#if defined(__CUDACC__) || defined(__SYCL_DEVICE_ONLY__)
// Do not try asserts on CUDA and SYCL!
#ifndef EIGEN_NO_DEBUG
#define EIGEN_NO_DEBUG
#endif
#ifdef EIGEN_INTERNAL_DEBUGGING
#undef EIGEN_INTERNAL_DEBUGGING
#endif
#ifdef EIGEN_EXCEPTIONS
#undef EIGEN_EXCEPTIONS
#endif
// All functions callable from CUDA code must be qualified with __device__
#ifdef __CUDACC__
// Do not try to vectorize on CUDA and SYCL!
#ifndef EIGEN_DONT_VECTORIZE
#define EIGEN_DONT_VECTORIZE
#endif
#define EIGEN_DEVICE_FUNC __host__ __device__
// We need cuda_runtime.h to ensure that that EIGEN_USING_STD_MATH macro
// works properly on the device side
#include <cuda_runtime.h>
#else
#define EIGEN_DEVICE_FUNC
#endif
#else
#define EIGEN_DEVICE_FUNC
#endif
// When compiling CUDA device code with NVCC, pull in math functions from the
// global namespace. In host mode, and when device doee with clang, use the
// std versions.
#if defined(__CUDA_ARCH__) && defined(__NVCC__)
#define EIGEN_USING_STD_MATH(FUNC) using ::FUNC;
#else
#define EIGEN_USING_STD_MATH(FUNC) using std::FUNC;
#endif
#if (defined(_CPPUNWIND) || defined(__EXCEPTIONS)) && !defined(__CUDA_ARCH__) && !defined(EIGEN_EXCEPTIONS) && !defined(EIGEN_USE_SYCL)
#define EIGEN_EXCEPTIONS
#endif
#ifdef EIGEN_EXCEPTIONS
#include <new>
#endif
// then include this file where all our macros are defined. It's really important to do it first because
// it's where we do all the alignment settings (platform detection and honoring the user's will if he
// defined e.g. EIGEN_DONT_ALIGN) so it needs to be done before we do anything with vectorization.
#include "src/Core/util/Macros.h"
// Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3)
// See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details.
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6)
@@ -99,163 +46,9 @@
// and inclusion of their respective header files
#include "src/Core/util/MKL_support.h"
// if alignment is disabled, then disable vectorization. Note: EIGEN_MAX_ALIGN_BYTES is the proper check, it takes into
// account both the user's will (EIGEN_MAX_ALIGN_BYTES,EIGEN_DONT_ALIGN) and our own platform checks
#if EIGEN_MAX_ALIGN_BYTES==0
#ifndef EIGEN_DONT_VECTORIZE
#define EIGEN_DONT_VECTORIZE
#endif
#endif
#if EIGEN_COMP_MSVC
#include <malloc.h> // for _aligned_malloc -- need it regardless of whether vectorization is enabled
#if (EIGEN_COMP_MSVC >= 1500) // 2008 or later
// Remember that usage of defined() in a #define is undefined by the standard.
// a user reported that in 64-bit mode, MSVC doesn't care to define _M_IX86_FP.
#if (defined(_M_IX86_FP) && (_M_IX86_FP >= 2)) || EIGEN_ARCH_x86_64
#define EIGEN_SSE2_ON_MSVC_2008_OR_LATER
#endif
#endif
#else
// Remember that usage of defined() in a #define is undefined by the standard
#if (defined __SSE2__) && ( (!EIGEN_COMP_GNUC) || EIGEN_COMP_ICC || EIGEN_GNUC_AT_LEAST(4,2) )
#define EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC
#endif
#endif
#ifndef EIGEN_DONT_VECTORIZE
#if defined (EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC) || defined(EIGEN_SSE2_ON_MSVC_2008_OR_LATER)
// Defines symbols for compile-time detection of which instructions are
// used.
// EIGEN_VECTORIZE_YY is defined if and only if the instruction set YY is used
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_SSE
#define EIGEN_VECTORIZE_SSE2
// Detect sse3/ssse3/sse4:
// gcc and icc defines __SSE3__, ...
// there is no way to know about this on msvc. You can define EIGEN_VECTORIZE_SSE* if you
// want to force the use of those instructions with msvc.
#ifdef __SSE3__
#define EIGEN_VECTORIZE_SSE3
#endif
#ifdef __SSSE3__
#define EIGEN_VECTORIZE_SSSE3
#endif
#ifdef __SSE4_1__
#define EIGEN_VECTORIZE_SSE4_1
#endif
#ifdef __SSE4_2__
#define EIGEN_VECTORIZE_SSE4_2
#endif
#ifdef __AVX__
#define EIGEN_VECTORIZE_AVX
#define EIGEN_VECTORIZE_SSE3
#define EIGEN_VECTORIZE_SSSE3
#define EIGEN_VECTORIZE_SSE4_1
#define EIGEN_VECTORIZE_SSE4_2
#endif
#ifdef __AVX2__
#define EIGEN_VECTORIZE_AVX2
#endif
#ifdef __FMA__
#define EIGEN_VECTORIZE_FMA
#endif
#if defined(__AVX512F__) && defined(EIGEN_ENABLE_AVX512)
#define EIGEN_VECTORIZE_AVX512
#define EIGEN_VECTORIZE_AVX2
#define EIGEN_VECTORIZE_AVX
#define EIGEN_VECTORIZE_FMA
#ifdef __AVX512DQ__
#define EIGEN_VECTORIZE_AVX512DQ
#endif
#ifdef __AVX512ER__
#define EIGEN_VECTORIZE_AVX512ER
#endif
#endif
// include files
// This extern "C" works around a MINGW-w64 compilation issue
// https://sourceforge.net/tracker/index.php?func=detail&aid=3018394&group_id=202880&atid=983354
// In essence, intrin.h is included by windows.h and also declares intrinsics (just as emmintrin.h etc. below do).
// However, intrin.h uses an extern "C" declaration, and g++ thus complains of duplicate declarations
// with conflicting linkage. The linkage for intrinsics doesn't matter, but at that stage the compiler doesn't know;
// so, to avoid compile errors when windows.h is included after Eigen/Core, ensure intrinsics are extern "C" here too.
// notice that since these are C headers, the extern "C" is theoretically needed anyways.
extern "C" {
// In theory we should only include immintrin.h and not the other *mmintrin.h header files directly.
// Doing so triggers some issues with ICC. However old gcc versions seems to not have this file, thus:
#if EIGEN_COMP_ICC >= 1110
#include <immintrin.h>
#else
#include <mmintrin.h>
#include <emmintrin.h>
#include <xmmintrin.h>
#ifdef EIGEN_VECTORIZE_SSE3
#include <pmmintrin.h>
#endif
#ifdef EIGEN_VECTORIZE_SSSE3
#include <tmmintrin.h>
#endif
#ifdef EIGEN_VECTORIZE_SSE4_1
#include <smmintrin.h>
#endif
#ifdef EIGEN_VECTORIZE_SSE4_2
#include <nmmintrin.h>
#endif
#if defined(EIGEN_VECTORIZE_AVX) || defined(EIGEN_VECTORIZE_AVX512)
#include <immintrin.h>
#endif
#endif
} // end extern "C"
#elif defined __VSX__
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_VSX
#include <altivec.h>
// We need to #undef all these ugly tokens defined in <altivec.h>
// => use __vector instead of vector
#undef bool
#undef vector
#undef pixel
#elif defined __ALTIVEC__
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_ALTIVEC
#include <altivec.h>
// We need to #undef all these ugly tokens defined in <altivec.h>
// => use __vector instead of vector
#undef bool
#undef vector
#undef pixel
#elif (defined __ARM_NEON) || (defined __ARM_NEON__)
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_NEON
#include <arm_neon.h>
#elif (defined __s390x__ && defined __VEC__)
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_ZVECTOR
#include <vecintrin.h>
#endif
#endif
#if defined(__F16C__) && !defined(EIGEN_COMP_CLANG)
// We can use the optimized fp16 to float and float to fp16 conversion routines
#define EIGEN_HAS_FP16_C
#endif
#if defined __CUDACC__
#define EIGEN_VECTORIZE_CUDA
#include <vector_types.h>
#if EIGEN_CUDACC_VER >= 70500
#define EIGEN_HAS_CUDA_FP16
#endif
#endif
#if defined EIGEN_HAS_CUDA_FP16
#include <host_defines.h>
#include <cuda_fp16.h>
#if defined(EIGEN_HAS_CUDA_FP16) || defined(EIGEN_HAS_HIP_FP16)
#define EIGEN_HAS_GPU_FP16
#endif
#if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE)
@@ -279,7 +72,6 @@
#include <cmath>
#include <cassert>
#include <functional>
#include <sstream>
#ifndef EIGEN_NO_IO
#include <iosfwd>
#endif
@@ -290,6 +82,10 @@
// for min/max:
#include <algorithm>
#if EIGEN_HAS_CXX11
#include <array>
#endif
// for std::is_nothrow_move_assignable
#ifdef EIGEN_INCLUDE_TYPE_TRAITS
#include <type_traits>
@@ -305,38 +101,25 @@
#include <intrin.h>
#endif
/** \brief Namespace containing all symbols from the %Eigen library. */
namespace Eigen {
inline static const char *SimdInstructionSetsInUse(void) {
#if defined(EIGEN_VECTORIZE_AVX512)
return "AVX512, FMA, AVX2, AVX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
#elif defined(EIGEN_VECTORIZE_AVX)
return "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
#elif defined(EIGEN_VECTORIZE_SSE4_2)
return "SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
#elif defined(EIGEN_VECTORIZE_SSE4_1)
return "SSE, SSE2, SSE3, SSSE3, SSE4.1";
#elif defined(EIGEN_VECTORIZE_SSSE3)
return "SSE, SSE2, SSE3, SSSE3";
#elif defined(EIGEN_VECTORIZE_SSE3)
return "SSE, SSE2, SSE3";
#elif defined(EIGEN_VECTORIZE_SSE2)
return "SSE, SSE2";
#elif defined(EIGEN_VECTORIZE_ALTIVEC)
return "AltiVec";
#elif defined(EIGEN_VECTORIZE_VSX)
return "VSX";
#elif defined(EIGEN_VECTORIZE_NEON)
return "ARM NEON";
#elif defined(EIGEN_VECTORIZE_ZVECTOR)
return "S390X ZVECTOR";
#else
return "None";
#if defined(EIGEN_USE_SYCL)
#undef min
#undef max
#undef isnan
#undef isinf
#undef isfinite
#include <SYCL/sycl.hpp>
#include <map>
#include <memory>
#include <utility>
#include <thread>
#ifndef EIGEN_SYCL_LOCAL_THREAD_DIM0
#define EIGEN_SYCL_LOCAL_THREAD_DIM0 16
#endif
#ifndef EIGEN_SYCL_LOCAL_THREAD_DIM1
#define EIGEN_SYCL_LOCAL_THREAD_DIM1 16
#endif
#endif
}
} // end namespace Eigen
#if defined EIGEN2_SUPPORT_STAGE40_FULL_EIGEN3_STRICTNESS || defined EIGEN2_SUPPORT_STAGE30_FULL_EIGEN3_API || defined EIGEN2_SUPPORT_STAGE20_RESOLVE_API_CONFLICTS || defined EIGEN2_SUPPORT_STAGE10_FULL_EIGEN2_API || defined EIGEN2_SUPPORT
// This will generate an error message:
@@ -345,7 +128,7 @@ inline static const char *SimdInstructionSetsInUse(void) {
namespace Eigen {
// we use size_t frequently and we'll never remember to prepend it with std:: everytime just to
// we use size_t frequently and we'll never remember to prepend it with std:: every time just to
// ensure QNX/QCC support
using std::size_t;
// gcc 4.6.0 wants std:: for ptrdiff_t
@@ -369,60 +152,85 @@ using std::ptrdiff_t;
#include "src/Core/util/StaticAssert.h"
#include "src/Core/util/XprHelper.h"
#include "src/Core/util/Memory.h"
#include "src/Core/util/IntegralConstant.h"
#include "src/Core/util/SymbolicIndex.h"
#include "src/Core/NumTraits.h"
#include "src/Core/MathFunctions.h"
#include "src/Core/GenericPacketMath.h"
#include "src/Core/MathFunctionsImpl.h"
#include "src/Core/arch/Default/ConjHelper.h"
// Generic half float support
#include "src/Core/arch/Default/Half.h"
#include "src/Core/arch/Default/TypeCasting.h"
#include "src/Core/arch/Default/GenericPacketMathFunctionsFwd.h"
#if defined EIGEN_VECTORIZE_AVX512
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#include "src/Core/arch/AVX/TypeCasting.h"
#include "src/Core/arch/AVX/Complex.h"
#include "src/Core/arch/AVX512/PacketMath.h"
#include "src/Core/arch/AVX512/TypeCasting.h"
#include "src/Core/arch/AVX512/Complex.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#include "src/Core/arch/AVX512/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_AVX
// Use AVX for floats and doubles, SSE for integers
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#include "src/Core/arch/AVX/Complex.h"
#include "src/Core/arch/AVX/TypeCasting.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX/TypeCasting.h"
#include "src/Core/arch/AVX/Complex.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_SSE
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#elif defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/PacketMath.h"
#include "src/Core/arch/AltiVec/MathFunctions.h"
#include "src/Core/arch/AltiVec/Complex.h"
#elif defined EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/PacketMath.h"
#include "src/Core/arch/NEON/TypeCasting.h"
#include "src/Core/arch/NEON/MathFunctions.h"
#include "src/Core/arch/NEON/Complex.h"
#elif defined EIGEN_VECTORIZE_ZVECTOR
#include "src/Core/arch/ZVector/PacketMath.h"
#include "src/Core/arch/ZVector/MathFunctions.h"
#include "src/Core/arch/ZVector/Complex.h"
#elif defined EIGEN_VECTORIZE_MSA
#include "src/Core/arch/MSA/PacketMath.h"
#include "src/Core/arch/MSA/MathFunctions.h"
#include "src/Core/arch/MSA/Complex.h"
#endif
// Half float support
#include "src/Core/arch/CUDA/Half.h"
#include "src/Core/arch/CUDA/PacketMathHalf.h"
#include "src/Core/arch/CUDA/TypeCasting.h"
#if defined EIGEN_VECTORIZE_GPU
#include "src/Core/arch/GPU/PacketMath.h"
#include "src/Core/arch/GPU/MathFunctions.h"
#include "src/Core/arch/GPU/TypeCasting.h"
#endif
#if defined EIGEN_VECTORIZE_CUDA
#include "src/Core/arch/CUDA/PacketMath.h"
#include "src/Core/arch/CUDA/MathFunctions.h"
#if defined(EIGEN_USE_SYCL)
#include "src/Core/arch/SYCL/SyclMemoryModel.h"
#include "src/Core/arch/SYCL/InteropHeaders.h"
#if !defined(EIGEN_DONT_VECTORIZE_SYCL)
#include "src/Core/arch/SYCL/PacketMath.h"
#include "src/Core/arch/SYCL/MathFunctions.h"
#include "src/Core/arch/SYCL/TypeCasting.h"
#endif
#endif
#include "src/Core/arch/Default/Settings.h"
// This file provides generic implementations valid for scalar as well
#include "src/Core/arch/Default/GenericPacketMathFunctions.h"
#include "src/Core/functors/TernaryFunctors.h"
#include "src/Core/functors/BinaryFunctors.h"
@@ -433,9 +241,16 @@ using std::ptrdiff_t;
// Specialized functors to enable the processing of complex numbers
// on CUDA devices
#ifdef EIGEN_CUDACC
#include "src/Core/arch/CUDA/Complex.h"
#endif
#include "src/Core/IO.h"
#include "src/Core/util/IndexedViewHelper.h"
#include "src/Core/util/ReshapedHelper.h"
#include "src/Core/ArithmeticSequence.h"
#ifndef EIGEN_NO_IO
#include "src/Core/IO.h"
#endif
#include "src/Core/DenseCoeffsBase.h"
#include "src/Core/DenseBase.h"
#include "src/Core/MatrixBase.h"
@@ -476,6 +291,8 @@ using std::ptrdiff_t;
#include "src/Core/Ref.h"
#include "src/Core/Block.h"
#include "src/Core/VectorBlock.h"
#include "src/Core/IndexedView.h"
#include "src/Core/Reshaped.h"
#include "src/Core/Transpose.h"
#include "src/Core/DiagonalMatrix.h"
#include "src/Core/Diagonal.h"
@@ -515,10 +332,12 @@ using std::ptrdiff_t;
#include "src/Core/BooleanRedux.h"
#include "src/Core/Select.h"
#include "src/Core/VectorwiseOp.h"
#include "src/Core/PartialReduxEvaluator.h"
#include "src/Core/Random.h"
#include "src/Core/Replicate.h"
#include "src/Core/Reverse.h"
#include "src/Core/ArrayWrapper.h"
#include "src/Core/StlIterators.h"
#ifdef EIGEN_USE_BLAS
#include "src/Core/products/GeneralMatrixMatrix_BLAS.h"

View File

@@ -49,9 +49,8 @@
#include "src/Geometry/AlignedBox.h"
#include "src/Geometry/Umeyama.h"
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
// Use the SSE optimized version whenever possible.
#if defined EIGEN_VECTORIZE_SSE
#include "src/Geometry/arch/Geometry_SSE.h"
#endif
@@ -59,4 +58,3 @@
#endif // EIGEN_GEOMETRY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */

41
Eigen/KLUSupport Normal file
View File

@@ -0,0 +1,41 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_KLUSUPPORT_MODULE_H
#define EIGEN_KLUSUPPORT_MODULE_H
#include <Eigen/SparseCore>
#include <Eigen/src/Core/util/DisableStupidWarnings.h>
extern "C" {
#include <btf.h>
#include <klu.h>
}
/** \ingroup Support_modules
* \defgroup KLUSupport_Module KLUSupport module
*
* This module provides an interface to the KLU library which is part of the <a href="http://www.suitesparse.com">suitesparse</a> package.
* It provides the following factorization class:
* - class KLU: a sparse LU factorization, well-suited for circuit simulation.
*
* \code
* #include <Eigen/KLUSupport>
* \endcode
*
* In order to use this module, the klu and btf headers must be accessible from the include paths, and your binary must be linked to the klu library and its dependencies.
* The dependencies depend on how umfpack has been compiled.
* For a cmake based project, you can use our FindKLU.cmake module to help you in this task.
*
*/
#include "src/KLUSupport/KLUSupport.h"
#include <Eigen/src/Core/util/ReenableStupidWarnings.h>
#endif // EIGEN_KLUSUPPORT_MODULE_H

View File

@@ -63,10 +63,7 @@
* \endcode
*/
#ifndef EIGEN_MPL2_ONLY
#include "src/OrderingMethods/Amd.h"
#endif
#include "src/OrderingMethods/Ordering.h"
#include "src/Core/util/ReenableStupidWarnings.h"

View File

@@ -36,6 +36,7 @@ extern "C" {
* \endcode
*
* In order to use this module, the PaSTiX headers must be accessible from the include paths, and your binary must be linked to the PaSTiX library and its dependencies.
* This wrapper resuires PaStiX version 5.x compiled without MPI support.
* The dependencies depend on how PaSTiX has been compiled.
* For a cmake based project, you can use our FindPaSTiX.cmake module to help you in this task.
*

0
Eigen/PardisoSupport Executable file → Normal file
View File

View File

@@ -25,9 +25,7 @@
#include "SparseCore"
#include "OrderingMethods"
#ifndef EIGEN_MPL2_ONLY
#include "SparseCholesky"
#endif
#include "SparseLU"
#include "SparseQR"
#include "IterativeLinearSolvers"

View File

@@ -30,16 +30,8 @@
* \endcode
*/
#ifdef EIGEN_MPL2_ONLY
#error The SparseCholesky module has nothing to offer in MPL2 only mode
#endif
#include "src/SparseCholesky/SimplicialCholesky.h"
#ifndef EIGEN_MPL2_ONLY
#include "src/SparseCholesky/SimplicialCholesky_impl.h"
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SPARSECHOLESKY_MODULE_H

View File

@@ -23,6 +23,8 @@
// Ordering interface
#include "OrderingMethods"
#include "src/Core/util/DisableStupidWarnings.h"
#include "src/SparseLU/SparseLU_gemm_kernel.h"
#include "src/SparseLU/SparseLU_Structs.h"
@@ -43,4 +45,6 @@
#include "src/SparseLU/SparseLU_Utils.h"
#include "src/SparseLU/SparseLU.h"
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SPARSELU_MODULE_H

View File

@@ -16,6 +16,15 @@
namespace Eigen {
namespace internal {
template<typename _MatrixType, int _UpLo> struct traits<LDLT<_MatrixType, _UpLo> >
: traits<_MatrixType>
{
typedef MatrixXpr XprKind;
typedef SolverStorage StorageKind;
typedef int StorageIndex;
enum { Flags = 0 };
};
template<typename MatrixType, int UpLo> struct LDLT_Traits;
// PositiveSemiDef means positive semi-definite and non-zero; same for NegativeSemiDef
@@ -48,20 +57,19 @@ namespace internal {
* \sa MatrixBase::ldlt(), SelfAdjointView::ldlt(), class LLT
*/
template<typename _MatrixType, int _UpLo> class LDLT
: public SolverBase<LDLT<_MatrixType, _UpLo> >
{
public:
typedef _MatrixType MatrixType;
typedef SolverBase<LDLT> Base;
friend class SolverBase<LDLT>;
EIGEN_GENERIC_PUBLIC_INTERFACE(LDLT)
enum {
RowsAtCompileTime = MatrixType::RowsAtCompileTime,
ColsAtCompileTime = MatrixType::ColsAtCompileTime,
MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
UpLo = _UpLo
};
typedef typename MatrixType::Scalar Scalar;
typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3
typedef typename MatrixType::StorageIndex StorageIndex;
typedef Matrix<Scalar, RowsAtCompileTime, 1, 0, MaxRowsAtCompileTime, 1> TmpMatrixType;
typedef Transpositions<RowsAtCompileTime, MaxRowsAtCompileTime> TranspositionType;
@@ -180,6 +188,7 @@ template<typename _MatrixType, int _UpLo> class LDLT
return m_sign == internal::NegativeSemiDef || m_sign == internal::ZeroSign;
}
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \returns a solution x of \f$ A x = b \f$ using the current decomposition of A.
*
* This function also supports in-place solves using the syntax <tt>x = decompositionObject.solve(x)</tt> .
@@ -197,13 +206,8 @@ template<typename _MatrixType, int _UpLo> class LDLT
*/
template<typename Rhs>
inline const Solve<LDLT, Rhs>
solve(const MatrixBase<Rhs>& b) const
{
eigen_assert(m_isInitialized && "LDLT is not initialized.");
eigen_assert(m_matrix.rows()==b.rows()
&& "LDLT::solve(): invalid number of rows of the right hand side matrix b");
return Solve<LDLT, Rhs>(*this, b.derived());
}
solve(const MatrixBase<Rhs>& b) const;
#endif
template<typename Derived>
bool solveInPlace(MatrixBase<Derived> &bAndX) const;
@@ -247,7 +251,7 @@ template<typename _MatrixType, int _UpLo> class LDLT
/** \brief Reports whether previous computation was successful.
*
* \returns \c Success if computation was succesful,
* \returns \c Success if computation was successful,
* \c NumericalIssue if the factorization failed because of a zero pivot.
*/
ComputationInfo info() const
@@ -258,8 +262,10 @@ template<typename _MatrixType, int _UpLo> class LDLT
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename RhsType, typename DstType>
EIGEN_DEVICE_FUNC
void _solve_impl(const RhsType &rhs, DstType &dst) const;
template<bool Conjugate, typename RhsType, typename DstType>
void _solve_impl_transposed(const RhsType &rhs, DstType &dst) const;
#endif
protected:
@@ -560,14 +566,22 @@ template<typename _MatrixType, int _UpLo>
template<typename RhsType, typename DstType>
void LDLT<_MatrixType,_UpLo>::_solve_impl(const RhsType &rhs, DstType &dst) const
{
eigen_assert(rhs.rows() == rows());
_solve_impl_transposed<true>(rhs, dst);
}
template<typename _MatrixType,int _UpLo>
template<bool Conjugate, typename RhsType, typename DstType>
void LDLT<_MatrixType,_UpLo>::_solve_impl_transposed(const RhsType &rhs, DstType &dst) const
{
// dst = P b
dst = m_transpositions * rhs;
// dst = L^-1 (P b)
matrixL().solveInPlace(dst);
// dst = L^-*T (P b)
matrixL().template conjugateIf<!Conjugate>().solveInPlace(dst);
// dst = D^-1 (L^-1 P b)
// dst = D^-* (L^-1 P b)
// dst = D^-1 (L^-*T P b)
// more precisely, use pseudo-inverse of D (see bug 241)
using std::abs;
const typename Diagonal<const MatrixType>::RealReturnType vecD(vectorD());
@@ -579,7 +593,6 @@ void LDLT<_MatrixType,_UpLo>::_solve_impl(const RhsType &rhs, DstType &dst) cons
// Moreover, Lapack's xSYTRS routines use 0 for the tolerance.
// Using numeric_limits::min() gives us more robustness to denormals.
RealScalar tolerance = (std::numeric_limits<RealScalar>::min)();
for (Index i = 0; i < vecD.size(); ++i)
{
if(abs(vecD(i)) > tolerance)
@@ -588,10 +601,12 @@ void LDLT<_MatrixType,_UpLo>::_solve_impl(const RhsType &rhs, DstType &dst) cons
dst.row(i).setZero();
}
// dst = L^-T (D^-1 L^-1 P b)
matrixU().solveInPlace(dst);
// dst = L^-* (D^-* L^-1 P b)
// dst = L^-T (D^-1 L^-*T P b)
matrixL().transpose().template conjugateIf<Conjugate>().solveInPlace(dst);
// dst = P^-1 (L^-T D^-1 L^-1 P b) = A^-1 b
// dst = P^T (L^-* D^-* L^-1 P b) = A^-1 b
// dst = P^-T (L^-T D^-1 L^-*T P b) = A^-1 b
dst = m_transpositions.transpose() * dst;
}
#endif

View File

@@ -13,6 +13,16 @@
namespace Eigen {
namespace internal{
template<typename _MatrixType, int _UpLo> struct traits<LLT<_MatrixType, _UpLo> >
: traits<_MatrixType>
{
typedef MatrixXpr XprKind;
typedef SolverStorage StorageKind;
typedef int StorageIndex;
enum { Flags = 0 };
};
template<typename MatrixType, int UpLo> struct LLT_Traits;
}
@@ -54,18 +64,17 @@ template<typename MatrixType, int UpLo> struct LLT_Traits;
* \sa MatrixBase::llt(), SelfAdjointView::llt(), class LDLT
*/
template<typename _MatrixType, int _UpLo> class LLT
: public SolverBase<LLT<_MatrixType, _UpLo> >
{
public:
typedef _MatrixType MatrixType;
typedef SolverBase<LLT> Base;
friend class SolverBase<LLT>;
EIGEN_GENERIC_PUBLIC_INTERFACE(LLT)
enum {
RowsAtCompileTime = MatrixType::RowsAtCompileTime,
ColsAtCompileTime = MatrixType::ColsAtCompileTime,
MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
};
typedef typename MatrixType::Scalar Scalar;
typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3
typedef typename MatrixType::StorageIndex StorageIndex;
enum {
PacketSize = internal::packet_traits<Scalar>::size,
@@ -100,7 +109,7 @@ template<typename _MatrixType, int _UpLo> class LLT
compute(matrix.derived());
}
/** \brief Constructs a LDLT factorization from a given matrix
/** \brief Constructs a LLT factorization from a given matrix
*
* This overloaded constructor is provided for \link InplaceDecomposition inplace decomposition \endlink when
* \c MatrixType is a Eigen::Ref.
@@ -129,6 +138,7 @@ template<typename _MatrixType, int _UpLo> class LLT
return Traits::getL(m_matrix);
}
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
*
* Since this LLT class assumes anyway that the matrix A is invertible, the solution
@@ -141,13 +151,8 @@ template<typename _MatrixType, int _UpLo> class LLT
*/
template<typename Rhs>
inline const Solve<LLT, Rhs>
solve(const MatrixBase<Rhs>& b) const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
eigen_assert(m_matrix.rows()==b.rows()
&& "LLT::solve(): invalid number of rows of the right hand side matrix b");
return Solve<LLT, Rhs>(*this, b.derived());
}
solve(const MatrixBase<Rhs>& b) const;
#endif
template<typename Derived>
void solveInPlace(const MatrixBase<Derived> &bAndX) const;
@@ -180,7 +185,7 @@ template<typename _MatrixType, int _UpLo> class LLT
/** \brief Reports whether previous computation was successful.
*
* \returns \c Success if computation was succesful,
* \returns \c Success if computation was successful,
* \c NumericalIssue if the matrix.appears not to be positive definite.
*/
ComputationInfo info() const
@@ -200,12 +205,14 @@ template<typename _MatrixType, int _UpLo> class LLT
inline Index cols() const { return m_matrix.cols(); }
template<typename VectorType>
LLT rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);
LLT & rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename RhsType, typename DstType>
EIGEN_DEVICE_FUNC
void _solve_impl(const RhsType &rhs, DstType &dst) const;
template<bool Conjugate, typename RhsType, typename DstType>
void _solve_impl_transposed(const RhsType &rhs, DstType &dst) const;
#endif
protected:
@@ -459,7 +466,7 @@ LLT<MatrixType,_UpLo>& LLT<MatrixType,_UpLo>::compute(const EigenBase<InputType>
*/
template<typename _MatrixType, int _UpLo>
template<typename VectorType>
LLT<_MatrixType,_UpLo> LLT<_MatrixType,_UpLo>::rankUpdate(const VectorType& v, const RealScalar& sigma)
LLT<_MatrixType,_UpLo> & LLT<_MatrixType,_UpLo>::rankUpdate(const VectorType& v, const RealScalar& sigma)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorType);
eigen_assert(v.size()==m_matrix.cols());
@@ -477,8 +484,17 @@ template<typename _MatrixType,int _UpLo>
template<typename RhsType, typename DstType>
void LLT<_MatrixType,_UpLo>::_solve_impl(const RhsType &rhs, DstType &dst) const
{
dst = rhs;
solveInPlace(dst);
_solve_impl_transposed<true>(rhs, dst);
}
template<typename _MatrixType,int _UpLo>
template<bool Conjugate, typename RhsType, typename DstType>
void LLT<_MatrixType,_UpLo>::_solve_impl_transposed(const RhsType &rhs, DstType &dst) const
{
dst = rhs;
matrixL().template conjugateIf<!Conjugate>().solveInPlace(dst);
matrixU().template conjugateIf<!Conjugate>().solveInPlace(dst);
}
#endif

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_CHOLMODSUPPORT_H
#define EIGEN_CHOLMODSUPPORT_H
namespace Eigen {
namespace Eigen {
namespace internal {
@@ -32,7 +32,7 @@ template<> struct cholmod_configure_matrix<std::complex<double> > {
}
};
// Other scalar types are not yet suppotred by Cholmod
// Other scalar types are not yet supported by Cholmod
// template<> struct cholmod_configure_matrix<float> {
// template<typename CholmodType>
// static void run(CholmodType& mat) {
@@ -79,12 +79,12 @@ cholmod_sparse viewAsCholmod(Ref<SparseMatrix<_Scalar,_Options,_StorageIndex> >
res.dtype = 0;
res.stype = -1;
if (internal::is_same<_StorageIndex,int>::value)
{
res.itype = CHOLMOD_INT;
}
else if (internal::is_same<_StorageIndex,long>::value)
else if (internal::is_same<_StorageIndex,SuiteSparse_long>::value)
{
res.itype = CHOLMOD_LONG;
}
@@ -95,9 +95,9 @@ cholmod_sparse viewAsCholmod(Ref<SparseMatrix<_Scalar,_Options,_StorageIndex> >
// setup res.xtype
internal::cholmod_configure_matrix<_Scalar>::run(res);
res.stype = 0;
return res;
}
@@ -121,9 +121,12 @@ template<typename _Scalar, int _Options, typename _Index, unsigned int UpLo>
cholmod_sparse viewAsCholmod(const SparseSelfAdjointView<const SparseMatrix<_Scalar,_Options,_Index>, UpLo>& mat)
{
cholmod_sparse res = viewAsCholmod(Ref<SparseMatrix<_Scalar,_Options,_Index> >(mat.matrix().const_cast_derived()));
if(UpLo==Upper) res.stype = 1;
if(UpLo==Lower) res.stype = -1;
// swap stype for rowmajor matrices (only works for real matrices)
EIGEN_STATIC_ASSERT((_Options & RowMajorBit) == 0 || NumTraits<_Scalar>::IsComplex == 0, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
if(_Options & RowMajorBit) res.stype *=-1;
return res;
}
@@ -159,6 +162,44 @@ MappedSparseMatrix<Scalar,Flags,StorageIndex> viewAsEigen(cholmod_sparse& cm)
static_cast<StorageIndex*>(cm.p), static_cast<StorageIndex*>(cm.i),static_cast<Scalar*>(cm.x) );
}
namespace internal {
// template specializations for int and long that call the correct cholmod method
#define EIGEN_CHOLMOD_SPECIALIZE0(ret, name) \
template<typename _StorageIndex> inline ret cm_ ## name (cholmod_common &Common) { return cholmod_ ## name (&Common); } \
template<> inline ret cm_ ## name<SuiteSparse_long> (cholmod_common &Common) { return cholmod_l_ ## name (&Common); }
#define EIGEN_CHOLMOD_SPECIALIZE1(ret, name, t1, a1) \
template<typename _StorageIndex> inline ret cm_ ## name (t1& a1, cholmod_common &Common) { return cholmod_ ## name (&a1, &Common); } \
template<> inline ret cm_ ## name<SuiteSparse_long> (t1& a1, cholmod_common &Common) { return cholmod_l_ ## name (&a1, &Common); }
EIGEN_CHOLMOD_SPECIALIZE0(int, start)
EIGEN_CHOLMOD_SPECIALIZE0(int, finish)
EIGEN_CHOLMOD_SPECIALIZE1(int, free_factor, cholmod_factor*, L)
EIGEN_CHOLMOD_SPECIALIZE1(int, free_dense, cholmod_dense*, X)
EIGEN_CHOLMOD_SPECIALIZE1(int, free_sparse, cholmod_sparse*, A)
EIGEN_CHOLMOD_SPECIALIZE1(cholmod_factor*, analyze, cholmod_sparse, A)
template<typename _StorageIndex> inline cholmod_dense* cm_solve (int sys, cholmod_factor& L, cholmod_dense& B, cholmod_common &Common) { return cholmod_solve (sys, &L, &B, &Common); }
template<> inline cholmod_dense* cm_solve<SuiteSparse_long> (int sys, cholmod_factor& L, cholmod_dense& B, cholmod_common &Common) { return cholmod_l_solve (sys, &L, &B, &Common); }
template<typename _StorageIndex> inline cholmod_sparse* cm_spsolve (int sys, cholmod_factor& L, cholmod_sparse& B, cholmod_common &Common) { return cholmod_spsolve (sys, &L, &B, &Common); }
template<> inline cholmod_sparse* cm_spsolve<SuiteSparse_long> (int sys, cholmod_factor& L, cholmod_sparse& B, cholmod_common &Common) { return cholmod_l_spsolve (sys, &L, &B, &Common); }
template<typename _StorageIndex>
inline int cm_factorize_p (cholmod_sparse* A, double beta[2], _StorageIndex* fset, std::size_t fsize, cholmod_factor* L, cholmod_common &Common) { return cholmod_factorize_p (A, beta, fset, fsize, L, &Common); }
template<>
inline int cm_factorize_p<SuiteSparse_long> (cholmod_sparse* A, double beta[2], SuiteSparse_long* fset, std::size_t fsize, cholmod_factor* L, cholmod_common &Common) { return cholmod_l_factorize_p (A, beta, fset, fsize, L, &Common); }
#undef EIGEN_CHOLMOD_SPECIALIZE0
#undef EIGEN_CHOLMOD_SPECIALIZE1
} // namespace internal
enum CholmodMode {
CholmodAuto, CholmodSimplicialLLt, CholmodSupernodalLLt, CholmodLDLt
};
@@ -195,7 +236,7 @@ class CholmodBase : public SparseSolverBase<Derived>
{
EIGEN_STATIC_ASSERT((internal::is_same<double,RealScalar>::value), CHOLMOD_SUPPORTS_DOUBLE_PRECISION_ONLY);
m_shiftOffset[0] = m_shiftOffset[1] = 0.0;
cholmod_start(&m_cholmod);
internal::cm_start<StorageIndex>(m_cholmod);
}
explicit CholmodBase(const MatrixType& matrix)
@@ -203,23 +244,23 @@ class CholmodBase : public SparseSolverBase<Derived>
{
EIGEN_STATIC_ASSERT((internal::is_same<double,RealScalar>::value), CHOLMOD_SUPPORTS_DOUBLE_PRECISION_ONLY);
m_shiftOffset[0] = m_shiftOffset[1] = 0.0;
cholmod_start(&m_cholmod);
internal::cm_start<StorageIndex>(m_cholmod);
compute(matrix);
}
~CholmodBase()
{
if(m_cholmodFactor)
cholmod_free_factor(&m_cholmodFactor, &m_cholmod);
cholmod_finish(&m_cholmod);
internal::cm_free_factor<StorageIndex>(m_cholmodFactor, m_cholmod);
internal::cm_finish<StorageIndex>(m_cholmod);
}
inline StorageIndex cols() const { return internal::convert_index<StorageIndex, Index>(m_cholmodFactor->n); }
inline StorageIndex rows() const { return internal::convert_index<StorageIndex, Index>(m_cholmodFactor->n); }
/** \brief Reports whether previous computation was successful.
*
* \returns \c Success if computation was succesful,
* \returns \c Success if computation was successful,
* \c NumericalIssue if the matrix.appears to be negative.
*/
ComputationInfo info() const
@@ -235,29 +276,29 @@ class CholmodBase : public SparseSolverBase<Derived>
factorize(matrix);
return derived();
}
/** Performs a symbolic decomposition on the sparsity pattern of \a matrix.
*
* This function is particularly useful when solving for several problems having the same structure.
*
*
* \sa factorize()
*/
void analyzePattern(const MatrixType& matrix)
{
if(m_cholmodFactor)
{
cholmod_free_factor(&m_cholmodFactor, &m_cholmod);
internal::cm_free_factor<StorageIndex>(m_cholmodFactor, m_cholmod);
m_cholmodFactor = 0;
}
cholmod_sparse A = viewAsCholmod(matrix.template selfadjointView<UpLo>());
m_cholmodFactor = cholmod_analyze(&A, &m_cholmod);
m_cholmodFactor = internal::cm_analyze<StorageIndex>(A, m_cholmod);
this->m_isInitialized = true;
this->m_info = Success;
m_analysisIsOk = true;
m_factorizationIsOk = false;
}
/** Performs a numeric decomposition of \a matrix
*
* The given matrix must have the same sparsity pattern as the matrix on which the symbolic decomposition has been performed.
@@ -268,17 +309,17 @@ class CholmodBase : public SparseSolverBase<Derived>
{
eigen_assert(m_analysisIsOk && "You must first call analyzePattern()");
cholmod_sparse A = viewAsCholmod(matrix.template selfadjointView<UpLo>());
cholmod_factorize_p(&A, m_shiftOffset, 0, 0, m_cholmodFactor, &m_cholmod);
internal::cm_factorize_p<StorageIndex>(&A, m_shiftOffset, 0, 0, m_cholmodFactor, m_cholmod);
// If the factorization failed, minor is the column at which it did. On success minor == n.
this->m_info = (m_cholmodFactor->minor == m_cholmodFactor->n ? Success : NumericalIssue);
m_factorizationIsOk = true;
}
/** Returns a reference to the Cholmod's configuration structure to get a full control over the performed operations.
* See the Cholmod user guide for details. */
cholmod_common& cholmod() { return m_cholmod; }
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** \internal */
template<typename Rhs,typename Dest>
@@ -288,22 +329,23 @@ class CholmodBase : public SparseSolverBase<Derived>
const Index size = m_cholmodFactor->n;
EIGEN_UNUSED_VARIABLE(size);
eigen_assert(size==b.rows());
// Cholmod needs column-major stoarge without inner-stride, which corresponds to the default behavior of Ref.
// Cholmod needs column-major storage without inner-stride, which corresponds to the default behavior of Ref.
Ref<const Matrix<typename Rhs::Scalar,Dynamic,Dynamic,ColMajor> > b_ref(b.derived());
cholmod_dense b_cd = viewAsCholmod(b_ref);
cholmod_dense* x_cd = cholmod_solve(CHOLMOD_A, m_cholmodFactor, &b_cd, &m_cholmod);
cholmod_dense* x_cd = internal::cm_solve<StorageIndex>(CHOLMOD_A, *m_cholmodFactor, b_cd, m_cholmod);
if(!x_cd)
{
this->m_info = NumericalIssue;
return;
}
// TODO optimize this copy by swapping when possible (be careful with alignment, etc.)
// NOTE Actually, the copy can be avoided by calling cholmod_solve2 instead of cholmod_solve
dest = Matrix<Scalar,Dest::RowsAtCompileTime,Dest::ColsAtCompileTime>::Map(reinterpret_cast<Scalar*>(x_cd->x),b.rows(),b.cols());
cholmod_free_dense(&x_cd, &m_cholmod);
internal::cm_free_dense<StorageIndex>(x_cd, m_cholmod);
}
/** \internal */
template<typename RhsDerived, typename DestDerived>
void _solve_impl(const SparseMatrixBase<RhsDerived> &b, SparseMatrixBase<DestDerived> &dest) const
@@ -316,19 +358,20 @@ class CholmodBase : public SparseSolverBase<Derived>
// note: cs stands for Cholmod Sparse
Ref<SparseMatrix<typename RhsDerived::Scalar,ColMajor,typename RhsDerived::StorageIndex> > b_ref(b.const_cast_derived());
cholmod_sparse b_cs = viewAsCholmod(b_ref);
cholmod_sparse* x_cs = cholmod_spsolve(CHOLMOD_A, m_cholmodFactor, &b_cs, &m_cholmod);
cholmod_sparse* x_cs = internal::cm_spsolve<StorageIndex>(CHOLMOD_A, *m_cholmodFactor, b_cs, m_cholmod);
if(!x_cs)
{
this->m_info = NumericalIssue;
return;
}
// TODO optimize this copy by swapping when possible (be careful with alignment, etc.)
// NOTE cholmod_spsolve in fact just calls the dense solver for blocks of 4 columns at a time (similar to Eigen's sparse solver)
dest.derived() = viewAsEigen<typename DestDerived::Scalar,ColMajor,typename DestDerived::StorageIndex>(*x_cs);
cholmod_free_sparse(&x_cs, &m_cholmod);
internal::cm_free_sparse<StorageIndex>(x_cs, m_cholmod);
}
#endif // EIGEN_PARSED_BY_DOXYGEN
/** Sets the shift parameter that will be used to adjust the diagonal coefficients during the numerical factorization.
*
* During the numerical factorization, an offset term is added to the diagonal coefficients:\n
@@ -343,7 +386,7 @@ class CholmodBase : public SparseSolverBase<Derived>
m_shiftOffset[0] = double(offset);
return derived();
}
/** \returns the determinant of the underlying matrix from the current factorization */
Scalar determinant() const
{
@@ -398,7 +441,7 @@ class CholmodBase : public SparseSolverBase<Derived>
template<typename Stream>
void dumpMemory(Stream& /*s*/)
{}
protected:
mutable cholmod_common m_cholmod;
cholmod_factor* m_cholmodFactor;
@@ -435,11 +478,11 @@ class CholmodSimplicialLLT : public CholmodBase<_MatrixType, _UpLo, CholmodSimpl
{
typedef CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLLT> Base;
using Base::m_cholmod;
public:
typedef _MatrixType MatrixType;
CholmodSimplicialLLT() : Base() { init(); }
CholmodSimplicialLLT(const MatrixType& matrix) : Base()
@@ -486,11 +529,11 @@ class CholmodSimplicialLDLT : public CholmodBase<_MatrixType, _UpLo, CholmodSimp
{
typedef CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLDLT> Base;
using Base::m_cholmod;
public:
typedef _MatrixType MatrixType;
CholmodSimplicialLDLT() : Base() { init(); }
CholmodSimplicialLDLT(const MatrixType& matrix) : Base()
@@ -535,11 +578,11 @@ class CholmodSupernodalLLT : public CholmodBase<_MatrixType, _UpLo, CholmodSuper
{
typedef CholmodBase<_MatrixType, _UpLo, CholmodSupernodalLLT> Base;
using Base::m_cholmod;
public:
typedef _MatrixType MatrixType;
CholmodSupernodalLLT() : Base() { init(); }
CholmodSupernodalLLT(const MatrixType& matrix) : Base()
@@ -586,11 +629,11 @@ class CholmodDecomposition : public CholmodBase<_MatrixType, _UpLo, CholmodDecom
{
typedef CholmodBase<_MatrixType, _UpLo, CholmodDecomposition> Base;
using Base::m_cholmod;
public:
typedef _MatrixType MatrixType;
CholmodDecomposition() : Base() { init(); }
CholmodDecomposition(const MatrixType& matrix) : Base()
@@ -600,7 +643,7 @@ class CholmodDecomposition : public CholmodBase<_MatrixType, _UpLo, CholmodDecom
}
~CholmodDecomposition() {}
void setMode(CholmodMode mode)
{
switch(mode)

View File

@@ -0,0 +1,413 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2017 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_ARITHMETIC_SEQUENCE_H
#define EIGEN_ARITHMETIC_SEQUENCE_H
namespace Eigen {
namespace internal {
#if (!EIGEN_HAS_CXX11) || !((!EIGEN_COMP_GNUC) || EIGEN_COMP_GNUC>=48)
template<typename T> struct aseq_negate {};
template<> struct aseq_negate<Index> {
typedef Index type;
};
template<int N> struct aseq_negate<FixedInt<N> > {
typedef FixedInt<-N> type;
};
// Compilation error in the following case:
template<> struct aseq_negate<FixedInt<DynamicIndex> > {};
template<typename FirstType,typename SizeType,typename IncrType,
bool FirstIsSymbolic=symbolic::is_symbolic<FirstType>::value,
bool SizeIsSymbolic =symbolic::is_symbolic<SizeType>::value>
struct aseq_reverse_first_type {
typedef Index type;
};
template<typename FirstType,typename SizeType,typename IncrType>
struct aseq_reverse_first_type<FirstType,SizeType,IncrType,true,true> {
typedef symbolic::AddExpr<FirstType,
symbolic::ProductExpr<symbolic::AddExpr<SizeType,symbolic::ValueExpr<FixedInt<-1> > >,
symbolic::ValueExpr<IncrType> >
> type;
};
template<typename SizeType,typename IncrType,typename EnableIf = void>
struct aseq_reverse_first_type_aux {
typedef Index type;
};
template<typename SizeType,typename IncrType>
struct aseq_reverse_first_type_aux<SizeType,IncrType,typename internal::enable_if<bool((SizeType::value+IncrType::value)|0x1)>::type> {
typedef FixedInt<(SizeType::value-1)*IncrType::value> type;
};
template<typename FirstType,typename SizeType,typename IncrType>
struct aseq_reverse_first_type<FirstType,SizeType,IncrType,true,false> {
typedef typename aseq_reverse_first_type_aux<SizeType,IncrType>::type Aux;
typedef symbolic::AddExpr<FirstType,symbolic::ValueExpr<Aux> > type;
};
template<typename FirstType,typename SizeType,typename IncrType>
struct aseq_reverse_first_type<FirstType,SizeType,IncrType,false,true> {
typedef symbolic::AddExpr<symbolic::ProductExpr<symbolic::AddExpr<SizeType,symbolic::ValueExpr<FixedInt<-1> > >,
symbolic::ValueExpr<IncrType> >,
symbolic::ValueExpr<> > type;
};
#endif
// Helper to cleanup the type of the increment:
template<typename T> struct cleanup_seq_incr {
typedef typename cleanup_index_type<T,DynamicIndex>::type type;
};
}
//--------------------------------------------------------------------------------
// seq(first,last,incr) and seqN(first,size,incr)
//--------------------------------------------------------------------------------
template<typename FirstType=Index,typename SizeType=Index,typename IncrType=internal::FixedInt<1> >
class ArithmeticSequence;
template<typename FirstType,typename SizeType,typename IncrType>
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
typename internal::cleanup_index_type<SizeType>::type,
typename internal::cleanup_seq_incr<IncrType>::type >
seqN(FirstType first, SizeType size, IncrType incr);
/** \class ArithmeticSequence
* \ingroup Core_Module
*
* This class represents an arithmetic progression \f$ a_0, a_1, a_2, ..., a_{n-1}\f$ defined by
* its \em first value \f$ a_0 \f$, its \em size (aka length) \em n, and the \em increment (aka stride)
* that is equal to \f$ a_{i+1}-a_{i}\f$ for any \em i.
*
* It is internally used as the return type of the Eigen::seq and Eigen::seqN functions, and as the input arguments
* of DenseBase::operator()(const RowIndices&, const ColIndices&), and most of the time this is the
* only way it is used.
*
* \tparam FirstType type of the first element, usually an Index,
* but internally it can be a symbolic expression
* \tparam SizeType type representing the size of the sequence, usually an Index
* or a compile time integral constant. Internally, it can also be a symbolic expression
* \tparam IncrType type of the increment, can be a runtime Index, or a compile time integral constant (default is compile-time 1)
*
* \sa Eigen::seq, Eigen::seqN, DenseBase::operator()(const RowIndices&, const ColIndices&), class IndexedView
*/
template<typename FirstType,typename SizeType,typename IncrType>
class ArithmeticSequence
{
public:
ArithmeticSequence(FirstType first, SizeType size) : m_first(first), m_size(size) {}
ArithmeticSequence(FirstType first, SizeType size, IncrType incr) : m_first(first), m_size(size), m_incr(incr) {}
enum {
SizeAtCompileTime = internal::get_fixed_value<SizeType>::value,
IncrAtCompileTime = internal::get_fixed_value<IncrType,DynamicIndex>::value
};
/** \returns the size, i.e., number of elements, of the sequence */
Index size() const { return m_size; }
/** \returns the first element \f$ a_0 \f$ in the sequence */
Index first() const { return m_first; }
/** \returns the value \f$ a_i \f$ at index \a i in the sequence. */
Index operator[](Index i) const { return m_first + i * m_incr; }
const FirstType& firstObject() const { return m_first; }
const SizeType& sizeObject() const { return m_size; }
const IncrType& incrObject() const { return m_incr; }
protected:
FirstType m_first;
SizeType m_size;
IncrType m_incr;
public:
#if EIGEN_HAS_CXX11 && ((!EIGEN_COMP_GNUC) || EIGEN_COMP_GNUC>=48)
auto reverse() const -> decltype(Eigen::seqN(m_first+(m_size+fix<-1>())*m_incr,m_size,-m_incr)) {
return seqN(m_first+(m_size+fix<-1>())*m_incr,m_size,-m_incr);
}
#else
protected:
typedef typename internal::aseq_negate<IncrType>::type ReverseIncrType;
typedef typename internal::aseq_reverse_first_type<FirstType,SizeType,IncrType>::type ReverseFirstType;
public:
ArithmeticSequence<ReverseFirstType,SizeType,ReverseIncrType>
reverse() const {
return seqN(m_first+(m_size+fix<-1>())*m_incr,m_size,-m_incr);
}
#endif
};
/** \returns an ArithmeticSequence starting at \a first, of length \a size, and increment \a incr
*
* \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */
template<typename FirstType,typename SizeType,typename IncrType>
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,typename internal::cleanup_index_type<SizeType>::type,typename internal::cleanup_seq_incr<IncrType>::type >
seqN(FirstType first, SizeType size, IncrType incr) {
return ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,typename internal::cleanup_index_type<SizeType>::type,typename internal::cleanup_seq_incr<IncrType>::type>(first,size,incr);
}
/** \returns an ArithmeticSequence starting at \a first, of length \a size, and unit increment
*
* \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType) */
template<typename FirstType,typename SizeType>
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,typename internal::cleanup_index_type<SizeType>::type >
seqN(FirstType first, SizeType size) {
return ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,typename internal::cleanup_index_type<SizeType>::type>(first,size);
}
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and with positive (or negative) increment \a incr
*
* It is essentially an alias to:
* \code
* seqN(f, (l-f+incr)/incr, incr);
* \endcode
*
* \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType)
*/
template<typename FirstType,typename LastType, typename IncrType>
auto seq(FirstType f, LastType l, IncrType incr);
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and unit increment
*
* It is essentially an alias to:
* \code
* seqN(f,l-f+1);
* \endcode
*
* \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType)
*/
template<typename FirstType,typename LastType>
auto seq(FirstType f, LastType l);
#else // EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX11
template<typename FirstType,typename LastType>
auto seq(FirstType f, LastType l) -> decltype(seqN(typename internal::cleanup_index_type<FirstType>::type(f),
( typename internal::cleanup_index_type<LastType>::type(l)
- typename internal::cleanup_index_type<FirstType>::type(f)+fix<1>())))
{
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
(typename internal::cleanup_index_type<LastType>::type(l)
-typename internal::cleanup_index_type<FirstType>::type(f)+fix<1>()));
}
template<typename FirstType,typename LastType, typename IncrType>
auto seq(FirstType f, LastType l, IncrType incr)
-> decltype(seqN(typename internal::cleanup_index_type<FirstType>::type(f),
( typename internal::cleanup_index_type<LastType>::type(l)
- typename internal::cleanup_index_type<FirstType>::type(f)+typename internal::cleanup_seq_incr<IncrType>::type(incr)
) / typename internal::cleanup_seq_incr<IncrType>::type(incr),
typename internal::cleanup_seq_incr<IncrType>::type(incr)))
{
typedef typename internal::cleanup_seq_incr<IncrType>::type CleanedIncrType;
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
( typename internal::cleanup_index_type<LastType>::type(l)
-typename internal::cleanup_index_type<FirstType>::type(f)+CleanedIncrType(incr)) / CleanedIncrType(incr),
CleanedIncrType(incr));
}
#else // EIGEN_HAS_CXX11
template<typename FirstType,typename LastType>
typename internal::enable_if<!(symbolic::is_symbolic<FirstType>::value || symbolic::is_symbolic<LastType>::value),
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,Index> >::type
seq(FirstType f, LastType l)
{
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
Index((typename internal::cleanup_index_type<LastType>::type(l)-typename internal::cleanup_index_type<FirstType>::type(f)+fix<1>())));
}
template<typename FirstTypeDerived,typename LastType>
typename internal::enable_if<!symbolic::is_symbolic<LastType>::value,
ArithmeticSequence<FirstTypeDerived, symbolic::AddExpr<symbolic::AddExpr<symbolic::NegateExpr<FirstTypeDerived>,symbolic::ValueExpr<> >,
symbolic::ValueExpr<internal::FixedInt<1> > > > >::type
seq(const symbolic::BaseExpr<FirstTypeDerived> &f, LastType l)
{
return seqN(f.derived(),(typename internal::cleanup_index_type<LastType>::type(l)-f.derived()+fix<1>()));
}
template<typename FirstType,typename LastTypeDerived>
typename internal::enable_if<!symbolic::is_symbolic<FirstType>::value,
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
symbolic::AddExpr<symbolic::AddExpr<LastTypeDerived,symbolic::ValueExpr<> >,
symbolic::ValueExpr<internal::FixedInt<1> > > > >::type
seq(FirstType f, const symbolic::BaseExpr<LastTypeDerived> &l)
{
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),(l.derived()-typename internal::cleanup_index_type<FirstType>::type(f)+fix<1>()));
}
template<typename FirstTypeDerived,typename LastTypeDerived>
ArithmeticSequence<FirstTypeDerived,
symbolic::AddExpr<symbolic::AddExpr<LastTypeDerived,symbolic::NegateExpr<FirstTypeDerived> >,symbolic::ValueExpr<internal::FixedInt<1> > > >
seq(const symbolic::BaseExpr<FirstTypeDerived> &f, const symbolic::BaseExpr<LastTypeDerived> &l)
{
return seqN(f.derived(),(l.derived()-f.derived()+fix<1>()));
}
template<typename FirstType,typename LastType, typename IncrType>
typename internal::enable_if<!(symbolic::is_symbolic<FirstType>::value || symbolic::is_symbolic<LastType>::value),
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,Index,typename internal::cleanup_seq_incr<IncrType>::type> >::type
seq(FirstType f, LastType l, IncrType incr)
{
typedef typename internal::cleanup_seq_incr<IncrType>::type CleanedIncrType;
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
Index((typename internal::cleanup_index_type<LastType>::type(l)-typename internal::cleanup_index_type<FirstType>::type(f)+CleanedIncrType(incr))/CleanedIncrType(incr)), incr);
}
template<typename FirstTypeDerived,typename LastType, typename IncrType>
typename internal::enable_if<!symbolic::is_symbolic<LastType>::value,
ArithmeticSequence<FirstTypeDerived,
symbolic::QuotientExpr<symbolic::AddExpr<symbolic::AddExpr<symbolic::NegateExpr<FirstTypeDerived>,
symbolic::ValueExpr<> >,
symbolic::ValueExpr<typename internal::cleanup_seq_incr<IncrType>::type> >,
symbolic::ValueExpr<typename internal::cleanup_seq_incr<IncrType>::type> >,
typename internal::cleanup_seq_incr<IncrType>::type> >::type
seq(const symbolic::BaseExpr<FirstTypeDerived> &f, LastType l, IncrType incr)
{
typedef typename internal::cleanup_seq_incr<IncrType>::type CleanedIncrType;
return seqN(f.derived(),(typename internal::cleanup_index_type<LastType>::type(l)-f.derived()+CleanedIncrType(incr))/CleanedIncrType(incr), incr);
}
template<typename FirstType,typename LastTypeDerived, typename IncrType>
typename internal::enable_if<!symbolic::is_symbolic<FirstType>::value,
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
symbolic::QuotientExpr<symbolic::AddExpr<symbolic::AddExpr<LastTypeDerived,symbolic::ValueExpr<> >,
symbolic::ValueExpr<typename internal::cleanup_seq_incr<IncrType>::type> >,
symbolic::ValueExpr<typename internal::cleanup_seq_incr<IncrType>::type> >,
typename internal::cleanup_seq_incr<IncrType>::type> >::type
seq(FirstType f, const symbolic::BaseExpr<LastTypeDerived> &l, IncrType incr)
{
typedef typename internal::cleanup_seq_incr<IncrType>::type CleanedIncrType;
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
(l.derived()-typename internal::cleanup_index_type<FirstType>::type(f)+CleanedIncrType(incr))/CleanedIncrType(incr), incr);
}
template<typename FirstTypeDerived,typename LastTypeDerived, typename IncrType>
ArithmeticSequence<FirstTypeDerived,
symbolic::QuotientExpr<symbolic::AddExpr<symbolic::AddExpr<LastTypeDerived,
symbolic::NegateExpr<FirstTypeDerived> >,
symbolic::ValueExpr<typename internal::cleanup_seq_incr<IncrType>::type> >,
symbolic::ValueExpr<typename internal::cleanup_seq_incr<IncrType>::type> >,
typename internal::cleanup_seq_incr<IncrType>::type>
seq(const symbolic::BaseExpr<FirstTypeDerived> &f, const symbolic::BaseExpr<LastTypeDerived> &l, IncrType incr)
{
typedef typename internal::cleanup_seq_incr<IncrType>::type CleanedIncrType;
return seqN(f.derived(),(l.derived()-f.derived()+CleanedIncrType(incr))/CleanedIncrType(incr), incr);
}
#endif // EIGEN_HAS_CXX11
#endif // EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX11 || defined(EIGEN_PARSED_BY_DOXYGEN)
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with increment \a incr.
*
* It is a shortcut for: \code seqN(last-(size-fix<1>)*incr, size, incr) \endcode
*
* \sa lastN(SizeType), seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */
template<typename SizeType,typename IncrType>
auto lastN(SizeType size, IncrType incr)
-> decltype(seqN(Eigen::last-(size-fix<1>())*incr, size, incr))
{
return seqN(Eigen::last-(size-fix<1>())*incr, size, incr);
}
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with a unit increment.
*
* It is a shortcut for: \code seq(last+fix<1>-size, last) \endcode
*
* \sa lastN(SizeType,IncrType, seqN(FirstType,SizeType), seq(FirstType,LastType) */
template<typename SizeType>
auto lastN(SizeType size)
-> decltype(seqN(Eigen::last+fix<1>()-size, size))
{
return seqN(Eigen::last+fix<1>()-size, size);
}
#endif
namespace internal {
// Convert a symbolic span into a usable one (i.e., remove last/end "keywords")
template<typename T>
struct make_size_type {
typedef typename internal::conditional<symbolic::is_symbolic<T>::value, Index, T>::type type;
};
template<typename FirstType,typename SizeType,typename IncrType,int XprSize>
struct IndexedViewCompatibleType<ArithmeticSequence<FirstType,SizeType,IncrType>, XprSize> {
typedef ArithmeticSequence<Index,typename make_size_type<SizeType>::type,IncrType> type;
};
template<typename FirstType,typename SizeType,typename IncrType>
ArithmeticSequence<Index,typename make_size_type<SizeType>::type,IncrType>
makeIndexedViewCompatible(const ArithmeticSequence<FirstType,SizeType,IncrType>& ids, Index size,SpecializedType) {
return ArithmeticSequence<Index,typename make_size_type<SizeType>::type,IncrType>(
eval_expr_given_size(ids.firstObject(),size),eval_expr_given_size(ids.sizeObject(),size),ids.incrObject());
}
template<typename FirstType,typename SizeType,typename IncrType>
struct get_compile_time_incr<ArithmeticSequence<FirstType,SizeType,IncrType> > {
enum { value = get_fixed_value<IncrType,DynamicIndex>::value };
};
} // end namespace internal
/** \namespace Eigen::indexing
* \ingroup Core_Module
*
* The sole purpose of this namespace is to be able to import all functions
* and symbols that are expected to be used within operator() for indexing
* and slicing. If you already imported the whole Eigen namespace:
* \code using namespace Eigen; \endcode
* then you are already all set. Otherwise, if you don't want/cannot import
* the whole Eigen namespace, the following line:
* \code using namespace Eigen::indexing; \endcode
* is equivalent to:
* \code
using Eigen::all;
using Eigen::seq;
using Eigen::seqN;
using Eigen::lastN; // c++11 only
using Eigen::last;
using Eigen::lastp1;
using Eigen::fix;
\endcode
*/
namespace indexing {
using Eigen::all;
using Eigen::seq;
using Eigen::seqN;
#if EIGEN_HAS_CXX11
using Eigen::lastN;
#endif
using Eigen::last;
using Eigen::lastp1;
using Eigen::fix;
}
} // end namespace Eigen
#endif // EIGEN_ARITHMETIC_SEQUENCE_H

View File

@@ -162,6 +162,45 @@ class Array
}
#endif
#if EIGEN_HAS_CXX11
/** \copydoc PlainObjectBase(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*
* Example: \include Array_variadic_ctor_cxx11.cpp
* Output: \verbinclude Array_variadic_ctor_cxx11.out
*
* \sa Array(const std::initializer_list<std::initializer_list<Scalar>>&)
* \sa Array(const Scalar&), Array(const Scalar&,const Scalar&)
*/
template <typename... ArgTypes>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs an array and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
* Example: \include Array_initializer_list_23_cxx11.cpp
* Output: \verbinclude Array_initializer_list_23_cxx11.out
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered.
*
* In the case of a compile-time column 1D array, implicit transposition from a single row is allowed.
* Therefore <code> Array<int,Dynamic,1>{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>Array<int,Dynamic,1>{{1},{2},{3},{4},{5}}</code> can be avoided:
*
* Example: \include Array_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Array_initializer_list_vector_cxx11.out
*
* In the case of fixed-sized arrays, the initializer list sizes must exactly match the array sizes,
* and implicit transposition is allowed for compile-time 1D arrays only.
*
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const std::initializer_list<std::initializer_list<Scalar>>& list) : Base(list) {}
#endif // end EIGEN_HAS_CXX11
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename T>
EIGEN_DEVICE_FUNC
@@ -178,6 +217,7 @@ class Array
Base::_check_template_params();
this->template _init2<T0,T1>(val0, val1);
}
#else
/** \brief Constructs a fixed-sized array initialized with coefficients starting at \a data */
EIGEN_DEVICE_FUNC explicit Array(const Scalar *data);
@@ -189,7 +229,8 @@ class Array
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE explicit Array(Index dim);
/** constructs an initialized 1x1 Array with the given coefficient */
/** constructs an initialized 1x1 Array with the given coefficient
* \sa const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args */
Array(const Scalar& value);
/** constructs an uninitialized array with \a rows rows and \a cols columns.
*
@@ -197,11 +238,14 @@ class Array
* it is redundant to pass these parameters, so one should use the default constructor
* Array() instead. */
Array(Index rows, Index cols);
/** constructs an initialized 2D vector with given coefficients */
/** constructs an initialized 2D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) */
Array(const Scalar& val0, const Scalar& val1);
#endif
#endif // end EIGEN_PARSED_BY_DOXYGEN
/** constructs an initialized 3D vector with given coefficients */
/** constructs an initialized 3D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2)
{
@@ -211,7 +255,9 @@ class Array
m_storage.data()[1] = val1;
m_storage.data()[2] = val2;
}
/** constructs an initialized 4D vector with given coefficients */
/** constructs an initialized 4D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2, const Scalar& val3)
{
@@ -258,7 +304,7 @@ class Array
/** \defgroup arraytypedefs Global array typedefs
* \ingroup Core_Module
*
* Eigen defines several typedef shortcuts for most common 1D and 2D array types.
* %Eigen defines several typedef shortcuts for most common 1D and 2D array types.
*
* The general patterns are the following:
*
@@ -271,6 +317,12 @@ class Array
* There are also \c ArraySizeType which are self-explanatory. For example, \c Array4cf is
* a fixed-size 1D array of 4 complex floats.
*
* With \cpp11, template alias are also defined for common sizes.
* They follow the same pattern as above except that the scalar type suffix is replaced by a
* template parameter, i.e.:
* - `ArrayRowsCols<Type>` where `Rows` and `Cols` can be \c 2,\c 3,\c 4, or \c X for fixed or dynamic size.
* - `ArraySize<Type>` where `Size` can be \c 2,\c 3,\c 4 or \c X for fixed or dynamic size 1D arrays.
*
* \sa class Array
*/
@@ -303,9 +355,43 @@ EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(std::complex<double>, cd)
#undef EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES
#undef EIGEN_MAKE_ARRAY_TYPEDEFS
#undef EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS
#undef EIGEN_MAKE_ARRAY_TYPEDEFS_LARGE
#if EIGEN_HAS_CXX11
#define EIGEN_MAKE_ARRAY_TYPEDEFS(Size, SizeSuffix) \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##SizeSuffix##SizeSuffix = Array<Type, Size, Size>; \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##SizeSuffix = Array<Type, Size, 1>;
#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Size) \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##Size##X = Array<Type, Size, Dynamic>; \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##X##Size = Array<Type, Dynamic, Size>;
EIGEN_MAKE_ARRAY_TYPEDEFS(2, 2)
EIGEN_MAKE_ARRAY_TYPEDEFS(3, 3)
EIGEN_MAKE_ARRAY_TYPEDEFS(4, 4)
EIGEN_MAKE_ARRAY_TYPEDEFS(Dynamic, X)
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(2)
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(3)
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(4)
#undef EIGEN_MAKE_ARRAY_TYPEDEFS
#undef EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS
#endif // EIGEN_HAS_CXX11
#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, SizeSuffix) \
using Eigen::Matrix##SizeSuffix##TypeSuffix; \
using Eigen::Vector##SizeSuffix##TypeSuffix; \

View File

@@ -69,6 +69,7 @@ template<typename Derived> class ArrayBase
using Base::coeff;
using Base::coeffRef;
using Base::lazyAssign;
using Base::operator-;
using Base::operator=;
using Base::operator+=;
using Base::operator-=;
@@ -88,7 +89,6 @@ template<typename Derived> class ArrayBase
#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::ArrayBase
#define EIGEN_DOC_UNARY_ADDONS(X,Y)
# include "../plugins/CommonCwiseUnaryOps.h"
# include "../plugins/MatrixCwiseUnaryOps.h"
# include "../plugins/ArrayCwiseUnaryOps.h"
# include "../plugins/CommonCwiseBinaryOps.h"
@@ -153,8 +153,8 @@ template<typename Derived> class ArrayBase
// inline void evalTo(Dest& dst) const { dst = matrix(); }
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(ArrayBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(ArrayBase)
EIGEN_DEVICE_FUNC
ArrayBase() : Base() {}
private:
explicit ArrayBase(Index);

View File

@@ -90,8 +90,8 @@ class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
EIGEN_DEVICE_FUNC
inline void evalTo(Dest& dst) const { dst = m_expression; }
const typename internal::remove_all<NestedExpressionType>::type&
EIGEN_DEVICE_FUNC
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
{
return m_expression;

View File

@@ -16,7 +16,7 @@ namespace Eigen {
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived& DenseBase<Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase<Derived>
::lazyAssign(const DenseBase<OtherDerived>& other)
{
enum{

View File

@@ -24,7 +24,7 @@ namespace internal {
// copy_using_evaluator_traits is based on assign_traits
template <typename DstEvaluator, typename SrcEvaluator, typename AssignFunc>
template <typename DstEvaluator, typename SrcEvaluator, typename AssignFunc, int MaxPacketSize = -1>
struct copy_using_evaluator_traits
{
typedef typename DstEvaluator::XprType Dst;
@@ -51,13 +51,15 @@ private:
InnerMaxSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::MaxSizeAtCompileTime)
: int(DstFlags)&RowMajorBit ? int(Dst::MaxColsAtCompileTime)
: int(Dst::MaxRowsAtCompileTime),
RestrictedInnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(InnerSize,MaxPacketSize),
RestrictedLinearSize = EIGEN_SIZE_MIN_PREFER_FIXED(Dst::SizeAtCompileTime,MaxPacketSize),
OuterStride = int(outer_stride_at_compile_time<Dst>::ret),
MaxSizeAtCompileTime = Dst::SizeAtCompileTime
};
// TODO distinguish between linear traversal and inner-traversals
typedef typename find_best_packet<DstScalar,Dst::SizeAtCompileTime>::type LinearPacketType;
typedef typename find_best_packet<DstScalar,InnerSize>::type InnerPacketType;
typedef typename find_best_packet<DstScalar,RestrictedLinearSize>::type LinearPacketType;
typedef typename find_best_packet<DstScalar,RestrictedInnerSize>::type InnerPacketType;
enum {
LinearPacketSize = unpacket_traits<LinearPacketType>::size,
@@ -97,7 +99,7 @@ private:
public:
enum {
Traversal = int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize) ? int(LinearVectorizedTraversal)
Traversal = (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal)
: int(MayInnerVectorize) ? int(InnerVectorizedTraversal)
: int(MayLinearVectorize) ? int(LinearVectorizedTraversal)
: int(MaySliceVectorize) ? int(SliceVectorizedTraversal)
@@ -172,6 +174,8 @@ public:
EIGEN_DEBUG_VAR(MaySliceVectorize)
std::cerr << "Traversal" << " = " << Traversal << " (" << demangle_traversal(Traversal) << ")" << std::endl;
EIGEN_DEBUG_VAR(SrcEvaluator::CoeffReadCost)
EIGEN_DEBUG_VAR(DstEvaluator::CoeffReadCost)
EIGEN_DEBUG_VAR(Dst::SizeAtCompileTime)
EIGEN_DEBUG_VAR(UnrollingLimit)
EIGEN_DEBUG_VAR(MayUnrollCompletely)
EIGEN_DEBUG_VAR(MayUnrollInner)
@@ -530,7 +534,7 @@ struct dense_assignment_loop<Kernel, SliceVectorizedTraversal, NoUnrolling>
const Scalar *dst_ptr = kernel.dstDataPtr();
if((!bool(dstIsAligned)) && (UIntPtr(dst_ptr) % sizeof(Scalar))>0)
{
// the pointer is not aligend-on scalar, so alignment is not possible
// the pointer is not aligned-on scalar, so alignment is not possible
return dense_assignment_loop<Kernel,DefaultTraversal,NoUnrolling>::run(kernel);
}
const Index packetAlignedMask = packetSize - 1;
@@ -607,7 +611,8 @@ public:
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
: m_dst(dst), m_src(src), m_functor(func), m_dstExpr(dstExpr)
{
#ifdef EIGEN_DEBUG_ASSIGN
@@ -697,6 +702,27 @@ protected:
DstXprType& m_dstExpr;
};
// Special kernel used when computing small products whose operands have dynamic dimensions. It ensures that the
// PacketSize used is no larger than 4, thereby increasing the chance that vectorized instructions will be used
// when computing the product.
template<typename DstEvaluatorTypeT, typename SrcEvaluatorTypeT, typename Functor>
class restricted_packet_dense_assignment_kernel : public generic_dense_assignment_kernel<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor, BuiltIn>
{
protected:
typedef generic_dense_assignment_kernel<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor, BuiltIn> Base;
public:
typedef typename Base::Scalar Scalar;
typedef typename Base::DstXprType DstXprType;
typedef copy_using_evaluator_traits<DstEvaluatorTypeT, SrcEvaluatorTypeT, Functor, 4> AssignmentTraits;
typedef typename AssignmentTraits::PacketType PacketType;
EIGEN_DEVICE_FUNC restricted_packet_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{
}
};
/***************************************************************************
* Part 5 : Entry point for dense rectangular assignment
***************************************************************************/
@@ -756,7 +782,7 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType
// AssignmentKind must define a Kind typedef.
template<typename DstShape, typename SrcShape> struct AssignmentKind;
// Assignement kind defined in this file:
// Assignment kind defined in this file:
struct Dense2Dense {};
struct EigenBase2EigenBase {};
@@ -835,6 +861,27 @@ void call_assignment_no_alias(Dst& dst, const Src& src, const Func& func)
Assignment<ActualDstTypeCleaned,Src,Func>::run(actualDst, src, func);
}
template<typename Dst, typename Src, typename Func>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void call_restricted_packet_assignment_no_alias(Dst& dst, const Src& src, const Func& func)
{
typedef evaluator<Dst> DstEvaluatorType;
typedef evaluator<Src> SrcEvaluatorType;
typedef restricted_packet_dense_assignment_kernel<DstEvaluatorType,SrcEvaluatorType,Func> Kernel;
EIGEN_STATIC_ASSERT_LVALUE(Dst)
EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename Dst::Scalar,typename Src::Scalar);
SrcEvaluatorType srcEvaluator(src);
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
dense_assignment_loop<Kernel>::run(kernel);
}
template<typename Dst, typename Src>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void call_assignment_no_alias(Dst& dst, const Src& src)
@@ -899,7 +946,7 @@ struct Assignment<DstXprType, SrcXprType, Functor, EigenBase2EigenBase, Weak>
src.evalTo(dst);
}
// NOTE The following two functions are templated to avoid their instanciation if not needed
// NOTE The following two functions are templated to avoid their instantiation if not needed
// This is needed because some expressions supports evalTo only and/or have 'void' as scalar type.
template<typename SrcScalarType>
EIGEN_DEVICE_FUNC

View File

@@ -68,16 +68,16 @@ class vml_assign_traits
#define EIGEN_PP_EXPAND(ARG) ARG
#if !defined (EIGEN_FAST_MATH) || (EIGEN_FAST_MATH != 1)
#define EIGEN_VMLMODE_EXPAND_LA , VML_HA
#define EIGEN_VMLMODE_EXPAND_xLA , VML_HA
#else
#define EIGEN_VMLMODE_EXPAND_LA , VML_LA
#define EIGEN_VMLMODE_EXPAND_xLA , VML_LA
#endif
#define EIGEN_VMLMODE_EXPAND__
#define EIGEN_VMLMODE_EXPAND_x_
#define EIGEN_VMLMODE_PREFIX_LA vm
#define EIGEN_VMLMODE_PREFIX__ v
#define EIGEN_VMLMODE_PREFIX(VMLMODE) EIGEN_CAT(EIGEN_VMLMODE_PREFIX_,VMLMODE)
#define EIGEN_VMLMODE_PREFIX_xLA vm
#define EIGEN_VMLMODE_PREFIX_x_ v
#define EIGEN_VMLMODE_PREFIX(VMLMODE) EIGEN_CAT(EIGEN_VMLMODE_PREFIX_x,VMLMODE)
#define EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \
template< typename DstXprType, typename SrcXprNested> \
@@ -89,7 +89,7 @@ class vml_assign_traits
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
if(vml_assign_traits<DstXprType,SrcXprNested>::Traversal==LinearTraversal) { \
VMLOP(dst.size(), (const VMLTYPE*)src.nestedExpression().data(), \
(VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE) ); \
(VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE) ); \
} else { \
const Index outerSize = dst.outerSize(); \
for(Index outer = 0; outer < outerSize; ++outer) { \
@@ -97,7 +97,7 @@ class vml_assign_traits
&(src.nestedExpression().coeffRef(0, outer)); \
EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer)); \
VMLOP( dst.innerSize(), (const VMLTYPE*)src_ptr, \
(VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE)); \
(VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \
} \
} \
} \
@@ -152,7 +152,7 @@ EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(ceil, Ceil, _)
if(vml_assign_traits<DstXprType,SrcXprNested>::Traversal==LinearTraversal) \
{ \
VMLOP( dst.size(), (const VMLTYPE*)src.lhs().data(), exponent, \
(VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE) ); \
(VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE) ); \
} else { \
const Index outerSize = dst.outerSize(); \
for(Index outer = 0; outer < outerSize; ++outer) { \
@@ -160,7 +160,7 @@ EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(ceil, Ceil, _)
&(src.lhs().coeffRef(0, outer)); \
EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer)); \
VMLOP( dst.innerSize(), (const VMLTYPE*)src_ptr, exponent, \
(VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE)); \
(VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \
} \
} \
} \

View File

@@ -114,8 +114,8 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC
inline Block(XprType& xpr, Index i) : Impl(xpr,i)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Block(XprType& xpr, Index i) : Impl(xpr,i)
{
eigen_assert( (i>=0) && (
((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && i<xpr.rows())
@@ -124,8 +124,8 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline Block(XprType& xpr, Index startRow, Index startCol)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Block(XprType& xpr, Index startRow, Index startCol)
: Impl(xpr, startRow, startCol)
{
EIGEN_STATIC_ASSERT(RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic,THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE)
@@ -135,8 +135,8 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline Block(XprType& xpr,
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Block(XprType& xpr,
Index startRow, Index startCol,
Index blockRows, Index blockCols)
: Impl(xpr, startRow, startCol, blockRows, blockCols)
@@ -159,10 +159,10 @@ class BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, Dense>
public:
typedef Impl Base;
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl)
EIGEN_DEVICE_FUNC inline BlockImpl(XprType& xpr, Index i) : Impl(xpr,i) {}
EIGEN_DEVICE_FUNC inline BlockImpl(XprType& xpr, Index startRow, Index startCol) : Impl(xpr, startRow, startCol) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index i) : Impl(xpr,i) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index startRow, Index startCol) : Impl(xpr, startRow, startCol) {}
EIGEN_DEVICE_FUNC
inline BlockImpl(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols)
EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols)
: Impl(xpr, startRow, startCol, blockRows, blockCols) {}
};
@@ -294,22 +294,22 @@ template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool H
EIGEN_DEVICE_FUNC inline Index outerStride() const;
#endif
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
XprType& nestedExpression() { return m_xpr; }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startRow() const
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startCol() const
{
return m_startCol.value();
@@ -342,8 +342,8 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, Index i)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr, Index i)
: Base(xpr.data() + i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor))
|| ((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && ( XprTypeIsRowMajor)) ? xpr.innerStride() : xpr.outerStride()),
BlockRows==1 ? 1 : xpr.rows(),
@@ -357,8 +357,8 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
: Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol)),
m_xpr(xpr), m_startRow(startRow), m_startCol(startCol)
{
@@ -367,8 +367,8 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr,
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr,
Index startRow, Index startCol,
Index blockRows, Index blockCols)
: Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol), blockRows, blockCols),
@@ -377,18 +377,18 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
init();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC
inline Index innerStride() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index innerStride() const
{
return internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.innerStride()
@@ -396,19 +396,19 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC
inline Index outerStride() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const
{
return m_outerStride;
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startRow() const
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
StorageIndex startCol() const
{
return m_startCol.value();
@@ -422,8 +422,8 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** \internal used by allowAligned() */
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, const Scalar* data, Index blockRows, Index blockCols)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
BlockImpl_dense(XprType& xpr, const Scalar* data, Index blockRows, Index blockCols)
: Base(data, blockRows, blockCols), m_xpr(xpr)
{
init();
@@ -431,7 +431,7 @@ class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
#endif
protected:
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void init()
{
m_outerStride = internal::traits<BlockType>::HasSameStorageOrderAsXprType

View File

@@ -14,56 +14,54 @@ namespace Eigen {
namespace internal {
template<typename Derived, int UnrollCount>
template<typename Derived, int UnrollCount, int Rows>
struct all_unroller
{
typedef typename Derived::ExpressionTraits Traits;
enum {
col = (UnrollCount-1) / Traits::RowsAtCompileTime,
row = (UnrollCount-1) % Traits::RowsAtCompileTime
col = (UnrollCount-1) / Rows,
row = (UnrollCount-1) % Rows
};
static inline bool run(const Derived &mat)
{
return all_unroller<Derived, UnrollCount-1>::run(mat) && mat.coeff(row, col);
return all_unroller<Derived, UnrollCount-1, Rows>::run(mat) && mat.coeff(row, col);
}
};
template<typename Derived>
struct all_unroller<Derived, 0>
template<typename Derived, int Rows>
struct all_unroller<Derived, 0, Rows>
{
static inline bool run(const Derived &/*mat*/) { return true; }
};
template<typename Derived>
struct all_unroller<Derived, Dynamic>
template<typename Derived, int Rows>
struct all_unroller<Derived, Dynamic, Rows>
{
static inline bool run(const Derived &) { return false; }
};
template<typename Derived, int UnrollCount>
template<typename Derived, int UnrollCount, int Rows>
struct any_unroller
{
typedef typename Derived::ExpressionTraits Traits;
enum {
col = (UnrollCount-1) / Traits::RowsAtCompileTime,
row = (UnrollCount-1) % Traits::RowsAtCompileTime
col = (UnrollCount-1) / Rows,
row = (UnrollCount-1) % Rows
};
static inline bool run(const Derived &mat)
{
return any_unroller<Derived, UnrollCount-1>::run(mat) || mat.coeff(row, col);
return any_unroller<Derived, UnrollCount-1, Rows>::run(mat) || mat.coeff(row, col);
}
};
template<typename Derived>
struct any_unroller<Derived, 0>
template<typename Derived, int Rows>
struct any_unroller<Derived, 0, Rows>
{
static inline bool run(const Derived & /*mat*/) { return false; }
};
template<typename Derived>
struct any_unroller<Derived, Dynamic>
template<typename Derived, int Rows>
struct any_unroller<Derived, Dynamic, Rows>
{
static inline bool run(const Derived &) { return false; }
};
@@ -78,7 +76,7 @@ struct any_unroller<Derived, Dynamic>
* \sa any(), Cwise::operator<()
*/
template<typename Derived>
inline bool DenseBase<Derived>::all() const
EIGEN_DEVICE_FUNC inline bool DenseBase<Derived>::all() const
{
typedef internal::evaluator<Derived> Evaluator;
enum {
@@ -87,7 +85,7 @@ inline bool DenseBase<Derived>::all() const
};
Evaluator evaluator(derived());
if(unroll)
return internal::all_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic>::run(evaluator);
return internal::all_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic, internal::traits<Derived>::RowsAtCompileTime>::run(evaluator);
else
{
for(Index j = 0; j < cols(); ++j)
@@ -102,7 +100,7 @@ inline bool DenseBase<Derived>::all() const
* \sa all()
*/
template<typename Derived>
inline bool DenseBase<Derived>::any() const
EIGEN_DEVICE_FUNC inline bool DenseBase<Derived>::any() const
{
typedef internal::evaluator<Derived> Evaluator;
enum {
@@ -111,7 +109,7 @@ inline bool DenseBase<Derived>::any() const
};
Evaluator evaluator(derived());
if(unroll)
return internal::any_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic>::run(evaluator);
return internal::any_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic, internal::traits<Derived>::RowsAtCompileTime>::run(evaluator);
else
{
for(Index j = 0; j < cols(); ++j)
@@ -126,7 +124,7 @@ inline bool DenseBase<Derived>::any() const
* \sa all(), any()
*/
template<typename Derived>
inline Eigen::Index DenseBase<Derived>::count() const
EIGEN_DEVICE_FUNC inline Eigen::Index DenseBase<Derived>::count() const
{
return derived().template cast<bool>().template cast<Index>().sum();
}

View File

@@ -141,7 +141,7 @@ struct CommaInitializer
* \sa CommaInitializer::finished(), class CommaInitializer
*/
template<typename Derived>
inline CommaInitializer<Derived> DenseBase<Derived>::operator<< (const Scalar& s)
EIGEN_DEVICE_FUNC inline CommaInitializer<Derived> DenseBase<Derived>::operator<< (const Scalar& s)
{
return CommaInitializer<Derived>(*static_cast<Derived*>(this), s);
}
@@ -149,7 +149,7 @@ inline CommaInitializer<Derived> DenseBase<Derived>::operator<< (const Scalar& s
/** \sa operator<<(const Scalar&) */
template<typename Derived>
template<typename OtherDerived>
inline CommaInitializer<Derived>
EIGEN_DEVICE_FUNC inline CommaInitializer<Derived>
DenseBase<Derived>::operator<<(const DenseBase<OtherDerived>& other)
{
return CommaInitializer<Derived>(*static_cast<Derived *>(this), other);

View File

@@ -90,7 +90,8 @@ template<typename T>
struct evaluator : public unary_evaluator<T>
{
typedef unary_evaluator<T> Base;
EIGEN_DEVICE_FUNC explicit evaluator(const T& xpr) : Base(xpr) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const T& xpr) : Base(xpr) {}
};
@@ -99,14 +100,14 @@ template<typename T>
struct evaluator<const T>
: evaluator<T>
{
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const T& xpr) : evaluator<T>(xpr) {}
};
// ---------- base class for all evaluators ----------
template<typename ExpressionType>
struct evaluator_base : public noncopyable
struct evaluator_base
{
// TODO that's not very nice to have to propagate all these traits. They are currently only needed to handle outer,inner indices.
typedef traits<ExpressionType> ExpressionTraits;
@@ -114,6 +115,14 @@ struct evaluator_base : public noncopyable
enum {
Alignment = 0
};
// noncopyable:
// Don't make this class inherit noncopyable as this kills EBO (Empty Base Optimization)
// and make complex evaluator much larger than then should do.
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE evaluator_base() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ~evaluator_base() {}
private:
EIGEN_DEVICE_FUNC evaluator_base(const evaluator_base&);
EIGEN_DEVICE_FUNC const evaluator_base& operator=(const evaluator_base&);
};
// -------------------- Matrix and Array --------------------
@@ -123,6 +132,33 @@ struct evaluator_base : public noncopyable
// Here we directly specialize evaluator. This is not really a unary expression, and it is, by definition, dense,
// so no need for more sophisticated dispatching.
// this helper permits to completely eliminate m_outerStride if it is known at compiletime.
template<typename Scalar,int OuterStride> class plainobjectbase_evaluator_data {
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
plainobjectbase_evaluator_data(const Scalar* ptr, Index outerStride) : data(ptr)
{
#ifndef EIGEN_INTERNAL_DEBUGGING
EIGEN_UNUSED_VARIABLE(outerStride);
#endif
eigen_internal_assert(outerStride==OuterStride);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const { return OuterStride; }
const Scalar *data;
};
template<typename Scalar> class plainobjectbase_evaluator_data<Scalar,Dynamic> {
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
plainobjectbase_evaluator_data(const Scalar* ptr, Index outerStride) : data(ptr), m_outerStride(outerStride) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const { return m_outerStride; }
const Scalar *data;
protected:
Index m_outerStride;
};
template<typename Derived>
struct evaluator<PlainObjectBase<Derived> >
: evaluator_base<Derived>
@@ -141,18 +177,23 @@ struct evaluator<PlainObjectBase<Derived> >
Flags = traits<Derived>::EvaluatorFlags,
Alignment = traits<Derived>::Alignment
};
EIGEN_DEVICE_FUNC evaluator()
: m_data(0),
m_outerStride(IsVectorAtCompileTime ? 0
: int(IsRowMajor) ? ColsAtCompileTime
: RowsAtCompileTime)
enum {
// We do not need to know the outer stride for vectors
OuterStrideAtCompileTime = IsVectorAtCompileTime ? 0
: int(IsRowMajor) ? ColsAtCompileTime
: RowsAtCompileTime
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator()
: m_d(0,OuterStrideAtCompileTime)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
EIGEN_DEVICE_FUNC explicit evaluator(const PlainObjectType& m)
: m_data(m.data()), m_outerStride(IsVectorAtCompileTime ? 0 : m.outerStride())
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const PlainObjectType& m)
: m_d(m.data(),IsVectorAtCompileTime ? 0 : m.outerStride())
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
@@ -161,30 +202,30 @@ struct evaluator<PlainObjectBase<Derived> >
CoeffReturnType coeff(Index row, Index col) const
{
if (IsRowMajor)
return m_data[row * m_outerStride.value() + col];
return m_d.data[row * m_d.outerStride() + col];
else
return m_data[row + col * m_outerStride.value()];
return m_d.data[row + col * m_d.outerStride()];
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
return m_data[index];
return m_d.data[index];
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index row, Index col)
{
if (IsRowMajor)
return const_cast<Scalar*>(m_data)[row * m_outerStride.value() + col];
return const_cast<Scalar*>(m_d.data)[row * m_d.outerStride() + col];
else
return const_cast<Scalar*>(m_data)[row + col * m_outerStride.value()];
return const_cast<Scalar*>(m_d.data)[row + col * m_d.outerStride()];
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
return const_cast<Scalar*>(m_data)[index];
return const_cast<Scalar*>(m_d.data)[index];
}
template<int LoadMode, typename PacketType>
@@ -192,16 +233,16 @@ struct evaluator<PlainObjectBase<Derived> >
PacketType packet(Index row, Index col) const
{
if (IsRowMajor)
return ploadt<PacketType, LoadMode>(m_data + row * m_outerStride.value() + col);
return ploadt<PacketType, LoadMode>(m_d.data + row * m_d.outerStride() + col);
else
return ploadt<PacketType, LoadMode>(m_data + row + col * m_outerStride.value());
return ploadt<PacketType, LoadMode>(m_d.data + row + col * m_d.outerStride());
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
{
return ploadt<PacketType, LoadMode>(m_data + index);
return ploadt<PacketType, LoadMode>(m_d.data + index);
}
template<int StoreMode,typename PacketType>
@@ -210,26 +251,22 @@ struct evaluator<PlainObjectBase<Derived> >
{
if (IsRowMajor)
return pstoret<Scalar, PacketType, StoreMode>
(const_cast<Scalar*>(m_data) + row * m_outerStride.value() + col, x);
(const_cast<Scalar*>(m_d.data) + row * m_d.outerStride() + col, x);
else
return pstoret<Scalar, PacketType, StoreMode>
(const_cast<Scalar*>(m_data) + row + col * m_outerStride.value(), x);
(const_cast<Scalar*>(m_d.data) + row + col * m_d.outerStride(), x);
}
template<int StoreMode, typename PacketType>
EIGEN_STRONG_INLINE
void writePacket(Index index, const PacketType& x)
{
return pstoret<Scalar, PacketType, StoreMode>(const_cast<Scalar*>(m_data) + index, x);
return pstoret<Scalar, PacketType, StoreMode>(const_cast<Scalar*>(m_d.data) + index, x);
}
protected:
const Scalar *m_data;
// We do not need to know the outer stride for vectors
variable_if_dynamic<Index, IsVectorAtCompileTime ? 0
: int(IsRowMajor) ? ColsAtCompileTime
: RowsAtCompileTime> m_outerStride;
plainobjectbase_evaluator_data<Scalar,OuterStrideAtCompileTime> m_d;
};
template<typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
@@ -238,9 +275,11 @@ struct evaluator<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
{
typedef Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> XprType;
EIGEN_DEVICE_FUNC evaluator() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator() {}
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& m)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& m)
: evaluator<PlainObjectBase<XprType> >(m)
{ }
};
@@ -251,9 +290,11 @@ struct evaluator<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
{
typedef Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> XprType;
EIGEN_DEVICE_FUNC evaluator() {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
evaluator() {}
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& m)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& m)
: evaluator<PlainObjectBase<XprType> >(m)
{ }
};
@@ -272,7 +313,8 @@ struct unary_evaluator<Transpose<ArgType>, IndexBased>
Alignment = evaluator<ArgType>::Alignment
};
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& t) : m_argImpl(t.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& t) : m_argImpl(t.nestedExpression()) {}
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
@@ -527,9 +569,7 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& op)
: m_functor(op.functor()),
m_argImpl(op.nestedExpression())
explicit unary_evaluator(const XprType& op) : m_d(op)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits<UnaryOp>::Cost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
@@ -540,32 +580,43 @@ struct unary_evaluator<CwiseUnaryOp<UnaryOp, ArgType>, IndexBased >
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_functor(m_argImpl.coeff(row, col));
return m_d.func()(m_d.argImpl.coeff(row, col));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
return m_functor(m_argImpl.coeff(index));
return m_d.func()(m_d.argImpl.coeff(index));
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index row, Index col) const
{
return m_functor.packetOp(m_argImpl.template packet<LoadMode, PacketType>(row, col));
return m_d.func().packetOp(m_d.argImpl.template packet<LoadMode, PacketType>(row, col));
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
{
return m_functor.packetOp(m_argImpl.template packet<LoadMode, PacketType>(index));
return m_d.func().packetOp(m_d.argImpl.template packet<LoadMode, PacketType>(index));
}
protected:
const UnaryOp m_functor;
evaluator<ArgType> m_argImpl;
// this helper permits to completely eliminate the functor if it is empty
class Data : private UnaryOp
{
public:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : UnaryOp(xpr.functor()), argImpl(xpr.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& func() const { return static_cast<const UnaryOp&>(*this); }
evaluator<ArgType> argImpl;
};
Data m_d;
};
// -------------------- CwiseTernaryOp --------------------
@@ -609,11 +660,7 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
evaluator<Arg3>::Alignment)
};
EIGEN_DEVICE_FUNC explicit ternary_evaluator(const XprType& xpr)
: m_functor(xpr.functor()),
m_arg1Impl(xpr.arg1()),
m_arg2Impl(xpr.arg2()),
m_arg3Impl(xpr.arg3())
EIGEN_DEVICE_FUNC explicit ternary_evaluator(const XprType& xpr) : m_d(xpr)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits<TernaryOp>::Cost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
@@ -624,38 +671,47 @@ struct ternary_evaluator<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>, IndexBased
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_functor(m_arg1Impl.coeff(row, col), m_arg2Impl.coeff(row, col), m_arg3Impl.coeff(row, col));
return m_d.func()(m_d.arg1Impl.coeff(row, col), m_d.arg2Impl.coeff(row, col), m_d.arg3Impl.coeff(row, col));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
return m_functor(m_arg1Impl.coeff(index), m_arg2Impl.coeff(index), m_arg3Impl.coeff(index));
return m_d.func()(m_d.arg1Impl.coeff(index), m_d.arg2Impl.coeff(index), m_d.arg3Impl.coeff(index));
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index row, Index col) const
{
return m_functor.packetOp(m_arg1Impl.template packet<LoadMode,PacketType>(row, col),
m_arg2Impl.template packet<LoadMode,PacketType>(row, col),
m_arg3Impl.template packet<LoadMode,PacketType>(row, col));
return m_d.func().packetOp(m_d.arg1Impl.template packet<LoadMode,PacketType>(row, col),
m_d.arg2Impl.template packet<LoadMode,PacketType>(row, col),
m_d.arg3Impl.template packet<LoadMode,PacketType>(row, col));
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
{
return m_functor.packetOp(m_arg1Impl.template packet<LoadMode,PacketType>(index),
m_arg2Impl.template packet<LoadMode,PacketType>(index),
m_arg3Impl.template packet<LoadMode,PacketType>(index));
return m_d.func().packetOp(m_d.arg1Impl.template packet<LoadMode,PacketType>(index),
m_d.arg2Impl.template packet<LoadMode,PacketType>(index),
m_d.arg3Impl.template packet<LoadMode,PacketType>(index));
}
protected:
const TernaryOp m_functor;
evaluator<Arg1> m_arg1Impl;
evaluator<Arg2> m_arg2Impl;
evaluator<Arg3> m_arg3Impl;
// this helper permits to completely eliminate the functor if it is empty
struct Data : private TernaryOp
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : TernaryOp(xpr.functor()), arg1Impl(xpr.arg1()), arg2Impl(xpr.arg2()), arg3Impl(xpr.arg3()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const TernaryOp& func() const { return static_cast<const TernaryOp&>(*this); }
evaluator<Arg1> arg1Impl;
evaluator<Arg2> arg2Impl;
evaluator<Arg3> arg3Impl;
};
Data m_d;
};
// -------------------- CwiseBinaryOp --------------------
@@ -668,7 +724,8 @@ struct evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
typedef binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs> > Base;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : Base(xpr) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& xpr) : Base(xpr) {}
};
template<typename BinaryOp, typename Lhs, typename Rhs>
@@ -696,10 +753,8 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator<Lhs>::Alignment,evaluator<Rhs>::Alignment)
};
EIGEN_DEVICE_FUNC explicit binary_evaluator(const XprType& xpr)
: m_functor(xpr.functor()),
m_lhsImpl(xpr.lhs()),
m_rhsImpl(xpr.rhs())
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit binary_evaluator(const XprType& xpr) : m_d(xpr)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits<BinaryOp>::Cost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
@@ -710,35 +765,45 @@ struct binary_evaluator<CwiseBinaryOp<BinaryOp, Lhs, Rhs>, IndexBased, IndexBase
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_functor(m_lhsImpl.coeff(row, col), m_rhsImpl.coeff(row, col));
return m_d.func()(m_d.lhsImpl.coeff(row, col), m_d.rhsImpl.coeff(row, col));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
return m_functor(m_lhsImpl.coeff(index), m_rhsImpl.coeff(index));
return m_d.func()(m_d.lhsImpl.coeff(index), m_d.rhsImpl.coeff(index));
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index row, Index col) const
{
return m_functor.packetOp(m_lhsImpl.template packet<LoadMode,PacketType>(row, col),
m_rhsImpl.template packet<LoadMode,PacketType>(row, col));
return m_d.func().packetOp(m_d.lhsImpl.template packet<LoadMode,PacketType>(row, col),
m_d.rhsImpl.template packet<LoadMode,PacketType>(row, col));
}
template<int LoadMode, typename PacketType>
EIGEN_STRONG_INLINE
PacketType packet(Index index) const
{
return m_functor.packetOp(m_lhsImpl.template packet<LoadMode,PacketType>(index),
m_rhsImpl.template packet<LoadMode,PacketType>(index));
return m_d.func().packetOp(m_d.lhsImpl.template packet<LoadMode,PacketType>(index),
m_d.rhsImpl.template packet<LoadMode,PacketType>(index));
}
protected:
const BinaryOp m_functor;
evaluator<Lhs> m_lhsImpl;
evaluator<Rhs> m_rhsImpl;
// this helper permits to completely eliminate the functor if it is empty
struct Data : private BinaryOp
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : BinaryOp(xpr.functor()), lhsImpl(xpr.lhs()), rhsImpl(xpr.rhs()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const BinaryOp& func() const { return static_cast<const BinaryOp&>(*this); }
evaluator<Lhs> lhsImpl;
evaluator<Rhs> rhsImpl;
};
Data m_d;
};
// -------------------- CwiseUnaryView --------------------
@@ -757,9 +822,7 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
Alignment = 0 // FIXME it is not very clear why alignment is necessarily lost...
};
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& op)
: m_unaryOp(op.functor()),
m_argImpl(op.nestedExpression())
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& op) : m_d(op)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits<UnaryOp>::Cost);
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
@@ -771,30 +834,40 @@ struct unary_evaluator<CwiseUnaryView<UnaryOp, ArgType>, IndexBased>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_unaryOp(m_argImpl.coeff(row, col));
return m_d.func()(m_d.argImpl.coeff(row, col));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
return m_unaryOp(m_argImpl.coeff(index));
return m_d.func()(m_d.argImpl.coeff(index));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index row, Index col)
{
return m_unaryOp(m_argImpl.coeffRef(row, col));
return m_d.func()(m_d.argImpl.coeffRef(row, col));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
return m_unaryOp(m_argImpl.coeffRef(index));
return m_d.func()(m_d.argImpl.coeffRef(index));
}
protected:
const UnaryOp m_unaryOp;
evaluator<ArgType> m_argImpl;
// this helper permits to completely eliminate the functor if it is empty
struct Data : private UnaryOp
{
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Data(const XprType& xpr) : UnaryOp(xpr.functor()), argImpl(xpr.nestedExpression()) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& func() const { return static_cast<const UnaryOp&>(*this); }
evaluator<ArgType> argImpl;
};
Data m_d;
};
// -------------------- Map --------------------
@@ -818,7 +891,8 @@ struct mapbase_evaluator : evaluator_base<Derived>
CoeffReadCost = NumTraits<Scalar>::ReadCost
};
EIGEN_DEVICE_FUNC explicit mapbase_evaluator(const XprType& map)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit mapbase_evaluator(const XprType& map)
: m_data(const_cast<PointerType>(map.data())),
m_innerStride(map.innerStride()),
m_outerStride(map.outerStride())
@@ -882,10 +956,10 @@ struct mapbase_evaluator : evaluator_base<Derived>
internal::pstoret<Scalar, PacketType, StoreMode>(m_data + index * m_innerStride.value(), x);
}
protected:
EIGEN_DEVICE_FUNC
inline Index rowStride() const { return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value(); }
EIGEN_DEVICE_FUNC
inline Index colStride() const { return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rowStride() const { return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index colStride() const { return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value(); }
PointerType m_data;
const internal::variable_if_dynamic<Index, XprType::InnerStrideAtCompileTime> m_innerStride;
@@ -938,7 +1012,8 @@ struct evaluator<Ref<PlainObjectType, RefOptions, StrideType> >
Alignment = evaluator<Map<PlainObjectType, RefOptions, StrideType> >::Alignment
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& ref)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& ref)
: mapbase_evaluator<XprType, PlainObjectType>(ref)
{ }
};
@@ -993,7 +1068,8 @@ struct evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator<ArgType>::Alignment, Alignment0)
};
typedef block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel> block_evaluator_type;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& block) : block_evaluator_type(block)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& block) : block_evaluator_type(block)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
@@ -1006,7 +1082,8 @@ struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /*HasDirectAcc
{
typedef Block<ArgType, BlockRows, BlockCols, InnerPanel> XprType;
EIGEN_DEVICE_FUNC explicit block_evaluator(const XprType& block)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit block_evaluator(const XprType& block)
: unary_evaluator<XprType>(block)
{}
};
@@ -1017,11 +1094,12 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
{
typedef Block<ArgType, BlockRows, BlockCols, InnerPanel> XprType;
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& block)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& block)
: m_argImpl(block.nestedExpression()),
m_startRow(block.startRow()),
m_startCol(block.startCol()),
m_linear_offset(InnerPanel?(XprType::IsRowMajor ? block.startRow()*block.cols() : block.startCol()*block.rows()):0)
m_linear_offset(ForwardLinearAccess?(ArgType::IsRowMajor ? block.startRow()*block.nestedExpression().cols() + block.startCol() : block.startCol()*block.nestedExpression().rows() + block.startRow()):0)
{ }
typedef typename XprType::Scalar Scalar;
@@ -1029,7 +1107,7 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
enum {
RowsAtCompileTime = XprType::RowsAtCompileTime,
ForwardLinearAccess = InnerPanel && bool(evaluator<ArgType>::Flags&LinearAccessBit)
ForwardLinearAccess = (InnerPanel || int(XprType::IsRowMajor)==int(ArgType::IsRowMajor)) && bool(evaluator<ArgType>::Flags&LinearAccessBit)
};
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -1040,11 +1118,8 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index index) const
{
if (ForwardLinearAccess)
return m_argImpl.coeff(m_linear_offset.value() + index);
else
return coeff(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0);
{
return linear_coeff_impl(index, bool_constant<ForwardLinearAccess>());
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
@@ -1055,11 +1130,8 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index index)
{
if (ForwardLinearAccess)
return m_argImpl.coeffRef(m_linear_offset.value() + index);
else
return coeffRef(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0);
{
return linear_coeffRef_impl(index, bool_constant<ForwardLinearAccess>());
}
template<int LoadMode, typename PacketType>
@@ -1100,10 +1172,32 @@ struct unary_evaluator<Block<ArgType, BlockRows, BlockCols, InnerPanel>, IndexBa
}
protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType linear_coeff_impl(Index index, internal::true_type /* ForwardLinearAccess */) const
{
return m_argImpl.coeff(m_linear_offset.value() + index);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType linear_coeff_impl(Index index, internal::false_type /* not ForwardLinearAccess */) const
{
return coeff(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& linear_coeffRef_impl(Index index, internal::true_type /* ForwardLinearAccess */)
{
return m_argImpl.coeffRef(m_linear_offset.value() + index);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& linear_coeffRef_impl(Index index, internal::false_type /* not ForwardLinearAccess */)
{
return coeffRef(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0);
}
evaluator<ArgType> m_argImpl;
const variable_if_dynamic<Index, (ArgType::RowsAtCompileTime == 1 && BlockRows==1) ? 0 : Dynamic> m_startRow;
const variable_if_dynamic<Index, (ArgType::ColsAtCompileTime == 1 && BlockCols==1) ? 0 : Dynamic> m_startCol;
const variable_if_dynamic<Index, InnerPanel ? Dynamic : 0> m_linear_offset;
const variable_if_dynamic<Index, ForwardLinearAccess ? Dynamic : 0> m_linear_offset;
};
// TODO: This evaluator does not actually use the child evaluator;
@@ -1117,7 +1211,8 @@ struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /* HasDirectAc
typedef Block<ArgType, BlockRows, BlockCols, InnerPanel> XprType;
typedef typename XprType::Scalar Scalar;
EIGEN_DEVICE_FUNC explicit block_evaluator(const XprType& block)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit block_evaluator(const XprType& block)
: mapbase_evaluator<XprType, typename XprType::PlainObject>(block)
{
// TODO: for the 3.3 release, this should be turned to an internal assertion, but let's keep it as is for the beta lifetime
@@ -1145,7 +1240,8 @@ struct evaluator<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator<ThenMatrixType>::Alignment, evaluator<ElseMatrixType>::Alignment)
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& select)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& select)
: m_conditionImpl(select.conditionMatrix()),
m_thenImpl(select.thenMatrix()),
m_elseImpl(select.elseMatrix())
@@ -1202,7 +1298,8 @@ struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
Alignment = evaluator<ArgTypeNestedCleaned>::Alignment
};
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& replicate)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& replicate)
: m_arg(replicate.nestedExpression()),
m_argImpl(m_arg),
m_rows(replicate.nestedExpression().rows()),
@@ -1266,64 +1363,6 @@ protected:
const variable_if_dynamic<Index, ArgType::ColsAtCompileTime> m_cols;
};
// -------------------- PartialReduxExpr --------------------
template< typename ArgType, typename MemberOp, int Direction>
struct evaluator<PartialReduxExpr<ArgType, MemberOp, Direction> >
: evaluator_base<PartialReduxExpr<ArgType, MemberOp, Direction> >
{
typedef PartialReduxExpr<ArgType, MemberOp, Direction> XprType;
typedef typename internal::nested_eval<ArgType,1>::type ArgTypeNested;
typedef typename internal::remove_all<ArgTypeNested>::type ArgTypeNestedCleaned;
typedef typename ArgType::Scalar InputScalar;
typedef typename XprType::Scalar Scalar;
enum {
TraversalSize = Direction==int(Vertical) ? int(ArgType::RowsAtCompileTime) : int(ArgType::ColsAtCompileTime)
};
typedef typename MemberOp::template Cost<InputScalar,int(TraversalSize)> CostOpType;
enum {
CoeffReadCost = TraversalSize==Dynamic ? HugeCost
: TraversalSize * evaluator<ArgType>::CoeffReadCost + int(CostOpType::value),
Flags = (traits<XprType>::Flags&RowMajorBit) | (evaluator<ArgType>::Flags&(HereditaryBits&(~RowMajorBit))) | LinearAccessBit,
Alignment = 0 // FIXME this will need to be improved once PartialReduxExpr is vectorized
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType xpr)
: m_arg(xpr.nestedExpression()), m_functor(xpr.functor())
{
EIGEN_INTERNAL_CHECK_COST_VALUE(TraversalSize==Dynamic ? HugeCost : int(CostOpType::value));
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
typedef typename XprType::CoeffReturnType CoeffReturnType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar coeff(Index i, Index j) const
{
if (Direction==Vertical)
return m_functor(m_arg.col(j));
else
return m_functor(m_arg.row(i));
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar coeff(Index index) const
{
if (Direction==Vertical)
return m_functor(m_arg.col(index));
else
return m_functor(m_arg.row(index));
}
protected:
typename internal::add_const_on_value_type<ArgTypeNested>::type m_arg;
const MemberOp m_functor;
};
// -------------------- MatrixWrapper and ArrayWrapper --------------------
//
// evaluator_wrapper_base<T> is a common base class for the
@@ -1340,7 +1379,8 @@ struct evaluator_wrapper_base
Alignment = evaluator<ArgType>::Alignment
};
EIGEN_DEVICE_FUNC explicit evaluator_wrapper_base(const ArgType& arg) : m_argImpl(arg) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator_wrapper_base(const ArgType& arg) : m_argImpl(arg) {}
typedef typename ArgType::Scalar Scalar;
typedef typename ArgType::CoeffReturnType CoeffReturnType;
@@ -1407,7 +1447,8 @@ struct unary_evaluator<MatrixWrapper<TArgType> >
{
typedef MatrixWrapper<TArgType> XprType;
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& wrapper)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& wrapper)
: evaluator_wrapper_base<MatrixWrapper<TArgType> >(wrapper.nestedExpression())
{ }
};
@@ -1418,7 +1459,8 @@ struct unary_evaluator<ArrayWrapper<TArgType> >
{
typedef ArrayWrapper<TArgType> XprType;
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& wrapper)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& wrapper)
: evaluator_wrapper_base<ArrayWrapper<TArgType> >(wrapper.nestedExpression())
{ }
};
@@ -1460,7 +1502,8 @@ struct unary_evaluator<Reverse<ArgType, Direction> >
Alignment = 0 // FIXME in some rare cases, Alignment could be preserved, like a Vector4f.
};
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& reverse)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& reverse)
: m_argImpl(reverse.nestedExpression()),
m_rows(ReverseRow ? reverse.nestedExpression().rows() : 1),
m_cols(ReverseCol ? reverse.nestedExpression().cols() : 1)
@@ -1567,7 +1610,8 @@ struct evaluator<Diagonal<ArgType, DiagIndex> >
Alignment = 0
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& diagonal)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit evaluator(const XprType& diagonal)
: m_argImpl(diagonal.nestedExpression()),
m_index(diagonal.index())
{ }

View File

@@ -48,6 +48,11 @@ public:
* Explicit zeros are not skipped over. To skip explicit zeros, see class SparseView
*/
EIGEN_STRONG_INLINE InnerIterator& operator++() { m_iter.operator++(); return *this; }
EIGEN_STRONG_INLINE InnerIterator& operator+=(Index i) { m_iter.operator+=(i); return *this; }
EIGEN_STRONG_INLINE InnerIterator operator+(Index i)
{ InnerIterator result(*this); result+=i; return result; }
/// \returns the column or row index of the current coefficient.
EIGEN_STRONG_INLINE Index index() const { return m_iter.index(); }
/// \returns the row index of the current coefficient.

View File

@@ -100,8 +100,14 @@ class CwiseBinaryOp :
typedef typename internal::remove_reference<LhsNested>::type _LhsNested;
typedef typename internal::remove_reference<RhsNested>::type _RhsNested;
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs, const BinaryOp& func = BinaryOp())
#if EIGEN_COMP_MSVC && EIGEN_HAS_CXX11
//Required for Visual Studio or the Copy constructor will probably not get inlined!
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CwiseBinaryOp(const CwiseBinaryOp<BinaryOp,LhsType,RhsType>&) = default;
#endif
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs, const BinaryOp& func = BinaryOp())
: m_lhs(aLhs), m_rhs(aRhs), m_functor(func)
{
EIGEN_CHECK_BINARY_COMPATIBILIY(BinaryOp,typename Lhs::Scalar,typename Rhs::Scalar);
@@ -110,16 +116,16 @@ class CwiseBinaryOp :
eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols());
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic)
return m_rhs.rows();
else
return m_lhs.rows();
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const {
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic)
return m_rhs.cols();
@@ -128,13 +134,13 @@ class CwiseBinaryOp :
}
/** \returns the left hand side nested expression */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const _LhsNested& lhs() const { return m_lhs; }
/** \returns the right hand side nested expression */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const _RhsNested& rhs() const { return m_rhs; }
/** \returns the functor representing the binary operation */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const BinaryOp& functor() const { return m_functor; }
protected:
@@ -158,7 +164,7 @@ public:
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived &
MatrixBase<Derived>::operator-=(const MatrixBase<OtherDerived> &other)
{
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar,typename OtherDerived::Scalar>());
@@ -171,7 +177,7 @@ MatrixBase<Derived>::operator-=(const MatrixBase<OtherDerived> &other)
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived &
MatrixBase<Derived>::operator+=(const MatrixBase<OtherDerived>& other)
{
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar,typename OtherDerived::Scalar>());
@@ -181,4 +187,3 @@ MatrixBase<Derived>::operator+=(const MatrixBase<OtherDerived>& other)
} // end namespace Eigen
#endif // EIGEN_CWISE_BINARY_OP_H

View File

@@ -105,7 +105,12 @@ class CwiseNullaryOp : public internal::dense_xpr_base< CwiseNullaryOp<NullaryOp
*/
template<typename Derived>
template<typename CustomNullaryOp>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CwiseNullaryOp<CustomNullaryOp, typename DenseBase<Derived>::PlainObject>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
#ifndef EIGEN_PARSED_BY_DOXYGEN
const CwiseNullaryOp<CustomNullaryOp,typename DenseBase<Derived>::PlainObject>
#else
const CwiseNullaryOp<CustomNullaryOp,PlainObject>
#endif
DenseBase<Derived>::NullaryExpr(Index rows, Index cols, const CustomNullaryOp& func)
{
return CwiseNullaryOp<CustomNullaryOp, PlainObject>(rows, cols, func);
@@ -131,7 +136,12 @@ DenseBase<Derived>::NullaryExpr(Index rows, Index cols, const CustomNullaryOp& f
*/
template<typename Derived>
template<typename CustomNullaryOp>
EIGEN_STRONG_INLINE const CwiseNullaryOp<CustomNullaryOp, typename DenseBase<Derived>::PlainObject>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
#ifndef EIGEN_PARSED_BY_DOXYGEN
const CwiseNullaryOp<CustomNullaryOp, typename DenseBase<Derived>::PlainObject>
#else
const CwiseNullaryOp<CustomNullaryOp, PlainObject>
#endif
DenseBase<Derived>::NullaryExpr(Index size, const CustomNullaryOp& func)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
@@ -150,7 +160,12 @@ DenseBase<Derived>::NullaryExpr(Index size, const CustomNullaryOp& func)
*/
template<typename Derived>
template<typename CustomNullaryOp>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CwiseNullaryOp<CustomNullaryOp, typename DenseBase<Derived>::PlainObject>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
#ifndef EIGEN_PARSED_BY_DOXYGEN
const CwiseNullaryOp<CustomNullaryOp, typename DenseBase<Derived>::PlainObject>
#else
const CwiseNullaryOp<CustomNullaryOp, PlainObject>
#endif
DenseBase<Derived>::NullaryExpr(const CustomNullaryOp& func)
{
return CwiseNullaryOp<CustomNullaryOp, PlainObject>(RowsAtCompileTime, ColsAtCompileTime, func);
@@ -170,7 +185,7 @@ DenseBase<Derived>::NullaryExpr(const CustomNullaryOp& func)
* \sa class CwiseNullaryOp
*/
template<typename Derived>
EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
DenseBase<Derived>::Constant(Index rows, Index cols, const Scalar& value)
{
return DenseBase<Derived>::NullaryExpr(rows, cols, internal::scalar_constant_op<Scalar>(value));
@@ -217,27 +232,32 @@ DenseBase<Derived>::Constant(const Scalar& value)
/** \deprecated because of accuracy loss. In Eigen 3.3, it is an alias for LinSpaced(Index,const Scalar&,const Scalar&)
*
* \sa LinSpaced(Index,Scalar,Scalar), setLinSpaced(Index,const Scalar&,const Scalar&)
* \only_for_vectors
*
* Example: \include DenseBase_LinSpaced_seq_deprecated.cpp
* Output: \verbinclude DenseBase_LinSpaced_seq_deprecated.out
*
* \sa LinSpaced(Index,const Scalar&, const Scalar&), setLinSpaced(Index,const Scalar&,const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomAccessLinSpacedReturnType
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomAccessLinSpacedReturnType
DenseBase<Derived>::LinSpaced(Sequential_t, Index size, const Scalar& low, const Scalar& high)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
return DenseBase<Derived>::NullaryExpr(size, internal::linspaced_op<Scalar,PacketScalar>(low,high,size));
return DenseBase<Derived>::NullaryExpr(size, internal::linspaced_op<Scalar>(low,high,size));
}
/** \deprecated because of accuracy loss. In Eigen 3.3, it is an alias for LinSpaced(const Scalar&,const Scalar&)
*
* \sa LinSpaced(Scalar,Scalar)
* \sa LinSpaced(const Scalar&, const Scalar&)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomAccessLinSpacedReturnType
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomAccessLinSpacedReturnType
DenseBase<Derived>::LinSpaced(Sequential_t, const Scalar& low, const Scalar& high)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
return DenseBase<Derived>::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op<Scalar,PacketScalar>(low,high,Derived::SizeAtCompileTime));
return DenseBase<Derived>::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op<Scalar>(low,high,Derived::SizeAtCompileTime));
}
/**
@@ -268,7 +288,7 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomA
DenseBase<Derived>::LinSpaced(Index size, const Scalar& low, const Scalar& high)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
return DenseBase<Derived>::NullaryExpr(size, internal::linspaced_op<Scalar,PacketScalar>(low,high,size));
return DenseBase<Derived>::NullaryExpr(size, internal::linspaced_op<Scalar>(low,high,size));
}
/**
@@ -281,7 +301,7 @@ DenseBase<Derived>::LinSpaced(const Scalar& low, const Scalar& high)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
return DenseBase<Derived>::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op<Scalar,PacketScalar>(low,high,Derived::SizeAtCompileTime));
return DenseBase<Derived>::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op<Scalar>(low,high,Derived::SizeAtCompileTime));
}
/** \returns true if all coefficients in this matrix are approximately equal to \a val, to within precision \a prec */
@@ -383,7 +403,7 @@ template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setLinSpaced(Index newSize, const Scalar& low, const Scalar& high)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
return derived() = Derived::NullaryExpr(newSize, internal::linspaced_op<Scalar,PacketScalar>(low,high,newSize));
return derived() = Derived::NullaryExpr(newSize, internal::linspaced_op<Scalar>(low,high,newSize));
}
/**
@@ -861,6 +881,42 @@ template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::UnitW()
{ return Derived::Unit(3); }
/** \brief Set the coefficients of \c *this to the i-th unit (basis) vector
*
* \param i index of the unique coefficient to be set to 1
*
* \only_for_vectors
*
* \sa MatrixBase::setIdentity(), class CwiseNullaryOp, MatrixBase::Unit(Index,Index)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::setUnit(Index i)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
eigen_assert(i<size());
derived().setZero();
derived().coeffRef(i) = Scalar(1);
return derived();
}
/** \brief Resizes to the given \a newSize, and writes the i-th unit (basis) vector into *this.
*
* \param newSize the new size of the vector
* \param i index of the unique coefficient to be set to 1
*
* \only_for_vectors
*
* \sa MatrixBase::setIdentity(), class CwiseNullaryOp, MatrixBase::Unit(Index,Index)
*/
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::setUnit(Index newSize, Index i)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
eigen_assert(i<newSize);
derived().resize(newSize);
return setUnit(i);
}
} // end namespace Eigen
#endif // EIGEN_CWISE_NULLARY_OP_H

View File

@@ -81,7 +81,7 @@ class CwiseUnaryView : public CwiseUnaryViewImpl<ViewOp, MatrixType, typename in
/** \returns the nested expression */
typename internal::remove_reference<MatrixTypeNested>::type&
nestedExpression() { return m_matrix.const_cast_derived(); }
nestedExpression() { return m_matrix; }
protected:
MatrixTypeNested m_matrix;
@@ -121,8 +121,6 @@ class CwiseUnaryViewImpl<ViewOp,MatrixType,Dense>
{
return derived().nestedExpression().outerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(CwiseUnaryViewImpl)
};
} // end namespace Eigen

View File

@@ -150,13 +150,18 @@ template<typename Derived> class DenseBase
* \sa SizeAtCompileTime, MaxRowsAtCompileTime, MaxColsAtCompileTime
*/
IsVectorAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime == 1
|| internal::traits<Derived>::MaxColsAtCompileTime == 1,
IsVectorAtCompileTime = internal::traits<Derived>::RowsAtCompileTime == 1
|| internal::traits<Derived>::ColsAtCompileTime == 1,
/**< This is set to true if either the number of rows or the number of
* columns is known at compile-time to be equal to 1. Indeed, in that case,
* we are dealing with a column-vector (if there is only one column) or with
* a row-vector (if there is only one row). */
NumDimensions = int(MaxSizeAtCompileTime) == 1 ? 0 : bool(IsVectorAtCompileTime) ? 1 : 2,
/**< This value is equal to Tensor::NumDimensions, i.e. 0 for scalars, 1 for vectors,
* and 2 for matrices.
*/
Flags = internal::traits<Derived>::Flags,
/**< This stores expression \ref flags flags which may or may not be inherited by new expressions
* constructed from this one. See the \ref flags "list of flags".
@@ -261,9 +266,9 @@ template<typename Derived> class DenseBase
/** \internal Represents a matrix with all coefficients equal to one another*/
typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>,PlainObject> ConstantReturnType;
/** \internal \deprecated Represents a vector with linearly spaced coefficients that allows sequential access only. */
typedef CwiseNullaryOp<internal::linspaced_op<Scalar,PacketScalar>,PlainObject> SequentialLinSpacedReturnType;
EIGEN_DEPRECATED typedef CwiseNullaryOp<internal::linspaced_op<Scalar>,PlainObject> SequentialLinSpacedReturnType;
/** \internal Represents a vector with linearly spaced coefficients that allows random access. */
typedef CwiseNullaryOp<internal::linspaced_op<Scalar,PacketScalar>,PlainObject> RandomAccessLinSpacedReturnType;
typedef CwiseNullaryOp<internal::linspaced_op<Scalar>,PlainObject> RandomAccessLinSpacedReturnType;
/** \internal the return type of MatrixBase::eigenvalues() */
typedef Matrix<typename NumTraits<typename internal::traits<Derived>::Scalar>::Real, internal::traits<Derived>::ColsAtCompileTime, 1> EigenvaluesReturnType;
@@ -297,17 +302,17 @@ template<typename Derived> class DenseBase
Derived& operator=(const ReturnByValue<OtherDerived>& func);
/** \internal
* Copies \a other into *this without evaluating other. \returns a reference to *this.
* \deprecated */
* Copies \a other into *this without evaluating other. \returns a reference to *this. */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
/** \deprecated */
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC
Derived& lazyAssign(const DenseBase<OtherDerived>& other);
EIGEN_DEVICE_FUNC
CommaInitializer<Derived> operator<< (const Scalar& s);
/** \deprecated it now returns \c *this */
template<unsigned int Added,unsigned int Removed>
/** \deprecated it now returns \c *this */
EIGEN_DEPRECATED
const Derived& flagged() const
{ return derived(); }
@@ -332,12 +337,13 @@ template<typename Derived> class DenseBase
EIGEN_DEVICE_FUNC static const ConstantReturnType
Constant(const Scalar& value);
EIGEN_DEVICE_FUNC static const SequentialLinSpacedReturnType
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType
LinSpaced(Sequential_t, Index size, const Scalar& low, const Scalar& high);
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType
LinSpaced(Sequential_t, const Scalar& low, const Scalar& high);
EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType
LinSpaced(Index size, const Scalar& low, const Scalar& high);
EIGEN_DEVICE_FUNC static const SequentialLinSpacedReturnType
LinSpaced(Sequential_t, const Scalar& low, const Scalar& high);
EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType
LinSpaced(const Scalar& low, const Scalar& high);
@@ -369,7 +375,7 @@ template<typename Derived> class DenseBase
template<typename OtherDerived> EIGEN_DEVICE_FUNC
bool isApprox(const DenseBase<OtherDerived>& other,
const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
bool isMuchSmallerThan(const RealScalar& other,
const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
template<typename OtherDerived> EIGEN_DEVICE_FUNC
@@ -380,7 +386,7 @@ template<typename Derived> class DenseBase
EIGEN_DEVICE_FUNC bool isConstant(const Scalar& value, const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
EIGEN_DEVICE_FUNC bool isZero(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
EIGEN_DEVICE_FUNC bool isOnes(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
inline bool hasNaN() const;
inline bool allFinite() const;
@@ -394,8 +400,8 @@ template<typename Derived> class DenseBase
*
* Notice that in the case of a plain matrix or vector (not an expression) this function just returns
* a const reference, in order to avoid a useless copy.
*
* \warning Be carefull with eval() and the auto C++ keyword, as detailed in this \link TopicPitfalls_auto_keyword page \endlink.
*
* \warning Be careful with eval() and the auto C++ keyword, as detailed in this \link TopicPitfalls_auto_keyword page \endlink.
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE EvalReturnType eval() const
@@ -410,7 +416,7 @@ template<typename Derived> class DenseBase
*
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void swap(const DenseBase<OtherDerived>& other)
{
EIGEN_STATIC_ASSERT(!OtherDerived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
@@ -422,7 +428,7 @@ template<typename Derived> class DenseBase
*
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void swap(PlainObjectBase<OtherDerived>& other)
{
eigen_assert(rows()==other.rows() && cols()==other.cols());
@@ -493,7 +499,7 @@ template<typename Derived> class DenseBase
typedef VectorwiseOp<Derived, Vertical> ColwiseReturnType;
typedef const VectorwiseOp<const Derived, Vertical> ConstColwiseReturnType;
/** \returns a VectorwiseOp wrapper of *this providing additional partial reduction operations
/** \returns a VectorwiseOp wrapper of *this for broadcasting and partial reductions
*
* Example: \include MatrixBase_rowwise.cpp
* Output: \verbinclude MatrixBase_rowwise.out
@@ -506,7 +512,7 @@ template<typename Derived> class DenseBase
}
EIGEN_DEVICE_FUNC RowwiseReturnType rowwise();
/** \returns a VectorwiseOp wrapper of *this providing additional partial reduction operations
/** \returns a VectorwiseOp wrapper of *this broadcasting and partial reductions
*
* Example: \include MatrixBase_colwise.cpp
* Output: \verbinclude MatrixBase_colwise.out
@@ -567,16 +573,59 @@ template<typename Derived> class DenseBase
}
EIGEN_DEVICE_FUNC void reverseInPlace();
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** STL-like <a href="https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator">RandomAccessIterator</a>
* iterator type as returned by the begin() and end() methods.
*/
typedef random_access_iterator_type iterator;
/** This is the const version of iterator (aka read-only) */
typedef random_access_iterator_type const_iterator;
#else
typedef typename internal::conditional< (Flags&DirectAccessBit)==DirectAccessBit,
internal::pointer_based_stl_iterator<Derived>,
internal::generic_randaccess_stl_iterator<Derived>
>::type iterator_type;
typedef typename internal::conditional< (Flags&DirectAccessBit)==DirectAccessBit,
internal::pointer_based_stl_iterator<const Derived>,
internal::generic_randaccess_stl_iterator<const Derived>
>::type const_iterator_type;
// Stl-style iterators are supported only for vectors.
typedef typename internal::conditional< IsVectorAtCompileTime,
iterator_type,
void
>::type iterator;
typedef typename internal::conditional< IsVectorAtCompileTime,
const_iterator_type,
void
>::type const_iterator;
#endif
inline iterator begin();
inline const_iterator begin() const;
inline const_iterator cbegin() const;
inline iterator end();
inline const_iterator end() const;
inline const_iterator cend() const;
#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::DenseBase
#define EIGEN_DOC_BLOCK_ADDONS_NOT_INNER_PANEL
#define EIGEN_DOC_BLOCK_ADDONS_INNER_PANEL_IF(COND)
#define EIGEN_DOC_UNARY_ADDONS(X,Y)
# include "../plugins/CommonCwiseUnaryOps.h"
# include "../plugins/BlockMethods.h"
# include "../plugins/IndexedViewMethods.h"
# include "../plugins/ReshapedMethods.h"
# ifdef EIGEN_DENSEBASE_PLUGIN
# include EIGEN_DENSEBASE_PLUGIN
# endif
#undef EIGEN_CURRENT_STORAGE_BASE_CLASS
#undef EIGEN_DOC_BLOCK_ADDONS_NOT_INNER_PANEL
#undef EIGEN_DOC_BLOCK_ADDONS_INNER_PANEL_IF
#undef EIGEN_DOC_UNARY_ADDONS
// disable the use of evalTo for dense objects with a nice compilation error
template<typename Dest>
@@ -587,12 +636,11 @@ template<typename Derived> class DenseBase
}
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(DenseBase)
/** Default constructor. Do nothing. */
EIGEN_DEVICE_FUNC DenseBase()
{
/* Just checks for self-consistency of the flags.
* Only do it when debugging Eigen, as this borders on paranoia and could slow compilation down
* Only do it when debugging Eigen, as this borders on paranoiac and could slow compilation down
*/
#ifdef EIGEN_INTERNAL_DEBUGGING
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, int(IsRowMajor))

View File

@@ -22,7 +22,8 @@ template<typename T> struct add_const_on_value_type_if_arithmetic
/** \brief Base class providing read-only coefficient access to matrices and arrays.
* \ingroup Core_Module
* \tparam Derived Type of the derived class
* \tparam #ReadOnlyAccessors Constant indicating read-only access
*
* \note #ReadOnlyAccessors Constant indicating read-only access
*
* This class defines the \c operator() \c const function and friends, which can be used to read specific
* entries of a matrix or array.
@@ -288,7 +289,8 @@ class DenseCoeffsBase<Derived,ReadOnlyAccessors> : public EigenBase<Derived>
/** \brief Base class providing read/write coefficient access to matrices and arrays.
* \ingroup Core_Module
* \tparam Derived Type of the derived class
* \tparam #WriteAccessors Constant indicating read/write access
*
* \note #WriteAccessors Constant indicating read/write access
*
* This class defines the non-const \c operator() function and friends, which can be used to write specific
* entries of a matrix or array. This class inherits DenseCoeffsBase<Derived, ReadOnlyAccessors> which
@@ -466,7 +468,8 @@ class DenseCoeffsBase<Derived, WriteAccessors> : public DenseCoeffsBase<Derived,
/** \brief Base class providing direct read-only coefficient access to matrices and arrays.
* \ingroup Core_Module
* \tparam Derived Type of the derived class
* \tparam #DirectAccessors Constant indicating direct access
*
* \note #DirectAccessors Constant indicating direct access
*
* This class defines functions to work with strides which can be used to access entries directly. This class
* inherits DenseCoeffsBase<Derived, ReadOnlyAccessors> which defines functions to access entries read-only using
@@ -539,7 +542,8 @@ class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived
/** \brief Base class providing direct read/write coefficient access to matrices and arrays.
* \ingroup Core_Module
* \tparam Derived Type of the derived class
* \tparam #DirectWriteAccessors Constant indicating direct access
*
* \note #DirectWriteAccessors Constant indicating direct access
*
* This class defines functions to work with strides which can be used to access entries directly. This class
* inherits DenseCoeffsBase<Derived, WriteAccessors> which defines functions to access entries read/write using

View File

@@ -61,7 +61,7 @@ struct plain_array
#if defined(EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT)
#define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask)
#elif EIGEN_GNUC_AT_LEAST(4,7)
// GCC 4.7 is too aggressive in its optimizations and remove the alignement test based on the fact the array is declared to be aligned.
// GCC 4.7 is too aggressive in its optimizations and remove the alignment test based on the fact the array is declared to be aligned.
// See this bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53900
// Hiding the origin of the array pointer behind a function argument seems to do the trick even if the function is inlined:
template<typename PtrType>
@@ -207,7 +207,9 @@ template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseSt
EIGEN_UNUSED_VARIABLE(rows);
EIGEN_UNUSED_VARIABLE(cols);
}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { std::swap(m_data,other.m_data); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data, other.m_data);
}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {}
@@ -267,7 +269,11 @@ template<typename T, int Size, int _Options> class DenseStorage<T, Size, Dynamic
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index cols) : m_rows(rows), m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{ std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); std::swap(m_cols,other.m_cols); }
{
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows() const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols() const {return m_cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index rows, Index cols) { m_rows = rows; m_cols = cols; }
@@ -296,7 +302,11 @@ template<typename T, int Size, int _Cols, int _Options> class DenseStorage<T, Si
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index) : m_rows(rows) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return _Cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index rows, Index) { m_rows = rows; }
@@ -325,11 +335,14 @@ template<typename T, int Size, int _Rows, int _Options> class DenseStorage<T, Si
return *this;
}
EIGEN_DEVICE_FUNC DenseStorage(Index, Index, Index cols) : m_cols(cols) {}
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_cols,other.m_cols); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
void conservativeResize(Index, Index, Index cols) { m_cols = cols; }
void resize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC void conservativeResize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC void resize(Index, Index, Index cols) { m_cols = cols; }
EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
EIGEN_DEVICE_FUNC T *data() { return m_data.array; }
};
@@ -381,16 +394,19 @@ template<typename T, int _Options> class DenseStorage<T, Dynamic, Dynamic, Dynam
EIGEN_DEVICE_FUNC
DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT
{
using std::swap;
swap(m_data, other.m_data);
swap(m_rows, other.m_rows);
swap(m_cols, other.m_cols);
numext::swap(m_data, other.m_data);
numext::swap(m_rows, other.m_rows);
numext::swap(m_cols, other.m_cols);
return *this;
}
#endif
EIGEN_DEVICE_FUNC ~DenseStorage() { internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, m_rows*m_cols); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other)
{ std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); std::swap(m_cols,other.m_cols); }
{
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
void conservativeResize(Index size, Index rows, Index cols)
@@ -459,14 +475,16 @@ template<typename T, int _Rows, int _Options> class DenseStorage<T, Dynamic, _Ro
EIGEN_DEVICE_FUNC
DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT
{
using std::swap;
swap(m_data, other.m_data);
swap(m_cols, other.m_cols);
numext::swap(m_data, other.m_data);
numext::swap(m_cols, other.m_cols);
return *this;
}
#endif
EIGEN_DEVICE_FUNC ~DenseStorage() { internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, _Rows*m_cols); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_cols,other.m_cols); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data,other.m_data);
numext::swap(m_cols,other.m_cols);
}
EIGEN_DEVICE_FUNC static Index rows(void) {return _Rows;}
EIGEN_DEVICE_FUNC Index cols(void) const {return m_cols;}
EIGEN_DEVICE_FUNC void conservativeResize(Index size, Index, Index cols)
@@ -533,14 +551,16 @@ template<typename T, int _Cols, int _Options> class DenseStorage<T, Dynamic, Dyn
EIGEN_DEVICE_FUNC
DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT
{
using std::swap;
swap(m_data, other.m_data);
swap(m_rows, other.m_rows);
numext::swap(m_data, other.m_data);
numext::swap(m_rows, other.m_rows);
return *this;
}
#endif
EIGEN_DEVICE_FUNC ~DenseStorage() { internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, _Cols*m_rows); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); }
EIGEN_DEVICE_FUNC void swap(DenseStorage& other) {
numext::swap(m_data,other.m_data);
numext::swap(m_rows,other.m_rows);
}
EIGEN_DEVICE_FUNC Index rows(void) const {return m_rows;}
EIGEN_DEVICE_FUNC static Index cols(void) {return _Cols;}
void conservativeResize(Index size, Index rows, Index)

View File

@@ -187,7 +187,7 @@ template<typename MatrixType, int _DiagIndex> class Diagonal
*
* \sa class Diagonal */
template<typename Derived>
inline typename MatrixBase<Derived>::DiagonalReturnType
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::DiagonalReturnType
MatrixBase<Derived>::diagonal()
{
return DiagonalReturnType(derived());
@@ -195,7 +195,7 @@ MatrixBase<Derived>::diagonal()
/** This is the const version of diagonal(). */
template<typename Derived>
inline typename MatrixBase<Derived>::ConstDiagonalReturnType
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::ConstDiagonalReturnType
MatrixBase<Derived>::diagonal() const
{
return ConstDiagonalReturnType(derived());
@@ -213,7 +213,7 @@ MatrixBase<Derived>::diagonal() const
*
* \sa MatrixBase::diagonal(), class Diagonal */
template<typename Derived>
inline typename MatrixBase<Derived>::DiagonalDynamicIndexReturnType
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::DiagonalDynamicIndexReturnType
MatrixBase<Derived>::diagonal(Index index)
{
return DiagonalDynamicIndexReturnType(derived(), index);
@@ -221,7 +221,7 @@ MatrixBase<Derived>::diagonal(Index index)
/** This is the const version of diagonal(Index). */
template<typename Derived>
inline typename MatrixBase<Derived>::ConstDiagonalDynamicIndexReturnType
EIGEN_DEVICE_FUNC inline typename MatrixBase<Derived>::ConstDiagonalDynamicIndexReturnType
MatrixBase<Derived>::diagonal(Index index) const
{
return ConstDiagonalDynamicIndexReturnType(derived(), index);
@@ -240,6 +240,7 @@ MatrixBase<Derived>::diagonal(Index index) const
* \sa MatrixBase::diagonal(), class Diagonal */
template<typename Derived>
template<int Index_>
EIGEN_DEVICE_FUNC
inline typename MatrixBase<Derived>::template DiagonalIndexReturnType<Index_>::Type
MatrixBase<Derived>::diagonal()
{
@@ -249,6 +250,7 @@ MatrixBase<Derived>::diagonal()
/** This is the const version of diagonal<int>(). */
template<typename Derived>
template<int Index_>
EIGEN_DEVICE_FUNC
inline typename MatrixBase<Derived>::template ConstDiagonalIndexReturnType<Index_>::Type
MatrixBase<Derived>::diagonal() const
{

View File

@@ -44,7 +44,7 @@ class DiagonalBase : public EigenBase<Derived>
EIGEN_DEVICE_FUNC
DenseMatrixType toDenseMatrix() const { return derived(); }
EIGEN_DEVICE_FUNC
inline const DiagonalVectorType& diagonal() const { return derived().diagonal(); }
EIGEN_DEVICE_FUNC
@@ -83,6 +83,30 @@ class DiagonalBase : public EigenBase<Derived>
{
return DiagonalWrapper<const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,DiagonalVectorType,product) >(scalar * other.diagonal());
}
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
#ifdef EIGEN_PARSED_BY_DOXYGEN
inline unspecified_expression_type
#else
inline const DiagonalWrapper<const EIGEN_CWISE_BINARY_RETURN_TYPE(DiagonalVectorType,typename OtherDerived::DiagonalVectorType,sum) >
#endif
operator+(const DiagonalBase<OtherDerived>& other) const
{
return (diagonal() + other.diagonal()).asDiagonal();
}
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
#ifdef EIGEN_PARSED_BY_DOXYGEN
inline unspecified_expression_type
#else
inline const DiagonalWrapper<const EIGEN_CWISE_BINARY_RETURN_TYPE(DiagonalVectorType,typename OtherDerived::DiagonalVectorType,difference) >
#endif
operator-(const DiagonalBase<OtherDerived>& other) const
{
return (diagonal() - other.diagonal()).asDiagonal();
}
};
#endif
@@ -154,6 +178,30 @@ class DiagonalMatrix
EIGEN_DEVICE_FUNC
inline DiagonalMatrix(const Scalar& x, const Scalar& y, const Scalar& z) : m_diagonal(x,y,z) {}
#if EIGEN_HAS_CXX11
/** \brief Construct a diagonal matrix with fixed size from an arbitrary number of coefficients. \cpp11
*
* There exists C++98 anologue constructors for fixed-size diagonal matrices having 2 or 3 coefficients.
*
* \warning To construct a diagonal matrix of fixed size, the number of values passed to this
* constructor must match the fixed dimension of \c *this.
*
* \sa DiagonalMatrix(const Scalar&, const Scalar&)
* \sa DiagonalMatrix(const Scalar&, const Scalar&, const Scalar&)
*/
template <typename... ArgTypes>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
DiagonalMatrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const ArgTypes&... args)
: m_diagonal(a0, a1, a2, args...) {}
/** \brief Constructs a DiagonalMatrix and initializes it by elements given by an initializer list of initializer
* lists \cpp11
*/
EIGEN_DEVICE_FUNC
explicit EIGEN_STRONG_INLINE DiagonalMatrix(const std::initializer_list<std::initializer_list<Scalar>>& list)
: m_diagonal(list) {}
#endif // EIGEN_HAS_CXX11
/** Copy constructor. */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
@@ -273,7 +321,7 @@ class DiagonalWrapper
* \sa class DiagonalWrapper, class DiagonalMatrix, diagonal(), isDiagonal()
**/
template<typename Derived>
inline const DiagonalWrapper<const Derived>
EIGEN_DEVICE_FUNC inline const DiagonalWrapper<const Derived>
MatrixBase<Derived>::asDiagonal() const
{
return DiagonalWrapper<const Derived>(derived());

View File

@@ -17,7 +17,7 @@ namespace Eigen {
*/
template<typename Derived>
template<typename DiagonalDerived>
inline const Product<Derived, DiagonalDerived, LazyProduct>
EIGEN_DEVICE_FUNC inline const Product<Derived, DiagonalDerived, LazyProduct>
MatrixBase<Derived>::operator*(const DiagonalBase<DiagonalDerived> &a_diagonal) const
{
return Product<Derived, DiagonalDerived, LazyProduct>(derived(),a_diagonal.derived());

View File

@@ -93,7 +93,7 @@ MatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const
* \sa dot(), norm(), lpNorm()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::squaredNorm() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::squaredNorm() const
{
return numext::real((*this).cwiseAbs2().sum());
}
@@ -105,7 +105,7 @@ EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scala
* \sa lpNorm(), dot(), squaredNorm()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::norm() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::norm() const
{
return numext::sqrt(squaredNorm());
}
@@ -120,7 +120,7 @@ EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scala
* \sa norm(), normalize()
*/
template<typename Derived>
EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::normalized() const
{
typedef typename internal::nested_eval<Derived,2>::type _Nested;
@@ -142,7 +142,7 @@ MatrixBase<Derived>::normalized() const
* \sa norm(), normalized()
*/
template<typename Derived>
EIGEN_STRONG_INLINE void MatrixBase<Derived>::normalize()
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void MatrixBase<Derived>::normalize()
{
RealScalar z = squaredNorm();
// NOTE: after extensive benchmarking, this conditional does not impact performance, at least on recent x86 CPU
@@ -163,7 +163,7 @@ EIGEN_STRONG_INLINE void MatrixBase<Derived>::normalize()
* \sa stableNorm(), stableNormalize(), normalized()
*/
template<typename Derived>
EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::stableNormalized() const
{
typedef typename internal::nested_eval<Derived,3>::type _Nested;
@@ -188,7 +188,7 @@ MatrixBase<Derived>::stableNormalized() const
* \sa stableNorm(), stableNormalized(), normalize()
*/
template<typename Derived>
EIGEN_STRONG_INLINE void MatrixBase<Derived>::stableNormalize()
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void MatrixBase<Derived>::stableNormalize()
{
RealScalar w = cwiseAbs().maxCoeff();
RealScalar z = (derived()/w).squaredNorm();
@@ -260,9 +260,9 @@ struct lpNorm_selector<Derived, Infinity>
template<typename Derived>
template<int p>
#ifndef EIGEN_PARSED_BY_DOXYGEN
inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
EIGEN_DEVICE_FUNC inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
#else
MatrixBase<Derived>::RealScalar
EIGEN_DEVICE_FUNC MatrixBase<Derived>::RealScalar
#endif
MatrixBase<Derived>::lpNorm() const
{

View File

@@ -32,8 +32,9 @@ template<typename Derived> struct EigenBase
/** \brief The interface type of indices
* \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE.
* \deprecated Since Eigen 3.3, its usage is deprecated. Use Eigen::Index instead.
* \sa StorageIndex, \ref TopicPreprocessorDirectives.
* DEPRECATED: Since Eigen 3.3, its usage is deprecated. Use Eigen::Index instead.
* Deprecation is not marked with a doxygen comment because there are too many existing usages to add the deprecation attribute.
*/
typedef Eigen::Index Index;

View File

@@ -100,7 +100,7 @@ struct isMuchSmallerThan_scalar_selector<Derived, true>
*/
template<typename Derived>
template<typename OtherDerived>
bool DenseBase<Derived>::isApprox(
EIGEN_DEVICE_FUNC bool DenseBase<Derived>::isApprox(
const DenseBase<OtherDerived>& other,
const RealScalar& prec
) const
@@ -122,7 +122,7 @@ bool DenseBase<Derived>::isApprox(
* \sa isApprox(), isMuchSmallerThan(const DenseBase<OtherDerived>&, RealScalar) const
*/
template<typename Derived>
bool DenseBase<Derived>::isMuchSmallerThan(
EIGEN_DEVICE_FUNC bool DenseBase<Derived>::isMuchSmallerThan(
const typename NumTraits<Scalar>::Real& other,
const RealScalar& prec
) const
@@ -142,7 +142,7 @@ bool DenseBase<Derived>::isMuchSmallerThan(
*/
template<typename Derived>
template<typename OtherDerived>
bool DenseBase<Derived>::isMuchSmallerThan(
EIGEN_DEVICE_FUNC bool DenseBase<Derived>::isMuchSmallerThan(
const DenseBase<OtherDerived>& other,
const RealScalar& prec
) const

View File

@@ -18,6 +18,16 @@ enum {
Small = 3
};
// Define the threshold value to fallback from the generic matrix-matrix product
// implementation (heavy) to the lightweight coeff-based product one.
// See generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemmProduct>
// in products/GeneralMatrixMatrix.h for more details.
// TODO This threshold should also be used in the compile-time selector below.
#ifndef EIGEN_GEMM_TO_COEFFBASED_THRESHOLD
// This default value has been obtained on a Haswell architecture.
#define EIGEN_GEMM_TO_COEFFBASED_THRESHOLD 20
#endif
namespace internal {
template<int Rows, int Cols, int Depth> struct product_type_selector;
@@ -25,7 +35,7 @@ template<int Rows, int Cols, int Depth> struct product_type_selector;
template<int Size, int MaxSize> struct product_size_category
{
enum {
#ifndef EIGEN_CUDA_ARCH
#ifndef EIGEN_GPU_COMPILE_PHASE
is_large = MaxSize == Dynamic ||
Size >= EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD ||
(Size==Dynamic && MaxSize>=EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD),
@@ -153,13 +163,13 @@ template<typename Scalar,int Size,int MaxSize,bool Cond> struct gemv_static_vect
template<typename Scalar,int Size,int MaxSize>
struct gemv_static_vector_if<Scalar,Size,MaxSize,false>
{
EIGEN_STRONG_INLINE Scalar* data() { eigen_internal_assert(false && "should never be called"); return 0; }
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Scalar* data() { eigen_internal_assert(false && "should never be called"); return 0; }
};
template<typename Scalar,int Size>
struct gemv_static_vector_if<Scalar,Size,Dynamic,true>
{
EIGEN_STRONG_INLINE Scalar* data() { return 0; }
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Scalar* data() { return 0; }
};
template<typename Scalar,int Size,int MaxSize>
@@ -229,7 +239,7 @@ template<> struct gemv_dense_selector<OnTheRight,ColMajor,true>
// on, the other hand it is good for the cache to pack the vector anyways...
EvalToDestAtCompileTime = (ActualDest::InnerStrideAtCompileTime==1),
ComplexByReal = (NumTraits<LhsScalar>::IsComplex) && (!NumTraits<RhsScalar>::IsComplex),
MightCannotUseDest = (!EvalToDestAtCompileTime) || ComplexByReal
MightCannotUseDest = ((!EvalToDestAtCompileTime) || ComplexByReal) && (ActualDest::MaxSizeAtCompileTime!=0)
};
typedef const_blas_data_mapper<LhsScalar,Index,ColMajor> LhsMapper;
@@ -316,7 +326,7 @@ template<> struct gemv_dense_selector<OnTheRight,RowMajor,true>
enum {
// FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
// on, the other hand it is good for the cache to pack the vector anyways...
DirectlyUseRhs = ActualRhsTypeCleaned::InnerStrideAtCompileTime==1
DirectlyUseRhs = ActualRhsTypeCleaned::InnerStrideAtCompileTime==1 || ActualRhsTypeCleaned::MaxSizeAtCompileTime==0
};
gemv_static_vector_if<RhsScalar,ActualRhsTypeCleaned::SizeAtCompileTime,ActualRhsTypeCleaned::MaxSizeAtCompileTime,!DirectlyUseRhs> static_rhs;
@@ -386,7 +396,8 @@ template<> struct gemv_dense_selector<OnTheRight,RowMajor,false>
*/
template<typename Derived>
template<typename OtherDerived>
inline const Product<Derived, OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Product<Derived, OtherDerived>
MatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const
{
// A note regarding the function declaration: In MSVC, this function will sometimes
@@ -428,6 +439,7 @@ MatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Product<Derived,OtherDerived,LazyProduct>
MatrixBase<Derived>::lazyProduct(const MatrixBase<OtherDerived> &other) const
{

View File

@@ -56,11 +56,13 @@ struct default_packet_traits
HasConj = 1,
HasSetLinear = 1,
HasBlend = 0,
HasReduxp = 1,
HasDiv = 0,
HasSqrt = 0,
HasRsqrt = 0,
HasExp = 0,
HasExpm1 = 0,
HasLog = 0,
HasLog1p = 0,
HasLog10 = 0,
@@ -81,7 +83,11 @@ struct default_packet_traits
HasPolygamma = 0,
HasErf = 0,
HasErfc = 0,
HasNdtri = 0,
HasBessel = 0,
HasIGamma = 0,
HasIGammaDerA = 0,
HasGammaSampleDerAlpha = 0,
HasIGammac = 0,
HasBetaInc = 0,
@@ -146,15 +152,18 @@ pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const
return static_cast<TgtPacket>(a);
}
/** \internal \returns reinterpret_cast<Target>(a) */
template <typename Target, typename Packet>
EIGEN_DEVICE_FUNC inline Target
preinterpret(const Packet& a); /* { return reinterpret_cast<const Target&>(a); } */
/** \internal \returns a + b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
padd(const Packet& a,
const Packet& b) { return a+b; }
padd(const Packet& a, const Packet& b) { return a+b; }
/** \internal \returns a - b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
psub(const Packet& a,
const Packet& b) { return a-b; }
psub(const Packet& a, const Packet& b) { return a-b; }
/** \internal \returns -a (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -167,23 +176,19 @@ pconj(const Packet& a) { return numext::conj(a); }
/** \internal \returns a * b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmul(const Packet& a,
const Packet& b) { return a*b; }
pmul(const Packet& a, const Packet& b) { return a*b; }
/** \internal \returns a / b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pdiv(const Packet& a,
const Packet& b) { return a/b; }
pdiv(const Packet& a, const Packet& b) { return a/b; }
/** \internal \returns the min of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmin(const Packet& a,
const Packet& b) { return numext::mini(a, b); }
pmin(const Packet& a, const Packet& b) { return numext::mini(a, b); }
/** \internal \returns the max of \a a and \a b (coeff-wise) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pmax(const Packet& a,
const Packet& b) { return numext::maxi(a, b); }
pmax(const Packet& a, const Packet& b) { return numext::maxi(a, b); }
/** \internal \returns the absolute value of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -207,7 +212,101 @@ pxor(const Packet& a, const Packet& b) { return a ^ b; }
/** \internal \returns the bitwise andnot of \a a and \a b */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pandnot(const Packet& a, const Packet& b) { return a & (!b); }
pandnot(const Packet& a, const Packet& b) { return a & (~b); }
/** \internal \returns ones */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ptrue(const Packet& /*a*/) { Packet b; memset((void*)&b, 0xff, sizeof(b)); return b;}
template <typename RealScalar>
EIGEN_DEVICE_FUNC inline std::complex<RealScalar> ptrue(const std::complex<RealScalar>& /*a*/) {
RealScalar b;
b = ptrue(b);
return std::complex<RealScalar>(b, b);
}
/** \internal \returns the bitwise not of \a a */
template <typename Packet> EIGEN_DEVICE_FUNC inline Packet
pnot(const Packet& a) { return pxor(ptrue(a), a);}
/** \internal \returns \a a shifted by N bits to the right */
template<int N> EIGEN_DEVICE_FUNC inline int
pshiftright(const int& a) { return a >> N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
pshiftright(const long int& a) { return a >> N; }
/** \internal \returns \a a shifted by N bits to the left */
template<int N> EIGEN_DEVICE_FUNC inline int
pshiftleft(const int& a) { return a << N; }
template<int N> EIGEN_DEVICE_FUNC inline long int
pshiftleft(const long int& a) { return a << N; }
/** \internal \returns the significant and exponent of the underlying floating point numbers
* See https://en.cppreference.com/w/cpp/numeric/math/frexp
*/
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet pfrexp(const Packet& a, Packet& exponent) {
int exp;
EIGEN_USING_STD_MATH(frexp);
Packet result = frexp(a, &exp);
exponent = static_cast<Packet>(exp);
return result;
}
/** \internal \returns a * 2^exponent
* See https://en.cppreference.com/w/cpp/numeric/math/ldexp
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pldexp(const Packet &a, const Packet &exponent) {
EIGEN_USING_STD_MATH(ldexp);
return ldexp(a, static_cast<int>(exponent));
}
/** \internal \returns zeros */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pzero(const Packet& a) { return pxor(a,a); }
template<> EIGEN_DEVICE_FUNC inline float pzero<float>(const float& a) {
EIGEN_UNUSED_VARIABLE(a);
return 0.f;
}
template<> EIGEN_DEVICE_FUNC inline double pzero<double>(const double& a) {
EIGEN_UNUSED_VARIABLE(a);
return 0.;
}
/** \internal \returns bits of \a or \b according to the input bit mask \a mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pselect(const Packet& mask, const Packet& a, const Packet& b) {
return por(pand(a,mask),pandnot(b,mask));
}
template<> EIGEN_DEVICE_FUNC inline float pselect<float>(
const float& mask, const float& a, const float&b) {
return numext::equal_strict(mask,0.f) ? b : a;
}
template<> EIGEN_DEVICE_FUNC inline double pselect<double>(
const double& mask, const double& a, const double& b) {
return numext::equal_strict(mask,0.) ? b : a;
}
/** \internal \returns a <= b as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_le(const Packet& a, const Packet& b) { return a<=b ? ptrue(a) : pzero(a); }
/** \internal \returns a < b as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_lt(const Packet& a, const Packet& b) { return a<b ? ptrue(a) : pzero(a); }
/** \internal \returns a == b as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_eq(const Packet& a, const Packet& b) { return a==b ? ptrue(a) : pzero(a); }
/** \internal \returns a < b or a==NaN or b==NaN as a bit mask */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pcmp_lt_or_nan(const Packet& a, const Packet& b) { return pnot(pcmp_le(b,a)); }
/** \internal \returns a packet version of \a *from, from must be 16 bytes aligned */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
@@ -217,10 +316,22 @@ pload(const typename unpacket_traits<Packet>::type* from) { return *from; }
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
ploadu(const typename unpacket_traits<Packet>::type* from) { return *from; }
/** \internal \returns a packet version of \a *from, (un-aligned masked load)
* There is no generic implementation. We only have implementations for specialized
* cases. Generic case should not be called.
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline
typename enable_if<unpacket_traits<Packet>::masked_load_available, Packet>::type
ploadu(const typename unpacket_traits<Packet>::type* from, typename unpacket_traits<Packet>::mask_t umask);
/** \internal \returns a packet with constant coefficients \a a, e.g.: (a,a,a,a) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pset1(const typename unpacket_traits<Packet>::type& a) { return a; }
/** \internal \returns a packet with constant coefficients set from bits */
template<typename Packet,typename BitsType> EIGEN_DEVICE_FUNC inline Packet
pset1frombits(BitsType a);
/** \internal \returns a packet with constant coefficients \a a[0], e.g.: (a[0],a[0],a[0],a[0]) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
pload1(const typename unpacket_traits<Packet>::type *a) { return pset1<Packet>(*a); }
@@ -289,6 +400,15 @@ template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstore(
template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstoreu(Scalar* to, const Packet& from)
{ (*to) = from; }
/** \internal copy the packet \a from to \a *to, (un-aligned store with a mask)
* There is no generic implementation. We only have implementations for specialized
* cases. Generic case should not be called.
*/
template<typename Scalar, typename Packet>
EIGEN_DEVICE_FUNC inline
typename enable_if<unpacket_traits<Packet>::masked_store_available, void>::type
pstoreu(Scalar* to, const Packet& from, typename unpacket_traits<Packet>::mask_t umask);
template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline Packet pgather(const Scalar* from, Index /*stride*/)
{ return ploadu<Packet>(from); }
@@ -298,7 +418,9 @@ template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstoreu
/** \internal tries to do cache prefetching of \a addr */
template<typename Scalar> EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* addr)
{
#ifdef __CUDA_ARCH__
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// do nothing
#elif defined(EIGEN_CUDA_ARCH)
#if defined(__LP64__)
// 64-bit pointer operand constraint for inlined asm
asm(" prefetch.L1 [ %1 ];" : "=l"(addr) : "l"(addr));
@@ -323,27 +445,48 @@ preduxp(const Packet* vecs) { return vecs[0]; }
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux(const Packet& a)
{ return a; }
/** \internal \returns the sum of the elements of \a a by block of 4 elements.
/** \internal \returns the sum of the elements of upper and lower half of \a a if \a a is larger than 4.
* For a packet {a0, a1, a2, a3, a4, a5, a6, a7}, it returns a half packet {a0+a4, a1+a5, a2+a6, a3+a7}
* For packet-size smaller or equal to 4, this boils down to a noop.
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline
typename conditional<(unpacket_traits<Packet>::size%8)==0,typename unpacket_traits<Packet>::half,Packet>::type
predux_downto4(const Packet& a)
predux_half_dowto4(const Packet& a)
{ return a; }
/** \internal \returns the product of the elements of \a a*/
/** \internal \returns the product of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_mul(const Packet& a)
{ return a; }
/** \internal \returns the min of the elements of \a a*/
/** \internal \returns the min of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(const Packet& a)
{ return a; }
/** \internal \returns the max of the elements of \a a*/
/** \internal \returns the max of the elements of \a a */
template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(const Packet& a)
{ return a; }
/** \internal \returns true if all coeffs of \a a means "true"
* It is supposed to be called on values returned by pcmp_*.
*/
// not needed yet
// template<typename Packet> EIGEN_DEVICE_FUNC inline bool predux_all(const Packet& a)
// { return bool(a); }
/** \internal \returns true if any coeffs of \a a means "true"
* It is supposed to be called on values returned by pcmp_*.
*/
template<typename Packet> EIGEN_DEVICE_FUNC inline bool predux_any(const Packet& a)
{
// Dirty but generic implementation where "true" is assumed to be non 0 and all the sames.
// It is expected that "true" is either:
// - Scalar(1)
// - bits full of ones (NaN for floats),
// - or first bit equals to 1 (1 for ints, smallest denormal for floats).
// For all these cases, taking the sum is just fine, and this boils down to a no-op for scalars.
return bool(predux(a));
}
/** \internal \returns the reversed elements of \a a*/
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a)
{ return a; }
@@ -351,7 +494,7 @@ template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet&
/** \internal \returns \a a with real and imaginary part flipped (for complex type only) */
template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a)
{
return Packet(a.imag(),a.real());
return Packet(numext::imag(a),numext::real(a));
}
/**************************
@@ -360,47 +503,51 @@ template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet
/** \internal \returns the sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psin(const Packet& a) { using std::sin; return sin(a); }
Packet psin(const Packet& a) { EIGEN_USING_STD_MATH(sin); return sin(a); }
/** \internal \returns the cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcos(const Packet& a) { using std::cos; return cos(a); }
Packet pcos(const Packet& a) { EIGEN_USING_STD_MATH(cos); return cos(a); }
/** \internal \returns the tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptan(const Packet& a) { using std::tan; return tan(a); }
Packet ptan(const Packet& a) { EIGEN_USING_STD_MATH(tan); return tan(a); }
/** \internal \returns the arc sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pasin(const Packet& a) { using std::asin; return asin(a); }
Packet pasin(const Packet& a) { EIGEN_USING_STD_MATH(asin); return asin(a); }
/** \internal \returns the arc cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pacos(const Packet& a) { using std::acos; return acos(a); }
Packet pacos(const Packet& a) { EIGEN_USING_STD_MATH(acos); return acos(a); }
/** \internal \returns the arc tangent of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet patan(const Packet& a) { using std::atan; return atan(a); }
Packet patan(const Packet& a) { EIGEN_USING_STD_MATH(atan); return atan(a); }
/** \internal \returns the hyperbolic sine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psinh(const Packet& a) { using std::sinh; return sinh(a); }
Packet psinh(const Packet& a) { EIGEN_USING_STD_MATH(sinh); return sinh(a); }
/** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pcosh(const Packet& a) { using std::cosh; return cosh(a); }
Packet pcosh(const Packet& a) { EIGEN_USING_STD_MATH(cosh); return cosh(a); }
/** \internal \returns the hyperbolic tan of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet ptanh(const Packet& a) { using std::tanh; return tanh(a); }
Packet ptanh(const Packet& a) { EIGEN_USING_STD_MATH(tanh); return tanh(a); }
/** \internal \returns the exp of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexp(const Packet& a) { using std::exp; return exp(a); }
Packet pexp(const Packet& a) { EIGEN_USING_STD_MATH(exp); return exp(a); }
/** \internal \returns the expm1 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet pexpm1(const Packet& a) { return numext::expm1(a); }
/** \internal \returns the log of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog(const Packet& a) { using std::log; return log(a); }
Packet plog(const Packet& a) { EIGEN_USING_STD_MATH(log); return log(a); }
/** \internal \returns the log1p of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
@@ -408,11 +555,11 @@ Packet plog1p(const Packet& a) { return numext::log1p(a); }
/** \internal \returns the log10 of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet plog10(const Packet& a) { using std::log10; return log10(a); }
Packet plog10(const Packet& a) { EIGEN_USING_STD_MATH(log10); return log10(a); }
/** \internal \returns the square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
Packet psqrt(const Packet& a) { using std::sqrt; return sqrt(a); }
Packet psqrt(const Packet& a) { EIGEN_USING_STD_MATH(sqrt); return sqrt(a); }
/** \internal \returns the reciprocal square-root of \a a (coeff-wise) */
template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
@@ -436,7 +583,7 @@ Packet pceil(const Packet& a) { using numext::ceil; return ceil(a); }
* The following functions might not have to be overwritten for vectorized types
***************************************************************************/
/** \internal copy a packet with constant coeficient \a a (e.g., [a,a,a,a]) to \a *to. \a to must be 16 bytes aligned */
/** \internal copy a packet with constant coefficient \a a (e.g., [a,a,a,a]) to \a *to. \a to must be 16 bytes aligned */
// NOTE: this function must really be templated on the packet type (think about different packet types for the same scalar type)
template<typename Packet>
inline void pstore1(typename unpacket_traits<Packet>::type* to, const typename unpacket_traits<Packet>::type& a)
@@ -518,7 +665,7 @@ inline void palign(PacketType& first, const PacketType& second)
***************************************************************************/
// Eigen+CUDA does not support complexes.
#ifndef __CUDACC__
#if !defined(EIGEN_GPUCC)
template<> inline std::complex<float> pmul(const std::complex<float>& a, const std::complex<float>& b)
{ return std::complex<float>(a.real()*b.real() - a.imag()*b.imag(), a.imag()*b.real() + a.real()*b.imag()); }
@@ -583,6 +730,22 @@ pinsertlast(const Packet& a, typename unpacket_traits<Packet>::type b)
return pblend(mask, pset1<Packet>(b), a);
}
/***************************************************************************
* Some generic implementations to be used by implementors
***************************************************************************/
/** Default implementation of pfrexp for float.
* It is expected to be called by implementers of template<> pfrexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent);
/** Default implementation of pldexp for float.
* It is expected to be called by implementers of template<> pldexp.
*/
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent);
} // end namespace internal
} // end namespace Eigen

View File

@@ -66,11 +66,19 @@ namespace Eigen
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sinh,scalar_sinh_op,hyperbolic sine,\sa ArrayBase::sinh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cosh,scalar_cosh_op,hyperbolic cosine,\sa ArrayBase::cosh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tanh,scalar_tanh_op,hyperbolic tangent,\sa ArrayBase::tanh)
#if EIGEN_HAS_CXX11_MATH
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asinh,scalar_asinh_op,inverse hyperbolic sine,\sa ArrayBase::asinh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acosh,scalar_acosh_op,inverse hyperbolic cosine,\sa ArrayBase::acosh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atanh,scalar_atanh_op,inverse hyperbolic tangent,\sa ArrayBase::atanh)
#endif
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(logistic,scalar_logistic_op,logistic function,\sa ArrayBase::logistic)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(lgamma,scalar_lgamma_op,natural logarithm of the gamma function,\sa ArrayBase::lgamma)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(digamma,scalar_digamma_op,derivative of lgamma,\sa ArrayBase::digamma)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erf,scalar_erf_op,error function,\sa ArrayBase::erf)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erfc,scalar_erfc_op,complement error function,\sa ArrayBase::erfc)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ndtri,scalar_ndtri_op,inverse normal distribution function,\sa ArrayBase::ndtri)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(exp,scalar_exp_op,exponential,\sa ArrayBase::exp)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(expm1,scalar_expm1_op,exponential of a value minus 1,\sa ArrayBase::expm1)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op,natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p,scalar_log1p_op,natural logarithm of 1 plus the value,\sa ArrayBase::log1p)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log)
@@ -88,7 +96,7 @@ namespace Eigen
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isinf,scalar_isinf_op,infinite value test,\sa Eigen::isnan DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isinf)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isfinite,scalar_isfinite_op,finite value test,\sa Eigen::isinf DOXCOMMA Eigen::isnan DOXCOMMA ArrayBase::isfinite)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sign,scalar_sign_op,sign (or 0),\sa ArrayBase::sign)
/** \returns an expression of the coefficient-wise power of \a x to the given constant \a exponent.
*
* \tparam ScalarExponent is the scalar type of \a exponent. It must be compatible with the scalar type of the given expression (\c Derived::Scalar).
@@ -102,17 +110,18 @@ namespace Eigen
inline const CwiseBinaryOp<internal::scalar_pow_op<Derived::Scalar,ScalarExponent>,Derived,Constant<ScalarExponent> >
pow(const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent);
#else
template<typename Derived,typename ScalarExponent>
inline typename internal::enable_if< !(internal::is_same<typename Derived::Scalar,ScalarExponent>::value) && EIGEN_SCALAR_BINARY_SUPPORTED(pow,typename Derived::Scalar,ScalarExponent),
const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,ScalarExponent,pow) >::type
pow(const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent) {
return x.derived().pow(exponent);
}
template<typename Derived>
inline const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,typename Derived::Scalar,pow)
pow(const Eigen::ArrayBase<Derived>& x, const typename Derived::Scalar& exponent) {
return x.derived().pow(exponent);
template <typename Derived,typename ScalarExponent>
EIGEN_DEVICE_FUNC inline
EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE(
const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,typename internal::promote_scalar_arg<typename Derived::Scalar
EIGEN_COMMA ScalarExponent EIGEN_COMMA
EIGEN_SCALAR_BINARY_SUPPORTED(pow,typename Derived::Scalar,ScalarExponent)>::type,pow))
pow(const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent)
{
typedef typename internal::promote_scalar_arg<typename Derived::Scalar,ScalarExponent,
EIGEN_SCALAR_BINARY_SUPPORTED(pow,typename Derived::Scalar,ScalarExponent)>::type PromotedExponent;
return EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,PromotedExponent,pow)(x.derived(),
typename internal::plain_constant_type<Derived,PromotedExponent>::type(x.derived().rows(), x.derived().cols(), internal::scalar_constant_op<PromotedExponent>(exponent)));
}
#endif
@@ -122,21 +131,21 @@ namespace Eigen
*
* Example: \include Cwise_array_power_array.cpp
* Output: \verbinclude Cwise_array_power_array.out
*
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
template<typename Derived,typename ExponentDerived>
inline const Eigen::CwiseBinaryOp<Eigen::internal::scalar_pow_op<typename Derived::Scalar, typename ExponentDerived::Scalar>, const Derived, const ExponentDerived>
pow(const Eigen::ArrayBase<Derived>& x, const Eigen::ArrayBase<ExponentDerived>& exponents)
pow(const Eigen::ArrayBase<Derived>& x, const Eigen::ArrayBase<ExponentDerived>& exponents)
{
return Eigen::CwiseBinaryOp<Eigen::internal::scalar_pow_op<typename Derived::Scalar, typename ExponentDerived::Scalar>, const Derived, const ExponentDerived>(
x.derived(),
exponents.derived()
);
}
/** \returns an expression of the coefficient-wise power of the scalar \a x to the given array of \a exponents.
*
* This function computes the coefficient-wise power between a scalar and an array of exponents.
@@ -145,7 +154,7 @@ namespace Eigen
*
* Example: \include Cwise_scalar_power_array.cpp
* Output: \verbinclude Cwise_scalar_power_array.out
*
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
@@ -155,21 +164,17 @@ namespace Eigen
inline const CwiseBinaryOp<internal::scalar_pow_op<Scalar,Derived::Scalar>,Constant<Scalar>,Derived>
pow(const Scalar& x,const Eigen::ArrayBase<Derived>& x);
#else
template<typename Scalar, typename Derived>
inline typename internal::enable_if< !(internal::is_same<typename Derived::Scalar,Scalar>::value) && EIGEN_SCALAR_BINARY_SUPPORTED(pow,Scalar,typename Derived::Scalar),
const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,Derived,pow) >::type
pow(const Scalar& x, const Eigen::ArrayBase<Derived>& exponents)
{
return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,Derived,pow)(
typename internal::plain_constant_type<Derived,Scalar>::type(exponents.rows(), exponents.cols(), x), exponents.derived() );
}
template<typename Derived>
inline const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename Derived::Scalar,Derived,pow)
pow(const typename Derived::Scalar& x, const Eigen::ArrayBase<Derived>& exponents)
{
return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename Derived::Scalar,Derived,pow)(
typename internal::plain_constant_type<Derived,typename Derived::Scalar>::type(exponents.rows(), exponents.cols(), x), exponents.derived() );
template <typename Scalar, typename Derived>
EIGEN_DEVICE_FUNC inline
EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE(
const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename internal::promote_scalar_arg<typename Derived::Scalar
EIGEN_COMMA Scalar EIGEN_COMMA
EIGEN_SCALAR_BINARY_SUPPORTED(pow,Scalar,typename Derived::Scalar)>::type,Derived,pow))
pow(const Scalar& x, const Eigen::ArrayBase<Derived>& exponents) {
typedef typename internal::promote_scalar_arg<typename Derived::Scalar,Scalar,
EIGEN_SCALAR_BINARY_SUPPORTED(pow,Scalar,typename Derived::Scalar)>::type PromotedScalar;
return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(PromotedScalar,Derived,pow)(
typename internal::plain_constant_type<Derived,PromotedScalar>::type(exponents.derived().rows(), exponents.derived().cols(), internal::scalar_constant_op<PromotedScalar>(x)), exponents.derived());
}
#endif

View File

@@ -41,6 +41,7 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
* - \b rowSuffix string printed at the end of each row
* - \b matPrefix string printed at the beginning of the matrix
* - \b matSuffix string printed at the end of the matrix
* - \b fill character printed to fill the empty space in aligned columns
*
* Example: \include IOFormat.cpp
* Output: \verbinclude IOFormat.out
@@ -53,9 +54,9 @@ struct IOFormat
IOFormat(int _precision = StreamPrecision, int _flags = 0,
const std::string& _coeffSeparator = " ",
const std::string& _rowSeparator = "\n", const std::string& _rowPrefix="", const std::string& _rowSuffix="",
const std::string& _matPrefix="", const std::string& _matSuffix="")
const std::string& _matPrefix="", const std::string& _matSuffix="", const char _fill=' ')
: matPrefix(_matPrefix), matSuffix(_matSuffix), rowPrefix(_rowPrefix), rowSuffix(_rowSuffix), rowSeparator(_rowSeparator),
rowSpacer(""), coeffSeparator(_coeffSeparator), precision(_precision), flags(_flags)
rowSpacer(""), coeffSeparator(_coeffSeparator), fill(_fill), precision(_precision), flags(_flags)
{
// TODO check if rowPrefix, rowSuffix or rowSeparator contains a newline
// don't add rowSpacer if columns are not to be aligned
@@ -71,6 +72,7 @@ struct IOFormat
std::string matPrefix, matSuffix;
std::string rowPrefix, rowSuffix, rowSeparator, rowSpacer;
std::string coeffSeparator;
char fill;
int precision;
int flags;
};
@@ -176,18 +178,26 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
width = std::max<Index>(width, Index(sstr.str().length()));
}
}
std::streamsize old_width = s.width();
char old_fill_character = s.fill();
s << fmt.matPrefix;
for(Index i = 0; i < m.rows(); ++i)
{
if (i)
s << fmt.rowSpacer;
s << fmt.rowPrefix;
if(width) s.width(width);
if(width) {
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, 0);
for(Index j = 1; j < m.cols(); ++j)
{
s << fmt.coeffSeparator;
if (width) s.width(width);
if(width) {
s.fill(fmt.fill);
s.width(width);
}
s << m.coeff(i, j);
}
s << fmt.rowSuffix;
@@ -196,6 +206,10 @@ std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat&
}
s << fmt.matSuffix;
if(explicit_precision) s.precision(old_precision);
if(width) {
s.fill(old_fill_character);
s.width(old_width);
}
return s;
}

View File

@@ -0,0 +1,207 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2017 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_INDEXED_VIEW_H
#define EIGEN_INDEXED_VIEW_H
namespace Eigen {
namespace internal {
template<typename XprType, typename RowIndices, typename ColIndices>
struct traits<IndexedView<XprType, RowIndices, ColIndices> >
: traits<XprType>
{
enum {
RowsAtCompileTime = int(array_size<RowIndices>::value),
ColsAtCompileTime = int(array_size<ColIndices>::value),
MaxRowsAtCompileTime = RowsAtCompileTime != Dynamic ? int(RowsAtCompileTime) : Dynamic,
MaxColsAtCompileTime = ColsAtCompileTime != Dynamic ? int(ColsAtCompileTime) : Dynamic,
XprTypeIsRowMajor = (int(traits<XprType>::Flags)&RowMajorBit) != 0,
IsRowMajor = (MaxRowsAtCompileTime==1&&MaxColsAtCompileTime!=1) ? 1
: (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0
: XprTypeIsRowMajor,
RowIncr = int(get_compile_time_incr<RowIndices>::value),
ColIncr = int(get_compile_time_incr<ColIndices>::value),
InnerIncr = IsRowMajor ? ColIncr : RowIncr,
OuterIncr = IsRowMajor ? RowIncr : ColIncr,
HasSameStorageOrderAsXprType = (IsRowMajor == XprTypeIsRowMajor),
XprInnerStride = HasSameStorageOrderAsXprType ? int(inner_stride_at_compile_time<XprType>::ret) : int(outer_stride_at_compile_time<XprType>::ret),
XprOuterstride = HasSameStorageOrderAsXprType ? int(outer_stride_at_compile_time<XprType>::ret) : int(inner_stride_at_compile_time<XprType>::ret),
InnerSize = XprTypeIsRowMajor ? ColsAtCompileTime : RowsAtCompileTime,
IsBlockAlike = InnerIncr==1 && OuterIncr==1,
IsInnerPannel = HasSameStorageOrderAsXprType && is_same<AllRange<InnerSize>,typename conditional<XprTypeIsRowMajor,ColIndices,RowIndices>::type>::value,
InnerStrideAtCompileTime = InnerIncr<0 || InnerIncr==DynamicIndex || XprInnerStride==Dynamic ? Dynamic : XprInnerStride * InnerIncr,
OuterStrideAtCompileTime = OuterIncr<0 || OuterIncr==DynamicIndex || XprOuterstride==Dynamic ? Dynamic : XprOuterstride * OuterIncr,
ReturnAsScalar = is_same<RowIndices,SingleRange>::value && is_same<ColIndices,SingleRange>::value,
ReturnAsBlock = (!ReturnAsScalar) && IsBlockAlike,
ReturnAsIndexedView = (!ReturnAsScalar) && (!ReturnAsBlock),
// FIXME we deal with compile-time strides if and only if we have DirectAccessBit flag,
// but this is too strict regarding negative strides...
DirectAccessMask = (int(InnerIncr)!=UndefinedIncr && int(OuterIncr)!=UndefinedIncr && InnerIncr>=0 && OuterIncr>=0) ? DirectAccessBit : 0,
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
Flags = (traits<XprType>::Flags & (HereditaryBits | DirectAccessMask)) | FlagsLvalueBit | FlagsRowMajorBit
};
typedef Block<XprType,RowsAtCompileTime,ColsAtCompileTime,IsInnerPannel> BlockType;
};
}
template<typename XprType, typename RowIndices, typename ColIndices, typename StorageKind>
class IndexedViewImpl;
/** \class IndexedView
* \ingroup Core_Module
*
* \brief Expression of a non-sequential sub-matrix defined by arbitrary sequences of row and column indices
*
* \tparam XprType the type of the expression in which we are taking the intersections of sub-rows and sub-columns
* \tparam RowIndices the type of the object defining the sequence of row indices
* \tparam ColIndices the type of the object defining the sequence of column indices
*
* This class represents an expression of a sub-matrix (or sub-vector) defined as the intersection
* of sub-sets of rows and columns, that are themself defined by generic sequences of row indices \f$ \{r_0,r_1,..r_{m-1}\} \f$
* and column indices \f$ \{c_0,c_1,..c_{n-1} \}\f$. Let \f$ A \f$ be the nested matrix, then the resulting matrix \f$ B \f$ has \c m
* rows and \c n columns, and its entries are given by: \f$ B(i,j) = A(r_i,c_j) \f$.
*
* The \c RowIndices and \c ColIndices types must be compatible with the following API:
* \code
* <integral type> operator[](Index) const;
* Index size() const;
* \endcode
*
* Typical supported types thus include:
* - std::vector<int>
* - std::valarray<int>
* - std::array<int>
* - Plain C arrays: int[N]
* - Eigen::ArrayXi
* - decltype(ArrayXi::LinSpaced(...))
* - Any view/expressions of the previous types
* - Eigen::ArithmeticSequence
* - Eigen::internal::AllRange (helper for Eigen::all)
* - Eigen::internal::SingleRange (helper for single index)
* - etc.
*
* In typical usages of %Eigen, this class should never be used directly. It is the return type of
* DenseBase::operator()(const RowIndices&, const ColIndices&).
*
* \sa class Block
*/
template<typename XprType, typename RowIndices, typename ColIndices>
class IndexedView : public IndexedViewImpl<XprType, RowIndices, ColIndices, typename internal::traits<XprType>::StorageKind>
{
public:
typedef typename IndexedViewImpl<XprType, RowIndices, ColIndices, typename internal::traits<XprType>::StorageKind>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(IndexedView)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(IndexedView)
typedef typename internal::ref_selector<XprType>::non_const_type MatrixTypeNested;
typedef typename internal::remove_all<XprType>::type NestedExpression;
template<typename T0, typename T1>
IndexedView(XprType& xpr, const T0& rowIndices, const T1& colIndices)
: m_xpr(xpr), m_rowIndices(rowIndices), m_colIndices(colIndices)
{}
/** \returns number of rows */
Index rows() const { return internal::size(m_rowIndices); }
/** \returns number of columns */
Index cols() const { return internal::size(m_colIndices); }
/** \returns the nested expression */
const typename internal::remove_all<XprType>::type&
nestedExpression() const { return m_xpr; }
/** \returns the nested expression */
typename internal::remove_reference<XprType>::type&
nestedExpression() { return m_xpr; }
/** \returns a const reference to the object storing/generating the row indices */
const RowIndices& rowIndices() const { return m_rowIndices; }
/** \returns a const reference to the object storing/generating the column indices */
const ColIndices& colIndices() const { return m_colIndices; }
protected:
MatrixTypeNested m_xpr;
RowIndices m_rowIndices;
ColIndices m_colIndices;
};
// Generic API dispatcher
template<typename XprType, typename RowIndices, typename ColIndices, typename StorageKind>
class IndexedViewImpl
: public internal::generic_xpr_base<IndexedView<XprType, RowIndices, ColIndices> >::type
{
public:
typedef typename internal::generic_xpr_base<IndexedView<XprType, RowIndices, ColIndices> >::type Base;
};
namespace internal {
template<typename ArgType, typename RowIndices, typename ColIndices>
struct unary_evaluator<IndexedView<ArgType, RowIndices, ColIndices>, IndexBased>
: evaluator_base<IndexedView<ArgType, RowIndices, ColIndices> >
{
typedef IndexedView<ArgType, RowIndices, ColIndices> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost /* TODO + cost of row/col index */,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits /*| LinearAccessBit | DirectAccessBit*/)),
Alignment = 0
};
EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& xpr) : m_argImpl(xpr.nestedExpression()), m_xpr(xpr)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
return m_argImpl.coeff(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Scalar& coeffRef(Index row, Index col)
{
return m_argImpl.coeffRef(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]);
}
protected:
evaluator<ArgType> m_argImpl;
const XprType& m_xpr;
};
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_INDEXED_VIEW_H

View File

@@ -1,7 +1,7 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2014 Gael Guennebaud <gael.guennebaud@inria.fr>
// Copyright (C) 2014-2019 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
@@ -44,7 +44,6 @@ class Inverse : public InverseImpl<XprType,typename internal::traits<XprType>::S
{
public:
typedef typename XprType::StorageIndex StorageIndex;
typedef typename XprType::PlainObject PlainObject;
typedef typename XprType::Scalar Scalar;
typedef typename internal::ref_selector<XprType>::type XprTypeNested;
typedef typename internal::remove_all<XprTypeNested>::type XprTypeNestedCleaned;
@@ -55,8 +54,8 @@ public:
: m_xpr(xpr)
{}
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC const XprTypeNestedCleaned& nestedExpression() const { return m_xpr; }

View File

@@ -113,10 +113,10 @@ template<typename PlainObjectType, int MapOptions, typename StrideType> class Ma
EIGEN_DEVICE_FUNC
inline Index outerStride() const
{
return int(StrideType::OuterStrideAtCompileTime) != 0 ? m_stride.outer()
: int(internal::traits<Map>::OuterStrideAtCompileTime) != Dynamic ? Index(internal::traits<Map>::OuterStrideAtCompileTime)
return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()
: internal::traits<Map>::OuterStrideAtCompileTime != Dynamic ? Index(internal::traits<Map>::OuterStrideAtCompileTime)
: IsVectorAtCompileTime ? (this->size() * innerStride())
: (int(Flags)&RowMajorBit) ? (this->cols() * innerStride())
: int(Flags)&RowMajorBit ? (this->cols() * innerStride())
: (this->rows() * innerStride());
}

View File

@@ -182,8 +182,6 @@ template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
#endif
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
template<typename T>
EIGEN_DEVICE_FUNC
@@ -296,9 +294,6 @@ template<typename Derived> class MapBase<Derived, WriteAccessors>
// In theory we could simply refer to Base:Base::operator=, but MSVC does not like Base::Base,
// see bugs 821 and 920.
using ReadOnlyMapBase::Base::operator=;
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase)
};
#undef EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS

View File

@@ -14,7 +14,6 @@
// TODO this should better be moved to NumTraits
#define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406L
namespace Eigen {
// On WINCE, std::abs is defined for int only, so let's defined our own overloads:
@@ -97,7 +96,7 @@ struct real_default_impl<Scalar,true>
template<typename Scalar> struct real_impl : real_default_impl<Scalar> {};
#ifdef __CUDA_ARCH__
#if defined(EIGEN_GPU_COMPILE_PHASE)
template<typename T>
struct real_impl<std::complex<T> >
{
@@ -145,7 +144,7 @@ struct imag_default_impl<Scalar,true>
template<typename Scalar> struct imag_impl : imag_default_impl<Scalar> {};
#ifdef __CUDA_ARCH__
#if defined(EIGEN_GPU_COMPILE_PHASE)
template<typename T>
struct imag_impl<std::complex<T> >
{
@@ -239,7 +238,7 @@ struct imag_ref_retval
****************************************************************************/
template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
struct conj_impl
struct conj_default_impl
{
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
@@ -249,7 +248,7 @@ struct conj_impl
};
template<typename Scalar>
struct conj_impl<Scalar,true>
struct conj_default_impl<Scalar,true>
{
EIGEN_DEVICE_FUNC
static inline Scalar run(const Scalar& x)
@@ -259,6 +258,20 @@ struct conj_impl<Scalar,true>
}
};
template<typename Scalar> struct conj_impl : conj_default_impl<Scalar> {};
#if defined(EIGEN_GPU_COMPILE_PHASE)
template<typename T>
struct conj_impl<std::complex<T> >
{
EIGEN_DEVICE_FUNC
static inline std::complex<T> run(const std::complex<T>& x)
{
return std::complex<T>(x.real(), -x.imag());
}
};
#endif
template<typename Scalar>
struct conj_retval
{
@@ -392,7 +405,7 @@ inline NewType cast(const OldType& x)
static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
using std::round;
EIGEN_USING_STD_MATH(round);
return round(x);
}
};
@@ -425,7 +438,12 @@ struct round_retval
struct arg_impl {
static inline Scalar run(const Scalar& x)
{
#if defined(EIGEN_HIP_DEVICE_COMPILE)
// HIP does not seem to have a native device side implementation for the math routine "arg"
using std::arg;
#else
EIGEN_USING_STD_MATH(arg);
#endif
return arg(x);
}
};
@@ -461,6 +479,66 @@ struct arg_retval
typedef typename NumTraits<Scalar>::Real type;
};
/****************************************************************************
* Implementation of expm1 *
****************************************************************************/
// This implementation is based on GSL Math's expm1.
namespace std_fallback {
// fallback expm1 implementation in case there is no expm1(Scalar) function in namespace of Scalar,
// or that there is no suitable std::expm1 function available. Implementation
// attributed to Kahan. See: http://www.plunk.org/~hatch/rightway.php.
template<typename Scalar>
EIGEN_DEVICE_FUNC inline Scalar expm1(const Scalar& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD_MATH(exp);
Scalar u = exp(x);
if (numext::equal_strict(u, Scalar(1))) {
return x;
}
Scalar um1 = u - RealScalar(1);
if (numext::equal_strict(um1, Scalar(-1))) {
return RealScalar(-1);
}
EIGEN_USING_STD_MATH(log);
Scalar logu = log(u);
return numext::equal_strict(u, logu) ? u : (u - RealScalar(1)) * x / logu;
}
}
template<typename Scalar>
struct expm1_impl {
EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
#if EIGEN_HAS_CXX11_MATH
using std::expm1;
#else
using std_fallback::expm1;
#endif
return expm1(x);
}
};
// Specialization for complex types that are not supported by std::expm1.
template <typename RealScalar>
struct expm1_impl<std::complex<RealScalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<RealScalar> run(
const std::complex<RealScalar>& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar)
return std_fallback::expm1(x);
}
};
template<typename Scalar>
struct expm1_retval
{
typedef Scalar type;
};
/****************************************************************************
* Implementation of log1p *
****************************************************************************/
@@ -474,23 +552,36 @@ namespace std_fallback {
typedef typename NumTraits<Scalar>::Real RealScalar;
EIGEN_USING_STD_MATH(log);
Scalar x1p = RealScalar(1) + x;
return numext::equal_strict(x1p, Scalar(1)) ? x : x * ( log(x1p) / (x1p - RealScalar(1)) );
Scalar log_1p = log(x1p);
const bool is_small = numext::equal_strict(x1p, Scalar(1));
const bool is_inf = numext::equal_strict(x1p, log_1p);
return (is_small || is_inf) ? x : x * (log_1p / (x1p - RealScalar(1)));
}
}
template<typename Scalar>
struct log1p_impl {
static inline Scalar run(const Scalar& x)
EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x)
{
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
#if EIGEN_HAS_CXX11_MATH
using std::log1p;
#endif
#else
using std_fallback::log1p;
#endif
return log1p(x);
}
};
// Specialization for complex types that are not supported by std::log1p.
template <typename RealScalar>
struct log1p_impl<std::complex<RealScalar> > {
EIGEN_DEVICE_FUNC static inline std::complex<RealScalar> run(
const std::complex<RealScalar>& x) {
EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar)
return std_fallback::log1p(x);
}
};
template<typename Scalar>
struct log1p_retval
@@ -687,7 +778,7 @@ inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random()
return EIGEN_MATHFUNC_IMPL(random, Scalar)::run();
}
// Implementatin of is* functions
// Implementation of is* functions
// std::is* do not work with fast-math and gcc, std::is* are available on MSVC 2013 and newer, as well as in clang.
#if (EIGEN_HAS_CXX11_MATH && !(EIGEN_COMP_GNUC_STRICT && __FINITE_MATH_ONLY__)) || (EIGEN_COMP_MSVC>=1800) || (EIGEN_COMP_CLANG)
@@ -716,7 +807,7 @@ EIGEN_DEVICE_FUNC
typename internal::enable_if<(!internal::is_integral<T>::value)&&(!NumTraits<T>::IsComplex),bool>::type
isfinite_impl(const T& x)
{
#ifdef __CUDA_ARCH__
#if defined(EIGEN_GPU_COMPILE_PHASE)
return (::isfinite)(x);
#elif EIGEN_USE_STD_FPCLASSIFY
using std::isfinite;
@@ -731,7 +822,7 @@ EIGEN_DEVICE_FUNC
typename internal::enable_if<(!internal::is_integral<T>::value)&&(!NumTraits<T>::IsComplex),bool>::type
isinf_impl(const T& x)
{
#ifdef __CUDA_ARCH__
#if defined(EIGEN_GPU_COMPILE_PHASE)
return (::isinf)(x);
#elif EIGEN_USE_STD_FPCLASSIFY
using std::isinf;
@@ -746,7 +837,7 @@ EIGEN_DEVICE_FUNC
typename internal::enable_if<(!internal::is_integral<T>::value)&&(!NumTraits<T>::IsComplex),bool>::type
isnan_impl(const T& x)
{
#ifdef __CUDA_ARCH__
#if defined(EIGEN_GPU_COMPILE_PHASE)
return (::isnan)(x);
#elif EIGEN_USE_STD_FPCLASSIFY
using std::isnan;
@@ -803,7 +894,6 @@ template<typename T> EIGEN_DEVICE_FUNC bool isnan_impl(const std::complex<T>& x)
template<typename T> EIGEN_DEVICE_FUNC bool isinf_impl(const std::complex<T>& x);
template<typename T> T generic_fast_tanh_float(const T& a_x);
} // end namespace internal
/****************************************************************************
@@ -812,7 +902,7 @@ template<typename T> T generic_fast_tanh_float(const T& a_x);
namespace numext {
#ifndef __CUDA_ARCH__
#if (!defined(EIGEN_GPUCC) || defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC))
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y)
@@ -841,6 +931,24 @@ EIGEN_ALWAYS_INLINE float mini(const float& x, const float& y)
{
return fminf(x, y);
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE double mini(const double& x, const double& y)
{
return fmin(x, y);
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE long double mini(const long double& x, const long double& y)
{
#if defined(EIGEN_HIPCC)
// no "fminl" on HIP yet
return (x < y) ? x : y;
#else
return fminl(x, y);
#endif
}
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
@@ -853,6 +961,92 @@ EIGEN_ALWAYS_INLINE float maxi(const float& x, const float& y)
{
return fmaxf(x, y);
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE double maxi(const double& x, const double& y)
{
return fmax(x, y);
}
template<>
EIGEN_DEVICE_FUNC
EIGEN_ALWAYS_INLINE long double maxi(const long double& x, const long double& y)
{
#if defined(EIGEN_HIPCC)
// no "fmaxl" on HIP yet
return (x > y) ? x : y;
#else
return fmaxl(x, y);
#endif
}
#endif
#if defined(SYCL_DEVICE_ONLY)
#define SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_char) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_short) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_int) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_long)
#define SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_char) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_short) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_int) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_long)
#define SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_uchar) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_ushort) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_uint) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_ulong)
#define SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_uchar) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_ushort) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_uint) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_ulong)
#define SYCL_SPECIALIZE_INTEGER_TYPES_BINARY(NAME, FUNC) \
SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) \
SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_BINARY(NAME, FUNC)
#define SYCL_SPECIALIZE_INTEGER_TYPES_UNARY(NAME, FUNC) \
SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) \
SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY(NAME, FUNC)
#define SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(NAME, FUNC) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_float) \
SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC,cl::sycl::cl_double)
#define SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(NAME, FUNC) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_float) \
SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC,cl::sycl::cl_double)
#define SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(NAME, FUNC, RET_TYPE) \
SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, RET_TYPE, cl::sycl::cl_float) \
SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, RET_TYPE, cl::sycl::cl_double)
#define SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE) \
template<> \
EIGEN_DEVICE_FUNC \
EIGEN_ALWAYS_INLINE RET_TYPE NAME(const ARG_TYPE& x) { \
return cl::sycl::FUNC(x); \
}
#define SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, TYPE) \
SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, TYPE, TYPE)
#define SYCL_SPECIALIZE_GEN1_BINARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE1, ARG_TYPE2) \
template<> \
EIGEN_DEVICE_FUNC \
EIGEN_ALWAYS_INLINE RET_TYPE NAME(const ARG_TYPE1& x, const ARG_TYPE2& y) { \
return cl::sycl::FUNC(x, y); \
}
#define SYCL_SPECIALIZE_GEN2_BINARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE) \
SYCL_SPECIALIZE_GEN1_BINARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE, ARG_TYPE)
#define SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, TYPE) \
SYCL_SPECIALIZE_GEN2_BINARY_FUNC(NAME, FUNC, TYPE, TYPE)
SYCL_SPECIALIZE_INTEGER_TYPES_BINARY(mini, min)
SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(mini, fmin)
SYCL_SPECIALIZE_INTEGER_TYPES_BINARY(maxi, max)
SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(maxi, fmax)
#endif
@@ -936,6 +1130,10 @@ inline EIGEN_MATHFUNC_RETVAL(hypot, Scalar) hypot(const Scalar& x, const Scalar&
return EIGEN_MATHFUNC_IMPL(hypot, Scalar)::run(x, y);
}
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(hypot, hypot)
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(log1p, Scalar) log1p(const Scalar& x)
@@ -943,7 +1141,11 @@ inline EIGEN_MATHFUNC_RETVAL(log1p, Scalar) log1p(const Scalar& x)
return EIGEN_MATHFUNC_IMPL(log1p, Scalar)::run(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(log1p, log1p)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float log1p(const float &x) { return ::log1pf(x); }
@@ -958,10 +1160,20 @@ inline typename internal::pow_impl<ScalarX,ScalarY>::result_type pow(const Scala
return internal::pow_impl<ScalarX,ScalarY>::run(x, y);
}
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(pow, pow)
#endif
template<typename T> EIGEN_DEVICE_FUNC bool (isnan) (const T &x) { return internal::isnan_impl(x); }
template<typename T> EIGEN_DEVICE_FUNC bool (isinf) (const T &x) { return internal::isinf_impl(x); }
template<typename T> EIGEN_DEVICE_FUNC bool (isfinite)(const T &x) { return internal::isfinite_impl(x); }
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isnan, isnan, bool)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isinf, isinf, bool)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isfinite, isfinite, bool)
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(round, Scalar) round(const Scalar& x)
@@ -969,6 +1181,10 @@ inline EIGEN_MATHFUNC_RETVAL(round, Scalar) round(const Scalar& x)
return EIGEN_MATHFUNC_IMPL(round, Scalar)::run(x);
}
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(round, round)
#endif
template<typename T>
EIGEN_DEVICE_FUNC
T (floor)(const T& x)
@@ -977,7 +1193,11 @@ T (floor)(const T& x)
return floor(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(floor, floor)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float floor(const float &x) { return ::floorf(x); }
@@ -993,7 +1213,11 @@ T (ceil)(const T& x)
return ceil(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(ceil, ceil)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float ceil(const float &x) { return ::ceilf(x); }
@@ -1034,6 +1258,10 @@ T sqrt(const T &x)
return sqrt(x);
}
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sqrt, sqrt)
#endif
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T log(const T &x) {
@@ -1041,7 +1269,12 @@ T log(const T &x) {
return log(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(log, log)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float log(const float &x) { return ::logf(x); }
@@ -1064,12 +1297,12 @@ abs(const T &x) {
return x;
}
#if defined(__SYCL_DEVICE_ONLY__)
EIGEN_ALWAYS_INLINE float abs(float x) { return cl::sycl::fabs(x); }
EIGEN_ALWAYS_INLINE double abs(double x) { return cl::sycl::fabs(x); }
#endif // defined(__SYCL_DEVICE_ONLY__)
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_INTEGER_TYPES_UNARY(abs, abs)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(abs, fabs)
#endif
#ifdef __CUDACC__
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float abs(const float &x) { return ::fabsf(x); }
@@ -1094,12 +1327,51 @@ T exp(const T &x) {
return exp(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(exp, exp)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float exp(const float &x) { return ::expf(x); }
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
double exp(const double &x) { return ::exp(x); }
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
std::complex<float> exp(const std::complex<float>& x) {
float com = ::expf(x.real());
float res_real = com * ::cosf(x.imag());
float res_imag = com * ::sinf(x.imag());
return std::complex<float>(res_real, res_imag);
}
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
std::complex<double> exp(const std::complex<double>& x) {
double com = ::exp(x.real());
double res_real = com * ::cos(x.imag());
double res_imag = com * ::sin(x.imag());
return std::complex<double>(res_real, res_imag);
}
#endif
template<typename Scalar>
EIGEN_DEVICE_FUNC
inline EIGEN_MATHFUNC_RETVAL(expm1, Scalar) expm1(const Scalar& x)
{
return EIGEN_MATHFUNC_IMPL(expm1, Scalar)::run(x);
}
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(expm1, expm1)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float expm1(const float &x) { return ::expm1f(x); }
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
double expm1(const double &x) { return ::expm1(x); }
#endif
template<typename T>
@@ -1109,7 +1381,11 @@ T cos(const T &x) {
return cos(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(cos,cos)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float cos(const float &x) { return ::cosf(x); }
@@ -1124,7 +1400,11 @@ T sin(const T &x) {
return sin(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sin, sin)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float sin(const float &x) { return ::sinf(x); }
@@ -1139,7 +1419,11 @@ T tan(const T &x) {
return tan(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(tan, tan)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float tan(const float &x) { return ::tanf(x); }
@@ -1154,7 +1438,21 @@ T acos(const T &x) {
return acos(x);
}
#ifdef __CUDACC__
#if EIGEN_HAS_CXX11_MATH
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T acosh(const T &x) {
EIGEN_USING_STD_MATH(acosh);
return acosh(x);
}
#endif
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(acos, acos)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(acosh, acosh)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float acos(const float &x) { return ::acosf(x); }
@@ -1169,7 +1467,21 @@ T asin(const T &x) {
return asin(x);
}
#ifdef __CUDACC__
#if EIGEN_HAS_CXX11_MATH
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T asinh(const T &x) {
EIGEN_USING_STD_MATH(asinh);
return asinh(x);
}
#endif
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(asin, asin)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(asinh, asinh)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float asin(const float &x) { return ::asinf(x); }
@@ -1184,7 +1496,21 @@ T atan(const T &x) {
return atan(x);
}
#ifdef __CUDACC__
#if EIGEN_HAS_CXX11_MATH
template<typename T>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
T atanh(const T &x) {
EIGEN_USING_STD_MATH(atanh);
return atanh(x);
}
#endif
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(atan, atan)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(atanh, atanh)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float atan(const float &x) { return ::atanf(x); }
@@ -1200,7 +1526,11 @@ T cosh(const T &x) {
return cosh(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(cosh, cosh)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float cosh(const float &x) { return ::coshf(x); }
@@ -1215,7 +1545,11 @@ T sinh(const T &x) {
return sinh(x);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sinh, sinh)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float sinh(const float &x) { return ::sinhf(x); }
@@ -1230,12 +1564,16 @@ T tanh(const T &x) {
return tanh(x);
}
#if (!defined(__CUDACC__)) && EIGEN_FAST_MATH
#if (!defined(EIGEN_GPUCC)) && EIGEN_FAST_MATH && !defined(SYCL_DEVICE_ONLY)
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float tanh(float x) { return internal::generic_fast_tanh_float(x); }
#endif
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(tanh, tanh)
#endif
#if defined(EIGEN_GPUCC)
template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float tanh(const float &x) { return ::tanhf(x); }
@@ -1250,7 +1588,11 @@ T fmod(const T& a, const T& b) {
return fmod(a, b);
}
#ifdef __CUDACC__
#if defined(SYCL_DEVICE_ONLY)
SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(fmod, fmod)
#endif
#if defined(EIGEN_GPUCC)
template <>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
float fmod(const float& a, const float& b) {
@@ -1264,6 +1606,23 @@ double fmod(const double& a, const double& b) {
}
#endif
#if defined(SYCL_DEVICE_ONLY)
#undef SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_BINARY
#undef SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_UNARY
#undef SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_BINARY
#undef SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY
#undef SYCL_SPECIALIZE_INTEGER_TYPES_BINARY
#undef SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY
#undef SYCL_SPECIALIZE_FLOATING_TYPES_BINARY
#undef SYCL_SPECIALIZE_FLOATING_TYPES_UNARY
#undef SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE
#undef SYCL_SPECIALIZE_GEN_UNARY_FUNC
#undef SYCL_SPECIALIZE_UNARY_FUNC
#undef SYCL_SPECIALIZE_GEN1_BINARY_FUNC
#undef SYCL_SPECIALIZE_GEN2_BINARY_FUNC
#undef SYCL_SPECIALIZE_BINARY_FUNC
#endif
} // end namespace numext
namespace internal {
@@ -1392,13 +1751,13 @@ template<> struct random_impl<bool>
template<> struct scalar_fuzzy_impl<bool>
{
typedef bool RealScalar;
template<typename OtherScalar> EIGEN_DEVICE_FUNC
static inline bool isMuchSmallerThan(const bool& x, const bool&, const bool&)
{
return !x;
}
EIGEN_DEVICE_FUNC
static inline bool isApprox(bool x, bool y, bool)
{
@@ -1410,10 +1769,10 @@ template<> struct scalar_fuzzy_impl<bool>
{
return (!x) || y;
}
};
} // end namespace internal
} // end namespace Eigen

View File

@@ -29,12 +29,7 @@ T generic_fast_tanh_float(const T& a_x)
// this range is +/-1.0f in single-precision.
const T plus_9 = pset1<T>(9.f);
const T minus_9 = pset1<T>(-9.f);
// NOTE GCC prior to 6.3 might improperly optimize this max/min
// step such that if a_x is nan, x will be either 9 or -9,
// and tanh will return 1 or -1 instead of nan.
// This is supposed to be fixed in gcc6.3,
// see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867
const T x = pmax(minus_9,pmin(plus_9,a_x));
const T x = pmax(pmin(a_x, plus_9), minus_9);
// The monomial coefficients of the numerator polynomial (odd).
const T alpha_1 = pset1<T>(4.89352455891786e-03f);
const T alpha_3 = pset1<T>(6.37261928875436e-04f);
@@ -72,14 +67,14 @@ T generic_fast_tanh_float(const T& a_x)
}
template<typename RealScalar>
EIGEN_STRONG_INLINE
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
RealScalar positive_real_hypot(const RealScalar& x, const RealScalar& y)
{
EIGEN_USING_STD_MATH(sqrt);
RealScalar p, qp;
p = numext::maxi(x,y);
if(p==RealScalar(0)) return RealScalar(0);
qp = numext::mini(y,x) / p;
qp = numext::mini(y,x) / p;
return p * sqrt(RealScalar(1) + qp*qp);
}
@@ -87,7 +82,8 @@ template<typename Scalar>
struct hypot_impl
{
typedef typename NumTraits<Scalar>::Real RealScalar;
static inline RealScalar run(const Scalar& x, const Scalar& y)
static EIGEN_DEVICE_FUNC
inline RealScalar run(const Scalar& x, const Scalar& y)
{
EIGEN_USING_STD_MATH(abs);
return positive_real_hypot<RealScalar>(abs(x), abs(y));

View File

@@ -255,27 +255,27 @@ class Matrix
*
* \sa resize(Index,Index)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Matrix() : Base()
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix() : Base()
{
Base::_check_template_params();
EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
}
// FIXME is it still needed
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit Matrix(internal::constructor_without_unaligned_array_assert)
: Base(internal::constructor_without_unaligned_array_assert())
{ Base::_check_template_params(); EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED }
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix(Matrix&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_constructible<Scalar>::value)
: Base(std::move(other))
{
Base::_check_template_params();
}
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix& operator=(Matrix&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable<Scalar>::value)
{
other.swap(*this);
@@ -283,25 +283,65 @@ class Matrix
}
#endif
#ifndef EIGEN_PARSED_BY_DOXYGEN
#if EIGEN_HAS_CXX11
/** \copydoc PlainObjectBase(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&... args)
*
* Example: \include Matrix_variadic_ctor_cxx11.cpp
* Output: \verbinclude Matrix_variadic_ctor_cxx11.out
*
* \sa Matrix(const std::initializer_list<std::initializer_list<Scalar>>&)
*/
template <typename... ArgTypes>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs a Matrix and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
* Example: \include Matrix_initializer_list_23_cxx11.cpp
* Output: \verbinclude Matrix_initializer_list_23_cxx11.out
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered.
*
* In the case of a compile-time column vector, implicit transposition from a single row is allowed.
* Therefore <code>VectorXd{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>RowVectorXd{{1},{2},{3},{4},{5}}</code> can be avoided:
*
* Example: \include Matrix_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Matrix_initializer_list_vector_cxx11.out
*
* In the case of fixed-sized matrices, the initializer list sizes must exactly match the matrix sizes,
* and implicit transposition is allowed for compile-time vectors only.
*
* \sa Matrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC
explicit EIGEN_STRONG_INLINE Matrix(const std::initializer_list<std::initializer_list<Scalar>>& list) : Base(list) {}
#endif // end EIGEN_HAS_CXX11
#ifndef EIGEN_PARSED_BY_DOXYGEN
// This constructor is for both 1x1 matrices and dynamic vectors
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE explicit Matrix(const T& x)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit Matrix(const T& x)
{
Base::_check_template_params();
Base::template _init1<T>(x);
}
template<typename T0, typename T1>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Matrix(const T0& x, const T1& y)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Matrix(const T0& x, const T1& y)
{
Base::_check_template_params();
Base::template _init2<T0,T1>(x, y);
}
#else
#else
/** \brief Constructs a fixed-sized matrix initialized with coefficients starting at \a data */
EIGEN_DEVICE_FUNC
explicit Matrix(const Scalar *data);
@@ -319,7 +359,8 @@ class Matrix
* \c EIGEN_INITIALIZE_MATRICES_BY_{ZERO,\c NAN} macros (see \ref TopicPreprocessorDirectives).
*/
EIGEN_STRONG_INLINE explicit Matrix(Index dim);
/** \brief Constructs an initialized 1x1 matrix with the given coefficient */
/** \brief Constructs an initialized 1x1 matrix with the given coefficient
* \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */
Matrix(const Scalar& x);
/** \brief Constructs an uninitialized matrix with \a rows rows and \a cols columns.
*
@@ -336,11 +377,14 @@ class Matrix
EIGEN_DEVICE_FUNC
Matrix(Index rows, Index cols);
/** \brief Constructs an initialized 2D vector with given coefficients */
/** \brief Constructs an initialized 2D vector with given coefficients
* \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */
Matrix(const Scalar& x, const Scalar& y);
#endif
#endif // end EIGEN_PARSED_BY_DOXYGEN
/** \brief Constructs an initialized 3D vector with given coefficients */
/** \brief Constructs an initialized 3D vector with given coefficients
* \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Matrix(const Scalar& x, const Scalar& y, const Scalar& z)
{
@@ -350,7 +394,9 @@ class Matrix
m_storage.data()[1] = y;
m_storage.data()[2] = z;
}
/** \brief Constructs an initialized 4D vector with given coefficients */
/** \brief Constructs an initialized 4D vector with given coefficients
* \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Matrix(const Scalar& x, const Scalar& y, const Scalar& z, const Scalar& w)
{
@@ -405,7 +451,7 @@ class Matrix
*
* \ingroup Core_Module
*
* Eigen defines several typedef shortcuts for most common matrix and vector types.
* %Eigen defines several typedef shortcuts for most common matrix and vector types.
*
* The general patterns are the following:
*
@@ -417,6 +463,15 @@ class Matrix
*
* There are also \c VectorSizeType and \c RowVectorSizeType which are self-explanatory. For example, \c Vector4cf is
* a fixed-size vector of 4 complex floats.
*
* With \cpp11, template alias are also defined for common sizes.
* They follow the same pattern as above except that the scalar type suffix is replaced by a
* template parameter, i.e.:
* - `MatrixSize<Type>` where `Size` can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size.
* - `MatrixXSize<Type>` and `MatrixSizeX<Type>` where `Size` can be \c 2,\c 3,\c 4 for hybrid dynamic/fixed matrices.
* - `VectorSize<Type>` and `RowVectorSize<Type>` for column and row vectors.
*
* With \cpp11, you can also use fully generic column and row vector types: `Vector<Type,Size>` and `RowVector<Type,Size>`.
*
* \sa class Matrix
*/
@@ -454,6 +509,55 @@ EIGEN_MAKE_TYPEDEFS_ALL_SIZES(std::complex<double>, cd)
#undef EIGEN_MAKE_TYPEDEFS
#undef EIGEN_MAKE_FIXED_TYPEDEFS
#if EIGEN_HAS_CXX11
#define EIGEN_MAKE_TYPEDEFS(Size, SizeSuffix) \
/** \ingroup matrixtypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Matrix##SizeSuffix = Matrix<Type, Size, Size>; \
/** \ingroup matrixtypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Vector##SizeSuffix = Matrix<Type, Size, 1>; \
/** \ingroup matrixtypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using RowVector##SizeSuffix = Matrix<Type, 1, Size>;
#define EIGEN_MAKE_FIXED_TYPEDEFS(Size) \
/** \ingroup matrixtypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Matrix##Size##X = Matrix<Type, Size, Dynamic>; \
/** \ingroup matrixtypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Matrix##X##Size = Matrix<Type, Dynamic, Size>;
EIGEN_MAKE_TYPEDEFS(2, 2)
EIGEN_MAKE_TYPEDEFS(3, 3)
EIGEN_MAKE_TYPEDEFS(4, 4)
EIGEN_MAKE_TYPEDEFS(Dynamic, X)
EIGEN_MAKE_FIXED_TYPEDEFS(2)
EIGEN_MAKE_FIXED_TYPEDEFS(3)
EIGEN_MAKE_FIXED_TYPEDEFS(4)
/** \ingroup matrixtypedefs
* \brief \cpp11 */
template <typename Type, int Size>
using Vector = Matrix<Type, Size, 1>;
/** \ingroup matrixtypedefs
* \brief \cpp11 */
template <typename Type, int Size>
using RowVector = Matrix<Type, 1, Size>;
#undef EIGEN_MAKE_TYPEDEFS
#undef EIGEN_MAKE_FIXED_TYPEDEFS
#endif // EIGEN_HAS_CXX11
} // end namespace Eigen
#endif // EIGEN_MATRIX_H

View File

@@ -76,6 +76,7 @@ template<typename Derived> class MatrixBase
using Base::coeffRef;
using Base::lazyAssign;
using Base::eval;
using Base::operator-;
using Base::operator+=;
using Base::operator-=;
using Base::operator*=;
@@ -122,7 +123,6 @@ template<typename Derived> class MatrixBase
#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::MatrixBase
#define EIGEN_DOC_UNARY_ADDONS(X,Y)
# include "../plugins/CommonCwiseUnaryOps.h"
# include "../plugins/CommonCwiseBinaryOps.h"
# include "../plugins/MatrixCwiseUnaryOps.h"
# include "../plugins/MatrixCwiseBinaryOps.h"
@@ -268,6 +268,8 @@ template<typename Derived> class MatrixBase
Derived& setIdentity();
EIGEN_DEVICE_FUNC
Derived& setIdentity(Index rows, Index cols);
EIGEN_DEVICE_FUNC Derived& setUnit(Index i);
EIGEN_DEVICE_FUNC Derived& setUnit(Index newSize, Index i);
bool isIdentity(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
bool isDiagonal(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
@@ -296,7 +298,7 @@ template<typename Derived> class MatrixBase
EIGEN_DEVICE_FUNC inline bool operator!=(const MatrixBase<OtherDerived>& other) const
{ return cwiseNotEqual(other).any(); }
NoAlias<Derived,Eigen::MatrixBase > noalias();
NoAlias<Derived,Eigen::MatrixBase > EIGEN_DEVICE_FUNC noalias();
// TODO forceAlignedAccess is temporarily disabled
// Need to find a nicer workaround.
@@ -326,6 +328,7 @@ template<typename Derived> class MatrixBase
inline const PartialPivLU<PlainObject> lu() const;
EIGEN_DEVICE_FUNC
inline const Inverse<Derived> inverse() const;
template<typename ResultType>
@@ -335,12 +338,15 @@ template<typename Derived> class MatrixBase
bool& invertible,
const RealScalar& absDeterminantThreshold = NumTraits<Scalar>::dummy_precision()
) const;
template<typename ResultType>
inline void computeInverseWithCheck(
ResultType& inverse,
bool& invertible,
const RealScalar& absDeterminantThreshold = NumTraits<Scalar>::dummy_precision()
) const;
EIGEN_DEVICE_FUNC
Scalar determinant() const;
/////////// Cholesky module ///////////
@@ -412,15 +418,19 @@ template<typename Derived> class MatrixBase
////////// Householder module ///////////
EIGEN_DEVICE_FUNC
void makeHouseholderInPlace(Scalar& tau, RealScalar& beta);
template<typename EssentialPart>
EIGEN_DEVICE_FUNC
void makeHouseholder(EssentialPart& essential,
Scalar& tau, RealScalar& beta) const;
template<typename EssentialPart>
EIGEN_DEVICE_FUNC
void applyHouseholderOnTheLeft(const EssentialPart& essential,
const Scalar& tau,
Scalar* workspace);
template<typename EssentialPart>
EIGEN_DEVICE_FUNC
void applyHouseholderOnTheRight(const EssentialPart& essential,
const Scalar& tau,
Scalar* workspace);
@@ -428,8 +438,10 @@ template<typename Derived> class MatrixBase
///////// Jacobi module /////////
template<typename OtherScalar>
EIGEN_DEVICE_FUNC
void applyOnTheLeft(Index p, Index q, const JacobiRotation<OtherScalar>& j);
template<typename OtherScalar>
EIGEN_DEVICE_FUNC
void applyOnTheRight(Index p, Index q, const JacobiRotation<OtherScalar>& j);
///////// SparseCore module /////////
@@ -456,6 +468,11 @@ template<typename Derived> class MatrixBase
const MatrixFunctionReturnValue<Derived> matrixFunction(StemFunction f) const;
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, cosh, hyperbolic cosine)
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, sinh, hyperbolic sine)
#if EIGEN_HAS_CXX11_MATH
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, atanh, inverse hyperbolic cosine)
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, acosh, inverse hyperbolic cosine)
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, asinh, inverse hyperbolic sine)
#endif
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, cos, cosine)
EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, sin, sine)
EIGEN_MATRIX_FUNCTION(MatrixSquareRootReturnValue, sqrt, square root)
@@ -464,8 +481,7 @@ template<typename Derived> class MatrixBase
EIGEN_MATRIX_FUNCTION_1(MatrixComplexPowerReturnValue, pow, power to \c p, const std::complex<RealScalar>& p)
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(MatrixBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MatrixBase)
EIGEN_DEVICE_FUNC MatrixBase() : Base() {}
private:
EIGEN_DEVICE_FUNC explicit MatrixBase(int);

View File

@@ -16,7 +16,11 @@ namespace Eigen {
namespace internal {
template<typename ExpressionType>
struct traits<NestByValue<ExpressionType> > : public traits<ExpressionType>
{};
{
enum {
Flags = traits<ExpressionType>::Flags & ~NestByRefBit
};
};
}
/** \class NestByValue
@@ -43,55 +47,11 @@ template<typename ExpressionType> class NestByValue
EIGEN_DEVICE_FUNC inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index row, Index col) const
{
return m_expression.coeff(row, col);
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col)
{
return m_expression.const_cast_derived().coeffRef(row, col);
}
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const
{
return m_expression.coeff(index);
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index)
{
return m_expression.const_cast_derived().coeffRef(index);
}
template<int LoadMode>
inline const PacketScalar packet(Index row, Index col) const
{
return m_expression.template packet<LoadMode>(row, col);
}
template<int LoadMode>
inline void writePacket(Index row, Index col, const PacketScalar& x)
{
m_expression.const_cast_derived().template writePacket<LoadMode>(row, col, x);
}
template<int LoadMode>
inline const PacketScalar packet(Index index) const
{
return m_expression.template packet<LoadMode>(index);
}
template<int LoadMode>
inline void writePacket(Index index, const PacketScalar& x)
{
m_expression.const_cast_derived().template writePacket<LoadMode>(index, x);
}
EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; }
EIGEN_DEVICE_FUNC const ExpressionType& nestedExpression() const { return m_expression; }
protected:
const ExpressionType m_expression;
};
@@ -99,12 +59,27 @@ template<typename ExpressionType> class NestByValue
/** \returns an expression of the temporary version of *this.
*/
template<typename Derived>
inline const NestByValue<Derived>
EIGEN_DEVICE_FUNC inline const NestByValue<Derived>
DenseBase<Derived>::nestByValue() const
{
return NestByValue<Derived>(derived());
}
namespace internal {
// Evaluator of Solve -> eval into a temporary
template<typename ArgType>
struct evaluator<NestByValue<ArgType> >
: public evaluator<ArgType>
{
typedef evaluator<ArgType> Base;
EIGEN_DEVICE_FUNC explicit evaluator(const NestByValue<ArgType>& xpr)
: Base(xpr.nestedExpression())
{}
};
}
} // end namespace Eigen
#endif // EIGEN_NESTBYVALUE_H

View File

@@ -33,6 +33,7 @@ class NoAlias
public:
typedef typename ExpressionType::Scalar Scalar;
EIGEN_DEVICE_FUNC
explicit NoAlias(ExpressionType& expression) : m_expression(expression) {}
template<typename OtherDerived>
@@ -74,10 +75,10 @@ class NoAlias
*
* More precisely, noalias() allows to bypass the EvalBeforeAssignBit flag.
* Currently, even though several expressions may alias, only product
* expressions have this flag. Therefore, noalias() is only usefull when
* expressions have this flag. Therefore, noalias() is only useful when
* the source expression contains a matrix product.
*
* Here are some examples where noalias is usefull:
* Here are some examples where noalias is useful:
* \code
* D.noalias() = A * B;
* D.noalias() += A.transpose() * B;
@@ -98,7 +99,7 @@ class NoAlias
* \sa class NoAlias
*/
template<typename Derived>
NoAlias<Derived,MatrixBase> MatrixBase<Derived>::noalias()
NoAlias<Derived,MatrixBase> EIGEN_DEVICE_FUNC MatrixBase<Derived>::noalias()
{
return NoAlias<Derived, Eigen::MatrixBase >(derived());
}

View File

@@ -21,12 +21,14 @@ template< typename T,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits10_impl
{
EIGEN_DEVICE_FUNC
static int run() { return std::numeric_limits<T>::digits10; }
};
template<typename T>
struct default_digits10_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
static int run() {
using std::log10;
using std::ceil;
@@ -38,6 +40,38 @@ struct default_digits10_impl<T,false,false> // Floating point
template<typename T>
struct default_digits10_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
static int run() { return 0; }
};
// default implementation of digits(), based on numeric_limits if specialized,
// 0 for integer types, and log2(epsilon()) otherwise.
template< typename T,
bool use_numeric_limits = std::numeric_limits<T>::is_specialized,
bool is_integer = NumTraits<T>::IsInteger>
struct default_digits_impl
{
EIGEN_DEVICE_FUNC
static int run() { return std::numeric_limits<T>::digits; }
};
template<typename T>
struct default_digits_impl<T,false,false> // Floating point
{
EIGEN_DEVICE_FUNC
static int run() {
using std::log;
using std::ceil;
typedef typename NumTraits<T>::Real Real;
return int(ceil(-log(NumTraits<Real>::epsilon())/log(static_cast<Real>(2))));
}
};
template<typename T>
struct default_digits_impl<T,false,true> // Integer
{
EIGEN_DEVICE_FUNC
static int run() { return 0; }
};
@@ -71,7 +105,7 @@ struct default_digits10_impl<T,false,true> // Integer
* and to \c 0 otherwise.
* \li Enum values ReadCost, AddCost and MulCost representing a rough estimate of the number of CPU cycles needed
* to by move / add / mul instructions respectively, assuming the data is already stored in CPU registers.
* Stay vague here. No need to do architecture-specific stuff.
* Stay vague here. No need to do architecture-specific stuff. If you don't know what this means, just use \c Eigen::HugeCost.
* \li An enum value \a IsSigned. It is equal to \c 1 if \a T is a signed type and to 0 if \a T is unsigned.
* \li An enum value \a RequireInitialization. It is equal to \c 1 if the constructor of the numeric type \a T must
* be called, and to 0 if it is safe not to call it. Default is 0 if \a T is an arithmetic type, and 1 otherwise.
@@ -118,6 +152,12 @@ template<typename T> struct GenericNumTraits
return internal::default_digits10_impl<T>::run();
}
EIGEN_DEVICE_FUNC
static inline int digits()
{
return internal::default_digits_impl<T>::run();
}
EIGEN_DEVICE_FUNC
static inline Real dummy_precision()
{
@@ -133,7 +173,8 @@ template<typename T> struct GenericNumTraits
EIGEN_DEVICE_FUNC
static inline T lowest() {
return IsInteger ? (numext::numeric_limits<T>::min)() : (-(numext::numeric_limits<T>::max)());
return IsInteger ? (numext::numeric_limits<T>::min)()
: static_cast<T>(-(numext::numeric_limits<T>::max)());
}
EIGEN_DEVICE_FUNC

View File

@@ -0,0 +1,232 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2011-2018 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_PARTIALREDUX_H
#define EIGEN_PARTIALREDUX_H
namespace Eigen {
namespace internal {
/***************************************************************************
*
* This file provides evaluators for partial reductions.
* There are two modes:
*
* - scalar path: simply calls the respective function on the column or row.
* -> nothing special here, all the tricky part is handled by the return
* types of VectorwiseOp's members. They embed the functor calling the
* respective DenseBase's member function.
*
* - vectorized path: implements a packet-wise reductions followed by
* some (optional) processing of the outcome, e.g., division by n for mean.
*
* For the vectorized path let's observe that the packet-size and outer-unrolling
* are both decided by the assignement logic. So all we have to do is to decide
* on the inner unrolling.
*
* For the unrolling, we can reuse "internal::redux_vec_unroller" from Redux.h,
* but be need to be careful to specify correct increment.
*
***************************************************************************/
/* logic deciding a strategy for unrolling of vectorized paths */
template<typename Func, typename Evaluator>
struct packetwise_redux_traits
{
enum {
OuterSize = int(Evaluator::IsRowMajor) ? Evaluator::RowsAtCompileTime : Evaluator::ColsAtCompileTime,
Cost = OuterSize == Dynamic ? HugeCost
: OuterSize * Evaluator::CoeffReadCost + (OuterSize-1) * functor_traits<Func>::Cost,
Unrolling = Cost <= EIGEN_UNROLLING_LIMIT ? CompleteUnrolling : NoUnrolling
};
};
/* Value to be returned when size==0 , by default let's return 0 */
template<typename PacketType,typename Func>
EIGEN_DEVICE_FUNC
PacketType packetwise_redux_empty_value(const Func& ) { return pset1<PacketType>(0); }
/* For products the default is 1 */
template<typename PacketType,typename Scalar>
EIGEN_DEVICE_FUNC
PacketType packetwise_redux_empty_value(const scalar_product_op<Scalar,Scalar>& ) { return pset1<PacketType>(1); }
/* Perform the actual reduction */
template<typename Func, typename Evaluator,
int Unrolling = packetwise_redux_traits<Func, Evaluator>::Unrolling
>
struct packetwise_redux_impl;
/* Perform the actual reduction with unrolling */
template<typename Func, typename Evaluator>
struct packetwise_redux_impl<Func, Evaluator, CompleteUnrolling>
{
typedef redux_novec_unroller<Func,Evaluator, 0, Evaluator::SizeAtCompileTime> Base;
typedef typename Evaluator::Scalar Scalar;
template<typename PacketType>
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE
PacketType run(const Evaluator &eval, const Func& func, Index /*size*/)
{
return redux_vec_unroller<Func, Evaluator, 0, packetwise_redux_traits<Func, Evaluator>::OuterSize>::template run<PacketType>(eval,func);
}
};
/* Add a specialization of redux_vec_unroller for size==0 at compiletime.
* This specialization is not required for general reductions, which is
* why it is defined here.
*/
template<typename Func, typename Evaluator, int Start>
struct redux_vec_unroller<Func, Evaluator, Start, 0>
{
template<typename PacketType>
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE PacketType run(const Evaluator &, const Func& f)
{
return packetwise_redux_empty_value<PacketType>(f);
}
};
/* Perform the actual reduction for dynamic sizes */
template<typename Func, typename Evaluator>
struct packetwise_redux_impl<Func, Evaluator, NoUnrolling>
{
typedef typename Evaluator::Scalar Scalar;
typedef typename redux_traits<Func, Evaluator>::PacketType PacketScalar;
template<typename PacketType>
EIGEN_DEVICE_FUNC
static PacketType run(const Evaluator &eval, const Func& func, Index size)
{
if(size==0)
return packetwise_redux_empty_value<PacketType>(func);
const Index size4 = (size-1)&(~3);
PacketType p = eval.template packetByOuterInner<Unaligned,PacketType>(0,0);
Index i = 1;
// This loop is optimized for instruction pipelining:
// - each iteration generates two independent instructions
// - thanks to branch prediction and out-of-order execution we have independent instructions across loops
for(; i<size4; i+=4)
p = func.packetOp(p,
func.packetOp(
func.packetOp(eval.template packetByOuterInner<Unaligned,PacketType>(i+0,0),eval.template packetByOuterInner<Unaligned,PacketType>(i+1,0)),
func.packetOp(eval.template packetByOuterInner<Unaligned,PacketType>(i+2,0),eval.template packetByOuterInner<Unaligned,PacketType>(i+3,0))));
for(; i<size; ++i)
p = func.packetOp(p, eval.template packetByOuterInner<Unaligned,PacketType>(i,0));
return p;
}
};
template< typename ArgType, typename MemberOp, int Direction>
struct evaluator<PartialReduxExpr<ArgType, MemberOp, Direction> >
: evaluator_base<PartialReduxExpr<ArgType, MemberOp, Direction> >
{
typedef PartialReduxExpr<ArgType, MemberOp, Direction> XprType;
typedef typename internal::nested_eval<ArgType,1>::type ArgTypeNested;
typedef typename internal::add_const_on_value_type<ArgTypeNested>::type ConstArgTypeNested;
typedef typename internal::remove_all<ArgTypeNested>::type ArgTypeNestedCleaned;
typedef typename ArgType::Scalar InputScalar;
typedef typename XprType::Scalar Scalar;
enum {
TraversalSize = Direction==int(Vertical) ? int(ArgType::RowsAtCompileTime) : int(ArgType::ColsAtCompileTime)
};
typedef typename MemberOp::template Cost<int(TraversalSize)> CostOpType;
enum {
CoeffReadCost = TraversalSize==Dynamic ? HugeCost
: TraversalSize==0 ? 1
: TraversalSize * evaluator<ArgType>::CoeffReadCost + int(CostOpType::value),
_ArgFlags = evaluator<ArgType>::Flags,
_Vectorizable = bool(int(_ArgFlags)&PacketAccessBit)
&& bool(MemberOp::Vectorizable)
&& (Direction==int(Vertical) ? bool(_ArgFlags&RowMajorBit) : (_ArgFlags&RowMajorBit)==0)
&& (TraversalSize!=0),
Flags = (traits<XprType>::Flags&RowMajorBit)
| (evaluator<ArgType>::Flags&(HereditaryBits&(~RowMajorBit)))
| (_Vectorizable ? PacketAccessBit : 0)
| LinearAccessBit,
Alignment = 0 // FIXME this will need to be improved once PartialReduxExpr is vectorized
};
EIGEN_DEVICE_FUNC explicit evaluator(const XprType xpr)
: m_arg(xpr.nestedExpression()), m_functor(xpr.functor())
{
EIGEN_INTERNAL_CHECK_COST_VALUE(TraversalSize==Dynamic ? HugeCost : (TraversalSize==0 ? 1 : int(CostOpType::value)));
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
typedef typename XprType::CoeffReturnType CoeffReturnType;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar coeff(Index i, Index j) const
{
return coeff(Direction==Vertical ? j : i);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar coeff(Index index) const
{
return m_functor(m_arg.template subVector<DirectionType(Direction)>(index));
}
template<int LoadMode,typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
PacketType packet(Index i, Index j) const
{
return packet<LoadMode,PacketType>(Direction==Vertical ? j : i);
}
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC
PacketType packet(Index idx) const
{
enum { PacketSize = internal::unpacket_traits<PacketType>::size };
typedef Block<const ArgTypeNestedCleaned,
Direction==Vertical ? int(ArgType::RowsAtCompileTime) : int(PacketSize),
Direction==Vertical ? int(PacketSize) : int(ArgType::ColsAtCompileTime),
true /* InnerPanel */> PanelType;
PanelType panel(m_arg,
Direction==Vertical ? 0 : idx,
Direction==Vertical ? idx : 0,
Direction==Vertical ? m_arg.rows() : Index(PacketSize),
Direction==Vertical ? Index(PacketSize) : m_arg.cols());
// FIXME
// See bug 1612, currently if PacketSize==1 (i.e. complex<double> with 128bits registers) then the storage-order of panel get reversed
// and methods like packetByOuterInner do not make sense anymore in this context.
// So let's just by pass "vectorization" in this case:
if(PacketSize==1)
return internal::pset1<PacketType>(coeff(idx));
typedef typename internal::redux_evaluator<PanelType> PanelEvaluator;
PanelEvaluator panel_eval(panel);
typedef typename MemberOp::BinaryOp BinaryOp;
PacketType p = internal::packetwise_redux_impl<BinaryOp,PanelEvaluator>::template run<PacketType>(panel_eval,m_functor.binaryFunc(),m_arg.outerSize());
return p;
}
protected:
ConstArgTypeNested m_arg;
const MemberOp m_functor;
};
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_PARTIALREDUX_H

View File

@@ -87,25 +87,14 @@ class PermutationBase : public EigenBase<Derived>
return derived();
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
Derived& operator=(const PermutationBase& other)
{
indices() = other.indices();
return derived();
}
#endif
/** \returns the number of rows */
inline Index rows() const { return Index(indices().size()); }
inline EIGEN_DEVICE_FUNC Index rows() const { return Index(indices().size()); }
/** \returns the number of columns */
inline Index cols() const { return Index(indices().size()); }
inline EIGEN_DEVICE_FUNC Index cols() const { return Index(indices().size()); }
/** \returns the size of a side of the respective square matrix, i.e., the number of indices */
inline Index size() const { return Index(indices().size()); }
inline EIGEN_DEVICE_FUNC Index size() const { return Index(indices().size()); }
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename DenseDerived>
@@ -333,12 +322,6 @@ class PermutationMatrix : public PermutationBase<PermutationMatrix<SizeAtCompile
inline PermutationMatrix(const PermutationBase<OtherDerived>& other)
: m_indices(other.indices()) {}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** Standard copy constructor. Defined only to prevent a default copy constructor
* from hiding the other templated constructor */
inline PermutationMatrix(const PermutationMatrix& other) : m_indices(other.indices()) {}
#endif
/** Generic constructor from expression of the indices. The indices
* array has the meaning that the permutations sends each integer i to indices[i].
*
@@ -373,17 +356,6 @@ class PermutationMatrix : public PermutationBase<PermutationMatrix<SizeAtCompile
return Base::operator=(tr.derived());
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
PermutationMatrix& operator=(const PermutationMatrix& other)
{
m_indices = other.m_indices;
return *this;
}
#endif
/** const version of indices(). */
const IndicesType& indices() const { return m_indices; }
/** \returns a reference to the stored array representing the permutation. */

View File

@@ -104,7 +104,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
typedef typename internal::traits<Derived>::StorageKind StorageKind;
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef typename internal::packet_traits<Scalar>::type PacketScalar;
typedef typename NumTraits<Scalar>::Real RealScalar;
typedef Derived DenseType;
@@ -358,7 +358,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
* remain row-vectors and vectors remain vectors.
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE void resizeLike(const EigenBase<OtherDerived>& _other)
{
const OtherDerived& other = _other.derived();
@@ -383,7 +383,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
* of rows and/or of columns, you can use conservativeResize(NoChange_t, Index) or
* conservativeResize(Index, NoChange_t).
*
* Matrices are resized relative to the top-left element. In case values need to be
* Matrices are resized relative to the top-left element. In case values need to be
* appended to the matrix they will be uninitialized.
*/
EIGEN_DEVICE_FUNC
@@ -440,7 +440,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
* of rows and/or of columns, you can use conservativeResize(NoChange_t, Index) or
* conservativeResize(Index, NoChange_t).
*
* Matrices are resized relative to the top-left element. In case values need to be
* Matrices are resized relative to the top-left element. In case values need to be
* appended to the matrix they will copied from \c other.
*/
template<typename OtherDerived>
@@ -526,6 +526,71 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
// EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
}
#if EIGEN_HAS_CXX11
/** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients. \cpp11
*
* \only_for_vectors
*
* This constructor is for 1D array or vectors with more than 4 coefficients.
* There exists C++98 analogue constructors for fixed-size array/vector having 1, 2, 3, or 4 coefficients.
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
* constructor must match the the fixed number of rows (resp. columns) of \c *this.
*/
template <typename... ArgTypes>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
PlainObjectBase(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
: m_storage()
{
_check_template_params();
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, sizeof...(args) + 4);
m_storage.data()[0] = a0;
m_storage.data()[1] = a1;
m_storage.data()[2] = a2;
m_storage.data()[3] = a3;
int i = 4;
auto x = {(m_storage.data()[i++] = args, 0)...};
static_cast<void>(x);
}
/** \brief Constructs a Matrix or Array and initializes it by elements given by an initializer list of initializer
* lists \cpp11
*/
EIGEN_DEVICE_FUNC
explicit EIGEN_STRONG_INLINE PlainObjectBase(const std::initializer_list<std::initializer_list<Scalar>>& list)
: m_storage()
{
_check_template_params();
size_t list_size = 0;
if (list.begin() != list.end()) {
list_size = list.begin()->size();
}
// This is to allow syntax like VectorXi {{1, 2, 3, 4}}
if (ColsAtCompileTime == 1 && list.size() == 1) {
eigen_assert(list_size == static_cast<size_t>(RowsAtCompileTime) || RowsAtCompileTime == Dynamic);
resize(list_size, ColsAtCompileTime);
std::copy(list.begin()->begin(), list.begin()->end(), m_storage.data());
} else {
eigen_assert(list.size() == static_cast<size_t>(RowsAtCompileTime) || RowsAtCompileTime == Dynamic);
eigen_assert(list_size == static_cast<size_t>(ColsAtCompileTime) || ColsAtCompileTime == Dynamic);
resize(list.size(), list_size);
Index row_index = 0;
for (const std::initializer_list<Scalar>& row : list) {
eigen_assert(list_size == row.size());
Index col_index = 0;
for (const Scalar& e : row) {
coeffRef(row_index, col_index) = e;
++col_index;
}
++row_index;
}
}
}
#endif // end EIGEN_HAS_CXX11
/** \sa PlainObjectBase::operator=(const EigenBase<OtherDerived>&) */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
@@ -564,7 +629,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
* \copydetails DenseBase::operator=(const EigenBase<OtherDerived> &other)
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& operator=(const EigenBase<OtherDerived> &other)
{
_resize_to_match(other);
@@ -678,7 +743,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
* remain row-vectors and vectors remain vectors.
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE void _resize_to_match(const EigenBase<OtherDerived>& other)
{
#ifdef EIGEN_NO_AUTOMATIC_RESIZING
@@ -705,10 +770,10 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
*
* \internal
*/
// aliasing is dealt once in internall::call_assignment
// aliasing is dealt once in internal::call_assignment
// so at this stage we have to assume aliasing... and resising has to be done later.
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& _set(const DenseBase<OtherDerived>& other)
{
internal::call_assignment(this->derived(), other.derived());
@@ -721,7 +786,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
* \sa operator=(const MatrixBase<OtherDerived>&), _set()
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& _set_noalias(const DenseBase<OtherDerived>& other)
{
// I don't think we need this resize call since the lazyAssign will anyways resize
@@ -744,18 +809,18 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED)
resize(rows,cols);
}
template<typename T0, typename T1>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE void _init2(const T0& val0, const T1& val1, typename internal::enable_if<Base::SizeAtCompileTime==2,T0>::type* = 0)
{
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 2)
m_storage.data()[0] = Scalar(val0);
m_storage.data()[1] = Scalar(val1);
}
template<typename T0, typename T1>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE void _init2(const Index& val0, const Index& val1,
typename internal::enable_if< (!internal::is_same<Index,Scalar>::value)
&& (internal::is_same<T0,Index>::value)
@@ -781,8 +846,8 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED)
resize(size);
}
// We have a 1x1 matrix/array => the argument is interpreted as the value of the unique coefficient (case where scalar type can be implicitely converted)
// We have a 1x1 matrix/array => the argument is interpreted as the value of the unique coefficient (case where scalar type can be implicitly converted)
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE void _init1(const Scalar& val0, typename internal::enable_if<Base::SizeAtCompileTime==1 && internal::is_convertible<T, Scalar>::value,T>::type* = 0)
@@ -790,7 +855,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 1)
m_storage.data()[0] = val0;
}
// We have a 1x1 matrix/array => the argument is interpreted as the value of the unique coefficient (case where scalar type match the index type)
template<typename T>
EIGEN_DEVICE_FUNC
@@ -846,7 +911,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
{
this->derived() = r;
}
// For fixed-size Array<Scalar,...>
template<typename T>
EIGEN_DEVICE_FUNC
@@ -858,7 +923,7 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
{
Base::setConstant(val0);
}
// For fixed-size Array<Index,...>
template<typename T>
EIGEN_DEVICE_FUNC
@@ -872,34 +937,34 @@ class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
{
Base::setConstant(val0);
}
template<typename MatrixTypeA, typename MatrixTypeB, bool SwapPointers>
friend struct internal::matrix_swap_impl;
public:
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** \internal
* \brief Override DenseBase::swap() since for dynamic-sized matrices
* of same type it is enough to swap the data pointers.
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void swap(DenseBase<OtherDerived> & other)
{
enum { SwapPointers = internal::is_same<Derived, OtherDerived>::value && Base::SizeAtCompileTime==Dynamic };
internal::matrix_swap_impl<Derived, OtherDerived, bool(SwapPointers)>::run(this->derived(), other.derived());
}
/** \internal
* \brief const version forwarded to DenseBase::swap
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void swap(DenseBase<OtherDerived> const & other)
{ Base::swap(other.derived()); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE void _check_template_params()
{
EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (Options&RowMajor)==RowMajor)
@@ -923,13 +988,19 @@ namespace internal {
template <typename Derived, typename OtherDerived, bool IsVector>
struct conservative_resize_like_impl
{
#if EIGEN_HAS_TYPE_TRAITS
static const bool IsRelocatable = std::is_trivially_copyable<typename Derived::Scalar>::value;
#else
static const bool IsRelocatable = !NumTraits<typename Derived::Scalar>::RequireInitialization;
#endif
static void run(DenseBase<Derived>& _this, Index rows, Index cols)
{
if (_this.rows() == rows && _this.cols() == cols) return;
EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(Derived)
if ( ( Derived::IsRowMajor && _this.cols() == cols) || // row-major and we change only the number of rows
(!Derived::IsRowMajor && _this.rows() == rows) ) // column-major and we change only the number of columns
if ( IsRelocatable
&& (( Derived::IsRowMajor && _this.cols() == cols) || // row-major and we change only the number of rows
(!Derived::IsRowMajor && _this.rows() == rows) )) // column-major and we change only the number of columns
{
internal::check_rows_cols_for_overflow<Derived::MaxSizeAtCompileTime>::run(rows, cols);
_this.derived().m_storage.conservativeResize(rows*cols,rows,cols);
@@ -957,8 +1028,9 @@ struct conservative_resize_like_impl
EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(Derived)
EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(OtherDerived)
if ( ( Derived::IsRowMajor && _this.cols() == other.cols()) || // row-major and we change only the number of rows
(!Derived::IsRowMajor && _this.rows() == other.rows()) ) // column-major and we change only the number of columns
if ( IsRelocatable &&
(( Derived::IsRowMajor && _this.cols() == other.cols()) || // row-major and we change only the number of rows
(!Derived::IsRowMajor && _this.rows() == other.rows()) )) // column-major and we change only the number of columns
{
const Index new_rows = other.rows() - _this.rows();
const Index new_cols = other.cols() - _this.cols();
@@ -986,13 +1058,18 @@ template <typename Derived, typename OtherDerived>
struct conservative_resize_like_impl<Derived,OtherDerived,true>
: conservative_resize_like_impl<Derived,OtherDerived,false>
{
using conservative_resize_like_impl<Derived,OtherDerived,false>::run;
typedef conservative_resize_like_impl<Derived,OtherDerived,false> Base;
using Base::run;
using Base::IsRelocatable;
static void run(DenseBase<Derived>& _this, Index size)
{
const Index new_rows = Derived::RowsAtCompileTime==1 ? 1 : size;
const Index new_cols = Derived::RowsAtCompileTime==1 ? size : 1;
_this.derived().m_storage.conservativeResize(size,new_rows,new_cols);
if(IsRelocatable)
_this.derived().m_storage.conservativeResize(size,new_rows,new_cols);
else
Base::run(_this.derived(), new_rows, new_cols);
}
static void run(DenseBase<Derived>& _this, const DenseBase<OtherDerived>& other)
@@ -1003,7 +1080,10 @@ struct conservative_resize_like_impl<Derived,OtherDerived,true>
const Index new_rows = Derived::RowsAtCompileTime==1 ? 1 : other.rows();
const Index new_cols = Derived::RowsAtCompileTime==1 ? other.cols() : 1;
_this.derived().m_storage.conservativeResize(other.size(),new_rows,new_cols);
if(IsRelocatable)
_this.derived().m_storage.conservativeResize(other.size(),new_rows,new_cols);
else
Base::run(_this.derived(), new_rows, new_cols);
if (num_new_elements > 0)
_this.tail(num_new_elements) = other.tail(num_new_elements);
@@ -1014,7 +1094,7 @@ template<typename MatrixTypeA, typename MatrixTypeB, bool SwapPointers>
struct matrix_swap_impl
{
EIGEN_DEVICE_FUNC
static inline void run(MatrixTypeA& a, MatrixTypeB& b)
static EIGEN_STRONG_INLINE void run(MatrixTypeA& a, MatrixTypeB& b)
{
a.base().swap(b);
}

View File

@@ -90,18 +90,23 @@ class Product : public ProductImpl<_Lhs,_Rhs,Option,
typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
EIGEN_DEVICE_FUNC Product(const Lhs& lhs, const Rhs& rhs) : m_lhs(lhs), m_rhs(rhs)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Product(const Lhs& lhs, const Rhs& rhs) : m_lhs(lhs), m_rhs(rhs)
{
eigen_assert(lhs.cols() == rhs.rows()
&& "invalid matrix product"
&& "if you wanted a coeff-wise or a dot product use the respective explicit functions");
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rows() const { return m_lhs.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index cols() const { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_lhs.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_rhs.cols(); }
EIGEN_DEVICE_FUNC const LhsNestedCleaned& lhs() const { return m_lhs; }
EIGEN_DEVICE_FUNC const RhsNestedCleaned& rhs() const { return m_rhs; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const LhsNestedCleaned& lhs() const { return m_lhs; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const RhsNestedCleaned& rhs() const { return m_rhs; }
protected:
@@ -116,7 +121,7 @@ class dense_product_base
: public internal::dense_xpr_base<Product<Lhs,Rhs,Option> >::type
{};
/** Convertion to scalar for inner-products */
/** Conversion to scalar for inner-products */
template<typename Lhs, typename Rhs, int Option>
class dense_product_base<Lhs, Rhs, Option, InnerProduct>
: public internal::dense_xpr_base<Product<Lhs,Rhs,Option> >::type
@@ -127,7 +132,7 @@ public:
using Base::derived;
typedef typename Base::Scalar Scalar;
EIGEN_STRONG_INLINE operator const Scalar() const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator const Scalar() const
{
return internal::evaluator<ProductXpr>(derived()).coeff(0,0);
}

View File

@@ -20,7 +20,7 @@ namespace internal {
/** \internal
* Evaluator of a product expression.
* Since products require special treatments to handle all possible cases,
* we simply deffer the evaluation logic to a product_evaluator class
* we simply defer the evaluation logic to a product_evaluator class
* which offers more partial specialization possibilities.
*
* \sa class product_evaluator
@@ -128,7 +128,7 @@ protected:
PlainObject m_result;
};
// The following three shortcuts are enabled only if the scalar types match excatly.
// The following three shortcuts are enabled only if the scalar types match exactly.
// TODO: we could enable them for different scalar types when the product is not vectorized.
// Dense = Product
@@ -137,7 +137,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,Options>, internal::assign_op<Scal
typename enable_if<(Options==DefaultProduct || Options==AliasFreeProduct)>::type>
{
typedef Product<Lhs,Rhs,Options> SrcXprType;
static EIGEN_STRONG_INLINE
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op<Scalar,Scalar> &)
{
Index dstRows = src.rows();
@@ -155,7 +155,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,Options>, internal::add_assign_op<
typename enable_if<(Options==DefaultProduct || Options==AliasFreeProduct)>::type>
{
typedef Product<Lhs,Rhs,Options> SrcXprType;
static EIGEN_STRONG_INLINE
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op<Scalar,Scalar> &)
{
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols());
@@ -170,7 +170,7 @@ struct Assignment<DstXprType, Product<Lhs,Rhs,Options>, internal::sub_assign_op<
typename enable_if<(Options==DefaultProduct || Options==AliasFreeProduct)>::type>
{
typedef Product<Lhs,Rhs,Options> SrcXprType;
static EIGEN_STRONG_INLINE
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op<Scalar,Scalar> &)
{
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols());
@@ -190,7 +190,7 @@ struct Assignment<DstXprType, CwiseBinaryOp<internal::scalar_product_op<ScalarBi
typedef CwiseBinaryOp<internal::scalar_product_op<ScalarBis,Scalar>,
const CwiseNullaryOp<internal::scalar_constant_op<ScalarBis>,Plain>,
const Product<Lhs,Rhs,DefaultProduct> > SrcXprType;
static EIGEN_STRONG_INLINE
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void run(DstXprType &dst, const SrcXprType &src, const AssignFunc& func)
{
call_assignment_no_alias(dst, (src.lhs().functor().m_other * src.rhs().lhs())*src.rhs().rhs(), func);
@@ -217,7 +217,7 @@ template<typename DstXprType, typename OtherXpr, typename ProductType, typename
struct assignment_from_xpr_op_product
{
template<typename SrcXprType, typename InitialFunc>
static EIGEN_STRONG_INLINE
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void run(DstXprType &dst, const SrcXprType &src, const InitialFunc& /*func*/)
{
call_assignment_no_alias(dst, src.lhs(), Func1());
@@ -246,19 +246,19 @@ template<typename Lhs, typename Rhs>
struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,InnerProduct>
{
template<typename Dst>
static EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
dst.coeffRef(0,0) = (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
dst.coeffRef(0,0) += (lhs.transpose().cwiseProduct(rhs)).sum();
}
template<typename Dst>
static EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.coeffRef(0,0) -= (lhs.transpose().cwiseProduct(rhs)).sum(); }
};
@@ -269,10 +269,10 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,InnerProduct>
// Column major result
template<typename Dst, typename Lhs, typename Rhs, typename Func>
void outer_product_selector_run(Dst& dst, const Lhs &lhs, const Rhs &rhs, const Func& func, const false_type&)
void EIGEN_DEVICE_FUNC outer_product_selector_run(Dst& dst, const Lhs &lhs, const Rhs &rhs, const Func& func, const false_type&)
{
evaluator<Rhs> rhsEval(rhs);
typename nested_eval<Lhs,Rhs::SizeAtCompileTime>::type actual_lhs(lhs);
ei_declare_local_nested_eval(Lhs,lhs,Rhs::SizeAtCompileTime,actual_lhs);
// FIXME if cols is large enough, then it might be useful to make sure that lhs is sequentially stored
// FIXME not very good if rhs is real and lhs complex while alpha is real too
const Index cols = dst.cols();
@@ -282,10 +282,10 @@ void outer_product_selector_run(Dst& dst, const Lhs &lhs, const Rhs &rhs, const
// Row major result
template<typename Dst, typename Lhs, typename Rhs, typename Func>
void outer_product_selector_run(Dst& dst, const Lhs &lhs, const Rhs &rhs, const Func& func, const true_type&)
void EIGEN_DEVICE_FUNC outer_product_selector_run(Dst& dst, const Lhs &lhs, const Rhs &rhs, const Func& func, const true_type&)
{
evaluator<Lhs> lhsEval(lhs);
typename nested_eval<Rhs,Lhs::SizeAtCompileTime>::type actual_rhs(rhs);
ei_declare_local_nested_eval(Rhs,rhs,Lhs::SizeAtCompileTime,actual_rhs);
// FIXME if rows is large enough, then it might be useful to make sure that rhs is sequentially stored
// FIXME not very good if lhs is real and rhs complex while alpha is real too
const Index rows = dst.rows();
@@ -300,37 +300,37 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,OuterProduct>
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
// TODO it would be nice to be able to exploit our *_assign_op functors for that purpose
struct set { template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() = src; } };
struct add { template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() += src; } };
struct sub { template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() -= src; } };
struct set { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() = src; } };
struct add { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() += src; } };
struct sub { template<typename Dst, typename Src> EIGEN_DEVICE_FUNC void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() -= src; } };
struct adds {
Scalar m_scale;
explicit adds(const Scalar& s) : m_scale(s) {}
template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const {
template<typename Dst, typename Src> void EIGEN_DEVICE_FUNC operator()(const Dst& dst, const Src& src) const {
dst.const_cast_derived() += m_scale * src;
}
};
template<typename Dst>
static EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, set(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, add(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
internal::outer_product_selector_run(dst, lhs, rhs, sub(), is_row_major<Dst>());
}
template<typename Dst>
static EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
internal::outer_product_selector_run(dst, lhs, rhs, adds(alpha), is_row_major<Dst>());
}
@@ -345,19 +345,19 @@ struct generic_product_impl_base
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ dst.setZero(); scaleAndAddTo(dst, lhs, rhs, Scalar(1)); }
template<typename Dst>
static EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ scaleAndAddTo(dst,lhs, rhs, Scalar(1)); }
template<typename Dst>
static EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{ scaleAndAddTo(dst, lhs, rhs, Scalar(-1)); }
template<typename Dst>
static EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{ Derived::scaleAndAddTo(dst,lhs,rhs,alpha); }
};
@@ -373,7 +373,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemvProduct>
typedef typename internal::remove_all<typename internal::conditional<int(Side)==OnTheRight,LhsNested,RhsNested>::type>::type MatrixType;
template<typename Dest>
static EIGEN_STRONG_INLINE void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
LhsNested actual_lhs(lhs);
RhsNested actual_rhs(rhs);
@@ -390,7 +390,7 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dst>
static EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
// Same as: dst.noalias() = lhs.lazyProduct(rhs);
// but easier on the compiler side
@@ -398,48 +398,71 @@ struct generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,CoeffBasedProductMode>
}
template<typename Dst>
static EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
// dst.noalias() += lhs.lazyProduct(rhs);
call_assignment_no_alias(dst, lhs.lazyProduct(rhs), internal::add_assign_op<typename Dst::Scalar,Scalar>());
}
template<typename Dst>
static EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs)
{
// dst.noalias() -= lhs.lazyProduct(rhs);
call_assignment_no_alias(dst, lhs.lazyProduct(rhs), internal::sub_assign_op<typename Dst::Scalar,Scalar>());
}
// Catch "dst {,+,-}= (s*A)*B" and evaluate it lazily by moving out the scalar factor:
// dst {,+,-}= s * (A.lazyProduct(B))
// This is a huge benefit for heap-allocated matrix types as it save one costly allocation.
// For them, this strategy is also faster than simply by-passing the heap allocation through
// stack allocation.
// For fixed sizes matrices, this is less obvious, it is sometimes x2 faster, but sometimes x3 slower,
// and the behavior depends also a lot on the compiler... so let's be conservative and enable them for dynamic-size only,
// that is when coming from generic_product_impl<...,GemmProduct> in file GeneralMatrixMatrix.h
template<typename Dst, typename Scalar1, typename Scalar2, typename Plain1, typename Xpr2, typename Func>
// This is a special evaluation path called from generic_product_impl<...,GemmProduct> in file GeneralMatrixMatrix.h
// This variant tries to extract scalar multiples from both the LHS and RHS and factor them out. For instance:
// dst {,+,-}= (s1*A)*(B*s2)
// will be rewritten as:
// dst {,+,-}= (s1*s2) * (A.lazyProduct(B))
// There are at least four benefits of doing so:
// 1 - huge performance gain for heap-allocated matrix types as it save costly allocations.
// 2 - it is faster than simply by-passing the heap allocation through stack allocation.
// 3 - it makes this fallback consistent with the heavy GEMM routine.
// 4 - it fully by-passes huge stack allocation attempts when multiplying huge fixed-size matrices.
// (see https://stackoverflow.com/questions/54738495)
// For small fixed sizes matrices, howver, the gains are less obvious, it is sometimes x2 faster, but sometimes x3 slower,
// and the behavior depends also a lot on the compiler... This is why this re-writting strategy is currently
// enabled only when falling back from the main GEMM.
template<typename Dst, typename Func>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void eval_dynamic(Dst& dst, const CwiseBinaryOp<internal::scalar_product_op<Scalar1,Scalar2>,
const CwiseNullaryOp<internal::scalar_constant_op<Scalar1>, Plain1>, Xpr2>& lhs, const Rhs& rhs, const Func &func)
void eval_dynamic(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Func &func)
{
call_assignment_no_alias(dst, lhs.lhs().functor().m_other * lhs.rhs().lazyProduct(rhs), func);
enum {
HasScalarFactor = blas_traits<Lhs>::HasScalarFactor || blas_traits<Rhs>::HasScalarFactor,
ConjLhs = blas_traits<Lhs>::NeedToConjugate,
ConjRhs = blas_traits<Rhs>::NeedToConjugate
};
// FIXME: in c++11 this should be auto, and extractScalarFactor should also return auto
// this is important for real*complex_mat
Scalar actualAlpha = blas_traits<Lhs>::extractScalarFactor(lhs)
* blas_traits<Rhs>::extractScalarFactor(rhs);
eval_dynamic_impl(dst,
blas_traits<Lhs>::extract(lhs).template conjugateIf<ConjLhs>(),
blas_traits<Rhs>::extract(rhs).template conjugateIf<ConjRhs>(),
func,
actualAlpha,
typename conditional<HasScalarFactor,true_type,false_type>::type());
}
// Here, we we always have LhsT==Lhs, but we need to make it a template type to make the above
// overload more specialized.
template<typename Dst, typename LhsT, typename Func>
protected:
template<typename Dst, typename LhsT, typename RhsT, typename Func, typename Scalar>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void eval_dynamic(Dst& dst, const LhsT& lhs, const Rhs& rhs, const Func &func)
void eval_dynamic_impl(Dst& dst, const LhsT& lhs, const RhsT& rhs, const Func &func, const Scalar& s /* == 1 */, false_type)
{
call_assignment_no_alias(dst, lhs.lazyProduct(rhs), func);
EIGEN_UNUSED_VARIABLE(s);
eigen_internal_assert(s==Scalar(1));
call_restricted_packet_assignment_no_alias(dst, lhs.lazyProduct(rhs), func);
}
template<typename Dst, typename LhsT, typename RhsT, typename Func, typename Scalar>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void eval_dynamic_impl(Dst& dst, const LhsT& lhs, const RhsT& rhs, const Func &func, const Scalar& s, true_type)
{
call_restricted_packet_assignment_no_alias(dst, s * lhs.lazyProduct(rhs), func);
}
// template<typename Dst>
// static inline void scaleAndAddTo(Dst& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
// { dst.noalias() += alpha * lhs.lazyProduct(rhs); }
};
// This specialization enforces the use of a coefficient-based evaluation strategy
@@ -582,7 +605,8 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
* which is why we don't set the LinearAccessBit.
* TODO: this seems possible when the result is a vector
*/
EIGEN_DEVICE_FUNC const CoeffReturnType coeff(Index index) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const CoeffReturnType coeff(Index index) const
{
const Index row = (RowsAtCompileTime == 1 || MaxRowsAtCompileTime==1) ? 0 : index;
const Index col = (RowsAtCompileTime == 1 || MaxRowsAtCompileTime==1) ? index : 0;
@@ -590,6 +614,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
}
template<int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const PacketType packet(Index row, Index col) const
{
PacketType res;
@@ -601,6 +626,7 @@ struct product_evaluator<Product<Lhs, Rhs, LazyProduct>, ProductTag, DenseShape,
}
template<int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const PacketType packet(Index index) const
{
const Index row = (RowsAtCompileTime == 1 || MaxRowsAtCompileTime==1) ? 0 : index;
@@ -629,7 +655,8 @@ struct product_evaluator<Product<Lhs, Rhs, DefaultProduct>, LazyCoeffBasedProduc
enum {
Flags = Base::Flags | EvalBeforeNestingBit
};
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit product_evaluator(const XprType& xpr)
: Base(BaseProduct(xpr.lhs(),xpr.rhs()))
{}
};
@@ -767,7 +794,8 @@ struct generic_product_impl<Lhs,Rhs,SelfAdjointShape,DenseShape,ProductTag>
typedef typename Product<Lhs,Rhs>::Scalar Scalar;
template<typename Dest>
static void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
static EIGEN_DEVICE_FUNC
void scaleAndAddTo(Dest& dst, const Lhs& lhs, const Rhs& rhs, const Scalar& alpha)
{
selfadjoint_product_impl<typename Lhs::MatrixType,Lhs::Mode,false,Rhs,0,Rhs::IsVectorAtCompileTime>::run(dst, lhs.nestedExpression(), rhs, alpha);
}
@@ -802,13 +830,21 @@ public:
MatrixFlags = evaluator<MatrixType>::Flags,
DiagFlags = evaluator<DiagonalType>::Flags,
_StorageOrder = MatrixFlags & RowMajorBit ? RowMajor : ColMajor,
_StorageOrder = (Derived::MaxRowsAtCompileTime==1 && Derived::MaxColsAtCompileTime!=1) ? RowMajor
: (Derived::MaxColsAtCompileTime==1 && Derived::MaxRowsAtCompileTime!=1) ? ColMajor
: MatrixFlags & RowMajorBit ? RowMajor : ColMajor,
_SameStorageOrder = _StorageOrder == (MatrixFlags & RowMajorBit ? RowMajor : ColMajor),
_ScalarAccessOnDiag = !((int(_StorageOrder) == ColMajor && int(ProductOrder) == OnTheLeft)
||(int(_StorageOrder) == RowMajor && int(ProductOrder) == OnTheRight)),
_SameTypes = is_same<typename MatrixType::Scalar, typename DiagonalType::Scalar>::value,
// FIXME currently we need same types, but in the future the next rule should be the one
//_Vectorizable = bool(int(MatrixFlags)&PacketAccessBit) && ((!_PacketOnDiag) || (_SameTypes && bool(int(DiagFlags)&PacketAccessBit))),
_Vectorizable = bool(int(MatrixFlags)&PacketAccessBit) && _SameTypes && (_ScalarAccessOnDiag || (bool(int(DiagFlags)&PacketAccessBit))),
_Vectorizable = bool(int(MatrixFlags)&PacketAccessBit)
&& _SameTypes
&& (_SameStorageOrder || (MatrixFlags&LinearAccessBit)==LinearAccessBit)
&& (_ScalarAccessOnDiag || (bool(int(DiagFlags)&PacketAccessBit))),
_LinearAccessMask = (MatrixType::RowsAtCompileTime==1 || MatrixType::ColsAtCompileTime==1) ? LinearAccessBit : 0,
Flags = ((HereditaryBits|_LinearAccessMask) & (unsigned int)(MatrixFlags)) | (_Vectorizable ? PacketAccessBit : 0),
Alignment = evaluator<MatrixType>::Alignment,
@@ -869,10 +905,10 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
typedef typename Lhs::DiagonalVectorType DiagonalType;
enum {
StorageOrder = int(Rhs::Flags) & RowMajorBit ? RowMajor : ColMajor
};
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.rhs(), xpr.lhs().diagonal())
@@ -884,7 +920,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DiagonalSha
return m_diagImpl.coeff(row) * m_matImpl.coeff(row, col);
}
#ifndef __CUDACC__
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
{
@@ -916,7 +952,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
typedef Product<Lhs, Rhs, ProductKind> XprType;
typedef typename XprType::PlainObject PlainObject;
enum { StorageOrder = int(Lhs::Flags) & RowMajorBit ? RowMajor : ColMajor };
enum { StorageOrder = Base::_StorageOrder };
EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr)
: Base(xpr.lhs(), xpr.rhs().diagonal())
@@ -928,7 +964,7 @@ struct product_evaluator<Product<Lhs, Rhs, ProductKind>, ProductTag, DenseShape,
return m_matImpl.coeff(row, col) * m_diagImpl.coeff(col);
}
#ifndef __CUDACC__
#ifndef EIGEN_GPUCC
template<int LoadMode,typename PacketType>
EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const
{

View File

@@ -128,7 +128,7 @@ DenseBase<Derived>::Random()
* \sa class CwiseNullaryOp, setRandom(Index), setRandom(Index,Index)
*/
template<typename Derived>
inline Derived& DenseBase<Derived>::setRandom()
EIGEN_DEVICE_FUNC inline Derived& DenseBase<Derived>::setRandom()
{
return *this = Random(rows(), cols());
}

View File

@@ -23,23 +23,29 @@ namespace internal {
* Part 1 : the logic deciding a strategy for vectorization and unrolling
***************************************************************************/
template<typename Func, typename Derived>
template<typename Func, typename Evaluator>
struct redux_traits
{
public:
typedef typename find_best_packet<typename Derived::Scalar,Derived::SizeAtCompileTime>::type PacketType;
typedef typename find_best_packet<typename Evaluator::Scalar,Evaluator::SizeAtCompileTime>::type PacketType;
enum {
PacketSize = unpacket_traits<PacketType>::size,
InnerMaxSize = int(Derived::IsRowMajor)
? Derived::MaxColsAtCompileTime
: Derived::MaxRowsAtCompileTime
InnerMaxSize = int(Evaluator::IsRowMajor)
? Evaluator::MaxColsAtCompileTime
: Evaluator::MaxRowsAtCompileTime,
OuterMaxSize = int(Evaluator::IsRowMajor)
? Evaluator::MaxRowsAtCompileTime
: Evaluator::MaxColsAtCompileTime,
SliceVectorizedWork = int(InnerMaxSize)==Dynamic ? Dynamic
: int(OuterMaxSize)==Dynamic ? (int(InnerMaxSize)>=int(PacketSize) ? Dynamic : 0)
: (int(InnerMaxSize)/int(PacketSize)) * int(OuterMaxSize)
};
enum {
MightVectorize = (int(Derived::Flags)&ActualPacketAccessBit)
MightVectorize = (int(Evaluator::Flags)&ActualPacketAccessBit)
&& (functor_traits<Func>::PacketAccess),
MayLinearVectorize = bool(MightVectorize) && (int(Derived::Flags)&LinearAccessBit),
MaySliceVectorize = bool(MightVectorize) && int(InnerMaxSize)>=3*PacketSize
MayLinearVectorize = bool(MightVectorize) && (int(Evaluator::Flags)&LinearAccessBit),
MaySliceVectorize = bool(MightVectorize) && (int(SliceVectorizedWork)==Dynamic || int(SliceVectorizedWork)>=3)
};
public:
@@ -51,8 +57,8 @@ public:
public:
enum {
Cost = Derived::SizeAtCompileTime == Dynamic ? HugeCost
: Derived::SizeAtCompileTime * Derived::CoeffReadCost + (Derived::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
Cost = Evaluator::SizeAtCompileTime == Dynamic ? HugeCost
: Evaluator::SizeAtCompileTime * Evaluator::CoeffReadCost + (Evaluator::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
UnrollingLimit = EIGEN_UNROLLING_LIMIT * (int(Traversal) == int(DefaultTraversal) ? 1 : int(PacketSize))
};
@@ -64,18 +70,20 @@ public:
#ifdef EIGEN_DEBUG_ASSIGN
static void debug()
{
std::cerr << "Xpr: " << typeid(typename Derived::XprType).name() << std::endl;
std::cerr << "Xpr: " << typeid(typename Evaluator::XprType).name() << std::endl;
std::cerr.setf(std::ios::hex, std::ios::basefield);
EIGEN_DEBUG_VAR(Derived::Flags)
EIGEN_DEBUG_VAR(Evaluator::Flags)
std::cerr.unsetf(std::ios::hex);
EIGEN_DEBUG_VAR(InnerMaxSize)
EIGEN_DEBUG_VAR(OuterMaxSize)
EIGEN_DEBUG_VAR(SliceVectorizedWork)
EIGEN_DEBUG_VAR(PacketSize)
EIGEN_DEBUG_VAR(MightVectorize)
EIGEN_DEBUG_VAR(MayLinearVectorize)
EIGEN_DEBUG_VAR(MaySliceVectorize)
EIGEN_DEBUG_VAR(Traversal)
std::cerr << "Traversal" << " = " << Traversal << " (" << demangle_traversal(Traversal) << ")" << std::endl;
EIGEN_DEBUG_VAR(UnrollingLimit)
EIGEN_DEBUG_VAR(Unrolling)
std::cerr << "Unrolling" << " = " << Unrolling << " (" << demangle_unrolling(Unrolling) << ")" << std::endl;
std::cerr << std::endl;
}
#endif
@@ -87,88 +95,86 @@ public:
/*** no vectorization ***/
template<typename Func, typename Derived, int Start, int Length>
template<typename Func, typename Evaluator, int Start, int Length>
struct redux_novec_unroller
{
enum {
HalfLength = Length/2
};
typedef typename Derived::Scalar Scalar;
typedef typename Evaluator::Scalar Scalar;
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE Scalar run(const Derived &mat, const Func& func)
static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func& func)
{
return func(redux_novec_unroller<Func, Derived, Start, HalfLength>::run(mat,func),
redux_novec_unroller<Func, Derived, Start+HalfLength, Length-HalfLength>::run(mat,func));
return func(redux_novec_unroller<Func, Evaluator, Start, HalfLength>::run(eval,func),
redux_novec_unroller<Func, Evaluator, Start+HalfLength, Length-HalfLength>::run(eval,func));
}
};
template<typename Func, typename Derived, int Start>
struct redux_novec_unroller<Func, Derived, Start, 1>
template<typename Func, typename Evaluator, int Start>
struct redux_novec_unroller<Func, Evaluator, Start, 1>
{
enum {
outer = Start / Derived::InnerSizeAtCompileTime,
inner = Start % Derived::InnerSizeAtCompileTime
outer = Start / Evaluator::InnerSizeAtCompileTime,
inner = Start % Evaluator::InnerSizeAtCompileTime
};
typedef typename Derived::Scalar Scalar;
typedef typename Evaluator::Scalar Scalar;
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE Scalar run(const Derived &mat, const Func&)
static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func&)
{
return mat.coeffByOuterInner(outer, inner);
return eval.coeffByOuterInner(outer, inner);
}
};
// This is actually dead code and will never be called. It is required
// to prevent false warnings regarding failed inlining though
// for 0 length run() will never be called at all.
template<typename Func, typename Derived, int Start>
struct redux_novec_unroller<Func, Derived, Start, 0>
template<typename Func, typename Evaluator, int Start>
struct redux_novec_unroller<Func, Evaluator, Start, 0>
{
typedef typename Derived::Scalar Scalar;
typedef typename Evaluator::Scalar Scalar;
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE Scalar run(const Derived&, const Func&) { return Scalar(); }
static EIGEN_STRONG_INLINE Scalar run(const Evaluator&, const Func&) { return Scalar(); }
};
/*** vectorization ***/
template<typename Func, typename Derived, int Start, int Length>
template<typename Func, typename Evaluator, int Start, int Length>
struct redux_vec_unroller
{
enum {
PacketSize = redux_traits<Func, Derived>::PacketSize,
HalfLength = Length/2
};
typedef typename Derived::Scalar Scalar;
typedef typename redux_traits<Func, Derived>::PacketType PacketScalar;
static EIGEN_STRONG_INLINE PacketScalar run(const Derived &mat, const Func& func)
template<typename PacketType>
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE PacketType run(const Evaluator &eval, const Func& func)
{
enum {
PacketSize = unpacket_traits<PacketType>::size,
HalfLength = Length/2
};
return func.packetOp(
redux_vec_unroller<Func, Derived, Start, HalfLength>::run(mat,func),
redux_vec_unroller<Func, Derived, Start+HalfLength, Length-HalfLength>::run(mat,func) );
redux_vec_unroller<Func, Evaluator, Start, HalfLength>::template run<PacketType>(eval,func),
redux_vec_unroller<Func, Evaluator, Start+HalfLength, Length-HalfLength>::template run<PacketType>(eval,func) );
}
};
template<typename Func, typename Derived, int Start>
struct redux_vec_unroller<Func, Derived, Start, 1>
template<typename Func, typename Evaluator, int Start>
struct redux_vec_unroller<Func, Evaluator, Start, 1>
{
enum {
index = Start * redux_traits<Func, Derived>::PacketSize,
outer = index / int(Derived::InnerSizeAtCompileTime),
inner = index % int(Derived::InnerSizeAtCompileTime),
alignment = Derived::Alignment
};
typedef typename Derived::Scalar Scalar;
typedef typename redux_traits<Func, Derived>::PacketType PacketScalar;
static EIGEN_STRONG_INLINE PacketScalar run(const Derived &mat, const Func&)
template<typename PacketType>
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE PacketType run(const Evaluator &eval, const Func&)
{
return mat.template packetByOuterInner<alignment,PacketScalar>(outer, inner);
enum {
PacketSize = unpacket_traits<PacketType>::size,
index = Start * PacketSize,
outer = index / int(Evaluator::InnerSizeAtCompileTime),
inner = index % int(Evaluator::InnerSizeAtCompileTime),
alignment = Evaluator::Alignment
};
return eval.template packetByOuterInner<alignment,PacketType>(outer, inner);
}
};
@@ -176,53 +182,65 @@ struct redux_vec_unroller<Func, Derived, Start, 1>
* Part 3 : implementation of all cases
***************************************************************************/
template<typename Func, typename Derived,
int Traversal = redux_traits<Func, Derived>::Traversal,
int Unrolling = redux_traits<Func, Derived>::Unrolling
template<typename Func, typename Evaluator,
int Traversal = redux_traits<Func, Evaluator>::Traversal,
int Unrolling = redux_traits<Func, Evaluator>::Unrolling
>
struct redux_impl;
template<typename Func, typename Derived>
struct redux_impl<Func, Derived, DefaultTraversal, NoUnrolling>
template<typename Func, typename Evaluator>
struct redux_impl<Func, Evaluator, DefaultTraversal, NoUnrolling>
{
typedef typename Derived::Scalar Scalar;
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE Scalar run(const Derived &mat, const Func& func)
typedef typename Evaluator::Scalar Scalar;
template<typename XprType>
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE
Scalar run(const Evaluator &eval, const Func& func, const XprType& xpr)
{
eigen_assert(mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix");
eigen_assert(xpr.rows()>0 && xpr.cols()>0 && "you are using an empty matrix");
Scalar res;
res = mat.coeffByOuterInner(0, 0);
for(Index i = 1; i < mat.innerSize(); ++i)
res = func(res, mat.coeffByOuterInner(0, i));
for(Index i = 1; i < mat.outerSize(); ++i)
for(Index j = 0; j < mat.innerSize(); ++j)
res = func(res, mat.coeffByOuterInner(i, j));
res = eval.coeffByOuterInner(0, 0);
for(Index i = 1; i < xpr.innerSize(); ++i)
res = func(res, eval.coeffByOuterInner(0, i));
for(Index i = 1; i < xpr.outerSize(); ++i)
for(Index j = 0; j < xpr.innerSize(); ++j)
res = func(res, eval.coeffByOuterInner(i, j));
return res;
}
};
template<typename Func, typename Derived>
struct redux_impl<Func,Derived, DefaultTraversal, CompleteUnrolling>
: public redux_novec_unroller<Func,Derived, 0, Derived::SizeAtCompileTime>
{};
template<typename Func, typename Derived>
struct redux_impl<Func, Derived, LinearVectorizedTraversal, NoUnrolling>
template<typename Func, typename Evaluator>
struct redux_impl<Func,Evaluator, DefaultTraversal, CompleteUnrolling>
: redux_novec_unroller<Func,Evaluator, 0, Evaluator::SizeAtCompileTime>
{
typedef typename Derived::Scalar Scalar;
typedef typename redux_traits<Func, Derived>::PacketType PacketScalar;
static Scalar run(const Derived &mat, const Func& func)
typedef redux_novec_unroller<Func,Evaluator, 0, Evaluator::SizeAtCompileTime> Base;
typedef typename Evaluator::Scalar Scalar;
template<typename XprType>
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE
Scalar run(const Evaluator &eval, const Func& func, const XprType& /*xpr*/)
{
const Index size = mat.size();
return Base::run(eval,func);
}
};
template<typename Func, typename Evaluator>
struct redux_impl<Func, Evaluator, LinearVectorizedTraversal, NoUnrolling>
{
typedef typename Evaluator::Scalar Scalar;
typedef typename redux_traits<Func, Evaluator>::PacketType PacketScalar;
template<typename XprType>
static Scalar run(const Evaluator &eval, const Func& func, const XprType& xpr)
{
const Index size = xpr.size();
const Index packetSize = redux_traits<Func, Derived>::PacketSize;
const Index packetSize = redux_traits<Func, Evaluator>::PacketSize;
const int packetAlignment = unpacket_traits<PacketScalar>::alignment;
enum {
alignment0 = (bool(Derived::Flags & DirectAccessBit) && bool(packet_traits<Scalar>::AlignedOnScalar)) ? int(packetAlignment) : int(Unaligned),
alignment = EIGEN_PLAIN_ENUM_MAX(alignment0, Derived::Alignment)
alignment0 = (bool(Evaluator::Flags & DirectAccessBit) && bool(packet_traits<Scalar>::AlignedOnScalar)) ? int(packetAlignment) : int(Unaligned),
alignment = EIGEN_PLAIN_ENUM_MAX(alignment0, Evaluator::Alignment)
};
const Index alignedStart = internal::first_default_aligned(mat.nestedExpression());
const Index alignedStart = internal::first_default_aligned(xpr);
const Index alignedSize2 = ((size-alignedStart)/(2*packetSize))*(2*packetSize);
const Index alignedSize = ((size-alignedStart)/(packetSize))*(packetSize);
const Index alignedEnd2 = alignedStart + alignedSize2;
@@ -230,34 +248,34 @@ struct redux_impl<Func, Derived, LinearVectorizedTraversal, NoUnrolling>
Scalar res;
if(alignedSize)
{
PacketScalar packet_res0 = mat.template packet<alignment,PacketScalar>(alignedStart);
PacketScalar packet_res0 = eval.template packet<alignment,PacketScalar>(alignedStart);
if(alignedSize>packetSize) // we have at least two packets to partly unroll the loop
{
PacketScalar packet_res1 = mat.template packet<alignment,PacketScalar>(alignedStart+packetSize);
PacketScalar packet_res1 = eval.template packet<alignment,PacketScalar>(alignedStart+packetSize);
for(Index index = alignedStart + 2*packetSize; index < alignedEnd2; index += 2*packetSize)
{
packet_res0 = func.packetOp(packet_res0, mat.template packet<alignment,PacketScalar>(index));
packet_res1 = func.packetOp(packet_res1, mat.template packet<alignment,PacketScalar>(index+packetSize));
packet_res0 = func.packetOp(packet_res0, eval.template packet<alignment,PacketScalar>(index));
packet_res1 = func.packetOp(packet_res1, eval.template packet<alignment,PacketScalar>(index+packetSize));
}
packet_res0 = func.packetOp(packet_res0,packet_res1);
if(alignedEnd>alignedEnd2)
packet_res0 = func.packetOp(packet_res0, mat.template packet<alignment,PacketScalar>(alignedEnd2));
packet_res0 = func.packetOp(packet_res0, eval.template packet<alignment,PacketScalar>(alignedEnd2));
}
res = func.predux(packet_res0);
for(Index index = 0; index < alignedStart; ++index)
res = func(res,mat.coeff(index));
res = func(res,eval.coeff(index));
for(Index index = alignedEnd; index < size; ++index)
res = func(res,mat.coeff(index));
res = func(res,eval.coeff(index));
}
else // too small to vectorize anything.
// since this is dynamic-size hence inefficient anyway for such small sizes, don't try to optimize.
{
res = mat.coeff(0);
res = eval.coeff(0);
for(Index index = 1; index < size; ++index)
res = func(res,mat.coeff(index));
res = func(res,eval.coeff(index));
}
return res;
@@ -265,130 +283,108 @@ struct redux_impl<Func, Derived, LinearVectorizedTraversal, NoUnrolling>
};
// NOTE: for SliceVectorizedTraversal we simply bypass unrolling
template<typename Func, typename Derived, int Unrolling>
struct redux_impl<Func, Derived, SliceVectorizedTraversal, Unrolling>
template<typename Func, typename Evaluator, int Unrolling>
struct redux_impl<Func, Evaluator, SliceVectorizedTraversal, Unrolling>
{
typedef typename Derived::Scalar Scalar;
typedef typename redux_traits<Func, Derived>::PacketType PacketType;
typedef typename Evaluator::Scalar Scalar;
typedef typename redux_traits<Func, Evaluator>::PacketType PacketType;
EIGEN_DEVICE_FUNC static Scalar run(const Derived &mat, const Func& func)
template<typename XprType>
EIGEN_DEVICE_FUNC static Scalar run(const Evaluator &eval, const Func& func, const XprType& xpr)
{
eigen_assert(mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix");
const Index innerSize = mat.innerSize();
const Index outerSize = mat.outerSize();
eigen_assert(xpr.rows()>0 && xpr.cols()>0 && "you are using an empty matrix");
const Index innerSize = xpr.innerSize();
const Index outerSize = xpr.outerSize();
enum {
packetSize = redux_traits<Func, Derived>::PacketSize
packetSize = redux_traits<Func, Evaluator>::PacketSize
};
const Index packetedInnerSize = ((innerSize)/packetSize)*packetSize;
Scalar res;
if(packetedInnerSize)
{
PacketType packet_res = mat.template packet<Unaligned,PacketType>(0,0);
PacketType packet_res = eval.template packet<Unaligned,PacketType>(0,0);
for(Index j=0; j<outerSize; ++j)
for(Index i=(j==0?packetSize:0); i<packetedInnerSize; i+=Index(packetSize))
packet_res = func.packetOp(packet_res, mat.template packetByOuterInner<Unaligned,PacketType>(j,i));
packet_res = func.packetOp(packet_res, eval.template packetByOuterInner<Unaligned,PacketType>(j,i));
res = func.predux(packet_res);
for(Index j=0; j<outerSize; ++j)
for(Index i=packetedInnerSize; i<innerSize; ++i)
res = func(res, mat.coeffByOuterInner(j,i));
res = func(res, eval.coeffByOuterInner(j,i));
}
else // too small to vectorize anything.
// since this is dynamic-size hence inefficient anyway for such small sizes, don't try to optimize.
{
res = redux_impl<Func, Derived, DefaultTraversal, NoUnrolling>::run(mat, func);
res = redux_impl<Func, Evaluator, DefaultTraversal, NoUnrolling>::run(eval, func, xpr);
}
return res;
}
};
template<typename Func, typename Derived>
struct redux_impl<Func, Derived, LinearVectorizedTraversal, CompleteUnrolling>
template<typename Func, typename Evaluator>
struct redux_impl<Func, Evaluator, LinearVectorizedTraversal, CompleteUnrolling>
{
typedef typename Derived::Scalar Scalar;
typedef typename Evaluator::Scalar Scalar;
typedef typename redux_traits<Func, Derived>::PacketType PacketScalar;
typedef typename redux_traits<Func, Evaluator>::PacketType PacketType;
enum {
PacketSize = redux_traits<Func, Derived>::PacketSize,
Size = Derived::SizeAtCompileTime,
PacketSize = redux_traits<Func, Evaluator>::PacketSize,
Size = Evaluator::SizeAtCompileTime,
VectorizedSize = (Size / PacketSize) * PacketSize
};
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Derived &mat, const Func& func)
template<typename XprType>
EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE
Scalar run(const Evaluator &eval, const Func& func, const XprType &xpr)
{
eigen_assert(mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix");
EIGEN_ONLY_USED_FOR_DEBUG(xpr)
eigen_assert(xpr.rows()>0 && xpr.cols()>0 && "you are using an empty matrix");
if (VectorizedSize > 0) {
Scalar res = func.predux(redux_vec_unroller<Func, Derived, 0, Size / PacketSize>::run(mat,func));
Scalar res = func.predux(redux_vec_unroller<Func, Evaluator, 0, Size / PacketSize>::template run<PacketType>(eval,func));
if (VectorizedSize != Size)
res = func(res,redux_novec_unroller<Func, Derived, VectorizedSize, Size-VectorizedSize>::run(mat,func));
res = func(res,redux_novec_unroller<Func, Evaluator, VectorizedSize, Size-VectorizedSize>::run(eval,func));
return res;
}
else {
return redux_novec_unroller<Func, Derived, 0, Size>::run(mat,func);
return redux_novec_unroller<Func, Evaluator, 0, Size>::run(eval,func);
}
}
};
// evaluator adaptor
template<typename _XprType>
class redux_evaluator
class redux_evaluator : public internal::evaluator<_XprType>
{
typedef internal::evaluator<_XprType> Base;
public:
typedef _XprType XprType;
EIGEN_DEVICE_FUNC explicit redux_evaluator(const XprType &xpr) : m_evaluator(xpr), m_xpr(xpr) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit redux_evaluator(const XprType &xpr) : Base(xpr) {}
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
typedef typename XprType::PacketScalar PacketScalar;
typedef typename XprType::PacketReturnType PacketReturnType;
enum {
MaxRowsAtCompileTime = XprType::MaxRowsAtCompileTime,
MaxColsAtCompileTime = XprType::MaxColsAtCompileTime,
// TODO we should not remove DirectAccessBit and rather find an elegant way to query the alignment offset at runtime from the evaluator
Flags = evaluator<XprType>::Flags & ~DirectAccessBit,
Flags = Base::Flags & ~DirectAccessBit,
IsRowMajor = XprType::IsRowMajor,
SizeAtCompileTime = XprType::SizeAtCompileTime,
InnerSizeAtCompileTime = XprType::InnerSizeAtCompileTime,
CoeffReadCost = evaluator<XprType>::CoeffReadCost,
Alignment = evaluator<XprType>::Alignment
InnerSizeAtCompileTime = XprType::InnerSizeAtCompileTime
};
EIGEN_DEVICE_FUNC Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC Index size() const { return m_xpr.size(); }
EIGEN_DEVICE_FUNC Index innerSize() const { return m_xpr.innerSize(); }
EIGEN_DEVICE_FUNC Index outerSize() const { return m_xpr.outerSize(); }
EIGEN_DEVICE_FUNC
CoeffReturnType coeff(Index row, Index col) const
{ return m_evaluator.coeff(row, col); }
EIGEN_DEVICE_FUNC
CoeffReturnType coeff(Index index) const
{ return m_evaluator.coeff(index); }
template<int LoadMode, typename PacketType>
PacketType packet(Index row, Index col) const
{ return m_evaluator.template packet<LoadMode,PacketType>(row, col); }
template<int LoadMode, typename PacketType>
PacketType packet(Index index) const
{ return m_evaluator.template packet<LoadMode,PacketType>(index); }
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeffByOuterInner(Index outer, Index inner) const
{ return m_evaluator.coeff(IsRowMajor ? outer : inner, IsRowMajor ? inner : outer); }
{ return Base::coeff(IsRowMajor ? outer : inner, IsRowMajor ? inner : outer); }
template<int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
PacketType packetByOuterInner(Index outer, Index inner) const
{ return m_evaluator.template packet<LoadMode,PacketType>(IsRowMajor ? outer : inner, IsRowMajor ? inner : outer); }
{ return Base::template packet<LoadMode,PacketType>(IsRowMajor ? outer : inner, IsRowMajor ? inner : outer); }
const XprType & nestedExpression() const { return m_xpr; }
protected:
internal::evaluator<XprType> m_evaluator;
const XprType &m_xpr;
};
} // end namespace internal
@@ -403,36 +399,42 @@ protected:
* The template parameter \a BinaryOp is the type of the functor \a func which must be
* an associative operator. Both current C++98 and C++11 functor styles are handled.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::sum(), DenseBase::minCoeff(), DenseBase::maxCoeff(), MatrixBase::colwise(), MatrixBase::rowwise()
*/
template<typename Derived>
template<typename Func>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::redux(const Func& func) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
typedef typename internal::redux_evaluator<Derived> ThisEvaluator;
ThisEvaluator thisEval(derived());
return internal::redux_impl<Func, ThisEvaluator>::run(thisEval, func);
// The initial expression is passed to the reducer as an additional argument instead of
// passing it as a member of redux_evaluator to help
return internal::redux_impl<Func, ThisEvaluator>::run(thisEval, func, derived());
}
/** \returns the minimum of all coefficients of \c *this.
* \warning the matrix must be not empty, otherwise an assertion is triggered.
* \warning the result is undefined if \c *this contains NaN.
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff() const
{
return derived().redux(Eigen::internal::scalar_min_op<Scalar,Scalar>());
}
/** \returns the maximum of all coefficients of \c *this.
* \warning the matrix must be not empty, otherwise an assertion is triggered.
* \warning the result is undefined if \c *this contains NaN.
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff() const
{
return derived().redux(Eigen::internal::scalar_max_op<Scalar,Scalar>());
@@ -445,7 +447,7 @@ DenseBase<Derived>::maxCoeff() const
* \sa trace(), prod(), mean()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::sum() const
{
if(SizeAtCompileTime==0 || (SizeAtCompileTime==Dynamic && size()==0))
@@ -458,7 +460,7 @@ DenseBase<Derived>::sum() const
* \sa trace(), prod(), sum()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::mean() const
{
#ifdef __INTEL_COMPILER
@@ -479,7 +481,7 @@ DenseBase<Derived>::mean() const
* \sa sum(), mean(), trace()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
DenseBase<Derived>::prod() const
{
if(SizeAtCompileTime==0 || (SizeAtCompileTime==Dynamic && size()==0))
@@ -494,7 +496,7 @@ DenseBase<Derived>::prod() const
* \sa diagonal(), sum()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
MatrixBase<Derived>::trace() const
{
return derived().diagonal().sum();

View File

@@ -187,6 +187,8 @@ protected:
* void foo(const Ref<MatrixXf,0,Stride<> >& A) { foo_impl(A); }
* \endcode
*
* See also the following stackoverflow questions for further references:
* - <a href="http://stackoverflow.com/questions/21132538/correct-usage-of-the-eigenref-class">Correct usage of the Eigen::Ref<> class</a>
*
* \sa PlainObjectBase::Map(), \ref TopicStorageOrders
*/

View File

@@ -115,7 +115,7 @@ template<typename MatrixType,int RowFactor,int ColFactor> class Replicate
*/
template<typename Derived>
template<int RowFactor, int ColFactor>
const Replicate<Derived,RowFactor,ColFactor>
EIGEN_DEVICE_FUNC const Replicate<Derived,RowFactor,ColFactor>
DenseBase<Derived>::replicate() const
{
return Replicate<Derived,RowFactor,ColFactor>(derived());
@@ -130,7 +130,7 @@ DenseBase<Derived>::replicate() const
* \sa VectorwiseOp::replicate(), DenseBase::replicate(), class Replicate
*/
template<typename ExpressionType, int Direction>
const typename VectorwiseOp<ExpressionType,Direction>::ReplicateReturnType
EIGEN_DEVICE_FUNC const typename VectorwiseOp<ExpressionType,Direction>::ReplicateReturnType
VectorwiseOp<ExpressionType,Direction>::replicate(Index factor) const
{
return typename VectorwiseOp<ExpressionType,Direction>::ReplicateReturnType

453
Eigen/src/Core/Reshaped.h Normal file
View File

@@ -0,0 +1,453 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2008-2017 Gael Guennebaud <gael.guennebaud@inria.fr>
// Copyright (C) 2014 yoco <peter.xiau@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_RESHAPED_H
#define EIGEN_RESHAPED_H
namespace Eigen {
namespace internal {
/** \class Reshaped
* \ingroup Core_Module
*
* \brief Expression of a fixed-size or dynamic-size reshape
*
* \tparam XprType the type of the expression in which we are taking a reshape
* \tparam Rows the number of rows of the reshape we are taking at compile time (optional)
* \tparam Cols the number of columns of the reshape we are taking at compile time (optional)
* \tparam Order can be ColMajor or RowMajor, default is ColMajor.
*
* This class represents an expression of either a fixed-size or dynamic-size reshape.
* It is the return type of DenseBase::reshaped(NRowsType,NColsType) and
* most of the time this is the only way it is used.
*
* However, in C++98, if you want to directly maniputate reshaped expressions,
* for instance if you want to write a function returning such an expression, you
* will need to use this class. In C++11, it is advised to use the \em auto
* keyword for such use cases.
*
* Here is an example illustrating the dynamic case:
* \include class_Reshaped.cpp
* Output: \verbinclude class_Reshaped.out
*
* Here is an example illustrating the fixed-size case:
* \include class_FixedReshaped.cpp
* Output: \verbinclude class_FixedReshaped.out
*
* \sa DenseBase::reshaped(NRowsType,NColsType)
*/
template<typename XprType, int Rows, int Cols, int Order>
struct traits<Reshaped<XprType, Rows, Cols, Order> > : traits<XprType>
{
typedef typename traits<XprType>::Scalar Scalar;
typedef typename traits<XprType>::StorageKind StorageKind;
typedef typename traits<XprType>::XprKind XprKind;
enum{
MatrixRows = traits<XprType>::RowsAtCompileTime,
MatrixCols = traits<XprType>::ColsAtCompileTime,
RowsAtCompileTime = Rows,
ColsAtCompileTime = Cols,
MaxRowsAtCompileTime = Rows,
MaxColsAtCompileTime = Cols,
XpxStorageOrder = ((int(traits<XprType>::Flags) & RowMajorBit) == RowMajorBit) ? RowMajor : ColMajor,
ReshapedStorageOrder = (RowsAtCompileTime == 1 && ColsAtCompileTime != 1) ? RowMajor
: (ColsAtCompileTime == 1 && RowsAtCompileTime != 1) ? ColMajor
: XpxStorageOrder,
HasSameStorageOrderAsXprType = (ReshapedStorageOrder == XpxStorageOrder),
InnerSize = (ReshapedStorageOrder==int(RowMajor)) ? int(ColsAtCompileTime) : int(RowsAtCompileTime),
InnerStrideAtCompileTime = HasSameStorageOrderAsXprType
? int(inner_stride_at_compile_time<XprType>::ret)
: Dynamic,
OuterStrideAtCompileTime = Dynamic,
HasDirectAccess = internal::has_direct_access<XprType>::ret
&& (Order==int(XpxStorageOrder))
&& ((evaluator<XprType>::Flags&LinearAccessBit)==LinearAccessBit),
MaskPacketAccessBit = (InnerSize == Dynamic || (InnerSize % packet_traits<Scalar>::size) == 0)
&& (InnerStrideAtCompileTime == 1)
? PacketAccessBit : 0,
//MaskAlignedBit = ((OuterStrideAtCompileTime!=Dynamic) && (((OuterStrideAtCompileTime * int(sizeof(Scalar))) % 16) == 0)) ? AlignedBit : 0,
FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1) ? LinearAccessBit : 0,
FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
FlagsRowMajorBit = (ReshapedStorageOrder==int(RowMajor)) ? RowMajorBit : 0,
FlagsDirectAccessBit = HasDirectAccess ? DirectAccessBit : 0,
Flags0 = traits<XprType>::Flags & ( (HereditaryBits & ~RowMajorBit) | MaskPacketAccessBit),
Flags = (Flags0 | FlagsLinearAccessBit | FlagsLvalueBit | FlagsRowMajorBit | FlagsDirectAccessBit)
};
};
template<typename XprType, int Rows, int Cols, int Order, bool HasDirectAccess> class ReshapedImpl_dense;
} // end namespace internal
template<typename XprType, int Rows, int Cols, int Order, typename StorageKind> class ReshapedImpl;
template<typename XprType, int Rows, int Cols, int Order> class Reshaped
: public ReshapedImpl<XprType, Rows, Cols, Order, typename internal::traits<XprType>::StorageKind>
{
typedef ReshapedImpl<XprType, Rows, Cols, Order, typename internal::traits<XprType>::StorageKind> Impl;
public:
//typedef typename Impl::Base Base;
typedef Impl Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(Reshaped)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Reshaped)
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline Reshaped(XprType& xpr)
: Impl(xpr)
{
EIGEN_STATIC_ASSERT(RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic,THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE)
eigen_assert(Rows * Cols == xpr.rows() * xpr.cols());
}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline Reshaped(XprType& xpr,
Index reshapeRows, Index reshapeCols)
: Impl(xpr, reshapeRows, reshapeCols)
{
eigen_assert((RowsAtCompileTime==Dynamic || RowsAtCompileTime==reshapeRows)
&& (ColsAtCompileTime==Dynamic || ColsAtCompileTime==reshapeCols));
eigen_assert(reshapeRows * reshapeCols == xpr.rows() * xpr.cols());
}
};
// The generic default implementation for dense reshape simply forward to the internal::ReshapedImpl_dense
// that must be specialized for direct and non-direct access...
template<typename XprType, int Rows, int Cols, int Order>
class ReshapedImpl<XprType, Rows, Cols, Order, Dense>
: public internal::ReshapedImpl_dense<XprType, Rows, Cols, Order,internal::traits<Reshaped<XprType,Rows,Cols,Order> >::HasDirectAccess>
{
typedef internal::ReshapedImpl_dense<XprType, Rows, Cols, Order,internal::traits<Reshaped<XprType,Rows,Cols,Order> >::HasDirectAccess> Impl;
public:
typedef Impl Base;
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ReshapedImpl)
EIGEN_DEVICE_FUNC inline ReshapedImpl(XprType& xpr) : Impl(xpr) {}
EIGEN_DEVICE_FUNC inline ReshapedImpl(XprType& xpr, Index reshapeRows, Index reshapeCols)
: Impl(xpr, reshapeRows, reshapeCols) {}
};
namespace internal {
/** \internal Internal implementation of dense Reshaped in the general case. */
template<typename XprType, int Rows, int Cols, int Order>
class ReshapedImpl_dense<XprType,Rows,Cols,Order,false>
: public internal::dense_xpr_base<Reshaped<XprType, Rows, Cols, Order> >::type
{
typedef Reshaped<XprType, Rows, Cols, Order> ReshapedType;
public:
typedef typename internal::dense_xpr_base<ReshapedType>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(ReshapedType)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ReshapedImpl_dense)
typedef typename internal::ref_selector<XprType>::non_const_type MatrixTypeNested;
typedef typename internal::remove_all<XprType>::type NestedExpression;
class InnerIterator;
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline ReshapedImpl_dense(XprType& xpr)
: m_xpr(xpr), m_rows(Rows), m_cols(Cols)
{}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline ReshapedImpl_dense(XprType& xpr, Index nRows, Index nCols)
: m_xpr(xpr), m_rows(nRows), m_cols(nCols)
{}
EIGEN_DEVICE_FUNC Index rows() const { return m_rows; }
EIGEN_DEVICE_FUNC Index cols() const { return m_cols; }
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \sa MapBase::data() */
EIGEN_DEVICE_FUNC inline const Scalar* data() const;
EIGEN_DEVICE_FUNC inline Index innerStride() const;
EIGEN_DEVICE_FUNC inline Index outerStride() const;
#endif
/** \returns the nested expression */
EIGEN_DEVICE_FUNC
const typename internal::remove_all<XprType>::type&
nestedExpression() const { return m_xpr; }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC
typename internal::remove_reference<XprType>::type&
nestedExpression() { return m_xpr; }
protected:
MatrixTypeNested m_xpr;
const internal::variable_if_dynamic<Index, Rows> m_rows;
const internal::variable_if_dynamic<Index, Cols> m_cols;
};
/** \internal Internal implementation of dense Reshaped in the direct access case. */
template<typename XprType, int Rows, int Cols, int Order>
class ReshapedImpl_dense<XprType, Rows, Cols, Order, true>
: public MapBase<Reshaped<XprType, Rows, Cols, Order> >
{
typedef Reshaped<XprType, Rows, Cols, Order> ReshapedType;
typedef typename internal::ref_selector<XprType>::non_const_type XprTypeNested;
public:
typedef MapBase<ReshapedType> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(ReshapedType)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ReshapedImpl_dense)
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline ReshapedImpl_dense(XprType& xpr)
: Base(xpr.data()), m_xpr(xpr)
{}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline ReshapedImpl_dense(XprType& xpr, Index nRows, Index nCols)
: Base(xpr.data(), nRows, nCols),
m_xpr(xpr)
{}
EIGEN_DEVICE_FUNC
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
}
EIGEN_DEVICE_FUNC
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC
inline Index innerStride() const
{
return m_xpr.innerStride();
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC
inline Index outerStride() const
{
return ((Flags&RowMajorBit)==RowMajorBit) ? this->cols() : this->rows();
}
protected:
XprTypeNested m_xpr;
};
// Evaluators
template<typename ArgType, int Rows, int Cols, int Order, bool HasDirectAccess> struct reshaped_evaluator;
template<typename ArgType, int Rows, int Cols, int Order>
struct evaluator<Reshaped<ArgType, Rows, Cols, Order> >
: reshaped_evaluator<ArgType, Rows, Cols, Order, traits<Reshaped<ArgType,Rows,Cols,Order> >::HasDirectAccess>
{
typedef Reshaped<ArgType, Rows, Cols, Order> XprType;
typedef typename XprType::Scalar Scalar;
// TODO: should check for smaller packet types
typedef typename packet_traits<Scalar>::type PacketScalar;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost,
HasDirectAccess = traits<XprType>::HasDirectAccess,
// RowsAtCompileTime = traits<XprType>::RowsAtCompileTime,
// ColsAtCompileTime = traits<XprType>::ColsAtCompileTime,
// MaxRowsAtCompileTime = traits<XprType>::MaxRowsAtCompileTime,
// MaxColsAtCompileTime = traits<XprType>::MaxColsAtCompileTime,
//
// InnerStrideAtCompileTime = traits<XprType>::HasSameStorageOrderAsXprType
// ? int(inner_stride_at_compile_time<ArgType>::ret)
// : Dynamic,
// OuterStrideAtCompileTime = Dynamic,
FlagsLinearAccessBit = (traits<XprType>::RowsAtCompileTime == 1 || traits<XprType>::ColsAtCompileTime == 1 || HasDirectAccess) ? LinearAccessBit : 0,
FlagsRowMajorBit = (traits<XprType>::ReshapedStorageOrder==int(RowMajor)) ? RowMajorBit : 0,
FlagsDirectAccessBit = HasDirectAccess ? DirectAccessBit : 0,
Flags0 = evaluator<ArgType>::Flags & (HereditaryBits & ~RowMajorBit),
Flags = Flags0 | FlagsLinearAccessBit | FlagsRowMajorBit | FlagsDirectAccessBit,
PacketAlignment = unpacket_traits<PacketScalar>::alignment,
Alignment = evaluator<ArgType>::Alignment
};
typedef reshaped_evaluator<ArgType, Rows, Cols, Order, HasDirectAccess> reshaped_evaluator_type;
EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : reshaped_evaluator_type(xpr)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
};
template<typename ArgType, int Rows, int Cols, int Order>
struct reshaped_evaluator<ArgType, Rows, Cols, Order, /* HasDirectAccess */ false>
: evaluator_base<Reshaped<ArgType, Rows, Cols, Order> >
{
typedef Reshaped<ArgType, Rows, Cols, Order> XprType;
enum {
CoeffReadCost = evaluator<ArgType>::CoeffReadCost /* TODO + cost of index computations */,
Flags = (evaluator<ArgType>::Flags & (HereditaryBits /*| LinearAccessBit | DirectAccessBit*/)),
Alignment = 0
};
EIGEN_DEVICE_FUNC explicit reshaped_evaluator(const XprType& xpr) : m_argImpl(xpr.nestedExpression()), m_xpr(xpr)
{
EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost);
}
typedef typename XprType::Scalar Scalar;
typedef typename XprType::CoeffReturnType CoeffReturnType;
typedef std::pair<Index, Index> RowCol;
inline RowCol index_remap(Index rowId, Index colId) const
{
if(Order==ColMajor)
{
const Index nth_elem_idx = colId * m_xpr.rows() + rowId;
return RowCol(nth_elem_idx % m_xpr.nestedExpression().rows(),
nth_elem_idx / m_xpr.nestedExpression().rows());
}
else
{
const Index nth_elem_idx = colId + rowId * m_xpr.cols();
return RowCol(nth_elem_idx / m_xpr.nestedExpression().cols(),
nth_elem_idx % m_xpr.nestedExpression().cols());
}
}
EIGEN_DEVICE_FUNC
inline Scalar& coeffRef(Index rowId, Index colId)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
const RowCol row_col = index_remap(rowId, colId);
return m_argImpl.coeffRef(row_col.first, row_col.second);
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index rowId, Index colId) const
{
const RowCol row_col = index_remap(rowId, colId);
return m_argImpl.coeffRef(row_col.first, row_col.second);
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index rowId, Index colId) const
{
const RowCol row_col = index_remap(rowId, colId);
return m_argImpl.coeff(row_col.first, row_col.second);
}
EIGEN_DEVICE_FUNC
inline Scalar& coeffRef(Index index)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
const RowCol row_col = index_remap(Rows == 1 ? 0 : index,
Rows == 1 ? index : 0);
return m_argImpl.coeffRef(row_col.first, row_col.second);
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index index) const
{
const RowCol row_col = index_remap(Rows == 1 ? 0 : index,
Rows == 1 ? index : 0);
return m_argImpl.coeffRef(row_col.first, row_col.second);
}
EIGEN_DEVICE_FUNC
inline const CoeffReturnType coeff(Index index) const
{
const RowCol row_col = index_remap(Rows == 1 ? 0 : index,
Rows == 1 ? index : 0);
return m_argImpl.coeff(row_col.first, row_col.second);
}
#if 0
EIGEN_DEVICE_FUNC
template<int LoadMode>
inline PacketScalar packet(Index rowId, Index colId) const
{
const RowCol row_col = index_remap(rowId, colId);
return m_argImpl.template packet<Unaligned>(row_col.first, row_col.second);
}
template<int LoadMode>
EIGEN_DEVICE_FUNC
inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
{
const RowCol row_col = index_remap(rowId, colId);
m_argImpl.const_cast_derived().template writePacket<Unaligned>
(row_col.first, row_col.second, val);
}
template<int LoadMode>
EIGEN_DEVICE_FUNC
inline PacketScalar packet(Index index) const
{
const RowCol row_col = index_remap(RowsAtCompileTime == 1 ? 0 : index,
RowsAtCompileTime == 1 ? index : 0);
return m_argImpl.template packet<Unaligned>(row_col.first, row_col.second);
}
template<int LoadMode>
EIGEN_DEVICE_FUNC
inline void writePacket(Index index, const PacketScalar& val)
{
const RowCol row_col = index_remap(RowsAtCompileTime == 1 ? 0 : index,
RowsAtCompileTime == 1 ? index : 0);
return m_argImpl.template packet<Unaligned>(row_col.first, row_col.second, val);
}
#endif
protected:
evaluator<ArgType> m_argImpl;
const XprType& m_xpr;
};
template<typename ArgType, int Rows, int Cols, int Order>
struct reshaped_evaluator<ArgType, Rows, Cols, Order, /* HasDirectAccess */ true>
: mapbase_evaluator<Reshaped<ArgType, Rows, Cols, Order>,
typename Reshaped<ArgType, Rows, Cols, Order>::PlainObject>
{
typedef Reshaped<ArgType, Rows, Cols, Order> XprType;
typedef typename XprType::Scalar Scalar;
EIGEN_DEVICE_FUNC explicit reshaped_evaluator(const XprType& xpr)
: mapbase_evaluator<XprType, typename XprType::PlainObject>(xpr)
{
// TODO: for the 3.4 release, this should be turned to an internal assertion, but let's keep it as is for the beta lifetime
eigen_assert(((internal::UIntPtr(xpr.data()) % EIGEN_PLAIN_ENUM_MAX(1,evaluator<XprType>::Alignment)) == 0) && "data is not aligned");
}
};
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_RESHAPED_H

View File

@@ -79,7 +79,7 @@ template<typename Derived> class ReturnByValue
template<typename Derived>
template<typename OtherDerived>
Derived& DenseBase<Derived>::operator=(const ReturnByValue<OtherDerived>& other)
EIGEN_DEVICE_FUNC Derived& DenseBase<Derived>::operator=(const ReturnByValue<OtherDerived>& other)
{
other.evalTo(derived());
return derived();

View File

@@ -114,7 +114,7 @@ template<typename MatrixType, int Direction> class Reverse
*
*/
template<typename Derived>
inline typename DenseBase<Derived>::ReverseReturnType
EIGEN_DEVICE_FUNC inline typename DenseBase<Derived>::ReverseReturnType
DenseBase<Derived>::reverse()
{
return ReverseReturnType(derived());
@@ -136,7 +136,7 @@ DenseBase<Derived>::reverse()
*
* \sa VectorwiseOp::reverseInPlace(), reverse() */
template<typename Derived>
inline void DenseBase<Derived>::reverseInPlace()
EIGEN_DEVICE_FUNC inline void DenseBase<Derived>::reverseInPlace()
{
if(cols()>rows())
{
@@ -171,8 +171,10 @@ struct vectorwise_reverse_inplace_impl<Vertical>
template<typename ExpressionType>
static void run(ExpressionType &xpr)
{
const int HalfAtCompileTime = ExpressionType::RowsAtCompileTime==Dynamic?Dynamic:ExpressionType::RowsAtCompileTime/2;
Index half = xpr.rows()/2;
xpr.topRows(half).swap(xpr.bottomRows(half).colwise().reverse());
xpr.topRows(fix<HalfAtCompileTime>(half))
.swap(xpr.bottomRows(fix<HalfAtCompileTime>(half)).colwise().reverse());
}
};
@@ -182,8 +184,10 @@ struct vectorwise_reverse_inplace_impl<Horizontal>
template<typename ExpressionType>
static void run(ExpressionType &xpr)
{
const int HalfAtCompileTime = ExpressionType::ColsAtCompileTime==Dynamic?Dynamic:ExpressionType::ColsAtCompileTime/2;
Index half = xpr.cols()/2;
xpr.leftCols(half).swap(xpr.rightCols(half).rowwise().reverse());
xpr.leftCols(fix<HalfAtCompileTime>(half))
.swap(xpr.rightCols(fix<HalfAtCompileTime>(half)).rowwise().reverse());
}
};
@@ -201,9 +205,9 @@ struct vectorwise_reverse_inplace_impl<Horizontal>
*
* \sa DenseBase::reverseInPlace(), reverse() */
template<typename ExpressionType, int Direction>
void VectorwiseOp<ExpressionType,Direction>::reverseInPlace()
EIGEN_DEVICE_FUNC void VectorwiseOp<ExpressionType,Direction>::reverseInPlace()
{
internal::vectorwise_reverse_inplace_impl<Direction>::run(_expression().const_cast_derived());
internal::vectorwise_reverse_inplace_impl<Direction>::run(m_matrix);
}
} // end namespace Eigen

View File

@@ -61,6 +61,7 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
typedef typename internal::traits<SelfAdjointView>::Scalar Scalar;
typedef typename MatrixType::StorageIndex StorageIndex;
typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
typedef SelfAdjointView<typename internal::add_const<MatrixType>::type, UpLo> ConstSelfAdjointView;
enum {
Mode = internal::traits<SelfAdjointView>::Mode,
@@ -197,6 +198,18 @@ template<typename _MatrixType, unsigned int UpLo> class SelfAdjointView
inline const ConjugateReturnType conjugate() const
{ return ConjugateReturnType(m_matrix.conjugate()); }
/** \returns an expression of the complex conjugate of \c *this if Cond==true,
* returns \c *this otherwise.
*/
template<bool Cond>
EIGEN_DEVICE_FUNC
inline typename internal::conditional<Cond,ConjugateReturnType,ConstSelfAdjointView>::type
conjugateIf() const
{
typedef typename internal::conditional<Cond,ConjugateReturnType,ConstSelfAdjointView>::type ReturnType;
return ReturnType(m_matrix.template conjugateIf<Cond>());
}
typedef SelfAdjointView<const typename MatrixType::AdjointReturnType,TransposeMode> AdjointReturnType;
/** \sa MatrixBase::adjoint() const */
EIGEN_DEVICE_FUNC
@@ -324,7 +337,7 @@ public:
/** This is the const version of MatrixBase::selfadjointView() */
template<typename Derived>
template<unsigned int UpLo>
typename MatrixBase<Derived>::template ConstSelfAdjointViewReturnType<UpLo>::Type
EIGEN_DEVICE_FUNC typename MatrixBase<Derived>::template ConstSelfAdjointViewReturnType<UpLo>::Type
MatrixBase<Derived>::selfadjointView() const
{
return typename ConstSelfAdjointViewReturnType<UpLo>::Type(derived());
@@ -341,7 +354,7 @@ MatrixBase<Derived>::selfadjointView() const
*/
template<typename Derived>
template<unsigned int UpLo>
typename MatrixBase<Derived>::template SelfAdjointViewReturnType<UpLo>::Type
EIGEN_DEVICE_FUNC typename MatrixBase<Derived>::template SelfAdjointViewReturnType<UpLo>::Type
MatrixBase<Derived>::selfadjointView()
{
return typename SelfAdjointViewReturnType<UpLo>::Type(derived());

View File

@@ -19,7 +19,7 @@ template<typename Decomposition, typename RhsType, typename StorageKind> class S
*
* \brief Pseudo expression representing a solving operation
*
* \tparam Decomposition the type of the matrix or decomposion object
* \tparam Decomposition the type of the matrix or decomposition object
* \tparam Rhstype the type of the right-hand side
*
* This class represents an expression of A.solve(B)
@@ -181,7 +181,7 @@ struct Assignment<DstXprType, Solve<CwiseUnaryOp<internal::scalar_conjugate_op<t
}
};
} // end namepsace internal
} // end namespace internal
} // end namespace Eigen

View File

@@ -164,7 +164,7 @@ struct triangular_solver_selector<Lhs,Rhs,OnTheRight,Mode,CompleteUnrolling,1> {
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename MatrixType, unsigned int Mode>
template<int Side, typename OtherDerived>
void TriangularViewImpl<MatrixType,Mode,Dense>::solveInPlace(const MatrixBase<OtherDerived>& _other) const
EIGEN_DEVICE_FUNC void TriangularViewImpl<MatrixType,Mode,Dense>::solveInPlace(const MatrixBase<OtherDerived>& _other) const
{
OtherDerived& other = _other.const_cast_derived();
eigen_assert( derived().cols() == derived().rows() && ((Side==OnTheLeft && derived().cols() == other.rows()) || (Side==OnTheRight && derived().cols() == other.cols())) );

View File

@@ -14,8 +14,35 @@ namespace Eigen {
namespace internal {
template<typename Derived>
struct solve_assertion {
template<bool Transpose_, typename Rhs>
static void run(const Derived& solver, const Rhs& b) { solver.template _check_solve_assertion<Transpose_>(b); }
};
template<typename Derived>
struct solve_assertion<Transpose<Derived> >
{
typedef Transpose<Derived> type;
template<bool Transpose_, typename Rhs>
static void run(const type& transpose, const Rhs& b)
{
internal::solve_assertion<typename internal::remove_all<Derived>::type>::template run<true>(transpose.nestedExpression(), b);
}
};
template<typename Scalar, typename Derived>
struct solve_assertion<CwiseUnaryOp<Eigen::internal::scalar_conjugate_op<Scalar>, const Transpose<Derived> > >
{
typedef CwiseUnaryOp<Eigen::internal::scalar_conjugate_op<Scalar>, const Transpose<Derived> > type;
template<bool Transpose_, typename Rhs>
static void run(const type& adjoint, const Rhs& b)
{
internal::solve_assertion<typename internal::remove_all<Transpose<Derived> >::type>::template run<true>(adjoint.nestedExpression(), b);
}
};
} // end namespace internal
/** \class SolverBase
@@ -35,7 +62,7 @@ namespace internal {
*
* \warning Currently, any other usage of transpose() and adjoint() are not supported and will produce compilation errors.
*
* \sa class PartialPivLU, class FullPivLU
* \sa class PartialPivLU, class FullPivLU, class HouseholderQR, class ColPivHouseholderQR, class FullPivHouseholderQR, class CompleteOrthogonalDecomposition, class LLT, class LDLT, class SVDBase
*/
template<typename Derived>
class SolverBase : public EigenBase<Derived>
@@ -46,6 +73,9 @@ class SolverBase : public EigenBase<Derived>
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef Scalar CoeffReturnType;
template<typename Derived_>
friend struct internal::solve_assertion;
enum {
RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
@@ -56,7 +86,8 @@ class SolverBase : public EigenBase<Derived>
MaxSizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::MaxRowsAtCompileTime,
internal::traits<Derived>::MaxColsAtCompileTime>::ret),
IsVectorAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime == 1
|| internal::traits<Derived>::MaxColsAtCompileTime == 1
|| internal::traits<Derived>::MaxColsAtCompileTime == 1,
NumDimensions = int(MaxSizeAtCompileTime) == 1 ? 0 : bool(IsVectorAtCompileTime) ? 1 : 2
};
/** Default constructor */
@@ -74,7 +105,7 @@ class SolverBase : public EigenBase<Derived>
inline const Solve<Derived, Rhs>
solve(const MatrixBase<Rhs>& b) const
{
eigen_assert(derived().rows()==b.rows() && "solve(): invalid number of rows of the right hand side matrix b");
internal::solve_assertion<typename internal::remove_all<Derived>::type>::template run<false>(derived(), b);
return Solve<Derived, Rhs>(derived(), b.derived());
}
@@ -112,6 +143,13 @@ class SolverBase : public EigenBase<Derived>
}
protected:
template<bool Transpose_, typename Rhs>
void _check_solve_assertion(const Rhs& b) const {
EIGEN_ONLY_USED_FOR_DEBUG(b);
eigen_assert(derived().m_isInitialized && "Solver is not initialized.");
eigen_assert((Transpose_?derived().cols():derived().rows())==b.rows() && "SolverBase::solve(): invalid number of rows of the right hand side matrix b");
}
};
namespace internal {

View File

@@ -50,6 +50,71 @@ inline void stable_norm_kernel(const ExpressionType& bl, Scalar& ssq, Scalar& sc
ssq += (bl*invScale).squaredNorm();
}
template<typename VectorType, typename RealScalar>
void stable_norm_impl_inner_step(const VectorType &vec, RealScalar& ssq, RealScalar& scale, RealScalar& invScale)
{
typedef typename VectorType::Scalar Scalar;
const Index blockSize = 4096;
typedef typename internal::nested_eval<VectorType,2>::type VectorTypeCopy;
typedef typename internal::remove_all<VectorTypeCopy>::type VectorTypeCopyClean;
const VectorTypeCopy copy(vec);
enum {
CanAlign = ( (int(VectorTypeCopyClean::Flags)&DirectAccessBit)
|| (int(internal::evaluator<VectorTypeCopyClean>::Alignment)>0) // FIXME Alignment)>0 might not be enough
) && (blockSize*sizeof(Scalar)*2<EIGEN_STACK_ALLOCATION_LIMIT)
&& (EIGEN_MAX_STATIC_ALIGN_BYTES>0) // if we cannot allocate on the stack, then let's not bother about this optimization
};
typedef typename internal::conditional<CanAlign, Ref<const Matrix<Scalar,Dynamic,1,0,blockSize,1>, internal::evaluator<VectorTypeCopyClean>::Alignment>,
typename VectorTypeCopyClean::ConstSegmentReturnType>::type SegmentWrapper;
Index n = vec.size();
Index bi = internal::first_default_aligned(copy);
if (bi>0)
internal::stable_norm_kernel(copy.head(bi), ssq, scale, invScale);
for (; bi<n; bi+=blockSize)
internal::stable_norm_kernel(SegmentWrapper(copy.segment(bi,numext::mini(blockSize, n - bi))), ssq, scale, invScale);
}
template<typename VectorType>
typename VectorType::RealScalar
stable_norm_impl(const VectorType &vec, typename enable_if<VectorType::IsVectorAtCompileTime>::type* = 0 )
{
using std::sqrt;
using std::abs;
Index n = vec.size();
if(n==1)
return abs(vec.coeff(0));
typedef typename VectorType::RealScalar RealScalar;
RealScalar scale(0);
RealScalar invScale(1);
RealScalar ssq(0); // sum of squares
stable_norm_impl_inner_step(vec, ssq, scale, invScale);
return scale * sqrt(ssq);
}
template<typename MatrixType>
typename MatrixType::RealScalar
stable_norm_impl(const MatrixType &mat, typename enable_if<!MatrixType::IsVectorAtCompileTime>::type* = 0 )
{
using std::sqrt;
typedef typename MatrixType::RealScalar RealScalar;
RealScalar scale(0);
RealScalar invScale(1);
RealScalar ssq(0); // sum of squares
for(Index j=0; j<mat.outerSize(); ++j)
stable_norm_impl_inner_step(mat.innerVector(j), ssq, scale, invScale);
return scale * sqrt(ssq);
}
template<typename Derived>
inline typename NumTraits<typename traits<Derived>::Scalar>::Real
blueNorm_impl(const EigenBase<Derived>& _vec)
@@ -74,7 +139,7 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
// are used. For any specific computer, each of the assignment
// statements can be replaced
ibeta = std::numeric_limits<RealScalar>::radix; // base for floating-point numbers
it = std::numeric_limits<RealScalar>::digits; // number of base-beta digits in mantissa
it = NumTraits<RealScalar>::digits(); // number of base-beta digits in mantissa
iemin = std::numeric_limits<RealScalar>::min_exponent; // minimum exponent
iemax = std::numeric_limits<RealScalar>::max_exponent; // maximum exponent
rbig = (std::numeric_limits<RealScalar>::max)(); // largest floating-point number
@@ -98,12 +163,16 @@ blueNorm_impl(const EigenBase<Derived>& _vec)
RealScalar asml = RealScalar(0);
RealScalar amed = RealScalar(0);
RealScalar abig = RealScalar(0);
for(typename Derived::InnerIterator it(vec, 0); it; ++it)
for(Index j=0; j<vec.outerSize(); ++j)
{
RealScalar ax = abs(it.value());
if(ax > ab2) abig += numext::abs2(ax*s2m);
else if(ax < b1) asml += numext::abs2(ax*s1m);
else amed += numext::abs2(ax);
for(typename Derived::InnerIterator it(vec, j); it; ++it)
{
RealScalar ax = abs(it.value());
if(ax > ab2) abig += numext::abs2(ax*s2m);
else if(ax < b1) asml += numext::abs2(ax*s1m);
else amed += numext::abs2(ax);
}
}
if(amed!=amed)
return amed; // we got a NaN
@@ -156,36 +225,7 @@ template<typename Derived>
inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
MatrixBase<Derived>::stableNorm() const
{
using std::sqrt;
using std::abs;
const Index blockSize = 4096;
RealScalar scale(0);
RealScalar invScale(1);
RealScalar ssq(0); // sum of square
typedef typename internal::nested_eval<Derived,2>::type DerivedCopy;
typedef typename internal::remove_all<DerivedCopy>::type DerivedCopyClean;
const DerivedCopy copy(derived());
enum {
CanAlign = ( (int(DerivedCopyClean::Flags)&DirectAccessBit)
|| (int(internal::evaluator<DerivedCopyClean>::Alignment)>0) // FIXME Alignment)>0 might not be enough
) && (blockSize*sizeof(Scalar)*2<EIGEN_STACK_ALLOCATION_LIMIT)
&& (EIGEN_MAX_STATIC_ALIGN_BYTES>0) // if we cannot allocate on the stack, then let's not bother about this optimization
};
typedef typename internal::conditional<CanAlign, Ref<const Matrix<Scalar,Dynamic,1,0,blockSize,1>, internal::evaluator<DerivedCopyClean>::Alignment>,
typename DerivedCopyClean::ConstSegmentReturnType>::type SegmentWrapper;
Index n = size();
if(n==1)
return abs(this->coeff(0));
Index bi = internal::first_default_aligned(copy);
if (bi>0)
internal::stable_norm_kernel(copy.head(bi), ssq, scale, invScale);
for (; bi<n; bi+=blockSize)
internal::stable_norm_kernel(SegmentWrapper(copy.segment(bi,numext::mini(blockSize, n - bi))), ssq, scale, invScale);
return scale * sqrt(ssq);
return internal::stable_norm_impl(derived());
}
/** \returns the \em l2 norm of \c *this using the Blue's algorithm.
@@ -213,7 +253,10 @@ template<typename Derived>
inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
MatrixBase<Derived>::hypotNorm() const
{
return this->cwiseAbs().redux(internal::scalar_hypot_op<RealScalar>());
if(size()==1)
return numext::abs(coeff(0,0));
else
return this->cwiseAbs().redux(internal::scalar_hypot_op<RealScalar>());
}
} // end namespace Eigen

View File

@@ -0,0 +1,331 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2018 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
namespace Eigen {
namespace internal {
template<typename IteratorType>
struct indexed_based_stl_iterator_traits;
template<typename Derived>
class indexed_based_stl_iterator_base
{
protected:
typedef indexed_based_stl_iterator_traits<Derived> traits;
typedef typename traits::XprType XprType;
typedef indexed_based_stl_iterator_base<typename traits::non_const_iterator> non_const_iterator;
typedef indexed_based_stl_iterator_base<typename traits::const_iterator> const_iterator;
typedef typename internal::conditional<internal::is_const<XprType>::value,non_const_iterator,const_iterator>::type other_iterator;
// NOTE: in C++03 we cannot declare friend classes through typedefs because we need to write friend class:
friend class indexed_based_stl_iterator_base<typename traits::const_iterator>;
friend class indexed_based_stl_iterator_base<typename traits::non_const_iterator>;
public:
typedef Index difference_type;
typedef std::random_access_iterator_tag iterator_category;
indexed_based_stl_iterator_base() : mp_xpr(0), m_index(0) {}
indexed_based_stl_iterator_base(XprType& xpr, Index index) : mp_xpr(&xpr), m_index(index) {}
indexed_based_stl_iterator_base(const non_const_iterator& other)
: mp_xpr(other.mp_xpr), m_index(other.m_index)
{}
indexed_based_stl_iterator_base& operator=(const non_const_iterator& other)
{
mp_xpr = other.mp_xpr;
m_index = other.m_index;
return *this;
}
Derived& operator++() { ++m_index; return derived(); }
Derived& operator--() { --m_index; return derived(); }
Derived operator++(int) { Derived prev(derived()); operator++(); return prev;}
Derived operator--(int) { Derived prev(derived()); operator--(); return prev;}
friend Derived operator+(const indexed_based_stl_iterator_base& a, Index b) { Derived ret(a.derived()); ret += b; return ret; }
friend Derived operator-(const indexed_based_stl_iterator_base& a, Index b) { Derived ret(a.derived()); ret -= b; return ret; }
friend Derived operator+(Index a, const indexed_based_stl_iterator_base& b) { Derived ret(b.derived()); ret += a; return ret; }
friend Derived operator-(Index a, const indexed_based_stl_iterator_base& b) { Derived ret(b.derived()); ret -= a; return ret; }
Derived& operator+=(Index b) { m_index += b; return derived(); }
Derived& operator-=(Index b) { m_index -= b; return derived(); }
difference_type operator-(const indexed_based_stl_iterator_base& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return m_index - other.m_index;
}
difference_type operator-(const other_iterator& other) const
{
eigen_assert(mp_xpr == other.mp_xpr);
return m_index - other.m_index;
}
bool operator==(const indexed_based_stl_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const indexed_based_stl_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const indexed_based_stl_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator<=(const indexed_based_stl_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
bool operator> (const indexed_based_stl_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator>=(const indexed_based_stl_iterator_base& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
bool operator==(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index == other.m_index; }
bool operator!=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index != other.m_index; }
bool operator< (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index < other.m_index; }
bool operator<=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index <= other.m_index; }
bool operator> (const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index > other.m_index; }
bool operator>=(const other_iterator& other) const { eigen_assert(mp_xpr == other.mp_xpr); return m_index >= other.m_index; }
protected:
Derived& derived() { return static_cast<Derived&>(*this); }
const Derived& derived() const { return static_cast<const Derived&>(*this); }
XprType *mp_xpr;
Index m_index;
};
template<typename XprType>
class pointer_based_stl_iterator
{
enum { is_lvalue = internal::is_lvalue<XprType>::value };
typedef pointer_based_stl_iterator<typename internal::remove_const<XprType>::type> non_const_iterator;
typedef pointer_based_stl_iterator<typename internal::add_const<XprType>::type> const_iterator;
typedef typename internal::conditional<internal::is_const<XprType>::value,non_const_iterator,const_iterator>::type other_iterator;
// NOTE: in C++03 we cannot declare friend classes through typedefs because we need to write friend class:
friend class pointer_based_stl_iterator<typename internal::add_const<XprType>::type>;
friend class pointer_based_stl_iterator<typename internal::remove_const<XprType>::type>;
public:
typedef Index difference_type;
typedef typename XprType::Scalar value_type;
typedef std::random_access_iterator_tag iterator_category;
typedef typename internal::conditional<bool(is_lvalue), value_type*, const value_type*>::type pointer;
typedef typename internal::conditional<bool(is_lvalue), value_type&, const value_type&>::type reference;
pointer_based_stl_iterator() : m_ptr(0) {}
pointer_based_stl_iterator(XprType& xpr, Index index) : m_incr(xpr.innerStride())
{
m_ptr = xpr.data() + index * m_incr.value();
}
pointer_based_stl_iterator(const non_const_iterator& other)
: m_ptr(other.m_ptr), m_incr(other.m_incr)
{}
pointer_based_stl_iterator& operator=(const non_const_iterator& other)
{
m_ptr = other.m_ptr;
m_incr.setValue(other.m_incr);
return *this;
}
reference operator*() const { return *m_ptr; }
reference operator[](Index i) const { return *(m_ptr+i*m_incr.value()); }
pointer operator->() const { return m_ptr; }
pointer_based_stl_iterator& operator++() { m_ptr += m_incr.value(); return *this; }
pointer_based_stl_iterator& operator--() { m_ptr -= m_incr.value(); return *this; }
pointer_based_stl_iterator operator++(int) { pointer_based_stl_iterator prev(*this); operator++(); return prev;}
pointer_based_stl_iterator operator--(int) { pointer_based_stl_iterator prev(*this); operator--(); return prev;}
friend pointer_based_stl_iterator operator+(const pointer_based_stl_iterator& a, Index b) { pointer_based_stl_iterator ret(a); ret += b; return ret; }
friend pointer_based_stl_iterator operator-(const pointer_based_stl_iterator& a, Index b) { pointer_based_stl_iterator ret(a); ret -= b; return ret; }
friend pointer_based_stl_iterator operator+(Index a, const pointer_based_stl_iterator& b) { pointer_based_stl_iterator ret(b); ret += a; return ret; }
friend pointer_based_stl_iterator operator-(Index a, const pointer_based_stl_iterator& b) { pointer_based_stl_iterator ret(b); ret -= a; return ret; }
pointer_based_stl_iterator& operator+=(Index b) { m_ptr += b*m_incr.value(); return *this; }
pointer_based_stl_iterator& operator-=(Index b) { m_ptr -= b*m_incr.value(); return *this; }
difference_type operator-(const pointer_based_stl_iterator& other) const {
return (m_ptr - other.m_ptr)/m_incr.value();
}
difference_type operator-(const other_iterator& other) const {
return (m_ptr - other.m_ptr)/m_incr.value();
}
bool operator==(const pointer_based_stl_iterator& other) const { return m_ptr == other.m_ptr; }
bool operator!=(const pointer_based_stl_iterator& other) const { return m_ptr != other.m_ptr; }
bool operator< (const pointer_based_stl_iterator& other) const { return m_ptr < other.m_ptr; }
bool operator<=(const pointer_based_stl_iterator& other) const { return m_ptr <= other.m_ptr; }
bool operator> (const pointer_based_stl_iterator& other) const { return m_ptr > other.m_ptr; }
bool operator>=(const pointer_based_stl_iterator& other) const { return m_ptr >= other.m_ptr; }
bool operator==(const other_iterator& other) const { return m_ptr == other.m_ptr; }
bool operator!=(const other_iterator& other) const { return m_ptr != other.m_ptr; }
bool operator< (const other_iterator& other) const { return m_ptr < other.m_ptr; }
bool operator<=(const other_iterator& other) const { return m_ptr <= other.m_ptr; }
bool operator> (const other_iterator& other) const { return m_ptr > other.m_ptr; }
bool operator>=(const other_iterator& other) const { return m_ptr >= other.m_ptr; }
protected:
pointer m_ptr;
internal::variable_if_dynamic<Index, XprType::InnerStrideAtCompileTime> m_incr;
};
template<typename _XprType>
struct indexed_based_stl_iterator_traits<generic_randaccess_stl_iterator<_XprType> >
{
typedef _XprType XprType;
typedef generic_randaccess_stl_iterator<typename internal::remove_const<XprType>::type> non_const_iterator;
typedef generic_randaccess_stl_iterator<typename internal::add_const<XprType>::type> const_iterator;
};
template<typename XprType>
class generic_randaccess_stl_iterator : public indexed_based_stl_iterator_base<generic_randaccess_stl_iterator<XprType> >
{
public:
typedef typename XprType::Scalar value_type;
protected:
enum {
has_direct_access = (internal::traits<XprType>::Flags & DirectAccessBit) ? 1 : 0,
is_lvalue = internal::is_lvalue<XprType>::value
};
typedef indexed_based_stl_iterator_base<generic_randaccess_stl_iterator> Base;
using Base::m_index;
using Base::mp_xpr;
// TODO currently const Transpose/Reshape expressions never returns const references,
// so lets return by value too.
//typedef typename internal::conditional<bool(has_direct_access), const value_type&, const value_type>::type read_only_ref_t;
typedef const value_type read_only_ref_t;
public:
typedef typename internal::conditional<bool(is_lvalue), value_type *, const value_type *>::type pointer;
typedef typename internal::conditional<bool(is_lvalue), value_type&, read_only_ref_t>::type reference;
generic_randaccess_stl_iterator() : Base() {}
generic_randaccess_stl_iterator(XprType& xpr, Index index) : Base(xpr,index) {}
generic_randaccess_stl_iterator(const typename Base::non_const_iterator& other) : Base(other) {}
using Base::operator=;
reference operator*() const { return (*mp_xpr)(m_index); }
reference operator[](Index i) const { return (*mp_xpr)(m_index+i); }
pointer operator->() const { return &((*mp_xpr)(m_index)); }
};
template<typename _XprType, DirectionType Direction>
struct indexed_based_stl_iterator_traits<subvector_stl_iterator<_XprType,Direction> >
{
typedef _XprType XprType;
typedef subvector_stl_iterator<typename internal::remove_const<XprType>::type, Direction> non_const_iterator;
typedef subvector_stl_iterator<typename internal::add_const<XprType>::type, Direction> const_iterator;
};
template<typename XprType, DirectionType Direction>
class subvector_stl_iterator : public indexed_based_stl_iterator_base<subvector_stl_iterator<XprType,Direction> >
{
protected:
enum { is_lvalue = internal::is_lvalue<XprType>::value };
typedef indexed_based_stl_iterator_base<subvector_stl_iterator> Base;
using Base::m_index;
using Base::mp_xpr;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ColXpr,typename XprType::RowXpr>::type SubVectorType;
typedef typename internal::conditional<Direction==Vertical,typename XprType::ConstColXpr,typename XprType::ConstRowXpr>::type ConstSubVectorType;
public:
typedef typename internal::conditional<bool(is_lvalue), SubVectorType, ConstSubVectorType>::type reference;
typedef typename reference::PlainObject value_type;
private:
class subvector_stl_iterator_ptr
{
public:
subvector_stl_iterator_ptr(const reference &subvector) : m_subvector(subvector) {}
reference* operator->() { return &m_subvector; }
private:
reference m_subvector;
};
public:
typedef subvector_stl_iterator_ptr pointer;
subvector_stl_iterator() : Base() {}
subvector_stl_iterator(XprType& xpr, Index index) : Base(xpr,index) {}
reference operator*() const { return (*mp_xpr).template subVector<Direction>(m_index); }
reference operator[](Index i) const { return (*mp_xpr).template subVector<Direction>(m_index+i); }
pointer operator->() const { return (*mp_xpr).template subVector<Direction>(m_index); }
};
} // namespace internal
/** returns an iterator to the first element of the 1D vector or array
* \only_for_vectors
* \sa end(), cbegin()
*/
template<typename Derived>
inline typename DenseBase<Derived>::iterator DenseBase<Derived>::begin()
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
return iterator(derived(), 0);
}
/** const version of begin() */
template<typename Derived>
inline typename DenseBase<Derived>::const_iterator DenseBase<Derived>::begin() const
{
return cbegin();
}
/** returns a read-only const_iterator to the first element of the 1D vector or array
* \only_for_vectors
* \sa cend(), begin()
*/
template<typename Derived>
inline typename DenseBase<Derived>::const_iterator DenseBase<Derived>::cbegin() const
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
return const_iterator(derived(), 0);
}
/** returns an iterator to the element following the last element of the 1D vector or array
* \only_for_vectors
* \sa begin(), cend()
*/
template<typename Derived>
inline typename DenseBase<Derived>::iterator DenseBase<Derived>::end()
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
return iterator(derived(), size());
}
/** const version of end() */
template<typename Derived>
inline typename DenseBase<Derived>::const_iterator DenseBase<Derived>::end() const
{
return cend();
}
/** returns a read-only const_iterator to the element following the last element of the 1D vector or array
* \only_for_vectors
* \sa begin(), cend()
*/
template<typename Derived>
inline typename DenseBase<Derived>::const_iterator DenseBase<Derived>::cend() const
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
return const_iterator(derived(), size());
}
} // namespace Eigen

View File

@@ -30,12 +30,13 @@ public:
typedef typename Base::DstXprType DstXprType;
typedef swap_assign_op<Scalar> Functor;
EIGEN_DEVICE_FUNC generic_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
generic_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr)
: Base(dst, src, func, dstExpr)
{}
template<int StoreMode, int LoadMode, typename PacketType>
void assignPacket(Index row, Index col)
EIGEN_STRONG_INLINE void assignPacket(Index row, Index col)
{
PacketType tmp = m_src.template packet<LoadMode,PacketType>(row,col);
const_cast<SrcEvaluatorTypeT&>(m_src).template writePacket<LoadMode>(row,col, m_dst.template packet<StoreMode,PacketType>(row,col));
@@ -43,7 +44,7 @@ public:
}
template<int StoreMode, int LoadMode, typename PacketType>
void assignPacket(Index index)
EIGEN_STRONG_INLINE void assignPacket(Index index)
{
PacketType tmp = m_src.template packet<LoadMode,PacketType>(index);
const_cast<SrcEvaluatorTypeT&>(m_src).template writePacket<LoadMode>(index, m_dst.template packet<StoreMode,PacketType>(index));
@@ -52,7 +53,7 @@ public:
// TODO find a simple way not to have to copy/paste this function from generic_dense_assignment_kernel, by simple I mean no CRTP (Gael)
template<int StoreMode, int LoadMode, typename PacketType>
void assignPacketByOuterInner(Index outer, Index inner)
EIGEN_STRONG_INLINE void assignPacketByOuterInner(Index outer, Index inner)
{
Index row = Base::rowIndexByOuterInner(outer, inner);
Index col = Base::colIndexByOuterInner(outer, inner);

View File

@@ -61,24 +61,27 @@ template<typename MatrixType> class Transpose
typedef typename internal::remove_all<MatrixType>::type NestedExpression;
EIGEN_DEVICE_FUNC
explicit inline Transpose(MatrixType& matrix) : m_matrix(matrix) {}
explicit EIGEN_STRONG_INLINE Transpose(MatrixType& matrix) : m_matrix(matrix) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Transpose)
EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_matrix.cols(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_matrix.rows(); }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<MatrixTypeNested>::type&
nestedExpression() const { return m_matrix; }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
typename internal::remove_reference<MatrixTypeNested>::type&
nestedExpression() { return m_matrix; }
/** \internal */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
void resize(Index nrows, Index ncols) {
m_matrix.resize(ncols,nrows);
}
@@ -122,8 +125,10 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
EIGEN_DENSE_PUBLIC_INTERFACE(Transpose<MatrixType>)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TransposeImpl)
EIGEN_DEVICE_FUNC inline Index innerStride() const { return derived().nestedExpression().innerStride(); }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return derived().nestedExpression().outerStride(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index innerStride() const { return derived().nestedExpression().innerStride(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index outerStride() const { return derived().nestedExpression().outerStride(); }
typedef typename internal::conditional<
internal::is_lvalue<MatrixType>::value,
@@ -131,23 +136,23 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
const Scalar
>::type ScalarWithConstIfNotLvalue;
EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue* data() { return derived().nestedExpression().data(); }
EIGEN_DEVICE_FUNC inline const Scalar* data() const { return derived().nestedExpression().data(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
ScalarWithConstIfNotLvalue* data() { return derived().nestedExpression().data(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar* data() const { return derived().nestedExpression().data(); }
// FIXME: shall we keep the const version of coeffRef?
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index rowId, Index colId) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar& coeffRef(Index rowId, Index colId) const
{
return derived().nestedExpression().coeffRef(colId, rowId);
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index index) const
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const Scalar& coeffRef(Index index) const
{
return derived().nestedExpression().coeffRef(index);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TransposeImpl)
};
/** \returns an expression of the transpose of *this.
@@ -170,7 +175,8 @@ template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
*
* \sa transposeInPlace(), adjoint() */
template<typename Derived>
inline Transpose<Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Transpose<Derived>
DenseBase<Derived>::transpose()
{
return TransposeReturnType(derived());
@@ -182,7 +188,8 @@ DenseBase<Derived>::transpose()
*
* \sa transposeInPlace(), adjoint() */
template<typename Derived>
inline typename DenseBase<Derived>::ConstTransposeReturnType
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
typename DenseBase<Derived>::ConstTransposeReturnType
DenseBase<Derived>::transpose() const
{
return ConstTransposeReturnType(derived());
@@ -208,7 +215,7 @@ DenseBase<Derived>::transpose() const
*
* \sa adjointInPlace(), transpose(), conjugate(), class Transpose, class internal::scalar_conjugate_op */
template<typename Derived>
inline const typename MatrixBase<Derived>::AdjointReturnType
EIGEN_DEVICE_FUNC inline const typename MatrixBase<Derived>::AdjointReturnType
MatrixBase<Derived>::adjoint() const
{
return AdjointReturnType(this->transpose());
@@ -230,7 +237,7 @@ struct inplace_transpose_selector;
template<typename MatrixType>
struct inplace_transpose_selector<MatrixType,true,false> { // square matrix
static void run(MatrixType& m) {
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose());
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
}
};
@@ -255,7 +262,7 @@ template<typename MatrixType,bool MatchPacketSize>
struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non square matrix
static void run(MatrixType& m) {
if (m.rows()==m.cols())
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose());
m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose().template triangularView<StrictlyUpper>());
else
m = m.transpose().eval();
}
@@ -283,7 +290,7 @@ struct inplace_transpose_selector<MatrixType,false,MatchPacketSize> { // non squ
*
* \sa transpose(), adjoint(), adjointInPlace() */
template<typename Derived>
inline void DenseBase<Derived>::transposeInPlace()
EIGEN_DEVICE_FUNC inline void DenseBase<Derived>::transposeInPlace()
{
eigen_assert((rows() == cols() || (RowsAtCompileTime == Dynamic && ColsAtCompileTime == Dynamic))
&& "transposeInPlace() called on a non-square non-resizable matrix");
@@ -314,7 +321,7 @@ inline void DenseBase<Derived>::transposeInPlace()
*
* \sa transpose(), adjoint(), transposeInPlace() */
template<typename Derived>
inline void MatrixBase<Derived>::adjointInPlace()
EIGEN_DEVICE_FUNC inline void MatrixBase<Derived>::adjointInPlace()
{
derived() = adjoint().eval();
}
@@ -393,7 +400,8 @@ struct checkTransposeAliasing_impl<Derived, OtherDerived, false>
template<typename Dst, typename Src>
void check_for_aliasing(const Dst &dst, const Src &src)
{
internal::checkTransposeAliasing_impl<Dst, Src>::run(dst, src);
if((!Dst::IsVectorAtCompileTime) && dst.rows()>1 && dst.cols()>1)
internal::checkTransposeAliasing_impl<Dst, Src>::run(dst, src);
}
} // end namespace internal

View File

@@ -33,17 +33,6 @@ class TranspositionsBase
indices() = other.indices();
return derived();
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
Derived& operator=(const TranspositionsBase& other)
{
indices() = other.indices();
return derived();
}
#endif
/** \returns the number of transpositions */
Index size() const { return indices().size(); }
@@ -84,7 +73,7 @@ class TranspositionsBase
}
// FIXME: do we want such methods ?
// might be usefull when the target matrix expression is complex, e.g.:
// might be useful when the target matrix expression is complex, e.g.:
// object.matrix().block(..,..,..,..) = trans * object.matrix().block(..,..,..,..);
/*
template<typename MatrixType>
@@ -171,12 +160,6 @@ class Transpositions : public TranspositionsBase<Transpositions<SizeAtCompileTim
inline Transpositions(const TranspositionsBase<OtherDerived>& other)
: m_indices(other.indices()) {}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** Standard copy constructor. Defined only to prevent a default copy constructor
* from hiding the other templated constructor */
inline Transpositions(const Transpositions& other) : m_indices(other.indices()) {}
#endif
/** Generic constructor from expression of the transposition indices. */
template<typename Other>
explicit inline Transpositions(const MatrixBase<Other>& indices) : m_indices(indices)
@@ -189,17 +172,6 @@ class Transpositions : public TranspositionsBase<Transpositions<SizeAtCompileTim
return Base::operator=(other);
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
Transpositions& operator=(const Transpositions& other)
{
m_indices = other.m_indices;
return *this;
}
#endif
/** Constructs an uninitialized permutation matrix of given size.
*/
inline Transpositions(Index size) : m_indices(size)
@@ -306,17 +278,6 @@ class TranspositionsWrapper
return Base::operator=(other);
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
TranspositionsWrapper& operator=(const TranspositionsWrapper& other)
{
m_indices = other.m_indices;
return *this;
}
#endif
/** const version of indices(). */
const IndicesType& indices() const { return m_indices; }

View File

@@ -65,6 +65,7 @@ template<typename Derived> class TriangularBase : public EigenBase<Derived>
inline Index innerStride() const { return derived().innerStride(); }
// dummy resize function
EIGEN_DEVICE_FUNC
void resize(Index rows, Index cols)
{
EIGEN_UNUSED_VARIABLE(rows);
@@ -197,6 +198,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
typedef typename internal::traits<TriangularView>::MatrixTypeNestedNonRef MatrixTypeNestedNonRef;
typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
typedef TriangularView<typename internal::add_const<MatrixType>::type, _Mode> ConstTriangularView;
public:
@@ -217,7 +219,9 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
explicit inline TriangularView(MatrixType& matrix) : m_matrix(matrix)
{}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TriangularView)
using Base::operator=;
TriangularView& operator=(const TriangularView &other)
{ return Base::operator=(other); }
/** \copydoc EigenBase::rows() */
EIGEN_DEVICE_FUNC
@@ -240,6 +244,18 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularView
inline const ConjugateReturnType conjugate() const
{ return ConjugateReturnType(m_matrix.conjugate()); }
/** \returns an expression of the complex conjugate of \c *this if Cond==true,
* returns \c *this otherwise.
*/
template<bool Cond>
EIGEN_DEVICE_FUNC
inline typename internal::conditional<Cond,ConjugateReturnType,ConstTriangularView>::type
conjugateIf() const
{
typedef typename internal::conditional<Cond,ConjugateReturnType,ConstTriangularView>::type ReturnType;
return ReturnType(m_matrix.template conjugateIf<Cond>());
}
typedef TriangularView<const typename MatrixType::AdjointReturnType,TransposeMode> AdjointReturnType;
/** \sa MatrixBase::adjoint() const */
EIGEN_DEVICE_FUNC
@@ -433,14 +449,14 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
TriangularViewType& operator=(const TriangularViewImpl& other)
{ return *this = other.derived().nestedExpression(); }
/** \deprecated */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
/** \deprecated */
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC
void lazyAssign(const TriangularBase<OtherDerived>& other);
/** \deprecated */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
/** \deprecated */
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC
void lazyAssign(const MatrixBase<OtherDerived>& other);
#endif
@@ -468,7 +484,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
* \a Side==OnTheLeft (the default), or the right-inverse-multiply \a other * inverse(\c *this) if
* \a Side==OnTheRight.
*
* Note that the template parameter \c Side can be ommitted, in which case \c Side==OnTheLeft
* Note that the template parameter \c Side can be omitted, in which case \c Side==OnTheLeft
*
* The matrix \c *this must be triangular and invertible (i.e., all the coefficients of the
* diagonal must be non zero). It works as a forward (resp. backward) substitution if \c *this
@@ -486,7 +502,6 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
* \sa TriangularView::solveInPlace()
*/
template<int Side, typename Other>
EIGEN_DEVICE_FUNC
inline const internal::triangular_solve_retval<Side,TriangularViewType, Other>
solve(const MatrixBase<Other>& other) const;
@@ -495,7 +510,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
* \warning The parameter is only marked 'const' to make the C++ compiler accept a temporary expression here.
* This function will const_cast it, so constness isn't honored here.
*
* Note that the template parameter \c Side can be ommitted, in which case \c Side==OnTheLeft
* Note that the template parameter \c Side can be omitted, in which case \c Side==OnTheLeft
*
* See TriangularView:solve() for the details.
*/
@@ -521,10 +536,10 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
call_assignment(derived(), other.const_cast_derived(), internal::swap_assign_op<Scalar>());
}
/** \deprecated
* Shortcut for \code (*this).swap(other.triangularView<(*this)::Mode>()) \endcode */
/** Shortcut for \code (*this).swap(other.triangularView<(*this)::Mode>()) \endcode */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
/** \deprecated */
EIGEN_DEPRECATED EIGEN_DEVICE_FUNC
void swap(MatrixBase<OtherDerived> const & other)
{
EIGEN_STATIC_ASSERT_LVALUE(OtherDerived);
@@ -542,10 +557,6 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
template<typename ProductType>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE TriangularViewType& _assignProduct(const ProductType& prod, const Scalar& alpha, bool beta);
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(TriangularViewImpl)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TriangularViewImpl)
};
/***************************************************************************
@@ -556,7 +567,7 @@ template<typename _MatrixType, unsigned int _Mode> class TriangularViewImpl<_Mat
// FIXME should we keep that possibility
template<typename MatrixType, unsigned int Mode>
template<typename OtherDerived>
inline TriangularView<MatrixType, Mode>&
EIGEN_DEVICE_FUNC inline TriangularView<MatrixType, Mode>&
TriangularViewImpl<MatrixType, Mode, Dense>::operator=(const MatrixBase<OtherDerived>& other)
{
internal::call_assignment_no_alias(derived(), other.derived(), internal::assign_op<Scalar,typename OtherDerived::Scalar>());
@@ -566,7 +577,7 @@ TriangularViewImpl<MatrixType, Mode, Dense>::operator=(const MatrixBase<OtherDer
// FIXME should we keep that possibility
template<typename MatrixType, unsigned int Mode>
template<typename OtherDerived>
void TriangularViewImpl<MatrixType, Mode, Dense>::lazyAssign(const MatrixBase<OtherDerived>& other)
EIGEN_DEVICE_FUNC void TriangularViewImpl<MatrixType, Mode, Dense>::lazyAssign(const MatrixBase<OtherDerived>& other)
{
internal::call_assignment_no_alias(derived(), other.template triangularView<Mode>());
}
@@ -575,7 +586,7 @@ void TriangularViewImpl<MatrixType, Mode, Dense>::lazyAssign(const MatrixBase<Ot
template<typename MatrixType, unsigned int Mode>
template<typename OtherDerived>
inline TriangularView<MatrixType, Mode>&
EIGEN_DEVICE_FUNC inline TriangularView<MatrixType, Mode>&
TriangularViewImpl<MatrixType, Mode, Dense>::operator=(const TriangularBase<OtherDerived>& other)
{
eigen_assert(Mode == int(OtherDerived::Mode));
@@ -585,7 +596,7 @@ TriangularViewImpl<MatrixType, Mode, Dense>::operator=(const TriangularBase<Othe
template<typename MatrixType, unsigned int Mode>
template<typename OtherDerived>
void TriangularViewImpl<MatrixType, Mode, Dense>::lazyAssign(const TriangularBase<OtherDerived>& other)
EIGEN_DEVICE_FUNC void TriangularViewImpl<MatrixType, Mode, Dense>::lazyAssign(const TriangularBase<OtherDerived>& other)
{
eigen_assert(Mode == int(OtherDerived::Mode));
internal::call_assignment_no_alias(derived(), other.derived());
@@ -600,7 +611,7 @@ void TriangularViewImpl<MatrixType, Mode, Dense>::lazyAssign(const TriangularBas
* If the matrix is triangular, the opposite part is set to zero. */
template<typename Derived>
template<typename DenseDerived>
void TriangularBase<Derived>::evalTo(MatrixBase<DenseDerived> &other) const
EIGEN_DEVICE_FUNC void TriangularBase<Derived>::evalTo(MatrixBase<DenseDerived> &other) const
{
evalToLazy(other.derived());
}
@@ -626,6 +637,7 @@ void TriangularBase<Derived>::evalTo(MatrixBase<DenseDerived> &other) const
*/
template<typename Derived>
template<unsigned int Mode>
EIGEN_DEVICE_FUNC
typename MatrixBase<Derived>::template TriangularViewReturnType<Mode>::Type
MatrixBase<Derived>::triangularView()
{
@@ -635,6 +647,7 @@ MatrixBase<Derived>::triangularView()
/** This is the const version of MatrixBase::triangularView() */
template<typename Derived>
template<unsigned int Mode>
EIGEN_DEVICE_FUNC
typename MatrixBase<Derived>::template ConstTriangularViewReturnType<Mode>::Type
MatrixBase<Derived>::triangularView() const
{
@@ -717,6 +730,7 @@ struct unary_evaluator<TriangularView<MatrixType,Mode>, IndexBased>
{
typedef TriangularView<MatrixType,Mode> XprType;
typedef evaluator<typename internal::remove_all<MatrixType>::type> Base;
EIGEN_DEVICE_FUNC
unary_evaluator(const XprType &xpr) : Base(xpr.nestedExpression()) {}
};
@@ -932,7 +946,7 @@ struct triangular_assignment_loop<Kernel, Mode, Dynamic, SetOpposite>
* If the matrix is triangular, the opposite part is set to zero. */
template<typename Derived>
template<typename DenseDerived>
void TriangularBase<Derived>::evalToLazy(MatrixBase<DenseDerived> &other) const
EIGEN_DEVICE_FUNC void TriangularBase<Derived>::evalToLazy(MatrixBase<DenseDerived> &other) const
{
other.derived().resize(this->rows(), this->cols());
internal::call_triangular_assignment_loop<Derived::Mode,(Derived::Mode&SelfAdjoint)==0 /* SetOpposite */>(other.derived(), derived().nestedExpression());

View File

@@ -35,7 +35,7 @@ struct traits<VectorBlock<VectorType, Size> >
* It is the return type of DenseBase::segment(Index,Index) and DenseBase::segment<int>(Index) and
* most of the time this is the only way it is used.
*
* However, if you want to directly maniputate sub-vector expressions,
* However, if you want to directly manipulate sub-vector expressions,
* for instance if you want to write a function returning such an expression, you
* will need to use this class.
*
@@ -71,8 +71,8 @@ template<typename VectorType, int Size> class VectorBlock
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline VectorBlock(VectorType& vector, Index start, Index size)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
VectorBlock(VectorType& vector, Index start, Index size)
: Base(vector,
IsColVector ? start : 0, IsColVector ? 0 : start,
IsColVector ? size : 1, IsColVector ? 1 : size)
@@ -82,8 +82,8 @@ template<typename VectorType, int Size> class VectorBlock
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline VectorBlock(VectorType& vector, Index start)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
VectorBlock(VectorType& vector, Index start)
: Base(vector, IsColVector ? start : 0, IsColVector ? 0 : start)
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorBlock);

View File

@@ -1,7 +1,7 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
// Copyright (C) 2008-2019 Gael Guennebaud <gael.guennebaud@inria.fr>
// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
@@ -81,39 +81,46 @@ class PartialReduxExpr : public internal::dense_xpr_base< PartialReduxExpr<Matri
const MemberOp m_functor;
};
#define EIGEN_MEMBER_FUNCTOR(MEMBER,COST) \
template <typename ResultType> \
struct member_##MEMBER { \
EIGEN_EMPTY_STRUCT_CTOR(member_##MEMBER) \
typedef ResultType result_type; \
template<typename Scalar, int Size> struct Cost \
{ enum { value = COST }; }; \
template<typename XprType> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
ResultType operator()(const XprType& mat) const \
{ return mat.MEMBER(); } \
template<typename A,typename B> struct partial_redux_dummy_func;
#define EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(MEMBER,COST,VECTORIZABLE,BINARYOP) \
template <typename ResultType,typename Scalar> \
struct member_##MEMBER { \
EIGEN_EMPTY_STRUCT_CTOR(member_##MEMBER) \
typedef ResultType result_type; \
typedef BINARYOP<Scalar,Scalar> BinaryOp; \
template<int Size> struct Cost { enum { value = COST }; }; \
enum { Vectorizable = VECTORIZABLE }; \
template<typename XprType> \
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \
ResultType operator()(const XprType& mat) const \
{ return mat.MEMBER(); } \
BinaryOp binaryFunc() const { return BinaryOp(); } \
}
#define EIGEN_MEMBER_FUNCTOR(MEMBER,COST) \
EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(MEMBER,COST,0,partial_redux_dummy_func)
namespace internal {
EIGEN_MEMBER_FUNCTOR(squaredNorm, Size * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(norm, (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(stableNorm, (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(blueNorm, (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(hypotNorm, (Size-1) * functor_traits<scalar_hypot_op<Scalar> >::Cost );
EIGEN_MEMBER_FUNCTOR(sum, (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(mean, (Size-1)*NumTraits<Scalar>::AddCost + NumTraits<Scalar>::MulCost);
EIGEN_MEMBER_FUNCTOR(minCoeff, (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(maxCoeff, (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(all, (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(any, (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(count, (Size-1)*NumTraits<Scalar>::AddCost);
EIGEN_MEMBER_FUNCTOR(prod, (Size-1)*NumTraits<Scalar>::MulCost);
template <int p, typename ResultType>
EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(sum, (Size-1)*NumTraits<Scalar>::AddCost, 1, internal::scalar_sum_op);
EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(minCoeff, (Size-1)*NumTraits<Scalar>::AddCost, 1, internal::scalar_min_op);
EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(maxCoeff, (Size-1)*NumTraits<Scalar>::AddCost, 1, internal::scalar_max_op);
EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(prod, (Size-1)*NumTraits<Scalar>::MulCost, 1, internal::scalar_product_op);
template <int p, typename ResultType,typename Scalar>
struct member_lpnorm {
typedef ResultType result_type;
template<typename Scalar, int Size> struct Cost
enum { Vectorizable = 0 };
template<int Size> struct Cost
{ enum { value = (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost }; };
EIGEN_DEVICE_FUNC member_lpnorm() {}
template<typename XprType>
@@ -121,17 +128,20 @@ struct member_lpnorm {
{ return mat.template lpNorm<p>(); }
};
template <typename BinaryOp, typename Scalar>
template <typename BinaryOpT, typename Scalar>
struct member_redux {
typedef BinaryOpT BinaryOp;
typedef typename result_of<
BinaryOp(const Scalar&,const Scalar&)
>::type result_type;
template<typename _Scalar, int Size> struct Cost
{ enum { value = (Size-1) * functor_traits<BinaryOp>::Cost }; };
enum { Vectorizable = functor_traits<BinaryOp>::PacketAccess };
template<int Size> struct Cost { enum { value = (Size-1) * functor_traits<BinaryOp>::Cost }; };
EIGEN_DEVICE_FUNC explicit member_redux(const BinaryOp func) : m_functor(func) {}
template<typename Derived>
EIGEN_DEVICE_FUNC inline result_type operator()(const DenseBase<Derived>& mat) const
{ return mat.redux(m_functor); }
const BinaryOp& binaryFunc() const { return m_functor; }
const BinaryOp m_functor;
};
}
@@ -139,18 +149,38 @@ struct member_redux {
/** \class VectorwiseOp
* \ingroup Core_Module
*
* \brief Pseudo expression providing partial reduction operations
* \brief Pseudo expression providing broadcasting and partial reduction operations
*
* \tparam ExpressionType the type of the object on which to do partial reductions
* \tparam Direction indicates the direction of the redux (#Vertical or #Horizontal)
* \tparam Direction indicates whether to operate on columns (#Vertical) or rows (#Horizontal)
*
* This class represents a pseudo expression with partial reduction features.
* This class represents a pseudo expression with broadcasting and partial reduction features.
* It is the return type of DenseBase::colwise() and DenseBase::rowwise()
* and most of the time this is the only way it is used.
* and most of the time this is the only way it is explicitly used.
*
* To understand the logic of rowwise/colwise expression, let's consider a generic case `A.colwise().foo()`
* where `foo` is any method of `VectorwiseOp`. This expression is equivalent to applying `foo()` to each
* column of `A` and then re-assemble the outputs in a matrix expression:
* \code [A.col(0).foo(), A.col(1).foo(), ..., A.col(A.cols()-1).foo()] \endcode
*
* Example: \include MatrixBase_colwise.cpp
* Output: \verbinclude MatrixBase_colwise.out
*
* The begin() and end() methods are obviously exceptions to the previous rule as they
* return STL-compatible begin/end iterators to the rows or columns of the nested expression.
* Typical use cases include for-range-loop and calls to STL algorithms:
*
* Example: \include MatrixBase_colwise_iterator_cxx11.cpp
* Output: \verbinclude MatrixBase_colwise_iterator_cxx11.out
*
* For a partial reduction on an empty input, some rules apply.
* For the sake of clarity, let's consider a vertical reduction:
* - If the number of columns is zero, then a 1x0 row-major vector expression is returned.
* - Otherwise, if the number of rows is zero, then
* - a row vector of zeros is returned for sum-like reductions (sum, squaredNorm, norm, etc.)
* - a row vector of ones is returned for a product reduction (e.g., <code>MatrixXd(n,0).colwise().prod()</code>)
* - an assert is triggered for all other reductions (minCoeff,maxCoeff,redux(bin_op))
*
* \sa DenseBase::colwise(), DenseBase::rowwise(), class PartialReduxExpr
*/
template<typename ExpressionType, int Direction> class VectorwiseOp
@@ -163,11 +193,11 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
typedef typename internal::ref_selector<ExpressionType>::non_const_type ExpressionTypeNested;
typedef typename internal::remove_all<ExpressionTypeNested>::type ExpressionTypeNestedCleaned;
template<template<typename _Scalar> class Functor,
typename Scalar_=Scalar> struct ReturnType
template<template<typename OutScalar,typename InputScalar> class Functor,
typename ReturnScalar=Scalar> struct ReturnType
{
typedef PartialReduxExpr<ExpressionType,
Functor<Scalar_>,
Functor<ReturnScalar,Scalar>,
Direction
> Type;
};
@@ -186,24 +216,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
};
protected:
typedef typename internal::conditional<isVertical,
typename ExpressionType::ColXpr,
typename ExpressionType::RowXpr>::type SubVector;
/** \internal
* \returns the i-th subvector according to the \c Direction */
EIGEN_DEVICE_FUNC
SubVector subVector(Index i)
{
return SubVector(m_matrix.derived(),i);
}
/** \internal
* \returns the number of subvectors in the direction \c Direction */
EIGEN_DEVICE_FUNC
Index subVectors() const
{ return isVertical?m_matrix.cols():m_matrix.rows(); }
template<typename OtherDerived> struct ExtendedType {
typedef Replicate<OtherDerived,
isVertical ? 1 : ExpressionType::RowsAtCompileTime,
@@ -258,42 +271,81 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_DEVICE_FUNC
inline const ExpressionType& _expression() const { return m_matrix; }
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** STL-like <a href="https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator">RandomAccessIterator</a>
* iterator type over the columns or rows as returned by the begin() and end() methods.
*/
random_access_iterator_type iterator;
/** This is the const version of iterator (aka read-only) */
random_access_iterator_type const_iterator;
#else
typedef internal::subvector_stl_iterator<ExpressionType, DirectionType(Direction)> iterator;
typedef internal::subvector_stl_iterator<const ExpressionType, DirectionType(Direction)> const_iterator;
#endif
/** returns an iterator to the first row (rowwise) or column (colwise) of the nested expression.
* \sa end(), cbegin()
*/
iterator begin() { return iterator (m_matrix, 0); }
/** const version of begin() */
const_iterator begin() const { return const_iterator(m_matrix, 0); }
/** const version of begin() */
const_iterator cbegin() const { return const_iterator(m_matrix, 0); }
/** returns an iterator to the row (resp. column) following the last row (resp. column) of the nested expression
* \sa begin(), cend()
*/
iterator end() { return iterator (m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** const version of end() */
const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors<DirectionType(Direction)>()); }
/** \returns a row or column vector expression of \c *this reduxed by \a func
*
* The template parameter \a BinaryOp is the type of the functor
* of the custom redux operator. Note that func must be an associative operator.
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
* \sa class VectorwiseOp, DenseBase::colwise(), DenseBase::rowwise()
*/
template<typename BinaryOp>
EIGEN_DEVICE_FUNC
const typename ReduxReturnType<BinaryOp>::Type
redux(const BinaryOp& func = BinaryOp()) const
{ return typename ReduxReturnType<BinaryOp>::Type(_expression(), internal::member_redux<BinaryOp,Scalar>(func)); }
{
eigen_assert(redux_length()>0 && "you are using an empty matrix");
return typename ReduxReturnType<BinaryOp>::Type(_expression(), internal::member_redux<BinaryOp,Scalar>(func));
}
typedef typename ReturnType<internal::member_minCoeff>::Type MinCoeffReturnType;
typedef typename ReturnType<internal::member_maxCoeff>::Type MaxCoeffReturnType;
typedef typename ReturnType<internal::member_squaredNorm,RealScalar>::Type SquaredNormReturnType;
typedef typename ReturnType<internal::member_norm,RealScalar>::Type NormReturnType;
typedef PartialReduxExpr<const CwiseUnaryOp<internal::scalar_abs2_op<Scalar>, const ExpressionTypeNestedCleaned>,internal::member_sum<RealScalar,RealScalar>,Direction> SquaredNormReturnType;
typedef CwiseUnaryOp<internal::scalar_sqrt_op<RealScalar>, const SquaredNormReturnType> NormReturnType;
typedef typename ReturnType<internal::member_blueNorm,RealScalar>::Type BlueNormReturnType;
typedef typename ReturnType<internal::member_stableNorm,RealScalar>::Type StableNormReturnType;
typedef typename ReturnType<internal::member_hypotNorm,RealScalar>::Type HypotNormReturnType;
typedef typename ReturnType<internal::member_sum>::Type SumReturnType;
typedef typename ReturnType<internal::member_mean>::Type MeanReturnType;
typedef EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(SumReturnType,Scalar,quotient) MeanReturnType;
typedef typename ReturnType<internal::member_all>::Type AllReturnType;
typedef typename ReturnType<internal::member_any>::Type AnyReturnType;
typedef PartialReduxExpr<ExpressionType, internal::member_count<Index>, Direction> CountReturnType;
typedef PartialReduxExpr<ExpressionType, internal::member_count<Index,Scalar>, Direction> CountReturnType;
typedef typename ReturnType<internal::member_prod>::Type ProdReturnType;
typedef Reverse<const ExpressionType, Direction> ConstReverseReturnType;
typedef Reverse<ExpressionType, Direction> ReverseReturnType;
template<int p> struct LpNormReturnType {
typedef PartialReduxExpr<ExpressionType, internal::member_lpnorm<p,RealScalar>,Direction> Type;
typedef PartialReduxExpr<ExpressionType, internal::member_lpnorm<p,RealScalar,Scalar>,Direction> Type;
};
/** \returns a row (or column) vector expression of the smallest coefficient
* of each column (or row) of the referenced expression.
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* Example: \include PartialRedux_minCoeff.cpp
@@ -302,11 +354,17 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
* \sa DenseBase::minCoeff() */
EIGEN_DEVICE_FUNC
const MinCoeffReturnType minCoeff() const
{ return MinCoeffReturnType(_expression()); }
{
eigen_assert(redux_length()>0 && "you are using an empty matrix");
return MinCoeffReturnType(_expression());
}
/** \returns a row (or column) vector expression of the largest coefficient
* of each column (or row) of the referenced expression.
*
* \warning the size along the reduction direction must be strictly positive,
* otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* Example: \include PartialRedux_maxCoeff.cpp
@@ -315,7 +373,10 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
* \sa DenseBase::maxCoeff() */
EIGEN_DEVICE_FUNC
const MaxCoeffReturnType maxCoeff() const
{ return MaxCoeffReturnType(_expression()); }
{
eigen_assert(redux_length()>0 && "you are using an empty matrix");
return MaxCoeffReturnType(_expression());
}
/** \returns a row (or column) vector expression of the squared norm
* of each column (or row) of the referenced expression.
@@ -327,7 +388,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
* \sa DenseBase::squaredNorm() */
EIGEN_DEVICE_FUNC
const SquaredNormReturnType squaredNorm() const
{ return SquaredNormReturnType(_expression()); }
{ return SquaredNormReturnType(m_matrix.cwiseAbs2()); }
/** \returns a row (or column) vector expression of the norm
* of each column (or row) of the referenced expression.
@@ -339,7 +400,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
* \sa DenseBase::norm() */
EIGEN_DEVICE_FUNC
const NormReturnType norm() const
{ return NormReturnType(_expression()); }
{ return NormReturnType(squaredNorm()); }
/** \returns a row (or column) vector expression of the norm
* of each column (or row) of the referenced expression.
@@ -404,7 +465,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
* \sa DenseBase::mean() */
EIGEN_DEVICE_FUNC
const MeanReturnType mean() const
{ return MeanReturnType(_expression()); }
{ return sum() / Scalar(Direction==Vertical?m_matrix.rows():m_matrix.cols()); }
/** \returns a row (or column) vector expression representing
* whether \b all coefficients of each respective column (or row) are \c true.
@@ -500,7 +561,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
//eigen_assert((m_matrix.isNull()) == (other.isNull())); FIXME
return const_cast<ExpressionType&>(m_matrix = extendedTo(other.derived()));
return m_matrix = extendedTo(other.derived());
}
/** Adds the vector \a other to each subvector of \c *this */
@@ -510,7 +571,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
return const_cast<ExpressionType&>(m_matrix += extendedTo(other.derived()));
return m_matrix += extendedTo(other.derived());
}
/** Substracts the vector \a other to each subvector of \c *this */
@@ -520,7 +581,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
return const_cast<ExpressionType&>(m_matrix -= extendedTo(other.derived()));
return m_matrix -= extendedTo(other.derived());
}
/** Multiples each subvector of \c *this by the vector \a other */
@@ -532,7 +593,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType)
EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
m_matrix *= extendedTo(other.derived());
return const_cast<ExpressionType&>(m_matrix);
return m_matrix;
}
/** Divides each subvector of \c *this by the vector \a other */
@@ -544,7 +605,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType)
EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
m_matrix /= extendedTo(other.derived());
return const_cast<ExpressionType&>(m_matrix);
return m_matrix;
}
/** Returns the expression of the sum of the vector \a other to each subvector of \c *this */
@@ -609,7 +670,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
EIGEN_DEVICE_FUNC
CwiseBinaryOp<internal::scalar_quotient_op<Scalar>,
const ExpressionTypeNestedCleaned,
const typename OppositeExtendedType<typename ReturnType<internal::member_norm,RealScalar>::Type>::Type>
const typename OppositeExtendedType<NormReturnType>::Type>
normalized() const { return m_matrix.cwiseQuotient(extendedToOpposite(this->norm())); }
@@ -659,6 +720,10 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
const HNormalizedReturnType hnormalized() const;
protected:
Index redux_length() const
{
return Direction==Vertical ? m_matrix.rows() : m_matrix.cols();
}
ExpressionTypeNested m_matrix;
};
@@ -670,7 +735,7 @@ template<typename ExpressionType, int Direction> class VectorwiseOp
* \sa rowwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting
*/
template<typename Derived>
inline typename DenseBase<Derived>::ColwiseReturnType
EIGEN_DEVICE_FUNC inline typename DenseBase<Derived>::ColwiseReturnType
DenseBase<Derived>::colwise()
{
return ColwiseReturnType(derived());
@@ -684,7 +749,7 @@ DenseBase<Derived>::colwise()
* \sa colwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting
*/
template<typename Derived>
inline typename DenseBase<Derived>::RowwiseReturnType
EIGEN_DEVICE_FUNC inline typename DenseBase<Derived>::RowwiseReturnType
DenseBase<Derived>::rowwise()
{
return RowwiseReturnType(derived());

View File

@@ -40,6 +40,14 @@ struct visitor_impl<Visitor, Derived, 1>
}
};
// This specialization enables visitors on empty matrices at compile-time
template<typename Visitor, typename Derived>
struct visitor_impl<Visitor, Derived, 0> {
EIGEN_DEVICE_FUNC
static inline void run(const Derived &/*mat*/, Visitor& /*visitor*/)
{}
};
template<typename Visitor, typename Derived>
struct visitor_impl<Visitor, Derived, Dynamic>
{
@@ -98,6 +106,8 @@ protected:
*
* \note compared to one or two \em for \em loops, visitors offer automatic
* unrolling for small fixed size matrix.
*
* \note if the matrix is empty, then the visitor is left unchanged.
*
* \sa minCoeff(Index*,Index*), maxCoeff(Index*,Index*), DenseBase::redux()
*/
@@ -106,6 +116,9 @@ template<typename Visitor>
EIGEN_DEVICE_FUNC
void DenseBase<Derived>::visit(Visitor& visitor) const
{
if(size()==0)
return;
typedef typename internal::visitor_evaluator<Derived> ThisEvaluator;
ThisEvaluator thisEval(derived());
@@ -124,6 +137,9 @@ namespace internal {
template <typename Derived>
struct coeff_visitor
{
// default initialization to avoid countless invalid maybe-uninitialized warnings by gcc
EIGEN_DEVICE_FUNC
coeff_visitor() : row(-1), col(-1), res(0) {}
typedef typename Derived::Scalar Scalar;
Index row, col;
Scalar res;
@@ -196,6 +212,9 @@ struct functor_traits<max_coeff_visitor<Scalar> > {
/** \fn DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
* \returns the minimum of all coefficients of *this and puts in *row and *col its location.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(Index*), DenseBase::maxCoeff(Index*,Index*), DenseBase::visit(), DenseBase::minCoeff()
@@ -206,6 +225,8 @@ EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
internal::min_coeff_visitor<Derived> minVisitor;
this->visit(minVisitor);
*rowId = minVisitor.row;
@@ -214,6 +235,9 @@ DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
}
/** \returns the minimum of all coefficients of *this and puts in *index its location.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::minCoeff()
@@ -224,6 +248,8 @@ EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::minCoeff(IndexType* index) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
internal::min_coeff_visitor<Derived> minVisitor;
this->visit(minVisitor);
@@ -233,6 +259,9 @@ DenseBase<Derived>::minCoeff(IndexType* index) const
/** \fn DenseBase<Derived>::maxCoeff(IndexType* rowId, IndexType* colId) const
* \returns the maximum of all coefficients of *this and puts in *row and *col its location.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::maxCoeff()
@@ -243,6 +272,8 @@ EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
internal::max_coeff_visitor<Derived> maxVisitor;
this->visit(maxVisitor);
*rowPtr = maxVisitor.row;
@@ -251,6 +282,9 @@ DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
}
/** \returns the maximum of all coefficients of *this and puts in *index its location.
*
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \warning the result is undefined if \c *this contains NaN.
*
* \sa DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visitor(), DenseBase::maxCoeff()
@@ -261,6 +295,8 @@ EIGEN_DEVICE_FUNC
typename internal::traits<Derived>::Scalar
DenseBase<Derived>::maxCoeff(IndexType* index) const
{
eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
internal::max_coeff_visitor<Derived> maxVisitor;
this->visit(maxVisitor);

View File

@@ -22,6 +22,7 @@ struct Packet4cf
__m256 v;
};
#ifndef EIGEN_VECTORIZE_AVX512
template<> struct packet_traits<std::complex<float> > : default_packet_traits
{
typedef Packet4cf type;
@@ -44,8 +45,9 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
HasSetLinear = 0
};
};
#endif
template<> struct unpacket_traits<Packet4cf> { typedef std::complex<float> type; enum {size=4, alignment=Aligned32}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet4cf> { typedef std::complex<float> type; enum {size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> EIGEN_STRONG_INLINE Packet4cf padd<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf psub<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_sub_ps(a.v,b.v)); }
@@ -67,10 +69,18 @@ template<> EIGEN_STRONG_INLINE Packet4cf pmul<Packet4cf>(const Packet4cf& a, con
return Packet4cf(result);
}
template <>
EIGEN_STRONG_INLINE Packet4cf pcmp_eq(const Packet4cf& a, const Packet4cf& b) {
__m256 eq = _mm256_cmp_ps(a.v, b.v, _CMP_EQ_OQ);
return Packet4cf(_mm256_and_ps(eq, _mm256_permute_ps(eq, 0xb1)));
}
template<> EIGEN_STRONG_INLINE Packet4cf ptrue<Packet4cf>(const Packet4cf& a) { return Packet4cf(ptrue(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pnot<Packet4cf>(const Packet4cf& a) { return Packet4cf(pnot(Packet8f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cf pand <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_and_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf por <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_or_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pxor <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_xor_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pandnot<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_andnot_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pandnot<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_andnot_ps(b.v,a.v)); }
template<> EIGEN_STRONG_INLINE Packet4cf pload <Packet4cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet4cf(pload<Packet8f>(&numext::real_ref(*from))); }
template<> EIGEN_STRONG_INLINE Packet4cf ploadu<Packet4cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet4cf(ploadu<Packet8f>(&numext::real_ref(*from))); }
@@ -228,6 +238,7 @@ struct Packet2cd
__m256d v;
};
#ifndef EIGEN_VECTORIZE_AVX512
template<> struct packet_traits<std::complex<double> > : default_packet_traits
{
typedef Packet2cd type;
@@ -250,8 +261,9 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
HasSetLinear = 0
};
};
#endif
template<> struct unpacket_traits<Packet2cd> { typedef std::complex<double> type; enum {size=2, alignment=Aligned32}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet2cd> { typedef std::complex<double> type; enum {size=2, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> EIGEN_STRONG_INLINE Packet2cd padd<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_add_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd psub<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_sub_pd(a.v,b.v)); }
@@ -272,10 +284,18 @@ template<> EIGEN_STRONG_INLINE Packet2cd pmul<Packet2cd>(const Packet2cd& a, con
return Packet2cd(_mm256_addsub_pd(even, odd));
}
template <>
EIGEN_STRONG_INLINE Packet2cd pcmp_eq(const Packet2cd& a, const Packet2cd& b) {
__m256d eq = _mm256_cmp_pd(a.v, b.v, _CMP_EQ_OQ);
return Packet2cd(pand(eq, _mm256_permute_pd(eq, 0x5)));
}
template<> EIGEN_STRONG_INLINE Packet2cd ptrue<Packet2cd>(const Packet2cd& a) { return Packet2cd(ptrue(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pnot<Packet2cd>(const Packet2cd& a) { return Packet2cd(pnot(Packet4d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet2cd pand <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_and_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd por <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_or_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pxor <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_xor_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pandnot<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_andnot_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pandnot<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_andnot_pd(b.v,a.v)); }
template<> EIGEN_STRONG_INLINE Packet2cd pload <Packet2cd>(const std::complex<double>* from)
{ EIGEN_DEBUG_ALIGNED_LOAD return Packet2cd(pload<Packet4d>((const double*)from)); }

View File

@@ -10,7 +10,7 @@
#ifndef EIGEN_MATH_FUNCTIONS_AVX_H
#define EIGEN_MATH_FUNCTIONS_AVX_H
/* The sin, cos, exp, and log functions of this file are loosely derived from
/* The sin and cos functions of this file are loosely derived from
* Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
*/
@@ -18,187 +18,32 @@ namespace Eigen {
namespace internal {
inline Packet8i pshiftleft(Packet8i v, int n)
{
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_slli_epi32(v, n);
#else
__m128i lo = _mm_slli_epi32(_mm256_extractf128_si256(v, 0), n);
__m128i hi = _mm_slli_epi32(_mm256_extractf128_si256(v, 1), n);
return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1);
#endif
}
inline Packet8f pshiftright(Packet8f v, int n)
{
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_cvtepi32_ps(_mm256_srli_epi32(_mm256_castps_si256(v), n));
#else
__m128i lo = _mm_srli_epi32(_mm256_extractf128_si256(_mm256_castps_si256(v), 0), n);
__m128i hi = _mm_srli_epi32(_mm256_extractf128_si256(_mm256_castps_si256(v), 1), n);
return _mm256_cvtepi32_ps(_mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1));
#endif
}
// Sine function
// Computes sin(x) by wrapping x to the interval [-Pi/4,3*Pi/4] and
// evaluating interpolants in [-Pi/4,Pi/4] or [Pi/4,3*Pi/4]. The interpolants
// are (anti-)symmetric and thus have only odd/even coefficients
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
psin<Packet8f>(const Packet8f& _x) {
Packet8f x = _x;
// Some useful values.
_EIGEN_DECLARE_CONST_Packet8i(one, 1);
_EIGEN_DECLARE_CONST_Packet8f(one, 1.0f);
_EIGEN_DECLARE_CONST_Packet8f(two, 2.0f);
_EIGEN_DECLARE_CONST_Packet8f(one_over_four, 0.25f);
_EIGEN_DECLARE_CONST_Packet8f(one_over_pi, 3.183098861837907e-01f);
_EIGEN_DECLARE_CONST_Packet8f(neg_pi_first, -3.140625000000000e+00f);
_EIGEN_DECLARE_CONST_Packet8f(neg_pi_second, -9.670257568359375e-04f);
_EIGEN_DECLARE_CONST_Packet8f(neg_pi_third, -6.278329571784980e-07f);
_EIGEN_DECLARE_CONST_Packet8f(four_over_pi, 1.273239544735163e+00f);
// Map x from [-Pi/4,3*Pi/4] to z in [-1,3] and subtract the shifted period.
Packet8f z = pmul(x, p8f_one_over_pi);
Packet8f shift = _mm256_floor_ps(padd(z, p8f_one_over_four));
x = pmadd(shift, p8f_neg_pi_first, x);
x = pmadd(shift, p8f_neg_pi_second, x);
x = pmadd(shift, p8f_neg_pi_third, x);
z = pmul(x, p8f_four_over_pi);
// Make a mask for the entries that need flipping, i.e. wherever the shift
// is odd.
Packet8i shift_ints = _mm256_cvtps_epi32(shift);
Packet8i shift_isodd = _mm256_castps_si256(_mm256_and_ps(_mm256_castsi256_ps(shift_ints), _mm256_castsi256_ps(p8i_one)));
Packet8i sign_flip_mask = pshiftleft(shift_isodd, 31);
// Create a mask for which interpolant to use, i.e. if z > 1, then the mask
// is set to ones for that entry.
Packet8f ival_mask = _mm256_cmp_ps(z, p8f_one, _CMP_GT_OQ);
// Evaluate the polynomial for the interval [1,3] in z.
_EIGEN_DECLARE_CONST_Packet8f(coeff_right_0, 9.999999724233232e-01f);
_EIGEN_DECLARE_CONST_Packet8f(coeff_right_2, -3.084242535619928e-01f);
_EIGEN_DECLARE_CONST_Packet8f(coeff_right_4, 1.584991525700324e-02f);
_EIGEN_DECLARE_CONST_Packet8f(coeff_right_6, -3.188805084631342e-04f);
Packet8f z_minus_two = psub(z, p8f_two);
Packet8f z_minus_two2 = pmul(z_minus_two, z_minus_two);
Packet8f right = pmadd(p8f_coeff_right_6, z_minus_two2, p8f_coeff_right_4);
right = pmadd(right, z_minus_two2, p8f_coeff_right_2);
right = pmadd(right, z_minus_two2, p8f_coeff_right_0);
// Evaluate the polynomial for the interval [-1,1] in z.
_EIGEN_DECLARE_CONST_Packet8f(coeff_left_1, 7.853981525427295e-01f);
_EIGEN_DECLARE_CONST_Packet8f(coeff_left_3, -8.074536727092352e-02f);
_EIGEN_DECLARE_CONST_Packet8f(coeff_left_5, 2.489871967827018e-03f);
_EIGEN_DECLARE_CONST_Packet8f(coeff_left_7, -3.587725841214251e-05f);
Packet8f z2 = pmul(z, z);
Packet8f left = pmadd(p8f_coeff_left_7, z2, p8f_coeff_left_5);
left = pmadd(left, z2, p8f_coeff_left_3);
left = pmadd(left, z2, p8f_coeff_left_1);
left = pmul(left, z);
// Assemble the results, i.e. select the left and right polynomials.
left = _mm256_andnot_ps(ival_mask, left);
right = _mm256_and_ps(ival_mask, right);
Packet8f res = _mm256_or_ps(left, right);
// Flip the sign on the odd intervals and return the result.
res = _mm256_xor_ps(res, _mm256_castsi256_ps(sign_flip_mask));
return res;
return psin_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
pcos<Packet8f>(const Packet8f& _x) {
return pcos_float(_x);
}
// Natural logarithm
// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
// be easily approximated by a polynomial centered on m=1 for stability.
// TODO(gonnet): Further reduce the interval allowing for lower-degree
// polynomial interpolants -> ... -> profit!
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
plog<Packet8f>(const Packet8f& _x) {
Packet8f x = _x;
_EIGEN_DECLARE_CONST_Packet8f(1, 1.0f);
_EIGEN_DECLARE_CONST_Packet8f(half, 0.5f);
_EIGEN_DECLARE_CONST_Packet8f(126f, 126.0f);
return plog_float(_x);
}
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inv_mant_mask, ~0x7f800000);
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f plog1p<Packet8f>(const Packet8f& _x) {
return generic_plog1p(_x);
}
// The smallest non denormalized float number.
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(min_norm_pos, 0x00800000);
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(minus_inf, 0xff800000);
// Polynomial coefficients.
_EIGEN_DECLARE_CONST_Packet8f(cephes_SQRTHF, 0.707106781186547524f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p0, 7.0376836292E-2f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p1, -1.1514610310E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p2, 1.1676998740E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p3, -1.2420140846E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p4, +1.4249322787E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p5, -1.6668057665E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p6, +2.0000714765E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p7, -2.4999993993E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_p8, +3.3333331174E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_q1, -2.12194440e-4f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_log_q2, 0.693359375f);
Packet8f invalid_mask = _mm256_cmp_ps(x, _mm256_setzero_ps(), _CMP_NGE_UQ); // not greater equal is true if x is NaN
Packet8f iszero_mask = _mm256_cmp_ps(x, _mm256_setzero_ps(), _CMP_EQ_OQ);
// Truncate input values to the minimum positive normal.
x = pmax(x, p8f_min_norm_pos);
Packet8f emm0 = pshiftright(x,23);
Packet8f e = _mm256_sub_ps(emm0, p8f_126f);
// Set the exponents to -1, i.e. x are in the range [0.5,1).
x = _mm256_and_ps(x, p8f_inv_mant_mask);
x = _mm256_or_ps(x, p8f_half);
// part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2))
// and shift by -1. The values are then centered around 0, which improves
// the stability of the polynomial evaluation.
// if( x < SQRTHF ) {
// e -= 1;
// x = x + x - 1.0;
// } else { x = x - 1.0; }
Packet8f mask = _mm256_cmp_ps(x, p8f_cephes_SQRTHF, _CMP_LT_OQ);
Packet8f tmp = _mm256_and_ps(x, mask);
x = psub(x, p8f_1);
e = psub(e, _mm256_and_ps(p8f_1, mask));
x = padd(x, tmp);
Packet8f x2 = pmul(x, x);
Packet8f x3 = pmul(x2, x);
// Evaluate the polynomial approximant of degree 8 in three parts, probably
// to improve instruction-level parallelism.
Packet8f y, y1, y2;
y = pmadd(p8f_cephes_log_p0, x, p8f_cephes_log_p1);
y1 = pmadd(p8f_cephes_log_p3, x, p8f_cephes_log_p4);
y2 = pmadd(p8f_cephes_log_p6, x, p8f_cephes_log_p7);
y = pmadd(y, x, p8f_cephes_log_p2);
y1 = pmadd(y1, x, p8f_cephes_log_p5);
y2 = pmadd(y2, x, p8f_cephes_log_p8);
y = pmadd(y, x3, y1);
y = pmadd(y, x3, y2);
y = pmul(y, x3);
// Add the logarithm of the exponent back to the result of the interpolation.
y1 = pmul(e, p8f_cephes_log_q1);
tmp = pmul(x2, p8f_half);
y = padd(y, y1);
x = psub(x, tmp);
y2 = pmul(e, p8f_cephes_log_q2);
x = padd(x, y);
x = padd(x, y2);
// Filter out invalid inputs, i.e. negative arg will be NAN, 0 will be -INF.
return _mm256_or_ps(
_mm256_andnot_ps(iszero_mask, _mm256_or_ps(x, invalid_mask)),
_mm256_and_ps(iszero_mask, p8f_minus_inf));
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f pexpm1<Packet8f>(const Packet8f& _x) {
return generic_expm1(_x);
}
// Exponential function. Works by writing "x = m*log(2) + r" where
@@ -207,62 +52,7 @@ plog<Packet8f>(const Packet8f& _x) {
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
pexp<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f(1, 1.0f);
_EIGEN_DECLARE_CONST_Packet8f(half, 0.5f);
_EIGEN_DECLARE_CONST_Packet8f(127, 127.0f);
_EIGEN_DECLARE_CONST_Packet8f(exp_hi, 88.3762626647950f);
_EIGEN_DECLARE_CONST_Packet8f(exp_lo, -88.3762626647949f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_LOG2EF, 1.44269504088896341f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p0, 1.9875691500E-4f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p1, 1.3981999507E-3f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p2, 8.3334519073E-3f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p3, 4.1665795894E-2f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p4, 1.6666665459E-1f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p5, 5.0000001201E-1f);
// Clamp x.
Packet8f x = pmax(pmin(_x, p8f_exp_hi), p8f_exp_lo);
// Express exp(x) as exp(m*ln(2) + r), start by extracting
// m = floor(x/ln(2) + 0.5).
Packet8f m = _mm256_floor_ps(pmadd(x, p8f_cephes_LOG2EF, p8f_half));
// Get r = x - m*ln(2). If no FMA instructions are available, m*ln(2) is
// subtracted out in two parts, m*C1+m*C2 = m*ln(2), to avoid accumulating
// truncation errors. Note that we don't use the "pmadd" function here to
// ensure that a precision-preserving FMA instruction is used.
#ifdef EIGEN_VECTORIZE_FMA
_EIGEN_DECLARE_CONST_Packet8f(nln2, -0.6931471805599453f);
Packet8f r = _mm256_fmadd_ps(m, p8f_nln2, x);
#else
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_C1, 0.693359375f);
_EIGEN_DECLARE_CONST_Packet8f(cephes_exp_C2, -2.12194440e-4f);
Packet8f r = psub(x, pmul(m, p8f_cephes_exp_C1));
r = psub(r, pmul(m, p8f_cephes_exp_C2));
#endif
Packet8f r2 = pmul(r, r);
// TODO(gonnet): Split into odd/even polynomials and try to exploit
// instruction-level parallelism.
Packet8f y = p8f_cephes_exp_p0;
y = pmadd(y, r, p8f_cephes_exp_p1);
y = pmadd(y, r, p8f_cephes_exp_p2);
y = pmadd(y, r, p8f_cephes_exp_p3);
y = pmadd(y, r, p8f_cephes_exp_p4);
y = pmadd(y, r, p8f_cephes_exp_p5);
y = pmadd(y, r2, r);
y = padd(y, p8f_1);
// Build emm0 = 2^m.
Packet8i emm0 = _mm256_cvttps_epi32(padd(m, p8f_127));
emm0 = pshiftleft(emm0, 23);
// Return 2^m * exp(r).
return pmax(pmul(y, _mm256_castsi256_ps(emm0)), _x);
return pexp_float(_x);
}
// Hyperbolic Tangent function.
@@ -272,84 +62,11 @@ ptanh<Packet8f>(const Packet8f& x) {
return internal::generic_fast_tanh_float(x);
}
// Exponential function for doubles.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
pexp<Packet4d>(const Packet4d& _x) {
Packet4d x = _x;
_EIGEN_DECLARE_CONST_Packet4d(1, 1.0);
_EIGEN_DECLARE_CONST_Packet4d(2, 2.0);
_EIGEN_DECLARE_CONST_Packet4d(half, 0.5);
_EIGEN_DECLARE_CONST_Packet4d(exp_hi, 709.437);
_EIGEN_DECLARE_CONST_Packet4d(exp_lo, -709.436139303);
_EIGEN_DECLARE_CONST_Packet4d(cephes_LOG2EF, 1.4426950408889634073599);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_p0, 1.26177193074810590878e-4);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_p1, 3.02994407707441961300e-2);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_p2, 9.99999999999999999910e-1);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q0, 3.00198505138664455042e-6);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q1, 2.52448340349684104192e-3);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q2, 2.27265548208155028766e-1);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q3, 2.00000000000000000009e0);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_C1, 0.693145751953125);
_EIGEN_DECLARE_CONST_Packet4d(cephes_exp_C2, 1.42860682030941723212e-6);
_EIGEN_DECLARE_CONST_Packet4i(1023, 1023);
Packet4d tmp, fx;
// clamp x
x = pmax(pmin(x, p4d_exp_hi), p4d_exp_lo);
// Express exp(x) as exp(g + n*log(2)).
fx = pmadd(p4d_cephes_LOG2EF, x, p4d_half);
// Get the integer modulus of log(2), i.e. the "n" described above.
fx = _mm256_floor_pd(fx);
// Get the remainder modulo log(2), i.e. the "g" described above. Subtract
// n*log(2) out in two steps, i.e. n*C1 + n*C2, C1+C2=log2 to get the last
// digits right.
tmp = pmul(fx, p4d_cephes_exp_C1);
Packet4d z = pmul(fx, p4d_cephes_exp_C2);
x = psub(x, tmp);
x = psub(x, z);
Packet4d x2 = pmul(x, x);
// Evaluate the numerator polynomial of the rational interpolant.
Packet4d px = p4d_cephes_exp_p0;
px = pmadd(px, x2, p4d_cephes_exp_p1);
px = pmadd(px, x2, p4d_cephes_exp_p2);
px = pmul(px, x);
// Evaluate the denominator polynomial of the rational interpolant.
Packet4d qx = p4d_cephes_exp_q0;
qx = pmadd(qx, x2, p4d_cephes_exp_q1);
qx = pmadd(qx, x2, p4d_cephes_exp_q2);
qx = pmadd(qx, x2, p4d_cephes_exp_q3);
// I don't really get this bit, copied from the SSE2 routines, so...
// TODO(gonnet): Figure out what is going on here, perhaps find a better
// rational interpolant?
x = _mm256_div_pd(px, psub(qx, px));
x = pmadd(p4d_2, x, p4d_1);
// Build e=2^n by constructing the exponents in a 128-bit vector and
// shifting them to where they belong in double-precision values.
__m128i emm0 = _mm256_cvtpd_epi32(fx);
emm0 = _mm_add_epi32(emm0, p4i_1023);
emm0 = _mm_shuffle_epi32(emm0, _MM_SHUFFLE(3, 1, 2, 0));
__m128i lo = _mm_slli_epi64(emm0, 52);
__m128i hi = _mm_slli_epi64(_mm_srli_epi64(emm0, 32), 52);
__m256i e = _mm256_insertf128_si256(_mm256_setzero_si256(), lo, 0);
e = _mm256_insertf128_si256(e, hi, 1);
// Construct the result 2^n * exp(g) = e * x. The max is used to catch
// non-finite values in the input.
return pmax(pmul(x, _mm256_castsi256_pd(e)), _x);
pexp<Packet4d>(const Packet4d& x) {
return pexp_double(x);
}
// Functions for sqrt.
@@ -392,7 +109,6 @@ Packet4d psqrt<Packet4d>(const Packet4d& x) {
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000);
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(nan, 0x7fc00000);
_EIGEN_DECLARE_CONST_Packet8f(one_point_five, 1.5f);
_EIGEN_DECLARE_CONST_Packet8f(minus_half, -0.5f);
_EIGEN_DECLARE_CONST_Packet8f_FROM_INT(flt_min, 0x00800000);
@@ -401,20 +117,25 @@ Packet8f prsqrt<Packet8f>(const Packet8f& _x) {
// select only the inverse sqrt of positive normal inputs (denormals are
// flushed to zero and cause infs as well).
Packet8f le_zero_mask = _mm256_cmp_ps(_x, p8f_flt_min, _CMP_LT_OQ);
Packet8f x = _mm256_andnot_ps(le_zero_mask, _mm256_rsqrt_ps(_x));
Packet8f lt_min_mask = _mm256_cmp_ps(_x, p8f_flt_min, _CMP_LT_OQ);
Packet8f inf_mask = _mm256_cmp_ps(_x, p8f_inf, _CMP_EQ_OQ);
Packet8f not_normal_finite_mask = _mm256_or_ps(lt_min_mask, inf_mask);
// Fill in NaNs and Infs for the negative/zero entries.
Packet8f neg_mask = _mm256_cmp_ps(_x, _mm256_setzero_ps(), _CMP_LT_OQ);
Packet8f zero_mask = _mm256_andnot_ps(neg_mask, le_zero_mask);
Packet8f infs_and_nans = _mm256_or_ps(_mm256_and_ps(neg_mask, p8f_nan),
_mm256_and_ps(zero_mask, p8f_inf));
// Compute an approximate result using the rsqrt intrinsic.
Packet8f y_approx = _mm256_rsqrt_ps(_x);
// Do a single step of Newton's iteration.
x = pmul(x, pmadd(neg_half, pmul(x, x), p8f_one_point_five));
// Do a single step of Newton-Raphson iteration to improve the approximation.
// This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n).
// It is essential to evaluate the inner term like this because forming
// y_n^2 may over- or underflow.
Packet8f y_newton = pmul(y_approx, pmadd(y_approx, pmul(neg_half, y_approx), p8f_one_point_five));
// Insert NaNs and Infs in all the right places.
return _mm256_or_ps(x, infs_and_nans);
// Select the result of the Newton-Raphson step for positive normal arguments.
// For other arguments, choose the output of the intrinsic. This will
// return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(x) = +inf if
// x is zero or a positive denormalized float (equivalent to flushing positive
// denormalized inputs to zero).
return pselect<Packet8f>(not_normal_finite_mask, y_approx, y_newton);
}
#else

View File

@@ -18,11 +18,11 @@ namespace internal {
#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8
#endif
#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS (2*sizeof(void*))
#if !defined(EIGEN_VECTORIZE_AVX512) && !defined(EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS)
#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 16
#endif
#ifdef __FMA__
#ifdef EIGEN_VECTORIZE_FMA
#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
#endif
@@ -31,10 +31,14 @@ namespace internal {
typedef __m256 Packet8f;
typedef __m256i Packet8i;
typedef __m256d Packet4d;
typedef struct {
__m128i x;
} Packet8h;
template<> struct is_arithmetic<__m256> { enum { value = true }; };
template<> struct is_arithmetic<__m256i> { enum { value = true }; };
template<> struct is_arithmetic<__m256d> { enum { value = true }; };
template<> struct is_arithmetic<Packet8h> { enum { value = true }; };
#define _EIGEN_DECLARE_CONST_Packet8f(NAME,X) \
const Packet8f p8f_##NAME = pset1<Packet8f>(X)
@@ -58,17 +62,22 @@ template<> struct packet_traits<float> : default_packet_traits
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size=8,
size = 8,
HasHalfPacket = 1,
HasDiv = 1,
HasSin = EIGEN_FAST_MATH,
HasCos = 0,
HasLog = 1,
HasExp = 1,
HasDiv = 1,
HasSin = EIGEN_FAST_MATH,
HasCos = EIGEN_FAST_MATH,
HasLog = 1,
HasLog1p = 1,
HasExpm1 = 1,
HasExp = 1,
HasNdtri = 1,
HasBessel = 1,
HasSqrt = 1,
HasRsqrt = 1,
HasTanh = EIGEN_FAST_MATH,
HasTanh = EIGEN_FAST_MATH,
HasErf = EIGEN_FAST_MATH,
HasBlend = 1,
HasRound = 1,
HasFloor = 1,
@@ -95,6 +104,35 @@ template<> struct packet_traits<double> : default_packet_traits
HasCeil = 1
};
};
template <>
struct packet_traits<Eigen::half> : default_packet_traits {
typedef Packet8h type;
// There is no half-size packet for Packet8h.
typedef Packet8h half;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size = 8,
HasHalfPacket = 0,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasConj = 0,
HasSetLinear = 0,
HasSqrt = 0,
HasRsqrt = 0,
HasExp = 0,
HasLog = 0,
HasBlend = 0
};
};
#endif
template<> struct scalar_div_cost<float,true> { enum { value = 14 }; };
@@ -113,14 +151,30 @@ template<> struct packet_traits<int> : default_packet_traits
};
*/
template<> struct unpacket_traits<Packet8f> { typedef float type; typedef Packet4f half; enum {size=8, alignment=Aligned32}; };
template<> struct unpacket_traits<Packet4d> { typedef double type; typedef Packet2d half; enum {size=4, alignment=Aligned32}; };
template<> struct unpacket_traits<Packet8i> { typedef int type; typedef Packet4i half; enum {size=8, alignment=Aligned32}; };
template<> struct unpacket_traits<Packet8f> {
typedef float type;
typedef Packet4f half;
typedef Packet8i integer_packet;
typedef uint8_t mask_t;
enum {size=8, alignment=Aligned32, vectorizable=true, masked_load_available=true, masked_store_available=true};
};
template<> struct unpacket_traits<Packet4d> {
typedef double type;
typedef Packet2d half;
enum {size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false};
};
template<> struct unpacket_traits<Packet8i> { typedef int type; typedef Packet4i half; enum {size=8, alignment=Aligned32, vectorizable=false, masked_load_available=false, masked_store_available=false}; };
template<> EIGEN_STRONG_INLINE Packet8f pset1<Packet8f>(const float& from) { return _mm256_set1_ps(from); }
template<> EIGEN_STRONG_INLINE Packet4d pset1<Packet4d>(const double& from) { return _mm256_set1_pd(from); }
template<> EIGEN_STRONG_INLINE Packet8i pset1<Packet8i>(const int& from) { return _mm256_set1_epi32(from); }
template<> EIGEN_STRONG_INLINE Packet8f pset1frombits<Packet8f>(unsigned int from) { return _mm256_castsi256_ps(pset1<Packet8i>(from)); }
template<> EIGEN_STRONG_INLINE Packet8f pzero(const Packet8f& /*a*/) { return _mm256_setzero_ps(); }
template<> EIGEN_STRONG_INLINE Packet4d pzero(const Packet4d& /*a*/) { return _mm256_setzero_pd(); }
template<> EIGEN_STRONG_INLINE Packet8i pzero(const Packet8i& /*a*/) { return _mm256_setzero_si256(); }
template<> EIGEN_STRONG_INLINE Packet8f pload1<Packet8f>(const float* from) { return _mm256_broadcast_ss(from); }
template<> EIGEN_STRONG_INLINE Packet4d pload1<Packet4d>(const double* from) { return _mm256_broadcast_sd(from); }
@@ -129,6 +183,15 @@ template<> EIGEN_STRONG_INLINE Packet4d plset<Packet4d>(const double& a) { retur
template<> EIGEN_STRONG_INLINE Packet8f padd<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_add_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d padd<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_add_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8i padd<Packet8i>(const Packet8i& a, const Packet8i& b) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_add_epi32(a,b);
#else
__m128i lo = _mm_add_epi32(_mm256_extractf128_si256(a, 0), _mm256_extractf128_si256(b, 0));
__m128i hi = _mm_add_epi32(_mm256_extractf128_si256(a, 1), _mm256_extractf128_si256(b, 1));
return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1);
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f psub<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_sub_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d psub<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_sub_pd(a,b); }
@@ -157,7 +220,7 @@ template<> EIGEN_STRONG_INLINE Packet8i pdiv<Packet8i>(const Packet8i& /*a*/, co
return pset1<Packet8i>(0);
}
#ifdef __FMA__
#ifdef EIGEN_VECTORIZE_FMA
template<> EIGEN_STRONG_INLINE Packet8f pmadd(const Packet8f& a, const Packet8f& b, const Packet8f& c) {
#if ( (EIGEN_COMP_GNUC_STRICT && EIGEN_COMP_GNUC<80) || (EIGEN_COMP_CLANG) )
// Clang stupidly generates a vfmadd213ps instruction plus some vmovaps on registers,
@@ -184,11 +247,69 @@ template<> EIGEN_STRONG_INLINE Packet4d pmadd(const Packet4d& a, const Packet4d&
}
#endif
template<> EIGEN_STRONG_INLINE Packet8f pmin<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_min_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d pmin<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_min_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8f pmin<Packet8f>(const Packet8f& a, const Packet8f& b) {
#if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63
// There appears to be a bug in GCC, by which the optimizer may flip
// the argument order in calls to _mm_min_ps/_mm_max_ps, so we have to
// resort to inline ASM here. This is supposed to be fixed in gcc6.3,
// see also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867
Packet8f res;
asm("vminps %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b));
return res;
#else
// Arguments are swapped to match NaN propagation behavior of std::min.
return _mm256_min_ps(b,a);
#endif
}
template<> EIGEN_STRONG_INLINE Packet4d pmin<Packet4d>(const Packet4d& a, const Packet4d& b) {
#if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63
// See pmin above
Packet4d res;
asm("vminpd %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b));
return res;
#else
// Arguments are swapped to match NaN propagation behavior of std::min.
return _mm256_min_pd(b,a);
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pmax<Packet8f>(const Packet8f& a, const Packet8f& b) {
#if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63
// See pmin above
Packet8f res;
asm("vmaxps %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b));
return res;
#else
// Arguments are swapped to match NaN propagation behavior of std::max.
return _mm256_max_ps(b,a);
#endif
}
template<> EIGEN_STRONG_INLINE Packet4d pmax<Packet4d>(const Packet4d& a, const Packet4d& b) {
#if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63
// See pmin above
Packet4d res;
asm("vmaxpd %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b));
return res;
#else
// Arguments are swapped to match NaN propagation behavior of std::max.
return _mm256_max_pd(b,a);
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pmax<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_max_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d pmax<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_max_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8f pcmp_le(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_LE_OQ); }
template<> EIGEN_STRONG_INLINE Packet8f pcmp_lt(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_LT_OQ); }
template<> EIGEN_STRONG_INLINE Packet8f pcmp_eq(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_EQ_OQ); }
template<> EIGEN_STRONG_INLINE Packet4d pcmp_eq(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_EQ_OQ); }
template<> EIGEN_STRONG_INLINE Packet8f pcmp_lt_or_nan(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a, b, _CMP_NGE_UQ); }
template<> EIGEN_STRONG_INLINE Packet8i pcmp_eq(const Packet8i& a, const Packet8i& b) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_cmpeq_epi32(a,b);
#else
__m128i lo = _mm_cmpeq_epi32(_mm256_extractf128_si256(a, 0), _mm256_extractf128_si256(b, 0));
__m128i hi = _mm_cmpeq_epi32(_mm256_extractf128_si256(a, 1), _mm256_extractf128_si256(b, 1));
return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1);
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pround<Packet8f>(const Packet8f& a) { return _mm256_round_ps(a, _MM_FROUND_CUR_DIRECTION); }
template<> EIGEN_STRONG_INLINE Packet4d pround<Packet4d>(const Packet4d& a) { return _mm256_round_pd(a, _MM_FROUND_CUR_DIRECTION); }
@@ -199,17 +320,101 @@ template<> EIGEN_STRONG_INLINE Packet4d pceil<Packet4d>(const Packet4d& a) { ret
template<> EIGEN_STRONG_INLINE Packet8f pfloor<Packet8f>(const Packet8f& a) { return _mm256_floor_ps(a); }
template<> EIGEN_STRONG_INLINE Packet4d pfloor<Packet4d>(const Packet4d& a) { return _mm256_floor_pd(a); }
template<> EIGEN_STRONG_INLINE Packet8i ptrue<Packet8i>(const Packet8i& a) {
#ifdef EIGEN_VECTORIZE_AVX2
// vpcmpeqd has lower latency than the more general vcmpps
return _mm256_cmpeq_epi32(a,a);
#else
const __m256 b = _mm256_castsi256_ps(a);
return _mm256_castps_si256(_mm256_cmp_ps(b,b,_CMP_TRUE_UQ));
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f ptrue<Packet8f>(const Packet8f& a) {
#ifdef EIGEN_VECTORIZE_AVX2
// vpcmpeqd has lower latency than the more general vcmpps
const __m256i b = _mm256_castps_si256(a);
return _mm256_castsi256_ps(_mm256_cmpeq_epi32(b,b));
#else
return _mm256_cmp_ps(a,a,_CMP_TRUE_UQ);
#endif
}
template<> EIGEN_STRONG_INLINE Packet4d ptrue<Packet4d>(const Packet4d& a) {
#ifdef EIGEN_VECTORIZE_AVX2
// vpcmpeqq has lower latency than the more general vcmppd
const __m256i b = _mm256_castpd_si256(a);
return _mm256_castsi256_pd(_mm256_cmpeq_epi64(b,b));
#else
return _mm256_cmp_pd(a,a,_CMP_TRUE_UQ);
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pand<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_and_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d pand<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_and_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8i pand<Packet8i>(const Packet8i& a, const Packet8i& b) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_and_si256(a,b);
#else
return _mm256_castps_si256(_mm256_and_ps(_mm256_castsi256_ps(a),_mm256_castsi256_ps(b)));
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f por<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_or_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d por<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_or_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8i por<Packet8i>(const Packet8i& a, const Packet8i& b) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_or_si256(a,b);
#else
return _mm256_castps_si256(_mm256_or_ps(_mm256_castsi256_ps(a),_mm256_castsi256_ps(b)));
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pxor<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_xor_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d pxor<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_xor_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8i pxor<Packet8i>(const Packet8i& a, const Packet8i& b) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_xor_si256(a,b);
#else
return _mm256_castps_si256(_mm256_xor_ps(_mm256_castsi256_ps(a),_mm256_castsi256_ps(b)));
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pandnot<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_andnot_ps(a,b); }
template<> EIGEN_STRONG_INLINE Packet4d pandnot<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_andnot_pd(a,b); }
template<> EIGEN_STRONG_INLINE Packet8f pandnot<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_andnot_ps(b,a); }
template<> EIGEN_STRONG_INLINE Packet4d pandnot<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_andnot_pd(b,a); }
template<> EIGEN_STRONG_INLINE Packet8i pandnot<Packet8i>(const Packet8i& a, const Packet8i& b) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_andnot_si256(b,a);
#else
return _mm256_castps_si256(_mm256_andnot_ps(_mm256_castsi256_ps(b),_mm256_castsi256_ps(a)));
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pselect<Packet8f>(const Packet8f& mask, const Packet8f& a, const Packet8f& b)
{ return _mm256_blendv_ps(b,a,mask); }
template<> EIGEN_STRONG_INLINE Packet4d pselect<Packet4d>(const Packet4d& mask, const Packet4d& a, const Packet4d& b)
{ return _mm256_blendv_pd(b,a,mask); }
template<int N> EIGEN_STRONG_INLINE Packet8i pshiftright(Packet8i a) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_srli_epi32(a, N);
#else
__m128i lo = _mm_srli_epi32(_mm256_extractf128_si256(a, 0), N);
__m128i hi = _mm_srli_epi32(_mm256_extractf128_si256(a, 1), N);
return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1);
#endif
}
template<int N> EIGEN_STRONG_INLINE Packet8i pshiftleft(Packet8i a) {
#ifdef EIGEN_VECTORIZE_AVX2
return _mm256_slli_epi32(a, N);
#else
__m128i lo = _mm_slli_epi32(_mm256_extractf128_si256(a, 0), N);
__m128i hi = _mm_slli_epi32(_mm256_extractf128_si256(a, 1), N);
return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1);
#endif
}
template<> EIGEN_STRONG_INLINE Packet8f pload<Packet8f>(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_ps(from); }
template<> EIGEN_STRONG_INLINE Packet4d pload<Packet4d>(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_pd(from); }
@@ -219,6 +424,14 @@ template<> EIGEN_STRONG_INLINE Packet8f ploadu<Packet8f>(const float* from) { EI
template<> EIGEN_STRONG_INLINE Packet4d ploadu<Packet4d>(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_pd(from); }
template<> EIGEN_STRONG_INLINE Packet8i ploadu<Packet8i>(const int* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(from)); }
template<> EIGEN_STRONG_INLINE Packet8f ploadu<Packet8f>(const float* from, uint8_t umask) {
Packet8i mask = _mm256_set1_epi8(static_cast<char>(umask));
const Packet8i bit_mask = _mm256_set_epi32(0xffffff7f, 0xffffffbf, 0xffffffdf, 0xffffffef, 0xfffffff7, 0xfffffffb, 0xfffffffd, 0xfffffffe);
mask = por<Packet8i>(mask, bit_mask);
mask = pcmp_eq<Packet8i>(mask, _mm256_set1_epi32(0xffffffff));
EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_maskload_ps(from, mask);
}
// Loads 4 floats from memory a returns the packet {a0, a0 a1, a1, a2, a2, a3, a3}
template<> EIGEN_STRONG_INLINE Packet8f ploaddup<Packet8f>(const float* from)
{
@@ -226,7 +439,7 @@ template<> EIGEN_STRONG_INLINE Packet8f ploaddup<Packet8f>(const float* from)
// Packet8f tmp = _mm256_castps128_ps256(_mm_loadu_ps(from));
// tmp = _mm256_insertf128_ps(tmp, _mm_movehl_ps(_mm256_castps256_ps128(tmp),_mm256_castps256_ps128(tmp)), 1);
// return _mm256_unpacklo_ps(tmp,tmp);
// _mm256_insertf128_ps is very slow on Haswell, thus:
Packet8f tmp = _mm256_broadcast_ps((const __m128*)(const void*)from);
// mimic an "inplace" permutation of the lower 128bits using a blend
@@ -256,6 +469,14 @@ template<> EIGEN_STRONG_INLINE void pstoreu<float>(float* to, const Packet8f&
template<> EIGEN_STRONG_INLINE void pstoreu<double>(double* to, const Packet4d& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_pd(to, from); }
template<> EIGEN_STRONG_INLINE void pstoreu<int>(int* to, const Packet8i& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_si256(reinterpret_cast<__m256i*>(to), from); }
template<> EIGEN_STRONG_INLINE void pstoreu<float>(float* to, const Packet8f& from, uint8_t umask) {
Packet8i mask = _mm256_set1_epi8(static_cast<char>(umask));
const Packet8i bit_mask = _mm256_set_epi32(0xffffff7f, 0xffffffbf, 0xffffffdf, 0xffffffef, 0xfffffff7, 0xfffffffb, 0xfffffffd, 0xfffffffe);
mask = por<Packet8i>(mask, bit_mask);
mask = pcmp_eq<Packet8i>(mask, _mm256_set1_epi32(0xffffffff));
EIGEN_DEBUG_UNALIGNED_STORE return _mm256_maskstore_ps(to, mask, from);
}
// NOTE: leverage _mm256_i32gather_ps and _mm256_i32gather_pd if AVX2 instructions are available
// NOTE: for the record the following seems to be slower: return _mm256_i32gather_ps(from, _mm256_set1_epi32(stride), 4);
template<> EIGEN_DEVICE_FUNC inline Packet8f pgather<float, Packet8f>(const float* from, Index stride)
@@ -354,6 +575,28 @@ template<> EIGEN_STRONG_INLINE Packet4d pabs(const Packet4d& a)
return _mm256_and_pd(a,mask);
}
template<> EIGEN_STRONG_INLINE Packet8f pfrexp<Packet8f>(const Packet8f& a, Packet8f& exponent) {
return pfrexp_float(a,exponent);
}
template<> EIGEN_STRONG_INLINE Packet8f pldexp<Packet8f>(const Packet8f& a, const Packet8f& exponent) {
return pldexp_float(a,exponent);
}
template<> EIGEN_STRONG_INLINE Packet4d pldexp<Packet4d>(const Packet4d& a, const Packet4d& exponent) {
// Build e=2^n by constructing the exponents in a 128-bit vector and
// shifting them to where they belong in double-precision values.
Packet4i cst_1023 = pset1<Packet4i>(1023);
__m128i emm0 = _mm256_cvtpd_epi32(exponent);
emm0 = _mm_add_epi32(emm0, cst_1023);
emm0 = _mm_shuffle_epi32(emm0, _MM_SHUFFLE(3, 1, 2, 0));
__m128i lo = _mm_slli_epi64(emm0, 52);
__m128i hi = _mm_slli_epi64(_mm_srli_epi64(emm0, 32), 52);
__m256i e = _mm256_insertf128_si256(_mm256_setzero_si256(), lo, 0);
e = _mm256_insertf128_si256(e, hi, 1);
return pmul(a,_mm256_castsi256_pd(e));
}
// preduxp should be ok
// FIXME: why is this ok? why isn't the simply implementation working as expected?
template<> EIGEN_STRONG_INLINE Packet8f preduxp<Packet8f>(const Packet8f* vecs)
@@ -406,7 +649,7 @@ template<> EIGEN_STRONG_INLINE double predux<Packet4d>(const Packet4d& a)
return predux(Packet2d(_mm_add_pd(_mm256_castpd256_pd128(a),_mm256_extractf128_pd(a,1))));
}
template<> EIGEN_STRONG_INLINE Packet4f predux_downto4<Packet8f>(const Packet8f& a)
template<> EIGEN_STRONG_INLINE Packet4f predux_half_dowto4<Packet8f>(const Packet8f& a)
{
return _mm_add_ps(_mm256_castps256_ps128(a),_mm256_extractf128_ps(a,1));
}
@@ -450,6 +693,16 @@ template<> EIGEN_STRONG_INLINE double predux_max<Packet4d>(const Packet4d& a)
return pfirst(_mm256_max_pd(tmp, _mm256_shuffle_pd(tmp, tmp, 1)));
}
// not needed yet
// template<> EIGEN_STRONG_INLINE bool predux_all(const Packet8f& x)
// {
// return _mm256_movemask_ps(x)==0xFF;
// }
template<> EIGEN_STRONG_INLINE bool predux_any(const Packet8f& x)
{
return _mm256_movemask_ps(x)!=0;
}
template<int Offset>
struct palign_impl<Offset,Packet8f>
@@ -630,6 +883,335 @@ template<> EIGEN_STRONG_INLINE Packet4d pinsertlast(const Packet4d& a, double b)
return _mm256_blend_pd(a,pset1<Packet4d>(b),(1<<3));
}
// Packet math for Eigen::half
template<> struct unpacket_traits<Packet8h> { typedef Eigen::half type; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet8h half; };
template<> EIGEN_STRONG_INLINE Packet8h pset1<Packet8h>(const Eigen::half& from) {
Packet8h result;
result.x = _mm_set1_epi16(from.x);
return result;
}
template<> EIGEN_STRONG_INLINE Eigen::half pfirst<Packet8h>(const Packet8h& from) {
return half_impl::raw_uint16_to_half(static_cast<unsigned short>(_mm_extract_epi16(from.x, 0)));
}
template<> EIGEN_STRONG_INLINE Packet8h pload<Packet8h>(const Eigen::half* from) {
Packet8h result;
result.x = _mm_load_si128(reinterpret_cast<const __m128i*>(from));
return result;
}
template<> EIGEN_STRONG_INLINE Packet8h ploadu<Packet8h>(const Eigen::half* from) {
Packet8h result;
result.x = _mm_loadu_si128(reinterpret_cast<const __m128i*>(from));
return result;
}
template<> EIGEN_STRONG_INLINE void pstore<Eigen::half>(Eigen::half* to, const Packet8h& from) {
_mm_store_si128(reinterpret_cast<__m128i*>(to), from.x);
}
template<> EIGEN_STRONG_INLINE void pstoreu<Eigen::half>(Eigen::half* to, const Packet8h& from) {
_mm_storeu_si128(reinterpret_cast<__m128i*>(to), from.x);
}
template<> EIGEN_STRONG_INLINE Packet8h
ploaddup<Packet8h>(const Eigen::half* from) {
Packet8h result;
unsigned short a = from[0].x;
unsigned short b = from[1].x;
unsigned short c = from[2].x;
unsigned short d = from[3].x;
result.x = _mm_set_epi16(d, d, c, c, b, b, a, a);
return result;
}
template<> EIGEN_STRONG_INLINE Packet8h
ploadquad<Packet8h>(const Eigen::half* from) {
Packet8h result;
unsigned short a = from[0].x;
unsigned short b = from[1].x;
result.x = _mm_set_epi16(b, b, b, b, a, a, a, a);
return result;
}
EIGEN_STRONG_INLINE Packet8f half2float(const Packet8h& a) {
#ifdef EIGEN_HAS_FP16_C
return _mm256_cvtph_ps(a.x);
#else
EIGEN_ALIGN32 Eigen::half aux[8];
pstore(aux, a);
float f0(aux[0]);
float f1(aux[1]);
float f2(aux[2]);
float f3(aux[3]);
float f4(aux[4]);
float f5(aux[5]);
float f6(aux[6]);
float f7(aux[7]);
return _mm256_set_ps(f7, f6, f5, f4, f3, f2, f1, f0);
#endif
}
EIGEN_STRONG_INLINE Packet8h float2half(const Packet8f& a) {
#ifdef EIGEN_HAS_FP16_C
Packet8h result;
result.x = _mm256_cvtps_ph(a, _MM_FROUND_TO_NEAREST_INT|_MM_FROUND_NO_EXC);
return result;
#else
EIGEN_ALIGN32 float aux[8];
pstore(aux, a);
Eigen::half h0(aux[0]);
Eigen::half h1(aux[1]);
Eigen::half h2(aux[2]);
Eigen::half h3(aux[3]);
Eigen::half h4(aux[4]);
Eigen::half h5(aux[5]);
Eigen::half h6(aux[6]);
Eigen::half h7(aux[7]);
Packet8h result;
result.x = _mm_set_epi16(h7.x, h6.x, h5.x, h4.x, h3.x, h2.x, h1.x, h0.x);
return result;
#endif
}
template<> EIGEN_STRONG_INLINE Packet8h ptrue(const Packet8h& a) {
Packet8h r; r.x = _mm_cmpeq_epi32(a.x, a.x); return r;
}
template<> EIGEN_STRONG_INLINE Packet8h por(const Packet8h& a,const Packet8h& b) {
// in some cases Packet4i is a wrapper around __m128i, so we either need to
// cast to Packet4i to directly call the intrinsics as below:
Packet8h r; r.x = _mm_or_si128(a.x,b.x); return r;
}
template<> EIGEN_STRONG_INLINE Packet8h pxor(const Packet8h& a,const Packet8h& b) {
Packet8h r; r.x = _mm_xor_si128(a.x,b.x); return r;
}
template<> EIGEN_STRONG_INLINE Packet8h pand(const Packet8h& a,const Packet8h& b) {
Packet8h r; r.x = _mm_and_si128(a.x,b.x); return r;
}
template<> EIGEN_STRONG_INLINE Packet8h pandnot(const Packet8h& a,const Packet8h& b) {
Packet8h r; r.x = _mm_andnot_si128(b.x,a.x); return r;
}
template<> EIGEN_STRONG_INLINE Packet8h pselect(const Packet8h& mask, const Packet8h& a, const Packet8h& b) {
Packet8h r; r.x = _mm_blendv_epi8(b.x, a.x, mask.x); return r;
}
template<> EIGEN_STRONG_INLINE Packet8h pcmp_eq(const Packet8h& a,const Packet8h& b) {
Packet8f af = half2float(a);
Packet8f bf = half2float(b);
Packet8f rf = pcmp_eq(af, bf);
// Pack the 32-bit flags into 16-bits flags.
Packet8h result; result.x = _mm_packs_epi32(_mm256_extractf128_si256(_mm256_castps_si256(rf), 0),
_mm256_extractf128_si256(_mm256_castps_si256(rf), 1));
return result;
}
template<> EIGEN_STRONG_INLINE Packet8h pconj(const Packet8h& a) { return a; }
template<> EIGEN_STRONG_INLINE Packet8h pnegate(const Packet8h& a) {
Packet8h sign_mask; sign_mask.x = _mm_set1_epi16(static_cast<unsigned short>(0x8000));
Packet8h result; result.x = _mm_xor_si128(a.x, sign_mask.x);
return result;
}
template<> EIGEN_STRONG_INLINE Packet8h padd<Packet8h>(const Packet8h& a, const Packet8h& b) {
Packet8f af = half2float(a);
Packet8f bf = half2float(b);
Packet8f rf = padd(af, bf);
return float2half(rf);
}
template<> EIGEN_STRONG_INLINE Packet8h psub<Packet8h>(const Packet8h& a, const Packet8h& b) {
Packet8f af = half2float(a);
Packet8f bf = half2float(b);
Packet8f rf = psub(af, bf);
return float2half(rf);
}
template<> EIGEN_STRONG_INLINE Packet8h pmul<Packet8h>(const Packet8h& a, const Packet8h& b) {
Packet8f af = half2float(a);
Packet8f bf = half2float(b);
Packet8f rf = pmul(af, bf);
return float2half(rf);
}
template<> EIGEN_STRONG_INLINE Packet8h pdiv<Packet8h>(const Packet8h& a, const Packet8h& b) {
Packet8f af = half2float(a);
Packet8f bf = half2float(b);
Packet8f rf = pdiv(af, bf);
return float2half(rf);
}
template<> EIGEN_STRONG_INLINE Packet8h pgather<Eigen::half, Packet8h>(const Eigen::half* from, Index stride)
{
Packet8h result;
result.x = _mm_set_epi16(from[7*stride].x, from[6*stride].x, from[5*stride].x, from[4*stride].x, from[3*stride].x, from[2*stride].x, from[1*stride].x, from[0*stride].x);
return result;
}
template<> EIGEN_STRONG_INLINE void pscatter<Eigen::half, Packet8h>(Eigen::half* to, const Packet8h& from, Index stride)
{
EIGEN_ALIGN32 Eigen::half aux[8];
pstore(aux, from);
to[stride*0].x = aux[0].x;
to[stride*1].x = aux[1].x;
to[stride*2].x = aux[2].x;
to[stride*3].x = aux[3].x;
to[stride*4].x = aux[4].x;
to[stride*5].x = aux[5].x;
to[stride*6].x = aux[6].x;
to[stride*7].x = aux[7].x;
}
template<> EIGEN_STRONG_INLINE Eigen::half predux<Packet8h>(const Packet8h& a) {
Packet8f af = half2float(a);
float reduced = predux<Packet8f>(af);
return Eigen::half(reduced);
}
template<> EIGEN_STRONG_INLINE Eigen::half predux_max<Packet8h>(const Packet8h& a) {
Packet8f af = half2float(a);
float reduced = predux_max<Packet8f>(af);
return Eigen::half(reduced);
}
template<> EIGEN_STRONG_INLINE Eigen::half predux_min<Packet8h>(const Packet8h& a) {
Packet8f af = half2float(a);
float reduced = predux_min<Packet8f>(af);
return Eigen::half(reduced);
}
template<> EIGEN_STRONG_INLINE Eigen::half predux_mul<Packet8h>(const Packet8h& a) {
Packet8f af = half2float(a);
float reduced = predux_mul<Packet8f>(af);
return Eigen::half(reduced);
}
template<> EIGEN_STRONG_INLINE Packet8h preduxp<Packet8h>(const Packet8h* p) {
Packet8f pf[8];
pf[0] = half2float(p[0]);
pf[1] = half2float(p[1]);
pf[2] = half2float(p[2]);
pf[3] = half2float(p[3]);
pf[4] = half2float(p[4]);
pf[5] = half2float(p[5]);
pf[6] = half2float(p[6]);
pf[7] = half2float(p[7]);
Packet8f reduced = preduxp<Packet8f>(pf);
return float2half(reduced);
}
template<> EIGEN_STRONG_INLINE Packet8h preverse(const Packet8h& a)
{
__m128i m = _mm_setr_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1);
Packet8h res;
res.x = _mm_shuffle_epi8(a.x,m);
return res;
}
template<> EIGEN_STRONG_INLINE Packet8h pinsertfirst(const Packet8h& a, Eigen::half b)
{
Packet8h res;
res.x = _mm_insert_epi16(a.x,int(b.x),0);
return res;
}
template<> EIGEN_STRONG_INLINE Packet8h pinsertlast(const Packet8h& a, Eigen::half b)
{
Packet8h res;
res.x = _mm_insert_epi16(a.x,int(b.x),7);
return res;
}
template<int Offset>
struct palign_impl<Offset,Packet8h>
{
static EIGEN_STRONG_INLINE void run(Packet8h& first, const Packet8h& second)
{
if (Offset!=0)
first.x = _mm_alignr_epi8(second.x,first.x, Offset*2);
}
};
EIGEN_STRONG_INLINE void
ptranspose(PacketBlock<Packet8h,8>& kernel) {
__m128i a = kernel.packet[0].x;
__m128i b = kernel.packet[1].x;
__m128i c = kernel.packet[2].x;
__m128i d = kernel.packet[3].x;
__m128i e = kernel.packet[4].x;
__m128i f = kernel.packet[5].x;
__m128i g = kernel.packet[6].x;
__m128i h = kernel.packet[7].x;
__m128i a03b03 = _mm_unpacklo_epi16(a, b);
__m128i c03d03 = _mm_unpacklo_epi16(c, d);
__m128i e03f03 = _mm_unpacklo_epi16(e, f);
__m128i g03h03 = _mm_unpacklo_epi16(g, h);
__m128i a47b47 = _mm_unpackhi_epi16(a, b);
__m128i c47d47 = _mm_unpackhi_epi16(c, d);
__m128i e47f47 = _mm_unpackhi_epi16(e, f);
__m128i g47h47 = _mm_unpackhi_epi16(g, h);
__m128i a01b01c01d01 = _mm_unpacklo_epi32(a03b03, c03d03);
__m128i a23b23c23d23 = _mm_unpackhi_epi32(a03b03, c03d03);
__m128i e01f01g01h01 = _mm_unpacklo_epi32(e03f03, g03h03);
__m128i e23f23g23h23 = _mm_unpackhi_epi32(e03f03, g03h03);
__m128i a45b45c45d45 = _mm_unpacklo_epi32(a47b47, c47d47);
__m128i a67b67c67d67 = _mm_unpackhi_epi32(a47b47, c47d47);
__m128i e45f45g45h45 = _mm_unpacklo_epi32(e47f47, g47h47);
__m128i e67f67g67h67 = _mm_unpackhi_epi32(e47f47, g47h47);
__m128i a0b0c0d0e0f0g0h0 = _mm_unpacklo_epi64(a01b01c01d01, e01f01g01h01);
__m128i a1b1c1d1e1f1g1h1 = _mm_unpackhi_epi64(a01b01c01d01, e01f01g01h01);
__m128i a2b2c2d2e2f2g2h2 = _mm_unpacklo_epi64(a23b23c23d23, e23f23g23h23);
__m128i a3b3c3d3e3f3g3h3 = _mm_unpackhi_epi64(a23b23c23d23, e23f23g23h23);
__m128i a4b4c4d4e4f4g4h4 = _mm_unpacklo_epi64(a45b45c45d45, e45f45g45h45);
__m128i a5b5c5d5e5f5g5h5 = _mm_unpackhi_epi64(a45b45c45d45, e45f45g45h45);
__m128i a6b6c6d6e6f6g6h6 = _mm_unpacklo_epi64(a67b67c67d67, e67f67g67h67);
__m128i a7b7c7d7e7f7g7h7 = _mm_unpackhi_epi64(a67b67c67d67, e67f67g67h67);
kernel.packet[0].x = a0b0c0d0e0f0g0h0;
kernel.packet[1].x = a1b1c1d1e1f1g1h1;
kernel.packet[2].x = a2b2c2d2e2f2g2h2;
kernel.packet[3].x = a3b3c3d3e3f3g3h3;
kernel.packet[4].x = a4b4c4d4e4f4g4h4;
kernel.packet[5].x = a5b5c5d5e5f5g5h5;
kernel.packet[6].x = a6b6c6d6e6f6g6h6;
kernel.packet[7].x = a7b7c7d7e7f7g7h7;
}
EIGEN_STRONG_INLINE void
ptranspose(PacketBlock<Packet8h,4>& kernel) {
EIGEN_ALIGN32 Eigen::half in[4][8];
pstore<Eigen::half>(in[0], kernel.packet[0]);
pstore<Eigen::half>(in[1], kernel.packet[1]);
pstore<Eigen::half>(in[2], kernel.packet[2]);
pstore<Eigen::half>(in[3], kernel.packet[3]);
EIGEN_ALIGN32 Eigen::half out[4][8];
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out[i][j] = in[j][2*i];
}
for (int j = 0; j < 4; ++j) {
out[i][j+4] = in[j][2*i+1];
}
}
kernel.packet[0] = pload<Packet8h>(out[0]);
kernel.packet[1] = pload<Packet8h>(out[1]);
kernel.packet[2] = pload<Packet8h>(out[2]);
kernel.packet[3] = pload<Packet8h>(out[3]);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -37,13 +37,51 @@ struct type_casting_traits<int, float> {
template<> EIGEN_STRONG_INLINE Packet8i pcast<Packet8f, Packet8i>(const Packet8f& a) {
return _mm256_cvtps_epi32(a);
return _mm256_cvttps_epi32(a);
}
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8i, Packet8f>(const Packet8i& a) {
return _mm256_cvtepi32_ps(a);
}
template<> EIGEN_STRONG_INLINE Packet8i preinterpret<Packet8i,Packet8f>(const Packet8f& a) {
return _mm256_castps_si256(a);
}
template<> EIGEN_STRONG_INLINE Packet8f preinterpret<Packet8f,Packet8i>(const Packet8i& a) {
return _mm256_castsi256_ps(a);
}
#ifndef EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8h, Packet8f>(const Packet8h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
#endif // EIGEN_VECTORIZE_AVX512
template<> EIGEN_STRONG_INLINE Packet8h pcast<Packet8f, Packet8h>(const Packet8f& a) {
return float2half(a);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -0,0 +1,492 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2018 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_COMPLEX_AVX512_H
#define EIGEN_COMPLEX_AVX512_H
namespace Eigen {
namespace internal {
//---------- float ----------
struct Packet8cf
{
EIGEN_STRONG_INLINE Packet8cf() {}
EIGEN_STRONG_INLINE explicit Packet8cf(const __m512& a) : v(a) {}
__m512 v;
};
template<> struct packet_traits<std::complex<float> > : default_packet_traits
{
typedef Packet8cf type;
typedef Packet4cf half;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size = 8,
HasHalfPacket = 1,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
};
};
template<> struct unpacket_traits<Packet8cf> {
typedef std::complex<float> type;
enum {
size = 8,
alignment=unpacket_traits<Packet16f>::alignment,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
typedef Packet4cf half;
};
template<> EIGEN_STRONG_INLINE Packet8cf ptrue<Packet8cf>(const Packet8cf& a) { return Packet8cf(ptrue(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf pnot<Packet8cf>(const Packet8cf& a) { return Packet8cf(pnot(Packet16f(a.v))); }
template<> EIGEN_STRONG_INLINE Packet8cf padd<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_add_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf psub<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_sub_ps(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf pnegate(const Packet8cf& a)
{
return Packet8cf(pnegate(a.v));
}
template<> EIGEN_STRONG_INLINE Packet8cf pconj(const Packet8cf& a)
{
const __m512 mask = _mm512_castsi512_ps(_mm512_setr_epi32(
0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,
0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000));
return Packet8cf(pxor(a.v,mask));
}
template<> EIGEN_STRONG_INLINE Packet8cf pmul<Packet8cf>(const Packet8cf& a, const Packet8cf& b)
{
__m512 tmp2 = _mm512_mul_ps(_mm512_movehdup_ps(a.v), _mm512_permute_ps(b.v, _MM_SHUFFLE(2,3,0,1)));
return Packet8cf(_mm512_fmaddsub_ps(_mm512_moveldup_ps(a.v), b.v, tmp2));
}
template<> EIGEN_STRONG_INLINE Packet8cf pand <Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf por <Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf pxor <Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(pxor(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet8cf pandnot<Packet8cf>(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(pandnot(a.v,b.v)); }
template <>
EIGEN_STRONG_INLINE Packet8cf pcmp_eq(const Packet8cf& a, const Packet8cf& b) {
__m512 eq = pcmp_eq<Packet16f>(a.v, b.v);
return Packet8cf(pand(eq, _mm512_permute_ps(eq, 0xB1)));
}
template<> EIGEN_STRONG_INLINE Packet8cf pload <Packet8cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet8cf(pload<Packet16f>(&numext::real_ref(*from))); }
template<> EIGEN_STRONG_INLINE Packet8cf ploadu<Packet8cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet8cf(ploadu<Packet16f>(&numext::real_ref(*from))); }
template<> EIGEN_STRONG_INLINE Packet8cf pset1<Packet8cf>(const std::complex<float>& from)
{
return Packet8cf(_mm512_castpd_ps(pload1<Packet8d>((const double*)(const void*)&from)));
}
template<> EIGEN_STRONG_INLINE Packet8cf ploaddup<Packet8cf>(const std::complex<float>* from)
{
return Packet8cf( _mm512_castpd_ps( ploaddup<Packet8d>((const double*)(const void*)from )) );
}
template<> EIGEN_STRONG_INLINE Packet8cf ploadquad<Packet8cf>(const std::complex<float>* from)
{
return Packet8cf( _mm512_castpd_ps( ploadquad<Packet8d>((const double*)(const void*)from )) );
}
template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float>* to, const Packet8cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(&numext::real_ref(*to), from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float>* to, const Packet8cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(&numext::real_ref(*to), from.v); }
template<> EIGEN_DEVICE_FUNC inline Packet8cf pgather<std::complex<float>, Packet8cf>(const std::complex<float>* from, Index stride)
{
return Packet8cf(_mm512_castpd_ps(pgather<double,Packet8d>((const double*)(const void*)from, stride)));
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet8cf>(std::complex<float>* to, const Packet8cf& from, Index stride)
{
pscatter((double*)(void*)to, _mm512_castps_pd(from.v), stride);
}
template<> EIGEN_STRONG_INLINE std::complex<float> pfirst<Packet8cf>(const Packet8cf& a)
{
return pfirst(Packet2cf(_mm512_castps512_ps128(a.v)));
}
template<> EIGEN_STRONG_INLINE Packet8cf preverse(const Packet8cf& a) {
return Packet8cf(_mm512_castsi512_ps(
_mm512_permutexvar_epi64( _mm512_set_epi32(0, 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7),
_mm512_castps_si512(a.v))));
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet8cf>(const Packet8cf& a)
{
return predux(padd(Packet4cf(extract256<0>(a.v)),
Packet4cf(extract256<1>(a.v))));
}
template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet8cf>(const Packet8cf& a)
{
return predux_mul(pmul(Packet4cf(extract256<0>(a.v)),
Packet4cf(extract256<1>(a.v))));
}
template <>
EIGEN_STRONG_INLINE Packet4cf predux_half_dowto4<Packet8cf>(const Packet8cf& a) {
__m256 lane0 = extract256<0>(a.v);
__m256 lane1 = extract256<1>(a.v);
__m256 res = _mm256_add_ps(lane0, lane1);
return Packet4cf(res);
}
template<int Offset>
struct palign_impl<Offset,Packet8cf>
{
static EIGEN_STRONG_INLINE void run(Packet8cf& first, const Packet8cf& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet16f>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, false,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, true,false>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet8cf, Packet8cf, true,true>
{
EIGEN_STRONG_INLINE Packet8cf pmadd(const Packet8cf& x, const Packet8cf& y, const Packet8cf& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet8cf,Packet16f)
template<> EIGEN_STRONG_INLINE Packet8cf pdiv<Packet8cf>(const Packet8cf& a, const Packet8cf& b)
{
Packet8cf num = pmul(a, pconj(b));
__m512 tmp = _mm512_mul_ps(b.v, b.v);
__m512 tmp2 = _mm512_shuffle_ps(tmp,tmp,0xB1);
__m512 denom = _mm512_add_ps(tmp, tmp2);
return Packet8cf(_mm512_div_ps(num.v, denom));
}
template<> EIGEN_STRONG_INLINE Packet8cf pcplxflip<Packet8cf>(const Packet8cf& x)
{
return Packet8cf(_mm512_shuffle_ps(x.v, x.v, _MM_SHUFFLE(2, 3, 0 ,1)));
}
//---------- double ----------
struct Packet4cd
{
EIGEN_STRONG_INLINE Packet4cd() {}
EIGEN_STRONG_INLINE explicit Packet4cd(const __m512d& a) : v(a) {}
__m512d v;
};
template<> struct packet_traits<std::complex<double> > : default_packet_traits
{
typedef Packet4cd type;
typedef Packet2cd half;
enum {
Vectorizable = 1,
AlignedOnScalar = 0,
size = 4,
HasHalfPacket = 1,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasNegate = 1,
HasAbs = 0,
HasAbs2 = 0,
HasMin = 0,
HasMax = 0,
HasSetLinear = 0,
HasReduxp = 0
};
};
template<> struct unpacket_traits<Packet4cd> {
typedef std::complex<double> type;
enum {
size = 4,
alignment = unpacket_traits<Packet8d>::alignment,
vectorizable=true,
masked_load_available=false,
masked_store_available=false
};
typedef Packet2cd half;
};
template<> EIGEN_STRONG_INLINE Packet4cd padd<Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_add_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd psub<Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_sub_pd(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pnegate(const Packet4cd& a) { return Packet4cd(pnegate(a.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pconj(const Packet4cd& a)
{
const __m512d mask = _mm512_castsi512_pd(
_mm512_set_epi32(0x80000000,0x0,0x0,0x0,0x80000000,0x0,0x0,0x0,
0x80000000,0x0,0x0,0x0,0x80000000,0x0,0x0,0x0));
return Packet4cd(pxor(a.v,mask));
}
template<> EIGEN_STRONG_INLINE Packet4cd pmul<Packet4cd>(const Packet4cd& a, const Packet4cd& b)
{
__m512d tmp1 = _mm512_shuffle_pd(a.v,a.v,0x0);
__m512d tmp2 = _mm512_shuffle_pd(a.v,a.v,0xFF);
__m512d tmp3 = _mm512_shuffle_pd(b.v,b.v,0x55);
__m512d odd = _mm512_mul_pd(tmp2, tmp3);
return Packet4cd(_mm512_fmaddsub_pd(tmp1, b.v, odd));
}
template<> EIGEN_STRONG_INLINE Packet4cd ptrue<Packet4cd>(const Packet4cd& a) { return Packet4cd(ptrue(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pnot<Packet4cd>(const Packet4cd& a) { return Packet4cd(pnot(Packet8d(a.v))); }
template<> EIGEN_STRONG_INLINE Packet4cd pand <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pand(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd por <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(por(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pxor <Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pxor(a.v,b.v)); }
template<> EIGEN_STRONG_INLINE Packet4cd pandnot<Packet4cd>(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pandnot(a.v,b.v)); }
template <>
EIGEN_STRONG_INLINE Packet4cd pcmp_eq(const Packet4cd& a, const Packet4cd& b) {
__m512d eq = pcmp_eq<Packet8d>(a.v, b.v);
return Packet4cd(pand(eq, _mm512_permute_pd(eq, 0x55)));
}
template<> EIGEN_STRONG_INLINE Packet4cd pload <Packet4cd>(const std::complex<double>* from)
{ EIGEN_DEBUG_ALIGNED_LOAD return Packet4cd(pload<Packet8d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet4cd ploadu<Packet4cd>(const std::complex<double>* from)
{ EIGEN_DEBUG_UNALIGNED_LOAD return Packet4cd(ploadu<Packet8d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet4cd pset1<Packet4cd>(const std::complex<double>& from)
{
#ifdef EIGEN_VECTORIZE_AVX512DQ
return Packet4cd(_mm512_broadcast_f64x2(pset1<Packet1cd>(from).v));
#else
return Packet4cd(_mm512_castps_pd(_mm512_broadcast_f32x4( _mm_castpd_ps(pset1<Packet1cd>(from).v))));
#endif
}
template<> EIGEN_STRONG_INLINE Packet4cd ploaddup<Packet4cd>(const std::complex<double>* from) {
return Packet4cd(_mm512_insertf64x4(
_mm512_castpd256_pd512(ploaddup<Packet2cd>(from).v), ploaddup<Packet2cd>(from+1).v, 1));
}
template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> * to, const Packet4cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); }
template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> * to, const Packet4cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); }
template<> EIGEN_DEVICE_FUNC inline Packet4cd pgather<std::complex<double>, Packet4cd>(const std::complex<double>* from, Index stride)
{
return Packet4cd(_mm512_insertf64x4(_mm512_castpd256_pd512(
_mm256_insertf128_pd(_mm256_castpd128_pd256(ploadu<Packet1cd>(from+0*stride).v), ploadu<Packet1cd>(from+1*stride).v,1)),
_mm256_insertf128_pd(_mm256_castpd128_pd256(ploadu<Packet1cd>(from+2*stride).v), ploadu<Packet1cd>(from+3*stride).v,1), 1));
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet4cd>(std::complex<double>* to, const Packet4cd& from, Index stride)
{
__m512i fromi = _mm512_castpd_si512(from.v);
double* tod = (double*)(void*)to;
_mm_storeu_pd(tod+0*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,0)) );
_mm_storeu_pd(tod+2*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,1)) );
_mm_storeu_pd(tod+4*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,2)) );
_mm_storeu_pd(tod+6*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,3)) );
}
template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet4cd>(const Packet4cd& a)
{
__m128d low = extract128<0>(a.v);
EIGEN_ALIGN16 double res[2];
_mm_store_pd(res, low);
return std::complex<double>(res[0],res[1]);
}
template<> EIGEN_STRONG_INLINE Packet4cd preverse(const Packet4cd& a) {
return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, EIGEN_SSE_SHUFFLE_MASK(3,2,1,0)));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet4cd>(const Packet4cd& a)
{
return predux(padd(Packet2cd(_mm512_extractf64x4_pd(a.v,0)),
Packet2cd(_mm512_extractf64x4_pd(a.v,1))));
}
template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet4cd>(const Packet4cd& a)
{
return predux_mul(pmul(Packet2cd(_mm512_extractf64x4_pd(a.v,0)),
Packet2cd(_mm512_extractf64x4_pd(a.v,1))));
}
template<int Offset>
struct palign_impl<Offset,Packet4cd>
{
static EIGEN_STRONG_INLINE void run(Packet4cd& first, const Packet4cd& second)
{
if (Offset==0) return;
palign_impl<Offset*2,Packet8d>::run(first.v, second.v);
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, false,true>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const
{
return internal::pmul(a, pconj(b));
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, true,false>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const
{
return internal::pmul(pconj(a), b);
}
};
template<> struct conj_helper<Packet4cd, Packet4cd, true,true>
{
EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const
{ return padd(pmul(x,y),c); }
EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const
{
return pconj(internal::pmul(a, b));
}
};
EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet4cd,Packet8d)
template<> EIGEN_STRONG_INLINE Packet4cd pdiv<Packet4cd>(const Packet4cd& a, const Packet4cd& b)
{
Packet4cd num = pmul(a, pconj(b));
__m512d tmp = _mm512_mul_pd(b.v, b.v);
__m512d denom = padd(_mm512_permute_pd(tmp,0x55), tmp);
return Packet4cd(_mm512_div_pd(num.v, denom));
}
template<> EIGEN_STRONG_INLINE Packet4cd pcplxflip<Packet4cd>(const Packet4cd& x)
{
return Packet4cd(_mm512_permute_pd(x.v,0x55));
}
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet8cf,4>& kernel) {
PacketBlock<Packet8d,4> pb;
pb.packet[0] = _mm512_castps_pd(kernel.packet[0].v);
pb.packet[1] = _mm512_castps_pd(kernel.packet[1].v);
pb.packet[2] = _mm512_castps_pd(kernel.packet[2].v);
pb.packet[3] = _mm512_castps_pd(kernel.packet[3].v);
ptranspose(pb);
kernel.packet[0].v = _mm512_castpd_ps(pb.packet[0]);
kernel.packet[1].v = _mm512_castpd_ps(pb.packet[1]);
kernel.packet[2].v = _mm512_castpd_ps(pb.packet[2]);
kernel.packet[3].v = _mm512_castpd_ps(pb.packet[3]);
}
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet8cf,8>& kernel) {
PacketBlock<Packet8d,8> pb;
pb.packet[0] = _mm512_castps_pd(kernel.packet[0].v);
pb.packet[1] = _mm512_castps_pd(kernel.packet[1].v);
pb.packet[2] = _mm512_castps_pd(kernel.packet[2].v);
pb.packet[3] = _mm512_castps_pd(kernel.packet[3].v);
pb.packet[4] = _mm512_castps_pd(kernel.packet[4].v);
pb.packet[5] = _mm512_castps_pd(kernel.packet[5].v);
pb.packet[6] = _mm512_castps_pd(kernel.packet[6].v);
pb.packet[7] = _mm512_castps_pd(kernel.packet[7].v);
ptranspose(pb);
kernel.packet[0].v = _mm512_castpd_ps(pb.packet[0]);
kernel.packet[1].v = _mm512_castpd_ps(pb.packet[1]);
kernel.packet[2].v = _mm512_castpd_ps(pb.packet[2]);
kernel.packet[3].v = _mm512_castpd_ps(pb.packet[3]);
kernel.packet[4].v = _mm512_castpd_ps(pb.packet[4]);
kernel.packet[5].v = _mm512_castpd_ps(pb.packet[5]);
kernel.packet[6].v = _mm512_castpd_ps(pb.packet[6]);
kernel.packet[7].v = _mm512_castpd_ps(pb.packet[7]);
}
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<Packet4cd,4>& kernel) {
__m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [a0 a1 b0 b1]
__m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [a2 a3 b2 b3]
__m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(0,1,0,1)); // [c0 c1 d0 d1]
__m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, EIGEN_SSE_SHUFFLE_MASK(2,3,2,3)); // [c2 c3 d2 d3]
kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a3 b3 c3 d3]
kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a2 b2 c2 d2]
kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(1,3,1,3))); // [a1 b1 c1 d1]
kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, EIGEN_SSE_SHUFFLE_MASK(0,2,0,2))); // [a0 b0 c0 d0]
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertfirst(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,0));
tmp = pinsertfirst(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 0) );
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertfirst(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 0) ));
}
template<> EIGEN_STRONG_INLINE Packet8cf pinsertlast(const Packet8cf& a, std::complex<float> b)
{
Packet2cf tmp = Packet2cf(_mm512_extractf32x4_ps(a.v,3) );
tmp = pinsertlast(tmp, b);
return Packet8cf( _mm512_insertf32x4(a.v, tmp.v, 3) );
}
template<> EIGEN_STRONG_INLINE Packet4cd pinsertlast(const Packet4cd& a, std::complex<double> b)
{
return Packet4cd(_mm512_castsi512_pd( _mm512_inserti32x4(_mm512_castpd_si512(a.v), _mm_castpd_si128(pset1<Packet1cd>(b).v), 3) ));
}
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_COMPLEX_AVX512_H

View File

@@ -15,13 +15,13 @@ namespace Eigen {
namespace internal {
// Disable the code for older versions of gcc that don't support many of the required avx512 instrinsics.
#if EIGEN_GNUC_AT_LEAST(5, 3)
#if EIGEN_GNUC_AT_LEAST(5, 3) || EIGEN_COMP_CLANG || EIGEN_COMP_MSVC >= 1923
#define _EIGEN_DECLARE_CONST_Packet16f(NAME, X) \
const Packet16f p16f_##NAME = pset1<Packet16f>(X)
#define _EIGEN_DECLARE_CONST_Packet16f_FROM_INT(NAME, X) \
const Packet16f p16f_##NAME = (__m512)pset1<Packet16i>(X)
const Packet16f p16f_##NAME = preinterpret<Packet16f,Packet16i>(pset1<Packet16i>(X))
#define _EIGEN_DECLARE_CONST_Packet8d(NAME, X) \
const Packet8d p8d_##NAME = pset1<Packet8d>(X)
@@ -29,7 +29,6 @@ namespace internal {
#define _EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(NAME, X) \
const Packet8d p8d_##NAME = _mm512_castsi512_pd(_mm512_set1_epi64(X))
// Natural logarithm
// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
@@ -73,7 +72,7 @@ plog<Packet16f>(const Packet16f& _x) {
x = pmax(x, p16f_min_norm_pos);
// Extract the shifted exponents.
Packet16f emm0 = _mm512_cvtepi32_ps(_mm512_srli_epi32((__m512i)x, 23));
Packet16f emm0 = _mm512_cvtepi32_ps(_mm512_srli_epi32(preinterpret<Packet16i,Packet16f>(x), 23));
Packet16f e = _mm512_sub_ps(emm0, p16f_126f);
// Set the exponents to -1, i.e. x are in the range [0.5,1).
@@ -129,7 +128,6 @@ plog<Packet16f>(const Packet16f& _x) {
p16f_nan),
p16f_minus_inf);
}
#endif
// Exponential function. Works by writing "x = m*log(2) + r" where
@@ -255,6 +253,7 @@ pexp<Packet8d>(const Packet8d& _x) {
return pmax(pmul(x, e), _x);
}*/
// Functions for sqrt.
// The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step
// of Newton's method, at a cost of 1-2 bits of precision as opposed to the
@@ -310,77 +309,135 @@ EIGEN_STRONG_INLINE Packet8d psqrt<Packet8d>(const Packet8d& x) {
}
#endif
// Functions for rsqrt.
// Almost identical to the sqrt routine, just leave out the last multiplication
// and fill in NaN/Inf where needed. Note that this function only exists as an
// iterative version for doubles since there is no instruction for diretly
// computing the reciprocal square root in AVX-512.
#ifdef EIGEN_FAST_MATH
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
prsqrt<Packet16f>(const Packet16f& _x) {
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(inf, 0x7f800000);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(nan, 0x7fc00000);
_EIGEN_DECLARE_CONST_Packet16f(one_point_five, 1.5f);
_EIGEN_DECLARE_CONST_Packet16f(minus_half, -0.5f);
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(flt_min, 0x00800000);
// prsqrt for float.
#if defined(EIGEN_VECTORIZE_AVX512ER)
Packet16f neg_half = pmul(_x, p16f_minus_half);
// select only the inverse sqrt of positive normal inputs (denormals are
// flushed to zero and cause infs as well).
__mmask16 le_zero_mask = _mm512_cmp_ps_mask(_x, p16f_flt_min, _CMP_LT_OQ);
Packet16f x = _mm512_mask_blend_ps(le_zero_mask, _mm512_rsqrt14_ps(_x), _mm512_setzero_ps());
// Fill in NaNs and Infs for the negative/zero entries.
__mmask16 neg_mask = _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_LT_OQ);
Packet16f infs_and_nans = _mm512_mask_blend_ps(
neg_mask, _mm512_mask_blend_ps(le_zero_mask, _mm512_setzero_ps(), p16f_inf), p16f_nan);
// Do a single step of Newton's iteration.
x = pmul(x, pmadd(neg_half, pmul(x, x), p16f_one_point_five));
// Insert NaNs and Infs in all the right places.
return _mm512_mask_blend_ps(le_zero_mask, x, infs_and_nans);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
prsqrt<Packet8d>(const Packet8d& _x) {
_EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(inf, 0x7ff0000000000000LL);
_EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(nan, 0x7ff1000000000000LL);
_EIGEN_DECLARE_CONST_Packet8d(one_point_five, 1.5);
_EIGEN_DECLARE_CONST_Packet8d(minus_half, -0.5);
_EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(dbl_min, 0x0010000000000000LL);
Packet8d neg_half = pmul(_x, p8d_minus_half);
// select only the inverse sqrt of positive normal inputs (denormals are
// flushed to zero and cause infs as well).
__mmask8 le_zero_mask = _mm512_cmp_pd_mask(_x, p8d_dbl_min, _CMP_LT_OQ);
Packet8d x = _mm512_mask_blend_pd(le_zero_mask, _mm512_rsqrt14_pd(_x), _mm512_setzero_pd());
// Fill in NaNs and Infs for the negative/zero entries.
__mmask8 neg_mask = _mm512_cmp_pd_mask(_x, _mm512_setzero_pd(), _CMP_LT_OQ);
Packet8d infs_and_nans = _mm512_mask_blend_pd(
neg_mask, _mm512_mask_blend_pd(le_zero_mask, _mm512_setzero_pd(), p8d_inf), p8d_nan);
// Do a first step of Newton's iteration.
x = pmul(x, pmadd(neg_half, pmul(x, x), p8d_one_point_five));
// Do a second step of Newton's iteration.
x = pmul(x, pmadd(neg_half, pmul(x, x), p8d_one_point_five));
// Insert NaNs and Infs in all the right places.
return _mm512_mask_blend_pd(le_zero_mask, x, infs_and_nans);
}
#elif defined(EIGEN_VECTORIZE_AVX512ER)
template <>
EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
return _mm512_rsqrt28_ps(x);
}
#elif EIGEN_FAST_MATH
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
prsqrt<Packet16f>(const Packet16f& _x) {
_EIGEN_DECLARE_CONST_Packet16f_FROM_INT(inf, 0x7f800000);
_EIGEN_DECLARE_CONST_Packet16f(one_point_five, 1.5f);
_EIGEN_DECLARE_CONST_Packet16f(minus_half, -0.5f);
Packet16f neg_half = pmul(_x, p16f_minus_half);
// Identity infinite, negative and denormal arguments.
__mmask16 inf_mask = _mm512_cmp_ps_mask(_x, p16f_inf, _CMP_EQ_OQ);
__mmask16 not_pos_mask = _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_LE_OQ);
__mmask16 not_finite_pos_mask = not_pos_mask | inf_mask;
// Compute an approximate result using the rsqrt intrinsic, forcing +inf
// for denormals for consistency with AVX and SSE implementations.
Packet16f y_approx = _mm512_rsqrt14_ps(_x);
// Do a single step of Newton-Raphson iteration to improve the approximation.
// This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n).
// It is essential to evaluate the inner term like this because forming
// y_n^2 may over- or underflow.
Packet16f y_newton = pmul(y_approx, pmadd(y_approx, pmul(neg_half, y_approx), p16f_one_point_five));
// Select the result of the Newton-Raphson step for positive finite arguments.
// For other arguments, choose the output of the intrinsic. This will
// return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf.
return _mm512_mask_blend_ps(not_finite_pos_mask, y_newton, y_approx);
}
#else
template <>
EIGEN_STRONG_INLINE Packet16f prsqrt<Packet16f>(const Packet16f& x) {
_EIGEN_DECLARE_CONST_Packet16f(one, 1.0f);
return _mm512_div_ps(p16f_one, _mm512_sqrt_ps(x));
}
#endif
// prsqrt for double.
#if EIGEN_FAST_MATH
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d
prsqrt<Packet8d>(const Packet8d& _x) {
_EIGEN_DECLARE_CONST_Packet8d(one_point_five, 1.5);
_EIGEN_DECLARE_CONST_Packet8d(minus_half, -0.5);
_EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(inf, 0x7ff0000000000000LL);
Packet8d neg_half = pmul(_x, p8d_minus_half);
// Identity infinite, negative and denormal arguments.
__mmask8 inf_mask = _mm512_cmp_pd_mask(_x, p8d_inf, _CMP_EQ_OQ);
__mmask8 not_pos_mask = _mm512_cmp_pd_mask(_x, _mm512_setzero_pd(), _CMP_LE_OQ);
__mmask8 not_finite_pos_mask = not_pos_mask | inf_mask;
// Compute an approximate result using the rsqrt intrinsic, forcing +inf
// for denormals for consistency with AVX and SSE implementations.
#if defined(EIGEN_VECTORIZE_AVX512ER)
Packet8d y_approx = _mm512_rsqrt28_pd(_x);
#else
Packet8d y_approx = _mm512_rsqrt14_pd(_x);
#endif
// Do one or two steps of Newton-Raphson's to improve the approximation, depending on the
// starting accuracy (either 2^-14 or 2^-28, depending on whether AVX512ER is available).
// The Newton-Raphson algorithm has quadratic convergence and roughly doubles the number
// of correct digits for each step.
// This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n).
// It is essential to evaluate the inner term like this because forming
// y_n^2 may over- or underflow.
Packet8d y_newton = pmul(y_approx, pmadd(neg_half, pmul(y_approx, y_approx), p8d_one_point_five));
#if !defined(EIGEN_VECTORIZE_AVX512ER)
y_newton = pmul(y_newton, pmadd(y_newton, pmul(neg_half, y_newton), p8d_one_point_five));
#endif
// Select the result of the Newton-Raphson step for positive finite arguments.
// For other arguments, choose the output of the intrinsic. This will
// return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf.
return _mm512_mask_blend_pd(not_finite_pos_mask, y_newton, y_approx);
}
#else
template <>
EIGEN_STRONG_INLINE Packet8d prsqrt<Packet8d>(const Packet8d& x) {
_EIGEN_DECLARE_CONST_Packet8d(one, 1.0f);
return _mm512_div_pd(p8d_one, _mm512_sqrt_pd(x));
}
#endif
#if defined(EIGEN_VECTORIZE_AVX512DQ)
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f plog1p<Packet16f>(const Packet16f& _x) {
return generic_plog1p(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet16f pexpm1<Packet16f>(const Packet16f& _x) {
return generic_expm1(_x);
}
#endif
#endif
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
psin<Packet16f>(const Packet16f& _x) {
return psin_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
pcos<Packet16f>(const Packet16f& _x) {
return pcos_float(_x);
}
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f
ptanh<Packet16f>(const Packet16f& _x) {
return internal::generic_fast_tanh_float(_x);
}
} // end namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,47 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2019 Rasmus Munk Larsen <rmlarsen@google.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_TYPE_CASTING_AVX512_H
#define EIGEN_TYPE_CASTING_AVX512_H
namespace Eigen {
namespace internal {
template <>
struct type_casting_traits<half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16h, Packet16f>(const Packet16h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16h pcast<Packet16f, Packet16h>(const Packet16f& a) {
return float2half(a);
}
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_TYPE_CASTING_AVX512_H

View File

@@ -60,7 +60,7 @@ template<> struct packet_traits<std::complex<float> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16}; typedef Packet2cf half; };
template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; };
template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>& from)
{
@@ -82,14 +82,14 @@ template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<f
template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, Index stride)
{
std::complex<float> EIGEN_ALIGN16 af[2];
EIGEN_ALIGN16 std::complex<float> af[2];
af[0] = from[0*stride];
af[1] = from[1*stride];
return pload<Packet2cf>(af);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf>(std::complex<float>* to, const Packet2cf& from, Index stride)
{
std::complex<float> EIGEN_ALIGN16 af[2];
EIGEN_ALIGN16 std::complex<float> af[2];
pstore<std::complex<float> >((std::complex<float> *) af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
@@ -128,7 +128,7 @@ template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::co
template<> EIGEN_STRONG_INLINE std::complex<float> pfirst<Packet2cf>(const Packet2cf& a)
{
std::complex<float> EIGEN_ALIGN16 res[2];
EIGEN_ALIGN16 std::complex<float> res[2];
pstore((float *)&res, a.v);
return res[0];
@@ -246,6 +246,11 @@ EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2cf,2>& kernel)
kernel.packet[0].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) {
Packet4f eq = reinterpret_cast<Packet4f>(vec_cmpeq(a.v,b.v));
return Packet2cf(vec_and(eq, vec_perm(eq, eq, p16uc_COMPLEX32_REV)));
}
#ifdef __VSX__
template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) {
Packet2cf result;
@@ -286,7 +291,7 @@ template<> struct packet_traits<std::complex<double> > : default_packet_traits
};
};
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16}; typedef Packet1cd half; };
template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; };
template<> EIGEN_STRONG_INLINE Packet1cd pload <Packet1cd>(const std::complex<double>* from) { return Packet1cd(pload<Packet2d>((const double*)from)); }
template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { return Packet1cd(ploadu<Packet2d>((const double*)from)); }
@@ -298,14 +303,14 @@ template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<dou
template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, Index stride)
{
std::complex<double> EIGEN_ALIGN16 af[2];
EIGEN_ALIGN16 std::complex<double> af[2];
af[0] = from[0*stride];
af[1] = from[1*stride];
return pload<Packet1cd>(af);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, Index stride)
{
std::complex<double> EIGEN_ALIGN16 af[2];
EIGEN_ALIGN16 std::complex<double> af[2];
pstore<std::complex<double> >(af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
@@ -345,7 +350,7 @@ template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::c
template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet1cd>(const Packet1cd& a)
{
std::complex<double> EIGEN_ALIGN16 res[2];
EIGEN_ALIGN16 std::complex<double> res[2];
pstore<std::complex<double> >(res, a);
return res[0];
@@ -422,6 +427,18 @@ EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet1cd,2>& kernel)
kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO);
kernel.packet[0].v = tmp;
}
template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packet1cd& b) {
// Compare real and imaginary parts of a and b to get the mask vector:
// [re(a)==re(b), im(a)==im(b)]
Packet2d eq = reinterpret_cast<Packet2d>(vec_cmpeq(a.v,b.v));
// Swap real/imag elements in the mask in to get:
// [im(a)==im(b), re(a)==re(b)]
Packet2d eq_swapped = reinterpret_cast<Packet2d>(vec_sld(reinterpret_cast<Packet4ui>(eq), reinterpret_cast<Packet4ui>(eq), 8));
// Return re(a)==re(b) & im(a)==im(b) by computing bitwise AND of eq and eq_swapped
return Packet1cd(vec_and(eq, eq_swapped));
}
#endif // __VSX__
} // end namespace internal

View File

@@ -9,10 +9,6 @@
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
/* The sin, cos, exp, and log functions of this file come from
* Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
*/
#ifndef EIGEN_MATH_FUNCTIONS_ALTIVEC_H
#define EIGEN_MATH_FUNCTIONS_ALTIVEC_H
@@ -20,180 +16,28 @@ namespace Eigen {
namespace internal {
static _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
static _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
static _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f);
static _EIGEN_DECLARE_CONST_Packet4i(23, 23);
static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(inv_mant_mask, ~0x7f800000);
/* the smallest non denormalized float number */
static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(min_norm_pos, 0x00800000);
static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_inf, 0xff800000); // -1.f/0.f
static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_nan, 0xffffffff);
/* natural logarithm computed for 4 simultaneous float
return NaN for x <= 0
*/
static _EIGEN_DECLARE_CONST_Packet4f(cephes_SQRTHF, 0.707106781186547524f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p0, 7.0376836292E-2f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p1, - 1.1514610310E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p2, 1.1676998740E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p3, - 1.2420140846E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p4, + 1.4249322787E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p5, - 1.6668057665E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p6, + 2.0000714765E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p7, - 2.4999993993E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p8, + 3.3333331174E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q1, -2.12194440e-4f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q2, 0.693359375f);
static _EIGEN_DECLARE_CONST_Packet4f(exp_hi, 88.3762626647950f);
static _EIGEN_DECLARE_CONST_Packet4f(exp_lo, -88.3762626647949f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_LOG2EF, 1.44269504088896341f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C1, 0.693359375f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C2, -2.12194440e-4f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p0, 1.9875691500E-4f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p1, 1.3981999507E-3f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p2, 8.3334519073E-3f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p3, 4.1665795894E-2f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p4, 1.6666665459E-1f);
static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p5, 5.0000001201E-1f);
#ifdef __VSX__
static _EIGEN_DECLARE_CONST_Packet2d(1 , 1.0);
static _EIGEN_DECLARE_CONST_Packet2d(2 , 2.0);
static _EIGEN_DECLARE_CONST_Packet2d(half, 0.5);
static _EIGEN_DECLARE_CONST_Packet2d(exp_hi, 709.437);
static _EIGEN_DECLARE_CONST_Packet2d(exp_lo, -709.436139303);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_LOG2EF, 1.4426950408889634073599);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p0, 1.26177193074810590878e-4);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p1, 3.02994407707441961300e-2);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p2, 9.99999999999999999910e-1);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q0, 3.00198505138664455042e-6);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q1, 2.52448340349684104192e-3);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q2, 2.27265548208155028766e-1);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q3, 2.00000000000000000009e0);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C1, 0.693145751953125);
static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C2, 1.42860682030941723212e-6);
#ifdef __POWER8_VECTOR__
static Packet2l p2l_1023 = { 1023, 1023 };
static Packet2ul p2ul_52 = { 52, 52 };
#endif
#endif
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f plog<Packet4f>(const Packet4f& _x)
{
Packet4f x = _x;
Packet4i emm0;
/* isvalid_mask is 0 if x < 0 or x is NaN. */
Packet4ui isvalid_mask = reinterpret_cast<Packet4ui>(vec_cmpge(x, p4f_ZERO));
Packet4ui iszero_mask = reinterpret_cast<Packet4ui>(vec_cmpeq(x, p4f_ZERO));
x = pmax(x, p4f_min_norm_pos); /* cut off denormalized stuff */
emm0 = vec_sr(reinterpret_cast<Packet4i>(x),
reinterpret_cast<Packet4ui>(p4i_23));
/* keep only the fractional part */
x = pand(x, p4f_inv_mant_mask);
x = por(x, p4f_half);
emm0 = psub(emm0, p4i_0x7f);
Packet4f e = padd(vec_ctf(emm0, 0), p4f_1);
/* part2:
if( x < SQRTHF ) {
e -= 1;
x = x + x - 1.0;
} else { x = x - 1.0; }
*/
Packet4f mask = reinterpret_cast<Packet4f>(vec_cmplt(x, p4f_cephes_SQRTHF));
Packet4f tmp = pand(x, mask);
x = psub(x, p4f_1);
e = psub(e, pand(p4f_1, mask));
x = padd(x, tmp);
Packet4f x2 = pmul(x,x);
Packet4f x3 = pmul(x2,x);
Packet4f y, y1, y2;
y = pmadd(p4f_cephes_log_p0, x, p4f_cephes_log_p1);
y1 = pmadd(p4f_cephes_log_p3, x, p4f_cephes_log_p4);
y2 = pmadd(p4f_cephes_log_p6, x, p4f_cephes_log_p7);
y = pmadd(y , x, p4f_cephes_log_p2);
y1 = pmadd(y1, x, p4f_cephes_log_p5);
y2 = pmadd(y2, x, p4f_cephes_log_p8);
y = pmadd(y, x3, y1);
y = pmadd(y, x3, y2);
y = pmul(y, x3);
y1 = pmul(e, p4f_cephes_log_q1);
tmp = pmul(x2, p4f_half);
y = padd(y, y1);
x = psub(x, tmp);
y2 = pmul(e, p4f_cephes_log_q2);
x = padd(x, y);
x = padd(x, y2);
// negative arg will be NAN, 0 will be -INF
x = vec_sel(x, p4f_minus_inf, iszero_mask);
x = vec_sel(p4f_minus_nan, x, isvalid_mask);
return x;
return plog_float(_x);
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f pexp<Packet4f>(const Packet4f& _x)
{
Packet4f x = _x;
return pexp_float(_x);
}
Packet4f tmp, fx;
Packet4i emm0;
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f psin<Packet4f>(const Packet4f& _x)
{
return psin_float(_x);
}
// clamp x
x = pmax(pmin(x, p4f_exp_hi), p4f_exp_lo);
// express exp(x) as exp(g + n*log(2))
fx = pmadd(x, p4f_cephes_LOG2EF, p4f_half);
fx = pfloor(fx);
tmp = pmul(fx, p4f_cephes_exp_C1);
Packet4f z = pmul(fx, p4f_cephes_exp_C2);
x = psub(x, tmp);
x = psub(x, z);
z = pmul(x,x);
Packet4f y = p4f_cephes_exp_p0;
y = pmadd(y, x, p4f_cephes_exp_p1);
y = pmadd(y, x, p4f_cephes_exp_p2);
y = pmadd(y, x, p4f_cephes_exp_p3);
y = pmadd(y, x, p4f_cephes_exp_p4);
y = pmadd(y, x, p4f_cephes_exp_p5);
y = pmadd(y, z, x);
y = padd(y, p4f_1);
// build 2^n
emm0 = vec_cts(fx, 0);
emm0 = vec_add(emm0, p4i_0x7f);
emm0 = vec_sl(emm0, reinterpret_cast<Packet4ui>(p4i_23));
// Altivec's max & min operators just drop silent NaNs. Check NaNs in
// inputs and return them unmodified.
Packet4ui isnumber_mask = reinterpret_cast<Packet4ui>(vec_cmpeq(_x, _x));
return vec_sel(_x, pmax(pmul(y, reinterpret_cast<Packet4f>(emm0)), _x),
isnumber_mask);
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet4f pcos<Packet4f>(const Packet4f& _x)
{
return pcos_float(_x);
}
#ifndef EIGEN_COMP_CLANG
@@ -225,96 +69,20 @@ Packet2d psqrt<Packet2d>(const Packet2d& x)
return vec_sqrt(x);
}
// VSX support varies between different compilers and even different
// versions of the same compiler. For gcc version >= 4.9.3, we can use
// vec_cts to efficiently convert Packet2d to Packet2l. Otherwise, use
// a slow version that works with older compilers.
// Update: apparently vec_cts/vec_ctf intrinsics for 64-bit doubles
// are buggy, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70963
static inline Packet2l ConvertToPacket2l(const Packet2d& x) {
#if EIGEN_GNUC_AT_LEAST(5, 4) || \
(EIGEN_GNUC_AT(6, 1) && __GNUC_PATCHLEVEL__ >= 1)
return vec_cts(x, 0); // TODO: check clang version.
#else
double tmp[2];
memcpy(tmp, &x, sizeof(tmp));
Packet2l l = { static_cast<long long>(tmp[0]),
static_cast<long long>(tmp[1]) };
return l;
#endif
}
template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
Packet2d pexp<Packet2d>(const Packet2d& _x)
{
Packet2d x = _x;
Packet2d tmp, fx;
Packet2l emm0;
// clamp x
x = pmax(pmin(x, p2d_exp_hi), p2d_exp_lo);
/* express exp(x) as exp(g + n*log(2)) */
fx = pmadd(x, p2d_cephes_LOG2EF, p2d_half);
fx = pfloor(fx);
tmp = pmul(fx, p2d_cephes_exp_C1);
Packet2d z = pmul(fx, p2d_cephes_exp_C2);
x = psub(x, tmp);
x = psub(x, z);
Packet2d x2 = pmul(x,x);
Packet2d px = p2d_cephes_exp_p0;
px = pmadd(px, x2, p2d_cephes_exp_p1);
px = pmadd(px, x2, p2d_cephes_exp_p2);
px = pmul (px, x);
Packet2d qx = p2d_cephes_exp_q0;
qx = pmadd(qx, x2, p2d_cephes_exp_q1);
qx = pmadd(qx, x2, p2d_cephes_exp_q2);
qx = pmadd(qx, x2, p2d_cephes_exp_q3);
x = pdiv(px,psub(qx,px));
x = pmadd(p2d_2,x,p2d_1);
// build 2^n
emm0 = ConvertToPacket2l(fx);
#ifdef __POWER8_VECTOR__
emm0 = vec_add(emm0, p2l_1023);
emm0 = vec_sl(emm0, p2ul_52);
#else
// Code is a bit complex for POWER7. There is actually a
// vec_xxsldi intrinsic but it is not supported by some gcc versions.
// So we shift (52-32) bits and do a word swap with zeros.
_EIGEN_DECLARE_CONST_Packet4i(1023, 1023);
_EIGEN_DECLARE_CONST_Packet4i(20, 20); // 52 - 32
Packet4i emm04i = reinterpret_cast<Packet4i>(emm0);
emm04i = vec_add(emm04i, p4i_1023);
emm04i = vec_sl(emm04i, reinterpret_cast<Packet4ui>(p4i_20));
static const Packet16uc perm = {
0x14, 0x15, 0x16, 0x17, 0x00, 0x01, 0x02, 0x03,
0x1c, 0x1d, 0x1e, 0x1f, 0x08, 0x09, 0x0a, 0x0b };
#ifdef _BIG_ENDIAN
emm0 = reinterpret_cast<Packet2l>(vec_perm(p4i_ZERO, emm04i, perm));
#else
emm0 = reinterpret_cast<Packet2l>(vec_perm(emm04i, p4i_ZERO, perm));
#endif
#endif
// Altivec's max & min operators just drop silent NaNs. Check NaNs in
// inputs and return them unmodified.
Packet2ul isnumber_mask = reinterpret_cast<Packet2ul>(vec_cmpeq(_x, _x));
return vec_sel(_x, pmax(pmul(x, reinterpret_cast<Packet2d>(emm0)), _x),
isnumber_mask);
return pexp_double(_x);
}
#endif
// Hyperbolic Tangent function.
template <>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f
ptanh<Packet4f>(const Packet4f& x) {
return internal::generic_fast_tanh_float(x);
}
} // end namespace internal
} // end namespace Eigen

View File

@@ -83,15 +83,6 @@ static Packet4i p4i_COUNTDOWN = { 0, 1, 2, 3 };
static Packet16uc p16uc_REVERSE32 = { 12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3 };
static Packet16uc p16uc_DUPLICATE32_HI = { 0,1,2,3, 0,1,2,3, 4,5,6,7, 4,5,6,7 };
// Mask alignment
#ifdef __PPC64__
#define _EIGEN_MASK_ALIGNMENT 0xfffffffffffffff0
#else
#define _EIGEN_MASK_ALIGNMENT 0xfffffff0
#endif
#define _EIGEN_ALIGNED_PTR(x) ((std::ptrdiff_t)(x) & _EIGEN_MASK_ALIGNMENT)
// Handle endianness properly while loading constants
// Define global static constants:
#ifdef _BIG_ENDIAN
@@ -129,27 +120,27 @@ static Packet16uc p16uc_COMPLEX32_REV2 = vec_sld(p16uc_PSET64_HI, p16uc_PSET64_L
#define EIGEN_PPC_PREFETCH(ADDR) asm( " dcbt [%[addr]]\n" :: [addr] "r" (ADDR) : "cc" );
#endif
template<> struct packet_traits<float> : default_packet_traits
{
template <>
struct packet_traits<float> : default_packet_traits {
typedef Packet4f type;
typedef Packet4f half;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size=4,
size = 4,
HasHalfPacket = 1,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasMin = 1,
HasMax = 1,
HasAbs = 1,
HasSin = 0,
HasCos = 0,
HasLog = 0,
HasExp = 1,
HasAdd = 1,
HasSub = 1,
HasMul = 1,
HasDiv = 1,
HasMin = 1,
HasMax = 1,
HasAbs = 1,
HasSin = EIGEN_FAST_MATH,
HasCos = EIGEN_FAST_MATH,
HasLog = 1,
HasExp = 1,
#ifdef __VSX__
HasSqrt = 1,
#if !EIGEN_COMP_CLANG
@@ -160,6 +151,8 @@ template<> struct packet_traits<float> : default_packet_traits
#else
HasSqrt = 0,
HasRsqrt = 0,
HasTanh = EIGEN_FAST_MATH,
HasErf = EIGEN_FAST_MATH,
#endif
HasRound = 1,
HasFloor = 1,
@@ -168,8 +161,8 @@ template<> struct packet_traits<float> : default_packet_traits
HasBlend = 1
};
};
template<> struct packet_traits<int> : default_packet_traits
{
template <>
struct packet_traits<int> : default_packet_traits {
typedef Packet4i type;
typedef Packet4i half;
enum {
@@ -186,9 +179,19 @@ template<> struct packet_traits<int> : default_packet_traits
};
};
template<> struct unpacket_traits<Packet4f> { typedef float type; enum {size=4, alignment=Aligned16}; typedef Packet4f half; };
template<> struct unpacket_traits<Packet4i> { typedef int type; enum {size=4, alignment=Aligned16}; typedef Packet4i half; };
template<> struct unpacket_traits<Packet4f>
{
typedef float type;
typedef Packet4f half;
typedef Packet4i integer_packet;
enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false};
};
template<> struct unpacket_traits<Packet4i>
{
typedef int type;
typedef Packet4i half;
enum {size=4, alignment=Aligned16, vectorizable=false, masked_load_available=false, masked_store_available=false};
};
inline std::ostream & operator <<(std::ostream & s, const Packet16uc & v)
{
@@ -238,42 +241,38 @@ inline std::ostream & operator <<(std::ostream & s, const Packet4ui & v)
// Need to define them first or we get specialization after instantiation errors
template<> EIGEN_STRONG_INLINE Packet4f pload<Packet4f>(const float* from)
{
// some versions of GCC throw "unused-but-set-parameter".
// ignoring these warnings for now.
EIGEN_UNUSED_VARIABLE(from);
EIGEN_DEBUG_ALIGNED_LOAD
#ifdef __VSX__
return vec_vsx_ld(0, from);
#else
return vec_ld(0, from);
#endif
}
template<> EIGEN_STRONG_INLINE Packet4i pload<Packet4i>(const int* from)
{
// some versions of GCC throw "unused-but-set-parameter".
// ignoring these warnings for now.
EIGEN_UNUSED_VARIABLE(from);
EIGEN_DEBUG_ALIGNED_LOAD
#ifdef __VSX__
return vec_vsx_ld(0, from);
#else
return vec_ld(0, from);
#endif
}
template<> EIGEN_STRONG_INLINE void pstore<float>(float* to, const Packet4f& from)
{
// some versions of GCC throw "unused-but-set-parameter" (float *to).
// ignoring these warnings for now.
EIGEN_UNUSED_VARIABLE(to);
EIGEN_DEBUG_ALIGNED_STORE
#ifdef __VSX__
vec_vsx_st(from, 0, to);
#else
vec_st(from, 0, to);
#endif
}
template<> EIGEN_STRONG_INLINE void pstore<int>(int* to, const Packet4i& from)
{
// some versions of GCC throw "unused-but-set-parameter" (float *to).
// ignoring these warnings for now.
EIGEN_UNUSED_VARIABLE(to);
EIGEN_DEBUG_ALIGNED_STORE
#ifdef __VSX__
vec_vsx_st(from, 0, to);
#else
vec_st(from, 0, to);
#endif
}
template<> EIGEN_STRONG_INLINE Packet4f pset1<Packet4f>(const float& from) {
@@ -285,6 +284,11 @@ template<> EIGEN_STRONG_INLINE Packet4i pset1<Packet4i>(const int& from) {
Packet4i v = {from, from, from, from};
return v;
}
template<> EIGEN_STRONG_INLINE Packet4f pset1frombits<Packet4f>(unsigned int from) {
return reinterpret_cast<Packet4f>(pset1<Packet4i>(from));
}
template<> EIGEN_STRONG_INLINE void
pbroadcast4<Packet4f>(const float *a,
Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3)
@@ -308,7 +312,7 @@ pbroadcast4<Packet4i>(const int *a,
template<> EIGEN_DEVICE_FUNC inline Packet4f pgather<float, Packet4f>(const float* from, Index stride)
{
float EIGEN_ALIGN16 af[4];
EIGEN_ALIGN16 float af[4];
af[0] = from[0*stride];
af[1] = from[1*stride];
af[2] = from[2*stride];
@@ -317,7 +321,7 @@ template<> EIGEN_DEVICE_FUNC inline Packet4f pgather<float, Packet4f>(const floa
}
template<> EIGEN_DEVICE_FUNC inline Packet4i pgather<int, Packet4i>(const int* from, Index stride)
{
int EIGEN_ALIGN16 ai[4];
EIGEN_ALIGN16 int ai[4];
ai[0] = from[0*stride];
ai[1] = from[1*stride];
ai[2] = from[2*stride];
@@ -326,7 +330,7 @@ template<> EIGEN_DEVICE_FUNC inline Packet4i pgather<int, Packet4i>(const int* f
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<float, Packet4f>(float* to, const Packet4f& from, Index stride)
{
float EIGEN_ALIGN16 af[4];
EIGEN_ALIGN16 float af[4];
pstore<float>(af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
@@ -335,7 +339,7 @@ template<> EIGEN_DEVICE_FUNC inline void pscatter<float, Packet4f>(float* to, co
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<int, Packet4i>(int* to, const Packet4i& from, Index stride)
{
int EIGEN_ALIGN16 ai[4];
EIGEN_ALIGN16 int ai[4];
pstore<int>((int *)ai, from);
to[0*stride] = ai[0];
to[1*stride] = ai[1];
@@ -391,6 +395,7 @@ template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i&
template<> EIGEN_STRONG_INLINE Packet4f pmin<Packet4f>(const Packet4f& a, const Packet4f& b)
{
#ifdef __VSX__
// NOTE: about 10% slower than vec_min, but consistent with std::min and SSE regarding NaN
Packet4f ret;
__asm__ ("xvcmpgesp %x0,%x1,%x2\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b));
return ret;
@@ -403,6 +408,7 @@ template<> EIGEN_STRONG_INLINE Packet4i pmin<Packet4i>(const Packet4i& a, const
template<> EIGEN_STRONG_INLINE Packet4f pmax<Packet4f>(const Packet4f& a, const Packet4f& b)
{
#ifdef __VSX__
// NOTE: about 10% slower than vec_max, but consistent with std::max and SSE regarding NaN
Packet4f ret;
__asm__ ("xvcmpgtsp %x0,%x2,%x1\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b));
return ret;
@@ -412,6 +418,15 @@ template<> EIGEN_STRONG_INLINE Packet4f pmax<Packet4f>(const Packet4f& a, const
}
template<> EIGEN_STRONG_INLINE Packet4i pmax<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_max(a, b); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_le(const Packet4f& a, const Packet4f& b) { return reinterpret_cast<Packet4f>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt(const Packet4f& a, const Packet4f& b) { return reinterpret_cast<Packet4f>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_eq(const Packet4f& a, const Packet4f& b) { return reinterpret_cast<Packet4f>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt_or_nan(const Packet4f& a, const Packet4f& b) {
Packet4f c = reinterpret_cast<Packet4f>(vec_cmpge(a,b));
return vec_nor(c,c);
}
template<> EIGEN_STRONG_INLINE Packet4i pcmp_eq(const Packet4i& a, const Packet4i& b) { return reinterpret_cast<Packet4i>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet4f pand<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_and(a, b); }
template<> EIGEN_STRONG_INLINE Packet4i pand<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_and(a, b); }
@@ -424,6 +439,10 @@ template<> EIGEN_STRONG_INLINE Packet4i pxor<Packet4i>(const Packet4i& a, const
template<> EIGEN_STRONG_INLINE Packet4f pandnot<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_and(a, vec_nor(b, b)); }
template<> EIGEN_STRONG_INLINE Packet4i pandnot<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_and(a, vec_nor(b, b)); }
template<> EIGEN_STRONG_INLINE Packet4f pselect(const Packet4f& mask, const Packet4f& a, const Packet4f& b) {
return vec_sel(b, a, reinterpret_cast<Packet4ui>(mask));
}
template<> EIGEN_STRONG_INLINE Packet4f pround<Packet4f>(const Packet4f& a) { return vec_round(a); }
template<> EIGEN_STRONG_INLINE Packet4f pceil<Packet4f>(const Packet4f& a) { return vec_ceil(a); }
template<> EIGEN_STRONG_INLINE Packet4f pfloor<Packet4f>(const Packet4f& a) { return vec_floor(a); }
@@ -452,16 +471,16 @@ template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int* from)
return (Packet4i) vec_perm(MSQ, LSQ, mask); // align the data
}
#else
// We also need ot redefine little endian loading of Packet4i/Packet4f using VSX
// We also need to redefine little endian loading of Packet4i/Packet4f using VSX
template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int* from)
{
EIGEN_DEBUG_UNALIGNED_LOAD
return (Packet4i) vec_vsx_ld((long)from & 15, (const int*) _EIGEN_ALIGNED_PTR(from));
return vec_vsx_ld(0, from);
}
template<> EIGEN_STRONG_INLINE Packet4f ploadu<Packet4f>(const float* from)
{
EIGEN_DEBUG_UNALIGNED_LOAD
return (Packet4f) vec_vsx_ld((long)from & 15, (const float*) _EIGEN_ALIGNED_PTR(from));
return vec_vsx_ld(0, from);
}
#endif
@@ -518,24 +537,24 @@ template<> EIGEN_STRONG_INLINE void pstoreu<int>(int* to, const Packet4i& f
vec_st( MSQ, 0, (unsigned char *)to ); // Store the MSQ part
}
#else
// We also need ot redefine little endian loading of Packet4i/Packet4f using VSX
// We also need to redefine little endian loading of Packet4i/Packet4f using VSX
template<> EIGEN_STRONG_INLINE void pstoreu<int>(int* to, const Packet4i& from)
{
EIGEN_DEBUG_ALIGNED_STORE
vec_vsx_st(from, (long)to & 15, (int*) _EIGEN_ALIGNED_PTR(to));
EIGEN_DEBUG_UNALIGNED_STORE
vec_vsx_st(from, 0, to);
}
template<> EIGEN_STRONG_INLINE void pstoreu<float>(float* to, const Packet4f& from)
{
EIGEN_DEBUG_ALIGNED_STORE
vec_vsx_st(from, (long)to & 15, (float*) _EIGEN_ALIGNED_PTR(to));
EIGEN_DEBUG_UNALIGNED_STORE
vec_vsx_st(from, 0, to);
}
#endif
template<> EIGEN_STRONG_INLINE void prefetch<float>(const float* addr) { EIGEN_PPC_PREFETCH(addr); }
template<> EIGEN_STRONG_INLINE void prefetch<int>(const int* addr) { EIGEN_PPC_PREFETCH(addr); }
template<> EIGEN_STRONG_INLINE float pfirst<Packet4f>(const Packet4f& a) { float EIGEN_ALIGN16 x; vec_ste(a, 0, &x); return x; }
template<> EIGEN_STRONG_INLINE int pfirst<Packet4i>(const Packet4i& a) { int EIGEN_ALIGN16 x; vec_ste(a, 0, &x); return x; }
template<> EIGEN_STRONG_INLINE float pfirst<Packet4f>(const Packet4f& a) { EIGEN_ALIGN16 float x; vec_ste(a, 0, &x); return x; }
template<> EIGEN_STRONG_INLINE int pfirst<Packet4i>(const Packet4i& a) { EIGEN_ALIGN16 int x; vec_ste(a, 0, &x); return x; }
template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a)
{
@@ -548,6 +567,19 @@ template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a)
template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { return vec_abs(a); }
template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { return vec_abs(a); }
template<int N> EIGEN_STRONG_INLINE Packet4i pshiftright(Packet4i a)
{ return vec_sr(a,reinterpret_cast<Packet4ui>(pset1<Packet4i>(N))); }
template<int N> EIGEN_STRONG_INLINE Packet4i pshiftleft(Packet4i a)
{ return vec_sl(a,reinterpret_cast<Packet4ui>(pset1<Packet4i>(N))); }
template<> EIGEN_STRONG_INLINE Packet4f pfrexp<Packet4f>(const Packet4f& a, Packet4f& exponent) {
return pfrexp_float(a,exponent);
}
template<> EIGEN_STRONG_INLINE Packet4f pldexp<Packet4f>(const Packet4f& a, const Packet4f& exponent) {
return pldexp_float(a,exponent);
}
template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
{
Packet4f b, sum;
@@ -676,6 +708,11 @@ template<> EIGEN_STRONG_INLINE int predux_max<Packet4i>(const Packet4i& a)
return pfirst(res);
}
template<> EIGEN_STRONG_INLINE bool predux_any(const Packet4f& x)
{
return vec_any_ne(x, pzero(x));
}
template<int Offset>
struct palign_impl<Offset,Packet4f>
{
@@ -769,6 +806,43 @@ template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, cons
}
template <>
struct type_casting_traits<float, int> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template <>
struct type_casting_traits<int, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet4i pcast<Packet4f, Packet4i>(const Packet4f& a) {
return vec_cts(a,0);
}
template<> EIGEN_STRONG_INLINE Packet4f pcast<Packet4i, Packet4f>(const Packet4i& a) {
return vec_ctf(a,0);
}
template<> EIGEN_STRONG_INLINE Packet4i preinterpret<Packet4i,Packet4f>(const Packet4f& a) {
return reinterpret_cast<Packet4i>(a);
}
template<> EIGEN_STRONG_INLINE Packet4f preinterpret<Packet4f,Packet4i>(const Packet4i& a) {
return reinterpret_cast<Packet4f>(a);
}
//---------- double ----------
#ifdef __VSX__
typedef __vector double Packet2d;
@@ -835,7 +909,7 @@ template<> struct packet_traits<double> : default_packet_traits
};
};
template<> struct unpacket_traits<Packet2d> { typedef double type; enum {size=2, alignment=Aligned16}; typedef Packet2d half; };
template<> struct unpacket_traits<Packet2d> { typedef double type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2d half; };
inline std::ostream & operator <<(std::ostream & s, const Packet2l & v)
{
@@ -863,21 +937,13 @@ inline std::ostream & operator <<(std::ostream & s, const Packet2d & v)
template<> EIGEN_STRONG_INLINE Packet2d pload<Packet2d>(const double* from)
{
EIGEN_DEBUG_ALIGNED_LOAD
#ifdef __VSX__
return vec_vsx_ld(0, from);
#else
return vec_ld(0, from);
#endif
return vec_xl(0, const_cast<double *>(from)); // cast needed by Clang
}
template<> EIGEN_STRONG_INLINE void pstore<double>(double* to, const Packet2d& from)
{
EIGEN_DEBUG_ALIGNED_STORE
#ifdef __VSX__
vec_vsx_st(from, 0, to);
#else
vec_st(from, 0, to);
#endif
vec_xst(from, 0, to);
}
template<> EIGEN_STRONG_INLINE Packet2d pset1<Packet2d>(const double& from) {
@@ -899,14 +965,14 @@ pbroadcast4<Packet2d>(const double *a,
template<> EIGEN_DEVICE_FUNC inline Packet2d pgather<double, Packet2d>(const double* from, Index stride)
{
double EIGEN_ALIGN16 af[2];
EIGEN_ALIGN16 double af[2];
af[0] = from[0*stride];
af[1] = from[1*stride];
return pload<Packet2d>(af);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<double, Packet2d>(double* to, const Packet2d& from, Index stride)
{
double EIGEN_ALIGN16 af[2];
EIGEN_ALIGN16 double af[2];
pstore<double>(af, from);
to[0*stride] = af[0];
to[1*stride] = af[1];
@@ -930,6 +996,7 @@ template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d&
template<> EIGEN_STRONG_INLINE Packet2d pmin<Packet2d>(const Packet2d& a, const Packet2d& b)
{
// NOTE: about 10% slower than vec_min, but consistent with std::min and SSE regarding NaN
Packet2d ret;
__asm__ ("xvcmpgedp %x0,%x1,%x2\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b));
return ret;
@@ -937,11 +1004,20 @@ template<> EIGEN_STRONG_INLINE Packet2d pmin<Packet2d>(const Packet2d& a, const
template<> EIGEN_STRONG_INLINE Packet2d pmax<Packet2d>(const Packet2d& a, const Packet2d& b)
{
// NOTE: about 10% slower than vec_max, but consistent with std::max and SSE regarding NaN
Packet2d ret;
__asm__ ("xvcmpgtdp %x0,%x2,%x1\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b));
return ret;
}
template<> EIGEN_STRONG_INLINE Packet2d pcmp_le(const Packet2d& a, const Packet2d& b) { return reinterpret_cast<Packet2d>(vec_cmple(a,b)); }
template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt(const Packet2d& a, const Packet2d& b) { return reinterpret_cast<Packet2d>(vec_cmplt(a,b)); }
template<> EIGEN_STRONG_INLINE Packet2d pcmp_eq(const Packet2d& a, const Packet2d& b) { return reinterpret_cast<Packet2d>(vec_cmpeq(a,b)); }
template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt_or_nan(const Packet2d& a, const Packet2d& b) {
Packet2d c = reinterpret_cast<Packet2d>(vec_cmpge(a,b));
return vec_nor(c,c);
}
template<> EIGEN_STRONG_INLINE Packet2d pand<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_and(a, b); }
template<> EIGEN_STRONG_INLINE Packet2d por<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_or(a, b); }
@@ -956,8 +1032,8 @@ template<> EIGEN_STRONG_INLINE Packet2d pfloor<Packet2d>(const Packet2d& a) { re
template<> EIGEN_STRONG_INLINE Packet2d ploadu<Packet2d>(const double* from)
{
EIGEN_DEBUG_ALIGNED_LOAD
return (Packet2d) vec_vsx_ld((long)from & 15, (const double*) _EIGEN_ALIGNED_PTR(from));
EIGEN_DEBUG_UNALIGNED_LOAD
return vec_vsx_ld(0, from);
}
template<> EIGEN_STRONG_INLINE Packet2d ploaddup<Packet2d>(const double* from)
@@ -970,13 +1046,13 @@ template<> EIGEN_STRONG_INLINE Packet2d ploaddup<Packet2d>(const double* from)
template<> EIGEN_STRONG_INLINE void pstoreu<double>(double* to, const Packet2d& from)
{
EIGEN_DEBUG_ALIGNED_STORE
vec_vsx_st((Packet4f)from, (long)to & 15, (float*) _EIGEN_ALIGNED_PTR(to));
EIGEN_DEBUG_UNALIGNED_STORE
vec_vsx_st(from, 0, to);
}
template<> EIGEN_STRONG_INLINE void prefetch<double>(const double* addr) { EIGEN_PPC_PREFETCH(addr); }
template<> EIGEN_STRONG_INLINE double pfirst<Packet2d>(const Packet2d& a) { double EIGEN_ALIGN16 x[2]; pstore<double>(x, a); return x[0]; }
template<> EIGEN_STRONG_INLINE double pfirst<Packet2d>(const Packet2d& a) { EIGEN_ALIGN16 double x[2]; pstore<double>(x, a); return x[0]; }
template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a)
{
@@ -984,6 +1060,59 @@ template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a)
}
template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { return vec_abs(a); }
// VSX support varies between different compilers and even different
// versions of the same compiler. For gcc version >= 4.9.3, we can use
// vec_cts to efficiently convert Packet2d to Packet2l. Otherwise, use
// a slow version that works with older compilers.
// Update: apparently vec_cts/vec_ctf intrinsics for 64-bit doubles
// are buggy, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70963
static inline Packet2l ConvertToPacket2l(const Packet2d& x) {
#if EIGEN_GNUC_AT_LEAST(5, 4) || \
(EIGEN_GNUC_AT(6, 1) && __GNUC_PATCHLEVEL__ >= 1)
return vec_cts(x, 0); // TODO: check clang version.
#else
double tmp[2];
memcpy(tmp, &x, sizeof(tmp));
Packet2l l = { static_cast<long long>(tmp[0]),
static_cast<long long>(tmp[1]) };
return l;
#endif
}
template<> EIGEN_STRONG_INLINE Packet2d pldexp<Packet2d>(const Packet2d& a, const Packet2d& exponent) {
// build 2^n
Packet2l emm0 = ConvertToPacket2l(exponent);
#ifdef __POWER8_VECTOR__
const Packet2l p2l_1023 = { 1023, 1023 };
const Packet2ul p2ul_52 = { 52, 52 };
emm0 = vec_add(emm0, p2l_1023);
emm0 = vec_sl(emm0, p2ul_52);
#else
// Code is a bit complex for POWER7. There is actually a
// vec_xxsldi intrinsic but it is not supported by some gcc versions.
// So we shift (52-32) bits and do a word swap with zeros.
const Packet4i p4i_1023 = pset1<Packet4i>(1023);
const Packet4i p4i_20 = pset1<Packet4i>(20); // 52 - 32
Packet4i emm04i = reinterpret_cast<Packet4i>(emm0);
emm04i = vec_add(emm04i, p4i_1023);
emm04i = vec_sl(emm04i, reinterpret_cast<Packet4ui>(p4i_20));
static const Packet16uc perm = {
0x14, 0x15, 0x16, 0x17, 0x00, 0x01, 0x02, 0x03,
0x1c, 0x1d, 0x1e, 0x1f, 0x08, 0x09, 0x0a, 0x0b };
#ifdef _BIG_ENDIAN
emm0 = reinterpret_cast<Packet2l>(vec_perm(p4i_ZERO, emm04i, perm));
#else
emm0 = reinterpret_cast<Packet2l>(vec_perm(emm04i, p4i_ZERO, perm));
#endif
#endif
return pmul(a, reinterpret_cast<Packet2d>(emm0));
}
template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a)
{
Packet2d b, sum;

View File

@@ -16,7 +16,7 @@ namespace Eigen {
namespace internal {
#if defined(__CUDACC__) && defined(EIGEN_USE_GPU)
#if defined(EIGEN_CUDACC) && defined(EIGEN_USE_GPU)
// Many std::complex methods such as operator+, operator-, operator* and
// operator/ are not constexpr. Due to this, clang does not treat them as device
@@ -55,7 +55,7 @@ template<typename T> struct scalar_difference_op<std::complex<T>, std::complex<T
// Product
template<typename T> struct scalar_product_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T>>::HasMul
Vectorizable = packet_traits<std::complex<T> >::HasMul
};
typedef typename std::complex<T> result_type;
@@ -76,7 +76,7 @@ template<typename T> struct scalar_product_op<std::complex<T>, std::complex<T> >
// Quotient
template<typename T> struct scalar_quotient_op<const std::complex<T>, const std::complex<T> > : binary_op_base<const std::complex<T>, const std::complex<T> > {
enum {
Vectorizable = packet_traits<std::complex<T>>::HasDiv
Vectorizable = packet_traits<std::complex<T> >::HasDiv
};
typedef typename std::complex<T> result_type;

View File

@@ -1,333 +0,0 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_PACKET_MATH_CUDA_H
#define EIGEN_PACKET_MATH_CUDA_H
namespace Eigen {
namespace internal {
// Make sure this is only available when targeting a GPU: we don't want to
// introduce conflicts between these packet_traits definitions and the ones
// we'll use on the host side (SSE, AVX, ...)
#if defined(__CUDACC__) && defined(EIGEN_USE_GPU)
template<> struct is_arithmetic<float4> { enum { value = true }; };
template<> struct is_arithmetic<double2> { enum { value = true }; };
template<> struct packet_traits<float> : default_packet_traits
{
typedef float4 type;
typedef float4 half;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size=4,
HasHalfPacket = 0,
HasDiv = 1,
HasSin = 0,
HasCos = 0,
HasLog = 1,
HasExp = 1,
HasSqrt = 1,
HasRsqrt = 1,
HasLGamma = 1,
HasDiGamma = 1,
HasZeta = 1,
HasPolygamma = 1,
HasErf = 1,
HasErfc = 1,
HasIGamma = 1,
HasIGammac = 1,
HasBetaInc = 1,
HasBlend = 0,
};
};
template<> struct packet_traits<double> : default_packet_traits
{
typedef double2 type;
typedef double2 half;
enum {
Vectorizable = 1,
AlignedOnScalar = 1,
size=2,
HasHalfPacket = 0,
HasDiv = 1,
HasLog = 1,
HasExp = 1,
HasSqrt = 1,
HasRsqrt = 1,
HasLGamma = 1,
HasDiGamma = 1,
HasZeta = 1,
HasPolygamma = 1,
HasErf = 1,
HasErfc = 1,
HasIGamma = 1,
HasIGammac = 1,
HasBetaInc = 1,
HasBlend = 0,
};
};
template<> struct unpacket_traits<float4> { typedef float type; enum {size=4, alignment=Aligned16}; typedef float4 half; };
template<> struct unpacket_traits<double2> { typedef double type; enum {size=2, alignment=Aligned16}; typedef double2 half; };
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pset1<float4>(const float& from) {
return make_float4(from, from, from, from);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pset1<double2>(const double& from) {
return make_double2(from, from);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plset<float4>(const float& a) {
return make_float4(a, a+1, a+2, a+3);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plset<double2>(const double& a) {
return make_double2(a, a+1);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 padd<float4>(const float4& a, const float4& b) {
return make_float4(a.x+b.x, a.y+b.y, a.z+b.z, a.w+b.w);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 padd<double2>(const double2& a, const double2& b) {
return make_double2(a.x+b.x, a.y+b.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 psub<float4>(const float4& a, const float4& b) {
return make_float4(a.x-b.x, a.y-b.y, a.z-b.z, a.w-b.w);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 psub<double2>(const double2& a, const double2& b) {
return make_double2(a.x-b.x, a.y-b.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pnegate(const float4& a) {
return make_float4(-a.x, -a.y, -a.z, -a.w);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pnegate(const double2& a) {
return make_double2(-a.x, -a.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pconj(const float4& a) { return a; }
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pconj(const double2& a) { return a; }
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmul<float4>(const float4& a, const float4& b) {
return make_float4(a.x*b.x, a.y*b.y, a.z*b.z, a.w*b.w);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmul<double2>(const double2& a, const double2& b) {
return make_double2(a.x*b.x, a.y*b.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pdiv<float4>(const float4& a, const float4& b) {
return make_float4(a.x/b.x, a.y/b.y, a.z/b.z, a.w/b.w);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pdiv<double2>(const double2& a, const double2& b) {
return make_double2(a.x/b.x, a.y/b.y);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmin<float4>(const float4& a, const float4& b) {
return make_float4(fminf(a.x, b.x), fminf(a.y, b.y), fminf(a.z, b.z), fminf(a.w, b.w));
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmin<double2>(const double2& a, const double2& b) {
return make_double2(fmin(a.x, b.x), fmin(a.y, b.y));
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmax<float4>(const float4& a, const float4& b) {
return make_float4(fmaxf(a.x, b.x), fmaxf(a.y, b.y), fmaxf(a.z, b.z), fmaxf(a.w, b.w));
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmax<double2>(const double2& a, const double2& b) {
return make_double2(fmax(a.x, b.x), fmax(a.y, b.y));
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pload<float4>(const float* from) {
return *reinterpret_cast<const float4*>(from);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pload<double2>(const double* from) {
return *reinterpret_cast<const double2*>(from);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 ploadu<float4>(const float* from) {
return make_float4(from[0], from[1], from[2], from[3]);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 ploadu<double2>(const double* from) {
return make_double2(from[0], from[1]);
}
template<> EIGEN_STRONG_INLINE float4 ploaddup<float4>(const float* from) {
return make_float4(from[0], from[0], from[1], from[1]);
}
template<> EIGEN_STRONG_INLINE double2 ploaddup<double2>(const double* from) {
return make_double2(from[0], from[0]);
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore<float>(float* to, const float4& from) {
*reinterpret_cast<float4*>(to) = from;
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore<double>(double* to, const double2& from) {
*reinterpret_cast<double2*>(to) = from;
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu<float>(float* to, const float4& from) {
to[0] = from.x;
to[1] = from.y;
to[2] = from.z;
to[3] = from.w;
}
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu<double>(double* to, const double2& from) {
to[0] = from.x;
to[1] = from.y;
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float4 ploadt_ro<float4, Aligned>(const float* from) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 350
return __ldg((const float4*)from);
#else
return make_float4(from[0], from[1], from[2], from[3]);
#endif
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double2 ploadt_ro<double2, Aligned>(const double* from) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 350
return __ldg((const double2*)from);
#else
return make_double2(from[0], from[1]);
#endif
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float4 ploadt_ro<float4, Unaligned>(const float* from) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 350
return make_float4(__ldg(from+0), __ldg(from+1), __ldg(from+2), __ldg(from+3));
#else
return make_float4(from[0], from[1], from[2], from[3]);
#endif
}
template<>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double2 ploadt_ro<double2, Unaligned>(const double* from) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 350
return make_double2(__ldg(from+0), __ldg(from+1));
#else
return make_double2(from[0], from[1]);
#endif
}
template<> EIGEN_DEVICE_FUNC inline float4 pgather<float, float4>(const float* from, Index stride) {
return make_float4(from[0*stride], from[1*stride], from[2*stride], from[3*stride]);
}
template<> EIGEN_DEVICE_FUNC inline double2 pgather<double, double2>(const double* from, Index stride) {
return make_double2(from[0*stride], from[1*stride]);
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<float, float4>(float* to, const float4& from, Index stride) {
to[stride*0] = from.x;
to[stride*1] = from.y;
to[stride*2] = from.z;
to[stride*3] = from.w;
}
template<> EIGEN_DEVICE_FUNC inline void pscatter<double, double2>(double* to, const double2& from, Index stride) {
to[stride*0] = from.x;
to[stride*1] = from.y;
}
template<> EIGEN_DEVICE_FUNC inline float pfirst<float4>(const float4& a) {
return a.x;
}
template<> EIGEN_DEVICE_FUNC inline double pfirst<double2>(const double2& a) {
return a.x;
}
template<> EIGEN_DEVICE_FUNC inline float predux<float4>(const float4& a) {
return a.x + a.y + a.z + a.w;
}
template<> EIGEN_DEVICE_FUNC inline double predux<double2>(const double2& a) {
return a.x + a.y;
}
template<> EIGEN_DEVICE_FUNC inline float predux_max<float4>(const float4& a) {
return fmaxf(fmaxf(a.x, a.y), fmaxf(a.z, a.w));
}
template<> EIGEN_DEVICE_FUNC inline double predux_max<double2>(const double2& a) {
return fmax(a.x, a.y);
}
template<> EIGEN_DEVICE_FUNC inline float predux_min<float4>(const float4& a) {
return fminf(fminf(a.x, a.y), fminf(a.z, a.w));
}
template<> EIGEN_DEVICE_FUNC inline double predux_min<double2>(const double2& a) {
return fmin(a.x, a.y);
}
template<> EIGEN_DEVICE_FUNC inline float predux_mul<float4>(const float4& a) {
return a.x * a.y * a.z * a.w;
}
template<> EIGEN_DEVICE_FUNC inline double predux_mul<double2>(const double2& a) {
return a.x * a.y;
}
template<> EIGEN_DEVICE_FUNC inline float4 pabs<float4>(const float4& a) {
return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), fabsf(a.w));
}
template<> EIGEN_DEVICE_FUNC inline double2 pabs<double2>(const double2& a) {
return make_double2(fabs(a.x), fabs(a.y));
}
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<float4,4>& kernel) {
float tmp = kernel.packet[0].y;
kernel.packet[0].y = kernel.packet[1].x;
kernel.packet[1].x = tmp;
tmp = kernel.packet[0].z;
kernel.packet[0].z = kernel.packet[2].x;
kernel.packet[2].x = tmp;
tmp = kernel.packet[0].w;
kernel.packet[0].w = kernel.packet[3].x;
kernel.packet[3].x = tmp;
tmp = kernel.packet[1].z;
kernel.packet[1].z = kernel.packet[2].y;
kernel.packet[2].y = tmp;
tmp = kernel.packet[1].w;
kernel.packet[1].w = kernel.packet[3].y;
kernel.packet[3].y = tmp;
tmp = kernel.packet[2].w;
kernel.packet[2].w = kernel.packet[3].z;
kernel.packet[3].z = tmp;
}
EIGEN_DEVICE_FUNC inline void
ptranspose(PacketBlock<double2,2>& kernel) {
double tmp = kernel.packet[0].y;
kernel.packet[0].y = kernel.packet[1].x;
kernel.packet[1].x = tmp;
}
#endif
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_PACKET_MATH_CUDA_H

File diff suppressed because it is too large Load Diff

View File

@@ -1,212 +0,0 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2016 Benoit Steiner <benoit.steiner.goog@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_TYPE_CASTING_CUDA_H
#define EIGEN_TYPE_CASTING_CUDA_H
namespace Eigen {
namespace internal {
template<>
struct scalar_cast_op<float, Eigen::half> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::half result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half operator() (const float& a) const {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 300
return __float2half(a);
#else
return Eigen::half(a);
#endif
}
};
template<>
struct functor_traits<scalar_cast_op<float, Eigen::half> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<int, Eigen::half> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::half result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half operator() (const int& a) const {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 300
return __float2half(static_cast<float>(a));
#else
return Eigen::half(static_cast<float>(a));
#endif
}
};
template<>
struct functor_traits<scalar_cast_op<int, Eigen::half> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<Eigen::half, float> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef float result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator() (const Eigen::half& a) const {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 300
return __half2float(a);
#else
return static_cast<float>(a);
#endif
}
};
template<>
struct functor_traits<scalar_cast_op<Eigen::half, float> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
#if defined(EIGEN_HAS_CUDA_FP16) && defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 300
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 2,
TgtCoeffRatio = 1
};
};
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast<half2, float4>(const half2& a, const half2& b) {
float2 r1 = __half22float2(a);
float2 r2 = __half22float2(b);
return make_float4(r1.x, r1.y, r2.x, r2.y);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 2
};
};
template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pcast<float4, half2>(const float4& a) {
// Simply discard the second half of the input
return __floats2half2_rn(a.x, a.y);
}
#elif defined EIGEN_VECTORIZE_AVX512
template <>
struct type_casting_traits<half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16f pcast<Packet16h, Packet16f>(const Packet16h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet16h pcast<Packet16f, Packet16h>(const Packet16f& a) {
return float2half(a);
}
#elif defined EIGEN_VECTORIZE_AVX
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8h, Packet8f>(const Packet8h& a) {
return half2float(a);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet8h pcast<Packet8f, Packet8h>(const Packet8f& a) {
return float2half(a);
}
// Disable the following code since it's broken on too many platforms / compilers.
//#elif defined(EIGEN_VECTORIZE_SSE) && (!EIGEN_ARCH_x86_64) && (!EIGEN_COMP_MSVC)
#elif 0
template <>
struct type_casting_traits<Eigen::half, float> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet4f pcast<Packet4h, Packet4f>(const Packet4h& a) {
__int64_t a64 = _mm_cvtm64_si64(a.x);
Eigen::half h = raw_uint16_to_half(static_cast<unsigned short>(a64));
float f1 = static_cast<float>(h);
h = raw_uint16_to_half(static_cast<unsigned short>(a64 >> 16));
float f2 = static_cast<float>(h);
h = raw_uint16_to_half(static_cast<unsigned short>(a64 >> 32));
float f3 = static_cast<float>(h);
h = raw_uint16_to_half(static_cast<unsigned short>(a64 >> 48));
float f4 = static_cast<float>(h);
return _mm_set_ps(f4, f3, f2, f1);
}
template <>
struct type_casting_traits<float, Eigen::half> {
enum {
VectorizedCast = 1,
SrcCoeffRatio = 1,
TgtCoeffRatio = 1
};
};
template<> EIGEN_STRONG_INLINE Packet4h pcast<Packet4f, Packet4h>(const Packet4f& a) {
EIGEN_ALIGN16 float aux[4];
pstore(aux, a);
Eigen::half h0(aux[0]);
Eigen::half h1(aux[1]);
Eigen::half h2(aux[2]);
Eigen::half h3(aux[3]);
Packet4h result;
result.x = _mm_set_pi16(h3.x, h2.x, h1.x, h0.x);
return result;
}
#endif
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_TYPE_CASTING_CUDA_H

View File

@@ -0,0 +1,655 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2007 Julien Pommier
// Copyright (C) 2014 Pedro Gonnet (pedro.gonnet@gmail.com)
// Copyright (C) 2009-2019 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
/* The exp and log functions of this file initially come from
* Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
*/
#ifndef EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_H
#define EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_H
namespace Eigen {
namespace internal {
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent) {
typedef typename unpacket_traits<Packet>::integer_packet PacketI;
const Packet cst_126f = pset1<Packet>(126.0f);
const Packet cst_half = pset1<Packet>(0.5f);
const Packet cst_inv_mant_mask = pset1frombits<Packet>(~0x7f800000u);
exponent = psub(pcast<PacketI,Packet>(pshiftright<23>(preinterpret<PacketI>(a))), cst_126f);
return por(pand(a, cst_inv_mant_mask), cst_half);
}
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent)
{
typedef typename unpacket_traits<Packet>::integer_packet PacketI;
const Packet cst_127 = pset1<Packet>(127.f);
// return a * 2^exponent
PacketI ei = pcast<Packet,PacketI>(padd(exponent, cst_127));
return pmul(a, preinterpret<Packet>(pshiftleft<23>(ei)));
}
// Natural logarithm
// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
// be easily approximated by a polynomial centered on m=1 for stability.
// TODO(gonnet): Further reduce the interval allowing for lower-degree
// polynomial interpolants -> ... -> profit!
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog_float(const Packet _x)
{
Packet x = _x;
const Packet cst_1 = pset1<Packet>(1.0f);
const Packet cst_half = pset1<Packet>(0.5f);
// The smallest non denormalized float number.
const Packet cst_min_norm_pos = pset1frombits<Packet>( 0x00800000u);
const Packet cst_minus_inf = pset1frombits<Packet>( 0xff800000u);
const Packet cst_pos_inf = pset1frombits<Packet>( 0x7f800000u);
// Polynomial coefficients.
const Packet cst_cephes_SQRTHF = pset1<Packet>(0.707106781186547524f);
const Packet cst_cephes_log_p0 = pset1<Packet>(7.0376836292E-2f);
const Packet cst_cephes_log_p1 = pset1<Packet>(-1.1514610310E-1f);
const Packet cst_cephes_log_p2 = pset1<Packet>(1.1676998740E-1f);
const Packet cst_cephes_log_p3 = pset1<Packet>(-1.2420140846E-1f);
const Packet cst_cephes_log_p4 = pset1<Packet>(+1.4249322787E-1f);
const Packet cst_cephes_log_p5 = pset1<Packet>(-1.6668057665E-1f);
const Packet cst_cephes_log_p6 = pset1<Packet>(+2.0000714765E-1f);
const Packet cst_cephes_log_p7 = pset1<Packet>(-2.4999993993E-1f);
const Packet cst_cephes_log_p8 = pset1<Packet>(+3.3333331174E-1f);
const Packet cst_cephes_log_q1 = pset1<Packet>(-2.12194440e-4f);
const Packet cst_cephes_log_q2 = pset1<Packet>(0.693359375f);
// Truncate input values to the minimum positive normal.
x = pmax(x, cst_min_norm_pos);
Packet e;
// extract significant in the range [0.5,1) and exponent
x = pfrexp(x,e);
// part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2))
// and shift by -1. The values are then centered around 0, which improves
// the stability of the polynomial evaluation.
// if( x < SQRTHF ) {
// e -= 1;
// x = x + x - 1.0;
// } else { x = x - 1.0; }
Packet mask = pcmp_lt(x, cst_cephes_SQRTHF);
Packet tmp = pand(x, mask);
x = psub(x, cst_1);
e = psub(e, pand(cst_1, mask));
x = padd(x, tmp);
Packet x2 = pmul(x, x);
Packet x3 = pmul(x2, x);
// Evaluate the polynomial approximant of degree 8 in three parts, probably
// to improve instruction-level parallelism.
Packet y, y1, y2;
y = pmadd(cst_cephes_log_p0, x, cst_cephes_log_p1);
y1 = pmadd(cst_cephes_log_p3, x, cst_cephes_log_p4);
y2 = pmadd(cst_cephes_log_p6, x, cst_cephes_log_p7);
y = pmadd(y, x, cst_cephes_log_p2);
y1 = pmadd(y1, x, cst_cephes_log_p5);
y2 = pmadd(y2, x, cst_cephes_log_p8);
y = pmadd(y, x3, y1);
y = pmadd(y, x3, y2);
y = pmul(y, x3);
// Add the logarithm of the exponent back to the result of the interpolation.
y1 = pmul(e, cst_cephes_log_q1);
tmp = pmul(x2, cst_half);
y = padd(y, y1);
x = psub(x, tmp);
y2 = pmul(e, cst_cephes_log_q2);
x = padd(x, y);
x = padd(x, y2);
Packet invalid_mask = pcmp_lt_or_nan(_x, pzero(_x));
Packet iszero_mask = pcmp_eq(_x,pzero(_x));
Packet pos_inf_mask = pcmp_eq(_x,cst_pos_inf);
// Filter out invalid inputs, i.e.:
// - negative arg will be NAN
// - 0 will be -INF
// - +INF will be +INF
return pselect(iszero_mask, cst_minus_inf,
por(pselect(pos_inf_mask,cst_pos_inf,x), invalid_mask));
}
/** \internal \returns log(1 + x) computed using W. Kahan's formula.
See: http://www.plunk.org/~hatch/rightway.php
*/
template<typename Packet>
Packet generic_plog1p(const Packet& x)
{
typedef typename unpacket_traits<Packet>::type ScalarType;
const Packet one = pset1<Packet>(ScalarType(1));
Packet xp1 = padd(x, one);
Packet small_mask = pcmp_eq(xp1, one);
Packet log1 = plog(xp1);
Packet inf_mask = pcmp_eq(xp1, log1);
Packet log_large = pmul(x, pdiv(log1, psub(xp1, one)));
return pselect(por(small_mask, inf_mask), x, log_large);
}
/** \internal \returns exp(x)-1 computed using W. Kahan's formula.
See: http://www.plunk.org/~hatch/rightway.php
*/
template<typename Packet>
Packet generic_expm1(const Packet& x)
{
typedef typename unpacket_traits<Packet>::type ScalarType;
const Packet one = pset1<Packet>(ScalarType(1));
const Packet neg_one = pset1<Packet>(ScalarType(-1));
Packet u = pexp(x);
Packet one_mask = pcmp_eq(u, one);
Packet u_minus_one = psub(u, one);
Packet neg_one_mask = pcmp_eq(u_minus_one, neg_one);
Packet logu = plog(u);
// The following comparison is to catch the case where
// exp(x) = +inf. It is written in this way to avoid having
// to form the constant +inf, which depends on the packet
// type.
Packet pos_inf_mask = pcmp_eq(logu, u);
Packet expm1 = pmul(u_minus_one, pdiv(x, logu));
expm1 = pselect(pos_inf_mask, u, expm1);
return pselect(one_mask,
x,
pselect(neg_one_mask,
neg_one,
expm1));
}
// Exponential function. Works by writing "x = m*log(2) + r" where
// "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then
// "exp(x) = 2^m*exp(r)" where exp(r) is in the range [-1,1).
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pexp_float(const Packet _x)
{
const Packet cst_1 = pset1<Packet>(1.0f);
const Packet cst_half = pset1<Packet>(0.5f);
const Packet cst_exp_hi = pset1<Packet>( 88.3762626647950f);
const Packet cst_exp_lo = pset1<Packet>(-88.3762626647949f);
const Packet cst_cephes_LOG2EF = pset1<Packet>(1.44269504088896341f);
const Packet cst_cephes_exp_p0 = pset1<Packet>(1.9875691500E-4f);
const Packet cst_cephes_exp_p1 = pset1<Packet>(1.3981999507E-3f);
const Packet cst_cephes_exp_p2 = pset1<Packet>(8.3334519073E-3f);
const Packet cst_cephes_exp_p3 = pset1<Packet>(4.1665795894E-2f);
const Packet cst_cephes_exp_p4 = pset1<Packet>(1.6666665459E-1f);
const Packet cst_cephes_exp_p5 = pset1<Packet>(5.0000001201E-1f);
// Clamp x.
Packet x = pmax(pmin(_x, cst_exp_hi), cst_exp_lo);
// Express exp(x) as exp(m*ln(2) + r), start by extracting
// m = floor(x/ln(2) + 0.5).
Packet m = pfloor(pmadd(x, cst_cephes_LOG2EF, cst_half));
// Get r = x - m*ln(2). If no FMA instructions are available, m*ln(2) is
// subtracted out in two parts, m*C1+m*C2 = m*ln(2), to avoid accumulating
// truncation errors.
Packet r;
#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
const Packet cst_nln2 = pset1<Packet>(-0.6931471805599453f);
r = pmadd(m, cst_nln2, x);
#else
const Packet cst_cephes_exp_C1 = pset1<Packet>(0.693359375f);
const Packet cst_cephes_exp_C2 = pset1<Packet>(-2.12194440e-4f);
r = psub(x, pmul(m, cst_cephes_exp_C1));
r = psub(r, pmul(m, cst_cephes_exp_C2));
#endif
Packet r2 = pmul(r, r);
// TODO(gonnet): Split into odd/even polynomials and try to exploit
// instruction-level parallelism.
Packet y = cst_cephes_exp_p0;
y = pmadd(y, r, cst_cephes_exp_p1);
y = pmadd(y, r, cst_cephes_exp_p2);
y = pmadd(y, r, cst_cephes_exp_p3);
y = pmadd(y, r, cst_cephes_exp_p4);
y = pmadd(y, r, cst_cephes_exp_p5);
y = pmadd(y, r2, r);
y = padd(y, cst_1);
// Return 2^m * exp(r).
return pmax(pldexp(y,m), _x);
}
// make it the default path for scalar float
template<>
EIGEN_DEVICE_FUNC inline float pexp(const float& a) { return pexp_float(a); }
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pexp_double(const Packet _x)
{
Packet x = _x;
const Packet cst_1 = pset1<Packet>(1.0);
const Packet cst_2 = pset1<Packet>(2.0);
const Packet cst_half = pset1<Packet>(0.5);
const Packet cst_exp_hi = pset1<Packet>(709.437);
const Packet cst_exp_lo = pset1<Packet>(-709.436139303);
const Packet cst_cephes_LOG2EF = pset1<Packet>(1.4426950408889634073599);
const Packet cst_cephes_exp_p0 = pset1<Packet>(1.26177193074810590878e-4);
const Packet cst_cephes_exp_p1 = pset1<Packet>(3.02994407707441961300e-2);
const Packet cst_cephes_exp_p2 = pset1<Packet>(9.99999999999999999910e-1);
const Packet cst_cephes_exp_q0 = pset1<Packet>(3.00198505138664455042e-6);
const Packet cst_cephes_exp_q1 = pset1<Packet>(2.52448340349684104192e-3);
const Packet cst_cephes_exp_q2 = pset1<Packet>(2.27265548208155028766e-1);
const Packet cst_cephes_exp_q3 = pset1<Packet>(2.00000000000000000009e0);
const Packet cst_cephes_exp_C1 = pset1<Packet>(0.693145751953125);
const Packet cst_cephes_exp_C2 = pset1<Packet>(1.42860682030941723212e-6);
Packet tmp, fx;
// clamp x
x = pmax(pmin(x, cst_exp_hi), cst_exp_lo);
// Express exp(x) as exp(g + n*log(2)).
fx = pmadd(cst_cephes_LOG2EF, x, cst_half);
// Get the integer modulus of log(2), i.e. the "n" described above.
fx = pfloor(fx);
// Get the remainder modulo log(2), i.e. the "g" described above. Subtract
// n*log(2) out in two steps, i.e. n*C1 + n*C2, C1+C2=log2 to get the last
// digits right.
tmp = pmul(fx, cst_cephes_exp_C1);
Packet z = pmul(fx, cst_cephes_exp_C2);
x = psub(x, tmp);
x = psub(x, z);
Packet x2 = pmul(x, x);
// Evaluate the numerator polynomial of the rational interpolant.
Packet px = cst_cephes_exp_p0;
px = pmadd(px, x2, cst_cephes_exp_p1);
px = pmadd(px, x2, cst_cephes_exp_p2);
px = pmul(px, x);
// Evaluate the denominator polynomial of the rational interpolant.
Packet qx = cst_cephes_exp_q0;
qx = pmadd(qx, x2, cst_cephes_exp_q1);
qx = pmadd(qx, x2, cst_cephes_exp_q2);
qx = pmadd(qx, x2, cst_cephes_exp_q3);
// I don't really get this bit, copied from the SSE2 routines, so...
// TODO(gonnet): Figure out what is going on here, perhaps find a better
// rational interpolant?
x = pdiv(px, psub(qx, px));
x = pmadd(cst_2, x, cst_1);
// Construct the result 2^n * exp(g) = e * x. The max is used to catch
// non-finite values in the input.
return pmax(pldexp(x,fx), _x);
}
// make it the default path for scalar double
template<>
EIGEN_DEVICE_FUNC inline double pexp(const double& a) { return pexp_double(a); }
// The following code is inspired by the following stack-overflow answer:
// https://stackoverflow.com/questions/30463616/payne-hanek-algorithm-implementation-in-c/30465751#30465751
// It has been largely optimized:
// - By-pass calls to frexp.
// - Aligned loads of required 96 bits of 2/pi. This is accomplished by
// (1) balancing the mantissa and exponent to the required bits of 2/pi are
// aligned on 8-bits, and (2) replicating the storage of the bits of 2/pi.
// - Avoid a branch in rounding and extraction of the remaining fractional part.
// Overall, I measured a speed up higher than x2 on x86-64.
inline float trig_reduce_huge (float xf, int *quadrant)
{
using Eigen::numext::int32_t;
using Eigen::numext::uint32_t;
using Eigen::numext::int64_t;
using Eigen::numext::uint64_t;
const double pio2_62 = 3.4061215800865545e-19; // pi/2 * 2^-62
const uint64_t zero_dot_five = uint64_t(1) << 61; // 0.5 in 2.62-bit fixed-point foramt
// 192 bits of 2/pi for Payne-Hanek reduction
// Bits are introduced by packet of 8 to enable aligned reads.
static const uint32_t two_over_pi [] =
{
0x00000028, 0x000028be, 0x0028be60, 0x28be60db,
0xbe60db93, 0x60db9391, 0xdb939105, 0x9391054a,
0x91054a7f, 0x054a7f09, 0x4a7f09d5, 0x7f09d5f4,
0x09d5f47d, 0xd5f47d4d, 0xf47d4d37, 0x7d4d3770,
0x4d377036, 0x377036d8, 0x7036d8a5, 0x36d8a566,
0xd8a5664f, 0xa5664f10, 0x664f10e4, 0x4f10e410,
0x10e41000, 0xe4100000
};
uint32_t xi = numext::as_uint(xf);
// Below, -118 = -126 + 8.
// -126 is to get the exponent,
// +8 is to enable alignment of 2/pi's bits on 8 bits.
// This is possible because the fractional part of x as only 24 meaningful bits.
uint32_t e = (xi >> 23) - 118;
// Extract the mantissa and shift it to align it wrt the exponent
xi = ((xi & 0x007fffffu)| 0x00800000u) << (e & 0x7);
uint32_t i = e >> 3;
uint32_t twoopi_1 = two_over_pi[i-1];
uint32_t twoopi_2 = two_over_pi[i+3];
uint32_t twoopi_3 = two_over_pi[i+7];
// Compute x * 2/pi in 2.62-bit fixed-point format.
uint64_t p;
p = uint64_t(xi) * twoopi_3;
p = uint64_t(xi) * twoopi_2 + (p >> 32);
p = (uint64_t(xi * twoopi_1) << 32) + p;
// Round to nearest: add 0.5 and extract integral part.
uint64_t q = (p + zero_dot_five) >> 62;
*quadrant = int(q);
// Now it remains to compute "r = x - q*pi/2" with high accuracy,
// since we have p=x/(pi/2) with high accuracy, we can more efficiently compute r as:
// r = (p-q)*pi/2,
// where the product can be be carried out with sufficient accuracy using double precision.
p -= q<<62;
return float(double(int64_t(p)) * pio2_62);
}
template<bool ComputeSine,typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
#if EIGEN_GNUC_AT_LEAST(4,4) && EIGEN_COMP_GNUC_STRICT
__attribute__((optimize("-fno-unsafe-math-optimizations")))
#endif
Packet psincos_float(const Packet& _x)
{
// Workaround -ffast-math aggressive optimizations
// See bug 1674
#if EIGEN_COMP_CLANG && defined(EIGEN_VECTORIZE_SSE)
#define EIGEN_SINCOS_DONT_OPT(X) __asm__ ("" : "+x" (X));
#else
#define EIGEN_SINCOS_DONT_OPT(X)
#endif
typedef typename unpacket_traits<Packet>::integer_packet PacketI;
const Packet cst_2oPI = pset1<Packet>(0.636619746685028076171875f); // 2/PI
const Packet cst_rounding_magic = pset1<Packet>(12582912); // 2^23 for rounding
const PacketI csti_1 = pset1<PacketI>(1);
const Packet cst_sign_mask = pset1frombits<Packet>(0x80000000u);
Packet x = pabs(_x);
// Scale x by 2/Pi to find x's octant.
Packet y = pmul(x, cst_2oPI);
// Rounding trick:
Packet y_round = padd(y, cst_rounding_magic);
EIGEN_SINCOS_DONT_OPT(y_round)
PacketI y_int = preinterpret<PacketI>(y_round); // last 23 digits represent integer (if abs(x)<2^24)
y = psub(y_round, cst_rounding_magic); // nearest integer to x*4/pi
// Reduce x by y octants to get: -Pi/4 <= x <= +Pi/4
// using "Extended precision modular arithmetic"
#if defined(EIGEN_HAS_SINGLE_INSTRUCTION_MADD)
// This version requires true FMA for high accuracy
// It provides a max error of 1ULP up to (with absolute_error < 5.9605e-08):
const float huge_th = ComputeSine ? 117435.992f : 71476.0625f;
x = pmadd(y, pset1<Packet>(-1.57079601287841796875f), x);
x = pmadd(y, pset1<Packet>(-3.1391647326017846353352069854736328125e-07f), x);
x = pmadd(y, pset1<Packet>(-5.390302529957764765544681040410068817436695098876953125e-15f), x);
#else
// Without true FMA, the previous set of coefficients maintain 1ULP accuracy
// up to x<15.7 (for sin), but accuracy is immediately lost for x>15.7.
// We thus use one more iteration to maintain 2ULPs up to reasonably large inputs.
// The following set of coefficients maintain 1ULP up to 9.43 and 14.16 for sin and cos respectively.
// and 2 ULP up to:
const float huge_th = ComputeSine ? 25966.f : 18838.f;
x = pmadd(y, pset1<Packet>(-1.5703125), x); // = 0xbfc90000
EIGEN_SINCOS_DONT_OPT(x)
x = pmadd(y, pset1<Packet>(-0.000483989715576171875), x); // = 0xb9fdc000
EIGEN_SINCOS_DONT_OPT(x)
x = pmadd(y, pset1<Packet>(1.62865035235881805419921875e-07), x); // = 0x342ee000
x = pmadd(y, pset1<Packet>(5.5644315544167710640977020375430583953857421875e-11), x); // = 0x2e74b9ee
// For the record, the following set of coefficients maintain 2ULP up
// to a slightly larger range:
// const float huge_th = ComputeSine ? 51981.f : 39086.125f;
// but it slightly fails to maintain 1ULP for two values of sin below pi.
// x = pmadd(y, pset1<Packet>(-3.140625/2.), x);
// x = pmadd(y, pset1<Packet>(-0.00048351287841796875), x);
// x = pmadd(y, pset1<Packet>(-3.13855707645416259765625e-07), x);
// x = pmadd(y, pset1<Packet>(-6.0771006282767103812147979624569416046142578125e-11), x);
// For the record, with only 3 iterations it is possible to maintain
// 1 ULP up to 3PI (maybe more) and 2ULP up to 255.
// The coefficients are: 0xbfc90f80, 0xb7354480, 0x2e74b9ee
#endif
if(predux_any(pcmp_le(pset1<Packet>(huge_th),pabs(_x))))
{
const int PacketSize = unpacket_traits<Packet>::size;
EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) float vals[PacketSize];
EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) float x_cpy[PacketSize];
EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) int y_int2[PacketSize];
pstoreu(vals, pabs(_x));
pstoreu(x_cpy, x);
pstoreu(y_int2, y_int);
for(int k=0; k<PacketSize;++k)
{
float val = vals[k];
if(val>=huge_th && (numext::isfinite)(val))
x_cpy[k] = trig_reduce_huge(val,&y_int2[k]);
}
x = ploadu<Packet>(x_cpy);
y_int = ploadu<PacketI>(y_int2);
}
// Compute the sign to apply to the polynomial.
// sin: sign = second_bit(y_int) xor signbit(_x)
// cos: sign = second_bit(y_int+1)
Packet sign_bit = ComputeSine ? pxor(_x, preinterpret<Packet>(pshiftleft<30>(y_int)))
: preinterpret<Packet>(pshiftleft<30>(padd(y_int,csti_1)));
sign_bit = pand(sign_bit, cst_sign_mask); // clear all but left most bit
// Get the polynomial selection mask from the second bit of y_int
// We'll calculate both (sin and cos) polynomials and then select from the two.
Packet poly_mask = preinterpret<Packet>(pcmp_eq(pand(y_int, csti_1), pzero(y_int)));
Packet x2 = pmul(x,x);
// Evaluate the cos(x) polynomial. (-Pi/4 <= x <= Pi/4)
Packet y1 = pset1<Packet>(2.4372266125283204019069671630859375e-05f);
y1 = pmadd(y1, x2, pset1<Packet>(-0.00138865201734006404876708984375f ));
y1 = pmadd(y1, x2, pset1<Packet>(0.041666619479656219482421875f ));
y1 = pmadd(y1, x2, pset1<Packet>(-0.5f));
y1 = pmadd(y1, x2, pset1<Packet>(1.f));
// Evaluate the sin(x) polynomial. (Pi/4 <= x <= Pi/4)
// octave/matlab code to compute those coefficients:
// x = (0:0.0001:pi/4)';
// A = [x.^3 x.^5 x.^7];
// w = ((1.-(x/(pi/4)).^2).^5)*2000+1; # weights trading relative accuracy
// c = (A'*diag(w)*A)\(A'*diag(w)*(sin(x)-x)); # weighted LS, linear coeff forced to 1
// printf('%.64f\n %.64f\n%.64f\n', c(3), c(2), c(1))
//
Packet y2 = pset1<Packet>(-0.0001959234114083702898469196984621021329076029360294342041015625f);
y2 = pmadd(y2, x2, pset1<Packet>( 0.0083326873655616851693794799871284340042620897293090820312500000f));
y2 = pmadd(y2, x2, pset1<Packet>(-0.1666666203982298255503735617821803316473960876464843750000000000f));
y2 = pmul(y2, x2);
y2 = pmadd(y2, x, x);
// Select the correct result from the two polynomials.
y = ComputeSine ? pselect(poly_mask,y2,y1)
: pselect(poly_mask,y1,y2);
// Update the sign and filter huge inputs
return pxor(y, sign_bit);
#undef EIGEN_SINCOS_DONT_OPT
}
template<typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet psin_float(const Packet& x)
{
return psincos_float<true>(x);
}
template<typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pcos_float(const Packet& x)
{
return psincos_float<false>(x);
}
/* polevl (modified for Eigen)
*
* Evaluate polynomial
*
*
*
* SYNOPSIS:
*
* int N;
* Scalar x, y, coef[N+1];
*
* y = polevl<decltype(x), N>( x, coef);
*
*
*
* DESCRIPTION:
*
* Evaluates polynomial of degree N:
*
* 2 N
* y = C + C x + C x +...+ C x
* 0 1 2 N
*
* Coefficients are stored in reverse order:
*
* coef[0] = C , ..., coef[N] = C .
* N 0
*
* The function p1evl() assumes that coef[N] = 1.0 and is
* omitted from the array. Its calling arguments are
* otherwise the same as polevl().
*
*
* The Eigen implementation is templatized. For best speed, store
* coef as a const array (constexpr), e.g.
*
* const double coef[] = {1.0, 2.0, 3.0, ...};
*
*/
template <typename Packet, int N>
struct ppolevl {
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet run(const Packet& x, const typename unpacket_traits<Packet>::type coeff[]) {
EIGEN_STATIC_ASSERT((N > 0), YOU_MADE_A_PROGRAMMING_MISTAKE);
return pmadd(ppolevl<Packet, N-1>::run(x, coeff), x, pset1<Packet>(coeff[N]));
}
};
template <typename Packet>
struct ppolevl<Packet, 0> {
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet run(const Packet& x, const typename unpacket_traits<Packet>::type coeff[]) {
EIGEN_UNUSED_VARIABLE(x);
return pset1<Packet>(coeff[0]);
}
};
/* chbevl (modified for Eigen)
*
* Evaluate Chebyshev series
*
*
*
* SYNOPSIS:
*
* int N;
* Scalar x, y, coef[N], chebevl();
*
* y = chbevl( x, coef, N );
*
*
*
* DESCRIPTION:
*
* Evaluates the series
*
* N-1
* - '
* y = > coef[i] T (x/2)
* - i
* i=0
*
* of Chebyshev polynomials Ti at argument x/2.
*
* Coefficients are stored in reverse order, i.e. the zero
* order term is last in the array. Note N is the number of
* coefficients, not the order.
*
* If coefficients are for the interval a to b, x must
* have been transformed to x -> 2(2x - b - a)/(b-a) before
* entering the routine. This maps x from (a, b) to (-1, 1),
* over which the Chebyshev polynomials are defined.
*
* If the coefficients are for the inverted interval, in
* which (a, b) is mapped to (1/b, 1/a), the transformation
* required is x -> 2(2ab/x - b - a)/(b-a). If b is infinity,
* this becomes x -> 4a/x - 1.
*
*
*
* SPEED:
*
* Taking advantage of the recurrence properties of the
* Chebyshev polynomials, the routine requires one more
* addition per loop than evaluating a nested polynomial of
* the same degree.
*
*/
template <typename Packet, int N>
struct pchebevl {
EIGEN_DEVICE_FUNC
static EIGEN_STRONG_INLINE Packet run(Packet x, const typename unpacket_traits<Packet>::type coef[]) {
typedef typename unpacket_traits<Packet>::type Scalar;
Packet b0 = pset1<Packet>(coef[0]);
Packet b1 = pset1<Packet>(static_cast<Scalar>(0.f));
Packet b2;
for (int i = 1; i < N; i++) {
b2 = b1;
b1 = b0;
b0 = psub(pmadd(x, b1, pset1<Packet>(coef[i])), b2);
}
return pmul(pset1<Packet>(static_cast<Scalar>(0.5f)), psub(b0, b2));
}
};
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_H

View File

@@ -0,0 +1,69 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2019 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_FWD_H
#define EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_FWD_H
namespace Eigen {
namespace internal {
// Forward declarations of the generic math functions
// implemented in GenericPacketMathFunctions.h
// This is needed to workaround a circular dependency.
template<typename Packet> EIGEN_STRONG_INLINE Packet
pfrexp_float(const Packet& a, Packet& exponent);
template<typename Packet> EIGEN_STRONG_INLINE Packet
pldexp_float(Packet a, Packet exponent);
/** \internal \returns log(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet plog_float(const Packet _x);
/** \internal \returns log(1 + x) */
template<typename Packet>
Packet generic_plog1p(const Packet& x);
/** \internal \returns exp(x)-1 */
template<typename Packet>
Packet generic_expm1(const Packet& x);
/** \internal \returns exp(x) for single precision float */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pexp_float(const Packet _x);
/** \internal \returns exp(x) for double precision real numbers */
template <typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pexp_double(const Packet _x);
/** \internal \returns sin(x) for single precision float */
template<typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet psin_float(const Packet& x);
/** \internal \returns cos(x) for single precision float */
template<typename Packet>
EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
EIGEN_UNUSED
Packet pcos_float(const Packet& x);
template <typename Packet, int N> struct ppolevl;
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_FWD_H

View File

@@ -26,15 +26,15 @@
// Standard 16-bit float type, mostly useful for GPUs. Defines a new
// type Eigen::half (inheriting from CUDA's __half struct) with
// type Eigen::half (inheriting either from CUDA's or HIP's __half struct) with
// operator overloads such that it behaves basically as an arithmetic
// type. It will be quite slow on CPUs (so it is recommended to stay
// in float32_bits for CPUs, except for simple parameter conversions, I/O
// in fp32 for CPUs, except for simple parameter conversions, I/O
// to disk and the likes), but fast on GPUs.
#ifndef EIGEN_HALF_CUDA_H
#define EIGEN_HALF_CUDA_H
#ifndef EIGEN_HALF_H
#define EIGEN_HALF_H
#if __cplusplus > 199711L
#define EIGEN_EXPLICIT_CAST(tgt_type) explicit operator tgt_type()
@@ -42,7 +42,6 @@
#define EIGEN_EXPLICIT_CAST(tgt_type) operator tgt_type()
#endif
#include <sstream>
namespace Eigen {
@@ -50,16 +49,25 @@ struct half;
namespace half_impl {
#if !defined(EIGEN_HAS_CUDA_FP16)
#if !defined(EIGEN_HAS_GPU_FP16)
// Make our own __half_raw definition that is similar to CUDA's.
struct __half_raw {
EIGEN_DEVICE_FUNC __half_raw() : x(0) {}
explicit EIGEN_DEVICE_FUNC __half_raw(unsigned short raw) : x(raw) {}
unsigned short x;
};
#elif defined(EIGEN_CUDACC_VER) && EIGEN_CUDACC_VER < 90000
#elif defined(EIGEN_HAS_HIP_FP16)
// Nothing to do here
// HIP fp16 header file has a definition for __half_raw
#elif defined(EIGEN_HAS_CUDA_FP16)
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000
// In CUDA < 9.0, __half is the equivalent of CUDA 9's __half_raw
typedef __half __half_raw;
typedef __half __half_raw;
#endif // defined(EIGEN_HAS_CUDA_FP16)
#elif defined(SYCL_DEVICE_ONLY)
typedef cl::sycl::half __half_raw;
#endif
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw raw_uint16_to_half(unsigned short x);
@@ -68,10 +76,16 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h);
struct half_base : public __half_raw {
EIGEN_DEVICE_FUNC half_base() {}
EIGEN_DEVICE_FUNC half_base(const half_base& h) : __half_raw(h) {}
EIGEN_DEVICE_FUNC half_base(const __half_raw& h) : __half_raw(h) {}
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDACC_VER) && EIGEN_CUDACC_VER >= 90000
#if defined(EIGEN_HAS_GPU_FP16)
#if defined(EIGEN_HAS_HIP_FP16)
EIGEN_DEVICE_FUNC half_base(const __half& h) { x = __half_as_ushort(h); }
#elif defined(EIGEN_HAS_CUDA_FP16)
#if (defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000)
EIGEN_DEVICE_FUNC half_base(const __half& h) : __half_raw(*(__half_raw*)&h) {}
#endif
#endif
#endif
};
@@ -79,18 +93,38 @@ struct half_base : public __half_raw {
// Class definition.
struct half : public half_impl::half_base {
#if !defined(EIGEN_HAS_CUDA_FP16) || (defined(EIGEN_CUDACC_VER) && EIGEN_CUDACC_VER < 90000)
// Writing this out as separate #if-else blocks to make the code easier to follow
// The same applies to most #if-else blocks in this file
#if !defined(EIGEN_HAS_GPU_FP16)
typedef half_impl::__half_raw __half_raw;
#elif defined(EIGEN_HAS_HIP_FP16)
// Nothing to do here
// HIP fp16 header file has a definition for __half_raw
#elif defined(EIGEN_HAS_CUDA_FP16)
// Note that EIGEN_CUDA_SDK_VER is set to 0 even when compiling with HIP, so
// (EIGEN_CUDA_SDK_VER < 90000) is true even for HIP! So keeping this within
// #if defined(EIGEN_HAS_CUDA_FP16) is needed
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000
typedef half_impl::__half_raw __half_raw;
#endif
#endif
EIGEN_DEVICE_FUNC half() {}
EIGEN_DEVICE_FUNC half(const __half_raw& h) : half_impl::half_base(h) {}
EIGEN_DEVICE_FUNC half(const half& h) : half_impl::half_base(h) {}
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDACC_VER) && EIGEN_CUDACC_VER >= 90000
#if defined(EIGEN_HAS_GPU_FP16)
#if defined(EIGEN_HAS_HIP_FP16)
EIGEN_DEVICE_FUNC half(const __half& h) : half_impl::half_base(h) {}
#elif defined(EIGEN_HAS_CUDA_FP16)
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000
EIGEN_DEVICE_FUNC half(const __half& h) : half_impl::half_base(h) {}
#endif
#endif
#endif
explicit EIGEN_DEVICE_FUNC half(bool b)
: half_impl::half_base(half_impl::raw_uint16_to_half(b ? 0x3c00 : 0)) {}
template<class T>
@@ -139,11 +173,6 @@ struct half : public half_impl::half_base {
EIGEN_DEVICE_FUNC EIGEN_EXPLICIT_CAST(double) const {
return static_cast<double>(half_impl::half_to_float(*this));
}
EIGEN_DEVICE_FUNC half& operator=(const half& other) {
x = other.x;
return *this;
}
};
} // end namespace Eigen
@@ -202,15 +231,24 @@ namespace Eigen {
namespace half_impl {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
#define EIGEN_HAS_NATIVE_FP16
#endif
// Intrinsics for native fp16 support. Note that on current hardware,
// these are no faster than float32_bits arithmetic (you need to use the half2
// these are no faster than fp32 arithmetic (you need to use the half2
// versions to get the ALU speed increased), but you do save the
// conversion steps back and forth.
#if defined(EIGEN_HAS_NATIVE_FP16)
EIGEN_STRONG_INLINE __device__ half operator + (const half& a, const half& b) {
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000
return __hadd(::__half(a), ::__half(b));
#else
return __hadd(a, b);
#endif
}
EIGEN_STRONG_INLINE __device__ half operator * (const half& a, const half& b) {
return __hmul(a, b);
@@ -219,9 +257,13 @@ EIGEN_STRONG_INLINE __device__ half operator - (const half& a, const half& b) {
return __hsub(a, b);
}
EIGEN_STRONG_INLINE __device__ half operator / (const half& a, const half& b) {
#if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000
return __hdiv(a, b);
#else
float num = __half2float(a);
float denom = __half2float(b);
return __float2half(num / denom);
#endif
}
EIGEN_STRONG_INLINE __device__ half operator - (const half& a) {
return __hneg(a);
@@ -261,10 +303,26 @@ EIGEN_STRONG_INLINE __device__ bool operator >= (const half& a, const half& b) {
return __hge(a, b);
}
#else // Emulate support for half floats
#endif
// Definitions for CPUs and older CUDA, mostly working through conversion
// to/from float32_bits.
// We need to distinguish clang as the CUDA compiler from clang as the host compiler,
// invoked by NVCC (e.g. on MacOS). The former needs to see both host and device implementation
// of the functions, while the latter can only deal with one of them.
#if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for half floats
#if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC)
// We need to provide emulated *host-side* FP16 operators for clang.
#pragma push_macro("EIGEN_DEVICE_FUNC")
#undef EIGEN_DEVICE_FUNC
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_HAS_NATIVE_FP16)
#define EIGEN_DEVICE_FUNC __host__
#else // both host and device need emulated ops.
#define EIGEN_DEVICE_FUNC __host__ __device__
#endif
#endif
// Definitions for CPUs and older HIP+CUDA, mostly working through conversion
// to/from fp32.
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) {
return half(float(a) + float(b));
@@ -318,6 +376,9 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const half& a, const hal
return float(a) >= float(b);
}
#if defined(__clang__) && defined(__CUDA__)
#pragma pop_macro("EIGEN_DEVICE_FUNC")
#endif
#endif // Emulate support for half floats
// Division by an index. Do it in full float precision to avoid accuracy
@@ -343,7 +404,8 @@ union float32_bits {
};
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
__half tmp_ff = __float2half(ff);
return *(__half_raw*)&tmp_ff;
@@ -399,7 +461,8 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) {
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h) {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __half2float(h);
#elif defined(EIGEN_HAS_FP16_C)
@@ -433,7 +496,8 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const half& a) {
return (a.x & 0x7fff) == 0x7c00;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const half& a) {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __hisnan(a);
#else
return (a.x & 0x7fff) > 0x7c00;
@@ -449,14 +513,19 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half abs(const half& a) {
return result;
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half exp(const half& a) {
#if EIGEN_CUDACC_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return half(hexp(a));
#else
return half(::expf(float(a)));
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half expm1(const half& a) {
return half(numext::expm1(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log(const half& a) {
#if defined(EIGEN_HAS_CUDA_FP16) && EIGEN_CUDACC_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530
#if (defined(EIGEN_HAS_CUDA_FP16) && EIGEN_CUDA_SDK_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return half(::hlog(a));
#else
return half(::logf(float(a)));
@@ -469,7 +538,8 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log10(const half& a) {
return half(::log10f(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half sqrt(const half& a) {
#if EIGEN_CUDACC_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return half(hsqrt(a));
#else
return half(::sqrtf(float(a)));
@@ -491,14 +561,16 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tanh(const half& a) {
return half(::tanhf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) {
#if EIGEN_CUDACC_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return half(hfloor(a));
#else
return half(::floorf(float(a)));
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) {
#if EIGEN_CUDACC_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return half(hceil(a));
#else
return half(::ceilf(float(a)));
@@ -506,7 +578,8 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) {
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (min)(const half& a, const half& b) {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __hlt(b, a) ? b : a;
#else
const float f1 = static_cast<float>(a);
@@ -515,7 +588,8 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (min)(const half& a, const half& b) {
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (max)(const half& a, const half& b) {
#if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __hlt(a, b) ? b : a;
#else
const float f1 = static_cast<float>(a);
@@ -524,10 +598,12 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (max)(const half& a, const half& b) {
#endif
}
#ifndef EIGEN_NO_IO
EIGEN_ALWAYS_INLINE std::ostream& operator << (std::ostream& os, const half& v) {
os << static_cast<float>(v);
return os;
}
#endif
} // end namespace half_impl
@@ -593,7 +669,8 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half exph(const Eigen::half& a) {
return Eigen::half(::expf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half logh(const Eigen::half& a) {
#if EIGEN_CUDACC_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530
#if (EIGEN_CUDA_SDK_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
return Eigen::half(::hlog(a));
#else
return Eigen::half(::logf(float(a)));
@@ -627,9 +704,12 @@ struct hash<Eigen::half> {
// Add the missing shfl_xor intrinsic
#if defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300
#if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
__device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneMask, int width=warpSize) {
#if EIGEN_CUDACC_VER < 90000
#if (EIGEN_CUDA_SDK_VER < 90000) || \
defined(EIGEN_HAS_HIP_FP16)
return static_cast<Eigen::half>(__shfl_xor(static_cast<float>(var), laneMask, width));
#else
return static_cast<Eigen::half>(__shfl_xor_sync(0xFFFFFFFF, static_cast<float>(var), laneMask, width));
@@ -638,7 +718,8 @@ __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneM
#endif
// ldg() has an overload for __half_raw, but we also need one for Eigen::half.
#if defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 350
#if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 350) || \
defined(EIGEN_HIP_DEVICE_COMPILE)
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half __ldg(const Eigen::half* ptr) {
return Eigen::half_impl::raw_uint16_to_half(
__ldg(reinterpret_cast<const unsigned short*>(ptr)));
@@ -646,7 +727,7 @@ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half __ldg(const Eigen::half* ptr)
#endif
#if defined(EIGEN_CUDA_ARCH)
#if defined(EIGEN_GPU_COMPILE_PHASE)
namespace Eigen {
namespace numext {
@@ -672,4 +753,4 @@ bool (isfinite)(const Eigen::half& h) {
} // namespace numext
#endif
#endif // EIGEN_HALF_CUDA_H
#endif // EIGEN_HALF_H

View File

@@ -21,7 +21,7 @@
* it does not correspond to the number of iterations or the number of instructions
*/
#ifndef EIGEN_UNROLLING_LIMIT
#define EIGEN_UNROLLING_LIMIT 100
#define EIGEN_UNROLLING_LIMIT 110
#endif
/** Defines the threshold between a "small" and a "large" matrix.

View File

@@ -0,0 +1,77 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2016 Benoit Steiner <benoit.steiner.goog@gmail.com>
// Copyright (C) 2019 Rasmus Munk Larsen <rmlarsen@google.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_GENERIC_TYPE_CASTING_H
#define EIGEN_GENERIC_TYPE_CASTING_H
namespace Eigen {
namespace internal {
template<>
struct scalar_cast_op<float, Eigen::half> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::half result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half operator() (const float& a) const {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __float2half(a);
#else
return Eigen::half(a);
#endif
}
};
template<>
struct functor_traits<scalar_cast_op<float, Eigen::half> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<int, Eigen::half> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef Eigen::half result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half operator() (const int& a) const {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __float2half(static_cast<float>(a));
#else
return Eigen::half(static_cast<float>(a));
#endif
}
};
template<>
struct functor_traits<scalar_cast_op<int, Eigen::half> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
template<>
struct scalar_cast_op<Eigen::half, float> {
EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
typedef float result_type;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator() (const Eigen::half& a) const {
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \
(defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE))
return __half2float(a);
#else
return static_cast<float>(a);
#endif
}
};
template<>
struct functor_traits<scalar_cast_op<Eigen::half, float> >
{ enum { Cost = NumTraits<float>::AddCost, PacketAccess = false }; };
}
}
#endif // EIGEN_GENERIC_TYPE_CASTING_H

Some files were not shown because too many files have changed in this diff Show More