Rasmus Larsen
a7c7d329d8
Merged in ezhulenev/eigen-01 (pull request PR-769)
...
Capture TensorMap by value inside tensor expression AST
2019-12-04 00:49:10 +00:00
Rasmus Larsen
cacf433975
Merged in anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754)
...
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.
Approved-by: Deven Desai <deven.desai.amd@gmail.com >
2019-12-04 00:45:42 +00:00
Eugene Zhulenev
8f4536e852
Capture TensorMap by value inside tensor expression AST
2019-12-03 16:39:05 -08:00
Rasmus Munk Larsen
4e696901f8
Remove __host__ annotation for device-only function.
2019-12-03 14:33:19 -08:00
Rasmus Munk Larsen
ead81559c8
Use EIGEN_DEVICE_FUNC macro instead of __device__.
2019-12-03 12:08:22 -08:00
Gael Guennebaud
6358599ecb
Fix QuaternionBase::cast for quaternion map and wrapper.
2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013
bug #1776 : fix vector-wise STL iterator's operator-> using a proxy as pointer type.
...
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Rasmus Munk Larsen
66f07efeae
Revert the specialization for scalar_logistic_op<float> introduced in:
...
77b447c24e
While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.
2019-12-02 17:00:58 -08:00
Rasmus Larsen
3b15373bb3
Merged in ezhulenev/eigen-02 (pull request PR-767)
...
Fix shadow warnings in AlignedBox and SparseBlock
2019-12-02 18:23:11 +00:00
Deven Desai
312c8e77ff
Fix for the HIP build+test errors.
...
Recent changes have introduced the following build error when compiling with HIPCC
---------
unsupported/test/../../Eigen/src/Core/GenericPacketMath.h:254:58: error: 'ldexp': no overloaded function has restriction specifiers that are compatible with the ambient context 'pldexp'
---------
The fix for the error is to pick the math function(s) from the global namespace (where they are declared as device functions in the HIP header files) when compiling with HIPCC.
2019-12-02 17:41:32 +00:00
Rasmus Larsen
956131d0e6
Merged in codeplaysoftware/eigen/SYCL-Backend (pull request PR-691)
...
SYCL Backend
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-11-28 16:19:25 +00:00
Mehdi Goli
00f32752f7
[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
...
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Eugene Zhulenev
82a47338df
Fix shadow warnings in AlignedBox and SparseBlock
2019-11-27 16:22:27 -08:00
Rasmus Munk Larsen
ea51a9eace
Add missing EIGEN_DEVICE_FUNC attribute to template specializations for pexp to fix GPU build.
2019-11-27 10:17:09 -08:00
Rasmus Munk Larsen
5a3ebda36b
Fix warning due to missing cast for exponent arguments for std::frexp and std::lexp.
2019-11-26 16:18:29 -08:00
Rasmus Larsen
2df57be856
Merged in realjhol/eigen/fix-warnings (pull request PR-760)
...
Fix warnings
2019-11-26 23:24:23 +00:00
Eugene Zhulenev
5496d0da0b
Add async evaluation support to TensorReverse
2019-11-26 15:02:24 -08:00
Eugene Zhulenev
bc66c88255
Add async evaluation support to TensorPadding/TensorImagePatch/TensorShuffling
2019-11-26 11:41:57 -08:00
Gael Guennebaud
c79b6ffe1f
Add an explicit example for auto and re-evaluation
2019-11-20 17:31:23 +01:00
Hans Johnson
e78ed6e7f3
COMP: Simplify install commands for Eigen
...
Confirm that install directory is identical
before and after this simplifying patch.
```bash
hg clone <<Eigen>>
mkdir eigen-bld
cd eigen-bld
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/bef
make install
find /tmp/pre_eigen_modernize >/tmp/bef
# Apply this patch
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/aft
make install
find /tmp/post_eigen_modernize |sed 's/post_e/pre_e/g' >/tmp/aft
diff /tmp/bef /tmp/aft
```
2019-11-17 15:14:25 -06:00
Hans Johnson
9d5cdc98c3
COMP: target_compile_definitions requires cmake 2.8.11
...
Features committed in 2016 have required cmake verison 2.8.11.
`sergiu Tue Nov 22 12:25:06 2016 +0100: target_compile_definitions`
Set the minimum cmake version to the minimum version that
is capable of compiling or installing the code base.
2019-11-17 14:59:32 -06:00
Gael Guennebaud
e5778b87b9
Fix duplicate symbol linking error.
2019-11-20 17:23:19 +01:00
Joel Holdsworth
86eb41f1cb
SparseRef: Fixed alignment warning on ARM GCC
2019-11-07 14:34:06 +00:00
Anshul Jaiswal
c1a67cb5af
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.
2019-11-06 22:40:38 +00:00
Rasmus Munk Larsen
cc3d0e6a40
Add EIGEN_HAS_INTRINSIC_INT128 macro
...
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
2019-11-06 14:24:33 -08:00
Rasmus Munk Larsen
ee404667e2
Rollback or PR-746 and partial rollback of 668ab3fc47
...
.
std::array is still not supported in CUDA device code on Windows.
2019-11-05 17:17:58 -08:00
Joel Holdsworth
743c925286
test/packetmath: Silence alignment warnings
2019-11-05 19:06:12 +00:00
Rasmus Larsen
0c9745903a
Merged in ezhulenev/eigen-01 (pull request PR-746)
...
Remove internal::smart_copy and replace with std::copy
2019-11-04 20:18:38 +00:00
Hans Johnson
8c8cab1afd
STYLE: Convert CMake-language commands to lower case
...
Ancient CMake versions required upper-case commands. Later command names
became case-insensitive. Now the preferred style is lower-case.
2019-10-31 11:36:37 -05:00
Hans Johnson
6fb3e5f176
STYLE: Remove CMake-language block-end command arguments
...
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308
1. Fix a bug in psqrt and make it return 0 for +inf arguments.
...
2. Simplify handling of special cases by taking advantage of the fact that the
builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:
Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)
Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.
4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.
Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
2cb2915f90
bug #1744 : fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require plog/pexp, but the later was disabled on some compilers
2019-11-15 13:39:51 +01:00
Gael Guennebaud
c3f6fcf2c0
bug #1747 : one more fix for MSVC regarding the Bessel implementation.
2019-11-15 11:12:35 +01:00
Gael Guennebaud
b9837ca9ae
bug #1281 : fix AutoDiffScalar's make_coherent for nested expression of constant ADs.
2019-11-14 14:58:08 +01:00
Gael Guennebaud
0fb6e24408
Fix case issue with Lapack unit tests
2019-11-14 14:16:05 +01:00
Gael Guennebaud
8af045a287
bug #1774 : fix VectorwiseOp::begin()/end() return types regarding constness.
2019-11-14 11:45:52 +01:00
Sakshi Goynar
75b4c0a3e0
PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flag
2019-10-31 16:09:16 -07:00
Gael Guennebaud
8496f86f84
Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices.
2019-11-13 21:16:53 +01:00
Gael Guennebaud
002e5b6db6
Move to my.cdash.org
2019-11-13 13:33:49 +01:00
Eugene Zhulenev
13c3327f5c
Remove legacy block evaluation support
2019-11-12 10:12:28 -08:00
Gael Guennebaud
71aa53dd6d
Disable AVX on broken xcode versions. See PR 748.
...
Patch adapted from Hans Johnson's PR 748.
2019-11-12 11:40:38 +01:00
Rasmus Munk Larsen
0ed0338593
Fix a race in async tensor evaluation: Don't run on_done() until after device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs.
2019-11-11 12:26:41 -08:00
Eugene Zhulenev
c952b8dfda
Break loop dependence in TensorGenerator block access
2019-11-11 10:32:57 -08:00
Rasmus Munk Larsen
ebf04fb3e8
Fix data race in css11_tensor_notification test.
2019-11-08 17:44:50 -08:00
Eugene Zhulenev
73ecb2c57d
Cleanup includes in Tensor module after switch to C++11 and above
2019-10-29 15:49:54 -07:00
Eugene Zhulenev
e7ed4bd388
Remove internal::smart_copy and replace with std::copy
2019-10-29 11:25:24 -07:00
Eugene Zhulenev
fbc0a9a3ec
Fix CXX11Meta compilation with MSVC
2019-10-28 18:30:10 -07:00
Eugene Zhulenev
bd864ab42b
Prevent potential ODR in TensorExecutor
2019-10-28 15:45:09 -07:00
Mehdi Goli
6332aff0b2
This PR fixes:
...
* The specialization of array class in the different namespace for GCC<=6.4
* The implicit call to `std::array` constructor using the initializer list for GCC <=6.1
2019-10-23 15:56:56 +01:00
Rasmus Larsen
8e4e29ae99
Merged in deven-amd/eigen-hip-fix-191018 (pull request PR-738)
...
Fix for the HIP build+test errors.
2019-10-22 22:18:38 +00:00
Rasmus Munk Larsen
97c0c5d485
Add block evaluation V2 to TensorAsyncExecutor.
...
Add async evaluation to a number of ops.
2019-10-22 12:42:44 -07:00
Deven Desai
102cf2a72d
Fix for the HIP build+test errors.
...
The errors were introduced by this commit :
After the above mentioned commit, some of the tests started failing with the following error
```
Built target cxx11_tensor_reduction
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible
DestinationBufferKind m_kind;
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible
DestinationBuffer m_destination;
^
```
For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype
2019-10-22 19:21:27 +00:00
Rasmus Munk Larsen
668ab3fc47
Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers.
2019-10-18 16:42:00 -07:00
Eugene Zhulenev
df0e8b8137
Propagate block evaluation preference through rvalue tensor expressions
2019-10-17 11:17:33 -07:00
Eugene Zhulenev
0d2a14ce11
Cleanup Tensor block destination and materialized block storage allocation
2019-10-16 17:14:37 -07:00
Eugene Zhulenev
02431cbe71
TensorBroadcasting support for random/uniform blocks
2019-10-16 13:26:28 -07:00
Eugene Zhulenev
d380c23b2c
Block evaluation for TensorGenerator/TensorReverse/TensorShuffling
2019-10-14 14:31:59 -07:00
Gael Guennebaud
39fb9eeccf
bug #1747 : fix compilation with MSVC
2019-10-14 22:50:23 +02:00
Eugene Zhulenev
a411e9f344
Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op
2019-10-10 10:56:58 -07:00
Rasmus Larsen
b03eb63d7c
Merged in ezhulenev/eigen-01 (pull request PR-726)
...
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-10 16:58:11 +00:00
Gael Guennebaud
e7d8ba747c
bug #1752 : make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled.
2019-10-10 17:41:47 +02:00
Gael Guennebaud
fb557aec5c
bug #1752 : disable some is_convertible tests for recent compilers.
2019-10-10 11:40:21 +02:00
Eugene Zhulenev
33e1746139
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-09 12:45:31 -07:00
Gael Guennebaud
f0a4642bab
Implement c++03 compatible fix for changeset 7a43af1a33
2019-10-09 16:00:57 +02:00
Gael Guennebaud
196de2efe3
Explicitly bypass resize and memmoves when there is already the exact right number of elements available.
2019-10-08 21:44:33 +02:00
Gael Guennebaud
36da231a41
Disable an expected warning in unit test
2019-10-08 16:28:14 +02:00
Gael Guennebaud
d1def335dc
fix one more possible conflicts with real/imag
2019-10-08 16:19:10 +02:00
Gael Guennebaud
87427d2eaa
PR 719: fix real/imag namespace conflict
2019-10-08 09:15:17 +02:00
Gael Guennebaud
7a43af1a33
Fix compilation of FFTW unit test
2019-10-08 08:58:35 +02:00
Eugene Zhulenev
f74ab8cb8d
Add block evaluation to TensorEvalTo and fix few small bugs
2019-10-07 15:34:26 -07:00
Brian Zhao
3afb640b56
Fixing incorrect size in Tensor documentation.
2019-10-04 21:30:35 -07:00
Rasmus Munk Larsen
20c4a9118f
Use "pdiv" rather than operator/ to support packet types.
2019-10-04 16:54:03 -07:00
Rasmus Larsen
d1dd51cb5f
Merged in ezhulenev/eigen-01 (pull request PR-723)
...
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-10-04 17:19:13 +00:00
Eugene Zhulenev
98bdd7252e
Fix compilation warnings and errors with clang in TensorBlockV2 code and tests
2019-10-04 10:15:33 -07:00
Rasmus Munk Larsen
fab4e3a753
Address comments on Chebyshev evaluation code:
...
1. Use pmadd when possible.
2. Add casts to avoid c++03 warnings.
2019-10-02 12:48:17 -07:00
Eugene Zhulenev
60ae24ee1a
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect
2019-10-02 12:44:06 -07:00
Eugene Zhulenev
6e40454a6e
Add beta to TensorContractionKernel and make memset optional
2019-10-02 11:06:02 -07:00
Rasmus Munk Larsen
bd0fac456f
Prevent infinite loop in the nvcc compiler while unrolling the recurrent templates for Chebyshev polynomial evaluation.
2019-10-01 13:15:30 -07:00
Gael Guennebaud
9549ba8313
Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real)
2019-10-01 12:54:25 +02:00
Gael Guennebaud
c8b2c603b0
Fix speed issue with SimplicialLDLT for complexes: the diagonal is real!
2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen
13ef08e5ac
Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.
2019-09-27 13:56:04 -07:00
Eugene Zhulenev
7c8bc0d928
Fix cxx11_tensor_block_io test
2019-09-25 11:48:11 -07:00
Eugene Zhulenev
0c845e28c9
Fix erf in c++03
2019-09-25 11:31:45 -07:00
Eugene Zhulenev
71d5bedf72
Fix compilation warnings and errors with clang in TensorBlockV2
2019-09-25 11:25:22 -07:00
Deven Desai
5e186b1987
Fix for the HIP build+test errors.
...
The errors were introduced by this commit : d38e6fbc27
After the above mentioned commit, some of the tests started failing with the following error
```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
3 errors generated.
```
This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).
This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
2019-09-25 15:39:13 +00:00
Eugene Zhulenev
f35b9ab510
Fix a bug in a packed block type in TensorContractionThreadPool
2019-09-24 16:54:36 -07:00
Rasmus Larsen
d38e6fbc27
Merged in rmlarsen/eigen (pull request PR-704)
...
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Rasmus Munk Larsen
591a554c68
Add TODO to cleanup FMA cost modelling.
2019-09-24 16:39:25 -07:00
Eugene Zhulenev
c64396b4c6
Choose TensorBlock StridedLinearCopy type statically
2019-09-24 16:04:29 -07:00
Eugene Zhulenev
c97b208468
Add new TensorBlock api implementation + tests
2019-09-24 15:17:35 -07:00
Eugene Zhulenev
ef9dfee7bd
Tensor block evaluation V2 support for unary/binary/broadcsting
2019-09-24 12:52:45 -07:00
Christoph Hertzberg
efd9867ff0
bug #1746 : Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types
2019-09-24 11:09:58 +02:00
Christoph Hertzberg
e4c1b3c1d2
Fix implicit conversion warnings and use pnegate to negate packets
2019-09-23 16:07:43 +02:00
Christoph Hertzberg
ba0736fa8e
Fix (or mask away) conversion warnings introduced in 553caeb6a3
...
.
2019-09-23 15:58:05 +02:00
Rasmus Munk Larsen
1d5af0693c
Add support for asynchronous evaluation of tensor casting expressions.
2019-09-19 13:54:49 -07:00
Rasmus Munk Larsen
6de5ed08d8
Add generic PacketMath implementation of the Error Function (erf).
2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen
28b6786498
Fix build on setups without AVX512DQ.
2019-09-19 12:36:09 -07:00
Deven Desai
e02d429637
Fix for the HIP build+test errors.
...
The errors were introduced by this commit : 6e215cf109
The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
2019-09-18 18:44:20 +00:00
Srinivas Vasudevan
df0816b71f
Merging eigen/eigen.
2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109
Add Bessel functions to SpecialFunctions.
...
- Split SpecialFunctions files in to a separate BesselFunctions file.
In particular add:
- Modified bessel functions of the second kind k0, k1, k0e, k1e
- Bessel functions of the first kind j0, j1
- Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Eugene Zhulenev
7c73296849
Revert accidental change to GCC diagnostics
2019-09-13 14:30:58 -07:00
Eugene Zhulenev
bf8866b466
Fix maybe-unitialized warnings in TensorContractionThreadPool
2019-09-13 14:29:55 -07:00
Eugene Zhulenev
553caeb6a3
Use ThreadLocal container in TensorContractionThreadPool
2019-09-13 12:14:44 -07:00
Srinivas Vasudevan
facdec5aa7
Add packetized versions of i0e and i1e special functions.
...
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
- Move chebevl to GenericPacketMathFunctions.
A brief benchmark with building Eigen with FMA, AVX and AVX2 flags
Before:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 57.3 57.3 10000000
BM_eigen_i0e_double/8 398 398 1748554
BM_eigen_i0e_double/64 3184 3184 218961
BM_eigen_i0e_double/512 25579 25579 27330
BM_eigen_i0e_double/4k 205043 205042 3418
BM_eigen_i0e_double/32k 1646038 1646176 422
BM_eigen_i0e_double/256k 13180959 13182613 53
BM_eigen_i0e_double/1M 52684617 52706132 10
BM_eigen_i0e_float/1 28.4 28.4 24636711
BM_eigen_i0e_float/8 75.7 75.7 9207634
BM_eigen_i0e_float/64 512 512 1000000
BM_eigen_i0e_float/512 4194 4194 166359
BM_eigen_i0e_float/4k 32756 32761 21373
BM_eigen_i0e_float/32k 261133 261153 2678
BM_eigen_i0e_float/256k 2087938 2088231 333
BM_eigen_i0e_float/1M 8380409 8381234 84
BM_eigen_i1e_double/1 56.3 56.3 10000000
BM_eigen_i1e_double/8 397 397 1772376
BM_eigen_i1e_double/64 3114 3115 223881
BM_eigen_i1e_double/512 25358 25361 27761
BM_eigen_i1e_double/4k 203543 203593 3462
BM_eigen_i1e_double/32k 1613649 1613803 428
BM_eigen_i1e_double/256k 12910625 12910374 54
BM_eigen_i1e_double/1M 51723824 51723991 10
BM_eigen_i1e_float/1 28.3 28.3 24683049
BM_eigen_i1e_float/8 74.8 74.9 9366216
BM_eigen_i1e_float/64 505 505 1000000
BM_eigen_i1e_float/512 4068 4068 171690
BM_eigen_i1e_float/4k 31803 31806 21948
BM_eigen_i1e_float/32k 253637 253692 2763
BM_eigen_i1e_float/256k 2019711 2019918 346
BM_eigen_i1e_float/1M 8238681 8238713 86
After:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 15.8 15.8 44097476
BM_eigen_i0e_double/8 99.3 99.3 7014884
BM_eigen_i0e_double/64 777 777 886612
BM_eigen_i0e_double/512 6180 6181 100000
BM_eigen_i0e_double/4k 48136 48140 14678
BM_eigen_i0e_double/32k 385936 385943 1801
BM_eigen_i0e_double/256k 3293324 3293551 228
BM_eigen_i0e_double/1M 12423600 12424458 57
BM_eigen_i0e_float/1 16.3 16.3 43038042
BM_eigen_i0e_float/8 30.1 30.1 23456931
BM_eigen_i0e_float/64 169 169 4132875
BM_eigen_i0e_float/512 1338 1339 516860
BM_eigen_i0e_float/4k 10191 10191 68513
BM_eigen_i0e_float/32k 81338 81337 8531
BM_eigen_i0e_float/256k 651807 651984 1000
BM_eigen_i0e_float/1M 2633821 2634187 268
BM_eigen_i1e_double/1 16.2 16.2 42352499
BM_eigen_i1e_double/8 110 110 6316524
BM_eigen_i1e_double/64 822 822 851065
BM_eigen_i1e_double/512 6480 6481 100000
BM_eigen_i1e_double/4k 51843 51843 10000
BM_eigen_i1e_double/32k 414854 414852 1680
BM_eigen_i1e_double/256k 3320001 3320568 212
BM_eigen_i1e_double/1M 13442795 13442391 53
BM_eigen_i1e_float/1 17.6 17.6 41025735
BM_eigen_i1e_float/8 35.5 35.5 19597891
BM_eigen_i1e_float/64 240 240 2924237
BM_eigen_i1e_float/512 1424 1424 485953
BM_eigen_i1e_float/4k 10722 10723 65162
BM_eigen_i1e_float/32k 86286 86297 8048
BM_eigen_i1e_float/256k 691821 691868 1000
BM_eigen_i1e_float/1M 2777336 2777747 256
This shows anywhere from a 50% to 75% improvement on these operations.
I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).
Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Srinivas Vasudevan
b052ec6992
Merged eigen/eigen into default
2019-09-11 18:01:54 -07:00
Deven Desai
cdb377d0cb
Fix for the HIP build+test errors introduced by the ndtri support.
...
The fixes needed are
* adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
* switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
* removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Gael Guennebaud
747c6a51ca
bug #1736 : fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d
bug #1741 : fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride
2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08
Fix compilation of BLAS backend and frontend
2019-09-11 10:02:37 +02:00
Rasmus Larsen
97f1e1d89f
Merged in ezhulenev/eigen-01 (pull request PR-698)
...
ThreadLocal container that does not rely on thread local storage
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-09-10 23:19:33 +00:00
Eugene Zhulenev
d918bd9a8b
Update ThreadLocal to use separate Initialize/Release callables
2019-09-10 16:13:32 -07:00
Gael Guennebaud
afa8d13532
Fix some implicit literal to Scalar conversions in SparseCore
2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115
bug #1741 : fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride
2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956
bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1
2019-09-10 16:25:24 +02:00
Eugene Zhulenev
e3dec4dcc1
ThreadLocal container that does not rely on thread local storage
2019-09-09 15:18:14 -07:00
Gael Guennebaud
17226100c5
Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
...
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3
Fix compilation without vector engine available (e.g., x86 with SSE disabled):
...
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7
Merged eigen/eigen
2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd
Fix doc issues regarding ndtri
2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926
Fix possible warning regarding strict equality comparisons
2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615
Merging from eigen/eigen.
2019-09-03 15:34:47 -04:00
Eugene Zhulenev
a8d264fa9c
Add test for const TensorMap underlying data mutation
2019-09-03 11:38:39 -07:00
Eugene Zhulenev
f68f2bba09
TensorMap constness should not change underlying storage constness
2019-09-03 11:08:09 -07:00
Gael Guennebaud
8e7e3d9bc8
Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)
2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27
PR 681: Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13
Change typedefs from private to protected to fix MSVC compilation
2019-09-03 19:11:36 -07:00
Eugene Zhulenev
47fefa235f
Allow move-only done callback in TensorAsyncDevice
2019-09-03 17:20:56 -07:00
Srinivas Vasudevan
18ceb3413d
Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b
Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
...
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen
85928e5f47
Guard against repeated definition of EIGEN_MPL2_ONLY
2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen
facc4e4536
Disable tests for contraction with output kernels when using libxsmm, which does not support this.
2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen
eab7e52db2
[Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
...
A few benchmark numbers:
name old time/op new time/op delta
BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5)
BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen
0987126165
Clean up unnecessary namespace specifiers in TensorBlock.h.
2019-08-07 12:12:52 -07:00
Gael Guennebaud
0050644b23
Fix doc regarding alignment and c++17
2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen
e2999d4c38
Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662 .
...
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s
VS
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
2019-08-02 11:18:13 -07:00
Alberto Luaces
c694be1214
Fixed Tensor documentation formatting.
2019-07-23 09:24:06 +00:00
Gael Guennebaud
15f3d9d272
More colamd cleanup:
...
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d
Eigen_Colamd.h updated to replace constexpr with consts and enums.
2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face
Ordering.h edited to fix dependencies on Eigen_Colamd.h
2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2
Eigen_Colamd.h edited replacing macros with constexprs and functions.
2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf
Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions
2019-07-21 04:53:31 +00:00
Kyle Vedder
f22b7283a3
Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code.
2019-07-18 18:12:14 +00:00
Michael Grupp
6e17491f45
Fix typo in Umeyama method documentation
2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456
Remove {} accidentally added in previous commit
2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f
Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN block, to make it actually appear in the generated documentation.
2019-07-12 19:46:37 +02:00
Christoph Hertzberg
9237883ff1
Escape \# inside doxygen docu
2019-07-12 19:45:13 +02:00
Christoph Hertzberg
c2671e5315
Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
...
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Eugene Zhulenev
3cd148f983
Fix expression evaluation heuristic for TensorSliceOp
2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen
23b958818e
Fix compiler for unsigned integers.
2019-07-09 11:18:25 -07:00
Eugene Zhulenev
6083014594
Add outer/inner chipping optimization for chipping dimension specified at runtime
2019-07-03 11:35:25 -07:00
Deven Desai
7eb2e0a95b
adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.
...
Not having this attribute results in the following failures in the `--config=rocm` TF build.
```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
typename Storage::Type result = constCast(m_impl.data());
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());
```
Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
2019-07-02 20:02:46 +00:00
Gael Guennebaud
ef8aca6a89
Merged in codeplaysoftware/eigen (pull request PR-667)
...
[SYCL] :
Approved-by: Gael Guennebaud <g.gael@free.fr >
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-07-02 12:45:23 +00:00
Eugene Zhulenev
4ac93f8edc
Allocate non-const scalar buffer for block evaluation with DefaultDevice
2019-07-01 10:55:19 -07:00
Mehdi Goli
9ea490c82c
[SYCL] :
...
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
* Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
* Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
* Fixing the SYCL device macro in SpecialFunctionsImpl.h.
2019-07-01 16:27:28 +01:00
Eugene Zhulenev
81a03bec75
Fix TensorReverse on GPU with m_stride[i]==0
2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen
8053eeb51e
Fix CUDA compilation error for pselect<half>.
2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen
74a9dd1102
Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.
2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen
70d4020ad9
Remove comma causing warning in c++03 mode.
2019-06-28 11:39:45 -07:00
Eugene Zhulenev
6e7c76481a
Merge with Eigen head
2019-06-28 11:22:46 -07:00
Eugene Zhulenev
878845cb25
Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred
2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen
1f61aee5ca
[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
...
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:11:56 -07:00
Mehdi Goli
7d08fa805a
[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
...
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:08:23 +01:00
Mehdi Goli
16a56b2ddd
[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
...
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
2019-06-27 12:25:09 +01:00
Christoph Hertzberg
adec097c61
Remove extra comma (causes warnings in C++03)
2019-06-26 16:14:28 +02:00
Eugene Zhulenev
229db81572
Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp
2019-06-25 15:41:37 -07:00
Deven Desai
ba506d5bd2
fix for a ROCm/HIP specificcompile errror introduced by a recent commit.
2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e
Remove extra "one" in comment.
2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb
Update comment as suggested by tra@google.com.
2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad
Fix grammar.
2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e
Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause.
2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1
Fix CUDA build on Mac.
2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730
Various fixes for packet ops.
...
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1
bug #1724 : Mask buggy warnings with g++-7
...
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Anshul Jaiswal
fab51d133e
Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD
...
to prevent conflicts with other libraries / code.
2019-06-08 21:09:06 +00:00
Eugene Zhulenev
79c402e40e
Fix shadow warnings in TensorContractionThreadPool
2019-08-30 15:38:31 -07:00
Eugene Zhulenev
edf2ec28d8
Fix block mapper type name in TensorExecutor
2019-08-30 15:29:25 -07:00
Eugene Zhulenev
f0b36fb9a4
evalSubExprsIfNeededAsync + async TensorContractionThreadPool
2019-08-30 15:13:38 -07:00
Eugene Zhulenev
619cea9491
Revert accidentally removed <memory> header from ThreadPool
2019-08-30 14:51:17 -07:00
Eugene Zhulenev
66665e7e76
Asynchronous expression evaluation with TensorAsyncDevice
2019-08-30 14:49:40 -07:00
Rasmus Munk Larsen
f6c51d9209
Fix missing header inclusion and colliding definitions for half type casting, which broke
...
build with -march=native on Haswell/Skylake.
2019-08-30 14:03:29 -07:00
Eugene Zhulenev
bc40d4522c
Const correctness in TensorMap<const Tensor<T, ...>> expressions
2019-08-28 17:46:05 -07:00
Rasmus Munk Larsen
1187bb65ad
Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.
2019-08-28 12:20:21 -07:00
Eugene Zhulenev
6e77f9bef3
Remove shadow warnings in TensorDeviceThreadPool
2019-08-28 10:32:19 -07:00
Rasmus Munk Larsen
9aba527405
Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.
2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d
Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
2019-08-27 11:30:31 -07:00
Rasmus Larsen
84fefdf321
Merged in ezhulenev/eigen-01 (pull request PR-683)
...
Asynchronous parallelFor in Eigen ThreadPoolDevice
2019-08-26 21:49:17 +00:00
maratek
8b5ab0e4dd
Fix get_random_seed on Native Client
...
Newlib in Native Client SDK does not provide ::random function.
Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
2019-08-23 15:25:56 -07:00
Eugene Zhulenev
6901788013
Asynchronous parallelFor in Eigen ThreadPoolDevice
2019-08-22 10:50:51 -07:00
Christoph Hertzberg
2fb24384c9
Merged in jaopaulolc/eigen (pull request PR-679)
...
Fixes for Altivec/VSX and compilation with clang on PowerPC
2019-08-22 15:57:33 +00:00
Rasmus Larsen
57f6b62597
Merged in rmlarsen/eigen (pull request PR-680)
...
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
2019-08-22 00:25:29 +00:00
Eugene Zhulenev
071311821e
Remove XSMM support from Tensor module
2019-08-19 11:44:25 -07:00
João P. L. de Carvalho
5ac7984ffa
Fix debug macros in p{load,store}u
2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40
Add missing pcmp_XX methods for double/Packet2d
...
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen
a3298b22ec
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
...
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
João P. L. de Carvalho
787f6ef025
Fix packed load/store for PowerPC's VSX
...
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294
Fix offset argument of ploadu/pstoreu for Altivec
...
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e
bug #1718 : Add cast to successfully compile with clang on PowerPC
...
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Rasmus Munk Larsen
6d432eae5d
Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off.
2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816
Add workaround for choosing the right include files with FP16C support with clang.
2019-06-05 13:36:37 -07:00
Justin Carpentier
ffaf658ecd
PR 655: Fix missing Eigen namespace in Macros
2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c
[SYCL] Adding the SYCL memory model. The SYCL memory model provides :
...
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
* an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Larsen
c1b0aea653
Merged in Artem-B/eigen (pull request PR-654)
...
Minor build improvements
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1
Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.
2019-05-31 15:26:06 -07:00
tra
b4c49bf00e
Minor build improvements
...
* Allow specifying multiple GPU architectures. E.g.:
cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
2019-05-31 14:08:34 -07:00
Christoph Hertzberg
5614400581
digits10() needs to return an integer
...
Problem reported on https://stackoverflow.com/questions/56395899
2019-05-31 15:45:41 +02:00
Rasmus Larsen
36e0a2b93f
Merged in deven-amd/eigen-hip-fix-190524 (pull request PR-649)
...
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 16:05:31 +00:00
Deven Desai
2c38930161
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves
56bc4974fb
GEMV: remove double declaration of constant.
...
That was hurting users with compilers that would object to proceed with
that:
"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
LhsPacketSize = Traits::LhsPacketSize,
^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
2019-05-23 14:50:29 -07:00
Christoph Hertzberg
ac21a08c13
Cast Index to RealScalar
...
This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types).
Reported by Peter Anderson-Sprecher via eMail
2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen
3eb5ad0ed0
Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177
...
and are part of LLVM 3.8.0.
2019-05-20 17:19:20 -07:00
Rasmus Larsen
e92486b8c3
Merged in rmlarsen/eigen (pull request PR-643)
...
Make Eigen build with cuda 10 and clang.
Approved-by: Justin Lebar <justin.lebar@gmail.com >
2019-05-20 17:02:39 +00:00
Rasmus Munk Larsen
fd595d42a7
Merge
2019-05-20 09:39:11 -07:00
Gael Guennebaud
cc7ecbb124
Merged in scramsby/eigen (pull request PR-646)
...
Eigen: Fix MSVC C++17 language standard detection logic
2019-05-20 07:19:10 +00:00
Eugene Zhulenev
01654d97fa
Prevent potential division by zero in TensorExecutor
2019-05-17 14:02:25 -07:00
Rasmus Larsen
78d3015722
Merged in ezhulenev/eigen-01 (pull request PR-644)
...
Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
2019-05-17 19:44:25 +00:00
Rasmus Larsen
bf9cbed8d0
Merged in glchaves/eigen (pull request PR-635)
...
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-05-17 19:40:50 +00:00
Eugene Zhulenev
96a276803c
Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
2019-05-16 16:15:45 -07:00
Rasmus Munk Larsen
ab0a30e429
Make Eigen build with cuda 10 and clang.
2019-05-15 13:32:15 -07:00
Rasmus Munk Larsen
734a50dc60
Make Eigen build with cuda 10 and clang.
2019-05-15 13:32:15 -07:00
Rasmus Larsen
c8d8d5c0fc
Merged in rmlarsen/eigen_threadpool (pull request PR-640)
...
Fix deadlocks in thread pool.
Approved-by: Eugene Zhulenev <ezhulenev@google.com >
2019-05-13 20:04:35 +00:00
Christoph Hertzberg
5f32b79edc
Collapsed revision from PR-641
...
* SparseLU.h - corrected example, it didn't compile
* Changed encoding back to UTF8
2019-05-13 19:02:30 +02:00
Anuj Rawat
ad372084f5
Removing unused API to fix compile error in TensorFlow due to
...
AVX512VL, AVX512BW usage
2019-05-12 14:43:10 +00:00
Christoph Hertzberg
4ccd1ece92
bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions
2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen
d3ef7cf03e
Fix build with clang on Windows.
2019-05-09 11:07:04 -07:00
Rasmus Munk Larsen
e5ac8cbd7a
A) fix deadlocks in thread pool caused by EventCount
...
This fixed 2 deadlocks caused by sloppiness in the EventCount logic.
Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm:
01da8caf00
bug #1 (Prewait):
Prewait must not consume existing signals.
Consider the following scenario.
There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty.
Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait.
Thread 2 checks the queue and now is going to call Prewait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Now thread 2 resumes and calls Prewait and takes away the signal.
Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks.
As the result we have 2 tasks, but only 1 thread is running.
bug #2 (CancelWait):
CancelWait must not take away a signal if it's not sure that the signal was meant for this thread.
When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm):
(a) the registered waiter notices presence of the new task and does not block
(b) the signaler notices presence of the waiters and wakes it
(c) both the waiter notices presence of the new task and signaler notices presence of the waiter
[it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock]
CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else.
Consider:
Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1).
Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks.
As the result we have 2 tasks, but only 1 thread is running.
Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2.
This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running.
B) fix deadlock in thread pool caused by RunQueue
This fixed a deadlock caused by sloppiness in the RunQueue logic.
Most likely this was introduced with the non-blocking thread pool.
The deadlock only affects workloads that require parallelism.
Most computational tasks don't require parallelism.
PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals.
Consider 2 worker threads are blocked.
External thread submits a task. One of the threads is woken.
It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock).
The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait).
Now external thread submits another task and signals EventCount again.
The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running.
It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug.
It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
2019-05-08 10:16:46 -07:00
Michael Tesch
c5019f722b
Use pade for matrix exponential also for complex values.
2019-05-08 17:04:55 +02:00
Eugene Zhulenev
45b40d91ca
Fix AVX512 & GCC 6.3 compilation
2019-05-07 16:44:55 -07:00
Christoph Hertzberg
e6667a7060
Fix stupid shadow-warnings (with old clang versions)
2019-05-07 18:32:19 +02:00
Christoph Hertzberg
e54dc24d62
Restore C++03 compatibility
2019-05-07 18:30:44 +02:00
Christoph Hertzberg
cca76c272c
Restore C++03 compatibility
2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7
Fix traits for scalar_logistic_op.
2019-05-03 15:49:09 -07:00
Scott Ramsby
ff06ef7584
Eigen: Fix MSVC C++17 language standard detection logic
...
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
2019-05-03 14:14:09 -07:00
Eugene Zhulenev
e9f0eb8a5e
Add masked_store_available to unpacket_traits
2019-05-02 14:52:58 -07:00
Eugene Zhulenev
96e30e936a
Add masked pstoreu for Packet16h
2019-05-02 14:11:01 -07:00
Eugene Zhulenev
b4010f02f9
Add masked pstoreu to AVX and AVX512 PacketMath
2019-05-02 13:14:18 -07:00
Gael Guennebaud
578407f42f
Fix regression in changeset ae33e866c7
2019-05-02 15:45:21 +02:00
Rasmus Larsen
ac50afaffa
Merged in ezhulenev/eigen-01 (pull request PR-633)
...
Check if gpu_assert was overridden in TensorGpuHipCudaDefines
2019-04-29 16:29:35 +00:00
Gustavo Lima Chaves
d4dcb71bcb
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
...
We take advantage of smaller SIMD registers as well, in that case.
Gains up to 3x for select input sizes.
2019-04-26 14:12:39 -07:00
Andy May
ae33e866c7
Fix compilation with PGI version 19
2019-04-25 21:23:19 +01:00
Gael Guennebaud
665ac22cc6
Merged in ezhulenev/eigen-01 (pull request PR-632)
...
Fix doxygen warnings
2019-04-25 20:02:20 +00:00
Eugene Zhulenev
01d7e6ee9b
Check if gpu_assert was overridden in TensorGpuHipCudaDefines
2019-04-25 11:19:17 -07:00
Eugene Zhulenev
8ead5bb3d8
Fix doxygen warnings to enable statis code analysis
2019-04-24 12:42:28 -07:00
Eugene Zhulenev
07355d47c6
Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h
2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen
144ca33321
Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings.
2019-04-24 08:50:07 -07:00
Eugene Zhulenev
a7b7f3ca8a
Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings
2019-04-23 17:23:19 -07:00
Eugene Zhulenev
68a2a8c445
Use packet ops instead of AVX2 intrinsics
2019-04-23 11:41:02 -07:00
Anuj Rawat
8c7a6feb8e
Adding lowlevel APIs for optimized RHS packet load in TensorFlow
...
SpatialConvolution
Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.
This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.
Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim
Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.
AVX512:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128, 24x24, 1, 64, 8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
32, 24x24, 3, 64, 5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128, 24x24, 3, 64, 3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
32, 14x14, 24, 64, 5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128, 3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X
AVX2:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
32, 24x24, 3, 64, 5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128, 24x24, 1, 64, 5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128, 24x24, 3, 64, 3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128, 3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X
In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).
On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:
AVX512:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 41350 | 15073 | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 7277 | 7341 | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 8675 | 8681 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 24155 | 16079 | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 25052 | 17152 | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 18269 | 18345 | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 19468 | 19872 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 156060 | 42432 | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 132701 | 36944 | 3.59X
AVX2:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 26233 | 12393 | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 6091 | 6062 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 7427 | 7408 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 23453 | 20826 | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 23167 | 22091 | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422 | 23682 | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165 | 23663 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 72689 | 44969 | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 61732 | 39779 | 1.55X
All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Christoph Hertzberg
4270c62812
Split the implementation of i?amax/min into two. Based on PR-627 by Sameer Agarwal.
...
Like the Netlib reference implementation, I*AMAX now uses the L1-norm instead of the L2-norm for each element. Changed I*MIN accordingly.
2019-04-15 17:18:03 +02:00
Rasmus Munk Larsen
039ee52125
Tweak cost model for tensor contraction when parallelizing over the inner dimension.
...
https://bitbucket.org/snippets/rmlarsen/MexxLo
2019-04-12 13:35:10 -07:00
Jonathon Koyle
9a3f06d836
Update TheadPoolDevice example to include ThreadPool creation and passing pointer into constructor.
2019-04-10 10:02:33 -06:00
Deven Desai
66a885b61e
adding EIGEN_DEVICE_FUNC to the recently added TensorContractionKernel constructor. Not having the EIGEN_DEVICE_FUNC attribute on it was leading to compiler errors when compiling Eigen in the ROCm/HIP path
2019-04-08 13:45:08 +00:00
Eugene Zhulenev
629ddebd15
Add missing semicolon
2019-04-02 15:04:26 -07:00
Eugene Zhulenev
4e2f6de1a8
Add support for custom packed Lhs/Rhs blocks in tensor contractions
2019-04-01 11:47:31 -07:00
Gael Guennebaud
45e65fbb77
bug #1695 : fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign.
2019-03-27 20:16:58 +01:00
William D. Irons
8de66719f9
Collapsed revision from PR-619
...
* Add support for pcmp_eq in AltiVec/Complex.h
* Fixed implementation of pcmp_eq for double
The new logic is based on the logic from NEON for double.
2019-03-26 18:14:49 +00:00
Gael Guennebaud
f11364290e
ICC does not support -fno-unsafe-math-optimizations
2019-03-22 09:26:24 +01:00
David Tellenbach
3031d57200
PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN
2019-03-21 02:21:04 +01:00
Deven Desai
51e399fc15
updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16
2019-03-19 21:45:25 +00:00
Deven Desai
2dbea5510f
Merged eigen/eigen into default
2019-03-19 16:52:38 -04:00
Rasmus Larsen
5c93b38c5f
Merged in rmlarsen/eigen (pull request PR-618)
...
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>.
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-03-18 15:51:55 +00:00
Gael Guennebaud
48898a988a
fix unit test in c++03: c++03 does not allow passing local or anonymous enum as template param
2019-03-18 11:38:36 +01:00
Gael Guennebaud
cf7e2e277f
bug #1692 : enable enum as sizes of Matrix and Array
2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen
e42f9aa68a
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>.
2019-03-15 17:15:14 -07:00
Rasmus Larsen
1936aac43f
Merged in tellenbach/eigen/sykline_consistent_include_guards (pull request PR-617)
...
Fix include guard comments for Skyline module
2019-03-15 20:04:56 +00:00
David Tellenbach
bd9c2ae3fd
Fix include guard comments
2019-03-15 15:29:17 +01:00
Rasmus Munk Larsen
8450a6d519
Clean up half packet traits and add a few more missing packet ops.
2019-03-14 15:18:06 -07:00
David Tellenbach
b013176e52
Remove undefined std::complex<int>
2019-03-14 11:40:28 +01:00
David Tellenbach
97f9a46cb9
PR 593: Add variadtic ctor for DiagonalMatrix with unit tests
2019-03-14 10:18:24 +01:00
Gael Guennebaud
45ab514fe2
revert debug stuff
2019-03-14 10:08:12 +01:00
Rasmus Munk Larsen
6a34003141
Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2.
2019-03-13 11:52:41 -07:00
Gael Guennebaud
d7d2f0680e
bug #1684 : partially workaround clang's 6/7 bug #40815
2019-03-13 10:40:01 +01:00
Rasmus Larsen
690f0795d0
Merged in rmlarsen/eigen (pull request PR-615)
...
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-12 16:09:48 +00:00
Thomas Capricelli
1901433674
erm.. use proper id
2019-03-12 13:53:38 +01:00
Thomas Capricelli
90302aa8c9
update tracking code
2019-03-12 13:47:01 +01:00
Rasmus Munk Larsen
77f7d4a894
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-11 17:51:16 -07:00
Eugene Zhulenev
001f10e3c9
Fix segfaults with cuda compilation
2019-03-11 09:43:33 -07:00
Eugene Zhulenev
899c16fa2c
Fix a bug in TensorGenerator for 1d tensors
2019-03-11 09:42:01 -07:00
Eugene Zhulenev
0f8bfff23d
Fix a data race in NonBlockingThreadPool
2019-03-11 09:38:44 -07:00
Gael Guennebaud
656d9bc66b
Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax
2019-03-10 21:19:18 +01:00
Gael Guennebaud
2df4f00246
Change license from LGPL to MPL2 with agreement from David Harmon.
2019-03-07 18:17:10 +01:00
Rasmus Munk Larsen
3c3f639fe2
Merge.
2019-03-06 11:54:30 -08:00
Rasmus Munk Larsen
f4ec8edea8
Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local.
2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen
41cdc370d0
Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
...
Found with -Wundefined-func-template.
Author: tkoeppe@google.com
2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen
cc407c9d4d
Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
...
Found with -Wundefined-func-template.
Author: tkoeppe@google.com
2019-03-06 11:40:06 -08:00
Eugene Zhulenev
1bc2a0a57c
Add missing return to NonBlockingThreadPool::LocalSteal
2019-03-06 10:49:49 -08:00
Eugene Zhulenev
4e4dcd9026
Remove redundant steal loop
2019-03-06 10:39:07 -08:00
Rasmus Larsen
4d808e834a
Merged in rmlarsen/eigen_threadpool (pull request PR-606)
...
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
Approved-by: Sameer Agarwal <sameeragarwal@google.com >
2019-03-06 17:59:03 +00:00
Rasmus Larsen
2ea18e505f
Merged in ezhulenev/eigen-01 (pull request PR-610)
...
Block evaluation for TensorGeneratorOp
2019-03-06 16:49:38 +00:00
Eugene Zhulenev
25abaa2e41
Check that inner block dimension is continuous
2019-03-05 17:34:35 -08:00
Eugene Zhulenev
5d9a6686ed
Block evaluation for TensorGeneratorOp
2019-03-05 16:35:21 -08:00
Rasmus Larsen
b4861f4778
Merged in ezhulenev/eigen-01 (pull request PR-609)
...
Tune tensor contraction threadpool heuristics
2019-03-05 23:54:40 +00:00
Gael Guennebaud
bfbf7da047
bug #1689 fix used-but-marked-unused warning
2019-03-05 23:46:24 +01:00
Eugene Zhulenev
a407e022e6
Tune tensor contraction threadpool heuristics
2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82
Add an extra check for the RunQueue size estimate
2019-03-05 11:51:26 -08:00
Eugene Zhulenev
b1a8627493
Do not create Tensor<const T> in cxx11_tensor_forced_eval test
2019-03-05 11:19:25 -08:00
Rasmus Munk Larsen
0318fc7f44
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
2019-03-05 10:24:54 -08:00
Eugene Zhulenev
efb5080d31
Do not initialize invalid fast_strides in TensorGeneratorOp
2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2
Add tiled evaluation for TensorForcedEvalOp
2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd
Use fast divisors in TensorGeneratorOp
2019-03-04 11:10:21 -08:00
Gael Guennebaud
b0d406d91c
Enable construction of Ref<VectorType> from a runtime vector.
2019-03-03 15:25:25 +01:00
Sam Hasinoff
9ba81cf0ff
Fully qualify Eigen::internal::aligned_free
...
This helps avoids a conflict on certain Windows toolchains
(potentially due to some ADL name resolution bug) in the case
where aligned_free is defined in the global namespace. In any
case, tightening this up is harmless.
2019-03-02 17:42:16 +00:00
Gael Guennebaud
22144e949d
bug #1629 : fix compilation of PardisoSupport (regression introduced in changeset a7842daef2
...
)
2019-03-02 22:44:47 +01:00
Bernhard M. Wiedemann
b071672e78
Do not keep latex logs
...
to make package builds more reproducible.
See https://reproducible-builds.org/ for why this is good.
2019-02-27 11:09:00 +01:00
Rasmus Munk Larsen
cf4a1c81fa
Fix specialization for conjugate on non-complex types in TensorBase.h.
2019-03-01 14:21:09 -08:00
Sameer Agarwal
c181dfb8ab
Consistently use EIGEN_BLAS_FUNC in BLAS.
...
Previously, for a few functions, eithe BLASFUNC or, EIGEN_CAT
was being used. This change uses EIGEN_BLAS_FUNC consistently
everywhere.
Also introduce EIGEN_BLAS_FUNC_SUFFIX, which by default is
equal to "_", this allows the user to inject a new suffix as
needed.
2019-02-27 11:30:58 -08:00
Rasmus Larsen
9558f4c25f
Merged in rmlarsen/eigen_threadpool (pull request PR-596)
...
Improve EventCount used by the non-blocking threadpool.
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-02-26 20:37:26 +00:00
Rasmus Larsen
2ca1e73239
Merged in rmlarsen/eigen (pull request PR-597)
...
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2.
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-02-25 17:02:16 +00:00
Gael Guennebaud
e409dbba14
Enable SSE vectorization of Quaternion and cross3() with AVX
2019-02-23 10:45:40 +01:00
Rasmus Munk Larsen
6560692c67
Improve EventCount used by the non-blocking threadpool.
...
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Gael Guennebaud
0b25a5c431
fix alignment in ploadquad
2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen
1dc1677d52
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2.
2019-02-22 12:33:57 -08:00
Gael Guennebaud
0cb4ba98e7
update wrt recent changes
2019-02-21 17:19:36 +01:00
Gael Guennebaud
cca6c207f4
AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM
2019-02-21 17:18:28 +01:00
Gael Guennebaud
1c09ee8541
bug #1674 : workaround clang fast-math aggressive optimizations
2019-02-22 15:48:53 +01:00
Gael Guennebaud
7e3084bb6f
Fix compilation on ARM.
2019-02-22 14:56:12 +01:00
Gael Guennebaud
32502f3c45
bug #1684 : add simplified regression test for respective clang's bug (this also reveal the same bug in Apples's clang)
2019-02-22 10:29:06 +01:00
Gael Guennebaud
42c23f14ac
Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes.
2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen
4d7f317102
Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.
2019-02-21 13:32:13 -08:00
Gael Guennebaud
2a39659d79
Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.
2019-02-20 15:23:23 +01:00
Gael Guennebaud
302377110a
Update documentation of Matrix and Array type aliases.
2019-02-20 15:18:48 +01:00
Gael Guennebaud
475295b5ff
Enable documentation of Array's typedefs
2019-02-20 15:18:07 +01:00
Gael Guennebaud
44b54fa4a3
Protect c++11 type alias with Eigen's macro, and add respective unit test.
2019-02-20 14:43:05 +01:00
Gael Guennebaud
7195f008ce
Merged in ra_bauke/eigen (pull request PR-180)
...
alias template for matrix and array classes, see also bug #864
Approved-by: Heiko Bauke <heiko.bauke@mail.de >
2019-02-20 13:22:39 +00:00
Gael Guennebaud
4e8047cdcf
Fix compilation with gcc and remove TR1 stuff.
2019-02-20 13:59:34 +01:00
Gael Guennebaud
844e5447f8
Update documentation regarding alignment issue.
2019-02-20 13:54:04 +01:00
Gael Guennebaud
edd413c184
bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
...
- this helps clang 5 and 6 to support alignas in STL's containers.
- this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
3b5deeb546
bug #899 : make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems.
2019-02-19 22:57:51 +01:00
Gael Guennebaud
482c5fb321
bug #899 : remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.
2019-02-19 22:52:15 +01:00
Gael Guennebaud
9ac1634fdf
Fix conversion warnings
2019-02-19 21:59:53 +01:00
Gael Guennebaud
292d61970a
Fix C++17 compilation
2019-02-19 21:59:41 +01:00
Rasmus Munk Larsen
071629a440
Fix incorrect value of NumDimensions in TensorContraction traits.
...
Reported here: #1671
2019-02-19 10:49:54 -08:00
Christoph Hertzberg
a1646fc960
Commas at the end of enumerator lists are not allowed in C++03
2019-02-19 14:32:25 +01:00
Gael Guennebaud
2cfc025bda
fix unit compilation in c++17: std::ptr_fun has been removed.
2019-02-19 14:05:22 +01:00
Gael Guennebaud
ab78cabd39
Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode.
2019-02-19 14:04:35 +01:00
Gael Guennebaud
115da6a1ea
Fix conversion warnings
2019-02-19 14:00:15 +01:00
Gael Guennebaud
7d10c78738
bug #1046 : add unit tests for correct propagation of alignment through std::alignment_of
2019-02-19 10:31:56 +01:00
Gael Guennebaud
7580112c31
Fix harmless Scalar vs RealScalar cast.
2019-02-18 22:12:28 +01:00
Gael Guennebaud
e23bf40dc2
Add unit test for LinSpaced and complex numbers.
2019-02-18 22:03:47 +01:00
Gael Guennebaud
796db94e6e
bug #1194 : implement slightly faster and SIMD friendly 4x4 determinant.
2019-02-18 16:21:27 +01:00
Gael Guennebaud
31b6e080a9
Fix regression: .conjugate() was popped out but not re-introduced.
2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0
Set cost of conjugate to 0 (in practice it boils down to a no-op).
...
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1
GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
...
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495 ).
2019-02-18 11:47:54 +01:00
Christoph Hertzberg
ec032ac03b
Guard C++11-style default constructor. Also, this is only needed for MSVC
2019-02-16 09:44:05 +01:00
Gael Guennebaud
902a7793f7
Add possibility to bench row-major lhs and rhs
2019-02-15 16:52:34 +01:00
Gael Guennebaud
83309068b4
bug #1680 : improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE.
2019-02-15 16:35:35 +01:00
Gael Guennebaud
0505248f25
bug #1680 : make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC)
2019-02-15 16:33:56 +01:00
Gael Guennebaud
559320745e
bug #1678 : Fix lack of __FMA__ macro on MSVC with AVX512
2019-02-15 10:30:28 +01:00
Gael Guennebaud
d85ae650bf
bug #1678 : workaround MSVC compilation issues with AVX512
2019-02-15 10:24:17 +01:00
Gael Guennebaud
f2970819a2
bug #1679 : avoid possible division by 0 in complex-schur
2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen
65e23ca7e9
Revert b55b5c7280
...
.
2019-02-14 13:46:13 -08:00
Rasmus Larsen
efeabee445
Merged in ezhulenev/eigen-01 (pull request PR-590)
...
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7
Fix signed-unsigned return in RuqQueue
2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265
Fix signed-unsigned comparison warning in RunQueue
2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a
Do not generate no-op cast() and conjugate() expressions
2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790
Speedup Tensor ThreadPool RunQueu::Empty()
2019-02-13 10:20:53 -08:00
Gael Guennebaud
bdcb5f3304
Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...)
2019-02-11 22:56:19 +01:00
Gael Guennebaud
2edfc6807d
Fix compilation of empty products of the form: Mx0 * 0xN
2019-02-11 18:24:07 +01:00
Gael Guennebaud
eb46f34a8c
Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.
...
Not sure that's so critical, but this does not complexify the code base much.
2019-02-11 17:59:35 +01:00
Gael Guennebaud
dada863d23
Enable unit tests of PartialPivLU on fixed size matrices, and increase tested matrix size (blocking was not tested!)
2019-02-11 17:56:20 +01:00
Gael Guennebaud
ab6e6edc32
Speedup PartialPivLU for small matrices by passing compile-time sizes when available.
...
This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well.
The table below reports times in micro seconds for 10 random matrices:
| ------ float --------- | ------- double ------- |
size | before after ratio | before after ratio |
fixed 1 | 0.34 0.11 2.93 | 0.35 0.11 3.06 |
fixed 2 | 0.81 0.24 3.38 | 0.91 0.25 3.60 |
fixed 3 | 1.49 0.49 3.04 | 1.68 0.55 3.01 |
fixed 4 | 2.31 0.70 3.28 | 2.45 1.08 2.27 |
fixed 5 | 3.49 1.11 3.13 | 3.84 2.24 1.71 |
fixed 6 | 4.76 1.64 2.88 | 4.87 2.84 1.71 |
dyn 1 | 0.50 0.40 1.23 | 0.51 0.40 1.26 |
dyn 2 | 1.08 0.85 1.27 | 1.04 0.69 1.49 |
dyn 3 | 1.76 1.26 1.40 | 1.84 1.14 1.60 |
dyn 4 | 2.57 1.75 1.46 | 2.67 1.66 1.60 |
dyn 5 | 3.80 2.64 1.43 | 4.00 2.48 1.61 |
dyn 6 | 5.06 3.43 1.47 | 5.15 3.21 1.60 |
2019-02-11 13:58:24 +01:00
Eugene Zhulenev
21eb97d3e0
Add PacketConv implementation for non-vectorizable src expressions
2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1
Optimize TensorConversion evaluator: do not convert same type
2019-02-08 15:13:24 -08:00
Steven Peters
953ca5ba2f
Spline.h: fix spelling "spang" -> "span"
2019-02-08 06:23:24 +00:00
Eugene Zhulenev
59998117bb
Don't do parallel_pack if we can use thread_local memory in tensor contractions
2019-02-07 09:21:25 -08:00
Gael Guennebaud
013cc3a6b3
Make GEMM fallback to GEMV for runtime vectors.
...
This is a more general and simpler version of changeset 4c0fa6ce0f
2019-02-07 16:24:09 +01:00
Gael Guennebaud
fa2fcb4895
Backed out changeset 4c0fa6ce0f
2019-02-07 16:07:08 +01:00
Gael Guennebaud
b3c4344a68
bug #1676 : workaround GCC's bug in c++17 mode.
2019-02-07 15:21:35 +01:00
Rasmus Larsen
3091c03898
Merged in ezhulenev/eigen-01 (pull request PR-581)
...
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing
Approved-by: Rasmus Larsen <rmlarsen@google.com >
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-02-05 22:45:20 +00:00
Eugene Zhulenev
8491127082
Do not reduce parallelism too much in contractions with small number of threads
2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing
2019-02-04 10:43:16 -08:00
Eugene Zhulenev
6d0f6265a9
Remove duplicated comment line
2019-02-04 10:30:25 -08:00
Eugene Zhulenev
690b2c45b1
Fix GeneralBlockPanelKernel Android compilation
2019-02-04 10:29:15 -08:00
Gael Guennebaud
871e2e5339
bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)
2019-02-03 08:54:47 +01:00
Rasmus Larsen
e7b481ea74
Merged in rmlarsen/eigen (pull request PR-578)
...
Speed up Eigen matrix*vector and vector*matrix multiplication.
Approved-by: Eugene Zhulenev <ezhulenev@google.com >
2019-02-02 01:53:44 +00:00
Sameer Agarwal
b55b5c7280
Speed up row-major matrix-vector product on ARM
...
The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.
This change introduces two modifications to this logic.
1. A smaller threshold for ARM and ARM64 devices.
The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.
On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.
The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.
With the current change, Eigen handily beats that code.
2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.
Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen
4c0fa6ce0f
Speed up Eigen matrix*vector and vector*matrix multiplication.
...
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.
The benchmarks below test
c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.
Benchmark measurements:
SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 1096 312 +71.5%
BM_MatVec/128 4581 1464 +68.0%
BM_MatVec/256 18534 5710 +69.2%
BM_MatVec/512 118083 24162 +79.5%
BM_MatVec/1k 704106 173346 +75.4%
BM_MatVec/2k 3080828 742728 +75.9%
BM_MatVec/4k 25421512 4530117 +82.2%
BM_VecMat/32 352 130 +63.1%
BM_VecMat/64 1213 425 +65.0%
BM_VecMat/128 4640 1564 +66.3%
BM_VecMat/256 17902 5884 +67.1%
BM_VecMat/512 70466 24000 +65.9%
BM_VecMat/1k 340150 161263 +52.6%
BM_VecMat/2k 1420590 645576 +54.6%
BM_VecMat/4k 8083859 4364327 +46.0%
AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 619 120 +80.6%
BM_MatVec/128 9693 752 +92.2%
BM_MatVec/256 38356 2773 +92.8%
BM_MatVec/512 69006 12803 +81.4%
BM_MatVec/1k 443810 160378 +63.9%
BM_MatVec/2k 2633553 646594 +75.4%
BM_MatVec/4k 16211095 4327148 +73.3%
BM_VecMat/64 925 227 +75.5%
BM_VecMat/128 3438 830 +75.9%
BM_VecMat/256 13427 2936 +78.1%
BM_VecMat/512 53944 12473 +76.9%
BM_VecMat/1k 302264 157076 +48.0%
BM_VecMat/2k 1396811 675778 +51.6%
BM_VecMat/4k 8962246 4459010 +50.2%
AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 401 111 +72.3%
BM_MatVec/128 1846 513 +72.2%
BM_MatVec/256 36739 1927 +94.8%
BM_MatVec/512 54490 9227 +83.1%
BM_MatVec/1k 487374 161457 +66.9%
BM_MatVec/2k 2016270 643824 +68.1%
BM_MatVec/4k 13204300 4077412 +69.1%
BM_VecMat/32 324 106 +67.3%
BM_VecMat/64 1034 246 +76.2%
BM_VecMat/128 3576 802 +77.6%
BM_VecMat/256 13411 2561 +80.9%
BM_VecMat/512 58686 10037 +82.9%
BM_VecMat/1k 320862 163750 +49.0%
BM_VecMat/2k 1406719 651397 +53.7%
BM_VecMat/4k 7785179 4124677 +47.0%
Currently watchingStop watching
2019-01-31 14:24:08 -08:00
Gael Guennebaud
7ef879f6bf
GEBP: improves pipelining in the 1pX4 path with FMA.
...
Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA.
With AVX+FMA I measured a speed up of about x1.25 in such cases.
2019-01-30 23:45:12 +01:00
Gael Guennebaud
de77bf5d6c
Fix compilation with ARM64.
2019-01-30 16:48:20 +01:00
Gael Guennebaud
d586686924
Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper.
2019-01-30 16:48:01 +01:00
Gael Guennebaud
eb4c6bb22d
Fix conflicts and merge
2019-01-30 15:57:08 +01:00
Gael Guennebaud
e3622a0396
Slightly extend discussions on auto and move the content of the Pit falls wiki page here.
...
http://eigen.tuxfamily.org/index.php?title=Pit_Falls
2019-01-30 13:09:21 +01:00
Gael Guennebaud
df12fae8b8
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 , the previous GCC issue is fixed in GCC trunk (will be gcc 9).
2019-01-30 11:52:28 +01:00
Gael Guennebaud
3775926bba
ARM64 & GEBP: add specialization for double +30% speed up
2019-01-30 11:49:06 +01:00
Gael Guennebaud
be5b0f664a
ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM
2019-01-30 11:48:25 +01:00
Christoph Hertzberg
a7779a9b42
Hide some annoying unused variable warnings in g++8.1
2019-01-29 16:48:21 +01:00
Gael Guennebaud
efe02292a6
Add recent gemm related changesets and various cleanups in perf-monitoring
2019-01-29 11:53:47 +01:00
Gael Guennebaud
8a06c699d0
bug #1669 : fix PartialPivLU/inverse with zero-sized matrices.
2019-01-29 10:27:13 +01:00
Gael Guennebaud
a2a07e62b9
Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected.
2019-01-29 10:10:07 +01:00
Gael Guennebaud
f489f44519
bug #1574 : implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs.
2019-01-28 17:29:50 +01:00
Gael Guennebaud
803fa79767
Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion
2019-01-28 17:24:44 +01:00
Gael Guennebaud
53560f9186
bug #1672 : fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro)
2019-01-28 13:47:28 +01:00
Christoph Hertzberg
c9825b967e
Renaming even more I identifiers
2019-01-26 13:22:13 +01:00
Christoph Hertzberg
5a52e35f9a
Renaming some more I identifiers
2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen
71429883ee
Fix compilation error in NEON GEBP specializaition of madd.
2019-01-25 17:00:21 -08:00
Christoph Hertzberg
934b8a1304
Avoid I as an identifier, since it may clash with the C-header complex.h
2019-01-25 14:54:39 +01:00
Gael Guennebaud
ec8a387972
cleanup
2019-01-24 10:24:45 +01:00
Gael Guennebaud
6908ce2a15
More thoroughly check variadic template ctor of fixed-size vectors
2019-01-24 10:24:28 +01:00
David Tellenbach
237b03b372
PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients.
2019-01-23 00:07:19 +01:00
Christoph Hertzberg
bd6dadcda8
Tell doxygen that cxx11 math is available
2019-01-24 00:14:02 +01:00
Gael Guennebaud
c64d5d3827
Bypass inline asm for non compatible compilers.
2019-01-23 23:43:13 +01:00
Christoph Hertzberg
e16913a45f
Fix name of tutorial snippet.
2019-01-23 10:35:06 +01:00
Gael Guennebaud
80f81f9c4b
Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.
2019-01-22 17:08:47 +01:00
David Tellenbach
db152b9ee6
PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
...
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
Gael Guennebaud
543529da6a
Add more extensive tests of Array ctors, including {} variants
2019-01-22 15:30:50 +01:00
nluehr
92774f0275
Replace host_define.h with cuda_runtime_api.h
2019-01-18 16:10:09 -06:00
Gael Guennebaud
d18f49cbb3
Fix compilation of unit tests with gcc and c++17
2019-01-18 11:12:42 +01:00
Christoph Hertzberg
da0a41b9ce
Mask unused-parameter warnings, when building with NDEBUG
2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen
2eccbaf3f7
Add missing logical packet ops for GPU and NEON.
2019-01-17 17:45:08 -08:00
Christoph Hertzberg
d575505d25
After fixing bug #1557 , boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success)
2019-01-17 19:14:07 +01:00
Gael Guennebaud
ee3662abc5
Remove some useless const_cast
2019-01-17 18:27:49 +01:00
Gael Guennebaud
0fe6b7d687
Make nestByValue works again (broken since 3.3) and add unit tests.
2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82
Extend reshaped unit tests and remove useless const_cast
2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1
Cleanup useless const_cast and add missing broadcast assignment tests
2019-01-17 16:55:42 +01:00
Gael Guennebaud
be05d0030d
Make FullPivLU use conjugateIf<>
2019-01-17 12:01:00 +01:00
Patrick Peltzer
bba2f05064
Boosttest only available for Boost version >= 1.53.0
2019-01-17 11:54:37 +01:00
Patrick Peltzer
15e53d5d93
PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
...
This changeset also includes:
* add HouseholderSequence::conjugateIf
* define int as the StorageIndex type for all dense solvers
* dedicated unit tests, including assertion checking
* _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
* CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
* Cholesky: add missing assertions
* FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
* BDCSVD: Unambiguous return type for ternary operator
* SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11
Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it.
2019-01-17 11:33:43 +01:00
Gael Guennebaud
7b35c26b1c
Doc: remove link to porting guide
2019-01-17 10:35:50 +01:00
Gael Guennebaud
4759d9e86d
Doc: add manual page on STL iterators
2019-01-17 10:35:14 +01:00
Gael Guennebaud
562985bac4
bug #1646 : fix false aliasing detection for A.row(0) = A.col(0);
...
This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.
2019-01-17 00:14:27 +01:00
Rasmus Munk Larsen
7401e2541d
Fix compilation error for logical packet ops with older compilers.
2019-01-16 14:43:33 -08:00
Rasmus Munk Larsen
ee550a2ac3
Fix flaky test for tensor fft.
2019-01-16 14:03:12 -08:00
Gael Guennebaud
0f028f61cb
GEBP: fix swapped kernel mode with AVX512 and complex scalars
2019-01-16 22:26:38 +01:00
Gael Guennebaud
e118ce86fd
GEBP: cleanup logic to choose between a 4 packets of 1 packet
2019-01-16 21:47:42 +01:00
Gael Guennebaud
70e133333d
bug #1661 : fix regression in GEBP and AVX512
2019-01-16 21:22:20 +01:00
Gael Guennebaud
ce88e297dc
Add a comment stating this doc page is partly obsolete.
2019-01-16 16:29:02 +01:00
Gael Guennebaud
729d1291c2
bug #1585 : update doc on lazy-evaluation
2019-01-16 16:28:17 +01:00
Gael Guennebaud
c8e40edac9
Remove Eigen2ToEigen3 migration page (obsolete since 3.3)
2019-01-16 16:27:00 +01:00
Gael Guennebaud
aeffdf909e
bug #1617 : add unit tests for empty triangular solve.
2019-01-16 15:24:59 +01:00
Gael Guennebaud
502f717980
bug #1646 : disable aliasing detection for empty and 1x1 expression
2019-01-16 14:33:45 +01:00
Gael Guennebaud
0b466b6933
bug #1633 : use proper type for madd temporaries, factorize RhsPacketx4.
2019-01-16 13:50:13 +01:00
Renjie Liu
dbfcceabf5
Bug: 1633: refactor gebp kernel and optimize for neon
2019-01-16 12:51:36 +08:00
Gael Guennebaud
2b70b2f570
Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry
2019-01-15 22:50:42 +01:00
Gael Guennebaud
2c2c114995
Silent maybe-uninitialized warnings by gcc
2019-01-15 16:53:15 +01:00
Gael Guennebaud
6ec6bf0b0d
Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests)
2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24
bug #1592 : makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests)
2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e
Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
...
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
32d7232aec
fix always true warning with gcc 4.7
2019-01-15 11:18:48 +01:00
Gael Guennebaud
6cf7afa3d9
Typo
2019-01-15 11:04:37 +01:00
Gael Guennebaud
e7d4d4f192
cleanup
2019-01-15 10:51:03 +01:00
Rasmus Larsen
7b3aab0936
Merged in rmlarsen/eigen (pull request PR-570)
...
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Rasmus Munk Larsen
8bf00c2baf
Remove extra <tr>.
2019-01-14 13:29:29 -08:00
Rasmus Munk Larsen
ec7fe83554
Merge.
2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2ea4efc0c3
Merge.
2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2c5843dbbb
Update documentation.
2019-01-14 13:26:34 -08:00
Gael Guennebaud
250dcd1fdb
bug #1652 : fix position of EIGEN_ALIGN16 attributes in Neon and Altivec
2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae
Merged eigen/eigen into default
2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d
AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers
2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe
AVX512 (r)sqrt(double) was mistakenly disabled with clang and others
2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90
fix warning
2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3
Doc: add Isometry in the list of supported Mode of Transform<>
2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a
Initialize isometric transforms like affine transforms.
...
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61
PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
...
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023
Fix StorageIndex FIXME in dense LU solvers
2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
1c6e6e2c3f
Merge.
2019-01-11 17:47:11 -08:00
Rasmus Larsen
0ba3b45419
Merged eigen/eigen into default
2019-01-11 17:46:04 -08:00
Rasmus Munk Larsen
28ba1b2c32
Add support for inverse hyperbolic functions.
...
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
89c4001d6f
Fix warnings in ptrue for complex and half types.
2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen
a49d01edba
Fix warnings in ptrue for complex and half types.
2019-01-11 13:18:17 -08:00
Eugene Zhulenev
1e6d15b55b
Fix shorten-64-to-32 warning in TensorContractionThreadPool
2019-01-11 11:41:53 -08:00
Rasmus Munk Larsen
df29511ac0
Fix merge.
2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen
8e71ed4cc9
Merge.
2019-01-11 10:35:07 -08:00
Rasmus Munk Larsen
fff5a5b579
Resolve.
2019-01-11 10:28:52 -08:00
Rasmus Munk Larsen
9396ace46b
Merge.
2019-01-11 10:28:52 -08:00
Rasmus Larsen
74882471d0
Merged eigen/eigen into default
2019-01-11 10:20:55 -08:00
Rasmus Munk Larsen
e9936cf2b9
Merge.
2019-01-11 09:58:33 -08:00
Gael Guennebaud
9005f0111f
Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7.
2019-01-11 17:10:54 +01:00
Mark D Ryan
3c9add6598
Remove reinterpret_cast from AVX512 complex implementation
...
The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&)
ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working
correctly. They're used to convert the kernel parameters to
PacketBlock<Packet8d,T>& so that the complex number versions of
ptranspose can be written using the existing double implementations.
Unfortunately, they don't seem to work and are responsible for 9 unit
test failures in the AVX512 build of tensorflow master. This commit
fixes the issue by manually initialising PacketBlock<Packet8d,T>
variables with the contents of the kernel parameter before calling
the double version of ptranspose, and then copying the resulting
values back into the kernel parameter before returning.
2019-01-11 14:02:09 +01:00
Christoph Hertzberg
0522460a0d
bug #1656 : Enable failtests only if BUILD_TESTING is enabled
2019-01-11 11:07:56 +01:00
Eugene Zhulenev
0abe03764c
Fix shorten-64-to-32 warning in TensorContractionThreadPool
2019-01-10 10:27:55 -08:00
Rasmus Munk Larsen
fcfced13ed
Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
ce38c342c3
merge.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
a05ec7993e
merge
2019-01-09 17:17:30 -08:00
Rasmus Munk Larsen
e15bb785ad
Collapsed revision
...
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5
Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f178429b9
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
1119c73d22
Collapsed revision
...
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514
Undo useless diffs.
2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8
Simplify a bit.
2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6
Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4
Merged eigen/eigen into default
2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3
bug #1654 : fix compilation with cuda and no c++11
2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74
fix plog(+inf) with AVX512
2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7
Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE
2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e
fix warning
2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b
Add missing pcmp_lt and others for AVX512
2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd
bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
...
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
- FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Eugene Zhulenev
e70ffef967
Optimize evalShardedByInnerDim
2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
055f0b73db
Add support for pcmp_eq and pnot, including for complex types.
2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41
Explicitly set fill character when printing aligned data to ostream
2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd
PR560: Fix the AVX512f only builds
...
Commit c53eececb0
introduced AVX512 support for complex numbers but required
avx512dq to build. Commit 1d683ae2f5
fixed some but not, it would seem all,
of the hard avx512dq dependencies. Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3. Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps. This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
697fba3bb0
Fix unit test
2018-12-27 11:20:47 +01:00
Gael Guennebaud
60d3fe9a89
One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)
2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510
Add EIGEN_STRONG_INLINE where required
2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8
Add missing pcmp_lt_or_nan for AVX512
2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a
Implement a faster fix for sin/cos of large entries that also correctly handle INF input.
2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8
Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)
2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb
Fix plog(+INF): it returned ~87 instead of +INF
2018-12-23 15:40:52 +01:00
Christoph Hertzberg
6dd93f7e3b
Make code compile again for older compilers.
...
See https://stackoverflow.com/questions/7411515/
2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves
1024a70e82
gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
...
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The patch works by altering the gebp lhs packing routines to also
consider ½ and ¼ packet lenght rows when packing, besides the original
whole package and row-by-row attempts. Finally, gebp itself will try
to fit a fraction of a packet at a time if:
i) ½ and/or ¼ packets are available for the current context (e.g. AVX2
and SSE-sized SIMD register for x86)
ii) The matrix's height is favorable to it (it may be it's too small
in that dimension to take full advantage of the current/maximum
packet width or it may be the case that last rows may take
advantage of smaller packets before gebp goes row-by-row)
This helps mitigate huge slowdowns one had on AVX512 builds when
compared to AVX2 ones, for some dimensions. Gains top at an extra 1x
in throughput. This patch is a complement to changeset 4ad359237a
.
Since packing is changed, Eigen users which would go for very
low-level API usage, like TensorFlow, will have to be adapted to work
fine with the changes.
2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves
e763fcd09e
Introducing "vectorized" byte on unpacket_traits structs
...
This is a preparation to a change on gebp_traits, where a new template
argument will be introduced to dictate the packet size, so it won't be
bound to the current/max packet size only anymore.
By having packet types defined early on gebp_traits, one has now to
act on packet types, not scalars anymore, for the enum values defined
on that class. One approach for reaching the vectorizable/size
properties one needs there could be getting the packet's scalar again
with unpacket_traits<>, then the size/Vectorizable enum entries from
packet_traits<>. It turns out guards like "#ifndef
EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet
variations of packet_traits<> for some types (and it makes sense to
keep that). In other words, one can't go back to the scalar and create
a new PacketType, as this will always lead to the maximum packet type
for the architecture.
The less costly/invasive solution for that, thus, is to add the
vectorizable info on every unpacket_traits struct as well.
2018-12-19 14:24:44 -08:00
Gael Guennebaud
efa4c9c40f
bug #1615 : slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
...
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
f20c991679
add changesets related to matrix product perf.
2018-12-13 10:33:29 +01:00
Rasmus Munk Larsen
dd6d65898a
Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0.
2018-12-12 14:45:31 -08:00
Gael Guennebaud
f582ea3579
Fix compilation with expression template scalar type.
2018-12-12 22:47:00 +01:00
Gael Guennebaud
cfc70dc13f
Add regression test for bug #1174
2018-12-12 18:03:31 +01:00
Gael Guennebaud
2de8da70fd
bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.
2018-12-12 17:30:08 +01:00
Gael Guennebaud
72c0bbe2bd
Simplify handling of tests that must fail to compile.
...
Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).
2018-12-12 15:48:36 +01:00
Gael Guennebaud
37c91e1836
bug #1644 : fix warning
2018-12-11 22:07:20 +01:00
Gael Guennebaud
f159cf3d75
Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
...
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd
Properly set the number of registers for AVX512
2018-12-11 15:33:17 +01:00
Gael Guennebaud
7166496f70
bug #1643 : fix compilation issue with gcc and no optimizaion
2018-12-11 13:24:42 +01:00
Gael Guennebaud
0d90637838
enable spilling workaround on architectures with SSE/AVX
2018-12-10 23:22:44 +01:00
Gael Guennebaud
cf697272e1
Remove debug code.
2018-12-09 23:05:46 +01:00
Gael Guennebaud
450dc97c6b
Various fixes in polynomial solver and its unit tests:
...
- cleanup noise in imaginary part of real roots
- take into account the magnitude of the derivative to check roots.
- use <= instead of < at appropriate places
2018-12-09 22:54:39 +01:00
Gael Guennebaud
348bb386d1
Enable "old" CMP0026 policy (not perfect, but better than dozens of warning)
2018-12-08 18:59:51 +01:00
Gael Guennebaud
bff90bf270
workaround "may be used uninitialized" warning
2018-12-08 18:58:28 +01:00
Gael Guennebaud
81c27325ae
bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512
2018-12-08 14:27:48 +01:00
Gael Guennebaud
426bce7529
fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target
2018-12-08 09:44:21 +01:00
Gael Guennebaud
cd25b538ab
Fix noise in sparse_basic_3 (numerical cancellation)
2018-12-08 00:13:37 +01:00
Gael Guennebaud
efaf03bf96
Fix noise in lu unit test
2018-12-08 00:05:03 +01:00
Gael Guennebaud
956678a4ef
bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.
2018-12-07 18:03:36 +01:00
Gael Guennebaud
7b6d0ff1f6
Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.
2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d
bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA
2018-12-07 10:01:09 +01:00
Gael Guennebaud
ae59a7652b
bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA
2018-12-07 09:23:28 +01:00
Gael Guennebaud
4e7746fe22
bug #1636 : fix gemm performance issue with gcc>=6 and no FMA
2018-12-07 09:15:46 +01:00
Gael Guennebaud
cbf2f4b7a0
AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only
2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5
Fix compilation with avx512f only, i.e., no AVX512DQ
2018-12-06 18:11:07 +01:00
Gael Guennebaud
aab749b1c3
fix test regarding AVX512 vectorization of complexes.
2018-12-06 16:55:00 +01:00
Gael Guennebaud
c53eececb0
Implement AVX512 vectorization of std::complex<float/double>
2018-12-06 15:58:06 +01:00
Gael Guennebaud
3fba59ea59
temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!
2018-12-06 00:13:26 +01:00
Gael Guennebaud
1ac2695ef7
bug #1636 : fix compilation with some ABI versions.
2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen
47d8b741b2
#elif -> #else to fix GPU build.
2018-12-05 13:19:31 -08:00
Rasmus Munk Larsen
8a02883d58
Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
...
Fix tensor contraction on AVX512 builds
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-12-05 18:19:32 +00:00
Gael Guennebaud
acc3459a49
Add help messages in the quick ref/ascii docs regarding slicing, indexing, and reshaping.
2018-12-05 17:17:23 +01:00
Gael Guennebaud
e2e897298a
Fix page nesting
2018-12-05 17:13:46 +01:00
Christoph Hertzberg
c1d356e8b4
bug #1635 : Use infinity from Numtraits instead of creating it manually.
2018-12-05 15:01:04 +01:00
Mark D Ryan
36f8f6d0be
Fix evalShardedByInnerDim for AVX512 builds
...
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size. While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16. The result is slightly
incorrect float32 contractions on AVX512 builds.
This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Rasmus Munk Larsen
b57b31cce9
Merged in ezhulenev/eigen-01 (pull request PR-553)
...
Do not disable alignment with EIGEN_GPUCC
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-12-04 23:47:19 +00:00
Eugene Zhulenev
0bb15bb6d6
Update checks in ConfigureVectorization.h
2018-12-03 17:10:40 -08:00
Eugene Zhulenev
fd0fbfa9b5
Do not disable alignment with EIGEN_GPUCC
2018-12-03 15:54:10 -08:00
Christoph Hertzberg
919414b9fe
bug #785 : Make Cholesky decomposition work for empty matrices
2018-12-03 16:18:15 +01:00
Gael Guennebaud
0ea7ae7213
Add missing padd for Packet8i (it was implicitly generated by clang and gcc)
2018-11-30 21:52:25 +01:00
Gael Guennebaud
ab4df3e6ff
bug #1634 : remove double copy in move-ctor of non movable Matrix/Array
2018-11-30 21:25:51 +01:00
Gael Guennebaud
c785464430
Add packet sin and cos to Altivec/VSX and NEON
2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be
Several improvements regarding packet-bitwise operations:
...
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876
Add psin/pcos on AVX512 -> almost for free, at last!
2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a
Cleanup
2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303
Fix pandnot order in AVX512
2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6
Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)
2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d
Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)
2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7
same for pmax
2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6
pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions
2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b
Add missing SSE/AVX type-casting in AVX512 mode
2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375
bug #1630 : fix linspaced when requesting smaller packet size than default one.
2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35
Use explicit packet type in SSE/PacketMath pldexp
2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08
do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24
bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.
2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21
Update pshiftleft to pass the shift as a true compile-time integer.
2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda
Unify SSE/AVX psin functions.
...
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
- add packet-float to packet-int type traits
- add packet float<->int reinterpret casts
- add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Rasmus Munk Larsen
08edbc8cfe
Merged in bjacob/eigen/fixbuild (pull request PR-549)
...
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 20:14:12 +00:00
Benoit Jacob
7b1cb8a440
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008
Unify Altivec/VSX pexp(double) with default implementation
2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e
cleanup
2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10
Unify SSE and AVX pexp for double.
2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054
Unify NEON's pexp with generic implementation
2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc
Unify Altivec/VSX's pexp with generic implementation
2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5
Unify SSE and AVX implementation of pexp
2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47
Unify Altivec/VSX's plog with generic implementation, and enable it!
2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8
Unify NEON's plog with generic implementation
2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114
First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
...
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c
Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"
2018-11-26 14:14:07 +01:00
Gael Guennebaud
382279eb7f
Extend unit test to recursively check half-packet types and non packet types
2018-11-26 14:10:07 +01:00
Gael Guennebaud
0836a715d6
bug #1611 : fix plog(0) on NEON
2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4
Fix typos
2018-11-23 22:22:14 +00:00
Gael Guennebaud
e3b22a6bd0
merge
2018-11-23 16:06:21 +01:00
Gael Guennebaud
ccabdd88c9
Fix reserved usage of double __ in macro names
2018-11-23 16:01:47 +01:00
Gael Guennebaud
572d62697d
check two ctors
2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b
Fix double = bool !
2018-11-23 15:12:06 +01:00
Gael Guennebaud
a7842daef2
Fix several uninitialized member from ctor
2018-11-23 15:10:28 +01:00
Christoph Hertzberg
ea60a172cf
Add default constructor to Bar to make test compile again with clang-3.8
2018-11-23 14:24:22 +01:00
Christoph Hertzberg
806352d844
Small typo found be Patrick Huber (pull request PR-547)
2018-11-23 12:34:27 +00:00
Gael Guennebaud
a476054879
bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup
2018-11-23 10:25:19 +01:00
Gael Guennebaud
c685fe9838
Move regression test to right unit test file
2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8
Workaround weird MSVC bug
2018-11-21 15:53:37 +01:00
Christoph Hertzberg
0ec8afde57
Fixed most conversion warnings in MatrixFunctions module
2018-11-20 16:23:28 +01:00
Deven Desai
e7e6809e6b
ROCm/HIP specfic fixes + updates
...
1. Eigen/src/Core/arch/GPU/Half.h
Updating the HIPCC implementation half so that it can declared as a __shared__ variable
2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h
introducing a EIGEN_USE_STD(func) macro that calls
- std::func be default
- ::func when eigen is being compiled with HIPCC
This change was requested in the previous HIP PR
(https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff )
3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h
Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC
4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Gael Guennebaud
6a510fe69c
Make MaxPacketSize a true upper bound, even for fixed-size inputs
2018-11-16 11:25:32 +01:00
Gael Guennebaud
43c987b1c1
Add explicit regression test for bug #1622
2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c
PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
...
Commit aa110e681b
optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation. However, it
introduced a mismatch between the packet size and the requestedAlignment. This
mismatch can lead to crashes when the destination is not aligned. This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize
This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment
A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046
Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES
2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f
typo
2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a
help doxygen linking to DenseBase::NulllaryExpr
2018-11-14 14:42:59 +01:00
Gael Guennebaud
4263f23c28
Improve doc on multi-threading and warn about hyper-threading
2018-11-14 14:42:29 +01:00
Gael Guennebaud
db529ae4ec
doxygen does not like \addtogroup and \ingroup in the same line
2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen
72928a2c8a
Merged in rmlarsen/eigen2 (pull request PR-543)
...
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
Approved-by: Eugene Zhulenev <ezhulenev@google.com >
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626
Remove accidental changes.
2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
77b447c24e
Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX.
2018-11-12 13:42:24 -08:00
Gael Guennebaud
c81bdbdadc
Add manual doc on STL-compatible iterators
2018-11-12 22:06:33 +01:00
Gael Guennebaud
0105146915
Fix warning in c++03
2018-11-10 09:11:38 +01:00
Rasmus Munk Larsen
93f9988a7e
A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
2018-11-09 14:15:32 -08:00
Gael Guennebaud
784a3f13cf
bug #1619 : fix mixing of const and non-const generic iterators
2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba
bug #1619 : make const and non-const iterators compatible
2018-11-09 16:49:19 +01:00
Gael Guennebaud
fbd6e7b025
add missing ref to a.zeta(b)
2018-11-09 13:53:42 +01:00
Gael Guennebaud
dffd1e11de
Limit the size of the toc
2018-11-09 13:52:34 +01:00
Gael Guennebaud
a88e0a0e95
Update doxy hacks wrt doxygen 1.8.13/14
2018-11-09 13:52:10 +01:00
Gael Guennebaud
bd9a00718f
Let doxygen sees lastN
2018-11-09 11:35:48 +01:00
Gael Guennebaud
d7c644213c
Add and update manual pages for slicing, indexing, and reshaping.
2018-11-09 11:35:27 +01:00
Gael Guennebaud
a368848473
Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE
2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6
Fix max-size in indexed-view
2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff
Merged in glchaves/eigen (pull request PR-539)
...
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gael Guennebaud
995730fc6c
Add option to disable plot generation
2018-11-07 00:41:16 +01:00
Gustavo Lima Chaves
4ad359237a
Vectorize row-by-row gebp loop iterations on 16 packets as well
...
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com >
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com >
2018-11-06 10:48:42 -08:00
Gael Guennebaud
9d318b92c6
add unit tests for bug #1619
2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e
bug #1617 : Fix SolveTriangular.solveInPlace crashing for empty matrix.
...
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d
bug #1618 : Use different power-of-2 check to avoid MSVC warning
2018-11-01 13:23:19 +01:00
Rasmus Munk Larsen
07fcdd1438
Merged in ezhulenev/eigen-02 (pull request PR-534)
...
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 18:34:35 +00:00
Eugene Zhulenev
8a977c1f46
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 11:31:29 -07:00
Halie Murray-Davis
fb62d6d96e
Fix typo in tutorial documentation.
2018-10-25 04:55:34 +00:00
Christoph Hertzberg
b5f077d22c
Document EIGEN_NO_IO preprocessor directive
2018-10-25 16:49:25 +02:00
Christian von Schultz
4a40b3785d
Collapsed revision (based on pull request PR-325)
...
* Support compiling without IO streams
Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f
Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
...
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code
4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
954b4ca9d0
Suppress compiler warning about unused global variable.
2018-10-22 13:48:56 -07:00
Rasmus Munk Larsen
9caafca550
Merged in rmlarsen/eigen (pull request PR-532)
...
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672
Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
...
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
39fec15d5c
Merged eigen/eigen into default
2018-10-19 09:48:19 -07:00
Christoph Hertzberg
40fa6f98bf
bug #1606 : Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
...
Grafted manually from a4afa90d16
2018-10-19 17:20:51 +02:00
Rasmus Munk Larsen
d8f285852b
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-18 16:55:02 -07:00
Rasmus Munk Larsen
dda68f56ec
Fix GPU build due to gpu_assert not always being defined.
2018-10-18 16:29:29 -07:00
Gael Guennebaud
1dcf5a6ed8
fix typo in doc
2018-10-17 09:29:36 +02:00
Eugene Zhulenev
9e96e91936
Move from rvalue arguments in ThreadPool enqueue* methods
2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816
Reduce thread scheduling overhead in parallelFor
2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f
Merged in ezhulenev/eigen-02 (pull request PR-528)
...
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-10-16 15:39:40 +00:00
Gael Guennebaud
0f780bb0b4
Fix float-to-double warning
2018-10-16 09:19:45 +02:00
Eugene Zhulenev
900c7c61bb
Check if it's allowed to squueze inner dimensions in TensorBlockIO
2018-10-15 16:52:33 -07:00
Gael Guennebaud
a39e0f7438
bug #1612 : fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>)
2018-10-16 01:04:25 +02:00
Gael Guennebaud
e3b85771d7
Show call stack in case of failing sparse solving.
2018-10-16 00:43:44 +02:00
Gael Guennebaud
d2d570c116
Remove useless (and broken) resize
2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d
Iterative solvers: unify and fix handling of multiple rhs.
...
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc
DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve
2018-10-15 23:46:00 +02:00
Gael Guennebaud
d835a0bf53
relax number of iterations checks to avoid false negatives
2018-10-15 10:23:32 +02:00
Gael Guennebaud
3a33db4de5
merge
2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1
Suppress unused variable compiler warning in sparse subtest 3.
2018-10-12 13:41:57 -07:00
Mark D Ryan
aa110e681b
PR 526: Speed up multiplication of small, dynamically sized matrices
...
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop. Using these types is likely to lead to little or no
vectorization. Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.
This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions. It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.
I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds. Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.
Disabling Packed8d on AVX512 builds:
Total Cases: 969
Better: 511
Worse: 85
Same: 373
Max Improvement: 169.00% (4 8 6)
Max Degradation: 36.50% (8 5 3)
Median Improvement: 35.46%
Median Degradation: 17.41%
Total FLOPs Improvement: 19.42%
Disabling Packet16f and Packed8f on AVX512 builds:
Total Cases: 969
Better: 658
Worse: 5
Same: 306
Max Improvement: 214.05% (8 6 5)
Max Degradation: 22.26% (16 2 1)
Median Improvement: 60.05%
Median Degradation: 13.32%
Total FLOPs Improvement: 59.58%
Disabling Packed8f on AVX builds:
Total Cases: 969
Better: 663
Worse: 96
Same: 210
Max Improvement: 155.29% (4 10 5)
Max Degradation: 35.12% (8 3 2)
Median Improvement: 34.28%
Median Degradation: 15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55
Fix code format
2018-11-02 14:51:35 -07:00
Eugene Zhulenev
118520f04a
Workaround nbcc+msvc compiler bug
2018-11-02 14:48:28 -07:00
Christoph Hertzberg
24dc076519
Explicitly convert 0 to Scalar for custom types
2018-10-12 10:22:19 +02:00
Gael Guennebaud
8214cf1896
Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways
2018-10-11 10:27:23 +02:00
Gael Guennebaud
43633fbaba
Fix warning with AVX512f
2018-10-11 10:13:48 +02:00
Gael Guennebaud
97e2c808e9
Fix avx512 plog(NaN) to return NaN instead of +inf
2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5
Enable avx512 plog with clang
2018-10-11 10:12:21 +02:00
Gael Guennebaud
2ef1b39674
Relaxed fastmath unit test: if std::foo fails, then let's only trigger a warning is numext::foo fails too.
...
A true error will triggered only if std::foo works but our numext::foo fails.
2018-10-11 09:45:30 +02:00
Gael Guennebaud
1d5a6363ea
relax numerical tests from equal to approx (x87)
2018-10-11 09:29:56 +02:00
Gael Guennebaud
f0aa7e40fc
Fix regression in changeset 5335659c47
2018-10-10 23:47:30 +02:00
Gael Guennebaud
ce243ee45b
bug #520 : add diagmat +/- diagmat operators.
2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47
Merged in ezhulenev/eigen-02 (pull request PR-525)
...
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688
bug #632 : add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
...
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 13:23:52 -07:00
Gael Guennebaud
76ceae49c1
bug #1609 : add inplace transposition unit test
2018-10-10 21:48:58 +02:00
Eugene Zhulenev
2bf1a31d81
Use void type if stl-style iterators are not supported
2018-10-10 10:31:40 -07:00
Christoph Hertzberg
f3130ee1ba
Avoid empty macro arguments
2018-10-10 08:23:40 +02:00
Rasmus Munk Larsen
e8918743c1
Merged in ezhulenev/eigen-01 (pull request PR-523)
...
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
befcac883d
Hide stl-container detection test under #if
2018-10-09 15:36:01 -07:00
Eugene Zhulenev
c0ca8a9fa3
Compile time detection for unimplemented stl-style iterators
2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454
bug #65 : add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()
2018-10-09 23:36:50 +02:00
Gael Guennebaud
bfa2a81a50
Make redux_vec_unroller more flexible regarding packet-type
2018-10-09 23:30:41 +02:00
Gael Guennebaud
c0c3be26ed
Extend unit tests for partial reductions
2018-10-09 22:54:54 +02:00
Christoph Hertzberg
3f2c8b7ff0
Fix a lot of Doxygen warnings in Tensor module
2018-10-09 20:22:47 +02:00
Christoph Hertzberg
f6359ad795
Small Doxygen fixes
2018-10-09 19:33:35 +02:00
Gael Guennebaud
7a882c05ab
Fix compilation on CUDA
2018-10-09 17:02:16 +02:00
Gael Guennebaud
93a6192e98
fix mpreal for mpfr<4.0.0
2018-10-09 09:15:22 +02:00
Rasmus Munk Larsen
d16634c4d4
Fix out-of bounds access in TensorArgMax.h.
2018-10-08 16:41:36 -07:00
Rasmus Munk Larsen
1a737e1d6a
Fix contraction test.
2018-10-08 16:37:07 -07:00
Gael Guennebaud
e00487f7d2
bug #1603 : add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy.
2018-10-08 22:27:04 +02:00
Gael Guennebaud
2eda9783de
typo
2018-10-08 21:37:46 +02:00
Gael Guennebaud
c6e2dde714
fix c++11 deprecated warning
2018-10-08 18:26:05 +02:00
Gael Guennebaud
6cc9b2c831
fix warning in mpreal.h
2018-10-08 18:25:37 +02:00
Gael Guennebaud
649d4758a6
merge
2018-10-08 17:35:18 +02:00
Gael Guennebaud
aa5820056e
Unify c++11 usage in doc's examples and snippets
2018-10-08 17:32:54 +02:00
Gael Guennebaud
e29bfe8479
Update included mpreal header to 3.6.5 and fix deprecated warnings.
2018-10-08 17:09:23 +02:00
Gael Guennebaud
64b1a15318
Workaround stupid warning
2018-10-08 12:01:18 +02:00
Gael Guennebaud
c9643f4a6f
Disable C++11 deprecated warning when limiting Eigen to C++98
2018-10-08 10:43:43 +02:00
Gael Guennebaud
774bb9d6f7
fix a doxygen issue
2018-10-08 09:30:15 +02:00
Gael Guennebaud
6c3f6cd52b
Fix maybe-uninitialized warning
2018-10-07 23:29:51 +02:00
Gael Guennebaud
bcb7c66b53
Workaround gcc's alloc-size-larger-than= warning
2018-10-07 21:55:59 +02:00
Gael Guennebaud
16b2001ece
Fix gcc 8.1 warning: "maybe use uninitialized"
2018-10-07 21:54:49 +02:00
Gael Guennebaud
6512c5e136
Implement a better workaround for GCC's bug #87544
2018-10-07 15:00:05 +02:00
Gael Guennebaud
409132bb81
Workaround gcc bug making it trigger an invalid warning
2018-10-07 09:23:15 +02:00
Gael Guennebaud
c6a1ab4036
Workaround MSVC compilation issue
2018-10-06 13:49:17 +02:00
Gael Guennebaud
e21766c6f5
Clarify doc of rowwise/colwise/vectorwise.
2018-10-05 23:12:09 +02:00
Gael Guennebaud
d92f004ab7
Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns
2018-10-05 23:11:21 +02:00
Gael Guennebaud
91613bf2c2
Add support for c++11 snippets
2018-10-05 23:08:39 +02:00
Gael Guennebaud
3e64b1fc86
Move iterators to internal, improve doc, make unit test c++03 friendly
2018-10-03 15:13:15 +02:00
Gael Guennebaud
2b2b4d0580
fix unused warning
2018-10-03 14:16:21 +02:00
Gael Guennebaud
8a1e98240e
add unit tests
2018-10-03 11:56:27 +02:00
Gael Guennebaud
5f26f57598
Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
...
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25
Add pointer-based iterator for direct-access expressions
2018-10-02 23:44:36 +02:00
Christoph Hertzberg
c5f1d0a72a
Fix shadow warning
2018-10-02 19:01:08 +02:00
Christoph Hertzberg
b92c71235d
Move struct outside of method for C++03 compatibility.
2018-10-02 18:59:10 +02:00
Christoph Hertzberg
051f9c1aff
Make code compile in C++03 mode again
2018-10-02 18:36:30 +02:00
Christoph Hertzberg
b786ce8c72
Fix conversion warning ... again
2018-10-02 18:35:25 +02:00
Gael Guennebaud
8c38528168
Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index)
2018-10-02 14:03:26 +02:00
Gael Guennebaud
12487531ce
Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns)
2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893
Use Index instead of ptrdiff_t or int, fix random-accessors.
2018-10-02 13:29:32 +02:00
Gael Guennebaud
de2efbc43c
bug #1605 : workaround ABI issue with vector types (aka __m128) versus scalar types (aka float)
2018-10-01 23:45:55 +02:00
Gael Guennebaud
b0c66adfb1
bug #231 : initial implementation of STL iterators for dense expressions
2018-10-01 23:21:37 +02:00
Christoph Hertzberg
564ca71e39
Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
...
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6
This commit contains the following (HIP specific) updates:
...
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
Changing "pass-by-reference" argument to be "pass-by-value" instead
(in a __global__ function decl).
"pass-by-reference" arguments to __global__ functions are unwise,
and will be explicitly flagged as errors by the newer versions of HIP.
- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
Changes introduced in recent commits breaks the HIP compile.
Adding EIGEN_DEVICE_FUNC attribute to some functions and
calling ::malloc/free instead of the corresponding std:: versions
to get the HIP compile working again
- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Change introduced a recent commit breaks the HIP compile
(link stage errors out due to failure to inline a function).
Disabling the recently introduced code (only for HIP compile), to get
the eigen nightly testing going again.
Will submit another PR once we have te proper fix.
- Eigen/src/Core/util/ConfigureVectorization.h
Enabling GPU VECTOR support when HIP compiler is in use
(for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Rasmus Munk Larsen
2088c0897f
Merged eigen/eigen into default
2018-09-28 16:00:46 -07:00
Rasmus Munk Larsen
31629bb964
Get rid of unused variable warning.
2018-09-28 16:00:09 -07:00
Eugene Zhulenev
bb13d5d917
Fix bug in copy optimization in Tensor slicing.
2018-09-28 14:34:42 -07:00
Rasmus Munk Larsen
104e8fa074
Fix a few warnings and rename a variable to not shadow "last".
2018-09-28 12:00:08 -07:00
Rasmus Munk Larsen
7c1b47840a
Merged in ezhulenev/eigen-01 (pull request PR-514)
...
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 18:37:54 +00:00
Eugene Zhulenev
524c81f3fa
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 11:24:08 -07:00
Christoph Hertzberg
86ba50be39
Fix integer conversion warnings
2018-09-28 19:33:39 +02:00
Eugene Zhulenev
e95696acb3
Optimize TensorBlockCopyOp
2018-09-27 14:49:26 -07:00
Eugene Zhulenev
9f33e71e9d
Revert code lost in merge
2018-09-27 12:08:17 -07:00
Eugene Zhulenev
a7a3e9f2b6
Merge with eigen/eigen default
2018-09-27 12:05:06 -07:00
Eugene Zhulenev
9f4988959f
Remove explicit mkldnn support and redundant TensorContractionKernelBlocking
2018-09-27 11:49:19 -07:00
Rasmus Munk Larsen
1e5750a5b8
Merged in rmlarsen/eigen4 (pull request PR-511)
...
Parallelize tensor contraction over the inner dimension.
2018-09-27 17:18:32 +00:00
Gael Guennebaud
af3ad4b513
oops, I've been too fast in previous copy/paste
2018-09-27 09:28:57 +02:00
Gael Guennebaud
24b163a877
#pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6
2018-09-27 09:23:54 +02:00
Eugene Zhulenev
b314376f9c
Test mkldnn pack for doubles
2018-09-26 18:22:24 -07:00
Eugene Zhulenev
22ed98a331
Conditionally add mkldnn test
2018-09-26 17:57:37 -07:00
Rasmus Munk Larsen
d956204ab2
Remove "false &&" left over from test.
2018-09-26 17:03:30 -07:00
Rasmus Munk Larsen
3815aeed7a
Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
...
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1 387457 396013 -2.2%
BM_Matmul_1_80_13522_2 406487 230789 +43.2%
BM_Matmul_1_80_13522_4 395821 123211 +68.9%
BM_Matmul_1_80_13522_6 391625 97002 +75.2%
BM_Matmul_1_80_13522_8 408986 113828 +72.2%
BM_Matmul_1_80_13522_16 399988 67600 +83.1%
BM_Matmul_1_80_13522_22 411546 60044 +85.4%
BM_Matmul_1_80_13522_32 393528 57312 +85.4%
BM_Matmul_1_80_13522_44 390047 63525 +83.7%
BM_Matmul_1_80_13522_88 387876 63592 +83.6%
BM_Matmul_1_1500_500_1 245359 248119 -1.1%
BM_Matmul_1_1500_500_2 401833 143271 +64.3%
BM_Matmul_1_1500_500_4 210519 100231 +52.4%
BM_Matmul_1_1500_500_6 251582 86575 +65.6%
BM_Matmul_1_1500_500_8 211499 80444 +62.0%
BM_Matmul_3_250_512_1 70297 68551 +2.5%
BM_Matmul_3_250_512_2 70141 52450 +25.2%
BM_Matmul_3_250_512_4 67872 58204 +14.2%
BM_Matmul_3_250_512_6 71378 63340 +11.3%
BM_Matmul_3_250_512_8 69595 41652 +40.2%
BM_Matmul_3_250_512_16 72055 42549 +40.9%
BM_Matmul_3_250_512_22 70158 54023 +23.0%
BM_Matmul_3_250_512_32 71541 56042 +21.7%
BM_Matmul_3_250_512_44 71843 57019 +20.6%
BM_Matmul_3_250_512_88 69951 54045 +22.7%
BM_Matmul_3_1500_512_1 369328 374284 -1.4%
BM_Matmul_3_1500_512_2 428656 223603 +47.8%
BM_Matmul_3_1500_512_4 205599 139508 +32.1%
BM_Matmul_3_1500_512_6 214278 139071 +35.1%
BM_Matmul_3_1500_512_8 184149 142338 +22.7%
BM_Matmul_3_1500_512_16 156462 156983 -0.3%
BM_Matmul_3_1500_512_22 163905 158259 +3.4%
BM_Matmul_3_1500_512_32 155314 157662 -1.5%
BM_Matmul_3_1500_512_44 235434 158657 +32.6%
BM_Matmul_3_1500_512_88 156779 160275 -2.2%
BM_Matmul_1500_4_512_1 363358 349528 +3.8%
BM_Matmul_1500_4_512_2 303134 263319 +13.1%
BM_Matmul_1500_4_512_4 176208 130086 +26.2%
BM_Matmul_1500_4_512_6 148026 115449 +22.0%
BM_Matmul_1500_4_512_8 131656 98421 +25.2%
BM_Matmul_1500_4_512_16 134011 82861 +38.2%
BM_Matmul_1500_4_512_22 134950 85685 +36.5%
BM_Matmul_1500_4_512_32 133165 90081 +32.4%
BM_Matmul_1500_4_512_44 133203 90644 +32.0%
BM_Matmul_1500_4_512_88 134106 100566 +25.0%
BM_Matmul_4_1500_512_1 439243 435058 +1.0%
BM_Matmul_4_1500_512_2 451830 257032 +43.1%
BM_Matmul_4_1500_512_4 276434 164513 +40.5%
BM_Matmul_4_1500_512_6 182542 144827 +20.7%
BM_Matmul_4_1500_512_8 179411 166256 +7.3%
BM_Matmul_4_1500_512_16 158101 155560 +1.6%
BM_Matmul_4_1500_512_22 152435 155448 -1.9%
BM_Matmul_4_1500_512_32 155150 149538 +3.6%
BM_Matmul_4_1500_512_44 193842 149777 +22.7%
BM_Matmul_4_1500_512_88 149544 154468 -3.3%
2018-09-26 16:47:13 -07:00
Eugene Zhulenev
71cd3fbd6a
Support multiple contraction kernel types in TensorContractionThreadPool
2018-09-26 11:08:47 -07:00
Christoph Hertzberg
0a3356f4ec
Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang)
2018-09-25 20:26:16 +02:00
Gael Guennebaud
41c3a2ffc1
Fix documentation of reshape to vectors.
2018-09-25 16:35:44 +02:00
Christoph Hertzberg
2c083ace3e
Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides
2018-09-24 18:01:17 +02:00
Gael Guennebaud
626942d9dd
fix alignment issue in ploaddup for AVX512
2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36
Merge with default.
2018-09-23 21:52:58 +02:00
Gael Guennebaud
795e12393b
Fix logic in diagonal*dense product in a corner case.
...
The problem was for: diag(1x1) * mat(1,n)
2018-09-22 16:44:33 +02:00
Gael Guennebaud
bac36d0996
Demangle Travseral and Unrolling in Redux
2018-09-21 23:03:45 +02:00
Gael Guennebaud
c696dbcaa6
Fiw shadowing of last and all
2018-09-21 23:02:33 +02:00
Christoph Hertzberg
e3c8289047
Replace unused PREDICATE by corresponding STATIC_ASSERT
2018-09-21 21:15:51 +02:00
Gael Guennebaud
1bf12880ae
Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all)
2018-09-21 16:50:04 +02:00
Gael Guennebaud
4291f167ee
Add missing plugins to DynamicSparseMatrix -- fix sparse_extra_3
2018-09-21 14:53:43 +02:00
Gael Guennebaud
03a0cb2b72
fix unalignedcount for avx512
2018-09-21 14:40:26 +02:00
Gael Guennebaud
371068992a
Add more debug output
2018-09-21 14:32:39 +02:00
Gael Guennebaud
91716f03a7
Fix vectorization logic unit test for AVX512
2018-09-21 14:32:24 +02:00
Gael Guennebaud
b00e48a867
Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)
2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787
merge with default Eigen
2018-09-21 11:51:49 +02:00
Gael Guennebaud
47720e7970
Doc fixes
2018-09-21 11:48:22 +02:00
Gael Guennebaud
3ec2985914
Merged indexing cleanup (pull request PR-506)
2018-09-21 09:36:05 +00:00
Gael Guennebaud
651e5d4866
Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
...
This change also make it future proof for AVX1024
2018-09-21 10:33:22 +02:00
Eugene Zhulenev
719e438a20
Collapsed revision
...
* Split cxx11_tensor_executor test
* Register test parts with EIGEN_SUFFIXES
* Fix EIGEN_SUFFIXES in cxx11_tensor_executor test
2018-09-20 15:19:12 -07:00
Gael Guennebaud
f0ef3467de
Fix doc
2018-09-20 22:57:28 +02:00
Gael Guennebaud
617f75f117
Add indexing namespace
2018-09-20 22:57:10 +02:00
Gael Guennebaud
0c56d22e2e
Fix shadowing
2018-09-20 22:56:21 +02:00
Rasmus Munk Larsen
8e2be7777e
Merged eigen/eigen into default
2018-09-20 11:41:15 -07:00
Rasmus Munk Larsen
5d2e759329
Initialize BlockIteratorState in a C++03 compatible way.
2018-09-20 11:40:43 -07:00
Gael Guennebaud
e04faca930
merge
2018-09-20 18:33:54 +02:00
Gael Guennebaud
d37188b9c1
Fix MPrealSupport
2018-09-20 18:30:10 +02:00
Gael Guennebaud
3c6dc93f99
Fix GPU support.
2018-09-20 18:29:21 +02:00
Gael Guennebaud
e0f6d352fb
Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header.
2018-09-20 18:07:32 +02:00
Gael Guennebaud
eeeb18814f
Fix warning
2018-09-20 17:48:56 +02:00
Gael Guennebaud
9419f506d0
Fix regression introduced by the previous fix for AVX512.
...
It brokes the complex-complex case on SSE.
2018-09-20 17:32:34 +02:00
Christoph Hertzberg
a0166ab651
Workaround for spurious "array subscript is above array bounds" warnings with g++4.x
2018-09-20 17:08:43 +02:00
Gael Guennebaud
e38d1ab4d1
Workaround increases required alignment warning
2018-09-20 17:07:33 +02:00
Christoph Hertzberg
c50250cb24
Avoid warning "suggest braces around initialization of subobject".
...
This test is not run in C++03 mode, so no compatibility is lost.
2018-09-20 17:03:42 +02:00
Gael Guennebaud
71496b0e25
Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
...
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Gael Guennebaud
5a30eed17e
Fix warnings in AVX512
2018-09-20 16:58:51 +02:00
Gael Guennebaud
2cf6d3050c
Disable ignoring attributes warning
2018-09-20 11:38:19 +02:00
Rasmus Munk Larsen
44d8274383
Cast to longer type.
2018-09-19 13:31:42 -07:00
Rasmus Munk Larsen
d638b62dda
Silence compiler warning.
2018-09-19 13:27:55 -07:00
Rasmus Munk Larsen
db9c9df59a
Silence more compiler warnings.
2018-09-19 11:50:27 -07:00
Rasmus Munk Larsen
febd09dcc0
Silence compiler warnings in ThreadPoolInterface.h.
2018-09-19 11:11:04 -07:00
Gael Guennebaud
c3a19527a2
Fix doc wrt previous change
2018-09-19 11:49:26 +02:00
Gael Guennebaud
dfa8439e4d
Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
...
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
luz.paz"
f67b19a884
[PATCH 1/2] Misc. typos
...
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
CMakeLists.txt | 26 +++++++++----------
Eigen/src/Core/GenericPacketMath.h | 2 +-
Eigen/src/SparseLU/SparseLU.h | 2 +-
bench/bench_norm.cpp | 2 +-
doc/HiPerformance.dox | 2 +-
doc/QuickStartGuide.dox | 2 +-
.../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++---
.../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +-
.../src/Tensor/TensorForwardDeclarations.h | 4 +--
.../src/Tensor/TensorGpuHipCudaDefines.h | 2 +-
.../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +-
.../CXX11/src/Tensor/TensorReductionGpu.h | 2 +-
.../test/cxx11_tensor_concatenation.cpp | 2 +-
unsupported/test/cxx11_tensor_executor.cpp | 2 +-
14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Gael Guennebaud
297ca62319
ease transition by adding placeholders::all/last/and as deprecated
2018-09-17 16:24:52 +02:00
Gael Guennebaud
2014c7ae28
Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end.
2018-09-15 14:35:10 +02:00
Gael Guennebaud
82772e8d9d
Rename Symbolic namespace to symbolic to be consistent with numext namespace
2018-09-15 14:16:20 +02:00
Rasmus Munk Larsen
400512bfad
Merged in ezhulenev/eigen-02 (pull request PR-501)
...
Enable DSizes type promotion with c++03
2018-09-19 00:50:04 +00:00
Eugene Zhulenev
c4627039ac
Support static dimensions (aka IndexList) in Tensor::resize(...)
2018-09-18 14:25:21 -07:00
Gael Guennebaud
3e8188fc77
bug #1600 : initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert)
2018-09-18 21:24:48 +02:00
Eugene Zhulenev
218a7b9840
Enable DSizes type promotion with c++03 compilers
2018-09-18 10:57:00 -07:00
Ravi Kiran
1f0c941c3d
Collapsed revision
...
* Merged eigen/eigen into default
2018-09-17 18:29:12 -07:00
Rasmus Munk Larsen
03a88c57e1
Merged in ezhulenev/eigen-02 (pull request PR-498)
...
Add DSizes index type promotion
2018-09-17 21:58:38 +00:00
Rasmus Munk Larsen
5ca0e4a245
Merged in ezhulenev/eigen-01 (pull request PR-497)
...
Fix warnings in IndexList array_prod
2018-09-17 20:15:06 +00:00
Eugene Zhulenev
a5cd4e9ad1
Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList
2018-09-17 10:58:07 -07:00
Gael Guennebaud
b311bfb752
bug #1596 : fix inclusion of Eigen's header within unsupported modules.
2018-09-17 09:54:29 +02:00
Gael Guennebaud
72f19c827a
typo
2018-09-16 22:10:34 +02:00
Eugene Zhulenev
66f056776f
Add DSizes index type promotion
2018-09-15 15:17:38 -07:00
Eugene Zhulenev
f313126dab
Fix warnings in IndexList array_prod
2018-09-15 13:47:54 -07:00
Christoph Hertzberg
42705ba574
Fix weird error for building with g++-4.7 in C++03 mode.
2018-09-15 12:43:41 +02:00
Rasmus Munk Larsen
c2383f95af
Merged in ezhulenev/eigen/fix_dsizes (pull request PR-494)
...
Fix DSizes IndexList constructor
2018-09-15 02:36:19 +00:00
Rasmus Munk Larsen
30290cdd56
Merged in ezhulenev/eigen/moar_eigen_fixes_3 (pull request PR-493)
...
Const cast scalar pointer in TensorSlicingOp evaluator
Approved-by: Sameer Agarwal <sameeragarwal@google.com >
2018-09-15 02:35:07 +00:00
Eugene Zhulenev
f7d0053cf0
Fix DSizes IndexList constructor
2018-09-14 19:19:13 -07:00
Rasmus Munk Larsen
601e289d27
Merged in ezhulenev/eigen/moar_eigen_fixes_1 (pull request PR-492)
...
Explicitly construct tensor block dimensions from evaluator dimensions
2018-09-15 01:36:21 +00:00
Eugene Zhulenev
71070a1e84
Const cast scalar pointer in TensorSlicingOp evaluator
2018-09-14 17:17:50 -07:00
Eugene Zhulenev
4863375723
Explicitly construct tensor block dimensions from evaluator dimensions
2018-09-14 16:55:05 -07:00
Rasmus Munk Larsen
14e35855e1
Merged in chtz/eigen-maxsizevector (pull request PR-490)
...
Let MaxSizeVector respect alignment of objects
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-09-14 23:29:24 +00:00
Rasmus Munk Larsen
281e631839
Merged in ezhulenev/eigen/indexlist_to_dsize (pull request PR-491)
...
Support reshaping with static shapes and dimensions conversion in tensor broadcasting
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-09-14 22:45:52 +00:00
Eugene Zhulenev
1b8d70a22b
Support reshaping with static shapes and dimensions conversion in tensor broadcasting
2018-09-14 15:25:27 -07:00
Christoph Hertzberg
007f165c69
bug #1598 : Let MaxSizeVector respect alignment of objects and add a unit test
...
Also revert 8b3d9ed081
2018-09-14 20:21:56 +02:00
Christoph Hertzberg
d7378aae8e
Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment.
2018-09-14 20:17:47 +02:00
Rasmus Munk Larsen
9b864cdb37
Merged in rmlarsen/eigen3 (pull request PR-480)
...
Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.
2018-09-14 00:05:09 +00:00
Rasmus Munk Larsen
d0eef5fe6c
Don't use bracket syntax in ctor.
2018-09-13 17:04:05 -07:00
Rasmus Munk Larsen
6313dde390
Fix merge error.
2018-09-13 16:42:05 -07:00
Rasmus Munk Larsen
0db590d22d
Backed out changeset 01197e4452
2018-09-13 16:20:57 -07:00
Rasmus Munk Larsen
b3f4c067d9
Merge
2018-09-13 16:18:52 -07:00
Rasmus Munk Larsen
2b07018140
Enable vectorized version on GPUs. The underlying bug has been fixed.
2018-09-13 16:12:22 -07:00
Rasmus Munk Larsen
53568e3549
Merged in ezhulenev/eigen/tiled_evalution_support (pull request PR-444)
...
Tiled evaluation for Tensor ops
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
Approved-by: Gael Guennebaud <g.gael@free.fr >
2018-09-13 22:05:47 +00:00
Eugene Zhulenev
01197e4452
Fix warnings
2018-09-13 15:03:36 -07:00
Gael Guennebaud
1141bcf794
Fix conjugate-gradient for very small rhs
2018-09-13 23:53:28 +02:00
Gael Guennebaud
7f3b17e403
MSVC 2015 supports c++11 thread-local-storage
2018-09-13 18:15:07 +02:00
Eugene Zhulenev
d138fe341d
Fis static_assert in test to conform c++11 standard
2018-09-11 17:23:18 -07:00
Rasmus Munk Larsen
e289f44c56
Don't vectorize the MeanReducer unless pdiv is available.
2018-09-11 14:09:00 -07:00
Eugene Zhulenev
55bb7e7935
Merge with upstream eigen/default
2018-09-11 13:33:06 -07:00
Eugene Zhulenev
81b38a155a
Fix compilation of tiled evaluation code with c++03
2018-09-11 13:32:32 -07:00
Rasmus Munk Larsen
5da960702f
Merged eigen/eigen into default
2018-09-11 10:08:46 -07:00
Rasmus Munk Larsen
46f88fc454
Use numerically stable tree reduction in TensorReduction.
2018-09-11 10:08:10 -07:00
Justin Carpentier
4827bec776
LLT: correct doc and add missing reference for the return type of rankUpdate
...
---
Eigen/src/Cholesky/LLT.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
2018-09-11 09:33:21 +02:00
Rasmus Munk Larsen
3d057e0453
Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.
2018-09-06 12:59:36 -07:00
cgs1019
c6066ac411
Make param name and docs constistent for JacobiRotation::makeGivens
...
Previously the rendered math in the doc string called the optional return value
'r', while the actual parameter and the doc string text referred to the
parameter as 'z'. This changeset renames all the z's to r's to match the math.
2018-09-06 11:04:17 -04:00
Alexey Frunze
edeee16a16
Fix build failures in matrix_power and matrix_exponential tests.
...
This fixes the static assertion complaining about double being
used in place of long double. This happened on MIPS32, where
double and long double have the same type representation.
This can be simulated on x86 as well if we pass -mlong-double-64
to g++.
2018-08-31 14:11:10 -07:00
Deven Desai
c64fe9ea1f
Updates to fix HIP-clang specific compile errors.
...
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
Rasmus Munk Larsen
8b3d9ed081
Use padding instead of alignment attribute, which MaxSizeVector does not respect. This leads to undefined behavior and hard-to-trace bugs.
2018-09-05 11:20:06 -07:00
Gael Guennebaud
5927eef612
Enable std::result_of for msvc 2015 and later
2018-09-13 09:44:46 +02:00
Christoph Hertzberg
3adece4827
Fix misleading indentation of errorCode and make it loop-local
2018-09-12 14:41:38 +02:00
Christoph Hertzberg
7e9c9fbb2d
Disable type-limits warnings for g++ < 4.8
2018-09-12 14:40:39 +02:00
Christoph Hertzberg
ba2c8efdcf
EIGEN_UNUSED is not supported by g++4.7 (and not portable)
2018-09-12 11:49:10 +02:00
Christoph Hertzberg
ff4e835d6b
"sparse_product.cpp" must be included before "sparse_basic.cpp", otherwise EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN has no effect
2018-08-30 20:10:11 +02:00
Christoph Hertzberg
023ed6b9a8
Product of empty array must be 1 and not 0.
2018-08-30 17:14:52 +02:00
Christoph Hertzberg
c2f4e8c08e
Fix integer conversion warning
2018-08-30 17:12:53 +02:00
Christoph Hertzberg
ddbc564386
Fixed a few more shadowing warnings when compiling with g++ (and c++03)
2018-08-30 16:33:03 +02:00
Deven Desai
946c3e2544
adding EIGEN_DEVICE_FUNC attribute to fix some GPU unit tests that are broken in HIP mode
2018-08-27 23:04:08 +00:00
Mehdi Goli
7ec8b40ad9
Collapsed revision
...
* Separating SYCL math function.
* Converting function overload to function specialisation.
* Applying the suggested design.
2018-08-28 14:20:48 +01:00
Christoph Hertzberg
20ba2eee6d
gcc thinks this may not be initialized
2018-08-28 18:33:24 +02:00
Christoph Hertzberg
73ca600bca
Fix numerous shadow-warnings for GCC<=4.8
2018-08-28 18:32:39 +02:00
Christoph Hertzberg
ef4d79fed8
Disable/ReenableStupidWarnings did not work properly, when included recursively
2018-08-28 18:26:22 +02:00
Gael Guennebaud
befaf83f5f
bug #1590 : fix collision with some system headers defining the macro FP32
2018-08-28 13:21:28 +02:00
Christoph Hertzberg
42f3ee4fb8
Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop
...
Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers
2018-08-28 11:44:15 +02:00
Eugene Zhulenev
c144bb355b
Merge with upstream eigen/default
2018-08-27 14:34:07 -07:00
Gael Guennebaud
5747288676
Disable a bonus unit-test which is broken with gcc 4.7
2018-08-27 13:07:34 +02:00
Gael Guennebaud
d5ed64512f
bug #1573 : workaround gcc 4.7 and 4.8 bug
2018-08-27 10:38:20 +02:00
Christoph Hertzberg
b1653d1599
Fix some trivial C++11 vs C++03 compatibility warnings
2018-08-25 12:21:00 +02:00
Christoph Hertzberg
42123ff38b
Make unit test C++03 compatible
2018-08-25 11:53:28 +02:00
Christoph Hertzberg
4b1ad086b5
Fix shadow warnings in doc-snippets
2018-08-25 10:07:17 +02:00
Christoph Hertzberg
117bc5d505
Fix some shadow warnings
2018-08-25 09:06:08 +02:00
Christoph Hertzberg
f155e97adb
Previous fix broke compilation for clang
2018-08-25 00:10:46 +02:00
Christoph Hertzberg
209b4972ec
Fix conversion warning
2018-08-25 00:02:46 +02:00
Christoph Hertzberg
495f6c3c3a
Fix missing-braces warnings
2018-08-24 23:56:13 +02:00
Christoph Hertzberg
5aaedbeced
Fixed more sign-compare and type-limits warnings
2018-08-24 23:54:12 +02:00
Christoph Hertzberg
8295f02b36
Hide "maybe uninitialized" warning on gcc
2018-08-24 23:22:20 +02:00
Christoph Hertzberg
f7675b826b
Fix several integer conversion and sign-compare warnings
2018-08-24 22:58:55 +02:00
Christoph Hertzberg
949b0ad9cb
Merged in rmlarsen/eigen3 (pull request PR-468)
...
Add support for emulating thread local.
2018-08-24 17:29:03 +00:00
Rasmus Munk Larsen
744e2fe0de
Address comments about EIGEN_THREAD_LOCAL.
2018-08-24 10:24:54 -07:00
Christoph Hertzberg
ad4a08fb68
Use Intel cast intrinsics, since MSVC does not allow direct casting.
...
Reported by David Winkler.
2018-08-24 19:04:33 +02:00
Rasmus Munk Larsen
8d9bc5cc02
Fix g++ compilation.
2018-08-23 13:06:39 -07:00
Rasmus Munk Larsen
e9f9d70611
Don't rely on __had_feature for g++.
...
Don't use __thread.
Only use thread_local for gcc 4.8 or newer.
2018-08-23 12:59:46 -07:00
Rasmus Munk Larsen
668690978f
Pad PerThread when we emulate thread_local to prevent false sharing.
2018-08-23 12:54:33 -07:00
Rasmus Munk Larsen
6cedc5a9b3
rename mu.
2018-08-23 12:11:58 -07:00
Rasmus Munk Larsen
6e0464004a
Store std::unique_ptr instead of raw pointers in per_thread_map_.
2018-08-23 12:10:08 -07:00
Rasmus Munk Larsen
e51d9e473a
Protect #undef max with #ifdef max.
2018-08-23 11:42:05 -07:00
Rasmus Munk Larsen
d35880ed91
merge
2018-08-23 11:36:49 -07:00
Christoph Hertzberg
a709c8efb4
Replace pointers by values or unique_ptr for better leak-safety
2018-08-23 19:41:59 +02:00
Christoph Hertzberg
39335cf51e
Make MaxSizeVector leak-safe
2018-08-23 19:37:56 +02:00
Benoit Steiner
ff8e0ecc2f
Updated one more line of code to avoid making the test dependent on cxx11 features.
2018-08-17 15:15:52 -07:00
Benoit Steiner
43d9dd9b28
Removed more dependencies on cxx11.
2018-08-17 08:49:32 -07:00
Gael Guennebaud
f76c802973
Add missing empty line
2018-08-17 17:16:12 +02:00
Christoph Hertzberg
41f1cc67b8
Assertion depended on a not yet initialized value
2018-08-17 16:42:53 +02:00
Christoph Hertzberg
4713465eef
Silence double-promotion warning
2018-08-17 16:39:43 +02:00
Christoph Hertzberg
595cae9b09
Silence logical-op-parentheses warning
2018-08-17 16:30:32 +02:00
Christoph Hertzberg
c9b25fbefa
Silence unused parameter warning
2018-08-17 16:28:28 +02:00
Christoph Hertzberg
dbdeceabdd
Silence double-promotion warning (when converting double to complex<long double>)
2018-08-17 16:26:11 +02:00
Benoit Steiner
19df4d5752
Merged in codeplaysoftware/eigen-upstream-pure/Pointer_type_creation (pull request PR-461)
...
Creating a pointer type in TensorCustomOp.h
2018-08-16 18:28:33 +00:00
Benoit Steiner
f641cf1253
Adding missing at method in Eigen::array
2018-08-16 11:24:37 -07:00
Benoit Steiner
ede580ccda
Avoid using the auto keyword to make the tensor block access test more portable
2018-08-16 10:49:47 -07:00
Benoit Steiner
e23c8c294e
Use actual types instead of the auto keyword to make the code more portable
2018-08-16 10:41:01 -07:00
Mehdi Goli
80f1a76dec
removing the noises.
2018-08-16 13:33:24 +01:00
Mehdi Goli
d0b01ebbf6
Reverting the unitended delete from the code.
2018-08-16 13:21:36 +01:00
Mehdi Goli
161dcbae9b
Using PointerType struct and specializing it per device for TensorCustomOp.h
2018-08-16 00:07:02 +01:00
Sameer Agarwal
f197c3f55b
Removed an used variable (PacketSize) from TensorExecutor
2018-08-15 11:24:57 -07:00
Benoit Steiner
4181556907
Fixed the tensor contraction code.
2018-08-15 09:34:47 -07:00
Benoit Steiner
b6f96cf7dd
Removed dependencies on cxx11 language features from the tensor_block_access test
2018-08-15 08:54:31 -07:00
Benoit Steiner
fbb834144d
Fixed more compilation errors
2018-08-15 08:52:58 -07:00
Benoit Steiner
6bb3f1b43e
Made the tensor_block_access test compile again
2018-08-14 14:26:59 -07:00
Benoit Steiner
43ec0082a6
Made the kronecker_product test compile again
2018-08-14 14:08:36 -07:00
Benoit Steiner
ab3f481141
Cleaned up the code and make it compile with more compilers
2018-08-14 14:05:46 -07:00
Rasmus Munk Larsen
fa0bcbf230
merge
2018-08-14 12:18:31 -07:00
Rasmus Munk Larsen
15d4f515e2
Use plain_assert in destructors to avoid throwing in CXX11 tests where main.h owerwrites eigen_assert with a throwing version.
2018-08-14 12:17:46 -07:00
Rasmus Munk Larsen
aebdb06424
Fix a few compiler warnings in CXX11 tests.
2018-08-14 12:06:39 -07:00
Rasmus Munk Larsen
2a98bd9c8e
Merged eigen/eigen into default
2018-08-14 12:02:09 -07:00
Benoit Steiner
59bba77ead
Fixed compilation errors with gcc 4.7 and 4.8
2018-08-14 10:54:48 -07:00
Mehdi Goli
a97aaa2bcf
Merge with upstream.
2018-08-14 17:49:29 +01:00
Mehdi Goli
8ba799805b
Merge with upstream
2018-08-14 09:43:45 +01:00
Rasmus Munk Larsen
6d6e7b7027
merge
2018-08-13 15:34:50 -07:00
Rasmus Munk Larsen
9bb75d8d31
Add Barrier.h.
2018-08-13 15:34:03 -07:00
Rasmus Munk Larsen
2e1adc0324
Merged eigen/eigen into default
2018-08-13 15:32:00 -07:00
Rasmus Munk Larsen
8278ae6313
Add support for thread local support on platforms that do not support it through emulation using a hash map.
2018-08-13 15:31:23 -07:00
Benoit Steiner
501be70b27
Code cleanup
2018-08-13 15:16:40 -07:00
Benoit Steiner
3d3711f22f
Fixed compilation errors.
2018-08-13 15:16:06 -07:00
Gael Guennebaud
3ec60215df
Merged in rmlarsen/eigen2 (pull request PR-466)
...
Move sigmoid functor to core and rename it to 'logistic'.
2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen
0f1b2e08a5
Call logistic functor from Tensor::sigmoid.
2018-08-13 11:52:58 -07:00
Rasmus Munk Larsen
d6e283ba96
sigmoid -> logistic
2018-08-13 11:14:50 -07:00
Benoit Steiner
26239ee580
Use NULL instead of nullptr to avoid adding a cxx11 requirement.
2018-08-13 11:05:51 -07:00
Benoit Steiner
3810ec228f
Don't use the auto keyword since it's not always supported properly.
2018-08-13 10:46:09 -07:00
Benoit Steiner
e6d5be811d
Fixed syntax of nested templates chevrons to make it compatible with c++97 mode.
2018-08-13 10:29:21 -07:00
Mehdi Goli
1aa86aad14
Merge with upstream.
2018-08-13 15:40:31 +01:00
Eugene Zhulenev
35d90e8960
Fix BlockAccess enum in CwiseUnaryOp evaluator
2018-08-10 17:37:58 -07:00
Eugene Zhulenev
855b68896b
Merge with eigen/default
2018-08-10 17:18:42 -07:00
Eugene Zhulenev
f2209d06e4
Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all evaluators
2018-08-10 16:53:36 -07:00
Benoit Steiner
c8ea398675
Avoided language features that are only available in cxx11 mode.
2018-08-10 13:02:41 -07:00
Benoit Steiner
4be4286224
Made the code compile with gcc 5.4.
2018-08-10 11:32:58 -07:00
Justin Carpentier
eabc7a4031
PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type
...
The type should be RowMajor
2018-08-10 14:30:06 +02:00
Rasmus Munk Larsen
c49e93440f
SuiteSparse defines the macro SuiteSparse_long to control what type is used for 64bit integers. The default value of this macro on non-MSVC platforms is long and __int64 on MSVC. CholmodSupport defaults to using long for the long variants of CHOLMOD functions. This creates problems when SuiteSparse_long is different than long. So the correct thing to do here is
...
to use SuiteSparse_long as the type instead of long.
2018-08-13 15:53:31 -07:00
Mehdi Goli
3a2e1b1fc6
Merge with upstream.
2018-08-10 12:28:38 +01:00
Rasmus Munk Larsen
bfc5091dd5
Cast to diagonalSize to RealScalar instead Scalar.
2018-08-09 14:46:17 -07:00
Rasmus Munk Larsen
8603d80029
Cast diagonalSize() to Scalar before multiplication. Without this, automatic differentiation in Ceres breaks because Scalar is a custom type that does not support multiplication by Index.
2018-08-09 11:09:10 -07:00
Eugene Zhulenev
cfaedb38cd
Fix bug in a test + compilation errors
2018-08-09 09:44:07 -07:00
Mehdi Goli
ea8fa5e86f
Merge with upstream
2018-08-09 14:07:56 +01:00
Mehdi Goli
8c083bfd0e
Properly fixing the PointerType for TensorCustomOp.h. As the output type here should be based on CoeffreturnType not the Scalar type. Therefore, Similar to reduction and evalTo function, it should have its own MakePointer class. In this case, for other device the type is defaulted to CoeffReturnType and no changes is required on users' code. However, in SYCL, on the device, we can recunstruct the device Type.
2018-08-09 13:57:43 +01:00
Alexey Frunze
050bcf6126
bug #1584 : Improve random (avoid undefined behavior).
2018-08-08 20:19:32 -07:00
Eugene Zhulenev
1c8b9e10a7
Merged with upstream eigen
2018-08-08 16:57:58 -07:00
Benoit Steiner
131ed1191f
Merged in codeplaysoftware/eigen-upstream-pure/Fixing_compiler_warning (pull request PR-462)
...
Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.
2018-08-08 18:14:15 +00:00
Benoit Steiner
1285c080b3
Merged in codeplaysoftware/eigen-upstream-pure/disabling_assert_in_sycl (pull request PR-459)
...
Disabling assert inside SYCL kernel.
2018-08-08 18:12:42 +00:00
Benoit Steiner
c4b2845be9
Merged in rmlarsen/eigen3 (pull request PR-458)
...
Fix init order.
2018-08-08 18:11:49 +00:00
Benoit Steiner
7124172b83
Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_UNROLL_LOOP (pull request PR-460)
...
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 18:10:54 +00:00
Mehdi Goli
532a0be05c
Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.
2018-08-08 12:12:26 +01:00
Mehdi Goli
67711eaa31
Fixing typo.
2018-08-08 11:38:10 +01:00
Mehdi Goli
3055e3a7c2
Creating a pointer type in TensorCustomOp.h
2018-08-08 11:19:02 +01:00
Mehdi Goli
22031ab59a
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 11:07:27 +01:00
Mehdi Goli
908b906d79
Disabling assert inside SYCL kernel.
2018-08-08 10:01:10 +01:00
Rasmus Munk Larsen
693fb1d41e
Fix init order.
2018-08-07 17:18:51 -07:00
Benoit Steiner
10d286f55b
Silenced a couple of compilation warnings.
2018-08-06 16:00:29 -07:00
Benoit Steiner
d011d05fd6
Fixed compilation errors.
2018-08-06 13:40:51 -07:00
Rasmus Munk Larsen
36e7e7dd8f
Forward declare NoOpOutputKernel as struct rather than class to be consistent with implementation.
2018-08-06 13:16:32 -07:00
Rasmus Munk Larsen
fa68342ef8
Move sigmoid functor to core.
2018-08-03 17:31:23 -07:00
Gael Guennebaud
09c81ac033
bug #1451 : fix numeric_limits<AutoDiffScalar<Der>> with a reference as derivative type
2018-08-04 00:17:37 +02:00
luz.paz"
43fd42a33b
Fix doxy and misc. typos
...
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
Eigen/src/Core/ProductEvaluators.h | 4 ++--
Eigen/src/Core/arch/GPU/Half.h | 2 +-
Eigen/src/Core/util/Memory.h | 2 +-
Eigen/src/Geometry/Hyperplane.h | 2 +-
Eigen/src/Geometry/Transform.h | 2 +-
Eigen/src/Geometry/Translation.h | 12 ++++++------
doc/PreprocessorDirectives.dox | 2 +-
doc/TutorialGeometry.dox | 2 +-
test/boostmultiprec.cpp | 2 +-
test/triangular.cpp | 2 +-
10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Jean-Christophe Fillion-Robin
2cbd9dd498
[PATCH] cmake: Support source include with add_subdirectory and
...
find_package use
This commit allows the sources of the project to be included in a parent
project CMakeLists.txt and support use of "find_package(Eigen3 CONFIG REQUIRED)"
Here is an example allowing to test the changes. It is not particularly
useful in itself. This change will allow to support one of the scenario
allowing to create custom 3D Slicer application bundling associated plugins.
/tmp/eigen-git-mirror # Eigen sources
/tmp/test/CMakeLists.txt:
cmake_minimum_required(VERSION 3.12)
project(test)
add_subdirectory("/tmp/eigen-git-mirror" "eigen-git-mirror")
find_package(Eigen3 CONFIG REQUIRED)
and configuring it using:
mkdir /tmp/test-build && cd $_
cmake \
-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY:BOOL=1 \
-DEigen3_DIR:PATH=/tmp/test-build/eigen-git-mirror \
/tmp/test
Co-authored-by: Pablo Hernandez <pablo.hernandez@kitware.com >
---
CMakeLists.txt | 1 +
cmake/Eigen3Config.cmake.in | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
2018-09-07 15:50:19 -04:00
Christoph Hertzberg
a80a290079
Fix 'template argument uses local type'-warnings (when compiled in C++03 mode)
2018-09-10 18:57:28 +02:00
Jiandong Ruan
6dcd2642aa
bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above
2018-09-08 12:05:33 -07:00
Christoph Hertzberg
edfb7962fd
Use static const int instead of enum to avoid numerous local-type-template-args warnings in C++03 mode
2018-09-07 14:08:39 +02:00
Alexey Frunze
ec38f07b79
bug #1595 : Don't use C++11's std::isnan() in MIPS/MSA packet math.
...
This removes reliance on C++11 and improves generated code.
2018-09-06 15:40:09 -07:00
Eugene Zhulenev
1b0373ae10
Replace all using declarations with typedefs in Tensor ops
2018-08-01 15:55:46 -07:00
Rasmus Munk Larsen
7f8b53fd0e
bug #1580 : Fix cuda clang build. STL is not supported, so std::equal_to and std::not_equal breaks compilation.
...
Update the definition of EIGEN_CONSTEXPR_ARE_DEVICE_FUNC to exclude clang.
See also PR 450.
2018-08-01 12:36:24 -07:00
Rasmus Munk Larsen
bcb29f890c
Fix initialization order.
2018-08-03 10:18:53 -07:00
Benoit Steiner
cf17794ef4
Merged in codeplaysoftware/eigen-upstream-pure/SYCL-required-changes (pull request PR-454)
...
SYCL required changes
2018-08-03 16:17:30 +00:00
Mehdi Goli
3074b1ff9e
Fixing the compilation error.
2018-08-03 17:13:44 +01:00
Mehdi Goli
225fa112aa
Merge with upstream.
2018-08-03 17:04:08 +01:00
Mehdi Goli
01358300d5
Creating separate SYCL required PR for uncontroversial files.
2018-08-03 16:59:15 +01:00
Gustavo Lima Chaves
2bf1cc8cf7
Fix 256 bit packet size assumptions in unit tests.
...
Like in change 2606abed53
, we have hit the threshould again. With
AVX512 builds we would never have Vector8f packets aligned at 64
bytes (the new value of EIGEN_MAX_ALIGN_BYTES after change 405859f18d
,
for AVX512-enabled builds).
This makes test/dynalloc.cpp pass for those builds.
2018-08-02 15:55:36 -07:00
Benoit Steiner
dd5875e30d
Merged in codeplaysoftware/eigen-upstream-pure/constructor_error_clang (pull request PR-451)
...
Fixing ambigous constructor error for Clang compiler.
2018-08-02 20:46:03 +00:00
Benoit Steiner
113d8343d6
Merged in codeplaysoftware/eigen-upstream-pure/Fixing_visual_studio_error_For_tensor_trace (pull request PR-452)
...
Fixing compilation error for cxx11_tensor_trace.cpp on Microsoft Visual Studio.
2018-08-02 17:54:26 +00:00
Mehdi Goli
516d2621b9
fixing compilation error for cxx11_tensor_trace.cpp error on Microsoft Visual Studio.
2018-08-02 14:30:48 +01:00
Mehdi Goli
40d6d020a0
Fixing ambigous constructor error for Clang compiler.
2018-08-02 13:34:53 +01:00
Gael Guennebaud
62169419ab
Fix two regressions introduced in previous merges: bad usage of EIGEN_HAS_VARIADIC_TEMPLATES and linking issue.
2018-08-01 23:35:34 +02:00
Eugene Zhulenev
64abdf1d7e
Fix typo + get rid of redundant member variables for block sizes
2018-08-01 12:35:19 -07:00
Benoit Steiner
93b9e36e10
Merged in paultucker/eigen (pull request PR-431)
...
Optional ThreadPoolDevice allocator
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com >
2018-08-01 19:14:34 +00:00
Eugene Zhulenev
385b3ff12f
Merged latest changes from upstream/eigen
2018-08-01 11:59:04 -07:00
Benoit Steiner
17221115c9
Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447)
...
Adding variadic version of assert which can take a parameter pack as its input.
2018-08-01 16:41:54 +00:00
Benoit Steiner
0360c36170
Merged in codeplaysoftware/eigen-upstream-pure/separating_internal_memory_allocation (pull request PR-446)
...
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
2018-08-01 16:13:15 +00:00
Mehdi Goli
c6a5c70712
Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.h
2018-08-01 16:56:26 +01:00
Benoit Steiner
9ca1c09131
Merged in codeplaysoftware/eigen-upstream-pure/new-arch-SYCL-headers (pull request PR-448)
...
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 15:50:54 +00:00
Benoit Steiner
45f75f1ace
Merged in codeplaysoftware/eigen-upstream-pure/using_PacketType_class (pull request PR-449)
...
Enabling per device specialisation of packetSize.
2018-08-01 15:43:03 +00:00
Benoit Steiner
90e632fd66
Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_STRONG_INLINE_MACRO (pull request PR-445)
...
Replacing ad-hoc inline keyword with EIGEN_STRONG_INLINE MACRO.
2018-08-01 15:41:06 +00:00
Mehdi Goli
af96018b49
Using the suggested modification.
2018-08-01 16:04:44 +01:00
Mehdi Goli
b512a9536f
Enabling per device specialisation of packetsize.
2018-08-01 13:39:13 +01:00
Mehdi Goli
c84509d7cc
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 12:40:54 +01:00
Mehdi Goli
3a197a60e6
variadic version of assert which can take a parameter pack as its input.
2018-08-01 12:19:14 +01:00
Mehdi Goli
d7a8414848
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
2018-08-01 11:56:30 +01:00
Mehdi Goli
9e219bb3d3
Converting ad-hoc inline keyword to EIGEN_STRONG_INLINE MACRO.
2018-08-01 10:47:49 +01:00
Eugene Zhulenev
83c0a16baf
Add block evaluation support to TensorOps
2018-07-31 15:56:31 -07:00
Benoit Steiner
edf46bd7a2
Merged in yuefengz/eigen (pull request PR-370)
...
Use device's allocate function instead of internal::aligned_malloc.
2018-07-31 22:38:28 +00:00
Paul Tucker
385f7b8d0c
Change getAllocator() to allocator() in ThreadPoolDevice.
2018-07-31 13:52:18 -07:00
Mark D Ryan
6f5b126e6d
Fix tensor contraction for AVX512 machines
...
This patch modifies the TensorContraction class to ensure that the kc_ field is
always a multiple of the packet_size, if the packet_size is > 8. Without this
change spatial convolutions in Tensorflow do not work properly as the code that
re-arranges the input matrices can assert if kc_ is not a multiple of the
packet_size. This leads to a unit test failure,
//tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
2018-07-31 09:33:37 +01:00
Gael Guennebaud
d6568425f8
Close branch tiling_3.
2018-07-31 08:13:43 +00:00
Gael Guennebaud
678a0dcb12
Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)
...
Tiled tensor executor
2018-07-31 08:13:00 +00:00
Gael Guennebaud
679eece876
Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437.
2018-07-31 10:10:14 +02:00
Gael Guennebaud
723856dec1
bug #1577 : fix msvc compilation of unit test, msvc defines ptrdiff_t as long long
2018-07-30 14:52:15 +02:00
Eugene Zhulenev
966c2a7bb6
Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible
2018-07-27 12:45:17 -07:00
Eugene Zhulenev
6913221c43
Add tiled evaluation support to TensorExecutor
2018-07-25 13:51:10 -07:00
Alexey Frunze
7b91c11207
bug #1578 : Improve prefetching in matrix multiplication on MIPS.
2018-07-24 18:36:44 -07:00
Patrik Huber
f5cace5e9f
Fix two small typos in the documentation
2018-07-26 19:55:19 +00:00
Gael Guennebaud
34539c4af4
Merged in rmlarsen/eigen1 (pull request PR-441)
...
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-30 11:26:24 +00:00
Mark D Ryan
bc615e4585
Re-enable FMA for fast sqrt functions
2018-07-30 13:21:00 +02:00
Mark D Ryan
96b030a8e4
Re-enable FMA for fast sqrt functions
...
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms. The float32
version is now 88% the speed of the original function, while the
double version is 90%.
2018-07-30 10:19:51 +01:00
Rasmus Munk Larsen
e478532625
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-27 12:36:34 -07:00
Rasmus Munk Larsen
2ebcb911b2
Add pcast packet op for NEON.
2018-07-26 14:28:48 -07:00
Christoph Hertzberg
397b0547e1
DIsable static assertions only when necessary and disable double-promotion warnings in that case as well
2018-07-26 00:01:24 +02:00
Christoph Hertzberg
5e79402b4a
fix warnings for doc-eigen-prerequisites
2018-07-24 21:59:15 +02:00
Christoph Hertzberg
5f79b7f9a9
Removed several shadowing types and use global Index typedef everywhere
2018-07-25 21:47:45 +02:00
Christoph Hertzberg
44ee201337
Rename variable which shadows class name
2018-07-25 20:26:15 +02:00
Gustavo Lima Chaves
705f66a9ca
Account for missing change on commit "Remove SimpleThreadPool and..."
...
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
2018-07-23 16:29:09 -07:00
Christoph Hertzberg
fd4fe7cbc5
Fixed issue which made documentation not getting built anymore
2018-07-24 22:56:15 +02:00
Christoph Hertzberg
636126ef40
Allow to filter out build-error messages
2018-07-24 20:12:49 +02:00
Eugene Zhulenev
d55efa6f0f
TensorBlockIO
2018-07-23 15:50:55 -07:00
Eugene Zhulenev
34a75c3c5c
Initial support of TensorBlock
2018-07-20 17:37:20 -07:00
Gael Guennebaud
2c2de9da7d
Merged in glchaves/eigen (pull request PR-433)
...
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block
2018-07-23 19:38:55 +00:00
Gael Guennebaud
4ca3e48f42
fix typo
2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a
Add lastN shorcuts to seq/seqN.
2018-07-23 16:20:25 +02:00
Gustavo Lima Chaves
02eaaacbc5
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded
...
block
Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
2018-07-20 16:08:40 -07:00
Eugene Zhulenev
2bf864f1eb
Disable type traits for stdlibc++ <= 4.9.3
2018-07-20 10:11:44 -07:00
Gael Guennebaud
de70671937
Oopps, EIGEN_COMP_MSVC is not available before including Eigen.
2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc
Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build.
2018-07-20 08:36:38 -07:00
Paul Tucker
d4afccde5a
Add test coverage for ThreadPoolDevice optional allocator.
2018-07-19 17:43:44 -07:00
Eugene Zhulenev
c58b874727
PR430: Convert count to the reducer type in MeanReducer
...
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.
cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.
static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
2018-07-19 17:37:03 -07:00
Gael Guennebaud
2424e3b7ac
Pass by const ref.
2018-07-19 18:48:19 +02:00
Gael Guennebaud
509a5fa77f
Fix IsRelocatable without C++11
2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009
Fix determination of EIGEN_HAS_TYPE_TRAITS
2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f
Fix stupid error in Quaternion move ctor
2018-07-19 18:33:53 +02:00
Paul Tucker
4e9848fa86
Actually add optional Allocator* arg to ThreadPoolDevice().
2018-07-16 17:53:36 -07:00
Paul Tucker
b3e7c9132d
Add optional Allocator argument to ThreadPoolDevice constructor.
...
When supplied, this allocator will be used in place of
internal::aligned_malloc. This permits e.g. use of a NUMA-node specific
allocator where the thread-pool is also restricted a single NUMA-node.
2018-07-16 17:26:05 -07:00
Gael Guennebaud
40797dbea3
bug #1572 : use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11.
2018-07-17 00:11:20 +02:00
Gael Guennebaud
add5757488
Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder.
2018-07-16 18:55:40 +02:00
Gael Guennebaud
901c7d31f0
Fix usage of EIGEN_SPLIT_LARGE_TESTS=ON: some unit tests, such as indexed_view have to be split unconditionally.
2018-07-16 18:35:05 +02:00
Gael Guennebaud
f2b52f9946
Add the cmake option "EIGEN_DASHBOARD_BUILD_TARGET" to control the build target in dashboard mode (e.g., ctest -D Experimental)
2018-07-16 17:59:30 +02:00
Gael Guennebaud
23d82c1ac5
Merged in rmlarsen/eigen2 (pull request PR-422)
...
Optimize the case where broadcasting is a no-op.
2018-07-14 11:42:58 +00:00
Gael Guennebaud
a87cff20df
Fix GeneralizedEigenSolver when requesting for eigenvalues only.
2018-07-14 09:38:49 +02:00
Rasmus Munk Larsen
3a9cf4e290
Get rid of alias for m_broadcast.
2018-07-13 16:24:48 -07:00
Rasmus Munk Larsen
4222550e17
Optimize the case where broadcasting is a no-op.
2018-07-13 16:12:38 -07:00
Rasmus Munk Larsen
4a3952fd55
Relax the condition to not only work on Android.
2018-07-13 11:24:07 -07:00
Rasmus Munk Larsen
02a9443db9
Clang produces incorrect Thumb2 assembler when using alloca.
...
Don't define EIGEN_ALLOCA when generating Thumb with clang.
2018-07-13 11:03:04 -07:00
Gael Guennebaud
20991c3203
bug #1571 : fix is_convertible<from,to> with "from" a reference.
2018-07-13 17:47:28 +02:00
Gael Guennebaud
1920129d71
Remove clang warning
2018-07-13 16:05:35 +02:00
Gael Guennebaud
195c9c054b
Print more debug info in gpu_basic
2018-07-13 16:05:07 +02:00
Gael Guennebaud
06eb24cf4d
Introduce gpu_assert for assertion in device-code, and disable them with clang-cuda.
2018-07-13 16:04:27 +02:00
Gael Guennebaud
5fd03ddbfb
Make EIGEN_TEST_CUDA_CLANG more friendly with OSX
2018-07-13 16:03:14 +02:00
Gael Guennebaud
86d9c0255c
Forward declaring std::array does not work with all std libs, so let's just include <array>
2018-07-13 13:06:44 +02:00
David Hyde
d908afe35f
bug #1558 : fix a corner case in MINRES when both v_new and w_new vanish.
2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379
Reduce number of allocations in TensorContractionThreadPool.
2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746
bug #1569 : fix Tensor<half>::mean() on AVX with respective unit test.
2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304
Add MIPS changes missing from previous merge.
2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739
Assert that no output kernel is defined for GPU contraction
2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85
Disable type traits for GCC < 5.1.0
2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce
Specify default output kernel for TensorContractionOp
2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f
Add regression for bugs #1573 and #1575
2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88
bug #1432 : fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72
Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests.
2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725
bug #1575 : fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted.
2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9
More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA
2018-07-18 13:51:36 +02:00
Alexey Frunze
3875fb05aa
Add support for MIPS SIMD (MSA)
2018-07-06 16:04:30 -07:00
Gael Guennebaud
44ea5f7623
Add unit test for -Tensor<complex> on GPU
2018-07-12 17:19:38 +02:00
Gael Guennebaud
12e1ebb68b
Remove local Index typedef from unit-tests
2018-07-12 17:16:40 +02:00
Gael Guennebaud
63185be8b2
Disable eigenvalues test for clang-cuda
2018-07-12 17:03:14 +02:00
Gael Guennebaud
bec013b2c9
fix unused warning
2018-07-12 17:02:18 +02:00
Gael Guennebaud
5c73c9223a
Fix shadowing typedefs
2018-07-12 17:01:07 +02:00
Gael Guennebaud
98728312c8
Fix compilation regarding std::array
2018-07-12 17:00:37 +02:00
Gael Guennebaud
eb3d8f68bb
fix unused warning
2018-07-12 16:59:47 +02:00
Gael Guennebaud
006e18e52b
Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h),
...
and alignment/vectorization logic is now in util/ConfigureVectorization.h
2018-07-12 16:57:41 +02:00
Thales Sabino
9a6a43319f
Fix cxx11_tensor_fft not building on Windows.
...
The type used in Eigen::DSizes needs to be at least 8 bytes long. Internally Tensor tries to convert this to an __int64 on Windows and this fails to build. On Linux, long and long long are both 8 byte integer types.
* * *
Changing from "long long" to "std::int64_t".
2018-07-12 11:20:59 +01:00
Gael Guennebaud
b347eb0b1c
Fix doc
2018-07-12 11:56:18 +02:00
Mark D Ryan
e79c5149bf
Fix AVX512 implementations of psqrt
...
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
fixed the AVX2 version of this function. The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN. Fixing the issues requires adding
some additional instructions that slow down the algorithms. A
similar test to the one used in 3ed67cb0bb
shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7
Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances.
2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e
Fix implicit conversion from 0.0 to scalar
2018-02-16 22:26:01 +04:00
Gael Guennebaud
937ad18221
add unit test for SimplicialCholesky and Boost multiprec.
2018-02-16 22:25:11 +04:00
Julian Kent
6d451cf2b6
Add missing consts for rows and cols functions in SparseLU
2018-02-10 13:44:05 +01:00
Daniele E. Domenichelli
a12b8a8c75
FindEigen3: Set Eigen3_FOUND variable
2018-07-11 16:31:50 +02:00
Gael Guennebaud
8bdb214fd0
remove double ;;
2018-07-12 11:17:53 +02:00
Gael Guennebaud
a9060378d3
bug #1570 : fix warning
2018-07-12 11:07:09 +02:00
Gael Guennebaud
6cd6551b26
Add deprecated header files for TensorFlow
2018-07-12 10:50:53 +02:00
Gael Guennebaud
da0c604078
Merged in deven-amd/eigen (pull request PR-402)
...
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
a4ea611ca7
Remove useless specialization thanks to is_convertible being more robust.
2018-07-12 09:59:44 +02:00
Gael Guennebaud
8a40dda5a6
Add some basic unit-tests
2018-07-12 09:59:00 +02:00
Gael Guennebaud
8ef267ccbd
spellcheck
2018-07-12 09:58:29 +02:00
Gael Guennebaud
21cf4a1a8b
Make is_convertible more robust and conformant to std::is_convertible
2018-07-12 09:57:19 +02:00
Gael Guennebaud
8a5955a052
Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
2018-07-11 17:16:50 +02:00
Gael Guennebaud
d193cc87f4
Fix regression in 9357838f94
2018-07-11 17:09:23 +02:00
Gael Guennebaud
fb33687736
Fix double ;;
2018-07-11 17:08:30 +02:00
Deven Desai
876f392c39
Updates corresponding to the latest round of PR feedback
...
The major changes are
1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
The above three changes effectively enable the Eigen "Packet" layer for the HIP platform
4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places
The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
1fe0b74904
deleting hip specific files that are no longer required
2018-07-11 09:28:44 -04:00
Deven Desai
dec47a6493
renaming CUDA* to GPU* for some header files
2018-07-11 09:26:54 -04:00
Deven Desai
471cfe5ff7
renaming CUDA* to GPU* for some header files
2018-07-11 09:22:04 -04:00
Deven Desai
38807a2575
merging updates from upstream
2018-07-11 09:17:33 -04:00
Gael Guennebaud
f00d08cc0a
Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix.
2018-07-11 14:01:47 +02:00
Gael Guennebaud
1625476091
Add internall::is_identity compile-time helper
2018-07-11 14:00:24 +02:00
Gael Guennebaud
fe723d6129
Fix conversion warning
2018-07-10 09:10:32 +02:00
Gael Guennebaud
9357838f94
bug #1543 : improve linear indexing for general block expressions
2018-07-10 09:10:15 +02:00
Gael Guennebaud
de9e31a06d
Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
...
If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
2018-07-09 15:41:14 +02:00
Gael Guennebaud
6190aa5632
bug #1567 : add optimized path for tensor broadcasting and 'Channel First' shape
2018-07-09 11:23:16 +02:00
Gael Guennebaud
ec323b7e66
Skip null numerators in triangular-vector-solve (as in BLAS TRSV).
2018-07-09 11:13:19 +02:00
Gael Guennebaud
359dd77ec3
Fix legitimate "declaration shadows a typedef" warning
2018-07-09 11:03:39 +02:00
Deven Desai
e2b2c61533
merging from master
2018-06-20 16:47:45 -04:00
Deven Desai
1bb6fa99a3
merging the CUDA and HIP implementation for the Tensor directory and the unit tests
2018-06-20 16:44:58 -04:00
Deven Desai
cfdabbcc8f
removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories
2018-06-20 12:57:02 -04:00
Deven Desai
7e41c8f1a9
renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories
2018-06-20 12:52:30 -04:00
Deven Desai
ee73ae0a80
Merged eigen/eigen into default
2018-06-20 12:37:11 -04:00
Mark D Ryan
90a53ca6fd
Fix the Packet16h version of ptranspose
...
The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was
reordering the PacketBlock argument incorrectly. This lead to errors in
the multiplication of matrices composed of 16 bit floats on AVX512
machines, if at least of the matrices was using RowMajor order. This
error is responsible for one tensorflow unit test failure on AVX512
machines:
//tensorflow/python/kernel_tests:batch_matmul_op_test
2018-06-16 15:13:06 -07:00
Gael Guennebaud
1f54164eca
Fix a few issues with Packet16h
2018-07-07 00:15:07 +02:00
Gael Guennebaud
f2dc048df9
complete implementation of Packet16h (AVX512)
2018-07-06 17:43:11 +02:00
Gael Guennebaud
a937c50208
palign is not used anymore, so let's relax the unit test
2018-07-06 17:41:52 +02:00
Gael Guennebaud
56a33ae57d
test product kernel with half-floats.
2018-07-06 17:14:04 +02:00
Gael Guennebaud
f4d623ffa7
Complete Packet8h implementation and test it in packetmath unit test
2018-07-06 17:13:36 +02:00
Gael Guennebaud
a8ab6060df
Add unitests for inverse and selfadjoint-eigenvalues on CUDA
2018-07-06 09:58:45 +02:00
Gael Guennebaud
b8271bb368
fix md5sum of lapack_addons
2018-06-15 14:21:29 +02:00
Deven Desai
b6cc0961b1
updates based on PR feedback
...
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)
1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`
After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.
2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
ba972fb6b4
moving Half headers from CUDA dir to GPU dir, removing the HIP versions
2018-06-13 12:26:18 -04:00
Deven Desai
d1d22ef0f4
syncing this fork with upstream
2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d
Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
...
Derivative of the incomplete Gamma function and the sample of a Gamma random variable
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com >
2018-06-11 17:57:47 +00:00
Andrea Bocci
f7124b3e46
Extend CUDA support to matrix inversion and selfadjointeigensolver
2018-06-11 18:33:24 +02:00
Gael Guennebaud
0537123953
bug #1565 : help MSVC to generatenot too bad ASM in reductions.
2018-07-05 09:21:26 +02:00
Gael Guennebaud
6a241bd8ee
Implement custom inplace triangular product to avoid a temporary
2018-07-03 14:02:46 +02:00
Gael Guennebaud
3ae2083e23
Make is_same_dense compatible with different scalar types.
2018-07-03 13:21:43 +02:00
Gael Guennebaud
67ec37f7b0
Activate dgmres unit test
2018-07-02 12:54:14 +02:00
Gael Guennebaud
047677a08d
Fix regression in changeset f05dea6b23
...
: computeFromHessenberg can take any expression for matrixQ, not only an HouseholderSequence.
2018-07-02 12:18:25 +02:00
Gael Guennebaud
d625564936
Simplify redux_evaluator using inheritance, and properly rename parameters in reducers.
2018-07-02 11:50:41 +02:00
Gael Guennebaud
d428a199ab
bug #1562 : optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x...
2018-07-02 11:41:09 +02:00
Gael Guennebaud
a7b313a16c
Fix unit test
2018-07-01 22:45:47 +02:00
Gael Guennebaud
0cdacf3fa4
update comment
2018-06-29 11:28:36 +02:00
Gael Guennebaud
54f6eeda90
Merged in net147/eigen (pull request PR-411)
...
Use std::complex constructor instead of assignment from scalar
2018-06-28 21:01:04 +00:00
Gael Guennebaud
9a81de1d35
Fix order of EIGEN_DEVICE_FUNC and returned type
2018-06-28 00:20:59 +02:00
Jonathan Liu
b7689bded9
Use std::complex constructor instead of assignment from scalar
...
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
2018-06-28 00:32:37 +10:00
Gael Guennebaud
f9d337780d
First step towards a generic vectorised quaternion product
2018-06-25 14:26:51 +02:00
Gael Guennebaud
ee5864f72e
bug #1560 fix product with a 1x1 diagonal matrix
2018-06-25 10:30:12 +02:00
Rasmus Munk Larsen
2f62cc68cd
merge
2018-06-22 15:09:44 -07:00
Rasmus Munk Larsen
bda71ad394
Fix typo in pbend for AltiVec.
2018-06-22 15:04:35 -07:00
Benoit Steiner
b6ffcd22e3
Merged in rmlarsen/eigen2 (pull request PR-409)
...
Fix oversharding bug in parallelFor.
2018-06-21 18:34:57 +00:00
Gael Guennebaud
4cc32d80fd
bug #1555 : compilation fix with XLC
2018-06-21 10:28:38 +02:00
Rasmus Munk Larsen
5418154a45
Fix oversharding bug in parallelFor.
2018-06-20 17:51:48 -07:00
Gael Guennebaud
cb4c9a6a94
bug #1531 : make dedicatd unit testing for NumDimensions
2018-06-08 17:11:45 +02:00
Gael Guennebaud
d6813fb1c5
bug #1531 : expose NumDimensions for solve and sparse expressions.
2018-06-08 16:55:10 +02:00
Gael Guennebaud
89d65bb9d6
bug #1531 : expose NumDimensions for compatibility with Tensor
2018-06-08 16:50:17 +02:00
Gael Guennebaud
f05dea6b23
bug #1550 : prevent avoidable memory allocation in RealSchur
2018-06-08 10:14:57 +02:00
Gael Guennebaud
7933267c67
fix prototype
2018-06-08 09:56:01 +02:00
Gael Guennebaud
f4d1461874
Fix the way matrix folder is passed to the tests.
2018-06-08 09:55:46 +02:00
Benoit Steiner
522d3ca54d
Don't use std::equal_to inside cuda kernels since it's not supported.
2018-06-07 13:02:07 -07:00
Christoph Hertzberg
7d7bb91537
Missing line during manual rebase of PR-374
2018-06-07 20:30:09 +02:00
Michael Figurnov
30fa3d0454
Merge from eigen/eigen
2018-06-07 17:57:56 +01:00
Benoit Steiner
d2b0a4a59b
Merged in mfigurnov/eigen/fix-bessel (pull request PR-404)
...
Fix compilation of special functions without C99 math.
2018-06-07 16:12:42 +00:00
Michael Figurnov
6c71c7d360
Merge from eigen/eigen.
2018-06-07 15:54:18 +01:00
Gael Guennebaud
c25034710e
Fiw some warnings in dox examples
2018-06-07 16:09:22 +02:00
Gael Guennebaud
37348d03ae
Fix int versus Index
2018-06-07 15:56:43 +02:00
Gael Guennebaud
c723ffd763
Fix warning
2018-06-07 15:56:20 +02:00
Gael Guennebaud
af7c83b9a2
Fix warning
2018-06-07 15:45:24 +02:00
Gael Guennebaud
7fe29aceeb
Fix MSVC warning C4290: C++ exception specification ignored except to indicate a function is not __declspec(nothrow)
2018-06-07 15:36:20 +02:00
Michael Figurnov
aa813d417b
Fix compilation of special functions without C99 math.
...
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
2018-06-07 14:35:07 +01:00
Gael Guennebaud
55774b48e4
Fix short vs long
2018-06-07 15:26:25 +02:00
Christoph Hertzberg
e5f9f4768f
Avoid unnecessary C++11 dependency
2018-06-07 15:03:50 +02:00
Gael Guennebaud
b3fd93207b
Fix typos found using codespell
2018-06-07 14:43:02 +02:00
Michael Figurnov
5172a32849
Updated the stopping criteria in igammac_cf_impl.
...
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
2018-06-07 12:03:58 +01:00
Michael Figurnov
4bd158fa37
Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
...
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter
The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Deven Desai
8fbd47052b
Adding support for using Eigen in HIP kernels.
...
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.
Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)
Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Benoit Steiner
e206f8d4a4
Merged in mfigurnov/eigen (pull request PR-400)
...
Exponentially scaled modified Bessel functions of order zero and one.
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com >
2018-06-05 17:05:21 +00:00
Penporn Koanantakool
e2ed0cf8ab
Add a ThreadPoolInterface* getter for ThreadPoolDevice.
2018-06-02 12:07:49 -07:00
Gael Guennebaud
84868da904
Don't run hg on non mercurial clone
2018-05-31 21:21:57 +02:00
Michael Figurnov
f216854453
Exponentially scaled modified Bessel functions of order zero and one.
...
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)
The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Gael Guennebaud
6af1433cb5
Doc: add aliasing in common pitfaffs.
2018-05-29 22:37:47 +02:00
Katrin Leinweber
ea94543190
Hyperlink DOIs against preferred resolver
2018-05-24 18:55:40 +02:00
Gael Guennebaud
999b552c16
Search for sequential Pastix.
2018-05-29 20:49:25 +02:00
Gael Guennebaud
eef4b7bd87
Fix handling of path names containing spaces and the likes.
2018-05-29 20:49:06 +02:00
Gael Guennebaud
647b724a36
Define pcast<> for SSE types even when AVX is enabled. (otherwise float are silently reinterpreted as int instead of being converted)
2018-05-29 20:46:46 +02:00
Gael Guennebaud
49262dfee6
Fix compilation and SSE support with PGI compiler
2018-05-29 15:09:31 +02:00
Christoph Hertzberg
750af06362
Add an option to test with external BLAS library
2018-05-22 21:04:32 +02:00
Christoph Hertzberg
d06a753d10
Make qr_fullpivoting unit test run for fixed-sized matrices
2018-05-22 20:29:17 +02:00
Gael Guennebaud
f0862b062f
Fix internal::is_integral<size_t/ptrdiff_t> with MSVC 2013 and older.
2018-05-22 19:29:51 +02:00
Gael Guennebaud
36e413a534
Workaround a MSVC 2013 compilation issue with MatrixBase(Index,int)
2018-05-22 18:51:35 +02:00
Gael Guennebaud
725bd92903
fix stupid typo
2018-05-18 17:46:43 +02:00
Gael Guennebaud
a382bc9364
is_convertible<T,Index> does not seems to work well with MSVC 2013, so let's rather use __is_enum(T) for old MSVC versions
2018-05-18 17:02:27 +02:00
Gael Guennebaud
4dd767f455
add some internal checks
2018-05-18 13:59:55 +02:00
Gael Guennebaud
345c0ab450
check that all integer types are properly handled by mat(i,j)
2018-05-18 13:46:46 +02:00
Mark D Ryan
405859f18d
Set EIGEN_IDEAL_MAX_ALIGN_BYTES correctly for AVX512 builds
...
bug #1548
The macro EIGEN_IDEAL_MAX_ALIGN_BYTES is being incorrectly set to 32
on AVX512 builds. It should be set to 64. In the current code it is
only set to 64 if the macro EIGEN_VECTORIZE_AVX512 is defined. This
macro does get defined in AVX512 builds in Core, but only after Macros.h,
the file that defines EIGEN_IDEAL_MAX_ALIGN_BYTES, has been included.
This commit fixes the issue by setting EIGEN_IDEAL_MAX_ALIGN_BYTES to
64 if __AVX512F__ is defined.
2018-05-17 17:04:00 +01:00
Vamsi Sripathi
6293ad3f39
Performance improvements to tensor broadcast operation
...
1. Added new packet functions using SIMD for NByOne, OneByN cases
2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
3. Added 4 test cases to cover the new packet functions
2018-05-23 14:02:05 -07:00
Gael Guennebaud
7134fa7a2e
Fix compilation with MSVC by reverting to char* for _mm_prefetch except for PGI (the later being the one that has the wrong prototype).
2018-06-07 09:33:10 +02:00
Jeff Trull
e7147f69ae
Add tests for sparseQR results (value and size) covering bugs #1522 and #1544
2018-04-21 10:26:30 -07:00
Robert Lukierski
b2053990d0
Adding EIGEN_DEVICE_FUNC to Products, especially Dense2Dense Assignment
...
specializations. Otherwise causes problems with small fixed size matrix multiplication (call to
0x00 in call_assignment_no_alias in debug mode or trap in release with CUDA 9.1).
2018-03-14 16:19:43 +00:00
Jeff Trull
9f0c5c3669
Make sparse QR result sizes consistent with dense QR, with the following rules:
...
1) Q is always square
2) Q*R*P' is valid and recovers the original matrix
This implies that the size of Q is the number of rows in the original matrix, square,
and that the size of R is the size of the original matrix.
2018-02-15 15:00:31 -08:00
Christoph Hertzberg
d655900953
bug #1544 : Generate correct Q matrix in complex case. Original patch was by Jeff Trull in PR-386.
2018-05-17 19:17:01 +02:00
Benoit Steiner
0371380d5b
Merged in rmlarsen/eigen2 (pull request PR-393)
...
Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.
2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen
b8d36774fa
Rename clip2 to clamp.
2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen
812480baa3
Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow.
2018-05-16 09:49:24 -07:00
Benoit Steiner
1403c2c15b
Merged in didierjansen/eigen (pull request PR-360)
...
Fix bugs and typos in the contraction example of the tensor README
2018-05-16 01:16:36 +00:00
Benoit Steiner
ad355b3f05
Merged in rmlarsen/eigen2 (pull request PR-392)
...
Add vectorized clip functor for Eigen Tensors
2018-05-16 01:15:56 +00:00
Christoph Hertzberg
0272f2451a
Fix "suggest parentheses around comparison" warning
2018-05-15 19:35:53 +02:00
Rasmus Munk Larsen
afec3021f7
Use numext::maxi & numext::mini.
2018-05-14 16:35:39 -07:00
Rasmus Munk Larsen
b8c8e5f436
Add vectorized clip functor for Eigen Tensors.
2018-05-14 16:07:13 -07:00
Benoit Steiner
6118c6ff4f
Enable RawAccess to tensor slices whenever possinle.
...
Avoid 32-bit integer overflow in TensorSlicingOp
2018-04-30 11:28:12 -07:00
Gael Guennebaud
6e7118265d
Fix compilation with NEON+MSVC
2018-04-26 10:50:41 +02:00
Gael Guennebaud
097dd4616d
Fix unit test for SIMD engine not supporting sqrt
2018-04-26 10:47:39 +02:00
Gael Guennebaud
8810baaed4
Add multi-threading for sparse-row-major * dense-row-major
2018-04-25 10:14:48 +02:00
Gael Guennebaud
2f3287da7d
Fix "used uninitialized" warnings
2018-04-24 17:17:25 +02:00
Gael Guennebaud
3ffd449ef5
Workaround warning
2018-04-24 17:11:51 +02:00
Gael Guennebaud
e8ca5166a9
bug #1428 : atempt to make NEON vectorization compilable by MSVC.
...
The workaround is to wrap NEON packet types to make them different c++ types.
2018-04-24 11:19:49 +02:00
Benoit Steiner
6f5935421a
fix AVX512 plog
2018-04-23 15:49:26 +00:00
Gael Guennebaud
e9da464e20
Add specializations of is_arithmetic for long long in c++11
2018-04-23 16:26:29 +02:00
Gael Guennebaud
a57e6e5f0f
workaround MSVC 2013 compilation issue (ambiguous call)
2018-04-23 15:31:51 +02:00
Gael Guennebaud
11123175db
typo in doc
2018-04-23 15:30:35 +02:00
Gael Guennebaud
5679e439e0
bug #1543 : fix linear indexing in generic block evaluation (this completes the fix in commit 12efc7d41b
...
)
2018-04-23 14:40:16 +02:00
Gael Guennebaud
35b31353ab
Fix unit test
2018-04-22 22:49:08 +02:00
Christoph Hertzberg
34e499ad36
Disable -Wshadow when compiling with g++
2018-04-21 22:08:26 +02:00
Jayaram Bobba
b7b868d1c4
fix AVX512 plog
2018-04-20 13:39:18 -07:00
Gael Guennebaud
686fb57233
fix const cast in NEON
2018-04-18 18:46:34 +02:00
Dmitriy Korchemkin
02d2f1cb4a
Cast zeros to Scalar in RealSchur
2018-04-18 13:52:46 +03:00
Christoph Hertzberg
50633d1a83
Renamed .trans() et al. to .reverseFlag() et at. Adapted documentation of .setReverseFlag()
2018-04-17 11:30:27 +02:00
nicolov
39c2cba810
Add a specialization of Eigen::numext::conj for std::complex<T> to be used when compiling a cuda kernel. This fixes the compilation of TensorFlow 1.4 with clang 6.0 used as CUDA compiler with libc++.
...
This follows the previous change in 2a69290ddb
, which mentions OSX (I guess because it uses libc++ too).
2018-04-13 22:29:10 +00:00
Christoph Hertzberg
775766d175
Add parenthesis to fix compiler warnings
2018-04-15 18:43:56 +02:00
Christoph Hertzberg
42715533f1
bug #1493 : Make representation of HouseholderSequence consistent and working for complex numbers. Made corresponding unit test actually test that. Also simplify implementation of QR decompositions
2018-04-15 10:15:28 +02:00
Christoph Hertzberg
c9ecfff2e6
Add links where to make PRs and report bugs into README.md
2018-04-13 21:05:28 +00:00
Christoph Hertzberg
c8b19702bc
Limit test size for sparse Cholesky solvers to EIGEN_TEST_MAX_SIZE
2018-04-13 20:36:58 +02:00
Christoph Hertzberg
2cbb00b18e
No need to make noise, if KLU is found
2018-04-13 19:14:25 +02:00
Christoph Hertzberg
84dcd998a9
Recent Adolc versions require C++11
2018-04-13 19:10:23 +02:00
Christoph Hertzberg
4d392d93aa
Make hypot_impl compile again for types with expression-templates (e.g., boost::multiprecision)
2018-04-13 19:01:37 +02:00
Christoph Hertzberg
072e111ec0
SelfAdjointView<...,Mode> causes a static assert since commit d820ab9edc
2018-04-13 19:00:34 +02:00
Gael Guennebaud
7a9089c33c
fix linking issue
2018-04-13 08:51:47 +02:00
Gael Guennebaud
e43ca0320d
bug #1520 : workaround some -Wfloat-equal warnings by calling std::equal_to
2018-04-11 15:24:13 +02:00
Weiming Zhao
b0eda3cb9f
Avoid using memcpy for non-POD elements
2018-04-11 11:37:06 +02:00
Gael Guennebaud
79266fec75
extend doxygen splitter for huge screens
2018-04-11 11:31:17 +02:00
Gael Guennebaud
426052ef6e
Update header/footer for doxygen 1.8.13
2018-04-11 11:30:34 +02:00
Gael Guennebaud
9c8decffbf
Fix javascript hacks for oxygen 1.8.13
2018-04-11 11:30:14 +02:00
Gael Guennebaud
e798466871
bug #1538 : update manual pages regarding BDCSVD.
2018-04-11 10:46:11 +02:00
Gael Guennebaud
c91906b065
Umfpack: UF_long has been removed in recent versions of suitesparse, and fix a few long-to-int conversions issues.
2018-04-11 09:59:59 +02:00
Gael Guennebaud
0050709ea7
Merged in v_huber/eigen (pull request PR-378)
...
Add interface to umfpack_*l_* functions
2018-04-11 07:43:04 +00:00
Guillaume Jacob
8c1652055a
Fix code sample output in block(int, int, int, int) doxygen
2018-04-09 17:23:59 +02:00
vhuber
08008f67e1
Add unitTest
2018-04-09 17:07:46 +02:00
Gael Guennebaud
add15924ac
Fix MKL backend for symmetric eigenvalues on row-major matrices.
2018-04-09 13:29:26 +02:00
Gael Guennebaud
04b1628e55
Add missing empty line.
2018-04-09 13:28:31 +02:00
Gael Guennebaud
c2624c0318
Fix cmake scripts with no fortran compiler
2018-04-07 08:45:19 +02:00
Gael Guennebaud
2f833b1c64
bug #1509 : fix computeInverseWithCheck for complexes
2018-04-04 15:47:46 +02:00
Gael Guennebaud
b903fa74fd
Extend list of MSVC versions
2018-04-04 15:14:09 +02:00
Gael Guennebaud
403f09ccef
Make stableNorm and blueNorm compatible with 2D matrices.
2018-04-04 15:13:31 +02:00
Gael Guennebaud
4213b63f5c
Factories code between numext::hypot and scalar_hyot_op functor.
2018-04-04 15:12:43 +02:00
Gael Guennebaud
368dd4cd9d
Make innerVector() and innerVectors() methods available to all expressions supported by Block.
...
Before, only SparseBase exposed such methods.
2018-04-04 15:09:21 +02:00
Gael Guennebaud
e116f6847e
bug #1521 : avoid signalling NaN in hypot and make it std::complex<> friendly.
2018-04-04 13:47:23 +02:00
Gael Guennebaud
73729025a4
bug #1521 : add unit test dedicated to numbest::hypos
2018-04-04 13:45:34 +02:00
Gael Guennebaud
13f5df9f67
Add a note on vec_min vs asm
2018-04-04 13:10:38 +02:00
Gael Guennebaud
e91e314347
bug #1494 : makes pmin/pmax behave on Altivec/VSX as on x86 regading NaNs
2018-04-04 11:39:19 +02:00
Gael Guennebaud
112c899304
comment unreachable code
2018-04-03 23:16:43 +02:00
Gael Guennebaud
a1292395d6
Fix compilation of product with inverse transpositions (e.g., mat * Transpositions().inverse())
2018-04-03 23:06:44 +02:00
Gael Guennebaud
8c7b5158a1
commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix)
...
Author: George Burgess IV <gbiv@google.com >
Date: Thu Mar 1 11:20:24 2018 -0800
Prefer `::operator new` to `new`
The C++ standard allows compilers much flexibility with `new`
expressions, including eliding them entirely
(https://godbolt.org/g/yS6i91 ). However, calls to `operator new` are
required to be treated like opaque function calls.
Since we're calling `new` for side-effects other than allocating heap
memory, we should prefer the less flexible version.
Signed-off-by: George Burgess IV <gbiv@google.com >
2018-04-03 17:15:38 +02:00
Gael Guennebaud
dd4cc6bd9e
bug #1527 : fix support for MKL's VML (destination was not properly resized)
2018-04-03 17:11:15 +02:00
Gael Guennebaud
c5b56f1fb2
bug #1528 : better use numeric_limits::min() instead of 1/highest() that with underflow.
2018-04-03 16:49:35 +02:00
Gael Guennebaud
8d0ffe3655
bug #1516 : add assertion for out-of-range diagonal index in MatrixBase::diagonal(i)
2018-04-03 16:15:43 +02:00
Gael Guennebaud
407e3e2621
bug #1532 : disable stl::*_negate in C++17 (they are deprecated)
2018-04-03 15:59:30 +02:00
Gael Guennebaud
40b4bf3d32
AVX512: _mm512_rsqrt28_ps is available for AVX512ER only
2018-04-03 14:36:27 +02:00
Gael Guennebaud
584951ca4d
Rename predux_downto4 to be more accurate on its semantic.
2018-04-03 14:28:38 +02:00
Gael Guennebaud
67bac6368c
protect calls to isnan
2018-04-03 14:19:04 +02:00
Gael Guennebaud
d43b2f01f4
Fix unit testing of predux_downto4 (bad name), and add unit testing of prsqrt
2018-04-03 14:14:00 +02:00
Gael Guennebaud
7b0630315f
AVX512: fix psqrt and prsqrt
2018-04-03 14:12:50 +02:00
Gael Guennebaud
6719409cd9
AVX512: add missing pinsertfirst and pinsertlast, implement pblend for Packet8d, fix compilation without AVX512DQ
2018-04-03 14:11:56 +02:00
Gael Guennebaud
524119d32a
Fix uninitialized output argument.
2018-04-03 10:56:10 +02:00
vhuber
267a144da5
Remove unnecessary define
2018-03-30 23:04:53 +02:00
vhuber
baf9a5a776
Add interface to umfpack_*l_* functions
2018-03-30 18:53:34 +02:00
luz.paz
e3912f5e63
MIsc. source and comment typos
...
Found using `codespell` and `grep` from downstream FreeCAD
2018-03-11 10:01:44 -04:00
Gael Guennebaud
5deeb19e7b
bug #1517 : fix triangular product with unit diagonal and nested scaling factor: (s*A).triangularView<UpperUnit>()*B
2018-02-09 16:52:35 +01:00
Gael Guennebaud
12efc7d41b
Fix linear indexing in generic block evaluation.
2018-02-09 16:45:49 +01:00
Gael Guennebaud
f4a6863c75
Fix typo
2018-02-09 16:43:49 +01:00
Viktor Csomor
000840cae0
Added a move constructor and move assignment operator to Tensor and wrote some tests.
2018-02-07 19:10:54 +01:00
Gael Guennebaud
3a2dc3869e
Fix weird issue with MSVC 2013
2018-07-18 02:26:43 -07:00
Eugene Zhulenev
c95aacab90
Fix TensorContractionOp evaluators for GPU and SYCL
2018-07-17 14:09:37 -07:00
Gael Guennebaud
038b55464b
Merged in deven-amd/eigen (pull request PR-425)
...
applying EIGEN_DECLARE_TEST to *gpu unit tests
2018-07-17 21:14:40 +00:00
Deven Desai
f124f07965
applying EIGEN_DECLARE_TEST to *gpu* tests
...
Also, a few minor fixes for GPU tests running in HIP mode.
1. Adding an include for hip/hip_runtime.h in the Macros.h file
For HIP __host__ and __device__ are macros which are defined in hip headers.
Their definitions need to be included before their use in the file.
2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
"Fuse computations into the Tensor contractions using output kernel"
3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
dff3a92d52
Remove usage of #if EIGEN_TEST_PART_XX in unit tests that does not require them (splitting can thus be avoided for them)
2018-07-17 15:52:58 +02:00
Gael Guennebaud
82f0ce2726
Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
...
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Gael Guennebaud
37f4bdd97d
Fix VERIFY_EVALUATION_COUNT(EXPR,N) with a complex expression as N
2018-07-17 13:20:49 +02:00
Gael Guennebaud
2b2cd85694
bug #1573 : add noexcept move constructor and move assignment operator to Quaternion
2018-07-17 11:11:33 +02:00
Eugene Zhulenev
43206ac4de
Call OutputKernel in evalGemv
2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf
Remove SimpleThreadPool and always use {NonBlocking}ThreadPool
2018-07-16 15:06:57 -07:00
Eugene Zhulenev
b324ed55d9
Call OutputKernel in evalGemv
2018-07-12 14:52:23 -07:00
Eugene Zhulenev
01fd4096d3
Fuse computations into the Tensor contractions using output kernel
2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f
Some warning fixes
2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57
Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
2018-01-30 20:22:12 +00:00
Gael Guennebaud
09a16ba42f
bug #1412 : fix compilation with nvcc+MSVC
2018-01-17 23:13:16 +01:00
Lee.Deokjae
5b3c367926
Fix typos in the contraction example of tensor README
2018-01-06 14:36:19 +09:00
Eugene Chereshnev
f558ad2955
Fix incorrect ldvt in LAPACKE call from JacobiSVD
2018-01-03 12:55:52 -08:00
Benoit Steiner
22de74aa76
Disable use of recurrence for computing twiddle factors.
2018-01-09 18:32:52 +00:00
Gael Guennebaud
73629f8b68
Fix gcc7 warning
2018-01-09 08:59:27 +01:00
RJ Ryan
59985cfd26
Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689
2017-12-31 10:44:56 -05:00
nluehr
f9bdcea022
For cuda 9.1 replace math_functions.hpp with cuda_runtime.h
2017-12-18 16:51:15 -08:00
Gael Guennebaud
06bf1047f9
Fix compilation of stableNorm with some expressions as input
2017-12-15 15:15:37 +01:00
Gael Guennebaud
73214c4bd0
Workaround nvcc 9.0 issue. See PR 351.
...
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Gael Guennebaud
31e0bda2e3
Fix cmake warning
2017-12-14 15:48:27 +01:00
Gael Guennebaud
26a2c6fc16
fix unit test
2017-12-14 15:11:04 +01:00
Gael Guennebaud
546ab97d76
Add possibility to overwrite EIGEN_STRONG_INLINE.
2017-12-14 14:47:38 +01:00
Gael Guennebaud
9c3aed9d48
Fix packet and alignment propagation logic of Block<Xpr> expressions. In particular, (A+B).col(j) lost vectorisation.
2017-12-14 14:24:33 +01:00
Gael Guennebaud
76c7dae600
ignore all *build* sub directories
2017-12-14 14:22:14 +01:00
Gael Guennebaud
b2cacd189e
fix header inclusion
2017-12-14 10:01:02 +01:00
Yangzihao Wang
3122477c86
Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
2017-12-12 11:15:24 -08:00
Benoit Steiner
393b7c4959
Merged in ncluehr/eigen/float2half-fix (pull request PR-349)
...
Replace __float2half_rn with __float2half
2017-12-01 00:29:51 +00:00
nluehr
aefd5fd5c4
Replace __float2half_rn with __float2half
...
The latter provides a consistent definition for CUDA 8.0 and 9.0.
2017-11-28 10:15:46 -08:00
Gael Guennebaud
d0b028e173
clarify Pastix requirements
2017-11-27 22:11:57 +01:00
Gael Guennebaud
3587e481fb
silent MSVC warning
2017-11-27 21:53:02 +01:00
Benoit Steiner
3a327cd3c7
Merged in ncluehr/eigen/predux_fp16_fix (pull request PR-348)
...
Fix incorrect integer cast in half2 predux.
2017-11-21 21:11:45 +00:00
nluehr
dd6de618c3
Fix incorrect integer cast in predux<half2>().
...
Bug corrupts results on Maxwell and earlier GPU architectures.
2017-11-21 10:47:00 -08:00
Gael Guennebaud
3dc6ff73ca
Handle PGI compiler
2017-11-17 22:54:39 +01:00
Zvi Rackover
599a88da27
Disable gcc-specific workaround for Clang to allow build with AVX512
...
There is currently a workaround for an issue in gcc that requires invoking gcc with the -fabi-version flag. This workaround is not needed for Clang and moreover is not supported.
2017-11-16 19:53:38 +00:00
Gael Guennebaud
672bdc126b
bug #1479 : fix failure detection in LDLT
2017-11-16 17:55:24 +01:00
Basil Fierz
624df50945
Adds missing EIGEN_STRONG_INLINE to support MSVC properly inlining small vector calculations
...
When working with MSVC often small vector operations are not properly inlined. This behaviour is observed even on the most recent compiler versions.
2017-10-26 22:44:28 +02:00
Benoit Steiner
746a6b7b81
Merged in zzp11/eigen/zzp11/a-small-mistake-quickreferencedox-edited-1510217281963 (pull request PR-346)
...
a small mistake QuickReference.dox edited online with Bitbucket
2018-03-23 01:02:34 +00:00
Benoit Steiner
d2631ef61d
Merged in facaiy/eigen/ENH/exp_support_complex_for_gpu (pull request PR-359)
...
ENH: exp supports complex type for cuda
2018-03-23 00:59:15 +00:00
Benoit Steiner
8fcbd6d4c9
Merged in dtrebbien/eigen (pull request PR-369)
...
Move up the specialization of std::numeric_limits
2018-03-23 00:54:58 +00:00
Rasmus Munk Larsen
e900b010c8
Improve robustness of igamma and igammac to bad inputs.
...
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.
Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00
Gael Guennebaud
f7d17689a5
Add static assertion for fixed sizes Ref<>
2018-03-09 10:11:13 +01:00
Gael Guennebaud
f6be7289d7
Implement better static assertion checking to make sure that the first assertion is a static one and not a runtime one.
2018-03-09 10:00:51 +01:00
Gael Guennebaud
d820ab9edc
Add static assertion on selfadjoint-view's UpLo parameter.
2018-03-09 09:33:43 +01:00
Daniel Trebbien
0c57be407d
Move up the specialization of std::numeric_limits
...
This fixes a compilation error seen when building TensorFlow on macOS:
https://github.com/tensorflow/tensorflow/issues/17067
2018-02-18 15:35:45 -08:00
Yan Facai (颜发才)
42a8334668
ENH: exp supports complex type for cuda
2018-01-04 16:01:01 +08:00
zhouzhaoping
912e9965ef
a small mistake QuickReference.dox edited online with Bitbucket
2017-11-09 08:49:01 +00:00
Gael Guennebaud
4c03b3511e
Fix issue with boost::multiprec in previous commit
2017-11-08 23:28:01 +01:00
Gael Guennebaud
e9d2888e74
Improve debugging tests and output in BDCSVD
2017-11-08 10:26:03 +01:00
Gael Guennebaud
e8468ea91b
Fix overflow issues in BDCSVD
2017-11-08 10:24:28 +01:00
Benoit Steiner
3949615176
Merged in JonasMu/eigen (pull request PR-329)
...
Added an example for a contraction to a scalar value to README.md
Approved-by: Jonas Harsch <jonas.harsch@gmail.com >
2017-10-27 07:27:46 +00:00
Christoph Hertzberg
11ddac57e5
Merged in guillaume_michel/eigen (pull request PR-334)
...
- Add support for NEON plog PacketMath function
2017-10-23 13:22:22 +00:00
Benoit Steiner
a6d875bac8
Removed unecesasry #include
2017-10-22 08:12:45 -07:00
Benoit Steiner
f16ba2a630
Merged in LaFeuille/eigen-1/LaFeuille/typo-fix-alignmeent-alignment-1505889397887 (pull request PR-335)
...
Typo fix alignmeent ->alignment
2017-10-21 01:59:55 +00:00
Benoit Steiner
ee6ad21b25
Merged in henryiii/eigen/henryiii/device (pull request PR-343)
...
Fixing missing inlines on device functions for newer CUDA cards
2017-10-21 01:58:22 +00:00
Henry Schreiner
9bb26eb8f1
Restore __device__
2017-10-21 00:50:38 +00:00
Henry Schreiner
4245475d22
Fixing missing inlines on device functions for newer CUDA cards
2017-10-20 03:20:13 +00:00
Benoit Steiner
8eb4b9d254
Merged in benoitsteiner/opencl (pull request PR-341)
2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen
2dd63ed395
Merge
2017-10-13 15:58:52 -07:00
Rasmus Munk Larsen
f349507e02
Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%.
2017-10-13 15:58:12 -07:00
Benoit Steiner
688451409d
Merged in mehdi_goli/upstr_benoit/ComputeCppNewReleaseFix (pull request PR-16)
...
Changes required for new ComputeCpp CE version.
2017-10-13 20:56:01 +00:00
Konstantinos Margaritis
0e6e027e91
check both z13 and z14 arches
2017-10-12 15:38:34 -04:00
Konstantinos Margaritis
6c3475f110
remove debugging
2017-10-12 15:34:55 -04:00
Konstantinos Margaritis
df7644aec3
Merged eigen/eigen into default
2017-10-12 22:23:13 +03:00
Konstantinos Margaritis
98e52cc770
rollback 374f750ad4
2017-10-12 15:22:10 -04:00
Konstantinos Margaritis
c4ad358565
explicitly set conjugate mask
2017-10-11 11:05:29 -04:00
Konstantinos Margaritis
380d41fd76
added some extra debugging
2017-10-11 10:40:12 -04:00
Konstantinos Margaritis
d0b7b9d0d3
some Packet2cf pmul fixes
2017-10-11 10:17:22 -04:00
Konstantinos Margaritis
df173f5620
initial pexp() for 32-bit floats, commented out due to vec_cts()
2017-10-11 09:40:49 -04:00
Konstantinos Margaritis
3dcae2a27f
initial pexp() for 32-bit floats, commented out due to vec_cts()
2017-10-11 09:40:45 -04:00
Konstantinos Margaritis
c2a2246489
fix predux_mul for z14/float
2017-10-10 13:38:32 -04:00
Konstantinos Margaritis
374f750ad4
eliminate 'enumeral and non-enumeral type in conditional expression' warning
2017-10-09 16:56:30 -04:00
Konstantinos Margaritis
bc30305d29
complete z14 port
2017-10-09 16:55:10 -04:00
Gael Guennebaud
0e85a677e3
bug #1472 : fix warning
2017-09-26 10:53:33 +02:00
Gael Guennebaud
8579195169
bug #1468 (1/2) : add missing std:: to memcpy
2017-09-22 09:23:24 +02:00
Gael Guennebaud
f92567fecc
Add link to a useful example.
2017-09-20 10:22:23 +02:00
Gael Guennebaud
7ad07fc6f2
Update documentation for aligned_allocator
2017-09-20 10:22:00 +02:00
LaFeuille
7c9b07dc5c
Typo fix alignmeent ->alignment
2017-09-20 06:38:39 +00:00
Mehdi Goli
2062ac9958
Changes required for new ComputeCpp CE version.
2017-09-18 18:17:39 +01:00
Christoph Hertzberg
23f8b00bc8
clang provides __has_feature(is_enum) (but not <type_traits>) in C++03 mode
2017-09-14 19:26:03 +02:00
Christoph Hertzberg
0c9ad2f525
std::integral_constant is not C++03 compatible
2017-09-14 19:23:38 +02:00
Rasmus Munk Larsen
1b7294f6fc
Fix cut-and-paste error.
2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen
94e2213b38
Avoid undefined behavior in Eigen::TensorCostModel::numThreads.
...
If the cost is large enough then the thread count can be larger than the maximum
representable int, so just casting it to an int is undefined behavior.
Contributed by phurst@google.com .
2017-09-08 15:49:55 -07:00
Gael Guennebaud
6d42309f13
Fix compilation of Vector::operator()(enum) by treating enums as Index
2017-09-07 14:34:30 +02:00
Benoit Steiner
ea4e65bf41
Fixed compilation with cuda_clang.
2017-09-07 09:13:52 +00:00
Gael Guennebaud
a91918a105
Merged in infinitei/eigen (pull request PR-328)
...
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
Approved-by: Tal Hadad <tal_hd@hotmail.com >
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu >
2017-09-06 08:42:14 +00:00
Gael Guennebaud
9c353dd145
Add C++11 max_digits10 for half.
2017-09-06 10:22:47 +02:00
Gael Guennebaud
b35d1ce4a5
Implement true compile-time "if" for apply_rotation_in_the_plane. This fixes a compilation issue for vectorized real type with missing vectorization for complexes, e.g. AVX512.
2017-09-06 10:02:49 +02:00
Gael Guennebaud
80142362ac
Fix mixing types in sparse matrix products.
2017-09-02 22:50:20 +02:00
Jonas Harsch
810b70ad09
Merged in JonasMu/added-an-example-for-a-contraction-to-a--1504265366851 (pull request PR-1)
...
Added an example for a contraction to a scalar value
2017-09-01 12:01:39 +00:00
Jonas Harsch
a34fb212cd
Close branch JonasMu/added-an-example-for-a-contraction-to-a--1504265366851
2017-09-01 12:01:39 +00:00
Jonas Harsch
a991c80365
Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example.
2017-09-01 11:30:26 +00:00
Benoit Steiner
a4089991eb
Added support for CUDA 9.0.
2017-08-31 02:49:39 +00:00
Abhijit Kundu
6d991a9595
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
2017-08-30 13:26:30 -04:00
Gael Guennebaud
304ef29571
Handle min/max/inf/etc issue in cuda_fp16.h directly in test/main.h
2017-08-24 11:26:41 +02:00
Konstantinos Margaritis
1affe3d8df
Merged eigen/eigen into default
2017-08-24 12:24:01 +03:00
Gael Guennebaud
21633e585b
bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER
2017-08-24 11:06:47 +02:00
Gael Guennebaud
12249849b5
Make the threshold from gemm to coeff-based-product configurable, and add some explanations.
2017-08-24 10:43:21 +02:00
Gael Guennebaud
39864ebe1e
bug #336 : improve doc for PlainObjectBase::Map
2017-08-22 17:18:43 +02:00
Gael Guennebaud
600e52fc7f
Add missing scalar conversion
2017-08-22 17:06:57 +02:00
Gael Guennebaud
9deee79922
bug #1457 : add setUnit() methods for consistency.
2017-08-22 16:48:07 +02:00
Gael Guennebaud
bc4dae9aeb
bug #1449 : fix redux_3 unit test
2017-08-22 15:59:08 +02:00
Gael Guennebaud
bc91a2df8b
bug #1461 : fix compilation of Map<const Quaternion>::x()
2017-08-22 15:10:42 +02:00
Gael Guennebaud
fc39d5954b
Merged in dtrebbien/eigen/patch-1 (pull request PR-312)
...
Work around a compilation error seen with nvcc V8.0.61
2017-08-22 12:17:37 +00:00
Gael Guennebaud
b223918ea9
Doc: warn about constness in LLT::solveInPlace
2017-08-22 14:12:47 +02:00
Konstantinos Margaritis
4ce5ec5197
initial support for z14
2017-08-07 05:54:29 -04:00
Konstantinos Margaritis
e1e71ca4e4
initial support for z14
2017-08-06 19:53:18 -04:00
Benoit Steiner
84d7be103a
Fixing Argmax that was breaking upstream TensorFlow.
2017-07-22 03:19:34 +00:00
Benoit Steiner
f0b154a4b0
Code cleanup
2017-07-10 09:54:09 -07:00
Benoit Steiner
575cda76b3
Fixed syntax errors generated by xcode
2017-07-09 11:39:01 -07:00
Benoit Steiner
5ac27d5b51
Avoid relying on cxx11 features when possible.
2017-07-08 21:58:44 -07:00
Benoit Steiner
c5a241ab9b
Merged in benoitsteiner/opencl (pull request PR-323)
...
Improved support for OpenCL
2017-07-07 16:27:33 +00:00
Benoit Steiner
b7ae4dd9ef
Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315)
...
Add labels to #ifdef, in TensorReductionCuda.h
2017-07-07 04:23:52 +00:00
Benoit Steiner
9daed67952
Merged in tntnatbry/eigen (pull request PR-319)
...
Tensor Trace op
2017-07-07 04:18:03 +00:00
Benoit Steiner
6795512e59
Improved the randomness of the tensor random generator
2017-07-06 21:12:45 -07:00
Benoit Steiner
dc524ac716
Fixed compilation warning
2017-07-06 21:11:15 -07:00
Benoit Steiner
62b4634ebe
Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14)
...
Applying Benoit's comment for Fixing ImageVolumePatch.
* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.
* Fixing dealocation of the memory in ImagePatch test for SYCL.
* Fixing the automerge issue.
2017-07-06 05:08:13 +00:00
Benoit Steiner
c92faf9d84
Merged in mehdi_goli/upstr_benoit/HiperbolicOP (pull request PR-13)
...
Adding hyperbolic operations for sycl.
* Adding hyperbolic operations.
* Adding the hyperbolic operations for CPU as well.
2017-07-06 05:05:57 +00:00
Benoit Steiner
53725c10b8
Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)
...
DataDependancy
* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.
* Applying Ronnan's Comments.
* Applying benoit's comments
2017-06-28 17:55:23 +00:00
Gael Guennebaud
c010b17360
Fix warning
2017-06-27 14:29:57 +02:00
Gael Guennebaud
561f777075
Fix a gcc7 warning about bool * bool in abs2 default implementation.
2017-06-27 12:05:17 +02:00
Gael Guennebaud
b651ce0ffa
Fix a gcc7 warning: Wint-in-bool-context
2017-06-26 09:58:28 +02:00
Christoph Hertzberg
157040d44f
Make sure CMAKE_Fortran_COMPILER is set before checking for Fortran functions
2017-06-20 16:58:03 +02:00
Gael Guennebaud
24fe1de9b4
merge
2017-06-15 10:17:39 +02:00
Gael Guennebaud
b240080e64
bug #1436 : fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing.
2017-06-15 10:16:30 +02:00
Benoit Steiner
3baef62b9a
Added missing __device__ qualifier
2017-06-13 12:56:55 -07:00
Benoit Steiner
449936828c
Added missing __device__ qualifier
2017-06-13 12:54:57 -07:00
Benoit Steiner
b8e805497e
Merged in benoitsteiner/opencl (pull request PR-318)
...
Improved support for OpenCL
2017-06-13 05:01:10 +00:00
Gael Guennebaud
9fbdf02059
Enable Array(EigenBase<>) ctor for compatible scalar types only. This prevents nested arrays to look as being convertible from/to simple arrays.
2017-06-12 22:30:32 +02:00
Gael Guennebaud
e43d8fe9d7
Fix compilation of streaming nested Array, i.e., cout << Array<Array<>>
2017-06-12 22:26:26 +02:00
Gael Guennebaud
d9d7bd6d62
Fix 1x1 case in Solve expression with EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION==RowMajor
2017-06-12 22:25:02 +02:00
Androbin42
95ecb2b5d6
Make buildtests.in more robust
2017-06-12 17:11:06 +00:00
Androbin42
3f7fb5a6d6
Make eigen_monitor_perf.sh more robust
2017-06-12 17:07:56 +00:00
Gael Guennebaud
7f42a93349
Merged in alainvaucher/eigen/find-module-imported-target (pull request PR-324)
...
In the CMake find module, define the Eigen imported target as when installing with CMake
* In the CMake find module, define the Eigen imported target
* Add quotes to the imported location, in case there are spaces in the path.
Approved-by: Alain Vaucher <acvaucher@gmail.com >
2017-11-15 20:45:09 +00:00
Gael Guennebaud
7cc503f9f5
bug #1485 : fix linking issue of non template functions
2017-11-15 21:33:37 +01:00
Gael Guennebaud
103c0aa6ad
Add KLU in the list of third-party sparse solvers
2017-11-10 14:13:29 +01:00
Gael Guennebaud
00bc67c374
Move KLU support to official
2017-11-10 14:11:22 +01:00
Gael Guennebaud
b82cd93c01
KLU: truely disable unimplemented code, add proper static assertions in solve
2017-11-10 14:09:01 +01:00
Gael Guennebaud
6365f937d6
KLU depends on BTF but not on libSuiteSparse nor Cholmod
2017-11-10 13:58:52 +01:00
Gael Guennebaud
8cf63ccb99
Merged in kylemacfarlan/eigen (pull request PR-337)
...
Add support for SuiteSparse's KLU routines
2017-11-10 10:43:17 +00:00
Gael Guennebaud
1495b98a8e
Merged in spraetor/eigen (pull request PR-305)
...
Issue with mpreal and std::numeric_limits::digits
2017-11-10 10:28:54 +00:00
Gael Guennebaud
fc45324380
Merged in jkflying/eigen-fix-scaling (pull request PR-302)
...
Make scaling work with non-square matrices
2017-11-10 10:11:36 +00:00
Gael Guennebaud
d306b96fb7
Merged in carpent/eigen (pull request PR-342)
...
Use col method for column-major matrix
2017-11-10 10:09:53 +00:00
Gael Guennebaud
1b2dcf9a47
Check that Schur decomposition succeed.
2017-11-10 10:26:09 +01:00
Gael Guennebaud
0a1cc73942
bug #1484 : restore deleted line for 128 bits long doubles, and improve dispatching logic.
2017-11-10 10:25:41 +01:00
Gael Guennebaud
f86bb89d39
Add EIGEN_MKL_NO_DIRECT_CALL option
2017-11-09 11:07:45 +01:00
Gael Guennebaud
5fa79f96b8
Patch from Konstantin Arturov to enable MKL's direct call by default
2017-11-09 10:58:38 +01:00
Justin Carpentier
a020d9b134
Use col method for column-major matrix
2017-10-17 21:51:27 +02:00
Kyle Vedder
c0e1d510fd
Add support for SuiteSparse's KLU routines
2017-10-04 21:01:23 -05:00
Gael Guennebaud
6dcf966558
Avoid implicit scalar conversion with accuracy loss in pow(scalar,array)
2017-06-12 16:47:22 +02:00
Gael Guennebaud
50e09cca0f
fix tipo
2017-06-11 15:30:36 +02:00
Gael Guennebaud
a4fd4233ad
Fix compilation with some compilers
2017-06-09 23:02:02 +02:00
Gael Guennebaud
c3e2afce0d
Enable MSVC 2010 workaround from MSVC only
2017-06-09 16:25:18 +02:00
Gael Guennebaud
731c8c704d
bug #1403 : more scalar conversions fixes in BDCSVD
2017-06-09 15:45:49 +02:00
Gael Guennebaud
1bbcf19029
bug #1403 : fix implicit scalar type conversion.
2017-06-09 14:44:02 +02:00
Gael Guennebaud
ba5cab576a
bug #1405 : enable StrictlyLower/StrictlyUpper triangularView as the destination of matrix*matrix products.
2017-06-09 14:38:04 +02:00
Gael Guennebaud
90168c003d
bug #1414 : doxygen, add EigenBase to CoreModule
2017-06-09 14:01:44 +02:00
Gael Guennebaud
26f552c18d
fix compilation of Half in C++98 (issue introduced in previous commit)
2017-06-09 13:36:58 +02:00
Gael Guennebaud
1d59ca2458
Fix compilation with gcc 4.3 and ARM NEON
2017-06-09 13:20:52 +02:00
Gael Guennebaud
fb1ee04087
bug #1410 : fix lvalue propagation of Array/Matrix-Wrapper with a const nested expression.
2017-06-09 13:13:03 +02:00
Gael Guennebaud
723a59ac26
add regression test for aliasing in product rewritting
2017-06-09 12:54:40 +02:00
Gael Guennebaud
8640093af1
fix compilation in C++98
2017-06-09 12:45:01 +02:00
Gael Guennebaud
a7be4cd1b1
Fix LeastSquareDiagonalPreconditioner for complexes (issue introduced in previous commit)
2017-06-09 11:57:53 +02:00
Gael Guennebaud
498aa95a8b
bug #1424 : add numext::abs specialization for unsigned integer types.
2017-06-09 11:53:49 +02:00
Gael Guennebaud
d588822779
Add missing std::numeric_limits specialization for half, and complete NumTraits<half>
2017-06-09 11:51:53 +02:00
Gael Guennebaud
682b2ef17e
bug #1423 : fix LSCG\'s Jacobi preconditioner for row-major matrices.
2017-06-08 15:06:27 +02:00
Gael Guennebaud
4bbc320468
bug #1435 : fix aliasing issue in exressions like: A = C - B*A;
2017-06-08 12:55:25 +02:00
Hugh Perkins
9341f258d4
Add labels to #ifdef, in TensorReductionCuda.h
2017-06-06 15:51:06 +01:00
Benoit Steiner
1e736b9ead
Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7)
...
Fixing SYCL alignment issue required by TensorFlow.
2017-05-26 17:23:00 +00:00
Benoit Steiner
9dee55ec33
Merged eigen/eigen into default
2017-05-26 09:01:04 -07:00
Mehdi Goli
0370d3576e
Applying Ronnan's comments.
2017-05-26 16:01:48 +01:00
Benoit Steiner
615aff4d6e
Merged in a-doumoulakis/opencl (pull request PR-12)
...
Enable triSYCL with Eigen
2017-05-25 18:18:23 +00:00
a-doumoulakis
c3bd860de8
Modification upon request
...
- Remove warning suppression
2017-05-25 18:46:18 +01:00
Mehdi Goli
e3f964ed55
Applying Benoit's comment;removing dead code.
2017-05-25 11:17:26 +01:00
Benoit Steiner
df90010cdd
Merged in mehdi_goli/opencl/CmakeFixForUbuntu16.04 (pull request PR-11)
...
CmakeFixForUbuntu16.04
2017-05-24 19:03:57 +00:00
a-doumoulakis
fb853a857a
Restore misplaced comment
2017-05-24 17:50:15 +01:00
a-doumoulakis
7a8ba565f8
Merge changed from upstream
2017-05-24 17:45:29 +01:00
Mehdi Goli
daf99daadd
Merged in DuncanMcBain/opencl/default (pull request PR-2)
...
Update FindComputeCpp.cmake with new changes from SDK
2017-05-24 13:59:53 +00:00
Mehdi Goli
9ef5c948ba
Fixing Cmake for gcc>=5.
2017-05-24 13:11:16 +01:00
Duncan McBain
0cb3c7c7dd
Update FindComputeCpp.cmake with new changes from SDK
2017-05-24 12:24:21 +01:00
Mmanu Chaturvedi
2971503fed
Specializing numeric_limits For AutoDiffScalar
2017-05-23 17:12:36 -04:00
Gael Guennebaud
26e8f9171e
Fix compilation of matrix log with Map as input
2017-06-07 10:51:23 +02:00
Gael Guennebaud
f2a553fb7b
bug #1411 : fix usage of alignment information in vectorization of quaternion product and conjugate.
2017-06-07 10:10:30 +02:00
Christoph Hertzberg
e018142604
Make sure CholmodSupport works when included in multiple compilation units (issue was reported on stackoverflow.com)
2017-06-06 19:23:14 +02:00
Gael Guennebaud
8508db52ab
bug #1417 : make LinSpace compatible with std::complex
2017-06-06 17:25:56 +02:00
Mehdi Goli
9aa7c30163
Merge with Benoit.
2017-05-23 10:51:34 +01:00
Mehdi Goli
b42d775f13
Temporarry branch for synch with upstream
2017-05-23 10:51:14 +01:00
Benoit Steiner
615733381e
Merged in mehdi_goli/opencl/FixingCmakeDependency (pull request PR-2)
...
Fixing Cmake Dependency for SYCL
2017-05-22 17:43:06 +00:00
Benoit Steiner
1500a67c41
Merged in mehdi_goli/opencl/TensorSupportedDevice (pull request PR-6)
...
Fixing suported device list.
2017-05-22 16:22:21 +00:00
Mehdi Goli
76c0fc1f95
Fixing SYCL alignment issue required by TensorFlow.
2017-05-22 16:49:32 +01:00
Mehdi Goli
2d17128d6f
Fixing suported device list.
2017-05-22 16:40:33 +01:00
Mehdi Goli
61d7f3664a
Fixing Cmake Dependency for SYCL
2017-05-22 14:58:28 +01:00
a-doumoulakis
a5226ce4f7
Add cmake file FindTriSYCL.cmake
2017-05-17 17:59:30 +01:00
a-doumoulakis
052426b824
Add support for triSYCL
...
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options
Fix contraction kernel with correct nd_item dimension
2017-05-05 19:26:27 +01:00
Abhijit Kundu
4343db84d8
updated warning number for nvcc relase 8 (V8.0.61) for the stupid warning message 'calling a __host__ function from a __host__ __device__ function is not allowed'.
2017-05-01 10:36:27 -04:00
Abhijit Kundu
9bc0a35731
Fixed nested angle barckets >> issue when compiling with cuda 8
2017-04-27 03:09:03 -04:00
Gael Guennebaud
891ac03483
Fix dense * sparse-selfadjoint-view product.
2017-04-25 13:58:10 +02:00
RJ Ryan
949a2da38c
Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer.
...
Improves support for std::complex types when compiling for CUDA.
Expands on e2e9cdd169
and 2bda1b0d93
.
2017-04-14 13:23:35 -07:00
Gael Guennebaud
d9084ac8e1
Improve mixing of complex and real in the vectorized path of apply_rotation_in_the_plane
2017-04-14 11:05:13 +02:00
Gael Guennebaud
f75dfdda7e
Fix unwanted Real to Scalar to Real conversions in column-pivoting QR.
2017-04-14 10:34:30 +02:00
Gael Guennebaud
0f83aeb6b2
Improve cmake scripts for Pastix and BLAS detection.
2017-04-14 10:22:12 +02:00
Benoit Steiner
0d08165a7f
Merged in benoitsteiner/opencl (pull request PR-309)
...
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
068cc09708
Preserve file naming conventions
2017-04-04 10:09:10 -07:00
Benoit Steiner
c302ea7bc4
Deleted empty line of code
2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1
Guard sycl specific code under a EIGEN_USE_SYCL ifdef
2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7
Code cleanup
2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd
Guard the sycl specific code with EIGEN_USE_SYCL
2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a
Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL
2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666
iGate the sycl specific code under a EIGEN_USE_SYCL define
2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0
Fixed compilation error when sycl is enabled.
2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96
fix typos in the Tensor readme
2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6
Restored code compatibility with compilers that dont support c++11
...
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3
Restore the old constructors to retain compatibility with non c++11 compilers.
2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f
Gate the sycl specific code under #ifdef sycl
2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555
Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.
2017-03-28 16:50:34 +01:00
Simon Praetorius
511810797e
Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport.
2017-03-24 17:45:56 +01:00
Luke Iwanski
a91417a7a5
Introduces align allocator for SYCL buffer
2017-03-20 14:48:54 +00:00
Gael Guennebaud
aae19c70ac
update has_ReturnType to be more consistent with other has_ helpers
2017-03-17 17:33:15 +01:00
Benoit Steiner
f8a622ef3c
Merged eigen/eigen into default
2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b
Silenced compilation warning
2017-03-15 20:02:39 -07:00
Luke Iwanski
9597d6f6ab
Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread
2017-03-15 19:28:09 +00:00
Luke Iwanski
c06861d15e
Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration
2017-03-15 19:26:08 +00:00
Benoit Steiner
7f31bb6822
Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304)
...
Fixed compilation with cuda-clang
2017-03-15 16:48:52 +00:00
Gael Guennebaud
89fd0c3881
better check array index before using it
2017-03-15 15:18:03 +01:00
Benoit Jacob
61160a21d2
ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.
2017-03-15 06:57:25 -04:00
Benoit Steiner
f0f3591118
Made the reduction code compile with cuda-clang
2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496
Adding synchronisation to convolution kernel for sycl backend.
2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b
Get rid of Init().
2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094
Use C++11 ctor forwarding to simplify code a bit.
2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6
Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
...
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.
* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.
* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.
* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053
Use name to distinguish name instead of the vendor
2017-03-08 18:26:34 +00:00
Mehdi Goli
aadb7405a7
Fixing typo in sycl Benchmark.
2017-03-08 18:20:06 +00:00
Gael Guennebaud
970ff78294
bug #1401 : fix compilation of "cond ? x : -x" with x an AutoDiffScalar
2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a
Adding sycl Benchmarks.
2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533
Fixing potential race condition on sycl device.
2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95
Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
2017-03-07 14:27:10 +00:00
Gael Guennebaud
e5156e4d25
fix typo
2017-03-07 11:25:58 +01:00
Gael Guennebaud
5694315fbb
remove UTF8 symbol
2017-03-07 10:53:47 +01:00
Gael Guennebaud
e958c2baac
remove UTF8 symbols
2017-03-07 10:47:40 +01:00
Gael Guennebaud
d967718525
do not include std header within extern C
2017-03-07 10:16:39 +01:00
Gael Guennebaud
659087b622
bug #1400 : fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY
2017-03-07 10:02:34 +01:00
Ilya Biryukov
1c03d43a5c
Fixed compilation with cuda-clang
2017-03-06 12:01:12 +01:00
Julian Kent
bbe717fa2f
Make scaling work with non-square matrices
2017-03-03 12:58:51 +01:00
Benoit Steiner
a71943b9a4
Made the Tensor code compile with clang 3.9
2017-03-02 10:47:29 -08:00
Benoit Steiner
09ae0e6586
Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that:
...
* they're used consistently between the declaration and the definition of a function
* we avoid calling host only methods from host device methods.
2017-03-01 11:47:47 -08:00
Benoit Steiner
1e2d046651
Silenced a couple of compilation warnings
2017-03-01 10:13:42 -08:00
Benoit Steiner
c1d87ec110
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-03-01 10:08:50 -08:00
Benoit Steiner
3a3f040baa
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 17:06:15 -08:00
Benoit Steiner
7b61944669
Made most of the packet math primitives usable within CUDA kernel when compiling with clang
2017-02-28 17:05:28 -08:00
Benoit Steiner
c92406d613
Silenced clang compilation warning.
2017-02-28 17:03:11 -08:00
Benoit Steiner
857adbbd52
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 14:58:45 -08:00
Benoit Steiner
4a7df114c8
Added missing EIGEN_DEVICE_FUNC
2017-02-28 14:00:15 -08:00
Benoit Steiner
de7b0fdea9
Made the TensorStorage class compile with clang 3.9
2017-02-28 13:52:22 -08:00
Benoit Steiner
765f4cc4b4
Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to run on GPU yet.
2017-02-28 11:57:00 -08:00
Benoit Steiner
e993c94f07
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:56:45 -08:00
Benoit Steiner
33443ec2b0
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:50:10 -08:00
Benoit Steiner
f3e9c42876
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:46:30 -08:00
Mehdi Goli
8296b87d7b
Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.
2017-02-28 17:16:14 +00:00
Gael Guennebaud
4e98a7b2f0
bug #1396 : add some missing EIGEN_DEVICE_FUNC
2017-02-28 09:47:38 +01:00
Gael Guennebaud
478a9f53be
Fix typo.
2017-02-28 09:32:45 +01:00
Benoit Steiner
889c606f8f
Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops
2017-02-27 17:17:47 -08:00
Benoit Steiner
193939d6aa
Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods.
2017-02-27 17:11:47 -08:00
Benoit Steiner
ed4dc9d01a
Declared the plset, ploadt_ro, and ploaddup packet primitives as usable within a gpu kernel
2017-02-27 16:57:01 -08:00
Benoit Steiner
b1fc7c9a09
Added missing EIGEN_DEVICE_FUNC qualifiers.
2017-02-27 16:48:30 -08:00
Benoit Steiner
554116bec1
Added EIGEN_DEVICE_FUNC to make the prototype of the EigenBase override match that of DenseBase
2017-02-27 16:45:31 -08:00
Benoit Steiner
34d9fce93b
Avoid unecessary float to double conversions.
2017-02-27 16:33:33 -08:00
Benoit Steiner
e0bd6f5738
Merged eigen/eigen into default
2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9
Adding TensorVolumePatchOP.h for sycl
2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137
Converting fixed float type into template type for TensorContraction.
2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae
Adding Sycl Backend for TensorGenerator.h.
2017-02-22 16:36:24 +00:00
Gael Guennebaud
5c68ba41a8
typos
2017-02-21 17:10:55 +01:00
Gael Guennebaud
b0f55ef85a
merge
2017-02-21 17:04:10 +01:00
Gael Guennebaud
d29e9d7119
Improve documentation of reshaped
2017-02-21 17:03:10 +01:00
Gael Guennebaud
9b6e365018
Fix linking issue.
2017-02-21 16:52:22 +01:00
Gael Guennebaud
3d200257d7
Add support for automatic-size deduction in reshaped, e.g.:
...
mat.reshaped(4,AutoSize); <-> mat.reshaped(4,mat.size()/4);
2017-02-21 15:57:25 +01:00
Gael Guennebaud
f8179385bd
Add missing const version of mat(all).
2017-02-21 13:56:26 +01:00
Gael Guennebaud
1e3aa470fa
Fix long to int conversion
2017-02-21 13:56:01 +01:00
Gael Guennebaud
b3fc0007ae
Add support for mat(all) as an alias to mat.reshaped(mat.size(),fix<1>);
2017-02-21 13:49:09 +01:00
Mehdi Goli
4f07ac16b0
Reducing the number of warnings.
2017-02-21 10:09:47 +00:00
Gael Guennebaud
76687f385c
bug #1394 : fix compilation of SelfAdjointEigenSolver<Matrix>(sparse*sparse);
2017-02-20 14:27:26 +01:00
Gael Guennebaud
d8b1f6cebd
bug #1380 : for Map<> as input of matrix exponential
2017-02-20 14:06:06 +01:00
Gael Guennebaud
6572825703
bug #1395 : fix the use of compile-time vectors as inputs of JacobiSVD.
2017-02-20 13:44:37 +01:00
Mehdi Goli
79ebc8f761
Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h.
2017-02-20 12:11:05 +00:00
Gael Guennebaud
9081c8f6ea
Add support for RowOrder reshaped
2017-02-20 11:46:21 +01:00
Gael Guennebaud
a811a04696
Silent warning.
2017-02-20 10:14:21 +01:00
Gael Guennebaud
63798df038
Fix usage of CUDACC_VER
2017-02-20 08:16:36 +01:00
Gael Guennebaud
deefa54a54
Fix tracking of temporaries in unit tests
2017-02-19 10:32:54 +01:00
Gael Guennebaud
f8a55cc062
Fix compilation.
2017-02-18 10:08:13 +01:00
Gael Guennebaud
cbbf88c4d7
Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON.
2017-02-17 14:39:02 +01:00
Gael Guennebaud
582b5e39bf
bug #1393 : enable Matrix/Array explicit ctor from types with conversion operators (was ok with 3.2)
2017-02-17 14:10:57 +01:00
Benoit Steiner
cfa0568ef7
Size indices are signed.
2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0
Adding TensorLayoutSwapOp for sycl.
2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6
Adding TensorPatch.h for sycl backend.
2017-02-15 10:13:01 +00:00
Benoit Steiner
31a25ab226
Merged eigen/eigen into default
2017-02-14 15:36:21 -08:00
Mehdi Goli
0d153ded29
Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test
2017-02-13 17:25:12 +00:00
Gael Guennebaud
5937c4ae32
Fall back is_integral to std::is_integral in c++11
2017-02-13 17:14:26 +01:00
Gael Guennebaud
7073430946
Fix overflow and make use of long long in c++11 only.
2017-02-13 17:14:04 +01:00
Jonathan Hseu
3453b00a1e
Fix vector indexing with uint64_t
2017-02-11 21:45:32 -08:00
Gael Guennebaud
e7ebe52bfb
bug #1391 : include IO.h before DenseBase to enable its usage in DenseBase plugins.
2017-02-13 09:46:20 +01:00
Gael Guennebaud
b3750990d5
Workaround some gcc 4.7 warnings
2017-02-11 23:24:06 +01:00
Gael Guennebaud
4b22048cea
Fallback Reshaped to MapBase when possible (same storage order and linear access to the nested expression)
2017-02-11 15:32:53 +01:00
Gael Guennebaud
83d6a529c3
Use Eigen::fix<N> to pass compile-time sizes.
2017-02-11 15:31:28 +01:00
Gael Guennebaud
c16ee72b20
bug #1392 : fix #include <Eigen/Sparse> with mpl2-only
2017-02-11 10:35:01 +01:00
Gael Guennebaud
e43016367a
Forgot to include a file in previous commit
2017-02-11 10:34:18 +01:00
Gael Guennebaud
6486d4fc95
Worakound gcc 4.7 issue in c++11.
2017-02-11 10:29:10 +01:00
Gael Guennebaud
4a4a72951f
Fix previous commits: disbale only problematic indexed view methods for old compilers instead of disabling everything.
...
Tested with gcc 4.7 (c++03) and gcc 4.8 (c++03 & c++11)
2017-02-11 10:28:44 +01:00
Benoit Steiner
fad776492f
Merged eigen/eigen into default
2017-02-10 14:27:43 -08:00
Benoit Steiner
1ef30b8090
Fixed bug introduced in previous commit
2017-02-10 13:35:10 -08:00
Benoit Steiner
769208a17f
Pulled latest updates from upstream
2017-02-10 13:11:40 -08:00
Benoit Steiner
8b3cc54c42
Added a new EIGEN_HAS_INDEXED_VIEW define that set to 0 for older compilers that are known to fail to compile the indexed views (I used the define from the indexed_views.cpp test).
...
Only include the indexed view methods when the compiler supports the code.
This makes it possible to use Eigen again in complex code bases such as TensorFlow and older compilers such as gcc 4.8
2017-02-10 13:08:49 -08:00
Gael Guennebaud
a1ff24f96a
Fix prunning in (sparse*sparse).pruned() when the result is nearly dense.
2017-02-10 13:59:32 +01:00
Gael Guennebaud
0256c52359
Include clang in the list of non strict MSVC (just to be sure)
2017-02-10 13:41:52 +01:00
Alexander Neumann
dd58462e63
fixed inlining issue with clang-cl on visual studio
...
(grafted from 7962ac1a58
)
2017-02-08 23:50:38 +01:00
Gael Guennebaud
fc8fd5fd24
Improve multi-threading heuristic for matrix products with a small number of columns.
2017-02-07 17:19:59 +01:00
Mehdi Goli
0ee97b60c2
Adding mean to TensorReductionSycl.h
2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b
Fixing TensorReductionSycl for min and max.
2017-02-06 18:05:23 +00:00
Gael Guennebaud
4254b3eda3
bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)
2017-02-03 15:22:35 +01:00
Mehdi Goli
bc128f9f3b
Reducing the warnings in Sycl backend.
2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30
Silenced several compilation warnings
2017-02-01 15:50:58 -08:00
Benoit Steiner
2db75c07a6
fixed the ordering of the template and EIGEN_DEVICE_FUNC keywords in a few more places to get more of the Eigen codebase to compile with nvcc again.
2017-02-01 15:41:29 -08:00
Benoit Steiner
fcd257039b
Replaced EIGEN_DEVICE_FUNC template<foo> with template<foo> EIGEN_DEVICE_FUNC to make the code compile with nvcc8.
2017-02-01 15:30:49 -08:00
Gael Guennebaud
84090027c4
Disable a part of the unit test for gcc 4.8
2017-02-01 23:37:44 +01:00
Gael Guennebaud
0eceea4efd
Define EIGEN_COMP_GNUC to reflect version number: 47, 48, 49, 50, 60, ...
2017-02-01 23:36:40 +01:00
Mehdi Goli
ff53050034
Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp in order to be the same as other tests.
2017-02-01 15:36:03 +00:00
Mehdi Goli
bab29936a1
Reducing warnings in Sycl backend.
2017-02-01 15:29:53 +00:00
Gael Guennebaud
645a8e32a5
Fix compilation of JacobiSVD for vectors type
2017-01-31 16:22:54 +01:00
Mehdi Goli
48a20b7d95
Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h
2017-01-31 14:06:36 +00:00
Gael Guennebaud
53026d29d4
bug #478 : fix regression in the eigen decomposition of zero matrices.
2017-01-31 14:22:42 +01:00
Benoit Steiner
fbc39fd02c
Merge latest changes from upstream
2017-01-30 15:25:57 -08:00
Gael Guennebaud
63de19c000
bug #1380 : fix matrix exponential with Map<>
2017-01-30 13:55:27 +01:00
Gael Guennebaud
c86911ac73
bug #1384 : fix evaluation of "sparse/scalar" that used the wrong evaluation path.
2017-01-30 13:38:24 +01:00
Mehdi Goli
82ce92419e
Fixing the buffer type in memcpy.
2017-01-30 11:38:20 +00:00
Gael Guennebaud
24409f3acd
Use fix<> API to specify compile-time reshaped sizes.
2017-01-29 15:20:35 +01:00
Gael Guennebaud
9036cda364
Cleanup intitial reshape implementation:
...
- reshape -> reshaped
- make it compatible with evaluators.
2017-01-29 14:57:45 +01:00
Gael Guennebaud
0e89baa5d8
import yoco xiao's work on reshape
2017-01-29 14:29:31 +01:00
Gael Guennebaud
d024e9942d
MSVC 1900 release is not c++14 compatible enough for us. The 1910 update seems to be fine though.
2017-01-27 22:17:59 +01:00
Gael Guennebaud
83592659ba
merge
2017-01-27 21:59:59 +01:00
Gael Guennebaud
4a351be163
Fix warning
2017-01-27 11:59:35 +01:00
Gael Guennebaud
251ad3e04f
Fix unamed type as template parametre issue.
2017-01-27 11:57:52 +01:00
Rasmus Munk Larsen
edaa0fc5d1
Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
...
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579
Merged in ggael/eigen-flexidexing (pull request PR-294)
...
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
98dfe0c13f
Fix useless ';' warning
2017-01-25 22:55:04 +01:00
Gael Guennebaud
28351073d8
Fix unamed type as template argument (ok in c++11 only)
2017-01-25 22:54:51 +01:00
Gael Guennebaud
607be65a03
Fix duplicates of array_size bewteen unsupported and Core
2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
7d39c6d50a
Merged eigen/eigen into default
2017-01-25 09:22:26 -08:00
Rasmus Munk Larsen
5c9ed4ba0d
Reverse arguments for pmin in AVX.
2017-01-25 09:21:57 -08:00
Gael Guennebaud
850ca961d2
bug #1383 : fix regression in LinSpaced for integers and high<low
2017-01-25 18:13:53 +01:00
Gael Guennebaud
296d24be4d
bug #1381 : fix sparse.diagonal() used as a rvalue.
...
The problem was that is "sparse" is not const, then sparse.diagonal() must have the
LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference,
const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing
zero coefficient. The trick is to return a reference to a local member of
evaluator<SparseMatrix>.
2017-01-25 17:39:01 +01:00
Gael Guennebaud
d06a48959a
bug #1383 : Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0.
2017-01-25 15:27:13 +01:00
Rasmus Munk Larsen
ae3e43a125
Remove extra space.
2017-01-24 16:16:39 -08:00
Benoit Steiner
e96c77668d
Merged in rmlarsen/eigen2 (pull request PR-292)
...
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352
Update copy helper to use fast_memcpy.
2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221
Adds a fast memcpy function to Eigen. This takes advantage of the following:
...
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.
2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux
Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.
The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.
Measured improvements in wall clock time:
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2 3.48 2.39 +31.3%
BM_memcpy_1T/8 12.3 6.51 +47.0%
BM_memcpy_1T/64 371 383 -3.2%
BM_memcpy_1T/512 66922 66720 +0.3%
BM_memcpy_1T/4k 9892867 6849682 +30.8%
BM_memcpy_1T/5k 14951099 10332856 +30.9%
BM_memcpy_2T/2 3.50 2.46 +29.7%
BM_memcpy_2T/8 12.3 7.66 +37.7%
BM_memcpy_2T/64 371 376 -1.3%
BM_memcpy_2T/512 66652 66788 -0.2%
BM_memcpy_2T/4k 6145012 6117776 +0.4%
BM_memcpy_2T/5k 9181478 9010942 +1.9%
BM_memcpy_4T/2 3.47 2.47 +31.0%
BM_memcpy_4T/8 12.3 6.67 +45.8
BM_memcpy_4T/64 374 376 -0.5%
BM_memcpy_4T/512 67833 68019 -0.3%
BM_memcpy_4T/4k 5057425 5188253 -2.6%
BM_memcpy_4T/5k 7555638 7779468 -3.0%
BM_memcpy_6T/2 3.51 2.50 +28.8%
BM_memcpy_6T/8 12.3 7.61 +38.1%
BM_memcpy_6T/64 373 378 -1.3%
BM_memcpy_6T/512 66871 66774 +0.1%
BM_memcpy_6T/4k 5112975 5233502 -2.4%
BM_memcpy_6T/5k 7614180 7772246 -2.1%
BM_memcpy_8T/2 3.47 2.41 +30.5%
BM_memcpy_8T/8 12.4 10.5 +15.3%
BM_memcpy_8T/64 372 388 -4.3%
BM_memcpy_8T/512 67373 66588 +1.2%
BM_memcpy_8T/4k 5148462 5254897 -2.1%
BM_memcpy_8T/5k 7660989 7799058 -1.8%
BM_memcpy_12T/2 3.50 2.40 +31.4%
BM_memcpy_12T/8 12.4 7.55 +39.1
BM_memcpy_12T/64 374 378 -1.1%
BM_memcpy_12T/512 67132 66683 +0.7%
BM_memcpy_12T/4k 5185125 5292920 -2.1%
BM_memcpy_12T/5k 7717284 7942684 -2.9%
BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4%
BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4%
BM_slicingSmallPieces_1T/64 491 476 +3.1%
BM_slicingSmallPieces_1T/512 21734 18814 +13.4%
BM_slicingSmallPieces_1T/4k 394660 396760 -0.5%
BM_slicingSmallPieces_1T/5k 218722 209244 +4.3%
BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0%
BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0
BM_slicingSmallPieces_2T/64 497 477 +4.0%
BM_slicingSmallPieces_2T/512 21732 18822 +13.4%
BM_slicingSmallPieces_2T/4k 392885 390490 +0.6%
BM_slicingSmallPieces_2T/5k 221988 208678 +6.0%
BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9%
BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7%
BM_slicingSmallPieces_4T/64 493 476 +3.4%
BM_slicingSmallPieces_4T/512 21702 18758 +13.6%
BM_slicingSmallPieces_4T/4k 393962 404023 -2.6%
BM_slicingSmallPieces_4T/5k 249667 211732 +15.2%
BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5%
BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8%
BM_slicingSmallPieces_6T/64 488 478 +2.0%
BM_slicingSmallPieces_6T/512 21719 18841 +13.3%
BM_slicingSmallPieces_6T/4k 394950 397583 -0.7%
BM_slicingSmallPieces_6T/5k 223080 210148 +5.8%
BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0%
BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9%
BM_slicingSmallPieces_8T/64 489 480 +1.8%
BM_slicingSmallPieces_8T/512 21586 18798 +12.9%
BM_slicingSmallPieces_8T/4k 394592 400165 -1.4%
BM_slicingSmallPieces_8T/5k 219688 208301 +5.2%
BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7%
BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8
BM_slicingSmallPieces_12T/64 488 476 +2.5%
BM_slicingSmallPieces_12T/512 21931 18831 +14.1%
BM_slicingSmallPieces_12T/4k 393962 396541 -0.7%
BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440
Fix NaN propagation for AVX512.
2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4
Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
...
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2
Add support for std::integral_constant
2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854
Add test for multiple symbols
2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13
Fix seq().reverse() in c++98
2017-01-24 11:36:43 +01:00
Gael Guennebaud
5783158e8f
Add unit test for FixedInt and Symbolic
2017-01-24 10:55:12 +01:00
Gael Guennebaud
ddd83f82d8
Add support for "SymbolicExpr op fix<N>" in C++98/11 mode.
2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a
Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|)
2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62
Add internal doc
2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab
Rename fix_t to FixedInt
2017-01-24 09:39:49 +01:00
Gael Guennebaud
156e6234f1
bug #1375 : fix cmake installation with cmake 2.8
2017-01-24 09:16:40 +01:00
Gael Guennebaud
ba3f977946
bug #1376 : add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))
2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36
bug #1382 : move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)
2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a
Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t
2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692
Use Index instead of size_t
2017-01-23 22:00:33 +01:00
Luke Iwanski
bf44fed9b7
Allows AMD APU
2017-01-23 15:56:45 +00:00
Gael Guennebaud
0fe278f7be
bug #1379 : fix compilation in sparse*diagonal*dense with openmp
2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e
bug #1378 : fix doc (DiagonalIndex vs Diagonal)
2017-01-21 22:09:59 +01:00
Mehdi Goli
602f8c27f5
Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow.
2017-01-20 18:23:20 +00:00
Gael Guennebaud
4d302a080c
Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only)
2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24
Exploit fixed values in seq and reverse with C++98 compatibility
2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34
Add support for fixed-value in symbolic expression, c++11 only for now.
2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8
Made sure that enabling avx2 instructions enables avx and sse instructions as well.
2017-01-19 09:54:48 -08:00
Mehdi Goli
77cc4d06c7
Removing unused variables
2017-01-19 17:06:21 +00:00
Mehdi Goli
837fdbdcb2
Merging with Benoit's upstream.
2017-01-19 11:34:34 +00:00
Mehdi Goli
6bdd15f572
Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
2017-01-19 11:30:59 +00:00
Benoit Steiner
aa7fb88dfa
Merged in LaFeuille/eigen (pull request PR-289)
...
Fix a typo
2017-01-18 16:44:39 -08:00
Gael Guennebaud
e84ed7b6ef
Remove dead code
2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419
Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end.
2017-01-18 23:16:32 +01:00
Mehdi Goli
c6f7b33834
Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy.
2017-01-18 10:45:28 +00:00
Gael Guennebaud
15471432fe
Add a .reverse() member to ArithmeticSequence.
2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a
Add missing operator*
2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b
Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n)
2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353
Merge the generic and dynamic overloads of block()
2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8
Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
...
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f
Fix regression when passing enums to operator()
2017-01-17 17:10:16 +01:00
Gael Guennebaud
f7852c3d16
Fix -Wunnamed-type-template-args
2017-01-17 16:05:58 +01:00
Gael Guennebaud
4f36dcfda8
Add a generic block() method compatible with Eigen::fix
2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356
Add a get_runtime_value helper to deal with pointer-to-function hack,
...
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
59801a3250
Add \newin{3.x} doxygen command
2017-01-17 10:31:28 +01:00
Gael Guennebaud
23bfcfc15f
Add missing overload of get_compile_time for c++98/11
2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2
Disambiguate the two versions of fix for doxygen
2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2
Add support for symbolic expressions as arguments of operator()
2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844
typos in doc
2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa
Typo
2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845
Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
...
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161
Introduce a EIGEN_HAS_CXX14 macro
2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381
Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.
2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8
Reverting unintentional change to Eigen/Geometry
2017-01-16 11:05:56 +00:00
LaFeuille
1b19b80c06
Fix a typo
2017-01-13 07:24:55 +00:00
Fraser Cormack
8245d3c7ad
Fix case-sensitivity of file include
2017-01-12 12:13:18 +00:00
Gael Guennebaud
752bd92ba5
Large code refactoring:
...
- generalize some utilities and move them to Meta (size(), array_size())
- move handling of all and single indices to IndexedViewHelper.h
- several cleanup changes
2017-01-11 17:24:02 +01:00
Gael Guennebaud
f93d1c58e0
Make get_compile_time compatible with variable_if_dynamic
2017-01-11 17:08:59 +01:00
Gael Guennebaud
c020d307a6
Make variable_if_dynamic<T> implicitely convertible to T
2017-01-11 17:08:05 +01:00
Gael Guennebaud
43c617e2ee
merge
2017-01-11 14:33:37 +01:00
Gael Guennebaud
152cd57bb7
Enable generation of doc for static variables in Eigen's namespace.
2017-01-11 14:29:20 +01:00
Gael Guennebaud
b1dc0fa813
Move fix and symbolic to their own file, and improve doxygen compatibility
2017-01-11 14:28:28 +01:00
Gael Guennebaud
04397f17e2
Add 1D overloads of operator()
2017-01-11 13:17:09 +01:00
Gael Guennebaud
45199b9773
Fix typo
2017-01-11 09:34:08 +01:00
Gael Guennebaud
1b5570988b
Add doc to seq, seqN, ArithmeticSequence, operator(), etc.
2017-01-10 22:58:58 +01:00
Gael Guennebaud
17eac60446
Factorize const and non-const version of the generic operator() method.
2017-01-10 21:45:55 +01:00
Gael Guennebaud
d072fc4b14
add writeable IndexedView
2017-01-10 17:10:35 +01:00
Gael Guennebaud
c9d5e5c6da
Simplify Symbolic API: std::tuple is now used internally and automatically built.
2017-01-10 16:55:07 +01:00
Gael Guennebaud
407e7b7a93
Simplify symbolic API by using "symbol=value" to associate a runtime value to a symbol.
2017-01-10 16:45:32 +01:00
Gael Guennebaud
96e6cf9aa2
Fix linking issue.
2017-01-10 16:35:46 +01:00
Gael Guennebaud
e63678bc89
Fix ambiguous call
2017-01-10 16:33:40 +01:00
Gael Guennebaud
8e247744a4
Fix linking issue
2017-01-10 16:32:06 +01:00
Gael Guennebaud
b47a7e5c3a
Add doc for IndexedView
2017-01-10 16:28:57 +01:00
Gael Guennebaud
87963f441c
Fallback to Block<> when possible (Index, all, seq with > increment).
...
This is important to take advantage of the optimized implementations (evaluator, products, etc.),
and to support sparse matrices.
2017-01-10 14:25:30 +01:00
Gael Guennebaud
a98c7efb16
Add a more generic evaluation mechanism and minimalistic doc.
2017-01-10 11:46:29 +01:00
Gael Guennebaud
13d954f270
Cleanup Eigen's namespace
2017-01-10 11:06:02 +01:00
Gael Guennebaud
9eaab4f9e0
Refactoring: move all symbolic stuff into its own namespace
2017-01-10 10:57:08 +01:00
Gael Guennebaud
acd08900c9
Move 'last' and 'end' to their own namespace
2017-01-10 10:31:07 +01:00
Gael Guennebaud
1df2377d78
Implement c++98 version of seq()
2017-01-10 10:28:45 +01:00
Gael Guennebaud
ecd9cc5412
Isolate legacy code (we keep it for performance comparison purpose)
2017-01-10 09:34:25 +01:00
Gael Guennebaud
b50c3e967e
Add a minimalistic symbolic scalar type with expression template and make use of it to define the last placeholder and to unify the return type of seq and seqN.
2017-01-09 23:42:16 +01:00
Gael Guennebaud
68064e14fa
Rename span/range to seqN/seq
2017-01-09 17:35:21 +01:00
Gael Guennebaud
ad3eef7608
Add link to SO
2017-01-09 13:01:39 +01:00
Gael Guennebaud
75aef5b37f
Fix extraction of compile-time size of std::array with gcc
2017-01-06 22:04:49 +01:00
Gael Guennebaud
233dff1b35
Add support for plain arrays for columns and both rows/columns
2017-01-06 22:01:53 +01:00
Gael Guennebaud
76e183bd52
Propagate compile-time size for plain arrays
2017-01-06 22:01:23 +01:00
Gael Guennebaud
3264d3c761
Add support for plain-array as indices, e.g., mat({1,2,3,4})
2017-01-06 21:53:32 +01:00
Gael Guennebaud
831fffe874
Add missing doc of SparseView
2017-01-06 18:01:29 +01:00
Gael Guennebaud
a875167d99
Propagate compile-time increment and strides.
...
Had to introduce a UndefinedIncr constant for non structured list of indices.
2017-01-06 15:54:55 +01:00
Gael Guennebaud
e383d6159a
MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd
2017-01-06 15:44:13 +01:00
Gael Guennebaud
fad1fa75b3
Propagate compile-time size with "all" and add c++11 array unit test
2017-01-06 13:29:33 +01:00
Gael Guennebaud
3730e3ca9e
Use "fix" for compile-time values, propagate compile-time sizes for span, clean some cleanup.
2017-01-06 13:10:10 +01:00
Gael Guennebaud
60e99ad8d7
Add unit test for indexed views
2017-01-06 11:59:08 +01:00
Gael Guennebaud
ac7e4ac9c0
Initial commit to add a generic indexed-based view of matrices.
...
This version already works as a read-only expression.
Numerous refactoring, renaming, extension, tuning passes are expected...
2017-01-06 00:01:44 +01:00
Gael Guennebaud
f3f026c9aa
Convert integers to real numbers when computing relative L2 error
2017-01-05 13:36:08 +01:00
Jim Radford
0c226644d8
LLT: const the arg to solveInPlace() to allow passing .transpose(), .block(), etc.
2017-01-04 14:42:57 -08:00
Jim Radford
be281e5289
LLT: avoid making a copy when decomposing in place
2017-01-04 14:43:56 -08:00
Gael Guennebaud
e27f17bf5c
Gub 1453: fix Map with non-default inner-stride but no outer-stride.
2017-08-22 13:27:37 +02:00
Gael Guennebaud
21d0a0bcf5
bug #1456 : add perf recommendation for LLT and storage format
2017-08-22 12:46:35 +02:00
Gael Guennebaud
2c3d70d915
Re-enable hidden doc in LLT
2017-08-22 12:04:09 +02:00
Gael Guennebaud
a6e7a41a55
bug #1455 : Cholesky module depends on Jacobi for rank-updates.
2017-08-22 11:37:32 +02:00
Gael Guennebaud
e6021cc8cc
bug #1458 : fix documentation of LLT and LDLT info() method.
2017-08-22 11:32:55 +02:00
Gael Guennebaud
2810ba194b
Clarify MKL_DIRECT_CALL doc.
2017-08-17 22:12:26 +02:00
Gael Guennebaud
f727844658
use MKL's lapacke.h header when using MKL
2017-08-17 21:58:39 +02:00
Gael Guennebaud
8c858bd891
Clarify doc regarding the usage of MKL_DIRECT_CALL
2017-08-17 12:17:45 +02:00
Gael Guennebaud
b95f92843c
Fix support for MKL's BLAS when using MKL_DIRECT_CALL.
2017-08-17 12:07:10 +02:00
Gael Guennebaud
89c01a494a
Add unit test for has_ReturnType
2017-08-17 11:55:00 +02:00
Gael Guennebaud
687bedfcad
Make NoAlias and JacobiRotation compatible with CUDA.
2017-08-17 11:51:22 +02:00
Gael Guennebaud
1f4b24d2df
Do not preallocate more space than the matrix size (when the sparse matrix boils down to a vector
2017-07-20 10:13:48 +02:00
Gael Guennebaud
d580a90c9a
Disable BDCSVD preallocation check.
2017-07-20 10:03:54 +02:00
Gael Guennebaud
55d7181557
Fix lazyness of operator* with CUDA
2017-07-20 09:47:28 +02:00
Gael Guennebaud
cda47c42c2
Fix compilation in c++98 mode.
2017-07-17 21:08:20 +02:00
Gael Guennebaud
a74b9ba7cd
Update documentation for CUDA
2017-07-17 11:05:26 +02:00
Gael Guennebaud
3182bdbae6
Disable vectorization when compiled by nvcc, even is EIGEN_NO_CUDA is defined
2017-07-17 11:01:28 +02:00
Gael Guennebaud
9f8136ff74
disable nvcc boolean-expr-is-constant warning
2017-07-17 10:43:18 +02:00
Gael Guennebaud
bbd97b4095
Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases
2017-07-17 01:02:51 +02:00
Gael Guennebaud
2299717fd5
Fix and workaround several doxygen issues/warnings
2017-01-04 23:27:33 +01:00
Luke Iwanski
90c5bc8d64
Fixes auto appearance in functor template argument for reduction.
2017-01-04 22:18:44 +00:00
Gael Guennebaud
ee6f7f6c0c
Add doc for sparse triangular solve functions
2017-01-04 23:10:36 +01:00
Gael Guennebaud
5165de97a4
Add missing snippet files.
2017-01-04 23:08:27 +01:00
Gael Guennebaud
a0a36ad0ef
bug #1336 : workaround doxygen failing to include numerous members of MatriBase in Matrix
2017-01-04 22:02:39 +01:00
Gael Guennebaud
29a1a58113
Document selfadjointView
2017-01-04 22:01:50 +01:00
Gael Guennebaud
a5ebc92f8d
bug #1336 : fix doxygen issue regarding EIGEN_CWISE_BINARY_RETURN_TYPE
2017-01-04 18:21:44 +01:00
Gael Guennebaud
45b289505c
Add debug output
2017-01-03 11:31:02 +01:00
Gael Guennebaud
5838f078a7
Fix inclusion
2017-01-03 11:30:27 +01:00
Gael Guennebaud
8702562177
bug #1370 : add doc for StorageIndex
2017-01-03 11:25:41 +01:00
Gael Guennebaud
575c078759
bug #1370 : rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index
2017-01-03 11:19:14 +01:00
NeroBurner
c4fc2611ba
add cmake-option to enable/disable creation of tests
...
* * *
disable unsupportet/test when test are disabled
* * *
rename EIGEN_ENABLE_TESTS to BUILD_TESTING
* * *
consider BUILD_TESTING in blas
2017-01-02 09:09:21 +01:00
Valentin Roussellet
d3c5525c23
Added += and + operators to inner iterators
...
Fix #1340
#1340
2016-12-28 18:29:30 +01:00
Gael Guennebaud
5c27962453
Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class.
2017-01-02 22:27:07 +01:00
Marco Falke
4ebf69394d
doc: Fix trivial typo in AsciiQuickReference.txt
...
* * *
fixup!
2017-01-01 13:25:48 +00:00
Gael Guennebaud
8d7810a476
bug #1365 : fix another type mismatch warning
...
(sync is set from and compared to an Index)
2016-12-28 23:35:43 +01:00
Gael Guennebaud
97812ff0d3
bug #1369 : fix type mismatch warning.
...
Returned values of omp thread id and numbers are int,
o let's use int instead of Index here.
2016-12-28 23:29:35 +01:00
Gael Guennebaud
7713e20fd2
Fix compilation
2016-12-27 22:04:58 +01:00
Gael Guennebaud
ab69a7f6d1
Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order
2016-12-27 16:55:47 +01:00
Gael Guennebaud
d32a43e33a
Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly.
2016-12-27 16:35:45 +01:00
Gael Guennebaud
7136267461
Add missing .outer() member to iterators of evaluators of cwise sparse binary expression
2016-12-27 16:34:30 +01:00
Gael Guennebaud
fe0ee72390
Fix check of storage order mismatch for "sparse cwiseop sparse".
2016-12-27 16:33:19 +01:00
Gael Guennebaud
6b8f637ab1
Harmless typo
2016-12-27 16:31:17 +01:00
Benoit Steiner
3eda02d78d
Fixed the sycl benchmarking code
2016-12-22 10:37:05 -08:00
Mehdi Goli
8b1c2108ba
Reverting asynchronous exec to Synchronous exec regarding random race condition.
2016-12-22 16:45:38 +00:00
Benoit Steiner
354baa0fb1
Avoid using horizontal adds since they're not very efficient.
2016-12-21 20:55:07 -08:00
Benoit Steiner
d7825b6707
Use native AVX512 types instead of Eigen Packets whenever possible.
2016-12-21 20:06:18 -08:00
Benoit Steiner
660da83e18
Pulled latest update from trunk
2016-12-21 16:43:27 -08:00
Benoit Steiner
4236aebe10
Simplified the contraction code`
2016-12-21 16:42:56 -08:00
Benoit Steiner
3cfa16f41d
Merged in benoitsteiner/opencl (pull request PR-279)
...
Fix for auto appearing in functor template argument.
2016-12-21 15:08:54 -08:00
Benoit Steiner
519d63d350
Added support for libxsmm kernel in multithreaded contractions
2016-12-21 15:06:06 -08:00
Benoit Steiner
0657228569
Simplified the way we link libxsmm
2016-12-21 14:40:08 -08:00
Benoit Steiner
bbca405f04
Pulled latest updates from trunk
2016-12-21 13:45:28 -08:00
Benoit Steiner
b91be60220
Automatically include and link libxsmm when present.
2016-12-21 13:44:59 -08:00
Gael Guennebaud
c6882a72ed
Merged in joaoruileal/eigen (pull request PR-276)
...
Minor improvements to Umfpack support
2016-12-21 21:39:48 +01:00
Benoit Steiner
f9eff17e91
Leverage libxsmm kernels within signle threaded contractions
2016-12-21 12:32:06 -08:00
Benoit Steiner
c19fe5e9ed
Added support for libxsmm in the eigen makefiles
2016-12-21 10:43:40 -08:00
Benoit Steiner
a34d4ebd74
Merged in benoitsteiner/opencl (pull request PR-278)
2016-12-21 08:24:17 -08:00
Luke Iwanski
c55ecfd820
Fix for auto appearing in functor template argument.
2016-12-21 15:42:51 +00:00
Joao Rui Leal
c8c89b5e19
renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus()
2016-12-21 09:16:28 +00:00
Benoit Steiner
0f577d4744
Merged eigen/eigen into default
2016-12-20 17:02:06 -08:00
Gael Guennebaud
f2f9df8aa5
Remove MSVC warning 4127 - conditional expression is constant from the disabled list as we now have a local workaround.
2016-12-20 22:53:19 +01:00
Gael Guennebaud
2b3fc981b8
bug #1362 : workaround constant conditional warning produced by MSVC
2016-12-20 22:52:27 +01:00
Luke Iwanski
29186f766f
Fixed order of initialisation in ExecExprFunctorKernel functor.
2016-12-20 21:32:42 +00:00
Gael Guennebaud
94e8d8902f
Fix bug #1367 : compilation fix for gcc 4.1!
2016-12-20 22:17:01 +01:00
Gael Guennebaud
e8d6862f14
Properly adjust precision when saving to Market format.
2016-12-20 22:10:33 +01:00
Gael Guennebaud
e2f4ee1c2b
Speed up parsing of sparse Market file.
2016-12-20 21:56:21 +01:00
Luke Iwanski
8245851d1b
Matching parameters order between lambda and the functor.
2016-12-20 16:18:15 +00:00
Gael Guennebaud
684cfc762d
Add transpose, adjoint, conjugate methods to SelfAdjointView (useful to write generic code)
2016-12-20 16:33:53 +01:00
Gael Guennebaud
8bd0d3aa34
merge
2016-12-20 15:56:00 +01:00
Gael Guennebaud
11f55b2979
Optimize storage layout of Cwise* and PlainObjectBase evaluator to remove the functor or outer-stride if they are empty.
...
For instance, sizeof("(A-B).cwiseAbs2()") with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
In theory, evaluators should be completely optimized away by the compiler, but this might help in some cases.
2016-12-20 15:55:40 +01:00
Gael Guennebaud
5271474b15
Remove common "noncopyable" base class from evaluator_base to get a chance to get EBO (Empty Base Optimization)
...
Note: we should probbaly get rid of this class and define a macro instead.
2016-12-20 15:51:30 +01:00
Christoph Hertzberg
1c024e5585
Added some possible temporaries to .hgignore
2016-12-20 14:45:44 +01:00
Gael Guennebaud
316673bbde
Clean-up usage of ExpressionTraits in all/any implementation.
2016-12-20 14:38:05 +01:00
Benoit Steiner
548ed30a1c
Added an OpenCL regression test
2016-12-19 18:56:26 -08:00
Christoph Hertzberg
10c6bcdc2e
Add support for long indexes and for (real-valued) row-major matrices to CholmodSupport module
2016-12-19 14:07:42 +01:00
Gael Guennebaud
f5d644b415
Make sure that HyperPlane::transform manitains a unit normal vector in the Affine case.
2016-12-20 09:35:00 +01:00
Benoit Steiner
27ceb43bf6
Fixed race condition in the tensor_shuffling_sycl test
2016-12-19 15:34:42 -08:00
Benoit Steiner
923acadfac
Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics
2016-12-19 13:02:27 -08:00
Benoit Jacob
751e097c57
Use 32 registers on ARM64
2016-12-19 13:44:46 -05:00
Benoit Steiner
fb1d0138ec
Include SSE packet instructions when compiling with avx512 enabled.
2016-12-19 07:32:48 -08:00
Joao Rui Leal
95b804c0fe
it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack
2016-12-19 10:45:59 +00:00
Gael Guennebaud
8c0e701504
bug #1360 : fix sign issue with pmull on altivec
2016-12-18 22:13:19 +00:00
Gael Guennebaud
fc94258e77
Fix unused warning
2016-12-18 22:11:48 +00:00
Benoit Steiner
0e0d92d34b
Merged in benoitsteiner/opencl (pull request PR-275)
...
Improved support for OpenCL
2016-12-17 10:14:17 -08:00
Benoit Steiner
9e03dfb452
Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code
2016-12-17 09:23:37 -08:00
Benoit Steiner
70d0172f0c
Merged eigen/eigen into default
2016-12-16 17:37:04 -08:00
Benoit Steiner
8910442e19
Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl.
2016-12-16 15:45:04 -08:00
Luke Iwanski
54db66c5df
struct -> class in order to silence compilation warning.
2016-12-16 20:25:20 +00:00
Mehdi Goli
35bae513a0
Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl.
2016-12-16 19:46:45 +00:00
ermak
d60cca32e5
Transformation methods added to ParametrizedLine class.
2016-12-17 00:45:13 +07:00
Jeff Trull
7949849ebc
refactor common row/column iteration code into its own class
2016-12-08 19:40:15 -08:00
Jeff Trull
d7bc64328b
add display of entries to gdb sparse matrix prettyprinter
2016-12-08 18:50:17 -08:00
Jeff Trull
ff424927bc
Introduce a simple pretty printer for sparse matrices (no contents)
2016-12-08 09:45:27 -08:00
Jeff Trull
5ce5418631
Correct prettyprinter comment - Quaternions are in fact supported
2016-12-08 07:31:16 -08:00
Rafael Guglielmetti
8f11df2667
NumTraits.h:
...
For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost
2016-12-16 09:07:12 +00:00
Gael Guennebaud
7d5303a083
Partly revert changeset 642dddcce2
...
, just in case the x87 issue popup again
2016-12-16 09:25:14 +01:00
Benoit Steiner
2f7c2459b7
Merged in benoitsteiner/opencl (pull request PR-272)
...
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
2016-12-15 17:46:40 -08:00
Mehdi Goli
c5e8546306
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
2016-12-15 16:59:57 +00:00
Christoph Hertzberg
4247d35d4b
Fixed bug which (extremely rarely) could end in an infinite loop
2016-12-15 17:22:12 +01:00
Christoph Hertzberg
642dddcce2
Fix nonnull-compare warning
2016-12-15 17:16:56 +01:00
Benoit Steiner
1324ffef2f
Reenabled the use of constexpr on OpenCL devices
2016-12-15 06:49:38 -08:00
Gael Guennebaud
5d00fdf0e8
bug #1363 : fix mingw's ABI issue
2016-12-15 11:58:31 +01:00
Benoit Steiner
2c2e218471
Avoid using #define since they can conflict with user code
2016-12-14 19:49:15 -08:00
Benoit Steiner
3beb180ee5
Don't call EnvThread::OnCancel by default since it doesn't do anything.
2016-12-14 18:33:39 -08:00
Benoit Steiner
9ff5d0f821
Merged eigen/eigen into default
2016-12-14 17:32:16 -08:00
Mehdi Goli
730eb9fe1c
Adding asynchronous execution as it improves the performance.
2016-12-14 17:38:53 +00:00
Gael Guennebaud
11b492e993
bug #1358 : fix compilation for sparse += sparse.selfadjointView();
2016-12-14 17:53:47 +01:00
Gael Guennebaud
e67397bfa7
bug #1359 : fix compilation of col_major_sparse.row() *= scalar
...
(used to work in 3.2.9 though the expression is not really writable)
2016-12-14 17:05:26 +01:00
Gael Guennebaud
98d7458275
bug #1359 : fix sparse /=scalar and *=scalar implementation.
...
InnerIterators must be obtained from an evaluator.
2016-12-14 17:03:13 +01:00
Mehdi Goli
2d4a091beb
Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h
2016-12-14 15:30:37 +00:00
Gael Guennebaud
c817ce3ba3
bug #1361 : fix compilation issue in mat=perm.inverse()
2016-12-13 23:10:27 +01:00
Benoit Steiner
a432fc102d
Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool
2016-12-12 15:24:16 -08:00
Benoit Steiner
8ae68924ed
Made ThreadPoolInterface::Cancel() an optional functionality
2016-12-12 11:58:38 -08:00
Gael Guennebaud
57acb05eef
Update and extend doc on alignment issues.
2016-12-11 22:45:32 +01:00
Benoit Steiner
76fca22134
Use a more accurate timer to sleep on Linux systems.
2016-12-09 15:12:24 -08:00
Benoit Steiner
4deafd35b7
Introduce a portable EIGEN_SLEEP macro.
2016-12-09 14:52:15 -08:00
Benoit Steiner
aafa97f4d2
Fixed build error with MSVC
2016-12-09 14:42:32 -08:00
Benoit Steiner
2f5b7a199b
Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.
2016-12-09 13:05:14 -08:00
Benoit Steiner
3d59a47720
Added a message to ease the detection of platforms on which thread cancellation isn't supported.
2016-12-08 14:51:46 -08:00
Benoit Steiner
28ee8f42b2
Added a Flush method to the RunQueue
2016-12-08 14:07:56 -08:00
Benoit Steiner
69ef267a77
Added the new threadpool cancel method to the threadpool interface based class.
2016-12-08 14:03:25 -08:00
Benoit Steiner
7bfff85355
Added support for thread cancellation on Linux
2016-12-08 08:12:49 -08:00
Benoit Steiner
6811e6cf49
Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268)
...
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-08 05:14:11 -08:00
Gael Guennebaud
747202d338
typo
2016-12-08 12:48:15 +01:00
Gael Guennebaud
bb297abb9e
make sure we use the right eigen version
2016-12-08 12:00:11 +01:00
Gael Guennebaud
8b4b00d277
fix usage of custom compiler
2016-12-08 11:59:39 +01:00
Gael Guennebaud
7105596899
Add missing include and use -O3
2016-12-07 16:56:08 +01:00
Gael Guennebaud
780f3c1adf
Fix call to convert on linux
2016-12-07 16:30:11 +01:00
Gael Guennebaud
3855ab472f
Cleanup file structure
2016-12-07 14:23:49 +01:00
Gael Guennebaud
59a59fa8e7
Update perf monitoring scripts to generate html/svg outputs
2016-12-07 13:36:56 +01:00
Angelos Mantzaflaris
7694684992
Remove superfluous const's (can cause warnings on some Intel compilers)
...
(grafted from e236d3443c
)
2016-12-07 00:37:48 +01:00
Gael Guennebaud
f2c506b03d
Add a script example to run and upload performance tests
2016-12-06 16:46:52 +01:00
Gael Guennebaud
1b4e085a7f
generate png file for web upload
2016-12-06 16:46:22 +01:00
Gael Guennebaud
f725f1cebc
Mention the CMAKE_PREFIX_PATH variable.
2016-12-06 15:23:45 +01:00
Gael Guennebaud
f90c4aebc5
Update monitored changeset lists
2016-12-06 15:07:46 +01:00
Gael Guennebaud
eb621413c1
Revert vec/y to vec*(1/y) in row-major TRSM:
...
- div is extremely costly
- this is consistent with the column-major case
- this is consistent with all other BLAS implementations
2016-12-06 15:04:50 +01:00
Gael Guennebaud
8365c2c941
Fix BLAS backend for symmetric rank K updates.
2016-12-06 14:47:09 +01:00
Gael Guennebaud
0c4d05b009
Explain how to choose your favorite Eigen version
2016-12-06 11:34:06 +01:00
Silvio Traversaro
e049a2a72a
Added relocatable cmake support also for CMake before 3.0 and after 2.8.8
2016-12-06 10:37:34 +01:00
Srinivas Vasudevan
e6c8b5500c
Change comparisons to use Scalar instead of RealScalar.
2016-12-05 14:01:45 -08:00
Srinivas Vasudevan
f7d7c33a28
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-05 12:19:01 -08:00
Silvio Traversaro
18481b518f
Make CMake config file relocatable
2016-12-05 10:39:52 +01:00
Gael Guennebaud
c68c8631e7
fix compilation of BTL's blaze interface
2016-12-05 23:02:16 +01:00
Gael Guennebaud
1ff1d4a124
Add performance monitoring for LLT
2016-12-05 23:01:52 +01:00
Srinivas Vasudevan
09ee7f0c80
Fix small nit where I changed name of plog1p to pexpm1.
2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
a0d3ac760f
Sync from Head.
2016-12-02 14:14:45 -08:00
Srinivas Vasudevan
218764ee1f
Added support for expm1 in Eigen.
2016-12-02 14:13:01 -08:00
Gael Guennebaud
66f65ccc36
Ease compiler job to generate clean and efficient code in mat*vec.
2016-12-02 22:41:26 +01:00
Gael Guennebaud
fe696022ec
Operators += and -= do not resize!
2016-12-02 22:40:25 +01:00
Angelos Mantzaflaris
18de92329e
use numext::abs
...
(grafted from 0a08d4c60b
)
2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris
e8a6aa518e
1. Add explicit template to abs2 (resolves deduction for some arithmetic types)
...
2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned)
(grafted from 4086187e49
)
2016-12-02 11:39:18 +01:00
Gael Guennebaud
a6b971e291
Fix memory leak in Ref<Sparse>
2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65
Optimize SparseLU::solve for rhs vectors
2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903
remove temporary in SparseLU::solve
2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4
bug #1356 : fix calls to evaluator::coeffRef(0,0) to get the address of the destination
...
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86
typo
2016-12-05 13:51:07 +01:00
Gael Guennebaud
445c015751
extend monitoring benchmarks with transpose matrix-vector and triangular matrix-vectors.
2016-12-05 13:36:26 +01:00
Gael Guennebaud
e3f613cbd4
Improve performance of row-major-dense-matrix * vector products for recent CPUs.
...
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354
Clean debugging code
2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a
Merged in srvasude/eigen (pull request PR-265)
...
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
4465d20403
Add missing generic load methods.
2016-12-03 21:25:04 +01:00
Gael Guennebaud
6a5fe86098
Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
...
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Benoit Steiner
2bfece5cd1
Merged eigen/eigen into default
2016-12-02 16:30:14 -08:00
Mehdi Goli
592acc5bfa
Makingt default numeric_list works with sycl.
2016-12-02 17:58:30 +00:00
Gael Guennebaud
8dfb3e00b8
merge
2016-12-02 11:34:21 +01:00
Gael Guennebaud
4c0d5f3c01
Add perf monitoring for gemv
2016-12-02 11:34:12 +01:00
Gael Guennebaud
d2718d662c
Re-enable A^T*A action in BTL
2016-12-02 11:32:03 +01:00
Christoph Hertzberg
22f7d398e2
bug #1355 : Fixed wrong line-endings on two files
2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4
Clean up SparseCore module regarding ReverseInnerIterator
2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09
typo UIntPtr
...
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655
fix two warnings(unused typedef, unused variable) and a typo
...
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb
fix member order
2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae
Merged in rmlarsen/eigen (pull request PR-256)
...
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Gael Guennebaud
f95e3b84a5
merge
2016-12-01 16:18:57 +01:00
Benoit Steiner
7ff26ddcbb
Merged eigen/eigen into default
2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d
Fix misleading-indentation warnings.
2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e
Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
2016-12-01 13:02:27 +00:00
Benoit Steiner
a70393fd02
Cleaned up forward declarations
2016-11-30 21:59:07 -08:00
Benoit Steiner
e073de96dc
Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again
2016-11-30 21:36:52 -08:00
Benoit Steiner
fca27350eb
Added the deallocate_all() method back
2016-11-30 20:45:20 -08:00
Benoit Steiner
e633a8371f
Simplified includes
2016-11-30 20:21:18 -08:00
Benoit Steiner
7cd33df4ce
Improved formatting
2016-11-30 20:20:44 -08:00
Benoit Steiner
fd1dc3363e
Merged eigen/eigen into default
2016-11-30 20:16:17 -08:00
Benoit Steiner
f5107010ee
Udated the Sizes class to work on AMD gpus without requiring a separate implementation
2016-11-30 19:57:28 -08:00
Benoit Steiner
e37c2c52d3
Added an implementation of numeric_list that works with sycl
2016-11-30 19:55:15 -08:00
Gael Guennebaud
8df272af88
Fix slection of product implementation for dynamic size matrices with fixed max size.
2016-11-30 22:21:33 +01:00
Benoit Steiner
faa2ff99c6
Pulled latest update from trunk
2016-11-30 09:31:24 -08:00
Benoit Steiner
df3da0780d
Updated customIndices2Array to handle various index sizes.
2016-11-30 09:30:12 -08:00
Gael Guennebaud
c927af60ed
Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times.
2016-11-30 17:59:13 +01:00
Luke Iwanski
26fff1c5b1
Added EIGEN_STRONG_INLINE to get_sycl_supported_device().
2016-11-30 16:55:22 +00:00
Gael Guennebaud
ab4ef5e66e
bug #1351 : fix compilation of random with old compilers
2016-11-30 17:37:53 +01:00
Sergiu Deitsch
5e3c5c42f6
cmake: remove architecture dependency from Eigen3ConfigVersion.cmake
...
Also, install Eigen3*.cmake under $prefix/share/eigen3/cmake by default.
(grafted from 86ab00cdcf
)
2016-11-30 15:46:46 +01:00
Sergiu Deitsch
3440b46e2f
doc: mention the NO_MODULE option and target availability
...
(grafted from 65f09be8d2
)
2016-11-30 15:41:38 +01:00
Rasmus Munk Larsen
a0329f64fb
Add a default constructor for the "fake" __half class when not using the
...
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Mehdi Goli
577ce78085
Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend.
2016-11-29 15:30:42 +00:00
Benoit Steiner
3011dc94ef
Call internal::array_prod to compute the total size of the tensor.
2016-11-28 09:00:31 -08:00
Benoit Steiner
02080e2b67
Merged eigen/eigen into default
2016-11-27 07:27:30 -08:00
Benoit Steiner
9fd081cddc
Fixed compilation warnings
2016-11-26 20:22:25 -08:00
Benoit Steiner
9f8fbd9434
Merged eigen/eigen into default
2016-11-26 11:28:25 -08:00
Benoit Steiner
67b2c41f30
Avoided unnecessary type conversion
2016-11-26 11:27:29 -08:00
Benoit Steiner
7fe704596a
Added missing array_get method for numeric_list
2016-11-26 11:26:07 -08:00
Mehdi Goli
7318daf887
Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
2016-11-25 16:19:07 +00:00
Benoit Steiner
7ad37606dd
Fixed the documentation of Scalar Tensors
2016-11-24 12:31:43 -08:00
Benoit Steiner
3be1afca11
Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5
2016-11-23 18:49:51 -08:00
Gael Guennebaud
308961c05e
Fix compilation.
2016-11-23 22:17:52 +01:00
Gael Guennebaud
21d0286d81
bug #1348 : Document EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES,
...
and reflect in the doc that EIGEN_DONT_ALIGN* are deprecated.
2016-11-23 22:15:03 +01:00
Mehdi Goli
b8cc5635d5
Removing unsupported device from test case; cleaning the tensor device sycl.
2016-11-23 16:30:41 +00:00
Gael Guennebaud
7f6333c32b
Merged in tal500/eigen-eulerangles (pull request PR-237)
...
Euler angles
2016-11-23 15:17:38 +00:00
Gael Guennebaud
f12b368417
Extend polynomial solver unit tests to complexes
2016-11-23 16:05:45 +01:00
Gael Guennebaud
56e5ec07c6
Automatically switch between EigenSolver and ComplexEigenSolver, and fix a few Real versus Scalar issues.
2016-11-23 16:05:10 +01:00
Gael Guennebaud
9246587122
Patch from Oleg Shirokobrod to extend polynomial solver to complexes
2016-11-23 15:42:26 +01:00
Gael Guennebaud
e340866c81
Fix compilation with gcc and old ABI version
2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98
Fix compilation issue with MSVC:
...
MSVC always messes up with shadowed template arguments, for instance in:
struct B { typedef float T; }
template<typename T> struct A : B {
T g;
};
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3
Optimize predux<Packet8f> (AVX)
2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856
Disable usage of SSE3 _mm_hadd_ps that is extremely slow.
2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e
Optimize predux<Packet4d> (AVX)
2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940
Disable usage of SSE3 haddpd that is extremely slow.
2016-11-22 16:58:31 +01:00
Sergiu Deitsch
5c516e4e0a
cmake: added Eigen3::Eigen imported target
...
(grafted from a287140f72
)
2016-11-22 12:25:06 +01:00
Gael Guennebaud
6a84246a6a
Fix regression in assigment of sparse block to spasre block.
2016-11-21 21:46:42 +01:00
Benoit Steiner
f11da1d83b
Made the QueueInterface thread safe
2016-11-20 13:17:08 -08:00
Benoit Steiner
ed839c5851
Enable the use of constant expressions with clang >= 3.6
2016-11-20 10:34:49 -08:00
Benoit Steiner
6d781e3e52
Merged eigen/eigen into default
2016-11-20 10:12:54 -08:00
Benoit Steiner
79a07b891b
Fixed a typo
2016-11-20 07:07:41 -08:00
Gael Guennebaud
465ede0f20
Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d3
...
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474
Fixed merge conflicts
2016-11-19 19:12:59 -08:00
Benoit Steiner
9265ca707e
Made it possible to check the state of a sycl device without synchronization
2016-11-19 10:56:24 -08:00
Benoit Steiner
2d1aec15a7
Added missing include
2016-11-19 08:09:54 -08:00
Luke Iwanski
af67335e0e
Added test for cwiseMin, cwiseMax and operator%.
2016-11-19 13:37:27 +00:00
Benoit Steiner
1bdf1b9ce0
Merged in benoitsteiner/opencl (pull request PR-253)
...
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
a357fe1fb9
Code cleanup
2016-11-18 16:58:09 -08:00
Benoit Steiner
1c6eafb46b
Updated cxx11_tensor_device_sycl to run only on the OpenCL devices available on the host
2016-11-18 16:43:27 -08:00
Benoit Steiner
ca754caa23
Only runs the cxx11_tensor_reduction_sycl on devices that are available.
2016-11-18 16:31:14 -08:00
Benoit Steiner
dc601d79d1
Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices().
2016-11-18 16:26:50 -08:00
Benoit Steiner
8649e16c2a
Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio
2016-11-18 14:18:34 -08:00
Benoit Steiner
110b7f8d9f
Deleted unnecessary semicolons
2016-11-18 14:06:17 -08:00
Benoit Steiner
b5e3285e16
Test broadcasting on OpenCL devices with 64 bit indexing
2016-11-18 13:44:20 -08:00
Gael Guennebaud
164414c563
Merged in ChunW/eigen (pull request PR-252)
...
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Benoit Steiner
37c2c516a6
Cleaned up the sycl device code
2016-11-18 12:38:06 -08:00
Benoit Steiner
7335c49204
Fixed the cxx11_tensor_device_sycl test
2016-11-18 12:37:13 -08:00
Mehdi Goli
15e226d7d3
adding Benoit changes on the TensorDeviceSycl.h
2016-11-18 16:34:54 +00:00
Mehdi Goli
622805a0c5
Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.
2016-11-18 16:20:42 +00:00
Luke Iwanski
5159675c33
Added isnan, isfinite and isinf for SYCL device. Plus test for that.
2016-11-18 16:01:48 +00:00
Tal Hadad
76b2a3e6e7
Allow to construct EulerAngles from 3D vector directly.
...
Using assignment template struct to distinguish between 3D vector and 3D rotation matrix.
2016-11-18 15:01:06 +02:00
Luke Iwanski
927bd62d2a
Now testing out (+=, =) in.FUNC() and out (+=, =) out.FUNC()
2016-11-18 11:16:42 +00:00
Gael Guennebaud
8193ffb3d3
bug #1343 : fix compilation regression in mat+=selfadjoint_view.
...
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2
bug #1343 : fix compilation regression in array = matrix_product
2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f
Merged eigen/eigen into default
2016-11-17 22:53:37 -08:00
Benoit Steiner
553f50b246
Added a way to detect errors generated by the opencl device from the host
2016-11-17 21:51:48 -08:00
Benoit Steiner
72a45d32e9
Cleanup
2016-11-17 21:29:15 -08:00
Benoit Steiner
4349fc640e
Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
...
Small cleanup
2016-11-17 20:27:54 -08:00
Benoit Steiner
a6a3fd0703
Made TensorDeviceCuda.h compile on windows
2016-11-17 16:15:27 -08:00
Chun Wang
0d0948c3b9
Workaround for error in VS2012 with /clr
2016-11-17 17:54:27 -05:00
Benoit Steiner
004344cf54
Avoid calling log(0) or 1/0
2016-11-17 11:56:44 -08:00
Konstantinos Margaritis
a1d5c503fa
replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f
2016-11-17 13:27:45 -05:00
Konstantinos Margaritis
672aa97d4d
implement float/std::complex<float> for ZVector as well, minor fixes to ZVector
2016-11-17 13:27:33 -05:00
Konstantinos Margaritis
8290e21fb5
replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f
2016-11-17 13:21:15 -05:00
Luke Iwanski
7878756dea
Fixed existing test.
2016-11-17 17:46:55 +00:00
Luke Iwanski
c5130dedbe
Specialised basic math functions for SYCL device.
2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256
Enable the use of AVX512 instruction by default
2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c
bump default branch to 3.3.90
2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4
Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs
2016-11-16 09:01:51 -08:00
Benoit Steiner
b5c75351e3
Merged eigen/eigen into default
2016-11-14 15:54:44 -08:00
Rasmus Munk Larsen
32df1b1046
Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics).
2016-11-14 14:18:16 -08:00
Mehdi Goli
05e8c2a1d9
Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl.
2016-11-14 18:13:53 +00:00
Mehdi Goli
f8ca893976
Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing.
2016-11-14 17:51:57 +00:00
Gael Guennebaud
0ee92aa38e
Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products.
2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0
bug #426 : move operator && and || to MatrixBase and SparseMatrixBase.
2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c
Merged in olesalscheider/eigen (pull request PR-248)
...
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba
Fix regression in SparseMatrix::ReverseInnerIterator
2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408
Make sure not to call numext::maxi on expression templates
2016-11-12 12:20:57 +01:00
Mehdi Goli
a5c3f15682
Adding comment to TensorDeviceSycl.h and cleaning the code.
2016-11-11 19:06:34 +00:00
Benoit Steiner
f4722aa479
Merged in benoitsteiner/opencl (pull request PR-247)
2016-11-11 00:01:28 +00:00
Mehdi Goli
3be3963021
Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor.
2016-11-10 19:16:31 +00:00
Mehdi Goli
12387abad5
adding the missing in eigen_assert!
2016-11-10 18:58:08 +00:00
Mehdi Goli
2e704d4257
Adding Memset; optimising MecopyDeviceToHost by removing double copying;
2016-11-10 18:45:12 +00:00
Gael Guennebaud
eeac81b8c0
bump to 3.3.0
2016-11-10 13:55:14 +01:00
Gael Guennebaud
e80bc2ddb0
Fix printing of sparse expressions
2016-11-10 10:35:32 +01:00
Benoit Steiner
75c080b176
Added a test to validate memory transfers between host and sycl device
2016-11-09 06:23:42 -08:00
Benoit Steiner
db3903498d
Merged in benoitsteiner/opencl (pull request PR-246)
...
Improved support for OpenCL
2016-11-08 22:28:44 +00:00
Benoit Steiner
dcc14bee64
Fixed the formatting of the code
2016-11-08 14:24:46 -08:00
Benoit Steiner
b88c1117d4
Fixed the indentation of the cmake file
2016-11-08 14:22:36 -08:00
Luke Iwanski
912cb3d660
#if EIGEN_EXCEPTION -> #ifdef EIGEN_EXCEPTIONS.
2016-11-08 22:01:14 +00:00
Luke Iwanski
1b345b0895
Fix for SYCL queue initialisation.
2016-11-08 21:56:31 +00:00
Luke Iwanski
1b95717358
Use try/catch only when exceptions are enabled.
2016-11-08 21:08:53 +00:00
Mehdi Goli
d57430dd73
Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
2016-11-08 17:08:02 +00:00
Gael Guennebaud
73985ead27
Extend unit test to check sparse solvers with a SparseVector as the rhs and result.
2016-11-06 20:29:57 +01:00
Gael Guennebaud
436a111792
Generalize Cholmod support to hanlde any sparse type as the rhs and result of the solve method
2016-11-06 20:29:23 +01:00
Gael Guennebaud
afc55b1885
Generalize IterativeSolverBase::solve to hanlde any sparse type as the results (instead of SparseMatrix only)
2016-11-06 20:28:18 +01:00
Gael Guennebaud
a5c2d8a3cc
Generalize solve_sparse_through_dense_panels to handle SparseVector.
2016-11-06 15:20:58 +01:00
Gael Guennebaud
f8bfe10613
Add missing friend declaration
2016-11-06 15:20:30 +01:00
Gael Guennebaud
fc7180cda8
Add a default ctor to evaluator<SparseVector>.
...
Needed for evaluator<Solve>.
2016-11-06 15:20:00 +01:00
Gael Guennebaud
4d226ab5b5
Enable swapping between SparseMatrix and SparseVector
2016-11-06 15:15:03 +01:00
Benoit Steiner
ad086b03e4
Removed unnecessary statement
2016-11-05 12:43:27 -07:00
Benoit Steiner
dad177be01
Added missing includes
2016-11-05 10:04:42 -07:00
Gael Guennebaud
55b4fd1d40
Extend mpreal unit test to check LLT with complexes.
2016-11-05 11:28:53 +01:00
Gael Guennebaud
a354c3ca59
Fix compilation of LLT with complex<mpreal>.
2016-11-05 11:28:29 +01:00
Benoit Steiner
d46a36cc84
Merged eigen/eigen into default
2016-11-04 18:22:55 -07:00
Mehdi Goli
0ebe3808ca
Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
2016-11-04 18:18:19 +00:00
Gael Guennebaud
47d1b4a609
Added tag 3.3-rc2 for changeset ba05572dcb
2016-11-04 09:09:18 +01:00
Gael Guennebaud
ba05572dcb
bump to 3.3-rc2
2016-11-04 09:09:06 +01:00
Benoit Steiner
5c3995769c
Improved AVX512 configuration
2016-11-03 04:50:28 -07:00
Benoit Steiner
fbe672d599
Reenable the generation of dynamic blas libraries.
2016-11-03 04:08:43 -07:00
Benoit Steiner
ca0ba0d9a4
Improved AVX512 support
2016-11-03 04:00:49 -07:00
Benoit Steiner
c80587c92b
Merged eigen/eigen into default
2016-11-03 03:55:11 -07:00
Gael Guennebaud
3f1d0cdc22
bug #1337 : improve doc of homogeneous() and hnormalized()
2016-11-03 11:03:08 +01:00
Gael Guennebaud
78e93ac1ad
bug #1330 : Cholmod supports double precision only, so let's trigger a static assertion if the scalar type does not match this requirement.
2016-11-03 10:21:59 +01:00
Benoit Steiner
3e37166d0b
Merged in benoitsteiner/opencl (pull request PR-244)
...
Disable vectorization on device only when compiling for sycl
2016-11-02 22:01:03 +00:00
Benoit Steiner
0585b2965d
Disable vectorization on device only when compiling for sycl
2016-11-02 11:44:27 -07:00
Benoit Steiner
e6e77ed08b
Don't call lgamma_r when compiling for an Apple device, since the function isn't available on MacOS
2016-11-02 09:55:39 -07:00
Benoit Steiner
b238f387b4
Pulled latest updates from trunk
2016-11-02 08:53:13 -07:00
Benoit Steiner
c8db17301e
Special functions require math.h: make sure it is included.
2016-11-02 08:51:52 -07:00
Gael Guennebaud
a07bb428df
bug #1004 : improve accuracy of LinSpaced for abs(low) >> abs(high).
2016-11-02 11:34:38 +01:00
Gael Guennebaud
598de8b193
Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.
2016-11-02 10:38:13 +01:00
Benoit Steiner
e44519744e
Merged in benoitsteiner/opencl (pull request PR-243)
...
Fixed the ambiguity in callig make_tuple for sycl backend.
2016-11-02 02:56:58 +00:00
Rasmus Munk Larsen
0a6ae41555
Merged eigen/eigen into default
2016-11-01 15:37:00 -07:00
Rasmus Munk Larsen
b730952414
Don't attempts to use lgamma_r for CUDA devices.
...
Fix type in lgamma_impl<double>.
2016-11-01 15:34:19 -07:00
Benoit Steiner
7a0e96b80d
Gate the code that refers to cuda fp16 primitives more thoroughly
2016-11-01 12:08:09 -07:00
Mehdi Goli
51af6ae971
Fixed the ambiguity in callig make_tuple for sycl backend.
2016-10-31 16:35:51 +00:00
Benoit Steiner
0a9ad6fc72
Worked around Visual Studio compilation errors
2016-10-28 07:54:27 -07:00
Benoit Steiner
d5f88e2357
Sharded the tensor_image_patch test to help it run on low power devices
2016-10-27 21:48:21 -07:00
Benoit Steiner
0b4b0f11e8
Fixed a few more compilation warnings
2016-10-28 04:01:01 +00:00
Benoit Steiner
306daa24a3
Fixed a compilation warning
2016-10-28 03:50:31 +00:00
Benoit Steiner
8471cf1996
Fixed compilation warning
2016-10-28 03:46:08 +00:00
Benoit Steiner
b0c5bfdf78
Added missing template parameters
2016-10-28 03:43:41 +00:00
Rasmus Munk Larsen
2ebb314fa7
Use threadsafe versions of lgamma and lgammaf if possible.
2016-10-27 16:17:12 -07:00
Gael Guennebaud
530f20c21a
Workaround MSVC issue.
2016-10-27 21:51:37 +02:00
Gael Guennebaud
c3ce4f9ac0
Merged in enricodetoma/eigen (pull request PR-241)
...
Always enable /bigobj for tests to avoid a compile error in MSVC 2015
2016-10-27 19:21:28 +00:00
Benoit Steiner
7d64e6752c
Pulled latest updates from trunk
2016-10-26 18:48:06 -07:00
Benoit Steiner
0a4c4d40b4
Removed a template parameter for fixed sized tensors
2016-10-26 18:47:37 -07:00
Gael Guennebaud
3ecb343dc3
Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression.
2016-10-26 22:50:41 +02:00
enrico.detoma
6ed571744b
Always enable /bigobj for tests to avoid a compile error in MSVC 2015
2016-10-26 22:48:46 +02:00
Gael Guennebaud
97feea9d39
add a generic EIGEN_HAS_CXX11
2016-10-26 15:53:13 +02:00
Gael Guennebaud
ca6a2a5248
Fix warning with ICC
2016-10-26 14:13:05 +02:00
Benoit Steiner
5f2dd503ff
Replaced tabs with spaces
2016-10-25 20:40:58 -07:00
Benoit Steiner
1644bafe29
Code cleanup
2016-10-25 20:36:14 -07:00
Gael Guennebaud
b15a5dc3f4
Fix ICC warnings
2016-10-25 22:20:24 +02:00
Gael Guennebaud
aad72f3c6d
Add missing inline keywords
2016-10-25 20:20:09 +02:00
Benoit Steiner
3e194a6a73
Fixed a typo
2016-10-25 08:42:15 -07:00
Gael Guennebaud
58146be99b
bug #1004 : one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity.
...
This version simply does low+i*step plus a branch to return high if i==size-1.
Vectorization is accomplished with a branch and the help of pinsertlast.
Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.
2016-10-25 16:53:09 +02:00
Gael Guennebaud
13fc18d3a2
Add a pinsertlast function replacing the last entry of a packet by a scalar.
...
(useful to vectorize LinSpaced)
2016-10-25 16:48:49 +02:00
Gael Guennebaud
2634f9386c
bug #1333 : fix bad usage of const_cast_derived. Better use .data() for that purpose.
2016-10-24 22:22:35 +02:00
Gael Guennebaud
9e8f07d7b5
Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.
2016-10-24 22:16:48 +02:00
Gael Guennebaud
b027d7a8cf
bug #1004 : remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term.
...
Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
2016-10-24 20:27:21 +02:00
Benoit Steiner
b11aab5fcc
Merged in benoitsteiner/opencl (pull request PR-238)
...
Added support for OpenCL to the Tensor Module
2016-10-24 15:30:45 +00:00
Gael Guennebaud
53c77061f0
bug #698 : rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible.
...
Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution.
This changeset also disable vectorization for this integer path.
2016-10-24 15:50:27 +02:00
Gael Guennebaud
e8e56c7642
Add unit test for overflow in LinSpaced
2016-10-24 15:43:51 +02:00
Gael Guennebaud
40f62974b7
bug #1328 : workaround a compilation issue with gcc 4.2
2016-10-20 19:19:37 +02:00
Benoit Steiner
cf20b30d65
Merge latest updates from trunk
2016-10-20 09:42:05 -07:00
Luke Iwanski
03b63e182c
Added SYCL include in Tensor.
2016-10-20 15:32:44 +01:00
Benoit Steiner
d3943cd50c
Fixed a few typos in the ternary tensor expressions types
2016-10-19 12:56:12 -07:00
Tal Hadad
15eca2432a
Euler tests: Tighter precision when no roll exists and clean code.
2016-10-18 23:24:57 +03:00
Tal Hadad
6f4f12d1ed
Add isApprox() and cast() functions.
...
test cases included
2016-10-17 22:23:47 +03:00
Tal Hadad
7402cfd4cc
Add safty for near pole cases and test them better.
2016-10-17 20:42:08 +03:00
Mehdi Goli
8fb162fc85
Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core.
2016-10-16 12:52:34 +01:00
Tal Hadad
58f5d7d058
Fix calc bug, docs and better testing.
...
Test code changes:
* better coded
* rand and manual numbers
* singularity checking
2016-10-16 14:39:26 +03:00
Mehdi Goli
e36cb91c99
Fixing the code indentation in the TensorReduction.h file.
2016-10-14 18:03:00 +01:00
Luke Iwanski
2e188dd4d4
Merged ComputeCpp to default.
2016-10-14 16:47:40 +01:00
Mehdi Goli
15380f9a87
Applyiing Benoit's comment to return the missing line back in Eigen/Core
2016-10-14 16:39:41 +01:00
Gael Guennebaud
692b30ca95
Fix previous merge.
2016-10-14 17:16:28 +02:00
Gael Guennebaud
050c681bdd
Merged in rmlarsen/eigen2 (pull request PR-232)
...
Improve performance of parallelized matrix multiply for rectangular matrices
2016-10-14 14:51:09 +00:00
Tal Hadad
078a202621
Merge Hongkai Dai correct range calculation, and remove ranges from API.
...
Docs updated.
2016-10-14 16:03:28 +03:00
Luke Iwanski
e742da8b28
Merged ComputeCpp into default.
2016-10-14 13:36:51 +01:00
Mehdi Goli
524fa4c46f
Reducing the code by generalising sycl backend functions/structs.
2016-10-14 12:09:55 +01:00
Hongkai Dai
014d9f1d9b
implement euler angles with the right ranges
2016-10-13 14:45:51 -07:00
Benoit Steiner
737e4152c3
Merged in lukier/eigen (pull request PR-234)
...
Enabling CUDA in Geometry
2016-10-13 18:09:28 +00:00
Benoit Steiner
d0ee2267d6
Relaxed the resizing checks so that they don't fail with gcc >= 5.3
2016-10-13 10:59:46 -07:00
Robert Lukierski
a94791b69a
Fixes for min and abs after Benoit's comments, switched to numext.
2016-10-13 15:00:22 +01:00
Avi Ginsburg
ac63d6891c
Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure
...
whether to limit the check to this compiler combination
(` || (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `)
or to leave it as it is. I also don't know if this will have any affect on
including Eigen in device code (I'm not in my current project).
2016-10-13 08:47:32 +00:00
Benoit Steiner
7e4a6754b2
Merged eigen/eigen into default
2016-10-12 22:42:33 -07:00
Benoit Steiner
38b6048e14
Deleted redundant implementation of predux
2016-10-12 14:37:56 -07:00
Gael Guennebaud
e74612b9a0
Remove double ;;
2016-10-12 22:49:47 +02:00
Benoit Steiner
78d2926508
Merged eigen/eigen into default
2016-10-12 13:46:29 -07:00
Benoit Steiner
2e2f48e30e
Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.
2016-10-12 13:45:39 -07:00
Gael Guennebaud
f939c351cb
Fix SPQR for rectangular matrices
2016-10-12 22:39:33 +02:00
Gael Guennebaud
091d373ee9
Fix outer-stride.
2016-10-12 21:47:52 +02:00
Robert Lukierski
471075f7ad
Fixes min() warnings.
2016-10-12 18:59:05 +01:00
Gael Guennebaud
5c366fe1d7
Merged in rmlarsen/eigen (pull request PR-230)
...
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
2016-10-12 16:30:51 +00:00
Robert Lukierski
86711497c4
Adding EIGEN_DEVICE_FUNC in the Geometry module.
...
Additional CUDA necessary fixes in the Core (mostly usage of
EIGEN_USING_STD_MATH).
2016-10-12 16:35:17 +01:00
Rasmus Munk Larsen
47150af1c8
Fix copy-paste error: Must use _mm256_cmp_ps for AVX.
2016-10-12 08:34:39 -07:00
Gael Guennebaud
89e315152c
bug #1325 : fix compilation on NEON with clang
2016-10-12 16:55:47 +02:00
Benoit Steiner
7f0599b6eb
Manually define int16_t and uint16_t when compiling with Visual Studio
2016-10-08 22:56:32 -07:00
Benoit Steiner
5727e4d89c
Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used.
2016-10-08 22:19:03 +00:00
Benoit Steiner
5266ff8966
Cleaned up a regression test
2016-10-08 19:12:44 +00:00
Benoit Steiner
5c68051cd7
Merge the content of the ComputeCpp branch into the default branch
2016-10-07 11:04:16 -07:00
Gael Guennebaud
4860727ac2
Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload)
2016-10-07 09:21:12 +02:00
Benoit Steiner
507b661106
Renamed predux_half into predux_downto4
2016-10-06 17:57:04 -07:00
Benoit Steiner
a498ff7df6
Fixed incorrect comment
2016-10-06 15:27:27 -07:00
Benoit Steiner
8ba3c41fcf
Revergted unecessary change
2016-10-06 15:12:15 -07:00
Benoit Steiner
a7473d6d5a
Fixed compilation error with gcc >= 5.3
2016-10-06 14:33:22 -07:00
Benoit Steiner
5e64cea896
Silenced a compilation warning
2016-10-06 14:24:17 -07:00
Benoit Steiner
33fba3f08d
Merged in rryan/eigen/tensorfunctors (pull request PR-233)
...
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
2016-10-06 12:29:19 -07:00
RJ Ryan
bfc264abe8
Add a test that GPU complex product reductions match CPU reductions.
2016-10-06 11:10:14 -07:00
RJ Ryan
e2e9cdd169
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
2016-10-06 10:49:48 -07:00
Benoit Steiner
d485d12c51
Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.
2016-10-06 10:41:03 -07:00
Rasmus Munk Larsen
48c635e223
Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small.
...
Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K.
Improvements in Wall time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 3088 1610 +47.9%
BM_OuterishProd/64/4 3562 2414 +32.2%
BM_OuterishProd/64/32 8861 7815 +11.8%
BM_OuterishProd/128/1 11363 6504 +42.8%
BM_OuterishProd/128/4 11128 9794 +12.0%
BM_OuterishProd/128/64 27691 27396 +1.1%
BM_OuterishProd/256/1 33214 28123 +15.3%
BM_OuterishProd/256/4 34312 36818 -7.3%
BM_OuterishProd/256/128 174866 176398 -0.9%
BM_OuterishProd/512/1 7963684 104224 +98.7%
BM_OuterishProd/512/4 7987913 112867 +98.6%
BM_OuterishProd/512/256 8198378 1306500 +84.1%
BM_OuterishProd/1k/1 7356256 324432 +95.6%
BM_OuterishProd/1k/4 8129616 331621 +95.9%
BM_OuterishProd/1k/512 27265418 7517538 +72.4%
Improvements in CPU time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 6169 1608 +73.9%
BM_OuterishProd/64/4 7117 2412 +66.1%
BM_OuterishProd/64/32 17702 15616 +11.8%
BM_OuterishProd/128/1 45415 6498 +85.7%
BM_OuterishProd/128/4 44459 9786 +78.0%
BM_OuterishProd/128/64 110657 109489 +1.1%
BM_OuterishProd/256/1 265158 28101 +89.4%
BM_OuterishProd/256/4 274234 183885 +32.9%
BM_OuterishProd/256/128 1397160 1408776 -0.8%
BM_OuterishProd/512/1 78947048 520703 +99.3%
BM_OuterishProd/512/4 86955578 1349742 +98.4%
BM_OuterishProd/512/256 74701613 15584661 +79.1%
BM_OuterishProd/1k/1 78352601 3877911 +95.1%
BM_OuterishProd/1k/4 78521643 3966221 +94.9%
BM_OuterishProd/1k/512 258104736 89480530 +65.3%
2016-10-06 10:33:10 -07:00
Benoit Steiner
9f3276981c
Enabling AVX512 should also enable AVX2.
2016-10-06 10:29:48 -07:00
Gael Guennebaud
80b5133789
Fix compilation of qr.inverse() for column and full pivoting variants.
2016-10-06 09:55:50 +02:00
Benoit Steiner
4131074818
Deleted unecessary CMakeLists.txt file
2016-10-05 18:54:35 -07:00
Benoit Steiner
cb5cd69872
Silenced a compilation warning.
2016-10-05 18:50:53 -07:00
Benoit Steiner
78b569f685
Merged latest updates from trunk
2016-10-05 18:48:55 -07:00
Benoit Steiner
9c2b6c049b
Silenced a few compilation warnings
2016-10-05 18:37:31 -07:00
Benoit Steiner
6f3cd529af
Pulled latest updates from trunk
2016-10-05 18:31:43 -07:00
Benoit Steiner
d7f9679a34
Fixed a couple of compilation warnings
2016-10-05 15:00:32 -07:00
Benoit Steiner
ae1385c7e4
Pull the latest updates from trunk
2016-10-05 14:54:36 -07:00
Benoit Steiner
73b0012945
Fixed compilation warnings
2016-10-05 14:24:24 -07:00
Benoit Steiner
c84084c0c0
Fixed compilation warning
2016-10-05 14:15:41 -07:00
Benoit Steiner
4387433acf
Increased the robustness of the reduction tests on fp16
2016-10-05 10:42:41 -07:00
Benoit Steiner
aad20d700d
Increase the tolerance to numerical noise.
2016-10-05 10:39:24 -07:00
Benoit Steiner
8b69d5d730
::rand() returns a signed integer on win32
2016-10-05 08:55:02 -07:00
Benoit Steiner
ed7a220b04
Fixed a typo that impacts windows builds
2016-10-05 08:51:31 -07:00
Benoit Steiner
ceee1c008b
Silenced compilation warning
2016-10-04 18:47:53 -07:00
Benoit Steiner
698ff69450
Properly characterize the CUDA packet primitives for fp16 as device only
2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen
7f67e6dfdb
Update comment for fast sqrt.
2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen
765615609d
Update comment for fast sqrt.
2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb
Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
...
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
SSE AVX
Fast=1 2.529G 4.380G
Fast=0 1.944G 1.898G
Fast=1 fixed 2.214G 3.739G
This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Benoit Steiner
6af5ac7e27
Cleanup the cuda executor code.
2016-10-04 08:52:13 -07:00
Benoit Steiner
2f6d1607c8
Cleaned up the random number generation code.
2016-10-04 08:38:23 -07:00
Benoit Steiner
881b90e984
Use explicit type casting to generate packets of zeros.
2016-10-04 08:23:38 -07:00
Benoit Steiner
616a7a1912
Improved support for compiling CUDA code with clang as the host compiler
2016-10-03 17:09:33 -07:00
Benoit Steiner
409e887d78
Added support for constand std::complex numbers on GPU
2016-10-03 11:06:24 -07:00
Gael Guennebaud
9d6d0dff8f
bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code.
...
The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).
2016-10-01 15:37:00 +02:00
Gael Guennebaud
8b84801f7f
bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous
2016-09-30 22:49:59 +02:00
Benoit Steiner
422530946f
Renamed the SYCL tests to follow the standard naming convention.
2016-09-30 08:22:10 -07:00
Gael Guennebaud
67b4f45836
Fix angle range
2016-09-30 12:46:33 +02:00
Gael Guennebaud
27f3970453
Remove std:: prefix
2016-09-30 12:40:41 +02:00
Gael Guennebaud
3860a0bc8f
bug #1312 : Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative.
2016-09-29 23:23:35 +02:00
Gael Guennebaud
33500050c3
bug #1308 : fix compilation of some small products involving nullary-expressions.
2016-09-29 09:40:44 +02:00
Benoit Steiner
27d7628f16
Updated the list of warnings to reflect the new message ids introduced in cuda 8.0
2016-09-28 17:42:59 -07:00
Benoit Steiner
2bda1b0d93
Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.
2016-09-28 17:08:41 -07:00
Mehdi Goli
dd602e62c8
Converting alias template to nested struct in order to be compatible with CXX-03
2016-09-27 16:21:19 +01:00
Gael Guennebaud
f3a00dd2b5
Merged in sergiu/eigen (pull request PR-229)
...
Disabled MSVC level 4 warning C4714
2016-09-27 09:28:08 +02:00
Gael Guennebaud
892afb9416
Add debug info.
2016-09-26 23:53:57 +02:00
Gael Guennebaud
779774f98c
bug #1311 : fix alignment logic in some cases of (scalar*small).lazyProduct(small)
2016-09-26 23:53:40 +02:00
Benoit Steiner
6565f8d60f
Made the initialization of a CUDA device thread safe.
2016-09-26 11:00:32 -07:00
Gael Guennebaud
48dfe98abd
bug #1308 : fix compilation of vector * rowvector::nullary.
2016-09-25 14:54:35 +02:00
Sergiu Deitsch
fe29157d02
disabled MSVC level 4 warning C4714
...
The level 4 warning (/W4) warns about functions marked as __forceinline not
inlined, and generates a lot of noise.
2016-09-25 14:25:47 +02:00
Benoit Steiner
f6ac51a054
Made TensorEvalTo compatible with c++0x again.
2016-09-23 16:45:17 -07:00
Benoit Steiner
00d4e65f00
Deleted unused TensorMap data member
2016-09-23 16:44:45 -07:00
Gael Guennebaud
86caba838d
bug #1304 : fix Projective * scaling and Projective *= scaling
2016-09-23 13:41:21 +02:00
Gael Guennebaud
b9f7a17e47
Add missing file.
2016-09-23 10:26:08 +02:00
Benoit Steiner
1301d744f8
Made the gaussian generator usable on GPU
2016-09-22 19:04:44 -07:00
Benoit Steiner
2a69290ddb
Added a specialization of Eigen::numext::real and Eigen::numext::imag for std::complex<T> to be used when compiling a cuda kernel. This is unfortunately necessary to be able to process complex numbers from a CUDA kernel on MacOS.
2016-09-22 15:52:23 -07:00
Gael Guennebaud
3946768916
Added tag 3.3-rc1 for changeset 77e27fbeee
2016-09-22 22:38:36 +02:00
Gael Guennebaud
77e27fbeee
bump to 3.3-rc1
2016-09-22 22:37:39 +02:00
Gael Guennebaud
2ada122bc6
merge
2016-09-22 22:33:18 +02:00
Gael Guennebaud
8f2bdde373
merge
2016-09-22 22:32:55 +02:00
Gael Guennebaud
ba0f844d6b
Backout changeset ce3557ca69
2016-09-22 22:28:51 +02:00
Gael Guennebaud
9bcdc8b756
Add a nullary-functor example performing index-based sub-matrices.
2016-09-22 22:27:54 +02:00
Benoit Steiner
50e3bbfc90
Calls x.imag() instead of imag(x) when x is a complex number since the former
...
is a constexpr while the later isn't. This fixes compilation errors triggered by nvcc on Mac.
2016-09-22 13:17:25 -07:00
Gael Guennebaud
ca3746c6f8
Bypass identity reflectors.
2016-09-22 22:07:13 +02:00
Felix Gruber
8bde7da086
fix documentation of LinSpaced
...
The index of the highest value in a LinSpace is size-1.
2016-09-22 14:50:07 +02:00
Gael Guennebaud
66cbabafed
Add a note regarding gcc bug #72867
2016-09-22 11:18:52 +02:00
Christoph Hertzberg
4b377715d7
Do not manually add absolute path to boost-library.
...
Also set C++ standard for blaze to C++14
2016-09-22 00:10:47 +02:00
Gael Guennebaud
aecc51a3e8
fix typo
2016-09-21 21:53:00 +02:00
Gael Guennebaud
1fc3a21ed0
Disable a failure test if extended double precision is in use (x87)
2016-09-21 20:09:07 +02:00
Gael Guennebaud
9fa2c8650e
Fix alignement of statically allocated temporaries in symv, and trmv.
2016-09-21 17:34:24 +02:00
Gael Guennebaud
ac5377e161
Improve cost estimation of complex division
2016-09-21 17:26:04 +02:00
Gael Guennebaud
5269d11935
Fix compilation if ICC.
2016-09-21 17:08:51 +02:00
Benoit Steiner
26f9907542
Added missing typedefs
2016-09-20 12:58:03 -07:00
RJ Ryan
608b1acd6d
Don't use c++11 features and fix include.
2016-09-20 07:49:05 -07:00
RJ Ryan
b2c6dc48d9
Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.
2016-09-20 07:18:20 -07:00
Benoit Steiner
8a66ca4b10
Pulled latest updates from trunk
2016-09-19 14:13:55 -07:00
Benoit Steiner
59e9edfbf1
Removed EIGEN_DEVICE_FUNC qualifers for the lu(), fullPivLu(), partialPivLu(), and inverse() functions since they aren't ready to run on GPU
2016-09-19 14:13:20 -07:00
Gael Guennebaud
3ada6e4bed
Merged hongkai-dai/eigen/tip into default (bug #1298 )
2016-09-19 22:08:06 +02:00
Benoit Steiner
c3ca9b1e76
Deleted some unecessary and confusing EIGEN_DEVICE_FUNC
2016-09-19 11:33:39 -07:00
Hongkai Dai
5dcc6d301a
remove ternary operator in euler angles
2016-09-19 10:30:30 -07:00
Luke Iwanski
c771df6bc3
Updated the owners of the file.
2016-09-19 14:09:25 +01:00
Luke Iwanski
b91e021172
Merged with default.
2016-09-19 14:03:54 +01:00
Luke Iwanski
cb81975714
Partial OpenCL support via SYCL compatible with ComputeCpp CE.
2016-09-19 12:44:13 +01:00
Gael Guennebaud
bf03820339
Silent warning.
2016-09-17 14:14:01 +02:00
Gael Guennebaud
de05a18fe0
fix compilation with boost::multiprec
2016-09-17 14:13:48 +02:00
Gael Guennebaud
4cc2c73e6a
Fix alignement of statically allocated temporaries in gemv.
2016-09-17 12:52:27 +02:00
Christoph Hertzberg
ce3557ca69
Make makeHouseholder more stable for cases where real(c0) is not very small (but the rest is).
2016-09-16 14:24:47 +02:00
Emil Fresk
6edd2e2851
Made AutoDiffJacobian more intuitive to use and updated for C++11
...
Changes:
* Removed unnecessary types from the Functor by inferring from its types
* Removed inputs() function reference, replaced with .rows()
* Updated the forward constructor to use variadic templates
* Added optional parameters to the Fuctor for passing parameters,
control signals, etc
* Has been tested with fixed size and dynamic matricies
Ammendment by chtz: overload operator() for compatibility with not fully conforming compilers
2016-09-16 14:03:55 +02:00
Gael Guennebaud
4adeababf9
Fix undeflow
2016-09-16 11:46:46 +02:00
Gael Guennebaud
18f6e47815
Fix order of "static inline".
2016-09-16 11:32:54 +02:00
Gael Guennebaud
ee62f168e6
Doc: add link from block methods to respective tutorial section.
2016-09-16 11:26:25 +02:00
Gael Guennebaud
ca7f061a5f
bug #828 : clarify documentation of SparseMatrixBase's methods returning a sub-matrix.
2016-09-16 11:23:19 +02:00
Gael Guennebaud
50e203c717
bug #828 : clarify documentation of SparseMatrixBase's unary methods.
2016-09-16 10:40:50 +02:00
Gael Guennebaud
fa9049a544
Let be consistent and consider any denormal number as zero.
2016-09-15 11:24:03 +02:00
Gael Guennebaud
b33144e4df
merge
2016-09-15 11:22:16 +02:00
Benoit Steiner
c0d56a543e
Added several missing EIGEN_DEVICE_FUNC qualifiers
2016-09-14 14:06:21 -07:00
Benoit Steiner
488ad7dd1b
Added missing EIGEN_DEVICE_FUNC qualifiers
2016-09-14 13:35:00 -07:00
Benoit Steiner
779faaaeba
Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling the EIGEN_THROW macro
2016-09-14 09:56:11 -07:00
Gael Guennebaud
1c8347e554
Fix product for custom complex type. (conjugation was ignored)
2016-09-14 18:28:49 +02:00
Benoit Steiner
ff47717f25
Suppress warning 2527 and 2529, which correspond to the "calling a __host__ function from a __host__ __device__ function is not allowed" message in nvcc 6.5.
2016-09-13 12:49:40 -07:00
Benoit Steiner
309190cf02
Suppress message 1222 when compiling with nvcc: this ensures that we don't warnings about unknown warning messages when compiling with older versions of nvcc
2016-09-13 12:42:13 -07:00
Gael Guennebaud
c10620b2b0
Fix typo in doc.
2016-09-13 09:25:07 +02:00
Gael Guennebaud
73c8f2f697
bug #1285 : fix regression introduced in changeset 00c29c2cae
2016-09-13 07:58:39 +02:00
Benoit Steiner
e4d4d15588
Register the cxx11_tensor_device only for recent cuda architectures (i.e. >= 3.0) since the test instantiate contractions that require a modern gpu.
2016-09-12 19:01:52 -07:00
Benoit Steiner
4dfd888c92
CUDA contractions require arch >= 3.0: don't compile the cuda contraction tests on older architectures.
2016-09-12 18:49:01 -07:00
Benoit Steiner
028e299577
Fixed a bug impacting some outer reductions on GPU
2016-09-12 18:36:52 -07:00
Benoit Steiner
5f50f12d2c
Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem.
2016-09-12 13:46:13 -07:00
Benoit Steiner
8321dcce76
Merged latest updates from trunk
2016-09-12 10:33:05 -07:00
Benoit Steiner
eb6ba00cc8
Properly size the list of waiters
2016-09-12 10:31:55 -07:00
Benoit Steiner
a618094b62
Added a resize method to MaxSizeVector
2016-09-12 10:30:53 -07:00
Gael Guennebaud
228ae29591
Fix compilation on 32 bits systems.
2016-09-09 22:34:38 +02:00
Gael Guennebaud
471eac5399
bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)
2016-09-08 08:36:27 +02:00
Gael Guennebaud
d780983f59
Doc: explain minimal requirements on nullary functors
2016-09-06 23:14:52 +02:00
Gael Guennebaud
85fb517eaf
Generalize ScalarBinaryOpTraits to any complex-real combination as defined by NumTraits (instead of supporting std::complex only).
2016-09-06 17:23:15 +02:00
Gael Guennebaud
447f269561
Disable previous workaround.
2016-09-06 15:49:02 +02:00
Gael Guennebaud
b046a3f87d
Workaround MSVC instantiation faillure of has_*ary_operator at the level of triats<Ref>::match so that the has_*ary_operator are really properly instantiated throughout the compilation unit.
2016-09-06 15:47:04 +02:00
Gael Guennebaud
3cb914f332
bug #1266 : remove CUDA guards on MatrixBase::<decomposition> definitions. (those used to break old nvcc versions that we propably don't care anymore)
2016-09-06 09:55:50 +02:00
Gael Guennebaud
e1642f485c
bug #1288 : fix memory leak in arpack wrapper.
2016-09-05 18:01:30 +02:00
Gael Guennebaud
19a95b3309
Fix shadowing wrt Eigen::Index
2016-09-05 17:19:47 +02:00
Gael Guennebaud
dabc81751f
Fix compilation when cuda_fp16.h does not exist.
2016-09-05 17:14:20 +02:00
Gael Guennebaud
e13071dd13
Workaround a weird msvc 2012 compilation error.
2016-09-05 15:50:41 +02:00
Gael Guennebaud
d123717e21
Fix for msvc 2012 and older
2016-09-05 15:26:56 +02:00
Benoit Steiner
87a8a1975e
Fixed a regression test
2016-09-02 19:29:33 -07:00
Benoit Steiner
13df3441ae
Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled.
2016-09-02 19:25:47 -07:00
Benoit Steiner
373c340b71
Fixed a typo
2016-09-02 15:41:17 -07:00
Benoit Steiner
cadd124d73
Pulled latest update from trunk
2016-09-02 15:30:02 -07:00
Benoit Steiner
05b0518077
Made the index type an explicit template parameter to help some compilers compile the code.
2016-09-02 15:29:34 -07:00
Benoit Steiner
adf864fec0
Merged in rmlarsen/eigen (pull request PR-222)
...
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 14:11:20 -07:00
Benoit Steiner
5a6be66cef
Turned the Index type used by the nullary wrapper into a template parameter.
2016-09-02 14:10:29 -07:00
Rasmus Munk Larsen
13e93ca8b7
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 13:41:36 -07:00
Benoit Steiner
6c05c3dd49
Fix the cxx11_tensor_cuda.cu test on 32bit platforms.
2016-09-02 11:12:16 -07:00
Gael Guennebaud
49c0390ce0
merge
2016-09-02 15:24:14 +02:00
Gael Guennebaud
d6c8366d84
Fix compilation with MSVC 2012
2016-09-02 15:23:32 +02:00
Benoit Steiner
039e225f7f
Added a test for nullary expressions on CUDA
...
Also check that we can mix 64 and 32 bit indices in the same compilation unit
2016-09-01 13:28:12 -07:00
Benoit Steiner
c53f783705
Updated the contraction code to support constant inputs.
2016-09-01 11:41:27 -07:00
Gael Guennebaud
ef54723dbe
One more msvc fix iteration, the previous one was over-simplified for visual
2016-09-01 15:04:53 +02:00
Gael Guennebaud
46475eff9a
Adjust Tensor module wrt recent change in nullary functor
2016-09-01 13:40:45 +02:00
Gael Guennebaud
72a4d49315
Fix compilation with CUDA 8
2016-09-01 13:39:33 +02:00
Gael Guennebaud
f9f32e9e2d
Fix compilation with nvcc
2016-09-01 13:06:14 +02:00
Gael Guennebaud
3d946e42b3
Fix compilation with visual studio
2016-09-01 12:59:32 +02:00
Benoit Steiner
221f619bea
Merged in rmlarsen/eigen (pull request PR-221)
...
Fix bugs to make min- and max reducers work with correctly with IEEE infinities.
2016-08-31 15:10:10 -07:00
Rasmus Munk Larsen
a1e092d1e8
Fix bugs to make min- and max reducers with correctly with IEEE infinities.
2016-08-31 15:04:16 -07:00
Gael Guennebaud
836fa25a82
Make sure sizeof is truelly needed, thus improving SFINAE portability.
2016-08-31 23:40:18 +02:00
Gael Guennebaud
84cf6e42ca
minor tweaks in has_* helpers
2016-08-31 23:04:14 +02:00
Gael Guennebaud
7ae819123c
Simplify CwiseNullaryOp example.
2016-08-31 15:46:04 +02:00
Gael Guennebaud
218c37beb4
bug #1286 : automatically detect the available prototypes of functors passed to CwiseNullaryExpr such that functors have only to implement the operators that matters among:
...
operator()()
operator()(i)
operator()(i,j)
Linear access is also automatically detected based on the availability of operator()(i,j).
2016-08-31 15:45:25 +02:00
Gael Guennebaud
efe2c225c9
bug #1283 : add regression unit test
2016-08-31 13:04:29 +02:00
Gael Guennebaud
3456247437
bug #1283 : quick fix for products involving uncommon general block access to vectors.
2016-08-31 08:17:15 +02:00
Gael Guennebaud
8c48d42530
Fix 4x4 inverse with non-linear destination
2016-08-30 23:16:38 +02:00
Gael Guennebaud
e7fbbc2748
Doc: add links and discourage user to write their own expression (better use CwiseNullaryOp)
2016-08-30 15:57:46 +02:00
Gael Guennebaud
1e2ab8b0b3
Doc: add an exemple showing how custom expression can be advantageously implemented via CwiseNullaryOp.
2016-08-30 15:40:41 +02:00
Gael Guennebaud
9c9e23858e
Doc: split customizing-eigen page into sub-pages and re-structure a bit the different topics
2016-08-30 11:10:08 +02:00
Gael Guennebaud
cffe8bbff7
Doc: add link to example
2016-08-30 10:45:27 +02:00
Gael Guennebaud
c57317035a
Fix unit test for 1x1 matrices
2016-08-30 10:20:23 +02:00
Gael Guennebaud
1f84f0d33a
merge EulerAngles module
2016-08-30 10:01:53 +02:00
Gael Guennebaud
68e803a26e
Fix warning
2016-08-30 09:21:57 +02:00
Gael Guennebaud
e074f720c7
Include missing forward declaration of SparseMatrix
2016-08-29 18:56:46 +02:00
Gael Guennebaud
2915e1fc5d
Revert part of changeset 5b3a6f51d3
...
to keep accuracy of smallest eigenvalues.
2016-08-29 14:14:18 +02:00
Gael Guennebaud
7e029d1d6e
bug #1271 : add SparseMatrix::coeffs() methods returning a 1D view of the non zero coefficients.
2016-08-29 12:06:37 +02:00
Gael Guennebaud
a93e354d92
Add some pre-allocation unit tests (not working yet)
2016-08-29 11:08:44 +02:00
Gael Guennebaud
6cd7b9ea6b
Fix compilation with cuda 8
2016-08-29 11:06:08 +02:00
Gael Guennebaud
8f4b4ad5fb
use ::hlog if available.
2016-08-29 11:05:32 +02:00
Gael Guennebaud
35a8e94577
bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.
2016-08-29 10:59:37 +02:00
Gael Guennebaud
0decc31aa8
Add generic implementation of conj_helper for custom complex types.
2016-08-29 09:42:29 +02:00
Gael Guennebaud
fd9caa1bc2
bug #1282 : fix implicit double to float conversion warning
2016-08-28 22:45:56 +02:00
Gael Guennebaud
68d1897e8a
Make sure that our log1p implementation is called as a last resort only.
2016-08-26 15:30:55 +02:00
Gael Guennebaud
fe60856fed
Add overload of numext::log1p for float/double in CUDA
2016-08-26 15:28:59 +02:00
Gael Guennebaud
0f56b5a6de
enable vectorization path when testing half on cuda, and add test for log1p
2016-08-26 14:55:51 +02:00
Gael Guennebaud
965e595f02
Add missing log1p method
2016-08-26 14:55:00 +02:00
Gael Guennebaud
1329c55875
Fix compilation with boost::multiprec.
2016-08-25 14:54:39 +02:00
Gael Guennebaud
441b7eaab2
Add support for non trivial scalar factor in sparse selfadjoint * dense products, and enable +=/-= assignement for such products.
...
This changeset also improves the performance by working on column of the result at once.
2016-08-24 13:06:34 +02:00
Gael Guennebaud
8132a12625
bug #1268 : detect faillure in LDLT and report them through info()
2016-08-23 23:15:55 +02:00
Gael Guennebaud
bde9b456dc
Typo
2016-08-23 21:36:36 +02:00
Gael Guennebaud
326320ec7b
Fix compilation in non C++11 mode.
2016-08-23 19:28:57 +02:00
Gael Guennebaud
ea2e968257
Address several implicit scalar conversions.
2016-08-23 18:44:33 +02:00
Gael Guennebaud
0a6a50d1b0
Cleanup eiegnvector extraction: leverage matrix products and compile-time sizes, remove numerous useless temporaries.
2016-08-23 18:14:37 +02:00
Gael Guennebaud
00b2666853
bug #645 : patch from Tobias Wood implementing the extraction of eigenvectors in GeneralizedEigenSolver
2016-08-23 17:37:38 +02:00
Gael Guennebaud
504a4404f1
Optimize expression matching "d?=a-b*c" as "d?=a; d?=b*c;"
2016-08-23 16:52:22 +02:00
Gael Guennebaud
e47a8928ec
Fix compilation in check_for_aliasing due to ambiguous specializations
2016-08-23 16:19:10 +02:00
Gael Guennebaud
6739f6bb1b
Merged in traversaro/eigen-1/traversaro/modify-findeigen3cmake-to-find-eigen3con-1469782761059 (pull request PR-213)
...
Modify FindEigen3.cmake to find Eigen3Config.cmake
2016-08-23 15:53:57 +02:00
Gael Guennebaud
ef3de20481
Cleanup cost of tanh
2016-08-23 14:39:55 +02:00
Gael Guennebaud
b3151bca40
Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.
2016-08-23 14:24:08 +02:00
Gael Guennebaud
a4c266f827
Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.
2016-08-23 14:23:08 +02:00
Gael Guennebaud
82147cefff
Fix possible overflow and biais in integer random generator
2016-08-23 13:25:31 +02:00
Silvio Traversaro
068ccab9fe
FindEigen3.cmake : search for package only if EIGEN3_INCLUDE_DIR is not already defined
2016-08-22 22:13:10 +00:00
Gael Guennebaud
581b6472d1
bug #1265 : remove outdated notes
2016-08-22 23:25:39 +02:00
Igor Babuschkin
59bacfe520
Fix compilation on CUDA 8 by removing call to h2log1p
2016-08-15 23:38:05 +01:00
Benoit Steiner
34ae80179a
Use array_prod instead of calling TotalSize since TotalSize is only available on DSize.
2016-08-15 10:29:14 -07:00
Benoit Steiner
2556565b4b
Merged in ibab/eigen/extend-log1p (pull request PR-218)
...
Fix compilation on CUDA 8 due to missing h2log1p function
2016-08-15 08:31:03 -07:00
Benoit Steiner
30dd6f5e34
Close branch extend-log1p
2016-08-15 08:31:03 -07:00
Benoit Steiner
fe73648c98
Fixed a bug in the documentation.
2016-08-12 10:00:43 -07:00
Christoph Hertzberg
9636a8ed43
bug #1273 : Add parentheses when redefining eigen_assert
2016-08-12 15:34:21 +02:00
Christoph Hertzberg
c83b754ee0
bug #1272 : Disable assertion when total number of columns is zero.
...
Also moved assertion to finished() method and adapted unit-test
2016-08-12 15:15:34 +02:00
Benoit Steiner
e3a8dfb02f
std::erfcf doesn't exist: use numext::erfc instead
2016-08-11 15:24:06 -07:00
Benoit Steiner
64e68cbe87
Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything.
2016-08-08 19:29:59 -07:00
Benoit Steiner
5157ce8cbf
Merged in ibab/eigen/extend-log1p (pull request PR-217)
...
Add log1p support for CUDA and half floats
2016-08-08 14:50:00 -07:00
Igor Babuschkin
aee693ac52
Add log1p support for CUDA and half floats
2016-08-08 20:24:59 +01:00
Benoit Steiner
72096f3bd4
Merged in suiyuan2009/eigen/fix_tanh_inconsistent_for_tensorflow (pull request PR-215)
...
Fix_tanh_inconsistent_for_tensorflow
2016-08-08 09:06:45 -07:00
Christoph Hertzberg
3e4a33d4ba
bug #1272 : Let CommaInitializer work for more border cases (enhances fix of bug #1242 ).
...
The unit test tests all combinations of 2x2 block-sizes from 0 to 3.
2016-08-08 17:26:48 +02:00
Ziming Dong
1031223c09
fix tanh inconsistent
2016-08-06 19:48:50 +08:00
Ziming Dong
5cf1e4c79b
create fix_tanh_inconsistent branch
2016-08-06 15:54:33 +08:00
Christoph Hertzberg
fe4b927e9c
Add aliases Eigen_*_DIR to Eigen3_*_DIR
...
This is to make configuring work again after project was renamed from Eigen to Eigen3
2016-08-05 15:21:14 +02:00
Benoit Steiner
fe778427f2
Fixed the constructors of the new half_base class.
2016-08-04 18:32:26 -07:00
Benoit Steiner
5eea1c7f97
Fixed cut and paste bug in debud message
2016-08-04 17:34:13 -07:00
Benoit Steiner
9506343349
Fixed the isnan, isfinite and isinf operations on GPU
2016-08-04 17:25:53 -07:00
Benoit Steiner
b50d8f8c4a
Extended a regression test to validate that we basic fp16 support works with cuda 7.0
2016-08-03 16:50:13 -07:00
Benoit Steiner
fad9828769
Deleted redundant regression test.
2016-08-03 16:08:37 -07:00
Benoit Steiner
373bb12dc6
Check that it's possible to forward declare the hlaf type.
2016-08-03 16:07:31 -07:00
Gael Guennebaud
17b9a55d98
Move Eigen::half_impl::half to Eigen::half while preserving the free functions to the Eigen::half_impl namespace together with ADL
2016-08-04 00:00:43 +02:00
Benoit Steiner
ca2cee2739
Merged in ibab/eigen (pull request PR-206)
...
Expose real and imag methods on Tensors
2016-08-03 11:53:04 -07:00
Benoit Steiner
d92df04ce8
Cleaned up the new float16 test a bit
2016-08-03 11:50:07 -07:00
Benoit Steiner
81099ef482
Added a test for fp16
2016-08-03 11:41:17 -07:00
Benoit Steiner
a20b58845f
CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.
2016-08-03 10:00:43 -07:00
Gael Guennebaud
819d0cea1b
List PARDISO solver.
2016-08-02 23:32:41 +02:00
Christoph Hertzberg
f4404777ff
Change project name to Eigen3, to be compatible with FindEigen3.cmake and Eigen3Config.cmake.
...
This is related to pull-requests 214.
2016-08-02 17:08:57 +00:00
Benoit Steiner
fd220dd8b0
Use numext::conj instead of std::conj
2016-08-01 18:16:16 -07:00
Benoit Steiner
e256acec7c
Avoid unecessary object copies
2016-08-01 17:03:39 -07:00
Gael Guennebaud
7995cec90c
Fix vectorization logic for coeff-based product for some corner cases.
2016-07-31 15:20:22 +02:00
Benoit Steiner
02fe89f5ef
half implementation has been moved to half_impl namespace
2016-07-29 15:09:34 -07:00
Benoit Steiner
2693fd54bf
bug #1266 : half implementation has been moved to half_impl namespace
2016-07-29 13:45:56 -07:00
Christoph Hertzberg
c5b893f434
bug #1266 : half implementation has been moved to half_impl namespace
2016-07-29 18:36:08 +02:00
Silvio Traversaro
5e51a361fe
Modify FindEigen3.cmake to find Eigen3Config.cmake
2016-07-29 08:59:38 +00:00
klimpel
ca5effa16c
MSVC-2010 is making problems with SFINAE again. But restricting to the variant for very old compilers (enum, template<typename C> for both function definitions) fixes the problem.
2016-07-28 15:58:17 +01:00
Gael Guennebaud
4057f9b1fc
Enable slice-vectorization+inner-unrolling when unaligned vectorization is allowed. For instance, this permits to vectorize 5x5 matrices (including product)
2016-07-28 13:47:33 +02:00
Gael Guennebaud
5fbe7aa604
Update and fix Cholesky mini benchmark
2016-07-28 11:26:30 +02:00
Gael Guennebaud
a72752caac
Vectorize more small product expressions by letting the general assignement logic decides on the sizes that are OK for vectorization.
2016-07-28 11:21:07 +02:00
Gael Guennebaud
cc2f6d68b1
bug #1264 : fix compilation
2016-07-27 23:30:47 +02:00
Gael Guennebaud
188590db82
Add instructions for LAPACKE+Accelerate
2016-07-27 15:07:35 +02:00
Gael Guennebaud
8972323c08
Big 1261: add missing max(ADS,ADS) overload (same for min)
2016-07-27 14:52:48 +02:00
Gael Guennebaud
5d94dc85e5
bug #1260 : add regression test
2016-07-27 14:38:30 +02:00
Gael Guennebaud
0d7039319c
bug #1260 : remove doubtful specializations of ScalarBinaryOpTraits
2016-07-27 14:35:52 +02:00
Christoph Hertzberg
d3d7c6245d
Add brackets to block matrix and fixed some typos
2016-07-27 09:55:39 +02:00
Gael Guennebaud
0eece608b4
Added tag 3.3-beta2 for changeset f6b3cf8de9
2016-07-26 23:52:14 +02:00
Gael Guennebaud
f6b3cf8de9
Bump to 3.3-beta2
2016-07-26 23:51:59 +02:00
Gael Guennebaud
9d16b6e1cf
Formatting
2016-07-26 23:51:43 +02:00
Gael Guennebaud
fd2f989b1d
Fix testing of nearly zero input matrices.
2016-07-26 14:46:02 +02:00
Gael Guennebaud
c9e3e438eb
Add more very small numbers in the list of nearly "zero" values when testing SVD and EVD algorithms
2016-07-26 14:45:44 +02:00
Gael Guennebaud
95113cb15c
Improve robustness of 2x2 eigenvalue with shifting and scaling
2016-07-26 14:43:54 +02:00
Gael Guennebaud
7f7e84aa36
Fix compilation with MKL support
2016-07-26 13:31:29 +02:00
Gael Guennebaud
429028b652
Typo.
2016-07-26 12:12:53 +02:00
Gael Guennebaud
6b89fa802c
Typos.
2016-07-26 12:08:04 +02:00
Gael Guennebaud
c581c8fa79
Fix with expession template scalar types.
2016-07-26 11:33:28 +02:00
Gael Guennebaud
8021aed89e
Split BLAS/LAPACK versus MKL documentation
2016-07-26 11:11:59 +02:00
Gael Guennebaud
757971e7ea
bug #1258 : fix compilation of Map<SparseMatrix>::coeffRef
2016-07-26 09:40:19 +02:00
Gael Guennebaud
c9425492c8
Update doc.
2016-07-25 18:41:26 +02:00
Gael Guennebaud
0592b4cfbf
merge
2016-07-25 18:20:22 +02:00
Gael Guennebaud
9c663e4ee8
Clean references to MKL in LAPACKe support.
2016-07-25 18:20:08 +02:00
Gael Guennebaud
0c06077efa
Rename MKL files
2016-07-25 18:00:47 +02:00
Gael Guennebaud
4d54e3dd33
bug #173 : remove dependency to MKL for LAPACKe backend.
2016-07-25 17:55:07 +02:00
Benoit Steiner
3d3d34e442
Deleted dead code.
2016-07-25 08:53:37 -07:00
Gael Guennebaud
34b483e25d
bug #1249 : enable use of __builtin_prefetch for GCC, clang, and ICC only.
2016-07-25 15:17:45 +02:00
Gael Guennebaud
6d5daf32f5
bug #1255 : comment out broken and unsused line.
2016-07-25 14:48:30 +02:00
Gael Guennebaud
f9598d73b5
bug #1250 : fix pow() for AutoDiffScalar with custom nested scalar type.
2016-07-25 14:42:19 +02:00
Gael Guennebaud
fd1117f2be
Implement digits10 for mpreal
2016-07-25 14:38:55 +02:00
Gael Guennebaud
9908020d36
Add minimal support for Array<string>, and fix Tensor<string>
2016-07-25 14:25:56 +02:00
Gael Guennebaud
4184a3e544
Extend boost.multiprec unit test with ET on, complexes, and general/generalized eigenvalue solvers.
2016-07-25 12:36:22 +02:00
Gael Guennebaud
1b2049fbda
Enforce scalar types in calls to max/min (helps with expression template scalar types)
2016-07-25 12:35:10 +02:00
Gael Guennebaud
b118bc76eb
Add digits10 overload for complex.
2016-07-25 12:33:21 +02:00
Gael Guennebaud
c96af5381f
Remove custom complex division function cdiv.
2016-07-25 12:31:58 +02:00
Gael Guennebaud
e1c7c5968a
Update doc.
2016-07-25 11:18:04 +02:00
Gael Guennebaud
8fffc81606
Add NumTraits::digits10() function based on numeric_limits::digits10 and make use of it for printing matrices.
2016-07-25 11:13:01 +02:00
Gael Guennebaud
5f03584752
merge
2016-07-23 17:52:44 +02:00
Gael Guennebaud
1b0353c659
Fix misuse of dummy_precesion in eigenvalues solvers
2016-07-23 17:52:31 +02:00
Benoit Steiner
c6b0de2c21
Improved partial reductions in more cases
2016-07-22 17:18:20 -07:00
Gael Guennebaud
72744d93ef
Allows the compiler to inline outer products (the change from default to dont-inline in changeset 737bed19c1
...
was not motivated)
2016-07-22 17:02:28 +02:00
Gael Guennebaud
32d95e86c9
merge
2016-07-22 16:43:12 +02:00
Gael Guennebaud
60d5980a41
add a note
2016-07-22 15:46:23 +02:00
Gael Guennebaud
d7a0e52478
Fix testing of log nearby 1
2016-07-22 15:44:26 +02:00
Gael Guennebaud
7acf23c14c
Truely split unit test.
2016-07-22 15:41:23 +02:00
Gael Guennebaud
24af67a6cc
Fix boostmultiprec for C++03
2016-07-22 15:30:54 +02:00
Gael Guennebaud
395c835f4b
Fix CUDA compilation
2016-07-22 15:30:24 +02:00
Gael Guennebaud
d075d122ea
Move half unit test from unsupported to main tests
2016-07-22 14:34:19 +02:00
Gael Guennebaud
47afc9a365
More cleaning in half:
...
- put its definition and functions in its own half_impl namespace such that the free function does not polute the Eigen namespace while still making them visible for half through ADL.
- expose Eigen::half throguh a using statement
- move operator<< from std to half_float namespace
2016-07-22 14:33:28 +02:00
Gael Guennebaud
0f350a8b7e
Fix CUDA compilation
2016-07-21 18:47:07 +02:00
Gael Guennebaud
bf91a44f4a
Use ADL and log10 for printing matrices.
2016-07-21 15:48:24 +02:00
Gael Guennebaud
82798162c0
Extend unit testing of half with ADL and arrays.
2016-07-21 15:47:21 +02:00
Gael Guennebaud
87fbda812f
Add missing log10 and random generator for half.
2016-07-21 15:46:45 +02:00
Gael Guennebaud
01d12d3e82
Some cleanup in Halh: standard functions should be defined in the namespace of the class half to make ADL work, and thus the global is* functions can be removed.
2016-07-21 15:10:48 +02:00
Gael Guennebaud
007edee1ac
Add a doc page summarizing the true speed of Eigen's decompositions.
2016-07-21 12:32:02 +02:00
Gael Guennebaud
9b76be9d21
Update benchmark for dense solver to stress least-squares pb, and to output a HTML table
2016-07-21 12:30:53 +02:00
Gael Guennebaud
72950effdf
enable testing of Boost.Multiprecision with expression templates
2016-07-20 18:21:30 +02:00
Yi Lin
7b4abc2b1d
Fixed a code comment error
2016-07-20 22:28:54 +08:00
Gael Guennebaud
b64b9d0172
Add a unit test to stress our solvers with Boost.Multiprecision
2016-07-20 15:20:14 +02:00
Gael Guennebaud
5e4dda8a12
Enable custom scalar types in some unit tests.
2016-07-20 15:19:17 +02:00
Gael Guennebaud
87d480d785
Make use of EIGEN_TEST_MAX_SIZE
2016-07-20 15:14:20 +02:00
Gael Guennebaud
7722913475
Fix ambiguous specialization with custom scalar type
2016-07-20 15:13:44 +02:00
Gael Guennebaud
fd057f86b3
Complete the coeff-wise math function table.
2016-07-20 12:14:10 +02:00
Gael Guennebaud
9e8476ef22
Add missing Eigen::rsqrt global function
2016-07-20 11:59:49 +02:00
Gael Guennebaud
4b4c296d6e
Simplify ScalarBinaryOpTraits by removing the Defined enum, and extend its documentation.
2016-07-20 09:56:39 +02:00
Gael Guennebaud
e3bf874c83
Workaround MSVC 2010 compilation issue.
2016-07-18 15:17:25 +02:00
Gael Guennebaud
0f89c6d6b5
Add a summary of possible values for EIGEN_COMP_MSVC
2016-07-18 15:16:13 +02:00
Gael Guennebaud
18884f17d7
Remove static constant declaration: this enforces compiler to generate costly code for thread safety.
2016-07-18 15:05:17 +02:00
Gael Guennebaud
79574e384e
Make scalar_product_op the default (instead of void)
2016-07-18 12:03:05 +02:00
Gael Guennebaud
6a3c451c1c
Permits call to explicit ctor.
2016-07-18 12:02:20 +02:00
Gael Guennebaud
0c3fe4aca5
merge
2016-07-18 10:44:15 +02:00
Gael Guennebaud
db9b154193
Add missing non-const reverse method in VectorwiseOp.
2016-07-16 15:19:28 +02:00
Gael Guennebaud
461cd819c2
Workaround VS2015 bug
2016-07-13 18:46:01 +02:00
Gael Guennebaud
5ea0864c81
Fix regression in a previous commit: some diagonal entry might not be treated by the 2x2 real preconditioner.
2016-07-13 18:37:54 +02:00
Benoit Steiner
20f7ef2f89
An evalTo expression is only aligned iff both the lhs and the rhs are aligned.
2016-07-12 10:56:42 -07:00
Gael Guennebaud
b4343aa67e
Avoid division by very small entries when extracting singularvalues, and explicitly handle the 1x1 complex case.
2016-07-12 17:22:03 +02:00
Gael Guennebaud
e2aa58b631
Consider denormals as zero in makeJacobi and 2x2 SVD.
...
This also fix serious issues with x387 for which values can be much smaller than the smallest denormal!
2016-07-12 17:21:03 +02:00
Gael Guennebaud
263993a7b6
Fix test for nearly null input
2016-07-12 17:19:26 +02:00
Gael Guennebaud
9ab35d8ba4
Fix compilation of doc
2016-07-12 16:47:39 +02:00
Gael Guennebaud
19614497ae
Add some doxygen's images to support both old and recent doxygen versions
...
(with some vague definitions of old and recent ;) )
2016-07-12 16:45:43 +02:00
Gael Guennebaud
c98bac2966
Manually add -stdd=c++11 to nvcc for old cmake versions
2016-07-12 09:29:18 +02:00
Benoit Steiner
013a904237
Pulled latest updates from trunk
2016-07-11 14:29:05 -07:00
Benoit Steiner
40eb97516c
reverted unintended change.
2016-07-11 14:28:03 -07:00
Benoit Steiner
03b71c273e
Made the packetmath test compile again. A better fix would be to move the special function tests to the unsupported directory where the code now resides.
2016-07-11 13:50:24 -07:00
Benoit Steiner
3a2dd352ae
Improved the contraction mapper to properly support tensor products
2016-07-11 13:43:41 -07:00
Benoit Steiner
0bc020be9d
Improved the detection of packet size in the tensor scan evaluator.
2016-07-11 12:14:56 -07:00
Gael Guennebaud
a96a7ce3f7
Move CUDA's special functions to SpecialFunctions module.
2016-07-11 18:39:11 +02:00
Gael Guennebaud
bec35f4c55
Clarify that SpecialFunctions is unsupported
2016-07-11 18:38:40 +02:00
Gael Guennebaud
fd60966310
merge
2016-07-11 18:11:47 +02:00
Gael Guennebaud
7d636349dc
Fix configuration of CUDA:
...
- preserve user defined CUDA_NVCC_FLAGS
- remove the -ansi flag that conflicts with -std=c++11
- do not add -std=c++11 if already there
2016-07-11 18:09:04 +02:00
klimpel
8b3fc31b55
compile fix (SFINAE variant apparently didn't work for all compilers) for the following compiler/platform:
...
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.
2016-07-11 17:42:22 +02:00
Gael Guennebaud
3e348fdcf9
Workaround MSVC bug
2016-07-11 15:24:52 +02:00
Gael Guennebaud
131ee4bb8e
Split test_slice_in_expr which seems to be huge for visual
2016-07-11 11:46:55 +02:00
Gael Guennebaud
194daa3048
Fix assertion (it did not make sense for static_val types)
2016-07-11 11:39:27 +02:00
Gael Guennebaud
18c35747ce
Emulate _BitScanReverse64 for 32 bits builds
2016-07-11 11:38:04 +02:00
Konstantinos Margaritis
ef05463fcf
Merged kmargar/eigen/tip into default, Altivec/VSX port should be working ok now.
2016-07-10 16:11:46 +03:00
Konstantinos Margaritis
9f7caa7e7d
minor fixes for big endian altivec/vsx
2016-07-10 07:05:10 -03:00
Christoph Hertzberg
3c795c6923
bug #1119 : Adjust call to ?gssvx for SuperLU 5
...
Also improved corresponding cmake module to detect versions 5.x
Based on patch by Christoph Grüninger.
2016-07-10 02:29:57 +02:00
Gael Guennebaud
57113e00f9
Relax strict equality
2016-07-09 23:37:11 +02:00
Gael Guennebaud
599f8ba617
Change runtime to compile-time conditional.
2016-07-08 11:39:43 +02:00
Gael Guennebaud
544935101a
Fix warnings
2016-07-08 11:38:52 +02:00
Gael Guennebaud
59bf2774a3
Fix warnings
2016-07-08 11:38:11 +02:00
Gael Guennebaud
2f7e2614e7
bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.
2016-07-08 11:13:55 +02:00
Gael Guennebaud
8b7431d8fd
fix compilation with c++11
2016-07-07 15:18:23 +02:00
Gael Guennebaud
69378eed0b
Split huge unit test
2016-07-07 15:18:04 +02:00
Gael Guennebaud
c684e37d32
Prevent division by zero.
2016-07-07 11:03:01 +02:00
Gael Guennebaud
179ebb88f9
Fix warning
2016-07-07 09:16:40 +02:00
Gael Guennebaud
5d2dada197
Fix warnings
2016-07-07 09:05:15 +02:00
Gael Guennebaud
f5e780fb05
split huge unit test
2016-07-07 08:59:59 +02:00
Gael Guennebaud
66917299a9
Add debug output
2016-07-06 22:27:15 +02:00
Gael Guennebaud
5ca2457fa5
Fix unit test.
2016-07-06 22:25:24 +02:00
Gael Guennebaud
9b68ed4537
Relax is_equal to is_approx because scaling might modify last bit.
2016-07-06 15:02:49 +02:00
Gael Guennebaud
c3b23d7dbf
Fix support of Intel's VML
2016-07-06 14:07:32 +02:00
Gael Guennebaud
8ec4d6480d
Fix compilation with recent updates of icc 2016
2016-07-06 14:07:14 +02:00
Gael Guennebaud
5b3a6f51d3
Improve numerical robustness of RealSchur: add scaling and compare sub-diag entries to largest diagonal entry instead of the 2 neighbors.
2016-07-06 13:45:30 +02:00
Gael Guennebaud
d2b5a19e0f
Fix warning.
2016-07-06 11:05:30 +02:00
Gael Guennebaud
367ef66af3
Re-enable some specializations for Assignment<.,Product<>>
2016-07-05 22:58:14 +02:00
Gael Guennebaud
155d8d8603
Fix compilation with msvc
2016-07-05 14:43:42 +02:00
Gael Guennebaud
43696ede8f
Revert unwanted changes.
2016-07-04 22:40:36 +02:00
Gael Guennebaud
b39fd8217f
Fix nesting of SolveWithGuess, and add unit test.
2016-07-04 17:47:47 +02:00
Gael Guennebaud
ec02af1047
Fix template resolution.
2016-07-04 17:37:33 +02:00
Gael Guennebaud
fbcfc2f862
Add unit test for solveWithGuess, and fix template resolution.
2016-07-04 17:19:38 +02:00
Gael Guennebaud
7f7839c12f
Add documentation and exemples for inplace decomposition.
2016-07-04 17:18:26 +02:00
Gael Guennebaud
32a41ee659
bug #707 : add inplace decomposition through Ref<> for Cholesky, LU and QR decompositions.
2016-07-04 15:13:35 +02:00
Gael Guennebaud
75e80792cc
Update relevent list of changesets.
2016-07-04 14:32:34 +02:00
Gael Guennebaud
dacc544b84
asm escape was not strong enough to prevent too aggressive compiler optimization let's fallback to no-inline.
2016-07-04 14:32:15 +02:00
Gael Guennebaud
b74e45906c
Few fixes in perf-monitoring.
2016-07-04 14:30:50 +02:00
Gael Guennebaud
ce9fc0ce14
fix clang compilation
2016-07-04 12:59:02 +02:00
Gael Guennebaud
440020474c
Workaround compilation issue with msvc
2016-07-04 12:49:19 +02:00
Gael Guennebaud
e61cee7a50
Fix compilation of some unit tests with msvc
2016-07-04 11:49:03 +02:00
Gael Guennebaud
91b3039013
Change the semantic of the last template parameter of Assignment from "Scalar" to "SFINAE" only.
...
The previous "Scalar" semantic was obsolete since we allow for different scalar types in the source and destination expressions.
On can still specialize on scalar types through SFINAE and/or assignment functor.
2016-07-04 11:02:00 +02:00
Gael Guennebaud
0fa9e4a15c
Fix performance regression in dgemm introduced by changeset 5d51a7f12c
2016-07-02 17:35:08 +02:00
Gael Guennebaud
672076db5d
Fix performance regression introduced in changeset e56aabf205
...
.
Register blocking sizes are better handled by the cache size heuristics.
The current code introduced very small blocks, for instance for 9x9 matrix,
thus killing performance.
2016-07-02 15:40:56 +02:00
Igor Babuschkin
78f37ca03c
Expose real and imag methods on Tensors
2016-07-01 17:34:31 +01:00
Gael Guennebaud
d161b8f03a
Merged in carpent/eigen (pull request PR-204)
...
Use complete nested namespace Eigen::internal, thus making the custom static assertion macros available outside the Eigen's namespace.
2016-07-01 09:56:44 +02:00
Benoit Steiner
cb2d8b8fa6
Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu.
2016-06-29 15:42:01 -07:00
Benoit Steiner
b2a47641ce
Made the code compile when using CUDA architecture < 300
2016-06-29 15:32:47 -07:00
Benoit Steiner
b047ca765f
Merged in ibab/eigen/fix-tensor-scan-gpu (pull request PR-205)
...
Add missing CUDA kernel to tensor scan op
2016-06-29 14:52:19 -07:00
Igor Babuschkin
85699850d9
Add missing CUDA kernel to tensor scan op
...
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
2016-06-29 11:54:35 +01:00
Justin Carpentier
6126886a67
Use complete nested namespace Eigen::internal
2016-06-28 20:09:25 +02:00
Benoit Jacob
328c5d876a
Undo changes in AltiVec --- I don't have any way to test there.
2016-06-28 11:15:25 -04:00
Benoit Jacob
38fb606052
Avoid global variables with static constructors in NEON/Complex.h
2016-06-28 11:12:49 -04:00
Benoit Steiner
1a9f92e781
Added a test to validate the tensor scan evaluation on GPU. The test is currently disabled since the code segfaults.
2016-06-27 16:02:52 -07:00
Benoit Steiner
75c333f94c
Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor.
...
Also avoid taking references to values that may becomes stale after a copy construction.
2016-06-27 10:32:38 -07:00
xantares
c52c8d76da
Disable pkgconfig only for native windows builds
...
ie enable it for MinGW
2016-06-27 16:43:08 +00:00
Gael Guennebaud
d937a420a2
Fix compilation with MSVC by using our portable numext::log1p implementation.
2016-08-22 15:44:21 +02:00
Gael Guennebaud
2d5731e40a
bug #1270 : bypass custom asm for pmadd and recent clang version
2016-08-22 15:38:03 +02:00
Gael Guennebaud
49b005181a
Define EIGEN_COMP_CLANG to clang version as major*100+minor (e.g., 307 corresponds to clang 3.7)
2016-08-22 15:37:05 +02:00
Gael Guennebaud
130f891bb0
bug #1278 : ease parsing
2016-08-22 15:00:29 +02:00
Benoit Steiner
7944d4431f
Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code.
2016-08-18 13:46:36 -07:00
Benoit Steiner
647a51b426
Force the inlining of a simple accessor.
2016-08-18 12:31:02 -07:00
Benoit Steiner
a452dedb4f
Merged in ibab/eigen/double-tensor-reduction (pull request PR-216)
...
Enable efficient Tensor reduction for doubles on the GPU (continued)
2016-08-18 12:29:54 -07:00
Igor Babuschkin
18c67df31c
Fix remaining CUDA >= 300 checks
2016-08-18 17:18:30 +01:00
Igor Babuschkin
1569a7d7ab
Add the necessary CUDA >= 300 checks back
2016-08-18 17:15:12 +01:00
Benoit Steiner
2b17f34574
Properly detect the type of the result of a contraction.
2016-08-16 16:00:30 -07:00
Igor Babuschkin
841e075154
Remove CUDA >= 300 checks and enable outer reductin for doubles
2016-08-06 18:07:50 +01:00
Igor Babuschkin
0425118e2a
Merge upstream changes
2016-08-05 14:34:57 +01:00
Igor Babuschkin
9537e8b118
Make use of atomicExch for atomicExchCustom
2016-08-05 14:29:58 +01:00
Igor Babuschkin
eeb0d880ee
Enable efficient Tensor reduction for doubles
2016-07-01 19:08:26 +01:00
Gael Guennebaud
d476cadbb8
bug #1247 : fix regression in compilation of pow(integer,integer), and add respective unit tests.
2016-06-25 10:12:06 +02:00
Gael Guennebaud
cfff370549
Fix hyperbolic functions for autodiff.
2016-06-24 23:21:35 +02:00
Gael Guennebaud
c50c73cae2
Fix missing specialization.
2016-06-24 23:10:39 +02:00
Gael Guennebaud
3852351793
merge pull request 198
2016-06-24 11:48:17 +02:00
Gael Guennebaud
6dd9077070
Fix some unused typedef warnings.
2016-06-24 11:34:21 +02:00
Gael Guennebaud
ce90647fa5
Fix NumTraits<AutoDiff>
2016-06-24 11:34:02 +02:00
Gael Guennebaud
fa39f81b48
Fix instantiation of ScalarBinaryOpTraits for AutoDiff.
2016-06-24 11:33:30 +02:00
Gael Guennebaud
cd577a275c
Relax promote_scalar_arg logic to enable promotion to Expr::Scalar if conversion to Expr::Literal fails.
...
This is useful to cancel expression template at the scalar level, e.g. with AutoDiff<AutoDiff<>>.
This patch also defers calls to NumTraits in cases for which types are not directly compatible.
2016-06-24 11:28:54 +02:00
Gael Guennebaud
deb45ad4bc
bug #1245 : fix compilation with msvc
2016-06-24 09:52:25 +02:00
Rasmus Munk Larsen
a9c1e4d7b7
Return -1 from CurrentThreadId when called by thread outside the pool.
2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen
d39df320d2
Resolve merge.
2016-06-23 15:08:03 -07:00
Gael Guennebaud
361dbd246d
Add unit test for printing empty tensors
2016-06-23 18:54:30 +02:00
Gael Guennebaud
360a743a10
bug #1241 : does not emmit anything for empty tensors
2016-06-23 18:47:31 +02:00
Gael Guennebaud
55fc04e8b5
Fix operator priority
2016-06-23 15:36:42 +02:00
Gael Guennebaud
bf2d5edecc
Fix warning.
2016-06-23 15:35:17 +02:00
Gael Guennebaud
7c6561485a
merge PR 194
2016-06-23 15:29:57 +02:00
Konstantinos Margaritis
be107e387b
fix compilation with clang 3.9, fix performance with pset1, use vector operators instead of intrinsics in some cases
2016-06-23 10:19:05 -03:00
Gael Guennebaud
76faf4a965
Introduce a NumTraits<T>::Literal type to be used for literals, and
...
improve mixing type support in operations between arrays and scalars:
- 2 * ArrayXcf is now optimized in the sense that the integer 2 is properly promoted to a float instead of a complex<float> (fix a regression)
- 2.1 * ArrayXi is now forbiden (previously, 2.1 was converted to 2)
- This mechanism should be applicable to any custom scalar type, assuming NumTraits<T>::Literal is properly defined (it defaults to T)
2016-06-23 14:27:20 +02:00
Gael Guennebaud
a3f7edf7e7
Biug 1242: fix comma init with empty matrices.
2016-06-23 10:25:04 +02:00
Benoit Steiner
a29a2cb4ff
Silenced a couple of compilation warnings generated by xcode
2016-06-22 16:43:02 -07:00
Benoit Steiner
f8fcd6b32d
Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers
2016-06-22 16:03:11 -07:00
Benoit Steiner
c58df31747
Handle empty tensors in the print functions
2016-06-21 09:22:43 -07:00
Benoit Steiner
de32f8d656
Fixed the printing of rank-0 tensors
2016-06-20 10:46:45 -07:00
Konstantinos Margaritis
8c34b5a0e3
mostly cleanups and modernizing code
2016-06-19 16:13:17 -03:00
Konstantinos Margaritis
b410d46482
mostly cleanups and modernizing code
2016-06-19 16:12:52 -03:00
Konstantinos Margaritis
b80379bda0
fixed pexp<Packet2d>, was failing tests
2016-06-19 16:11:58 -03:00
Tal Hadad
8e198d6835
Complete docs and add ostream operator for EulerAngles.
2016-06-19 20:42:45 +03:00
Benoit Steiner
b055590e91
Made log1p_impl usable inside a GPU kernel
2016-06-16 11:37:40 -07:00
Geoffrey Lalonde
72c95383e0
Add autodiff coverage for standard library hyperbolic functions, and tests.
...
* * *
Corrected tanh derivatived, moved test definitions.
* * *
Added more test cases, removed lingering lines
2016-06-15 23:33:19 -07:00
Gael Guennebaud
67c12531e5
Fix warnings with gcc
2016-06-15 18:11:33 +02:00
Gael Guennebaud
eb91345d64
Move scalar/expr to ArrayBase and fix documentation
2016-06-15 15:22:03 +02:00
Gael Guennebaud
4794834397
Propagate functor to ScalarBinaryOpTraits
2016-06-15 09:58:49 +02:00
Gael Guennebaud
c55035b9c0
Include the cost of stores in unrolling of triangular expressions.
2016-06-15 09:57:33 +02:00
Benoit Steiner
7d495d890a
Merged in ibab/eigen (pull request PR-197)
...
Implement exclusive scan option for Tensor library
2016-06-14 17:54:59 -07:00
Benoit Steiner
aedc5be1d6
Avoid generating pseudo random numbers that are multiple of 5: this helps
...
spread the load over multiple cpus without havind to rely on work stealing.
2016-06-14 17:51:47 -07:00
Gael Guennebaud
4e7c3af874
Cleanup useless helper: internal::product_result_scalar
2016-06-15 00:04:10 +02:00
Gael Guennebaud
101ea26f5e
Include the cost of stores in unrolling (also fix infinite unrolling with expression costing 0 like Constant)
2016-06-15 00:01:16 +02:00
Igor Babuschkin
c4d10e921f
Implement exclusive scan option
2016-06-14 19:44:07 +01:00
Gael Guennebaud
76236cdea4
merge
2016-06-14 15:33:47 +02:00
Gael Guennebaud
1004c4df99
Cleanup unused functors.
2016-06-14 15:27:28 +02:00
Gael Guennebaud
70dad84b73
Generalize expr/expr and scalar/expr wrt scalar types.
2016-06-14 15:26:37 +02:00
Gael Guennebaud
62134082aa
Update AutoDiffScalar wrt to scalar-multiple.
2016-06-14 15:06:35 +02:00
Gael Guennebaud
5d38203735
Update Tensor module to use bind1st_op and bind2nd_op
2016-06-14 15:06:03 +02:00
Gael Guennebaud
396d9cfb6e
Generalize expr.pow(scalar), pow(expr,scalar) and pow(scalar,expr).
...
Internal: scalar_pow_op (unary) is removed, and scalar_binary_pow_op is renamed scalar_pow_op.
2016-06-14 14:10:07 +02:00
Gael Guennebaud
a9bb653a68
Update doc (scalar_add_op is now deprecated)
2016-06-14 12:07:00 +02:00
Gael Guennebaud
a8c08e8b8e
Implement expr+scalar, scalar+expr, expr-scalar, and scalar-expr as binary expressions, and generalize supported scalar types.
...
The following functors are now deprecated: scalar_add_op, scalar_sub_op, and scalar_rsub_op.
2016-06-14 12:06:10 +02:00
Gael Guennebaud
756ac4a93d
Fix doc.
2016-06-14 12:03:39 +02:00
Gael Guennebaud
f925dba3d9
Fix compilation of BVH example
2016-06-14 11:32:09 +02:00
Gael Guennebaud
12350d3ac7
Add unit test for AlignedBox::center
2016-06-14 11:31:52 +02:00
Gael Guennebaud
bcc0f38f98
Add unittesting plugins to scalar_product_op and scalar_quotient_op to help chaking that types are properly propagated.
2016-06-14 11:31:27 +02:00
Gael Guennebaud
f57fd78e30
Generalize coeff-wise sparse products to support different scalar types
2016-06-14 11:29:54 +02:00
Gael Guennebaud
f5b1c73945
Set cost of constant expression to 0 (the cost should be amortized through the expression)
2016-06-14 11:29:06 +02:00
Gael Guennebaud
deb8306e60
Move MatrixBase::operaotr*(UniformScaling) as a free function in Scaling.h, and fix return type.
2016-06-14 11:28:03 +02:00
Gael Guennebaud
64fcfd314f
Implement scalar multiples and division by a scalar as a binary-expression with a constant expression.
...
This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages:
- it makes it clear on each side the scalar is applied,
- it clearly reflects that we are dealing with a binary-expression,
- the complexity of the type is hidden through macros defined at the end of Macros.h,
- distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions)
- "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr"
- scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
2016-06-14 11:26:57 +02:00
Gael Guennebaud
39781dc1e2
Fix compilation of evaluator unit test
2016-06-14 11:03:26 +02:00
Tal Hadad
6edfe8771b
Little bit docs
2016-06-13 22:03:19 +03:00
Tal Hadad
6e1c086593
Add static assertion
2016-06-13 21:55:17 +03:00
Gael Guennebaud
3c12e24164
Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.
2016-06-13 16:18:59 +02:00
Gael Guennebaud
7a9ef7bbb4
Add default template parameters for the second scalar type of binary functors.
...
This enhences backward compatibility.
2016-06-13 16:17:23 +02:00
Gael Guennebaud
2ca2ffb65e
check for mixing types in "array / scalar" expressions
2016-06-13 16:15:32 +02:00
Gael Guennebaud
4c61f00838
Add missing explicit scalar conversion
2016-06-12 22:42:13 +02:00
Tal Hadad
06206482d9
More docs, and minor code fixes
2016-06-12 23:40:17 +03:00
Gael Guennebaud
a3a4714aba
Add debug output.
2016-06-11 14:41:53 +02:00
Gael Guennebaud
83904a21c1
Make sure T(i+1,i)==0 when diagonalizing T(i:i+1,i:i+1)
2016-06-11 14:41:36 +02:00
Benoit Steiner
65d33e5898
Merged in ibab/eigen (pull request PR-195)
...
Add small fixes to TensorScanOp
2016-06-10 19:31:17 -07:00
Benoit Steiner
a05607875a
Don't refer to the half2 type unless it's been defined
2016-06-10 11:53:56 -07:00
Gael Guennebaud
fabae6c9a1
Cleanup
2016-06-10 15:58:33 +02:00
Gael Guennebaud
5de8d7036b
Add real.pow(complex), complex.pow(real) unit tests.
2016-06-10 15:58:22 +02:00
Gael Guennebaud
5fdd703629
Enable mixing types in numext::pow
2016-06-10 15:58:04 +02:00
Gael Guennebaud
2e238bafb6
Big 279: enable mixing types for comparisons, min, and max.
2016-06-10 15:05:43 +02:00
Gael Guennebaud
0028049380
bug #1240 : Remove any assumption on NEON vector types.
2016-06-09 23:08:11 +02:00
Igor Babuschkin
86aedc9282
Add small fixes to TensorScanOp
2016-06-07 20:06:38 +01:00
Christoph Hertzberg
db0118342c
Fixed compilation of BVH_Example (required for make doc)
2016-06-07 19:17:18 +02:00
Benoit Steiner
84b2060a9e
Fixed compilation error with gcc 4.4
2016-06-06 17:16:19 -07:00
Gael Guennebaud
2c462f4201
Clean handling for void type in EIGEN_CHECK_BINARY_COMPATIBILIY
2016-06-06 23:11:38 +02:00
Gael Guennebaud
3d71d3918e
Disable shortcuts for res ?= prod when the scalar types do not match exactly.
2016-06-06 23:10:55 +02:00
Benoit Steiner
7ef9f47b58
Misc small improvements to the reduction code.
2016-06-06 14:09:46 -07:00
Benoit Steiner
ea75dba201
Added missing EIGEN_DEVICE_FUNC qualifiers to the unary array ops
2016-06-06 13:32:28 -07:00
Benoit Steiner
33f0340188
Implement result_of for the new ternary functors
2016-06-06 12:06:42 -07:00
Tal Hadad
e30133e439
Doc EulerAngles class, and minor fixes.
2016-06-06 22:01:40 +03:00
Gael Guennebaud
df24f4a01d
bug #1201 : improve code generation of affine*vec with MSVC
2016-06-06 16:46:46 +02:00
Benoit Steiner
9137f560f0
Moved assertions to the constructor to make the code more portable
2016-06-06 07:26:48 -07:00
Gael Guennebaud
66e99ab6a1
Relax mixing-type constraints for binary coefficient-wise operators:
...
- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP>
- Remove the "functor_is_product_like" helper (was pretty ugly)
- Currently, OP is not used, but it is available to the user for fine grained tuning
- Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-=
- TODO: generalize all other binray operators (comparisons,pow,etc.)
- TODO: handle "scalar op array" operators (currently only * is handled)
- TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
2016-06-06 15:11:41 +02:00
Benoit Steiner
1f1e0b9e30
Silenced compilation warning
2016-06-05 12:59:11 -07:00
Benoit Steiner
5b95b4daf9
Moved static assertions into the class constructor to make the code more portable
2016-06-05 12:57:48 -07:00
Christoph Hertzberg
d7e3e4bb04
Removed executable bits from header files.
2016-06-05 10:15:41 +02:00
Eugene Brevdo
c53687dd14
Add randomized properties tests for betainc special function.
2016-06-05 11:10:30 -07:00
Rasmus Munk Larsen
f1f2ff8208
size_t -> int
2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen
76308e7fd2
Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.
2016-06-03 16:28:58 -07:00
Sean Templeton
bd21243821
Fix compile errors initializing packets on ARM DS-5 5.20
...
The ARM DS-5 5.20 compiler fails compiling with the following errors:
"src/Core/arch/NEON/PacketMath.h", line 113: Error: #146 : too many initializer values
Packet4f countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3);
^
"src/Core/arch/NEON/PacketMath.h", line 118: Error: #146 : too many initializer values
Packet4i countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3);
^
"src/Core/arch/NEON/Complex.h", line 30: Error: #146 : too many initializer values
static uint32x4_t p4ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET4(0x00000000, 0x80000000, 0x00000000, 0x80000000);
^
"src/Core/arch/NEON/Complex.h", line 31: Error: #146 : too many initializer values
static uint32x2_t p2ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x00000000, 0x80000000);
^
The vectors are implemented as two doubles, hence the too many initializer values error.
Changed the code to use intrinsic load functions which all compilers
implementing NEON should have.
2016-06-03 10:51:35 -05:00
Gael Guennebaud
1fc2746417
Make Arrays's ctor/assignment noexcept
2016-06-09 22:52:37 +02:00
Benoit Steiner
37638dafd7
Simplified the code that dispatches vectorized reductions on GPU
2016-06-09 10:29:52 -07:00
Benoit Steiner
66796e843d
Fixed definition of some of the reducer_traits
2016-06-09 08:50:01 -07:00
Benoit Steiner
4434b16694
Pulled latest updates from trunk
2016-06-09 08:25:47 -07:00
Benoit Steiner
14a112ee15
Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression.
2016-06-09 08:25:22 -07:00
Benoit Steiner
8f92c26319
Improved code formatting
2016-06-09 08:23:42 -07:00
Benoit Steiner
aa33446dac
Improved support for vectorization of 16-bit floats
2016-06-09 08:22:27 -07:00
Gael Guennebaud
e2b3836326
Include recent changesets that played with product's kernel
2016-06-09 17:13:33 +02:00
Gael Guennebaud
2bd59b0e0d
Take advantage that T is already diagonal in the extraction of generalized complex eigenvalues.
2016-06-09 17:12:03 +02:00
Gael Guennebaud
c1f9ca9254
Update RealQZ to reduce 2x2 diagonal block of T corresponding to non reduced diagonal block of S to positive diagonal form.
...
This step involve a real 2x2 SVD problem. The respective routine is thus in src/misc/ to be shared by both EVD and AVD modules.
2016-06-09 17:11:03 +02:00
Gael Guennebaud
15890c304e
Add unit test for non symmetric generalized eigenvalues
2016-06-09 16:17:27 +02:00
Gael Guennebaud
a20d2ec1c0
Fix shadow variable, and indexing.
2016-06-09 16:16:22 +02:00
Abhijit Kundu
0beabb4776
Fixed type conversion from int
2016-06-08 16:12:04 -04:00
Gael Guennebaud
df095cab10
Fixes for PARDISO: warnings, and defaults to metis+ in-core mode.
2016-06-08 18:31:19 +02:00
Gael Guennebaud
9fc8379328
Fix extraction of complex eigenvalue pairs in real generalized eigenvalue problems.
2016-06-08 16:39:11 +02:00
Christoph Hertzberg
9dd9d58273
Copied a regression test from 3.2 branch.
2016-06-08 15:36:42 +02:00
Benoit Steiner
8fd57a97f2
Enable the vectorization of adds and mults of fp16
2016-06-07 18:22:18 -07:00
Benoit Steiner
d6d39c7ddb
Added missing EIGEN_DEVICE_FUNC
2016-06-07 14:35:08 -07:00
Gael Guennebaud
8d97ba6b22
bug #725 : make move ctor/assignment noexcept.
2016-06-03 14:28:25 +02:00
Gael Guennebaud
e8b922ca63
Fix MatrixFunctions module.
2016-06-03 09:21:35 +02:00
Gael Guennebaud
82293f38d6
Fix unit test.
2016-06-03 08:12:14 +02:00
Gael Guennebaud
fe62c06d9b
Fix compilation.
2016-06-03 07:47:38 +02:00
Gael Guennebaud
969b8959a0
Fix compilation: Matrix does not indirectly live in the internal namespace anymore!
2016-06-03 07:44:58 +02:00
Gael Guennebaud
f2c2465acc
Fix function dependencies
2016-06-03 07:44:18 +02:00
Benoit Steiner
c3c8ad8046
Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing
2016-06-02 21:17:41 -07:00
Eugene Brevdo
39baff850c
Add TernaryFunctors and the betainc SpecialFunction.
...
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.
Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.
Added unit tests in array.cpp and cxx11_tensor_cuda.cu
Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
2016-06-02 17:04:19 -07:00
Benoit Steiner
02db4e1a82
Disable the tensor tests when using msvc since older versions of the compiler fail to handle this code
2016-06-04 08:21:17 -07:00
Benoit Steiner
c21eaedce6
Use array_prod to compute the number of elements contained in the input tensor expression
2016-06-04 07:47:04 -07:00
Benoit Steiner
36a4500822
Merged in ibab/eigen (pull request PR-192)
...
Add generic scan method
2016-06-03 17:28:33 -07:00
Benoit Steiner
c2a102345f
Improved the performance of full reductions.
...
AFTER:
BM_fullReduction/10 4541 4543 154017 21.0M items/s
BM_fullReduction/64 5191 5193 100000 752.5M items/s
BM_fullReduction/512 9588 9588 71361 25.5G items/s
BM_fullReduction/4k 244314 244281 2863 64.0G items/s
BM_fullReduction/5k 359382 359363 1946 64.8G items/s
BEFORE:
BM_fullReduction/10 9085 9087 74395 10.5M items/s
BM_fullReduction/64 9478 9478 72014 412.1M items/s
BM_fullReduction/512 14643 14646 46902 16.7G items/s
BM_fullReduction/4k 260338 260384 2678 60.0G items/s
BM_fullReduction/5k 385076 385178 1818 60.5G items/s
2016-06-03 17:27:08 -07:00
Igor Babuschkin
dc03b8f3a1
Add generic scan method
2016-06-03 17:37:04 +01:00
Gael Guennebaud
5b77481d58
merge
2016-06-02 22:21:45 +02:00
Gael Guennebaud
53feb73b45
Remove dead code.
2016-06-02 22:19:55 +02:00
Gael Guennebaud
2c00ac0b53
Implement generic scalar*expr and expr*scalar operator based on scalar_product_traits.
...
This is especially useful for custom scalar types, e.g., to enable float*expr<multi_prec> without conversion.
2016-06-02 22:16:37 +02:00
Rasmus Munk Larsen
811aadbe00
Add syntactic sugar to Eigen tensors to allow more natural syntax.
...
Specifically, this enables expressions involving:
scalar + tensor
scalar * tensor
scalar / tensor
scalar - tensor
2016-06-02 12:41:28 -07:00
Tal Hadad
52e4cbf539
Merged eigen/eigen into default
2016-06-02 22:15:20 +03:00
Tal Hadad
2aaaf22623
Fix Gael reports (except documention)
...
- "Scalar angle(int) const" should be "const Vector& angles() const"
- then method "coeffs" could be removed.
- avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;)
- about the "fromRotation" methods:
- replace the ones which are not static by operator= (as in Quaternion)
- the others are actually static methods: use a capital F: FromRotation
- method "invert" should be removed.
- use a macro to define both float and double EulerAnglesXYZ* typedefs
- AddConstIf -> not used
- no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants:
if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();
2016-06-02 22:12:57 +03:00
Benoit Steiner
6021c90fdf
Merged in ibab/eigen (pull request PR-189)
...
Add scan op to Tensor module
2016-06-02 08:08:11 -07:00
Gael Guennebaud
8b6f53222b
bug #1193 : fix lpNorm<Infinity> for empty input.
2016-06-02 15:29:59 +02:00
Gael Guennebaud
d616a81294
Disable MSVC's "decorated name length exceeded, name was truncated" warning in unit tests.
2016-06-02 14:48:38 +02:00
Gael Guennebaud
61a32f2a4c
Fix pointer to long conversion warning.
2016-06-02 14:45:45 +02:00
Igor Babuschkin
fbd7ed6ff7
Add tensor scan op
...
This is the initial implementation a generic scan operation.
Based on this, cumsum and cumprod method have been added to TensorBase.
2016-06-02 13:35:47 +01:00
Benoit Steiner
0ed08fd281
Use a single PacketSize variable
2016-06-01 21:19:05 -07:00
Benoit Steiner
8f6fedc55f
Fixed compilation warning
2016-06-01 21:14:46 -07:00
Benoit Steiner
c3cada38e2
Speedup a test
2016-06-01 21:13:00 -07:00
Gael Guennebaud
360e311b66
Doc: add some cross references (also fix empty macro argument warning)
2016-06-01 23:34:09 +02:00
Benoit Steiner
873e6ac54b
Silenced compilation warning generated by nvcc.
2016-06-01 14:20:50 -07:00
Benoit Steiner
d27b0ad4c8
Added support for mean reductions on fp16
2016-06-01 11:12:07 -07:00
Gael Guennebaud
cd221a62ee
Doc: start of a table summarizing coefficient-wise math functions.
2016-06-01 17:09:48 +02:00
Gael Guennebaud
3c69afca4c
Add missing ArrayBase::log1p
2016-06-01 17:08:47 +02:00
Gael Guennebaud
89099b0cf7
Expose log1p to Array.
2016-06-01 17:00:08 +02:00
Gael Guennebaud
afd33539dd
Doc: makes the global unary math functions visible to doxygen (and docuement them)
2016-06-01 15:27:13 +02:00
Gael Guennebaud
77e652d8ad
Doc: improve documentation of Map<SparseMatrix>
2016-06-01 10:03:32 +02:00
Gael Guennebaud
da4970ead2
Doc: disable inlining of inherited members, workaround Doxygen's limited C++ parsing abilities, and improve doc of MapBase.
2016-06-01 09:38:49 +02:00
Benoit Steiner
099b354ca7
Pulled latest updates from trunk
2016-05-31 10:34:16 -07:00
Benoit Steiner
5aeb3687c4
Only enable optimized reductions of fp16 if the reduction functor supports them
2016-05-31 10:33:40 -07:00
Benoit Steiner
b6e306f189
Improved support for CUDA 8.0
2016-05-31 09:47:59 -07:00
Gael Guennebaud
1d3b253329
bug #1181 : help MSVC inlining.
2016-05-31 17:23:42 +02:00
Gael Guennebaud
d79eee05ef
Fix compilation with old icc
2016-05-31 17:13:51 +02:00
Gael Guennebaud
2c1b56f4c1
bug #1238 : fix SparseMatrix::sum() overload for un-compressed mode.
2016-05-31 10:56:53 +02:00
Benoit Steiner
c4bd3b1f21
Silenced some compilation warnings triggered by nvcc 8.0
2016-05-27 14:40:49 -07:00
Benoit Steiner
e2946d962d
Reimplement clamp as a static function.
2016-05-27 12:58:43 -07:00
Benoit Steiner
e96d36d4cd
Use NULL instead of nullptr to preserve the compatibility with cxx03
2016-05-27 12:54:06 -07:00
Benoit Steiner
abc815798b
Added a new operation to enable more powerful tensorindexing.
2016-05-27 12:22:25 -07:00
Benoit Steiner
5707537592
Fixed option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr' warning generated by nvcc 7.5
2016-05-27 10:47:53 -07:00
Benoit Steiner
3a5d6a3c38
Disable the use of MMX instructions since the code is broken on many platforms
2016-05-27 09:13:26 -07:00
Christoph Hertzberg
f2c86384f4
Cleaner implementation of dont_over_optimize.
2016-05-27 11:13:38 +02:00
Gael Guennebaud
22a035db95
Fix compilation when defaulting to row-major
2016-05-27 10:31:11 +02:00
Gael Guennebaud
e0cb73b46b
Fix compilation with old ICC version (use C99 types instead of C++11 ones)
2016-05-27 10:28:09 +02:00
Benoit Steiner
1ae2567861
Fixed some compilation warnings
2016-05-26 15:57:19 -07:00
Benoit Steiner
094f4a56c8
Deleted extra namespace
2016-05-26 14:49:51 -07:00
Benoit Steiner
1a47844529
Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float)
2016-05-26 14:37:09 -07:00
Benoit Steiner
36369ab63c
Resolved merge conflicts
2016-05-26 13:39:39 -07:00
Benoit Steiner
28fcb5ca2a
Merged latest reduction improvements
2016-05-26 12:19:33 -07:00
Benoit Steiner
b24cf21235
Merged latest code improvements
2016-05-26 11:57:50 -07:00
Benoit Steiner
c1c7f06c35
Improved the performance of inner reductions.
2016-05-26 11:53:59 -07:00
Benoit Steiner
22d02c9855
Improved the coverage of the fp16 reduction tests
2016-05-26 11:12:16 -07:00
Christoph Hertzberg
41dcd047d7
bug #1237 : Redefine eigen_assert instead of disabling assertions for documentation snippets
2016-05-26 18:13:33 +02:00
Benoit Steiner
8288b0aec2
Code cleanup.
2016-05-26 09:00:04 -07:00
Gael Guennebaud
7ff5fadcc0
Disable usage of MMX with msvc.
2016-05-26 17:58:46 +02:00
Gael Guennebaud
e8cef383b7
bug #1236 : fix possible integer overflow in density estimation.
2016-05-26 17:51:04 +02:00
Gael Guennebaud
35df3a32eb
Disabled GCC6's ignored-attributes warning in packetmath unit test.
2016-05-26 17:42:58 +02:00
Gael Guennebaud
db62719eda
Fix some conversion warnings in unit tests.
2016-05-26 17:42:12 +02:00
Gael Guennebaud
fdcad686ee
Fix numerous pointer-to-integer conversion warnings in unit tests.
2016-05-26 17:41:28 +02:00
Gael Guennebaud
30d97c03ce
Defer the allocation of the working space:
...
- it is not always needed,
- and this fixes a long-to-float conversion warning
2016-05-26 17:39:42 +02:00
Gael Guennebaud
e08f54e9eb
Fix copy ctor prototype.
2016-05-26 17:37:25 +02:00
Gael Guennebaud
c7f54b11ec
linspaced's divisor for integer is better stored as the underlying scalar type.
2016-05-26 17:36:54 +02:00
Gael Guennebaud
bebc5a2147
Fix/handle some int-to-long conversions.
2016-05-26 17:35:53 +02:00
Gael Guennebaud
00c29c2cae
Store permutation's determinant as char.
...
This also fixes some long to float conversion warnings
2016-05-26 17:34:23 +02:00
Gael Guennebaud
2f56d91063
Fix a pointer to integer conversion warning
2016-05-26 17:31:45 +02:00
Gael Guennebaud
2a44a70142
Handle some Index to int conversions in BLAS/LAPACK support.
2016-05-26 17:29:04 +02:00
Gael Guennebaud
f253e19296
Disable some long to float conversion warnings
2016-05-26 17:27:14 +02:00
Christoph Hertzberg
2ee306e44a
Temporary workaround for bug #1237 . The snippet (expectedly) failed with enabled assertions.
2016-05-26 16:16:41 +02:00
Gael Guennebaud
37197b602b
Remove debuging code.
2016-05-26 11:53:10 +02:00
Gael Guennebaud
27f0434233
Introduce internal's UIntPtr and IntPtr types for pointer to integer conversions.
...
This fixes "conversion from pointer to same-sized integral type" warnings by ICC.
Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only,
let's be safe.
2016-05-26 10:52:12 +02:00
Gael Guennebaud
40e4637d79
Turn off ICC's conversion warning in is_convertible implementation
2016-05-26 10:48:43 +02:00
Gael Guennebaud
cc1ab64f29
Add missing inclusion of mmintrin.h
2016-05-26 09:51:50 +02:00
Benoit Steiner
2d7ed54ba2
Made the static storage class qualifier come first.
2016-05-25 22:16:15 -07:00
Benoit Steiner
e1fca8866e
Deleted unnecessary explicit qualifiers.
2016-05-25 22:15:26 -07:00
Benoit Steiner
9b0aaf5113
Don't mark inline functions as static since it confuses the ICC compiler
2016-05-25 22:10:11 -07:00
Benoit Steiner
3585ff585e
Silenced a compilation warning
2016-05-25 22:09:19 -07:00
Benoit Steiner
037a463fd5
Marked unused variables as such
2016-05-25 22:07:48 -07:00
Benoit Steiner
efeb89dcdb
Specify the rounding mode in the correct location
2016-05-25 17:53:24 -07:00
Benoit Steiner
457204cb83
Updated the README file for the tensor benchmarks
2016-05-25 16:13:41 -07:00
Benoit Steiner
0322c66a3f
Explicitly specify the rounding mode when converting floats to fp16
2016-05-25 15:56:15 -07:00
Benoit Steiner
3ac4045272
Made the IndexPair code compile in non cxx11 mode
2016-05-25 15:15:12 -07:00
Benoit Steiner
66556d0e05
Made the index pair list code more portable accross various compilers
2016-05-25 14:34:27 -07:00
Benoit Steiner
034aa3b2c0
Improved the performance of tensor padding
2016-05-25 11:43:08 -07:00
Benoit Steiner
58026905ae
Added support for statically known lists of pairs of indices
2016-05-25 11:04:14 -07:00
Benoit Steiner
ed783872ab
Disable the use of MMX instructions on x86_64 since too many compilers only support them in 32bit mode
2016-05-25 08:27:26 -07:00
Benoit Steiner
bcfff64f9e
Use numext:: instead of std:: functions.
2016-05-25 08:08:21 -07:00
Gael Guennebaud
f57260a997
Fix typo in dont_over_optimize
2016-05-25 11:17:53 +02:00
Gael Guennebaud
2cd32be70b
Fix warning.
2016-05-25 11:15:54 +02:00
Gael Guennebaud
bbf9109e25
Fix compilation with ICC.
2016-05-25 10:00:55 +02:00
Gael Guennebaud
2a1bff67fd
Fix static/inline order.
2016-05-25 10:00:11 +02:00
Benoit Steiner
0835667329
There is no need to make the fp16 full reduction kernel a static function.
2016-05-24 23:11:56 -07:00
Benoit Steiner
b5d6b52a4d
Fixed compilation warning
2016-05-24 23:10:57 -07:00
Benoit Steiner
d041a528da
Cleaned up the fp16 code a little more
2016-05-24 22:43:26 -07:00
Benoit Steiner
cb26784d07
Pulled latest updates from trunk
2016-05-24 18:51:39 -07:00
Benoit Steiner
ff4a289572
Cleaned up the fp16 code
2016-05-24 18:50:09 -07:00
Gael Guennebaud
3f715e1701
update doc wrt to unaligned vectorization
2016-05-24 22:34:59 +02:00
Gael Guennebaud
9216abe28d
Document EIGEN_UNALIGNED_VECTORIZE.
2016-05-24 22:14:34 +02:00
Gael Guennebaud
0fd953c217
Workaround clang/llvm bug in code generation.
2016-05-24 21:55:46 +02:00
Gael Guennebaud
e68e165a23
bug #256 : enable vectorization with unaligned loads/stores.
...
This concerns all architectures and all sizes.
This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0
2016-05-24 21:54:03 +02:00
Gael Guennebaud
78390e4189
Block<> should not disable vectorization based on inner-size, this is the responsibilty of the assignment logic.
2016-05-24 17:14:01 +02:00
Gael Guennebaud
64bb7576eb
Clean propagation of Dest/Src alignments.
2016-05-24 17:12:12 +02:00
Benoit Jacob
40a16282c7
Remove now-unused protate PacketMath func
2016-05-24 11:01:18 -04:00
Benoit Jacob
6136f4fdd4
Remove the rotating kernel. It was only useful on some ARM CPUs (Qualcomm Krait) that are not as ubiquitous today as they were when I introduced it.
2016-05-24 10:00:32 -04:00
Benoit Steiner
e617711306
Don't attempt to use MMX instructions with visualstudio since they're only partially supported.
2016-05-24 06:43:58 -07:00
Benoit Steiner
334e76537f
Worked around missing clang intrinsic
2016-05-24 00:29:28 -07:00
Benoit Steiner
b517ab349b
Use the generic ploadquad intrinsics since it does the job
2016-05-24 00:11:17 -07:00
Benoit Steiner
646872cb3b
Worked around missing clang intrinsics
2016-05-24 00:07:08 -07:00
Benoit Steiner
3dfc391a61
Added missing EIGEN_DEVICE_FUNC qualifier
2016-05-23 20:56:59 -07:00
Benoit Steiner
3d0741f027
Include mmintrin.h to make it possible to use mmx instructions when needed. For example, this will enable the definition of a half packet for the Packet4f type.
2016-05-23 20:43:48 -07:00
Benoit Steiner
33a94f5dc7
Use the Index type instead of integers to specify the strides in pgather/pscatter
2016-05-23 20:37:30 -07:00
Benoit Steiner
6bc684ab6a
Added missing alignment in the fp16 packet traits
2016-05-23 20:32:30 -07:00
Benoit Steiner
283e33dea4
ptranspose is not a template.
2016-05-23 19:55:55 -07:00
Benoit Steiner
a5a3ba2b80
Avoid unnecessary float to double conversions
2016-05-23 17:16:09 -07:00
Benoit Steiner
5ba0ebe7c9
Avoid unnecessary float to double conversion.
2016-05-23 17:14:31 -07:00
Benoit Steiner
7d980d74e5
Started to vectorize the processing of 16bit floats on CPU.
2016-05-23 15:21:40 -07:00
Benoit Steiner
5d51a7f12c
Don't optimize the processing of the last rows of a matrix matrix product in cases that violate the assumptions made by the optimized code path.
2016-05-23 15:13:16 -07:00
Benoit Steiner
7aa5bc9558
Fixed a typo in the array.cpp test
2016-05-23 14:39:51 -07:00
Benoit Steiner
a09cbf9905
Merged in rmlarsen/eigen (pull request PR-188)
...
Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-23 12:55:12 -07:00
Christoph Hertzberg
88654762da
Replace multiple constructors of half-type by a generic/templated constructor. This fixes an incompatibility with long double, exposed by the previous commit.
2016-05-23 10:03:03 +02:00
Christoph Hertzberg
718521d5cf
Silenced several double-promotion warnings
2016-05-22 18:17:04 +02:00
Christoph Hertzberg
b5a7603822
fixed macro name
2016-05-22 16:49:29 +02:00
Christoph Hertzberg
25a03c02d6
Fix some sign-compare warnings
2016-05-22 16:42:27 +02:00
Christoph Hertzberg
0851d5d210
Identify clang++ even if it is not named llvm-clang++
2016-05-22 15:21:14 +02:00
Gael Guennebaud
6a15e14cda
Document EIGEN_MAX_CPP_VER and user controllable compiler features.
2016-05-20 15:26:09 +02:00
Gael Guennebaud
ccaace03c9
Make EIGEN_HAS_CONSTEXPR user configurable
2016-05-20 15:10:08 +02:00
Gael Guennebaud
c3410804cd
Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable
2016-05-20 15:05:38 +02:00
Gael Guennebaud
abd1c1af7a
Make EIGEN_HAS_STD_RESULT_OF user configurable
2016-05-20 15:01:27 +02:00
Gael Guennebaud
1395056fc0
Make EIGEN_HAS_C99_MATH user configurable
2016-05-20 14:58:19 +02:00
Gael Guennebaud
48bf5ec216
Make EIGEN_HAS_RVALUE_REFERENCES user configurable
2016-05-20 14:54:20 +02:00
Gael Guennebaud
f43ae88892
Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES
2016-05-20 14:48:51 +02:00
Gael Guennebaud
8d6bd5691b
polygamma is C99/C++11 only
2016-05-20 14:45:33 +02:00
Gael Guennebaud
998f2efc58
Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used.
2016-05-20 14:44:28 +02:00
Gael Guennebaud
c028d96089
Improve doc of special math functions
2016-05-20 14:18:48 +02:00
Gael Guennebaud
0ba32f99bd
Rename UniformRandom to UnitRandom.
2016-05-20 13:21:34 +02:00
Gael Guennebaud
7a9d9cde94
Fix coding practice in Quaternion::UniformRandom
2016-05-20 13:19:52 +02:00
Joseph Mirabel
eb0cc2573a
bug #823 : add static method to Quaternion for uniform random rotations.
2016-05-20 13:15:40 +02:00
Gael Guennebaud
2f656ce447
Remove std:: to enable custom scalar types.
2016-05-19 23:13:47 +02:00
Rasmus Larsen
b1e080c752
Merged eigen/eigen into default
2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen
5624219b6b
Merge.
2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen
7df811cfe5
Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-18 15:09:48 -07:00
Benoit Steiner
bb3ff8e9d9
Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available.
2016-05-18 14:52:49 -07:00
Gael Guennebaud
84df9142e7
bug #1231 : fix compilation regression regarding complex_array/=real_array and add respective unit tests
2016-05-18 23:00:13 +02:00
Gael Guennebaud
21d692d054
Use coeff(i,j) instead of operator().
2016-05-18 17:09:20 +02:00
Gael Guennebaud
8456bbbadb
bug #1224 : fix regression in (dense*dense).sparseView() by specializing evaluator<SparseView<Product>> for sparse products only.
2016-05-18 16:53:28 +02:00
Gael Guennebaud
b507b82326
Use default sorting strategy for square products.
2016-05-18 16:51:54 +02:00
Gael Guennebaud
1fa15ceee6
Extend sparse*sparse product unit test to check that the expected implementation is used (conservative vs auto pruning).
2016-05-18 16:50:54 +02:00
Gael Guennebaud
548a487800
bug #1229 : bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway.
2016-05-18 16:44:05 +02:00
Gael Guennebaud
43790e009b
Pass argument by const ref instead of by value in pow(AutoDiffScalar...)
2016-05-18 16:28:02 +02:00
Gael Guennebaud
1fbfab27a9
bug #1223 : fix compilation of AutoDiffScalar's min/max operators, and add regression unit test.
2016-05-18 16:26:26 +02:00
Gael Guennebaud
448d9d943c
bug #1222 : fix compilation in AutoDiffScalar and add respective unit test
2016-05-18 16:00:11 +02:00
Gael Guennebaud
5a71eb5985
Big 1213: add regression unit test.
2016-05-18 14:03:03 +02:00
Gael Guennebaud
747e3290c0
bug #1213 : rename some enums type for consistency.
2016-05-18 13:26:56 +02:00
Rasmus Munk Larsen
f519fca72b
Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.
2016-05-17 16:06:00 -07:00
Benoit Steiner
86ae94462e
#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
2016-05-17 14:06:15 -07:00
Benoit Steiner
997c335970
Fixed compilation error
2016-05-17 12:54:18 -07:00
Benoit Steiner
ebf6ada5ee
Fixed compilation error in the tensor thread pool
2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen
0bb61b04ca
Merge upstream.
2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen
0dbd68145f
Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.
2016-05-17 10:25:19 -07:00
Rasmus Larsen
00228f2506
Merged eigen/eigen into default
2016-05-17 09:49:31 -07:00
Benoit Steiner
e7e64c3277
Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit:
...
Before"
M_broadcasting/10 500000 3690 27.10 MFlops/s
BM_broadcasting/80 500000 4014 1594.24 MFlops/s
BM_broadcasting/640 100000 14770 27731.35 MFlops/s
BM_broadcasting/4K 5000 632711 39512.48 MFlops/s
After:
BM_broadcasting/10 500000 4287 23.33 MFlops/s
BM_broadcasting/80 500000 4455 1436.41 MFlops/s
BM_broadcasting/640 200000 10195 40173.01 MFlops/s
BM_broadcasting/4K 5000 423746 58997.57 MFlops/s
2016-05-17 09:24:35 -07:00
Benoit Steiner
5fa27574dd
Allow vectorized padding on GPU. This helps speed things up a little
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:17:26 -07:00
Benoit Steiner
a910bcee43
Merged latest updates from trunk
2016-05-17 09:14:22 -07:00
Benoit Steiner
8d06c02ffd
Allow vectorized padding on GPU. This helps speed things up a little.
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:13:27 -07:00
Benoit Steiner
86da77cb9b
Pulled latest updates from trunk.
2016-05-17 07:21:48 -07:00
Benoit Steiner
92fc6add43
Don't rely on c++11 extension when we don't have to.
2016-05-17 07:21:22 -07:00
Benoit Steiner
2d74ef9682
Avoid float to double conversion
2016-05-17 07:20:11 -07:00
David Dement
ccc7563ac5
made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process
2016-05-16 14:26:41 -04:00
Gael Guennebaud
575bc44c3f
Fix unit test.
2016-05-19 22:48:16 +02:00
Gael Guennebaud
ccb408ee6a
Improve unit tests of zeta, polygamma, and digamma
2016-05-19 18:34:41 +02:00
Gael Guennebaud
6761c64d60
zeta and polygamma are not unary functions, but binary ones.
2016-05-19 18:34:16 +02:00
Gael Guennebaud
7a54032408
zeta and digamma do not require C++11/C99
2016-05-19 17:36:47 +02:00
Gael Guennebaud
ce12562710
Add some c++11 flags in documentation
2016-05-19 17:35:30 +02:00
Gael Guennebaud
b6ed8244b4
bug #1201 : optimize affine*vector products
2016-05-19 16:09:15 +02:00
Gael Guennebaud
73693b5de6
bug #1221 : disable gcc 6 warning: ignoring attributes on template argument
2016-05-19 15:21:53 +02:00
Gael Guennebaud
df9a5e13c6
Fix SelfAdjointEigenSolver for some input expression types, and add new regression unit tests for sparse and selfadjointview inputs.
2016-05-19 13:07:33 +02:00
Gael Guennebaud
6a2916df80
DiagonalWrapper is a vector, so it must expose the LinearAccessBit flag.
2016-05-19 13:06:21 +02:00
Gael Guennebaud
a226f6af6b
Add support for SelfAdjointView::diagonal()
2016-05-19 13:05:33 +02:00
Gael Guennebaud
ee7da3c7c5
Fix SelfAdjointView::triangularView for complexes.
2016-05-19 13:01:51 +02:00
Gael Guennebaud
b6b8578a67
bug #1230 : add support for SelfadjointView::triangularView.
2016-05-19 11:36:38 +02:00
Benoit Steiner
a80d875916
Added missing costPerCoeff method
2016-05-16 09:31:10 -07:00
Benoit Steiner
83ef39e055
Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions.
...
Before:
BM_colReduction_12T/10 1000000 1949 51.29 MFlops/s
BM_colReduction_12T/80 100000 15636 409.29 MFlops/s
BM_colReduction_12T/640 20000 95100 4307.01 MFlops/s
BM_colReduction_12T/4K 500 4573423 5466.36 MFlops/s
BM_colReduction_4T/10 1000000 1867 53.56 MFlops/s
BM_colReduction_4T/80 500000 5288 1210.11 MFlops/s
BM_colReduction_4T/640 10000 106924 3830.75 MFlops/s
BM_colReduction_4T/4K 500 9946374 2513.48 MFlops/s
BM_colReduction_8T/10 1000000 1912 52.30 MFlops/s
BM_colReduction_8T/80 200000 8354 766.09 MFlops/s
BM_colReduction_8T/640 20000 85063 4815.22 MFlops/s
BM_colReduction_8T/4K 500 5445216 4591.19 MFlops/s
BM_rowReduction_12T/10 1000000 2041 48.99 MFlops/s
BM_rowReduction_12T/80 100000 15426 414.87 MFlops/s
BM_rowReduction_12T/640 50000 39117 10470.98 MFlops/s
BM_rowReduction_12T/4K 500 3034298 8239.14 MFlops/s
BM_rowReduction_4T/10 1000000 1834 54.51 MFlops/s
BM_rowReduction_4T/80 500000 5406 1183.81 MFlops/s
BM_rowReduction_4T/640 50000 35017 11697.16 MFlops/s
BM_rowReduction_4T/4K 500 3428527 7291.76 MFlops/s
BM_rowReduction_8T/10 1000000 1925 51.95 MFlops/s
BM_rowReduction_8T/80 200000 8519 751.23 MFlops/s
BM_rowReduction_8T/640 50000 33441 12248.42 MFlops/s
BM_rowReduction_8T/4K 1000 2852841 8763.19 MFlops/s
After:
BM_colReduction_12T/10 50000000 59 1678.30 MFlops/s
BM_colReduction_12T/80 5000000 725 8822.71 MFlops/s
BM_colReduction_12T/640 20000 90882 4506.93 MFlops/s
BM_colReduction_12T/4K 500 4668855 5354.63 MFlops/s
BM_colReduction_4T/10 50000000 59 1687.37 MFlops/s
BM_colReduction_4T/80 5000000 737 8681.24 MFlops/s
BM_colReduction_4T/640 50000 108637 3770.34 MFlops/s
BM_colReduction_4T/4K 500 7912954 3159.38 MFlops/s
BM_colReduction_8T/10 50000000 60 1657.21 MFlops/s
BM_colReduction_8T/80 5000000 726 8812.48 MFlops/s
BM_colReduction_8T/640 20000 91451 4478.90 MFlops/s
BM_colReduction_8T/4K 500 5441692 4594.16 MFlops/s
BM_rowReduction_12T/10 20000000 93 1065.28 MFlops/s
BM_rowReduction_12T/80 2000000 950 6730.96 MFlops/s
BM_rowReduction_12T/640 50000 38196 10723.48 MFlops/s
BM_rowReduction_12T/4K 500 3019217 8280.29 MFlops/s
BM_rowReduction_4T/10 20000000 93 1064.30 MFlops/s
BM_rowReduction_4T/80 2000000 959 6667.71 MFlops/s
BM_rowReduction_4T/640 50000 37433 10941.96 MFlops/s
BM_rowReduction_4T/4K 500 3036476 8233.23 MFlops/s
BM_rowReduction_8T/10 20000000 93 1072.47 MFlops/s
BM_rowReduction_8T/80 2000000 959 6670.04 MFlops/s
BM_rowReduction_8T/640 50000 38069 10759.37 MFlops/s
BM_rowReduction_8T/4K 1000 2758988 9061.29 MFlops/s
2016-05-16 08:55:21 -07:00
Benoit Steiner
b789a26804
Fixed syntax error
2016-05-16 08:51:08 -07:00
Benoit Steiner
83dfb40f66
Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define.
2016-05-13 17:23:15 -07:00
Benoit Steiner
97605c7b27
New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.
2016-05-13 17:11:29 -07:00
Benoit Steiner
069a0b04d7
Added benchmarks for contraction on CPU.
2016-05-13 14:32:17 -07:00
Benoit Steiner
c4fc8b70ec
Removed unnecessary thread synchronization
2016-05-13 10:49:38 -07:00
Benoit Steiner
7aa3557d31
Fixed compilation errors triggered by old versions of gcc
2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen
5005b27fc8
Diasbled cost model by accident. Revert.
2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen
989e419328
Address comments by bsteiner.
2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen
e55deb21c5
Improvements to parallelFor.
...
Move some scalar functors from TensorFunctors. to Eigen core.
2016-05-12 14:07:22 -07:00
Benoit Steiner
ae9688f313
Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.
2016-05-12 12:06:51 -07:00
Benoit Steiner
2a54b70d45
Fixed potential race condition in the non blocking thread pool
2016-05-12 11:45:48 -07:00
Benoit Steiner
a071629fec
Replace implicit cast with an explicit one
2016-05-12 10:40:07 -07:00
Benoit Steiner
2f9401b061
Worked around compilation errors with older versions of gcc
2016-05-11 23:39:20 -07:00
Benoit Steiner
09653e1f82
Improved the portability of the tensor code
2016-05-11 23:29:09 -07:00
Benoit Steiner
fae0493f98
Fixed a couple of bugs related to the Pascalfamily of GPUs
...
H: Enter commit message. Lines beginning with 'HG:' are removed.
2016-05-11 23:02:26 -07:00
Benoit Steiner
886445ce4d
Avoid unnecessary conversions between floats and doubles
2016-05-11 23:00:03 -07:00
Benoit Steiner
595e890391
Added more tests for half floats
2016-05-11 21:27:15 -07:00
Benoit Steiner
b6a517c47d
Added the ability to load fp16 using the texture path.
...
Improved the performance of some reductions on fp16
2016-05-11 21:26:48 -07:00
Benoit Steiner
518149e868
Misc fixes for fp16
2016-05-11 20:11:14 -07:00
Benoit Steiner
56a1757d74
Made predux_min and predux_max on fp16 less noisy
2016-05-11 17:37:34 -07:00
Benoit Steiner
9091351dbe
__ldg is only available with cuda architectures >= 3.5
2016-05-11 15:22:13 -07:00
Benoit Steiner
02f76dae2d
Fixed a typo
2016-05-11 15:08:38 -07:00
Christoph Hertzberg
131e5a1a4a
Do not copy for trivial 1x1 case. This also avoids a "maybe-uninitialized" warning in some situations.
2016-05-11 23:50:13 +02:00
Benoit Steiner
70195a5ff7
Added missing EIGEN_DEVICE_FUNC
2016-05-11 14:10:09 -07:00
Benoit Steiner
09a19c33a8
Added missing EIGEN_DEVICE_FUNC qualifiers
2016-05-11 14:07:43 -07:00
Christoph Hertzberg
1a1ce6ff61
Removed deprecated flag (which apparently was ignored anyway)
2016-05-11 23:05:37 +02:00
Christoph Hertzberg
2150f13d65
fixed some double-promotion and sign-compare warnings
2016-05-11 23:02:26 +02:00
Christoph Hertzberg
7268b10203
Split unit test
2016-05-11 19:41:53 +02:00
Christoph Hertzberg
8d4ef391b0
Don't flood test output with successful VERIFY_IS_NOT_EQUAL tests.
2016-05-11 19:40:45 +02:00
Christoph Hertzberg
bda21407dd
Fix help output of buildtests and check scripts
2016-05-11 19:39:09 +02:00
Christoph Hertzberg
33ca7e3c8d
bug #1207 : Add and fix logical-op warnings
2016-05-11 19:36:34 +02:00
Benoit Steiner
217d984abc
Fixed a typo in my previous commit
2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48
Fix potential race condition in the CUDA reduction code.
2016-05-11 10:08:51 -07:00
Benoit Steiner
cbb14ed47e
Added a few tests to validate the generation of random tensors on GPU.
2016-05-11 10:05:56 -07:00
Benoit Steiner
6a5717dc74
Explicitely initialize all the atomic variables.
2016-05-11 10:04:41 -07:00
Christoph Hertzberg
0f61343893
Workaround maybe-uninitialized warning
2016-05-11 09:00:18 +02:00
Christoph Hertzberg
3bfc9b47ca
Workaround "misleading-indentation" warnings
2016-05-11 08:41:36 +02:00
Benoit Steiner
4ede059de1
Properly gate the use of half2.
2016-05-10 17:04:01 -07:00
Benoit Steiner
bf185c3c28
Extended the tests for ptanh
2016-05-10 16:21:43 -07:00
Benoit Steiner
661e710092
Added support for fp16 to the sigmoid functor.
2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552
Small improvement to the full reduction of fp16
2016-05-10 11:58:18 -07:00
Benoit Steiner
0b9e3dcd06
Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This improves the performance by 10 to 30%.
2016-05-10 11:05:33 -07:00
Benoit Steiner
6bf8273bc0
Added a test to validate the new non blocking thread pool
2016-05-10 10:49:34 -07:00
Benoit Steiner
4013b8feca
Simplified the reduction code a little.
2016-05-10 09:40:42 -07:00
Benoit Steiner
75bd2bd32d
Fixed compilation warning
2016-05-09 19:24:41 -07:00
Benoit Steiner
4670d7d5ce
Improved the performance of full reductions on GPU:
...
Before:
BM_fullReduction/10 200000 11751 8.51 MFlops/s
BM_fullReduction/80 5000 523385 12.23 MFlops/s
BM_fullReduction/640 50 36179326 11.32 MFlops/s
BM_fullReduction/4K 1 2173517195 11.50 MFlops/s
After:
BM_fullReduction/10 500000 5987 16.70 MFlops/s
BM_fullReduction/80 200000 10636 601.73 MFlops/s
BM_fullReduction/640 50000 58428 7010.31 MFlops/s
BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58
Added the ability to use a scratch buffer in cuda kernels
2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2
Added a new parallelFor api to the thread pool device.
2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7
Optimized the non blocking thread pool:
...
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
* Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
* Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
05c365fb16
Pulled latest updates from trunk
2016-05-07 13:39:04 -07:00
Benoit Steiner
691614bd2c
Worked around a bug in nvcc on tegra x1
2016-05-07 13:28:53 -07:00
Benoit Steiner
a2d94fc216
Merged latest updates from trunk
2016-05-06 19:17:57 -07:00
Benoit Steiner
8adf5cc70f
Added support for packet processing of fp16 on kepler and maxwell gpus
2016-05-06 19:16:43 -07:00
Benoit Steiner
1660e749b4
Avoid double promotion
2016-05-06 08:15:12 -07:00
Christoph Hertzberg
a11bd82dc3
bug #1213 : Give names to anonymous enums
2016-05-06 11:31:56 +02:00
Benoit Steiner
c54ae65c83
Marked a few tensor operations as read only
2016-05-05 17:18:47 -07:00
Benoit Steiner
69a8a4e1f3
Added a test to validate full reduction on tensor of half floats
2016-05-05 16:52:50 -07:00
Benoit Steiner
678a17ba79
Made the testing of contractions on fp16 more robust
2016-05-05 16:36:39 -07:00
Benoit Steiner
e3d053e14e
Refined the testing of log and exp on fp16
2016-05-05 16:24:15 -07:00
Benoit Steiner
9a48688d37
Further improved the testing of fp16
2016-05-05 15:58:05 -07:00
Benoit Steiner
0451940fa4
Relaxed the dummy precision for fp16
2016-05-05 15:40:01 -07:00
Benoit Steiner
910e013506
Relaxed an assertion that was tighter that necessary.
2016-05-05 15:38:16 -07:00
Benoit Steiner
f81e413180
Added a benchmark to measure the performance of full reductions of 16 bit floats
2016-05-05 14:15:11 -07:00
Benoit Steiner
28d5572658
Fixed some incorrect assertions
2016-05-05 10:02:26 -07:00
Benoit Steiner
2aba40d208
Avoid unecessary type promotion
2016-05-05 09:26:57 -07:00
Benoit Steiner
a4d6e8fef0
Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.
2016-05-05 09:25:55 -07:00
Benoit Steiner
7875437ca0
Avoided unecessary type promotion
2016-05-05 09:08:42 -07:00
Benoit Steiner
f363e533aa
Added tests for full contractions using thread pools and gpu devices.
...
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58
Updated the contraction code to ensure that full contraction return a tensor of rank 0
2016-05-05 08:37:47 -07:00
Christoph Hertzberg
b300a84989
Fixed some singed/unsigned comparison warnings
2016-05-05 13:36:28 +02:00
Christoph Hertzberg
dacb469bc9
Enable and fix -Wdouble-conversion warnings
2016-05-05 13:35:45 +02:00
Benoit Steiner
62b710072e
Reduced the memory footprint of the cxx11_tensor_image_patch test
2016-05-04 21:08:22 -07:00
Benoit Steiner
dd2b45feed
Removed extraneous 'explicit' keywords
2016-05-04 16:57:52 -07:00
Ola Røer Thorsen
be78aea6b3
fix double-promotion/float-conversion in Core/SpecialFunctions.h
2016-05-04 10:52:08 +02:00
Gael Guennebaud
75a94b9662
Improve documentation of BDCSVD
2016-05-04 12:53:14 +02:00
Benoit Steiner
968ec1c2ae
Use numext::isfinite instead of std::isfinite
2016-05-03 19:56:40 -07:00
Gael Guennebaud
e2ca478485
bug #1214 : consider denormals as zero in D&C SVD. This also workaround infinite binary search when compiling with ICC's unsafe optimizations.
2016-05-03 23:15:29 +02:00
Benoit Steiner
f899e08946
Enabled a number of tests previously disabled by mistake
2016-05-03 14:07:47 -07:00
Benoit Steiner
4c05fb03a3
Merged eigen/eigen into default
2016-05-03 13:15:00 -07:00
Benoit Steiner
577a07a86e
Re-enabled the product_small test now that everything compiles correctly.
2016-05-03 13:11:38 -07:00
Benoit Steiner
2c5568a757
Added a test to validate the computation of exp and log on 16bit floats
2016-05-03 12:06:07 -07:00
Benoit Steiner
6c3e5b85bc
Fixed compilation error with cuda >= 7.5
2016-05-03 09:38:42 -07:00
Benoit Steiner
aad9a04da4
Deleted superfluous explicit keyword.
2016-05-03 09:37:19 -07:00
Benoit Steiner
da50419df8
Made a cast explicit
2016-05-02 19:50:22 -07:00
Benoit Steiner
73ef5371e4
Pulled latest updates from trunk
2016-05-01 14:48:57 -07:00
Benoit Steiner
8a9228ed9b
Fixed compilation error
2016-05-01 14:48:01 -07:00
Gael Guennebaud
b1bd53aa6b
Fix performance regression: with AVX, unaligned stores were emitted instead of aligned ones for fixed size assignement.
2016-05-01 23:25:06 +02:00
Benoit Steiner
d6c9596fd8
Added missing accessors to fixed sized tensors
2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e
Deleted trailing commas
2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2
Deleted useless trailing commas
2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb
Deleted unnecessary trailing commas.
2016-04-29 18:33:46 -07:00
Benoit Steiner
2b890ae618
Fixed compilation errors generated by clang
2016-04-29 18:30:40 -07:00
Benoit Steiner
d217217842
Added a few tests to ensure that the dimensions of rank 0 tensors are correctly computed
2016-04-29 18:15:34 -07:00
Benoit Steiner
f100d1494c
Return the proper size (ie 1) for tensors of rank 0
2016-04-29 18:14:33 -07:00
Benoit Steiner
d14105f158
Made several tensor tests compatible with cxx03
2016-04-29 17:22:37 -07:00
Benoit Steiner
c0882ef4d9
Moved a number of tensor tests that don't require cxx11 to work properly outside the EIGEN_TEST_CXX11 test section
2016-04-29 17:13:51 -07:00
Benoit Steiner
9d1dbd1ec0
Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support
2016-04-29 16:53:55 -07:00
Benoit Steiner
a8c0405cf5
Deleted unused default values for template parameters
2016-04-29 16:34:43 -07:00
Benoit Steiner
4f53178e62
Made a coupe of tensor tests compile without requiring c++11 support.
2016-04-29 16:09:54 -07:00
Benoit Steiner
1131a984a6
Made the cxx11_tensor_forced_eval compile without c++11.
2016-04-29 15:48:59 -07:00
Benoit Steiner
46bcb70969
Don't turn on const expressions when compiling with gcc >= 4.8 unless the -std=c++11 option has been used
2016-04-29 15:20:59 -07:00
Benoit Steiner
c07404f6a1
Restore Tensor support for non c++11 compilers
2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021
Fixed include path
2016-04-29 15:11:09 -07:00
Benoit Steiner
3b8da4be5a
Extended the packetmath test to cover all the alignments made possible by avx512 instructions.
2016-04-29 14:13:43 -07:00
Benoit Steiner
2f28ccbea3
Update the makefile to make the tests compile with gcc 4.9
2016-04-29 14:11:09 -07:00
Benoit Steiner
7a4bd337d9
Resolved merge conflict
2016-04-29 13:42:22 -07:00
Benoit Steiner
07a247dcf4
Pulled latest updates from upstream
2016-04-29 13:41:26 -07:00
Benoit Steiner
fa5a8f055a
Implemented palign_impl for AVX512
2016-04-29 13:30:13 -07:00
Benoit Steiner
ef3ac9d05a
Fixed the AVX512 packet traits
2016-04-29 13:28:36 -07:00
Benoit Steiner
d7b75e8d86
Added pdiv packet primitives for avx512
2016-04-29 13:26:47 -07:00
Benoit Steiner
5e89ded685
Implemented preduxp for AVX512
2016-04-29 13:00:33 -07:00
Benoit Steiner
5f85662ad8
Implemented the pabs and preverse primitives for avx512.
2016-04-29 12:53:34 -07:00
Benoit Steiner
d37ee89ca8
Disabled some of the AVX512 primitives on compilers that don't support them
2016-04-29 12:50:29 -07:00
Gael Guennebaud
0f3c4c8ff4
Fix compilation of sparse.cast<>().transpose().
2016-04-29 18:26:08 +02:00
Benoit Steiner
a524a26fdc
Fixed a few memory leaks
2016-04-28 18:55:53 -07:00
Benoit Steiner
dacb23277e
Fixed the igamma and igammac implementations to make them callable from a gpu kernel.
2016-04-28 18:54:54 -07:00
Benoit Steiner
a5d4545083
Deleted unused variable
2016-04-28 14:14:48 -07:00
Justin Lebar
40d1e2f8c7
Eliminate mutual recursion in igamma{,c}_impl::Run.
...
Presently, igammac_impl::Run calls igamma_impl::Run, which in turn calls
igammac_impl::Run.
This isn't actually mutual recursion; the calls are guarded such that we never
get into a loop. Nonetheless, it's a stretch for clang to prove this. As a
result, clang emits a recursive call in both igammac_impl::Run and
igamma_impl::Run.
That this is suboptimal code is bad enough, but it's particularly bad when
compiling for CUDA/nvptx. nvptx allows recursion, but only begrudgingly: If
you have recursive calls in a kernel, it's on you to manually specify the
kernel's stack size. Otherwise, ptxas will dump a warning, make a guess, and
who knows if it's right.
This change explicitly eliminates the mutual recursion in igammac_impl::Run and
igamma_impl::Run.
2016-04-28 13:57:08 -07:00
Konstantinos Margaritis
87294c84a6
define Packet2d constants with VSX only
2016-04-28 14:39:56 -03:00
Konstantinos Margaritis
6ed7a7281c
remove accidentally pasted code
2016-04-28 14:35:55 -03:00
Konstantinos Margaritis
62f9093b31
improve state of MathFunctions as well
2016-04-28 14:33:09 -03:00
Konstantinos Margaritis
8ed26120c8
bring Altivec/VSX to a better state, implement some of the missing functions
2016-04-28 14:32:42 -03:00
Konstantinos Margaritis
950158f6d1
add name to copyrights
2016-04-28 14:32:11 -03:00
Konstantinos Margaritis
ee0459300b
minor fix, add to copyright
2016-04-28 14:31:21 -03:00
Benoit Steiner
3ec81fc00f
Fixed compilation error with clang.
2016-04-27 19:32:12 -07:00
Benoit Steiner
2b917291d9
Merged in rmlarsen/eigen2 (pull request PR-183)
...
Detect cxx_constexpr support when compiling with clang.
2016-04-27 15:19:54 -07:00
Rasmus Munk Larsen
09b9e951e3
Depend on the more extensive support for constexpr in clang:
...
http://clang.llvm.org/docs/LanguageExtensions.html#c-1y-relaxed-constexpr
2016-04-27 14:59:11 -07:00
Rasmus Munk Larsen
1a325ef71c
Detect cxx_constexpr support when compiling with clang.
2016-04-27 14:33:51 -07:00
Benoit Steiner
1a97fd8b4e
Merged latest update from trunk
2016-04-27 14:22:45 -07:00
Benoit Steiner
c61170e87d
fpclassify isn't portable enough. In particular, the return values of the function are not available on all the platforms Eigen supportes: remove it from Eigen.
2016-04-27 14:22:20 -07:00
Gael Guennebaud
318e65e0ae
Fix missing inclusion of Eigen/Core
2016-04-27 23:05:40 +02:00
Benoit Steiner
f629fe95c8
Made the index type a template parameter to evaluateProductBlockingSizes
...
Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.
2016-04-27 13:11:19 -07:00
Benoit Steiner
66b215b742
Merged latest updates from trunk
2016-04-27 12:57:48 -07:00
Benoit Steiner
25141b69d4
Improved support for min and max on 16 bit floats when running on recent cuda gpus
2016-04-27 12:57:21 -07:00
Rasmus Larsen
ff33798acd
Merged eigen/eigen into default
2016-04-27 12:27:00 -07:00
Rasmus Munk Larsen
463738ccbe
Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.
2016-04-27 12:26:18 -07:00
Benoit Steiner
6744d776ba
Added support for fpclassify in Eigen::Numext
2016-04-27 12:10:25 -07:00
Rasmus Munk Larsen
1f48f47ab7
Implement stricter argument checking for SYRK and SY2K and real matrices. To implement the BLAS API they should return info=2 if op='C' is passed for a complex matrix. Without this change, the Eigen BLAS fails the strict zblat3 and cblat3 tests in LAPACK 3.5.
2016-04-27 19:59:44 +02:00
Gael Guennebaud
3dddd34133
Refactor the unsupported CXX11/Core module to internal headers only.
2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46
Fixed the partial evaluation of non vectorizable tensor subexpressions
2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260
Refined the cost of the striding operation.
2016-04-25 09:16:08 -07:00
Heiko Bauke
e19b58e672
alias template for matrix and array classes
2016-04-23 00:08:51 +02:00
Konstantinos Margaritis
3f80696ae1
Merged eigen/eigen into default
2016-04-22 15:05:21 +03:00
Benoit Steiner
5c372d19e3
Merged in rmlarsen/eigen (pull request PR-179)
...
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 18:06:36 -07:00
Benoit Steiner
4bbc97be5e
Provide access to the base threadpool classes
2016-04-21 17:59:33 -07:00
Rasmus Munk Larsen
a3256d78d8
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 16:49:28 -07:00
Benoit Steiner
33adce5c3a
Added the ability to switch to the new thread pool with a #define
2016-04-21 11:59:58 -07:00
Benoit Steiner
79b900375f
Use index list for the striding benchmarks
2016-04-21 11:58:27 -07:00
Benoit Steiner
f670613e4b
Fixed several compilation warnings
2016-04-21 11:03:02 -07:00
Benoit Steiner
6015422ee6
Added an option to enable the use of the F16C instruction set
2016-04-21 10:30:29 -07:00
Benoit Steiner
32ffce04fc
Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable.
2016-04-21 08:47:28 -07:00
Konstantinos Margaritis
e5b2ef47d5
Merged eigen/eigen into default
2016-04-21 18:03:08 +03:00
Benoit Steiner
2dde1b1028
Don't crash when attempting to reduce empty tensors.
2016-04-20 18:08:20 -07:00
Benoit Steiner
a792cd357d
Added more tests
2016-04-20 17:33:58 -07:00
Benoit Steiner
80200a1828
Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction.
2016-04-20 12:10:27 -07:00
Benoit Steiner
c7c2054bb5
Started to implement a portable way to yield.
2016-04-19 17:59:58 -07:00
Benoit Steiner
1d0238375d
Made sure all the required header files are included when trying to use fp16
2016-04-19 17:44:12 -07:00
Benoit Steiner
2b72163028
Implemented a more portable version of thread local variables
2016-04-19 15:56:02 -07:00
Benoit Steiner
04f954956d
Fixed a few typos
2016-04-19 15:27:09 -07:00
Benoit Steiner
5b1106c56b
Fixed a compilation error with nvcc 7.
2016-04-19 14:57:57 -07:00
Benoit Steiner
7129d998db
Simplified the code that launches cuda kernels.
2016-04-19 14:55:21 -07:00
Benoit Steiner
b9ea40c30d
Don't take the address of a kernel on CUDA devices that don't support this feature.
2016-04-19 14:35:11 -07:00
Benoit Steiner
884c075058
Use numext::ceil instead of std::ceil
2016-04-19 14:33:30 -07:00
Benoit Steiner
a278414d1b
Avoid an unnecessary copy of the evaluator.
2016-04-19 13:54:28 -07:00
Benoit Steiner
f953c60705
Fixed 2 recent regression tests
2016-04-19 12:57:39 -07:00
Benoit Steiner
50968a0a3e
Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.
2016-04-19 11:53:58 -07:00
Benoit Steiner
84543c8be2
Worked around the lack of a rand_r function on windows systems
2016-04-17 19:29:27 -07:00
Benoit Steiner
5fbcfe5eb4
Worked around the lack of a rand_r function on windows systems
2016-04-17 18:42:31 -07:00
Gael Guennebaud
e4fe611e2c
Enable lazy-coeff-based-product for vector*(1x1) products
2016-04-16 15:17:39 +02:00
Benoit Steiner
c8e8f93d6c
Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.
2016-04-15 16:48:10 -07:00
Benoit Steiner
1a16fb1532
Deleted extraneous comma.
2016-04-15 15:50:13 -07:00
Benoit Steiner
7cff898e0a
Deleted unnecessary variable
2016-04-15 15:46:14 -07:00
Benoit Steiner
6c43c49e4a
Fixed a few compilation warnings
2016-04-15 15:34:34 -07:00
Benoit Steiner
eb669f989f
Merged in rmlarsen/eigen (pull request PR-178)
...
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
2016-04-15 14:53:15 -07:00
Gael Guennebaud
2a7115daca
bug #1203 : by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small
2016-04-15 22:34:11 +02:00
Rasmus Munk Larsen
3718bf654b
Get rid of void* casting when calling EvalRange::run.
2016-04-15 12:51:33 -07:00
Benoit Steiner
40c9923a8a
Fixed compilation errors with msvc
2016-04-15 11:27:52 -07:00
Benoit Steiner
1d23430628
Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).
2016-04-15 10:53:31 -07:00
Gael Guennebaud
1e80bddde3
Fix trmv for mixing types.
2016-04-15 17:58:36 +02:00
Konstantinos Margaritis
0e8fc31087
remove pgather/pscatter for std::complex<double> for s390x
2016-04-15 07:08:57 -04:00
Benoit Steiner
a62e924656
Added ability to access the cache sizes from the tensor devices
2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426
Added support for exclusive or
2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen
07ac4f7e02
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.
2016-04-14 18:28:23 -07:00
Benoit Steiner
9624a1ea3d
Added missing definition of PacketSize in the gpu evaluator of convolution
2016-04-14 17:16:58 -07:00
Benoit Steiner
6fbedf5a4e
Merged in rmlarsen/eigen (pull request PR-177)
...
Eigen Tensor cost model part 1.
2016-04-14 17:13:19 -07:00
Benoit Steiner
bebb89acfa
Enabled the new threadpool tests
2016-04-14 16:44:10 -07:00
Benoit Steiner
9c064b5a97
Cleanup
2016-04-14 16:41:31 -07:00
Benoit Steiner
1372156c41
Prepared the migration to the new non blocking thread pool
2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen
aeb5494a0b
Improvements to cost model.
2016-04-14 15:52:58 -07:00
Benoit Steiner
00dfe18487
Merged latest updates from trunk
2016-04-14 15:25:20 -07:00
Benoit Steiner
a8e8837ba7
Added tests for the non blocking thread pool
2016-04-14 15:23:49 -07:00
Benoit Steiner
78a51abc12
Added a more scalable non blocking thread pool
2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen
d2e95492e7
Merge upstream updates.
2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen
235e83aba6
Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.
2016-04-14 13:57:35 -07:00
Gael Guennebaud
68897c52f3
Add extreme values to the imaginary part for SVD unit tests.
2016-04-14 22:47:30 +02:00
Gael Guennebaud
20f387fafa
Improve numerical robustness of JacoviSVD:
...
- avoid noise amplification in complex to real conversion
- compare off-diagonal entries to the current biggest diagonal entry: no need to bother about a 2x2 block containing ridiculously small entries compared to the rest of the matrix.
2016-04-14 22:46:55 +02:00
Benoit Steiner
7718749fee
Force the inlining of the << operator on half floats
2016-04-14 11:51:54 -07:00
Benoit Steiner
5379d2b594
Inline the << operator on half floats
2016-04-14 11:40:48 -07:00
Benoit Steiner
5912ad877c
Silenced a compilation warning
2016-04-14 11:40:14 -07:00
Benoit Steiner
2b6e3de02f
Added tests to validate flooring and ceiling of fp16
2016-04-14 11:39:18 -07:00
Benoit Steiner
6f23e945f6
Added simple test for numext::sqrt and numext::pow on fp16
2016-04-14 10:32:52 -07:00
Benoit Steiner
72510c80e1
Added basic test for trigonometric functions on fp16
2016-04-14 10:27:24 -07:00
Benoit Steiner
7b3d7acebe
Added support for fp16 to test_isApprox, test_isMuchSmallerThan, and test_isApproxOrLessThan
2016-04-14 10:25:50 -07:00
Benoit Steiner
5c13765ee3
Added ability to printf fp16
2016-04-14 10:24:52 -07:00
Benoit Steiner
c7167fee0e
Added support for fp16 to the sigmoid function
2016-04-14 10:08:33 -07:00
Benoit Steiner
f6003f0873
Made the test msvc friendly
2016-04-14 09:47:26 -07:00
Gael Guennebaud
3551dea887
Cleaning pass on rcond estimator.
2016-04-14 16:45:41 +02:00
Gael Guennebaud
d8a3bdaa24
remove useless include
2016-04-14 15:18:56 +02:00
Gael Guennebaud
d402adc3d7
Better use .data() than &coeffRef(0)
2016-04-14 15:18:08 +02:00
Gael Guennebaud
ea7087ef31
Merged in rmlarsen/eigen (pull request PR-174)
...
Add matrix condition number estimation module.
2016-04-14 15:11:33 +02:00
Benoit Steiner
36f5a10198
Properly gate the definition of the error and gamma functions for fp16
2016-04-13 18:44:48 -07:00
Benoit Steiner
10b69810d1
Improved support for trigonometric functions on GPU
2016-04-13 16:00:51 -07:00
Benoit Steiner
d6105b53b8
Added basic implementation of the lgamma, digamma, igamma, igammac, polygamma, and zeta function for fp16
2016-04-13 15:26:02 -07:00
Gael Guennebaud
703251f10f
merge
2016-04-13 23:45:10 +02:00
Gael Guennebaud
39211ba46b
Fix JacobiSVD for complex when the complex-to-real update already gives a diagonal 2x2 block.
2016-04-13 23:43:26 +02:00
Benoit Steiner
2986253259
Cleaned up the implementation of digamma
2016-04-13 14:24:06 -07:00
Benoit Steiner
d5de1a8220
Pulled latest updates from trunk
2016-04-13 14:17:11 -07:00
Benoit Steiner
87ca15c4e8
Added support for sin, cos, tan, and tanh on fp16
2016-04-13 14:12:38 -07:00
Gael Guennebaud
2c9e4fa417
Add debug output for random unit test
2016-04-13 22:56:12 +02:00
Gael Guennebaud
7d1391d049
Turn a converge check to a warning
2016-04-13 22:50:54 +02:00
Gael Guennebaud
feef39e2d1
Fix underflow in JacoviSVD's complex to real preconditioner
2016-04-13 22:49:51 +02:00
Gael Guennebaud
f4e12272f1
Fix corner case in unit test.
2016-04-13 22:18:02 +02:00
Gael Guennebaud
a95e1a273e
Fix warning in unit tests
2016-04-13 22:00:38 +02:00
Benoit Steiner
bf3f6688f0
Added support for computing cos, sin, tan, and tanh on GPU.
2016-04-13 11:55:08 -07:00
Benoit Steiner
473c8380ea
Added constructors to convert unsigned integers into fp16
2016-04-13 11:03:37 -07:00
Gael Guennebaud
42a3352a3b
Workaround a division by zero when outerstride==0
2016-04-13 19:02:02 +02:00
Gael Guennebaud
6f960b83ff
Make use of is_same_dense helper instead of extract_data to detect input/outputs are the same.
2016-04-13 18:47:12 +02:00
Gael Guennebaud
b7716c0328
Fix incomplete previous patch on matrix comparision.
2016-04-13 18:32:56 +02:00
Gael Guennebaud
2630d97c62
Fix detection of same matrices when both matrices are not handled by extract_data.
2016-04-13 18:26:08 +02:00
Gael Guennebaud
512ba0ac76
Add regression unit tests for half-packet vectorization
2016-04-13 18:16:35 +02:00
Gael Guennebaud
06447e0a39
Improve half-packet vectorization logic to distinguish linear versus inner traversal modes.
2016-04-13 18:15:49 +02:00
Gael Guennebaud
bbb8854bf7
Enable half-packet in reduxions.
2016-04-13 13:02:34 +02:00
Benoit Steiner
e9b12cc1f7
Fixed compilation warnings generated by clang
2016-04-12 20:53:18 -07:00
Benoit Steiner
eaeb6ca93a
Enable the benchmarks for algebraic and transcendental fnctions on fp16.
2016-04-12 16:29:00 -07:00
Benoit Steiner
aa1ba8bbd2
Don't put a command at the end of an enumerator list
2016-04-12 16:28:11 -07:00
Benoit Steiner
e49945ced4
Pulled latest update from trunk
2016-04-12 14:13:41 -07:00
Benoit Steiner
25d05c4b8f
Fixed the vectorization logic test
2016-04-12 14:13:25 -07:00
Benoit Steiner
53121c0119
Turned on the contraction benchmarks for fp16
2016-04-12 14:11:52 -07:00
Gael Guennebaud
b67c983291
Enable the use of half-packet in coeff-based product.
...
For instance, Matrix4f*Vector4f is now vectorized again when using AVX.
2016-04-12 23:03:03 +02:00
Benoit Steiner
e3a184785c
Fixed the zeta test
2016-04-12 11:12:36 -07:00
Benoit Steiner
3b76df64fc
Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3
2016-04-12 10:58:51 -07:00
Benoit Steiner
8bfe739cd2
Updated the AVX512 PacketMath to properly leverage the AVX512DQ instructions
2016-04-11 18:40:16 -07:00
Rasmus Larsen
6498dadc2f
Merged eigen/eigen into default
2016-04-11 17:42:05 -07:00
Benoit Steiner
d6e596174d
Pull latest updates from upstream
2016-04-11 17:20:17 -07:00
Benoit Steiner
748c4c4599
More accurate cost estimates for exp, log, tanh, and sqrt.
2016-04-11 13:11:04 -07:00
Benoit Steiner
833efb39bf
Added epsilon, dummy_precision, infinity and quiet_NaN NumTraits for fp16
2016-04-11 11:03:56 -07:00
Benoit Steiner
e939b087fe
Pulled latest update from trunk
2016-04-11 11:03:02 -07:00
Gael Guennebaud
1744b5b5d2
Update doc regarding the genericity of EIGEN_USE_BLAS
2016-04-11 17:16:07 +02:00
Gael Guennebaud
91bf925fc1
Improve constness of level2 blas API.
2016-04-11 17:13:01 +02:00
Gael Guennebaud
0483430283
Move LAPACK declarations from blas.h to lapack.h and fix compatibility with EIGEN_USE_MKL
2016-04-11 17:12:31 +02:00
Gael Guennebaud
097d1e8823
Cleanup obsolete assign_scalar_eig2mkl helper.
2016-04-11 16:09:29 +02:00
Gael Guennebaud
fec4c334ba
Remove all references to MKL in BLAS wrappers.
2016-04-11 16:04:09 +02:00
Gael Guennebaud
ddabc992fa
Fix long to int conversion in BLAS API.
2016-04-11 15:52:01 +02:00
Gael Guennebaud
8191f373be
Silent unused warning.
2016-04-11 15:37:16 +02:00
Gael Guennebaud
6a9ca88e7e
Relax dependency on MKL for EIGEN_USE_BLAS
2016-04-11 15:17:14 +02:00
Gael Guennebaud
4e8e5888d7
Improve constness of blas level-3 interface.
2016-04-11 15:12:44 +02:00
Gael Guennebaud
675e0a2224
Fix static/inline keywords order.
2016-04-11 15:06:20 +02:00
Gael Guennebaud
fc6a0ebb1c
Typos in doc.
2016-04-11 10:54:58 +02:00
Till Hoffmann
643b697649
Proper handling of domain errors.
2016-04-10 00:37:53 +01:00
Rasmus Munk Larsen
1f70bd4134
Merge.
2016-04-09 15:31:53 -07:00
Rasmus Munk Larsen
096e355f8e
Add short-circuit to avoid calling matrix norm for empty matrix.
2016-04-09 15:29:56 -07:00
Rasmus Larsen
be80fb49fc
Merged default ( 4a92b590a0
...
) into default
2016-04-09 13:13:01 -07:00
Rasmus Larsen
7a8176587b
Merged eigen/eigen into default
2016-04-09 12:47:41 -07:00
Rasmus Munk Larsen
4a92b590a0
Merge.
2016-04-09 12:47:24 -07:00
Rasmus Munk Larsen
ee6c69733a
A few tiny adjustments to short-circuit logic.
2016-04-09 12:45:49 -07:00
Till Hoffmann
7f4826890c
Merge upstream
2016-04-09 20:08:07 +01:00
Till Hoffmann
de057ebe54
Added nans to zeta function.
2016-04-09 20:07:36 +01:00
Gael Guennebaud
af2161cdb4
bug #1197 : fix/relax some LM unit tests
2016-04-09 11:14:02 +02:00
Gael Guennebaud
a05a683d83
bug #1160 : fix and relax some lm unit tests by turning faillures to warnings
2016-04-09 10:49:19 +02:00
Benoit Steiner
5da90fc8dd
Use numext::abs instead of std::abs in scalar_fuzzy_default_impl to make it usable inside GPU kernels.
2016-04-08 19:40:48 -07:00
Benoit Steiner
01bd577288
Fixed the implementation of Eigen::numext::isfinite, Eigen::numext::isnan, andEigen::numext::isinf on CUDA devices
2016-04-08 16:40:10 -07:00
Benoit Steiner
89a3dc35a3
Fixed isfinite_impl: NumTraits<T>::highest() and NumTraits<T>::lowest() are finite numbers.
2016-04-08 15:56:16 -07:00
Benoit Steiner
995f202cea
Disabled the use of half2 on cuda devices of compute capability < 5.3
2016-04-08 14:43:36 -07:00
Benoit Steiner
8d22967bd9
Initial support for taking the power of fp16
2016-04-08 14:22:39 -07:00
Benoit Steiner
3394379319
Fixed the packet_traits for half floats.
2016-04-08 13:33:59 -07:00
Benoit Steiner
0d2a532fc3
Created the new EIGEN_TEST_CUDA_CLANG option to compile the CUDA tests using clang instead of nvcc
2016-04-08 13:16:08 -07:00
Rasmus Larsen
0b81a18d12
Merged eigen/eigen into default
2016-04-08 12:58:57 -07:00
Benoit Steiner
2d072b38c1
Don't test the division by 0 on float16 when compiling with msvc since msvc detects and errors out on divisions by 0.
2016-04-08 12:50:25 -07:00
Benoit Jacob
cd2b667ac8
Add references to filed LLVM bugs
2016-04-08 08:12:47 -04:00
Benoit Steiner
3bd16457e1
Properly handle complex numbers.
2016-04-07 23:28:04 -07:00
Benoit Steiner
63102ee43d
Turn on the coeffWise benchmarks on fp16
2016-04-07 23:05:20 -07:00
Benoit Steiner
7c47d3e663
Fixed the type casting benchmarks for fp16
2016-04-07 22:50:25 -07:00
Benoit Steiner
166b56bc61
Fixed the type casting benchmark for float16
2016-04-07 22:45:54 -07:00
Benoit Steiner
2f2801f096
Merged in parthaEth/eigen (pull request PR-175)
...
Static casting scalar types so as to let chlesky module of eigen work with ceres
2016-04-07 22:10:14 -07:00
Benoit Steiner
d962fe6a99
Renamed float16 into cxx11_float16 since the test relies on c++11 features
2016-04-07 20:28:32 -07:00
Rasmus Larsen
c34e55c62b
Merged eigen/eigen into default
2016-04-07 20:23:03 -07:00
Benoit Steiner
7d5b17087f
Added missing EIGEN_DEVICE_FUNC to the tensor conversion code.
2016-04-07 20:01:19 -07:00
Benoit Steiner
a6d08be9b2
Fixed the benchmarking of fp16 coefficient wise operations
2016-04-07 17:13:44 -07:00
Rasmus Munk Larsen
283c51cd5e
Widen short-circuiting ReciprocalConditionNumberEstimate so we don't call InverseMatrixL1NormEstimate for dec.rows() <= 1.
2016-04-07 16:45:40 -07:00
Rasmus Munk Larsen
d51803a728
Use Index instead of int for indexing and sizes.
2016-04-07 16:39:48 -07:00
Rasmus Munk Larsen
fd872aefb3
Remove transpose() method from LLT and LDLT classes as it would imply conjugation.
...
Explicitly cast constants to RealScalar in ConditionEstimator.h.
2016-04-07 16:28:44 -07:00
Rasmus Munk Larsen
0b5546d182
Use lpNorm<1>() to compute l1 norms in LLT and LDLT.
2016-04-07 15:49:30 -07:00
parthaEth
2d5bb375b7
Static casting scalar types so as to let chlesky module of eigen work with ceres
2016-04-08 00:14:44 +02:00
Benoit Steiner
a02ec09511
Worked around numerical noise in the test for the zeta function.
2016-04-07 12:11:02 -07:00
Benoit Steiner
c912b1d28c
Fixed a typo in the polygamma test.
2016-04-07 11:51:07 -07:00
Benoit Steiner
74f64838c5
Updated the unary functors to use the numext implementation of typicall functions instead of the one provided in the standard library. The standard library functions aren't supported officially by cuda, so we're better off using the numext implementations.
2016-04-07 11:42:14 -07:00
Benoit Steiner
737644366f
Move the functions operating on fp16 out of the std namespace and into the Eigen::numext namespace
2016-04-07 11:40:15 -07:00
Benoit Steiner
dc45aaeb93
Added tests for float16
2016-04-07 11:18:05 -07:00
Benoit Steiner
8db269e055
Fixed a typo in a test
2016-04-07 10:41:51 -07:00
Benoit Steiner
b89d3f78b2
Updated the isnan, isinf and isfinite functions to make compatible with cuda devices.
2016-04-07 10:08:49 -07:00
Benoit Steiner
48308ed801
Added support for isinf, isnan, and isfinite checks to the tensor api
2016-04-07 09:48:36 -07:00
Benoit Steiner
cfb34d808b
Fixed a possible integer overflow.
2016-04-07 08:46:52 -07:00
Benoit Steiner
df838736e2
Fixed compilation warning triggered by msvc
2016-04-06 20:48:55 -07:00
Benoit Steiner
14ea7c7ec7
Fixed packet_traits<half>
2016-04-06 19:30:21 -07:00
Benoit Steiner
532fdf24cb
Added support for hardware conversion between fp16 and full floats whenever
...
possible.
2016-04-06 17:11:31 -07:00
Benoit Steiner
165150e896
Fixed the tests for the zeta and polygamma functions
2016-04-06 14:31:01 -07:00
Benoit Steiner
7be1eaad1e
Fixed typos in the implementation of the zeta and polygamma ops.
2016-04-06 14:15:37 -07:00
Benoit Steiner
58c1dbff19
Made the fp16 code more portable.
2016-04-06 13:44:08 -07:00
Benoit Steiner
cf7e73addd
Added some missing conversions to the Half class, and fixed the implementation of the < operator on cuda devices.
2016-04-06 09:59:51 -07:00
Benoit Steiner
10bdd8e378
Merged in tillahoffmann/eigen (pull request PR-173)
...
Added zeta function of two arguments and polygamma function
2016-04-06 09:40:17 -07:00
Benoit Steiner
7781f865cb
Renamed the EIGEN_TEST_NVCC cmake option into EIGEN_TEST_CUDA per the discussion in bug #1173 .
2016-04-06 09:35:23 -07:00
Benoit Steiner
72abfa11dd
Added support for isfinite on fp16
2016-04-06 09:07:30 -07:00
Rasmus Munk Larsen
4d07064a3d
Fix bug in alternate lower bound calculation due to missing parentheses.
...
Make a few expressions more concise.
2016-04-05 16:40:48 -07:00
Konstantinos Margaritis
2bba4ee2cf
Merged kmargar/eigen/tip into default
2016-04-05 22:22:08 +03:00
Konstantinos Margaritis
317384b397
complete the port, remove float support
2016-04-05 14:56:45 -04:00
tillahoffmann
726bd5f077
Merged eigen/eigen into default
2016-04-05 18:21:05 +01:00
Till Hoffmann
a350c25a39
Added accuracy comments.
2016-04-05 18:20:40 +01:00
Gael Guennebaud
4d7e230d2f
bug #1189 : fix pow/atan2 compilation for AutoDiffScalar
2016-04-05 14:49:41 +02:00
Konstantinos Margaritis
bc0ad363c6
add remaining includes
2016-04-05 06:01:17 -04:00
Konstantinos Margaritis
2d41dc9622
complete int/double specialized traits for ZVector
2016-04-05 06:00:51 -04:00
Konstantinos Margaritis
644d0f91d2
enable all tests again
2016-04-05 05:59:54 -04:00
Konstantinos Margaritis
988344daf1
enable the other includes as well
2016-04-05 05:59:30 -04:00
Rasmus Larsen
d7eeee0c1d
Merged eigen/eigen into default
2016-04-04 15:58:27 -07:00
Rasmus Munk Larsen
513c372960
Fix docstrings to list all supported decompositions.
2016-04-04 14:34:59 -07:00
Rasmus Munk Larsen
86e0ed81f8
Addresses comments on Eigen pull request PR-174.
...
* Get rid of code-duplication for real vs. complex matrices.
* Fix flipped arguments to select.
* Make the condition estimation functions free functions.
* Use Vector::Unit() to generate canonical unit vectors.
* Misc. cleanup.
2016-04-04 14:20:01 -07:00
Benoit Jacob
158fea0f5e
bug #1190 - Don't trust __ARM_FEATURE_FMA on Clang/ARM
2016-04-04 16:42:40 -04:00
Benoit Jacob
03f2997a11
bug #1191 - Prevent Clang/ARM from rewriting VMLA into VMUL+VADD
2016-04-04 16:41:47 -04:00
Till Hoffmann
b0143de177
Merge upstream.
2016-04-04 19:16:48 +01:00
Till Hoffmann
b97911dd18
Refactored code into type-specific helper functions.
2016-04-04 19:16:03 +01:00
Benoit Steiner
c4179dd470
Updated the scalar_abs_op struct to make it compatible with cuda devices.
2016-04-04 11:11:51 -07:00
Benoit Steiner
1108b4f218
Fixed the signature of numext::abs to make it compatible with complex numbers
2016-04-04 11:09:25 -07:00
tillahoffmann
b8245cc325
Merged eigen/eigen into default
2016-04-04 12:28:11 +01:00
Gael Guennebaud
2b457f8e5e
Fix cross-compiling windows version detection
2016-04-04 11:47:46 +02:00
Rasmus Larsen
30242b7565
Merged eigen/eigen into default
2016-04-01 17:19:36 -07:00
Rasmus Munk Larsen
9d51f7c457
Add rcond method to LDLT.
2016-04-01 16:48:38 -07:00
Rasmus Munk Larsen
f54137606e
Add condition estimation to Cholesky (LLT) factorization.
2016-04-01 16:19:45 -07:00
Rasmus Munk Larsen
fb8dccc23e
Replace "inline static" with "static inline" for consistency.
2016-04-01 12:48:18 -07:00
Rasmus Munk Larsen
91414e0042
Fix comments in ConditionEstimator and minor cleanup.
2016-04-01 11:58:17 -07:00
Rasmus Munk Larsen
1aa89fb855
Add matrix condition estimator module that implements the Higham/Hager algorithm from http://www.maths.manchester.ac.uk/~higham/narep/narep135.pdf used in LPACK. Add rcond() methods to FullPivLU and PartialPivLU.
2016-04-01 10:27:59 -07:00
Till Hoffmann
80eba21ad0
Merge upstream.
2016-04-01 18:18:49 +01:00
Till Hoffmann
eb0ae602bd
Added CUDA tests.
2016-04-01 18:17:45 +01:00
Till Hoffmann
ffd770ce94
Fixed CUDA signature.
2016-04-01 17:58:24 +01:00
Till Hoffmann
3cb0a237c1
Fixed suggestions by Eugene Brevdo.
2016-04-01 17:51:39 +01:00
tillahoffmann
49960adbdd
Merged eigen/eigen into default
2016-04-01 14:36:15 +01:00
Till Hoffmann
57239f4a81
Added polygamma function.
2016-04-01 14:35:21 +01:00
Till Hoffmann
dd5d390daf
Added zeta function.
2016-04-01 13:32:29 +01:00
Benoit Steiner
3da495e6b9
Relaxed the condition used to gate the fft code.
2016-03-31 18:11:51 -07:00
Benoit Steiner
0ea7ab4f62
Hashing was only officially introduced in c++11. Therefore only define an implementation of the hash function for float16 if c++11 is enabled.
2016-03-31 14:44:55 -07:00
Benoit Steiner
92b7f7b650
Improved code formating
2016-03-31 13:09:58 -07:00
Benoit Steiner
f197813f37
Added the ability to hash a fp16
2016-03-31 13:09:23 -07:00
Benoit Steiner
0f5cc504fe
Properly gate the fft code
2016-03-31 12:59:39 -07:00
Benoit Steiner
4c859181da
Made it possible to use the NumTraits for complex and Array in a cuda kernel.
2016-03-31 12:48:38 -07:00
Benoit Steiner
c36ab19902
Added __ldg primitive for fp16.
2016-03-31 10:55:03 -07:00
Benoit Steiner
b575fb1d02
Added NumTraits for half floats
2016-03-31 10:43:59 -07:00
Benoit Steiner
8c8a79cec1
Fixed a typo
2016-03-31 10:33:32 -07:00
Benoit Steiner
af4ef540bf
Fixed a off-by-one bug in a debug assertion
2016-03-30 18:37:19 -07:00
Benoit Steiner
791e5cfb69
Added NumTraits for type2index.
2016-03-30 18:36:36 -07:00
Benoit Steiner
4f1a7e51c1
Pull math functions from the global namespace only when compiling cuda code with nvcc. When compiling with clang, we want to use the std namespace.
2016-03-30 17:59:49 -07:00
Benoit Steiner
bc68fc2fe7
Enable constant expressions when compiling cuda code with clang.
2016-03-30 17:58:32 -07:00
Benoit Steiner
483aaad10a
Fixed compilation warning
2016-03-30 17:08:13 -07:00
Benoit Steiner
1b40abbf99
Added missing assignment operator to the TensorUInt128 class, and made misc small improvements
2016-03-30 13:17:03 -07:00
Benoit Jacob
01b5333e44
bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain
2016-03-30 11:02:33 -04:00
Benoit Steiner
aa45ad2aac
Fixed the formatting of the README.
2016-03-29 15:06:13 -07:00
Benoit Steiner
56df5ef1d7
Attempt to fix the formatting of the README
2016-03-29 15:03:38 -07:00
Benoit Steiner
1bcd82e31b
Pulled latest updates from trunk
2016-03-29 13:36:18 -07:00
Gael Guennebaud
09ad31aa85
Add regression test for nesting type handling in blas_traits
2016-03-29 22:33:57 +02:00
Benoit Steiner
1841d6d4c3
Added missing cuda template specializations for numext::ceil
2016-03-29 13:29:34 -07:00
Benoit Steiner
7b7d2a9fa5
Use false instead of 0 as the expected value of a boolean
2016-03-29 11:50:17 -07:00
Benoit Steiner
e02b784ec3
Added support for standard mathematical functions and trancendentals(such as exp, log, abs, ...) on fp16
2016-03-29 09:20:36 -07:00
Benoit Steiner
c38295f0a0
Added support for fmod
2016-03-28 15:53:02 -07:00
Benoit Steiner
6772f653c3
Made it possible to customize the threadpool
2016-03-28 10:01:04 -07:00
Benoit Steiner
1bc81f7889
Fixed compilation warnings on arm
2016-03-28 09:21:04 -07:00
Benoit Steiner
78f83d6f6a
Prevent potential overflow.
2016-03-28 09:18:04 -07:00
Konstantinos Margaritis
01e7298fe6
actually include ZVector files, passes most basic tests (float still fails)
2016-03-28 10:58:02 -04:00
Konstantinos Margaritis
f48011119e
Merged eigen/eigen into default
2016-03-28 01:48:45 +03:00
Konstantinos Margaritis
ed6b9d08f1
some primitives ported, but missing intrinsics and crash with asm() are a problem
2016-03-27 18:47:49 -04:00
Benoit Steiner
74f91ed06c
Improved support for integer modulo
2016-03-25 17:21:56 -07:00
Benoit Steiner
65716e99a5
Improved the cost estimate of the quotient op
2016-03-25 11:13:53 -07:00
Benoit Steiner
d94f6ba965
Started to model the cost of divisions more accurately.
2016-03-25 11:02:56 -07:00
Benoit Steiner
a86c9f037b
Fixed compilation error on windows
2016-03-24 18:54:31 -07:00
Benoit Steiner
0968e925a0
Updated the benchmarking code to use Eigen::half instead of half
2016-03-24 18:00:33 -07:00
Benoit Steiner
044efea965
Made sure that the cxx11_tensor_cuda test can be compiled even without support for cxx11.
2016-03-23 20:02:11 -07:00
Benoit Steiner
2e4e4cb74d
Use numext::abs instead of abs to avoid incorrect conversion to integer of the argument
2016-03-23 16:57:12 -07:00
Benoit Steiner
41434a8a85
Avoid unnecessary conversions
2016-03-23 16:52:38 -07:00
Benoit Steiner
92693b50eb
Fixed compilation warning
2016-03-23 16:40:36 -07:00
Benoit Steiner
9bc9396e88
Use portable includes
2016-03-23 16:30:06 -07:00
Benoit Steiner
393bc3b16b
Added comment
2016-03-23 16:22:15 -07:00
Benoit Steiner
81d340984a
Removed executable bit from header files
2016-03-23 16:15:02 -07:00
Benoit Steiner
bff8cbad06
Removed executable bit from header files
2016-03-23 16:14:23 -07:00
Benoit Steiner
7a570e50ef
Fixed contractions of fp16
2016-03-23 16:00:06 -07:00
Benoit Steiner
7168afde5e
Made the tensor benchmarks compile on MacOS
2016-03-23 14:21:04 -07:00
Benoit Steiner
2062ee2d26
Added a test to verify that notifications are working properly
2016-03-23 13:39:00 -07:00
Benoit Steiner
fc3660285f
Made type conversion explicit
2016-03-23 09:56:50 -07:00
Benoit Steiner
0e68882604
Added the ability to divide a half float by an index
2016-03-23 09:46:42 -07:00
Benoit Steiner
6971146ca9
Added more conversion operators for half floats
2016-03-23 09:44:52 -07:00
Christoph Hertzberg
9642fd7a93
Replace all M_PI by EIGEN_PI and add a check to the testsuite.
2016-03-23 15:37:45 +01:00
Benoit Steiner
28e02996df
Merged patch 672 from Justin Lebar: Don't use long doubles with cuda
2016-03-22 16:53:57 -07:00
Benoit Steiner
3d1e857327
Fixed compilation error
2016-03-22 15:48:28 -07:00
Benoit Steiner
de7d92c259
Pulled latest updates from trunk
2016-03-22 15:24:49 -07:00
Benoit Steiner
002cf0d1c9
Use a single Barrier instead of a collection of Notifications to reduce the thread synchronization overhead
2016-03-22 15:24:23 -07:00
Benoit Steiner
bc2b802751
Fixed a couple of typos
2016-03-22 14:27:34 -07:00
Benoit Steiner
e7a468c5b7
Filter some compilation flags that nvcc warns about.
2016-03-22 14:26:50 -07:00
Benoit Steiner
6a31b7be3e
Avoid using std::vector whenever possible
2016-03-22 14:02:50 -07:00
Benoit Steiner
65a7113a36
Use an enum instead of a static const int to prevent possible link error
2016-03-22 09:33:54 -07:00
Benoit Steiner
f9ad25e4d8
Fixed contractions of 16 bit floats
2016-03-22 09:30:23 -07:00
Benoit Steiner
8ef3181f15
Worked around a constness related issue
2016-03-21 11:24:05 -07:00
Benoit Steiner
7a07d6aa2b
Small cleanup
2016-03-21 11:12:17 -07:00
Konstantinos Margaritis
a9a6710e15
add initial s390x(zEC13) ZVECTOR support
2016-03-21 13:46:47 -04:00
Benoit Steiner
e91f255301
Marked variables that's only used in debug mode as such
2016-03-21 10:02:00 -07:00
Benoit Steiner
db5c14de42
Explicitly cast the default value into the proper scalar type.
2016-03-21 09:52:58 -07:00
Christoph Hertzberg
b224771f40
bug #1178 : Simplified modification of the SSE control register for better portability
2016-03-20 10:57:08 +01:00
Benoit Steiner
8e03333f06
Renamed some class members to make the code more readable.
2016-03-18 15:21:04 -07:00
Benoit Steiner
6c08943d9f
Fixed a bug in the padding of extracted image patches.
2016-03-18 15:19:10 -07:00
Benoit Steiner
134d750eab
Completed the implementation of vectorized type casting of half floats.
2016-03-18 13:36:28 -07:00
Benoit Steiner
7bd551b3a9
Make all the conversions explicit
2016-03-18 12:20:08 -07:00
Benoit Steiner
bb0e73c191
Gate all the CUDA tests under the EIGEN_TEST_NVCC option
2016-03-18 12:17:37 -07:00
Benoit Steiner
2db4a04827
Fixed a typo
2016-03-18 12:08:01 -07:00
Benoit Steiner
dd514de8a9
Added a test to validate the fallback path for half floats
2016-03-18 12:02:39 -07:00
Benoit Steiner
9a7ece9caf
Worked around constness issue
2016-03-18 10:38:29 -07:00
Benoit Steiner
edc679f6c6
Fixed compilation warning
2016-03-18 07:12:34 -07:00
Benoit Steiner
53d498ef06
Fixed compilation warnings in the cuda tests
2016-03-18 07:04:54 -07:00
Benoit Steiner
e10e126cd0
pulled latest updates from trunk
2016-03-17 21:48:38 -07:00
Benoit Steiner
70eb70f5f8
Avoid mutable class members when possible
2016-03-17 21:47:18 -07:00
Benoit Steiner
7b98de1f15
Implemented some of the missing type casting for half floats
2016-03-17 21:45:45 -07:00
Benoit Steiner
afb81b7ded
Made sure to use the hard abi when compiling with NEON instructions to avoid the "gnu/stubs-soft.h: No such file or directory" error
2016-03-17 21:24:24 -07:00
Benoit Steiner
95b8961a9b
Allocate the mersenne twister used by the random number generators on the heap instead of on the stack since they tend to keep a lot of state (i.e. about 5k) around.
2016-03-17 15:23:51 -07:00
Benoit Steiner
f7329619da
Fix bug in tensor contraction. The code assumes that contraction axis indices for the LHS (after possibly swapping to ColMajor!) is increasing. Explicitly sort the contraction axis pairs to make it so.
2016-03-17 15:08:02 -07:00
Christoph Hertzberg
46aa9772fc
Merged in ebrevdo/eigen (pull request PR-169)
...
Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.
2016-03-16 21:59:08 +01:00
Eugene Brevdo
f1f7181f53
Merge default branch.
2016-03-16 12:46:19 -07:00
Eugene Brevdo
1f69a1b65f
Change the header guard around certain numext functions to be CUDA specific.
2016-03-16 12:44:35 -07:00
Benoit Steiner
ab9b749b45
Improved a test
2016-03-14 20:03:13 -07:00
Benoit Steiner
5a51366ea5
Fixed a typo.
2016-03-14 09:25:16 -07:00
Benoit Steiner
fcf59e1c37
Properly gate the use of cuda intrinsics in the code
2016-03-14 09:13:44 -07:00
Benoit Steiner
97a1f1c273
Make sure we only use the half float intrinsic when compiling with a version of CUDA that is recent enough to provide them
2016-03-14 08:37:58 -07:00
Eugene Brevdo
9550be925d
Merge specfun branch.
2016-03-13 15:46:51 -07:00
Eugene Brevdo
b1a9afe9a9
Add tests in array.cpp that check igamma/igammac properties.
...
This adds to the set of existing tests, which compare a specific
set of values to third party calculated ground truth.
2016-03-13 15:45:34 -07:00
Benoit Steiner
e29c9676b1
Don't mark the cast operator as explicit, since this is a c++11 feature that's not supported by older compilers.
2016-03-12 00:15:58 -08:00
Benoit Steiner
eecd914864
Also replaced uint32_t with unsigned int to make the code more portable
2016-03-11 19:34:21 -08:00
Benoit Steiner
1ca8c1ec97
Replaced a couple more uint16_t with unsigned short
2016-03-11 19:28:28 -08:00
Benoit Steiner
0423b66187
Use unsigned short instead of uint16_t since they're more portable
2016-03-11 17:53:41 -08:00
Benoit Steiner
048c4d6efd
Made half floats usable on hardware that doesn't support them natively.
2016-03-11 17:21:42 -08:00
Benoit Steiner
b72ffcb05e
Made the comparison of Eigen::array GPU friendly
2016-03-11 16:37:59 -08:00
Benoit Steiner
25f69cb932
Added a comparison operator for Eigen::array
...
Alias Eigen::array to std::array when compiling with Visual Studio 2015
2016-03-11 15:20:37 -08:00
Benoit Steiner
c5b98a58b8
Updated the cxx11_meta test to work on the Eigen::array class when std::array isn't available.
2016-03-11 11:53:38 -08:00
Benoit Steiner
456e038a4e
Fixed the +=, -=, *= and /= operators to return a reference
2016-03-10 15:17:44 -08:00
Benoit Steiner
86d45a3c83
Worked around visual studio compilation warnings.
2016-03-09 21:29:39 -08:00
Benoit Steiner
8fd4241377
Fixed a typo.
2016-03-10 02:28:46 +00:00
Benoit Steiner
a685a6beed
Made the list reductions less ambiguous.
2016-03-09 17:41:52 -08:00
Benoit Steiner
3149b5b148
Avoid implicit cast
2016-03-09 17:35:17 -08:00
Benoit Steiner
b2100b83ad
Made sure to include the <random> header file when compiling with visual studio
2016-03-09 16:03:16 -08:00
Benoit Steiner
f05fb449b8
Avoid unnecessary conversion from 32bit int to 64bit unsigned int
2016-03-09 15:27:45 -08:00
Benoit Steiner
1d566417d2
Enable the random number generators when compiling with visual studio
2016-03-09 10:55:11 -08:00
Eugene Brevdo
836e92a051
Update MathFunctions/SpecialFunctions with intelligent header guards.
2016-03-09 09:04:45 -08:00
Benoit Steiner
b084133dbf
Fixed the integer division code on windows
2016-03-09 07:06:36 -08:00
Benoit Steiner
6d30683113
Fixed static assertion
2016-03-08 21:02:51 -08:00
Eugene Brevdo
5e7de771e3
Properly fix merge issues.
2016-03-08 17:35:05 -08:00
Eugene Brevdo
73220d2bb0
Resolve bad merge.
2016-03-08 17:28:21 -08:00
Eugene Brevdo
5f17de3393
Merge changes.
2016-03-08 17:22:26 -08:00
Eugene Brevdo
14f0fde51f
Add certain functions to numext (log, exp, tan) because CUDA doesn't support std::
...
Use these in SpecialFunctions.
2016-03-08 17:17:44 -08:00
Benoit Steiner
46177c8d64
Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.
2016-03-08 16:37:27 -08:00
Benoit Steiner
6d6413f768
Simplified the full reduction code
2016-03-08 16:02:00 -08:00
Benoit Steiner
5a427a94a9
Fixed the tensor generator code
2016-03-08 13:28:06 -08:00
Benoit Steiner
a81b88bef7
Fixed the tensor concatenation code
2016-03-08 12:30:19 -08:00
Benoit Steiner
551ff11d0d
Fixed the tensor layout swapping code
2016-03-08 12:28:10 -08:00
Benoit Steiner
8768c063f5
Fixed the tensor chipping code.
2016-03-08 12:26:49 -08:00
Benoit Steiner
e09eb835db
Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
2016-03-08 12:07:33 -08:00
Benoit Steiner
3b614a2358
Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.
2016-03-07 17:53:28 -08:00
Eugene Brevdo
dd6dcad6c2
Merge branch specfun.
2016-03-07 15:37:12 -08:00
Eugene Brevdo
0bb5de05a1
Finishing touches on igamma/igammac for GPU. Tests now pass.
2016-03-07 15:35:09 -08:00
Benoit Steiner
769685e74e
Added the ability to pad a tensor using a non-zero value
2016-03-07 14:45:37 -08:00
Benoit Steiner
7f87cc3a3b
Fix a couple of typos in the code.
2016-03-07 14:31:27 -08:00
Eugene Brevdo
5707004d6b
Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes.
...
0. Prior to this PR, not a single sharded CUDA test was actually being *run*.
Fixed that.
GPU tests are still failing for igamma/igammac.
1. Add calls for igamma/igammac to TensorBase
2. Fix up CUDA-specific calls of igamma/igammac
3. Add unit tests for digamma, igamma, igammac in CUDA.
2016-03-07 14:08:56 -08:00
Benoit Steiner
e5f25622e2
Added a test to validate the behavior of some of the tensor syntactic sugar.
2016-03-07 09:04:27 -08:00
Benoit Steiner
9f5740cbc1
Added missing include
2016-03-06 22:03:18 -08:00
Benoit Steiner
5238e03fe1
Don't try to compile the uint128 test with compilers that don't support uint127
2016-03-06 21:59:40 -08:00
Benoit Steiner
9a54c3e32b
Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.
2016-03-06 09:38:56 -08:00
Benoit Steiner
05bbca079a
Turn on some of the cxx11 features when compiling with visual studio 2015
2016-03-05 10:52:08 -08:00
Benoit Steiner
6093eb9ff5
Don't test our 128bit emulation code when compiling with msvc
2016-03-05 10:37:11 -08:00
Benoit Steiner
57b263c5b9
Avoid using initializer lists in test since not all version of msvc support them
2016-03-05 08:35:26 -08:00
Benoit Steiner
23aed8f2e4
Use EIGEN_PI instead of redefining our own constant PI
2016-03-05 08:04:45 -08:00
Eugene Brevdo
0b9e0abc96
Make igamma and igammac work correctly.
...
This required replacing ::abs with std::abs.
Modified some unit tests.
2016-03-04 21:12:10 -08:00
Benoit Steiner
c23e0be18f
Use the CMAKE_CXX_STANDARD variable to turn on cxx11
2016-03-04 20:18:01 -08:00
Benoit Steiner
ec35068edc
Don't rely on the M_PI constant since not all compilers provide it.
2016-03-04 16:42:38 -08:00
Benoit Steiner
60d9df11c1
Fixed the computation of leading zeros when compiling with msvc.
2016-03-04 16:27:02 -08:00
Benoit Steiner
4e49fd5eb9
MSVC uses __uint128 while other compilers use __uint128_t to encode 128bit unsigned integers. Make the cxx11_tensor_uint128.cpp test work in both cases.
2016-03-04 14:49:18 -08:00
Benoit Steiner
667fcc2b53
Fixed syntax error
2016-03-04 14:37:51 -08:00
Benoit Steiner
4416a5dcff
Added missing include
2016-03-04 14:35:43 -08:00
Benoit Steiner
c561eeb7bf
Don't use implicit type conversions in initializer lists since not all compilers support them.
2016-03-04 14:12:45 -08:00
Benoit Steiner
174edf976b
Made the contraction test more portable
2016-03-04 14:11:13 -08:00
Benoit Steiner
2c50fc878e
Fixed a typo
2016-03-04 14:09:38 -08:00
Eugene Brevdo
7ea35bfa1c
Initial implementation of igamma and igammac.
2016-03-03 19:39:41 -08:00
Benoit Steiner
deea866bbd
Added tests to cover the new rounding, flooring and ceiling tensor operations.
2016-03-03 12:38:02 -08:00
Benoit Steiner
5cf4558c0a
Added support for rounding, flooring, and ceiling to the tensor api
2016-03-03 12:36:55 -08:00
Benoit Steiner
dac58d7c35
Added a test to validate the conversion of half floats into floats on Kepler GPUs.
...
Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5.
2016-03-03 10:37:25 -08:00
Benoit Steiner
1032441c6f
Enable partial support for half floats on Kepler GPUs.
2016-03-03 10:34:20 -08:00
Benoit Steiner
1da10a7358
Enable the conversion between floats and half floats on older GPUs that support it.
2016-03-03 10:33:20 -08:00
Benoit Steiner
2de8cc9122
Merged in ebrevdo/eigen (pull request PR-167)
...
Add infinity() support to numext::numeric_limits, use it in lgamma.
I tested the code on my gtx-titan-black gpu, and it appears to work as expected.
2016-03-03 09:42:12 -08:00
Eugene Brevdo
ab3dc0b0fe
Small bugfix to numeric_limits for CUDA.
2016-03-02 21:48:46 -08:00
Eugene Brevdo
6afea46838
Add infinity() support to numext::numeric_limits, use it in lgamma.
...
This makes the infinity access a __device__ function, removing
nvcc warnings.
2016-03-02 21:35:48 -08:00
Gael Guennebaud
3fccef6f50
bug #537 : fix compilation with Apples's compiler
2016-03-02 13:22:46 +01:00
Benoit Steiner
fedaf19262
Pulled latest updates from trunk
2016-03-01 06:15:44 -08:00
Gael Guennebaud
dfa80b2060
Compilation fix
2016-03-01 12:48:56 +01:00
Gael Guennebaud
bee9efc203
Compilation fix
2016-03-01 12:47:27 +01:00
Benoit Steiner
68ac5c1738
Improved the performance of large outer reductions on cuda
2016-02-29 18:11:58 -08:00
Benoit Steiner
56a3ada670
Added benchmarks for full reduction
2016-02-29 14:57:52 -08:00
Benoit Steiner
b2075cb7a2
Made the signature of the inner and outer reducers consistent
2016-02-29 10:53:38 -08:00
Benoit Steiner
3284842045
Optimized the performance of narrow reductions on CUDA devices
2016-02-29 10:48:16 -08:00
Gael Guennebaud
e9bea614ec
Fix shortcoming in fixed-value deduction of startRow/startCol
2016-02-29 10:31:27 +01:00
Benoit Steiner
609b3337a7
Print some information to stderr when a CUDA kernel fails
2016-02-27 20:42:57 +00:00
Benoit Steiner
1031b31571
Improved the README
2016-02-27 20:22:04 +00:00
Gael Guennebaud
8e6faab51e
bug #1172 : make valuePtr and innderIndexPtr properly return null for empty matrices.
2016-02-27 14:55:40 +01:00
Benoit Steiner
ac2e6e0d03
Properly vectorized the random number generators
2016-02-26 13:52:24 -08:00
Benoit Steiner
caa54d888f
Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag
2016-02-26 12:38:18 -08:00
Benoit Steiner
93485d86bc
Added benchmarks for type casting of float16
2016-02-26 12:24:58 -08:00
Benoit Steiner
002824e32d
Added benchmarks for fp16
2016-02-26 12:21:25 -08:00
Benoit Steiner
2cd32cad27
Reverted previous commit since it caused more problems than it solved
2016-02-26 13:21:44 +00:00
Benoit Steiner
d9d05dd96e
Fixed handling of long doubles on aarch64
2016-02-26 04:13:58 -08:00
Benoit Steiner
af199b4658
Made the CUDA architecture level a build setting.
2016-02-25 09:06:18 -08:00
Benoit Steiner
c36c09169e
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
2016-02-24 17:07:25 -08:00
Benoit Steiner
7a01cb8e4b
Marked the And and Or reducers as stateless.
2016-02-24 16:43:01 -08:00
Gael Guennebaud
91e1375ba9
merge
2016-02-23 11:09:05 +01:00
Gael Guennebaud
055000a424
Fix startRow()/startCol() for dense Block with direct access:
...
the initial implementation failed for empty rows/columns for which are ambiguous.
2016-02-23 11:07:59 +01:00
Benoit Steiner
1d9256f7db
Updated the padding code to work with half floats
2016-02-23 05:51:22 +00:00
Benoit Steiner
8cb9bfab87
Extended the tensor benchmark suite to support types other than floats
2016-02-23 05:28:02 +00:00
Benoit Steiner
f442a5a5b3
Updated the tensor benchmarking code to work with compilers that don't support cxx11.
2016-02-23 04:15:48 +00:00
Benoit Steiner
72d2cf642e
Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.
2016-02-22 15:29:41 -08:00
Benoit Steiner
6270d851e3
Declare the half float type as arithmetic.
2016-02-22 13:59:33 -08:00
Benoit Steiner
5cd00068c0
include <iostream> in the tensor header since we now use it to better report cuda initialization errors
2016-02-22 13:59:03 -08:00
Benoit Steiner
257b640463
Fixed compilation warning generated by clang
2016-02-21 22:43:37 -08:00
Benoit Steiner
584832cb3c
Implemented the ptranspose function on half floats
2016-02-21 12:44:53 -08:00
Benoit Steiner
e644f60907
Pulled latest updates from trunk
2016-02-21 20:24:59 +00:00
Benoit Steiner
95fceb6452
Added the ability to compute the absolute value of a half float
2016-02-21 20:24:11 +00:00
Benoit Steiner
ed69cbeef0
Added some debugging information to the test to figure out why it fails sometimes
2016-02-21 11:20:20 -08:00
Benoit Steiner
96a24b05cc
Optimized casting of tensors in the case where the casting happens to be a no-op
2016-02-21 11:16:15 -08:00
Benoit Steiner
203490017f
Prevent unecessary Index to int conversions
2016-02-21 08:49:36 -08:00
Benoit Steiner
9ff269a1d3
Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.
2016-02-20 07:47:23 +00:00
Benoit Steiner
1e6fe6f046
Fixed the float16 tensor test.
2016-02-20 07:44:17 +00:00
Rasmus Munk Larsen
8eb127022b
Get rid of duplicate code.
2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen
d5e2ec7447
Speed up tensor FFT by up ~25-50%.
...
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8 132 134 -1.5%
BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8%
BM_tensor_fft_single_1D_cpu/16 199 195 +2.0%
BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4%
BM_tensor_fft_single_1D_cpu/32 373 341 +8.6%
BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6%
BM_tensor_fft_single_1D_cpu/64 797 675 +15.3%
BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8%
BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6%
BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5%
BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9%
BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1%
BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4%
BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9%
BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5%
BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6%
BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5%
BM_tensor_fft_double_1D_cpu/8 138 131 +5.1%
BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6%
BM_tensor_fft_double_1D_cpu/16 218 200 +8.3%
BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6%
BM_tensor_fft_double_1D_cpu/32 406 368 +9.4%
BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7%
BM_tensor_fft_double_1D_cpu/64 856 728 +15.0%
BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0%
BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5%
BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9%
BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9%
BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9%
BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2%
BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9%
BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2%
BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4%
BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9%
BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4%
BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6%
BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4%
BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7%
BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7%
BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4%
BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7%
BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0%
BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6%
BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6%
BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9%
BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3%
BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3%
BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6%
BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5%
BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5%
BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3%
BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5%
BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2%
BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2%
BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6%
BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7%
BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9%
BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9%
BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4%
BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2%
BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5%
BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4%
BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8%
BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7%
BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6%
BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1%
BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5%
BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
2016-02-19 16:29:23 -08:00
Gael Guennebaud
d90a2dac5e
merge
2016-02-19 23:01:27 +01:00
Gael Guennebaud
485823b5f5
Add COD and BDCSVD in list of benched solvers.
2016-02-19 23:00:33 +01:00
Gael Guennebaud
2af04f1a57
Extend unit test to stress smart_copy with empty input/output.
2016-02-19 22:59:28 +01:00
Gael Guennebaud
6fa35bbd28
bug #1170 : skip calls to memcpy/memmove for empty imput.
2016-02-19 22:58:52 +01:00
Benoit Steiner
46fc23f91c
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
2016-02-19 13:44:22 -08:00
Gael Guennebaud
6f0992c05b
Fix nesting type and complete reflection methods of Block expressions.
2016-02-19 22:21:02 +01:00
Gael Guennebaud
f3643eec57
Add typedefs for the return type of all block methods.
2016-02-19 22:15:01 +01:00
Benoit Steiner
670db7988d
Updated the contraction code to make it compatible with half floats.
2016-02-19 13:03:26 -08:00
Benoit Steiner
180156ba1a
Added support for tensor reductions on half floats
2016-02-19 10:05:59 -08:00
Benoit Steiner
5c4901b83a
Implemented the scalar division of 2 half floats
2016-02-19 10:03:19 -08:00
Benoit Steiner
f268db1c4b
Added the ability to query the minor version of a cuda device
2016-02-19 16:31:04 +00:00
Benoit Steiner
a08d2ff0c9
Started to work on contractions and reductions using half floats
2016-02-19 15:59:59 +00:00
Benoit Steiner
f3352e0fb0
Don't make the array constructors explicit
2016-02-19 15:58:57 +00:00
Benoit Steiner
f7cb755299
Added support for operators +=, -=, *= and /= on CUDA half floats
2016-02-19 15:57:26 +00:00
Benoit Steiner
dc26459b99
Implemented protate() for CUDA
2016-02-19 15:16:54 +00:00
Benoit Steiner
cd042dbbfd
Fixed a bug in the tensor type converter
2016-02-19 15:03:26 +00:00
Benoit Steiner
ac5d706a94
Added support for simple coefficient wise tensor expression using half floats on CUDA devices
2016-02-19 08:19:12 +00:00
Benoit Steiner
0606a0a39b
FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA
2016-02-18 23:15:23 -08:00
Benoit Steiner
f36c0c2c65
Added regression test for float16
2016-02-19 06:23:28 +00:00
Benoit Steiner
7151bd8768
Reverted unintended changes introduced by a bad merge
2016-02-19 06:20:50 +00:00
Benoit Steiner
1304e1fb5e
Pulled latest updates from trunk
2016-02-19 06:17:02 +00:00
Benoit Steiner
17b9fbed34
Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa
2016-02-19 06:16:07 +00:00
Benoit Steiner
8ce46f9d89
Improved implementation of ptanh for SSE and AVX
2016-02-18 13:24:34 -08:00
Eugene Brevdo
832380c455
Merged eigen/eigen into default
2016-02-17 14:44:06 -08:00
Eugene Brevdo
06a2bc7c9c
Tiny bugfix in SpecialFunctions: some compilers don't like doubles
...
implicitly downcast to floats in an array constructor.
2016-02-17 14:41:59 -08:00
Gael Guennebaud
f6f057bb7d
bug #1166 : fix shortcomming in gemv when the destination is not a vector at compile-time.
2016-02-15 21:43:07 +01:00
Gael Guennebaud
8e1f1ba6a6
Import wiki's paragraph: "I disabled vectorization, but I'm still getting annoyed about alignment issues"
2016-02-12 22:16:59 +01:00
Gael Guennebaud
c8b4c4b48a
bug #795 : mention allocate_shared as a condidate for aligned_allocator.
2016-02-12 22:09:16 +01:00
Gael Guennebaud
6eff3e5185
Fix triangularView versus triangularPart.
2016-02-12 17:09:28 +01:00
Gael Guennebaud
4252af6897
Remove dead code.
2016-02-12 16:13:35 +01:00
Gael Guennebaud
2f5f56a820
Fix usage of evaluator in sparse * permutation products.
2016-02-12 16:13:16 +01:00
Gael Guennebaud
0a537cb2d8
bug #901 : fix triangular-view with unit diagonal of sparse rectangular matrices.
2016-02-12 15:58:31 +01:00
Gael Guennebaud
b35d1a122e
Fix unit test: accessing elements in a deque by offsetting a pointer to another element causes undefined behavior.
2016-02-12 15:31:16 +01:00
Benoit Steiner
9e3f3a2d27
Deleted outdated comment
2016-02-11 17:27:35 -08:00
Benoit Steiner
de345eff2e
Added a method to conjugate the content of a tensor or the result of a tensor expression.
2016-02-11 16:34:07 -08:00
Benoit Steiner
17e93ba148
Pulled latest updates from trunk
2016-02-11 15:05:38 -08:00
Benoit Steiner
3628f7655d
Made it possible to run the scalar_binary_pow_op functor on GPU
2016-02-11 15:05:03 -08:00
Hauke Heibel
eeac46f980
bug #774 : re-added comment referencing equations in the original paper
2016-02-11 19:38:37 +01:00
Benoit Steiner
c569cfe12a
Inline the +=, -=, *= and /= operators consistently between DenseBase.h and SelfCwiseBinaryOp.h
2016-02-11 09:33:32 -08:00
Gael Guennebaud
8cc9232b9a
bug #774 : fix a numerical issue producing unwanted reflections.
2016-02-11 15:32:56 +01:00
Gael Guennebaud
2d35c0cb5f
Merged in rmlarsen/eigen (pull request PR-163)
...
Implement complete orthogonal decomposition in Eigen.
2016-02-11 15:12:34 +01:00
Benoit Steiner
33e2373f01
Merged in nnyby/eigen/nnyby/doc-grammar-fix-linearly-space-linearly-1443742971203 (pull request PR-138)
...
[doc] grammar fix: "linearly space" -> "linearly spaced"
2016-02-10 23:29:59 -08:00
Benoit Steiner
6d8b1dce06
Avoid implicit cast from double to float.
2016-02-10 18:07:11 -08:00
Benoit Steiner
1dfaafe28a
Added a regression test for tanh
2016-02-10 17:41:47 -08:00
Rasmus Munk Larsen
b6fdf7468c
Rename inverse -> pseudoInverse.
2016-02-10 13:03:07 -08:00
Benoit Jacob
9d6f1ad398
I'm told to use __EMSCRIPTEN__ by an Emscripten dev.
2016-02-10 12:48:34 -05:00
Benoit Steiner
bfb3fcd94f
Optimized implementation of the tanh function for SSE
2016-02-10 08:52:30 -08:00
Benoit Steiner
2d523332b3
Optimized implementation of the hyperbolic tangent function for AVX
2016-02-10 08:48:05 -08:00
Benoit Jacob
e6ee18d6b4
Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC
2016-02-10 11:11:49 -05:00
Benoit Steiner
2ac59e5d36
Pulled latest updates from trunk
2016-02-10 08:03:02 -08:00
Benoit Steiner
9a21b38ccc
Worked around a few clang compilation warnings
2016-02-10 08:02:04 -08:00
Benoit Jacob
964a95bf5e
Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088
2016-02-10 10:37:22 -05:00
Benoit Steiner
72ab7879f7
Fixed clang comilation warnings
2016-02-10 06:48:28 -08:00
Benoit Steiner
e88535634d
Fixed some clang compilation warnings
2016-02-09 23:32:41 -08:00
Benoit Steiner
970751ece3
Disabling the nvcc warnings in addition to the clang warnings when clang is used as a frontend for nvcc
2016-02-09 20:55:50 -08:00
Benoit Steiner
6323851ea9
Fixed compilation warning
2016-02-09 20:43:41 -08:00
Rasmus Munk Larsen
bb8811c655
Enable inverse() method for computing pseudo-inverse.
2016-02-09 20:35:20 -08:00
Benoit Steiner
5cc0dd5f44
Fixed the code that disables the use of variadic templates when compiling with nvcc on ARM devices.
2016-02-09 10:32:01 -08:00
Benoit Steiner
a9cc6a06b9
Fixed compilation warning in the splines test
2016-02-09 05:10:06 +00:00
Benoit Steiner
d69946183d
Updated the TensorIntDivisor code to work properly on LLP64 systems
2016-02-08 21:03:59 -08:00
Benoit Steiner
24d291cf16
Worked around nvcc crash when compiling Eigen on Tegra X1
2016-02-09 02:34:02 +00:00
Rasmus Munk Larsen
53f60e0afc
Make applyZAdjointOnTheLeftInPlace protected.
2016-02-08 09:01:43 -08:00
Rasmus Munk Larsen
414efa47d3
Add missing calls to tests of COD.
...
Fix a few mistakes in 3.2 -> 3.3 port.
2016-02-08 08:50:34 -08:00
Gael Guennebaud
c2bf2f56ef
Remove custom unaligned loads for SSE. They were only useful for core2 CPU.
2016-02-08 14:29:12 +01:00
Gael Guennebaud
a4c76f8d34
Improve inlining
2016-02-08 11:33:02 +01:00
Rasmus Munk Larsen
16ec450ca1
Nevermind.
2016-02-06 17:54:01 -08:00
Rasmus Munk Larsen
019fff9a00
Add my name to copyright notice in ColPivHouseholder.h, mostly for previous work on stable norm downdate formula.
2016-02-06 17:48:42 -08:00
Rasmus Munk Larsen
86d6201d7b
Merge.
2016-02-06 16:36:56 -08:00
Rasmus Munk Larsen
d904c8ac8f
Implement complete orthogonal decomposition in Eigen.
2016-02-06 16:32:00 -08:00
Gael Guennebaud
010afe1619
Add exemples for reshaping/slicing with Map.
2016-02-06 22:49:18 +01:00
Gael Guennebaud
8e599bc098
Fix warning in unit test
2016-02-06 20:26:59 +01:00
Gael Guennebaud
c6a12d1dc6
Fix warning with gcc < 4.8
2016-02-06 18:06:51 +01:00
Benoit Steiner
4d4211c04e
Avoid unecessary type conversions
2016-02-05 18:19:41 -08:00
Benoit Steiner
d2cba52015
Only enable the cxx11_tensor_uint128 test on 64 bit machines since 32 bit systems don't support the __uin128_t type
2016-02-05 18:14:23 -08:00
Benoit Steiner
fb00a4af2b
Made the tensor fft test compile on tegra x1
2016-02-06 01:42:14 +00:00
Gael Guennebaud
5b2d287878
bug #779 : allow non aligned buffers for buffers smaller than the requested alignment.
2016-02-05 21:46:39 +01:00
Gael Guennebaud
e8e1d504d6
Add an explicit assersion on the alignment of the pointer returned by std::malloc
2016-02-05 21:38:16 +01:00
Gael Guennebaud
62a1c911cd
Remove posix_memalign, _mm_malloc, and _aligned_malloc special paths.
2016-02-05 21:24:35 +01:00
Rasmus Munk Larsen
093f2b3c01
Merge.
2016-02-04 14:32:19 -08:00
Benoit Steiner
3ca1ae2bb7
Commented out the version of pexp<Packet8d> since it fails to compile with gcc 5.3
2016-02-04 13:49:06 -08:00
Rasmus Munk Larsen
2e39cc40a4
Fix condition that made the unit test spam stdout with bogus error messages.
2016-02-04 12:56:14 -08:00
Benoit Steiner
23f69ab936
Added implementations of pexp, plog, psqrt, and prsqrt optimized for AVX512
2016-02-04 10:36:36 -08:00
Benoit Steiner
6c9cf117c1
Fixed indentation
2016-02-04 10:34:10 -08:00
Benoit Steiner
bcdcdace48
Pulled latest updates from trunk
2016-02-04 08:56:49 -08:00
Gael Guennebaud
659fc9c159
Remove dead code
2016-02-04 09:55:09 +01:00
Gael Guennebaud
d5d7798b9d
Improve heuritics for switching between coeff-based and general matrix product implementation.
2016-02-04 09:53:47 +01:00
Benoit Steiner
f535378995
Added support for vectorized type casting of int to char.
2016-02-03 18:58:29 -08:00
Benoit Steiner
4ab63a3f6f
Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.
2016-02-03 17:23:07 -08:00
Benoit Steiner
727ff26960
Disable 2 more nvcc warning messages
2016-02-03 16:01:37 -08:00
Benoit Steiner
1cbb79cdfd
Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings
2016-02-03 15:58:26 -08:00
Benoit Steiner
bcbde37a11
Made sure the code compiles when EIGEN_HAS_C99_MATH isn't defined
2016-02-03 14:53:08 -08:00
Benoit Steiner
f933f69021
Added a few comments
2016-02-03 14:12:18 -08:00
Benoit Steiner
5d82e47ef6
Properly disable nvcc warning messages in user code.
2016-02-03 14:10:06 -08:00
Benoit Steiner
af8436b196
Silenced the "calling a __host__ function from a __host__ __device__ function is not allowed" messages
2016-02-03 13:48:36 -08:00
Benoit Steiner
d7742d22e4
Revert the nvcc messages to their default severity instead of the forcing them to be warnings
2016-02-03 13:47:28 -08:00
Benoit Steiner
ac26e1aaf3
Pulled latest updates from trunk
2016-02-03 12:52:20 -08:00
Benoit Steiner
492fe7ce02
Silenced some unhelpful warnings generated by nvcc.
2016-02-03 12:51:19 -08:00
Gael Guennebaud
b70db60e4d
Merged in rmlarsen/eigen (pull request PR-161)
...
Change Eigen's ColPivHouseholderQR to use numerically stable norm downdate formula
2016-02-03 21:37:06 +01:00
Rasmus Munk Larsen
5fb04ab2da
Fix bad line break. Don't repeat Kahan matrix test since it is deterministic.
2016-02-03 10:12:10 -08:00
Rasmus Munk Larsen
d9a6f86cc0
Make the array of directly compute column norms a member to avoid allocation in computeInPlace.
2016-02-03 09:55:30 -08:00
Gael Guennebaud
70dc14e4e1
bug #1161 : fix division by zero for huge scalar types
2016-02-03 18:25:41 +01:00
Damien R
c301f99208
bug #1164 : fix list and deque specializations such that our aligned allocator is automatically activatived only when the user did not specified an allocator (or specified the default std::allocator).
2016-02-03 18:07:25 +01:00
Gael Guennebaud
eb6d9aea0e
Clarify error message when writing to a read-only sparse-sub-matrix.
2016-02-03 16:58:23 +01:00
Gael Guennebaud
040cf33e8f
merge
2016-02-03 16:09:51 +01:00
Gael Guennebaud
c85fbfd0b7
Clarify documentation on the restrictions of writable sparse block expressions.
2016-02-03 16:08:43 +01:00
Benoit Steiner
dc413dbe8a
Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
...
Add constructor for long types.
2016-02-02 20:58:06 -08:00
Ville Kallioniemi
783018d8f6
Use EIGEN_STATIC_ASSERT for backward compatibility.
2016-02-02 16:45:12 -07:00
Benoit Steiner
99cde88341
Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.
2016-02-02 11:06:53 -08:00
Ville Kallioniemi
ff0a83aaf8
Use single template constructor to avoid overload resolution issues.
2016-02-02 00:33:25 -07:00
Ville Kallioniemi
aedea349aa
Replace separate low word constructors with a single templated constructor.
2016-02-01 20:25:02 -07:00
Ville Kallioniemi
f0fdefa96f
Rebase to latest.
2016-02-01 19:32:31 -07:00
Benoit Steiner
d93b71a301
Updated the packetmath test to call predux_half instead of predux4
2016-02-01 15:18:33 -08:00
Benoit Steiner
ef66f2887b
Updated the matrix multiplication code to make it compile with AVX512 enabled.
2016-02-01 14:38:05 -08:00
Benoit Steiner
85b6d82b49
Generalized predux4 to support AVX512 packets, and renamed it predux_half.
...
Disabled the implementation of pabs for avx512 since the corresponding intrinsics are not shipped with gcc
2016-02-01 14:35:51 -08:00
Benoit Steiner
64ce78c2ec
Cleaned up a tensor contraction test
2016-02-01 13:57:41 -08:00
Benoit Steiner
0ce5d32be5
Sharded the cxx11_tensor_contract_cuda test
2016-02-01 13:33:23 -08:00
Benoit Steiner
922b5f527b
Silenced a few compilation warnings
2016-02-01 13:30:49 -08:00
Benoit Steiner
6b5dff875e
Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
2016-02-01 12:46:32 -08:00
Rasmus Munk Larsen
00f9ef6c76
merging.
2016-02-01 11:10:30 -08:00
Benoit Steiner
264f8141f8
Shared the tensor reduction test
2016-02-01 07:44:31 -08:00
Benoit Steiner
11bb71c8fc
Sharded the tensor device test
2016-02-01 07:34:59 -08:00
Gael Guennebaud
ff1157bcbf
bug #694 : document that SparseQR::matrixR is not sorted.
2016-02-01 16:09:34 +01:00
Gael Guennebaud
ec469700dc
bug #557 : make InnerIterator of sparse storage types more versatile by adding default-ctor, copy-ctor/assignment
2016-02-01 15:04:33 +01:00
Gael Guennebaud
6e0a86194c
Fix integer path for num_steps==1
2016-02-01 15:00:04 +01:00
Gael Guennebaud
e1d219e5c9
bug #698 : fix linspaced for integer types.
2016-02-01 14:25:34 +01:00
Gael Guennebaud
2c3224924b
Fix warning and replace min/max macros by calls to mini/maxi
2016-02-01 10:23:45 +01:00
Benoit Steiner
e80ed948e1
Fixed a number of compilation warnings generated by the cuda tests
2016-01-31 20:09:41 -08:00
Benoit Steiner
6720b38fbf
Fixed a few compilation warnings
2016-01-31 16:48:50 -08:00
Benoit Steiner
3f1ee45833
Fixed compilation errors triggered by duplicate inline declaration
2016-01-31 10:48:49 -08:00
Benoit Steiner
70be6f6531
Pulled latest changes from trunk
2016-01-31 10:44:45 -08:00
Benoit Steiner
4a2ddfb81d
Sharded the CUDA argmax tensor test
2016-01-31 10:44:15 -08:00
Gael Guennebaud
d142165942
bug #667 : declare several critical functions as FORECE_INLINE to make ICC happier.
...
<g.gael@free.fr > HG: branch 'default' HG: changed Eigen/src/Core/ArrayBase.h HG: changed Eigen/src/Core/AssignEvaluator.h HG: changed
Eigen/src/Core/CoreEvaluators.h HG: changed Eigen/src/Core/CwiseUnaryOp.h HG: changed Eigen/src/Core/DenseBase.h HG: changed Eigen/src/Core/MatrixBase.h
2016-01-31 16:34:10 +01:00
Gael Guennebaud
a4e4542b89
Avoid overflow in unit test.
2016-01-30 22:26:17 +01:00
Gael Guennebaud
3ba8a3ab1a
Disable underflow unit test on the i387 FPU.
2016-01-30 22:14:04 +01:00
Benoit Steiner
483082ef6e
Fixed a few memory leaks in the cuda tests
2016-01-30 11:59:22 -08:00
Benoit Steiner
bd21aba181
Sharded the cxx11_tensor_cuda test and fixed a memory leak
2016-01-30 11:47:09 -08:00
Benoit Steiner
9de155d153
Added a test to cover threaded tensor shuffling
2016-01-30 10:56:47 -08:00
Benoit Steiner
32088c06a1
Made the comparison between single and multithreaded contraction results more resistant to numerical noise to prevent spurious test failures.
2016-01-30 10:51:14 -08:00
Benoit Steiner
2053478c56
Made sure to use a tensor of rank 0 to store the result of a full reduction in the tensor thread pool test
2016-01-30 10:46:36 -08:00
Benoit Steiner
d0db95f730
Sharded the tensor thread pool test
2016-01-30 10:43:57 -08:00
Benoit Steiner
ba27c8a7de
Made the CUDA contract test more robust to numerical noise.
2016-01-30 10:28:43 -08:00
Benoit Steiner
4281eb1e2c
Added 2 benchmarks to the suite of tensor benchmarks running on GPU
2016-01-30 10:20:43 -08:00
Gael Guennebaud
102fa96a96
Extend doc on dense+sparse
2016-01-30 14:58:21 +01:00
Gael Guennebaud
1bc207c528
backout changeset d4a9e61569
...
: the extended SparseView is not needed anymore
2016-01-30 14:43:21 +01:00
Gael Guennebaud
8ed1553d20
bug #632 : implement general coefficient-wise "dense op sparse" operations through specialized evaluators instead of using SparseView.
...
This permits to deal with arbitrary storage order, and to by-pass the more complex iterator of the sparse-sparse case.
2016-01-30 14:39:50 +01:00
Gael Guennebaud
699634890a
bug #946 : generalize Cholmod::solve to handle any rhs expression
2016-01-29 23:02:22 +01:00
Gael Guennebaud
15084cf1ac
bug #632 : add support for "dense +/- sparse" operations. The current implementation is based on SparseView to make the dense subexpression compatible with the sparse one.
2016-01-29 22:09:45 +01:00
Gael Guennebaud
d4a9e61569
Extend SparseView to allow keeping explicit zeros. This is equivalent to sparseView(1,-1) but faster because the test is removed at compile-time.
2016-01-29 22:07:56 +01:00
Gael Guennebaud
d8d37349c3
bug #696 : enable zero-sized block at compile-time by relaxing the respective assertion
2016-01-29 12:44:49 +01:00
Gael Guennebaud
e8ccc06fe5
merge
2016-01-29 09:40:38 +01:00
Benoit Steiner
963f2d2a8f
Marked several methods EIGEN_DEVICE_FUNC
2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0
Fixed a couple of compilation warnings.
2016-01-28 23:15:45 -08:00
Benoit Steiner
e4f83bae5d
Fixed the tensor benchmarks on apple devices
2016-01-28 21:08:07 -08:00
Benoit Steiner
10bea90c4a
Fixed clang related compilation error
2016-01-28 20:52:08 -08:00
Benoit Steiner
d3f533b395
Fixed compilation warning
2016-01-28 20:09:45 -08:00
Abhijit Kundu
3fde202215
Making ceil() functor generic w.r.t packet type
2016-01-28 21:27:00 -05:00
Benoit Steiner
211d350fc3
Fixed a typo
2016-01-28 17:13:04 -08:00
Benoit Steiner
bd2e5a788a
Made sure the number of floating point operations done by a benchmark is computed using 64 bit integers to avoid overflows.
2016-01-28 17:10:40 -08:00
Benoit Steiner
120e13b1b6
Added a readme to explain how to compile the tensor benchmarks.
2016-01-28 17:06:00 -08:00
Benoit Steiner
a68864b6bc
Updated the benchmarking code to print the number of flops processed instead of the number of bytes.
2016-01-28 16:51:40 -08:00
Benoit Steiner
8217281ae4
Merge latest updates from trunk
2016-01-28 16:20:53 -08:00
Benoit Steiner
c8d5f21941
Added extra tensor benchmarks
2016-01-28 16:20:36 -08:00
Benoit Steiner
7b3044d086
Made sure to call nvcc with the relaxed-constexpr flag.
2016-01-28 15:36:34 -08:00
Rasmus Munk Larsen
acce4dd050
Change Eigen's ColPivHouseholderQR to use the numerically stable norm downdate formula from http://www.netlib.org/lapack/lawnspdf/lawn176.pdf , which has been used in LAPACK's xGEQPF and xGEQP3 since 2006. With the old formula, the code chooses the wrong pivots and fails to correctly determine rank on graded matrices.
...
This change also adds additional checks for non-increasing diagonal in R11 to existing unit tests, and adds a new unit test with the Kahan matrix, which consistently fails for the original code.
Benchmark timings on Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz. Code compiled with AVX & FMA. I just ran on square matrices of 3 difference sizes.
Benchmark Time(ns) CPU(ns) Iterations
-------------------------------------------------------
Before:
BM_EigencolPivQR/64 53677 53627 12890
BM_EigencolPivQR/512 15265408 15250784 46
BM_EigencolPivQR/4k 15403556228 15388788368 2
After (non-vectorized version):
Benchmark Time(ns) CPU(ns) Iterations Degradation
--------------------------------------------------------------------
BM_EigencolPivQR/64 63736 63669 10844 18.5%
BM_EigencolPivQR/512 16052546 16037381 43 5.1%
BM_EigencolPivQR/4k 15149263620 15132025316 2 -2.0%
Performance-wise there seems to be a ~18.5% degradation for small (64x64) matrices, probably due to the cost of more O(min(m,n)^2) sqrt operations that are not needed for the unstable formula.
2016-01-28 15:07:26 -08:00
Gael Guennebaud
b908e071a8
bug #178 : get rid of some const_cast in SparseCore
2016-01-28 22:11:18 +01:00
Gael Guennebaud
c1d900af61
bug #178 : remove additional const on nested expression, and remove several const_cast.
2016-01-28 21:43:20 +01:00
Benoit Steiner
12f8bd12a2
Merged in jiayq/eigen (pull request PR-159)
...
Modifications to the tensor benchmarks to allow compilation in a standalone fashion.
2016-01-28 11:28:55 -08:00
Yangqing Jia
270c4e1ecd
bugfix
2016-01-28 11:11:45 -08:00
Yangqing Jia
c4e47630b1
benchmark modifications to make it compilable in a standalone fashion.
2016-01-28 10:35:14 -08:00
Gael Guennebaud
f50bb1e6f3
Fix compilation with gcc
2016-01-28 13:25:26 +01:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Gael Guennebaud
df15fbc452
bug #1158 : PartialReduxExpr is a vector expression, and it thus must expose the LinearAccessBit flag
2016-01-28 13:16:30 +01:00
Gael Guennebaud
9bcadb7fd1
Disable stupid MSVC warning
2016-01-28 12:14:16 +01:00
Gael Guennebaud
b4d87fff4a
Fix MSVC warning.
2016-01-28 12:12:30 +01:00
Gael Guennebaud
2bad3e78d9
bug #96 , bug #1006 : fix by value argument in result_of.
2016-01-28 12:12:06 +01:00
Gael Guennebaud
7802a6bb1c
Fix unit test filename.
2016-01-28 09:35:37 +01:00
Benoit Steiner
4bf9eaf77a
Deleted an invalid assertion that prevented the assignment of empty tensors.
2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885
Fixed some compilation problems with nvcc + clang
2016-01-27 15:37:03 -08:00
Benoit Steiner
47ca9dc809
Fixed the tensor_cuda test
2016-01-27 14:58:48 -08:00
Benoit Steiner
55a5204319
Fixed the flags passed to nvcc to compile the tensor code.
2016-01-27 14:46:34 -08:00
Gael Guennebaud
4865e1e732
Update link to suitesparse.
2016-01-27 22:48:40 +01:00
Benoit Steiner
9dfbd4fe8d
Made the cuda tests compile using make check
2016-01-27 12:22:17 -08:00
Benoit Steiner
5973bcf939
Properly specify the namespace when calling cout/endl
2016-01-27 12:04:42 -08:00
Eugene Brevdo
c8d94ae944
digamma special function: merge shared code.
...
Moved type-specific code into a helper class digamma_impl_maybe_poly<Scalar>.
2016-01-27 09:52:29 -08:00
Gael Guennebaud
9c8f7dfe94
bug #1156 : fix several function declarations whose arguments were passed by value instead of being passed by reference
2016-01-27 18:34:42 +01:00
Gael Guennebaud
9aa6fae123
bug #1154 : move to dynamic scheduling for spmv products.
2016-01-27 18:03:51 +01:00
Gael Guennebaud
9ac8e8c6a1
Extend mixing type unit test with trmv, and the following not yet supported products: trmm, symv, symm
2016-01-27 17:29:53 +01:00
Gael Guennebaud
6da5d87f92
add nomalloc unit test for rank2 updates
2016-01-27 17:26:48 +01:00
Gael Guennebaud
9801c959e6
Fix tri = complex * real product, and add respective unit test.
2016-01-27 17:12:25 +01:00
Gael Guennebaud
21b5345782
Add meta_least_common_multiple helper.
2016-01-27 17:11:39 +01:00
Gael Guennebaud
fecea26d93
Extend doc on shifting strategy
2016-01-27 15:55:15 +01:00
Ville Kallioniemi
02db1228ed
Add constructor for long types.
2016-01-26 23:41:01 -07:00
Gael Guennebaud
412bb5a631
Remove redundant test.
2016-01-26 23:35:30 +01:00
Gael Guennebaud
0f8d26c6a9
Doc: add flip* and arrayfun MatLab equivalent.
2016-01-26 23:34:48 +01:00
Gael Guennebaud
cfa21f8123
Remove dead code.
2016-01-26 23:33:15 +01:00
Gael Guennebaud
6850eab33b
Re-enable blocking on rows in non-l3 blocking mode.
2016-01-26 23:32:48 +01:00
Gael Guennebaud
aa8c6a251e
Make sure that micro-panel-size is smaller than blocking sizes (otherwise we might get a buffer overflow)
2016-01-26 23:31:48 +01:00
Gael Guennebaud
5b0a9ee003
Make sure that block sizes are smaller than input matrix sizes.
2016-01-26 23:30:24 +01:00
Benoit Jacob
639b1d864a
bug #1152 : Fix data race in static initialization of blas
2016-01-26 11:44:16 -05:00
Christoph Hertzberg
44d4674955
bug #1153 : Don't rely on __GXX_EXPERIMENTAL_CXX0X__ to detect C++11 support
2016-01-26 16:45:33 +01:00
Hauke Heibel
5eb2790be0
Fixed minor typo in SplineFitting.
2016-01-25 22:17:52 +01:00
Gael Guennebaud
8328caa618
bug #51 : add block preallocation mechanism to selfadjoit*matrix product.
2016-01-25 22:06:42 +01:00
Gael Guennebaud
2f9e6314b1
update BLAS interface to general_matrix_matrix_triangular_product
2016-01-25 21:56:05 +01:00
Gael Guennebaud
e58827d2ed
bug #51 : make general_matrix_matrix_triangular_product use L3-blocking helper so that general symmetric rank-updates and general-matrix-to-triangular products do not trigger dynamic memory allocation for fixed size matrices.
2016-01-25 17:16:33 +01:00
Gael Guennebaud
c10021c00a
bug #1144 : clarify the doc about aliasing in case of resizing and matrix product.
2016-01-25 15:50:55 +01:00
Gael Guennebaud
b114e6fd3b
Improve documentation.
2016-01-25 11:56:25 +01:00
Gael Guennebaud
869b4443ac
Add SparseVector::conservativeResize() method.
2016-01-25 11:55:39 +01:00
Benoit Steiner
e3a15a03a4
Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
2016-01-24 23:04:50 -08:00
Benoit Steiner
bd207ce11e
Added missing EIGEN_DEVICE_FUNC qualifier
2016-01-24 20:36:05 -08:00
Gael Guennebaud
acf6f7af6b
Merged in larsmans/eigen (pull request PR-156)
...
Documentation fixes
2016-01-24 22:28:49 +01:00
Lars Buitinck
cc482e32f1
Method is called visit, not visitor
2016-01-24 15:50:59 +01:00
Lars Buitinck
19e437daf0
Copyedit documentation: typos, spelling
2016-01-24 15:50:36 +01:00
Gael Guennebaud
1cf85bd875
bug #977 : add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow
2016-01-23 22:40:11 +01:00
Gael Guennebaud
369d6d1ae3
Add link to reference paper.
2016-01-23 22:16:03 +01:00
Gael Guennebaud
0caa4b1531
bug #1150 : make IncompleteCholesky more robust by iteratively increase the shift until the factorization succeed (with at most 10 attempts).
2016-01-23 22:13:54 +01:00
Benoit Steiner
cb4e53ff7f
Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
...
Add ctor for long
2016-01-22 19:11:31 -08:00
Ville Kallioniemi
9f94e030c1
Re-add executable flags to minimize changeset.
2016-01-22 20:08:45 -07:00
Benoit Steiner
3aeeca32af
Leverage the new blocking code in the tensor contraction code.
2016-01-22 16:36:30 -08:00
Benoit Steiner
4beb447e27
Created a mechanism to enable contraction mappers to determine the best blocking strategy.
2016-01-22 14:37:26 -08:00
Gael Guennebaud
5358c38589
bug #1095 : add Cholmod*::logDeterminant/determinant (from patch of Joshua Pritikin)
2016-01-22 16:05:29 +01:00
Gael Guennebaud
6a44ccb58b
Backout changeset 690bc950f7
2016-01-22 15:03:53 +01:00
Gael Guennebaud
06971223ef
Unify std::numeric_limits and device::numeric_limits within numext namespace
2016-01-22 15:02:21 +01:00
Ville Kallioniemi
9b6c72958a
Update to latest default branch
2016-01-21 23:08:54 -07:00
Ville Kallioniemi
73aec9219b
Make use of 32 bit ints explicit and remove executable bit from headers.
2016-01-21 23:00:32 -07:00
Benoit Steiner
7b68cf2e0f
Pulled latest updates from trunk
2016-01-21 17:17:56 -08:00
Benoit Steiner
c33479324c
Fixed a constness bug
2016-01-21 17:08:11 -08:00
Gael Guennebaud
ee37eb4eed
bug #977 : avoid division by 0 in normalize() and normalized().
2016-01-21 20:43:42 +01:00
Gael Guennebaud
7cae8918c0
Fix compilation on old gcc+AVX
2016-01-21 20:30:32 +01:00
Gael Guennebaud
8dca9f97e3
Add numext::sqrt function to enable custom optimized implementation.
...
This changeset add two specializations for float/double on SSE. Those
are mostly usefull with GCC for which std::sqrt add an extra and costly
check on the result of _mm_sqrt_*. Clang does not add this burden.
In this changeset, only DenseBase::norm() makes use of it.
2016-01-21 20:18:51 +01:00
Gael Guennebaud
34340458cb
bug #1151 : remove useless critical section
2016-01-21 14:29:45 +01:00
Jan Prach
690bc950f7
fix clang warnings
...
"braces around scalar initializer"
2016-01-20 19:35:59 -08:00
Benoit Steiner
f2a842294f
Pulled latest updates from the trunk
2016-01-20 18:12:53 -08:00
Benoit Steiner
7ce932edd3
Small cleanup and small fix to the contraction of row major tensors
2016-01-20 18:12:08 -08:00
Gael Guennebaud
62f7e77711
add upper|lower case in incomplete_cholesky unit test
2016-01-21 00:02:59 +01:00
Benoit Steiner
47076bf00e
Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.
2016-01-20 14:51:48 -08:00
Benoit Steiner
ebd3388ee6
Pulled latest updates from trunk
2016-01-20 13:56:43 -08:00
Gael Guennebaud
ed8ade9c65
bug #1149 : fix Pastix*::*parm()
2016-01-20 19:01:24 +01:00
Gael Guennebaud
4c5e96aab6
bug #1148 : silent Pastix by default
2016-01-20 18:56:17 +01:00
Gael Guennebaud
db237d0c75
bug #1145 : fix PastixSupport LLT/LDLT wrappers (missing resize prior to calls to selfAdjointView)
2016-01-20 18:49:01 +01:00
Gael Guennebaud
0b7169d1f7
bug #1147 : fix compilation of PastixSupport
2016-01-20 18:15:59 +01:00
Gael Guennebaud
234a1094b7
Add static assertion to y(), z(), w() accessors
2016-01-20 09:18:44 +01:00
Ville Kallioniemi
915e7667cd
Remove executable bit from header files
2016-01-19 21:17:29 -07:00
Ville Kallioniemi
2832175a68
Use explicitly 32 bit integer types in constructors.
2016-01-19 20:12:17 -07:00
Benoit Steiner
df79c00901
Improved the formatting of the code
2016-01-19 17:24:08 -08:00
Benoit Steiner
6d472d8375
Moved the contraction mapping code to its own file to make the code more manageable.
2016-01-19 17:22:05 -08:00
Benoit Steiner
b3b722905f
Improved code indentation
2016-01-19 17:09:47 -08:00
Benoit Steiner
5b7713dd33
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
2016-01-19 17:05:10 -08:00
Ville Kallioniemi
63fb66f53a
Add ctor for long
2016-01-17 21:25:36 -07:00
Eugene Brevdo
6a75e7e0d5
Digamma cleanup
...
* Added permission from cephes author to use his code
* Cleanup in ArrayCwiseUnaryOps
2016-01-15 16:32:21 -08:00
Benoit Steiner
34057cff23
Fixed a race condition that could affect some reductions on CUDA devices.
2016-01-15 15:11:56 -08:00
Benoit Steiner
0461f0153e
Made it possible to compare tensor dimensions inside a CUDA kernel.
2016-01-15 11:22:16 -08:00
Benoit Steiner
aed4cb1269
Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.
2016-01-14 21:45:14 -08:00
Benoit Steiner
c1a42c2d0d
Don't disable the AVX implementations of plset when compiling with AVX512 enabled
2016-01-14 17:21:39 -08:00
Benoit Steiner
0366478df8
Added alignment requirement to the AVX512 packet traits.
2016-01-14 17:02:39 -08:00
Benoit Steiner
3cfd16f3af
Fixed the signature of the plset primitives for AVX512
2016-01-14 16:58:01 -08:00
Benoit Steiner
67f44365ea
Fixed the AVX512 signature of the ptranspose primitives
2016-01-14 16:51:11 -08:00
Benoit Steiner
a282eb1363
pscatter/pgather use Index instead of int to specify the stride
2016-01-14 16:39:39 -08:00
Benoit Steiner
7832485575
Deleted unnecessary commas and semicolons
2016-01-14 16:36:29 -08:00
Benoit Steiner
8fe2532e70
Fixed a boundary condition bug in the outer reduction kernel
2016-01-14 09:29:48 -08:00
Benoit Steiner
9f013a9d86
Properly record the rank of reduced tensors in the tensor traits.
2016-01-13 14:24:37 -08:00
Benoit Steiner
79b69b7444
Trigger the optimized matrix vector path more conservatively.
2016-01-12 15:21:09 -08:00
Benoit Steiner
d920d57f38
Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.
2016-01-12 11:32:27 -08:00
Benoit Steiner
bd7d901da9
Reverted a previous change that tripped nvcc when compiling in debug mode.
2016-01-11 17:49:44 -08:00
Benoit Steiner
bbdabbb379
Made the blas utils usable from within a cuda kernel
2016-01-11 17:26:56 -08:00
Benoit Steiner
c5e6900400
Silenced a few compilation warnings.
2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61
Updated the tensor traits: the alignment is not part of the Flags enum anymore
2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c
Enabled the use of fixed dimensions from within a cuda kernel.
2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6
Deleted unused variable.
2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7
Silenced a nvcc compilation warning
2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24
Silenced several compilation warnings triggered by nvcc.
2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
...
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634
Fixed a bug in the dispatch of optimized reduction kernels.
2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e
Re-enabled the optimized reduction CUDA code.
2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a
Cleaned up double-defined macro from last commit
2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3
Alternative way of forcing instantiation of device kernels without
...
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Gael Guennebaud
b557662e58
merge
2016-01-09 08:37:01 +01:00
Gael Guennebaud
8b9dc9f0df
bug #1144 : fix regression in x=y+A*x (aliasing), and move evaluator_traits::AssumeAliasing to evaluator_assume_aliasing.
2016-01-09 08:30:38 +01:00
Benoit Steiner
e76904af1b
Simplified the dispatch code.
2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac
Made it possible to use array of size 0 on CUDA devices
2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
2016-01-08 13:53:40 -08:00
Gael Guennebaud
f9d71a1729
extend matlab conversion table
2016-01-08 22:24:45 +01:00
Benoit Steiner
6639b7d6e8
Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
...
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2
Fixed a typo.
2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818
Optimized the performance of broadcasting of scalars.
2016-01-06 18:47:45 -08:00
Gael Guennebaud
ee738321aa
rm remaining debug code
2016-01-06 14:49:40 +01:00
Christoph Hertzberg
54bf582303
bug #1143 : Work-around gcc bug
2016-01-06 11:59:24 +01:00
Benoit Steiner
99093c0fe0
Added support for AVX512 to the build files
2016-01-05 10:02:49 -08:00
Benoit Steiner
cfff40b1d4
Improved the performance of reductions on CUDA devices
2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf
Added a 'divup' util to compute the floor of the quotient of two integers
2016-01-04 16:29:26 -08:00
Gael Guennebaud
715f6f049f
Improve inline documentation of SparseCompressedBase and its derived classes
2016-01-03 21:56:30 +01:00
Gael Guennebaud
8b0d1eb0f7
Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation warnings
2016-01-01 21:45:06 +01:00
Gael Guennebaud
9900782e88
Mark AlignedBit and EvalBeforeNestingBit with deprecated attribute, and remove the remaining usages of EvalBeforeNestingBit.
2015-12-30 16:47:49 +01:00
Gael Guennebaud
70404e07c2
Workaround clang -Wdocumentation warning about "/*<"
2015-12-30 16:46:45 +01:00
Gael Guennebaud
addb7066e8
Workaround "empty paragraph" warning with clang -Wdocumentation
2015-12-30 16:45:44 +01:00
Gael Guennebaud
eadc377b3f
Add missing doc of Derived template parameter
2015-12-30 16:43:19 +01:00
Gael Guennebaud
29bb599e03
Fix numerous doxygen issues in auto-link generation
2015-12-30 16:04:24 +01:00
Gael Guennebaud
162ccb2938
Fix links to Eigen2-to-Eigen3 porting helpers
2015-12-30 16:03:14 +01:00
Gael Guennebaud
5fae3750b5
Recent versions of doxygen miss-parsed Eigen/* headers
2015-12-30 16:02:05 +01:00
Gael Guennebaud
b84cefe61d
Add missing snippets for erf/erfc/lgamma functions.
2015-12-30 15:12:15 +01:00
Gael Guennebaud
16dd82ed51
Add missing snippet for sign/cwiseSign functions.
2015-12-30 15:11:42 +01:00
Gael Guennebaud
978c379ed7
Add missing ctor from uint
2015-12-30 12:52:38 +01:00
Gael Guennebaud
25f2b8d824
bug #1141 : add missing initialization of CholmodBase::m_*IsOk
2015-12-29 15:50:11 +01:00
Eugene Brevdo
f2471f31e0
Modify constants in SpecialFunctions to lowercase (avoid name conflicts).
2015-12-28 17:48:38 -08:00
Eugene Brevdo
afb35385bf
Change PI* to M_PI* in SpecialFunctions to avoid possible breakage
...
with external DEFINEs.
2015-12-28 17:34:06 -08:00
Eugene Brevdo
14897600b7
Protect digamma tests behind a EIGEN_HAS_C99_MATH check.
2015-12-24 21:28:18 -08:00
Eugene Brevdo
cef81c9084
Merged eigen/eigen into default
2015-12-24 21:17:33 -08:00
Eugene Brevdo
f7362772e3
Add digamma for CPU + CUDA. Includes tests.
2015-12-24 21:15:38 -08:00
Gael Guennebaud
d2e288ae50
Workaround compilers that do not even define _mm256_set_m128.
2015-12-24 16:53:43 +01:00
Benoit Steiner
bdcbc66a5c
Don't attempt to vectorize mean reductions of integers since we can't use
...
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5
Optimized the configuration of the outer reduction cuda kernel
2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b
Added missing define
2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810
Made sure the optimized gpu reduction code is actually compiled.
2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a
Optimized outer reduction on GPUs.
2015-12-22 15:06:17 -08:00
Benoit Steiner
3504ae47ca
Made it possible to run the lgamma, erf, and erfc functors on a CUDA gpu.
2015-12-21 15:20:06 -08:00
Benoit Steiner
1c3e78319d
Added missing const
2015-12-21 15:05:01 -08:00
Benoit Steiner
9f9d8d2f62
Disabled part of the matrix matrix peeling code that's incompatible with 512 bit registers
2015-12-21 13:04:52 -08:00
Benoit Steiner
b74887d5f2
Implemented most of the packet primitives for AVX512
2015-12-21 11:46:36 -08:00
Benoit Steiner
6ffb208c77
Make sure EIGEN_HAS_MM_MALLOC is set to 1 when using the avx512 instruction set.
2015-12-21 11:23:15 -08:00
Benoit Steiner
994d1c60b9
Free memory allocated using posix_memalign() with free() instead of std::free()
2015-12-21 11:21:39 -08:00
Benoit Steiner
b407948a77
Merged in connor-k/eigen (pull request PR-149)
...
[doc] Remove extra ';' in Advanced Initialization sample
2015-12-21 09:44:25 -08:00
Benoit Steiner
a6c243617b
Fixed a typo in previous change.
2015-12-21 09:05:45 -08:00
Benoit Steiner
51be91f15e
Added support for CUDA architectures that don's support for 3.5 capabilities
2015-12-21 08:42:58 -08:00
connor-k
95dd423cca
[doc] Remove extra ';' in Tutorial_AdvancedInitialization_Join.cpp
2015-12-21 01:12:26 +00:00
Tal Hadad
c006ecace1
Fix comments
2015-12-20 20:07:06 +02:00
Tal Hadad
bfed274df3
Use RotationBase, test quaternions and support ranges.
2015-12-20 16:24:53 +02:00
Tal Hadad
b091b7e6ea
Remove unneccesary comment.
2015-12-20 13:00:07 +02:00
Tal Hadad
fabd8474ff
Merged eigen/eigen into default
2015-12-20 12:50:07 +02:00
Tal Hadad
6752a69aa5
Much better tests, and a little bit more functionality.
2015-12-20 12:49:12 +02:00
Benoit Steiner
6d777e1bc7
Fixed a typo.
2015-12-18 19:25:50 -08:00
Benoit Steiner
1b82969559
Add alignment requirement for local buffer used by the slicing op.
2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919
Doubled the speed of full reductions on GPUs.
2015-12-18 14:07:31 -08:00
Gael Guennebaud
3abd8470ca
bug #1140 : remove custom definition and use of _mm256_setr_m128
2015-12-18 14:18:59 +01:00
Benoit Steiner
8dd17cbe80
Fixed a clang compilation warning triggered by the use of arrays of size 0.
2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684
Silenced some compilation warnings triggered by nvcc
2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3
Made it possible to run tensor chipping operations on CUDA devices
2015-12-17 13:29:08 -08:00
Benoit Steiner
2ca55a3ae4
Fixed some compilation error triggered by the tensor code with msvc 2008
2015-12-16 20:45:58 -08:00
Gael Guennebaud
55aef139ff
Added tag 3.3-beta1 for changeset 9f9de1aaa9
2015-12-16 21:49:02 +01:00
Gael Guennebaud
9f9de1aaa9
bump to 3.3-beta1
2015-12-16 21:48:48 +01:00
Christoph Hertzberg
49d96aee64
bug #1120 : Make sure that SuperLU version is checked
2015-12-16 11:37:16 +01:00
Gael Guennebaud
ae8b217a01
Update doc to make it clear that only SuperLU 4.x is supported
2015-12-16 10:47:03 +01:00
Gael Guennebaud
35d8725c73
Disable AutoDiffScalar generic copy ctor for non compatible scalar types (fix ambiguous template instantiation)
2015-12-16 10:14:24 +01:00
Christoph Hertzberg
92655e7215
bug #1136 : Protect isinf for Intel compilers. Also don't distinguish GCC from ICC and don't rely on EIGEN_NOT_A_MACRO, which might not be defined when including this.
2015-12-15 11:34:52 +01:00
Benoit Steiner
17352e2792
Made the entire TensorFixedSize api callable from a CUDA kernel.
2015-12-14 15:20:31 -08:00
Benoit Steiner
75e19fc7ca
Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible to call them from a CUDA kernel.
2015-12-14 15:12:55 -08:00
Gael Guennebaud
140f3a02a8
Fix MKL wrapper for ComplexSchur
2015-12-11 23:31:21 +01:00
Gael Guennebaud
4483c0fdf6
Fix unused variable warning.
2015-12-11 23:29:53 +01:00
Gael Guennebaud
774dba87c8
merge
2015-12-11 23:28:44 +01:00
Gael Guennebaud
c884a8e7f4
merge
2015-12-11 23:07:33 +01:00
Gael Guennebaud
4d708457d0
Increase axpy vector size
2015-12-11 23:07:22 +01:00
Benoit Steiner
b8861b0c25
Make sure the data is aligned on a 64 byte boundary when using avx512 instructions.
2015-12-11 09:19:57 -08:00
Gael Guennebaud
b60a8967f5
bug #1134 : fix JacobiSVD pre-allocation
...
(grafted from f22036f5f8
)
2015-12-11 11:59:11 +01:00
Gael Guennebaud
ca39b1546e
Merged in ebrevdo/eigen (pull request PR-148)
...
Add special functions to eigen: lgamma, erf, erfc.
2015-12-11 11:52:09 +01:00
Gael Guennebaud
82152f2ae6
bug #1132 : add EIGEN_MAPBASE_PLUGIN
2015-12-11 11:43:49 +01:00
Gael Guennebaud
4519fd5d40
Fix MKL compilation issue
2015-12-11 11:11:38 +01:00
Gael Guennebaud
7385e6e2ef
Remove useless explicit
2015-12-11 11:11:19 +01:00
Gael Guennebaud
bcb4f126a7
Fix compilation of PardisoSupport
2015-12-11 11:11:00 +01:00
Gael Guennebaud
30b5c4cd14
Remove useless "explicit", and fix inline/static order.
2015-12-11 10:59:39 +01:00
Gael Guennebaud
79c1e6d0a6
Fix compilation of MKL support.
2015-12-11 10:55:07 +01:00
Gael Guennebaud
c684a07eba
merge
2015-12-11 10:06:38 +01:00
Gael Guennebaud
836da91b3f
Fix unit tests wrt EIGEN_DEFAULT_TO_ROW_MAJOR
2015-12-11 10:06:28 +01:00
Benoit Steiner
6af52a1227
Fixed a typo in the constructor of tensors of rank 5.
2015-12-10 23:31:12 -08:00
Benoit Steiner
2d8f2e4042
Made 2 tests compile without cxx11.
...
HdG: --
2015-12-10 23:20:04 -08:00
Benoit Steiner
8d28a161b2
Use the proper accessor to refer to the value of a scalar tensor
2015-12-10 22:53:56 -08:00
Benoit Steiner
8e00ea9a92
Fixed the coefficient accessors use for the 2d and 3d case when compiling without cxx11 support.
2015-12-10 22:45:10 -08:00
Benoit Steiner
9db8316c93
Updated the cxx11_tensor_custom_op to not require cxx11.
2015-12-10 20:53:44 -08:00
Benoit Steiner
4e324ca6ae
Updated the cxx11_tensor_assign test to make it compile without support for cxx11
2015-12-10 20:47:25 -08:00
Benoit Steiner
6acf2bd472
Fixed compilation error triggered by MSVC 2008
2015-12-10 17:17:42 -08:00
Benoit Steiner
9a415fb1e2
Preliminary support for AVX512
2015-12-10 15:34:57 -08:00
Benoit Steiner
b820b097b8
Created EIGEN_HAS_C99_MATH define as Gael suggested.
2015-12-10 13:52:05 -08:00
Gael Guennebaud
df6f54ff63
Fix storage order of PartialRedux
2015-12-10 22:24:58 +01:00
Gael Guennebaud
d1862967a8
Make sure ADOLC is recent enough by searching for adtl.h
2015-12-10 22:23:21 +01:00
Mark Borgerding
22dd368ea0
sign(complex) compiles for GPU
2015-12-10 16:14:29 -05:00
Benoit Steiner
8314962ce2
Only test the lgamma, erf and erfc function when using a C99 compliant compiler
2015-12-10 13:13:45 -08:00
Benoit Steiner
58e06447de
Silence a compilation warning
2015-12-10 13:11:36 -08:00
Benoit Steiner
48877a6933
Only implement the lgamma, erf, and erfc functions when using a compiler compliant with the C99 specification.
2015-12-10 13:09:49 -08:00
Gael Guennebaud
46d2f6cd78
Workaround gcc issue with -O3 and the i387 FPU.
2015-12-10 21:33:43 +01:00
Gael Guennebaud
7ad1aaec1d
bug #1103 : fix neon vectorization of pmul(Packet1cd,Packet1cd)
2015-12-10 16:06:33 +01:00
Gael Guennebaud
b0a1d6f2e5
Improve handling of deprecated EIGEN_INCLUDE_INSTALL_DIR variable
2015-12-10 15:47:06 +01:00
Benoit Steiner
53b196aa5f
Simplified the implementation of lgamma, erf, and erfc
2015-12-08 14:17:34 -08:00
Benoit Steiner
e535450573
Cleanup
2015-12-08 14:06:39 -08:00
Benoit Steiner
b630d10b62
Only disable the erf, erfc, and lgamma tests for older versions of c++.
2015-12-07 17:08:08 -08:00
Benoit Steiner
b1ae39794c
Simplified the code a bit
2015-12-07 16:46:35 -08:00
Benoit Steiner
73b68d4370
Fixed a couple of typos
...
Cleaned up the code a bit.
2015-12-07 16:38:48 -08:00
Eugene Brevdo
fa4f933c0f
Add special functions to Eigen: lgamma, erf, erfc.
...
Includes CUDA support and unit tests.
2015-12-07 15:24:49 -08:00
Benoit Steiner
7dfe75f445
Fixed compilation warnings
2015-12-07 08:12:30 -08:00
Gael Guennebaud
ad3d68400e
Add matrix-free solver example
2015-12-07 12:33:38 +01:00
Gael Guennebaud
b37036afce
Implement wrapper for matrix-free iterative solvers
2015-12-07 12:23:22 +01:00
Benoit Steiner
f4ca8ad917
Use signed integers instead of unsigned ones more consistently in the codebase.
2015-12-04 18:14:16 -08:00
Benoit Steiner
490d26e4c1
Use integers instead of std::size_t to encode the number of dimensions in the Tensor class since most of the code currently already use integers.
2015-12-04 10:15:11 -08:00
Benoit Steiner
d20efc974d
Made it possible to use the sigmoid functor within a CUDA kernel.
2015-12-04 09:38:15 -08:00
Benoit Steiner
e25e3a041b
Added rsqrt() method to the Array class: this method computes the coefficient-wise inverse square root much more efficiently than calling sqrt().inverse().
2015-12-03 18:16:35 -08:00
Benoit Steiner
029052d276
Deleted redundant code
2015-12-03 17:08:47 -08:00
Benoit Steiner
c41e9e4bd0
Merged in Unril/eigen-1/Unril/fixes-internal-compiler-error-while-comp-1449156092576 (pull request PR-147)
...
Fixes internal compiler error while compiling with VC2015 Update1 x64.
2015-12-03 14:26:14 -08:00
Gael Guennebaud
1562e13aba
Add missing Rotation2D::operator=(Matrix2x2)
2015-12-03 22:25:26 +01:00
Nikolay Fedorov
944647c0aa
Fixes internal compiler error while compiling with VC2015 Update1 x64.
2015-12-03 15:21:43 +00:00
Benoit Steiner
d2d4c45d55
Made it possible to leverage several binary functor in a CUDA kernel
...
Explicitely specified the return type of the various scalar_cmp_op functors.
2015-12-02 17:21:33 -08:00
Gael Guennebaud
c5b86893e7
bug #1123 : add missing documentation of angle() and axis()
2015-12-01 14:45:08 +01:00
Gael Guennebaud
0bb12fa614
Add LU::transpose().solve() and LU::adjoint().solve() API.
2015-12-01 14:38:47 +01:00
Rasmus Munk Larsen
1663d15da7
Add internal method _solve_impl_transposed() to LU decomposition classes that solves A^T x = b or A^* x = b.
2015-11-30 13:39:24 -08:00
Gael Guennebaud
274b2272b7
Make bench_gemm compatible with 3.2
2015-12-01 09:57:31 +01:00
Gael Guennebaud
6c02cbbb0f
Fix matrix to quaternion (and angleaxis) conversion for matrix expression.
2015-12-01 09:45:56 +01:00
Gael Guennebaud
844561939f
Do not check NeedsToAlign if no static alignment
2015-11-30 22:36:14 +01:00
Gael Guennebaud
1d906d883d
Fix degenerate cases in syrk and trsm
2015-11-30 22:20:31 +01:00
Gael Guennebaud
e7a1c48185
Update BLAS API unit tests
2015-11-30 22:19:20 +01:00
Gael Guennebaud
034ca5a22d
Clean hardcoded compilation options
2015-11-30 17:05:42 +01:00
Gael Guennebaud
fd727249ad
Update ADOL-C support.
2015-11-30 16:00:22 +01:00
Gael Guennebaud
6fcd316f23
Extend superlu cmake script to check version
2015-11-30 14:48:11 +01:00
Gael Guennebaud
afa11d646d
Fix UmfPackLU ctor for exppressions
2015-11-27 22:04:22 +01:00
Gael Guennebaud
6bdeb8cfbe
bug #918 , umfpack: add access to umfpack return code and parameters
2015-11-27 21:58:36 +01:00
Gael Guennebaud
3f32f5ec22
ArrayBase::sign: add unit test and fix doc
2015-11-27 16:27:53 +01:00
Gael Guennebaud
da46b1ed54
bug #1112 : fix compilation on exotic architectures
2015-11-27 15:57:18 +01:00
Gael Guennebaud
1261d020c3
bug #1120 , superlu: mem_usage_t is now uniquely defined, so let's use it.
2015-11-27 10:39:09 +01:00
Gael Guennebaud
0ff127e896
Preserve CMAKE_CXX_FLAGS in BTL
2015-11-27 10:18:39 +01:00
Gael Guennebaud
ca001d7c2a
Big 1009, part 2/2: add static assertion on LinearAccessBit in coeff(index)-like methods.
2015-11-27 10:06:47 +01:00
Gael Guennebaud
91a7059459
bug #1009 , part 1/2: make sure vector expressions expose LinearAccessBit flag.
2015-11-27 10:06:07 +01:00
Mark Borgerding
7ddcf97da7
added scalar_sign_op (both real,complex)
2015-11-24 17:15:07 -05:00
Benoit Steiner
44848ac39b
Fixed a bug in TensorArgMax.h
2015-11-23 15:58:47 -08:00
Benoit Steiner
547a8608e5
Fixed the implementation of Eigen::internal::count_leading_zeros for MSVC.
...
Also updated the code to silence bogux warnings generated by nvcc when compilining this function.
2015-11-23 12:17:45 -08:00
Benoit Steiner
562078780a
Don't create more cuda blocks than necessary
2015-11-23 11:00:10 -08:00
Benoit Steiner
df31ca3b9e
Made it possible to refer t oa GPUDevice from code compile with a regular C++ compiler
2015-11-23 10:03:53 -08:00
Benoit Steiner
1e04059012
Deleted unused variable.
2015-11-23 08:36:54 -08:00
Benoit Steiner
4286b2d494
Pulled latest updates from trunk
2015-11-23 08:28:34 -08:00
Gael Guennebaud
f9fff67a56
Disable "decorated name length exceeded, name was truncated" MSVC warning.
2015-11-23 15:03:24 +01:00
Gael Guennebaud
f3dca16a1d
bug #1117 : workaround unused-local-typedefs warning when EIGEN_NO_STATIC_ASSERT and NDEBUG are both defined.
2015-11-23 14:07:52 +01:00
Gael Guennebaud
31b661e4ca
Add a note on initParallel being optional in C++11.
2015-11-23 13:28:43 +01:00
Gael Guennebaud
8a2659f0cb
Improve numerical robustness of some unit tests
2015-11-23 10:53:55 +01:00
Gael Guennebaud
82bd4e546a
Merged in dr15jones/eigen (pull request PR-146)
...
Use a class constructor to initialize CPU cache sizes
2015-11-22 22:50:31 +01:00
Gael Guennebaud
35c17a3fc8
Use overload instead of template full specialization to please old MSVC
2015-11-22 22:09:57 +01:00
Gael Guennebaud
b265979a70
Make FullPivLU::solve use rank() instead of nonzeroPivots().
2015-11-21 15:03:04 +01:00
Benoit Steiner
9fa65d3838
Split TensorDeviceType.h in 3 files to make it more manageable
2015-11-20 17:42:50 -08:00
Benoit Steiner
a367804856
Added option to force the usage of the Eigen array class instead of the std::array class.
2015-11-20 12:41:40 -08:00
Benoit Steiner
86486eee2d
Pulled latest updates from trunk
2015-11-20 11:10:37 -08:00
Benoit Steiner
383d1cc2ed
Added proper support for fast 64bit integer division on CUDA
2015-11-20 11:09:46 -08:00
Chris Jones
4946d758c9
Use a class constructor to initialize CPU cache sizes
...
Using a static instance of a class to initialize the values for
the CPU cache sizes guarantees thread-safe initialization of the
values when using C++11. Therefore under C++11 it is no longer
necessary to call Eigen::initParallel() before calling any eigen
functions on different threads.
2015-11-20 19:58:08 +01:00
Gael Guennebaud
027a846b34
Use .data() instead of &coeffRef(0).
2015-11-20 15:30:10 +01:00
Gael Guennebaud
4522ffd17c
Add regression using test for array<complex>/real
2015-11-20 15:29:32 +01:00
Gael Guennebaud
4fc36079e7
Fix overload instantiation for clang
2015-11-20 15:29:03 +01:00
Gael Guennebaud
4a985e793c
Workaround msvc broken complex/complex division in unit test
2015-11-20 14:52:08 +01:00
Gael Guennebaud
5c9c0dca4d
Add missing using statement to enable fast Array<complex> / real operations. (was ok for Matrix only)
2015-11-20 14:51:36 +01:00
Gael Guennebaud
e1b27bcb0b
Workaround MSVC missing overloads of std::fpclassify for integral types
2015-11-20 13:55:34 +01:00
Gael Guennebaud
e52d4f8d8d
Add is_integral<> type traits
2015-11-20 13:54:28 +01:00
Benoit Steiner
0ad7c7b1ad
Fixed another clang compilation warning
2015-11-19 15:52:51 -08:00
Benoit Steiner
66ff9b2c6c
Fixed compilation warning generated by clang
2015-11-19 15:40:32 -08:00
Benoit Steiner
f37a5f1c53
Fixed compilation error triggered by nvcc
2015-11-19 14:34:26 -08:00
Benoit Steiner
04f1284f9a
Shard the uint128 test
2015-11-19 14:08:08 -08:00
Benoit Steiner
e2859c6b71
Cleanup the integer division test
2015-11-19 14:07:50 -08:00
Benoit Steiner
f8df393165
Added support for 128bit integers on CUDA devices.
2015-11-19 13:57:27 -08:00
Benoit Steiner
7d1cedd0fe
Added numeric limits for unsigned integers
2015-11-18 17:17:44 -08:00
Gael Guennebaud
1994999105
Add regression unit test for prod.maxCoeff(i)
2015-11-18 23:29:07 +01:00
Benoit Steiner
1dd444ea71
Avoid using the version of TensorIntDiv optimized for 32-bit integers when the divisor can be equal to one since it isn't supported.
2015-11-18 11:37:58 -08:00
Benoit Jacob
4926251f13
bug #1115 : enable static alignment on ARM outside of old-GCC
2015-11-18 10:55:23 -05:00
Gael Guennebaud
a64156cae5
Workaround i387 issue in unit test
2015-11-16 13:33:54 +01:00
Benoit Steiner
bf792f59e3
Only enable the use of constexpr with nvcc if we're using version 7.5 or above
2015-11-13 12:24:22 -08:00
Benoit Steiner
f1fbd74db9
Added sanity check
2015-11-13 09:07:27 -08:00
Benoit Steiner
1e1755352d
Made it possible to compute atan, tanh, sinh and cosh on GPU
2015-11-12 20:19:38 -08:00
Benoit Steiner
7815b84be4
Fixed a compilation warning
2015-11-12 20:16:59 -08:00
Benoit Steiner
10a91930cc
Fixed a compilation warning triggered by nvcc
2015-11-12 20:10:52 -08:00
Benoit Steiner
ed4b37de02
Fixed a few compilation warnings
2015-11-12 20:08:01 -08:00
Benoit Steiner
b69248fa2a
Added a couple of missing EIGEN_DEVICE_FUNC
2015-11-12 20:01:50 -08:00
Benoit Steiner
0aaa5941df
Silenced some compilation warnings triggered by nvcc
2015-11-12 19:11:43 -08:00
Benoit Steiner
2c73633b28
Fixed a few more typos
2015-11-12 18:39:19 -08:00
Benoit Steiner
be08e82953
Fixed typos
2015-11-12 18:37:40 -08:00
Benoit Steiner
e4d45f3440
Only enable the use of const expression when nvcc is called with the -std=c++11 option
2015-11-12 18:18:35 -08:00
Benoit Steiner
150c12e138
Completed the IndexList rewrite
2015-11-12 18:11:56 -08:00
Benoit Steiner
8037826367
Simplified more of the IndexList code.
2015-11-12 17:19:45 -08:00
Benoit Steiner
e9ecfad796
Started to make the IndexList code compile by more compilers
2015-11-12 16:41:14 -08:00
Benoit Steiner
7a1316fcc5
Fixed compilation error with xcode.
2015-11-12 11:05:54 -08:00
Benoit Steiner
737d237722
Made it possible to run some of the CXXMeta functions on a CUDA device.
2015-11-12 09:02:59 -08:00
Benoit Steiner
1e072424e8
Moved the array code into it's own file.
2015-11-12 08:57:04 -08:00
Benoit Steiner
aa5f1ca714
gen_numeric_list takes a size_t, not a int
2015-11-12 08:30:10 -08:00
Gael Guennebaud
dfbb889fe9
Fix missing Dynamic versus HugeCost changes
2015-11-12 12:09:48 +01:00
Gael Guennebaud
e701cb2c7c
Update EIGEN_FAST_MATH doc
2015-11-12 12:09:19 +01:00
Benoit Steiner
9fa10fe52d
Don't use std::array when compiling with nvcc since nvidia doesn't support the use of STL containers on GPU.
2015-11-11 15:38:30 -08:00
Benoit Steiner
c587293e48
Fixed a compilation warning
2015-11-11 15:35:12 -08:00
Benoit Steiner
7f1c29fb0c
Make it possible for a vectorized tensor expression to be executed in a CUDA kernel.
2015-11-11 15:22:50 -08:00
Benoit Steiner
4f471146fb
Allow the vectorized version of the Binary and the Nullary functors to run on GPU
2015-11-11 15:19:00 -08:00
Benoit Steiner
99f4778506
Disable SFINAE when compiling with nvcc
2015-11-11 15:04:58 -08:00
Benoit Steiner
5cb18e5b5e
Fixed CUDA compilation errors
2015-11-11 14:36:33 -08:00
Benoit Steiner
228edfe616
Use Eigen::NumTraits instead of std::numeric_limits
2015-11-11 09:26:23 -08:00
Taylor Braun-Jones
b836acb799
Further fixes for CMAKE_INSTALL_PREFIX correctness
...
And other related cmake cleanup, including:
- Use CMAKE_CURRENT_LIST_DIR to find UseEigen3.cmake
- Use INSTALL_DIR term consistently for variable names
- Drop unnecessary extra EIGEN_INCLUDE_INSTALL_DIR
- Fix some paths in generated eigen3.pc and Eigen3Config.cmake files
missing CMAKE_INSTALL_PREFIX
- Fix pkgconfig directory choice ignored if it doesn't exist at configure
time (bug #711 )
2015-11-07 21:29:24 -05:00
Gael Guennebaud
e73ef4f25e
bug #1109 : use noexcept instead of throw for C++11 compilers
2015-12-10 14:21:23 +01:00
Gael Guennebaud
145ad5d800
Use more explicit names.
2015-12-10 12:03:38 +01:00
Gael Guennebaud
75f0fe3795
Fix usage of "Index" as a compile time integral.
2015-12-10 12:01:06 +01:00
Gael Guennebaud
f248249c1f
bug #1113 : fix name conflict with C99's "I".
2015-12-10 11:57:57 +01:00
Gael Guennebaud
21ed29e2c9
Disable complex scalar types because the compiler might aggressively vectorize
...
the initialization of complex coeffs to 0 before we can check for alignedness
2015-12-09 20:46:09 +01:00
Gael Guennebaud
fbe18d5507
Forbid the creation of SparseCompressedBase object
2015-12-09 15:47:32 +01:00
Gael Guennebaud
dc73430d4b
bug #1074 : forbid the creation of PlainObjectBase object by making its ctor protected
2015-12-09 15:47:08 +01:00
Gael Guennebaud
1257fbd2f9
Fix sign-unsigned issue in enum
2015-12-09 10:06:42 +01:00
Gael Guennebaud
4549549992
Fix and clarify documentation of Transform wrt operator*(MatrixBase)
2015-12-08 16:21:49 +01:00
Gael Guennebaud
543bd28a24
Fix Alignment in coeff-based product, and enable unaligned vectorization
2015-12-08 11:28:05 +01:00
Gael Guennebaud
03ad4fc504
Extend unit test of coeff-based product to check many more combinations
2015-12-08 11:27:43 +01:00
Benoit Steiner
20e2ab1121
Fixed another compilation warning
2015-12-07 16:17:57 -08:00
Benoit Steiner
d573efe303
Code cleanup
2015-11-06 14:54:28 -08:00
Benoit Steiner
9fa283339f
Silenced a compilation warning
2015-11-06 11:44:22 -08:00
Benoit Steiner
53432a17b2
Added static assertions to avoid misuses of padding, broadcasting and concatenation ops.
2015-11-06 10:26:19 -08:00
Benoit Steiner
6857a35a11
Fixed typos
2015-11-06 09:42:05 -08:00
Benoit Steiner
33cbdc2d15
Added more missing EIGEN_DEVICE_FUNC
2015-11-06 09:29:59 -08:00
Benoit Steiner
d27e4f1cba
Added missing EIGEN_DEVICE_FUNC statements
2015-11-06 09:23:58 -08:00
Benoit Steiner
ed1962b464
Reimplement the tensor comparison operators by using the scalar_cmp_op functors. This makes them more cuda friendly.
2015-11-06 09:18:43 -08:00
Gael Guennebaud
bfd6ee64f3
bug #1105 : fix default preallocation when moving from compressed to uncompressed mode
2015-11-06 15:05:37 +01:00
Benoit Steiner
29038b982d
Added support for modulo operation
2015-11-05 19:39:48 -08:00
Benoit Steiner
fbcf8cc8c1
Pulled latest updates from trunk
2015-11-05 14:30:02 -08:00
Benoit Steiner
0d15ad8019
Updated the regressions tests that cover full reductions
2015-11-05 14:22:30 -08:00
Benoit Steiner
c75a19f815
Misc fixes to full reductions
2015-11-05 14:21:20 -08:00
Benoit Steiner
ec5a81b45a
Fixed a bug in the extraction of sizes of fixed sized tensors of rank 0
2015-11-05 13:39:48 -08:00
Gael Guennebaud
589b839ad0
Add unit test for Hessian via AutoDiffScalar
2015-11-05 14:54:05 +01:00
Gael Guennebaud
9ceaa8e445
bug #1063 : nest AutoDiffScalar by value to avoid dead references
2015-11-05 13:54:26 +01:00
Gael Guennebaud
ae87f094eb
Fix "," in non SSE4 mode
2015-11-05 12:08:36 +01:00
Gael Guennebaud
2844e7ae43
SPQR and UmfPack need to link to cholmod.
...
(grafted from 47592d31ea
)
2015-11-05 12:05:02 +01:00
Gael Guennebaud
780eeb3be7
prevent stack overflow in unit test
2015-11-05 00:32:48 -08:00
Benoit Steiner
beedd9630d
Updated the reduction code so that full reductions now return a tensor of rank 0.
2015-11-04 13:57:36 -08:00
Gael Guennebaud
90323f1751
Fix AVX round/ceil/floor, and fix respective unit test
2015-11-04 22:15:57 +01:00
Gael Guennebaud
3dd24bdf99
Merged in aavenel/eigen (pull request PR-142)
...
Add round, ceil and floor for SSE4.1/AVX (Bug #70 )
2015-11-04 18:26:38 +01:00
Gael Guennebaud
902750826b
Add support for dense.cwiseProduct(sparse)
...
This also fixes a regression regarding (dense*sparse).diagonal()
2015-11-04 17:42:07 +01:00
Gael Guennebaud
f6b1deebab
Fix compilation of sparse-triangular to dense assignment
2015-11-04 17:02:32 +01:00
Benoit Steiner
36cd6daaae
Made the CUDA implementation of ploadt_ro compatible with cuda implementations older than 3.5
2015-11-03 16:36:30 -08:00
Gael Guennebaud
29a94c8055
compilation issue
2015-11-02 16:11:59 +01:00
Alexandre Avenel
38832e0791
Merge
2015-11-01 10:55:42 +01:00
Alexandre Avenel
d46e2c10a6
Add round, ceil and floor for SSE4.1/AVX (Bug #70 )
2015-11-01 10:49:27 +01:00
Gael Guennebaud
c0352197a1
bug #1099 : add missing incude for CUDA
2015-10-31 18:06:28 +01:00
Gael Guennebaud
b32948c642
bug #1102 : fix multiple definition linking issue
2015-10-30 22:25:59 +01:00
Gael Guennebaud
5a2007f7e4
typo
2015-10-30 22:16:23 +01:00
Gael Guennebaud
8a3151de2e
Limit matrix size for other eigen and schur decompositions
2015-10-30 18:06:03 +01:00
Gael Guennebaud
fdf3030ff8
Limit matrix sizes for trmm unit test and complexes.
2015-10-30 15:07:50 +01:00
Gael Guennebaud
9285647dfe
Limit matrix size when testing for NaN: they can become prohibitively expensive when running on x87 fp unit
2015-10-30 14:44:22 +01:00
Gael Guennebaud
ddaaa2d381
bug #1101 : typo
2015-10-30 12:02:52 +01:00
Gael Guennebaud
c8c8821038
Biug 1100: remove explicit CMAKE_INSTALL_PREFIX prefix to please cmake install's DESTINATION argument
2015-10-30 12:00:34 +01:00
Gael Guennebaud
0e6cb08f92
Fix shadow warning
2015-10-30 11:44:22 +01:00
Gael Guennebaud
27c56bf60f
Workaround compilation issue with MSVC<=2013
2015-10-30 10:57:11 +01:00
Gael Guennebaud
213bd0253a
Fix gcc 4.4 compilation issue
2015-10-30 08:44:37 +01:00
Benoit Steiner
6a02c2a85d
Fixed a compilation warning
2015-10-29 20:21:29 -07:00
Benoit Steiner
ca12d4c3b3
Pulled latest updates from trunk
2015-10-29 17:57:48 -07:00
Benoit Steiner
31bdafac67
Added a few tests to cover rank-0 tensors
2015-10-29 17:56:48 -07:00
Benoit Steiner
ce19e38c1f
Added support for tensor maps of rank 0.
2015-10-29 17:49:04 -07:00
Benoit Steiner
3785c69294
Added support for fixed sized tensors of rank 0
2015-10-29 17:31:03 -07:00
Benoit Steiner
0d7a23d34e
Extended the reduction code so that reducing an empty set returns the neural element for the operation
2015-10-29 17:29:49 -07:00
Benoit Steiner
1b0685d09a
Added support for rank-0 tensors
2015-10-29 17:27:38 -07:00
Benoit Steiner
c444a0a8c3
Consistently use the same index type in the fft codebase.
2015-10-29 16:39:47 -07:00
Benoit Steiner
09ea3a7acd
Silenced a few more compilation warnings
2015-10-29 16:22:52 -07:00
Benoit Steiner
0974a57910
Silenced compiler warning
2015-10-29 15:00:06 -07:00
Benoit Steiner
ac142773a7
Don't call internal::check_rows_cols_for_overflow twice in PlainObjectBase::resize since this is extremely expensive for small arrays
2015-10-29 13:13:39 -07:00
Gael Guennebaud
05a0ee25df
Fix warning.
2015-10-29 21:06:07 +01:00
Gael Guennebaud
7cfbe35e49
Fix duplicated declaration
2015-10-29 21:05:52 +01:00
Gael Guennebaud
568d488a27
Fusion the two similar specialization of Sparse2Dense Assignment.
...
This change also fixes a compilation issue with MSVC<=2013.
2015-10-29 13:16:15 +01:00
Gael Guennebaud
7a5f83ca60
Add overloads for real times sparse<complex> operations.
...
This avoids real to complex conversions, and also fixes a compilation issue with MSVC.
2015-10-29 03:55:39 -07:00
Gael Guennebaud
c688cc28d6
fix copy/paste typo
2015-10-28 20:20:05 +01:00
Gael Guennebaud
5b6cff5b0e
fix typo
2015-10-28 20:18:00 +01:00
Gael Guennebaud
6759a21e49
CUDA support: define more accurate min/max values for device::numeric_limits of float and double using values from cfloat header
2015-10-28 16:49:15 +01:00
Gael Guennebaud
28ddb5158d
Enable std::isfinite/nan/inf on MSVC 2013 and newer and clang. Fix isinf for gcc4.4 and older msvc with fast-math.
2015-10-28 16:27:20 +01:00
Ilya Popov
1a842c0dc4
Fix typo in TutorialSparse: laplace equation contains gradient symbol (\nabla) instead of laplacian (\Delta).
2015-10-28 09:52:55 +00:00
Gael Guennebaud
8531304858
Simplify cost computations based on HugeCost being smaller that unrolling limit
2015-10-28 13:39:02 +01:00
Gael Guennebaud
1f11dd6ced
Add a unit test for large chains of products
2015-10-28 12:53:13 +01:00
Gael Guennebaud
902c2db5a5
Extend vectorwiseop unit test with column/row vectors as input.
2015-10-28 11:59:20 +01:00
Gael Guennebaud
77ff3386b7
Refactoring of the cost model:
...
- Dynamic is now an invalid value
- introduce a HugeCost constant to be used for runtime-cost values or arbitrarily huge cost
- add sanity checks for cost values: must be >=0 and not too large
This change provides several benefits:
- it fixes shortcoming is some cost computation where the Dynamic case was not properly handled.
- it simplifies cost computation logic, and should avoid future similar shortcomings.
- it allows to distinguish between different level of dynamic/huge/infinite cost
- it should enable further simplifications in the computation of costs (save compilation time)
2015-10-28 11:42:14 +01:00
Gael Guennebaud
827d8a9bad
Fix false negative in redux test
2015-10-27 21:37:03 +01:00
Gael Guennebaud
d4cf436cb1
Enable mpreal unit test for C++11 compiler only
2015-10-27 17:35:54 +01:00
Gael Guennebaud
946f8850e8
bug #1008 : add a unit test for fast-math mode and isinf/isnan/isfinite/etc. functions.
2015-10-27 16:44:45 +01:00
Gael Guennebaud
e3031d7bfa
bug #1008 : improve handling of fast-math mode for older gcc versions.
2015-10-27 16:43:23 +01:00
Gael Guennebaud
2475a1de48
bug #1008 : stabilize isfinite/isinf/isnan/hasNaN/allFinite functions for fast-math mode.
2015-10-27 15:39:50 +01:00
Gael Guennebaud
699c33e76a
merge
2015-10-27 11:10:11 +01:00
Gael Guennebaud
8c66b6bc61
Simplify evaluator::Flags for Map<>
2015-10-27 11:06:42 +01:00
Gael Guennebaud
12f50a4697
Fix assign vectorization logic with respect to fixed outer-stride
2015-10-27 11:04:19 +01:00
Gael Guennebaud
c1e0b6dde3
merge
2015-10-27 11:02:03 +01:00
Gael Guennebaud
73f692d16b
Fix ambiguous instantiation
2015-10-27 11:01:37 +01:00
Gael Guennebaud
0fc8954282
Improve readibility of EIGEN_DEBUG_ASSIGN mode.
2015-10-27 10:38:49 +01:00
Benoit Steiner
1c8312c811
Started to add support for tensors of rank 0
2015-10-26 14:29:26 -07:00
Benoit Steiner
1f4c98abb1
Fixed compilation warning
2015-10-26 12:42:55 -07:00
Benoit Steiner
9dc236bc83
Fixed compilation warning
2015-10-26 12:41:48 -07:00
Benoit Steiner
9f721384e0
Added support for empty dimensions
2015-10-26 11:21:27 -07:00
Benoit Steiner
ded4336988
Pulled latest updates from trunk
2015-10-26 10:48:29 -07:00
Benoit Steiner
a3e144727c
Fixed compilation warning
2015-10-26 10:48:11 -07:00
Benoit Steiner
f8e7b9590d
Fixed compilation error triggered by gcc 4.7
2015-10-26 10:47:37 -07:00
Gael Guennebaud
e6f8c5c325
Add support to directly evaluate the product of two sparse matrices within a dense matrix.
2015-10-26 18:20:00 +01:00
Gael Guennebaud
a5324a131f
bug #1092 : fix iterative solver ctors for expressions as input
2015-10-26 16:16:24 +01:00
Gael Guennebaud
f93654ae16
bug #1098 : fix regression introduced when generalizing some compute() methods in changeset 7031a851d4
...
.
2015-10-26 16:00:25 +01:00
Gael Guennebaud
af2e25d482
Merged in infinitei/eigen (pull request PR-140)
...
bug #1097 Added ArpackSupport to cmake install target
2015-10-26 15:31:39 +01:00
Gael Guennebaud
4704bdc9c0
Make the IterativeLinearSolvers module compatible with MPL2-only mode
...
by defaulting to COLAMDOrdering and NaturalOrdering for ILUT and ILLT respectively.
2015-10-26 15:17:52 +01:00
Gael Guennebaud
47d44c2f37
Add missing licence header to some top header files
2015-10-26 11:46:05 +01:00
Gael Guennebaud
8a211bb1a9
bug #1088 : fix setIdenity for non-compressed sparse-matrix
2015-10-25 22:01:58 +01:00
Gael Guennebaud
ac6b2266b9
Fix SparseMatrix::insert/coeffRef for non-empty compressed matrix
2015-10-25 22:00:38 +01:00
Abhijit Kundu
0ed41bdefa
ArpackSupport was missing here also.
2015-10-16 18:21:02 -07:00
Abhijit Kundu
1127ca8586
Added ArpackSupport to cmake install target
2015-10-16 16:41:33 -07:00
Gael Guennebaud
e99279f444
merge
2015-10-16 22:12:54 +02:00
Benoit Steiner
de1e9f29f4
Updated the custom indexing code: we can now use any container that provides the [] operator to index a tensor. Added unit tests to validate the use of std::map and a few more types as valid custom index containers
2015-10-15 14:58:49 -07:00
Benoit Steiner
6585efc553
Tightened the definition of isOfNormalIndex to take into account integer types in addition to arrays of indices
...
Only compile the custom index code when EIGEN_HAS_SFINAE is defined. For the time beeing, EIGEN_HAS_SFINAE is a synonym for EIGEN_HAS_VARIADIC_TEMPLATES, but this might evolve in the future.
Moved some code around.
2015-10-14 09:31:37 -07:00
Gael Guennebaud
c0adf6e38d
Fix perm*sparse return type and nesting, and add several sanity checks for perm*sparse
2015-10-14 10:16:48 +02:00
Gael Guennebaud
527fc4bc86
Fix ambiguous instantiation issues of product_evaluator.
2015-10-14 10:14:47 +02:00
Gael Guennebaud
2598f3987e
Add a plain_object_eval<> helper returning a plain object type based on evaluator's Flags,
...
and base nested_eval on it.
2015-10-14 10:12:58 +02:00
Gael Guennebaud
b4c79ee1d3
Update custom setFromTripplets API to allow passing a functor object, and add a collapseDuplicates method to cleanup the API. Also add respective unit test
2015-10-13 11:30:41 +02:00
Gabriel Nützi
fc7478c04d
name changes 2
...
user: Gabriel Nützi <gnuetzi@gmx.ch >
branch 'default'
changed unsupported/Eigen/CXX11/src/Tensor/Tensor.h
changed unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h
2015-10-09 19:10:08 +02:00
Gabriel Nützi
7b34834f64
name changes
...
user: Gabriel Nützi <gnuetzi@gmx.ch >
branch 'default'
changed unsupported/Eigen/CXX11/src/Tensor/Tensor.h
2015-10-09 19:08:14 +02:00
Gabriel Nützi
6edae2d30d
added CustomIndex capability only to Tensor and not yet to TensorBase.
...
using Sfinae and is_base_of to select correct template which converts to array<Index,NumIndices>
user: Gabriel Nützi <gnuetzi@gmx.ch >
branch 'default'
added unsupported/Eigen/CXX11/src/Tensor/TensorMetaMacros.h
added unsupported/test/cxx11_tensor_customIndex.cpp
changed unsupported/Eigen/CXX11/Tensor
changed unsupported/Eigen/CXX11/src/Tensor/Tensor.h
changed unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h
changed unsupported/test/CMakeLists.txt
2015-10-09 18:52:48 +02:00
Calixte Denizet
b9d81c9150
Add a functor to setFromTriplets to handle duplicated entries
2015-10-06 13:29:41 +02:00
Gael Guennebaud
9acfc7c4f3
remove reference to internal method
2015-10-13 10:55:58 +02:00
Gael Guennebaud
a44d91a0b2
extend unit test for SparseMatrix::prune
2015-10-13 10:53:38 +02:00
Gael Guennebaud
ac22b66f1c
Fix macro issues
2015-10-13 10:18:09 +02:00
Gael Guennebaud
3e32f6b554
update mpreal.h
2015-10-13 09:58:54 +02:00
Gael Guennebaud
ea9749fd6c
Fix packetmath unit test for pdiv not being always defined
2015-10-13 09:53:46 +02:00
Gael Guennebaud
252e89b11b
bug #1086 : replace deprecated UF_long by SuiteSparse_long
2015-10-12 16:20:12 +02:00
Gael Guennebaud
6407e367ee
Add missing epxlicit keyword, and fix regression in DynamicSparseMatrix
2015-10-12 09:49:05 +02:00
Gael Guennebaud
63e29e7765
Workaround ICC issue with first_aligned
2015-10-11 22:47:28 +02:00
Gael Guennebaud
6163db814c
bug #1085 : workaround gcc default ABI issue
2015-10-10 22:38:55 +02:00
Gael Guennebaud
6536b4bad7
Implement temporary-free path for "D.nolias() ?= C + A*B". (I thought it was already implemented)
2015-10-09 15:28:09 +02:00
Gael Guennebaud
a4cc4c1e5e
Clarify note in nested_eval for evaluator creating temporaries.
2015-10-09 14:57:51 +02:00
Gael Guennebaud
ae38910693
The evalautor of Solve was missing the EvalBeforeNestingBit flag.
2015-10-09 14:57:19 +02:00
Gael Guennebaud
515ecddb97
Add unit test for nested_eval
2015-10-09 14:29:46 +02:00
Gael Guennebaud
78b8c344b5
Add unit test for CoeffReadCost
2015-10-09 14:28:48 +02:00
Gael Guennebaud
321cb56bf6
Add unit test to check nesting of complex expressions in redux()
2015-10-09 13:29:39 +02:00
Gael Guennebaud
2632b3446c
Improve documentation of TriangularView.
2015-10-09 12:10:58 +02:00
Gael Guennebaud
1429daf850
Add lvalue check for TriangularView::swap, and fix deprecated TriangularView::lazyAssign
2015-10-09 12:10:48 +02:00
Gael Guennebaud
72bd05b6d8
Cleaning in Redux.h
2015-10-09 12:07:42 +02:00
Gael Guennebaud
2c516ba38f
Remove auto references and referenced-by relation in doc.
2015-10-09 12:07:06 +02:00
Gael Guennebaud
041e038fef
Remove dead code in selfadjoint_matrix_vector_product
2015-10-09 10:42:14 +02:00
Gael Guennebaud
c2d68b984f
Optimize a bit complex selfadjoint * vector product.
2015-10-09 10:34:58 +02:00
Gael Guennebaud
1932a24760
Simplify EIGEN_DENSE_PUBLIC_INTERFACE
2015-10-09 10:21:54 +02:00
Gael Guennebaud
186ec1437c
Cleanup EIGEN_SPARSE_PUBLIC_INTERFACE, it is now a simple alias to EIGEN_GENERIC_PUBLIC_INTERFACE
2015-10-08 22:06:49 +02:00
Gael Guennebaud
c9718514f5
Fix nesting sub-expression in outer-products
2015-10-08 21:41:53 +02:00
Gael Guennebaud
4140ee039d
Fix propagation of AssumeAliasing for expression as: "scalar * (A*B)"
2015-10-08 21:41:27 +02:00
Gael Guennebaud
d866279364
Clean a bit the implementation of inverse permutations
2015-10-08 18:36:39 +02:00
Gael Guennebaud
8d00a953af
Fix a nesting issue in some matrix-vector cases.
2015-10-08 17:36:57 +02:00
Gael Guennebaud
dd934ad057
Re-enable vectorization of LinSpaced, plus some cleaning
2015-10-08 17:27:01 +02:00
Gael Guennebaud
f6f6f50272
Clean evaluator<EvalToTemp>
2015-10-08 16:34:33 +02:00
Gael Guennebaud
67bfba07fd
Fix some CUDA issues
2015-10-08 16:30:28 +02:00
Gael Guennebaud
412c049ba4
Fix a warning
2015-10-08 16:27:54 +02:00
Gael Guennebaud
aa6b1aebf3
Properly implement PartialReduxExpr on top of evaluators, and fix multiple evaluation of nested expression
2015-10-08 15:57:05 +02:00
Gael Guennebaud
5cc7251188
Some cleaning in evaluators
2015-10-08 15:22:04 +02:00
Gael Guennebaud
e30bc89190
Add missing include of std vector
2015-10-08 15:20:50 +02:00
Gael Guennebaud
5d7ebfb275
Update sparse solver list to make it more complete
2015-10-08 11:33:17 +02:00
Gael Guennebaud
1b148d9e2e
Move IncompleteCholesky to official modules
2015-10-08 11:32:46 +02:00
Gael Guennebaud
632e7705b1
Improve doc of IncompleteCholesky
2015-10-08 10:54:36 +02:00
Gael Guennebaud
64242b8bf3
Doc: add link to doc of sparse solver concept
2015-10-08 10:50:39 +02:00
Gael Guennebaud
131db3c552
Fix return by value versus ref typo in IncompleteCholesky
2015-10-07 16:37:46 +02:00
Gael Guennebaud
13294b5152
Unify gemm and lazy_gemm benchmarks
2015-10-07 16:06:48 +02:00
Gael Guennebaud
247259f805
Add a perfromance regression benchmark for lazyProduct
2015-10-07 15:51:06 +02:00
Gael Guennebaud
c6eb17cbe9
Add helper routines to help bypassing some compiler otpimization when benchmarking
2015-10-07 15:50:42 +02:00
Gael Guennebaud
f047ecc36a
_mm_hadd_epi32 is for SSSE3 only (and not SSE3)
2015-10-07 15:48:35 +02:00
Gael Guennebaud
aba1eda71e
Help clang to inline some functions, thus fixing some regressions
2015-10-07 15:44:12 +02:00
Gael Guennebaud
41cc1f9033
Remove debuging prod() and lazyprod() function, plus some cleaning in noalias assignment
2015-10-07 15:41:22 +02:00
Gael Guennebaud
ca0dd7ae26
Fix implicit cast in unit test
2015-10-07 15:36:12 +02:00
Gael Guennebaud
8bb51a87f7
Re-enable some invalid scalar type conversion checks by disabling explicit vectorization
2015-10-06 17:24:01 +02:00
Gael Guennebaud
27a94299aa
Add sparse vector to Ref<SparseMatrix> conversion unit tests, and improve output of sparse_ref unit test in case of failure.
2015-10-06 17:23:11 +02:00
Gael Guennebaud
2e0ece7b66
Fix wrong casting syntax
2015-10-06 17:22:12 +02:00
Gael Guennebaud
69a7897e72
Fix storage index type in empty permutations
2015-10-06 17:21:24 +02:00
Gael Guennebaud
26cde4db3c
Define Permutation*<>::Scalar to 'void', re-enable scalar type compatibility check in assignment while relaxing this test for void types.
2015-10-06 17:18:06 +02:00
Gael Guennebaud
fb51bab272
Some cleaning
2015-10-06 17:14:56 +02:00
Gael Guennebaud
2c676ddb40
Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)
2015-10-06 15:43:27 +02:00
Gael Guennebaud
2d287a4898
Fix Ref<SparseMatrix> for Transpose<SparseVector>
2015-10-06 15:09:04 +02:00
Gael Guennebaud
752a0e5339
bug #1076 : fix scaling in IncompleteCholesky, improve doc, add read-only access to the different factors, remove debugging code.
2015-10-06 13:25:45 +02:00
Gael Guennebaud
f25bdc707f
Optimise assignment into a Block<SparseMatrix> by using Ref and avoiding useless updates in non-compressed mode. This make row-by-row filling of a row-major sparse matrix very efficient.
2015-10-06 11:59:08 +02:00
Gael Guennebaud
945b80c83e
Optimize Ref<SparseMatrix> by removing useless default initialisation of SparseMapBase and SparseMatrix
2015-10-06 11:57:03 +02:00
Gael Guennebaud
9a070638de
Enable to view a SparseVector as a Ref<SparseMatrix>
2015-10-06 11:53:19 +02:00
Gael Guennebaud
1b43860bc1
Make SparseVector derive from SparseCompressedBase, thus improving compatibility between sparse vectors and matrices
2015-10-06 11:41:03 +02:00
Gael Guennebaud
6100d1ae64
Improve counting of sparse temporaries
2015-10-06 11:32:02 +02:00
Gael Guennebaud
1879917d35
Propagate cmake generator
2015-10-05 16:18:22 +02:00
Gael Guennebaud
deb261f64b
Make abs2 compatible with custom complex types
2015-10-02 10:33:25 +02:00
nnyby
ccc7b0ffea
[doc] grammar fix: "linearly space" -> "linearly spaced"
2015-10-01 23:43:06 +00:00
Gael Guennebaud
75a60d3ac0
bug #1075 : fix AlignedBox::sample for runtime dimension
2015-09-30 11:44:02 +02:00
Gael Guennebaud
9136b95219
Merged in doug_kwan/eigen (pull request PR-137)
...
Specified signedness of char type in test
2015-09-30 11:37:04 +02:00
Gael Guennebaud
781e8c38bd
merge
2015-09-29 11:12:43 +02:00
Gael Guennebaud
b2b8c1d41e
Fix performance regression in sparse * dense product where "sparse" is an expression
2015-09-29 11:11:40 +02:00
Doug Kwan
239c9946cd
Specified signedness of char type in test so that test passes
...
consistently on different targets.
2015-09-28 14:26:10 -07:00
Benoit Steiner
d46bacb6bb
Call numext::mini instead of std::min in several places.
2015-09-28 10:40:41 -07:00
Gael Guennebaud
ceafed519f
Add support for permutation * homogenous
2015-09-28 16:56:11 +02:00
Gael Guennebaud
ddb5650530
bug #1070 : propagate last three Matrix template arguments for NumTraits<AutoDiffScalar<>>::Real
2015-09-28 15:07:03 +02:00
Gael Guennebaud
02e940fc9f
bug #1071 : improve doc on lpNorm and add example for some operator norms
2015-09-28 11:55:36 +02:00
Gael Guennebaud
8c1ee3629f
Add support for row/col-wise lpNorm()
2015-09-28 11:36:00 +02:00
Gael Guennebaud
75861f6650
bug #1069 : fix AVX support on MSVC (use of non portable C-style cast)
2015-09-28 10:08:26 +02:00
Tal Hadad
5e0a178df2
Initial fork of unsupported module EulerAngles.
2015-09-27 16:51:24 +03:00
Gael Guennebaud
d16797cfc0
Fix bug #1067 : naming conflict
2015-09-19 21:44:14 +02:00
Benoit Steiner
13aee4463e
Cleaned up a test
2015-09-18 09:42:08 -07:00
Benoit Steiner
58a6453d48
Fixed compilation warning
2015-09-17 10:18:49 -07:00
Benoit Steiner
31afdcb4c2
Fix return type for TensorEvaluator<TensorSlicingOp>::data
2015-09-17 09:40:21 -07:00
Gael Guennebaud
9d993c709b
Fix typo in Vectowise::any()
2015-09-16 22:31:19 +02:00
Christoph Hertzberg
43ba07d4d7
Merged in daalpa/eigen/daalpa/removed-documentation-that-did-not-match-1442148941751 (pull request PR-136)
...
Removed documentation that did not match the member function DenseBase::outerSize()
2015-09-13 16:35:32 +02:00
daalpa
fab96f2ff3
Removed documentation that did not match the member function DenseBase::outerSize()
2015-09-13 12:55:57 +00:00
Christoph Hertzberg
d6f762d955
Fixed cuda code: EIGEN_DEVICE_FUNC must come after template<...>
2015-09-10 11:46:27 +02:00
Gael Guennebaud
680d318352
Add unit tests for bug #981 : valid and invalid usage of ternary operator
2015-09-09 11:38:25 +02:00
Benoit Steiner
84e0c27b61
Fixed a compilation warning
2015-09-08 17:05:35 -07:00
Benoit Steiner
05f2f94f2b
Fixed a compilation warning
2015-09-08 17:04:03 -07:00
Benoit Steiner
98f8f0db9a
Added support for predux_mul for CUDA devices
2015-09-08 15:37:25 -07:00
Christoph Hertzberg
e3f69eb60d
Fixed minor regression caused by 7031a851d4
2015-09-08 10:53:10 +02:00
Gael Guennebaud
5bf971e5b8
MKL is now free of charge for opensource
2015-09-07 11:23:55 +02:00
Gael Guennebaud
73a86cfcd3
Add EIGEN_QUATERNION_PLUGIN
2015-09-07 11:12:30 +02:00
Gael Guennebaud
7fad309631
Fix link and code formating
2015-09-07 11:08:41 +02:00
Gael Guennebaud
7031a851d4
Generalize matrix ctor and compute() method of dense decomposition to 1) limit temporaries, 2) forward expressions to nested decompositions, 3) fix ambiguous ctor instanciation for square decomposition
2015-09-07 10:42:04 +02:00
Gael Guennebaud
1702fcb72e
Added tag 3.3-alpha1 for changeset f9303cc7c5
2015-09-04 17:27:20 +02:00
Sergiu Dotenco
85afb61417
use explicit Scalar types for AngleAxis initialization
...
(grafted from 89a222ce50
)
2015-08-28 22:20:15 +02:00
Benoit Steiner
56983f6d43
Fixed compilation warning
2015-10-23 12:03:42 -07:00
Benoit Steiner
57857775b4
Added support for arrays of size 0
2015-10-23 10:20:51 -07:00
Benoit Steiner
c40c2ceb27
Reordered the code of fft constructor to prevent compilation warnings
2015-10-23 09:38:19 -07:00
Benoit Steiner
a586fdaa91
Reworked the tensor contraction mapper code to make it compile on Android
2015-10-23 09:33:41 -07:00
Benoit Steiner
29c3b7513e
Pulled latest updates from trunk
2015-10-23 09:16:14 -07:00
Benoit Steiner
9ea39ce13c
Refined the #ifdef __CUDACC__ guard to ensure that when trying to compile gpu code with a non cuda compiler results in a linking error instead of bogus code.
2015-10-23 09:15:34 -07:00
Gael Guennebaud
c244081490
disable usage of INTMAX_T
2015-10-23 14:48:54 +02:00
Gael Guennebaud
0905ed5390
remove useless cstdint header
2015-10-23 14:41:25 +02:00
Gael Guennebaud
54b23cce16
Switch to MPL2
2015-10-23 10:36:33 +02:00
Benoit Steiner
ac99b49249
Added missing glue logic
2015-10-22 16:54:21 -07:00
Benoit Steiner
2dd9446613
Added mapping between a specific device and the corresponding packet type
2015-10-22 16:53:36 -07:00
Benoit Steiner
2495e2479f
Added tests for the fft code
2015-10-22 16:52:55 -07:00
Benoit Steiner
a147c62998
Added support for fourier transforms (code courtesy of thucjw@gmail.com)
2015-10-22 16:51:30 -07:00
Gael Guennebaud
71b473aab1
Remove invalid typename keyword
2015-10-22 21:58:18 +02:00
Gael Guennebaud
ebc1af1683
merge
2015-10-22 21:47:47 +02:00
Benoit Steiner
825146c8fd
Fixed incorrect expected value
2015-10-22 11:56:00 -07:00
Benoit Steiner
4cf7da63de
Added a constructor to simplify the construction of tensormap from tensor
2015-10-22 11:48:02 -07:00
Gael Guennebaud
0eb46508e2
Avoid any openmp calls if multi-threading is explicitely disabled at runtime.
2015-10-22 16:30:28 +02:00
Gael Guennebaud
6df8e99470
bug #1089 : add a warning when using a MatrixBase method which is implemented within another module by declaring them inline.
2015-10-22 16:10:28 +02:00
Gael Guennebaud
e78bc111f1
bug #1090 : fix a shortcoming in redux logic for which slice-vectorization plus unrolling might happen.
2015-10-21 20:58:33 +02:00
Benoit Steiner
b178cc3479
Added some syntactic sugar to make it simpler to compare a tensor to a scalar.
2015-10-21 11:28:28 -07:00
Gael Guennebaud
5ca2e25967
merge
2015-10-21 13:49:13 +02:00
Gael Guennebaud
8afd0ce955
add FIXME
2015-10-21 13:48:15 +02:00
Gael Guennebaud
8961265889
bug #1064 : add support for Ref<SparseVector>
2015-10-21 09:47:43 +02:00
Benoit Steiner
0af63493fd
Disable SFINAE for versions of gcc older than 4.8
2015-10-20 11:53:30 -07:00
Benoit Steiner
73b8e719ae
Removed bogus assertion
2015-10-20 11:42:34 -07:00
Benoit Steiner
eaf4b98180
Added support for boolean reductions (ie 'and' & 'or' reductions)
2015-10-20 11:41:22 -07:00
Benoit Steiner
f5c1587e4e
Fixed a bug in the tensor conversion op
2015-10-20 11:37:44 -07:00
Gael Guennebaud
fe630c9873
Improve numerical accuracy in LLT and triangular solve by using true scalar divisions (instead of x * (1/y))
2015-10-18 22:15:01 +02:00
yoco
15f273b63c
fix reshape flag and test case
2014-02-10 22:49:13 +08:00
yoco
b64a09acc1
fix reshape's Max[Row/Col]AtCompileTime
2014-02-04 05:54:50 +08:00
yoco
f8ad87f226
Reshape always non-directly-access
2014-02-04 05:19:56 +08:00
yoco
515bbf8bb2
Improve reshape test case
...
- simplify test code
- add reshape chain
2014-02-04 02:50:23 +08:00
yoco
009047db27
Fix Reshape traits flag calculate bug
2014-02-04 02:21:41 +08:00
yoco
2b89080903
Remove reshape InnerPanel, add test, fix bug
2014-01-20 01:43:28 +08:00
yoco
03723abda0
Remove useless reshape row/col ctor
2014-01-20 00:22:16 +08:00
yoco
342c8e5321
Fix Reshape DirectAccessBit bug
2014-01-20 00:15:19 +08:00
yoco
24e1c0f2a1
add reshape test for const and static-size matrix
2014-01-18 23:27:53 +08:00
yoco
150796337a
Add unit-test for reshape
...
- add unittest for dynamic & fixed-size reshape
- fix fixed-size reshape bugs bugs found while testing
2014-01-18 16:10:46 +08:00
yoco
497a7b0ce1
remove c++11, make c++03 compatible
2014-01-18 13:20:14 +08:00
yoco
9c832fad60
add reshape() snippets
2014-01-18 04:53:46 +08:00
yoco
1e1d0c15b5
add example code for Reshape class
2014-01-18 04:43:29 +08:00
yoco
fe2ad0647a
reshape now supported
...
- add member function to plugin
- add forward declaration
- add documentation
- add include
2014-01-18 04:21:20 +08:00
yoco
7bd58ad0b6
add Eigen/src/Core/Reshape.h
2014-01-18 04:16:44 +08:00